Graph and minimum spanning tree in Python

Question

Here is my Graph class that implements a graph and has nice a method to generate its spanning tree using Kruskal's algorithm.

I want to:

Make it pythonic
Improve readability
Improve the abstraction (but not changing the use of outer and inner dicts to represent the graph)

Performance is not a concern.

Code:

class Graph(object):
    def __init__(self):
        self.g = {}

    def add(self, vertex1, vertex2, weight):
        if vertex1 not in self.g:
            self.g[vertex1] = {}

        if vertex2 not in self.g:
            self.g[vertex2] = {}

        self.g[vertex1][vertex2] = weight
        self.g[vertex2][vertex1] = weight

    def has_link(self, v1, v2):
        return v2 in self[v1] or v1 in self[v2]

    def edges(self):
        data = []

        for from_vertex, destinations in self.g.items():
            for to_vertex, weight in destinations.items():
                if (to_vertex, from_vertex, weight) not in data:
                    data.append((from_vertex, to_vertex, weight))

        return data

    def sorted_by_weight(self, desc=False):
        return sorted(self.edges(), key=lambda x: x[2], reverse=desc)

    def spanning_tree(self, minimum=True):
        mst = Graph()
        parent = {}
        rank = {}

        def find_parent(vertex):
            while parent[vertex] != vertex:
                vertex = parent[vertex]

            return vertex

        def union(root1, root2):
            if rank[root1] > rank[root2]:
                parent[root2] = root1
            else:
                parent[root2] = root1

                if rank[root2] == rank[root1]:
                    rank[root2] += 1

        for vertex in self.g:
            parent[vertex] = vertex
            rank[vertex] = 0

        for v1, v2, weight in self.sorted_by_weight(not minimum):
            parent1 = find_parent(v1)
            parent2 = find_parent(v2)

            if parent1 != parent2:
                mst.add(v1, v2, weight)
                union(parent1, parent2)

            if len(self) == len(mst):
                break

        return mst

    def __len__(self):
        return len(self.g.keys())

    def __getitem__(self, node):
        return self.g[node]

    def __iter__(self):
        for edge in self.edges():
            yield edge

    def __str__(self):
        return "\n".join('from %s to %s: %d' % edge for edge in self.edges())

graph = Graph()

graph.add('a', 'b', 2)
graph.add('a', 'd', 3)
graph.add('a', 'c', 3)

graph.add('b', 'a', 2)
graph.add('b', 'c', 4)
graph.add('b', 'e', 3)

graph.add('c', 'a', 3)
graph.add('c', 'b', 4)
graph.add('c', 'd', 5)
graph.add('c', 'e', 1)

graph.add('d', 'a', 3)
graph.add('d', 'c', 5)
graph.add('d', 'f', 7)

graph.add('f', 'd', 7)
graph.add('f', 'e', 8)
graph.add('f', 'g', 9)

print(graph.spanning_tree())
print()
print(graph.spanning_tree(False))

Gareth Rees · Accepted Answer · 2016-05-18 11:41:14Z

There are no docstrings. How do I use this class? What arguments should I pass to the methods and what do they return?
There's no way to add a vertex with no edges to a graph.
The test code shows that it is quite laborious to initialize a graph. It would make sense for the constructor to take an iterable of edges.
The attribute g is not intended for use outside the class (callers should use the public methods). It is conventional to name internal attributes with names starting with one underscore.

The code in add could be simplified by making use of collections.defaultdict:

def __init__(self, edges=()):
    """Construct a graph containing the edges from the given iterable.
    An edge between vertices v1 and v2 with weight w should be
    specified as a tuple (v1, v2, w).

    """
    self._g = defaultdict(dict)
    for edge in edges:
        self.add(*edge)

def add(self, v1, v2, w):
    """Add an edge between vertices v1 and v2 with weight w.
    If an edge already exists between these vertices, set its
    weight to w.

    """
    self._g[v1][v2] = self._g[v2][v1] = w

The has_link method uses "link", but the rest of the code uses "edge". It would be better to be consistent.
The __getitem__ method uses "node", but the rest of the code uses "vertex". It would be better to be consistent.

Since add ensures that edges are stored in both directed, it's only necessary to test one direction:

def has_edge(self, v1, v2):
    """Return True if the graph has an edge between vertices v1 and v2."""
    return v2 in self._g[v1]

The edges method accumulates a list of edges. To avoid an edge appearing twice, the code checks each edge to see if it is already in the list. But lists do not have an efficient membership test, so the runtime is quadratic in the number of edges. It would be more efficient to make a set of edges:

def edges(self):
    """Return the edges in the graph as a set of tuples (v1, v2, w)."""
    edges = set()
    for v1, destinations in self._g.items():
        for v2, w in destinations.items():
            if (v2, v1, w) not in edges:
                edges.add((v1, v2, w))
    return edges

In the sorted_by_weight method, it would be better to name the keyword argument reverse, for consistency with Python's built-in sorted.
In spanning_tree, the parent and rank dictionaries can be initialized directly:
```
parent = {v: v for v in self._g}
rank = {v: 0 for v in self._g}
```
The parent data structure is known as a disjoint-set forest. An important optimization on this data structure is path compression: that is, when you search the parent data structure for the root of the tree containing a vertex, you update the parent data structure so that future searches run quickly:
```
def find_root(v):
    if parent[v] != v:
        parent[v] = find_root(parent[v])
    return parent[v]
```
In the union function, you always attach the tree for root2 to the tree for root1:
```
if rank[root1] > rank[root2]:
    parent[root2] = root1
else:
    parent[root2] = root1
```
It's more efficient to always attach the smaller tree to the bigger tree, so that the length of the searches in find_root doesn't increase:
```
if rank[root1] > rank[root2]:
    parent[root2] = root1
else:
    parent[root1] = root2
```
The __len__ and __getitem__ methods treat the graph as a collection of vertices, but __iter__ treats it as a collection of edges. This seems likely to lead to confusion. It would be better to be consistent and have all special methods treat the graph as a collection in the same way.

asked	5 months ago
viewed	298 times
active	5 months ago

current community

your communities

more stack exchange communities

Graph and minimum spanning tree in Python

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python algorithm graph or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Graph and minimum spanning tree in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python algorithm graph or ask your own question.

Related

Hot Network Questions