Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Here is my Graph class that implements a graph and has nice a method to generate its spanning tree using Kruskal's algorithm.

I want to:

  • Make it pythonic
  • Improve readability
  • Improve the abstraction (but not changing the use of outer and inner dicts to represent the graph)

Performance is not a concern.

Code:

class Graph(object):
    def __init__(self):
        self.g = {}

    def add(self, vertex1, vertex2, weight):
        if vertex1 not in self.g:
            self.g[vertex1] = {}

        if vertex2 not in self.g:
            self.g[vertex2] = {}

        self.g[vertex1][vertex2] = weight
        self.g[vertex2][vertex1] = weight

    def has_link(self, v1, v2):
        return v2 in self[v1] or v1 in self[v2]

    def edges(self):
        data = []

        for from_vertex, destinations in self.g.items():
            for to_vertex, weight in destinations.items():
                if (to_vertex, from_vertex, weight) not in data:
                    data.append((from_vertex, to_vertex, weight))

        return data

    def sorted_by_weight(self, desc=False):
        return sorted(self.edges(), key=lambda x: x[2], reverse=desc)

    def spanning_tree(self, minimum=True):
        mst = Graph()
        parent = {}
        rank = {}

        def find_parent(vertex):
            while parent[vertex] != vertex:
                vertex = parent[vertex]

            return vertex

        def union(root1, root2):
            if rank[root1] > rank[root2]:
                parent[root2] = root1
            else:
                parent[root2] = root1

                if rank[root2] == rank[root1]:
                    rank[root2] += 1

        for vertex in self.g:
            parent[vertex] = vertex
            rank[vertex] = 0

        for v1, v2, weight in self.sorted_by_weight(not minimum):
            parent1 = find_parent(v1)
            parent2 = find_parent(v2)

            if parent1 != parent2:
                mst.add(v1, v2, weight)
                union(parent1, parent2)

            if len(self) == len(mst):
                break

        return mst

    def __len__(self):
        return len(self.g.keys())

    def __getitem__(self, node):
        return self.g[node]

    def __iter__(self):
        for edge in self.edges():
            yield edge

    def __str__(self):
        return "\n".join('from %s to %s: %d' % edge for edge in self.edges())

graph = Graph()

graph.add('a', 'b', 2)
graph.add('a', 'd', 3)
graph.add('a', 'c', 3)

graph.add('b', 'a', 2)
graph.add('b', 'c', 4)
graph.add('b', 'e', 3)

graph.add('c', 'a', 3)
graph.add('c', 'b', 4)
graph.add('c', 'd', 5)
graph.add('c', 'e', 1)

graph.add('d', 'a', 3)
graph.add('d', 'c', 5)
graph.add('d', 'f', 7)

graph.add('f', 'd', 7)
graph.add('f', 'e', 8)
graph.add('f', 'g', 9)

print(graph.spanning_tree())
print()
print(graph.spanning_tree(False))
share|improve this question
up vote 2 down vote accepted
  1. There are no docstrings. How do I use this class? What arguments should I pass to the methods and what do they return?

  2. There's no way to add a vertex with no edges to a graph.

  3. The test code shows that it is quite laborious to initialize a graph. It would make sense for the constructor to take an iterable of edges.

  4. The attribute g is not intended for use outside the class (callers should use the public methods). It is conventional to name internal attributes with names starting with one underscore.

  5. The code in add could be simplified by making use of collections.defaultdict:

    def __init__(self, edges=()):
        """Construct a graph containing the edges from the given iterable.
        An edge between vertices v1 and v2 with weight w should be
        specified as a tuple (v1, v2, w).
    
        """
        self._g = defaultdict(dict)
        for edge in edges:
            self.add(*edge)
    
    def add(self, v1, v2, w):
        """Add an edge between vertices v1 and v2 with weight w.
        If an edge already exists between these vertices, set its
        weight to w.
    
        """
        self._g[v1][v2] = self._g[v2][v1] = w
    
  6. The has_link method uses "link", but the rest of the code uses "edge". It would be better to be consistent.

  7. The __getitem__ method uses "node", but the rest of the code uses "vertex". It would be better to be consistent.

  8. Since add ensures that edges are stored in both directed, it's only necessary to test one direction:

    def has_edge(self, v1, v2):
        """Return True if the graph has an edge between vertices v1 and v2."""
        return v2 in self._g[v1]
    
  9. The edges method accumulates a list of edges. To avoid an edge appearing twice, the code checks each edge to see if it is already in the list. But lists do not have an efficient membership test, so the runtime is quadratic in the number of edges. It would be more efficient to make a set of edges:

    def edges(self):
        """Return the edges in the graph as a set of tuples (v1, v2, w)."""
        edges = set()
        for v1, destinations in self._g.items():
            for v2, w in destinations.items():
                if (v2, v1, w) not in edges:
                    edges.add((v1, v2, w))
        return edges
    
  10. In the sorted_by_weight method, it would be better to name the keyword argument reverse, for consistency with Python's built-in sorted.

  11. In spanning_tree, the parent and rank dictionaries can be initialized directly:

    parent = {v: v for v in self._g}
    rank = {v: 0 for v in self._g}
    
  12. The parent data structure is known as a disjoint-set forest. An important optimization on this data structure is path compression: that is, when you search the parent data structure for the root of the tree containing a vertex, you update the parent data structure so that future searches run quickly:

    def find_root(v):
        if parent[v] != v:
            parent[v] = find_root(parent[v])
        return parent[v]
    
  13. In the union function, you always attach the tree for root2 to the tree for root1:

    if rank[root1] > rank[root2]:
        parent[root2] = root1
    else:
        parent[root2] = root1
    

    It's more efficient to always attach the smaller tree to the bigger tree, so that the length of the searches in find_root doesn't increase:

    if rank[root1] > rank[root2]:
        parent[root2] = root1
    else:
        parent[root1] = root2
    
  14. The __len__ and __getitem__ methods treat the graph as a collection of vertices, but __iter__ treats it as a collection of edges. This seems likely to lead to confusion. It would be better to be consistent and have all special methods treat the graph as a collection in the same way.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.