Can someone help me optimize my DFS implementation in Python?

Question

def dfs(graph, node):
"""Run DFS through the graph from the starting 
node and return the nodes in order of finishing time.
"""
stack = [[node, True]]
while True in [x[1] for x in stack]:
    i = 0
    for x in xrange(len(stack)):
        if stack[x][1] == True:
            i = x
            break
    stack[i][1] = False
    stack = [[x, True] for x in graph[stack[i][0]] \
                 if (([x, True] not in stack) and ([x, False] not in stack))] + stack
return [x[0] for x in stack]

Input: Graph as adjacency list Eg: {1:[2, 3], 2:[1], 3:[]}
Node is an integer value from where depth first search begins Eg: 1
Output: The list of nodes in order of their finishing times.
Eg: [3, 2, 1]

For large graphs, the above code is taking too long to execute. Is there a better way to do this?
Edit: I could do better time by implementing the following code. Is it possible to do even better?

def dfs(graph, node):
"""Run DFS through the graph from the starting 
node and return the nodes in order of finishing time.
"""
seen = defaultdict(bool)
stack = [node]

while True:
    flag = True
    for x in stack:
        if not seen[x]:
            stack = [a for a in graph[x] if not seen[a]] + stack
            seen[x] = True
            flag = False
            break
     if flag:
        break
return stack

Edit: I've changed seen and stack variables to sets. But I'm not getting the required performance boost. What am I doing wrong?

def dfs(graph, node):
    """Run DFS through the graph from the starting 
    node and return the nodes in order of finishing time.
    """
    seen = set()
    stack = {node}
    while True:
       flag = True
       for x in stack:
           if x not in seen:
                stack = stack.union(set(graph[x]))
                seen = seen.union({x})
                flag = False
                break
        if flag:
            break
    return list(stack)

This does not do depth-first search, it does breadth-first search. Which do you want? I'll write an answer later today when I have time. Btw: The third function doesn't work because sets aren't ordered (unless all you want to do is to find all nodes that are reachable from the starting node). — flornquake, Aug 6 '13 at 12:17
Winston has already answered it better than I could, so nevermind. — flornquake, Aug 7 '13 at 6:18

Roman Susi · Accepted Answer · 2013-08-08 16:07:19Z

Some hints.

Set type can be used for seen variable.

That stack = [...] + stack is a monster, both + operation and list comprehension.

Is it possible to use set there (what type graph[x] is?). Maybe, its more efficient to have stack in a list of lists form?

UPDATE: loop over the whole stack again and again can be another point of optimization.

UPDATE2: in case you have not solved the problem, one possible way is to eliminate the inefficient filtering of the stack against seen values. This can be achieved by having a set of not yet visited nodes and using set operations to get the next level from the current level. This is why I've advised to keep levels in the stack (list of sets - as particular order inside a level is not important, is it?). Operating on relatively small number of nodes in the latest level is more efficient than looping over the whole stack. Actually, already seen nodes can also be subtracted from "next level", so if the speed is ok, set of unvisited nodes can be redundant.

UPDATE3:

I think, the following produces result, which is stated in the docstring. Finishing time is a time when the node has no more children to process and is popped from the stack. Reverse order can also be easily achieved by changing .append(x) to insert(0, x).

def dfs(graph, node):
    """Run DFS through the graph from the starting
    node and return the nodes in order of finishing time.
    """
    seen = {node}
    stack = [node]
    result = []
    while stack:
        x = stack[-1]
        nxt = set(graph.get(x, [])) - seen
        if nxt:
            stack.extend(list(nxt))
            seen |= nxt
        else:
            result.append(x)
            stack.pop()
    return result

Remark: the algorithm does not preserve the order of children. Minor modification may be needed to preserve the order of graph.get(x, []), in if nxt:

stack.extend(nx for nx in graph[x] if nx in nxt)

I tried using sets, I've edited the question description to include the edited code. The running time is not very different from my second attempt. Can you tell me where I'm going wrong? — vaishaks, Aug 6 '13 at 10:52
@Roman The type of graph[x] is a list: (quoting vaishaks) "Input: Graph as adjacency list Eg: {1:[2, 3], 2:[1], 3:[]}" — flornquake, Aug 6 '13 at 12:20
Those can be easily converted to sets, of course. (if the order of traversal is not important) — Roman Susi, Aug 6 '13 at 15:19
@Roman Update3 is achieving exactly what I needed and it's running in good time. Thanks a lot Roman! — vaishaks, Aug 9 '13 at 7:23

vaishaks · Answer 2 · 2013-08-07 17:12:29Z

def dfs(graph, node):

This isn't a depth first search

    """Run DFS through the graph from the starting 
    node and return the nodes in order of finishing time.
    """
    seen = set()
    stack = {node}

You aren't using this as a stack, don't call it that

    while True:
       flag = True

avoid flag variables, they confound code. Also that's a very undescriptive name.

       for x in stack:
           if x not in seen:

Seen these are sets, you should be probably use: for x in stack - seen: which should be more efficient

                stack = stack.union(set(graph[x]))

Using this approach will make things rather slow, you just keep making stack bigger and bigger.

                seen = seen.union({x})

Use seen.add to add things to the set. Using union like this will be much slower

                flag = False
                break
        if flag:
            break
    return list(stack)

The set gives you no guarantees as to the order of the elements. I'm guessing that's now what you wanted. As it stands,this will end up giving you all the elements in your graph in a random order.

My approach

def breadth_first_search(graph, start):
    seen = set()
    current = {start}
    history = []

    while current:
         history.extend(current) # record when we found all these elements
         seen.update(current) # add them to the list of elements we've already seen
         neighbours = set() # generate a set of all elements that we can reach from these ones
         for node in current:
             neighbours.update(graph[node])
         current = neighbours - seen # the next round of elements is this one, except those we've already seen


    return history

That is a brilliant answer Winston. I learnt a lot from that. But my application requires me to return the nodes in order of finishing times the way dfs would finish. I agree my iterative code looks a lot like bfs. And your code works perfectly well if I needed the nodes to be returned in order of finishing times the way bfs would. But the code I wrote gives me the right answer for small graphs. It's not at all random(because of using sets) as one would expect it to be. I find that very weird now that I know sets are un-ordered. Can you please tell me how you would do it for dfs? — vaishaks, Aug 7 '13 at 14:09
@vaishaks, your code working for small graphs is greatly mysterious. Any chance you can get me the data you ran it against? — Winston Ewert, Aug 7 '13 at 17:13
Sorry the output of the set code makes no sense actually. My bad. I've now reverted to function 2. The one with the defaultdict. Graph: {1: [3], 2: [1], 3: [2], 4: [3, 5, 6], 6: [8], 7: [6], 8: [7]} Start: 8, Your approach Output: [8, 7, 6] Required output: [6, 7, 8]. Start: 4, Your approach Output: [4, 3, 5, 6, 8, 2, 1, 7], Required output: [7, 8, 1, 2, 3, 5, 6, 4] — vaishaks, Aug 7 '13 at 18:20
@vaishaks: You're right that it's not bfs (I was mistaken earlier), but it's not dfs either. (This may or may not be what you want.) E.g., in your last example, dfs would start at 4, go to one of the successors of 4, (let's say 3). Then dfs would take one of the children of 3, and so on. The other successor of 4 would only be traversed later. Also, is it important that the output be in "reversed" order (starting point at the end)? — flornquake, Aug 8 '13 at 7:18

asked	7 months ago
viewed	539 times
active	6 months ago

current community

more stack exchange communities

Can someone help me optimize my DFS implementation in Python?

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python optimization algorithm performance graph or ask your own question.

Community Bulletin

Hot Network Questions

current community

more stack exchange communities

Can someone help me optimize my DFS implementation in Python?

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python optimization algorithm performance graph or ask your own question.

Community Bulletin

Related

Hot Network Questions