Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have an indented text file that will be used to build a tree. Each line represents a node, and indents represent depth as well as node the current node is a child of.

For example, a file might look like

ROOT
   Node1
      Node2
         Node3
            Node4
   Node5
   Node6

Which indicates that ROOT contains three children: 1, 5, and 6, Node1 has one child: 2, and Node2 has one child: 3, etc.

I have come up with a recursive algorithm and have programmed it and it works, but it's kind of ugly and especially treats the example above very crudely (when going from node 4 to node 5)

It uses "indent count" as the basis for recursion, so if the number of indents = current depth + 1, I would go one level deeper. But this means when I read a line with less indents, I have to come back up one level at a time, checking the depth each time.

Here is what I have

def _recurse_tree(node, parent, depth):
    tabs = 0

    while node:
        tabs = node.count("\t")
        if tabs == depth:
            print "%s: %s" %(parent.strip(), node.strip())
        elif tabs == depth + 1:
            node = _recurse_tree(node, prev, depth+1)
            tabs = node.count("\t")

            #check if we have to surface some more
            if tabs == depth:
                print "%s: %s" %(parent.strip(), node.strip())
            else:
                return node
        else:
            return node

        prev = node
        node = inFile.readline().rstrip()

inFile = open("test.txt")
root = inFile.readline().rstrip()
node = inFile.readline().rstrip()
_recurse_tree(node, root, 1)

Right now I am just printing out the nodes to verify that the parent node is correct for each line, but maybe there is a cleaner way to do it? Especially the case in the elif block when I'm coming back from each recursion call.

share|improve this question
You will have to write a bunch of code to make the right parsing of this and the validation. Can you use xml? Its base structure is a tree. – guax May 20 '11 at 18:15
Unfortunately no, as this is more of a recursion exercise than anything. I assumed this sort of problem would be quite common though. – Keikoku May 20 '11 at 18:20
Might this be a homework question? If so, it's etiquette to add the homework tag. – Mu Mind May 20 '11 at 18:40
Nope, personal interest. Haven't done recursion in awhile. – Keikoku May 20 '11 at 19:00
If so, this is really not Python-specific. More of a general algorithm musing. – Santa May 20 '11 at 19:06

3 Answers

up vote 3 down vote accepted

The big issue is the "lookahead" that I think caused the ugliness in question. It can be shortened slightly:

def _recurse_tree(parent, depth, source):
    last_line = source.readline().rstrip()
    while last_line:
        tabs = last_line.count('\t')
        if tabs < depth:
            break
        node = last_line.strip()
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
            last_line = _recurse_tree(node, tabs+1, source)
    return last_line

inFile = open("test.txt")
_recurse_tree(None, 0, inFile)

Since we're talking recursion, I took pains to avoid any global variables (source and last_line). It would be more pythonic to make them members on some parser object.

share|improve this answer
inFile is a global variable... – martineau May 20 '11 at 20:44
@martineau: you're right, I meant to replace inFile with source inside the function, fixed that now. – Mu Mind May 20 '11 at 21:00
Also looks to me like the last_line argument is always passed in as None -- so it could be just be a local variable with an initial value of source.readline().rstrip() set before the while loop (and the check for it being None removed). – martineau May 22 '11 at 7:04
@martineau: right again, edited accordingly. When I was writing it, I was tinkering a bit and wasn't sure whether each recursion/return would correspond to moving to the next line of input. Since I mentioned this being a "shortened" version, I guess I'd better squeeze all the air out, huh? – Mu Mind May 22 '11 at 14:31
1  
+1 even though recursing to read each line of the file makes me nervous... – martineau May 22 '11 at 15:20

If you don't insist on recursion, this works too:

from itertools import takewhile

is_tab = '\t'.__eq__

def build_tree(lines):
    lines = iter(lines)
    stack = []
    for line in lines:
        indent = len(list(takewhile(is_tab, line)))
        stack[indent:] = [line.lstrip()]
        print stack

source = '''ROOT
\tNode1
\t\tNode2
\t\t\tNode3
\t\t\t\tNode4
\tNode5
\tNode6'''

build_tree(source.split('\n'))

Result:

['ROOT']
['ROOT', 'Node1']
['ROOT', 'Node1', 'Node2']
['ROOT', 'Node1', 'Node2', 'Node3']
['ROOT', 'Node1', 'Node2', 'Node3', 'Node4']
['ROOT', 'Node5']
['ROOT', 'Node6']
share|improve this answer

I would not use recursion for something like this at all (Ok, maybe I would if I was coding this in a language like Scheme, but this is Python here). Recursion is great for iterating over data that is shaped like a tree, and in those cases it would simplify your design greatly when compared to normal loops.

However, this is not the case here. Your data surely represents a tree, but it's formatted sequentially, i.e. it is a simple sequence of lines. Such data is most easily processed with a simple loop, although you could make the design more general, if you wish, by separating it into three different layers: the sequential reader (which will parse the tabs as a specification of depth level), the tree inserter (which inserts a node into a tree in a specific depth level, by keeping track of the last node which was inserted into the tree) and the tree itself:

import re

# *** Tree representation ***
class Node(object):
    def __init__(self, title):
        self.title = title
        self.parent = None
        self.children = []

    def add(self, child):
        self.children.append(child)
        child.parent = self

# *** Node insertion logic ***
class Inserter(object):
    def __init__(self, node, depth = 0):
        self.node = node
        self.depth = depth

    def __call__(self, title, depth):
        newNode = Node(title)
        if (depth > self.depth):
            self.node.add(newNode)
            self.depth = depth
        elif (depth == self.depth):
            self.node.parent.add(newNode)
        else:
            parent = self.node.parent
            for i in xrange(0, self.depth - depth):
                parent = parent.parent
            parent.add(newNode)
            self.depth = depth

        self.node = newNode

# *** File iteration logic ***
with open(r'tree.txt', 'r') as f:    
    tree = Node(f.readline().rstrip('\n'))
    inserter = Inserter(tree)

    for line in f:
        line = line.rstrip('\n')
        # note there's a bug with your original tab parsing code:
        # it would count all tabs in the string, not just the ones
        # at the beginning
        tabs = re.match('\t*', line).group(0).count('\t')
        title = line[tabs:]
        inserter(title, tabs)

When I had to test this code before pasting it here, I wrote a very simple function to pretty print the tree that I read to memory. For this function, the most natural thing was to use recursion of course, because now the tree is indeed represented as tree data:

def print_tree(node, depth = 0):
    print '%s%s' % ('  ' * depth, node.title)
    for child in node.children:
        rec(child, depth + 1)

print_tree(tree)
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.