Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Background:

I was working on a project were I needed to write some rules for text-processing. After working on this project for a couple of days and implementing some rules, I realized I needed to determine the order of the rules. No problem; we have topological sorting to help. But then I realized that I can't expect the graph to always be full. So I came up with this idea, that given a single rule with a set of dependencies (or a single dependency) I need to check the dependencies of the dependencies. Sounds familiar? Yes. This subject is very similar to depth-first-search of a graph. I am not a mathematician, nor did I study C.S. Hence, Graph Theory is a new field for me. Nevertheless, I implemented something below which works (inefficiently, I suspect).

The code:

This is my search and yield algorithm. If you run it on the examples below, you will see it visits some nodes more than once. Hence, the speculated inefficiency. A word about the input. The rules I wrote are basically Python classes, which have a class property depends. I was criticized for not using inspect.getmro- But this would complicate thing terribly because the class would need to inherit from each other (See example here).

def _yield_name_dep(rules_deps):
    global recursion_counter
    recursion_counter = recursion_counter +1 
    # yield all rules by their named and dependencies
    for rule, dep in rules_deps.items():
        if not dep:
            yield rule, dep
            continue
        else:
            yield rule, dep
            for ii in dep:
                i = getattr(rules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep(new_dep):
                        yield dep    
                else:
                    yield str(instance), instance.depends

Input to test:

demo_class_content ="""
class A(object):
    depends = 'B'
    def __str__(self):
        return self.__class__.__name__

class B(object):
    depends = ('C','F')
    def __str__(self):
        return self.__class__.__name__

class C(object):
    depends = ('D', 'E')
    def __str__(self):
        return self.__class__.__name__

class D(object):
    depends = None
    def __str__(self):
        return self.__class__.__name__   

class F(object):
    depends = 'E'
    def __str__(self):
        return self.__class__.__name__

class E(object):
    depends = None  
    def __str__(self):
        return self.__class__.__name__
"""       

with open('demo_classes.py', 'w') as clsdemo:
    clsdemo.write(demo_class_content)

import demo_classes as rules

rule_start={'A': 'B'}

def _yield_name_dep(rules_deps):
    # yield all rules by their named and dependencies
    for rule, dep in rules_deps.items():
        if not dep:
            yield rule, dep
            continue
        else:
            yield rule, dep
            for ii in dep:
                i = getattr(rules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep(new_dep):
                        yield dep    
                else:
                    yield str(instance), instance.depends

if __name__ == '__main__':
    # this is yielding nodes visited multiple times, 
    # list(_yield_name_dep(rule_start))
    # hence, my work around was to use set() ...
    rule_dependencies = list(set(_yield_name_dep(rule_start)))
    print rule_dependencies

The questions:

  • I tried classifying my work, and I think what I did is similar to DFS. Can you really classify it like this?
  • How can I improve this function to skip visited nodes, and still use generators?

answers ...

Using the feedback from Gareth and other kind users of Stackoverflow, here is what I came up with. It is clearer, and also more general:

def _dfs(start_nodes, rules, visited):
    """
    Depth First Search
    start_nodes - Dictionary of Rule with dependencies (as Tuples):    

        start_nodes = {'A': ('B','C')}

    rules - Dictionary of Rules with dependencies (as Tuples):
    e.g.
    rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 
             'D':(), 'E':(), 'F':()}
    The above rules describe the following DAG:

                    A
                   / \
                  B   C
                 / \ / \
                D   E   F
    usage:
    >>> rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 
                 'D':(), 'E':(), 'F':()}
    >>> visited = []
    >>> list(_dfs({'A': ('B','C')}, rules, visited))
    [('A', ('B', 'C')), ('B', ('D', 'E')), ('D', ()), ('E', ()), 
    ('C', ('E', 'F')), ('F', ())]
    """

    for rule, dep in start_nodes.items():
        if rule not in visited:
            yield rule, dep
            visited.append(rule)
            for ii in dep:
                new_dep={ ii : rules[ii]}
                for dep in _dfs(new_dep, rules, visited):
                    if dep not in visited:
                        yield dep
share|improve this question
    
At Code Review, we review working code (see here). For help getting your code to work, try Stack Overflow. –  Gareth Rees Oct 14 '13 at 9:55
    
@GarethRees, the code is working ... but it could be better –  Oz123 Oct 14 '13 at 10:06

1 Answer 1

up vote 1 down vote accepted

This code is not ready for review, as it's not working—it visits some rules more than once. If you need help writing a topological sort algorithm, Stack Overflow is the right place, not Code Review.

But I'll make one important observation about your code: there is no documentation. It is not clear from your code what it is supposed to do and how to use it. When you have a function whose behaviour is complex, it's important to start by writing a specification for it: what inputs does it take, and what output does it produce?

Once you've written the specification, stop and think. Is this specification clear? Is it easy for people to understand and use? If the answers are "no", then think about how you could change the specification.

If you try to do this, then you'll immediately see the problem. I had a go myself and this is what I came up with:

def _yield_name_dep(rules_deps):
    """Generate rules from rules_deps, together with their dependencies,
    in topological order: that is, each rule is generated before any of 
    the rules it depends on.

    The argument rules_deps must be a dictionary mapping a rule 
    object R to either
    (i) the name of the (single) rule that R depends on; or
    (ii) a sequence of names of rules that R depends on; or
    (iii) None if R has no dependencies.

    The names of the dependencies must be names of objects in the rules
    module, each of which must have a "depends" attribute containing the
    names of the rule's dependencies (as above).

    The generated output consists of pairs (rule, dependencies), with
    rule being a rule object (or the rule name, if the rule has no          
    dependencies), and dependencies as above.
    """

Did I get this right? Who knows? This is a terrible specification. Imagine being a programmer trying to read the documentation and figure out how to use the function!

So you must start by simplifying and clarifying the specification. Design the interface so that is easy to understand and easy to use, and then fixing the code will be straightforward.

share|improve this answer
    
thanks for your feedback. See my answer above. –  Oz123 Oct 16 '13 at 7:29

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.