Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am trying to finish an algorithm that computes the "most balanced tree". I have most of it done, but have been stuck really trying to get it correct for a while. It seems to be quite simple, which is embarrassing to me that it is taking so long. Balance in this case is different from the normal idea of a balanced tree, but in the fact that vertices have weights. If we have a bipartite graph $G= (A, B, E)$ all the weights of vertices in A are positive and the weights of vertices in B are negative. the balance of the tree is the sum of its weights. Here is the general idea for the algorithm (using dynamic programming):

Give root r a node u in part A with frequency f(u) has balance f(u) a node u in part B has balance -f(u) a subtree has balance that is sum of balances of nodes root r is possible only if either it belongs to A and the balance of S is between 0 and f(r) it belongs to B and the balance of S is between 0 and -f(r)

A subtree with root r in part A should have children subtrees with roots u1 .. ui that have negative balances with sum higher than -f(r) these balances are (minus) frequencies of quasispeies A symmetric condition applies to a root in part B

Dynamic programming counting or minimizing: instance is (S, r), if r is not a valid root, the result is null

One can see that the number of instances is roughly $2^n$ (for sets, we also specify roots) The number of steps where we look up two entries is roughly $3^n$ (times the selections of roots) The counting of comes from T subset S subset V We partition V into T, S-T, V-S

Comment: we select T that contains minimum index node of S this should avoid the possibility of creating/counting the same tree more than once

recurrance:

MinTree(S, root) =

-Case 1: empty set if if r is not valid in S

-Case 2: min_{S’ = partitions of S with respect to r} MinTree(S’ \ {r}, i) for all i \in S

Finally, here is the Python code. (Dynamic Programming part is _min_entropy_tree)

def min_entropy_fork_resolution(graph):
    """ Resolve forks locally using minimum entropy trees.

        Parameters
        ----------
        graph : DecliquedAmpliconReadGraph
            Amplicon read graph with fork vertices added.

        Returns
        -------
        graph : DecliquedAmpliconReadGraph
            Modified read graph with all forks resolved.
    """
    if not graph:
        raise ValueError("Non-empty Read-Graph expected!")

    def _all_subsets_with(my_set, item):
        # return all subsets with respect to the item
        return reduce(lambda z, x: z + [y | {x} for y in z], my_set, [{item}])

    def _balance(node, left, right):
        # return the balance of the node i.e., f(node)
        freq, _ = node
        return freq if node in left else -freq

    def _tree_balance(tree, left, right):
        # return the balance of the tree
        return sum(_balance(node, left, right) for node in tree)

    def _is_valid(root, tree, left, right):
        # check whether this tree has a valid balance
        tbal = _tree_balance(tree, left, right)
        rbal = _balance(root, left, right)
        s = rbal + tbal
        if root in left:
            val = s >= 0.0 and s <= rbal
        else:
            val = s <= 0.0 and s >= rbal

        return val

    def _children(root, subset, left, right):
        # get the immediate children of the root
        if root in left:
            return right & subset
        elif root in right:
            return left & subset
        else:
            return left | right

    def _min_entropy_tree(nodes, root, left, right, trees):
        # actual DP algorithm. compute the minimum entropy tree
        # trees is our lookup table
        if not _is_valid(root, nodes, left, right):
            return None

        rfreq, rnode = root
        min_child = None
        min_subtree = None
        min_subtree_bal = 0
        others = nodes - {root}
        for subset in _all_subsets_with(others, root):
            for child in _children(root, subset, left, right):
                childfreq, childnode = child
                if not trees[childnode]:
                    subtree = _min_entropy_tree(subset - {root}, child, left, \
                                                right, trees)
                    # invalid subtree
                    if not subtree:
                        continue
                else:
                    subtree = trees[childnode]

                balance = _tree_balance(subtree, left, right)
                if balance < min_subtree_bal:
                    min_subtree_bal = balance
                    min_subtree = trees[childnode]
                    min_child = child

        if not others or not min_subtree:
            tree = netx.Graph()
            tree.add_node(root)
        else:
            tree = min_subtree.copy()
            tree.add_edge(root, min_child)

        trees[rnode] = tree
        return tree

Finally, here is the code that calls the above functions.

left = set(lreads)
    right = set(rreads)
    allreads = left | right
    min_tree_bal = sys.maxint
    min_tree = None
    # get the min entropy tree. DFS over the tree then
    # and use the edge assignments to resolve the fork
    for readpair in allreads:
        rest = allreads - {readpair}
        trees = {read: None for freq, read in rest}
        tree = _min_entropy_tree(rest, readpair, left, right, trees)
        balance = _tree_balance(tree, left, right)
        if balance < min_tree_bal:
            min_tree_bal = balance
            min_tree = tree
share|improve this question
Does this code work? – Winston Ewert Feb 5 '12 at 20:16
It runs, but does not produce the correct answer. I believe it has to do with how I am storing results in my look-up table. It requires the use of NetworkX and will run. It just does not produce the min forest. – Nicholas Mancuso Feb 5 '12 at 20:38
Code that doesn't produce the right answer is off-topic here. We improve working code. If you need help with fixing your code the place to look is Stackoverflow. But they won't like the question in its current state. You should at least provide a failing case for the code, and preferably simplify it as much as possible. – Winston Ewert Feb 5 '12 at 21:12
Ok. Is it possible to transfer it there? Or would it be better for me to repost? – Nicholas Mancuso Feb 5 '12 at 21:31
1  
FYI, I've started a discussion on meta about changing the FAQ: meta.codereview.stackexchange.com/questions/464/… – Winston Ewert Feb 6 '12 at 3:08
show 6 more comments

closed as off topic by Winston Ewert Feb 5 '12 at 21:13

Questions on Code Review Stack Exchange are expected to relate to code review request within the scope defined in the FAQ. Consider editing the question or leaving comments for improvement if you believe the question can be reworded to fit within the scope. Read more about closed questions here.