I am trying to finish an algorithm that computes the "most balanced tree". I have most of it done, but have been stuck really trying to get it correct for a while. It seems to be quite simple, which is embarrassing to me that it is taking so long. Balance in this case is different from the normal idea of a balanced tree, but in the fact that vertices have weights. If we have a bipartite graph $G= (A, B, E)$ all the weights of vertices in A are positive and the weights of vertices in B are negative. the balance of the tree is the sum of its weights. Here is the general idea for the algorithm (using dynamic programming):
Give root r a node u in part A with frequency f(u) has balance f(u) a node u in part B has balance -f(u) a subtree has balance that is sum of balances of nodes root r is possible only if either it belongs to A and the balance of S is between 0 and f(r) it belongs to B and the balance of S is between 0 and -f(r)
A subtree with root r in part A should have children subtrees with roots u1 .. ui that have negative balances with sum higher than -f(r) these balances are (minus) frequencies of quasispeies A symmetric condition applies to a root in part B
Dynamic programming counting or minimizing: instance is (S, r), if r is not a valid root, the result is null
One can see that the number of instances is roughly $2^n$ (for sets, we also specify roots) The number of steps where we look up two entries is roughly $3^n$ (times the selections of roots) The counting of comes from T subset S subset V We partition V into T, S-T, V-S
Comment: we select T that contains minimum index node of S this should avoid the possibility of creating/counting the same tree more than once
recurrance:
MinTree(S, root) =
-Case 1: empty set if if r is not valid in S
-Case 2: min_{S’ = partitions of S with respect to r} MinTree(S’ \ {r}, i) for all i \in S
Finally, here is the Python code. (Dynamic Programming part is _min_entropy_tree)
def min_entropy_fork_resolution(graph):
""" Resolve forks locally using minimum entropy trees.
Parameters
----------
graph : DecliquedAmpliconReadGraph
Amplicon read graph with fork vertices added.
Returns
-------
graph : DecliquedAmpliconReadGraph
Modified read graph with all forks resolved.
"""
if not graph:
raise ValueError("Non-empty Read-Graph expected!")
def _all_subsets_with(my_set, item):
# return all subsets with respect to the item
return reduce(lambda z, x: z + [y | {x} for y in z], my_set, [{item}])
def _balance(node, left, right):
# return the balance of the node i.e., f(node)
freq, _ = node
return freq if node in left else -freq
def _tree_balance(tree, left, right):
# return the balance of the tree
return sum(_balance(node, left, right) for node in tree)
def _is_valid(root, tree, left, right):
# check whether this tree has a valid balance
tbal = _tree_balance(tree, left, right)
rbal = _balance(root, left, right)
s = rbal + tbal
if root in left:
val = s >= 0.0 and s <= rbal
else:
val = s <= 0.0 and s >= rbal
return val
def _children(root, subset, left, right):
# get the immediate children of the root
if root in left:
return right & subset
elif root in right:
return left & subset
else:
return left | right
def _min_entropy_tree(nodes, root, left, right, trees):
# actual DP algorithm. compute the minimum entropy tree
# trees is our lookup table
if not _is_valid(root, nodes, left, right):
return None
rfreq, rnode = root
min_child = None
min_subtree = None
min_subtree_bal = 0
others = nodes - {root}
for subset in _all_subsets_with(others, root):
for child in _children(root, subset, left, right):
childfreq, childnode = child
if not trees[childnode]:
subtree = _min_entropy_tree(subset - {root}, child, left, \
right, trees)
# invalid subtree
if not subtree:
continue
else:
subtree = trees[childnode]
balance = _tree_balance(subtree, left, right)
if balance < min_subtree_bal:
min_subtree_bal = balance
min_subtree = trees[childnode]
min_child = child
if not others or not min_subtree:
tree = netx.Graph()
tree.add_node(root)
else:
tree = min_subtree.copy()
tree.add_edge(root, min_child)
trees[rnode] = tree
return tree
Finally, here is the code that calls the above functions.
left = set(lreads)
right = set(rreads)
allreads = left | right
min_tree_bal = sys.maxint
min_tree = None
# get the min entropy tree. DFS over the tree then
# and use the edge assignments to resolve the fork
for readpair in allreads:
rest = allreads - {readpair}
trees = {read: None for freq, read in rest}
tree = _min_entropy_tree(rest, readpair, left, right, trees)
balance = _tree_balance(tree, left, right)
if balance < min_tree_bal:
min_tree_bal = balance
min_tree = tree