Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I need to resolve "multiple paths" specified like so:

p1 = 'item.element.[compact|fullsize].documents.[note|invoice]'

A list like this is expected:

['item.element.compact.documents.note', 'item.element.fullsize.documents.note',
 'item.element.compact.documents.invoice', 'item.element.fullsize.documents.invoice']

Code:

def resolve_paths(path):
    parts = path.split('.')
    depth = len(parts)
    new_paths = []
    for level, part in enumerate(parts):
        mult_branches = re.findall(r'\[(\w+)(?:\|(\w+))*\]', part)
        if mult_branches:
            mult_branches = flatten_iterable(mult_branches)
            for branch in mult_branches:
                interm_path = '.'.join(parts[:level] + [branch] + parts[level+1:])
                new_paths.extend(resolve_paths(interm_path))
            return new_paths
        elif level == depth - 1:
            new_paths.append(path)
    return new_paths

Several tests I wrote for this function pass, but I'm not entirely happy with this solution, it's somewhat convoluted. Better solutions? Simplifications?

share|improve this question
    
What is flatten_iterable()? –  Curt F. yesterday

3 Answers 3

What do you think this will return for part = '[compact|fullsize|x|y]'

re.findall(r'\[(\w+)(?:\|(\w+))*\]', part)

It will give:

[('compact', 'y')]

Because, when a group matches multiple times, the last match overwrites previous matches, as per the docs. If you want to support multiple patterns, I'm afraid you will have to do in two steps: matching and then splitting.


The part I don't like about this is that you split path to parts, then when you need to recurse, you re-join the parts again, and in the recursion step it will be split again, and so on. That's a lot of splitting and joining and iterations. The recursive logic can be also tricky to understand.

Here's an alternative, without using recursion and unnecessary splitting, joining, iteration:

from collections import deque


def resolve_paths(path):
    parts = deque(path.split('.'))
    paths = [[]]

    while parts:
        part = parts.popleft()
        branches = re.findall(r'\[(\w+)(?:\|(\w+))*\]', part)
        if branches:
            orig_paths = paths[:]
            paths = []
            for branch in branches[0]:
                for path in orig_paths:
                    paths.append(path + [branch])
        else:
            for path in paths:
                path.append(part)

    return ['.'.join(path) for path in paths]
share|improve this answer

This combinatorial problems typically have a compact and elegant solution using itertools. In this case, it is itertools.product you want to use:

from itertools import product

def resolve_paths(path):
    subpaths = path.split('.')
    for idx, subpath in enumerate(subpaths):
        if subpath[0] == '[' and subpath[-1] == ']':
            subpaths[idx] = subpath[1:-1].split('|')
        else:
            subpaths[idx] = [subpath]
    for path in product(*subpaths):
        yield '.'.join(path)

I have made it an iterator, which I like better for this type of problems.

>>> path = 'item.element.[compact|fullsize].documents.[note|invoice]'
>>> list(resolve_paths(path))
['item.element.compact.documents.note',
 'item.element.compact.documents.invoice',
 'item.element.fullsize.documents.note',
 'item.element.fullsize.documents.invoice']
share|improve this answer
    
I feel like this is the best solution presented so far. It avoids recursion, is the most compact, and doesn't even use any regular expressions. –  Curt F. yesterday
    
+1 I agree, awesome solution –  mjgpy3 19 hours ago
    
Elegant, best solution +1. It also treats the issue I pointed out in my first point. –  janos 17 hours ago

The question's not too specific, so here's a "competing" solution to compare:

def resolve_paths(components, results = []):
  if not components:
    return results

  current = components[0]
  branches = split_branches(current) if has_branches(current) else [current]

  return resolve_paths(components[1:], list(append_all(branches, results)) or branches)

def append_component(acc, component):
  return acc + '.' + component

def split_branches(component):
  return component[1:-1].split('|')

def split_components(path):
  return path.split('.')

def has_branches(component):
  return component.startswith('[')

def append_all(branches, results):
  for branch in branches:
    for result in results:
      yield append_component(result, branch)


p1 = 'item.element.[compact|fullsize].documents.[note|invoice]'

print resolve_paths(split_components(p1))

Positives

  • resolve_paths is fairly simple and straight-forward. It's using an recursion and accumulator pattern which clears up a lot of bookkeeping.
  • Overall the code's pretty explicit (I believe). For example segments like split_branches(current) if has_branches(current) else [current] are usually pretty nasty, arguably, however, the named helper methods enhance explicitness.
  • I made a lot of assumptions about parsing that simplified out the regular expression. If these assumptions are wrong, that's okay, they're pretty easily changeable as they are in their own methods.

Negatives

  • Still recursive, so it could break in extreme edge-cases
  • Some expressions are overloaded (e.g. list(append_all(branches, results)) or branches)
  • The list(list(append_all(branches, results)) or branches)/yield pattern is kind of gross. I did it to alleviate the need for an accumulator list in append_all.
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.