Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Suppose I have the following string:

trend  = '(A|B|C)_STRING'

I want to expand this to:

A_STRING
B_STRING
C_STRING

The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)

would expand to

STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D

I also want to cover the case of an empty conditional:

(|A_)STRING would expand to:

A_STRING
STRING

Here's what I've tried so far:

def expandOr(trend):
    parenBegin = trend.index('(') + 1
    parenEnd = trend.index(')')
    orExpression = trend[parenBegin:parenEnd]
    originalTrend = trend[0:parenBegin - 1]
    expandedOrList = []

    for oe in orExpression.split("|"):
        expandedOrList.append(originalTrend + oe)

But this is obviously not working.

Is there any easy way to do this using regex?

share|improve this question
1  
You realize you're discarding everything after the closing parenthesis, right? Do you not see a way to fix that? –  jwodder 1 hour ago
 
Not sure what you mean. The code works for the case where the parentheses come at the end the of the string. i.e. STRING_(A|B) –  Mark Kennedy 53 mins ago
 
Right, the code works there because there's nothing after the parentheses to discard, but if you input FOO_(A|B)_BAR, you get FOO_A and FOO_B, with the _BAR being discarded. Do you not realize that this is what's wrong with your code? Do you not see how you forgot to handle the substring after the )? –  jwodder 48 mins ago

1 Answer

I would do this to extract the groups:

def extract_groups(trend):
    l_parens = [i for i,c in enumerate(trend) if c == '(']
    r_parens = [i for i,c in enumerate(trend) if c == ')']
    assert len(l_parens) == len(r_parens)
    return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]

And then you can evaluate the product of those extracted groups using itertools.product:

expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]

Now it's just a question of splicing those back onto your original expression. I'll use re for that :)

#python3.3+
def _gen(it):
    yield from it

p = re.compile('\(.*?\)')

for tup in product(*extract_groups(trend)):
    gen = _gen(tup)
    print(p.sub(lambda x: next(gen),trend))

STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D

There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.