Python regex string expansion

Question

Suppose I have the following string:

trend  = '(A|B|C)_STRING'

I want to expand this to:

A_STRING
B_STRING
C_STRING

The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)

would expand to

STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D

I also want to cover the case of an empty conditional:

(|A_)STRING would expand to:

A_STRING
STRING

Here's what I've tried so far:

def expandOr(trend):
    parenBegin = trend.index('(') + 1
    parenEnd = trend.index(')')
    orExpression = trend[parenBegin:parenEnd]
    originalTrend = trend[0:parenBegin - 1]
    expandedOrList = []

    for oe in orExpression.split("|"):
        expandedOrList.append(originalTrend + oe)

But this is obviously not working.

Is there any easy way to do this using regex?

You realize you're discarding everything after the closing parenthesis, right? Do you not see a way to fix that? — jwodder, 1 hour ago
Not sure what you mean. The code works for the case where the parentheses come at the end the of the string. i.e. STRING_(A|B) — Mark Kennedy, 53 mins ago
Right, the code works there because there's nothing after the parentheses to discard, but if you input FOO_(A|B)_BAR, you get FOO_A and FOO_B, with the _BAR being discarded. Do you not realize that this is what's wrong with your code? Do you not see how you forgot to handle the substring after the )? — jwodder, 48 mins ago

roippi · Answer 1 · 2013-11-19 02:07:19Z

I would do this to extract the groups:

def extract_groups(trend):
    l_parens = [i for i,c in enumerate(trend) if c == '(']
    r_parens = [i for i,c in enumerate(trend) if c == ')']
    assert len(l_parens) == len(r_parens)
    return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]

And then you can evaluate the product of those extracted groups using itertools.product:

expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]

Now it's just a question of splicing those back onto your original expression. I'll use re for that :)

#python3.3+
def _gen(it):
    yield from it

p = re.compile('\(.*?\)')

for tup in product(*extract_groups(trend)):
    gen = _gen(tup)
    print(p.sub(lambda x: next(gen),trend))

STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D

There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.

asked	today
viewed	33 times
active	today

Python regex string expansion

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python regex string or ask your own question.

Python regex string expansion

1 Answer

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged python regex string or ask your own question.

Related