0

I have a situation where I have a regular expression like this

regex_string = r'(?P<x>\d+)\s(?P<y>\w+)'
r = re.compile(regex_string)

and, before I start matching things with it, I'd like to replace the regex group named x with a particular value, say 2014. This way, when I search for matches to this regular expression, we will only find things that have x=2014. What is the best way to approach this issue?

The challenge here is that both the original regular expression regex_string and the arbitrary replacement value x=2014 are specified by an end user. In my head, the ideal thing would be to have a function like replace_regex:

r = re.compile(regex_string)
r = replace_regex_variables(r, x=2014)
for match in r.finditer(really_big_string):
    do_something_with_each_match(match)

I'm open to any solution, but specifically interested in understanding if its possible to do this without checking matches after they are returned by finditer to take advantage of re's performance. In other words, preferrably NOT this:

r = re.compile(regex_string)
for match in r.finditer(really_big_string):
    if r.groupdict()['x'] == 2014:
        do_sometehing_with_each_match(match)
6
  • Not without re-building the regular expression pattern itself, no. That requires parsing the string pattern, replacing the group with the literal text it must match, recompiling the pattern and returning that. Commented Apr 18, 2014 at 13:19
  • 3
    It'll be much easier to just verify that r.group('x') is equal to '2014'. The parsing will have to take into account nested groups, for example. Commented Apr 18, 2014 at 13:20
  • @MartijnPieters Recompiling the regular expression is totally fine by me. Any suggestions on how to replace the original variable in the regex string with values in a smart way? Commented Apr 18, 2014 at 13:21
  • Care to limit this to a subset of regex? Can the pattern match literal parenthesis, question marks and angle brackets, for example? Can there be nested groups? Commented Apr 18, 2014 at 13:22
  • @MartijnPieters The pattern for all intensive purposes could match anything, but I think it is safe to assume that there will not be nested groups. Commented Apr 18, 2014 at 13:29

1 Answer 1

0

You want something like this, don't you?

r = r'(?P<x>%(x)s)\s(?P<y>\w+)'
r = re.compile(r % {x: 2014})
for match in r.finditer(really_big_string):
    do_something_with_each_match(match)
2
  • No, the OP wants to be able to use the original pattern still too, and/or do the same to an arbitrary number of named patterns. Commented Apr 18, 2014 at 13:21
  • Nice idea, but as @MartijnPieters mentioned, the regex is provided by an end user and, separately, so is the x=2014 bit. I don't know a priori which parts of the regex will be matched. I'll clarify that in the question. Commented Apr 18, 2014 at 13:24

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.