Using regex to extract string position Python

Question

I'm trying to extract the position (index) of a substring using regex. I need to use regex because the string won't be exactly the same. I want to get the position of the substring (either starting or ending position), so I can take the 1,000 characters following that substring.

For example, if I had "while foreign currencies are traded frequently, very little money is made by most." I want to find the position of "foreign currencies" so I can get all the words after.

f5 is the text.

I've tried:

p = re.compile("((^\s*|\.\s*)foreign\s*(currency|currencies))?")
for m in p.finditer(f5):
    print m.start(), m.group()

to get the location. This gives me (0,0) even though I've checked to make sure the regex picks up what I'm looking for in the text.

I've also tried:

location = re.search(r"((^\s*|\.\s*)foreign\s*(currency|currencies))?", f5)
print location

Output is <_sre.SRE_Match at 0x297d3328>

If I try

location.span()

I get (0,0) again.

Basically, I want to convert <_sre.SRE_Match at 0x297d3328> into an integer that gives the location of the search term.

I've spent half a day searching for a solution. Thanks for any help.

Can you give a short, copyable example of an f5 which doesn't work which should? — DSM, May 13 '14 at 15:27
The SRE_Match is a match object in Python, so you're not going to be converting it at all. You need to extract your matches out of the object via group(), for one instance. — Signus, May 13 '14 at 15:38

fredtantini · Accepted Answer · 2014-05-13 15:44:59Z

up vote 1 down vote accepted

In addition to previous solutions/comments, if you want all the words after, you can just do something like:

>>> location = re.search(r".*foreign\s*currenc(y|ies)(.*)", f5)
>>> location.group(2)
' are traded frequently, very little money is made by most.'

the .group(2) part matches the (.*) in the regexp.

answered May 13 '14 at 15:44

fredtantini

6,59751430

Use a non-capturing group (?:y|ies) and (.*) will be captured in group 1 (slightly more logical/readable). – Sam May 13 '14 at 16:08

That did the trick! Thanks so much. – user2649353 May 13 '14 at 17:29

add a comment |

Hans Then · Answer 2 · 2014-05-13 15:37:00Z

Your pattern includes everything before the word "foreign". So python will consider that part of your match. If you want to discard that, simply remove it from your search string.

Try:

 p = re.compile('foreign\s+(currency|currencies)?')
 m = p.search(s)
 m.start()

This also works with finditer:

 for m in p.finditer(s):
     m.start()

Sam · Answer 3 · 2014-05-13 15:29:54Z

Don't have much experience in Python, so I can't directly answer your question. But if you want the substring starting with the match, why don't you just match the rest of the string OR remove everything before the match.

Example 1:

Match foreign currenc(y|ies) followed by every other character in the String. I used the s modifier so that the dot matches new lines as well.

foreign\s+currenc(?:y|ies).*

Example 2:

Replace this expression with an empty String. This will lazily match everything up until the lookahead of foreign currenc(y|ies) is matched.

.*?(?=foreign\s+currenc(?:y|ies))

Note: I changed (currency|currencies) to currenc(?:y|ies) because it is slightly more efficient.

asked	1 year ago
viewed	133 times
active	1 year ago

current community

your communities

more stack exchange communities

Using regex to extract string position Python

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged python regex string substring or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Using regex to extract string position Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python regex string substring or ask your own question.

Linked

Related

Hot Network Questions