Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I'm trying to extract the position (index) of a substring using regex. I need to use regex because the string won't be exactly the same. I want to get the position of the substring (either starting or ending position), so I can take the 1,000 characters following that substring.

For example, if I had "while foreign currencies are traded frequently, very little money is made by most." I want to find the position of "foreign currencies" so I can get all the words after.

f5 is the text.

I've tried:

p = re.compile("((^\s*|\.\s*)foreign\s*(currency|currencies))?")
for m in p.finditer(f5):
    print m.start(), m.group()

to get the location. This gives me (0,0) even though I've checked to make sure the regex picks up what I'm looking for in the text.

I've also tried:

location = re.search(r"((^\s*|\.\s*)foreign\s*(currency|currencies))?", f5)
print location

Output is <_sre.SRE_Match at 0x297d3328>

If I try

location.span() 

I get (0,0) again.

Basically, I want to convert <_sre.SRE_Match at 0x297d3328> into an integer that gives the location of the search term.

I've spent half a day searching for a solution. Thanks for any help.

share|improve this question
    
Can you give a short, copyable example of an f5 which doesn't work which should? – DSM May 13 '14 at 15:27
    
The SRE_Match is a match object in Python, so you're not going to be converting it at all. You need to extract your matches out of the object via group(), for one instance. – Signus May 13 '14 at 15:38
up vote 1 down vote accepted

In addition to previous solutions/comments, if you want all the words after, you can just do something like:

>>> location = re.search(r".*foreign\s*currenc(y|ies)(.*)", f5)
>>> location.group(2)
' are traded frequently, very little money is made by most.'

the .group(2) part matches the (.*) in the regexp.

share|improve this answer
    
Use a non-capturing group (?:y|ies) and (.*) will be captured in group 1 (slightly more logical/readable). – Sam May 13 '14 at 16:08
    
That did the trick! Thanks so much. – user2649353 May 13 '14 at 17:29

Your pattern includes everything before the word "foreign". So python will consider that part of your match. If you want to discard that, simply remove it from your search string.

Try:

 p = re.compile('foreign\s+(currency|currencies)?')
 m = p.search(s)
 m.start()

This also works with finditer:

 for m in p.finditer(s):
     m.start()
share|improve this answer

Don't have much experience in Python, so I can't directly answer your question. But if you want the substring starting with the match, why don't you just match the rest of the string OR remove everything before the match.

Example 1:

Match foreign currenc(y|ies) followed by every other character in the String. I used the s modifier so that the dot matches new lines as well.

foreign\s+currenc(?:y|ies).*

Example 2:

Replace this expression with an empty String. This will lazily match everything up until the lookahead of foreign currenc(y|ies) is matched.

.*?(?=foreign\s+currenc(?:y|ies))

Note: I changed (currency|currencies) to currenc(?:y|ies) because it is slightly more efficient.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.