Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I wanted to match the numeric values of a string:

1,000 metric tonnes per contract month
Five cents ($0.05) per tonne
Five cents ($0.05) per tonne
1,000 metric tonnes per contract month

My current approach:

size = re.findall(r'(\d+(,?\d*).*?)', my_string)

What I get with my approach:

print size
[(u'1,000', u',000')]

As you can see, the number 1 was being cut out from the second element of the list, why is that? Also, could I get a hint as to how I can match the $0.05 terms?

share|improve this question
    
I've edited my code to include '$' as well. –  Ashwini Chaudhary Jun 20 '13 at 12:24

5 Answers 5

up vote 3 down vote accepted

Something like this:

>>> import re
>>>  strs = """1,000 metric tonnes per contract month
Five cents ($0.05) per tonne
Five cents ($0.05) per tonne
1,000 metric tonnes per contract month"""
>>> [m.group(0) for m in re.finditer(r'\$?\d+([,.]\d+)?', strs)]
['1,000', '$0.05', '$0.05', '1,000']

Demo : http://rubular.com/r/UomzIY3SD3

share|improve this answer
    
This also matches 1,00000 and doesn't match 1,000.05. –  Tim Pietzcker Jun 20 '13 at 12:12

re,findall() returns a tuple of all the capturing groups for each match, and each set of normal parentheses generates one such group. Write your regex like this:

size = re.findall(r'\d{1,3}(?:,\d{3})*(?:\.\d+)?', my_string)

Explanation:

\d{1,3}      # One to three digits
(?:,\d{3})*  # Optional thousands groups
(?:\.\d+)?   # Optional decimal part

This assumes that all numbers have commas as thousands separators, i. e. no numbers like 1000000. If you need to match those too, use

size = re.findall(r'\d+(?:,\d{3})*(?:\.\d+)?', my_string)
share|improve this answer
1  
This fails on simple numbers such as '1000'. –  larsmans Jun 20 '13 at 12:11

Try this regex:

(\$?\d+(?:[,.]?\d*(?:\.\d+)?)).*?

Live demo

share|improve this answer
    
Have a look at my solution. –  NeverHopeless Jun 20 '13 at 12:26

Why are you grouping your regex? Try this r'\$?\d+,?\d*\.?\d*'

share|improve this answer

I would try this regex:

r'[0-9]+(?:,[0-9]+)(?:.[0-9])?'

Add \$? at the beginning to optionally catch the $

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.