0

I'm trying to read a .txt file (data is ASCII textbook material stuff) with strings of numbers scattered throughout the file. I'm trying to extract those numbers to tag them to a list using regex and eventually add all the values as integers into a sum variable and print it. The problem is when I run this code:

import re

hand = open('regexTextData.txt')
numbers = list()
for line in hand:
        if len(line) == 0: continue
        extractedNumbers = re.findall('[0-9+]', line)
        numbers = extractedNumbers + numbers

total = 0
for i in range(len(numbers)):
        value = int(numbers[i])
        total = total + value

print(total)

I run into an error:

Traceback (most recent call last):
  File "sum_numbers_in_text_regex.py", line 13, in <module>
    value = int(numbers[i]) 
ValueError: invalid literal for int() with base 10: '+'

What exactly went wrong here? I tried looking at other solutions but to no avail. If I missed a page that covered it I would like to know please.

Thanks ahead of time for reading

2 Answers 2

2
for n in range(len(numbers)): 

not

for n in len(numbers): 

FINAL EDIT: FINISHED PROGRAM

import re

hand = open('regexTextData.txt')
numbers = [] # no need of writing out list(), just use []
for line in hand:
        if len(line) == 0: continue
        extractedNumbers = re.findall('[0-9]+', line) # Do not use '+' as that matches the '+' symbols.
        numbers = extractedNumbers + numbers

total = 0
for i in range(len(numbers)):
        value = int(numbers[i]) # Now all your values in numbers should be in numerical string form.
        total = total + value

print(total)

Just needed to change the regex pattern to '([0-9]+)' and it would recognize all strings with numbers. This fixed the program.

Your main problem was the regex. Let's say we had some example text as line = "0 and 1 and 2 and 2 + and yes mate"

re.findall('[0-9+]', line) # Outputs: ['0', '1', '2', '2', '+']. We have matched a '+' because you have include the plus symbol in your regex.

Solution (remove +):

re.findall('([0-9]+)', line) # Outputs: ['0', '1', '2', '2'] # No more '+'.

BONUS: If you are interested, you can also replace this code:

total = 0
for i in range(len(numbers)):
        value = int(numbers[i]) # Now all your values in numbers should be in numerical string form.
        total = total + value

with this simplified code:

total = sum(map(lambda x: int(x), numbers))

lambda is an anonymous function that takes x as input and outputs int(x). map is a function where it applies a function (our lamda function) on each element of numbers. Finally, sum will simply add up the numbers found in an iterable (after applying the map function which returns an iterable, we will have integers only).

I like the solution you posted and it is probably more efficient but for the purposes of understanding regex I need to use regex. Appreciate the alternative solution though.

5
  • thanks. I totally mistyped and fixed the past error. What is wrong now though? Commented Dec 12, 2016 at 8:12
  • 1
    Your numbers does not have a list of numerical strings. Please see my edited answer for the new solution. Commented Dec 12, 2016 at 8:28
  • I understand the list of numerical strings but the numbers are greater than 1 digit sometimes. "37820" might be an example of a numerical string you would see in this txt file. Sorry for not being more specific! Commented Dec 12, 2016 at 8:52
  • last edit finished the program. Thank you for your help in identifying the source of the problem your explanation was very clear Commented Dec 12, 2016 at 8:57
  • No worries - i just made a change to regex code. It will not match digits that have a length greater than 1. Commented Dec 12, 2016 at 9:14
1

You are trying to iterate an integer. Instead, try iterating a range:

for n in range(len(numbers)):
    value = int(numbers[n])
    sum = sum + value

Also note the change from numbers[i] to numbers[n].

1
  • thanks. I totally mistyped and fixed the past error. What is wrong now though? Commented Dec 12, 2016 at 8:12

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.