To get a grasp of what to speedup I ran a profiler on your code.
To get the 1 million integers I made a function to generate them, with the following:
def generate_random_numbers(amount=1000000):
import random
n = [str(int(random.normalvariate(0, amount))) for _ in range(amount)]
with open('data', 'w') as f:
f.write('\n'.join(n))
Yes I did put the sigma as one million too.
I then copied your code into a function, removed the file part, and ran the profiler over it.
Resulting in the following code:
import cProfile
def read_nums():
with open('data', 'r') as fp:
return [int(line) for line in fp]
def has_two_sum(n,num_set,nums):
res = any(((n-x) in num_set) and 2*x !=n for x in nums)
return res
def two_sum(nums):
nums = [int(line) for line in nums]
num_set = set(nums)
return sum(1 for n in range(-10000,10001) if has_two_sum(n,num_set,nums))
nums = read_nums()
cProfile.run('two_sum({!r})'.format(nums))
Which resulted in the following profile of your code:
166366 function calls in 6.966 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.362 0.362 0.900 0.900 2sum.py:11(two_sum)
1 0.395 0.395 0.395 0.395 2sum.py:12(<listcomp>)
20002 0.015 0.000 0.139 0.000 2sum.py:14(<genexpr>)
20001 0.025 0.000 0.124 0.000 2sum.py:7(has_two_sum)
106356 0.052 0.000 0.052 0.000 2sum.py:8(<genexpr>)
1 0.216 0.216 1.116 1.116 <string>:1(<module>)
20001 0.047 0.000 0.095 0.000 {built-in method builtins.any}
1 5.851 5.851 6.966 6.966 {built-in method builtins.exec}
1 0.004 0.004 0.143 0.143 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Which is not 25 minuets for 6200 of the 20000.
Instead it's a lot less. It says it took 6.966, but cumtime of two_sums
is 0.900.
I then used a timer on it, which says it runs in ~0.899s.
Using:
print(timeit('two_sum({!r})'.format(nums), 'from __main__ import two_sum', number=100)/100)
So performance wise, I'd say it's not Python's fault, and would be very hard to optimize.
Instead I'd change your code to be a 'three line' function, but format the comprehensions to be more readable.
- I'd remove un-needed brackets
- Use a set compression.
- Remove odd whitespace. (
(1 for ...
)
def two_sum(nums):
nums = {int(n) for n in nums}
return sum(
1
for n in range(-10000, 10001)
if any(n - x in nums and 2 * x != n for x in nums)
)