Optimizing Project Euler Solution #12

Question

Just to reiterate my prev. post, as I think it helps structure my weaknesses.

Coding methodology: Is the algorithm I used feasible or mostly correct? If not, how do I correct it and why won't it work or scale? I realize in some cases there are always going to be optimal solutions that will be over my head mathematically, but generally you can yield a solution that is within a reasonable range of optimal that is passable without knowing a ton of number theory in my experience. Where possible, I hope to learn algorithms along the way (such as the Sieve of Eratosthenes for prime ranges).
Coding execution: Given some pseudocode that we determine algorithmically will work, is the code written executed without redundancies and minimizing runtime?
Being more pythonic: Is something I'm not doing pythonic or can it be done simpler with some sort of library I don't know about? I believe this is mostly semantic, though in some cases can have large effects on performance (for instance, using map() instead of a for loop to apply int()).

I submitted the following code as a solution to Project Euler's #12. It's extremely slow, running in mean 7.13 seconds over 100 trials (I think; just using other submitted solutions as a benchmark. It seems as though solutions <1 seconds are generally decent.)

def factors(n):    
    return set(reduce(list.__add__, 
                ([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))

def triangle_number(n):
    for i in range(n):
        n += i  
    return n

n = 1
while True:
    if len(factors(triangle_number(n))) >= 500:
        print n, triangle_number(n)   
        break
    else:
        n +=1

The code for the factors function was borrowed from here.

If you're referring to the project euler problem, here you go — mburke05, 6 hours ago

Caridorc · Answer 1 · 2015-09-20 21:23:27Z

up vote 6 down vote

Use itertools

Itertools is a seriously overpowered module in the Python stlib, so use it as much as possible, for example the following:

reduce(list.__add__,

should be:

itertools.chain.from_iterable

or, for people less used to itertools:

flatten = itertools.chain.from_iterable

and you can just use flatten.

And manually incrementing in a while loop should seriously be avoided, use itertools.count, like this:

for i in itertools.count():

(i is usually used as a loop counter)

Math formula

The function:

triangle_number

should be written as return sum(range(n)), but there is a fast O(1) formula for it.

edited 11 hours ago

answered 12 hours ago

Caridorc
8,7602949

Thanks for the input. I didn't totally follow the syntax for list.__add__ so .chain is also confusing for me. Anyway you could clarify on add and .chain and what each do? Also looking at the .count(), isn't it just a wrapper of the same while True function I just wrote out? Or am I missing some nuance that makes it way faster. – mburke05 11 hours ago

@mburke05 Count is in C faster than Python. Avoiding explicit incrementing is good anyway. – Caridorc 11 hours ago

@mburke05 About chain, I mean using flatten instead of reduce and add – Caridorc 11 hours ago

add a comment |

WChargin · Answer 2 · 2015-09-21 04:18:20Z

This is an addition to Caridorc's answer.

The usual reason for using itertools is that you don't want to construct lists when you don't need to. For example, in Python 2, you'd write for i in xrange(100) instead of for i in range(100) so that you don't have to construct a whole list of 100 numbers when you don't need it. If this doesn't make sense to you, familiarize yourself with the difference between generator expressions (genexprs) and list types.

In your case, however, there's a much more serious problem. You've got a Shlemiel the painter's algorithm!

When you write reduce(list.__add__, list_of_lists), you're essentially writing list_1 + ... + list_n. Python will evaluate this as follows:

Take list_1 and list_2. Create a new list object of the proper size, and copy the elements from lists 1 and 2 into this new list. Call it intermediate_1.
Take intermediate_1 and list_3. Create a new list of the proper size, and copy the elements from these two lists into the new list. Call it intermediate_2.
…
Take intermediate_(n-1) and list_n. Create a new list of the proper size, and copy the elements from these two lists into the new list. This is the final result.

Do you see the problem? This code is quadratic in both runtime and memory usage.

Here's some hard data. Consider the following function:

def fn(n):
    lists = [[0] * 10000 for i in xrange(n)]
    combined = reduce(list.__add__, lists)

I ran this for values of n of 1, 10, 100, 200, 250, 300, 400, and 500 (using IPython's %timeit fn(100), which does proper timing), and got the following results:

This definitely looks quadratic, and the coefficient of determination is R² = 0.99655, which indicates an extremely strong correlation.

When we instead use

def fast(n):
    combined = list(itertools.chain.from_iterable(
        [0] * 10000 for i in xrange(n)))

the time for fast(500) is just 65.8 ms instead of 7.42 s. (This shows that it is indeed the Shlemiel factor and not just large lists that's causing the slowdown.)

So, tl;dr, yes, use iterators—but make sure you know why!

So when use `reduce`?

Just a review of what reduce does for binary operators:

reduce(int.__add__, [x1, x2, x3], x0) = (((x0 + x1) + x2) + x3)

This is extended to general functions as follows:

reduce(f, [x1, x2, x3], x0) = f(f(f(x0, x1), x2), x3)

You should use reduce when you want to write code that looks like either of the above examples. Basically, reduce just saves you a loop.

Conceptually, you use reduce when you want to collapse a bunch of things into one value.

It may help to think of reduce in comparison with its other core functional operations:

Use map when you want to transform a bunch of values to other things (like "reverse all these strings"). You use map in list comprehensions when you write [s[::-1] for s in strings].
Use filter when you want to take a bunch of values and keep just some of them (like "take just the strings that are palindromes"). You use reduce in list comprehensions when you write [s for s in strings if s == s[::-1]].
Use reduce when you want to collapse a bunch of things into one value (like "concatenate all these strings"). You can't use reduce in list comprehensions, because you're not creating a new list (just a value).

Notes:

The problem in your case is that the version with list_1 + ... + list_n is just as bad in terms of efficiency. reduce wasn't your problem; it was what reduce expanded to.
With reduce, you can omit the third argument x0, and reduce will use the first element of the sequence, or raise a TypeError if the sequence is empty.

Before noting some examples, it may be helpful to realize that some operations from __builtins__ are just reduce under the hood. For example:

sum(ints) = reduce(int.__add__, ints, 0);
all(bools) = reduce(lambda x, y: x and y, bools, True), and, similarly;
any(bools) = reduce(lambda x, y: x or y, bools, False);
min(ints) is like reduce(lambda x, y: x if x < y else y, ints) (it's actually a bit smarter), and, similarly;
max(ints) is like reduce(lambda x, y: x if x > y else y, ints).

A few quick examples:

def factorial(n):
    return reduce(int.__mul__, xrange(1, n + 1), 1)

def replay_chess_game(moves):
    return reduce(lambda board, move: board.then(move),
                  moves,
                  INITIAL_BOARD)

def perform_git_rebase(commits):
    return reduce(lambda head, commit: head.cherry_pick(commit),
                  commits,
                  git_globals.refs.HEAD)

I don't use reduce all that much in Python, but you should definitely know when to use it.

This is awesome thanks! I had no clue reduce worked in such a fashion. In what scenarios would reduce() be optimal to use rather than something that forwent using intermediary lists in memory? Also, WOW i had no clue generators could make much of a difference. I need to better understand generators and itertools better. Any good resources you'd recommend? — mburke05, 8 hours ago
@mburke05 added a section about reduce. Yeah, you should read up on generators! They're awesome—both in that they save you a ton of memory and runtime, and also in that they're a super powerful language feature that actually lets you "pause" execution of a function and return at a later time. (Look up "coroutines" if you're interested.) I'm not sure what aspect of them you want to learn more about, so I don't know any specific resources. Any thoughts? — WChargin, 4 hours ago

SuperBiasedMan · Answer 3 · 2015-09-21 08:47:42Z

In it's current state, it's hard for me to realise what factors is doing. The name indicates something, and I see it returns a set. So I'd assume that you're getting factors from n but your code is complicated and inscrutable to me. You should add a docstring explaining the basics of your function.

def factors(n):    
    """Return a set of factors by fooing the bar"""
    return set(reduce(list.__add__, 
                ([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))

It doesn't need to be extensive but a clue about implementation is useful.

triangle_number is also unclearly named, because what it returns is just a sum of numbers up to n. If triangle_number is a particular mathematical or programming concept, you should note that (though you don't need to explain it for people, they can look it up themselves). In this case I knew as you linked to the challenge itself, but bear in mind that the code itself shows no such association.

Also you could just use sum. sum is a built in function that takes a generator expression that @WChargin talked about. In simple terms, a generator is like a for loop condensed into a one line expression. So you could just do this:

def triangle_number(n):
    return sum(i for i in range(n))

asked	today
viewed	207 times
active	today

current community

your communities

more stack exchange communities

Optimizing Project Euler Solution #12

3 Answers 3

So when use `reduce`?

Your Answer

Not the answer you're looking for? Browse other questions tagged python programming-challenge python-2.7 or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Optimizing Project Euler Solution #12

3 Answers 3

So when use reduce?

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python programming-challenge python-2.7 or ask your own question.

Related

Hot Network Questions

So when use `reduce`?