Python implementation of the longest increasing subsequence problem

Question

Prompted by this question on Stackoverflow, I wrote an implementation in Python of the longest increasing subsequence problem. In a nutshell, the problem is: given a sequence of numbers, remove the fewest possible to obtain an increasing subsequence (the answer is not unique). Perhaps it is best illustrated by example.

>>> elems
[25, 72, 31, 32, 8, 20, 38, 43, 85, 39, 33, 40, 98, 37, 14]
>>> subsequence(elems)
[25, 31, 32, 38, 39, 40, 98]

The code below works, but I am sure it could be made shorter and / or more readable. Can any more experienced Python coders offer some suggestions?

edited to add a description: The algorithm iterates over the input array, X, while keeping track of the length longest increasing subsequence found so far (L). It also maintains an array M of length L where M[j] = "the index in X of the final element of the best subsequence of length j found so far" where best means the one that ends on the lowest element. It also maintains an array P which constitutes a linked list of indices in X of the best possible subsequences (e.g. P[j], P[P[j]], P[P[P[j]]] ... is the best subsequence ending with X[j], in reverse order). P is not needed if only the length of the longest increasing subsequence is needed.

from random import randrange
from bisect import bisect_left


def randomList(N, max):
    return [randrange(max) for x in xrange(N)]


def subsequence(seq):
    """Returns the longest subsequence (non-contiguous) of seq that is
    strictly increasing.

    """
    # head[j] = index in 'seq' of the final member of the best subsequence
    # of length 'j + 1' yet found
    head = [0]
    # predecessor[j] = linked list of indices of best subsequence ending
    # at seq[j], in reverse order
    predecessor = [-1]
    for i in xrange(1, len(seq)):
        ## Find j such that:  seq[head[j - 1]] < seq[i] <= seq[head[j]]
        ## seq[head[j]] is increasing, so use binary search.
        j = bisect_left([seq[head[idx]] for idx in xrange(len(head))], seq[i])

        if j == len(head):
            head.append(i)
        if seq[i] < seq[head[j]]:
            head[j] = i

        predecessor.append(head[j - 1] if j > 0 else -1)

    ## trace subsequence back to output
    result = []
    trace_idx = head[-1]
    while (trace_idx >= 0):
        result.append(seq[trace_idx])
        trace_idx = predecessor[trace_idx]

    return result[::-1]


l1 = randomList(15, 100)

Could you please provide a description of your algorithm, it will be very helpful in reviewing your code? What is the role of M and P?
@rik The algorithm, including the M terminology, is taken from the linked Wikipedia article. I will edit to add a description of what it does in my own words...
Realizing that I had mistitled this post as longest "decreasing subsequence problem" which differs from this implementation by a sign change.

Winston Ewert · Accepted Answer · 2012-03-21 22:26:46Z

from random import randrange
from itertools import islice

def randomSeq(max):

The python style guide recommend that you use space_with_underscores for function names

  while True: yield randrange(max)



def randomList(N,max):
  return list(islice(randomSeq(max),N))

I suspect that's not a really efficient way to produce a random list. I think the following may be a better option:

return [randrange(max) for x in range(N)]

I've not done any benchmarking so I may be wrong.

## Returns the longest subsequence (non-contiguous) of X that is strictly increasing.
def subsequence(X):
    L = 1     ## length of longest subsequence (initially: just first element)
    M = [0]   ## M[j] = index in X of the final member of the lowest subsequence of length 'j' yet found
    P = [-1]

Consider using names instead of these single letter variables.

    for i in range(1,X.__len__()):

Don't use x.__len__() use len(x)

        ## Find largest j <= L such that: X[M[j]] < X[i].
        ## X[M[j]] is increasing, so use binary search over j.

This whole bit about the binary search would do well in a seperate function so as not the clutter the subsequence implementation. Actually python already provides a binary search function in the bisect module.

        j = -1
        start = 0
        end = L - 1
        going = True
        while going:
            if (start == end):

You don't need those parens

                if (X[M[start]] < X[i]):
                    j = start
                going = False

Use break. break isn't pretty when you can avoid it, but setting booleans flags is worse.

            else:
                partition = 1 + ((end - start - 1) / 2)

Notice how you only use start + partition? Calculate the middle element instead. It'll make this whole section a bit neater.

                if (X[M[start + partition]] < X[i]):
                    start += partition
                    j = start
                else:
                    end = start + partition - 1

        if (j >= 0):
            P.append(M[j])
        else:
            P.append(-1)

        j += 1
        if (j == L):
            M.append(i)
            L += 1
        if (X[i] < X[M[j]]):
            M[j] = i

    ## trace subsequence back to output
    result = []
    trace_idx = M[L-1]
    while (trace_idx >= 0):
        result.append(X[trace_idx])
        trace_idx = P[trace_idx]

    return list(result.__reversed__())

use reversed(list)

l1 = randomList(15,100)

Thanks, I've made some of the simpler edits, I will be looking at the rest...
Using 'bisect' rather than custom code is a big improvement, I think - it's getting there!

Rik Poggi · Answer 2 · 2012-03-22 11:45:04Z

PEP8 is the python style guide. To improve your style a start is to run the pep8 script against your file. Other style "problems" you'll have to find them yourself (or ask here on core review :)

Comments and docstrings

Check the PEP8 session Comments.

This:

## Returns the longest subsequence (non-contiguous) of X that is strictly increasing.
def subsequence(X):

Should be a docstring:

def subsequence(X):
    """Returns the longest subsequence (non-contiguous) of X that is strictly
    increasing.

    """

Variable names

I know that you kept the wikipedia variable names, but you should try to make your script conistent with itself. So, squeeze your variable name imagination, and choose something short, sweet and meaningful. You can always leave a note at the beginning, linking to the wikipedia page and say which variable is which.

With this:

L = 1     ## length of longest subsequence (initially: just first element)

I'd do:

# length of longest subsequence (initially: just first element)
len_longest = 1

Since the comment was a bit too long to be an inline comment I moved it. A good name for X might be seq. For M maybe pos_smallest and for P pos_predecessor or something like that, maybe with a better understanding of the underling algorithm you can find better names :)

`range()` and `xrange()`

Check the link above and you'll see that one is a list and the other an itertor (genrator).

So, this:

for i in range(1,len(X)):

Should be:

for i in xrange(1, len(seq)):

Note that in Python 3 xrange() become range().

Reverse a list

This line:

return list(reversed(result))

Should be:

return result[::-1]

Other than being more clear, it will be Python 3 compatible.

Useless brackets are useless

Python is not C so don't put brackets everywhere.

Here:

while (trace_idx >= 0):

Could just be:

while trace_idx >= 0:

Do the same thing in another couple of places.

Binary search

Your use of bisect_left looks really strange:

j = bisect_left([X[M[idx]] for idx in range(L)], X[i])

And even more the previous version:

while True:
    if (start == end):
        if (X[M[start]] < X[i]):
            j = start
        break
    else:
        partition = 1 + ((end - start - 1) / 2)
        if (X[M[start + partition]] < X[i]):
            start += partition
            j = start
        else:
            end = start + partition - 1

Which could have easily been:

while start != end:
    partition = 1 + (end - start - 1) / 2
    if X[M[start + partition]] < X[i]:
        start += partition
        j = start
    else:
        end = start + partition - 1

if X[M[start]] < X[i]:
    j = start

But you shouldn't do it like that anyway. Take a look at Binary Search in Python, there are both solution, the accepted is in "pure" Python (the one that I like the most) and there's the one with bisect_left. If you care about performances time them both.

This is just a pointing and I would have really liked to give you the code of the binary search, but you didn't really explained how your algorithm works (and it's quite different from the one showed in the wikipedia article), so I wasn't able to improve yours without changing its core.

Usually what one should do is instantiate at the beginning the entire M and P list, something like:

M = [0] * len(seq)

So you won't have to deal with append and so on... that are quite hard to follow.
I couldn't also get around your -1 offestes, are they really necessary? That's the main different I've found with the "official" algorithm.

Thanks; I've incorporated the stylistic changes - I'll be looking into the bit about the binary search
The reason for growing head (previously M) as the algorithm proceeds is that it need not grow to the size of seq, and in fact can end up much smaller. But yes, it could just be allocated initially to be the size of seq.
The core idea of the algorithm is that seq[head[idx]] (formerly X[M[j]]) stays sorted on idx -- yes it looks a bit odd, but I can't think of a better way to write it...

asked	1 year ago
viewed	647 times
active	1 year ago

Python implementation of the longest increasing subsequence problem

2 Answers

Comments and docstrings

Variable names

`range()` and `xrange()`

Reverse a list

Useless brackets are useless

Binary search

Your Answer

Not the answer you're looking for? Browse other questions tagged python readability or ask your own question.

Linked

Python implementation of the longest increasing subsequence problem

2 Answers

Comments and docstrings

Variable names

range() and xrange()

Reverse a list

Useless brackets are useless

Binary search

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python readability or ask your own question.

Linked

Related

`range()` and `xrange()`