Python implementation of the longest increasing subsequence

Question

Prompted by this question on Stack Overflow, I wrote an implementation in Python of the longest increasing subsequence problem. In a nutshell, the problem is: given a sequence of numbers, remove the fewest possible to obtain an increasing subsequence (the answer is not unique).

Perhaps it is best illustrated by example:

>>> elems
[25, 72, 31, 32, 8, 20, 38, 43, 85, 39, 33, 40, 98, 37, 14]
>>> subsequence(elems)
[25, 31, 32, 38, 39, 40, 98]

The algorithm iterates over the input array, X, while keeping track of the length longest increasing subsequence found so far (L). It also maintains an array M of length L where M[j] = "the index in X of the final element of the best subsequence of length j found so far" where best means the one that ends on the lowest element.

It also maintains an array P which constitutes a linked list of indices in X of the best possible subsequences (e.g. P[j], P[P[j]], P[P[P[j]]] ... is the best subsequence ending with X[j], in reverse order). P is not needed if only the length of the longest increasing subsequence is needed.

The code below works, but I am sure it could be made shorter and / or more readable. Can any more experienced Python coders offer some suggestions?

from random import randrange
from itertools import islice

def randomSeq(max):
  while True: yield randrange(max)

def randomList(N,max):
  return list(islice(randomSeq(max),N))

## Returns the longest subsequence (non-contiguous) of X that is strictly increasing.
def subsequence(X):
    L = 1     ## length of longest subsequence (initially: just first element)
    M = [0]   ## M[j] = index in X of the final member of the lowest subsequence of length 'j' yet found
    P = [-1]
    for i in range(1,X.__len__()):
        ## Find largest j <= L such that: X[M[j]] < X[i].
        ## X[M[j]] is increasing, so use binary search over j.
        j = -1
        start = 0
        end = L - 1
        going = True
        while going:
            if (start == end):
                if (X[M[start]] < X[i]):
                    j = start
                going = False
            else:
                partition = 1 + ((end - start - 1) / 2)
                if (X[M[start + partition]] < X[i]):
                    start += partition
                    j = start
                else:
                    end = start + partition - 1

        if (j >= 0):
            P.append(M[j])
        else:
            P.append(-1)

        j += 1
        if (j == L):
            M.append(i)
            L += 1
        if (X[i] < X[M[j]]):
            M[j] = i

    ## trace subsequence back to output
    result = []
    trace_idx = M[L-1]
    while (trace_idx >= 0):
        result.append(X[trace_idx])
        trace_idx = P[trace_idx]

    return list(result.__reversed__())


l1 = randomList(15,100)

See the revised version below, in this answer.

Winston Ewert · Accepted Answer · 2012-03-21 22:26:46Z

from random import randrange
from itertools import islice

def randomSeq(max):

The python style guide recommend that you use space_with_underscores for function names

  while True: yield randrange(max)



def randomList(N,max):
  return list(islice(randomSeq(max),N))

I suspect that's not a really efficient way to produce a random list. I think the following may be a better option:

return [randrange(max) for x in range(N)]

I've not done any benchmarking so I may be wrong.

## Returns the longest subsequence (non-contiguous) of X that is strictly increasing.
def subsequence(X):
    L = 1     ## length of longest subsequence (initially: just first element)
    M = [0]   ## M[j] = index in X of the final member of the lowest subsequence of length 'j' yet found
    P = [-1]

Consider using names instead of these single letter variables.

    for i in range(1,X.__len__()):

Don't use x.__len__() use len(x)

        ## Find largest j <= L such that: X[M[j]] < X[i].
        ## X[M[j]] is increasing, so use binary search over j.

This whole bit about the binary search would do well in a seperate function so as not the clutter the subsequence implementation. Actually python already provides a binary search function in the bisect module.

        j = -1
        start = 0
        end = L - 1
        going = True
        while going:
            if (start == end):

You don't need those parens

                if (X[M[start]] < X[i]):
                    j = start
                going = False

Use break. break isn't pretty when you can avoid it, but setting booleans flags is worse.

            else:
                partition = 1 + ((end - start - 1) / 2)

Notice how you only use start + partition? Calculate the middle element instead. It'll make this whole section a bit neater.

                if (X[M[start + partition]] < X[i]):
                    start += partition
                    j = start
                else:
                    end = start + partition - 1

        if (j >= 0):
            P.append(M[j])
        else:
            P.append(-1)

        j += 1
        if (j == L):
            M.append(i)
            L += 1
        if (X[i] < X[M[j]]):
            M[j] = i

    ## trace subsequence back to output
    result = []
    trace_idx = M[L-1]
    while (trace_idx >= 0):
        result.append(X[trace_idx])
        trace_idx = P[trace_idx]

    return list(result.__reversed__())

use reversed(list)

l1 = randomList(15,100)

Rik Poggi · Answer 2 · 2012-03-22 11:45:04Z

PEP8 is the python style guide. To improve your style a start is to run the pep8 script against your file. Other style "problems" you'll have to find them yourself (or ask here on core review :)

Comments and docstrings

Check the PEP8 session Comments.

This:

## Returns the longest subsequence (non-contiguous) of X that is strictly increasing.
def subsequence(X):

Should be a docstring:

def subsequence(X):
    """Returns the longest subsequence (non-contiguous) of X that is strictly
    increasing.

    """

Variable names

I know that you kept the wikipedia variable names, but you should try to make your script conistent with itself. So, squeeze your variable name imagination, and choose something short, sweet and meaningful. You can always leave a note at the beginning, linking to the wikipedia page and say which variable is which.

With this:

L = 1     ## length of longest subsequence (initially: just first element)

I'd do:

# length of longest subsequence (initially: just first element)
len_longest = 1

Since the comment was a bit too long to be an inline comment I moved it. A good name for X might be seq. For M maybe pos_smallest and for P pos_predecessor or something like that, maybe with a better understanding of the underling algorithm you can find better names :)

`range()` and `xrange()`

Check the link above and you'll see that one is a list and the other an itertor (genrator).

So, this:

for i in range(1,len(X)):

Should be:

for i in xrange(1, len(seq)):

Note that in Python 3 xrange() become range().

Reverse a list

This line:

return list(reversed(result))

Should be:

return result[::-1]

Other than being more clear, it will be Python 3 compatible.

Useless brackets are useless

Python is not C so don't put brackets everywhere.

Here:

while (trace_idx >= 0):

Could just be:

while trace_idx >= 0:

Do the same thing in another couple of places.

Binary search

Your use of bisect_left looks really strange:

j = bisect_left([X[M[idx]] for idx in range(L)], X[i])

And even more the previous version:

while True:
    if (start == end):
        if (X[M[start]] < X[i]):
            j = start
        break
    else:
        partition = 1 + ((end - start - 1) / 2)
        if (X[M[start + partition]] < X[i]):
            start += partition
            j = start
        else:
            end = start + partition - 1

Which could have easily been:

while start != end:
    partition = 1 + (end - start - 1) / 2
    if X[M[start + partition]] < X[i]:
        start += partition
        j = start
    else:
        end = start + partition - 1

if X[M[start]] < X[i]:
    j = start

But you shouldn't do it like that anyway. Take a look at Binary Search in Python, there are both solution, the accepted is in "pure" Python (the one that I like the most) and there's the one with bisect_left. If you care about performances time them both.

This is just a pointing and I would have really liked to give you the code of the binary search, but you didn't really explained how your algorithm works (and it's quite different from the one showed in the wikipedia article), so I wasn't able to improve yours without changing its core.

Usually what one should do is instantiate at the beginning the entire M and P list, something like:

M = [0] * len(seq)

So you won't have to deal with append and so on... that are quite hard to follow.
I couldn't also get around your -1 offestes, are they really necessary? That's the main different I've found with the "official" algorithm.

The reason for growing head (previously M) as the algorithm proceeds is that it need not grow to the size of seq, and in fact can end up much smaller. But yes, it could just be allocated initially to be the size of seq. — gcbenison, Mar 22 '12 at 18:36
The core idea of the algorithm is that seq[head[idx]] (formerly X[M[j]]) stays sorted on idx -- yes it looks a bit odd, but I can't think of a better way to write it... — gcbenison, Mar 22 '12 at 18:50

arekolek · Answer 3 · 2016-07-12 19:16:40Z

Note: some of the suggestions here concern the revised version of the code, that was posted as this other answer.

List comprehension in bisect

You could rewrite:

j = bisect_left([seq[head[idx]] for idx in xrange(len(head))], seq[i])

trivially as:

j = bisect_left([seq[k] for k in head], seq[i])

There's nothing to gain with the original approach.

Or, if you make seq a numpy.array, then you could use:

j = bisect_left(seq[head], seq[i])

If-statements in the for-loop

You don't really need to check if seq[i] < seq[head[j]].

From bisect_left line we know that all(seq[i] <= v for v in seq[head][j:]), so your check is equivalent to if seq[i] != seq[head[j]].

But I see no reason why you can't update head[j] = i in such case. Values in seq[head] stay the same, since seq[i] == seq[head[j]]. It also won't affect predecessor paths of longer subsequences.

On the other hand, j == len(head) implies that seq[i] == seq[head[j]], so maybe you just meant:

if j == len(head): head.append(i)
else: head[j] = i

to avoid unnecessary assignments. And that's ok. But in this particular case, I propose to ask forgiveness, not permission:

try: head[j] = i
except: head.append(j)

Parentheses

Also, you don't need the parentheses in while (trace_idx >= 0):, this should be just:

while trace_idx >= 0:

Generating random numbers

You don't need randomList if you use numpy:

from numpy.random import randint

l1 = list(randint(100, size=15))

Miscellaneous suggestions

You may consider renaming the function to include the full name.
You may consider handling the empty sequence correctly.
I think it is possible to have a better name for the head variable.
Using None gives better readability to mark that there are no predecessors.
In some applications, it may be beneficial to return indices instead of actual values.
It is possible to generate the indices in correct order directly, using recursion.

Here's the function with all remarks applied:

def longest_increasing_subsequence(seq):
  if not seq: return seq

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # seq[i] can extend a subsequence that ends with a smaller element
    j = bisect_left([seq[k] for k in lastoflength], seq[i])
    # update existing subsequence of length j or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # remember element before seq[i] in the subsequence
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  # return indices ..., p(p(p(i))), p(p(i)), p(i), i
  def trace(i):
    if i is not None:
      yield from trace(predecessor[i])
      yield i
  return list(trace(lastoflength[-1]))

Hi, and welcome to Code Review! Sorry about the mess around this question and your suggested edit. The question is from old days, when we did yet actively discourage users from revising the code in their question. Revising the question creates a mess, as it invalidates existing answers, and invites new answers that are not comparable to the existing ones... The recommended practice today is to post revised code as a new question, so this kind of thing doesn't happen anymore with more recent posts (2014 and later). Hope you'll enjoy the site nonetheless! — janos♦, Jul 10 '16 at 13:36

janos · Answer 4 · 2016-07-10 13:29:25Z

Here's the revised code, based on the suggestions from code reviews, excluding @arekolek's answer, which appears to be a review of this revised code here.

from random import randrange
from bisect import bisect_left


def randomList(N, max):
    return [randrange(max) for x in xrange(N)]


def subsequence(seq):
    """Returns the longest subsequence (non-contiguous) of seq that is
    strictly increasing.

    """
    # head[j] = index in 'seq' of the final member of the best subsequence 
    # of length 'j + 1' yet found
    head = [0]
    # predecessor[j] = linked list of indices of best subsequence ending
    # at seq[j], in reverse order
    predecessor = [-1]
    for i in xrange(1, len(seq)):
        ## Find j such that:  seq[head[j - 1]] < seq[i] <= seq[head[j]]
        ## seq[head[j]] is increasing, so use binary search.
        j = bisect_left([seq[head[idx]] for idx in xrange(len(head))], seq[i])

        if j == len(head):
            head.append(i)
        if seq[i] < seq[head[j]]:
            head[j] = i

        predecessor.append(head[j - 1] if j > 0 else -1)

    ## trace subsequence back to output
    result = []
    trace_idx = head[-1]
    while (trace_idx >= 0):
        result.append(seq[trace_idx])
        trace_idx = predecessor[trace_idx]

    return result[::-1]


l1 = randomList(15, 100)

asked	4 years ago
viewed	3867 times
active	6 months ago

current community

your communities

more stack exchange communities

Python implementation of the longest increasing subsequence

4 Answers 4

Comments and docstrings

Variable names

`range()` and `xrange()`

Reverse a list

Useless brackets are useless

Binary search

List comprehension in bisect

If-statements in the for-loop

Parentheses

Generating random numbers

Miscellaneous suggestions

Your Answer

Not the answer you're looking for? Browse other questions tagged python dynamic-programming or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Python implementation of the longest increasing subsequence

4 Answers 4

Comments and docstrings

Variable names

range() and xrange()

Reverse a list

Useless brackets are useless

Binary search

List comprehension in bisect

If-statements in the for-loop

Parentheses

Generating random numbers

Miscellaneous suggestions

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python dynamic-programming or ask your own question.

Linked

Related

Hot Network Questions

`range()` and `xrange()`