Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Problem statement

There are two sorted arrays nums1 and nums2 of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be \$\mathcal{O}(\log (m+n))\$.

Implementation

 def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin <= imax:
        i = (imin + imax) / 2
        j = half_len - i
        if j > 0 and i < m and B[j-1] > A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and j < n and A[i-1] > B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0

Concern

I suspect that it is safe to change this line of code

j > 0 and i < m and B[j-1] > A[i]

to

i < m and B[j-1] > A[i]

and also it is safe to change this line of code

i > 0 and j < n and A[i-1] > B[j]

to

i > 0 and A[i-1] > B[j]

I think remove the condition check of j is safe since we already making sure size of A is no bigger than size of B.

share|improve this question
2  
Please note that on Code Review, answers may address any aspect of the code, not necessarily your specific concerns. – 200_success Apr 10 '16 at 1:21
2  
This question was cross-posted from Stack Overflow. Please declare such cross-posts in the future. – 200_success Apr 10 '16 at 1:31
1  
What is the median of two arrays? – vnp Apr 10 '16 at 4:28
2  
Sorry still not clear. I know what is the median of an array. What is the the median of two arrays? – vnp Apr 10 '16 at 6:40
1  
@LinMa IIRC median is a value which is not less than the half items of the collection and not greater than the half of items. For arrays 1,2,2,2,3 and '2,7' there exist no such element which is bigger than half and smaller than half of all items, anyway the median exists and it is 2. Am I wrong? – CiaPan Apr 21 '16 at 7:31
up vote 1 down vote accepted
+50

First lines can be written as (it's just a personal taste):

A, B = sorted([A, B], key=len, reverse=True)

Also the (m + n + 1) / 2 is possibly a bug, use (m + n + 1) // 2 instead (in python / is floating point division and // is integer division)

I am still looking for further improvements

To be honest I still can't get why it's O(log(n+m)) :-\

Edit:

I can't tell by sure whether the changes will break or not, but I've tested it with 10^6 random inputs multiples times and it seems that the changes are safe:

#!/usr/bin/env python

import random

def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin <= imax:
        i = (imin + imax) / 2
        j = half_len - i
        if i < m and B[j-1] > A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and A[i-1] > B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0

def random_seq(max_len):
    l = random.randint(0, max_len)
    return list(sorted(random.choice(range(max_len)) for i in range(l)))

def median_slow(A, B):
    C = A + B
    C.sort()
    if not C: raise ValueError()
    m1 = (len(C) - 1) / 2
    m2 = len(C) / 2
    return (C[m1] + C[m2]) / 2.0

for i in range(1000000):
    A = random_seq(10)
    B = random_seq(10)
    if len(A) + len(B) > 0:
        me = median(A, B)
        ma = median_slow(A, B)
        assert me == ma

print "Validated!"

Last Edit:

m + n is larger than one so m + n + 1 is larger than two so j = half_length - 1 will always be positive. also n > m so always n + m <= 2*n so half_length = (n + m + 1) / 2 is always n or smaller but if you subtract one (j = half_length - 1) then it will be always less than n. So these checks are really redundant

share|improve this answer
1  
Thanks AmirHossein, vote up for your advice. Do you have any advice on my original concerns? – Lin Ma Apr 21 '16 at 22:58
2  
I still can't get why it's O(log(n+m)): maybe you missed sorted (or its implications) from the title/problem statement. (A bit difficult to tell without you stating a tighter upper bound - or argue the O(log(n+m)) overly optimistic. But, then, you suggest sorting where merging would produce a combined sorted array.) – greybeard Apr 22 '16 at 4:18
1  
@greybeard, yes and vote up, original array is sorted, good catch. I will update question. And appreciate if you could comment on my original question. :) – Lin Ma Apr 22 '16 at 16:50
1  
Hi AmirHossein, my original two arrays are both sorted (asc). Do you have any advice on my original concerns? Thanks. – Lin Ma Apr 22 '16 at 16:51
1  
@LinMa check it out :-) – AmirHossein Apr 22 '16 at 19:51

Sorry for being late, but had no time to study your code until today. The results are the same which AmirHossein presented before.

After you conditionally swap the input arrays:

    if m > n:
        A, B, m, n = B, A, n, m

you have m <= n, so

    half_len = (m + n + 1) // 2

(after the fix by AmirHossein) is not less than m. That makes

    j = half_len - i

greater than zero if i < m, so j > 0 and i < m is redundant.

Of course half_len is also is not greater than n, so similary j < n if i > 0.

As for the complexity, the code bisects the A array in each iteration, so the complexity is actually \$\mathcal{O}(\log m)\$ where m is the length of A after the conditional swap, or \$\mathcal{O}(\log (\min(m,n)))\$ in terms of input values of m,n. And this can be bounded by \$\mathcal{O}(\log (m+n))\$.

share|improve this answer
    
Thanks CiaPan, vote up first before study. :)) – Lin Ma Apr 29 '16 at 23:33

I don't have enough test cases to be sure if this is the proper result, but you could do something like this:

def median(a, b):
    # Combine lists
    m = a + b
    # Gnome sort
    i = 0
    while True:
        i += 1 if i == 0 else 0
        if i < len(m):
            if m[i-1] < m[i]:
                m[i-1], m[i] = m[i], m[i-1]
                i -= 1
            else:
                i += 1
        else:
            break
    # Get median
    while len(m) >= 3:
        m.pop(0)
        m.pop(-1)
    # Average
    return sum(m) / len(m)
  1. First we put the lists together and sort them in reverse numerical order with a Gnome sort, which runs in \$\mathcal{O}(n)\$.

  2. Then, we take the median elements from the list by poping from the top and bottom until we only have the middle element(s) left, which runs in \$\mathcal{O}(1)\$. If the list has an even length, we end up with the middle pair of elements, but if the list has an odd length, we get the middle element.

  3. Finally, we return the average of the element(s).

share|improve this answer
2  
This is how I used to always do it manually! I like the simplicity, +1. (I unfortunately dislike the mix.pop and so I'd personally change the loop to: mid = int((len(mix) - 1) / 2);mix = mix[mid:-mid]) – Peilonrayz Apr 13 '16 at 15:57
1  
@JoeWallis: I had considered that, but I wasn't sure on the complexity of gathering the items and redefining. – Zach Gates Apr 13 '16 at 16:05
1  
I timed the while and the slice, the slice was 1/4 of the time for me on Python3. timeit.timeit('mix=list(range(20))\nwhile len(mix)>=3:\n\tmix.pop(0)\n\tmix.pop(-1)') and timeit.timeit('mix=list(range(20))\nmid = int((len(mix) - 1) / 2)\nmix = mix[mid:-mid]'). It's not the most scientific test, but 1/4 shows a clear difference. – Peilonrayz Apr 13 '16 at 16:12
3  
Please review code, rather than posting an alternate solution. – QPaysTaxes Apr 14 '16 at 2:32
2  
@LinMa No, in Python there is a special kind of assignment which is performed as if it was parallel: a,b,c = p,q,r means 'evaluate exressions p, q and r, and assign their values to variables a, b and c, respectively'. So the variables' swap can be done simply with a,b = b,a with no (explicit) intermediate storage. – CiaPan Apr 21 '16 at 20:46

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.