Using binary search to find the index of an item in a sequence

Question

Here's my script:

def index(seq, item):
    """Returns the index of an item in an increasingly-sorted
       sequence (if exists) otherwise returns the index where
       it would be inserted, using binary search.

       seq: a sequence.

       precondition: seq is sorted increasingly.
       """
    # Updates the sequence with the item using a
    # set to avoid duplicates.
    seq = set(seq)
    seq.update([item])

    # Converts the sequence to a list and re-sorts it. 
    sorted_seq = sorted(seq)

    middle = len(sorted_seq) // 2
    middle_value = sorted_seq[middle]

    first_half = sorted_seq[:middle]
    second_half = sorted_seq[middle:]

    if item == middle_value:
        return middle

    elif item < middle_value:
        return index(first_half, item)

    elif item > middle_value:
        return index(second_half, item) + len(first_half)

How can it be refactored and optimized?

Notes:

I'm a hobbyist and beginner (in programming in general, and Python in particular).
I'm new to binary search I just understand the basic idea.

coderodde · Accepted Answer · 2016-10-02 18:27:11Z

seq = set(seq)
seq.update([item])

# Converts the sequence to a list and re-sorts it. 
sorted_seq = sorted(seq)

In worst case, sorted() will take \$\Theta(n \log n)\$, where \$n = \$ len(seq). Even worse, if you did not found the value, you recur to half of the original sequence, and so on.

So, instead of presorting the sequence, just proceed to doing the actual search; if the caller fails to input a sorted list, it's not your fault.

I have this in mind, runs in worst case logarithmic time and behaves exactly as your implementation:

def binary_search(lst, value):
    count = len(lst)
    first = 0

    while count:
        it = first
        step = count // 2
        it += step
        if value > lst[it]:
            it += 1
            first = it
            count -= step + 1
        else:
            count = step

    return first


def main():
    lst = [1, 2, 5, 10, 11, 17, 20]

    for value in range(0, 25):
        print("OP:", index(lst, value))
        print("cr:", binary_search(lst, value))
        print("---")

if __name__ == "__main__":
    main()

Hope that helps.

Thank you! That's far better than my algorithm, and not buggy as it. — Mahmud Muhammad Naguib, 19 hours ago
Can we use it = first+step instead of initializing it with first and incrementing with step? I've got the total idea and the logic, but can you, please, explain the logic behind count -= step + 1 to me? — Mahmud Muhammad Naguib, 19 hours ago

j4nu5 · Answer 2 · 2016-10-03 02:11:57Z

up vote 3 down vote

seq = set(seq)

This will cause index calculations to be incorrect. Consider this invocation:

index([1, 1, 2, 3], 2)

This should return 2 but your function returns 1. As a caller of your index method, I would expect the returned index to have item in seq.

~~seq.update([item])~~

~~This is a bug. It's causing print(index([1, 1, 2, 3], 15)) to return 3, when it should return -1 or some other sentinel value to denote absence of an item in seq.~~

EDIT: The function's documentation clearly says that index will return the final index after insertion if item does not exist in seq.

sorted_seq = sorted(seq)

This is unnecessary. Validate/assume your input seq to be sorted. This should be the standard expectation from any caller invoking your index function.

Be careful about:

first_half = sorted_seq[:middle]
second_half = sorted_seq[middle:]

Python fortunately will not create copies of the list but other languages may create copies of the original list for this operation.

After you have removed seq.update([item]), you will need to check for the base of your recursion, namely when seq is empty, causing

middle_value = sorted_seq[middle]

to fail.

As an aside, try implementing this as an iterative algorithm.

It will run faster
The recursive version will crash for large seq.

Totally aside, have a look at Binary Search tutorial on TopCoder Its a pretty good article.

edited yesterday

answered yesterday

j4nu5

657

Thank you! I've stated in the docstring otherwise returns the index where it would be inserted, theerfore I've used seq.update([item]). But Using seq = set(seq) is buggy as you've mentioned. Can you explain, please, what you mean by Python fortunately will not create copies of the list but other languages may create copies of the original list for this operation., ? – Mahmud Muhammad Naguib yesterday

@MahmudMuhammadNaguib I was pointing at the memory requirements of your algorithm. In each recursive step of your algorithm, you probably do not want to create fresh copies of the first and second halves of the arrays. Python slices do this by default but in a language other than python, you would probably want to pass the list by reference along with high/end and low/start pointers/indexes that mark the boundaries of the sequence to be searched for. This would incur the cost of 3 pointers in each recursive step, making extra memory requirements O(1). – j4nu5 yesterday

Thanks! I suggest you remove sorted_seq = sorted(seq) and the section about it. After inserting item the sequence won't be sorted anymore, so we have to re-sort it. – Mahmud Muhammad Naguib 20 hours ago

Please do not do that. As the accepted answer suggests, that would make the time complexity O(n lg n). Beats the point of binary search, which is supposed to run in O(lg n). Instead this should be your binary search criteria: "Find the index of the leftmost element in seq which is >= item". This will take care of duplicates and missing elements. – j4nu5 17 hours ago

add a comment |

asked	yesterday
viewed	163 times
active	yesterday

current community

your communities

more stack exchange communities

Using binary search to find the index of an item in a sequence

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python algorithm python-3.x reinventing-the-wheel binary-search or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Using binary search to find the index of an item in a sequence

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python algorithm python-3.x reinventing-the-wheel binary-search or ask your own question.

Related

Hot Network Questions