Faster and pythonic way to write this code

Question

I am trying to write a code which should return true if any element of list is present in a dictionary. Performance of this piece is really important. I know I can just loop over list and break if I find the first search hit. Is there any faster or more Pythonic way for this than given below?

for x in someList:
     if x in someDict:
           return True
return False

EDIT: I am using Python 2.7. My first preference would be a faster method.

"any faster or more pythonic way" --- pick one. You cannot have a dress to be both blue and white. — zerkms, 8 hours ago
@zerkms actually there is a famous dress over there that has 2 colors at the same time x). (But this is off topic and I know and agree with you) ;) — Pablo Palácios, 7 hours ago
@wim: Are you justifying your downvote by saying that the answer does not deserve such high upvotes or is there some reason? Aparently your comment sounds in the manner :-) — Abhijit, 7 hours ago
The answer is fine it's just a very basic question that could have been solved with some simple searching and reading. I'm used to +7 questions being a bit more interesting and meaty in the python tag — wim, 7 hours ago

Abhijit · Accepted Answer · 2015-04-24 11:45:58Z

Use of builtin any can have some performance edge over two loops

any(x in someDict for x in someList)

but you might need to measure your mileage. If your list and dict remains pretty static and you have to perform the comparison multiple times, you may consider using set

someSet = set(someList) 
someDict.viewkeys() & someSet

Note Python 3.X, by default returns views rather than a sequence, so it would be straight forward when using Python 3.X

someSet = set(someList) 
someDict.keys() & someSet

In both the above cases you can wrap the result with a bool to get a boolean result

bool(someDict.keys() & set(someSet ))

Heretic Note

My curiosity took better of me and I timed all the proposed solutions. It seems, your original solution is better performance wise. Here is the result

Sample Randomly generated Input

>>> import timeit
>>> import random
>>> some_list = random.sample(range(1,10000),10)
>>> some_dict = dict(zip(random.sample(range(1,10000),10), random.sample(range(1,10000),10)))

Test Result

>>> def foo_set(some_list, some_dict):
    return some_dict.viewkeys() & set(some_list )

>>> def foo_nested(some_list, some_dict):
    for x in some_list:
         if x in some_dict:
           return True
    return False

>>> def foo_any(some_list, some_dict):
    return any(x in some_dict for x in some_list)

>>> def foo_ifilter(some_list, some_dict):
    return bool(next(ifilter(some_dict.__contains__, some_list), False))

>>> def foo_ifilter_new(some_list, some_dict):
    return not not next(ifilter(someDict.__contains__, someList), False)
>>> def foo_ifilter_v3(some_list, some_dict):
    return any(ifilter(some_dict.__contains__, some_list))
>>>  # Timeit wrapper
>>> def time(fn):
    return timeit.timeit("{}(some_list, some_dict)".format(fn),
                 setup="from __main__ import {}, ifilter, some_list, some_dict".format(fn))  


>>> time("foo_nested")
1.054499578614525
>>> time("foo_ifilter")
1.2866317889381662

>>> time("foo_any")
1.4705059825669196
>>> time("foo_set")
5.068478196019896
>>> time("foo_ifilter")
1.268290360447736
>>> time("foo_ifilter_v3")
1.070190605300013

I understand, that the solution utilizing set is not a good use case here, so I retested it without recreating the set on each iteration

>>> def foo_set(some_list, some_dict):
    return some_dict.viewkeys() & some_set

>>> some_set = set(some_list) 
>>> print timeit.timeit("foo_set(some_list, some_dict)",
                 setup="from __main__ import foo_set, ifilter, some_list, some_dict")
2.72747207852

I also tested what Ashwin suggested and there is a 2 fold increase in performance.

>>> def foo_set(some_list, some_dict):
    return not set(some_dict).isdisjoint(some_list)

>>> time("foo_set")
2.357448644513852

Summarizing the result

╔══════════════════════════════════╦══════╦════════════════════╗
║            Procedure             ║ Rank ║       Result       ║
╠══════════════════════════════════╬══════╬════════════════════╣
║ Nested Loop                      ║    1 ║ 1.054499578614525  ║
║ ifilter/all (edited)             ║    2 ║ 1.070190605300013  ║
║ ifilter/next (suggested)         ║    3 ║ 1.268290360447736  ║
║ ifilter/next                     ║    4 ║ 1.2866317889381662 ║
║ any                              ║    5 ║ 1.4705059825669196 ║
║ set (without dict views        ) ║    6 ║ 2.357448644513852  ║
║ set (optimized for dynamic data) ║    7 ║ 2.72747207852      ║
║ set                              ║    8 ║ 5.068478196019896  ║
╚══════════════════════════════════╩══════╩════════════════════╝

If my list and dicts are quite dynamic and changing a lot then is the first option my best bet? — Naman, 8 hours ago
The timeit results are quite interesting. I would probably use nested or any then. — Naman, 8 hours ago
Instead of set.intersection I would use set.isdisjoint: not set(some_dct).isdisjoint(some_lst). Dictviews are slow because there you're calling set() on both dict and list(implicitly and explictly), OTOH set(some_dct).isdisjoint(some_lst) will convert some_dict to set first and then we loop over some_list's items and check if it is contained in the set, if yes short-circuit. — Ashwini Chaudhary, 7 hours ago
Please add to the list Raymond's latest revision: any(ifilter(someDict.__contains__, someList)) — twasbrillig, 6 hours ago

Raymond Hettinger · Answer 2 · 2015-04-24 07:59:41Z

up vote 8 down vote

The cleanest and fastest way is to use any() and itertools.ifilter():

any(ifilter(someDict.__contains__, someList))

This code uses:

a bound method, someDict.__contains__ as the predicate
the ifilter() itertool lazily scans for a true result at C speed
the any() built-in drives the filter to find a single matching occurrence or to return False if the iterator is exhausted.

You could also use itertools.imap() instead of ifilter().

edited 6 hours ago

answered 8 hours ago

Raymond Hettinger
58.8k11112195

2

I'd be interested in the fastest way if there is even 10-20% speed improvement. Could you please edit the answer with the specific for ifilter? – Naman 8 hours ago

2

In Abhit's answer, he timed the different approaches and found that ifilter/next was not the fastest; any idea why this would be the case? – twasbrillig 7 hours ago

add a comment |

twasbrillig · Answer 3 · 2015-04-24 05:30:09Z

up vote 4 down vote

s = set(someList)
return bool(s.intersection(someDict.keys()))

answered 8 hours ago

twasbrillig
2,7252926

add a comment |

asked	today
viewed	889 times
active	today

current community

your communities

more stack exchange communities

Faster and pythonic way to write this code

3 Answers 3

Test Result

Summarizing the result

Your Answer

Not the answer you're looking for? Browse other questions tagged python performance list dictionary or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Faster and pythonic way to write this code

3 Answers 3

Test Result

Summarizing the result

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python performance list dictionary or ask your own question.

Related

Hot Network Questions