Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am trying to write a code which should return true if any element of list is present in a dictionary. Performance of this piece is really important. I know I can just loop over list and break if I find the first search hit. Is there any faster or more Pythonic way for this than given below?

for x in someList:
     if x in someDict:
           return True
return False

EDIT: I am using Python 2.7. My first preference would be a faster method.

share|improve this question
7  
"any faster or more pythonic way" --- pick one. You cannot have a dress to be both blue and white. –  zerkms 8 hours ago
2  
@zerkms actually there is a famous dress over there that has 2 colors at the same time x). (But this is off topic and I know and agree with you) ;) –  Pablo Palácios 7 hours ago
2  
+7 for this question?? –  wim 7 hours ago
    
@wim: Are you justifying your downvote by saying that the answer does not deserve such high upvotes or is there some reason? Aparently your comment sounds in the manner :-) –  Abhijit 7 hours ago
3  
The answer is fine it's just a very basic question that could have been solved with some simple searching and reading. I'm used to +7 questions being a bit more interesting and meaty in the python tag –  wim 7 hours ago

3 Answers 3

up vote 27 down vote accepted

Use of builtin any can have some performance edge over two loops

any(x in someDict for x in someList)

but you might need to measure your mileage. If your list and dict remains pretty static and you have to perform the comparison multiple times, you may consider using set

someSet = set(someList) 
someDict.viewkeys() & someSet 

Note Python 3.X, by default returns views rather than a sequence, so it would be straight forward when using Python 3.X

someSet = set(someList) 
someDict.keys() & someSet 

In both the above cases you can wrap the result with a bool to get a boolean result

bool(someDict.keys() & set(someSet ))

Heretic Note

My curiosity took better of me and I timed all the proposed solutions. It seems, your original solution is better performance wise. Here is the result

Sample Randomly generated Input

>>> import timeit
>>> import random
>>> some_list = random.sample(range(1,10000),10)
>>> some_dict = dict(zip(random.sample(range(1,10000),10), random.sample(range(1,10000),10)))

Test Result

>>> def foo_set(some_list, some_dict):
    return some_dict.viewkeys() & set(some_list )

>>> def foo_nested(some_list, some_dict):
    for x in some_list:
         if x in some_dict:
           return True
    return False

>>> def foo_any(some_list, some_dict):
    return any(x in some_dict for x in some_list)

>>> def foo_ifilter(some_list, some_dict):
    return bool(next(ifilter(some_dict.__contains__, some_list), False))

>>> def foo_ifilter_new(some_list, some_dict):
    return not not next(ifilter(someDict.__contains__, someList), False)
>>> def foo_ifilter_v3(some_list, some_dict):
    return any(ifilter(some_dict.__contains__, some_list))
>>>  # Timeit wrapper
>>> def time(fn):
    return timeit.timeit("{}(some_list, some_dict)".format(fn),
                 setup="from __main__ import {}, ifilter, some_list, some_dict".format(fn))  


>>> time("foo_nested")
1.054499578614525
>>> time("foo_ifilter")
1.2866317889381662

>>> time("foo_any")
1.4705059825669196
>>> time("foo_set")
5.068478196019896
>>> time("foo_ifilter")
1.268290360447736
>>> time("foo_ifilter_v3")
1.070190605300013

I understand, that the solution utilizing set is not a good use case here, so I retested it without recreating the set on each iteration

>>> def foo_set(some_list, some_dict):
    return some_dict.viewkeys() & some_set

>>> some_set = set(some_list) 
>>> print timeit.timeit("foo_set(some_list, some_dict)",
                 setup="from __main__ import foo_set, ifilter, some_list, some_dict")
2.72747207852

I also tested what Ashwin suggested and there is a 2 fold increase in performance.

>>> def foo_set(some_list, some_dict):
    return not set(some_dict).isdisjoint(some_list)

>>> time("foo_set")
2.357448644513852

Summarizing the result

╔══════════════════════════════════╦══════╦════════════════════╗
║            Procedure             ║ Rank ║       Result       ║
╠══════════════════════════════════╬══════╬════════════════════╣
║ Nested Loop                      ║    1 ║ 1.054499578614525  ║
║ ifilter/all (edited)             ║    2 ║ 1.070190605300013  ║
║ ifilter/next (suggested)         ║    3 ║ 1.268290360447736  ║
║ ifilter/next                     ║    4 ║ 1.2866317889381662 ║
║ any                              ║    5 ║ 1.4705059825669196 ║
║ set (without dict views        ) ║    6 ║ 2.357448644513852  ║
║ set (optimized for dynamic data) ║    7 ║ 2.72747207852      ║
║ set                              ║    8 ║ 5.068478196019896  ║
╚══════════════════════════════════╩══════╩════════════════════╝
share|improve this answer
    
If my list and dicts are quite dynamic and changing a lot then is the first option my best bet? –  Naman 8 hours ago
1  
@Naman: Yes, that is what I mentioned in my answer. –  Abhijit 8 hours ago
    
The timeit results are quite interesting. I would probably use nested or any then. –  Naman 8 hours ago
2  
Instead of set.intersection I would use set.isdisjoint: not set(some_dct).isdisjoint(some_lst). Dictviews are slow because there you're calling set() on both dict and list(implicitly and explictly), OTOH set(some_dct).isdisjoint(some_lst) will convert some_dict to set first and then we loop over some_list's items and check if it is contained in the set, if yes short-circuit. –  Ashwini Chaudhary 7 hours ago
1  
Please add to the list Raymond's latest revision: any(ifilter(someDict.__contains__, someList)) –  twasbrillig 6 hours ago

The cleanest and fastest way is to use any() and itertools.ifilter():

any(ifilter(someDict.__contains__, someList))

This code uses:

  • a bound method, someDict.__contains__ as the predicate
  • the ifilter() itertool lazily scans for a true result at C speed
  • the any() built-in drives the filter to find a single matching occurrence or to return False if the iterator is exhausted.

You could also use itertools.imap() instead of ifilter().

share|improve this answer
2  
I'd be interested in the fastest way if there is even 10-20% speed improvement. Could you please edit the answer with the specific for ifilter? –  Naman 8 hours ago
2  
In Abhit's answer, he timed the different approaches and found that ifilter/next was not the fastest; any idea why this would be the case? –  twasbrillig 7 hours ago
s = set(someList)
return bool(s.intersection(someDict.keys()))
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.