Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

What is the most Pythonic way to take a dict of lists and produce a new dict with the list items as keys and the previous dict's keys as list items.

Here's a visual explanation:

favorite_fruits = {"alice": {"apple", "orange"}, "bob": {"apple"}, "carol": {"orange"}}
people_by_fruit = {"orange": {"carol", "alice"}, "apple": {"bob", "alice"}}

Here's the best I have at the moment:

from collections import defaultdict
favorite_fruits = {"alice": {"apple", "orange"}, "bob": {"apple"}, "carol": {"orange"}}
people_by_fruit = defaultdict(set)
for person, fruit in favorite_fruits.items():
    for fruit in fruit:
        people_by_fruit[fruit].add(person)
share|improve this question
    
    
@kushj: that solves a slightly different problem – Trey Hunner Nov 14 '15 at 6:54
1  
@TreyHunner you did it in your code, this code looks fine. Please, post your specific concerns as I don't see any issues with this. – Maxim Galushka Nov 14 '15 at 18:59
    
You can write it in just one line by use dict compression and map(). But IMHO that is not really better and even a trick because in Python 3 map is a generator and executed just when you evaluate generator I.e. you should add something like [:] at the end of map code. – Michele d'Amico Nov 14 '15 at 21:42
    
@Micheled'Amico: that's an interesting idea but I'm having trouble envisioning that. – Trey Hunner Nov 15 '15 at 0:50

Actually, I do believe you are quite good as you are. The simple inversion listed in this similar question (from comments) does not work when you want to split up the set values of your first dict. You could try something like a dict comprehension with a double for loop, but that doesn't work either as the second time you get a fruit it will overwrite the first one.

The only thing I would like to change in your answer is to use the plural of fruit, fruits, so that you don't do the for fruit in fruit which looks kind of hairy, and has the potential for code breaking as you're overwriting a variable with the same variable. Not good. In other words:

people_by_fruit = defaultdict(set)
for person, fruits in favorite_fruits.items():
    for fruit in fruits:
        people_by_fruit[fruit].add(person)
share|improve this answer
    
Nice catch on the repeat variable name. Thanks for the feedback! – Trey Hunner Nov 15 '15 at 6:32

First of all my opinion is that your version is quite close to the best one. But it is possible to use a single for cycle or write it in just one line by use of map() and list compression instead of nested for cycles:

from collections import defaultdict

direct = {"a": [1, 2, 3], "b": [3], "c": [2, 4, 5], "d": [6]}


def invert(d):
    ret = defaultdict(set)
    for key, values in d.items():
        for value in values:
            ret[value].add(key)
    return ret


def invert_alt(d):
    ret = defaultdict(set)
    list(map(lambda h: ret[h[1]].add(h[0]), [(key, value) for key in d for value in d[key]]))
    return ret


def invert_final(d):
    ret = defaultdict(set)
    for key, value in [(key, value) for key in d for value in d[key]]:
        ret[value].add(key)
    return ret


print(invert(direct))
print(invert_alt(direct))
print(invert_final(direct))

Is it clear that invert_alt() have too much issues to use it:

  1. You should use list() trick just in Python3 because map() is a generator and not evaluated until the code access to generator element, you don't need it in Python2.
  2. This implementation uses map's side effect to do its job and my position is to avoid any use of side effects to complete the core jobs.
  3. Is really hard to understand.

For invert_final() you pay a little bit in clearness to remove a nested indentation: maybe a good compromise. Due to Python's formatting if you remove nesting indentation that is always a good goal.

share|improve this answer
    
In invert_final you don't remove a nested cycle, you add a cycle and a list comprehension. In invert_alt you add a list generation to force execution of generator of an extra lamda/map expression. Both versions going a little backwards, proving that the OP version is actually a good version: Readable, memory efficient and effective. – holroy Nov 15 '15 at 11:12
    
Sorry,, I've used wrong term. I meant remove nestled indentation. Remove nested indentations are always a good point in Python. As I wrote invert_alt is wrong but I wrote it just because OP ask me in the comment. I've pointed that you pay little in readability but you remove indentation, nested indentation is a well know Python issue so remove one could be a valuable goal. – Michele d'Amico Nov 15 '15 at 11:23
    
@holroy just for the records, Python developer put lot effort in list compression syntax and tools like zip map to make simple remove some nested indentations and make code more compact.. Try to write a five nested for cycle with a if logic inside ... You will love zip construct. – Michele d'Amico Nov 15 '15 at 11:31
    
Readability and good line lengths are also good, if not better. Removing indentation on cost of clarity, is not a good trade off in my book. – holroy Nov 15 '15 at 11:33
    
I do love the different comprehensions available, not that keen on the not so recommended map though useful in some contexts. Choose the right tool for the right job. Nesting five for loops with if sounds like something in need of refactoring – holroy Nov 15 '15 at 11:39

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.