For some post-processing, I need to flatten a structure like this
{'foo': { 'cat': {'name': 'Hodor', 'age': 7}, 'dog': {'name': 'Mordor', 'age': 5}}, 'bar': { 'rat': {'name': 'Izidor', 'age': 3}} }
Each bottom entries will appear as a row on the output. The heading keys will appear each row, flattened. Perhaps an example is better than my mediocre explanation:
[{'age': 5, 'animal': 'dog', 'foobar': 'foo', 'name': 'Mordor'}, {'age': 7, 'animal': 'cat', 'foobar': 'foo', 'name': 'Hodor'}, {'age': 3, 'animal': 'rat', 'foobar': 'bar', 'name': 'Izidor'}]
I first wrote this function:
def flatten(data, primary_keys):
out = []
keys = copy.copy(primary_keys)
keys.reverse()
def visit(node, primary_values, prim):
if len(prim):
p = prim.pop()
for key, child in node.iteritems():
primary_values[p] = key
visit(child, primary_values, copy.copy(prim))
else:
new = copy.copy(node)
new.update(primary_values)
out.append(new)
visit(data, { }, keys)
return out
out = flatten(a, ['foobar', 'animal'])
I was not really satisfied because I have to use copy.copy
to protect my input arguments. Obviously, when using flatten
one does not want its input data
to be altered.
So I thought about one alternative that uses more global variables (at least global to flatten
) and uses an index instead of directly passing primary_keys
to visit
. However, this does not really help me to get rid of the ugly initial copy:
keys = copy.copy(primary_keys)
keys.reverse()
So here is my final version:
def flatten(data, keys):
data = copy.copy(data)
keys = copy.copy(keys)
keys.reverse()
out = []
values = {}
def visit(node, id):
if id:
id -= 1
for key, child in node.iteritems():
values[keys[id]] = key
visit(child, id)
else:
node.update(values)
out.append(node)
visit(data, len(keys))
return out
I am sure some Python magic will help in this case.
'foobar'
and'animal'
come from? They appear nowhere in your calling example. Shouldn't it beout = flatten(a, ['foobar', 'animal'])
instead? \$\endgroup\$ – 301_Moved_Permanently May 25 '16 at 11:36