I have to flatten a large number (>300k) dicts to write them to a csv file.
Example:
dict = {
a: b,
c: [
{
d:e
},
{
f:g
}
]
}
becomes:
a, c.0.d, c.1.f
b, e, g
The dicts can be really big with a lot of dicts as values.
My function to flatten them generically is:
def flatten(self, d, parent_key='', sep='.'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(self.flatten(v, new_key, sep=sep).items())
elif isinstance(v, list):
if isinstance(v[0], dict):
counter = 0
for entry in v:
new_count_key = new_key + sep + str(counter)
items.extend(self.flatten(entry, new_count_key, sep=sep).items())
counter += 1
else:
items.append((new_key, v))
if new_key not in self.key_list:
self.key_list.append(new_key)
else:
items.append((new_key, v))
if new_key not in self.key_list:
self.key_list.append(new_key)
return dict(items)
When I meassure the time needed for exceution I found out the highest amount of time is caused by the instance checks. With 2500 dicts for example, the instance checks need around 6 seconds out of 12 seconds total.
Is there any way, I can speed up this function?