Need an algorithm to optimize grouping python dictionaries in hierarchical form

Question

Previously I asked a question on how to group dictionaries in a list in a hierarchical structure.

Initially I wanted to group a list of dictionaries that looks like the following, using multiple keys:

[{'dept':1, 'age':10, 'name':'Sam'},
{'dept':1, 'age':12, 'name':'John'},
.
.
.
{'dept':2,'age':20, 'name':'Mary'},
{'dept':2,'age':11, 'name':'Mark'},
{'dept':2,'age':11, 'name':'Tom'}]

And the output would be:

[
    {
        2: {
            20: [
                {
                    'age': 20,
                    'dept': 2,
                    'name': 'Mary'
                }
            ]
        },
        {
            11: [
                {
                    'age': 11,
                    'dept': 2,
                    'name': 'Mark'
                },
                {
                    'age': 11,
                    'dept': 2,
                    'name': 'Tom'
                }
            ]
        }
    },
    {
        1: {
            10: [
                {
                    'age': 10,
                    'dept': 1,
                    'name': 'Sam'
                }
            ]
        },
        {
            12: [
                {
                    'age': 12,
                    'dept': 1,
                    'name': 'John'
                }
            ]
        }
    }
]

Using this code:

import itertools, operator

l = [{'dept':1, 'age':10, 'name':'Sam'},
        {'dept':1, 'age':12, 'name':'John'},
        {'dept':2,'age':20, 'name':'Mary'},
        {'dept':2,'age':11, 'name':'Mark'},
        {'dept':2,'age':11, 'name':'Tom'}]

groups = ['dept', 'age', 'name'] 

groups.reverse()
def hierachical_data(data, groups):
    g = groups[-1]
    g_list = []
    for key, items in itertools.groupby(data, operator.itemgetter(g)):
        g_list.append({key:list(items)})
    groups = groups[0:-1]
    if(len(groups) != 0):
        for e in g_list:
            for k,v in e.items():
                e[k] = hierachical_data(v, groups)

    return g_list

print hierachical_data(l, groups)

This method works fine, but now I need to optimize. The dictionaries has a big memory overhead and this grouping gets pretty slow when we are dealing with huge number of records.

I was wondering if there is any algorithm I could use to reduce the time needed to do the grouping.

P.S: I wouldn't mind changing the data-structure as long as it gives me the same hierarchical format, or any better suggestion of course.

Thanks.

Welcome to Code Review. If you read the FAQ, you will find that you need to post your code into your question or we'll have to close it. Please edit your post and paste in your code so that we can review it. For a really great question (worthy of upvoting) please include everything needed (including data) so that others can execute your code easily.
@Leonid Actually that's currently the solution I am using. To make it clear these records come from a database and then they are serialized into a dictionary. I use grouping and aggregation functions on the database side to reduce the number of records returned, but I was wondering if I can make my python code (the part that does the rest of the processing) even better.
It would be useful to have an actual sample of large data so we could run that. Otherwise, we are optimizing in the dark.
it seams weird that your output data structure is a list of dictionaries with only one key in each. IMHO you need a list only at the last nesting level (where you store you originals record).
The perhaps you should use the power of the database partly - select dept, age, name from something order by dept, age; This is O(n), but if this is too slow, then yes - hack away.

Oved D · Answer 1 · 2012-10-22 15:39:20Z

Try doing a recursive sequence that iterates through each group, by passing a group index to each call, and find and build the tree as it receives inputs:

def generate_hierarchical_hash(entries, groups):
  """Returns hierarchical hash of entries, grouped at each level by the groups"""
  hierarchy = {}
  for to_insert in entries:
    insert_into_hierarchy(to_insert, hierarchy, groups, 0)
  return hierarchy


def insert_into_hierarchy(to_insert, hierarchy, groupings, current_grouping_level):
  if current_grouping_level == len(groupings):
    # if at final level, do not do anything - protection case
    pass
  else:
    grouping_at_level = groupings[current_grouping_level]
    grouping_key = to_insert[grouping_at_level]
    is_final_group = current_grouping_level == len(groupings) - 1
    if is_final_group:
      # if at last grouping, nodes will contain lists, so create or retrieve list
      # and append value to it
      if grouping_key in hierarchy:
        list_for_group = hierarchy[grouping_key]
      else:
        # create new list and insert it into hierarchy
        list_for_group = list()
        hierarchy[grouping_key] = list_for_group

      # insert to insert into end of list
      list_for_group.append(to_insert)
    else:
      # otherwise, find or create hash for this grouping,
      if grouping_key in hierarchy:
        hierarchy_for_group = hierarchy[grouping_key]
      else:
        # create hash and insert into parent hash
        hierarchy_for_group = {}
        hierarchy[grouping_key] = hierarchy_for_group
      # recursive call, go down a level and group
      insert_into_hierarchy(to_insert, hierarchy_for_group, groupings, current_grouping_level + 1)

This could be called like:

input =[{'dept':1, 'age':10, 'name':'Sam'},
        {'dept':1, 'age':12, 'name':'John'},
        {'dept':2,'age':20, 'name':'Mary'},
        {'dept':2,'age':11, 'name':'Mark'},
        {'dept':2,'age':11, 'name':'Tom'}]

groups = ['dept', 'age', 'name']

hierarchical_hash = generate_hierarchical_hash(input, groups)

Thanks a lot for your answer. Please allow me sometime to test it before I mark the question as answered.
I am sorry I haven't checked it yet things have been crazy at my end.

asked	6 months ago
viewed	152 times
active	6 months ago

Need an algorithm to optimize grouping python dictionaries in hierarchical form

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python optimization algorithm performance or ask your own question.

Need an algorithm to optimize grouping python dictionaries in hierarchical form

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python optimization algorithm performance or ask your own question.

Related