Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have written this implementation of Huffman coding. Please suggest ways to make this code better, i.e. more Pythonic.

from collections import Counter

#####################################################################

class Node(object):
  def __init__(self, pairs, frequency):
    self.pairs = pairs
    self.frequency = frequency

  def __repr__(self):
    return repr(self.pairs) + ", " + repr(self.frequency)

  def merge(self, other):
    total_frequency = self.frequency + other.frequency
    for p in self.pairs:
      p[1] = "1" + p[1]
    for p in other.pairs:
      p[1] = "0" + p[1]
    new_pairs = self.pairs + other.pairs
    return Node(new_pairs, total_frequency)

#####################################################################

def huffman_codes(s):
  frequency_table = Counter(s)
  initial_table = []
  for k, v in frequency_table.items():
    initial_table.append([k, v])
  initial_table = sorted(initial_table, key = lambda p : p[1])
  # print(initial_table)
  table = []
  for entry in initial_table:
    ch = entry[0]
    freq = entry[1]
    table.append(Node([[ch, ""]], freq))
  # print(table)
  while len(table) > 1:
    first_node = table.pop(0)
    second_node = table.pop(0)
    new_node = first_node.merge(second_node)
    table.append(new_node)
    table = sorted(table, key = lambda n : n.frequency)
    # print(table)
  return dict(map(lambda p: (p[0], p[1]), table[0].pairs))

#####################################################################

print(huffman_codes('yashaswita'))

Thanks

share|improve this question
2  
Also, if you want to compare methods, there is an early version of a Huffman coding module I wrote as an exercise on pastebin. The core is the same as what I ended up with, but it ended up getting significantly refactored and simplified. – agf Aug 21 '11 at 14:44
1  
"pythonic" rocks – Joset Aug 22 '11 at 1:42

migrated from stackoverflow.com Aug 21 '11 at 19:11

4 Answers

up vote 4 down vote accepted

I agree with agf except for point 4. This is my try on your code:

from collections import Counter
import heapq

class Node(object):
    def __init__(self, pairs, frequency):
        self.pairs = pairs
        self.frequency = frequency

    def __repr__(self):
        return repr(self.pairs) + ", " + repr(self.frequency)

    def merge(self, other):
        total_frequency = self.frequency + other.frequency
        for p in self.pairs:
            p[1] = "1" + p[1]
        for p in other.pairs:
            p[1] = "0" + p[1]
        new_pairs = self.pairs + other.pairs
        return Node(new_pairs, total_frequency)

    def __lt__(self, other):
        return self.frequency < other.frequency

def huffman_codes(s):
    table = [Node([[ch, '']], freq) for ch, freq in Counter(s).items()]
    heapq.heapify(table)
    while len(table) > 1:
        first_node = heapq.heappop(table)
        second_node = heapq.heappop(table)
        new_node = first_node.merge(second_node)
        heapq.heappush(table, new_node)
    return dict(table[0].pairs)

print(huffman_codes('yashaswita'))
share|improve this answer
2  
Nice, (I'm out of votes), but you actually did exactly what I did in #4 except for not needing to sort because you added the heapq, and better variable naming. – agf Aug 21 '11 at 14:18

From my part I will start to fix some pep8 issues...

from collections import Counter


class Node(object):
    def __init__(self, pairs, frequency):
        self.pairs = pairs
        self.frequency = frequency

    def __repr__(self):
        return repr(self.pairs) + ", " + repr(self.frequency)

    def merge(self, other):
        total_frequency = self.frequency + other.frequency
        for p in self.pairs:
            p[1] = "1" + p[1]
        for p in other.pairs:
            p[1] = "0" + p[1]
        new_pairs = self.pairs + other.pairs
        return Node(new_pairs, total_frequency)


def huffman_codes(s):
    frequency_table = Counter(s)
    initial_table = []
    for k, v in frequency_table.items():
        initial_table.append([k, v])
    initial_table = sorted(initial_table, key=lambda p: p[1])
    # print(initial_table)
    table = []
    for entry in initial_table:
        ch = entry[0]
        freq = entry[1]
        table.append(Node([[ch, ""]], freq))
    # print(table)
    while len(table) > 1:
        first_node = table.pop(0)
        second_node = table.pop(0)
        new_node = first_node.merge(second_node)
        table.append(new_node)
        table = sorted(table, key=lambda n: n.frequency)
        # print(table)
    return dict(map(lambda p: (p[0], p[1]), table[0].pairs))

print(huffman_codes('yashaswita'))
share|improve this answer

Really does belong on codereview.

  1. Use four space indentation. This is the Python standard, and will make any code more "Pythonic". Edit: You're following PEP-8 in all other respects as far as I can tell, so if you like two space indentation, it's not a big deal. Your code is very easy to read.

  2. You don't need to sort every time; you don't even need a fully sorted list. Python has a datatype for this -- it's called heapq. Use it's heappush and heappop methods, so you don't have to sort every time. You can actually write a single line loop to do the tree building. Read what that page says about priority queues -- it's what you're using.

  3. You don't need anything like

    return dict(map(lambda p: (p[0], p[1]), table[0].pairs))
    

    just

    return dict(table[0].pairs)
    

    does exactly the same thing.

  4. If you really want to minimize lines, everything before the while can be written as one line:

    table = sorted((Node([[p[0], ""]], p[1]) for p in Counter(s).iteritems()), key = lambda n : n.frequency)
    
share|improve this answer

Generally, loops aren't encouraged in python, since it is interpreted rather than compiled, thus the loop body will be "recompiled" at every iteration.

using map/reduce is one good way to avoid loops. another is using the for.. in, e.g.:

initial_table=[[k, v] for k, v in frequency_table.items()]

and similarly for your 2nd for loop.

share|improve this answer
3  
Loops aren't encouraged in Python? The loop body will be recompiled at every iteration? That's completely and totally wrong. You actual advice for how to write it is fine, but the first paragraph is entirely wrong. -1 if you don't edit this. – agf Aug 21 '11 at 13:13
Sorry, that's what i've been taught. I guess I was told wrong, then. Good to know, thanks :) – Eran Zimmerman Aug 21 '11 at 13:27
1  
Python code is bytecompiled once, before it's executed. The names in the loop are looked up again on every iteration, because Python is a dynamic language and you could be changing what they mean; this is also generally true for a list comprehension. With map, the function you call repeatedly is only looked up once, however. The Python interpreter does a good job optimizing list comprehensions though (and they tend to be a bit faster than explicit loops) so map usually doesn't help much. – agf Aug 21 '11 at 13:33

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.