Is there a better way to implement this Huffman Algorithm in Python?

Question

I have written this implementation of Huffman coding. Please suggest ways to make this code better, i.e. more Pythonic.

from collections import Counter

#####################################################################

class Node(object):
  def __init__(self, pairs, frequency):
    self.pairs = pairs
    self.frequency = frequency

  def __repr__(self):
    return repr(self.pairs) + ", " + repr(self.frequency)

  def merge(self, other):
    total_frequency = self.frequency + other.frequency
    for p in self.pairs:
      p[1] = "1" + p[1]
    for p in other.pairs:
      p[1] = "0" + p[1]
    new_pairs = self.pairs + other.pairs
    return Node(new_pairs, total_frequency)

#####################################################################

def huffman_codes(s):
  frequency_table = Counter(s)
  initial_table = []
  for k, v in frequency_table.items():
    initial_table.append([k, v])
  initial_table = sorted(initial_table, key = lambda p : p[1])
  # print(initial_table)
  table = []
  for entry in initial_table:
    ch = entry[0]
    freq = entry[1]
    table.append(Node([[ch, ""]], freq))
  # print(table)
  while len(table) > 1:
    first_node = table.pop(0)
    second_node = table.pop(0)
    new_node = first_node.merge(second_node)
    table.append(new_node)
    table = sorted(table, key = lambda n : n.frequency)
    # print(table)
  return dict(map(lambda p: (p[0], p[1]), table[0].pairs))

#####################################################################

print(huffman_codes('yashaswita'))

Thanks

Also, if you want to compare methods, there is an early version of a Huffman coding module I wrote as an exercise on pastebin. The core is the same as what I ended up with, but it ended up getting significantly refactored and simplified.

cldy · Accepted Answer · 2011-08-21 14:10:23Z

I agree with agf except for point 4. This is my try on your code:

from collections import Counter
import heapq

class Node(object):
    def __init__(self, pairs, frequency):
        self.pairs = pairs
        self.frequency = frequency

    def __repr__(self):
        return repr(self.pairs) + ", " + repr(self.frequency)

    def merge(self, other):
        total_frequency = self.frequency + other.frequency
        for p in self.pairs:
            p[1] = "1" + p[1]
        for p in other.pairs:
            p[1] = "0" + p[1]
        new_pairs = self.pairs + other.pairs
        return Node(new_pairs, total_frequency)

    def __lt__(self, other):
        return self.frequency < other.frequency

def huffman_codes(s):
    table = [Node([[ch, '']], freq) for ch, freq in Counter(s).items()]
    heapq.heapify(table)
    while len(table) > 1:
        first_node = heapq.heappop(table)
        second_node = heapq.heappop(table)
        new_node = first_node.merge(second_node)
        heapq.heappush(table, new_node)
    return dict(table[0].pairs)

print(huffman_codes('yashaswita'))

Nice, (I'm out of votes), but you actually did exactly what I did in #4 except for not needing to sort because you added the heapq, and better variable naming.

user6532 · Answer 2 · 2011-08-22 05:55:54Z

From my part I will start to fix some pep8 issues...

from collections import Counter


class Node(object):
    def __init__(self, pairs, frequency):
        self.pairs = pairs
        self.frequency = frequency

    def __repr__(self):
        return repr(self.pairs) + ", " + repr(self.frequency)

    def merge(self, other):
        total_frequency = self.frequency + other.frequency
        for p in self.pairs:
            p[1] = "1" + p[1]
        for p in other.pairs:
            p[1] = "0" + p[1]
        new_pairs = self.pairs + other.pairs
        return Node(new_pairs, total_frequency)


def huffman_codes(s):
    frequency_table = Counter(s)
    initial_table = []
    for k, v in frequency_table.items():
        initial_table.append([k, v])
    initial_table = sorted(initial_table, key=lambda p: p[1])
    # print(initial_table)
    table = []
    for entry in initial_table:
        ch = entry[0]
        freq = entry[1]
        table.append(Node([[ch, ""]], freq))
    # print(table)
    while len(table) > 1:
        first_node = table.pop(0)
        second_node = table.pop(0)
        new_node = first_node.merge(second_node)
        table.append(new_node)
        table = sorted(table, key=lambda n: n.frequency)
        # print(table)
    return dict(map(lambda p: (p[0], p[1]), table[0].pairs))

print(huffman_codes('yashaswita'))

agf · Answer 3 · 2011-08-21 13:12:39Z

Really does belong on codereview.

Use four space indentation. This is the Python standard, and will make any code more "Pythonic". Edit: You're following PEP-8 in all other respects as far as I can tell, so if you like two space indentation, it's not a big deal. Your code is very easy to read.
You don't need to sort every time; you don't even need a fully sorted list. Python has a datatype for this -- it's called heapq. Use it's heappush and heappop methods, so you don't have to sort every time. You can actually write a single line loop to do the tree building. Read what that page says about priority queues -- it's what you're using.

You don't need anything like

return dict(map(lambda p: (p[0], p[1]), table[0].pairs))

just

return dict(table[0].pairs)

does exactly the same thing.

If you really want to minimize lines, everything before the while can be written as one line:

table = sorted((Node([[p[0], ""]], p[1]) for p in Counter(s).iteritems()), key = lambda n : n.frequency)

Eran Zimmerman · Answer 4 · 2011-08-21 13:12:08Z

up vote 3 down vote

~~Generally, loops aren't encouraged in python, since it is interpreted rather than compiled, thus the loop body will be "recompiled" at every iteration.~~

using map/reduce is one good way to avoid loops. another is using the for.. in, e.g.:

initial_table=[[k, v] for k, v in frequency_table.items()]

and similarly for your 2nd for loop.

answered Aug 21 '11 at 13:12

Eran Zimmerman

3

Loops aren't encouraged in Python? The loop body will be recompiled at every iteration? That's completely and totally wrong. You actual advice for how to write it is fine, but the first paragraph is entirely wrong. -1 if you don't edit this. – agf Aug 21 '11 at 13:13

Sorry, that's what i've been taught. I guess I was told wrong, then. Good to know, thanks :) – Eran Zimmerman Aug 21 '11 at 13:27

1

Python code is bytecompiled once, before it's executed. The names in the loop are looked up again on every iteration, because Python is a dynamic language and you could be changing what they mean; this is also generally true for a list comprehension. With map, the function you call repeatedly is only looked up once, however. The Python interpreter does a good job optimizing list comprehensions though (and they tend to be a bit faster than explicit loops) so map usually doesn't help much. – agf Aug 21 '11 at 13:33

asked	1 year ago
viewed	755 times
active	1 year ago

Is there a better way to implement this Huffman Algorithm in Python?

migrated from stackoverflow.com Aug 21 '11 at 19:11

4 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python algorithm or ask your own question.

Linked

Is there a better way to implement this Huffman Algorithm in Python?

migrated from stackoverflow.com Aug 21 '11 at 19:11

4 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python algorithm or ask your own question.

Linked

Related