Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

I'm trying to write a program in c for Huffman coding, but I am stuck. For input I have:

Sample input:
4      // here I scan how many letters I have
A 00   // and for everyone I scan how they are coded in string down
B 10     
C 01
D 11
001010010101001011010101010110011000 //this is a suboptimal huffman code

So first I have to decode this string, and to find out how many times every letter appear. And I already do that. But now I have to find out how many bits have every letter using huffman tree, and in the output I have to print the average bit per symbol.

The output for this example here have to be:

Sample output
1.722

So now, how to find out how many bits have every letter with huffman coding?

share|improve this question
    
In your example, since every letter uses two bits, I don't see why the average isn't 2.00 ABBCCCABDCCCCCBCBA –  Jim May 1 at 1:22
add comment

2 Answers

To solve this you need to create the huffman tree and compute the bits needed to represent every symbol. Then you can compute total bits needed for original string in huffman encoding and divide by number of characters.

First you map your input string based on the original character encoding :

00 A
10 B
10 B
01 C
01 C
01 C
00 A
10 B
11 D
01 C
01 C
01 C
01 C
01 C
10 B
01 C
10 B
00 A

Next you count number of occurrence of each character:

3 00,A
9 01,C
5 10,B
1 11,D

Now we make a min priority queue using the occurrence as key, this looks like :

[(1,D), (3,A), (5, B), (9,C)]

Keep applying the huffman process ( http://en.wikipedia.org/wiki/Huffman_coding ). So first you combine D and A to make a new node 'DA' which key = 1+3 = 4. Put this back in the priority queue:

[(4, DA), (5, B), (9,C)]

Now DA and B combine to give DAB:

[(9, DAB), (9,C)]

Now DAB and C combine to give root node : 'DABC'

[(18, DABC)]

Now the process stops and we give each character a new encoding based on how far it is away from the root node. 'C' was combined the last so that get's only one bit. Let's say I always use '0' for the second element ( of the two that got picked from priority queue). The implicit bits are represented in parenthesis:

C =      0, DAB =      1
B = (1)  0, DA  = (1)  1
A = (11) 0, D   = (11) 1

So you get the encoding:

C = 0
B = 10
A = 110
D = 111

Encoding original message:

Total bits needed = 9 * 1 + 5 * 2 + 3 * 3 + 3 * 1 
= 9 + 10 + 9 + 3 
= 31

Number of Characters = 18

Average bits = 31 / 18 = 1.722222
share|improve this answer
    
thanks this was realy helpful, but i don't know how to make this:Now we make a min priority queue using the occurrence as key, this looks like : [(1,D), (3,A), (5, B), (9,C)] in C programming language –  Maria May 1 at 17:48
    
I have explained this as a concept, to do it in 'C' language requires coding up priority queue data structure, this is usually coded as a heap (en.wikipedia.org/wiki/Priority_queue). In C++, if you use STL, you can use the priority_queue data structure to achieve this. –  user3585718 May 6 at 16:42
add comment

Once you have the Huffman coding tree, the optimum code for each symbol is given by the path to the symbol in the tree.

For instance, let's take this tree and say that left is 0 and right is 1 (this is arbitrary) :

/ \
A  \
   /\
  B  \
     /\
     C D

Path to A is left, therefore its optimum code is 0, the length of this code is 1 bit. Path to B is right, left, its code is 10, length 2 bits. C is right, right, left, code 110 ,3 bits, and D right, right, right, right, code 1111, 4 bits.

Now you have the length of each code and you already computed the frequency of each symbol. The average bits per symbol is the average across these code lengths weighted by the frequency of their associated symbols.

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.