algorithm


Improvements requested by Nick Larsen:

  • This topic would benefit from additional syntax notation, explanation of parameters, or remarks. - 2d ago
    Comments:
    • Does not explain what a greedy algorithm is at all. - Nick Larsen

This draft deletes the entire topic.

inline side-by-side expand all collapse all

Examples

  • Improvements requested by Nick Larsen:

    • This was posted as an example, but it does not attempt to illustrate the topic. It should possibly be an edit to the topic, a separate topic, or deleted altogether. - 2d ago
      Comments:
      • Huffman Encoding could be an entire topic with examples that are implementations in various languages. The Huffman encoding page should then be linked from the Greedy Algorithms page. - Nick Larsen
    0

    The Huffman Encoding algorithm is an extremely simple algorithm for compressing data. In this example, we compress text data to binary data of variable lengths. This algorithm is called "greedy" because it compresses the most common characters to shorter strings than less common characters.

    (Example in Python 2.7.11)

    from heapq import heapify, heappush, heappop
    
    def huffman(char_freqs):
        ''' The Huffman Encoding algorithm, taking a dictionary of
            characters and their frequencies.
        '''
        min_queue = [[val, [key, '']] for key, val in char_freqs.iteritems()]
        heapify(min_queue)
        while len(min_queue) > 1:
            low = heappop(min_queue)
            high = heappop(min_queue)
            for pair in low[1:]:
                pair[1] = '0' + pair[1]
            for pair in high[1:]:
                pair[1] = '1' + pair[1]
            heappush(min_queue, [low[0] + high[0]] + low[1:] + high[1:])
    
        return sorted(heappop(min_queue)[1:], key=lambda v: (len(v[-1]), v))
    
    char_freqs = {'A': 0.25, 'T: 0.5, 'C': 0.2, 'G': 0.05}
    encoded = huffman(char_freqs)
    print(encoded)
    # prints: [['T', '1'], ['A', '00'], ['C', '011'], ['G', '010']]
    

    Now, imagine we had a long string representing a genome. That string would only have four characters, repeated many times (A, C, T, and G). We could compress this long string for storage by replacing each letter with it's small binary representation, saving us a lot of space.

I am downvoting this example because it is...

Syntax

Syntax

Parameters

Parameters

Remarks

Remarks

Still have question about Greedy Algorithms? Ask Question

Huffman Encoding

0

The Huffman Encoding algorithm is an extremely simple algorithm for compressing data. In this example, we compress text data to binary data of variable lengths. This algorithm is called "greedy" because it compresses the most common characters to shorter strings than less common characters.

(Example in Python 2.7.11)

from heapq import heapify, heappush, heappop

def huffman(char_freqs):
    ''' The Huffman Encoding algorithm, taking a dictionary of
        characters and their frequencies.
    '''
    min_queue = [[val, [key, '']] for key, val in char_freqs.iteritems()]
    heapify(min_queue)
    while len(min_queue) > 1:
        low = heappop(min_queue)
        high = heappop(min_queue)
        for pair in low[1:]:
            pair[1] = '0' + pair[1]
        for pair in high[1:]:
            pair[1] = '1' + pair[1]
        heappush(min_queue, [low[0] + high[0]] + low[1:] + high[1:])

    return sorted(heappop(min_queue)[1:], key=lambda v: (len(v[-1]), v))

char_freqs = {'A': 0.25, 'T: 0.5, 'C': 0.2, 'G': 0.05}
encoded = huffman(char_freqs)
print(encoded)
# prints: [['T', '1'], ['A', '00'], ['C', '011'], ['G', '010']]

Now, imagine we had a long string representing a genome. That string would only have four characters, repeated many times (A, C, T, and G). We could compress this long string for storage by replacing each letter with it's small binary representation, saving us a lot of space.

Topic Outline