This is an old revision of the document!


4.8 Huffman Codes and Data Compression

The Problem

  • Encoding Symbols Using Bits:
    • We need schemes to encode text written in richer alphabets and converts this text into long strings of bits.
    • The first thought would be to represent all of the symbol by the same number of bits
    • However, this is not efficient as some symbols are frequent while others are less frequent.
    • Thus symbols are represented differently depending on their frequency.
    • Data Compression Algorithms are devoted at representing large files as compact as possibly, that is they take files as input and reduce their space through efficient encoding schemes


Variable-Length Encoding Schemes:

  • Basically more frequent letters are represented differently from the less frequent ones.
  • The goal is to represent symbols with as short bits as possible


Prefix codes:

  • It's the encoding of symbols in such a way that no encoding is a prefix of any other
  • A Prefix code for a set S of letters is a function f that maps each letter x∈ S to some sequence of zeros and ones such that:
    • For distinct x,y ∈ S, the sequence f(x) is not a prefix of the sequence f(y).


Optimal Prefix Codes :

  • For each letter x in S, there is a frequency fx, that represents the fraction of letters in the text that are equal to x
  • Assuming there are n letters, n*fx of these letters equal to x
  • The frequencies sum to 1 –>∑x ∈ S fx = 1
courses/cs211/winter2012/journals/jeanpaul/chapterfour_sectionviii.1331008279.txt.gz · Last modified: 2012/03/06 04:31 by mugabej
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0