Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2018:journals:boyese:chapter4 [2018/03/12 16:38] – [Section 4.8: Huffman Codes and Data Compression] boyesecourses:cs211:winter2018:journals:boyese:chapter4 [2018/03/12 18:25] (current) – [Section 4.8: Huffman Codes and Data Compression] boyese
Line 263: Line 263:
 ====Section 4.8: Huffman Codes and Data Compression==== ====Section 4.8: Huffman Codes and Data Compression====
  
 +===Huffman Codes===
 +**Problem: Encoding**
 +  * Computers use bits: 0s and 1s
 +  * Need to represent what humans know to what computers know
 +  * Map symbol → unique sequence of 0s and 1s
 +  * Process is called encoding
 +  * The least number of bits needed to encode 26 letters, space, and 5 punctuation marks (, . ? ! ‘) (32 characters = 25), so 5 bits
 +  * For a longer string of characters, we can use shorter encodings for frequently used characters like ETAOINSHRDLU
 +      * Goal: optimal encoding that takes advantage of non-uniformity of letter frequencies 
 +      * Looking to represent data as compactly as possible
 +      * Example: Morse code, which is an example of variable-length encoding
 +      * We run into a problem when doing this: ambiguity caused by the encoding of one character being a prefix of encoding another
  
 +**Optimal Prefix Codes**
 +  * Solution to ambiguity: Prefix codes, which are a map of letters to bit strings such that no encoding is a prefix of any other
 +      * Goal: minimize the average number of bits per letter (ABL):
 +      * Σ<sub>x∈S</sub> : frequency of x * length of encoding x
 +      * fx : frequency that the letter x occurs
 +      * |ϒ(x)|: length of encoding of x
 +      * Minimize ABL =  Σ<sub>x∈S</sub> f<sub>x</sub>|ϒ(x)|
 +  * Binary tree to represent prefix codes
 +      * Exposes structure netter than list of mappings
 +          * Each leaf node is a letter
 +          * Follow path to the letter
 +              * Going left: 0, Going right: 1
 +  * Recursively generate tree
 +      * All letters are in root node
 +      * For all letters in node:
 +          * If encoding begins with 0, letter belongs in left subtree 
 +          * Otherwise (encoding begins with 1), letter belongs in right subtree 
 +          * If last bit of encoding, make the letter a leaf node of that subtree
 +          * Shift encoding one bit 
 +          * Process left and right children 
 +  * Tree Properties
 +      * The length of the path from root to leaf is the depth
 +      * The binary tree T corresponding to the optimal prefix code is full, i.e., each internal node has two children
 +  * Conclusions
 +      * The binary tree corresponding to the optimal prefix code is full, i.e., each internal node has two children
 +      * We want to label the leaf nodes of the binary tree corresponding to the optimal prefix code such that nodes with greatest depth have least frequency
 +      * The two letters with least frequency are definitely going to be siblings 
 +
 +**Huffman’s Algorithm**
 +  * Data structures needed:
 +      * Binary tree for the prefix codes
 +      * Priority queue for choosing the node with lowest frequency 
 +  * Costs
 +      * Inserting and extracting node into PQ: O(log n) 
 +      * Number of insertions and extractions: O(n)
 +      * O(n log n)
  
 <code> <code>
Line 271: Line 319:
     Else     Else
         Let y* and z* be the two lowest-frequency letters         Let y* and z* be the two lowest-frequency letters
-        Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter ω of frequency f<sub>y*</sub> + f<sub)z*</sub>+        Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter ω of frequency f<sub>y*</sub> + f<sub>z*</sub>
         Recursively construct a prefix code γ’ for S’, with tree T’         Recursively construct a prefix code γ’ for S’, with tree T’
         Define a prefix code for S as follows:         Define a prefix code for S as follows:
courses/cs211/winter2018/journals/boyese/chapter4.1520872684.txt.gz · Last modified: by boyese
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0