Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:nasona:chapter4 [2018/03/11 15:26] – [4.8 Huffman Codes] nasona
+++ courses:cs211:winter2018:journals:nasona:chapter4 [2018/03/11 22:01] (current) – [4.8 Huffman Codes] nasona
@@ Line 301: / Line 301: @@
 ======= 4.8 Huffman Codes =======
-•	A greedy rule is used to shrink the size of the problem instance so that an quivalent smaller problem can then be solved by recursion
-•	The global consequences of the initial greedy decision do not become fully apparent until the full recursion is complete
+==Summary==
-•	Data compression, an area that forms part of the foundation for digital communication
+We want to map encodings to letters so that no encoding is the prefix of another in order to avoid ambiguity. A prefix code that minimizes the average bits per letter is optimal. In other words, more frequently used letters should have shorter encodings in order to compress the data. A brute force algorithm is infeasible because the search space is so large and so variable. We will use a full binary tree to represent prefix codes all letters encoded will be leaf nodes to avoid any encoding being the prefix of another. Meta-letters will be interior nodes (parents to letters). The greedy aspect of Huffman's algorithm is that you merge the two lowest frequency letters in the algorithm. The bottom-up approach is the optimal approach. The runtime of the algorithm is O(klogk).
 ==The Problem==
+  * A greedy rule is used to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion
+  * The global consequences of the initial greedy decision do not become fully apparent until the full recursion is complete
+  * Data compression, an area that forms part of the foundation for digital communication
   * The mapping of bit strings to symbols is arbitrary
   * The letters in most human alphabets do not get used equally frequently
@@ Line 352: / Line 355: @@
 ==Huffman’s Algorithm==
      To construct a prefix code for an alphabet S, with given frequencies:
-                If S has two letters:
+     If S has two letters:
-			Encode one letter using 0 and the other using 1
+          Encode one letter using 0 and the other using 1
-		Else:
+     Else:
-			Let y* and z* be the two lowest-frequency letters
+          Let y* and z* be the two lowest-frequency letters
-			Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter w of frequency fy*+fz*
+          Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter w of frequency fy*+fz*
-			Recursively construct a prefix code y’ for S’, with tree T’
+          Recursively construct a prefix code y’ for S’, with tree T’
-			Define a prefix code for S as follows:
+          Define a prefix code for S as follows:
-				Start with T’
+               Start with T’
-				Take the leaf labeled w and add two children below it labeled y* and z*
+               Take the leaf labeled w and add two children below it labeled y* and z*
-		Endif
+      Endif
   * The greedy rule underlying Huffman’s Algorithm – the merging of the two lowest frequency letters – fits into the structure of the algorithm as a whole
@@ Line 374: / Line 377: @@
     * Each iteration, which performs three of these operations, takes time O(logk)
     * Summing over all k iterations, we get a total runtime of O(klogk)
-Extensions
+==Extensions==
   * The challenge here is to define an encoding schedule where the notion of using fractions of bits is well-defined
   * Another drawback of prefix codes is that they cannot adapt to changes in the text
+==Additional Information==
+A concept that was troubling me in the beginning stages of understanding Huffman codes was the concept of the meta-letter that is the parent of leaf nodes. As a read the section after the class, it made more sense. The meta-letter is there because a letter node cannot be there. A letter encoding cannot be an interior node because that means that it is the prefix of another node, which is the point of the Huffman codes. There can be no encoding that is the prefix of another. Moreover, meta-letters are helpful and make more sense in Huffman's optimal bottom up approach anyway. The first meta-letter joins the two letters of the lowest frequency and that way its like a super letter and now the alphabet has one less letter. This explanation of the purpose of the meta-letter also makes it clear why we would implement the algorithm with a priority queue.
+On a scale of 1 to 10, the readability of this section is a 9. The book explained really well the notation and the concepts behind the Huffman Code. Also, the main part of the code involved binary trees, which we've already seen multiple times in this class, so it was really building off the knowledge we already knew.