Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:ahmadh:ch4 [2018/03/11 06:34] – ahmadh
+++ courses:cs211:winter2018:journals:ahmadh:ch4 [2018/03/11 06:49] (current) – [4.8.3 Comments] ahmadh
@@ Line 223: / Line 223: @@
 As such, the search for an optimal prefix code can be viewed as the search for a binary tree T, together with a labeling of the leaves of T, that minimizes the average number of bits per letter. Note that the length of the encoding of a letter x ∈ S is simply the length of the path from the root to the leaf labeled x. Thus we dre seeking the labeled tree that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: ∑ (x ∈ S) f_x × depth(x). We will use ABL(T) to denote this quantity.
+=== 4.8.2.1 An Algorithm to Construct an Optimal Prefix Code ===
+	To construct a prefix code for an alphabet S, with given frequencies:
+	   If S has two letters then
+	      Encode one letter using 0 and the other letter using 1
+	   Else
+	      Let y* and z* be the two lowest-frequency letters
+	      Form a new alphabet S’ by deleting y* and z* and
+	        replacing them with a new letter ω of frequency f_y* + f_z*
+	      Recursively construct a prefix code γ’ for S’, with tree T’
+	      Define a prefix code for S as follows:
+	         Start with T’
+	         Take the leaf labeled ω and add two children below it labeled y* and z*
+	   Endif
+The proof of correctness and optimality for this algorithm can be found on pages 174-5 of the textbook, and is omitted here for redundancy purposes.
+This algorithm is referred to as Huffman's Algorithm, and the prefix code it generates is called a Huffman Code. With the most basic implementation, identifying the lowest-frequency letters can be done in a single scan of the alphabet, in time O(k), and so summing this over the k - 1 iterations gives O(k^2) time. However, this time can be improved upon. We can maintain the alphabet S in a priority queue, using each letter’s frequency as its key. In each iteration we just extract the minimum twice (this gives us the two lowest-frequency letters), and then we insert a new letter whose key is the sum of these two minimum frequencies. Our priority queue now contains a representation of the alphabet that we need for the next iteration. We can make each insertion and extraction of the minimum run in time O(log k); hence, each iteration--which performs just three of these operations--takes time O(log k). Summing over all k iterations, we get a total running time of O(k log k).
+==== 4.8.3 Comments ====
+I feel like it is safe to say that this was by far the most interesting section that I remember reading in the book. I am not sure why, but the problem seemed really practical/important to me, and I can totally see how big of a deal this algorithm was when it came out. The idea of encoding the most frequent letters using fewer bits seems like a no brainer. The section attracted and then kept my attention //throughout// the reading--and that is a first. 10/10. :)