Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:beckg:ch4 [2018/03/07 04:33] – beckg
+++ courses:cs211:winter2018:journals:beckg:ch4 [2018/03/12 20:48] (current) – beckg
@@ Line 157: / Line 157: @@
+===== 4.8: Huffman Codes and Data Compression =====
+Encoding complex information into binary sequences is a crucial part of data compression. A particular component is focused on storing information in as little physical memory as possible, and that's where taking advantage of the nonuniform frequencies of letter appearances comes into play. "Letters" in this case refers to fundamental pieces of whatever "alphabet" may be being used--this does not necessarily mean ASCII based information, it could be some form of encoded info.
+A //prefix code// for a set S of letters is a function g that maps each letter to some sequence of zeros and ones, in such a way that for distinct letters x and y in S, the sequence g(x) is not a prefix of the sequence g(y). This then allows the encoded message be perfectly decoded by scanning the bit sequence left to right. //Optimal// prefix codes are those that minimize the average number of bits required per letter (ABL(g)). This coincides with a given text to be translated as it depends on the relative frequencies of the letters' appearances which sum to 1.
+Note that we can represent prefix codes as binary trees, by simply defining from the root node, taking the path to the left child is a 0 and the right a 1. Additionally, the tree that corresponds to the optimal prefix code is //full//--each non-leaf node has two children. Also, note that the ABL therefore can be thought of as the weighted average of the depths of leaves (the weight is the frequency of each letter's appearance).
+Proven in the book, we have that there is an optimal prefix code and corresponding tree T in which the two lowest frequency letters are assigned to leaves that are siblings in T. So, given the relative frequencies of each letter, //Huffman's Algorithm// creates such a tree recursively from the bottom up by:
+  * If S has two letters: encode one using 0 and one using 1.
+  * Else:
+    * Let y and z be the two lowest freq. letters
+    * form a new alphabet S' by deleting y and z and replacing them with a new letter w of frequency //f<sub>y</sub>+f<sub>z</sub>//.
+    * Recursively construct a prefix code for S' with tree T'
+    * Define a prefix code for S as follows:
+      * Start with T'
+      * Take leaf labeled w and add two children below it labeled y and z
+The codes produced by this are called //Huffman codes//, and they achieve the minimum ABL of any prefix code. Implementing the algorithm using heap based PQs, we get a running time of //O(n logn)// for an alphabet of n letters.
+I very much liked this section. It was very good explaining everything, as well as the nuances involved in data compression. It certainly "fits" in this chapter of the book due to the greedy nature of the algorithm, but the introduction to information theory made the field seem fascinating and all I want is more! 9/10.