Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:mccaffreyk:home:4 [2018/03/11 03:17] – mccaffreyk
+++ courses:cs211:winter2018:journals:mccaffreyk:home:4 [2018/03/11 19:53] (current) – mccaffreyk
@@ Line 37: / Line 37: @@
 ==== Section 4.8: Huffman Codes and Data Compression ====
-Here, we focus on how computers store and transfer information using "bits". We can represent 2^b characters uniquely, where b is the total number of bits. However, this is not always optimal. Bits cost memory to store. Further, there is often ambiguity when character representations are ordered. For instance, suppose that a=1 and b=11. If we have word ab, we would have bits 111, which could also be aaa. If we added spaces, then we would technically be using a 3 bit system instead of 2 (0,1, space), which we want to avoid here. Thus, we are divided between minimizing the amount of space required and avoiding ambiguity.
+Here, we focus on how computers store and transfer information using "bits". We can represent 2^b characters uniquely, where b is the total number of bits. However, this is not always optimal. Bits cost memory to store. Further, there is often ambiguity when character representations are ordered. For instance, suppose that a=1 and b=11. If we have word ab, we would have bits 111, which could also be aaa. We say that b has a prefix of a. If we added spaces, then we would technically be using a 3 bit system instead of 2 (0,1, space), which we want to avoid here. Thus, we are divided between minimizing the amount of space required and avoiding ambiguity. To do this, we need an algorithm which generates non-ambiguous(prefix) codes using the least amount of bits on average. To do this, the algorithm will use fewer bits to represent characters which tend to be typed more frequently and vice versa. This should decrease the average number of bits. That is, our algorithm should generate prefix codes which are "optimal", thus satisfying both goals. We clarify some basic facts. First, the binary tree created will be full since each node which is not a leaf has two children. This is because such trees are always more optimal in terms of space than those which are not full. Second, we know that in the tree, more frequently occurring characters should be at lower layers. This is because if we represent bits as quantities of edge path to root, lower layers mean fewer bits required. Third, if a node has the lowest frequency in one of our binary trees, it must be a leaf as it is required to have maximum depth. Thus, we arrive at Huffman's algorithm, a complicated recursive step set to construct such a tree. We should note that this algorithm inserts letters to the tree from lowest to highest frequency. There exists a simple equation to determine average bit-length change in a tree after adding another node. That is, average bit length of new tree=average bit length of old tree-frequency of new node. This reflects the fact that adding nodes to the tree. Higher frequencies mean more change between trees since they cause the number of layers to increase faster(2,4,8 etc). The concrete proof for this is very complex and I do not fully understand the math behind it. The total running time of Huffman's algorithm is O(k log k)where k is the number of letters in the alphabet. We could perhaps improve by using fractions of bits instead of full bits, but this holds deep challenges and complexities. Overall, I give this section a 4/10. The concepts were mostly straightforward but I could not really understand Huffman's algorithm or many of the proofs discussed.