Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| courses:cs211:winter2018:journals:patelk:chapter4 [2018/03/10 16:56] – [4.8 Huffman Codes and Data COmpression] patelk | courses:cs211:winter2018:journals:patelk:chapter4 [2018/03/10 20:29] (current) – [4.8 Huffman Codes and Data COmpression] patelk | ||
|---|---|---|---|
| Line 345: | Line 345: | ||
| Readability: | Readability: | ||
| - | ===== 4.8 Huffman Codes and Data COmpression | + | ===== 4.8 Huffman Codes and Data Compression |
| * Greedy rule to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion | * Greedy rule to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion | ||
| Line 405: | Line 405: | ||
| * can lock them together because we know they end up as sibling leaves below a common parent | * can lock them together because we know they end up as sibling leaves below a common parent | ||
| * Replace y* and z* with meta-letter, | * Replace y* and z* with meta-letter, | ||
| + | // | ||
| {{: | {{: | ||
| Line 410: | Line 411: | ||
| {{: | {{: | ||
| + | * bottom-up approach: focuses on the leaves representing the two lowest-frequency letters, and then continues by recursion. | ||
| + | |||
| + | **Analyzing the Algorithm** | ||
| + | * //The Optimality of the Algorithm// | ||
| + | * Recursive algorithm: proof by induction | ||
| + | * Base: optimal for all two letter alphabets (one bit per letter) | ||
| + | * By induction: optimal for all alphabets of size k-1 | ||
| + | * Consider: alphabet S of size k | ||
| + | * The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code | ||
| + | * // | ||
| + | * Can be polynomial time in k, the number of letters in the alphabet | ||
| + | * recursive calls: sequence of k-1 iterations | ||
| + | * each iteration consists simply of identifying the two lowest-frequency letters -> O(k) | ||
| + | * combined: O(k^2) | ||
| + | * Use a priority queue to maintain a set of k elements | ||
| + | * allows for insertion and extraction | ||
| + | * Each iteration: extract the minimum twice; insert a new letter whose key is the sum of these two minimum frequencies | ||
| + | * Using priority queues via heaps: each insertion and extraction: O(logk) -> over k iterations, O(klogk) | ||
| + | |||
| + | **Extensions** | ||
| + | * Data Compression: | ||
| + | * need to be highly compressed: use a " | ||
| + | * Cannot adapt to changes in text: frequency is stable for half the text, but then changes radically | ||
| + | * Halfway into the sequence insert some kind of instruction to completely change the encoding | ||
| + | * This will pay off in the long run: called adaptive compression schemes | ||
| + | |||
| + | ==== Personal Thoughts ==== | ||
| + | |||
| + | While arguably one of the more interesting topics of this textbook, this section was long and very dense. There was a lot of information that was hard to digest all at once. Our discussion in class was helpful, but not long enough to help make this section very clear. I will most likely need to review this chapter section again. | ||
| + | Readability: | ||
