Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| courses:cs211:winter2018:journals:beckg:ch4 [2018/03/07 04:33] – beckg | courses:cs211:winter2018:journals:beckg:ch4 [2018/03/12 20:48] (current) – beckg | ||
|---|---|---|---|
| Line 157: | Line 157: | ||
| + | |||
| + | ===== 4.8: Huffman Codes and Data Compression ===== | ||
| + | Encoding complex information into binary sequences is a crucial part of data compression. A particular component is focused on storing information in as little physical memory as possible, and that's where taking advantage of the nonuniform frequencies of letter appearances comes into play. " | ||
| + | |||
| + | A //prefix code// for a set S of letters is a function g that maps each letter to some sequence of zeros and ones, in such a way that for distinct letters x and y in S, the sequence g(x) is not a prefix of the sequence g(y). This then allows the encoded message be perfectly decoded by scanning the bit sequence left to right. //Optimal// prefix codes are those that minimize the average number of bits required per letter (ABL(g)). This coincides with a given text to be translated as it depends on the relative frequencies of the letters' | ||
| + | |||
| + | Note that we can represent prefix codes as binary trees, by simply defining from the root node, taking the path to the left child is a 0 and the right a 1. Additionally, | ||
| + | |||
| + | Proven in the book, we have that there is an optimal prefix code and corresponding tree T in which the two lowest frequency letters are assigned to leaves that are siblings in T. So, given the relative frequencies of each letter, // | ||
| + | * If S has two letters: encode one using 0 and one using 1. | ||
| + | * Else: | ||
| + | * Let y and z be the two lowest freq. letters | ||
| + | * form a new alphabet S' by deleting y and z and replacing them with a new letter w of frequency // | ||
| + | * Recursively construct a prefix code for S' with tree T' | ||
| + | * Define a prefix code for S as follows: | ||
| + | * Start with T' | ||
| + | * Take leaf labeled w and add two children below it labeled y and z | ||
| + | |||
| + | The codes produced by this are called //Huffman codes//, and they achieve the minimum ABL of any prefix code. Implementing the algorithm using heap based PQs, we get a running time of //O(n logn)// for an alphabet of n letters. | ||
| + | |||
| + | I very much liked this section. It was very good explaining everything, as well as the nuances involved in data compression. It certainly " | ||
