Just like CS112
We need 5 bits to encode whole aphabet.
We can do it in a better way.
We can have shorter encodings for more frequent letters.
This will work as long as prefixes don't overlap.
Goal is to minimize ABL (Average # of bits per letter)
Calculate ABL with ABL = sum ( frequency * #bits)
It would actually be pretter interesting if they explained how they got this formula. It wouldn't help me learn at all, but it would have been cool
How to minimize it? Brute force? If that was the answer we wouldn't be studying this.
Use binary tree to represent encoding. encoding is the path you take to get to the letter (all letters are on leaves)
0 left 1 right
There's a proof in the book about how the tree has to be full. I thought this was pretty intuitive. I don't know if this is because I've seen this before or if it actually is just pretty intuitive.
Proof is on page 168
We know it makes a lot of sense to have the most frequent letters near the top of the tree and to have letter with similar frequencies to have similar bit lengths.
This is how we are going to put them in the tree.
More frequent → higher up in teh tree → fewer bits
Similar frequency → sibilings or similar # bits
Algorithm on Page 172.
Informal and intuitive explanation below
Put em all in a priority Q
Grab two with lowest frequencies (so that they can be toward the bottom of the tree) and make them have a common parent. make the parent's frequency the sum of those of it's children. this is O(log n)
Do this over and over until we only have on thingy in the PQ.
This is our tree.
This leads us to have similar frequency letters on similar levels.
We only do this n-1 times because we remove two items from the PQ and add one, with a net loss of 1 item, and we do this until there is only one item left in the PQ
So, the whole algorithm is O(nlogn)
Proof on page 174-175 for optimality
Informal explanation of proof
Contradiction proof
Suppose this algorithm is not optimal.
This means there is some subtree Z such that the ABL of Z is less than the ABL of the entire tree
We know that there is at least one subtree Z where (because the entire tree is full) two letters are the leaves of some node in Z. Let us call these letters y and x.
If we delete y and x and label their former parent w
I don't really understand the rest of the proof. It's very confusing. I'll have to keep looking at it.