Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionviii [2012/03/06 03:42] – created mugabej | courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionviii [2012/03/06 05:27] (current) – [Implementation and Running Time] mugabej | ||
---|---|---|---|
Line 2: | Line 2: | ||
===== The Problem ===== | ===== The Problem ===== | ||
+ | |||
+ | |||
+ | * ** Encoding Symbols Using Bits**: | ||
+ | * We need schemes to encode text written in richer alphabets and converts this text into long strings of bits.\\ | ||
+ | * The first thought would be to represent all of the symbol by the same number of bits | ||
+ | * However, this is not efficient as some symbols are frequent while others are less frequent. | ||
+ | * Thus symbols are represented differently depending on their frequency. | ||
+ | * //Data Compression Algorithms// | ||
+ | |||
+ | \\ | ||
+ | ** Variable-Length Encoding Schemes**: | ||
+ | * Basically more frequent letters are represented differently from the less frequent ones. | ||
+ | * The goal is to represent symbols with as short bits as possible | ||
+ | \\ | ||
+ | ** Prefix codes**: | ||
+ | * It's the encoding of symbols in such a way that no encoding is a prefix of any other | ||
+ | * A //Prefix code// for a set S of letters is a function γ that maps each letter x∈ S to some sequence of zeros and ones such that: | ||
+ | * For distinct x,y ∈ S, the sequence γ(x) is not a prefix of the sequence γ(y). | ||
+ | \\ | ||
+ | ** Optimal Prefix Codes **: | ||
+ | * For each letter x in S, there is a frequency f< | ||
+ | * Assuming there are n letters, n*f< | ||
+ | * The frequencies sum to 1 --> | ||
+ | * |γ(x)|: Length of encoding of x | ||
+ | * ABL = Σ< | ||
+ | * ABL is the Average number of Bits per Letter | ||
+ | \\ | ||
+ | \\ | ||
+ | ==== Designing the Algorithm ==== | ||
+ | |||
+ | * The search space of the problem includes all of the possible ways of mapping letters to bit strings | ||
+ | * We need to develop a tree-based means of representing prefix codes that exposes their structure more clearly | ||
+ | \\ | ||
+ | \\ | ||
+ | ** Representing Prefix Codes using Binary Trees ** | ||
+ | * We label each leaf of the binary tree with a distinct letter in the size of the alphabet S | ||
+ | * For each letter x in S, we follow the path from the root to the leaf labeled x: | ||
+ | * Each time the path goes from a node to its left child, we write down a 0 | ||
+ | * Each time the path goes from a node to its right child, we write down a 1 | ||
+ | * The Encoding of S constructed from T is a prefix code | ||
+ | * The search for an optimal prefix code reduces to the search of a binary Tree T, together with a labeling of the leaves of T, that minimizes the ABL. | ||
+ | * Thus the length of the encoding of a letter x in S is simply the length from the root to the leaf labeled x: This length is termed the ** depth** of the leaf, and the depth of a leaf v in T will be denoted by depth< | ||
+ | * So now our goal is to find the labeled tree T that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: ∑< | ||
+ | * The Binary Tree corresponding to the optimal prefix code is full | ||
+ | ==== Algorithm to Construct an Optimal Prefix Code ==== | ||
+ | |||
+ | >>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | \\ | ||
+ | * The algorithm above is referred to as the //Huffman Algorithm// and the prefix code it produces is termed the //Huffman code// | ||
+ | * This algorithm produces an optimal prefix code | ||
+ | * This algorithm also follows a bottom-up approach as seen: It focuses on the leaves representing the two lowest-frequency letters, and then continues by recursion. | ||
+ | \\ | ||
+ | \\ | ||
+ | ==== Analyzing the Algorithm ==== | ||
+ | |||
+ | * The algorithm is optimal | ||
+ | * ABL(T' | ||
+ | * The Huffman code for a given alphabet achieves the minimum ABL of any prefix code | ||
+ | |||
+ | ==== Implementation and Running Time ==== | ||
+ | |||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | ||
+ | \\ | ||
+ | So overall takes O(nlogn).\\ | ||
+ | \\ | ||
+ | Very intriguing section although its proofs were long. I give it an 8/10. | ||
+ | |||
+ | |||