Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2014:journals:colin:chapter4 [2014/02/27 05:04] – mohnacsc
+++ courses:cs211:winter2014:journals:colin:chapter4 [2014/03/05 19:16] (current) – 4.7-4.8 mohnacsc
@@ Line 53: / Line 53: @@
 As the algorithms we explore begin to become more complicated, it is more important than ever to work through proofs slowly.  It is much harder to fully understand optimal solutions upon first glance when the problem sets become large and complex.  I found the similarities of the BFS and Dijkstra’s Algorithm interesting.  I don’t see much of a difference, just an ordering of the edges added (not very complicated).
+==== 4.7 - Clustering ====
+Clustering occurs when breaking down a collection of objects into coherent groups.  This is possible through a distance function, assuming that closer items are more related.  The spacing between clusterings is the minimum distance between any pair in different clusters.  The goal is to find to maximum possible spacing between clusters.
+One method is to add edges between nodes in order of increasing distance.  If the edge connects nodes already in the same cluster, it doesn’t add them.  The algorithm terminates when it has reached some number of connected components.  The components C(1), …, C(k) formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.  Similar to Kruskal’s Algorithm.
+==== 4.8 - Huffman Codes and Data Compression ====
+Data compression deals with reducing the space of files through efficient encoding schemes.  Prefix codes are used to separate bit strings such that data can be stored using a variable bit depth.  This technique is useful when determining bit encoding for letters based on frequency.
+We use a Greedy Algorithm to find the optimal prefix code.  In a binary tree T, with nodes representing elements we wish to encode, following the path from the root to a desired node gives you the optimal prefix.  Every branch to the left, we write 0.  Naturally, every branch to the right in our path is written as a 1.  The sequence of bits along the path is the encoding.  The process can be applied inversely to build a tree from a prefix code.  It is easy to see how the length of encoding corresponds to the number of edges on its path.  The binary tree corresponding to the optimal prefix code is full (parent node has 2 children).
+Producing a tree with leaves as close to the root as possible gives us a small average leaf depth.  A top-down approach is to split the alphabet into two sets with a balanced frequency of letters.  Then, recursively construct prefix codes for both sets and make them the two children of the root node.  The question arises: how do we know what is acceptable as a balanced frequency for the sets?
+Huffman’s Algorithm requires that we group lower frequency letters recursively and replace them with meta-letters which are later expanded back into letters.  The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code.  Finding the lowest frequency letters take O(k) time and so, over k-1 iterations, the running time is O(k^2).  However, a priority queue would allow for insertion and extraction at O(log k) time.  Over k iterations, the total run time is O(k log k).