Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2011:journals:wendy:chapter4 [2011/03/01 00:51] – [Section 7: Clustering] shangw
+++ courses:cs211:winter2011:journals:wendy:chapter4 [2011/03/02 04:59] (current) – shangw
@@ Line 79: / Line 79: @@
 This section talks about another problem-Clustering-and how to solve it with greedy algorithm. Clustering (of maximum spacing) can be precisely described as partitioning a collection of objects U such that the maximum possible spacing s achieved (That is, the minimum distance between any pair of points lying in different clusters is minimized).
-The algorithm is exactly the Kruskal's algorithm except that we stop it just before adding the last k-1th edge. Basically, it is constructing the minimum spanning tree without the k-1 most weighted edges.
+The algorithm is exactly the Kruskal's algorithm except that we stop it just before adding the last k-1th edge, hence its running time is O(mlogn). Basically, it is constructing the minimum spanning tree without the k-1 most weighted edges. We can prove it using a contradiction.
+Practical applications can be finding out the clusters and using low-cost network within the cluster and using more robust network between each cluster.
+This section is quite short but the application of the minimum spanning tree is interesting.
+===== Section 8: Huffman Code and Data Compression =====
+This section introduces us the Huffman Code and how to use greedy algorithm to achieve.
+First, the book gradually introduces the origin of Huffman coding from using 1,0 bits representing letters, the different frequency of usage of letters in English, the Morse code which is similar but easier with the pausing, Prefix codes and the way to calculate the optimal prefix codes (minimize the sum of frequency*bit-length).
+To begin design the algorithm, intuitively the Binary Tree comes to mind because if we only use the leaves to represent letters then it is gonna be prefix code since no parent is involved. It is not hard to see that the optimal prefix code corresponds with some full binary tree. Now we attempt to construct the algorithm to build such a full binary tree. The Top-Down Approach as in the S-F code is the first try, unfortunately the example used earlier is already a counter example. To perfectly modify this approach needs to borrow knowledge from dynamic programming, which has not come up yet. However, if given a tree structure (that is, all the frequencies of the letters), we can use the two lowest-frequency letters as leaves (then add up and regard as another leave) to plant the tree.This is the Huffman coding. The greedy part is to merge the two lowest-frequency leaves together in every recursive call, which is not hard to prove to be the optimal one. Priority Queue is the best data structure to implement the Huffman Coding, because the two lowest ones are easy to be extracted and push back the combined element to the queue it will automatically be in position.
+Compression is not always as easy as the Huffman coding. Tow major challenges are the "fractional bit representation" and, as mentioned before, the variation of text. Also, there are occasions when compression is not necessary such as all letters appear at similar frequency. Doing the compression in turn adds more work.
+I like the structure of this section because it tends to "guide" us to seek the right solution. Also, it stirred in some interesting stories along with the Huffman coding.
+Readability is 8.