Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:ahmadh:ch4 [2018/03/06 01:44] – ahmadh
+++ courses:cs211:winter2018:journals:ahmadh:ch4 [2018/03/11 06:49] (current) – [4.8.3 Comments] ahmadh
@@ Line 182: / Line 182: @@
 ===== 4.7 Clusters =====
+Clustering arises whenever one has a collection of objects that one is trying to classify or organize into coherent groups. One common approach to this problem is to define a distance function on the objects, with the interpretation that obiects at a larger distance from one another are less similar to each other. Given a distance function on the objects, the clustering problem seeks to divide them into groups so that, intuitively, obiects within the same group are "close," and objects in different groups are "far apart."
+Suppose we are given a set U of n obiects, labeled p_1, p_2, ..., p_n. For each pair, p_i and p_j, we have a numerical distance d(p_i, p_j). Suppose we are seeking to divide the obiects in U into k groups, for a given parameter k. We say that a k-clustering of U is a partition of U into k non-empty sets C_1, C_2 ..... C_k. We define the spacing of a k-clustering to be the minimum distance between any pair of points lying in different clusters.
+As such, there are exponentially many different k-clusterings of a set U. How then can we efficiently find the one that has maximum spacing?
+==== 4.7.1 Designing the Algorithm ====
+To find a clustering of maximum spacing, we consider growing a graph on the vertex set U. The connected components will be the clusters, and we will try to bring nearby points together into the same cluster as rapidly as possible. Thus we start by drawing an edge between the closest pair of points. We then draw an edge between the next closest pair of points. We continue adding edges between pairs of points, in order of increasing distance d(p_i, p_j). In this way, we are growing a graph H on U edge by edge, with connected components corresponding to clusters. If we
+are about to add the edge (p_i, p_j) and find that p_i and p_j already belong to the same cluster, we will refrain from adding the edge--it’s not necessary, because it won’t change the set of components. In this way, our graph-growing process will never create a cycle; so H will actually be a union of trees. Each time we add an edge that spans two distinct components, it is as though we have merged the two corresponding clusters.
+Although our graph-growing procedure was motivated by this cluster-merging idea, our procedure is precisely Kruskal’s Minimum Spanning Tree Algorithm. We are doing exactly what Kruskal’s Algorithm would do if given a graph G on U in which there was an edge of cost d(p_i, p_j) between each pair of nodes (p_i, p_j). The only difference is that we seek a k-clustering, so we stop the procedure once we obtain k connected components.
+Claim: The components C_1, C_2, ..., C_k formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.
+The detailed proof for this claim can be found on pages 160-1 of the book, and is omitted here for redundancy purposes.
+==== 4.7.2 Comments ====
+I felt like this was one of the less interesting sections that we've done. I don't mean to imply that the problem was insignificant--I just felt like the section did not captivate my attention as much. And since we could not really discuss the algorithm in detail in class on Friday (due to lack of time), I found myself spending more time than expected in understanding the algorithm and its proofs. In terms of how interesting the section was, I'd say about 6/10.
+===== 4.8 Huffman Codes and Data Compression =====
+==== 4.8.1 Prefix and Optimal Prefix Codes ====
+We say that a prefix code for a set S of letters is a function γ that maps each letter x ∈ S to some sequence of zeros and ones, in such a way that for distinct x, y ∈ S, the sequence γ(x) is not a prefix of the sequence γ(y). If we have a text consisting of a sequence of letters x_1 x_2 x_3 . . . x_n, we can convert this to a sequence of bits by simply encoding each letter as a bit sequence using γ and then concatenating all these bit sequences together: γ(x_l) γ(x_2) . . . γ(x_n).
+However, given an alphabet and a set of frequencies for the letters, we would ideally like to produce a prefix code that is as efficient as possible--namely, a prefix code that minimizes the average number of bits per letter (ABL). We call such a prefix code optimal.
+==== 4.8.2 Designing the Algorithm ====
+We describe a greedy method to construct an optimal prefix code efficiently. First, however, it is useful to develop a tree-based means of representing prefix codes that exposes their structure more clearly than simply the lists of function values (γ).
+A labeled, //full// binary tree T naturally describes a prefix code. For each letter x ∈ S, we follow the path from the root to the leaf labeled x: each time the path goes from a node to its left child, we write down a 0, and each time the path goes from a node to its right child, we write down a 1. We take the resulting string of bits as the encoding of x.
+We claim that the encoding of S from T is a prefix code. This is true because in order for the encoding of x to be a prefix of the encoding of y, the path from the root to x would have to be a prefix of the path from the root
+to y. However, this is the same as saying that x would lie on the path from the root to y, which isn’t possible if x is a leaf node. What's more is that this relationship between binary trees and prefix codes works in the other direction as well.
+As such, the search for an optimal prefix code can be viewed as the search for a binary tree T, together with a labeling of the leaves of T, that minimizes the average number of bits per letter. Note that the length of the encoding of a letter x ∈ S is simply the length of the path from the root to the leaf labeled x. Thus we dre seeking the labeled tree that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: ∑ (x ∈ S) f_x × depth(x). We will use ABL(T) to denote this quantity.
+=== 4.8.2.1 An Algorithm to Construct an Optimal Prefix Code ===
+	To construct a prefix code for an alphabet S, with given frequencies:
+	   If S has two letters then
+	      Encode one letter using 0 and the other letter using 1
+	   Else
+	      Let y* and z* be the two lowest-frequency letters
+	      Form a new alphabet S’ by deleting y* and z* and
+	        replacing them with a new letter ω of frequency f_y* + f_z*
+	      Recursively construct a prefix code γ’ for S’, with tree T’
+	      Define a prefix code for S as follows:
+	         Start with T’
+	         Take the leaf labeled ω and add two children below it labeled y* and z*
+	   Endif
+The proof of correctness and optimality for this algorithm can be found on pages 174-5 of the textbook, and is omitted here for redundancy purposes.
+This algorithm is referred to as Huffman's Algorithm, and the prefix code it generates is called a Huffman Code. With the most basic implementation, identifying the lowest-frequency letters can be done in a single scan of the alphabet, in time O(k), and so summing this over the k - 1 iterations gives O(k^2) time. However, this time can be improved upon. We can maintain the alphabet S in a priority queue, using each letter’s frequency as its key. In each iteration we just extract the minimum twice (this gives us the two lowest-frequency letters), and then we insert a new letter whose key is the sum of these two minimum frequencies. Our priority queue now contains a representation of the alphabet that we need for the next iteration. We can make each insertion and extraction of the minimum run in time O(log k); hence, each iteration--which performs just three of these operations--takes time O(log k). Summing over all k iterations, we get a total running time of O(k log k).
+==== 4.8.3 Comments ====
+I feel like it is safe to say that this was by far the most interesting section that I remember reading in the book. I am not sure why, but the problem seemed really practical/important to me, and I can totally see how big of a deal this algorithm was when it came out. The idea of encoding the most frequent letters using fewer bits seems like a no brainer. The section attracted and then kept my attention //throughout// the reading--and that is a first. 10/10. :)