Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2012:journals:virginia:chapter4 [2012/02/26 19:04] โ€“ [Section 6: Union-Find Data Structure] lovellvcourses:cs211:winter2012:journals:virginia:chapter4 [2012/03/06 16:01] (current) โ€“ [Section 8: Huffman Codes and Data Compression] lovellv
Line 59: Line 59:
 ===== Section 7: Clustering ===== ===== Section 7: Clustering =====
  
 +In this section, we look at clustering problems, problems where we want to group similar objects close to each other and far apart from others.  In these problems, we will typically define a distance function on the objects to quantify their similarity, and use these distances to build some "good" grouping.  
 +
 +One basic type of clustering we might consider is a k-clustering of maximum spacing.  In this clustering, we want to divide the objects into k groups, looking to maximize spacing (the minimum distance between any pairs of points in different clusters).  We can find this clustering through a single-link greedy clustering algorithm that joins the closest pair of objects that are not already in the same cluster at every pass, stopping when there are k clusters.  This algorithm is the same as Kruskal's algorithm, stopped when k connected components are obtained.  It is equivalent to removing the k-1 most expensive edges from the MST (p159).   
 +
 +Readability: 7
 ===== Section 8: Huffman Codes and Data Compression ===== ===== Section 8: Huffman Codes and Data Compression =====
  
 +We also use greedy algorithms to help with encoding, the process of mapping symbols (letters, for example) to bits.  There are many possible ways to encode our alphabet.  We want to find some optimal encoding by taking advantage of the non-uniformity of the alphabet, so that the average number of bits per letter (ABL) is as small as possible.  We also want an encoding that does not require spaces or other separators between letters, that is, we want a prefix code, an encoding where no letter is a prefix of another.
 +
 +We will represent a prefix code using a binary tree, where each letter is a leaf node and the path to a letter is its encoding.  The length of an encoding is the depth of that letter in the tree, so we are interested in minimizing the weighted averages of the depth of the leaves.  Given a set of letter frequencies, we can find this optimal prefix code using Huffman's Algorithm (p 172).  In this algorithm, we begin at the bottom of the tree, assigning the letters with lowest frequency as siblings and creating a meta-letter as their parent that has frequency equal to the sum of their frequencies. We then recursively repeat this process until the tree is complete.  This algorithm as O(nlogn) run time when using a priority queue on an alphabet of n characters.
  
 +Readability: 7                
  
  
  
  
courses/cs211/winter2012/journals/virginia/chapter4.1330283063.txt.gz ยท Last modified: 2012/02/26 19:04 by lovellv
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0