Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2012:journals:virginia:chapter4 [2012/02/26 19:03] – [Section 6: Union-Find Data Structure] lovellvcourses:cs211:winter2012:journals:virginia:chapter4 [2012/03/06 16:01] (current) – [Section 8: Huffman Codes and Data Compression] lovellv
Line 43: Line 43:
 We need a data structure to help us implement Kruskal's algorithm.  The Union-Find data structure will help us by allowing us to maintain information about and perform operations on disjoint sets of a graph.  The three operations supported by Union-Find are: We need a data structure to help us implement Kruskal's algorithm.  The Union-Find data structure will help us by allowing us to maintain information about and perform operations on disjoint sets of a graph.  The three operations supported by Union-Find are:
  
-  MakeUnionFind(S), which returns a Union-Find structure where every element is in its own set, +  MakeUnionFind(S), which returns a Union-Find structure where every element is in its own set, 
-  Find(u), which returns the name of the set containing u, and +  Find(u), which returns the name of the set containing u, and 
-  Union(A, B), which merges the sets A and B into a single set.+  Union(A, B), which merges the sets A and B into a single set.
  
 One implementation for Union-Find uses an array Component to store the name of the set that contains each element.  When two sets are merged, the name of the larger set is kept.  In this implementation, Find is O(1), MakeUnionFind is O(n) and k Union operations take at most O(k logk) time.   One implementation for Union-Find uses an array Component to store the name of the set that contains each element.  When two sets are merged, the name of the larger set is kept.  In this implementation, Find is O(1), MakeUnionFind is O(n) and k Union operations take at most O(k logk) time.  
Line 59: Line 59:
 ===== Section 7: Clustering ===== ===== Section 7: Clustering =====
  
 +In this section, we look at clustering problems, problems where we want to group similar objects close to each other and far apart from others.  In these problems, we will typically define a distance function on the objects to quantify their similarity, and use these distances to build some "good" grouping.  
 +
 +One basic type of clustering we might consider is a k-clustering of maximum spacing.  In this clustering, we want to divide the objects into k groups, looking to maximize spacing (the minimum distance between any pairs of points in different clusters).  We can find this clustering through a single-link greedy clustering algorithm that joins the closest pair of objects that are not already in the same cluster at every pass, stopping when there are k clusters.  This algorithm is the same as Kruskal's algorithm, stopped when k connected components are obtained.  It is equivalent to removing the k-1 most expensive edges from the MST (p159).   
 +
 +Readability: 7
 ===== Section 8: Huffman Codes and Data Compression ===== ===== Section 8: Huffman Codes and Data Compression =====
  
 +We also use greedy algorithms to help with encoding, the process of mapping symbols (letters, for example) to bits.  There are many possible ways to encode our alphabet.  We want to find some optimal encoding by taking advantage of the non-uniformity of the alphabet, so that the average number of bits per letter (ABL) is as small as possible.  We also want an encoding that does not require spaces or other separators between letters, that is, we want a prefix code, an encoding where no letter is a prefix of another.
 +
 +We will represent a prefix code using a binary tree, where each letter is a leaf node and the path to a letter is its encoding.  The length of an encoding is the depth of that letter in the tree, so we are interested in minimizing the weighted averages of the depth of the leaves.  Given a set of letter frequencies, we can find this optimal prefix code using Huffman's Algorithm (p 172).  In this algorithm, we begin at the bottom of the tree, assigning the letters with lowest frequency as siblings and creating a meta-letter as their parent that has frequency equal to the sum of their frequencies. We then recursively repeat this process until the tree is complete.  This algorithm as O(nlogn) run time when using a priority queue on an alphabet of n characters.
  
 +Readability: 7                
  
  
  
  
courses/cs211/winter2012/journals/virginia/chapter4.1330283029.txt.gz · Last modified: 2012/02/26 19:03 by lovellv
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0