Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2012:journals:virginia:chapter4 [2012/02/26 19:00] – [Section 6: Union-Find Data Structure] lovellvcourses:cs211:winter2012:journals:virginia:chapter4 [2012/03/06 16:01] (current) – [Section 8: Huffman Codes and Data Compression] lovellv
Line 19: Line 19:
 ===== Section 4: Shortest Paths in a Graph ===== ===== Section 4: Shortest Paths in a Graph =====
  
-Our next problem is to find the shortest path from a node s to every other node in a directed graph where edge lengths vary.  We can use Dijkstra's algorithm to solve this problem.  In this algorithm, we begin with an explored set that contains only s.  At each pass, for each unexplored node, we determine the shortest path that can be constructed by traversing through the explored part and then going along one more edge to the unexplored node.  We choose the unexplored node that has the shortest of these paths to add to the explored set (p 138).  +Our next problem is to find the shortest path from a node s to every other node in a directed graph where edge lengths vary.  We can use **Dijkstra's algorithm** to solve this problem.  In this algorithm, we begin with an explored set that contains only s.  At each pass, for each unexplored node, we determine the shortest path that can be constructed by traversing through the explored part and then going along one more edge to the unexplored node.  We choose the unexplored node that has the shortest of these paths to add to the explored set (p 138).  
  
 We can prove Dijkstra's algorithm works using an inductive greedy stays ahead proof that uses the idea that at any point in the algorithm's execution, the path to nodes in the explored set is the shortest path.  By using a priority queue, we can implement this algorithm in O(m log n) time.   We can prove Dijkstra's algorithm works using an inductive greedy stays ahead proof that uses the idea that at any point in the algorithm's execution, the path to nodes in the explored set is the shortest path.  By using a priority queue, we can implement this algorithm in O(m log n) time.  
Line 28: Line 28:
 In the minimum spanning tree problem, we want to find the subset of edges of a graph that will keep the graph connected and have the smallest total edge weight.  There are three algorithms that accomplish this. In the minimum spanning tree problem, we want to find the subset of edges of a graph that will keep the graph connected and have the smallest total edge weight.  There are three algorithms that accomplish this.
  
-Kruskal's Algorithm: Add edges in order of increasing cost as long as a cycle is not created.  +**Kruskal's Algorithm**: Add edges in order of increasing cost as long as a cycle is not created.  
  
-Prim's Algorithm: Choose a root and build a tree from it, adding whichever node is cheapest to add to the current partial tree.+**Prim's Algorithm**: Choose a root and build a tree from it, adding whichever node is cheapest to add to the current partial tree.
  
-Reverse-Delete Algorithm: Remove edges in order of descending cost, as long as the graph is not disconnected.+**Reverse-Delete Algorithm**: Remove edges in order of descending cost, as long as the graph is not disconnected.
  
 To prove that these algorithms work, we first need to note two properties: the Cut Property and the Cycle Property.  The Cut Property says that if S is a proper subset of nodes of a graph V and e is the minimum cost edge from S to V\S, then e must be in the minimum spanning tree.  The Cycle Property say that the most expensive edge in a cycle cannot be in a minimum spanning tree.  The proof of Kruskal's and Prim's follows from the Cut Property.  The proof of Reverse-Delete follows from the Cycle Property.   To prove that these algorithms work, we first need to note two properties: the Cut Property and the Cycle Property.  The Cut Property says that if S is a proper subset of nodes of a graph V and e is the minimum cost edge from S to V\S, then e must be in the minimum spanning tree.  The Cycle Property say that the most expensive edge in a cycle cannot be in a minimum spanning tree.  The proof of Kruskal's and Prim's follows from the Cut Property.  The proof of Reverse-Delete follows from the Cycle Property.  
Line 43: Line 43:
 We need a data structure to help us implement Kruskal's algorithm.  The Union-Find data structure will help us by allowing us to maintain information about and perform operations on disjoint sets of a graph.  The three operations supported by Union-Find are: We need a data structure to help us implement Kruskal's algorithm.  The Union-Find data structure will help us by allowing us to maintain information about and perform operations on disjoint sets of a graph.  The three operations supported by Union-Find are:
  
-1. MakeUnionFind(S), which returns a Union-Find structure where every element is in its own set, +  * MakeUnionFind(S), which returns a Union-Find structure where every element is in its own set, 
-2. Find(u), which returns the name of the set containing u, and +  Find(u), which returns the name of the set containing u, and 
-3. Union(A, B), which merges the sets A and B into a single set.+  Union(A, B), which merges the sets A and B into a single set.
  
 One implementation for Union-Find uses an array Component to store the name of the set that contains each element.  When two sets are merged, the name of the larger set is kept.  In this implementation, Find is O(1), MakeUnionFind is O(n) and k Union operations take at most O(k logk) time.   One implementation for Union-Find uses an array Component to store the name of the set that contains each element.  When two sets are merged, the name of the larger set is kept.  In this implementation, Find is O(1), MakeUnionFind is O(n) and k Union operations take at most O(k logk) time.  
Line 59: Line 59:
 ===== Section 7: Clustering ===== ===== Section 7: Clustering =====
  
 +In this section, we look at clustering problems, problems where we want to group similar objects close to each other and far apart from others.  In these problems, we will typically define a distance function on the objects to quantify their similarity, and use these distances to build some "good" grouping.  
 +
 +One basic type of clustering we might consider is a k-clustering of maximum spacing.  In this clustering, we want to divide the objects into k groups, looking to maximize spacing (the minimum distance between any pairs of points in different clusters).  We can find this clustering through a single-link greedy clustering algorithm that joins the closest pair of objects that are not already in the same cluster at every pass, stopping when there are k clusters.  This algorithm is the same as Kruskal's algorithm, stopped when k connected components are obtained.  It is equivalent to removing the k-1 most expensive edges from the MST (p159).   
 +
 +Readability: 7
 ===== Section 8: Huffman Codes and Data Compression ===== ===== Section 8: Huffman Codes and Data Compression =====
  
 +We also use greedy algorithms to help with encoding, the process of mapping symbols (letters, for example) to bits.  There are many possible ways to encode our alphabet.  We want to find some optimal encoding by taking advantage of the non-uniformity of the alphabet, so that the average number of bits per letter (ABL) is as small as possible.  We also want an encoding that does not require spaces or other separators between letters, that is, we want a prefix code, an encoding where no letter is a prefix of another.
 +
 +We will represent a prefix code using a binary tree, where each letter is a leaf node and the path to a letter is its encoding.  The length of an encoding is the depth of that letter in the tree, so we are interested in minimizing the weighted averages of the depth of the leaves.  Given a set of letter frequencies, we can find this optimal prefix code using Huffman's Algorithm (p 172).  In this algorithm, we begin at the bottom of the tree, assigning the letters with lowest frequency as siblings and creating a meta-letter as their parent that has frequency equal to the sum of their frequencies. We then recursively repeat this process until the tree is complete.  This algorithm as O(nlogn) run time when using a priority queue on an alphabet of n characters.
  
 +Readability: 7                
  
  
  
  
courses/cs211/winter2012/journals/virginia/chapter4.1330282806.txt.gz · Last modified: 2012/02/26 19:00 by lovellv
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0