Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:melkersonr:chapter4 [2018/03/06 04:16] – [Section 4.7] melkersonr
+++ courses:cs211:winter2018:journals:melkersonr:chapter4 [2018/03/13 01:40] (current) – [Section 4.8] melkersonr
@@ Line 41: / Line 41: @@
 ===== Section 4.5 =====
-   * **Summary & motivations:** Section 4.5 is The Minimum Spanning Tree Problem. The authors begin by telling us that this section uses an exchange argument to address algorithms for solving the "second fundamental problem on graphs: minimum spanning tree" (p142). A spanning tree is derived from a graph G = (V,E) by taking a subset of E such that the graph is still connected. A minimum spanning tree (MST) does so with the lowest cost. An example for where this algorithm is used is communication networks. The cable or phone company wants to connect all the houses, but using the least amount of wire. We learn that the minimum cost solution is a tree (**4.16**). If it were to contain a cycle, that would mean that two nodes u,v are connected in more than one way, so one of the ways can be deleted. Deleting one of the ways lowers the cost but leaves all the nodes still connected. So long as the graph is not simple, it could have exponentially many spanning trees, so we need an algorithm to find the minimum spanning tree. The section closes with examples of ways to extend the MST problem, but they are all much more computationally complicated than the version we study here.
+   * **Summary & motivations:** Section 4.5 is The Minimum Spanning Tree Problem. The authors begin by telling us that this section uses an exchange argument to address algorithms for solving the "second fundamental problem on graphs: minimum spanning tree" (p142). A spanning tree is derived from a graph G = (V,E) by taking a subset of E such that the graph is still connected. A minimum spanning tree (MST) does so with the lowest cost. An example for where this algorithm is used is communication networks - the cable or phone company wants to connect all the houses, but using the least amount of wire. We learn that the minimum cost solution is a tree (**4.16**). If it were to contain a cycle, that would mean that two nodes u,v are connected in more than one way, so one of the ways can be deleted. Deleting one of the ways lowers the cost but leaves all the nodes still connected. So long as the graph is not simple, it could have exponentially many spanning trees, so we need an algorithm to find the minimum spanning tree. The section closes with examples of ways to extend the MST problem, but they are all much more computationally complicated than the version we study here.
    * **About the Algorithms:** We actually explore three greedy algorithms that correctly find a minimum spanning tree. To prove optimality of the algorithms, we need to have rules that tell us when an edge should or should not be in the MST. The Cut Property tells when an edge //should// be in the MST, and the proof for the Cut Property utilizes an exchange argument. Kruskal's and Prim's Algorithms use the Cut Property for their optimality proofs. The Cycle Property tells when an edge should //not// be in the MST, and the proof for the Cycle Property utilizes an exchange argument as well. The Reverse-Delete Algorithm uses the Cycle Property for its optimality proof.
       * The first algorithm is Kruskal's Algorithm. You sort the edges by increasing cost and start with an empty tree. You add edges in increasing cost order so long as adding them does not create a cycle. If the addition of an edge would create a cycle, it's discarded/skipped. Once you go through all the edges you have the MST. This algorithm has a runtime of O(m log n) when using the right data structures. Implementation is left until Section 4.6.
@@ Line 54: / Line 54: @@
    * **Summary & motivations:** Section 4.6 is Implementing Kruskal's Algorithm: The Union-Find Data Structure. Here we explore the implementation that we left off in the previous section. It gets its own section, likely because it involves creating a whole new problem-specific data structure. This new data structure is called the Union-Find data structure. It allows us to quickly determine if the connected components of two nodes are the same or not and quickly merge/combine the connected components if they are different. The Union-Find data structure allows us to achieve the O(m log n) runtime for Kruskal's Algorithm we mentioned in the previous section. It's worth noting that the Union-Find data structure only works when adding edges, and not when deleting edges.
-   * **About the Algorithm:** This is really "About the Data Structure," but I digress. The Union-Find Data Structure has three operations: MakeUnionFind(S), Find(u), and Union(A,B). MakeUnionFind(S) returns a Union-Find Data Structure on S, meaning that every node has its own connected component to start. Goal runtime for MakeUnionFind(S) is O(n) where n is the number of nodes. Find(u) returns the name of the connected component of node u. Goal runtime for Find(u) is O(log n) time, but some implementations can have O(1) time. Union(A,B) merges the connected components with names A and B into one, updating the data structure in the process. Goal runtime for Union(A,B) is O(log n). We implement Kruskal's Algorithm by first calling MakeUnionFind(V) on V, the set of nodes in the graph, and then adding edges (u,v) in order of increasing cost such that Find(u) != Find(v) by calling Union(Find(u), Find(v)). In order to achieve these goal runtimes, we can use a "simple" Union-Find data structure, or we can use a "better" Union-Find data structure.
+   * **About the Algorithm:** This is really "About the Data Structure," but I digress. The Union-Find Data Structure has three operations: MakeUnionFind(S), Find(u), and Union(A,B). MakeUnionFind(S) returns a Union-Find Data Structure on S, meaning that every node has its own connected component to start. Goal runtime for MakeUnionFind(S) is O(n) where n is the number of nodes. Find(u) returns the name of the connected component of node u. Goal runtime for Find(u) is O(log n) time, but some implementations can have O(1) time. Union(A,B) merges the connected components with names A and B into one, updating the data structure in the process. Goal runtime for Union(A,B) is O(log n). We implement Kruskal's Algorithm by first calling MakeUnionFind(V) on V, the set of nodes in the graph, and then adding edges (u,v) in order of increasing cost such that Find(u) != Find(v) by calling Union(Find(u), Find(v)). We can use a "simple" Union-Find data structure, or we can use a "better" Union-Find data structure:
-     * In the "simple" version, we keep an array Component of size n to track the name of the connected component for each node. We set Component[s] = s for all nodes for MakeUnionFind with O(n) runtime, as desired. Because Component is an array, Find(v) is Component[v] with O(1) runtime, which is better than desired. We have to optimize to get the desired Union(A,B) runtime. If we keep track of the list of nodes in each connected component explicitly and keep track of connected component sizes in an array Size of length n, we can get the desired runtime. By keeping track of the nodes explicitly, we don't have to iterate through the whole array to find nodes that get changed in a Union operation. By maintaining information about connected component size, we can transfer over the minimum possible number of nodes between connected components A and B in a Union. We show that k Union operations takes at most O(k log k) time.
+     * In the "simple" version, we keep an array Component of size n to track the name of the connected component for each node. We set Component[s] = s for all nodes for MakeUnionFind with O(n) runtime, as desired. Because Component is an array, Find(v) is Component[v] with O(1) runtime, which is better than desired. We have to optimize to get a decent Union(A,B) runtime. We keep track of the list of nodes in each connected component explicitly and keep track of connected component sizes in an array Size of length n. By keeping track of the nodes explicitly, we don't have to iterate through the whole array to find nodes that get changed in a Union operation. By maintaining information about connected component size, we can transfer over the minimum possible number of nodes between connected components A and B in a Union. We show that k Union operations takes at most O(k log k) time.
-     * In the "better" version, we use pointers. For MakeUnionFind, each node points to itself (or the pointer is null), which is O(n). Names of connected components are just a node in that connected component. For Union(A,B), we must update the name of either A or B. Say the name of A is given by node v and the name of B is given by node u. We can merge the connected components by updating u's pointer to v. This implementation, though, causes Find(u) to run more slowly than in the "simple" version. Now we must trace through pointers to get the name of a node's connected component, since Union operations add on pointers. We show that this takes O(log n) time if we keep the larger connected component as the name when doing a Union. We maintain the extra connected component size information as an extra field in/on the nodes (where node here means a node and pointer node, not a graph node).
+     * In the "better" version, we use pointers. For MakeUnionFind, each node points to itself (or the pointer is null), which is O(n) as desired. Names of connected components are just a node in that connected component. For Union(A,B), we must update the name of either A or B. Say the name of A is given by node v and the name of B is given by node u. We can merge the connected components by updating u's pointer to v. This implementation, though, causes Find(u) to run more slowly than in the "simple" version. Now we must trace through pointers to get the name of a node's connected component, since Union operations add on pointers. We show that this takes O(log n) time if we keep the larger connected component as the name when doing a Union. We maintain the extra connected component size information as an extra field in/on the nodes (where node here means a node and pointer node, not a graph node).
    * **My Questions:** I'd ask the authors why they describe the desired runtime for Union(A,B) as being O(log n) but then in the "simple" data structure explanation they land on an O(k log k) runtime for k Union operations and in the "better" explanation they land on O(1) runtime. Why would you say the goal runtime is x, but example 1 gives y and example 2 gives z?
    * **Second Time Around:** Since we kind of breezed through this topic in class, I'd say I understand it better after reading about it more in depth and reading about the implementation.
@@ Line 64: / Line 64: @@
 ===== Section 4.7 =====
-   * **Summary & motivations:** Section 4.5 is Clustering. The problem of clustering arises when we want to classify subsets of a collection into "coherent groups" (p 158). To do so, we have to have a way of quantifying how similar the entities are. That can be done with a distance function. Distances don't have to be physical distances, though. Rather, they're some measure for similarity. In the end, entities in the same cluster should be similar and entities in different clusters should be dissimilar. The flavor of clustering that we explore here is Clusterings of Maximum Spacing. Given a set of entities where we can determine the distance between any two entities, we seek to find a k-clustering (divide the entities into k clusters). The spacing of a k-clustering is "the minimum distance between any pair of points lying in different clusters" (p 158). Maximum possible spacing suggests that the individual k clusters are as far apart as possible. If there are exponentially many k-clusterings for a set of entities, is there an algorithm for finding one with max spacing?
+   * **Summary & motivations:** Section 4.5 is Clustering. The problem of clustering arises when we want to classify subsets of a collection into "coherent groups" (p 158). To do so, we have to have a way of quantifying how similar the entities are. That can be done with a distance function. Distances don't have to be physical distances, though. Rather, they're some measure for similarity. In the end, entities in the same cluster should be similar and entities in different clusters should be dissimilar. The flavor of clustering that we explore here is Clusterings of Maximum Spacing. Given a set of entities where we can determine the distance between any two entities, we seek to find a k-clustering (divide the entities into k clusters). The spacing of a k-clustering is "the minimum distance between any pair of points lying in different clusters" (p 158). Maximum spacing suggests that the individual k clusters are as far apart as possible. If there are exponentially many k-clusterings for a set of entities, is there an algorithm for finding one with max spacing?
-   * **About the Algorithm:**
+   * **About the Algorithm:** The algorithm is quite simple. We just add edges in increasing distance order until we have k clusters. This algorithm is essentially Kruskal's Algorithm stopped "before it adds its last k - 1 edges" (p 159), which is the same as taking the MST produced by Kruskal's Algorithm and deleting the k - 1 most expensive edges. The text does not provide any further elaboration on this algorithm's implementation or runtime, which I assume is because it is so similar to Kruskal's Algorithm. I believe the runtime of this algorithm is O(m log n).
-   * **My Questions:**
+   * **My Questions:** Am I correct that the runtime is O(m log n)?
-   * **Second Time Around:**
+   * **Second Time Around:** Either we didn't finish the lecture on this in class or I'm experiencing amnesia. Either way, I did not know how to get the k-clustering of maximum spacing before reading this section, so that's a huge chunk of information that makes more sense after reading/that I know after reading.
-   * **Note to Self:**
+   * **Note to Self:** Another example of using algorithms we already know in a new context.
-   * **Readability:**
+   * **Readability:** I was ready to give this section a 9, but then I got to the proof for **4.26**. I didn't process that fully the first time around, so I'm giving this section a 7 instead.
+===== Section 4.8 =====
+   * **Summary & Motivations:** Section 4.8 is Huffman Codes and Data Compression. We explore encoding symbols of an alphabet using bits. We're converting an alphabet that's readable to humans into sequences of 0's and 1's that are readable to computers. A simple approach would be a fixed-size encoding. Let's say the alphabet we want to encode is 26 characters (a through z, presumably), the space, the comma, the period, the question mark, the exclamation point, and the apostrophe. Since we can form 2^b different symbols of b bits, we can use 5 bits per symbol for 2^5 = 32 symbols. The ASCII code does this, just with a larger alphabet, so there are more bits per symbol. A better approach would be to determine the length of each symbol's encoding based on the symbol's frequency in the language, and hope to use fewer than an average of b bits per symbol (from 2^b = number of symbols). Even low-tech Morse code does this! In Morse code, higher frequency letters use fewer dots/dashes. Morse code, however, actually uses a three-letter alphabet because pauses are required between symbols to avoid ambiguity. Ambiguity would arise because some symbols have encodings that are prefixes of other symbols' encodings. We can solve this problem by disallowing any individual encodings to be the prefix of any other individual encodings. Under this regime, the encodings are called "prefix codes." Furthermore, if we have a frequency for each letter x, equal to "the fraction of letters in the text that are equal to x" (p 165), we can determine the average bits per letter of an encoding as the sum over all symbols of each symbol's frequency times the length of its encoding. Our goal is to find an algorithm that minimizes the ABL of an alphabet, given the alphabet and a set of frequencies for the alphabet's symbols.
+   * **About the Algorithm:** We employ a binary tree to represent the prefix codes, since that's easier to visualize than a list of function mappings. If we only allow the leaves of this binary tree to represent the symbols in the alphabet, the codes are naturally prefix codes: there is not more than one way to arrive at a single leaf in a binary tree. Furthermore, we say that going left equates to a 0 and going right equates to a 1. The binary tree that gives an optimal prefix code is full, because if there were a node with only one child, you could simply replace that node with its one child, thus shortening the code. We use the knowledge that leaves of minimum depth in the tree should correspond to letters of maximum frequency to then conclude that leaves of maximum depth should correspond to letters of minimum frequency. From there, we formulate an algorithm to determine the optimal prefix code. We simply pick the two lowest-frequency letters y* and z* in the alphabet S and combine them into a "meta-letter" with frequency f_{y*} + f_{z*}. Then we "recursively find a prefix code for the smaller alphabet" (p 172), where the alphabet is now one letter smaller. When we get to the point where the alphabet only has two letters, we arbitrarily assign one 0 and the other 1. We can form the prefix code by "unfold[ing] the result back through the recursive calls" (p 173). If we maintain a heap-based priority queue using each letter's frequency as its key, this algorithm is O(n log n), where n is the number of letters in the alphabet.
+   * **My Questions:** I find the term "prefix code" confusing. To me, it sounds like a "prefix code" //would// be the prefix of another code, rather than it strictly not being the prefix of another code.
+   * **Second Time Around:** Something that actually made less sense the second time around: the algorithm itself. I didn't really understand what it meant by separating the "Define a prefix code for S as follows" block  from the rest.
+   * **Note to Self:** Approaching a problem top-down vs bottom-up can affect optimality, as in the case of Shannon-Fano codes vs Huffman codes.
+   * **Readability:** 8 - I almost gave this section a 9 or 10, but the algorithm made less sense in the reading than it did in class.