Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2018:journals:cohene:home:chapter4 [2018/03/06 01:27] – [4.5: The Minimum Spanning Problem] cohenecourses:cs211:winter2018:journals:cohene:home:chapter4 [2018/03/12 23:28] (current) – Added Section 4.8 cohene
Line 49: Line 49:
 ===== 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure ===== ===== 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure =====
  
 +The Union-Find data-structure will store a representation of connected components that allows us to quickly and efficiently search and update these components. This turns out to be the same data structure in Kruskal's Algorithm. This data structure allows us to maintain disjoint sets. Find(node) allows us to find the set containing that node. Union(nodeA, nodeB) takes to sets and merges them into a single set. To add an edge to a graph, first we must test that they're in the same connected component. If they are not, then we perform a union on them to combine them into one. This only works for addition, not deletion.
  
 +To create this structure, we create an array Component, and a set with n elements. To create the union-find, we initialize the Component array to the set. We can optimize this set by choosing the name for the union to be the name of one of the sets. Furthermore, we can optimize by maintaining the union as a smaller set for any large sets. This gives the Union operation O(n) time. We can improve the structure for union-find by using pointers. The pointer implements Union in O(1) time, but Find is no longer constant, rather, takes O(log n) time.
  
 +
 +We can implement Kruskal's Algorithm by using the union-find data structure, which runs in O(m log n ) time.
  
 ===== 4.7: Clustering ===== ===== 4.7: Clustering =====
 +
 +This section discusses clustering. Clustering can be used whenever one is attempting to organize a collection into various groups. There are various ways to group a collection. One way is by distance, such as with points on a plane. We can use minimum spanning trees to play a part in clusterings of maximum spacing. If we want to divide objects into k groups, we will have a k-clustering as a partition of a union into k nonempty sets. The spacing of a k-clustering is the minimum distance between any pair of points in different clusters. 
 +
 +
 +To find a clustering of maximum spacing, we want to cluster points as rapidly as possible. We grow a graph edge by edge, with connected components corresponding to clusters. This process will never create a cycle, but rather a union of trees. This procedure is extremely similar to Kruskal's Minimum Spanning Tree Algorithm. 
 +
 +
 +===== 4.8: Huffman Codes and Data Compression =====
 +
 +This chapter discusses solving problems using data compression. Here, the problem is encoding symbols using bits. To convert various "alphabets" into language computers understand, we use bits. Typically, we use a system of a fixed number of bits per letter, which for the English alphabet (and a few symbols) gives of 32 symbols, which means we use 5 bits per symbol. Taking Computer Organization really helps in understanding how binary and bits work, so this is a somewhat familiar topic for me. Although we need 5 bits per symbol, what we really need is an average of 5 bits per symbol, as some symbols will be used more often than others. It is a waste to use so many bits on letters that are used very frequently. In fact, it makes a lot of sense to use less bits on letters used more frequently and more on those used less frequently. This is where data compression comes into play. The big question is how to find the most optimal way of compressing the data.
 +
 +One example of how data has been compressed is Morse code. Morse code translates each symbol into dots or dashes. However, Morse code removes ambiguity between letters that are represented similarly by adding a space in between each letter. If we were to turn this into bits, the ambiguity would remain as certain encodings are prefixes to others. A method must be built to eliminate the prefix problem. The method could work as follows:
 +Start reading bits from left to right, once you come across enough bits to match an encoding this becomes a letter, continue with the next bit and repeat.
 +
 +Now, we must account for letter frequencies to make the most optimal encoding. We can use a greedy algorithm to find the optimal prefix code. First, we can build a tree to help us build each encoding. We will create a binary tree where each time we go from a node to its left child we add a 0 and every time we go from a node to its right child we add a 1. By using a tree, we can see that the length of each symbol in bits is its corresponding layer in the tree. We want to fill the binary tree to make it optimal.
 +
 +The first thought of building is tree is to attempt to build it top down. However, if we knew the optimal prefix code, this would help a great amount. We could then simply fill in the nodes of the tree. We would start with the highest frequency letters and move down through the tree, labeling nodes in order of decreasing frequency. This is essentially the basis of Huffman's Algorithm, which gives us a Huffman code.
 +
 +Huffman's Algorithm runs in polynomial time k, which is the number of letters in the alphabet, which recursively calls a sequence of k-1 iterations, which overall gives a O(k^2) runtime. However, if a priority queue is used to implement this algorithm, the run time can be dropped down to a nice O(k log k) runtime.
 +
  
  
courses/cs211/winter2018/journals/cohene/home/chapter4.1520299624.txt.gz · Last modified: by cohene
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0