Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:goldm:ch4 [2018/02/27 02:35] – [4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead] goldm
+++ courses:cs211:winter2018:journals:goldm:ch4 [2018/03/12 18:08] (current) – goldm
@@ Line 67: / Line 67: @@
 ===== 4.4: Shortest Paths in a Graph  =====
+The problem for this section is: given a directed graph with a designated start node, we want to know the shortest path to each of the other nodes. The algorithm solving this problem was presented by none other than Edsger Dijkstra
+Dijkstra’s Algorithm (G, l)
+Let S be the set of explored nodes
+->For each u in S, we store a distance d(u)
+Initially S = {s} and d(s) = 0
+While S =/= V
+->Select a node v not in S with at least one edge from S for which
+-->d’(v) = min (e=(u,v):u in S) d(u) + le is as small as possible
+->Add v to S and define d(v)=d’(v)
+EndWhile
+The above pseudocode describes the idea behind the greedy algorithm.
+In order to actually implement this algorithm, we employ the trusty priority queue using the value d'(v) as the key for v. We can use the heap based priority queue implementation to remove nodes from the queue and reorder the heap as d'(v)'s change. Abusing the logn operations of the heap, we can ultimately manage to implement this algorithm in O(mlogn) for a graph with m edges and n nodes.
+Of all of the greedy algorithms we have looked at, while this algorithm is fairly conceptually straight forward, similarly to the others, I find it much harder to actually "perform it" as we did in class. There is significantly more to keep track of as is expected of a graph. Despite this, I actually found this section to be an easier read than the previous ones, so I give this section a 6/10.
+===== 4.5: The Minimum Spanning Tree Problem  =====
+This problem focuses on finding the cheapest way to connect a set of locations. Thus, we should be able to get from any node to any other node in the cheapest way possible. If the set is V and the solution is T, then (V,T) will be a tree. In general, a spanning tree is a set of edges that makes all nodes reachable. Unlike in previous problems, there are actually a number of intuitive, viable ways to go about finding the tree. One method involves adding edges in increasing cost so long as they do not create a cycle. Another approach, called Prim's Algorithm, is simply an adaptation of Dijkstra's Algorithm. Lastly, we can start with the full set of edges and remove them in decreasing order so long as they do not disconnect the graph. By cleverly picking data structures, we can implement Kruskal's and Prim's algorithms in O(mlogn). As seems to be the case often, we implement Prim's algorithm using a priority queue. This problem can be appropriately extended to handle things such as how to connect nodes in networks and spread out traffic so that no single edge finds itself overloaded.
+I did like that this section focused on a more intuitive algorithm, but at the same time, I often find the sections riddled with proofs to be a little tiresome to read, which was the case here. As such, I give this section a 5/10.
+===== 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure  =====
+The problem set up for this section is that we have a set of nodes, but we add edges to our graph overtime. As we add edges, we wish to maintain the set of connected components for the graph. Ultimately, the section creates a data structure, which they call the Union-Find Data Structure. It posits it will store a representation of the components in a way that allows for rapid searching and updating. When considering an edge, by comparing the connected components, we could can decide whether or not we need to add the edge by checking if the two components are the same or not. Given a node u, we will be able to call Find(u) to get the name of its component, then we can check for e from u to v if Find(u) = Find(v). We can also call Union(A,B) to merge A and B into a single set. The three operations for the data structure will be MakeUnionFind(S), Find(u), and Union(A,B). We seek to implement MakeUnionFind in O(n), Find(u) in O(logn), and Union(A,B) in O(logn). We ultimately use a pointer-based implementation. For each node, it will have a pointer to the name of its connected component. Finally, using this to implement Kruskal's algorithm, we end up with a respectable O(mlogn) runtime.
+This section did a good job of clearly laying out how to construct the necessary data structure, additionally, I feel as if it gave me a decent idea of how I could do this for myself in the future, that is, develop my own data structure. Given this, I give the section a 6.5/10.
+===== 4.7: Clustering  =====
+While we primarily discussed minimum spanning trees in the context of networks, they also play a big role in the field of clustering. Clustering is the idea of taking a set of objects and coherently sorting them into groups. A first step for this involves constructing a distance function for the set. Intuitively, given this distance function, we want to group items close in proximity such that objects in different groups are far apart from one another. A k-clustering is separating a set into k clusters. One natural goal for this is to maximize the distance between the k clusters. An approach to this is growing a graph on the vertex set and making the clusters the connected components. We try to bring nearby points into clusters as quickly as possible. As such, we add the closest points first and keep doing this. We also do not add extra edges that will create a cycle. Thus, the graph we create ends up being a union of trees. When we merge two clusters, we do so by connecting one node in each. This is single-link clustering. To give another technical term, this is a form of "hierarchical agglomerative clustering." To actually do this, we use Kruskal's algorithm and stop when we have k connected components.
+While I understand why this stands alone as a section, I do feel as if it could have been shrunk enough to be added as an "extension" to the previous section as is often done. Regardless, it was not a particular dense section and made sense, so I give it a 6/10.
+===== 4.8: Huffman Codes and Data Compression  =====
+Given that computers encode information in bits, it is necessary for computers to have a means of encoding different alphabets into strings of 0's and 1's. There are a number of viable means of doing this, but we want to encode things with, on average, the fewest number of bits. This was especially important in the past when memory was hard to come by. This is where data compression comes in. Data compression and compression algorithms allow for files to be taken as input and compressed down to much smaller sizes. It is still important, however, to reduce the size of things up front. Morse code serves as a good example of this where producing common letters took fewer input characters. Having variable length of strings, while easily avoidable in morse by pausing, is an issue in binary where there are only two symbols. To avoid this with binary, we would need to ensure that no letter's encoding is the prefix of another. Prefix codes are ultimately the way encodings have their ambiguity removed. We use binary trees to represent prefix codes.
+We ultimately end up with Huffman's Algorithm. The prefix codes it produces are Huffman Codes. We are able to implement Huffman's Algorithm in O(klogk).
+This section seemed particularly long and contained a lot of proofs, which generally can be difficult to read. Overall, I give the section a 5/10.