Student Journals

Section 4.5 Section 4.5 discusses Minimum Spanning Tree and the efficiencies of their algorithms. A mimimum spanning tree by definition is a tree that connects all the nodes with an edge, without creating any cycles, and has the least cost of building this tree. To produce a mim spanning tree with use greedy algorithm, and the three simple ones the book discusses, which all of them create a MST, are Kruskal's Algorithm, Prim's Algorithm, and the last is Reverse-Delete Algorithm. Both Kruskal's and Prim's Algorithms only add an edge to a tree is it passes the cut property. By using the cut property we assure that the tree is optimal. Reverse-Delete Algorithm deletes the edges that have the highest costs and do not disconnect the tree. Therefore each edge that is deleted passes the Cycle Property, which also assures that the tree is optimal. Lastly, Prim's Algorithm can be implemented using a priority queue that also has the functions, ChangeKey and ExtractMin, and thus we find that the algorithm has a running time of O(mlogn), such that ChangeKey and ExtractMin take logn time and we do that with m edges.

Section 4.6 Section 4.6 specifically looks at Kruskal's Algorithm, its implemented, and its efficiency. Kruskal's Algorithm uses a new kind fo data structure called a Union-Find structure, which will store a representation of the components in a way that supports rapid searching and updating. Therefore union-find supports three operations, MakeUnionFind(S) - that corresponds to teh connected components of a graph with no edges, Find(u) - that returns the name of the set containing u, and Union(A,B) - that changes the data structure by merging the sets A and B into a single set. Thus if we have an array implementation of the Union-Find data structure for some set S of size n, the Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O9klogk) time. However if we use a pointer-based implementation of the Union-Find Data Structure for some set S of size n we see that it is more optimal because the Union operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and a Find operation takes O(logn) time. At the end of the section we prove that the implementation of Kruskal's Algorithm (by using the Union-Find structure) takes O(mlogm) or O(mlogn) since m is less than or equal to n-squared.

Section 4.7 Section 4.7 discusses the idea of clustering. Clustering arises whenever there is a collection of objects that needs to be classified or organized into specific groups. To find a clustering of maximum spacing we use a mimimum spanning tree as a basic formalization. For example, if we have a set U of n objects, for each pair of objects (pi and pj), we have a numerical distance d(pi, pj). And if we want to divide the cluster into k groups we say that a k-clustering of U is a partitions of U into k nonempty sets C1, C2,….Ck. The spacing of this clustering is the mimimum distance between any pair of poitns lying in different clusters. For the design of the algorithm we use Kruskal's Algorithm but stop just before it addeds its last k-1 edges. This, iteratively merging clusters is equivalent to computing a mimimum spanning tree and deleting the most expensive edges. Ultimately we find that the components C1, C2,…, Ck formed by deleting the k-1 most expensive edges of the mimimum spanning tree T constitute a k-clustering of maximum spacing.

Section 4.8 Section 4.8 discusses Huffman Codes and Data Compression and the problem that computers operate on a sequence of bits and therefore one needs encoding schemes that take text written in richer alphabets and converst this text into long string of bits. Through this process we are comfonted with the issue of reducing the average number of bits per letter and thus data compression becomes very significant. After discusses these initial issues the section then goes on to define prefix codes, which are a major element of data compression. The section then discusses how to uses Huffman's Algorithm to compress data such that the Huffman code for a given alphabet acheives the minimum average number of bits per letter of any prefix codes, and defines its implementation and running time by using a priority queue and has a total run time of O(klogk)