CS211: Design and Analysis of Algorithms

In this case, the graph evolves through the addition of edges. Our goal is to maintain the set of connected components of such a graph throughout this evolution process. We will develop a data structure called the Union-Find structure which will store a representation of the components in a way that supports rapid updating and searching. For every edge e(v,w), if the connected components containing v and w are different, then there is no path between v and w, and hence edge e should be added and vice versa if the component is the same. The Problem. Find(u) will return the name of the set containing u. “if Find(u)=Find(v)“ will be used to check if u and v are connected. Union(A,B) is another operation merging two sets into one. When we add an edge, we have to first check if the two nodes, the edge connects, are in the same connected component. The Union-Find data structure cannot be used to handle the effects of edge deletion. We can use the array implementation of the Union-Find data structure to simplify our data structure. Consider the array implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. The Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any sequence k Union operations takes at most O(k logk) time. This is proven on page 153. We will try to do better and reduce the worst-case time required. We will do this at the expense of raising the time required for the Find operation to O(log n). A better data structure for Union-Find. The data structure for this alternate implementation uses pointers. Consider the above pointer-based implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. A Union operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and a Find operation takes O(log n) time.proof on page 155.

Further Improvements. The bad case for the running time of the pointer-based Union-Find data structure is when we repeatedly take Union of equal-sized sets.Find(v) takes O(log n) time, and having to follow the same sequence of log n pointers every time for finding the name of the set containing v is quite redundant. So in the improved implementation, we will compress the path we followed after every Find operation by resetting all pointers along the path to point to the current name of the set. But how did the time required for a Find(v) change?A sequence of n Find operations employing compression requires an amount of time that is extremely close to linear in n; the actual upper bound is O(nα(n)) where α(n) is an extremely slow-growing function of n called the inverse Ackermann function.

Implementing Kruskal's Algorithm. Now we will use the Union-Find data structure to implement Kruskal's algorithm. Sort edges by cost, and this takes time O(m log m). We use the Union-Find data structure to maintain the connected components of (V,T) as edges are added. We are doing a total of at most 2m Find and n-1 Union operations over the course of Kruskal's algorithm. We can use either form of Union-Find data structure described above to conclude that it is a total of O(m log n) time. To sum up, Kruskal's Algorithm can be implemented on a graph with n nodes and m edges to run in O(m log n) time.

This section had some twists and it was not too long, thus I give it a scale of 8/10.