Minimum Spanning Trees also play a role in the area of clustering.
The Problem. Clustering happens when one has a collection of objects such as a set of documents that one is trying to organize into a coherent group. It is important to find the similarityies between each pair of objects. One way to define similarities is by using a distance function on the objects with an interpretation that objects at a larger distance from one another are less similar to each other. But what do we mean by distance??? Distance could mean a variety of things but in our case, distance will help us cluster objects “closer” to each other together and put those “far apart” in different groups. Clustering of Maximum Spacing. Minmum Spanning trees play a role in cases where we are given a set of n objects whose distance between any two objects is positive and symmetric. How possible is it to divide the objects into k groups, where k is the minimum distance between any pair of points?? Our goal is to seek k clusters or groups of objects with the maximum possible spacing. Our next question involves finding a most efficient k-clustering with the maximum spacing. Designing the Algorithm. We start by considering growing a graph on vertex set U. Then draw an edge between the closets pair of points. Then draw another edge between the next closest pair in increasing order. If we are about to add an edge between two nodes that already belong to the same cluster, we will refrain from adding the edge because it wont change the set of components. In this way we wont have a cycle; thus giving us a union of trees. What is the connection to MST's? Our procedure is precisely applying Kruskal's MST Algorithm. We stop when we obtain k connected components. This is similar to taking a full MST , and deleting k-1 most expensive edges, and defining the k-clustering to be the resulting connected components. Analyzing the Algorithm. We claim that the components formed by deleting k-1 most expensive edges of the MST constitute a k-clustering of maximum spacing, proof on page 160-161. What is the running time of finding k-clusters?? It will take the running time Kruskal's Algorithm would take.
This section was easy to read through but required some attention, its scale is 8.5/10.