Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionvii [2012/03/06 02:35] – [The Problem] mugabej | courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionvii [2012/03/06 03:38] (current) – [The Problem] mugabej | ||
---|---|---|---|
Line 9: | Line 9: | ||
** Clustering of Maximum Spacing ** | ** Clustering of Maximum Spacing ** | ||
\\ | \\ | ||
+ | \\ | ||
--> Let's consider a set U of n objects{p< | --> Let's consider a set U of n objects{p< | ||
- | --> For each pair p< | + | --> For each pair p< |
- | --> | + | --> |
+ | --> | ||
+ | --> d(p< | ||
+ | --> Distances are symmetric: d(p< | ||
+ | \\ | ||
+ | Now, | ||
+ | \\ | ||
+ | --> A k-clustering of U is a partition of U into k nonempty sets C< | ||
+ | --> Spacing of a k-clustering: | ||
+ | \\ | ||
+ | --> So,our goal is to find the k-clustering with the maximum possible spacing.\\ | ||
+ | --> ** Thus, The Problem **:\\ | ||
+ | \\ | ||
+ | Since there are exponentially many different k-clusterings of a set U, how can we efficiently find the one that has maximum spacing? | ||
+ | \\ | ||
+ | | ||
+ | \\ | ||
+ | --> Again, our goal is to find a clustering of maximum spacing \\ | ||
+ | --> To achieve this, we proceed by growing a graph on the vertex set U and achieve n clusters\\ | ||
+ | --> Our aim is to bring points nearby together into the same cluster as fast as possible\\ | ||
+ | \\ | ||
+ | --> We start by drawing an edge between the closest pair of points \\ | ||
+ | --> We then draw an edge between the next closest pairs of points\\ | ||
+ | --> We continue adding edges between pairs of points in order of increasing distance d(p< | ||
+ | \\ | ||
+ | --> Thus we're growing a graph H on U edge by edge which connects components corresponding to clusters.\\ | ||
+ | --> Each time we add an edge between that spans two distinct components, it's like merging two corresponding clusters: This iterative merging of clusters is called **single-link clustering**, | ||
+ | --> Single-link clustering is a special case of // | ||
+ | \\ | ||
+ | --> This algorithm is precisely an application of Kruskal' | ||
+ | --> Only difference: We stop once we obtain k connected components. So we're running Kruskal' | ||
+ | \\ | ||
+ | --> So, it's basically taking the full Minimum Spanning Tree T produced by Kruskal' | ||
+ | \\ | ||
+ | ** Analyzing the Algorithm ** | ||
+ | \\ | ||
+ | \\ | ||
+ | --> The components C< | ||
+ | --> The idea behind the claim is that the spacing of any other clustering can be no larger than that of the clustering found by by the single-linkage algorithm.\\ | ||
+ | \\ | ||
+ | This section was interesting in that it presented a very important idea in a way as concise as it can get. I give it an 8/10. |