Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionvii [2012/03/06 02:51] – [The Problem] mugabejcourses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionvii [2012/03/06 03:38] (current) – [The Problem] mugabej
Line 11: Line 11:
 \\ \\
 --> Let's consider a set U of n objects{p<sub>1</sub>,p<sub>2</sub>,...,p<sub>n</sub>}\\ --> Let's consider a set U of n objects{p<sub>1</sub>,p<sub>2</sub>,...,p<sub>n</sub>}\\
---> For each pair p<sub>i</sub> and p<sub>j</sub>, there's a numerical distance d(p<sub>i</sub>,p<sub>j</sub>)\\+--> For each pair p<sub>i</sub> and p<sub>j</sub>, there's a numerical distance d(p<sub>i</sub>, p<sub>j</sub>)\\
 --> For this distance:\\ --> For this distance:\\
 --> d(p<sub>i</sub>,p<sub>i</sub>) = 0\\ --> d(p<sub>i</sub>,p<sub>i</sub>) = 0\\
Line 23: Line 23:
 \\ \\
 --> So,our goal is to find the k-clustering with the maximum possible spacing.\\ --> So,our goal is to find the k-clustering with the maximum possible spacing.\\
---> ** So, The Problem **:\\+--> ** Thus, The Problem **:\\
 \\ \\
---> Since there are exponentially many different k-clusterings of  a set U, how can we efficiently find one that has maximum spacing? +Since there are exponentially many different k-clusterings of  a set U, how can we efficiently find the one that has maximum spacing?\\ 
 +\\ 
 + **Designing the algorithm** 
 +\\ 
 +--> Again, our goal is to find a clustering of maximum spacing \\ 
 +--> To achieve this, we proceed by growing a graph on the vertex set U and achieve n clusters\\ 
 +--> Our aim is to bring points nearby together into the same cluster as fast as possible\\ 
 +\\ 
 +--> We start by drawing an edge between the closest pair of points \\ 
 +--> We then draw an edge between the next closest pairs of points\\ 
 +--> We continue adding edges between pairs of points in order of increasing distance d(p<sub>i</sub>,p<sub>j</sub>)\\ 
 +\\ 
 +--> Thus we're growing a graph H on U edge by edge which connects components corresponding to clusters.\\ 
 +--> Each time we add an edge between that spans two distinct components, it's like merging two corresponding clusters: This iterative merging of clusters is called **single-link clustering**, which basically means that we combine clusters as soon as a single link joins them together.\\ 
 +--> Single-link clustering is a special case of //hierarchical agglomerative clustering//\\ 
 +\\ 
 +--> This algorithm is precisely an application of Kruskal's algorithm we saw in the the previous section 
 +--> Only difference: We stop once we obtain k connected components. So we're running Kruskal's algorithm until just before it adds its last k-1 edges.\\ 
 +\\ 
 +--> So, it's basically taking the full Minimum Spanning Tree T produced by Kruskal's algorithm and deleting the k-1 most expensive edges.\\ 
 +\\ 
 +** Analyzing the Algorithm ** 
 +\\ 
 +\\ 
 +--> The components C<sub>1</sub>,C<sub>2</sub>,..., C<sub>k</sub> formed by deleting the k-1 most expensive edges of the Minimum Spanning Tree T constitute a k-clustering of minimum spacing. Proof: Book, Page 160.\\ 
 +--> The idea behind the claim is that the spacing of any other clustering can be no larger than that of the clustering found by by the single-linkage algorithm.\\ 
 +\\ 
 +This section was interesting in that it presented a very important idea in a way as concise as it can get. I give it an 8/10.
  
courses/cs211/winter2012/journals/jeanpaul/chapterfour_sectionvii.1331002264.txt.gz · Last modified: 2012/03/06 02:51 by mugabej
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0