This is an old revision of the document!
4.7 Clustering
The Problem
Clustering arises when one is trying to classify a collection of objects into coherent groups.The first question that comes into mind is whether there exist some measure under which each pair of objects is similar or dissimilar to another. A common approach used is to define a distance function on the objects, where objects at a larger distance from one another are less similar to each other. However, the notion of distance usually takes on a very abstract meaning in many applications.
Given a distance function on the objects, the clustering problem seeks to divide them into groups such that objects within the same group are “close”, and objects in different groups are “far apart”. So the main challenge is figuring out what a good set of groups might look like.
Clustering of Maximum Spacing
–> Let's consider a set U of n objects{p1,p2,…,pn}
–> For each pair pi and pj, there's a numerical distance d(pi,pj)
–> For this distance:
–> d(pi,pi) = 0
–> d(pi,pj ≥ 0 for distinct i and j
–> Distances are symmetric: d(pi,pj = d(pj,pi
Now,let's suppose we're dividing the objects in U into k groups, for a certain parameter k.
–> A k-clustering of U is a partition of U into k nonempty sets C1,C2,…, Ck
–> Spacing of a k-clustering: the minimum distance between any pair of points lying in different clusters.
–> So,our goal is to find the k-clustering with the maximum possible spacing.
–> So, The Problem :
Since there are exponentially many different k-clusterings of a set U, how can we efficiently find the one that has maximum spacing?