Stephen's Sidebar

Chapter 5.4: Finding the Closest Pair of Points

The problem: Given n points in a plane, find the pair that is closest together.

Designing the Algorithm

Let us denote the set of points by P = {p1, … , pn}, where pi has coordinates (x(i), y(i); and for two points we use d(pi, pj) to denote the distance. Our goal is to find a pair of points that minimizes this distance. We will apply the style of divide and conquer similar to what we used in Mergesort: we find the closest pair among the points in the left half of P and the closest pair among the points in the right half of P. Our main issue with this problem is that there could be a pair of points between the two halves that minimizes the distance which is not accounted for in the mergesort style algorithm.

Suppose q0 and q1 are correctly returned as the closest pair of points on the left side, Q, and r0 and r1 are correctly returned as the closest pair of points on the right side R. Let delta be the minimum distance of the two smallest pairs of each side. If there exists a q in Q and an r in R for which d(q,r) < delta, then each of q and r lies within a distance delta of the midline L.

Here is the algorithm:

Closest-Pair(P)
  Construct P(x) and P(y)
  (p0, p1) = Closest-Pair-Rec(P(x), P(y))
  
Closest-Pair-Rec(P(x), P(y))
  If the absolute value of P <= 3
    find closest pair by measuring all pairwise distances
    
  Construct Q(x), Q(y), R(x), R(y)
  (q0, q1) = Closest-Pair-Rec(Q(x), Q(y))
  (r0, r1) = Closest-Pair-Rec(R(x), R(y))
  
  delta = min(d(q0,q1), d(r0,r1))
  x' = maximum x-coordinate of a point in set Q
  L = {(x,y) : x = x'}
  S = points in P within a distance delta of L
  
  Construct S(y)
  For each point s in S(y), compute distance from s to each of the next 15 points in S(y)
    Let s, s' be pair achieving minimum of these distances
    
  If d(s, s') < delta
    return (s, s')
  Else if d(q0, q1) < d(r0, r1)
    return (q0, q1)
  Else
    return (r0, r1)

This algorithm correctly outputs a closest pair of points in P with a running time of O(nlogn).

The initial sorting of P by x and y coordinate takes O(nlogn) time, and the running time of the remainder of the algorithm satisfies the recurrence relation T(n) ⇐ 2T(n/2) + cn, so the rest of the algorithm runs at O(nlogn) as well.

This was a difficult section to follow, but ingenious nonetheless. 9.5/10.