Table of Contents
5.1 The Mergesort Algorithm
- Mergesort sorts a list by dividing it in half, recursively sorting the halves, and then merging the halves back together, exploiting the fact that it is easy to combine sorted lists into a sorted list, and that if you divide a list often enough the parts become sorted automatically (becoming singleton). Given any algorithm of this sort, in which the data is divided in half and then combined in linear time, we have T(n)=<2T(n/2)+cn, and T(2)=<c. The sum of this function as it is recursively applied can be found by looking at each level of recursion; the first takes cn time, the second 2c(n/2)=cn, and so forth; there are log_2 n layers, giving a running time of O(n log_2 n). One can also solve the equation directly by substituting c(n/2)(log_2 n/2) for T(n/2); this gives T(n)=<2c(n/2)(log_2 n/2)+cn=cn[(log_2 n)-1]+cn=cn(log_2 n), giving O(n log n); given an asymptotic running time, one can plug it in with generic constants and simplify to find the constant (here c) that would create the desired cancellation.
- No insights.
- No questions.
- No complaints.
5.2 Further Recurrence Relations
- Recursive algorithms can be classified by the form of their recurrence relation, the divisions they make and complexity of the combination. A generalization of the quicksort recurrence is T(n) ⇐ qT(n/2)+cn; total work in level j is (q/2)^jcn, and the implied summation simplifies to O(n^(log_2q)-1). One can identify such relations, where what to plug in is not obvious, by plugging in an expression with extra degrees of freedom, and then identifying values that complete the substitution. A different extension is given by T(n) ⇐ 2T(n/2)+cn^2; this gives O(n^2).
- No insights.
- No questions.
- No complaints.
5.3 Counting Inversions
- The brute force means of counting inversions between two ordered lists is to look at each pair, in O(n^2) time. To improve on that, divide the list being analyzed in two and recursively sort and count inversions within each sublist. Then combine the two into a sorted list, counting inversions (pairs where a number in the first sublist is greater than a number in the second; in other words, each time an element is taken from the second sublist, add the number of elements not yet taken from the first). Since this is a linear-time rider on the linear-time combination of mergesort, this algorithm shares mergesort's O(n log n) runtime.
- No insights.
- No questions.
- No complaints.
5.4 Finding the Closest Pair of Points
- Finding the closest pair among n points by brute force entails looking at each of the O(n^2) pairs. To improve on that, sort the points (separately along each axis), then divide the space in half (with half the points on each side) and recursively find the closest pair in each half. Any closer pair must span the dividing line, and thus if d is the minimum of the distances of the closest pairs in the halves, only points within d of the center need be considered. Now given a list of the points within that 2d rectangle about the center sorted by y-coordinate, suppose that two are closer than d. Then there can be no more than 14 points between them as sorted by y-coordinate, for there is not space to fit more than that many points in the intervening rectangle without two in the same half being closer than d. Therefore, regardless of the number of points in this 2d-wide rectangle, the closest can be found within 15n comparisons. This shares the recurrence with mergesort, and since the initial sort is also O(n log n), the entirety is O(n log n).
- No insights.
- This chapter seems really focused on improving on O(n^2) runtimes. Having seen the difference between O(n log n) and O(n^2) in practice (in Economics I have Ns ranging above 10e7), I am not complaining. But why the fixation on polynomial time as the standard for efficiency, given how often brute force achieves it and how often we attempt to improve on it?
- No complaints.
5.5 Integer Multiplication
- The conventional multiply-and-carry multiplication algorithm is O(n^2) for multiplying two n digit numbers (n nx1 multiplications, each O(n), plus n O(n) additions). One can also multiply by divide-and conquer; instead of using the equivalence that if a=10q+r, ab=10bq+br, one uses a=2^(n/2)a_1+a_0 and b=2^(n/2)b_1+b_0, ab=a_1b_1*2^n+(a_1b_0+a_0+b_1)^(n/2)+a_0x_0. This creates 4 subproblems of size n/2, plus a constant number of O(n) additions. This comes back to the earlier recurrence relation T(n) ⇐ 4T(n/2)+O(n), but this gives O(n^2). Improving requires a reduction in the number of multiplications; using the equality ab=a_1b_!*2^n+1)=O(n^1.59).
- No insights.
- No questions.
- No complaints. I appreciated their giving the derivation of the second identity.
1)
a_1+a_0)(b_1+b_0)-a_1b_1-a_0b_0)*2^(n/2)+a_0b_0 uses just three, with running time O(n^(log_2 3