Chapter 4 covers greedy algorithms – algorithms that build solutions in small steps, choosing a decision at each step in order to optimize a certain criterion.
4.1 Interval Scheduling: The Greedy Algorithm stays ahead We can talk about a set of information as requests with start and end times. Subsets of these requests can be compatible I no two overlap in time, and optimal if the compatible sets are of maximum size.
When we design a greedy algorithm we do so in terms of certain rules or objectives. The book outlines several rules: selecting the available request that starts the earliest or the request with the minimum start time, accepting the request that requires the smallest interval of time or accepting the request that has the fewest number of noncompatible requests. All three have certain problems associated with them, but the book identifies a greedy rule that leads to the optimal solution. If we accept the first request that finishes first, we ensure that our resource becomes free as soon as possible while still satisfying one request (Algorithm on p. 118). The induction proof on page 120 shows that using this algorithm, for each r>=1, the rth accepted request in the algorithm’s schedule finishes no later than the rth request in the optimal schedule, and so greedy stays ahead. The proof on page 121 shows that the algorithm also returns an optimal set. Using this algorithm to sort n request in order of finishing time and label them in order takes O(n log n) time, and to select requests in order of increasing f(i) takes O(n) time (although I’m not totally confident about how the two parts are different).
4.2 Scheduling to Minimize Lateness: An exchange argument This section also discusses interval scheduling, but this time the requests don’t have a start and finish time. Instead they have “intervals” of certain lengths with certain deadlines, but can be scheduled any time before the deadline. The algorithm itself has to schedule the start and end times for each interval. If a request misses the deadline, it is considered “late.” Our goal in our scheduling problem is to minimize the maximum lateness (lateness is defined as: li = f(i) – di). The optimal algorithm to find max lateness is to sort jobs in increasing order of their deadlines and schedule them in this order. This method ensures that earlier deadlines are finished first (algorithm p. 127). The proof on p. 129 shows that with this algorithm we can come up with an optimal schedule that has no inversions and no idle time.
4.4 Shortest Paths in a Graph This section looks at ways in which we can find the shortest path between nodes in a graph. Djikstra’s algorithm is a greedy algorithm that solves the single-source shortest-paths. It first determines the length of the shortest path from s to each other node (algorithm on p. 138). The proof on p. 139 shows that Djikstra’s algorithm does in fact always produce the shortest path from any starting point. The only time it won’t find the shortest path is if the edges have negative lengths. The algorithm has a run time of O(mn) time for a graph of m edges and n nodes. If we use a priority queue to implement Djikstra’s algorithm, we can get a run time of O(m), plus the time for ExtractMin and ChangeKey operations (p.141-2).
4.5 The Minimum Spanning Tree Problem
Kruskal’s algorithm starts without any edges and builds a spanning tree by successively inserting edges from E in order of increasing cost. Prim’s algorithm starts with a root node and tries to greedily grow a tree from the root outward. The reverse-delete algorithm is defined in the book as a backward version of Kruskal’s algorithm, starting with a full graph and deleting edges in order of decreasing cost. The book proves the cut property on p. 145 which says that the minimum cost edge in a graph will always be contained in a minimum spanning tree. Pages 146-7 prove the Kruskal’s and Prim’s algorithms produce minimum spanning trees. The cycle property, proved on p. 147 shows that a minimum spanning tree will not contain the most expensive edge in a graph. Prim’s algorithm has a running time of O(m), plus the time it takes to run ExtractMin and ChangeKey operations, just like Djikstra’s.
4.6 Implementing Kruskal’s Algorithm: The Union-Find data structure The Union-find data structure stores a representation of components in a way that make sis easier to search and update the collection using MakeUnionFind(S) to return sets of elements in O(n) time, Find(u) to return the name of the set containing u in O(log n) time, and Union(A, B) to change the structure by merging sets A and B in O(log n) time. We can use union-find to implement Kruskal’s algorithm, sorting edges in O(m log m) time and running in O(m log n) time.
Section 4.6 was the most difficult to follow in its discussion of the data structures used for union-find. Overall, I would give this chapter a 7 – it had a lot of information and presented several new concepts that will probably take me more time and practice to understand.