This is an old revision of the document!
Table of Contents
Chapter 4: Greedy Algorithms
Section 1: Interval Scheduling: The Greedy Algorithm Stays Ahead
The Interval Scheduling Problem consists of a set of requests {1,2,…,n}, where the ith request corresponds to an interval of time starting at s(i) and finishing at f(i). When utilizing a greedy algorithm for the interval scheduling problem, we want to define a simple rule to select a first request. Once the first request is accepted, all other requests that are not compatible with the first request are rejected. Then, another compatible request is accepted, and again all of the other requests that are not compatible with it are rejected. This process repeats until you run out of the requests. The main challenge of this problem is deciding on what the rule should be for picking the next acceptable request. There are 4 main possibilities for choosing the order of the requests: choosing by earliest start time, choosing by shortest time required to complete the task, by the smallest number of coexisting requests during its time interval, and by earliest finish time. Choosing by earliest start time does not yield an optimal solution because if the earliest request i is for a very long interval, we may reject a lot of requests for shorter time intervals. Choosing by smallest interval of time does not yield an optimal solution, because it may cause the algorithm to choose a request of a smaller time interval by that results in only one request being chosen instead of two. Choosing by least number of incompatible requests also does not yield an optimal solution, because it may not choose the row with the most requests since they may conflict with lots of other rows. The correct way to choose the next request is by earliest finish time. We prove this by using the greedy stays ahead idea. The Greedy algorithm will always first choose the algorithm with the earliest finish time, so the possible start time of the next request will always be at least as early as the optimal solution. If the optimal solution has an extra request, the greedy algorithm will always add that request because its finish time will always be at least as early as the optimal solutions finish time. This algorithm will run in O(nlogn), since it will take nlogn time to sort the requests in order of finish time. Then selecting the requests will take place in constant time. Another related problem that can arise is trying to schedule all of the requests using as few resources as possible. 'In any instance of the Interval Partitioning Problem, the number of resources needed is at least the depth of the set of intervals'. The depth of a set of intervals is the maximum number that pass over any single point on the time-line. The algorithm used to solve this problem is very simple. You go sort the requests by finish time. Then you go one-by-one looking at the requests. For each request, you look at the current resources allocated. If the request can be added to a resource without conflict with another request, you add it to the resource, else you keep looking at other resources. If no such compatible resource is found, you must create a new resource and add the request to it. The algorithm will schedule every request on a resource using a number of resources equal to the depth of the set of intervals, which is the optimal number of resources needed. This section was very easy to read and understand, I'd give it an 8/10.
Section 2: Scheduling to Minimize Lateness
We now consider a different scheduling problem. This situation consists of requests like before, that each take a given interval of time to be completed. Each request however now has a deadline time, but can be scheduled at any interval of time before the deadline. Each accepted request is assigned an interval of time of its length, and different requests must not have overlapping intervals. A request is considered late if its interval finish time is after its deadline time. Its lateness is calculated by subtracting its finish time by its deadline time. The goal of this algorithm is minimize the maximum lateness of any one given request. There are several possible approaches that could be taken in an attempt to solve this problem. The first approach would be to schedule jobs in order of increasing time interval. This approach does not produce an optimal ordering, as the jobs with short time intervals might have very long deadlines, whereas the jobs that take longer amounts of time might have shorter deadlines and need to be scheduled first. Another way to approach this problem would be to order requests by slack time, which is calculated by subtracting the time interval over which they must take place by the deadline time. However, this approach also does not produce an optimal solution, as maximum lateness is not optimized if a job with a long time interval and deadline is scheduled before a job with a small time interval and smaller deadline, but a larger slack. The approach that always produces the optimal solution is to sort the jobs in increasing order of deadlines. This rule is also termed as 'Earliest Deadline First'. This approach confirms that jobs with earlier deadlines get completed earlier, which will minimize the lateness in the long run. The schedule produced by taking this approach will have no idle time, since work can be done during this period to minimize lateness. Another characteristic of this algorithm is that all schedules with no inversions and no idle time have the same maximum lateness. If a schedule has an inversion, where a job a is scheduled after job b, but has an earlier deadline, then the greedy algorithm would have scheduled the two requests in the opposite order, thus minimizing lateness. Thus, there is an optimal schedule that has no inversions or idle time. If a schedule had an inversion, swapping the two requests would result in one less inversion, and would have a maximum lateness that is no larger than it was prior to the inversion. Thus you can continue to remove inversions which will either lower the lateness or have it stay the same, and thus will result in an optimal solution with no inversions. This section was easy to understand after having talked about it extensively in class. I'd give it a 7/10.
Section 4: Shortest Paths in a Graph
Another application of Greedy Algorithms is to find the shortest path in a graph between two nodes, which results in the creation of minimum-cost spanning trees. The setup of the shortest paths problem consists of a directed graph, with a designated start node s that has a path to every other node in G. Each edge of the graph is given a weight to denote the time it takes to traverse the edge. The length of the path between two nodes is the sum of the weights of all the edges that are traversed in the path. The goal of the problem is to determine the shortest path from s to every other node in the graph. An algorithm that can be used to find the shortest path between nodes in the graph is called Dijkstra's algorithm. Dijkstra's algorithm works by considering every node in the graph, and finding the set of edges in the graph that results in the shortest distance from the start node to the desired node. However, there are some limitations to this algorithm. The algorithm does not always find shortest paths if some of the edges can have negative lengths. A more complex algorithm - due to Bellman and Ford - is required for cases with negative path lengths. Dijkstra's algorithm can also be thought of as a 'continuous' version of Breadth-First Search for traversing a graph, where the tree created is the shortest path to any given node from the start node. This algorithm runs in O(mn) time. There are n-1 iterations over the nodes of the graph, where each iteration adds a new node to the current set. For each iteration, we have to consider each node that is not in the current set, and find all the edges between the set and the node to determine the minimum distance to the node. Thus you are traversing m edges of the graph for each node. However, if you change the implementation of the graph into a priority queue, Dijkstra's algorithm can be implemented to run in O(m) time. Each priority queue operation runs in O(logn), for a total overall time of O(mlogn) for the implementation. This section was rather difficult to follow, I would give it a 4/10.
