This is an old revision of the document!


An algorithm is greedy if it builds up a solution in small steps choosing a decision at each step myopically to optimize some underlying criterion.

4.1 - Interval Scheduling: The Greedy Algorithm Stays Ahead

Designing a Greedy Algorithm (using the Interval Scheduling Problem) Use a simple rule to select a first request i1; once that is accepted, we reject all requests that are not compatible with i1. We then select the next request i2 to be accepted and again reject all requests that are not compatible with i2. Do this until we run out of requests. We tried a bunch of ideas out and decided the best one was: accept first the request that finishes first. Then we can maximize our time. Initially let R be teh set of all requests and let A be empty While R is not yet empty

 Choose a request i ∈ R that has the smallest finishing time
 Add request i to A
 Delete all requests from R that are not compatible with request i
Endwhile

Return the set A as the set of accepted requests

Analyzing the Algorithm See p199-120 for reasoning for greedy rules. Mainly look at the pictures.

We can make our algorithm run in time O*n logn) as follows. We begin by sorting the n requests in order of finishing time and labeling them in this order; that is, we will assume that f(i) ⇐ f(j) when i < j. In an additional O(n) time, we construct an array S[1…n] with the property that S[i] contains the value s(i).

We now select requests by processing the intervals in order of increasing f(i). We always select the first and then iterate through the intervals in order until reaching the first j for which s(j = f(1); we then select this one as well. Thus, this part of the algorithm takes time O(n).

Related Problem: Scheduling All Intervals Problem: If we have many identical resources and wish to schedule all the requests using as few resources as possible

 4.4 In any instances of Interval Partitioning, the number of resources needed is at least the depth of the set of intervals.

Can we design an efficient algorithm that schedules all intervals using the min number of resources possible? Is there always a schedule using a number of resources that is equal to the depth?

Designing the Algorithm: Let d be the depth of teh set of intervals. Sort the intervals by their start times, breaking ties arbitrarily Let I1, I2,…,In denote the intervals in this order For j = 1, 2, 3, …, n

  For each interval Ii that precedes Ij in sorted order and overlaps it
     Exclude the label of Ii from consideration for Ij
  Endfor
  If there is any label from {1,2,...,d} that has not been excluded then
     Assign a nonexcluded label to Ij
  Else
     Leave Ij unlabeled
  Endif
Endfor

If we use the greedy algorithm above, every interval will be assigned a label, and not two overlapping intervals will receive the same label. Since our algorithm is using d labels, we can conclude it is always using the minimum possible number of labels.

I give this section a 8.5 on interesting and readability. It explained in greater detail the discussions and lectures in class about it.

4.2 - Scheduling to Minimize Lateness: an exchange argument

The Problem Consider again a situation in which we have a single resource and a set of n requests to use the resource for an interval of time and assume that the resources is available starting at time s. BUT, in contrast to before, each request is now more flexible. Instead of a start time and a finish time, the request i has a deadline di and requires a contiguous time interval of length ti, but is willing to be scheduled at any time before the deadline. Each accepted request must be assigned an interval of time of length ti and different requests must be assigned non-overlapping intervals. Let's plan to satisfy each request, but we are allowed to let certain requests return late. So, at our overall start time s, we will assign each request i an interval of time of length ti; this interval is denoted by [s(i), f(i)] with f(i) = s(i) + ti. Then, (UNLIKE the previous problem), the algorithm must actually determine a start time for each interval. We say that a request is late if it misses the deadline; the lateness is defined by li = f(i) - di. We will say that li = 0 if request i is not late. The goal in our problem will be to schedule all requests, using nonoverlapping intervals, so as to minimize the maximum lateness, L = max li. (We will refer to our requests as jobs.)

Designing the algorithm We simply sort the jobs in increasing order of their deadlines di and schedule them in this order. There is an intuitive basis to this rule: we should make sure that jobs with earlier deadlines get completed earlier.

Order the jobs in order of their deadlines
Assume for simplicity of notation that d1 ≤ ... ≤ dn
Initially, f = s
Consider the jobs i = 1,.., n in this order
    Assign job i to the time interval from s(i) = f to f(i) = f + ti
    Let f = f + ti
End 
Return the set of scheduled intervals [s(i), f(i)] for i = 1,..., n.

Analyzing the algorithm Observe that the schedule it produces has no “gaps” - times when the machine is not working yet there are jobs left. Time that passes during a gap will be called idle time: there is work to be done, yet for some reason, the machine is sitting idle. (the schedule produced by the algorithm has no idle time.)

How can we prove our schedule has the smallest maximum lateness possible? (128-129 for details)

We say schedule Z has an inversion if a job i with deadline di is scheduled before another job j with earlier deadline dj < di. By definition, our schedule can have no inversions. Also note that all schedules with no inversions and no idle time have the same maximum lateness.

There is an optimal schedule that has no inversions. (Since there is an optimal schedule with no idle time and you can always undo the inversion to make a better schedule.)

The schedule A produced by the greedy algorithm has optimal maximum lateness L.

4.4 - Shortest Paths in a Graph

The Problem Given a start node s, what is the shortest path from s to each other node?

The concrete setup of the shortest paths problem is as follows. We are given a directed graph G = (V,E), with a start node s. We assume that s has a path to every other node in G. Each edge e has a length l_e ≥ 0, indicating the time/distance/cost it takes to reverse e. For a path P, the length of P, denoted l(P), is the sum of the lengths of all edges in P. Our goal is to determine the shortest path from s to every other node in the graph.

Designing the Algorithm We begin with an algorithm that just determines the length of the shortest path from s to each other node in the graph; it is then easy to produce the paths. The algorithm maintains a set S of vertices u for which we have determined a shortest-path distance d(u) from s; this is the “explored” part of the graph. Initially S = {s} and d(s) = 0. Now, for each node vV - S, we determine the shortest path that can be constructed by traveling along a path through the explored part S to some uS, followed by the single edge (u,v). We choose the node v ∈ V - S for which d is the smallest, add v to S and define d(v).

Dijkstra's Algorithm (G, l)
Let S be the set of explored nodes
   For each u ∈ S, we store a distance d(u)
Initially S = {s} and d(s) = 0
While S ≠ V
   Select a node v ∉ S with at least one edge from S for which d'(v)  = min(d(u)) + l_e
   Add v to S and define d(v) = d'(v)
Endwhile

Analyzing the Algorithm Is it always true that when Dijkstra's Algorithm adds a node v, we get the true shortest-path distance to v? YES :-D

DA (Dijkstra's Algorithm) is greedy because we always form the shortest new s-v path we can make from a path in S followed by a single edge.

  • Consider the set S at any point in the algorithm's execution. For each u ∈ S, the path P_u is a shortest s-u path.

DA does not always find shortest paths if some of the edges have negative lengths.

Implementation and Running Time There are n - 1 iterations of the While loop for a graph with n nodes, as each iteration adds a new node v to S. Selecting the correct node v efficiently is a more subtle issue. We explicitly maintain the values of the minima d'(v) for each node v ∈ V - S and keep the nodes V - S in a pq with d'(v) as their keys. Using the pq, we'll use the operations ChangeKey and ExtractMin.

How do we implement Da using a pq? We put the nodes V in a priority queue with d'(v) as the key for v ∈ V. To select the node v that should be added to the set S, we need the ExtractMin operation. During an iteration in which each node v is added to S and w∉ S is a node that remains in the pq, we note: If (v,w) is not an edge, then we don't do anything. If e' = (v,w) ∈ E, then there is a new value for the key. If d'(w) > d(v) + l_e', then we need to use the ChangeKey operation to decrease the key of node w appropriately. This ChangeKey operation can occur at most once per edge, when the tail of the edge e' is added to S.

Using a pq, Dijkstra's algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin and m ChangeKey operations.

Using the heap-based pq, each pq operations can be made to run in O(log n) time. Thus the overall time for the implementation is O(m log n).

4.5 - The Minimum Spanning Tree Problem

The Problem Suppose we have a set of locations to connect but we want to build it as cheaply as possible. For certain pairs (vi, vj), we may build a direct link between vi and vj for a certain cost c(vi, vj) > 0. thus we can represent the set of possible links that may be built using a graph G = (V,E), with a positive cost c_e associated with each edge e = (vi, vj). The problem is to find a subset of the edges T ⊆ E so that the graph (V,T) is connected and the total cost sum(ce) is as small as possible.

(4.16) Let T be a minimum-cost solution to the network design problem defined above. Then (V,T) is a tree.

We will call a subset T ⊆ E a spanning tree of G if (V, T) is a tree.

Designing Algorithms Three greedy algorithms for this problem:

  • Kruskal's - Starts with no edges and builds a spanning tree by successively inserting edges from E in order of increasing cost. As we move through the edges in this order, we insert each edge e as long as it does not create a cycle when added ot the edges we've already inserted. If, on the other hand, inserting e would result in a cycle, we don't and move on.
  • Prim's - Like Dijkstra's. Start with a root node s and try to grow a tree from s outward. At each step, add the node that can be attached as cheaply as possibly. We maintain a set S ⊆ V on which a spanning tree has been constructed so far. Initially, S = {s}. In each iteration, we grow S by one node, adding the node that minimizes the attachment cost and including the edge e = (u,v) that achieves this minimum in the tree.
  • Reverse-Delete - Start with the full graph (V, E) and begin deleting edges in order of decreasing cost. Starting from the most expensive, we delete each edge e as long as it doesn't disconnect the graph.

Analyzing the algorithms The three algorithms work by repeatedly inserting or deleting edges from a partial solution. So, when is it safe to include or eliminate an edge? (Note: assume all edge costs are distinct from one another.

Cut Property: Assume all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to 
all of V and let edge e = (v, w) be the minimum-cost edge with one end in S and the other in V - S. then every minimum spanning
tree contains the edge e.

Cycle Property: Assume that all edge costs are distinct. Let C be any cycle in G and let edge e = (v,w) be the most 
expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G.

To eliminate the assumption that all distances are distinct, we just “perturb” them all by realllly small amounts.

All three algorithms produce a minimum spanning tree of G.

Implementing Prim's The implementation of Prim's and Dijkstra's are almost identical. We need to be ale to decide which node v to add next to the growing set S by maintaining the attachment costs for each node v ∈ V - S. We keep the nodes in a pq with the attachment costs a(v) as the keys; we select a node with an ExtractMin operation and update the attachment costs using ChangeKey operations. There are n - iterations in which we perform ExtractMin and we perforem ChangeKey at most once for each edge. Thus,

Using a pq, Prim's can be implemented on a graph with n nodes and m edges to run in O(m) time,
plus the time for n ExtractMin and mChangeKey operations. Overall running time is O(m log n).

I found this section to be a 9.5. I really felt like I was confident in understanding the concepts.

4.6 - Implementing Kruskal's: Union-Find

The Problem The Union-Fidn data structure allows us to maintan disjoint sets. Given a node u, the operation Find(u) will return the name of the set containing u. This op can be used to test if two nodes u and v are in the same set by checking Find(u) == Find(v). The data structure will also implement an operation Union(X, Y) to merge two sets X and Y.

Find returns the name of the component containing u. If the components aren't the same, we can add the edge.

UF will support three operations:

  • MakeUnionFind(S) will return a UF on set S where all elements are in separate sets.
  • Find(u) will return the name of the set containing u.
  • Union(X,Y) will merge the sets X and Y into a single set.

The simple structure for UF is to maintain an array. For an array, Find takes O(one) (my one key won't work :( ), MakeUnionFind(S) takes O(n) and k Union operations takes at most O(k log k) time.

The better structure for UF uses pointers. Then, Union takes O(one), MakeUnionFind(S) takes O(n) time and Find takes O(log n) time.

courses/cs211/winter2014/journals/deirdre/chapter4.1393384297.txt.gz · Last modified: 2014/02/26 03:11 by tobind
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0