This is an old revision of the document!

Chapter 4

An algorithm is greedy if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion. One can often design many different greedy algorithms for the same problem, each one locally, incrementally optimizing some different measure on its way to a solution.

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

The interval scheduling problem, as we discussed in Chapter 1, is an ideal example to explain the “Greedy Stays Ahead” approach. We have a set of requests {1, 2 ….. n}; the i'th request corresponds to an interval of time starting at s(i) and finishing at f(i). We’ll say that a subset of the requests is compatible if no two of them overlap in time, and our goal is to accept as large a compatible subset as possible. Compatible sets of maximum size will be called optimal.

The basic idea in a greedy algorithm for interval scheduling is to use a simple rule to select a first request i_1. Once a request i_1 is accepted, we reject all requests that are not compatible with i_1. We then select the next request i_2 to be accepted, and again reject all requests that are not compatible with i_2. We continue in this fashion until we run out of requests.

A greedy rule that does lead to the optimal solution is based on the following idea: we should accept first the request that finishes first, that is, the request i for which f(i) is as small as possible. This is also quite a natural idea: we ensure that our resource becomes free as soon as possible while still satisfying one request. In this way we can maximize the time left to satisfy other requests.

We can formalize the idea using the following algorithm (we will use R to denote the set of requests that we have neither accepted nor rejected yet, and use A to denote the set of accepted requests):

Initially let R be the set of all requests, and let A be empty
While R is not yet empty
   Choose a request i ∈ R that has the smallest finishing time
   Add request i to A
   Delete all requests from R that are not compatible with request i
Endwhile
Return the set A as the set of accepted requests

Detailed proofs proving the algorithm's correctness, and more importantly, optimality, can be found on pages 118-122 of the textbook. (It does not make sense to repeat all the text here.)

4.1.1 Comments

I initially had trouble getting used to the idea of greedy algorithms in general, and optimality. Class discussions definitely helped, as did going back and re-reading the textbook sections after the lectures. Now that I am more comfortable with the idea of optimality, I feel like the idea behind these algorithms is genius. The idea behind the proofs–showing how the greedy algorithm stays ahead of the optimal solution with respect to a measure at all times, and as such, is the optimal solution–was pretty intuitive. I'd say that the section was a 7/10 in terms of how interesting the content was.

4.2 Scheduling to Minimize Lateness: An Exchange Argument

Consider a situation in which we have a single resource and a set of n requests to use the resource for an interval of time. Assume that the resource is available starting at time s. However, each request is now more flexible compared to the previous problem. Instead of a start time and finish time, the request i has a deadline d_i, and it requires a contiguous time interval of length t_i, but it is willing to be scheduled at any time before the deadline. Each accepted request must be assigned an interval of time of length t_i, and different requests must be assigned non-overlapping intervals.

Beginning at our overall start time s, we will assign each request i an interval of time of length t_i; let us denote this interval by [s(i), f(i)], with f(i) = s(i) + t_i. Also, we will say that a request i is late if it misses the deadline, i.e., f(i) > d_i. The lateness of such a request i is defined to be l_i = f(i) - d_i (l_i = 0 if request i is not late).

There is a basic greedy algorithm that always produces an optimal solution. We sort the jobs in increasing order of their deadlines d_i, and schedule them in this order. There is an intuitive basis to this rule: we should make sure that jobs with earlier deadlines get completed earlier.

For this algorithm, we can assume that the jobs are labeled in the order of their deadlines, that is, we have d_1 ≤ . .. ≤ d_n. We will schedule all jobs in this order. Again, let s be the start time for all jobs. Job 1 will start at time s = s(1) and end at time f(1) = s(1) + t_1; Job 2 will start at time s(2) = f(1) and end at time f(2) = s(2) + t_2; and so on. We will use f to denote the finishing time of the last scheduled.

Order the jobs in order of their deadlines 
Assume for simplicity of notation that d_1 ≤ ... ≤ d_n
Initially f = s
Consider the jobs i=l . ... . n in this order
   Assign job i to the time interval from s(i)=f to f(i)=f + t_i 
   Let f = f + t_i
End
Return the set of scheduled intervals [s(i), f(i)] for i=1 . ... . n

Detailed proofs proving the algorithm's correctness, and more importantly, optimality, can be found on pages 128-131 of the textbook.

4.2.1 Comments

The exchange argument was less challenging to understand than the greedy algorithm stays ahead approach. The concept was very to understand–swapping two adjacent jobs makes sense for the optimal approach. 8/10 in terms of how interesting I found the content to be.

4.4 Shortest Path in a Graph

A basic algorithmic problem is to determine the shortest path between nodes in a graph. We may ask this as a point-to-point question: Given nodes u and v, what is the shortest u-v path? Or in other words: Given a start node s, what is the shortest path from s to each other node?

We are given a directed graph G = (V, E), with a designated start node s. We assume that s has a path to every other node in G. Each edge e has a length l_e ≥ 0, indicating the time (or distance, or cost) it takes to traverse e. For a path P, the length of P–denoted l(P)–is the sum of the lengths of all edges in P. Our goal is to determine the shortest path from s to every other node in the graph.

We can solve the problem using the following algorithm:

Dijkstra’s Algorithm (G, l)
Let S be the set of explored nodes
   For each u ∈ S, we store a distsnce d(u)
Initially S = {s} and d(s) = 0
While S ≠ V
   Select a node u ∈ S with at least one edge from S for which
     d’(v) = min_e=(u,v):u∈S d(u) + l_e is as small as possible 
   Add u to S and define d(v)=d’(v)                
Endwhile

We can use a set to represent the set of explored nodes, and a priority queue to maintain the next node to be explored. As such, the algorithm runs in O(m log n) runtime.

Detailed proofs proving the algorithm's correctness and optimality can be found on pages 139-142 of the textbook.

4.4.1 Comments

I had somewhat of a superficial, prior familiarity with Dijkstra's algorithm. As such, following it was not that difficult for me personally. I will say, however, that it was significantly easier to understand it in class compared to understanding it during my readings. I feel like that was because we spent a good amount of time in class tracing through the algorithm.

One thing that still bothers me, however, is the fact that the runtime of the algorithm as O(m log n) based of the “fact” that there are always more edges than the nodes, as we discussed in class. I still feel like, if we had a tree that had nodes connected linearly, e.g A connected to B, B connected to C, C connected to D, and D connected to E, we have 5 nodes and 4 edges. In this (special) case, the number of nodes are more than the number of edges.

Other than that, I will rate this section a 9/10 in term of interesting-ness. :)

4.5 The Minimum Spanning Tree Problem

Suppose we have a set of locations V = {v_l, v_2, …, v_n}, and we want to build a communication network on top of them. The network should be connected, i.e. there should be a path between every pair of nodes. We wish to build this network as cheaply as possible.

For certain pairs (v_i, v_j), we may build a direct link between v_i and v_j for a certain cost c(v_i, v_j) > 0. Thus we can represent the set of possible links that may be built using a graph G = (V, E), with a positive cost C_e associated with each edge e = (v_i, v_j). The problem is to find a subset of the edges T ⊆ E so that the graph (V, T) is connected, and the total cost is as small as possible.

Lemma: Let T be a minimum-cost solution to the network design problem defined above. Then (V, T) is a tree. Proof: By definition, (V, T) must be connected. It must also contain no cycles, which we will prove. Suppose it contained a cycle C, and let e be any edge on C. We claim that (V, T - {e}) is still connected, since any path that previously used the edge e can now go “the long way” around the remainder of the cycle C instead. It follows that (V, T - {e}) is also a valid solution to the problem, and it is cheaper. We get our contradiction here.

4.5.1 Designing the Algorithms

This problem is a case where many of the first greedy algorithms that come to mind turn out to actually be correct: they each solve the problem optimally.

4.5.1.1 Kruskal's Algorithm

One simple algorithm starts without any edges at all and builds a spanning tree by successively inserting edges from E in order of increasing cost. As we move through the edges in this order, we insert each edge e as long as it does not create a cycle when added to the edges we’ve already inserted. If, on the other hand, inserting e would result in a cycle, then we simply discard e and continue. This approach is called Kruskal’s Algorithm.

4.5.1.2 Prim's Algorithm

Another simple greedy algorithm can be designed by analogy with Dijkstra’s Algorithm for paths, although, in fact, it is even simpler to specify than Dijkstra’s Algorithm. We start with a root node s and try to greedily grow a tree from s outward. At each step, we simply add the node that can be attached as cheaply as possibly to the partial tree we already have. This approach is called Prim's algorithm.

4.5.1.2 The Reverse-Delete Algorithm

Finally, we can design a greedy algorithm by running sort of a “backward” version of Kruskal’s Algorithm. Specifically, we start with the full graph (V, E) and begin deleting edges in order of decreasing cost. As we get to each edge e (starting from the most expensive), we delete it as long as doing so would not actually disconnect the graph we currently have. This approach is generally called the Reverse-Delete Algorithm.

4.5.2 Analyzing the Algorithms

When is it safe to include an edge in the minimum spanning tree? When can we guarantee that an edge is not in the minimum spanning tree? To answer to these questions lies in the Cut and Cycle Properties.

4.5.2.1 The Cut Property

Assume that all edge costs are distinct. 
Let S be any subset of nodes that is neither empty nor equal to all of V, 
and let edge e = (v, w) be the minimum-cost edge with one end in S and 
the other in V - S. Then every minimum spanning tree contains the edge e.

A detailed proof of the Cut Property can be found on page 145 of the textbook (and is omitted here for redundancy purposes).

4.5.2.2 The Cycle Property

Assume that all edge costs are distinct. 
Let C be any cycle in G, and the edge e = (v, w) be the most expensive edge
belonging to C. 
Then e does not belong to any minimum spanning tree of G.

A detailed proof of the Cut Property can be found on pages 147-8 of the textbook (and is omitted here for redundancy purposes).

4.5.3 Proving Optimality of the Algorithms

4.5.3.1 Kruskal's Algorithm

Claim: Kruskal’s Algorithm produces a minimum spanning tree of G.

Proof: Consider any edge e = (v, w) added by Kruskal’s Algorithm, and let S be the set of all nodes to which v has a path at the moment just before e is added. Clearly v ∈ S, but w ∉ S, since adding e does not create a cycle. Moreover, no edge from S to V - S has been encountered yet, since any such edge could have been added without creating a cycle, and hence would have been added by Kruskal’s Algorithm. Thus e is the cheapest edge with one end in S and the other in V - S, and so, by the Cut Property, belongs to every minimum spanning tree. So, if we can show that the output (V, T) of Kruskal’s Algorithm is in fact a spanning tree of G, then we will be done. Clearly (V, T) contains no cycles, since the algorithm is explicitly designed to avoid creating cycles. Further, if (V, T) were not connected, then there would exist a nonempty subset of nodes S (not equal to all of V) such that there is no edge from S to V - S. But this contradicts the behavior of the algorithm: we know that since G is connected, there is at least one edge between S and V - S, and the algorithm will add the first of these that it encounters.

4.5.3.2 Prim's Algorithm

Claim: Prim’s Algorithm produces a minimum spanning tree of G.

Proof: For Prim’s Algorithm, it is also very easy to show that it only adds edges belonging to every minimum spanning tree. Indeed, in each iteration of the algorithm, there is a set S ⊆ V on which a partial spanning tree has been constructed, and a node v and edge e are added that minimize the quantity min_{e = (u,v):u ∈ s} C_e. By definition, e is the cheapest edge with one end in S and the other end in V - S, and so by the Cut Property it is in every minimum spanning tree. It is also straightforward to show that Prim’s Algorithm produces a spanning tree of G, and hence it produces a minimum spanning tree.

4.5.3.3 The Reverse Delete Algorithm

Claim: The Reverse Delete Algorithm produces a minimum spanning tree of G.

Proof: Consider any edge e = (v, w) removed by Reverse-Delete. At the time that e is removed, it lies on a cycle C; and since it is the first edge encountered by the algorithm in decreasing order of edge costs, it must be the most expensive edge on C. Thus by the the Cycle Property, e does not belong to any minimum spanning tree. So if we show that the output (V, T) of Reverse-Delete is a spanning tree of G, we will be done. Clearly (V, T) is connected, since the algorithm never removes an edge when this will disconnect the graph. Now, suppose by way of contradiction that (V, T) contains a cycle C. Consider the most expensive edge e on C, which would be the first one encountered by the algorithm. This edge should have been removed, since its removal would not have disconnected the graph, and this contradicts the behavior of Reverse-Delete.

4.5.4 Implementing Prim's Algorithm

By analogy with Dijkstra’s Algorithm, we need to be able to decide which node v to add next to the growing set S, by maintaining the attachment costs a(v) = min_{e = (u,v):u ∈ s} C_e for each node v ∈ V - S. As before, we keep the nodes in a priority queue with these attachment costs a(v) as the keys; we select a node with an ExtractMin operation, and update the attachment costs using ChangeKey operations. There are n - 1 iterations in which we perform ExtractMin, and we perform ChangeKey at most once for each edge.

Using a priority queue, Prim’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin, and m ChangeKey operations. If we use a heap-based priority queue we can implement both ExtractMin and ChangeKey in O(log n) time, and so get an overall running time of O(m log n).

4.5.5 Comments

Overall, I found this section pretty interesting. I liked how we intuitively came up with the Cut and Cycle properties, and then used those properties to prove the optimality of the algorithms. I feel like this was a way of proving optimality that we hadn't visited before. I'd give it a 9/10 in terms of how interesting I found the section to be.

4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure

The Union-Find data structure allows us to maintain disjoint sets (such as the components of a graph) in the following sense. Given a node u, the operation Find(u) will return the name of the set containing u. This operation can be used to test if two nodes u and v are in the same set, by simply checking if Find(u) = Find(v). The data structure will also implement an operation Union(A, B) to take two sets A and B and merge them to a single set.

These operations can be used to maintain connected components of an evolving graph G = (V, E) as edges are added. The sets will be the connected components of the graph. For a node u, the operation Find(u) will return the name of the component containing u. If we add an edge (u, v) to the graph, then we first test if u and v are already in the same connected component (by testing if Find(u) = Find(v)). If they are not, then Union(Find(u),Find(v)) can be used to merge the two components into one. It is important to note that the Union-Find data structure can only be used to maintain components of a graph as we add edges; it is not designed to handle the effects of edge deletion, which may result in a single component being “split” into two.

4.6.1 Implementing the Algorithm

We’ll use the Union-Find data structure to implement Kruskal’s Algorithm. First we need to sort the edges by cost. This takes time O(m log m). Since we have at most one edge between any pair of nodes, we have m < n^2 and hence this running time is also O(m log n).

After the sorting operation, we use the Union-Find data structure to maintain the connected components of (V, T) as edges are added. As each edge e = (v, w) is considered, we compute Find(u) and Find(v) and test if they are equal to see if v and w belong to different components. We use Union(Find(u),Find(v)) to merge the two components, if the algorithm decides to include edge e in the tree T.

We are doing a total of at most 2m Find and n - 1 Union operations over the course of Kruskal’s Algorithm. This is a total of O(m log n) time.

4.6.2 Comments

This section was also pretty intuitive and easy to follow. Before knowing that Union-Find was an actual thing, I was wondering how on earth we would implement the Kruskal's Algorithm. With the introduction of the new data structure, it makes implementing the algorithm much easier. This section was around 7/10 in terms of interesting-ness (if that is even a word?).

4.7 Clusters

Clustering arises whenever one has a collection of objects that one is trying to classify or organize into coherent groups. One common approach to this problem is to define a distance function on the objects, with the interpretation that obiects at a larger distance from one another are less similar to each other. Given a distance function on the objects, the clustering problem seeks to divide them into groups so that, intuitively, obiects within the same group are “close,” and objects in different groups are “far apart.”

Suppose we are given a set U of n obiects, labeled p_1, p_2, …, p_n. For each pair, p_i and p_j, we have a numerical distance d(p_i, p_j). Suppose we are seeking to divide the obiects in U into k groups, for a given parameter k. We say that a k-clustering of U is a partition of U into k non-empty sets C_1, C_2 ….. C_k. We define the spacing of a k-clustering to be the minimum distance between any pair of points lying in different clusters.

As such, there are exponentially many different k-clusterings of a set U. How then can we efficiently find the one that has maximum spacing?

4.7.1 Designing the Algorithm

To find a clustering of maximum spacing, we consider growing a graph on the vertex set U. The connected components will be the clusters, and we will try to bring nearby points together into the same cluster as rapidly as possible. Thus we start by drawing an edge between the closest pair of points. We then draw an edge between the next closest pair of points. We continue adding edges between pairs of points, in order of increasing distance d(p_i, p_j). In this way, we are growing a graph H on U edge by edge, with connected components corresponding to clusters. If we are about to add the edge (p_i, p_j) and find that p_i and p_j already belong to the same cluster, we will refrain from adding the edge–it’s not necessary, because it won’t change the set of components. In this way, our graph-growing process will never create a cycle; so H will actually be a union of trees. Each time we add an edge that spans two distinct components, it is as though we have merged the two corresponding clusters.

Although our graph-growing procedure was motivated by this cluster-merging idea, our procedure is precisely Kruskal’s Minimum Spanning Tree Algorithm. We are doing exactly what Kruskal’s Algorithm would do if given a graph G on U in which there was an edge of cost d(p_i, p_j) between each pair of nodes (p_i, p_j). The only difference is that we seek a k-clustering, so we stop the procedure once we obtain k connected components.

Claim: The components C_1, C_2, …, C_k formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.

The detailed proof for this claim can be found on pages 160-1 of the book, and is omitted here for redundancy purposes.

4.7.2 Comments

I felt like this was one of the less interesting sections that we've done. I don't mean to imply that the problem was insignificant–I just felt like the section did not captivate my attention as much. And since we could not really discuss the algorithm in detail in class on Friday (due to lack of time), I found myself spending more time than expected in understanding the algorithm and its proofs. In terms of how interesting the section was, I'd say about 6/10.

4.8 Huffman Codes and Data Compression

4.8.1 Prefix and Optimal Prefix Codes

We say that a prefix code for a set S of letters is a function γ that maps each letter x ∈ S to some sequence of zeros and ones, in such a way that for distinct x, y ∈ S, the sequence γ(x) is not a prefix of the sequence γ(y). If we have a text consisting of a sequence of letters x_1 x_2 x_3 . . . x_n, we can convert this to a sequence of bits by simply encoding each letter as a bit sequence using γ and then concatenating all these bit sequences together: γ(x_l) γ(x_2) . . . γ(x_n).

However, given an alphabet and a set of frequencies for the letters, we would ideally like to produce a prefix code that is as efficient as possible–namely, a prefix code that minimizes the average number of bits per letter (ABL). We call such a prefix code optimal.

4.8.2 Designing the Algorithm

We describe a greedy method to construct an optimal prefix code efficiently. First, however, it is useful to develop a tree-based means of representing prefix codes that exposes their structure more clearly than simply the lists of function values (γ).

A labeled binary tree T naturally describes a prefix code. For each letter x ∈ S, we follow the path from the root to the leaf labeled x: each time the path goes from a node to its left child, we write down a 0, and each time the path goes from a node to its right child, we write down a 1. We take the resulting string of bits as the encoding of x.

We claim that the encoding of S from T is a prefix code. This is true because in order for the encoding of x to be a prefix of the encoding of y, the path from the root to x would have to be a prefix of the path from the root to y. However, this is the same as saying that x would lie on the path from the root to y, which isn’t possible if x is a leaf node. What's more is that this relationship between binary trees and prefix codes works in the other direction as well.

As such, the search for an optimal prefix code can be viewed as the search for a binary tree T, together with a labeling of the leaves of T, that minimizes the average number of bits per letter. Note that the length of the encoding of a letter x ∈ S is simply the length of the path from the root to the leaf labeled x. Thus we dre seeking the labeled tree that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: ∑ (x ∈ S) f_x . depth(x). We will use ABL(T) to denote this quantity.

Table of Contents