Chapter 4

Greedy Algorithms

“Greed… is good. Greed is right. Greed works.” -Gordon Gecko, played by Michael Douglas in “Wall Street”, 1987

A greedy algorithm is one that builds up on small steps making decisions to optimize the solution.

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

For this, we consider the same problem as in Chapter 1, where we have a set of requests and we need to figure out the optimal way of scheduling the jobs.

Designing a Greedy Algorithm

The simple rule is to select a request and reject all requests that are not compatible with the selected request. Then we continue the process with the rest of the requests. Some rules of ordering the selection could be according to:

Start time: Select the one that starts the earliest. This would allow us to use the resources early on. However, this is not optimal because if the earliest request is very long, we would be rejecting a lot of other jobs for this one. (Picture 4.1(a), page 117)
Shortest duration: Select the one that has the shortest duration. This process is better than the previous one but is still not optimal. (Picture 4.1(b), page 117)
Fewest conflicts: We select a request that has the fewest number of conflicts with the other requests. Setting up a counterexample for this one is hard but it is possible. (Picture 4.1©, page 117)
Ending time: Select the one that finishes earliest. This method is optimal as we perform an algorithm in such a way that the resources are free as soon as possible. (Algorithm on page 118)

Analyzing the Algorithm

It returns a compatible set of requests. (Proof on page 119)
For all indices r ⇐ k we have f(ir) ⇐ f(jr), where ir is the rth element from the set of k elements that the algorithm returns and jr is the rth element from the optimal set of k elements. (Proof on page 120)
The greedy algorithm returns an optimal set. (Proof on page 121)

Runtime: O(n log(n)) (Proof on page 121)

Extensions

There are problems that might arise in a practical situation:

We might not be able to know all the requests beforehand.
Every request could have different values for us.

A Related Problem: Scheduling All Intervals

Interval Partitioning Problem: We have multiple resources but we need to complete all requests using as minimum resources as possible. (Page 122, 123) “In any instance of Interval partitioning, the number of resources needed is at least the depth of the set of intervals.” (Proof on page 123, 124)

Interval Partitioning Problem Algorithm: Page 124

If we use the greedy algorithm as provided, every request will be assigned to a resource and no two overlapping requests would be assigned to the same resource. (Proof on page 124, 125)

The algorithm schedules every request on a resource, using a number of resources equal to the depth of the set of intervals. This is the optimal number of resources needed.

4.2 Scheduling to Minimize Lateness: An Exchange Argument

The Problem

Lateness Minimizing Problem, like the Interval Scheduling Problem, has one resource, but differs such that the requests have a deadline and can be scheduled at any time, provided that it meets the deadline.

Designing the Algorithm

Order the requests according to their duration. This algorithm fails because it ignores the deadlines of the requests.
Order the requests according to their slack. This algorithm fails too (Proof on page 127)
Earliest Deadline First: We should be able to finish the jobs with earlier deadlines earlier. Thus, this algorithm works. (Algorithm on page 127, 128)

Analyzing the Algorithm

There is an optimal schedule with no idle time. (Proof on page 128)
All schedules with no inversions and no idle time have the same maximum lateness. (Proof on page 128, 129)
There is an optimal schedule that has no inversions and no idle time. (Proof on page 129, 130)
The schedule produced by the greedy algorithm has optimal maximum lateness. (The two statements above prove this)

Extensions

A much harder problem would occur when there also exists, for each request, an earliest possible starting time, or earliest release time.

4.4 Shortest Paths in a Graph

The Problem

“Given a start node s, what is the shortest path from s to each other node?”

Designing the Algorithm

Dijkstra has proposed a very simple algorithm to solve undirected shortest-path problems. From the source node, we find a path to each of the rest of the nodes and search for the shortest-paths for each of the nodes. (Algorithm on page 138)

Analyzing the Algorithm

For each path from the source, s, to any other node, u, already explored, the path is the shortest s-u path. (Proof on page 139, 140)

This algorithm, however, does not work if the distance between any of the nodes is negative. Also, Dijkstra's algorithm is just a standard BFS algorithm to traverse a graph.

Runtime

Total runtime: O(m log n) on a graph with n nodes and m edges. (Explanation in page 141, 142)

4.5 The Minimum Spanning Tree Problem

The Problem

We need to build a communication tree on different locations, with minimum cost and a connected graph.

The minimum-cost solution to the problem is a tree. (Proof on page 142, 143)

Designing Algorithms

Some greedy algorithms to think of, and that actually works, are:

Kruskal's Algorithm: Inserting edges in the order of increasing cost, as long as it does not create a cycle.
Prim's Algorithm: Starting at a node, we add a node that can be most cheaply added to the node, till all the nodes are connected.
Reverse-Delete Algorithm: This is just the reverse of Kruskal's algorithm. Remove edges in the order of decreasing cost, till there exists no cycle.

Analyzing the Algorithms

Cut Property: “Assume that all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v, w) be the minimum-cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e.” (Proof on page 145)

Cycle Property: “Assume that all edge costs are distinct. Let C be any cycle in G, and let edge e = (v, w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G.” (Proof on page 147, 148)

All of the three algorithms above produce a minimum spanning tree of G (Proofs on pages 146-149)

Eliminating the Assumption that All Edge Costs Are Distinct (Please see question section below) How do we know that the algorithms we discussed still works if the edge costs are not distinct?

Implementing Prim's Algorithm

Prim's algorithm runs in O(m log n) time, on a graph with n nodes and m edges with n ExtractMin and m ChangeKey operations, as before. (Explanation on page 150)

Extension

Tension between goals (Please see question section below)
Congestion on the edges.
Trees have a property that destroying one link causes the whole tree to fail.

4.6 Implementing Kruskal's Algorithm: The Union-Find Data Structure

The Problem

Using the Union-Find data structure, we determine whether two nodes, u and v, belong to the same set or not. It will support the following three actions:

Make a Union-Find data structure
For an element u in S, Find(u) would return S
Union(A, B) would return a union of sets A and B

A Simple Data Structure for Union-Find

We use an array to represent a Union-Find data structure, which contains the sets containing each element. If S is a set of size n, Component(S) is the name containing S. This makes Find(u) to run on O(1) time and Union(A, B) to run on O(n) time. We perform some optimization:

Explicitly maintain the list of elements in each of the sets so that we don't need to look for the elements throughout the array, again and again
We name the union as the larger of the two sets, so that only the other set needs to be updated.

Even with these optimization, the worst case possible would end up with a runtime O(n), if both sets A and B are very large and contain a constant fraction of all the elements. However, this does not happen very often.

“The Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O(k log k) time.” (Proof on page 153, 154)

A Better Data Structure for Union-Find

We can implement Union-Find using pointers. This data structure implements Union in O(1) time since we have to update only one pointer. But Find(u) would take a long time and MAY go upto O(n) time. To prevent this, we still use the optimization mentioned above, to name the union as the larger set.

Using the above information, “A Union operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and a Find operation takes O(log n) time.” (Proof on page 155)

Further Improvements

In an improved implementation, we update every pointed in the path to perform Find(v) so that each pointer points to the current name of the set. Thus now, Find takes O(log n) time at the most.

Implementing Kruskal's Algorithm

We sort the edges by the cost, which takes O(m log n) time, then we use the Union-Find data structure to maintain connected components as the edges are added. For each of the current edges, we check if Find(u) and Find(v) are equal and use Union(Find(u), Find(v)).

At most, we perform 2m Find and n-1 Union. Thus “Kruskal's algorithm can be implemented on a graph with n nodes and m edges to run in O(m log n) time.” (Page 157)

4.7 Clustering

The Problem

Clustering refers to a problem in which we are given a distance between various objects and we classify them into clusters depending on how far or how close they are to each other.

Clusterings of Maximum Spacing: Lets suppose that we have a n number of objects and we wish to divide them into, a given, k groups. The “spacing” of a 2 clusters is the minimum distance between any pair of points in clusters. Our goal in the problem is to seek maximum spacing.

Designing the Algorithm

We use Kruskal's algorithm to find the minimum cost edges and keep adding them. If we arrive at an edge between two nodes that already belong to a cluster, we don't need to add the edge. We stop once we have k number of clusters. (See page 160 for an example)

Analyzing the Algorithm

“The components C1, C2, … Ck formed by deleting the k - 1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.” (Proof on page 160 and 161)

4.8 Huffman Codes and Data Compression

The Problem

Encoding Symbols Using Bits: We use sequences of bits to represent each letter of the English alphabet in a computer. Since we have 26 different alphabets, we might have to use at least 5 bits for each alphabet to have a unique code for each of the alphabets. However, we can even optimize this by using fewer number of bits to represent the most common occurring alphabets like e, t, a, o, i and n than the least frequently occurring alphabets like q, j, x and z. We would be saving a lot of space if data are stored that way. This is an example of data compression. We need to take advantage of the nonuniform frequencies of the alphabets.

Variable-Length Encoding Schemes: Samuel Morse developed Morse code to translate each letter into a sequence of dots (short pulses) and dashes (long pulses) to send telegraphs. For now, let's assume a dot as a 0 and a dash as a 1. He consulted different local printing presses for frequency estimates of the alphabets in English. Using this, Morse code maps e to 0, t to 1, a to 01 etc. With only this information, 0101 could represent eta, aa, etet or aet. To make it possible to read the encoded message, pauses are included in Morse code. That would mean there are actually three symbols used in a Morse code: 0, 1 and pause. Thus, if we want to use only 0 and 1 as symbols, we need to find a different method of coding the alphabets.

Prefix Codes: The problem with Morse code is that there exist pairs of letters where the bit string that encodes one letter is a prefix of the bit string that encodes another. The following Prefix Code eliminates this problem:

a = 11
b = 01
c = 001
d = 10
e = 000

The above Prefix Code doesn't have any character with an encoding that is a prefix to another character.

Optimal Prefix Codes: With the above codes, and the frequencies as follows:

a = .32
b = .25
c = .20
d = .18
e = .05

…, we have the average number of bits per letter as follows:

.32 * 2 + .25 * 2 + .20 * 3 + .18 * 2 + .05 * 3 = 2.25

But the above code is still not optimal. (See page 165 and 166 for more details) We now need to find a prefix code that is efficient.

Designing the Algorithm

Representing Prefix Codes Using Binary Trees: We make a binary tree T where each node is either a leaf or has 2 children. The number of leaves is also equal to number of different alphabets in a set S.

“The encoding of S constructed from T is a prefix code.” (Proof on page 166, 167)

We can construct a binary tree in such a way that the top node has two children, the right child being the most frequently occurred letter and the left child being a parent to two other child. Every time you take a right path, the code is 1, and for left, it is 0. This, the letter on the right child of the root node has a code 1. (Picture on page 168)

“The binary tree corresponding to the optimal prefix code is full.” (Proof on page 168, 169)

A First Attempt: The Top-Down Approach: We need to pack the leaves in the trees as tightly as possible. There are different ways to do it but there are schemes called Shannon-Fano codes that can make this precise. The resulting code is the one given above, which we already know that it is not optimal. Huffman, a graduate student in Fayo's class solved the problem a few years later.

What if We Knew the Tree Structure of the Optimal Prefix Code?: If we already have the tree but not the labeling of the leaves, it is very easy to create the codes.

“Suppose that u and v are two different leaves of the optimal tree T* such that depth of u is less than depth of v. Also, suppose that in a labeling of T* corresponding to an optimal prefix code, leaf u is labeled with y and leaf v is labeled with z, both belonging to S. Then fy >= fz.” (Proof on page 170)

Suppose that v is the deepest node in T* and its parent is u. Since the tree is a full binary tree, u has another child, w. “w is a leaf of T*.” (Proof on page 171)

“There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*.”

The Algorithm to create such a tree is given in page 172, 173, with explanation too.

Analyzing the Algorithm

“The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code.” (Proof on page 174 and 175, section 4.32 and 4.33)

Implementation and Running Time: Using a priority queue, the running time of the algorithm would be O(k log k).

Extensions

A black and white picture pf 1000 X 1000 pixels would be very hard to compress. Of course there is a way to compress them but each pixel representing a black or white code is already being represented by one symbol: black or white.

If a and b occur throughout the first section of a text with no c and d at all, and c and d occurs throughout the second part of the text, with no a and b at all (among only 4 alphabets), we might want to change the code halfway through the text so that the average number of bits is maintained low and is not radically changed halfway through the text.

Questions and Comments

Questions for Feb 29th

As for questions, I don't understand the two of the followings:

Eliminating the Assumption that All Edge Costs Are Distinct (Section 4.5's Analysis)
Tension between goals (Section 4.5's Extension)

Most of the rest, we have discussed in class, and I have no problems.

Questions for Mar 7th

No questions. No problems, mostly dealt everything in class. Readable, probably because of the class. Interesting things:

So my Organizational Behaviour Professor showed us Gordon Gecko's speech in his class today. It was quite interesting that two of my classes are dealing with GREED at the same time, not really in an immoral way though.
It was interesting to know that Huffman solved a problem that his teacher couldn't solve.
Despite referring the book a million times, section 4.8 would be the longest section on this wiki so far. I know it's too long, but I guess writing such notes would help me refer to important parts in a chapter without really having to go through most of the book.

Table of Contents

Chapter 4

Greedy Algorithms

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

Designing a Greedy Algorithm

Analyzing the Algorithm

Extensions

A Related Problem: Scheduling All Intervals

4.2 Scheduling to Minimize Lateness: An Exchange Argument

The Problem

Designing the Algorithm

Analyzing the Algorithm

Extensions

4.4 Shortest Paths in a Graph

The Problem

Designing the Algorithm

Analyzing the Algorithm

Runtime

4.5 The Minimum Spanning Tree Problem

The Problem

Designing Algorithms

Analyzing the Algorithms

Implementing Prim's Algorithm

Extension

4.6 Implementing Kruskal's Algorithm: The Union-Find Data Structure

The Problem

A Simple Data Structure for Union-Find

A Better Data Structure for Union-Find

Further Improvements

Implementing Kruskal's Algorithm

4.7 Clustering

The Problem

Designing the Algorithm

Analyzing the Algorithm

4.8 Huffman Codes and Data Compression

The Problem

Designing the Algorithm

Analyzing the Algorithm

Extensions

Questions and Comments