Chapter 4: Greedy Algorithms

A greedy algorithm creates a solution by making the optimal choice at a local level in the hopes of creating a solution that is globally optimal. The easiest example of this is giving change in the way that uses the least amount of coins. There are several ways to prove that the greedy solution is the optimal solution. The two this chapter discusses are greedy stays ahead and exchange (swappy). The motivation by the greedy algorithm is to solve a non-trivial problem optimally and can be used to solve interesting problems.

Section 4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead

The Interval Scheduling problem is a problem where we have n intervals that each have a specific start and finish time. We want to perform as many of these n tasks as possible but the intervals cannot overlap , i.e. they are compatible. The section traces through several ways we could order the intervals prior to running the greedy algorithm, providing counter-examples to the orderings that are not optimal.The optimal ordering has been found to be ordering the intervals by the time they finish. Then we choose the job the finishes first thus allowing us to maximize the time we have left to perform other jobs. If we order the intervals in this way the algorithm (text page 118) will return an optimal ordering of intervals. In order to prove that the greedy solution is the optimal solution, we use the “Greedy Stays Ahead” proof. If we create the “optimal” set of intervals, we know that at every point the greedy algorithm will have chosen an interval that ends either sooner or at the same time as the optimal set. From there we can see that there would not be an interval in the optimal solution that we would not have chosen in the greedy solution. The greedy algorithm runs in O(nloqn) time. The interval sorting take O(logn) time and then the while loop with run in O(n) time. In addition to the greedy algorithm solving the Interval Scheduling problem, it also solves the Interval Partitioning problem. This problem has a set of intervals or jobs that we want to fit in the smallest amount of resources possible (i.e. a bunch of lectures that we want to fit into the smallest number of classrooms). We immediately know the minimum number of classrooms we will use, by seeing how deep the set of intervals is. If there are 3 intervals that overlap at any given point, we know that we will need to have at least 3 resources to fit everything. Then we can design a greedy algorithm (text page 124) to place the intervals into the minimum number of resources. After analyzing the algorithm, we find that the optimal number of resources is exactly the depth of the set of intervals.

This was a pretty straightforward section. I would give it a 9/10 because it provided solutions for some cool problems and explained everything in a clear manner.

Section 4.2: Scheduling to Minimize Lateness: An Exchange Argument

The problem discussed in this section is similar to the schedule all intervals problem except that each request now has a flexible start time. Each interval has a specific deadline and a continuous length of time it must run. The goal of the problem is is to minimize lateness, which is the time that it takes for an interval to finish after its deadline. When designing the algorithm, we sort the jobs in increasing order of their deadlines and then schedule them in this order. Thus the job with the earliest deadline will be ordered first. This algorithm is outlined on page 127 of the test, and will always return the optimal solution, even if it does not seem like it at first. The optimal solution has no idle time, which would be time between jobs where the computer was not processing any jobs. We can prove that this greedy algorithm is the optimal solution by using an exchange argument. This means that there are no inversions, or places in the schedule where a job with a later deadline was scheduled before a job with an earlier deadline. This problem can also be extending to analyze the problem where each of the intervals has a deadline, a needed run time, and an earliest possible start time. The motivation behind this problem could be scheduling jobs for a computer to process for a printer to print. The section did not really discuss the O runtime of the algorithm. This section was straightforward and helped to reinforce what we had discussed in class. I would give the section a 8/10.

Section 4.4: Shortest Paths in a Graph

The problem of finding the shortest path in the graph can be solved by using a greedy algorithm. We use this algorithm on directed graphs, where each to the edges have a specific weight. The algorithm we used was designed by Edsger Dijkstra and we at each point choose the shortest path to the next node. The algorithm, on page 138 of the text, seems a little confusing at points, but does manage to produce an optimal solution every time. We just have to be careful to not grow the set S by the wrong node, which would give us an incorrect shortest path. Thus at any point in our algorithm, the path that we have chosen is the shortest path. This algorithm does not work for negative lengths and is essentially an extension of the breadth-first search. Now looking at the run time of the algorithm we can see that the algorithm, without the creation of helper operations that will be created outside of the algorithm. If we utilize ChangeKey and ExtractMin, by keeping the nodes in a priority queue, we are able to run this algorithm in O(mlogn) time.

The motivation behind this problem is to determine how quickly we can get from one point to another, such as from one website to another.

We went over this algorithm pretty extensively in class and that was much more helpful than the book. Trying to read about the algorithm was extremely confusing.

I would give this section a 6/10 as it was confusing to read and class was much more helpful.

Section 4.5: The Minimum Spanning Tree Problem

The minimum scanning tree problem is looking for a connection between nodes that is built as cheaply as possible. A graph G has have exponentially many spanning trees and a subset of the tree T, is a spanning tree if (V,T) is a tree as well. There are three greedy algorithms that are able to solve this spanning tree problem : Kruskal's algorithm, Prim's algorithm, and Reverse-Delete algorithm. Each of these is guaranteed to produce an optimal solution and shows that there is a robustness to the minimum spanning tree. Each of these algorithms work differently, but are essentially doing the same thing; repeatedly inserting or deleting edges form a partial solution.

The section discusses the Cut Property, which is discussed on page 145 of the text. We then use this Cut Property to determine the optimality of the different algorithms. The Prim and Kruskal's algorithms only remove edges when it is approved by the Cut Property.

There is another property called the Cycle Property, discussed on page 147 of the text. We can use the cycle property to prove that the Reverse-Delete algorithm is optimal as it only adds edges approved by the cycle property.

Prim and Kruskal's algorithms run in O(mlogn) time by using priority queues to implement them.

I missed the class where we discussed this section, so it was a little confusing to read. This summary is a tad short, but that is because I am still processing what I have read. I think I understand what is happening, but I will probably go through and add more to this after reading the section again.

I give this section a 7/10 as I am still working on understanding it.

Section 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure

We did not have a class on this section so it is taking me a little longer than I was expecting to go through the section and understand.

In this section we look at graphs that change with time and creating connected components using Kruskal's algorithm. The Union-Find data structure enables us to maintain disjoint sets.

I think I may need to come in to speak with you about this section as I am pretty confused trying to read it from the chapter. I am having difficulty discerning what the pertinent information is. I am going to revisit this later in the week and figure out what exactly is going on!

Section 4.7: Clustering

Clustering is the solution to the problem where we want to organize objects into coherent groups where the closer the objects are, the more they have in common. Then we want to find ckusterings with the maximum spacing. Given a value k, we want to find a solution with k clusters that has the clusters as far apart from each other as possible. In order to find this maximum spacing we use a modified version of Kruskal's algorithm. We start by adding an edge between the two closest points and then the next two closet, etc. We do not add edges between nodes that are already in the same cluster as that would create a cycle. We run the Kruskal's algorithm but we stop after we create k connected components. So we do not add the k-1 expensive edges.

This section was pretty straightforward and we went over it pretty well in class. It helps that we were using an algorithm that we had already proved and discussed. I would give this section a 8/10. It was helpful and easy to follow.

Section 4.8: Huffman Codes and Data Compression

When communicating with computers, we must change English characters to binary digits. We want to minimize the number of bits needed to encode each letter. The problem with encoding is that, as in morse code, there could potentially be ambiguous codes. This means that the code for one letter code be a prefix of another code. A prefix code is a set of encodings that does not contain any prefixes. Then once we have created this code we are able to translate a string of bits without any ambiguity. This section also looks at optimal prefix codes which have the smallest average number of bits required per letter. The algorithm to create the optimal prefix code can be used multiple ways. We can start with the encoding first, and create a binary tree from it. Each time a path to a leaf branches left, we write down a zero and each time a path to a leaf branches right, we write down a one. We can also start with a binary tree and from it deduce the encoding. We say that a binary tree is full if every internal node has two children. When we finally look at creating the algorithm we look at a bottom up approach as opposed to the top down approach which has to use the brute-force algorithm as opposed to the optimal one. The algorithm, discussed on page 172 of the text, starts by placing all of the letters, with their frequencies in some order. Then you match the two letters with the lowest frequencies with each other under a meta letterwith the frequncy equal to the sum of the two letters frequencies. You then repeat this process until allot the nodes are matched up and the frequency of the meta letter is one. This algorithm is called Huffaman's algorithm. This is a greedy algorithm because at each point we choose the two lowest frequeny letters. This algorithm will run in O(klogk) time if we use a priority queue to keep track of the letters and their frequencies. Prefix codes have some drawbacks including an inability to change as more text is added and that you cannot use partial bits. However, we choose prefix codes as a trade off between the power of compression and the computational cost.

This was a little confusing to reAd, but the explanations in class were very helpful. I would give this section a 8/10 as I never thought of this problem and I think it is pretty cool that they have created a solution to it!

Table of Contents