CS211: Design and Analysis of Algorithms

−Table of Contents

Chapter 4: Greedy Algorithms

Chapter 4: Greedy Algorithms

4.1 - Interval Scheduling: The Greedy Algorithm Stays Ahead

A subset of requests is compatible if no two of them overlap in time, accepting the largest compatible subset possible. Compatible sets of maximum size are optimal.

Rules for selection of requests can vary but not all are desirable: selecting by earliest start time, shortest interval, and least amount of conflicts all have counterexamples. The best choice is selecting based on earliest finish time

A is a compatible set of requests. The goal is to show that the set A is equivalent to the optimal solution, making sure that each element in A finishes at a time less than or equal to the element from the optimal set. Through this process, we conclude the greedy algorithm returns an optimal set A. The algorithm runs in O(n log n) time.

A Related Problem: Scheduling All Intervals

Another problem may ask for all intervals to be satisfied using as few resources as possible. In any instance of Interval Partitioning, the number of resources needed is at least the depth of the sets of intervals. Altering our greedy algorithm to include a depth label allows every interval to be assigned a label. No two overlapping intervals will receive the same label. This schedules every interval on a resource, using a number of resources equal to the depth of the sets of intervals. This is the optimal number of resources needed.

Discussion

I found the most interesting part of this section was the lecture where we analyzed the algorithms with counterexamples. It gave the most clarity to the problem solving method used here. In addition, I was surprised to see that soling for all intervals was as simple as adding a depth tracker. I expected to implement new rules for obtaining the optimal solution.

4.2 - Scheduling to Minimize Lateness: An Exchange Argument

Reexamine the problem where the goal is to minimize lateness. Each task is then assigned a deadline. The optimal solution is found when arranging tasks in increasing order of deadlines (Earliest Deadline First). The schedule has no idle time, that is time between tasks. An inversion in the schedule occurs when a job is scheduled with a deadline later than another unselected job. All schedules with no inversions and no idle time have the same maximum lateness. Therefore, there is an optimal schedule that has no inversions and no idle time. A schedule produced by this greedy algorithm, with Earliest Deadline First sorting, will have a maximum lateness of L.

4.4 - Shortest Paths in a Graph

The goal is to determine the shortest path from starting node s to every other node in a graph. To do this, determine the length of shortest paths from s and store them as you travel to mark the explored portion of the graph. A node is only stored as the shortest path if it is properly so, staying ahead. Consider the set S at any point in the algorithm’s execution. For each node in S, the stored path is a shortest path to that node. Known as Dijkstra’s Algorithm.

The algorithm doesn’t find shortest paths where edges have negative values. Also, the algorithm is very similar to breadth-first search. Using a priority queue, the algorithm can be implemented on a graph with n nodes and m edges in O(m) time. Using heap-based priority queue can result in O(log n) time. Overall implementation: O(m log n).

4.5 - The Minimum Spanning Tree Problem

The goal is to build a path between every pair of nodes as cheaply as possible (or with least distance). A minimum-cost solution will not contain cycles. A minimum spanning tree can be found using several different greedy algorithms. It is possible to begin with no edges and add them in order of increasing cost. Another way is similar to Dijkstra’s Algorithm, begin at a root node and add the cheapest node possible. Another is to begin with a complete graph and delete edges in order of decreasing cost.

Assuming all edges are distinct, let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e=(v,w) be the minimum-cost edge with one end in S and the other in V-S. Then every minimum spanning tree contains the edge e. Let C be any cycle in tree G, and let edge e=(v,w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G.

Both Prim’s and Kruskal’s Algorithms can be implemented in O(m log n) time.

4.6 - Implementing Kruskal’s Algorithm: The Union-Find Data Structure

Consider a graph with a fixed number of nodes but grows over time by the addition of edges. We want to keep track of the connected components of the graph. The Union-Find data structure maintains disjoint sets. Given node u, Find(u) will return the name of the set containing u. This can be used to test if u and v are in the same set by checking Find(u)=Find(v). Union(A,B) takes two sets and merges them into a single set. This method only supports addition of edges, not deletion.

In an array implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set, the Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O(k log k) time.

In a pointer-based implementtion of the data structure, a Union operation takes O(1) time, MakeUnionfind(S) takes O(n) time, and a Find operation takes O(log n) time.

Kruskal’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m log n) time.

Discussion

As the algorithms we explore begin to become more complicated, it is more important than ever to work through proofs slowly. It is much harder to fully understand optimal solutions upon first glance when the problem sets become large and complex. I found the similarities of the BFS and Dijkstra’s Algorithm interesting. I don’t see much of a difference, just an ordering of the edges added (not very complicated).

4.7 - Clustering

Clustering occurs when breaking down a collection of objects into coherent groups. This is possible through a distance function, assuming that closer items are more related. The spacing between clusterings is the minimum distance between any pair in different clusters. The goal is to find to maximum possible spacing between clusters.

One method is to add edges between nodes in order of increasing distance. If the edge connects nodes already in the same cluster, it doesn’t add them. The algorithm terminates when it has reached some number of connected components. The components C(1), …, C(k) formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing. Similar to Kruskal’s Algorithm.

4.8 - Huffman Codes and Data Compression

Data compression deals with reducing the space of files through efficient encoding schemes. Prefix codes are used to separate bit strings such that data can be stored using a variable bit depth. This technique is useful when determining bit encoding for letters based on frequency.

We use a Greedy Algorithm to find the optimal prefix code. In a binary tree T, with nodes representing elements we wish to encode, following the path from the root to a desired node gives you the optimal prefix. Every branch to the left, we write 0. Naturally, every branch to the right in our path is written as a 1. The sequence of bits along the path is the encoding. The process can be applied inversely to build a tree from a prefix code. It is easy to see how the length of encoding corresponds to the number of edges on its path. The binary tree corresponding to the optimal prefix code is full (parent node has 2 children).

Producing a tree with leaves as close to the root as possible gives us a small average leaf depth. A top-down approach is to split the alphabet into two sets with a balanced frequency of letters. Then, recursively construct prefix codes for both sets and make them the two children of the root node. The question arises: how do we know what is acceptable as a balanced frequency for the sets?

Huffman’s Algorithm requires that we group lower frequency letters recursively and replace them with meta-letters which are later expanded back into letters. The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code. Finding the lowest frequency letters take O(k) time and so, over k-1 iterations, the running time is O(k^2). However, a priority queue would allow for insertion and extraction at O(log k) time. Over k iterations, the total run time is O(k log k).