−Table of Contents

Chapter 4

Chapter 4

definition of Greedy agorithm: An algorithm is greedy if it builds up a solution in sma!l steps, choosing a decision at each step myopically to optimize some underlying criterion. OR Finding the best step locally.

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

Section Summary:

Problem Motivation: We have a set of requests {1, 2 ….. n}; the ith request corresponds to an interval of time starting at s(i) and finishing at f(i). We’ll say that a subset of the requests is compatible if no two of them overlap in time, and our goal is to accept as large a compatible subset as possible. Compatible sets of maximum size will be called optimaL.

Most Natural attempt: select the available request that starts earliest.
Result: not optimal ⇒
Counter-Example: if the earliest request i is for a very long interval, then by accepting request i we may have to

reject a lot of requests for shorter time intervals

attempt 2: smallest interval of time
result: not optimal * counter-example: p142 fig.b

Optimal solution: choosing the request that finishes the earliest
Proof of optimality: greedy stays ahead in each step.

A Related Problem: Scheduling All Intervals Interval Partitioning Problem

algorithm on P149

Proof Approaches:  Greedy algorithm stays ahead • Does better than any other algorithm at each step  Exchange argument • Transform any solution into a greedy solution  Structural Argument • Figure out some structural bound that all solutions must meet

Readable/Interesting: 6/6

4.2 Scheduling to Minimize Lateness: An Exchange Argument

The Problem We have a single resorce, and a set of job requests with a start time and a deadline. We are trying to schedule the tasks to minimize the max lateness.

opt Algo: Earliest deadline first.

The main step in showing the optimality of our algorithm is to establish that there is an optimal schedule that has no inversions and no idle time.

analysis: Our plan here is to gradually modify O while preserving its optimality at each step, but eventually transforming it into a schedule that is identical to the schedule A found by the greedy algorithm. We refer to this type of analysis as an exchange argument

There is an optimal schedule with no idle time. All schedules with no inversions and no idle time have the same maximum lateness. Def. An inversion in schedule S is a pair of jobs i and j such that: di < dj but j scheduled before i.

Greedy Exchange Proofs on slide 4.2 p7.

Running time: nlongn — need to sort the deadlines first.

Readable/Interesting: 6/6

4.4 Shortest Paths in a Graph

problem:

Given a directed weighted graph, what is the shortest path?

Dijkstra's Algo: The algorithm maintains a set S of vertices u for which we have determined a shortest-path distance d(u) from s; this is the “explored” part of the graph. Initially S = {s}, and d(s) = O.

Now, for each node v ~ V-S, we determine the shortest path that can be constructed by traveling along a path through the explored part S to some u ~ $, followed by the single edge (u, v). That is, we consider the quantity d’(v) = mine=(a,v):a~s d(a) + ~e. We choose the node v e V-S for which t~s quantity is minimized, add v to S, and define d(v) to be the value d’(v).

We always form shortest new s-v path from a path in S followed by a single edge • Proof of optimality: Stays ahead of all other solutions  Each time selects a path to a node v, that path is shorter than every other possible path to v (stay ahead proof)

proof of correctness: inductive proof Inductive hypothesis: Assume true for |S| = k, k ≥ 1 Let v be the next node added to S by Greedy, and let uv be the chosen edge. The shortest s→u path plus u→v is an s→v path of length π(v)  Consider any s→v path P. It's no shorter than π(v).  Let x→y be the first edge in P that leaves S,and let P' be the subpath to x.  P is already too long as soon as it leaves S.

Question: Still cannot see why the order matter. If we do a BFS and visit all vertices and update all distance labels, wouldn't we get the same result?

Readable/Interesting: 8/8

4.5 The Minimum Spanning Tree Proble

Section Summary

Motivation of the problem/applications: How to minimize cost of laying cables? How to find the minimum spanning tree for a connected weighed graph?

Definition: Min Spanning tree: spanning: spans all nodes in graph Tree: contains no cycles Minimum: no cycle

Claim: MST contains no cycle Proof: Contradiction, sippose it contains cycle. Then we can remove the long route and keep it a spanning tree while reducing the cost. Thus the tree is not minimum.

Two Properties: When is it safe to invclude an edge in the MST? Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then MST contains e. proof on P170

When is it safe to remove an edge from the MST? Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then MST does not contain f. proof on P170

cycle-cut intersection proof

Three algorithms

Prim: Start with an arbitrary node s and add it to the tree T. Grow T outward by choosing the cheapest edge e with on endpoint in T and the other out of T. Proof on P172 Cut property

Kruskal: Sort the edges in G in ascending orders. pick the smallest edge that does not creat a cycle.

Reverse-delete: It is the rever version of Kruskal

Implementations of the two algorithms

Questions: I had some trouble understanding the proof for Cayley's therom…

4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure

Section Summary:

Problem Motivation: Try to implement the Kruskal's algorithm

pointer based implementation

Find Operations Log(n)

Union Operations Log(n)

The Kruskal Algorithm in detail

Intereting/readible : 5/5

4.7 Clustering

Section Summary definition: Divide objects into clusters so that points in different clusters are far apart.

Motivation/Application Identify patterns in gene expression

K-Clustering: try to minimize the distance functions so that the farthest two elements in one cluster are not farther apart than any elements outside of the cluster.

It is basically Kruskal's algo except we stop when there are k connected components we seek for.

proof on P184

Interesting/readable: 7/7

4.8 Huffman Codes and Data Compression

summary

Motivation: How to encode data so that transmission could be more efficient? Answer: use less bits on the data without much differentiation!(Low entropy data?)

We use Huffman codes.

If we use all letters in the same frequency, then there is nothing to encode or compress, but when we do not, which is often the case, we can always represent the more frequently used words with less bits.

Watch out for prefix ambiguity!

variable-length encoding

Goal: minimize Average number of Bits per Letter (ABL): calculate abl using the expected value.

Algorithm

implemenation

Binary tree for the prefix codes
Priority queue (with heap) choosing the node with lowest frequency

Cost (nlogn) Why? logn inserting and dequeuing. do it n times.

Interesting/readable: 5/8