Chapter 4

Chapter 4

Greedy Algorithms

“Greed… is good. Greed is right. Greed works” - Wall Street.

Greed algorithm definition:

Hard to define
builds up a solution in small steps.
need to prove the greedy algorithm is optimal.

Two methods of proof

greedy algorithm stays ahead (4.1)
Exchanage argument

All this leads up to minimum spanning tree!

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

Greedy rule that works

accept the request that finishes first
algorithm on page 118

Show that this algorithm is optimal

need to show that this algorithm produces the same size interval as the optimal solution
To prove this algorithm is optimal need lemma.
- Lemma 1: for all indices r⇐ k we have f(ir) ⇐ f(jr). proof on p. 120
now can prove greedy algorithm is optimal. see page 121

Scheduling all Intervals

This is similar to the classroom multi resource problem we did in class
4.4: in any instance of interval partitioning the number of resources needed is at least the depth of the set of intervals.
algorithm on p. 124
not sure how to code the “exclusion” idea in python.
4.5: if we use the greedy algoirthm above every interval will be assigned a label, and no two overlapping intervals will receive the same label.
4.6: the greedy algoithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of interals. This is the optimal number of resources needed.

4.2 Scheduling to Minimize Lateness: An exchange Argument

We did a lot with this in class - talks about maximum lateness
optimal greedy rule - earliest deadline firt
algorithm pgs. 127 - 128
- 4.7 there is an optimal sched with no idle time
- 4.8 all schedules with no inversions and no idle time have the same maximum lateness
- 4.9 there is an optimal schedule that has no inversions and no idle time.
- swapping argument page 129 and page 130
- 4.10 the schedule A produced by the greedy algorithm has optimal maximum lateness L.

4.3 Optimal Catching: A more Complex Exchange Argument

A problem that focuses on a sequence of requests of a different form.
I don't think we covered this in class and its not listed on the schedule so I'm not going to write on it. If you want me to just let me know!

4.4 Shortest Paths in a Graph

Problem = how to find the shortest paths either from one point to another or to all the points.

Dijkstra algorithm
- on page 138
- 4.14 consider the set S at any point in the algorithm's execution. For each u element of S, the path Pu is a shortest s-u path.
  - proof on page 139
  - used dijkstra's for a project in highschool before.
- 4.15 using a priority queue, Dijkstra's algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for extractMin and Changekey operations.

4.5 The Minimum Spanning Tree Problem

problem: looks for way to hit all the verticies the cheapest way possible.
4.16: Let T be a minimum-cost solution to the network design problem defined above. Then (V,T) is a tree.
Three algorithms that work:
- Kruskals: orders edges from low cost to high cost and keeps adding in the min cost that does not cause a cycle
- Prim's - pick a source node and add the node that can be added most cheaply to the tree. Continue to do this.
- Reverse - delete algorithm - delete highest cost edges as long as it does not disconnect the tree.
4.17: assume that all edges costs are distinct. Let s be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v,w) be the minimum cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e.
from pgs 146 - 148 the book discusses the why the three algoithms above are optimal.
- we went over these in class. the book also discusses the cycle and cut properties we talked about
pgs 149 - 150 talk about the implementation of prims algorithm
- important take away for prim use a priority queue

4.6 implementing Kruskal's algorithm: the union-find data structure

union-find data -
- given a node u, the operation Find(u) will return the name of the set containing U.
- Can be used to test if two nodes are in the same set.
- Can merge two sets. Use the union-find to maintain connected components of graph G(V,E)
- more info on p 152.
- 4.23 consider the array implementation of the union-find data structure for some S of size n, where unions keep the name of the larger set. The Find operation takes O(1) time, MakeUninionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O(klogk) time.
  - pf on page 153
- another implmentation of Union-Find possible.
  - 4.24 Consider the above pointer-based implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. A union operation takes O(1) time. MakeUnionFind(S) takes O(n) time. and a find operation takes O(logn) time.
  - Definitely still have some questions of the Union-find data structure, but I am going to go review the slides I missed.
implementing Kruskal's Algorithm
- sort the edges by cost, then use union-find to maintain the connected components of (V,T) more details on p. 157
- 4.25 Kruskal's algoirthm can be implemented on a graph with n nodes and m edges to run in O(mlogntime)

4.7 Clustering

The problem

clustering arises when you have a group of objects and want to stratify them into different groups.
common approach is to define a distance function on the objects where the larger the distance the less “related” the objects are to each other. In a physicial world this function could be taken as literal distance.
Given a distance function the clustering approach seeks to divide them into groups so that objects in the same group are close and objects in the other group are far apart.

clustering of maximum spacing

further def of distance function: d(i,i) = 0, d(i,j) > 0, d(i,j)=d(j,i)
if we want k groups you need k clusterings.
we also want clustering with maximum possible spacing, which means we want the clusters to be as far as possible from each other.

Designing the algorithm:

we grow a graph from the set of vertices
- add edges in increasing order of distances, don't add an edge if the two vertices are already in the same cluster.
- if we add an edge that spans two distinct components we will be merging the two clusters. (call this single link clustering)
- we are using hierachial agglomerative clustering to make this work.
- we are using Kruskal's minimum spanning tree algorithm thus the union find data structure.
  - we stop just before we add the k-1 edges.

Analyzing the algorithm

4.26: The components C1, C2, …, Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k clusering of maximum spacing.
proof on page 150.

4.8 HUffman codes and data compression

basic problems of data compression

encoding symbols using bits

convert text into binary aka long strings of bits
could fix a number of bits to each letter. so if we have k letters need 2^k bits.
this might not be the most efficient.
how do you lower the average number of bits per letter?

Variable-length encoding schemes

ex. morse code
morse code the lettters are prefixes. we don't want this.

prefix code

means that no letters are prefixes. don't need spaces.
ex. a = 00, b = 10, c = 11. 110010 is clearly cab.
we will want an optimal prefix codes aka we want less bits for more frequent letters.
- will lower abl.
see def. of encoding length on page 165. (sum of frequencys * length of encoding)

designing the algorithm

representing prefix codes using bin trees.
- each leaf is a letter
- going left is 0 going right is one
- see ex. on 168
- will give a prefix code if all the letters are leafs! (4.27)
- 4.28 the bin tree for optimal prefix code is full
Top down approach
- different from how we did it in class
- this isn't optimal
- see 4.29 and page 170-171
Now lets find the optimal prefix code algorithm
- 4.31 - there is an optimal prefix code, with corresponding tree t* in which the two lowest-frequency letters are assigned to leaves that are siblings in t*
- called huffman's algorithm
- see page 172
- we also did a different algorithm in class

Analyzing Huffman's algoirthm

proof of optimality - on p. 174. (Very mathy, uses summation notation and not words)
second part of proof uses proof by contradiction.
see page 174 - 175
Running time = o(klogk)

extensions

can use it with images and encoding messages.
I can't remember where or why I went to this, (was it at UDEL?, but I remember going to talk and listening to a woman talk about how her job was to do this to encode messages in images.

Table of Contents