Chapter 4
Greedy Algorithms
“Greed… is good. Greed is right. Greed works” - Wall Street.
Greed algorithm definition:
Two methods of proof
All this leads up to minimum spanning tree!
4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead
Greedy rule that works
Show that this algorithm is optimal
need to show that this algorithm produces the same size interval as the optimal solution
To prove this algorithm is optimal need lemma.
now can prove greedy algorithm is optimal. see page 121
Scheduling all Intervals
This is similar to the classroom multi resource problem we did in class
4.4: in any instance of interval partitioning the number of resources needed is at least the depth of the set of intervals.
algorithm on p. 124
not sure how to code the “exclusion” idea in python.
4.5: if we use the greedy algoirthm above every interval will be assigned a label, and no two overlapping intervals will receive the same label.
4.6: the greedy algoithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of interals. This is the optimal number of resources needed.
4.2 Scheduling to Minimize Lateness: An exchange Argument
4.3 Optimal Catching: A more Complex Exchange Argument
A problem that focuses on a sequence of requests of a different form.
I don't think we covered this in class and its not listed on the schedule so I'm not going to write on it. If you want me to just let me know!
4.4 Shortest Paths in a Graph
Problem = how to find the shortest paths either from one point to another or to all the points.
Dijkstra algorithm
on page 138
4.14 consider the set S at any point in the algorithm's execution. For each u element of S, the path Pu is a shortest s-u path.
4.15 using a priority queue, Dijkstra's algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for extractMin and Changekey operations.
4.5 The Minimum Spanning Tree Problem
problem: looks for way to hit all the verticies the cheapest way possible.
4.16: Let T be a minimum-cost solution to the network design problem defined above. Then (V,T) is a tree.
Three algorithms that work:
Kruskals: orders edges from low cost to high cost and keeps adding in the min cost that does not cause a cycle
Prim's - pick a source node and add the node that can be added most cheaply to the tree. Continue to do this.
Reverse - delete algorithm - delete highest cost edges as long as it does not disconnect the tree.
4.17: assume that all edges costs are distinct. Let s be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v,w) be the minimum cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e.
from pgs 146 - 148 the book discusses the why the three algoithms above are optimal.
pgs 149 - 150 talk about the implementation of prims algorithm
4.6 implementing Kruskal's algorithm: the union-find data structure
4.7 Clustering
The problem
clustering arises when you have a group of objects and want to stratify them into different groups.
common approach is to define a distance function on the objects where the larger the distance the less “related” the objects are to each other. In a physicial world this function could be taken as literal distance.
Given a distance function the clustering approach seeks to divide them into groups so that objects in the same group are close and objects in the other group are far apart.
clustering of maximum spacing
further def of distance function: d(i,i) = 0, d(i,j) > 0, d(i,j)=d(j,i)
if we want k groups you need k clusterings.
we also want clustering with maximum possible spacing, which means we want the clusters to be as far as possible from each other.
Designing the algorithm:
Analyzing the algorithm
4.26: The components C1, C2, …, Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k clusering of maximum spacing.
proof on page 150.
4.8 HUffman codes and data compression
encoding symbols using bits
convert text into binary aka long strings of bits
could fix a number of bits to each letter. so if we have k letters need 2^k bits.
this might not be the most efficient.
how do you lower the average number of bits per letter?
Variable-length encoding schemes
prefix code
means that no letters are prefixes. don't need spaces.
ex. a = 00, b = 10, c = 11. 110010 is clearly cab.
we will want an optimal prefix codes aka we want less bits for more frequent letters.
see def. of encoding length on page 165. (sum of frequencys * length of encoding)
designing the algorithm
Analyzing Huffman's algoirthm
proof of optimality - on p. 174. (Very mathy, uses summation notation and not words)
second part of proof uses proof by contradiction.
see page 174 - 175
Running time = o(klogk)
extensions
can use it with images and encoding messages.
I can't remember where or why I went to this, (was it at UDEL?, but I remember going to talk and listening to a woman talk about how her job was to do this to encode messages in images.