Chapter 4

My notes on Chapter 4 readings

Intro & 4.1: Interval Scheduling

A greedy algorithm uses a sum of local optimizations to make a global optimization.
We can show optimality by showing that greedy stays ahead or by making an exchange argument to show that the greedy solution and the optimal solution are equivalent.
We can find a greedy algorithm to solve interval scheduling problems
- We solve this by taking all requests, and, while there still exist requests to be processed, choose a request W with the smallest finishing time, add it to our schedule, then delete from the requests all requests incompatible with W.
We can complete this in O(n*log(n)) time
We can extend this to fulfill interval partitioning, in which the number of resources needed is at least the depth of the set of intervals - that is, the number of intervals overlapping.
- We can solve this by sorting intervals by start time, then for each interval, either placing it in the first resource available for it, or allotting a new resource if none is available.
This section wasn't very interesting to read. 6/10.

Say we want to schedule jobs with start and end times and deadlines so that the maximum lateness for any job is minimized.
We can't always do what we've done for the previous jobs, because in that example we just did as much as we could and here we have to do everything.
We can't do jobs in increasing order of length, or of slack time. Instead, we order by deadline!
We prove this by exchange argument: any optimal solution that isn't the greedy solution must have some sort of inversions! We un-invert those, get rid of lack time, then we have our greedy solution.
Not that interesting a section, kind of overly wordy. 5/10.

We want to find the shortest path in a graph, either from some node s to some other node t or from s to every other node.
We can use Dijkstra's algorithm to do this for directed graphs:
- Explore nodes by selecting the next (univisited) node t as that with the next lowest-cost distance from s by way of an already discovered node.
- Add t to the discovered set.
Using a priority queue, Dijkstra's algorithm runs in O(m) time for m edges.
6/10.

A minimum spanning tree is a subset of a graph G such that the new graph G' is connected as cheaply as possible.
Such a tree will necessarily be a tree.
Three algorithms to build a minimum spanning tree consist of doing the following:
- Adding edges to the MST in increasing order of cost, provided the next cheapest edge does not form a cycle,
- Beginning at vertex s, adding the next cheapest edge,
- Begin with the original G and, in decreasing order of edge cost, delete edges provided the current edge to be deleted will not make the spanning tree disconnected.
These three algorithms are Kruskal's, Prim's, and the Reverse-Delete algorithms, respectively.
We can implement Prim's algorithm in O(mlogn) time.
This section wasn't interesting. 4/10.

We can implement Kruskal using a pointer-based structure to support creation, find, and union for sets.
With this structure we can run Kruskal in O(m log n) time
Not that interesting, 5/10

We can use clustering when we want to group objects based on their similarity.
In order to find a clustering with maximum spacing (objects in the different groups are as far apart as possible), we use Kruskal but stop before we add the last k-1 edges where k is the number of connected components in G.
I had a phone interview with a clustering question and I had no idea what to do. This section is a 10/10 because Epic pays really well and asks about clustering in interviews.

Data encoding problems all relate to how we can take a problem and make it recursive.
We can apply data compression to encoding symbols in bits, encoding finding prefix codes (mapping letters to bit strings 1:1), etc.
We can represent prefix codes by using binary trees:
- A binary tree whose number of leaves is the size of our alphabet, and we label leaves with distinct letters of the alphabet
  - By selecting a letter s in our alphabet, we follow the path from the root to the 's' leaf, and for each time we go to the left child we write a 0, and for the right we write a 1.
Huffman's algorithm takes this a step farther, saying that if the alphabet S has more than two letters, we form an alphabet S' excluding the two lowest-frequency letters, replacing them with a new letter. We construct a prefix code c for S', and define a prefix code for S beginning with the tree used to find c.
This algorithm uses the minimum needed number of bits per letter for a prefix code, and can run in O(k^2) time. If we use priority queues, we can get O(k logk) time.
I know this should have been interesting, but since I missed class I'm just basing my opinion on the book which was not interesting. 4/10.