Paul's Journal

Preface

Mentions the problem of P vs. NP. I was never really all that good on this topic. Interesting. Must refresh myself.
Mainly is a pep talk on why I should take this course.
Designed for undergrad.

Chapter 1

First Problem: Stable Matching.
- w = women
- m = men
- Matching is stable if the following conditions hold:
  - If w prefers m' over her husband, m, m' does not prefer w over his own wife, w'.
  - If m prefers w' over his wife, w, w' does not prefer m over her own husband, m'.
- M = set of n men.
- W = set of n women.
- A matching is a set of ordered pairs, S, each from MxW, with the property that each member of M and each member of W appears in at most one pair in S.
- A perfect matching is a matching S' with the property that each member of Mand each member of W appears in exactly one pair in S'.
- Gale-Shapley Stable Matching Algorithm
  - Initialize each person to be free
  - while (some man is free and hasn't proposed to every woman)
    - Choose such a man m
    - w = 1st woman on m's list to whom m has not yet proposed
    - if (w is free)
      - assign m and w to be engaged
    - else if (w prefers m to her fiancé m')
      - assign m and w to be engaged and m' to be free
    - else
      - w rejects m
- (1.1) w remains engaged from the point at which she receives her first proposal; and the sequence of partners to which she is engaged gets better and better (in terms of her preference list)
- (1.2) The sequence of women to whom m proposes gets worse and worse (in terms of his preference list).
- (1.3) G-S algorithm takes at most n^2 iterations.
- (1.4) If m is free at some point in the execution of the algorithm, then there is a woman to whom he has not yet proposed.
- (1.5) G-S algorithm always returns a perfect matching.
- (1.6) G-S algorithm always returns a stable matching.
- Because men propose, it's more likely that the stable matching will give the men the women that are higher on their preference list. This would not be the case if women were the ones proposing. Unfair towards women.
- (1.7) G-S algorithm always returns the same stable matching set.

Chapter 2

Polynomial time:
- There exists constants c > 0 and d > 0 such that on every input of size N, its running time is bounded by c N^d steps.
Good rule to go by: An algorithm is efficient if its running time is polynomial.
T(n) is the worst case running time of an algorithm.
We say that T(n) is O(f(n)) if there exist constants c > 0 and n0 ≥ 0 such that for all n ≥ n0, we have T(n) ≤ c · f(n).
- T(n) is asymptotically upper bounded by f.
- T(n) is order f(n).
T(n) is Ω(f(n)) if there exist constants ε > 0 and n0 ≥ 0 such that for all n ≥ n0 , we have T(n) ≥ εf(n).
- T(n) is asymptotically lower bounded by f.
T(n) is Θ(f(n)) if T(n) is both O(f(n)) and Ω(f(n)).
O == Upper Bound == WORST TIME
Ω == Lower Bound == As fast as it can go
Θ == Tight Bound == It fits this running time just right
Properties:
- Transitivity:
  - If f = O(g) and g = O(h) then f = O(h).
  - Also works with Ω and Θ.
- Additivity:
  - If f = O(h) and g = O(h) then f + g = O(h).
  - Also works with Ω and Θ.
Constant time is better than logarithmic time.
Logarithmic time is better than polynomial time.
Polynomial time is better than exponential time.
Same stuff from CS 111/112 about run times. Linear, quadratic, logarithmic, etc.

Section 2.3

Journal Entry for week 2 starts here
Transitive property applies to O, Ω, Θ.
(2.6) If g = O(f), then f + g = O(f).
Goes over the running times of arrays and lists. Same stuff we did in CS 112.
Goes over efficiency of G-S algorithm
1. Identify free man.
2. Identify highest ranking woman on his list
3. Find out marital status
4. Choose appropriately according to her preference list whether she gets engaged to the man in question or stays married to her husband
Implementation of G-S Algorithm
- Since they are implementing a lot of this using arrays, this will run pretty quickly since a lot of the things that need to be done take constant time.
- Linked list to keep track of the men.
  - Use it like a stack. As soon as a man is engaged, take him out of list. Constant time since we always deal with the person in the front. If a man's wife leaves him, he is put at the top of the list, which also takes constant time.
  - O(k)
- Identify highest tanking woman on his preference list.
  - Next[m] keeps track of which woman man m should propose to next
  - Initially Next[m] = 1 for all m
  - ManPref is a 2D matrix of the preference lists for the men.
  - ManPref[m, Next[m]] returns the woman m should propose to next
  - Once m proposes to the woman, Next[m] is incremented regardless of what the woman's response is.
  - O(k)
- Check if woman is engaged already
  - Current is an array of length n.
  - Current[w] is the woman's current partner
  - Thus, checking if a woman is engaged will be fast
  - O(k)
- Maintaining the preferences of the women.
  - At the beginning, create an nxn array named Ranking
  - Ranking[m,w] would return how highly ranked m is on w's preference list
  - O(k)
- The G-S Algorithm runs in O(n^2)
Book then slowly goes over common running times. Same stuff from CS 112 AGAIN.
It does then discuss times that are worse than polynomial time, which we did not discuss in CS 112, but we did in CS 312 and CS 313.
Exponential time is bad. Booo. We do run into it often with brute-force algorithms, but that is only because we must often explore all possible subsets of an input. This means going through every element in the power set. That means exponential time (2^n).
We also occasionally run into a n! problem. This grows faster than exponential. Usually happens when we must explore all possible matchings in a set. Also arises when we must find the number of ways to arrange n items in order.
Summary, brute-force algorithms are bad.
Priority Queues
- Each element has a priority value or a key
- Can be used to sort a list
- Heap to implement
- Heaps have order, parents are always less than their children. This makes finding the root/min easy.
  - We can use either a tree with pointers or an array with a funky way of finding children to implement the heap
    - Tree with nodes should be pretty simple.
    - Array implementation
      - Root at 0
      - Left Child of i at 2i
      - Right Child of i at 2i+1
      - Parent of i is floor(i/2)
      - Length of array tells how many elements are in it
  - Because smallest element is at the root, finding the min is easy.
  - Adding elements
    - We added something (we gave a node a child)
    - Now we must put it in its right place.
    - So we move things up.
    - Pseudo code:
      - Heapify-up(H, i):
        
        if i > 1 then (if we are not at the root)
        
        j=parent(i)=floor(i/2) (Make j the parent)
        
        if key[H[i]] < key[H[j]] (If the element at i is less than the parent j, swap em)
        
        swap array entries H[i] and H[j]
        
        Heapify-up(H, j) (Do this until we are at the root)
    - Fixes everything and puts it in the right order
    - Should only be O(log n) because that is the height of the tree. Also, every other step in the function is constant.
  - Removing elements
    - We removed something. We must move this empty spot. We fill it with value w.
    - If w is too small for its current position in the tree, we use heapify-up to get it in its right place.
    - If it is too large for its position, we call Heapify-down
    - This part is kind of confusing, because how do we know if we need to use heapify-down or heapify-up.
    - Pseudo code:
      - Heapify-down(H, i):
        
        n = length(H)
        
        if 2i > n then
        
        Terminate with H unchanged
        
        else if 2i < n then
        
        left=2i and right=2i+1
        
        j be index that minimizes
        
        key[H[left]] and key[H[ right ]]
        
        else if 2i = n then
        
        j=2i
        
        if key[H[j]] < key[H[i]] then
        
        swap array entries H[i] and H[j]
        
        Heapify-down(H, j)
    - This is also O(log n)
- Heap Operations
  - StartHeap(N)
    - Creates an empty heap that can hold N elements
    - Is O(N) because we need to create N spaces? Books says N, but I thought it would be constant.
  - Insert(v)
    - Inserts item v into heap
    - Is O(log n) because the heapify-up function takes log n time.
  - FindMin()
    - Identifies minimum element in the heap.
    - Since this is the root and we are not removing or adding anything, this is O(k).
  - Delete(i)
    - Deletes element in heap at a specified position
    - Is O(log n) because either we will use either heapify-up or heapify-down, which are both O(log n)
  - ExtractMin()
    - Finds smallest element of the heap and removes it.
    - Smallest element is the root, so finding it is O(k)
    - Removing it by either heapify-up or heapify-down, so it is O(log n)
    - So, the entire ExtractMin operation will take O(log n)
How readable this chapter was:
- The part where it explained the removing an element from the heap was very confusing. I had to read it multiple times. I'm still not sure if I understand it. As I see it, we either use heapify-up or heapify-down when deleting an element. How do we know which one to use? We know that at this point in the heap, things can be out of order. We removed an element, we are not replacing it, so it would make sense that we would just put an infinity at the node, heapify-down, and then remove the infinity node at the end. The book talks about it like we are replacing the value. WHAT IF I JUST WANNA DELETE THE VALUE? If you couldn't tell, this part of the chapter was very frustrating.
- The rest of the book was very readable. The parts where it went over things I already learned in CS 112 was kind of annoying.

Chapter 3

GRAPHS!
- Useful for a lot of things.
- Book mentions a lot that I don't care about.
- Cool things? Helps us think about DFAs. Cool stuff.
- A lot of fluff about something that's pretty simple.
- Cycle = loop in graph.
Representation
- Adjacency Matrix
  - Symmetric Matrix
  - Matrix[i,j]=1 implies that there is an edge between node i and j
  - Θ(n^2) space
  - Constant time access
  - Identifying all edges is Θ(n^2). This kind of confused me. I thought it would only take n(n+1)/2 time because a lot of the data is repeated. Wouldn't it be O(n^2) and Θ(something else)? The big theta/tight bound concept confuses me. The book says I have to “inflate” the running time, and n(n+1)/2 does inflate to n^2, but it seems like this makes the point of the tight bound (I assume that the point of identifying the tight bound is to find a more precise measurement of efficiency) if it “inflates” the running time? Is it used to find a precise measurement of efficiency if the upper bound is really huge (let's say O(n^5)) and the lower bound is really small (let's say Ω(n))? If that is the case, what possible values could we have for the tight bound? This is confusing.
- Adjacency List
  - Not really a list. It's an array of lists. SO IT'S ALSMOST A MATRIX.
  - Array[i] is the ith node. It's a list of all the edges that node i is connected to.
  - There are two representations of each edge because each edge appears in two separate lists
  - m = number of edges
  - n = number of nodes
  - Space is 2m+n.
  - Identifying all edges is Θ(m + n). This tight bound makes sense because it is actually less than the amount of time it would actually take. It's also better than best case.
Trees
- Tree= graph without cycle/loop
- I will call cycles loops because I find that it makes more sense to me
- Theorem
  - Let G be a graph
  - If any two of the following are true, the third is also true
    - G is connected
    - G does not contain a cycle
    - G has n-1 edges
  - We don't really need a proof for this. It's very intuitive. A node needs at least one edge to join a graph. Everything else follows from this.
- Rooted Tree
  - Pick a node
  - Shove everything that's immediately connected to it into the next level down
  - Do the same to those nodes. You now have a hierarchy. This is a rooted tree.
- There's mention of uses for trees. This interests me not at all. If I need a tree, I will know. Also, it's pretty obvious that if the book's teaching me about it, it has some use somewhere. The only case where this isn't a good assumption is in higher up classes where they teach you absurd facts just because some nuts think they're cool. Only sometimes do I find myself to be one of them.
Shortest path problem
- Find the shortest path between two nodes in a graph.
- Lots of obvious uses
- Connectivity and Traversal
  - Breadth-first Search and Depth-first Search
    - Take a graph and make a bunch of layers
  - BFS
    - Pick a node any node
    - That's DAS ROOT. DAS ROOT DAS ROOT DAS ROOT DAS ROOT!
    - Everything that's immediately connected to DAS ROOT is in the next layer.
    - Everything that's immediately connected to any of the nodes in this layer is in the next layer
    - That's how you do it.(I could write a fancy algorithm for it, but it's harder for me to get the hang of things by reading the algorithm. I'd rather have an informal intuitive explanation. That's why I write the algorithms this way. This only becomes a problem when implementation time comes. BUT I R GOoD PROGRAMMER s0 I nO Haz problem)
    - Theorem: if you're on layer i and some other node is on layer j, the distance between the two of you is abs(i-j)
    - Theorem: Layer i is exactly abs(i-j) from everything in layer j
    - Theorem: if node i and j are connected, then the layers they are on differ by at most one. They could be on the same layer. This is the case with all BFS trees no matter what node you pick or what order you add nodes in. It makes sense. Pretty intuitive.
  - DFS
    - Pick DAS ROOT as always
    - grab a node it's connected to. Grab a node that that node is connected to. Keep doing this until you hit a dead end. That's layer one.
    - Go back a node. Find some other node you didn't hit yet. Find another node attached to that one you didn't hit yet. Keep it up. Dead end? That's layer two.
    - Do this until done.
    - That's what I call a DFS tree. Hopefully, I understand things correctly and I am right in calling that a DFS tree. Hopefully.
Queues and Stacks
- Good ole CS112 stuff
- Queues: FIFO
- Stacks: LIFO
- That's all there is to it.
Implementation
- BFS wit Adjacency list
- Pseudo Code:
  - BFS(s): # s is node index. s will be DAS ROOT.
    - Discovered[s]=true # we found s. Discovered will be an array of nodes we already went through
    - i=0 # i is layer counter
    - T = empty # current BFS tree will be empty
      - While L[i] is not empty # while this layer has stuff in it. starts out at L[0] being not empty because it has s in it. happens n times
      - for all nodes u in l[i] happens n-1 times
        
        consider each edge (u,v) incident to u # this section just finds all nodes attached to all nodes in current layer and add them to the tree. This is where all the real work happens . happens n-1 times
        
        if discovered[v] = false # we haven't found this node yet
        
        discovered[v]=true # now we have
        
        add (u,v) to T # add the edge to the tree
        
        add v to L[i+1] #add the new node to the next layer so we can analyze it later
    - i++
  - O(n^3).
  - If we look for a tighter bound, we can find it.
  - “consider each edge (u,v) incident to u”
    - happens at most n-1 times
    - But in reality this rarely happens
    - if we sum up the total of how many times this runs throughout the entire algorithm, we find that it is 2m.
Bipartiteness
- My definition: split up the nodes. Every edge connects one node from each side. doesn't connect nodes on the same side. NO INBREED EDGES.
- Theorems and such
  - Bipartite means cannot have odd length cycle (because this means one edge will connect two nodes form the same side)
  - How to tell if Bipartite? BFS TREE
    - Layers will alternate color

Chapter 4

Section 4.1 - Interval Scheduling: The greedy algorithm stays ahead
- Informal explanation: At each step, make sure you have as much as possible
  - Local Optimizations
- Needs proof to show optimal
- Needs counterexample to show that it is not optimal
- Example problem: Giving out change in fewest coins possible
  - Informal explanation: At each iteration, add biggest coin possible that doesn't take us over our goal amount
  - Only works with American coins (reasons why is obvious, just pick arbitrary numbers for coin sizes and you will eventually find a system of coins that won't work)
- How to prove optimality?
  - Greedy stays ahead: Show that the greedy algorithm performs at least as well as any other arbitrary algorithm
  - Exchange argument: Transform any solution into a greedy solution
  - Structural argument: Figure out some structural bound that all solutions must meet
  - These last two were not talked about yet, but they will be later
- Another example problem: Interval scheduling
  - We want to find a way to schedule the maximum number of jobs where job j starts at s_j and ends at f_j
  - How do we order the jobs?
  - Algorithm is on page 118
  - Ways we could order the jobs:
    - Start time
    - Finish time
    - Total time for the job to run
    - Number of conflicts
  - Counterexamples:
    - Start time
      - Ascending start time: What if we just have one super long job that starts the earliest that conflicts with all the others?We only get one job done.
    - Total time for the job to run
      - Shortest running time: What if we have one that is the shortest, but it conflicts with two super long jobs while those two super long jobs don't conflict at all? We only get one job done rather than two.
    - Number of conflicts
      - Fewest number of conflicts: What if we have 5 that are compatible with each other, but four of them conflict with 30294 other jobs, and there is one job that only conflicts with two of the compatible five? We would get one less than the optimal solution.
  - The best way to do it is by the finishing time.
  - Algorithm should be pretty obvious and intuitive.
  - Proof of optimality: Greedy stays ahead/by contradiction
    - Page 121
    - It's an induction proof that leads to a contradiction,
    - Read for more detail in the book. It's kind of confusing, especially when I get to the part of the inductive proof where I show that if blank holds for k, it holds for k+1. That part threw me off.
- Problem: Scheduling All Intervals
  - We have a bunch of overlapping intervals, and we want to assign them in such a way that we have a minimum number of conflicts
  - Algorithm on page 124
  - Informal Algorithm
    - Put the events in increasing order of start time
    - Schedule the intervals in this order. This will magically minimize the number of conflicts
  - Note that the greedy algorithm never assigns two incompatible intervals to the same label
  - Proof on page 124-125
Section 4.2 - Scheduling to Minimize Lateness: An Exchange Argument
- We have a bunch of jobs that start at s and end at f. We need to schedule these jobs in such a way that we minimize the lateness of a job. The lateness of a job is defined as max(actual finish time - finish time, 0). We want everything to finish on time essentially.
- How do we order the jobs?
- Earliest Deadline First
- NOTE: Greedy algorithm has no idle time (Simple enough idea)
- NOTE: There exists an optimal schedule with no idle time (intuitive, no proof necessary)
- Proof of optimality: By Exchange argument
  - Start with an optimal schedule
  - Slowly change it over time (while maintaining optimality) until it is identical to the greedy proof
  - (Pretty similar to greedy stays a head, same basic idea)
- Definition: Inversion
  - Pair of jobs, i and j, such that the deadline of j is before the deadline of i, but i is scheduled first
- Intuitively, our greedy solution has no inversions (BECAUSE IT IS GREEDY! Think about it. Makes sense right?)
  - Proof of this is on page 129
- Swapping two adjacent jobs with the same deadline does not increase the max lateness
  - Pretty obvious
  - Proof is in the slides
- Swapping two adjacent, inverted jobs reduces the number of inversions by one and does not increase the max lateness
  - This follows from the previous claim
- Proof of optimality
  - Consider an optimal schedule that has the minimum number of inversions
  - (We can safely assume we have no idle time in this schedule)
  - If there are no inversions, this is the same as the schedule produced by our algorithm (same proof technique as last time)
  - If there are inversions…
    - Fix the inversion
    - Does not increase maximum lateness and decreases the number of inversions (this means that before it did not have the minimum number of inversions, which contradicts our definition)
  - DONE!
Section 4.3 - Who cares
Section 4.4 - Shortest Paths in a Graph
- Given a graph, find shortest path.
- What comes to mind? DIJSKTRA
- Dijkstra's Algorithm
  - Keep track of explored nodes S
  - Keep track of distances to each node (initially infinity)
  - Choose an unexplored node v that has one edge (u,v) connecting it to the explored nodes
    - set the distance to it as the min distance from the source to u + length from u to v
  - On page 138
- Proof that it works: Induction
  - Page 139
  - Let S be the set of all explored nodes
  - Consider the case where S only has one element, i.e. |S|=1
    - We obviously have the shortest distance
    - BASE CASE
  - Assume this holds for |S| = k where k ≥ 1
    - That for k nodes, we have the shortest path to all of the k nodes
  - Add another node v (we are no dealing with k+1 nodes) that is connected by edge (u,v)
  - We have the shortest distance to u
  - Any other path from root to v will be no shorter than that from root to u to v
  - So, this works by induction
Section 4.5 - The Minimum Spanning Tree Problem
- Minimum Spanning Tree
  - Tree with n-1 nodes that connects all the nodes
  - This means no cycles
- Algorithms to find MSTs
  - Prim's algorithm
    - Pick a root node (doesn't matter which one because it will be connected in the end anyway
    - Add cheapest edge that has exactly one endpoint already included
  - Kruskal's algorithm
    - Add edges in order of cheapness unless it creates a cycle
    - This sounds like it's exactly the same as Prim's, but it just uses different wording
  - Reverse-Delete algorithm
    - Take original graph
    - delete most expensive edge unless it creates two separate unconnected graphs
- Cut Property
  - Let S be any subset of nodes. If the edge with the minimum cost has exactly one edge in S, the MST contains that edge.
  - Proof: Contradiction
    - Assume WLOG all costs for the edges are distinct
    - Supposed there exists a minimum spanning tree that did not contain the edge e that has the minimum cost with only one edge in S.
    - This means that adding this edge e would create a cycle in S.
    - In this cycle, there is, by definition, an edge larger than e. Let's call it f
    - Removing f and adding e would also create a spanning tree.
    - Since e costs less than f, this new tree has a smaller total cost than the original tree. Thus, the original tree was not a minimum spanning tree. This is a contradiction.
- Cycle Property
  - Consider a cycle. The MST does not contain the largest edge in that cycle.
  - Proof: Contradiction
    - Suppose the MST contained f
    - Removing f creates a cut S
    - By definition of C, there exists another edge e such that e is in both S and C
    - By definition, e is less than f
    - Replacing f with e in the original tree would create a spanning tree with a smaller total cost than the original MST, which contradicts the fact that the original tree was a MST
- A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S.
- Claim: A cycle and a cutset intersect in an even number of edges
- Prim's Algorithm
  - Proof of Correctness:
    - Let s be any node
    - Use cut property to find minimum edge
  - Implementation explained on 149-150
Section 4.6 - Implementing Kruskal's Algorithm: The Union-Find Data Structure
- Kruskal's Algorithm
  - Proof of Correctness:
    - Edges in increasing order of weight
    - Add edges in that order
      - It works because of cut property
      - If it creates a cycle, by the cycle property, it must be the largest edge in the cycle, so we must not add it
- Union-Data Structure
  - Fancy data structure
  - Keeps track of edges added
  - Maintains disjoint sets
  - Operations
    - Find(u): returns name of set containing u
    - Union(A, B): merges A and B
- Implement Kruskal's using Union-Data Structure
  - Details are on page 157
  - This Union-Data Structure is pretty magical.
  - It essentially make the whole algorithm easy.
Section 4.7 - Clustering
- Clustering
  - We got a bunch of points
  - There is a numeric value that specifies the similarity of the nodes
  - How do we divide them into k non-empty groups where the ones in the same group/cluster are more “similar” to each other than to any member of another group/cluster?
  - We want to cluster them in such a way that there is maximum spacing between the clusters.
  - How do we do that? Read that last line again. MAX DIST.
  - Just get rid of the longest edges. BOOM.
  - We only need to delete k-1 edges to get k groups
  - Didn't talk about efficiency of the algorithm much in the book, but I guess that's because it's not particularly difficult to figure it out.
  - How do we know that this algorithm works, i.e. produces a k-clustering of maximum spacing?
  - PROOF INFORMAL OUTLINE:
    - Let C be some clustering. Let d be the largest distance between any two nodes in this cluster C. Suppose there is some other clustering C'. All edges in C' are less than or equal to d because of how Kruskal's algorithm works. More detail in the book, but there is enough to understand just from this.
Section 4.8 - Huffman Codes and Data Compression
- HUFFMAN TREES!
  - Just like CS112
  - We need 5 bits to encode whole aphabet.
  - We can do it in a better way.
  - We can have shorter encodings for more frequent letters.
  - This will work as long as prefixes don't overlap.
  - Goal is to minimize ABL (Average # of bits per letter)
  - Calculate ABL with ABL = sum ( frequency * #bits)
  - It would actually be pretter interesting if they explained how they got this formula. It wouldn't help me learn at all, but it would have been cool
  - How to minimize it? Brute force? If that was the answer we wouldn't be studying this.
  - Use binary tree to represent encoding. encoding is the path you take to get to the letter (all letters are on leaves)
  - 0 left 1 right
  - There's a proof in the book about how the tree has to be full. I thought this was pretty intuitive. I don't know if this is because I've seen this before or if it actually is just pretty intuitive.
  - Proof is on page 168
  - We know it makes a lot of sense to have the most frequent letters near the top of the tree and to have letter with similar frequencies to have similar bit lengths.
  - This is how we are going to put them in the tree.
  - More frequent → higher up in teh tree → fewer bits
  - Similar frequency → sibilings or similar # bits
  - Algorithm on Page 172.
  - Informal and intuitive explanation below
    - Put em all in a priority Q
    - Grab two with lowest frequencies (so that they can be toward the bottom of the tree) and make them have a common parent. make the parent's frequency the sum of those of it's children. this is O(log n)
    - Do this over and over until we only have on thingy in the PQ.
    - This is our tree.
    - This leads us to have similar frequency letters on similar levels.
    - We only do this n-1 times because we remove two items from the PQ and add one, with a net loss of 1 item, and we do this until there is only one item left in the PQ
    - So, the whole algorithm is O(nlogn)
  - Proof on page 174-175 for optimality
  - Informal explanation of proof
    - Contradiction proof
    - Suppose this algorithm is not optimal.
    - This means there is some subtree Z such that the ABL of Z is less than the ABL of the entire tree
    - We know that there is at least one subtree Z where (because the entire tree is full) two letters are the leaves of some node in Z. Let us call these letters y and x.
    - If we delete y and x and label their former parent w
    - I don't really understand the rest of the proof. It's very confusing. I'll have to keep looking at it.

Chapter 5

Section 5.1 - A First Recurrence: The Mergesort Algorithm
- Break something up into little parts and build them back up recursively.
- Example? Merge sort.
- We break the list in half,
- Keep doing this until we only have lists of length one
- We sort all of them together again
- Let T(n) be the number of comparisons needed to merge sort a list of length n
- T(n) can be upper bounded by T(n) ≤ 2 T(n/2) + cn when n > 2
- 2 is the number of problems that the original is divided into
- n/2 is the new problem size
Section 5.2 - Further Recurrence Relations
- Let's find a general solution.
- Things to figure out:
  - # of subproblems
  - size of subproblems
  - number of levels
  - cost of combining
- (5.4) Any function T(___) with q>2 where T(n) ≤ q T(n/2) + cn when n>2, i.e. the problem is split into q different n/2 size problems, and T(2)⇐ c is bounded by O(n^(log_{2}^{q})).
- The process for doing so is demonstrated in Problem 2 of Problem set 6.
  - Find the four things above
  - Calculate the number of problems per level
  - Calculate the size of each subproblem in each level in order to find the cost at the bottom most level.
    - The subproblem sizes do not matter unless at last level
    - At least I think… I might be terribly wrong.
  - Problem size: Bottom Most Level work x # levels x Combination cost
    - Also, I think I might be terribly wrong
  - My way of calculating this may be very wrong.
  - But, I think that winging it works just as well.
Section 5.3 - Counting Inversions
- Inversion: An array has an inversion if i < j, but array[i]>array[j].
  - It really depends on how you define array[i]>array[j].
- Counting number of inversions can be a way to determine similarity
- Divide and Conquer Method to count inversions
  - Merge And Count and Sort And Count algorithms on page 224-225
  - Will put a more informal explanation of Merge and Count below because it's the real important part
  - This is more of how a person would do it, not at all how the algorithm works because the algorithm is all recursive and such.
  - Separate list into two parts (Divide is O(1))
  - Assume both sides are sorted (because they are sorted by Sort and Count
  - Keep a counter on both sides
  - Is left side bigger than right side?
    - If so, inversions exist. The number of inversions is the number of elements after the left counter (including the one it is on)
    - Move the right side counter over
  - If not, move left side over
  - This whole time, we are also merge sorting the list.
  - This way, we end up sorting the entire list as well as finding inversions.
  - We can see how we are able to recursively count inversions this way.
  - The recurrence relation is T(n) ≤ T(n/2) + T(n/2) + O(n)
  - We can see that the running time is O(n log n)
  - Pretty obvious seeing how similar it is to merge sort
Section 5.4 - Finding Closest Pair of Points
- Summary of Algorithm is on Page 230
- Informal Summary of algorithm
  - Cut initial area that is encapsulated in a box into two boxes
  - Find closest on each side brute force style. The min of the shortest distances on each side is called delta.
  - What if the closest points in the whole box happens to be right on the line we cut?
  - Check within delta of that line.
  - We are done.
  - We can see clearly how this can be done recursively
  - We can get an awesome efficiency of O(n log n)
    - This part is really confusing
  - The recurrence relation is T(n) ≤ 2T(n/2) + O(n log n) for the algorithm on page 230
  - We divide each sub box into two new sub boxes each time
  - This explains the 2T(n/2)
  - The O(n log n) combination cost comes from the fact that we must sort the points along the delta strip on the dividing lines
  - This gives us T(n) = O(n log^2 n)
  - We can do better and get T(n) = O(n log n)
  - When we do the recursive step, the sub problem that going up the chain is returning a sublist that is already sorted. We just need to merge these lists instead of merging from scratch.
  - This gives us T(n) = O(n log n)
Section 5.5 - Integer Multiplication
- Assume we have two n digit numbers
- Addition is O(n)
- Multiplication is O(n^2)
- Must find a better way to multiply
- Let x and y be the numbers in questions
- Cut each into two n/2 pieces, x_1, x_0, y_1, y_0
- Subscript 1 refers to higher order digits
- We have
  - xy=(x_1(10^(n/2))+x_0)(y_1(10^(n/2))+y_0)
  - xy=(x_1*y_1(10^(n/2))(10^(n/2)))+(x_1*y_0+x_0*y_1)*10^(n/2)+(x_0*y_0)
  - xy=(x_1*y_1*10^n)+(x_1*y_0+x_0*y_1)*10^(n/2)+(x_0*y_0)
- This leads us to a method called Karatsuba Multiplication
- Algorithm is on page 233
  - Recursively multiply on x and y
  - Divide into two parts like above
  - compute x_1+x_0 and y_1+y_0
  - Recursive step on these two sums
  - set p = to this value
  - set x_1*y_1=Recur on x_1 and y_1
  - set x_0*y_0=Recur on x_0 and y_0
  - return (x_1*y_1*10^n)+(p-x_1*y_1-x_0*y_0)*10^(n/2)+(x_0*y_0)
  - (I know this notation is very terrible, but the formal proof is in the book. I wrote it this way so I could understand it better)
- This has a recurrence relation of T(n) ≤ 3T(n/2) + cn
- This is because we divide the problem into three sub problems, p, x_1*y_1, x_0*y_0
- They are all of size n/2
- the constant combination is obvious

Chapter 6

Section 6.1 - Weighted Interval Scheduling: A Recursive Procedure
- Key thing: MEMOIZATION. It's pretty awesome.
  - Something I find very interesting is that memoizing seems very intensive on I/O actions. It makes the program a lot more efficient in terms of run time, but it really hits the I/O hard because of how much reading and writing we are doing. This really makes me wonder if memoizing is that great after the Andy Danner talk.
- Example Problem: Fib Seq
  - Can be easily solved with tail recursion.
  - The idea is to write things down that you have already computed.
- Example Problem: Weighted Interval Scheduling
  - Job j starts at s_j, finishes at f_j, and has weight or value v_j
  - Two jobs are compatible if they don't overlap
  - We want to find the group of jobs that are compatible with each other that also return the largest possible total weight/value.
    - Greedy algorithm doesn't work because it does not take into account weight/value.
  - Informal Algorithm Explanation:
    - Label jobs by their end times, e.g. if a job finishes before any other job ends, then it is job 1.
    - Define p(j) to be the job i such that i<j and is also compatible with j
    - Consider some optimal solution
    - Let OPT(j) be the group of jobs selected by the optimal solution consisting of jobs1 through j
    - We can assume that for job j, either j is in OPT(j) or not
    - Case 1: j is in OPT(j)
      - Then the jobs incompatible with j cannot be in OPT(j). They are p(j)+1, p(j)+2, … , j-1
      - Must include OPT(j-1)
    - Case 2: j is not in OPT(j)
      - Must include OPT(j-1)
    - This is pretty much what we do. It's a divide and conquer type of thing. It's recursive. The dynamic programming part comes in where we memoize the values for OPT(whatever)
    - The concept is very similar to the idea of tail recursion on the Fib Seq. Pretty much the same thing.
  - Non-Dynamic Algorithm: Exponential and Very Slow (Page 254)
    - Sort jobs by finish times so that f_1 ≤ f_2 ≤ … ≤ f_n #this is the labeling step from above
    - Compute p(1), p(2), …, p(n) #compute the compatible jobs. This has to be done. No way around it.
    - Compute(n)
    - Compute-Opt(j):
      - if j = 0
        
        return 0
      - else
        
        #Max determines which on the optimal solution would pick since the optimal solution is trying to get the max weight
        
        # picks j does not pick j
        
        return max(v_j + Compute-Opt(p(j)), Compute-Opt(j-1))
  - Having to constantly recompute Compute-Opt(p(j)) is stupid. Let's not do that and just compute it once.
  - Dynamic Algorithm: Fast (Page 256)
    - Sort jobs by finish times so that f_1 ≤ f_2 ≤ … ≤ f_n
    - Compute p(1), p(2), …, p(n)
    - for j=1 to n
      - M[j]=Empty #This is a global array we use to store all the memoized values for M-compute so we only compute if we have to
    - M[0]=0
    - M-Compute(n)
    - M-Compute-Opt(j):
      - if j = 0
        
        return 0
      - else # v_j is the value/weight of j
        
        return max(v_j + M-Compute-Opt(p(j)), M-Compute-Opt(j-1))
  - Efficiency:
    - Sorting jobs by finish time is O(n log n)
    - Computing p(whatever) is O(n log n)
      - I don't really see how this is not O(n^2)
      - I think it is O(n^2) because we have n elements, so we must compute p(_) n times.
      - p(_) is linear right? don't we have to go through all n elements to find one that it compatible with it?
    - M-Compute-Opt(j) is O(n)
      - This is awesome.
      - Everything we do is pretty much just constant accessing actions.
      - The thing that is not is recursively calling M-Compute-Opt.
      - The thing is, we only do it once for each value since we memoized everything.
      - So, we only do it n times.
      - When we actually compute something, all we really do is look up a weight/value and look up other stuff.
    - So, this takes O(n log n)
  - This only gives us the total value/weight that is the max achievable. It does not tell us what intervals were chosen.
  - We can figure this out after the fact.
  - We still have our M[] array, which is nice.
  - Informal Algorithm Explanation:
    - We essentially go through M and find the intervals that would have been chosen using this inequality
      - v_j + M[p(j)] > M[j-1]
      - This means if the value of this current interval + that of the one compatible with it that ends the latest is greater than the value of the interval right before this one, then this is one would have been chosen.
  - Algorithm:
    - M-Compute-Opt(n) # compute M[]
    - Find-Solution(n)
    - Find-Solution(j):
      - if j = 0:
        
        output nothing
      - else if v_j + M[p(j)] > M[j-1]:
        
        print j
        
        Find-Solution(p(j))
      - else:
        
        Find-Solution(j-1)
Section 6.2 - Principles of Dynamic Programming: Memoization or Iteration over Subproblems
- The book first goes over what makes the previous algorithm dynamic and efficient
  - If I was the author, I would have put that in the last chapter. It has no business being here.
- It says nothing else useful.
Section 6.3 - Segmented Least Squares: Multi-way Choices
- The problem we are focusing in this section is the one where we have a bunch of dots and want to know what line (if it even is a line) that is most similar to the data (you know what I mean)
- If it is not a line, then we do not try and form a curve, instead we form a sequence of lines.
- How do we do this?
  - Consider the first and last points.
  - For every point p_n, it can only belong to one line segment
  - We define cost for each point as cost: c (segment cost) + error of segment
  - Let OPT(j) = minimum cost for points p_1, p_i+1 , … , p_j (the min cost of all the points)
  - Let e(i, j) = minimum sum of squares for points p_i, p_i+1 , … , p_j
  - How to compute OPT(j)?
    - Cost = e(i, j) + c + OPT(i-1).
    - OPT(J):
      - if j=0
        
        return 0
      - else
        
        min {e(i,j)+c+OPT(i,j): 1⇐i⇐j }
  - Algorithm: page 265
    - INPUT: n, p_1,…,p_N , c
    - Segmented-Least-Squares()
      - M[0] = 0
      - e[0][0] = 0
      - for j = 1 to n
        
        for i = 1 to j
        
        e[i][j] = least square error for the segment p_i, …, p_j
      - for j = 1 to n
        
        M[j] = min 1 ≤ i ≤ j (e[i][j] + c + M[i-1])
      - return M[n]
  - The magic really lies in the math behind the least square error. That's how we determine which segment something belongs to.
  - Efficiency:
    - The first group of nested for loops is O(n^3).
    - This can be reduced to quadratic if we pre-compute and write down the values for p_whatever
      - YAY MEMOIZING
    - the last for loop is actually quadratic even though it initially looks linear because it's just a single for loop, but remember that min is a linear time function
  - For some reason the book likes to do things in this very not straightforward way of finding all the data we need using one algorithm and then later using that data to find the information we really need to get our results.
  - So, since we really need to know how to split up the points into a sequence of segments, we will go through all the segments we just assigned
  - Algorithm: page 266
    - FindSegments(j):
      - if j = 0:
        
        output nothing
      - else:
        
        Find an i that minimizes ei,j + c + M[i-1]
        
        Output the segment {pi, …, pj}
      - FindSegments(i-1)
Section 6.4 - Subset Sums and Knapsacks: Adding a variable
- We're focusing on the knapsack problem
- We have n things and a knapsack
- item i has weight w_i (always positive) and also has a value v_i
- We can only carry so much weight
- We want to maximize the value (more bang for the buck type of deal)
- So, for each item, we either pick it or we don't pick it
- OPT(i) = max profit subset of items 1,…, i
- How to compute OPT(i)?
  - if it does not select object i
    - OPT selects best of { 1, 2, …, i-1 }
  - if it does
    - Does not mean we must not pick everything lower than i
    - So, we don't know what to do.
    - MORE SUBPROBLEMS!!! w0000ooooooo…
    - So, we set a new weight limit, our current available weight minus what we jsut added, i.e. w – w_i
    - We let OPT select the best of these
- That's it! That's the algorithm
- Algoriddim be on Page 269 mahn
  - Input: N, w_1,…,w_N, v_1,…,v_N
  - for w = 0 to W
    - M[0, w] = 0
  - for i = 1 to N
    - for w = 1 to W
      - if wi > w :
        
        M[i, w] = M[i-1, w]
      - else
        
        M[i, w] = max{ M[i-1, w], vi + M[i-1, w-wi] }
  - return M[n, W]
- The reason for why this works is suprisingly simple once we realize the important of just changing the amount of weight left over and over again
- Efficiency:
  - This is a strange one.
  - It's Θ(n W)
  - So it both depends on the knap sack capacity and the number of objects
  - they call it Pseudo-polynomial.
  - Weird stuff. Never thought algorithms like this existed.

Chapter 7

Section 7.1 - The Maximum-Flow Problem and the Ford-Fulkerson Algorithm
- Problem
  - You have a directed graph
  - Each edge has a capacity
  - Have a start node s and end node t
  - Greedy
    - Find an s-t path with highest capacity
    - do it!
    - keep doing this until you cant do it any more
    - Locally optimal, but not globally optimal (it's very easy to think of a counter example, counter example is on page 339 in book)
  - Awesome Solution: Residual Graphs
    - Edges have a capacity, but also a flow
    - We don't have to use all of it at once
    - Residual edge is one where you have some going one way and some going the other
    - The Ford-Fulkerson Algorithm uses residual edges and works awesomely (optimal)
      - page 342 - 344
      - Informal explanation
        
        graphs
        
        keep updating the residual edges using the following formula:
        
        If edge is the same as it was originally, set the forward rate to the bottle neck + original rate (which is the edge with the smallest flow rate)
        
        Otherwise, set the backwards residual rate to what it originally was minus the bottle neck
Section 7.2 - Maximum Flows and Minimmum Cuts in a Network
- Cut: s-t cut is a partition (A,B) where s is in A and t is in B
- Problem: Find an s-t cut of minimum capacity
- The capacity of a cut is the sum of all the possible stuff that can go through the edge at once
- Flow Value Lemma: Let f be any flow, and let (A, B) be any s-t cut. Then, the value of the flow is = f_out(A) – f_in(A).
- Weak Duality: let f be any flow and let (A, B) be any s-t cut. Then the value of the flow is at most the cut’s capacity.
- Corollary: Let f be any flow, and let (A, B) be any cut. If v(f) = cap(A, B), then f is a max flow and (A, B) is a min cut
- Augmenting path theorem: Flow f is a max flow iff there are no augmenting paths
- Max-flow min-cut theorem: The value of the max flow is equal to the value of the min cut.
  - More explanation on page 350
  - Boom and we are done. Just use the same stuff as in the past section and we have all we need.
Section 7.5 - A First Application: Bipartite Matching Problem
- Given an undirected bipartite graph
- We must match so each node appears at most once on each edge
- How do we find one with the max number of nodes? This is our problem.
- Here's how we do it. We got a bunch of red and blue nodes. Add nodes s and t, which will be the start and end nodes. Have the start node s have a unit edge (edge of size 1) connect it to all the red nodes. Have all the blue nodes have a unit edge connect them to the end node t. Still have same edges in the middle.
- Now, use same algorithm as in the first section. BOOOM!!! Right? Yep. It's pretty intuitive, these later sections are more like practical uses rather than actual algorithms. I guess this is kind of good since pretty much all computer science problems go back to graph theory.
- Extensions: The structore of bipartite graphs with no perfect matching
  - Dicusses how to tell if it is impossible to find a perfect matching in a bipartite graph.
  - Can easily find an algorithm based on the following theorem:
    - Assume that the bipartite graph G=(V,E) has two sides X and Y such that |X| and |Y|. Then the graph G either has a perfect matching or these is a subset A such that |\G(A)|<|A|. A perfect matching or an appropriate subset A can be found in O(mn) time.
Section 7.7 - Extensions to Max-Flow Problem
- Circulations with Demands
  - We have a directed graph
  - edges have capacities
  - nodes have supply and demands
  - A circulation is a function that satisfies the following
    - the flows for a particular edge are always less than the capacity (makes sense and is very intuitive)
    - the demand for a node is equal to the stuff coming in minnus the stuff going out (the the demand that his node has? I guess that makes enough sense…)
  - To keep the world form exploding, the sum of supplies == sum of demands
  - Algorithm: more details on pg 381 and 382
    - Add a new source s and sink t
    - For each v with d(v) < 0, add edge (s, v) with capacity -d(v)
    - For each v with d(v) > 0, add edge (v, t) with capacity d(v)
  - MAGIC: G has circulation iff G' has max flow of value D
  - MORE MAGIC : Given (V, E, c, d), there does not exist a circulation iff there exists a node partition (A, B) such that the sum of all the demands is greater than cap(A,B)
  - These two pretty much solve the problem for us. more details are on page 382 if you get confused rereading this sometime in the future
- Circulation with Demands and Lower Bounds
  - Same givens as last problem
  - except!!!! we have lower bounds on each edge
  - algorithm on page 384
  - Here's how we do
  - we set teh flow along an edge to be the lower bound and then we update the demand on both nodes attached to it.
  - the rest is self explanatory

Table of Contents

Paul's Journal

Preface

Chapter 1

Chapter 2

Section 2.3

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7