======= The Front Matter =======
==Summary==
This section is an introduction to Greedy Algorithms. Greedy algorithms build a solution by making step-by-step, local decisions to optimize underlying criterion. There are two types of proofs: Greedy Stays Ahead and Exchange Argument. Greedy Stays Ahead measures the progress of the algorithm and proves that it does better than or equal to any other algorithm, especially a simple brute search. The exchange argument is when you consider an optimal solution and gradually transform it into the solution yielded by the greedy algorithm without jeopardizing it optimality at any point.
==Notes==
  * “Greed… is good. Greed is right. Greed works.”
  * An algorithm is greedy if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion.
  * When a greedy algorithm succeeds in solving a nontrivial problem optimally, it typically implies something interesting and useful about the structure of the problem itself; there is a local decision rule that one can use to construct optimal solutions.
  * Types of proofs
    * Greedy Stays Ahead: if you measure the greedy algorithm’s progress in a step-by-step fashion, one sees that it does better than any other algorithm at each step, and thus, produces an optimal solution.
    * Exchange Argument: consider any possible solution to the problem and gradually transform it into the solution found by the greedy algorithm without hurting its quality.

==Additional Information==
There wasn't anything that I was unclear about or had to reread to understand after class or after my first run through of this section. 

In terms of readability, this section was a 10 because it was just an introduction section. It was basically definitions, and they were super easy to understand because they were clearly explained. Also, this section was a page, so it was a super quick read.
======= 4.1 Interval Scheduling =======
==Summary==
In this section, there are two problems to solve: Interval Scheduling and Interval Partitioning. In Interval Scheduling, the greedy aspect of the problem is to select intervals based on finish times (the smallest one first); this allows us to maximize the time left to complete other tasks. We prove the algorithm's optimality based on this greedy criterion by using the greedy stays ahead proof. In Interval Partitioning, the greedy aspect is that we select intervals based on the earliest start time. We prove this greedy algorithm for optimality by using a structural argument; the number of resources for the intervals cannot be less than the depth of the intervals.

==The Problem==
  * A subset of requests is compatible if no two of them overlap in time; our goal is to accept as large a compatible subset as possible
  * Compatible sets of maximum size are optimal

==Designing a Greedy Algorithm==
  * Basic idea in greedy algorithm for interval scheduling is to use a simple rule to select a first request i1, then reject all requests not compatible with i1, then select request i2, then reject all requests incompatible with i2, and so on until we run out of requests
  * The challenge is in designing a good greedy algorithm is deciding which simple rule to use for the selection
  * There are many natural rules for this problem, which do not give good solutions
    * Pick based on earliest start time: not optimal
    * Pick based on smallest interval of time: not optimal
    * For each request, count the number of requests that are not compatible and accept the request that has the fewest number of incompatible requests (pick based on fewest conflicts): surprisingly, not optimal
  * The optimal solution: accept the first request that finishes first, a request for which the finish time is the smallest
    * Allows us to maximize the time left to satisfy other requests

Algorithm
     Initially let R be the set of all requests, and let A be empty
     While R is not yet empty
	     Choose a request i in R that has the smallest finishing time
	     Add request i to A
	     Delete all requests from R that are not compatible with request i
     Endwhile
     Return the set A as the set of accepted requests

==Analyzing the Algorithm==
  * A is a compatible set of requests
  * We need to show that the solution yielded by the algorithm is optimal
    * We will show that the size of A (set of compatible requests yielded by the algorithm) and the size of O (an optimal solution) are equal, A has the same number of intervals as O and hence is also optimal
    * We will use a greedy stays ahead proof
      * We want to show that each of A’s intervals finishes at least as ssoon as the corresponding interval in set O
      * For each r>=1, the rth accepted request in the algorithm’s scheduling finishes no later than the rth request in the optimal schedule
  * For all indices r<=k we have finish time of rth i (ir) is less than or equal to the finish time of rth j (jr)
    * Now let r>1. Assume statement is true for r-1, and we will prove it for r. In order for the greedy algorithm to not be optimal, it would have to fall behind. However, it will not because the greedy always has the option at worst of choosing jr.
  * The greedy algorithm returns an optimal set A
    * If A is not optimal, an optimal set O must have more requests, must have m>k. Therefore, there must be a request j(k+1) in O, which starts after jk ends, and thus after ik ends, therefore j(k+1) would be compatible with ik and the set A. The greedy algorithm stops with request ik, and it is only supposed to stop when R is empty – a contradiction.
  * Runtime: O(nlogn), most costly portion of the algorithm is sorting the requests in order of finishing time
  * Extensions: scheduler needs to make decisions about accepting and rejecting before knowing about the full set of requests, requests may time out if the scheduler waits too long to schedule; each request has a different value, want to maximize the value

==Scheduling All Intervals==
  * Goal is to partition all intervals across multiple resources, Interval Partitioning Problem
    * We have many identical resources available, and we want to schedule all requests using as few resources as possible
  * Define the depth of a set of intervals as the maximum number that pass over any single point on the timeline
  * In any instance of Interval Partitioning, the number of resources needed is at least the depth of the set of intervals
    * A set of intervals has depth d and let requests 1 through d pass over a common point on the timeline. Each of these intervals must be scheduled on a different resource, need at least d resources
  * Can we design an algorithm that schedules all intervals using the minimum possible number of resources? Is there always a schedule using a number of resources that is equal to the depth? Yes and yes!!!!!
  * Design a greedy algorithm that schedules all intervals using a number of resources equal to depth, no solution can have number of resources that is smaller than the depth
  * Structural argument: there is a structural bound asserting that every possible solution must have at least a certain value

Algorithm
     Sort the intervals by their start times, breaking ties arbitrarily
     Let I1, I2, …., In denote the intervals in this order
     For j = 1, 2, 3, …, n:
	     For each interval Ii that precedes Ij in sorted order and overlaps it:
                  Exclude the label of Ii from considerations for Ij’
	     Endfor
	     If there is any label from {1, 2,…, d} that has not been excluded then:
                  Assign a nonexcluded label to Ij
	     Else
                  Leave Ij unlabeled
	     Endif
     Endfor

  * If we use the greedy algorithm above, every interval will be assigned a label, and no two overlapping will receive the same label.
  * The greedy algorithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of intervals. This is the optimal number of resources needed.

==Additional Information==
In class, I understood what the structural argument was in the specific Interval Partitioning problem: the optimal solution cannot be less than the depth of the intervals. However, I was kind of confused by what it was in a broader sense. Luckily, after reading this section, it became much clearer. You just have to prove that there is a structural bound asserting that every possible solution must have at least a certain value. Then, you have to show that the greedy algorithm yields a result that has that minimum value.

In terms of readability, this section is a solid 9. Because I read this section after hearing the lecture in class, it was super easy to comprehend, and the proof s were written very clearly. The only proof that was a little difficult to understand in terms of the way that the book explained it was the greedy stays ahead proof. Luckily, it wasn't too difficult to get because we learned about it in class.
======= 4.2 Scheduling to Minimize Lateness =======
==Summary==
We want to be able to schedule all of the tasks, without overlapping, so to minimize the maximum lateness. Scheduling jobs in order of increasing length and by shortest slack time do not always yield optimal solutions. The greedy criterion for this problem is sort the jobs in increasing order of the deadline and then schedule them in that order. Of course, there would be no idle time. To prove optimality for this greedy algorithm, we will use an exchange argument in which we will modify an optimal solution, ensuring it remains optimal at each step, slowly until we transform the old optimal solution into the solution that our greedy algorithm would produce. In this case, we would modify an optimal solution with inversions to get it to have no inversions to prove the greedy algorithm's optimality. 

==The Problem==
  * The request i has a deadline di, requires a contiguous time interval of length ti, but it is willing to be scheduled at any time before the deadline. Each accepted request must be assigned an interval of time of length ti, and different requests must be assigned to non-overlapping intervals.
  * Request i is late if it misses the deadline
  * Our goal is to schedule all requests using non-overlapping intervals, so as to minimize the maximum lateness

==Designing the Algorithm==
  * Schedule jobs in order of increasing length: not optimal
  * Scheduling jobs by smallest slack time: not optimal
  * The optimal solution: sort the jobs in increasing order of their deadlines di, and schedule them in order (Earliest Deadline First Rule).
  * s is start time, f is finishtime, d is deadline, t is time length of the interval

Algorithm 
     Order the jobs in order of their deadline
     Assume that d1<=…<=dn (sorted deadlines)
     Initially, f = s
     Consider jobs i=1,…, n in this order:
	     Assign job I to the time interval from s(i) = f to f(i) = f + ti
	     Let f = f + ti
     End
     Return the set of scheduled intervals [s(i), f(i)] for i = 1,…, n

==Analyzing the Algorithm==
  * The algorithm above produces no gaps, no idle time.
  * There is an optimal solution with no idle time.
  * Exchange argument: gradually modify O, preserving its optimality at each step, but eventually transforming it into a schedule that is identical to schedule A found by the greedy algorithm.
  * All schedules with no inversions and no idle time have the same maximum lateness.
    * They can only differ in the order in which jobs with identical deadlines are scheduled, which cannot affect the maximum lateness.
  * There is an optimal schedule that has no inversions and no idle time.
    * If O has an inversion, then there is a pair of jobs i and j such that j is scheduled immediately after i and has dj<di.
    * After swapping i and j we get a schedule with one less inversion. 
    * The new swapped schedule has a maximum lateness no larger than that of O.
      * The finishing time of j before the seap is exactly equal to the finishing time of I after the sap. Thus, all jobs other than jobs I and j finish at the same time in the two schedules. Job j will get finished earlier in the new schedule. Therefore, the swap does not increase the lateness of job j. Job I finished at j’s previous finish time in pre- swap schedule O. If job i is late in this new schedule, its lateness of i is equal to the difference of the finish time of j in schedule O before the swap and the deadline of i. Job i cannot be more late in the swapped schedule than j was in the pre-swapped schedule. The lateness of I is equal to the pre-swap finish time of j minus the deadline of i, which is less than the finish time of j minus the deadline of j, which is equal to the lateness of j. Therefore, the swap does not increase the maximum lateness of the schedule.
    * The schedule A produced by the greedy algorithm has optimal maximum lateness L.

==Additional Information==
When first reading through the section, it was hard for e t understand the algorithm and its proof because of the notation that the book invented for it. The book abbreviated full words to letters, so when I first read the section I must have not registered where they defined the terms. However, once I went back and reread the section, the notation became more clear. For instance, s is start time, f is finishtime, d is deadline, and t is time length of the interval. I can't type some of the symbols because they have some lines over them. However, the letters with the lines over them were post-swap, and the ones without were pre-swap.

In terms of readability, on a scale from 1 to 10, this section was an 8. The section was explained very well overall. Unfortunately, the proof of optimality of the algorithm for this section had some confusing notation in it. However, once I understood the notation, understanding the proof got exponentially easier.
======= 4.4 Shortest Paths in a Graph =======
==Summary==
Our goal is to find the the shortest paths in a graph. To solve this, we will use Dijkstra's algorithm. Dijkstra's algorithm is greedy because we always form the shortest new s-v path we can make from a path followed by a single edge. However, all of the edges must have nonnegative distance; the algorithm breaks down if it does not. We will prove this algorithm for correctness using the greedy stays ahead proof. The runtime of the algorithm is O(mlogn) if you use the priority queue to implement it.

==The Problem==
  * Given nodes u and v, what is the shortest u-v path? Given a starting node s, what is the shortest path from s to each other node?
  * This is for a directed graph. Assume s has a path to every other node in the graph. Each edge is non-negative. The length of a path is the sum of the lengths of all the edges in the path.

==Designing the Algorithm==
  * For each node v in set V-S, we determine the shortest path that can be constructed by traveling along a path through the explored part S to some u in set S, followed by a single edge (u, v).

Dijkstra's Algorithm
     Dijkstra’s Algorithm (G, l)
          Let S be the set of explored nodes
          For each u in S:
	       Store a distance d(u)
          Initially, S = {s} and d{s} = 0
          While S != V:
               Select a node v not in set S with at least one edge from S for 
               which d’(v) = min(e=(u, v):u in S) d(u) + le is as small as possible
               Add v to S and define d(v) = d’(v)
          Endwhile

==Analyzing the Algorithm==
  * Dijkstra’s Algorithm is greedy in the sense that we always form the shortest new s-v path we can make from a path S followed by a single edge.
  * We will prove its correctness by using greedy stays ahead.
  * Consider the set S at any point in the algorithm’s execution. For each u in set S (discovered nodes), the path Pu is a shortest s-u path.
    * P cannot be shorter than Pv because it is already at least as long as Pv by the time it has left the set S. In iteration k+1, Dijkstra’s Algorithm must have considered adding node y to the set S via the edge (x, y) and reject its option in favor of adding v. This means that there is no path from s to y through x that is shorter than Pv. The subpath of P up to y is such a path, and so this subpath is at least as long as Pv. Since the edge lengths are nonnegative, the full path P is at least as long as Pv as well.
  * The algorithm does not always find the shortest paths if some of the edges have negative lengths.
  * Dijkstra’s Algorithm is a continuous version of the standard breadth-first search algorithm for traversing a graph.
  * There are n-1 iterations of the while loop. 
  * For a graph with m edges, computing all these minima can take O(m) time, so this would lead to an implementation that runs in O(mn) time.
  * However, we can do much better if we use a priority queue.
    * We will need: ChangeKey and ExtractMin operations
    * ExtractMin: selecting a node v to be added to set S
    * ChangeKey: decrease the key of node w appropriately
    * ChangeKey can occur at most once per edge
  * Using a priority queue, Dijkstra’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin and m ChangeKey operations.
  * The heap-based priority queue implementation results in an overall time for implementation in this case is O(mlogn)

==Additional Information==
The notation was pretty hard for me to understand in the first run through of the section, specifically d’(v) = min(e=(u, v):u in S) d(u) + le. After reading the section again, the notation became more clear. min(e=(u, v):u in S)d(u) means the minimum distance to u of the paths from node u to node v in set of discovered nodes S. d'(v) is the new distance to v and le is the length of the edge. Therefore, now I better understand the algorithm overall.

In terms of readability, on a scale of 1 to 10, this section was a 7. It wasn't necessarily that this section was difficult to understand. It was just that some of the notation was confusing, especially when the algorithm is written out. However, once I got the notation down, the section was not too difficult. I thought that the proofs were explained fairly clearly.

======= 4.5 The Minimum Spanning Tree Problem =======
==Summary==
We want to build a connected graph, that uses weighted edges, as cheaply as possible. In order to do this, the output would be a tree because there would be no cycles (that would be a waste of cost because deleting the maximum edge would not disconnect the graph). Thus, the output is a tree, which is called a minimum spanning tree. There are three algorithms that output optimal solution for this problem: Prim's Algorithm, Kruskal's Algorithm, and the Reverse-Delete Algorithm. We also learned about the cut property, which is used to prove Prim's Algorithm and Kruskal's Algorithm for correctness. We also learned about the cycle property, which is used to prove the Reverse-Delete Algorithm for correctness. The cut property tells you when you include an edge in the minimum spanning tree, and the cycle property tells you when you can delete an edge to create and minimum spanning tree. Finally, the implementation of Prim's Algorithm is very similar to the implementation of Dijkstra's Algorithm.

==The Problem==
  * Set of locations V = {v1, v2, …, vn}, we want to build a communication network on top of them
  * The network should be connected; we want to build it as cheaply as possible
  * For certain pairs (vi, vj), we can build a link between them for a certain cost
  * Let T be a minimum-cost solution to the network design problem defined above. Then (V, T) is a tree.
    * Starting from any optimal solution, we could keep deleting edges on cycles until we had a tree; with nonnegative edges, the cost would not increase during this process
  * This is called the Minimum Spanning Tree Problem.

==Designing the Algorithm==
  * Many of the first greedy algorithms on tires turn out to be correct and produce optimal solutions.
  * Three greedy algorithms that produce optimal solutions
    * Kruskal’s Algorithm: Starts without any edges, builds spanning tree by successively inserting edges from E in order of increasing cost, insert each edge e as long as it does not create a cycle when added to the edges that are already inserted, if the insertion results in a cycle we discard e and continue with the algorithm
    * Prim’s Algorithm: analogy with Dijkstra’s Algorithm for paths, start with a root node s and try to greedily grow a tree outward, at each step add the node that can be attached as cheaply as possible to the partial tree we already have (in each iteration add the node v that minimizes the attachment cost)
    * Reverse-Delete Algorithm: backward of Kruskal’s Algorithm, start with full graph and begin deleting edges in order of decreasing cost, we delete it an edge as long as doing so would not disconnect the graph

==Analyzing the Algorithms==
  * Useful to have in hand some basic facts saying when it is safe to include or eliminate an edge in the minimum spanning tree
  * Cut Property
    * Assume that all edges costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v, w) be the minimum-cost edge with one end in S and the other in V-S. Then every minimum spanning tree contains the edge e.
    * Exchange argument proof: S is the subset of nodes and e is the edge with the minimum cost edge with exactly one endpoint in S. Then, the minimum spanning tree contains e. Suppose there is a minimum spanning tree that doesn't include e. Adding e to the minimum spanning tree creates a cycle in the tree.e is in the cycle C and the cutset. There is another edge e' that is in C and the cutset. Exchange the edges. The new tree is the original tree including e and excluding e'. Because the cost of the new tree is less than the cost of the original tree, this is a contradiction. Thus, the minimum spanning tree does include e.
  * Kruskal’s Algorithm produces a minimum spanning tree of G.
    * Consider any edge e = (v, w) added by Kruskal’s Algorithm; adding e to the set of the included nodes does not create a cycle. Also, no edge from S to V-S has been encountered yet since any edge could have been added without creating a cycle, would have been added by algorithm. Therefore, e is the cheapest edge with one edge in S and one in V-S, by the cut property, it belongs to every minimum spanning tree. And, because the nature of the algorithm does not add any cycles, the algorithm will add the first of these edges it encounters.
  * Prim’s Algorithm produces a minimum spanning tree G.
    * The nature of the algorithm is to add nodes that minimize the quantity of min(e+(u, v):u in set S)ce. E is the cheapest edge with one end in S and the other end in V-S, by the cut property, it is in every minimum spanning tree.
  * Cycle Property
    * Assume all edge costs are distinct. Let C be any cycle in G, and let edge e = (v, w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree G.
    * Exchange argument, swapping e for a cheaper edge in such a way we still have a spanning tree: Let C be any cycle in the graph. Edge e is the edge in C with the maximum cost. Therefore, the minimum spanning tree does not contain e. Suppose e belongs to the minimum spanning tree. Deleting e from the minimum spanning tree creates a cut in the minimum spanning tree. e is in C and the cutset. There is another edge e' that is in C and the cutset. The new tree yielded is the original tree including e' minus e, and is also a spanning tree. Since the const of the new tree is less than the cost of the original tree, this is a contradiction. So, e does not in fact belong to the minimum spanning tree.
  * The Reverse-Delete Algorithm produces a minimum spanning tree.
    * When e is removed, it lies on a cycle C. By the algorithm edges are removed in decreasing order of edge cost. Therefore, by the cycle property, e does not belong in the minimum spanning tree. So, e must be the most expensive edge. The algorithm never removes an edge when it will disconnect the graph. Most expensive edge in cycles will be deleted by the nature of the algorithm.
  * Any algorithm that builds a minimum spanning tree by repeatedly including edges when justified by the cut property and deleting edges when justified by the cycle property – in any order at all – will end up with a minimum spanning tree.
  * Thus far, we have assumed that all edge costs are distinct. IF we run any of our minimum spanning tree algorithms, using the perturbed costs for comparing edges, we will produce a minimum spanning tree T that is also optimal for the original instance.

==Implementing Prim’s Algorithm==
  * Prim’s and Kruskal’s Algorithms can be implemented in O(mlogn) time.
  * Using a priority queue, Prim’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin and m ChangeKey operations.
  * If we use a heap-based priority queue we can implement both ExctracMin and ChangeKey in O(logn) time, and so get an overall running time of O(mlogn).

==Extensions==
  * We could be concerned about point-to-point distances in the spanning tree we build, and be willing to reduce these even if we pay more for the set of edges.
  * We may care more about the congestion on the edges.
  * We could make resilience an explicit goal.

==Additional Information==
The cycle and cut properties were hard for me to comprehend on the first day that we talked about them in class and the first time I read about them in the section. However, the cut property merely tells you when you can include an edge in a minimum spanning tree. It says that the minimum cost edge with one end in the explored set and the other in the unexplored set can must be included in the minimum spanning tree (with neither the explored set nor the unexplored set being empty). The cycle property tells you when you can delete edges to create the minimum spanning tree. It says that the most expensive edge in a cycle can be deleted and does not belong in any minimum spanning tree. Prim's and Kruskal's Algorithms use the cut property to prove for correctness, and the Reverse-Delete Algorithm uses the cycle property to prove for correctness.

On a scale of 1 to 10, the readability of this section was a 7. The description of the algorithms were very straightforward. However, the proofs of the algorithms using the cycle and cut properties were a little hard to comprehend at first. However, after going to class and reading over the proofs again, it was easier to understand.
======= 4.6 Implementing Kruskal's Algorithmm: Union-Find ========
==Summary==
When an edge is added to the graph, we don’t want to have to keep continuously recomputing the connected components with each iteration. The Union-Find structure will store a representation of the components in a way that supports rapid searching and updating and will prevent us from having to add edges to recompute connected components many times over. The Union-Find data structure supports three operations MakeUnionFind(S), Find(u), and Union(A, B). We first try an array implementation, but that is not as efficient as we wish it to be. The more efficient form of the implementation is a pointer based implementation. For example, in the pointer-based implementation, if you are performing Union(Find(u), Find(v)) and |Find(u)| < |Find(v)|, you merely would change u's pointer to point to v. You do not update the pointers of any of the other nodes in u's set. To make this more efficient, we could also, each time a Find operation is performed, compress the paths by resetting all pointers along the path to point to the current name of the set. This results in the Union operation running in O(1) time and the Find operation running in O(logn) time at most. The implementation of Kruskal's Algorithm using the pointer-based Union-Find data structure is O(mlogn).

==The Problem==
  * The Union-Find data structure allows us to maintain disjoint sets
  * Find(u): return the name of the set containing u.
  * Union(A, B): take two sets A and B and merge them into a single set
  * If we add an edge (u, v) to the graph, we will first test u and v to see if they are already in the same connected set. If they are not, Union(Find(u), Find(v)) can be used to merge the two components into one.
  * The Union-Find data structure supports
    * MakeUnionFind(S): for a set S will return a Union-Find data structure on set S where all elements are in separate sets; run in O(n) time
    * Find(u): run in O(logn)
    * Union(A, B): run in O(logn)

==A Simple Data Structure Union-Find==
  * Array Component that contains the name of the set currently containing each element, size n, where Component[s] is the name of the set containing s
  * To implement MakeUnionFind(S), setup the array Component and initialize Component[s] = s for all s in set S
  * Maintain the list of elements in each set
  * Maintain an additional array Size of length n where Size[a] is the size of set A and when a Union(A, B) operation is performed, use the name of the larger set for the union
  * Consider an array implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. The Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, any sequence k of Union operations takes at most O(klogk) time.

==A Better Data Structure for Union-Find==
  * Each node v in set S will be contained in a record with an associated pointer to the name of the set that contains v.
  * To indicate we took the union of the two set, and that the name of the union set is v, we update u’s pointer to point to v. We do not update the pointers at the other nodes of set B.
  * Pointer-based data structure implements Union in O(1) time, but Find operation is no longer constant.
  * Find can be as large as O(n)
  * To reduce the time required for a Find operation, we will use the same optimization we used before: keep the name of the larger set as the name of the union.
  * Consider the above pointer-based implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. A Union operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and a Find operation takes O(logn) time.

==Further Improvements==
  * For all the applications of the Union-Find data structures that we consider the O(logn) time per operation is good enough in the sense that further improvement in the time ofr the operations would not translate to improvements in the overall running time of the algorithms where we use them.
  * In the improved implementation, we will compress the path we follow after every Find operation by resetting all pointers along the path to point to the current name of the set. This will allow subsequent Find operations to run more quickly.
  * The improved implementation supports Union operations in O(1), MakeUnionFind(S) in O(n), and Find operations take at most O(logn) time.

==Implementing Kruskal’s Algorithm==
  * Sort the edges in the graph by cost takes O(mlogm) time which can be simplified to O(mlogn) because m <= n^2.
  * Use the Union-Find data structure to maintain the connected components of (V, T) as edges are added. As each ede e = (v, w) is considered, we compute Find(u) and Find(v) and test if they are equal to see if v and w belong to different components.
  * At most there are 2m Find operations and n-1 Union operations.
  * This results in a total O(mlogn) runtime.
  * Kruskal’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(mlogn) time.

==Additional Information==
The pointer implementation of the Union-Find data structure was a little confusing to me at first. We didn't really go over how the Union-Find structure was implemented in class, so the first time I read the pointer implementation, I did not fully comprehend what exactly that implementation did. Luckily, after reading the implementation again, it became clearer. The Union-Find structure in the form of the pointer implementation, to represent a union of the set with u in it and the set with v in it and making the union set name v's, merely update u's pointer to point to v. This makes the Union operation O(1) and the Find operation, with each subsequent operation when we update the pointers to point to the "parent" set with each Find, is O(logn).

On a scale of 1 to 10, the readability of this section was a solid 7. The different Union-Find implementations were a little hard to understand (and they are definitely hard things to explain). Nonetheless, I thought the book did a fair job doing just that. It just took a couple of rereadings to fully comprehend the Union-Find data structure using pointers.
======= 4.7 Clustering ========
==Summary==
Minimum spanning trees play a role in clustering. We are trying to find the k-clustering (partition of U into k nonempty sets) with the maximum possible spacing (we want to group together the elements that are the closest). We will define a distance function d(pi, pj) on the objects with the conditions: d(pi, pi) = 0, d(pi, pj) > 0 when pi and pj are distinct, and d(pi, pj) = d(pj, pi). The algorithm to find this clustering is basically running Kruskal’s Algorithm but stopping it right before it adds its last k-1 edges. The connected components formed by deleting the k-1 most expensive edges of the minimum spanning tree T create a k-clustering of maximum spacing.

==The Problem==
  * A collection of objects that one is trying to classify or organize into coherent groups.
  * Define a distance function on the objects, with the interpretation that objects at a larger distance from one another are less similar to each other.
  * Given a distance function on the objects, the clustering problem seeks to divide them into groups so that objects within the same group are “close” and objects in different groups are “far apart”.
  * N objects labeled p1, p2, …, pn
  * For each pair pi and pj, we have distance d(pi, pj)
    * d(pi, pi) = 0, d(pi, pj) > 0 for distinct pi and pj, and d(pi, pj) = d(pj, pi)
  * we want to divide the objects in U into k groups for a given parameter k
    * a k-clustering of U is a partition of U into k nonempty sets C1, C2,…Ck
    * spacing of a k-clustering to be the minimum distance between any pair of points lying in different clusters
  * we are looking for the k-clustering with the maximum possible sapcing

==Designing the Algorithm==
  * consider growing a graph on the vertex set U
  * connected components will be clusters
  * drawing an edge between the closest pair of points, draw an edge between the next closest pair of points, and so on; continue adding edges between pairs in order of increasing distance d(pi, pj)
  * if we are about to add edge (pi, pj) and find pi and pj are already in the same cluster, we will not add the edge (never create a cycle which means we will have union trees)
  * the iterative merging of clusters is termed single-linked clustering, a special case of hierarchical agglomerative clustering
  * our procedure above is Kruskal’s Minimum Spanning Tree Algorithm; the only difference is we stop the procedure once we obtain k connected components
    * run Kruskal’s Algorithm but stop it just before it adds its last k-1 edges
  * The components C1, C2, …, Ck formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.
    * The spacing of the k-clustering is precisely the length d* of the (k-1)st most expensive edges in the minimum spanning tree. d(p, p’) <= d*, since the edge (p, p’) was added by Kruskal’s algorithm. Consider some other clustering, which partitions U into nonempty sets C1', C2',... Ck'. We need to show that the spacing of the other clustering is at most d*. Because the two clusterings are not the same, one of our clusters Cr is not a subset of any off the k sets Cs'. Because two nodes pi and pj belong to the same component Cr, Kruskal's algorithm must have added all of the edges of a pi-pj path P before it stopped. Each edge on P has at most d*. We know that pi in Cs' but pj is not in Cs'. Thus, d(p, p') <= d* because the edge (p, p') was added by Kruskal's Algorithm. Therefore, p' is the first node on P that does not belong to Cs' and p is the nodes on P that comes just before p'. But, p and p’ belong to different sets in the k-clustering, hence the spanning of the k-clustering is at most d(p, p’) <= d*.

==Additional Information==
The thing that confused me in this section was the proof of the algorithm for correctness. The notation in the proof was super confusing to me. This proof rests on the fact that Kruskal's Algorithm adds the smallest distance edges first and leaves the largest for last, which it does not add. Thus, in any instance when there is a deviation between the resulting MST from Kruskal's Algorithm and another algorithm. The distance of two points that are not in the same cluster in the deviant output its distance is less than or equal to the distance added by Kruskal's. Therefore, Kruskal's produces the optimal solution.

On a scale of 1 to 10, the readability of this section was a 9. The book did a really great job explaining what a clustering was, and the description of the algorithm was super clear. The only thing that confused me a little was the proof of the algorithm we used. But, still, I feel like I at least got the gist of the proof.

======= 4.8 Huffman Codes =======

==Summary==
We want to map encodings to letters so that no encoding is the prefix of another in order to avoid ambiguity. A prefix code that minimizes the average bits per letter is optimal. In other words, more frequently used letters should have shorter encodings in order to compress the data. A brute force algorithm is infeasible because the search space is so large and so variable. We will use a full binary tree to represent prefix codes all letters encoded will be leaf nodes to avoid any encoding being the prefix of another. Meta-letters will be interior nodes (parents to letters). The greedy aspect of Huffman's algorithm is that you merge the two lowest frequency letters in the algorithm. The bottom-up approach is the optimal approach. The runtime of the algorithm is O(klogk).

==The Problem==
  * A greedy rule is used to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion
  * The global consequences of the initial greedy decision do not become fully apparent until the full recursion is complete
  * Data compression, an area that forms part of the foundation for digital communication
  * The mapping of bit strings to symbols is arbitrary
  * The letters in most human alphabets do not get used equally frequently
  * Issue of reducing the average number of bits per letter is a fundamental problem in the area of data compression
  * How might we construct the optimal way to take advantage of the non-uniform frequencies of the letters?
  * Appealing to the problem of compressing data: squeezes all the available gains out of non-uniformities in the frequencies
  * Variable Length Encoding Schemes
    * Morse code uses such short strings for the letters that the encoding of words becomes ambiguous
    * To deal with this ambiguity, Morse code transmissions involve short pauses between letters
  * Prefix Codes
    * The ambiguity in Morse code arises because there exists pairs of letters where the bit string that encodes one letter is a prefix of the bit string that encodes another
    * It is enough to map the letters to bit strings in such a way that no encoding is a prefix of any other
    * A prefix code for a set S of letters is a function γ that maps each letter x in set S to some sequence of zeroes and ones, in such a way that for distinct x, y in set S, the sequence γ (x) is not a prefix of the sequence γ (y).
    * If we hand this message to recipient who knows the function y, they will be able to reconstruct the text according to the following rule
      * Scan the bit sequence from left to right
      * As soon as you’ve seen enough bits to match the encoding of some letter, output this as the first letter of the text (must be the correct first letter because no shorter or longer prefix of the bit sequence could encode any other letter)
      * Delete the corresponding set of bits from the front of the message and iterate
  * Optimal Prefix Codes
    * More frequent letters can have shorter encodings
    * Notation to express the frequencies of letters; below is the average number of bits required per letter
      * ALB(γ ) = Σx∈Sfx |γ(x)|
    * A prefix code that minimizes the average number of bits per letter [ALB(γ )]  is optimal

==Designing the Algorithm==
  * Search space includes all possible ways of mapping letters to bit strings, subject to defining property of prefix codes
    * Brute force is infeasible
  * To construct an optimal prefix code: develop a tree-based means of representing prefix codes
  * In using a binary tree, follow the path from the root to the leaf labeled, go from a node to its left child = 0, go from a node to its right child = 1, resulting string of bits is encoding of x
  * The encoding of S constructed from T is a prefix code
  * Start with a root; all letters x in set S whose encodings begin with 0 will be leaves of the left subtree of the root and all letters y in set S whose encodings begin with a l will be leaves in the right subtree, build two subtrees recursively using this rule
  * Search for optimal prefix code can be viewed as the search for a binary tree T, together with the labeling of the leaves of T, that minimizes the average number of bits per letter
  * The length of the encoding of a letter x in set S is simply the length of the path from the root to the leaf labeled x
  * Seeking the labeled tree that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: Σx∈Sfx *depthT(x)
  * A binary tree is full if each node that is not a leaf has two children
  * The binary tree corresponding to the optimal prefix code is full
  * A natural way to do this would be to try building a tree from the top down by packing the leaves as tightly as possible: this is suboptimal
  * Suppose that u and v are leaves of T*, such that depth(u) < depth(v). Suppose that in a labeling pf T*, corresponding to an optimal prefix code, leaf u is labeled with y in set S and leaf v is labeled with z in set S. Then fy >= fz.
  * Go through leaves in order of increasing depth, assigning letters in order of decreasing frequency
  * Among the labels we assign a block of leaves all at the same depth, it doesn’t matter which label we assign to which leaf
  * Consider a lead v in T* whose depth is as large as possible. Leaf v has a parent w and T* is a full binary tree, so u has another child w. We refer to v and w as siblings since the since they have a common parent.
    * w is a leaf of T*
  * There is an optimal prefix code with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*.
  * y* and z* are the two lowest frequency letters in S
  * This common parent acts like a “meta-letter” whose frequency is the sum of frequencies of y* and z*.

==Huffman’s Algorithm==
     To construct a prefix code for an alphabet S, with given frequencies:
     If S has two letters:
          Encode one letter using 0 and the other using 1
     Else: 
          Let y* and z* be the two lowest-frequency letters
          Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter w of frequency fy*+fz*
          Recursively construct a prefix code y’ for S’, with tree T’
          Define a prefix code for S as follows:
               Start with T’
               Take the leaf labeled w and add two children below it labeled y* and z*
      Endif

  * The greedy rule underlying Huffman’s Algorithm – the merging of the two lowest frequency letters – fits into the structure of the algorithm as a whole
  * Simply commit to having them be children of the same parent, and this is enough to produce a new, equivalent problem with one less letter
  * Bottom-up approach: it focuses on the leaves representing the two lowest frequency letters, and then continues by recursion

==Analyzing the Algorithm==
  * The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code.
  * Implementation and Runtime
    * There are k-1 iterations of recursive calls
    * Using an implementation of priority queues via heaps, we can make insertion and extraction of minimum in O(logk) time
    * Each iteration, which performs three of these operations, takes time O(logk)
    * Summing over all k iterations, we get a total runtime of O(klogk)
==Extensions==
  * The challenge here is to define an encoding schedule where the notion of using fractions of bits is well-defined
  * Another drawback of prefix codes is that they cannot adapt to changes in the text

==Additional Information==
A concept that was troubling me in the beginning stages of understanding Huffman codes was the concept of the meta-letter that is the parent of leaf nodes. As a read the section after the class, it made more sense. The meta-letter is there because a letter node cannot be there. A letter encoding cannot be an interior node because that means that it is the prefix of another node, which is the point of the Huffman codes. There can be no encoding that is the prefix of another. Moreover, meta-letters are helpful and make more sense in Huffman's optimal bottom up approach anyway. The first meta-letter joins the two letters of the lowest frequency and that way its like a super letter and now the alphabet has one less letter. This explanation of the purpose of the meta-letter also makes it clear why we would implement the algorithm with a priority queue.

On a scale of 1 to 10, the readability of this section is a 9. The book explained really well the notation and the concepts behind the Huffman Code. Also, the main part of the code involved binary trees, which we've already seen multiple times in this class, so it was really building off the knowledge we already knew.