Both sides previous revisionPrevious revisionNext revision | Previous revision |
courses:cs211:winter2014:journals:johanna:home [2014/02/26 04:41] – [Chapter 4.5 Summary] goergenj | courses:cs211:winter2014:journals:johanna:home [2014/04/02 21:34] (current) – goergenj |
---|
| |
====== Chapter 4.6 Summary ====== | ====== Chapter 4.6 Summary ====== |
* | * In this section, we explore the Union-Find Data Structure which supports rapid searching and updating of connected components of a graph while Kruskal's Algorithm is being implemented |
| * Since Kruskal's Algorithm adds edges before the nodes are necessarily connected, the set of edges in the MST begins with a series of nonsensical non-connected components until the algorithm terminates. This means it would be really helpful to have a structure that tells us which connected component each edge (v, w) belongs to throughout our algorithm implementation. The Union-Find data structure allows us to find whether a node //u// is connected to a node //v// by checking if Find(u) = Find (v), since Find(v) will return the name of the set containing //u//. Note that the Union-Find data structure is only equipped to handle edge addition, not deletion throughout the Kruskal Algorithm implementation. |
| * The Union-Find data structure will be implemented using an array called //Component// which has indices 1-n pertaining to the nodes 1-n, where the value stored at //Component[n]// is the name of the connected component that contains node //n//. Even with some optimizations that consider the union of large sets into one connected component, the worst-case run-time of adding nodes to a connected component using this implementation is O(n), which is not ideal when adding nodes will be done repeatedly throughout Kruskal's Algorithm. |
| * A different approach is using pointers between nodes and connected components represented by a member node's name (the first node to be added to a component, i.e. to exist without being connected to any other nodes). It is possible using this data structure to implement Kruskal's Algorithm in O(mlogn) time because we are using the Find function 2m times (and Find has a runtime of O(logn) time now), and the Union operations n-1 times (the Union operation has a runtime of O(1) and MakeUnionFind has O(n) runtime). |
| * Question: Would we just use //n// 2-element arrays to create the node object in the pointer implementation? Would the first element in the array be the actual node and then the second element be the node it points to? |
| * 6/10 very intricate stuff |
| |
| ====== Chapter 4.7 Summary ====== |
| * In this section we talk about clustering. The algorithm for constructing a k-clustering of maximum spacing is actually the same as Kruskal's Algorithm except because we start with an empty graph H and add the smallest edge in U one at a time until we have k connected components. The only time we don't add an edge (//p<sub>i</sub>, p<sub>j</sub>//) is when both //p<sub>i</sub>// and //p<sub>j</sub>// are in the same connected component already, ensuring that we do not create a cycle. We know the spacing of the resultant k-cluster is optimal because we can prove it using proof 4.26. The idea behind the proof is to assume there is some other k-clustering of vertex set U and show that the spacing of this other set, call it //C'//, is at most the spacing of our original solution, //C//. If //C'// is not equal to //C//, this means there exists two nodes //p<sub>i</sub>// and //p<sub>j</sub>// such that both are in cluster //C<sub>r</sub>// in //C// but are in different clusters in //C'//. Let's say //p<sub>i</sub>// in //C'<sub>s</sub>// and //p<sub>j</sub>// in //C'<sub>t</sub>// which does not equal //C'<sub>s</sub>//. Since both nodes belong to our solution //C//, this means the path between them //P// was added by Kruskal's Algorithm, meaning each edge on //P// has length at most //d*//. But these two nodes belong to different clusters in //C'//. And since the distance of the path //P// between them minus edges containing //p<sub>i</sub>// and //p<sub>j</sub>// is less than or equal to //d*//, this means the spacing of //C'// is at most //d*//, which concludes our proof. |
| * The proof of this algorithm was a bit hard to follow at first, which is why I typed it out as well as I could in my own words instead of summarizing it more compactly. It makes sense now but I am just wondering, is there some way that a greedy-stays-ahead proof could work to prove the optimality of our solution? It seems like since we are constantly optimizing the edges added to our clusters, the greedy algorithm stays ahead intuitively and often when this is the case we can prove optimality using a greedy-stays-ahead proof. |
| * Rating: 4/10 |
| |
| ====== Chapter 4.8 Summary ====== |
| * Data compression is an area that studies compression algorithms-- algorithms that take files as input and reduce their space through efficient encoding. For example, when encoding 32 characters in the English language using 0's and 1's, the simplest solution would be to represent each character using 5 bits since 32 = 2<sup>5</sup>. But anyone who speaks English, or any language for that matter, knows that certain characters are used much more frequently than others and therefore we could more efficiently encode the English language by assigning small numbers of bits to frequent letters and larger numbers of bits to the less frequent letters. Algorithms to implement this type of optimization are what we are looking for. |
| * A prefix code for a set //S// of letters is a function that maps each letter in //S// to some sequence of zeros and ones, in such a way that for distinct letters in //S//, the sequences of //x in S//'s encoding is not a prefix of the sequence //y in S//. Our ideal optimal prefix code, given an alphabet and a set of frequencies for the letters in the alphabet, will minimize the average number of bits per letter, ABL(γ). We will use a binary tree to help us come up with such an optimal prefix code. Each leaf node of the tree will be a letter in the alphabet //S//. We start at the root of the tree //T// and follow along the path to our desired letter //x//. Each time we move from a node to its left child, we write down a 0, and each time we move from a node to its right child we write down a 1. The bit code we are left with when we finally arrive at //x// is the encoding for the letter //x//. Using this system, we ensure that //S// is a prefix code because no letter's leaf node lies on the path to another leaf node (since each leaf node is the end of a downward path through the tree). Next we observe that the binary tree corresponding to the optimal prefix code is full. This is true because if a certain parent node has only one child, the child could be moved up and become the parent node, thus creating a shorter encoding for this child node, and a more optimal overall encoding function. |
| * Top-down approach that splits the alphabet into nearly-equal halves recursively does not guarantee an optimal prefix code. However, there exists a greedy approach that yields an optimal prefix code using a binary tree. By proof 4.29, we can see that if a leaf node //i// has greater depth than a leaf node //j// then the letter represented by node //i// has to have less frequency than the letter represented by node //j//. Then it follows that we can build an optimal prefix code by first taking all leaves of depth 1 and labeling them with the highest frequency letters, then labeling the leaves of depth 2 with the next-highest frequency letters, etc. This partially solves our overall problem. It allows us to create an optimal prefix encoding given the structure of tree //T// and the alphabet with a set of frequencies for the letters in the alphabet. But we aren't really building this optimal encoding "from scratch". |
| * In order to do so, we must begin building our tree from the bottom up. We will consider the set of all letters in our alphabet initially, choosing the two with the smallest frequencies-- call them //e// and //f//. These will be merged into one single letter (//ef//) of frequency //f<sub>e</sub>// + //f<sub>f</sub>//. Then this merged letter will be added back to the set of letters in the alphabet and //f// and //e// will be deleted from the set. We will continue to consider and merge the two lowest-frequency letters in the set until all the letters have been merged into one letter, which will become the root node of our tree with frequency 1. |
| * Question: Implementing this algorithm using priority queues via heaps is a clear choice but the idea of using the tree to represent our overall encoding does not seem as obvious. I assume we would use an array for our underlying implementation of the binary tree since it allows constant access? |
| * Rating: 6/10 longest section ever! But reading it again after discussing it in class helped clarify a lot. |
| |
| ====== Chapter 5.1 Summary ====== |
| * In this section, we look at the mergesort algorithm, a very common example of a divide and conquer algorithm. The general structure of the mergesort algorithm is that we divide the input into two pieces of equal size, sort the two equal halves recursively, then combine the two results into an overall solution, spending linear time for initial division and final recombining. If we let //T(n)// be the worst-case running time of this algorithm on an input of size //n//, we can write the recurrence relation shown in (5.1). It basically states that the worst-case runtime T(n) is always less than 2T(n/2) + O(n). O(n) can also be written as cn for some constant c, since this is the actual definition of O(n). Unfortunately this representation of T(n) is not so helpful because we would prefer to have T(n) only on the left side of the inequality, not both sides. We will need to solve the recurrence relation. |
| * There are a few approaches to solving the recurrence relation. One is to "unroll" the recursion, finding a pattern for runtime that depends on the level of recursion. This is the intuitive way to solve the recurrence relation for mergesort. The other way is to start with a guess for the solution, then substitute it into the recurrence relation and check its accuracy. |
| * Here is how "unrolling" the recursion works for mergesort: At the first level, we have a single problem of size //n//, which by definition takes at most //cn// plus the time spent in all subsequent recursive calls. The next level consists of two problems each of size //n/2// which take time //cn/2// plus the time spent in all subsequent recursive calls. Thus this level overall takes //cn// time as well. this pattern continues, resulting in the fact that each level takes a time of //cn//. Since there are always going to be log//n// levels of recursion when given a total input size of //n//, and each level takes //cn// time at most, we get a total running time of O(//n//log//n//). |
| * Here is how guessing a solution and substituting it into the recurrence relation works for mergesort: We can prove by induction that //T(n) ≤ cn//log<sub>2</sub>//n// for all //n// ≥ 2. We assume that //T(m) ≤ cm//log<sub>2</sub>//m// for all //m// less than //n// and then go on to prove inductively that the same holds for //n//. If we do not know for which //k// //T(n)// ≤ //kn//log<sub>2</sub>//n//, we can also use partial substitution to find this constant //k//. |
| * Questions: When would we use the guessing a solution approach to solve a recurrence relation? Both types of guessing approaches seem to start with some type of specific idea as to what the runtime will be. Would it ever be easier to use this than the "unrolling" approach? |
| * Rating:4/10 |
| |
| |
| ====== Chapter 5.2 Summary ====== |
| * This chapter is all about unrolling different recurrence relations to come up with asymptotic runtime. We look at the recurrence relation T(n) < qT(n/2) + cn for two separate cases-- when q = 1 and when q > 2. In the case that q > 2, unrolling the recurrence the same way we demonstrated extensively in class and on our problem set gives us the result O(n<sup>log2n</sup>) as the asymptotic runtime. In the case that q = 1, the runtime is O(n) because each level of the recurrence tree (let this level be //k//) yields a one problem the size of cn/2<sup>j</sup>. These recurrence relations could also have been solved by partial substitution, in which a bound is guessed and then checked through induction. When q = 2, the asymptotic runtime of the algorithm is O(nlogn). This range of running times can be attributed to the different pieces of the algorithm that account for most of the work. When q = 1, the first level dominates the running time whereas when q=2, each level does equal work in contributing to the running time, making it sensible that O(nlogn) would be the running time since this divides the levels evenly. Last, we find that a recurrence relation of T(n) ≤ 2T(n/2) +cn<sup>2</sup> yields a running time of O(n<sup>2</sup>). This makes sense because at each level //k// there are 2<sup>k</sup> problems each with a size of (n/2j)<sup>2</sup>. When multiplied and placed into a geometric series, this leaves us with a T(n) less than or equal to O(n<sup>2</sup>). |
| * Question: It would be helpful to see an example of figuring out asymptotic run time using partial substitution. I think the ideas make sense in the book but it would be difficult for me to think of them on my own. |
| * Rating: 4/10 |
| |
| ====== Chapter 5.3 Summary ====== |
| * This chapter deals with the "movie ranking" problem we worked on in class where two preference lists are considered and we try to compare them to see how similar they are. The strategy is to count the number of inversions in order to measure this. In order to specifically compare two separate lists, it is only necessary to use the ordering of one list as the "regular" ordering and then do all actual operations on the other. This list will be the one involved in the algorithm. To find the number of inversions, it is necessary to split the list into two halves, consider how many inversions exist in each half alone, then how many inversions exist between the two halves. This process will be repeated recursively. The recursive step of this algorithm will also be responsible for sorting the lists in order that the combining step will have to do less work and thus decrease the running time. This important process will be called the Merge-and-Count. The Merge-and-Count will walk along the sorted lists A and B, appending the smallest element to the sorted list C. We use pointers to our current position in each list in order to keep track of our place when adding to C. If the pointers are pointing to two elements a and b, then the lowest of the two simply gets removed and added to C and then the pointer on this list is updated. In this same step we can also count the inversions because if an element of list A is added to C, no inversions exist (since all elements of A are smaller than elements of B) but if an element of B is added to C, then the inversions added by this element are equal to the number of elements remaining in A. Merge-and-Count runs in O(n) time and the entire algorithm requires O(nlogn) time. |
| * Questions: Would this algorithm fall apart if the list were to be recursively divided into three sub-lists? Would it be possible to use more sub-problems or would this break it? The step where we don't have to add inversions to C if the new element comes from A and vice versa for B would be a bit different but I feel like it would still work somehow. |
| * Rating: 4/10 |
| |
| ====== Chapter 5.4 Summary ====== |
| * In this section, we are going to find the closest pair of points! The first step of our algorithm is to sort all n points into order of ascening x-coordinates. Then create another list where they are sorted by ascending y-coordinates. Then we divide the plane of points into two halves with n/2 points. This is done by splitting the list P<sub>x</sub> into two halves, Q and R. Next, we look through P<sub>x</sub> and P<sub>y</sub> and in O(n) time we can create a list of the points in Q sorted by increasing x-coord, a list of points in Q sorted by increasing y-coord, and the same two lists for the R half. Then we calculate the two closest points in R and Q and we then assign the smallest of these two values to the variable delta. Then we can search in between the two halves, with a search radius of delta extending from the center of the plane. If there are no other points within delta from the center line, we know there are none closer than the closest points in R or Q. Otherwise, we search along the dividing line using boxes of size (δ/2 x δ/2) to contain them. If a point is in one such box, it is necessary to compare its distance to any points within the next 15 boxes upwards and to the sides (as long as it's across the axis) of this point's box (and within it). Otherwise, it need not be compared to any other points along the dividing line. This algorithm will complete in O(nlogn) time. |
| * Questions: The proof of this algorithm is a bit fuzzy to me. I would like to discuss it in class. |
| * Rating: 7/10 this is a really cool algorithm. |
| |
| ====== Chapter 6.1 Summary ====== |
| * This section deals with weighted interval scheduling. The input is a set of jobs with start and end times and values (or weights) and the goal is to accept the set of non-overlapping jobs with the greatest value possible. No greedy algorithm exists. So the idea is that we will design an algorithm based on the idea that for each job, we will either select it or not. If we do not select job j, we go on to consider job j-1. If we do select it, we add the value of j plus the optimal solution of the next non-overlapping set to our final solution. In order to pick whether or not we add job j, we simply recursively choose the maximum of these two options. The efficiency of this algorithm relies on memoizing our recursion. This means after the first time we compute the optimal value for a certain job, we store it in a globally accessible array and thus we will not have to calculate it each time we are calculating the optimal value for other jobs. Using memoization the compute-opt method becomes O(n). |
| |
| |
| ====== Chapter 6.2 Summary ====== |
| * Instead of defining our solution to the weighted intervals problem recursively, we can also create an equally efficient iterative solution. In this solution, we begin by computing M[0], the memoized optimal value of interval 0, then iterate through the jobs until we reach our job n. From here on out we will use dynamic programming algorithms that use this iterative type of approach instead of recursive. In order to build an algorithm based on dynamic programming, we need a set of subproblems that satisfies these properties: |
| * polynomial number of subproblems |
| * solutions to subproblems can be easily combined to solve overall problem |
| * subproblems can be naturally ordered from "smallest" to "largest" |
| * Question: Why is it better to use the iterative approach than the recursive approach? Is it just for the sake of writing the algorithm or is there some type of running time tradeoff I am missing? |
| * Rating: 5/10 because it didn't give much detail on the reason for our new interest in iterative algorithms instead of recursive ones. |
| |
| ====== Chapter 6.3 Summary ====== |
| * The next section focuses on problems where we do not just pick between choosing something or not but instead have to choose between some polynomial number of possibilities. These are called "multi-way choices". |
| * We will try to create a line of best fit given a set of n points in the (x,y) plane. This problem is unique because instead of creating just one line to fit the entire set of points, often it looks like the set of points would be better fit by two lines. This will be accommodated by our algorithm. We will divide the line into partitions P, where P corresponds to a line segment S and for each P, we will calculate the penalty of the partition which is defined as the sum of the number of segments into which P is partitioned and the error value of the optimal line through that segment. The penalty will help us to use dynamic programming to calculate the optimal set of lines to fit our points. |
| |
| ====== Chapter 6.4 Summary ====== |
| * The knapsack problem! |
| * Given a set of objects with weight and value, and a knapsack with a maximum weight W, how can we maximize the value of the items stored in the knapsack without putting too much weight in it? |
| * In order to solve for Opt(n), we will need to find out the value for Opt(n-1) and the Opt() of all subsets of the first n-1 items. Thus, we will use a 2D array to store the values for the optimal value in the bag per weight per num items added for each subset. This is a version of the memoization we used in previous algorithms to allow us not to have to recalculate and help us easily make comparisons throughout the algorithm. |
| |
| |
| ====== Chapter 7.1 Summary ====== |
| * This section deals with flow networks. These are networks that model traffic through "switches" passing from edge to edge. These are modeled by graphs with a few components: capacities on the edges, source nodes in the graph that generate traffic, sink nodes in the graph that absorb traffic, and the traffic itself which is transmitted across the edges. The graphs of this form are directed graphs represented by G = (V, E) as any other graph. However, this graph also has a single source node //s// and a single sink node //t//. The other nodes will be called internal nodes. An //s-t// flow is a function mapping each edge //e// in E to a nonnegative real number. The value f(//e//) represents the amount of flow carried by edge //e//. A value f(//e//) must satisfy the two values: 0≤f(//e//)≤c<sub>e</sub> and the sum of flow into //e// is equal to the sum of the flow out of //e//. The problem we will try to solve with this network flow model will be to find the arrangement of traffic that makes the most efficient use possible of a given set of capacities and flows. |
| * The residual graph is a tool we will use to implement an algorithm to solve the maximum-flow problem. The residual graph contains the same nodes as our original graph but its edges are different. Each forward edge (u,v) has a capacity of c<sub>e</sub> - f(//e//) and then a backwards edge (v,u) is added so that we can "undo" this forward flow. The capacity of this backwards edge is f(//e//). |
| * The augment function defined on page 342 traverses a path P turning it into a simple residual graph by decreasing the remaining capacity of each edge by bottleneck(P, f) and increasing the residual capacity by b. Proof 7.1 proves that the result of this augment function is a flow in G. |
| * Then the Ford-Fulkerson Algorithm uses this augment function and creates an entire residual graph and then uses it to calculate the maximum flow. It does so by running the augment function on every s-t path in the residual graph. |
| |
| ====== Chapter 7.2 Summary ====== |
| * We will now use the fact that cuts provide very natural upper bounds on the values of flows. This is since the capacity of a cut (A,B) is the sum of capacities on all edges out of A. Then we will prove that v(f)= f<sup>out</sup>(A) - f<sup>in</sup>(A). The intuition is that f<sup>in</sup>(s) = 0 so v(f) = <sup>out</sup>(s) - <sup>in</sup>(s). Then either edge //e// has both ends in A, //e// has one end in and one end out of A, or //e// has neither ends in A. So f(//e//) either appears once in the sum with a + and once with a -, just once in the sum with a +, or just in the sum with a -. Thus, the sum of flow for edges in A is f<sup>out</sup>(A) - f<sup>in</sup>(A). |
| * This helps us come to the conclusion that the maximum flow in the maximum-flow problem is equal to the minimum cut. |