Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2014:journals:annebailey:home [2014/03/04 21:25] – [4.7 Clustering] dickensa
+++ courses:cs211:winter2014:journals:annebailey:home [2014/03/30 18:57] (current) – dickensa
@@ Line 86: / Line 86: @@
 =====5.1 A First Recurrence: The Mergesort Algorithm =====
 We look at Divide and Conquer algorithms starting with Mergesort since it divides a list of numbers in two halves, sorts each half recursively, and combines the results.  I found the beginning of the recurrence explanation confusing.  A recurrence can be solved either by running across a few levels to see a pattern and then summing over the running times of all of the levels or by guess and check with an induction on n.  The first method, when used on mergesort, shows that we have 2^n problems of size cn/2^n on each level, so we have cn time at most on each level.  We will have at most log2n levels deep, so we have a total running time of O(nlogn).  Guess and check in the plug in method will show us that, if we follow the equation T(n) <= 2T(n/2) + cn, by plugging in our guess of a running time of nlogn as (n/2)log(n/2).  The inequality holds, so we can have preserved the inequality and our guess and check is correct.  If we follow the equation after plugging in our guess and find a contradiction that the time associated with n/2 is greater than that for n, we won’t have the proper depth.  We must finish the induction without contradiction to have a proof in this way.  (6/10)
+=====5.2 Further Recurrence Relations =====
+We want to consider algorithms that recursively call on q subproblems of size n/2, then combine results in O(n) time.  Mergesort is an example of such behavior for q=2.  For q=2, T(n) follows the recurrence relation (for some constant c) T(n) <= qT(n/2) + cn for n>2, and T(2) <= c.  We will treat every different q separately.  For q>2, We start by analyzing the first levels.  The first level will have a single problem of size n, for a total of cn.  The second level will have q problems of size n/2.  The nth level of recursion will have q^(n) problems of size 1/2n.  The total work on a level will be q^n(cn/2^n), and therefore the total workload will be O(n^(log2(q))).  When q=1, there is just the combination of the solutions, so we have a workload of O(n).  For q=2, the workload is O(nlogn).  For q=1, the work is mostly in the top level; q=2 is a set in which each level has the same amount of work; for q>2 a majority of the work will be done at the bottom levels.  When recombination requires time O(cn^2), it will be ruled by O(n^2) considering how quickly (n/x)^2 decreases as x increases. I find the T(n) notation confusing, but understand how we arrived at each of the upper bounds in some way or another.  (7/10)
+=====5.3 Counting Inversions =====
+We start with a problem that analyzes by rankings to match preferences with a target audience by the process of collaborative filtering.  This method could also be used for meta-searches, which combine the process of searching multiple search engines on the web.  In a set of n items in a specific rank order, collaborative filtering will look for other users who have ranked items similarly to your order.  In order to carry this out, the process will count inversions, which is the amount of items “out of order” between the two sets.  When a set fully disagrees, there will be n choose 2 inversions total, at most.  The simplest algorithm would look through each pair in O(n^2) time, but we can find an algorithm with O(nlogn) instead.  We will divide the list into two halves, count the inversions in each half, and then combine the solutions in O(n). We will walk through each of the sorted lists of half the size of the total elements, and then append them to a final sorted list with the number of inversions counted.  This will take a total of O(n) time.  Since A and B are sorted, an inversion occurs when any item goes onto the separate list C from B, in the amount of the number of items remaining in A but not yet appended to C.  I’m completely confused about how we sort lists A and B when they are rankings assigned by the user.  (9/10)
+=====5.4 Finding the Closest Pair of Points =====
+When we have n points in a plane, we want to find the pair of points that is closest together; we want to find a solution that is more efficient than the obvious O(n^2) of looking at every pair of points.  We will be able to find an algorithm that completes the problem in O(nlogn) at this point, and eventually in O(n) with randomization.  We want to find the points pi,pj which minimize d(pi,pj).  In one dimension, this can be done by sorting the points in O(nlogn) time by x coordinate and then walking through that sorted list and looking at the distances from one point to the next.  The first step of our algorithm is the same – we sort the points by their x-coordinate.  We divide the sets in half at this point so that we have two sets of size n/2 (left half will have the ceiling function).  When we run through the points to order by x-coordinate, we will also take note of the order by y-coordinates, all in O(n) time.  Using accumulating/pointer variables, we find the closest points in each the left and the right side, and set that distance equal to delta.  We then look at all points within delta of the line we used to separate the sets, and if they are closer than delta to one another, they are the new closest pair.  This is an algorithm based heavily on recursion and knowing how to efficiently combine our data to produce a total running time of O(nlogn).  About halfway through reading this section I finally understood exactly what we had talked about in class and was able to sketch the solution before reviewing the class notes and remainder of the text.  This was an extremely helpful section.  (9/10)
+=====5.5 Integer Multiplication =====
+For integer multiplication we require more than two recursive calls on each level.  The elementary school method of finding a product is O(n^2), but we can do better.  x_1 will correspond to the “high-order” n/2 bits in a base 2 number system, and x_0 to the “low order” n/2 bits.  Then xy = x_1*y_1*2^n + (x_1y_0 + x_0y_1)*2^(n/2) + x_0y_0.  We recursively solve the problem for the four problems of size n/2. Combining our solution will take O(n), so our recurrence is bounded on T(n) <= 4T(n/2) + cn, q = 4, so this will be T(n) <= O(n^(log2 q) = O(n^2).  To bring q down to 3 and the whole algorithm to O(n^(1.59)), we will solve x_1y_1 and x_0y_0 by recursion; we find some of the terms and then use them to determine the other terms.  Each recursion will be on a size of n/2 asymptotically, and in our better instance we will have T(n) <= 3T(n/2) + cn.  From the earlier parts of the chapter, we know O(n^(log2 3)) = O(n^1.59) is our runtime. This problem was easy to understand, but it glossed over how we achieve the q = 3 case a bit too much.  (5/10)
+=====6.1 Weighted Interval Scheduling: A Recursive Procedure =====
+Chapter 6 is about Dynamic Programming, which is the opposite of greedy as it looks at every possible solution by making a series of subproblems, and then building up the correct solutions to increasingly large subproblems, in a method that is very close to brute force but without looking at every problem’s solution directly. We will first use this method in a recursive form for weighted interval scheduling.  We want to accept a set of intervals that do not overlap of maximum value in this problem.  We don’t know any greedy algorithm that optimally solves such a general form of the problem yet.  Each request I will have start time s_i, finish time f_i, and value v_i. We want to choose a subset S in {1, … , n} with no overlap in the elements of S where the sum of the values of the elements in S are maximized.  To solve this problem, we will break up our data set such that OPT(j) is the value of an optimal solution of a smaller problem of form {1, 2, … , j}.  We are looking for OPT(n). p(j) is the largest index of a job (in order of increasing finish time) such that the job does not overlap with j.  Then OPT(j) = max(v_j + OPT(p(j)), OPT(j-1)).  j belongs to the optimal solution if and only if v_j + OPT(p(j)) >= OPT(j-1).  Assuming that we have already sorted by finish time and figured out p(i) for each i, we have then figured out our recursion relation.  By induction on j, we find out that our solution is optimal [Comp-Opt(0) = 0 trivially – for j>0, Comp-Opt(i) is correct for 0<i<j – we know that Comp-Opt(p(j)) = OPT(p(j)) and Comp-OPT(j -1) = OPT(j-1), so it follows that OPT(j) = max(v_j + Comp-Opt(p(j)), Comp-Opt(j-1)) = Comp-Opt(j)].  But this would take exponential time.  Therefore we must memoize the recursion (save previously calculated inputs) to speed up time to polynomial-time, our ultimate goal.  We are really only looking for Comp-Opt(0,…,j), which is of size n+1, so we can store each Comp-Opt when it is calculated so that it is accessible. Each single call is O(1), and we can only call O(n) of these values, so our M-Comp-Opt with memoization will run in our desired O(n).  In addition to finding the optimal value, we should keep track of the set of intervals, so we will add this functionality into our M-Comp-Opt by getting our optimal solution back from the array M after our optimal solution is achieved.  We create a Find-Solution(j) recursion that checks whether v_j + M(p(j)) >= M(j-1), and if it is, it outputs j in addition to the result of Find-Sol’n(p(j)), or if it is not, outputs the result of Find-Sol’n(j-1).  Therefore, it takes O(n) calls with O(1) per call.  It therefore returns the solution in O(n) time when it has the array M of the optimal values of the subproblems.  I’m not entirely clear on where the array M comes from, but the remainder of the solution is very clear. This was a very helpful section to read though it took a few rounds to understand.  (8/10)
+=====6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems =====
+We will here convert our last section’s Weighted Interval Scheduling method using M-Comp-Opt, the array M, and Find-Sol’n to explain and study dynamic programming by iterating through subproblems instead of using explicit recursion.  The array M is important because it stores the optimal solutions to each iterative subproblem.  Once we create the entire array, we have the solution, and tracing back with Find-Sol’n just takes the data from M and converts it to the indexes we have in an optimal solution set S.  Instead of memoized recursion, we can increment j and take the maximum as M[j] of the function that we have already stated.  By induction, as before, we can show that this chooses OPT(j) as M(j).  It is clearly an algorithm in O(n) time, as it simply goes through n problems and chooses and assigns a maximum in constant time at each level.  Each manner works, but we will focus on the iterative build-up of subproblems.  We will need to choose subproblems such that they are (informally) polynomial in number, they can compute the original problem’s solution, and they have a natural ordering and easy recurrence so that they can be calculated from the “smaller” subproblems easily.  Though we may want to go about ordering subproblems, it will be easier if we first define their recurrence to be sure that they will be useful subproblems.  This section was particularly readable, and easy.  It was frustrating to read the simple iterative algorithm. The last section was hard to read, and it was so stinking simple.  What does the book mean by these being informal guidelines for the subproblems?  I liked this section a lot, but it was really vague.  (8/10)
+=====6.3 Segmented Least Squares: Multi-Way Choices =====
+We want to look at a dynamic problem where we do not have the binary choice of an interval being inside or outside of the optimal solution.  Instead, there will be multiple possibilities for the solution’s structure.  Here we are looking at a line of best fit through a scattered plot of data points.  We decide which line best fits by finding the minimum cumulative error – the sum of the distances that each of the points are from the line of best fits.  Since the points may lie on more than one line, such as data where the points follow three lines, we will look to best fit the data with as few lines as possible.  We will want to keep track of the points where the curve changes from one line to another.  To find the minimum amount of lines, we use a penalty system where we multiply the number of segments by a positive multiplier and add that to a segment and the error value of the optimal line through each segment.  We want to find the line with minimum penalty; since the multiplier is not set, we can set it higher to penalize using more lines.  This problem is a partition of n objects.  If our last partition optimally is i through n, then the optimal value is OPT(n) = e_(i,n) + C +OPT(i-1) where e_(i,n) is the minimum error on i through n. and OPT(j) = min_(1-i-j) (e_(i,j) + C + OPT(i-1)).  We will trace back through the array M to create a partition.  This algorithm will run through n times on n^2 points, so our algorithm to find the errors is O(n^3), and to run through the array and find segments, we will have O(n^2) after the errors have been caluculated.  I am not entirely clear on how we find the formulas for the different lines, and this section was not particularly readable.  (5/10)
+=====6.4 Subset Sums and Knapsacks: Adding a Variable =====
+We will add to our interval scheduling problem by looking at n requests which use resources between time 0 and W, and have weights w_i for problem i.  we want the sums of the weights to be as large as possible without exceeding W.  We call this a knapsack problem because we fill our knapsack with items of value v_i and weight w_i until the total weight does not succeed W and the value is the maximum problem.  We need to work out enough subproblems that we know OPT(n-1)’s value and the best possible solution on n-1 for weight W-w_n.  Then OPT(I,w) = max(OPT(i-1,w), w_i + OPT(i-1,w-w_i)).  To do that we will build a table of the values of OPT(i,w) which only computes each value once.  Finding M[i,w] comes from two other values in the table, so we compute each of these in constant time, so the running time increases in proportion with the number of entries in the table, so Subset-Sum(n,W) runs with O(nW) time, so it is pseudo-polynomial, which grows very quickly with large inputs.  Given M, we can find the optimal set with O(n) time.  Therefore, our total running time is O(nW).  I found this section very readable and helpful, but my understanding was dependent on having already studied this in class.  (6/10)
+=====6.6 Sequence Alignment =====
+We use sequence alignment to compare strings using shortest paths in graphs with any integer weight to edges.  When we compare strings we can use gaps to minimize mismatches.  To define the similarity of two strings, we should look at the indexing without gaps.  An alignment will denote which symbols are the same in order without counting the gap spaces.  For example, if the second spot of the first word and the first spot in the second word match, we add the pair (2,1) to the set of alignments.  When positions don’t match, there is a gap, and it has cost delta > 0.  Mismatches cost a specific amount, and having the same letter costs nothing.  The cost total is the sum of gap and mismatches.  The first alignment of strings will only be better if the number of gaps times delta + the sum of the mismatches is less than the number of gaps times delta; in this case the first way uses mismatched letters and the second puts in gaps everywhere to avoid mismatches.  Similarity is therefore the minimum cost of an alignment between X and Y.  If we have an optimal M, either (m,n) is in M, the mth position of X is not matched, or the nth position of Y is not matched.  The recurrence for i and j will be OPT(i,j) = min([alpha_x_i,y_j + OPT (i-1,j-1) , delta + OPT (i-1,j), delta + OPT(i,j-1)]).  If (i,j) is in our optimal M, then the first of these WILL achieve the minimum value.  This runs in time O(mn) since A has O(mn) values, and we spend at most constant time on each.  We have the minimum cost path from i to j, f(i,j) = OPT(i,j).  I thought this section was a bit confusing because it failed to give an algorithm for how exactly to choose where to put gaps.  (8/10)
+=====7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm =====
+Transportation uses a network model in which edges carry traffic and nodes let you switch between different highways; then networks will have capacities on the edges, source nodes that generate traffic, and destinations which take in and eliminate traffic.  These highways will have directions along which the traffic flows, so they will be directed graphs with capacities on edge e c_e, one source node s, and one sink node t.  We want to move the traffic around so that we efficiently use the capacities available.  The maximum flow equals the minimum capacity of an area, called the minimum cut.  To find the maximum flow, we want to push as much flow as possible along the edges from s to t.  We can push backwards and create a residual graph.  I do not understand at all how the pushing backwards of flow works.  We can show that the Ford-Fulkerson Algorithm terminates while computing an s-t flow on G after at most C, the sum of the capacities of the edges iterations on the while loop.  The bounds can simply be O(m), so O(mC).  This section confused me instead of clearing things up.  (4/10)
+=====7.2 Maximum Flows and Minimum Cuts in a Network =====
+This section continues analyzing the uses of the Ford-Fulkerson algorithm from page 344.
+We want to show that the algorithm returns the absolute maximum value for any flow in G; therefore we will use cuts to place more upper bounds than just C.  Dividing the nodes into two sets, A and B means that we can bound the maximum flow values on A and on B; a cut is therefore a partition of the vertex set into A and B where s is in A and t is in B.  The capacity of this partition is c(A,B) = sum_(e in A) c_e. v(f) \leq c(A,B).  Every flow is upper-bounded by the capacity of every cut, so the maximum flow is equal to the minimum cut. W can compute an s-t cut of minimum capacity in O(m) time since we can extend the algorithm easily.  There is a maximum flow where every flow is an integer, but this is not always the case for every flow.  We can prove the Max-Flow Min-Cut Theorem even for real numbers, but we work well with integers on this problem.  This section was easier to read and helped a bit with the Ford-Fulkerson algorithm, but I still don’t understand residual graphs.  (7/10)