Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2014:journals:annebailey:home [2014/03/17 00:54] – dickensa
+++ courses:cs211:winter2014:journals:annebailey:home [2014/03/30 18:57] (current) – dickensa
@@ Line 106: / Line 106: @@
 We will here convert our last section’s Weighted Interval Scheduling method using M-Comp-Opt, the array M, and Find-Sol’n to explain and study dynamic programming by iterating through subproblems instead of using explicit recursion.  The array M is important because it stores the optimal solutions to each iterative subproblem.  Once we create the entire array, we have the solution, and tracing back with Find-Sol’n just takes the data from M and converts it to the indexes we have in an optimal solution set S.  Instead of memoized recursion, we can increment j and take the maximum as M[j] of the function that we have already stated.  By induction, as before, we can show that this chooses OPT(j) as M(j).  It is clearly an algorithm in O(n) time, as it simply goes through n problems and chooses and assigns a maximum in constant time at each level.  Each manner works, but we will focus on the iterative build-up of subproblems.  We will need to choose subproblems such that they are (informally) polynomial in number, they can compute the original problem’s solution, and they have a natural ordering and easy recurrence so that they can be calculated from the “smaller” subproblems easily.  Though we may want to go about ordering subproblems, it will be easier if we first define their recurrence to be sure that they will be useful subproblems.  This section was particularly readable, and easy.  It was frustrating to read the simple iterative algorithm. The last section was hard to read, and it was so stinking simple.  What does the book mean by these being informal guidelines for the subproblems?  I liked this section a lot, but it was really vague.  (8/10)
+=====6.3 Segmented Least Squares: Multi-Way Choices =====
+We want to look at a dynamic problem where we do not have the binary choice of an interval being inside or outside of the optimal solution.  Instead, there will be multiple possibilities for the solution’s structure.  Here we are looking at a line of best fit through a scattered plot of data points.  We decide which line best fits by finding the minimum cumulative error – the sum of the distances that each of the points are from the line of best fits.  Since the points may lie on more than one line, such as data where the points follow three lines, we will look to best fit the data with as few lines as possible.  We will want to keep track of the points where the curve changes from one line to another.  To find the minimum amount of lines, we use a penalty system where we multiply the number of segments by a positive multiplier and add that to a segment and the error value of the optimal line through each segment.  We want to find the line with minimum penalty; since the multiplier is not set, we can set it higher to penalize using more lines.  This problem is a partition of n objects.  If our last partition optimally is i through n, then the optimal value is OPT(n) = e_(i,n) + C +OPT(i-1) where e_(i,n) is the minimum error on i through n. and OPT(j) = min_(1-i-j) (e_(i,j) + C + OPT(i-1)).  We will trace back through the array M to create a partition.  This algorithm will run through n times on n^2 points, so our algorithm to find the errors is O(n^3), and to run through the array and find segments, we will have O(n^2) after the errors have been caluculated.  I am not entirely clear on how we find the formulas for the different lines, and this section was not particularly readable.  (5/10)
+=====6.4 Subset Sums and Knapsacks: Adding a Variable =====
+We will add to our interval scheduling problem by looking at n requests which use resources between time 0 and W, and have weights w_i for problem i.  we want the sums of the weights to be as large as possible without exceeding W.  We call this a knapsack problem because we fill our knapsack with items of value v_i and weight w_i until the total weight does not succeed W and the value is the maximum problem.  We need to work out enough subproblems that we know OPT(n-1)’s value and the best possible solution on n-1 for weight W-w_n.  Then OPT(I,w) = max(OPT(i-1,w), w_i + OPT(i-1,w-w_i)).  To do that we will build a table of the values of OPT(i,w) which only computes each value once.  Finding M[i,w] comes from two other values in the table, so we compute each of these in constant time, so the running time increases in proportion with the number of entries in the table, so Subset-Sum(n,W) runs with O(nW) time, so it is pseudo-polynomial, which grows very quickly with large inputs.  Given M, we can find the optimal set with O(n) time.  Therefore, our total running time is O(nW).  I found this section very readable and helpful, but my understanding was dependent on having already studied this in class.  (6/10)
+=====6.6 Sequence Alignment =====
+We use sequence alignment to compare strings using shortest paths in graphs with any integer weight to edges.  When we compare strings we can use gaps to minimize mismatches.  To define the similarity of two strings, we should look at the indexing without gaps.  An alignment will denote which symbols are the same in order without counting the gap spaces.  For example, if the second spot of the first word and the first spot in the second word match, we add the pair (2,1) to the set of alignments.  When positions don’t match, there is a gap, and it has cost delta > 0.  Mismatches cost a specific amount, and having the same letter costs nothing.  The cost total is the sum of gap and mismatches.  The first alignment of strings will only be better if the number of gaps times delta + the sum of the mismatches is less than the number of gaps times delta; in this case the first way uses mismatched letters and the second puts in gaps everywhere to avoid mismatches.  Similarity is therefore the minimum cost of an alignment between X and Y.  If we have an optimal M, either (m,n) is in M, the mth position of X is not matched, or the nth position of Y is not matched.  The recurrence for i and j will be OPT(i,j) = min([alpha_x_i,y_j + OPT (i-1,j-1) , delta + OPT (i-1,j), delta + OPT(i,j-1)]).  If (i,j) is in our optimal M, then the first of these WILL achieve the minimum value.  This runs in time O(mn) since A has O(mn) values, and we spend at most constant time on each.  We have the minimum cost path from i to j, f(i,j) = OPT(i,j).  I thought this section was a bit confusing because it failed to give an algorithm for how exactly to choose where to put gaps.  (8/10)
+=====7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm =====
+Transportation uses a network model in which edges carry traffic and nodes let you switch between different highways; then networks will have capacities on the edges, source nodes that generate traffic, and destinations which take in and eliminate traffic.  These highways will have directions along which the traffic flows, so they will be directed graphs with capacities on edge e c_e, one source node s, and one sink node t.  We want to move the traffic around so that we efficiently use the capacities available.  The maximum flow equals the minimum capacity of an area, called the minimum cut.  To find the maximum flow, we want to push as much flow as possible along the edges from s to t.  We can push backwards and create a residual graph.  I do not understand at all how the pushing backwards of flow works.  We can show that the Ford-Fulkerson Algorithm terminates while computing an s-t flow on G after at most C, the sum of the capacities of the edges iterations on the while loop.  The bounds can simply be O(m), so O(mC).  This section confused me instead of clearing things up.  (4/10)
+=====7.2 Maximum Flows and Minimum Cuts in a Network =====
+This section continues analyzing the uses of the Ford-Fulkerson algorithm from page 344.
+We want to show that the algorithm returns the absolute maximum value for any flow in G; therefore we will use cuts to place more upper bounds than just C.  Dividing the nodes into two sets, A and B means that we can bound the maximum flow values on A and on B; a cut is therefore a partition of the vertex set into A and B where s is in A and t is in B.  The capacity of this partition is c(A,B) = sum_(e in A) c_e. v(f) \leq c(A,B).  Every flow is upper-bounded by the capacity of every cut, so the maximum flow is equal to the minimum cut. W can compute an s-t cut of minimum capacity in O(m) time since we can extend the algorithm easily.  There is a maximum flow where every flow is an integer, but this is not always the case for every flow.  We can prove the Max-Flow Min-Cut Theorem even for real numbers, but we work well with integers on this problem.  This section was easier to read and helped a bit with the Ford-Fulkerson algorithm, but I still don’t understand residual graphs.  (7/10)