Student Journals

Section 6.4 - Subset Sums and Knapsacks: Adding a Variable In this section we look at the Knapsack problem where there are n items and each is given a weight wi, and each request i has a value vi and weight wi. Thus the problem is to make the knapsack as full as possible (with max capcity of W). This problem is similar to the Weighted-Interval Problem because as in W-I-P, the key to the problem is that the item is either included in the sack or not. For this problem, we don't include item n when the nth term is too big (W<wn), so OPT(n,W) = OPT(n-1, W). But if we do include n, then OPT(n,W) = wn + OPT(n-1, W - wn), where W - wn is the new W. So as before, the algorithm is designed to build up a table of all OPT(i,w) values while computing each of them only once. Therefore, after building up a table of solutions M, and each of the values M[i,w] have been computed in O(1) time by using the previous values the algorithm Subset-Sum(n, W) runs in O(nW) time. This runtime is known as pseudo-polynomial. To recover an optimal set S of items, an algorithm can trace back through the array M by a procedure similar to those we devloped in the previous secitons: Given a table M of the optimal values of the subproblems, the optimal set S can be found in O(n) time.

Section 6.5 - RNA Secondary Structure: Dynamic Programming over Intervals In this section the book looks at the problem of RNA secondary structure prediction. In this problem, a single stranded RNA molecule where the molecule can be viewed as a sequence of n symbols (bases) drawn form the alphabet {A, C, G, U}. Let B = b1b2…bn be a single stranded RNA molecule, where each bi is an element of the alphabet. Thus, we say that a secondary structure on B is a set of pairs S = {(i, j)}, where i, j are elements of {1, 2, …., n}, that satisfies the following conditions: 1. the ends of each pair in S are separated by at least four intervening bases; that is, if (i, j) E S, then i<j-4, 2. the elements of any pair in S consist of either {A, U} or {C, G}, 3. S is a matching: no base appears in more than one pair, and 4. if (i, j) and (k, l) are two pairs in S, then we cannot have i < k< j< l. For design purposes we describe OPT(i,j) denotes the maximum number of base pairs in a secondary structure on bibi+1…bj, and we know that OPT(i, j) = 0 whenever i≥j-4. Now we knot that in the optimal secondary structure on bibi+1….bj, either j is not involved in a pair; or j pairs with t for some t<j-4. If we rearrange this we can say that OPT(i, j) = max(OPT(i, j-1), max(1+OPT(i, t-1) + OPT(t+1, j-1))), where the max is taken over t such that bt and bj are an allowable base pair. Once we knwo this, we can describe the rest of the algorithm as: 1- initialize OPT(i,j) = 0 where i≥j-4. Next, go into a for loop of k=5…n-1, and then go into a nested for loop for i = 1….n-k. Within the two for loops the algorithm sets j = i+k and computes OPT(i,j). Once it has done this for all of n then it returns OPT(i, n). It is easy to bound the running time of this problem since there are O(n-squared) subproblems to solve, and evaluating the recurrence of OPT(i,j) takes time O(n) for each. THus the running time is O(n-cubed).

Section 6.6 - Sequence Alignment In this section we look at the problem of how one should define similarity between two strings. To answer this problem we first need two strings to “line up”. To design this algorithm we define similarity as finding the optimal alignment between X and Y, and suppose M is a given alignment between X and Y. First there is a parameter g>0 that defines a gap penalty and for each position of X or Y that is not matched in M, you incur cost g. Second, for each par of letters p, q in the alphabet, there is a mismatch cost of mm = 0 for each letter p. The cost of M is the sum of its gap and mismatch costs, and we seek an alignment of minimum cost. In the optimal alignment M, either (m,n) is an element of M or (m,n) is NOT and element of M. Thus at least one of the follow is true: i. (m,n) is an element of M or, ii. the mth position of X is not matched or, iii. the nth position of Y is not matched. And last the section defines the minimum alignment costs as OPT(i, j) = min(mm + OPT(i-1,j-1), g + OPT(i-1, j), g + OPT(i, j-1), where i≥1 and j≥1. For this problem, the running time is O(mn), since the aray A has O(mn) entries, and at worst we spend constant time on each.

Section 6.7 - Sequence Alignment in Linear Space via Divide and Conquer In this section we look the sequence alignment problem again but construct a now algorithm to reduce the space it takes from O(mn) to O(m+n) space. First we show that if we only care about the value of teh optimal alignment, and not the alignment itself, it is easy to get away with linear space. To do this, we “collapse” the array A to an m x 2 array B: as the algorithm iterates through the values of j, entires of the form B[i, 0] will hold the “previous” column's value A{i, j-1], while entries of the form B[i, 1] will hold the “current” column's value A[i,j]. This algorithm is clearly superior to the previous section's because it uses the same running time of O(mn), but only uses O(m) space. This section also looks at the backward formulation of the string alignment. In this case the algorithm uses divide and conquer to solve the algorithm. But this problem again has a running time of O(mn) and uses O(m + n) space. Therefore through these two new algorithms we can optimize the space needed to solve sequence alignment, but there still isn't an algorithm yet that can produce a result faster than O(mn) running time.

Section 6.8 - Shortest Paths in a Graph In this seciton we look at the shortest path problem, where we want to find the path of a graph using minimum cost. First we look at Dijkstra's Algorithm for the Shortest Path Problem (no negative costs). The basic idea is to maintain a set S wtih the property that the shortest path from s to each node in S is known. We start with S = {S} - since we know that thest shortest path from s to s has cost 0 when there are no negative costs, and then we add elements greedily to this set S. The best algorithm that has yet been defined to solve this is the Bellman-Ford Algorithm which states that if G has no negative cycles, then there is a shortest path from s to t that is simple, and hence has at most n-1 edges. Then we define OPT(i,v) to denote the minimum cost of v-t path using at most i edges. This idea leads to the recursive formula: if i>0 then, OPT(i,v) = min(OPT(i-1, v), min(OPT(i-1,w) + Cvw)). Using this recurrence, we get the following dynamic programming alogirthm to compute the value OPT(n-1, s): Shortest-Path(G, s, t):

   n = #of nodes in G
   Array M[0...n-1, V]
   Define M[0, t] = 0 and M[0, v] = infinity for all other v elemnet of V
   For i=1.... n-1:
        For v element of V in any order
              compute M[i, v] using the recurrence formula
        Endfor
   Endfor
   Return M[n-1, s]

This shortest-Path method correctly computes the minimum cost of an s-t path in any graph that has no negative cycles, and runs in O(n-cubed time) because of the two nested for loops and each recurrence takes linear time.