Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2011:journals:wendy:chapter6 [2011/03/29 15:48] shangwcourses:cs211:winter2011:journals:wendy:chapter6 [2011/04/03 05:29] (current) shangw
Line 50: Line 50:
 Readability is 6. Readability is 6.
  
-==== Section 6: RNA Secondary Structure Dynamic Programming over Intervals =====+==== Section 6: Sequence Alignment===== 
 +This section introduces a fairly commonly seen problem: sequence alignment, which appear in google-search and molecular computational biology. We want to see if two things are "similar"; thanks to some biologist, there is a numerical definition of similarity. What we want to do is to find the parameter to compare.  
 +The algorithm given by the book is to use the recurrence: 
 +OPT(i,j)=min(alphaij+OPT(i-1,j-1), delta+OPT(i-1,j), delta+OPT(i,j-1)). 
 +First possibility is to have i,j matching each other, second is to skip i, third is to skip j. 
 +To prove the correctness of the dynamic programming, we visualize a graph with each node pointing to the top, right, and the upper right diag. It is not hard to show that the OPT algorithm gives the best solution to go from (00) to any (ij), which essentially means the best alignment. 
  
 +However, in real world scenario, we are not going to be given the target to compare with what we type in. In other word, how can the computer know to compare "occurence" with "occurrence"? Guess this is another interesting problem in searching.
 +
 +Readability is 7. 
 +
 +==== Section 7: Sequence Alignment in Linear Space via Divide and Conquer=====
 +This section gives an improvement of the algorithm from section 5.
 +In real world, especially molecule biology, there are many elements to be aligned and it takes over too much space (O(mn) for the graph) if we use the algorithm given in the last section. Hence, we hope to find a more spacial-efficient algorithm. 
 +The first idea comes to mind (in the book) is to only record the current column and the one right before it because those are the only relevant information in order to calculate the optimal value. 
 +However, a problem appears: we cannot record the path leading to the optimal value in that way. Then the book further elaborates the idea using divide-conquer. The algorithm is based on this theorem:
 +m and n to be aligned, k be in {0..n} and q be an index that minimizes the quantity f(q,k)+g(q,k). Then there is a corner to corner path of minimum length that passes through (q,k).
 +This, together with working the problem backward from (m,n) to (0,0) and forward as the original way, allows us to use divide and conquer. For one of the sequence to be aligned, wlog, say n, we always divide it by half; and the other sequence, m, we divide it based on the value of q. On the way, we record each q. 
 +It indeed takes O(m+n) space. The recurrence relationship is: T(m,n)<=cmn+T(q,n/2)+T(m-q,n/2)=O(mn), so the running time is roughly the same as the algorithm before. Hence we achieved our goal: shrink the space and maintain the running time. 
 +I really like the idea and the readability is 9.
 +
 +==== Section 8: Shortest Paths in a Graph=====
 +This is a variation of the shortest paths in a graph problem with only positive edges. In this extended problem, we allow negative-weighted edges; however we still do not allow a negative cycle-because it can just go on forever around the cycle and go to negative infinitive, and become meaningless. 
 +When there is negative edges, it is easy to come up with a counter example that our old greedy algorithm no longer works. Then the book tries to modify the greedy algorithm by adding a constant M big enough to each edge so as to make them all positive, which is easy to be seen wrong, too. Hence, we need to turn to dynamic programming. 
 +The recurrence is:
 +OPT(i,v)=min(OPT(i-1,v),min(OPT(i-1,w)+cvw)): that is, to calculate out all the optimal solutions using different number of edges. The calculation, as usual, is built up on smaller sub-problems. The running time, if calculated roughly, is O(n^3), though a closer look refines it to be O(mn). 
 +This algorithm, like many other dynamic programming, is space-consuming. To save space, when we go through each node, we will record the next node that leading to the optimal solution. In this way, it only takes O(n) space. 
 +In addition, to save both time and space, we set a stopping signal: when we run into 2 same columns consecutively, it is time to stop because any further update will not change anything. 
 +The readability is 8.
 +
 +==== Section 9: Shortest Paths and Distance Vector Protocols=====
 +This is section introduces an algorithm that improves the shortest paths algorithm from section 8.
 +First we observe the shortage of the original algorithm: it is a pull-based algorithm, and in  each iteration, each node will contact all its neighbors and "pull" the new value from it; however, if a neighbor remains the same value, then there is no need to pull out the new value from it. The new algorithm is a push-based algorithm. That is, updates will not be provided until the value of a node changes; then the updates will be pushed to its surrounding neighbors, and the method is called the distance vector protocol. The book writes down the detailed algorithm, as usual. 
 +Then it continues to discuss some associated concerns with the distance vector protocol. In real life scenario, if one of the edges is deleted, which totally cut-off all the connections between a node a and the ending node b, the distance vector protocol will not cease updating M[a] and never realize there is no path available. And this, of course, can be problematic. 
 +The way to solve it is to keep track of the big picture: each node not only storing knowledge of its neighbor but the entire network to ensure there actually exists a path. The extra storage will take over more space but is indeed necessary.
 +Readability is 7. 
  
courses/cs211/winter2011/journals/wendy/chapter6.1301413729.txt.gz · Last modified: 2011/03/29 15:48 by shangw
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0