Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2011:journals:david:chapter6 [2011/03/30 00:34] – [6.6 - Sequence Alignment] margoliesd
+++ courses:cs211:winter2011:journals:david:chapter6 [2011/04/06 03:26] (current) – [6.10 - Negative Cycles in a Graph] margoliesd
@@ Line 39: / Line 39: @@
 ====6.6 - Sequence Alignment====
-The problem we are trying to solve is how to determine how alike two words or sequences are. We will use gaps and mismatches to determine the optimal value, where gaps all have constant cost and mismatches have variable cost depending on the symbols mismatched. Thus, the cost of M, our optimal solution, is the total of all gaps and mismatch costs, which is the lowest in the optimal solution. We define our sets as {1,2,..,n} and {1,2,..,m}, and note that for the optimal M, either (m,n) are in M (they match), or they do not. This leads us to the following recurrence, where either 1,2,or 3 is true. 1 - (m,n) is in M, 2 - the m<sup>th</sup> position of X is not matched, 3 - the n<sup>th</sup> position of Y is not matched.
+The problem we are trying to solve is how to determine how alike two words or sequences are. We will use gaps and mismatches to determine the optimal value, where gaps all have constant cost and mismatches have variable cost depending on the symbols mismatched. Thus, the cost of M, our optimal solution, is the total of all gaps and mismatch costs, which is the lowest in the optimal solution. We define our sets as Y={1,2,..,n} and X={1,2,..,m}, and note that for the optimal M, either (m,n) are in M (they match), or they do not. This leads us to the following recurrence, where either 1,2,or 3 is true. 1 - (m,n) is in M, 2 - the m<sup>th</sup> position of X is not matched, 3 - the n<sup>th</sup> position of Y is not matched. This gives us opt(i,j) as min[mismatch cost of x<sub>i</sub>,y<sub>j</sub> + opt(i-1,j-1), gap cost + opt(i-1,j), gap cost + opt(i,j-1)]. There is also a pictoral approach involving the minimum cost path. We get a running time of O(mn).
+Readability 5/10. This was another difficult section to understand.
+====6.7 - Sequence Alignment in Linear Space via Divide and Conquer====
+The space requirement in the previous section is O(mn), but this can become 10 GB if both strings are 100,000 symbols each. We find that we can use a divide and conquer approach to divide the problem in many recursive calls, which allows for space to be reused by each subsequent call. First, we note that we could use a 2-dimension array with the previous algorithm because we only need to know about the current and previous columns. However, this will not give us enough information to get the alignment back once we find its value. We use the graph from the previous section, and define g(i,j) as the length of the shortest path from (i,j) to (m,n). In our case, we initially start at g(m,n) which is 0, and try to find g(0,0), which gives the value. We call this the Backward-Space-Efficient-Alignment, and it has a space requirement of O(m+n) and a running time of O(mn).
+Readability: 5/10. Another difficult section.
+====6.8 - Shortest Paths in Graphs====
+We will denote our directed graph as G=(V,E) and give each edge a specific weight. While Dijkstra's Algorithm can find us the path of least cost, it does not work for negative costs. Our problem involves finding the path of least cost in a graph that can have negative edge weights, but does not have any negative cycles. If we begin with a greedy approach, we consider the minimum cost edge leaving our node. But this could cause us to miss an edge of greater cost that leads us on a path with more negative costs that could negate it. The Bellman-Ford Algorithm gives an efficient solution to our problem. We note that our path will have at most n-1 edges. If we define opt(i,v) to be the minimum cost of the path from to v to t with at most i edges, we can definite is as the minimum of [opt(i-1,v), min(opt(i-1,w) + the cost of v to w). This gives a 2-dimensional array M with the optimal values for each subproblem. We can get a running time of O(mn). While our array will be of size n<sup>2</sup>, we can actually get a smaller memory requirement. We use a 1-dimensional array and only update a cost if it was lower than the previous cost. We keep a "first" node for each entry to keep track of the first edge we need to take. This will allow us to find the optimal path after the algorithm has completed.
+Readability: 8/10. This was easier to understand than the other sections.
+====6.9 - Shortest Paths and Distance Vector Protocols====
+An application for Shortest Paths algorithm we used in the previous section is for a network of routers (nodes) with direct links (edges). The cost of an edge is the delay of the link and we look for a path with the shortest delay. While Dijkstra's Algorithm could work in this situation because delays cannot be negative, it requires us to have a global knowledge of the network. We can use Bellman-Ford to avoid this problem, but we need to re-implement it as a "push-based" algorithm where costs need only be sent if they change value. We use an "asynchronous" algorithm to denote which nodes are active so we can update its neighbors. We call the finding of distances between all pairs of nodes a "distance vector protocol". One problem with this algorithm occurs if an edge gets deleted. Then our nodes keep referring back to each other until they find a new path. In practice, nodes store more of the entire path so we do not have this problem.
+Readability: 7/10.
+====6.10 - Negative Cycles in a Graph====
+If we augment a graph by adding a sink node that has a path from every other node leading to it and the augmented graph has a negative cycle, then the original graph must have a negative cycle too. If we have a negative cycle and we are looking for a path from one node to the sink that passes through the cycle, as we increase the number of allowable edges, the cycle tends toward negative infinity. However, if there are no negative cycles, then opt(i,v)=opt(n-1,v) as long as i is greater than or equal to n. So as long as this holds true for all nodes, there are no negative cycles in the graph. We can use a pointer graph that starts off having no cycles and add edges to it until we find a cycle.
+Readability: 5/10. The last part confused me.