Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2011:journals:chen:chapter_6 [2011/03/16 06:14] – [6.3 Segmented Least Squares: Multi-way Choices] zhongc
+++ courses:cs211:winter2011:journals:chen:chapter_6 [2011/04/06 16:41] (current) – [6.9 Shortest Paths and Distance Vector Protocols] zhongc
@@ Line 104: / Line 104: @@
 interesting/readable: 7/7
+====== 6.4 Subset Sums and Knapsacks: Adding a Variabl ======
+Summary
+Problem:
+Given n objects and a "knapsack"
+Item i weighs wi > 0 kilograms and has value vi > 0
+d-dynamical programming won't work.
+Recurrence:
+Either select item i or not
+Case 1: OPT does not select item i
+• OPT selects best of { 1, 2, …, i-1 }
+Case 2: OPT selects item i
+• Accepting item i does not immediately imply that
+we will have to reject other items
+ No known conflicts
+• **Without knowing what other items were selected
+before i, we don't even know if we have enough
+room for i**
+Needs more memorization power!
+Add a variable.
+Def. OPT(i, **w**) = max profit subset of items 1,
+…, i with weight limit w
+ Case 1: OPT does not select item i
+• OPT selects best of { 1, 2, …, i-1 } using weight
+limit w
+ Case 2: OPT selects item i
+• new weight limit = w – wi
+• OPT selects best of { 1, 2, …, i–1 } using new
+weight limit, w – wi
+iterate through the n-w array.
+Runtime: O(nw)
+Interesting/readable:7/7
+====== 6.5 RNA SECONDARY STRUCTURE ======
+Summary
+Problem:
+RNA. String B = b1b2…bn over alphabet { A, C, G, U }
+• Secondary structure. RNA is single-stranded so it
+A set of pairs S = { (bi, bj) } that satisfy:
+Constraints:
+ [Watson-Crick] S is a matching and each pair in
+S is a Watson-Crick complement: A-U, U-A, CG,
+or G-C
+ [No sharp turns] The ends of each pair are
+separated by at least 4 intervening bases. If (bi,
+bj) ∈ S, then i < j - 4
+ [Non-crossing] If (bi, bj) and (bk, bl) are two
+pairs in S, then we cannot have i < k < j < l
+• Free energy. Usual hypothesis is that an RNA
+molecule will form the secondary structure with the
+optimum total free energy.
+• Goal. Given an RNA molecule B = b1b2…bn, find a
+secondary structure S that maximizes the number
+of base pairs
+OPT(i, j) = maximum number of base pairs in
+a secondary structure of the substring
+bibi+1…bj
+ Case 1. If i ≥ j - 4
+• OPT(i, j) = 0 by no-sharp turns condition
+ Case 2. Base bj is not involved in a pair
+• OPT(i, j) = OPT(i, j-1)
+ Case 3. Base bj pairs with bt for some i ≤ t < j - 4
+• non-crossing constraint decouples resulting subproblems
+• OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) }
+Runtime: O(N3)    two for loops and a max().
+Interesting/readable:7/7
+====== 6.6-6.7 SEQUENCE ALIGNMENT ======
+Some general context of the problem:
+Google search and dictionaries have functions that recognizes what the user probably mean when typing in a word
+that does not exist in the dataset.
+In order to do
+**How should we define similarity between two words or strings?**
+Dissimilar by gaps and mismatches.
+Formalized:
+Sequence Alignment
+• Goal: Given two strings X = x1 x2 . . . xm and
+Y = y1 y2 . . . yn find alignment of minimum
+cost
+• An alignment M is a set of ordered pairs xi-yj
+such that each item occurs in at most one
+pair and no crossings
+• The pair xi-yj and xi'-yj' cross if i < i', but j > j’.
+different cases
+Case 1: xM and yN are aligned
+• Pay mismatch for xM-yN + min cost of aligning rest of
+strings
+• OPT(M, N) = αXmYn + OPT(M-1, N-1)
+ Case 2: xM is not matched
+• Pay gap for xM + min cost of aligning rest of strings
+• OPT(M, N) = δ + OPT(M-1, N)
+ Case 3: yN is not matched
+• Pay gap for yN + min cost of aligning rest of strings
+• OPT(M, N) = δ + OPT(M, N-1)
+Similar to KnapSack problem in the data structure.
+space O(mn)
+Can do this in linear space:
+Collapse into an m x 2 array
+ M[i,0] represents previous row; M[i,1] -- current
+T(m, n) <= cmn + T(q, n/2) + T(m - q, n/2)
+= cmn + kqn/2 + k(m - q)n/2
+= cmn + kqn/2 + kmn/2 - kqn/2
+= (c + k/2)mn.
+combination of divide-and-conquer and dynamic programming
+we can map the problem to shortest path problem on a weighted graph.
+Divide: find index q that minimizes f(q, n/2) + g(q, n/2) using DP
+Conquer: recursively compute optimal alignment in each piece.
+Interesting/readable:7/7
+====== 6.8 Shortest Paths in a Graph ======
+Negative weights would change everything that made the Greedy appropriate. Thus we cannot
+make decision only relying on local inforamtion at each step b/c a big negative edge that
+come later may drastically change the picture.
+Some constraints:
+No negative weight cycle
+If some path from s to t contains a negative
+cost cycle, there does not exist a shortest s-t
+path.
+we can loop forever and get ever smaller.
+Reccurence:
+OPT(i,v): minimum cost of a v-t path P using
+at most i edges
+ This formulation eases later discussion
+• Original problem is OPT(n-1, s)
+DP
+ Case 1: P uses at most i-1 edges
+• OPT(i, v) = OPT(i-1, v)
+ Case 2: P uses exactly i edges
+• if (v, w) is first edge, then OPT uses (v, w), and
+then selects best w-t path using at most i-1 edges
+• Cost: cost of chosen edge
+Implementation
+for every edge number,
+for possible node v
+for each edge incident to v
+find out foreach edge (v, w) ∈ E
+M[i, v] = min(M[i, v], M[i-1, w] + cvw )
+O(mn) space.
+General process of DP
+Review: Dynamic Programming Process
+. Determine the optimal substructure of the
+problem  define the recurrence relation
+. Define the algorithm to find the value of the
+optimal solution
+. Optionally, change the algorithm to an
+iterative rather than recursive solution
+. Define algorithm to find the optimal
+solution
+. Analyze running time of algorithms
+Interesting/readable: 8/8
+====== 6.9 Shortest Paths and Distance Vector Protocols ======
+Problem Motivation:
+One important application of the Shortest-Path Problem is for routers in a
+communication network to determine the most efficient path to a destination.
+attempt:
+Dijkstra's algorithm requires global information of network- unrealistic
+we need to work with only local information.
+Bellman-Ford uses only local knowledge of
+neighboring nodes
+ Distribute algorithm: each node v maintains its
+value M[v]
+ Updates its value after getting neighbor’s values
+Problems with the Distance Vector Protocol
+One of the major problems with the distributed implementation of Bellman-
+Ford on routers (the protocol we have been discussing above) is that it’s derived
+from an initial dynamic programming algorithm that assumes edge costs will
+remain constant during the execution of the algorithm.
+That is, we might get into a situation where there is infinite looping of mutual dependancy.
+could fail if the other node is deleted.
+Solution:
+ Each router stores entire path Not just the distance and the first hop
+ Based on Dijkstra's algorithm
+ Avoids "counting-to-infinity" problem and related
+difficulties
+ Tradeoff: requires significantly more storage
+Interesting/readable: 5/5