Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2014:journals:cory:home [2014/03/02 18:30] – walkerc
+++ courses:cs211:winter2014:journals:cory:home [2014/03/31 21:51] (current) – walkerc
@@ Line 92: / Line 92: @@
 ===== 5.2 Summary =====
 A more general class of algorithms is obtained by considering divide-and-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine the results in O(n) time. 1) For some constant c, T(n) ≤ qT(n/2) + cn when n >2, and T(2) ≤ c. Any function T(*) satisfying 1) with q > 2 is bounded by O(n^(log_2 * q)). So, the running time is more than linear, since log_2 * q > 1, but still polynomial in n. Any function T(*) satisfying 1) with q = 1 is bounded by O(n). Furthermore, for some constant c, T(n) ≤ 2T(n/2) + cn^2 when n > 2 and T(2) ≤ c. What are some algorithms that we may use regularly that use these recurrence relations? Interest: 4/10
+===== 5.3 Summary =====
+Another problem that can be solved with divide-and-conquer algorithms is a problem that arises in the analysis of rankings. For example, many sites on the Web make use of a technique known as collaborative filtering, where they try to match people's preferences with those of other people on the Internet. It uses these similarities in interest to recommend new things that the other people have liked. Another application is in meta-search tools on the Web. To compare two sets of rankings, we count the number of inversions used to order the sets. This can be done in O(nlogn) time. The most crucial routine in the process of counting inversions is the Merge-and-Count algorithm. Because the Merge-and-Count procedure takes O(n) time, the Sort-and-Count algorithm correctly sorts the input list and counts the number of inversions; it runs in O(nlogn) time for a list with n elements. Question: Do Facebook, twitter, and other modern search engines still use a similar algorithm in their recommendations? Interest: 6/10
+===== 5.4 Summary =====
+Finding the closest pair of points means that, given n points in the plane, we want to find the pair that is closest together. This problem is found in computational geometry, and is useful in graphics, computer vision, geographic information systems, molecular modeling, etc. Given points P = {p_1, ..., p_n}, our goal is to find a pair of points p_i, p_j that minimizes the distance d(p_i, p_j). This is solved with another divide and conquer algorithm, like Mergesort: find the closest pair among the points in the "left half" of P and the closest pair among the points in the "right half" of P. Use this information to get the overall solution in linear time. Overall, this gives O(nlogn) running time, and correctly outputs a closest pair of points in P. Question: How exactly DO we recombine at the end in linear time? Interest: 3/10
+===== 5.5 Summary =====
+Integer multiplication can be done with a different divide and conquer algorithm, in which the "default" quadratic algorithm is improved by means of a different recurrence. Here, we break up the product of the two integers into partial sums. Putting it into base-2 reduces the problem of solving a single n-bit instance to the problem of solving four n/2 - bit instances. Then, recursively compute the results for these four n/2 - bit instances, and then combine them. This results in the Recursive-Multiply algorithm. The running time of Recursive-Multiply on two n-bit factors is O(n^(log_2(3)) = O(n^1.59). Question: Is this algorithm used behind the scenes for each integer multiplication that we ask of our computers? Interest: 8/10
+===== 6.1 Summary =====
+The Interval Scheduling Problem can be solved for an optimal solution with a particular greedy algorithm, but the Weighted Interval Scheduling Problem requires a more general version, where each interval has a value or weight, and we want to accept a set of maximum value. With weighted intervals, no natural greedy algorithm is known; instead, we switch to dynamic programming. In particular, a recursive type of algorithm is used to solve this problem, where each request is said to belong to an optimal solution on the given set if and only if v_j + OPT(p(j)) >= OPT(j-1), where OPT(x) is the value of the optimal solution. In the process of using this recursion to calculate the optimal solution, we must save values that have already been computed, which is called memoization. The running time of the complete algorithm with the memoization is O(n). Question: How does this algorithm avoid delving through all possibilities, and thus being as inefficient as brute force? Interest: 4/10
+===== 6.2 Summary =====
+The key to the efficient algorithm used for the weighted interval scheduling problem is the array M. This array is built by using the value of optimal solutions to the subproblems on intervals 1 through j for each j, using the previous algorithm to define each value of M[j] based on values that come earlier in the array. Once we have the array M, the problem is solved, because it contains the value of the optimal solution on the full instance, and our Find-Solution algorithm can be used to backtrack through M and return the final, optimal solution. Dynamic programming algorithms can be developed using an iterative building up of subproblems; this approach is often a simpler way to express the algorithms. To develop an algorithm based on dynamic programming, one needs a collection of subproblems derived from the original problem that satisfy the following properties:  1) There are only a polynomial number of subproblems 2) The solution to the original problem can be computed from solutions to the subproblems 3) There's a natural ordering on subproblems from "smallest" to "largest" together with an easy-to-compute recurrence that allows one to determine the solution to a subproblem from the solutions to some number of smaller subproblems. Question: Is dynamic programming only used for those problems that cannot be solved with more efficient algorithms, just as a better alternative to brute force? Interest: 4/10
+===== 6.3 Summary =====
+In the segmented Least Squares problem, the recurrence we use will involve what could be called "multi-way choices." At each step, we have a polynomial number of possibilities to consider for the structure of the optimal solution. Essentially, we are trying to find the smallest number of lines of best fit that can be used and still approximately cover the majority of the points on the graph. For each segment, we compute the line minimizing error with respect to the points in S. The penalty of a partition is defined to be a sum of 1) the number of segments into which we partition P, times a fixed, given multiplier C > 0 and 2) for each segment, the error value of the optimal line through the segment. We want to find a partition of least penalty. For the subproblem on the points p_1 through p_j, OPT(j) = min(e_i,j + C + OPT(i-1)) and the segment p_i through p_j is used in an optimum solution for the subproblem if and only if the minimum is obtained using index i. In total, the algorithm that is found for this problem has a running time of O(n^2) once all the e_i,j values have been determined. Question: What would we do if there was a graph with several very-stray points; would it just be impossible to compute a good best-fit line? Interest: 6/10
+===== 6.4 Summary =====
+In the subset sums and knapsacks problem, the "obvious" set of subproblems turns out to not be enough, and we end up creating a richer collection of subproblems. This is done by adding a new variable to the recurrence underlying the dynamic program. Here we have a single machine that can process jobs between the times of 0 and W for some W. Our goal is to process jobs so as to keep the machine as busy as possible up to the W "cut-off." Given n items that each have a nonnegative weight, and a bound W, we want to select a subset S of the items so that the sum of each of the items' weights is less than the W cut-off, while the sum of all of their weights is as large as possible. This is called the Subset Sum Problem, which is a natural special case of the Knapsack Problem where each request i has both a value and a weight. Dynamic programming can be used to solve this problem. The Subset-Sum(n, W) Algorithm correctly computes the optimal value of the problem and runs in O(nW) time. Given a table M of the optimal values of the subproblems, the optimal set S can be found in O(n) time. The Knapsack Problem can be solved in O(nW) time. Question: Is there another algorithm (not using dynamic programming) that can find an equally efficient solution/running time? Interest: 7/10
+===== 7.1 Summary =====
+Graphs can often be used to model transportation networks, whose edges carry "traffic" and whose nodes act as "switches" passing traffic between different edges. Network models have: capacities on the edges, source nodes which generate traffic, sink (destination) nodes, and the traffic itself (or flow) which is transmitted across the edges. To find a maximum flow in a network, with the smallest bottleneck, we must answer the Maximum-Flow Problem with a combination of greedy and dynamic programming. We can push forward on edges with leftover capacity and backward on edges that are already carrying flow to change its direction. A residual graph provides a systematic way to search for forward-backward operations like this.The algorithm that answers this is known as the Ford-Fulkerson Algorithm, which runs in O(mC) time. Question: Which parts of this are dynamic programming, and which are greedy? Interest: 6/10
+===== 7.2 Summary =====
+This section is about the analysis of the Ford-Fulkerson Algorithm, and how it provides insight into the Maximum-Flow Problem. Here we see and prove that the Ford-Fulkerson Algorithm has the maximum possible value of any flow in G. By watching the amount of flow f (any s-t flow) sends across a cut, we can measure the flow value. Let f be any s-t flow and (A, B) any s-t cut. Then v(f) ≤ c(A, B). If f is an s-t-flow such that there is no s-t path in the residual graph G_f, then there is an s-t cut (A*, B*) in G for which v(f) = c(A*, B*). Consequently, f has the maximum value of any flow in G, and (A*, B*) has the minimum capacity of any s-t cut in G. Given a flow f of maximum value, we can compute an s-t cut of minimum capacity in O(m) time. In every flow network, the maximum value of an s-t flow is equal to the minimum capacity of an s-t cut. If all capacities in the flow network are integers, then there is a maximum flow f for which every flow value f(e) is an integer. Question: Because this algorithm could potentially run forever, is it really always efficient? Interest: 6/10
+===== 7.5 Summary =====
+One of the main goals of the Maximum-Flow Problem is to be able to solve the Bipartite Matching Problem. A bipartite graph G=(V, E) is an undirected graph whose node set can be partitioned so that every edge has one end in X and the other in Y. A matching M in G is a subset of the edges so that each node appears in at most one edge in M. The Bipartite Matching Problem involves finding a matching in G of largest possible size. The size of the maximum matching in G is equal to the value of the maximum flow in G'; and the edges in such a matching in G are the edges that carry flow from X to Y in G'. The Ford-Fulkerson Algorithm can be used to find a maximum matching in a bipartite graph in O(mn) time. Assume that the bipartite graph G = (V, E) has two sides X and Y such that |X| = |Y|. Then the graph G either has a perfect matching or there is a subset A⊆X such that |Γ(A)| < |A|. A perfect matching or an appropriate subset A can be found in O(mn) time. Question: What real world applications do bipartite graphs have? Interest: 4/10
+===== 7.7 Summary =====
+In the Maximum-Flow Problem, we had only a single source s and a single sink t. Now we can have a set S of sources and a set T of sinks. Instead of maximizing the flow value, we will have sources that have fixed supply values and sinks with fixed demand values, and we want to ship flow from nodes with supply to those with demands. There is a feasible circulation with demands {d_v} in G if and only if the maximum s* - t* flow in G' has value D. If all capacities and demands in G are integers, and there is a feasible circulation, then there is a feasible circulation that is integer-valued. There is a feasible circulation in G if and only if there is a feasible circulation in G'. If all demands, capacities, and lower bounds in G are integers, and there is a feasible circulation, then there is a feasible circulation that is integer-valued. Question: Is this algorithm just a slightly tweaked version of the Ford-Fulkerson Algorithm? Interest: 6/10