Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
courses:cs211:winter2014:journals:eric:home [2014/03/04 20:00] – [4.8 Huffman Codes and Data Compression] utomoe | courses:cs211:winter2014:journals:eric:home [2014/03/31 14:36] (current) – [Eric's Journal] utomoe | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Eric's Journal ====== | ====== Eric's Journal ====== | ||
+ | Interest: 7/10. this is back to bit too abstract for me. maybe (in fact probably) this week's reading will help out if I take big data class this fall. it seems network flow would be huge part of big data since it's about partitioning too big a set to compute with x resources into smaller sets with x*y resources | ||
+ | ===== Week 11 Reading ===== | ||
+ | ====7.1 The Max-Flow Prob and the Ford-Fulkerson Alg==== | ||
+ | >the prob: graphs through nodes and switches act as transportation networks with traffic of certain capacities traveling through, and having source nodes and sink nodes for that data | ||
+ | -flow networks | ||
+ | * flow: abstract entity gen by source nodes | ||
+ | * flow network: directed graph with following features: | ||
+ | * capacity, a nonnegative number ce | ||
+ | * source node s in V | ||
+ | * sink node t in V | ||
+ | * internal nodes in between | ||
+ | -defining flow: 0< | ||
+ | -the max-flow prob: arrange traffic in a flow network as efficiently as poss (max the flow value) | ||
+ | * by cutting a graph, we can know the max possible flow value | ||
+ | * minimum cut: max-flow value equals the minimum capacity of any such division | ||
+ | > | ||
+ | -the residual graph: node set of Gf is same as G, ce-f(e) leftover units of capacity on edges to push forward to, and backward edges e' | ||
+ | -aug paths in a res graph | ||
+ | aug(f,P) | ||
+ | let b=bottleneck(P, | ||
+ | for each edge in P | ||
+ | if edge is forward: increase f(e) in G by b | ||
+ | else: decrease f(e) in G by b | ||
+ | * 7.1 f' is a flow in G | ||
+ | * proof: 0< | ||
+ | Max-Flow | ||
+ | all e in G to 0 | ||
+ | while there' | ||
+ | let P be that simple plath | ||
+ | f' | ||
+ | update f to be f' | ||
+ | update Gf to be Gf' | ||
+ | | ||
+ | * the above is the FF alg who dev in 1956 | ||
+ | >ana the alg: termination and RT | ||
+ | -7.2: at every intermediate stage of FFA, flow values{fe} and the resi caps in Gf are ints. Proof: suppose true after the while loop j iterations, then bn(P,f) will be an it and f' will have ints so will the new res graph | ||
+ | -7.3: let f be a flow in G, and P a simple st path in G. Then vf' | ||
+ | -7.4: suppose as above that all caps in G are ints. Then the FFA terminates i nat most c iterations of the while loop. Proof: by 7.3 flow in FFA increases each iter, and by 7.2 at least by 1. it starts with 0 and not higher than c, so that's max c iters | ||
+ | -7.5 suppose that all caps in G are ints, then FFA can be implemented to run in mC time. Proof: most work done in one iter is f. have two linked lists for each node in Gf, then to find path st we can use breadfirst or depthfirst in )(m+n). then for each e we can construct forward and backward edges in Gf' in Om time | ||
+ | ====7.2 max flows and min cuts in a network==== | ||
+ | >ana the alg: flows and cuts: structure is important because the max flow is determined by any point where it's just one point connecting two separate cut sets | ||
+ | -7.6: let f be any st flow and AB and set cut. then vf=fout(A) - fin(A). Proof: vf = foutv-finv for all v in A. Then that equals feoutA - feinA = foutA-finA | ||
+ | -7.7 let f be any st flow and AB any st cut then vf=finB-foutB | ||
+ | -7.8: vf< | ||
+ | >ana the alg: max-flow equals min-cut | ||
+ | -7.9: if f is an st flow such that there is no st in Gf, then there' | ||
+ | -7.10:the flow f_ ret by FFA is a max flow | ||
+ | -7.11: given f of max val we can compute st of min cap in m time. proof: 79 then bfs or dfs to find A*, then B*=v-A* and return cut(A*,B*) | ||
+ | -7.12: in every flow network there f and cut so that vf=c(A,B) | ||
+ | -7.13: in every flow network the max val of st flow is equal to min cap of st cut | ||
+ | -further analysis: integer valued flows | ||
+ | --7.14: if all caps in flow network are ints then there is max flow f for which every flow val fe is an its | ||
+ | --real #s as caps? all the theorems here were assumed that caps had to be ints. but what if they' | ||
+ | |||
+ | ====7.3 choosing good aug paths==== | ||
+ | >the upper bound can be very bad if each bottleneck is the same in graph as in res graph, and if that bottleneck were potentially one. then to get flow value of 200, for example, we'd have to do 200 augmentations | ||
+ | >Des a faster flow alg | ||
+ | Scaling Max-Flow | ||
+ | set all e in G to 0 | ||
+ | set D to be largest power of 2 no larger than max cap out of s | ||
+ | while D>=1 | ||
+ | while there' | ||
+ | let P be a simple st path in GfD | ||
+ | | ||
+ | | ||
+ | D = D/2 | ||
+ | | ||
+ | >ana the alg | ||
+ | -7.15: if the caps are int vals then throughout the SMFA the flow and res caps are ints, implying when D=1, GfD is same as Gf and hence when the alg terminates the flow, f is of max value | ||
+ | -7.16: the number of iters of the outer while loop is at most 1+logC | ||
+ | -7.17: during the Dscaling phase, each aug increases the flow val by at least D | ||
+ | -7.18 let f be the flow at the end of the Dscaling phase. There is st cut AB in G for which CAB< | ||
+ | -7.19 the # of augs in scaling phase at most 2m | ||
+ | -7.20 the SMFA in graph with m edges and its is at most 2m(1+logc) augs and runs in at most m^2logc time | ||
+ | > | ||
+ | |||
+ | ====7.5 a first app: the bipartite matching problem==== | ||
+ | >the prob: bipartite graph unidirected whose node set can be partitioned. a matching is a subset M such that each node apppears in at most one edge in M | ||
+ | >des the alg: direct all X to Y, then add node s and sx to each X then to and Y to t, then each edge a cap of 1 | ||
+ | >ana the alg | ||
+ | -7.34: M' contains k edges cuz in the cut set, first is cardinality of M' while second is 0 since nothing enters A, so M' contains k edges | ||
+ | -7.35: each X is the tail of at most one edge in M': if tail of 2, 2 units of flow leave from x, so 2 must have come into X but the cap is 1 so that can't be | ||
+ | -7.36: each Y is the head of at most one edge in M' | ||
+ | -7.37: the size of max match in G is equal to val of max flow in G' and the edges in such a matching G are the edges that carry flow from X to Y in G' | ||
+ | -bounding the running time | ||
+ | * 7.38: the FFA can be used to find a max matching in a bip graph in O(mn) time | ||
+ | * better bounds from 7.3 would not give better RT oddly enough because these bounds were for all instances of C where as before C had to be small enough | ||
+ | * aug paths are also called alternating paths in the context of finding max matching | ||
+ | -Extensions: | ||
+ | -what if the graph doesn' | ||
+ | -7.40: assume bip G with XY has |X|=|Y|. then G has either perfmat or there' | ||
+ | -proof 7.40: show val of maxflow in G' is n; by mfmct we know A' contains s from XY. assume by 7.11 A' | ||
+ | |||
+ | ====7.7 extensions to the max flow prob==== | ||
+ | >the prob: circulations with demands: consider fixed supply and fixed demand values and shift flow to meet those demands (like highway or railway of shipping routes). So now G has dv>0 and supply of -dv. set S to ann nodes with negative demand T to nodes with pos demands | ||
+ | 1) cap conditions: for each e in E we have 0< | ||
+ | 2) demand conditions: for each v we have v finv-foutv = dv | ||
+ | -feasibility prob: does circulation meeting 1 and 2 exist? | ||
+ | -7.49: if there exists feas circ with demands dv then sum of vs in dv = 0 | ||
+ | -proof of 7.49: dv must = finv - foutv so e is counted twice and cancel out. so dv = 0 | ||
+ | >des and ana an alg for circs: attach super source s* and super sink t* to each node in T, then add those to G. for each v in T add edge vt* with cap dv and each u in S add s*u with capacity -du then carry over rest of G to G' | ||
+ | -7.50: there' | ||
+ | -7.51: the graph G has fc with dv iff for all cuts AB, dv< | ||
+ | >the prob: circs with demands and lower bounds | ||
+ | -lower bounds are used to force/ | ||
+ | 1) cap conditions: each e in E, le< | ||
+ | 2) dem cond: each v in V finv-foutv = dv | ||
+ | >des and ana an alg with lower bounds | ||
+ | -consider f0inv - f0outv = leeinv - leeoutv then we have ce-le units to work with in remainder, so in G' the capacities of the edges will be ce-le and demand of node v dv-Lv | ||
+ | -7.52: there' | ||
+ | -proof of 7.52: finv-foutv = einv(le+f' | ||
+ | and conversely f'inv - f'outv = (fe-le)einv - (fe-le)eoutv = dv-Lv so G' stipulations are met | ||
+ | Interest: 8/10. partly because now we're dealing with non-binary, but multifaceted problems, and largely because it's week11! semester almost over. but not quite. | ||
+ | ====CHAPTER 6: DYNAMIC PROGRAMMING==== | ||
+ | |||
+ | ====6.1 weighted int scheduling: a rec procedure==== | ||
+ | -des a rec alg | ||
+ | * greedy algs can't do anything if the values of the things aren't all 1 | ||
+ | * n reqs, with interval i and value/ | ||
+ | * we need to find them by looking at smaller subproblems, | ||
+ | * 6.1: opt(j) = max(vj + opt(p(j)), opt(j-1)) | ||
+ | * 6.2: req j belongs to opt sol on set 1..j iff vj + opt(p(j)) >= opt(j-1) | ||
+ | ComputeOpt(j) | ||
+ | if j=0 return 0 | ||
+ | else return max(vj+ComputeOpt(pj, | ||
+ | * 6.3: computeopt is correct (duh) | ||
+ | * Proof of 6.3, using 6.1: opt(j)=max(vj+computeopt(pj), | ||
+ | -memoizing the recursion | ||
+ | * we need to make all the redundancies go away because it's inefficient and costly. so we memoize | ||
+ | * we memoize by storing/ | ||
+ | MCompOpt(j) | ||
+ | if j=1 return 0 | ||
+ | else if M[j] not empty return M[j] | ||
+ | else define M[j] = max(vj+MCompOpt(pj), | ||
+ | -analyzing the memoized version | ||
+ | * 6.4: the RT of MCompOpt is n assuming input ints sorted by finish times | ||
+ | * Proof: calling itself is Ok, and each rec calls invokes two rec calls to itself, increasing the entries in M by 1 and because M only has n+1 entries, it's On | ||
+ | -computing a sol in addition to its value | ||
+ | * we need to find a way to store values to make a solution, other than using an array | ||
+ | findSol(j) | ||
+ | if j=0 then output nothing | ||
+ | else if vj+M(pj)> | ||
+ | else output result of findSol(j-1) | ||
+ | * 6.5: given M of opt values for subprobs findSol returns opt sol in On | ||
+ | |||
+ | ====6.2 principles of dynamic prog: memoization or iteration over subprobs==== | ||
+ | -des the alg | ||
+ | IterativeCompOpt | ||
+ | | ||
+ | for 1..n, M(j)=max(vj+M(pj), | ||
+ | -analyzing the alg: using 6.1 and 6.3 we know by induction that opt(j) is written in M(j), then we can use findSol and IterativeCompOpt to fill out M | ||
+ | -a basic outline of dynamic programming: | ||
+ | 1)there are only a polynomial number of subproblems | ||
+ | 2)the sol to the original problem can easily be computed from the solutions to the subprobs (the orig might even be 1 subprob) | ||
+ | 3) there is a natural ordering from smallest to largest, and easy to compute recurrence that allows one to determine the solution to a subprob from the sols to some number of smaller subprobs | ||
+ | There' | ||
+ | |||
+ | ====6.3 segmented least squares: multiway choices==== | ||
+ | -we're no longer in binary mode (ooh wow) | ||
+ | -the problem | ||
+ | * we gotta find the best line to fit a data plot with the minimum margin of error | ||
+ | * line of min error formula on pg 262 | ||
+ | * we can use a set of lines because smaller lines can give smaller errors | ||
+ | * we don't want to hardcode lines in but rather use intuition to solve the segmented least squares problem | ||
+ | -formulating the problem. penalty of partition is | ||
+ | 1) the number of segments into which we partition P times a fixed given multiplier C>0 | ||
+ | 2) for each seg, error val of opt line through that seg | ||
+ | -designing the alg | ||
+ | * 6.6 if the last segment of the opt partition is pi..pn, then the val of opt sol is opt(n) = ei,n + C + opt(i-1) | ||
+ | * 6.7 for subprob on points pi..pj, opt(j) = min for 1< | ||
+ | segLeastSquares(n) | ||
+ | array M(0..n) | ||
+ | set M0 = 0 | ||
+ | for all i<=j pairs | ||
+ | compute the least squares error ei,j for pi..pj | ||
+ | for j=1..n, use 6.7 to compute M(j) | ||
+ | | ||
+ | |||
+ | findSegs(j) | ||
+ | if j=0 output nothing | ||
+ | else | ||
+ | find i that mins rec relation | ||
+ | output seg pi..pj and the result of findSegs(i-1) | ||
+ | -analyzing the alg: it's n2 because there is a pair to check for n entries. and then to find ei,j is also linear so its n3, but we can work it down to n2 by using the ingredients of calc for ei,j-1 to find ei,j in constant time. so RT: n^2 | ||
+ | |||
+ | ====6.4: subset sums and knapsacks: adding a variable==== | ||
+ | -the problem | ||
+ | * we have a single machine with requests, and only a certain time period to them in. so now we have a cut off time and need to decide which jobs to do and which to ignore. | ||
+ | * knapsack prob: requests with value vi and weight wi, and we need to maximize the kn by doing the most valuable jobs. the weight is like the time we have to do the jobs in, or memory resources or whatever | ||
+ | * we can't do greedy. counterexample: | ||
+ | * we gotta do dynamic programming which is all about subproblems. | ||
+ | * Question: is the difference between greedy and dynamic that greedy does break things up, but it doesn' | ||
+ | -des the alg | ||
+ | * a false start: we can't just sort by i and take things in order because accepting req n does not mean we have to reject all other requests because now we have to deal with the concept of " | ||
+ | * a better sol: more subprobs, and also need to know how much work we can fit in for n-1 items with w-wi weight; if n not O, then opt(n, | ||
+ | * 6.8: if w<wi, then opt(i, | ||
+ | subsetSum(n, | ||
+ | array M(0..n) | ||
+ | init M(0,w) = 0 for each w=0..w | ||
+ | for i=1..n | ||
+ | for w=0..w | ||
+ | use 6.8 to compute M(i,w) | ||
+ | | ||
+ | -analyzing the alg | ||
+ | * we can use a 2d table repping the building up of subproblems to find the next row of values. m(i-1,w) and M(i-1,w-wi) computed makes M(i,w) | ||
+ | * 6.9: the subsetSum(n, | ||
+ | * 6.10: given table M of opt vals of subprobs, the opt set S can be found linear time | ||
+ | -extension: the kn prob: what about nonnegative weight wi and distinct val vj, but now we have to find max val with a restriction on what the max value is (not just the size of kn now but the value). bound wi<=W | ||
+ | 1) if n not in O, opt(n-1,W) | ||
+ | 2) if n in O, opt(n-1), W-wn) | ||
+ | * 6.11: if w<wi, then opt(i,w) = opt(i-1,w). otherwise opt(i, | ||
+ | * 6.12: the kn prob can be solved in O(nW) time | ||
+ | |||
+ | ===== Week 8 Reading ===== | ||
+ | Interest: 9/10. You're right, the finding closest pair of a sorted list in linear time is cool. | ||
+ | ====5.2 further rec rels==== | ||
+ | >DAC using recursion on q subprobs of n/2 size each | ||
+ | >5.3: for some c, Tn <= qTn/2 +cn when n>2 and T2 <= c | ||
+ | >The case of q>2 subprobs | ||
+ | * analyze first few levels for prob of size n; usually total work per level increases as recu does | ||
+ | * identify a pattern to gen total work/level | ||
+ | * sum over all levels of recursion | ||
+ | * 5.4: any function T satisfying 5.3 with q>2 is bounded by O(n^log2 of q) | ||
+ | * applying partial subst: how to choose the right k and d to lower RT. new formula: kn^d - (ql/2 - c)n | ||
+ | >the case of one subprob | ||
+ | * 5.5 any function T sat 5.3 with q=1 is bounded by O(n) | ||
+ | * the effect of the parameter q:it's a knife edge because the amt of work/lev is exactly the same... nlogn | ||
+ | >a related recurrence: Tn <= 2Tn/2 + n^2 | ||
+ | * DAC template: div input into 2 eq pieces, solve em sep by rec, then combine results into an overall sol, spending quad for initial div and final recombining | ||
+ | * 5.6: for some c, Tn <= 2Tn/2 + cn^2 when n>2 and T2 <= c | ||
+ | * analyzing first few levels: AWPL decreases as opposed to the example in 5.1 | ||
+ | * identifying a pattern: TWPL is cn^2/2^j where j is the level | ||
+ | * summing over all levels of recu: j0..n-1 (1/2^j) <= 2cn^2 = n^2 | ||
+ | |||
+ | ====5.3 counting inversions==== | ||
+ | >the prob: rankings, collaborative filtering, having to count how many inversions your pref list has with one person/many other people' | ||
+ | >des and ana the alg | ||
+ | * we can do better than taking each number and comparing all the others (n^2). we can do nlogn | ||
+ | * sort, then split into 2 halves, rec sort each half, then combine | ||
+ | * Merge-and-Count(A, | ||
+ | maintain Current pointer init to first el | ||
+ | maintain Count for inv, init to 0 | ||
+ | while both A and B not empty | ||
+ | | ||
+ | if bj is smaller, increment Count by num of el in A left | ||
+ | | ||
+ | append remainder of other list to output | ||
+ | return count and merged list | ||
+ | |||
+ | * Sort-and-Count(L) RT n | ||
+ | if list has one el, no inv | ||
+ | else | ||
+ | div list into 2 halves w/ a first n/2 and b second n/2 els | ||
+ | (ra, A) = sort and count(a) | ||
+ | (rb, B) = sort and count(b) | ||
+ | (r,L) = merge and count(a,b) | ||
+ | return r=ra + rb + r and sorted list L | ||
+ | * 5.7: the sort and count alg correctly sorts the input list and counts the num of inv; it runs nlogn for a n-el list | ||
+ | |||
+ | ====5.4 finding the closest pair of points==== | ||
+ | >the prob: n points in plane, find pair that is closest; part of computational geometry (graphics, comp vision, geographic info sys, molecular modeling). we gotta do better than n^2. in this chap we'll do nlogn time, but in chap 13, there' | ||
+ | >des the alg: set of points P, pi has coords (xi,yi) and d(pi,pj) is the distance b/t the two pts | ||
+ | * DAC style is to find closes pair among left half then right half then get overall sol in linear; this can be nlogn with the combining part being an additional n | ||
+ | * setting up the recu: sort all pts by x and then by y coord; the pos of each pt is kept by the lists they' | ||
+ | * 5.8: if there exists q and r for which d(q, | ||
+ | * proof: x*-qx <= rx-qx <= d(q,r) < de. rx-x* <= rx-qx <= d(q,r) < d. so each q and r has an x-coord within de of x* and hence lies within de of L | ||
+ | * 5.9: if s and s' have prob that d(s, | ||
+ | * proof: the boxes thing showed in class on Mar10. there can only be one pt per box because that distance de is the min for each side. then there are 15 adjacent boxes on the other side of L that must be checked for s and s' to be compared | ||
+ | * Closest-Pair(P) alg | ||
+ | construct px and py nlogn time | ||
+ | p*0, p*1 = closest pair rec(px, py) | ||
+ | |||
+ | closest-pair-rec(px, | ||
+ | if |p| <= 3, then find closes pair by meas all pairwise dist | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | de = min above 2 lines | ||
+ | x* = max x coord of pt in Q | ||
+ | L = {(x, | ||
+ | S = pts in P within de of L | ||
+ | |||
+ | | ||
+ | for each s in Sy, compute dist from s to each of next 15 pts in Sy; let s, s' be the min dist from these pairs ntime | ||
+ | |||
+ | if d(s, | ||
+ | else if d(q*0, | ||
+ | else return [r] | ||
+ | |||
+ | * 5.11: the alg correctly outputs a closes pair of pts in P | ||
+ | * proof: using induction and 5.10 and 5.9, we get closest pair in S and distance less than de; now that pair has both els in Q or R, or one in each; if former, it's found by recu, if latter, then remainder of alg finds it | ||
+ | * 5.12: RT is nlogn | ||
+ | * proof: initial sort is nlogn, then remainder is recurrence 5.1 of nlogn | ||
===== Week 7's Reading Summary: ===== | ===== Week 7's Reading Summary: ===== | ||
+ | Interest: 8/10 because this is only the 2nd new type of alg we've come across in 7 weeks of the class | ||
====4.7 Clustering==== | ====4.7 Clustering==== | ||
>the problem | >the problem | ||
Line 57: | Line 353: | ||
* adaptation can be a problem if frequencies of letters change depending on their location in total encoding; this can be solved by placing halfway into the sequences an instruction that says the encoding changed | * adaptation can be a problem if frequencies of letters change depending on their location in total encoding; this can be solved by placing halfway into the sequences an instruction that says the encoding changed | ||
* investing small amt of space can save big as it reduces avB/L in the long run (adaptive compression schemes) | * investing small amt of space can save big as it reduces avB/L in the long run (adaptive compression schemes) | ||
+ | ====(CHAPTER 5: DIVIDE AND CONQUER) 5.1 a first recurrence: the mergesort alg==== | ||
+ | * template for mergesort and other common divide and conquer algs: divide input into 2 equal pieces, solve the 2 subproblems by recursion, then combine spending only linear time for initial division and final recombining | ||
+ | * 5.1 recurrence relation: for some constant c, T(n) <= 2T(n/2)+cn when n>=2 and T(2) <= c; this specifies T(n) in terms of its values or smaller inputs, and we now need to get T only on the left hand side | ||
+ | * designing eff DAC algs is heavily intertwined with understanding how recurrence relation determines a RT | ||
+ | > | ||
+ | > | ||
+ | >subbing a solution into the MR: T(n) = 2c(n/ | ||
+ | >an approach using partial substitution (weaker of the two): | ||
===== Week 6's Reading Summary: ===== | ===== Week 6's Reading Summary: ===== | ||
+ | Interest: 7/10 | ||
==== 4.2 Sch to Min Lateness: An Exchange Argument ==== | ==== 4.2 Sch to Min Lateness: An Exchange Argument ==== | ||
>The problem: one resource to meet n requests for an interval of time. Each req has deadline d. Lateness means the deadline was missed. We want to minimize the max lateness of a request | >The problem: one resource to meet n requests for an interval of time. Each req has deadline d. Lateness means the deadline was missed. We want to minimize the max lateness of a request |