Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:patelk:chapter5 [2018/03/10 18:40] – [Chapter 5] patelk
+++ courses:cs211:winter2018:journals:patelk:chapter5 [2018/03/10 20:28] (current) – [5.3 Counting Inversions] patelk
@@ Line 7: / Line 7: @@
 ===== 5.1 A First Recurrence: The Mergesort Algorithm =====
-===== 5.2 Further Recurrece Relations =====
+  * Mergesort: sorts a given list of numbers by first dividing them into two equal halves, sorting each half separately by recursion, and then combining the results of these recursive calls using the linear time algorithm for merging sorted lists
+  * Base Case: when input has been reduced to size 2, T(n_ is equal to a constant when n is a constant.
+    * Stop the recursion and sort the two element
+  * For some constant c, T(n) <= 2T(n/2) + cn when n >2, and T(2) <= c
+  * To gain an explicit bound, we need to solve the recurrence relation so that T appears only on the left-hand side of the inequality, not the right-hand side as well.
+__Approaches to Solving Recurrences__
+   - "unroll" the recursion, accounting for the running time across the first few levels, identifying a pattern that can be continues as the recursion expands. Sum up the running times over all levels -> total running time
+   - Start with a guess, substitute in the recurrence relation, check if it works; Use an argument by induction on n to formally justify this approach
+__Unrolling the MergeSort Recurrence__
+  * Analyzing the first few levels: single problem of size n
+    * level 0: takes at most cn plus the time spent in all subsequent recursive calls
+    * level 1: two problems of size n/2; each takes at most cn/2 time
+    * level 3: four problems of size n/4; each takes at most cn/4 time
+  * Identifying a pattern: level j: number of subproblems has doubled j times: 2^j; each ahs shrunk in size by a factor of two j times, and so each has size n/2^j; each level contributes at most 2^j(cn/2^j)=cn to the total running time.
+  * Summing over all levels of recursion: we've found that the recurrence in (5.1) has the property that the same upper bound of cn applies to total amount of work performed at each level. Number of levels: logn. Total running time = O(nlogn)
+__Substituting a Solution into the Mergesort Recurrence__
+  * Belief: T(n) <= cnlogn for all n>2
+    * want to check if this is true
+  * Plug it into the recurrence:
+    * for n=2: true, since cnlogn = 2c
+    * By induction: T(m) <= cmlogm, for all values of m less than n, and we want to establish this for T(n).
+      * T(n/2) <= c(n/2)log(n/2) -> log(n/2) = logn -1
+__An Approach Using Partial Substitution__
+  * One guesses the overall form of the solution without pinning down the exact values of all the constants and other parameters at the outset.
+  * Suppose we believe that T(n) = O(nlogn)
+  * First write T(n) <= knlogbn for some constant k and base b
+  * Try out one level of the induction as follows
+    * T(n) <= 2T(n/2) + cn <= 2k(n/2)logb(n/2)+cn
+  * Chose 2 as the base to help with simplification to:
+    * T(n) = (knlogn) - kn + cn
+  * k must be at least as large as c, so:
+    * T(n) <= (knlogn) - kn +cn <= knlogn
+==== Personal Thoughts ====
+Using mergesort as the example was helpful, as I am pretty familiar with that algorithm at this point. We went over it in class which helped me follow along with the step-by-step process of coming up with an appropriate recurrence relation. Still though, this material is a little difficult for me and I know I'll need more practice with its application before I really understand it.
+Readability: 6.0
+Interesting: 6.0
+===== 5.2 Further Recurrence Relations =====
+  * Divide-and-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine in O(n) time
+  * For some constant c, T(n) <= qT(n/2) + cn when n>2, and T(2)<= c.
+**The Case of q>2 Subproblems**
+  * Unrolling in the case q>2:
+    * Analyzing the first few levels: show an example of this for the case q=3; first level of recursion, single problem of size n takes at most cn plus the time spent in subsequent recursive calls; next level, q problems, each of size n/2, each takes at most cn/2 time for a total of at most (q/2)cn, again plus the time in subsequent recursive calls...
+    * Identifying a pattern: at level j, there are q^j distinct instances, each of size n/2^j. Thus at level j = q^j(cn/2^j) = (q/2)^j * cn.
+    * Summing over all levels of recursion: logn levels of recursion -> geometric sum. Any function T(.) satisfying "for some constant c, T(n) <= qT(n/2) + cn when n>2, and T(2)<= c" with q>2 is bounded by O(n^logq)
+  * Applying Partial Substitution:
+    * q>2 has the form T(n) <= kn^d for k>0 and d>1.
+      * T(n) <qT(n/2)+cn
+      * T(n/2) = (q/2^d)kn^d + cn
+        * choose d so that q/2^d = 1 -> d=logq
+        * get rid of the cn term -> change the form of our guess for T(n) so as to explicitly subtract it off
+        * Handle base case and choose k: choose k large enough so that the formula is a valid upper bound for the case n=2
+**The Case of One Subproblem**
+  * Consider case of q=1
+  * Unrolling the recurrence:
+    * Analyzing the first few levels: first-single problem of size n that takes at most cn time, next-one problem of size n/2 which contributes to cn/2, next-one problem of size n/4 which contributes to cn/4
+    * Identifying a pattern: at level j, size n/2^j and contributes to a cn/2^j running time
+    * Summing over all levels of recursion: There are logn levels of recursion and the total amount of work performed: T(n) <= 2cn = O(n)
+  * Any function T(.) satisfying for some constant c, T(n) <= qT(n/2) + cn when n>2, and T(2)<= c" with q=1 is bounded by O(n)
+  * Geometric series with a decaying exponent: fully half the work performed by the algorithm is being done at the top level of the recursion
+  * The Effect of Parameter q: when q=1, the resulting running time is linear; when q=2, it's O(nlogn); when q>2, it's a polynomial bound with an exponent larger than 1 that grows with q
+    * When q=1, the total running tie is dominated by the top level, whereas when q>2, it's dominated by the work done on constant-size subproblems at the bottom of the recursion
+**A Related Recurrence: T(n) <=2T(n/2)+O(n^2)**
+  * Divide the input into two pieces of equal size, solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending quadratic time for the initial division and final recombining.
+    * For some constant c, T(n) <=2T(n/2)+cn^2 when n>2 and T(2) <=c
+  * first reaction is to guess that the solution will be T(n)=O(n^2logn)
+    * true but we can also show a stronger upper bound
+  * Unrolling:
+    * Analyzing the first few levels: first-single problem of size n that takes at most cn^2 time plus the time spent in all subsequent recursive calls, next- we have two problems, each of size 2/2. Each takes at most c(n/2)^2 for a total of at most cn^2/2
+    * Identifying a pattern: at level j, there are j^j subproblems each of size n/2^j -> cn^2 / 2^j
+    * Summing over all levels of recusion: we've arrived at almost the same sum that we had for q=1 in the previous recurrence: O(n^2)
+      * Initial guess overestimated becasue of how quickly n^2 decreases as we replace it.
+==== Personal Thoughts ====
+This section took a while to digest as there are mathematical equations at each step. I think I will have to refer back to this section and the examples until I get the hang of it, but for the most part the section laid out the recurrence relations in a very step-by-step manner that made it easier to follow.
+Readability: 6.0
+Interesting: 6.0
 ===== 5.3 Counting Inversions =====
+**The Problem**
+  * Rankings: becoming important to a number of current applications
+  * Collaborative Filtering: match your preferences with those of other people out on the Internest
+    * can then recommend new things that those other people have liked
+  * Meta-Search Tools: execute the same query on many different search engines and then try to synthesize the results by looking for similarities and differences among the various rankings that the search engine return
+  * Comparing Two Rankings:
+    * Natural Method: label the movies from 1 to n, order these labels according to the stranger's ranking, see how many out of order pairs there are
+      * Sequence of n numbers a1...an. Assume numbers are distinct; define a measure that tells us how far this list is from being in ascending order
+      * Count the number of inversions: if ever pair forms an inversion, there are (n choose 2) inversions.
+**Designing and Analyzing the Algorithm**
+  * Look at every pair of numbers (ai,aj) and determine whether they constitue an inversion -> O(n^2)
+  * Desired Time: O(nlogn)
+  * Set m = [n/2] and divide the list into two pieces a1,...,am and am+1,...,an.
+  * Count the number of inversions in each of these two halves separately
+  * Then count the number of inversions where the two numbers belong to different halves -> must do this in O(n) time
+    * the pairs (ai,aj) where ai is in the first half, aj is in the second half and ai>aj
+    * Recursively sort the numbers in the two halves
+    * Merge-and-Count: left with two sorted lists A and B
+      * produce a single sorted list C from their union, while counting the number of pairs(a,b) with an inversion.
+      * Walk through the sorted lists A and B, removing elements from the front and appending them to the sorted list C. In a given step, we have a Current pointer into each list, showing our current position.
+      * Every time ai is appended to C, no new inversions are encountered, since ai is smaller than everything left in list B, and it comes before all of them. If bj is appended, then there is an inversion so we increase our count of the number of inversions by the number of elements remaining in A. In constant time, we can account for all inversions.
+{{:courses:cs211:winter2018:journals:patelk:merge-and-sort3.png?nolink&400|}}
+{{:courses:cs211:winter2018:journals:patelk:merge-and-sort1.png?nolink&400|}}
+{{:courses:cs211:winter2018:journals:patelk:merge-and-sort2.png?nolink&400|}}
+  * The Sort-And-Count algorithm correctly sorts the input list and counts the number of inversions in O(nlogn) running time for a list with n elements
+==== Personal Thoughts ====
+I thought this section provided a very interesting algorithm to count the number of inversions in O(nlogn) time. The combination of this section and the classroom discussion did a pretty good job of helping me understand this material. It also helped that we were introduced to inversions in another chapter/section. This made it so that the concept as a whole wasn't as difficult to understand. The algorithm and runtime both make sense to me.
+Readability: 7.0
+Interesting: 7.0