Cory's Journal

Preface Summary

Algorithmic ideas are useful within and far beyond computer science. They do not only have many applications; algorithms are also a great way to see the over-arching goals of the field of computer science in general. Dealing with algorithms, there are two main components: getting to the mathematical center of the problem, and then identifying the appropriate algorithm design techniques. At their most effective, algorithmic ideas don't just provide solutions to well-posed problems; they allow us to express the underlying questions effectively. This section was a very simple introduction to the ideas we will be learning throughout the course. I do not have any questions on the introduction, but I do believe that I am beginning to get a feel for how important this course is to understanding major concepts of computer science (and also for being able to effectively tackle technical interviews). We are also supposed to rank each section that we summarize on a scale of 1 to 10, 10 being most interesting and readable. This section was roughly (3/10). In the future, I will just put this number at the end of each summary with the understanding that I am ranking the section on this type of scale.

1.1 Summary

The Stable Matching Problem was the result of two mathematical economists, Gale and Shapley, trying to find a college admissions or job recruiting process that was self-enforcing. In a situation where matches repeatedly change based on individuals' preferences, a lot of chaos may be generated, and both sides of the matches can end up unhappy with the process and the outcome. The Stable Matching Problem is a method to reduce this chaos. To understand the essence of the problem, we eliminate real-life complications and just look at a more bare-bones version: Consider a set M = {m1, …, mn} of n men and a set of W = {w1, …, wn} of n women. M x W denotes the set of all possible ordered pairs of the form (m, w) where m is a man in the set M and w is a woman in the set W. A matching S is a set of ordered pairs each from M x W where each member of M and each member of W appears in at most one pair in S. A perfect match S' is a matching with the property that each member of M and W appears in exactly one pair in S'. Adding preferences means that each m in set M ranks all women. m prefers w to w' if m ranks w higher than w' in his preference list. Likewise, all women rank all the men. Our goal is a perfect matching with no instabilities. S is stable if it is perfect and there is no instability with respect to S. It is possible for an instance to have more than one stable matching. To go through the algorithm, initially all m in M and w in W are single, or free. While there is a man m who is free and hasn't proposed to every woman, you choose one of the free men, m. Have him propose to each w in the order that they appear on the man's preference list. If w is free, (m, w) become engaged. Otherwise, w is currently engaged to m'. From here, if w prefers m' to m, m remains free and keeps searching. Otherwise, w prefers m to m' and (m, w) become engaged while m' becomes single. This process is repeated until there are no single men, or until free men have proposed to each woman. (7/10) My question for section 1.1: How useful would this solution be in a real-life situation with multiple real-life complications?

2.1 Summary

A major focus of the book is finding efficient algorithms for computational problems, which seems to encompass all of computer science. So to approach this specifically, first we identify broad themes and design principles in the development of algorithms. A property shared by many of the problems we study is that they are discrete, meaning that they involve a search over many combinatorial possibilities with the goal of efficiently finding a solution to satisfy clearly delineated conditions. Algorithms must be efficient. Particularly, space or memory used by an algorithm is a big issue. The proposed definition of efficiency is that an algorithm is efficient if, when implemented, it runs quickly on real input instances. This definition omits where and how well we implement an algorithm. This definition also does not account for how an algorithm may scale as the problem grows. Analyzing the worst-case of an algorithm has often been found to do a reasonable job of capturing its efficiency in practice. And if the worst-case can be found to be efficient, the algorithm is efficient. The search space on the Stable Matching Problem, for example, is very large, and the brute-force algorithm would plow through all perfect matchings in numerical order. This means that brute-force is a very slow, inefficient solution to a problem. So, another proposed definition for efficiency is that “an algorithm is efficient if it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search.” Polynomial running time is when an algorithm holds to the principle where for two constants c > 0 and d > 0, on every input instance of size N, its running time is bounded by cN^d primitive computational steps. Another proposed definition of efficiency is that an algorithm is efficient if it has a polynomial running time. This is also good because it becomes negatable; it is possible to express the notion that there is no efficient algorithm for a particular problem. (6/10) This leads to my question: Just how efficient is the Gale-Shapley solution to the Stable Matching Problem?

2.2 Summary

An algorithm's worst-case running time on size n inputs grows at most at a rate proportional to some function f(n). Algorithms can be expressed in pseudo-code, that resembles a high-level programming language. We want to express the growth rate of running times and other functions so that constant factors and low-order terms are not counted; for example, with a running time of 1.62n^2 + 3.5n + 8, we'd say that it grows like n^2 up to constant factors.

Asymptotic Upper Bounds

Let T(n) be a function, or the worst-case running time of an algorithm with input size n. With another function f(n), T(n) is O(f(n)) if the function T(n) is bounded above by a constant multiple of f(n); T(n) = O(f(n)). In other words, T(n) = O(f(n)) if there are constants c > 0 and nO ≥ 0 so that for all n ≥ n), we have T(n) ≤ c*f(n). In this case, T is asymptotically upper-bounded by f. Functions can have many upper bounds.

Asymptotic Lower Bounds

To show that the upper bound is the best possible running time, we show that for arbitrarily large input sizes n, T(n) is at least a constant multiple of some specific function f(n). Thus, T(n) = Ω(f(n)) if there exist constants ε > 0 and nO ≥ 0 so that for all n ≥ nO, we have T(n) ≥ ε*f(n). Here, T is asymptotically lower-bounded by f. ε must be fixed, independent of n.

Asymptotically Tight Bounds

If we can show that T(n) is both O(f(n)) and Ω(f(n)), then we've found the “right” bound: T(n) grows exactly like f(n) to within a constant factor. The notation to express this is as follows: if T(n) is O(f(n)) and Ω(f(n)), then T(n) is Θ(f(n)), and we say f(n) is an asymptotically tight bound for T(n). These are nice things to find on worse-case running times because they characterize the worst-case performance of algorithms precisely up to constant factors. One can obtain such bounds by closing the gap between upper and lower bounds.

Some properties that these asymptotic growth rates have are transitivity and sums of functions (we can get results by adding the two functions). Some functions that come up repeatedly in the analysis of the algorithms are polynomials, logarithms, and exponentials.

Though I do not have a direct question for this section, I am still having difficulty fully grasping the concept of these bounds: how they are found, and what exactly we use them for. I will continue to study them in hopes of fully understanding them as soon as possible. (8/10)

2.3 Summary

For each algorithm, the choice of data structure is up to the algorithm's designer. An array has these properties: -We can find the ith element in the arry in O(1) (constant) time. -Looking for an element e in the array is O(n) (linear) time(if unsorted). -If the elements in the array are sorted, we can find the element e in the array in O(logn) time with binary search. Arrays are less good for maintaining a dynamic list of elements that are frequently added to or removed from. Linked lists are an alternative, where each element points to the next in the list. Each pointer must be maintained, and the list can be traversed in time proportional to its length. A doubly linked list can also be used, elements pointing to previous and next elements. Linked lists are not as useful for finding the ith elements, because this takes O(i) time. Arrays and linked lists can be used to implement the Stable Matching algorithm. With these, the G-S algorithm can be implemented in O(n^2) time. Question: Is there an even faster way? Interest: 8/10

2.4 Summary

Linear Time: Algorithms that run in O(n) time have running times at most a constant factor times the size of the input. This is generally found by processing the input with a single pass. O(nlogn) time is a common running time: It is the running time of any algorithm that splits its input into two equal-sized pieces, solves each piece recursively, and then combines the solution in linear time. Another common running time is quadratic time, or O(n^2). This is commonly found by performing a search over all pairs of input items and spending constant time per pair, or from a pair of nested loops. Cubic (O(n^3)) time can be seen with more elaborate sets of nested loops. It is unclear if the algorithms that lead to this are practical on inputs of reasonable size. Finally, we have O(n^k) time running time when for any constant k, we search over all subsets of size k. Question: Are there other running times between these that just aren't as common? Interest: 9/10

2.5 Summary

Priority queues are useful when describing how to implement graph algorithms. It is designed for applications where elements have a priority value, or key, and each time an element is selected from a set, we want to take the one with highest priority. A priority queue is a data structure that maintains a set of elements where each element has an associated value key that denotes the element's priority. A heap is used to implement a priority queue. The heap data structure combines benefits of a sorted array and a list. We think of a heap as a balanced binary tree; the tree will have a root and each node can have up to 2 children. The keys in the binary tree are in heap order if the key of any element is at least as large as the key of the element of its parent. Heapify-up is used to “fix” the tree if this condition is not met. Heapify-down is used if the key is too big and one of its children. Together, Heapify-down and Heapify-up ops can efficiently implement a priority queue. Question: What are priority queues good for compared to arrays or linked lists? Interest: 8/10