Sam's Journal

Summary of Preface, Ch. 1.1, Ch. 2.1, Ch. 2.2

Preface

“The study of algorithms is a powerful lens through which to view the field of computer science in general.” The preface explains how central algorithms are to our lives and how pretty much every problem we face on a daily basis can be expressed using algorithms. They say that the “algorithmic enterprise” consists of two main things: drilling down to the mathematical core of any given problem and then figuring out the best algorithmic strategy to solving that problem. These two pieces, of course, overlap. The more familiar one is with the many possible algorithmic solutions to problems, the better one generally is at boiling down complex problems into their base states that can then more readily be solved. This book aims to teach us to do exactly that. Also, the book was comprised by people who teach the class at Cornell.

Ch. 1.1

Chapter 1.1 is a review of the stable matching problem. The book first talks about the problem in terms of students applying to internships at particular companies, and later says that the very first version of the problem was probably the medical residents/hospitals version that we discussed in class. Finally, they boil it down to the “marriage” problem we also did in class. A matching, S, is stable if it is both a perfect matching AND there is no instability. It is possible for there to be more than one stable matching. The algorithm that is used to solve such a problem is fairly simple, as we discussed in class. A male proposes to the highest ranked female he has not yet proposed to; she will except if she is single or if she prefers the man proposing to her current fiance. If she choose the proposer over her current fiance, her fiance goes back into the pool of “free” men. The algorithm stops when no man is free. The longest possible run time is n^2; assuming all men end up proposing to all women. In addition, the people responsible for coming up with the stable matching algorithm were Gale and Shapley (1962).

Ch. 2.1

This section beings by talking about our goal with algorithms: to find the most efficient algorithm for any computation problem. But first, we must define what it means for an algorithm to truly be efficient. The book goes through a few possible definitions of efficiency. The first they discuss is “An algorithm is efficient if, when implemented, it runs quickly on real input instances.” This first definition is dismissed because of a few particular flaws: the lack of specificity about where and how the algorithm is implemented, the lack of clarity about what a “real” input is, and the fact that this definition does not consider how the performance of an algorithm may change drastically when performed on 100 vs. 100,000. This beginning definition, however, keys us to the idea that we need to begin looking at efficiency in a more mathematical, quantifiable way.

We then turn to analyzing worst-case run time scenarios to help us come up with a better definition of true efficiency. The next definition the book proposes is this: “An algorithm is efficient if it achieves qualitatively better performance, at an analytical level, than brute-force search.” Again, the book finds issue with the fact that “qualitatively better” performance is rather vague. This leads the authors to their third and final definition of efficiency: “An algorithm is efficient if it has a polynomial running time.” Now, the book understands that this final definitions may seem a little weak, but their justification for it is this: “It really works. Problems for which polynomial running-time algorithms exist almost invariably turn out to have algorithms with running times proportional to very moderately growing polynomials like n, n log n, n^2, or n^3.” A huge benefit of a definition like this is the fact that it is negatable. We can now say that an algorithm is NOT efficient if it has a worse than polynomial worst-case running time. It is also possible to express that sometimes there is no efficient algorithm for a particular problem.

Ch. 2.2

An algorithm's worst-case running time on inputs of size n grows at a rate that is at most proportional to some function f(n). We aim to express the growth rate of running times and other functions in a way that is insensitive to constant factors and low-order terms. Asymptotic Upper Bounds Let T(n) be a function, such as the worst-case running time of an algorithm with an input of size n. Given another function f(n), we say that T(n) is O(f(n)) if T(n) is bounded above by a constant multiple of f(n). Sometimes this is written as T(n) = O(f(n)). Asymptotic Lower Bounds For arbitrarily large input size n, the function T(n) is at least a constant multiple of some specific function f(n). Thus, we say that T(n) is Ω(f(n)), also written T(n) = Ω(f(n)). Asymptotically Tight Bounds If a function T(n) is both O(f(n)) and Ω(f(n)), we say that T(n) is Θ(f(n)). We would then say that f(n) is an asymptotically tight bound for T(n).

Summary of 2.3, 2.4, 2.5

2.3

Before we can implement any algorithm, the first step is generally deciding which data structures are going to be most beneficial to us and then getting our data into whatever structure we choose. In some cases we will have to do some preprocessing of the data to get it into our chosen structures. Arrays are one of the simplest ways to keep a list of n elements. Typically the array will be of length n and A[i] will be the ith element of the list. Getting the ith element is O(1), and checking to see if an item is in the array is linear O(n), unless the list happens to be sorted, in which case the search can be accomplished in O(log n) time. However, an array is not great for maintaining a dynamic list. Doubly linked lists can also be used, but then we lose our constant-time access of elements; this then results in O(i) time. In order to implement the Gale-Shapely algorithm as efficiently as possible, we will have to get several operations down to constant time. Page 46 details this.

2.4

When approaching a new problem, it can help to think about two different kinds of bounds: the worst-case scenario running time (brute force search) and the running time you hope to achieve. A linear algorithm, O(n) running time, has a natural property: its running time is at most a constant factor times the size of its input. The most basic way to get an algorithm to this running time is to process the input in a single pass. Other algorithms can be linear for more subtle reasons, like the merging of two sorted lists that we demonstrated in class. It is a linear operation.

Another common running time is O(n log n), most frequently exemplified by sorting algorithms. Quadratic running time is also fairly common, and can stem from many things, often performing a search over all pairs of input items while spending constant time per pair, or simply nested loops. More elaborate sets of nested loops often result in O(n^3) in running time. O(n^k) can be obtained by performing a brute-force search over a set on n items with k subsets.

Beyond Polynomial Time

Some common bigger-than-polynomial running times we see often are 2^n as well as n!. 2^n arises naturally as a running time for a search algorithm that must consider all subsets. It is important to note that often two algorithms can have very similar-looking search spaces and still have widely varying run-times: one run time might be a lot better than brute-force search while another won't be able to be improved upon at all.

n! is also a naturally occurring run time, but it is a little scary-looking because it grows even faster than 2^n. n! is the number of ways to match up n number of items with n other items, and this is one way n! occurs. Another example of n! naturally occurring is the Travelling Salesman Problem. Given n cities, with the distances between all pairs, what is the shortest route that visits all cities? Assuming the salesman starts and ends at the first city. This problem is not known to have an efficient solution. It belongs to the class of NP-complete problems.

Finally, some algorithms actually run in sublinear time. It takes linear time just to read input, so how does this occur? Often sublinear running times happen when input can be “queried” indirectly rather than read completely, and we then aim to minimize the amount of querying we do. An example of this is the binary search algorithm.

2.5

Priority queues are a more complex data structure that in some cases can help us overcome higher-level obstacles that prevent us from achieving a polynomial-time solution, depending on the type of problem we are trying to solve. For instance, they can be useful for the Stable Matching Problem. So, what exactly is a priority queue? It is a data structure that maintains a set of elements s, where each element v in S has an associated value key(v) that denotes the priority of element v; the smaller the key the higher the priority. They support the addition and deletion of elements from the set and also the selection of the element with the smallest key. A priority queue can be implemented so that the running time of most of its operations (adding, deleting, selecting element with minimum key) is at most O(log n). Sorting a priority will take O(n log n) time.

To implement a priority queue, we will use a data structure called a heap, like the one we discussed in class. It will be represented as a binary search tree. The tree will have a root and every node will have at most two children. The tree will be considered in “heap order” if the key of any element is at least as large as the key of the element at its parent node int the tree. Or, the child nodes of any parent node must be equal to or larger than their parent node. As we discussed in class, the process of adding nodes to the heap is fairly simple. We will add them to the bottom of the tree and then use the Heapify-up procedure to fix the ordering. Heapify-up is O(log n) in running time, as is Heapify-down.

Readability

9/10: Pretty easy to understand this section.

Summary of 3.1, 3.2, 3.3

3.1

Graph typically refers to an undirected graph. Directed graphs will be called such. Edges will be defined as a line between a set of nodes (u, v). There are many different common examples of graphs: 1. Transportation networks (Plane routes) 2. Communication Networks (computer networks) 3. Information Networks (WWW) 4. Social Networks (Facebook) 5. Dependency Networks (Courses and their Prerequisites). A path in any undirected graph is a sequence P of nodes v1, v2, v…, vk-1 such that each consecutive pair vi and vi+1 are joined by an edge. (v1 and v2, v2 and v3, etc.) A path is called simple if all of its vertices are distinct. A cycle starts and ends at the same node. A graph is connected if for every pair of nodes u and v, there exists a path from u to v. That is, no node is seperate from all the other nodes. A directed graph is considered strongly connected if for every two nodes u and v there exists a path from u to v as well as a path from v to u. The distance between any two nodes is the number of edges required in a path from u to v.

A graph is a tree if it is connected and does not contain a cycle. Trees have a root node, r. Rooted trees are a fundamental notion in CS because they encode the notion of hierarchy.

Thm. 3.1: Every n-node tree has exactly n-1 edges.

Thm. 3.2: Let G be a graph of n nodes. Any two of the following statements implies the third: 1. G is connected. 2. G does not contain a cycle. 3. G has n-1 edges.

3.2

Given two nodes s and t in graph G, is there a path from s to t? We call this determining s-t connectivity. It can also be called the Maze-solving Problem. There are two common approaches to this problem: breadth-first search and depth-first search.

Breadth-first search begins at a node s and explores outward from s in all possible directions one layer at a time. Note: There will be obvious holes in any graph that is not fully connected; any nodes that cannot be reached from the starting node will not be “seen” by either search type, BFS or DFS. In a BFS, we find that nodes in Layer Lj have a distance of j from the starting node, s.

Thm. 3.3: For each j >= 1, layer Lj produced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer.

BFS produces a tree rooted at S.

Thm. 3.4: Let T be a breadth-first search tree, let x and y be nodes in T belonging to layers Li and Lj, and let (x,y) be an edge of G. Then i and j differ by at most one. In other words, if there exists an edge between any two nodes on different layers of the tree, then the nodes must be on adjacent layers.

The connected component of any graph G is the set R of G containing the starting node s. There exists a simple algorithm to produce the set R of nodes that make up the connected component of any graph:

    R will consist of nodes to which s has a path
    Initially R = {s}
    While there is an edge (u,v) where u is in R and v is not in R:
        Add v to R
    Endwhile

Thm. 3.5: The set R produced by this algorithm is precisely the connected components of G containing s.

DFS is the intuitive way one might travel through a maze. You will select a path and follow that path until you reach a dead-end, then back track to the starting part and try again. It also has a simple algorithm. Initially, all nodes are marked unexplored.

  DFS(u):
  Mark u as "explored" and add u to R
  For each edge (u, v) incident to u
      If v is not marked "explored," then:
          recursively invoke DFS(v)
  Endif
  Endfor

DFS also yields a tree rooted at node s, but the tree will look much different than a BFS tree.

Thm. 3.6: For a given recursive call DFS(u), all nodes that are marked “explored” between the invocation and end of this recursive call are descendants of u in T.

Thm. 3.7: Let T be a depth-first search tree, let x and y be nodes in T, and let (x,y) be an edge of G that is not and edge of T. Then one of x or y is an ancestor of the other.

Thm. 3.8: For any two nodes s and t in a graph, their connected components are either identical or disjoint.

3.3

BFS and DFS differ obviously in that the first uses a queue and the second a stack. There are two basic ways to represent graphs: with an adjacency matrix or an adjacency list. This book uses the adjacency list representation.

For this section, n = |V| and m = |E|. Connected graphs must have at least m >= n - 1 edges. We aim to implement the basic graph search algorithms in O(m+n) time. Note that running time O(m+n) is the same as O(m) because m >= n - 1 and this running time is considered linear. We will keep our running times in terms of both variables for accuracy.

The simplest way to represent a graph is with an adjacency matrix, which is an n x n matrix, where A[u,v] = 1 if there is an edge between u and v and a 0 if there is not an edge. If the graph is undirected, then it is symmetric; that is, A[u,v] = A[v,u]. There are two main disadvantages to this representation: it takes Θ(n^2) space and examining all edges in the matrix will take Θ(n) time. So, throughout the book we use adjacency lists, which works best for sparse graphs: graphs with fewer than n^2 edges. This list works simply by having a list Adj, where Adj[v] contains a list of all the nodes who have and edge with node v.

Thm. 3.10: Representing a graph using an adjacency matrix takes O(n^2) space while using an adjacency list requires O(m+n) space.

Summary of Thm. 3.11-3.13: The implementations of BFS and DFS given in this book can acheive O(m+n) running times.

Readability

I give this section an 8/10 for readability. I find graphs a little easier to understand than the material we worked with in chapter 2.

Summary of 3.4, 3.5, 3.6, 4.1

3.4

Remember, a bipartite graph is one where the set of nodes V can be partitioned into two sets X and Y such that every edge has one end in X and one end in Y. We will imagine that the nodes in set X are colored red and that those in set Y are colored blue.

Thm. 3.14 If a graph G is bipartite, then it cannot contain an odd cycle.

Is there an easy way to figure out if a graph is bipartite? Yes! We simply begin by coloring one node red, then coloring all of its neighbors blue, then coloring the neighbors of the nodes we just colored blue red and so on. After every node is colored, we can easily tell if our graph is bipartite. If no edge has two same-colored nodes, then our graph is bipartite. This is exactly like breadth-first search. We color the starting node red, then we color L1 blue, L2 red, etc. All the odd-numbered layers are blue and the even ones are red. The total running time for our coloring algorithm is O(m+n) just like BFS.

3.5

Directed graphs! Now our edges has direction! Much direction! Very specific! Wow! Such complication!

Representing directed graphs is fairly simple: now each node will have two lists associated with it: one for nodes to which it has edges and for nodes from which it has edges.

BFS and DFS are almost the same in directed graphs. For BFS, we start at a node S and define our first layer to be to consist of all nodes to which S has an edge, and the next layer to be all the nodes to which the nodes in the first layer have an edge and so on. The run time is still O(m+n).

We may simply create a G^rev, an exact duplicate of G except that all the edges will be reversed. We can then easily see the nodes with a path to s, rather than the nodes to which s has a path.

For directed graphs, we say that any two nodes are mutually reachable if there exists a path from u to v and from v to u. So, a directed graph is strongly connected if EVERY pair of nodes is mutually reachable.

Thm. 3.16: If u and v are mutually reachable, and v and w are mutually reachable, then u and w are mutually reachable. (This sounds like the transitive property to me.)

Thm. 3.17: For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.

3.6

If an undirected graph has no cycle, then it has a very simple structure: each of its connected components is a tree. But for directed graphs, it is possible for no cycles to exist while the graph maintains a unique structure. When a directed graph has no cycles, we call it a DAG, or directed acyclic graph.

These graphs can easily represent dependencies and often the order in which things much be done. A directed edge may exist between node i and node j, indicating that node i's task must be completed before node j's task. This can lead to a topological ordering of a graph G, so that for every edge (vi, vj), we have i<j. In other words, all edges point forward in the ordering. A topological ordering on tasks provides an order in which they can be safely performed. When we come to the task vj, all the tasks that are required to precede it are already done.

Thm. 3.18: If G has a topological ordering, then G is a DAG.

Thm. 3.19: If G is a DAG, there is a node v with no incoming edges.

Thm. 3.20: If G is a DAG, then G has a topological ordering.

Computing a topological ordering of a graph G will be O(m+n).

4 intro + 4.1

It's hard to define what exactly is meant by “greedy,” but generally we can say that greedy algorithms seek to find an over-all optimal solution by choosing the most optimal solution at each step. There are two ways to show that a “greedy” algorithm has found an optimal solution: we may use a “greedy stays ahead” proof or an exchange argument. Greedy stays ahead functions by showing that since a greedy algorithm will function better than any other algorithm at each step, it must produce and over-all optimal solution. The exchange argument takes a greedy solution and transforms it into a known optimal solution without changing its quality, thereby showing that the greedy algorithm has found an optimal solution. There are three very well-known applications for greedy algorithms: finding shortest paths in a graph, the Minimum Spanning Tree Problem, and the construction of Huffman codes for performing data compression.

The interval scheduling problem is one that often has a greedy solution applied. The successful greedy “rule” here is that we accept first the request that finishes first. This algorithm can be O(n logn) in best case.

Another type of interval scheduling problem is scheduling all intervals and determining the number of resources necessary to do so. This is often represented by thinking of scheduling a given set of classes at specific times in the fewest number of rooms possible. This time, our greedy solution will aim to use resources equal to the number of overlaps; that is, if there are 4 classes that begin at 1:00, then an optimal solution will use only 4 classrooms to handle the load.

Thm. 4.6: The greedy algorithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of intervals. This is the optimal number of resources needed.

Readability

9/10 I really enjoyed reading these chapters and felt like they were a quick, easy-to-understand review of material already fully covered in class.

Summary of 4.2, 4.4, 4.5, 4.6

4.2

We will now consider a different problem: suppose we need to schedule a certain amount of jobs that each take a certain amount of time using one particular resource, such as a single computer. We need to find a way to organize all the jobs such that their lateness is minimized; that is, such that they are finished before their deadline as often as is possible. There are several poor greedy algorithms that do not produce optimal solutions, such as doing the shortest jobs first, or even doing the jobs with the shortest amount of slack time first. What does work, however, is simply scheduling the jobs by deadlines, earliest deadline first. This produces an optimal greedy solution.

We can prove that this is an optimal solution by first identifying an optimal solution, O, and then modifying it so that it looks exactly like our greedy solution without increasing the amount of lateness. This is called an exchange argument.

4.4

There are a couple ways to think about the “shortest path” problem in any given graph. When can try to find a shortest path from a node U to a node V, or we can try to find the shortest path from a starting node S to all other nodes in the graph. Dijkstra's Algorithm is often used to solve such a problem.

Dijkstra's algorithm works by first determining the length of the shortest path from s to each other node in the graph. The algorithm maintains a set S of vertices u for which we have determined the shortest path distance d(u) from s; this is the explored part of the graph. Initially S = {s} and d(s) = 0. Now, for each node v in V - S, we determine the shortest path that can be constructed by traveling along a path through the explored part S to some u in S, followed by the single edge (u,v). We choose the node v in V - S for which the distance is minimized, add v to S, and define d(v). The best overall running time for this algorithm, when implemented with a heap-based priority queue, is O(m log n) for a graph with n nodes and m edges.

4.5

The Minimum Spanning tree problem is another type of graph traversal question. Suppose we have a set of locations V = {v1, v2,…) and we want to build a communications network on top of them. A path should exist between every pair of nodes (the graph should be connected with no stragglers) but we want to do this as cheaply as possible. The solution to this problem will be a tree, which is why this is commonly called the minimum spanning tree problem. There are three greedy algorithms that accomplish this task and find a minimum spanning tree:

Kruskal's Algorithm: Starts with no edges at all and builds a spanning tree by adding edges from E in order of increasing cost. As we move through the edges in this order, we insert each edge E as long as it does not create a cycle when added to the edges we've already added.

Prim's Algorithm: Start with a root node S and try to greedily grow a tree outward. At each step, we simply add the node that can be attached as cheaply as possible to the partial tree we already have.

Reverse-Delete Algorithm: Start with a full graph and begin deleting edges in order of decreasing cost. As we get to each edge e, we delete it only if doing so would not disconnect the graph.

The run times for all these algorithms can be O(m log n).

4.6

When implementing Kruskal's algorithm, we need a quick and easy way to see which nodes in a graph are connected. When we add an edge to the graph, we don't want to have to recompute the connected components from scratch. Instead, we will develop a data structure that we call the Union-Find structure, which will store a representation of the components in a way that supports rapid searching and updating.

The Union-Find data structure works as follows. The operation Find(u) will return the name of the set containing u. This operation can used to test if two nodes u and v are in the same set, by simply checking if Find(u) == Find(v). The data structure will also have an operation Union(A,B), to take two sets A and B and merge them together.

Readability

8/10. A little dry, here and there, but not too bad. Again, pretty familiar from class.

Summary of 4.7, 4.8, 5.1, 5.2

4.7

Minimum spanning trees play an interesting role in clustering. Clustering arises whenever one has a collection of objects that one is trying to organize into coherent groups. In such a situation, we often seek to measure how similar or different each pair of objects is. Often a distance function is run on the objects, with objects a larger distance away being more dissimilar from the others. Sometimes distance is actual distance, but often it's an abstract representation of the similarity or dissimilarity of the objects. Given a distance function on the objects, the clustering problem seeks to divide them into groups based on how close/far apart they are.

Say we are trying to divide a set of objects, U, into k groups. We then say that a k-clustering of U is a partition of U into k non-empty sets. The spacing of a k-clustering is the minimum distance between any pair of points lying in *different* clusters. Given that we want points in different clusters to be far apart from one another, a natural goal is to seek the k-clustering with the maximum possible spacing.

Usually we stat by drawing an edge between the closest pair of points, and then between the next closest pair of points and so on. We stop when we obtain k connected components.

4.8

We want to use the fewest number of bits possible to encode things, such as the characters in the English language. There are 26 letters, 5 important punctuation marks, and a space, which is 32 characters. This means we can't use less than 2^5 bits. But perhaps we don't have to use 5 bits for every letter; we can assign really common letters to strings of less than 5 bits so that we can attempt to minimize the average number of bits used. The issue of reducing the average number of bits per letter is a fundamental problem in the area of data compression.

Prefix encodings do NOT allow their to be any characters encoded by a set of characters that is all the beginning of a different encoding for a different character. For instance, if “a” was encoded as 1 and “b” was encoded as 10, we could say that a was a prefix of b. Eliminating prefixes makes encoding unambiguous. Morse code is a prefix-filled encoding and because of this is highly ambiguous.

5.1

Divide and conquer refers to a class of algorithm techniques in which one breaks the input into several parts, solves the problem in each part recursively, and then combines the solutions into an overall solution. In many cases, it can be a simple and powerful method.

First we will talk about the Mergesort algorithm. Mergesort sorts a given list of numbers by first dividing them into two equal halves, sorting each half separately by recursion, and then combining the results of these recursive calls. In Mergesort, we assume that once the input has been reduced to size 2, we stop the recursion and sort the two elements by simply comparing them to each other.

There are two basic ways to go about solving a recurrence: the most natural way would be to “unroll” the recursion, accounting for the running time across the first few levels and identifying a pattern that can be continued as the recursion expands. A second way is to start with a guess for the solution, substitute it into the recurrence relation, and check that it works.

5.2

As a way to explore this issue further, we now consider a class of recurrence relations that generalizes, and show how to solve the recurrences in this class.

This more general class of algorithms is obtained by considering divide-and-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine the results in O(n) time.

If T(n) denotes the running time of an algorithm design in this style, then T(n) obeys the following recurrence relation:

Thm 5.3: For some constant c, T(n) ⇐ qT(n/2) + cn when n>2, and T(2) ⇐ c.

Readability

6/10. This section was really dense and difficult to get through.

Summary of 5.3-5.5

5.3

Analyzing rankings has become a common problem recently. Many websites allow users to rank movies or other types of media and then try to suggest other things that user might like based on how other users who like the same thing that the first user likes rated things the user hasn't seen. This technique is often called collaborative filtering. Essentially, once the website has identified people with similar tastes to yours, it can then recommend new things those people with similar tastes liked that you may also like.

A core issue in this problem is the business of how we will compare various rankings. What constitutes “similar”? We seek to find a way to understand the middle-ground between users with identical rankings and users with completely opposite rankings. One neat solution to this problem is counting inversions. (In the list 2,1,3 we would say there is one inversion; the 2 is in front of the 1.)

What is the simplest algorithm to count inversions? We could look at every pair of numbers (ai, aj) and determine whether they constitute an inversion; this would take O(n^2) time. We can achieve O(n log n) time, though!

But how? We set m = [n/2] and divide the list into two pieces, a1–>am and am+1 –> an. We first count the number of inversions in each of these two halves. The trick is that we must do that in O(n) time. To help with counting the number of inversions, between the two halves, we will make the algorithm recursively sort the numbers in the two halves as well. Having the recursive step do a bit more work (sorting as well as counting) will make the “combining” portion of the algorithm easier. So, the crucial routine in this process is Merge-and-Count.

Our merge-and-count routine will walk through the sorted lists A and B, removing elements from the front and appending them to the sorted list C. In a given step, we have a current pointer into each list. We will then compare the elements ai and bj, remove the smaller one from its list, and append it to the end of list C. As A and B are sorted, we will count the number of inversions fairly easily. If we append an item from list a, then there are no inversions, but if the smallest item is in list B, then we know that it is inverted with everything in list A because it comes after all of them. We then increase our inversions count by the number of items currently in list A, then continue with our comparisons.

Thm. 5.7: The sort-and-count algorithm correctly sorts the input list and counts the number of inversions; it runs in O(n log n) time for a list with n elements.

5.4

Finding the right way to merge the solutions for this problem can be tricky. The problem: finding the closest pair of points. Given n points, in the plane, find the pair that is closest together. M. I. Shamos and D. Hoey considered this problem in the early 1970s.

We begin with some notation: P = {p1–>pn} where pi has co-ords (xi,yi). d(pi,pj) denotes the standard Euclidean distance between the points pi and pj. Our goal is to find a pair of points that minimizes d(pi,pj).

We will find the closest pair of points in one half and the closest pair of points in the other half and then use this information to get the overall solution in linear time.

The first half, Q, and the second half, R, will be considered one at a time. We will find the closest pair of points in Q and R by generating four lists, Qx, which has the points in Q sorted by increasing x co-ords, and Qy, which has the points in Q sorted by decreasing x co-ords, and analagous lists Rx and Ry. We now recursively determine a closest pair of points in both Q and R with access to these lists we have generated.

Now, we may have already found the closest points in our total set, but we must check to make sure. We will call D the minimum of d(q0,q1) and d(r0,r1). Now, we will imagine that there is a line dividing Q from R, and look for points within distance D of the line L, because only points within our current minimum D have a chance of being closer together than our current set of smallest distance points.

This algorithm runs in O(n log n) time.

5.5

We now turn to consider integer multiplication. The algorithm we were taught to do this in elementary school is quite efficient: you compute many partial products before adding them all together to get the final product. However, this algorithm runs in O(n^2), which we can improve upon by using a different, recursive way of performing the multiplication.

We will use only three recursive calls so that we get q=3 and our less-than-n^2 running time; the attempted version of the algorithm that is q=4 is n^2 and thus not an improvement!

The running time of Recursive-Multiply on two n-bit factors is O(n^1.59).

Readability

Liked this section! I had to wait to do the part about integer multiplication until we'd talked about it in class, but the other sections were very understandable. 9/10.

Summary of 6.1-6.4

6.1

Sometimes, there won't be a greedy solution to any given problem. Dynamic Programming can oftentimes look quite similar to brute-force search. It systematically works through the exponentially large set of possible solutions, BUT it does this without ever examining them all explicitly. We will build a dynamic-programming solution to the weighted interval scheduling problem.

Our first solution is, unfortunately, exponential:

Compute-Opt(j):

  if j=0
      Return 0
  
  else:
      Return max(Vj + Compute-Opt(p(j)), Compute-Opt(j-1)
  endif

It is exponential only because we are re-computing values multiple times; if we were to memoize the values (store them in an array as we compute them) then we can bring the running time down to O(n).

6.2

Now we want to turn our recursive solution into an iterative one. It is important to note that we can directly compute the entries in M by an iterative algorithm, rather than using memoized recursion. We simply start with M[0]=0 and keep incrementing j; each time we need to determine a value M[j], it is provided by our previous algorithm. It looks like this:

Iterative-Compute-Opt:

  M[0]=0
  For j = 1...n
      M[j] = max(Vj+M[p(j)], M[j-1])
  endfor

It is also an O(n) solution.

We can look at our solution to the weighted interval scheduling problem in a more general way to come up with a rough template for designing dynamic programming algorithms. To set about developing an algorithm based on dynamic programming, one need a collection of subproblems derived from the original problem that satisfies a few basic properties: i. There are only a polynomial number of subproblems ii. The solution to the original problem can be easily compute from the solutions to the subproblems. iii. There is a natural ordering on subproblems from smallest to largest together with an easy-to-compute recurrence that allows one to determine the solution to a subproblem from the solutions to some number of subproblems.

6.3

The weighted-interval problem depends on a binary choice: that is, a given interval would either be included in the final solution or it would not. There are many more complicated problems that we can apply dynamic programming to, where the recurrence will involve “multi-way choices.” At each step, we have a polynomial number of possibilities to consider.

Let's consider trying to find a line of best fit for a given set of data. If the points seem to lie cleanly along one line, we can easily compute the line of best fit. But data is not always so neat. What if we can see that the points would lie best against two or even three lines of best fit? We would then want to formulate a solution that requires us to fit the points well using as few lines as possible; we don't want to just connect each point with its own individual line. The is the “Segmented Least Squares” problem.

The best running time for the algorithm, as seen in the book, is O(n^2).

6.4

For problems like the Knapsack problem, we will need to add an additional variable to the recurrence underling the dynamic program. The goal in these kinds of more general problems is to choose a subset of maximum total value, subject to the restriction that its weight not exceed W. A greedy solution that always finds the correct solution to this type of problem is not known, so we must turn to dynamic programming. For this problem, determining a good set of subproblems will be tricky.

In order to find the value of Opt(n), we need to not only know the value of Opt(n-1), but also we need to know the best solution we can get using a subset of the first n-1 items and total allowed weight W - wn.

The running time for the algorithm this creates, as seen in the book, is O(nW). Given a table M of the optimal values of the subproblems, the optimal set S can be found in O(n) time.

Readability

Dynamic programming is pretty confusing! I had a hard time understanding these sections, especially the final one. The first two were pretty clear. 5/10 overall, though.

Summary of 7.1-7.2, 7.5, 7.7

7.1

Graphs are often used to model transportation networks, where edges carry some sort of traffic and nodes act as “switches” passing traffic between different edges. Network models like this have capacities on the edges and source nodes in the graph (that generate traffic) and sink nodes in the graph that are destinations that absorb traffic as it arrives. The traffics across these kinds of graphs is often referred to as “flow.”

A typical goal with this type of graph is to arrange the traffic so as to make as efficient use as possible of the available capacity. At the moment, there is no known algorithm for the Maximum-Flow Problem that could be considered a “dynamic programming” solution, so we'll return to greedy solutions.

The Ford-Fulkerson Algorithm solves this problem. (page 344)

It can be implemented in O(mC) time, so long as all the capacities in the flow network are integers.

7.2

We will continue to analyze the Ford-Fulkerson algorithm and verify that it does indeed have the maximum possible value for any graph G.

The Ford-Fulkerson algorithm terminates when the flow f has no s-t path in the residual graph of Gf. This turns out to be the only property needed for proving its maximality. (7.9, pg 348)

Remember when we pointed out that capacities in our graph must be integers for the F-F algorithm? Well, if that's NOT the case, then it is entirely possible for the algorithm to run forever, if real numbers are incorporated.

7.5

The Bipartite Matching Problem

A matching in a graph G is a subset of the edges M in E such that each node appears in at most one edge in M. The Bipartite Matching problem is that of finding a matching in G of largest possible size.

The Ford-Fulkerson algorithms can be used to find a maximum matching in a bipartite graph in O(mn) time.

7.7

Much of the power of the Maximum-Flow Problem lies in the fact that many problems with a nontrivial combinatorial search component can be solved in polynomial time because they can be reduced to the problem of finding a maximum flow or a minimum cut in a directed graph.

Before, we had just one sink and one source. Now, we will have multiple! Now, instead of maximizing flow value, we will have sources that have a fixed supply and sinks that have a fixed demand, and our goal is to ship flow from nodes with available supply to those with given demands.

Readability

4/10. This section is incredibly dense and hard to make sense of. I don't think the book does a great job of explaining overly technical aspects.

Table of Contents

Sam's Journal

Summary of Preface, Ch. 1.1, Ch. 2.1, Ch. 2.2

Preface

Ch. 1.1

Ch. 2.1

Ch. 2.2

Summary of 2.3, 2.4, 2.5

2.3

2.4

2.5

Readability

Summary of 3.1, 3.2, 3.3

3.1

3.2

3.3

Readability

Summary of 3.4, 3.5, 3.6, 4.1

3.4

3.5

3.6

4 intro + 4.1

Readability

Summary of 4.2, 4.4, 4.5, 4.6

4.2

4.4

4.5

4.6

Readability

Summary of 4.7, 4.8, 5.1, 5.2

4.7

4.8

5.1

5.2

Readability

Summary of 5.3-5.5

5.3

5.4

5.5

Readability

Summary of 6.1-6.4

6.1

6.2

6.3

6.4

Readability

Summary of 7.1-7.2, 7.5, 7.7

7.1

7.2

7.5

7.7

Readability