Anne Bailey's Journal

Preface:

Algorithms unravel the secrets of every science. They problem solve through mathematical cores and identify the nature of the problem they are solving. The point of algorithms are to both provide solutions and to construct the framework underlying the problems they are used to evaluate. We see this framework while trying to develop the most efficient possible algorithms. The point of studying algorithms is to better understand how to efficiently answer questions as they are posed. I would consider this preface a 3/10 in terms of how interesting I find it. (3/10)

1.1: Stable Matching

A self-enforcing match is one in which agents acting in their own self interest could not break a match for mutual gain on each side. The idea developed by Gale and Shapely behind this algorithm is that individual self-interest will reinforce an algorithm to assure that any matches made will not be broken after all assignment have been doled out. It is easiest to approach a one-to-one match so that applicants and acceptances are equal and mutual. Though matching is just assigning a single set of outcomes, perfect matching means that we follow this one-to-one rule. Preferences are the basis of self-interest that influences stability; without preferences no one would care with whom they were matched and matching could be a rolling process in pursuit of convenience and efficiency alone. To match perfectly, we must start trying to match persons and giving every person the chance to break their initial agreement in favor of another person – who may either accept or reject the person. If both are better off in their own self-interest by switching, this has eliminated an unstable match from the process. The algorithm to match men and women exclusively and heterosexually in marriage specifically favors men – they act in order of preference when women do not. This process goes on for n2 times at worst, which is a fairly long run time. Since the algorithm will not terminate until all are matched and it checks at each point for stability, the algorithm accomplishes what it sets out to do by matching a stable set of couples. In the long run, all orders of running of the algorithm will yield the same stable matching. (7/10)

2.1: Computational Tractability

Studying algorithms helps introduce students to the structure of computational theory; they are effectively discrete, at most countably infinite, since computers may not truly evaluate infinite data sets. This would require an infinite amount of memory with which the computer could store data to analyze its implications. For this reason, efficient algorithms are a cornerstone of effective computation: without an efficient algorithm, a computer will waste time and memory resources that could be used for other problems. Efficiency is defined by how poorly a case may go. Though an average-case may be more effective in evaluation of runtimes and memory use, worst-case scenarios are helpful in avoiding a brute-force method of computation. To find a case with lower run time at worst is often a better choice. “Proposed definition of efficiency (3): An algorithm is efficient if it has a polynomial running time.” This still does not assure that the most efficient algorithm will run better in every practical case. “There is no efficient algorithm for a particular problem.” My question here is how no algorithm may be especially efficient – does a lack of proof suggest this conclusion, or is it purely subjective? (9/10)

2.2: Asymptotic Order of Growth

Algorithms are mainly expressed in terms of pseudo-code and best/worst possible running times. The growth rate of running times with the addition of inputs is important because over time, an algorithm that significantly slows with inputs will not effectively use computer space. For this reason, with x being a mathematical expression denoting running time, O(x) is the worst case running time, Omega(x) is a best case running time, and if they are equal, they are equal to Theta(x), which is the tight bound that shows the actual running time of an algorithm. This is because all running times will be less than or equal to O(x) and greater than or equal to Omega(x). Due to the inequality, if they meet in the middle, the running time must be the point at which they meet. These tight bounds may yield from limits as n goes toward infinity, or they may be calculated from ratios with other functions whose tight bounds are known. For polynomial-time algorithms, the order is of a constant d such that O(nd) is the asymptotic running time regardless of the size of the inputs. The same logic applies to logarithms with O(nx¬¬) and exponential algorithms with O(rn). (8/10)

2.3: Implementing the Stable Matching Algorithm Using Lists and Arrays

We use algorithms to look at specific problems in which we use different data structures. Our prior Stable Matching algorithm ran, at most, with O(n^2). We can achieve this using lists and arrays. Choosing a data structure is, however, subjective. We can weigh the running time trade-offs in order to select the best data structure for the problem at hand. Arrays are not great for keeping lists that change constantly since they cannot grow and the nodes are stuck in a specific placement in the list. Linked lists are better for lists that change frequently since it is possible to insert and delete elements in a sorted order with shorter running times. We must iterate through a list until the ith element to access it, though, so this is one such trade off of using a linked list. Access of the ith element of an array is O(1), whereas this list implementation is O(i); this is an illustration of one of the running time trade offs. To translate between the two forms, it takes O(n) running time. To use arrays and linked lists to implement Stable Matching will should use arrays for preference lists, indices to refer to each man and woman. In this style, we should assign a specific data structure to every pertinent piece of information. Whichever structure requires the highest runtime to access the data will combine with the amount of run-throughs to determine the worst-case running time. These steps will combine to make the G-S algorithm run in O(n^2) at worst with implementations of arrays and doubly linked lists. (4/10 – We already knew this – No questions)

2.4: A Survey of Common Running Times

To build an algorithm, we should know what running time the algorithm will require, both by nature of the amount of information, and by how long it will take to use this information to attain an answer. O(n) is normally the size of the input times constant access, so it is likely that this type of algorithm will have to look through all data once. This can occur in a maximum-finding program that goes through elements one by one and records the so-far maximum at every data point. This order also applies to merging two sorted lists since it involves simply going through each list and adding the next smallest item until each list is empty. O(n log n) applies to the algorithms that split inputs in half, solves each recursively, and outputs a combination of the two solution with O(n) time. Sorting can be solved this way (Mergesort) by dividing inputs into two pieces, solving, and then merging the two sorted lists. This will also be common where an algorithm’s longest step is sorting. O(n^2) will apply to most algorithms where you must run through n inputs n times. It seems to be a brute force response to algorithms that comes from nested loops. Cubic time will describe a triple nested loop of any type where there are three data sets to compare. O(n^k) follows the above pattern to describe any search k times over n amounts of items. It is not a quick running time, but it can be necessary in a brute force approach to algorithm solving. Above and beyond polynomial time will often be O(2^n) or O(n!), which will come from comparing subsets of different sizes. n! grows more quickly than 2^n, but both grow so quickly that there will often be a better running time solution available. Finally, some cases will be smaller than linear but larger than constant. For example, binary searches run in this method.

My question about this section is how we can know with certainty that a proposed solution to an algorithm is efficient. The section provided a wealth of examples without sketches, and it felt like a bit much information in one place. (7/10)

2.5: A More Complex Data Structure: Priority Queues

Our approach to algorithms is to find a brute-force solution, then to improve upon it as best possible. A good goal is polynomial time, or better. The algorithm to evaluate in this section is the priority queue implementation. For this, we have a priority value (key), and we always want to select the highest priority from the list. Things won’t come in in order of priority, so it must be that processes are assigned a position in a data structure at the addition independently of the priority it has been assigned. “O(n) priority queue operations can be used to sort a set of n numbers” since extracting the smallest number may be executed n times with O(n). Therefore, if items can be put in and taken out with O(logn), then it will sort n numbers with O(nlogn) running time. The data structure that will yield this result is a heap, since it can be implemented as a balanced binary tree of maximum height logn. “Heap order: For every element v, at a node I, the element w at I’s parent satisfies key(w)⇐key(v).” An array of n nodes can maintain this heap with parent nodes at i, and children at 2i and 2i+1. When we add items or move items in the heap, we use heapify-up and heapify-down to move around the items and make sure that they are in the proper nodes such that their priorities are correct in relation to the other nodes’ keys. A table of running times for implementing priority queues with heaps is available on page 64. (7/10 – No questions since we already studied this in class)

3.1: Basic Definitions and Applications

A graph G is a way of showing pairwise relationships, called edges, amongst nodes. Edges can be undirected and display mutual relationship, or edges may be directed and relay asymmetric relationship. An edge between two nodes (u and v) in an undirected graph may be shown as e = {u,v} and u and v have no particular order. For a directed graph, however, edges are represented as e = (u,v) which show u and v as an ordered pair for which u directs toward v. These graphs model communication and transportation networks well in both directed and undirected forms, depending on the nature of the relationship tracked. A path in an undirected graph is a sequence of nodes such that each consecutive pair is joined by an edge in the graph. Simple paths only hit any given vertex once at most, and a cycle is a path in which the sequence of nodes visited in the path is visited cyclically – at least one node will be in the path more than once. Undirected graphs are connected if there is always some path between u and v, even if it reaches through other nodes. Strong connection means that every node has a path directed out to every other node. We can use connection to see how objects in a graph are related, and through how many steps we must go to connect any items. Trees are undirected graphs that do not contain a cycle. By picking a root in the tree below which to place all other nodes, we can view the tree as a hierarchical structure. With such hierarchy, the direction of relationships gains a new meaning, such as with how one node above another may be its parent, and the node below a child thereof. Trees of n nodes have n-1 edges. (7/10 – very helpful, very readable)

3.2: Graph Connectivity and Graph Traversal

Within graphs, we will want to be able to evaluate the connectivity of two elements without depending on visual representation, so we must therefore evaluate the breadth-first and depth-first searches as they allow us to answer questions about such connectivity. Breadth-first searches select the initial node as a root, then work out from the initial node in layers of how closely connected nodes are to the root (how many edges it takes to get from the initial node to the one for which we are evaluating connectivity). The strength of this search is that we will find both the shortest path from one node to another. Therefore, in whatever layer of nodes the target node occurs, it is connected as closely as possible to the original node. Worst case, this algorithm should run in O(n+m) since for each node in the connected graph, the algorithm will try to run and add a new element to a new layer with every single edge. (n = number of nodes, m = number of edges). Depth-first search explores single paths at a time instead of multiple layers of connections at once. It will look at long and twisting paths before coming closer to the root node in question. This search will also run at worst on a connected graph with O(n+m) since for every edge the algorithm will look at what is on the other end of that edge. When no more edges exist, the algorithm spends on looking for a child and then going back up to a previous node, and checking for an unexplored edge off of that node. Every single node will be checked at least once, and every edge will be eventually traversed. I’m not entirely certain where the O(n+m) comes from in DFS; I completely understand that each edge traversal adds to the time, but it seems that each node would be visited as such just by following the edges. (7/10)

3.3: Implementing Graph Traversal Using Queues and Stacks

Graphs can be represented as adjacency matrix or list representations, which are matrices or lists that name between which nodes an edge will exist. The number n of vertices and m of edges are the parameters on which a graph is built. O(m+n) as stated in my entry on 3.2 will be a good runtime for the searches through different graphs; it functions as linear time and in connected graphs, O(m+n) = O(m) since m is greater than or equal to n-1. This answers my only lingering question from the previous section. Adjacency matrices take up Theta(n^2) space, so if the graph is not strongly connected according to the definition in 3.1, there may be better ways to condense this data. Adjacency lists, however, take only O(m+n) since each edge contributes twice to the sum, and there are m connections by edges to take into account, we are looking at the total amount of connections to evaluate being 2m, which is O(m). DFS and BFS both are critical as to the order in which they take and consider elements. Queues are first in, first out while stacks are last in, first out data structures. The DFS will work well as a stack since we are constantly pushing things back off of our list of explored elements to retrace steps through nodes and take paths that we could have taken before. The BFS will run in O(m+n) on a queue or stack implementation since the time spent on setting up lists and managing an array about which nodes have already been discovered will be O(n) and the time on edges O(m): this will therefore run as O(m+n). Finally, we can use these searches to find all of the connected components in a graph by setting them to run while adding newly discovered nodes to an array or list. (9/10 – I found this section very helpful in clearing up my confusions on BFS and DFS and their implementations)

3.4: Testing Bipartiteness: An Application of Breadth-First Search

Testing for nodes to fit cleanly into two layers in which nodes from the same layer are never connected. Every edge has a node in each of the layers. Pick a node s, color it red, and declare all of its neighbors blue. If you can go through the whole graph using this method without ever changing a node’s color, we have a bipartite graph. This approach is effectively a Breadth First Search since we look in layers at which nodes are connected. Further, in this algorithm we color nodes based on which layer they are in, which is a simple constant action. Then, we run through the edges and see if any have both nodes in a single layer. This process has the same runtime as your basic breadth-first search, O(m+n) with n nodes and m edges in the graph. We can use this algorithm to split any distinct groups of data. I have no questions about this problem, it is a very straightforward algorithm. (5/10 – This was a very readable section, but we already learned this well.)

3.5: Connectivity in Directed Graphs

Edge (u,v) in a directed graph will go from node u to node v, it is the same type of relationship as that of following hyperlinks in a Wikipedia page. Once you follow a hyperlink in the middle of a Wikipedia article, the link doesn’t appear for you to go back along the same path to the page you were just reading. We use adjacency lists and matrices to keep track of relationships in directed graphs. One list keeps track of nodes that lead to a target node, and another list will keep track of nodes to which the target node leads with an edge originating at the target. Breadth First Search will be similar to how it worked in an undirected graph, but instead we will only explore edges that lead out from a node. The nodes in a layer j appear by the shortest path from s to that node in j – the shortest path will have j edges. The worst case will be that this breadth first search runs the same as the original, O(m+n). Strong connectivity means that each node leads to all other nodes and all other nodes lead back to it. To test this, it takes only linear time to run through and see if all of the nodes are properly connected. Running a BFS search also works, since we can run a BFS in the normal graph G, then in the reverse graph Grev. Failing to reach every node in either of the searches means there is no strong connectivity. To run this twice has the original runtime of O(m+n). A strong component is the set of nodes to which a node s is mutually reachable. I found this section very helpful in understanding directed graphs and how to traverse them. (8/10)

3.6: Directed Acyclic Graphs and Topological Ordering

Undirected graph with no cycles is always a tree for each connected component. Directed graphs can have complex structures without directed cycles. This is called a DAG. They are very useful for prerequisites and other dependent relationships. A topological ordering of such a DAG is any order of nodes such that a node must always come before any to which it leads. If this exists, the graph is a DAG by implication. To create such a topological ordering we use an algorithm that evaluates relationships. First, we need a root node with no incoming edges, which exists in every DAG. Then, by induction we can create a topological ordering with O(n2) run time in total. An O(m+n) algorithm exists with deletion of nodes with no left incoming edges not yet explored. I found this piece well written and helpful. (7/10)

4 Preface: Greedy Algorithms

Greedy algorithms build up solutions slowly by picking optimal pieces to a solution that fit a certain optimization criterion. It follows any path that is most optimal to build up the best possible solution. Therefore, our greedy algorithm should produce an optimal solution – one on which we cannot improve. We use such an algorithm for finding shortest paths. We also use such algorithm for a multitude of other applications that require we choose the best possible combination of solution outputs. (1/10)

4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead

Interval scheduling, which was first addressed in the first chapter, looks to collect a group of compatible (non-overlapping) tasks taking up the most possible collective time or tasks, depending on the value of each task versus the time it takes to complete. Starting with the earliest task not be optimal since accepting a long first task will reject a lot of tasks that overlap. The smallest interval of time doesn’t guarantee that conflicts will be minimized. Least conflicts does not guarantee that the best choice doesn’t have the most conflicts with very short other tasks that all happen at the same time. We eventually realize that the earliest finish time is a good way to look at which collection of tasks will be optimal. We run through to collect the next closest finishing time that doesn’t conflict with any prior selections. This will always run in a way that forms an optimal solution, since any other choices could not improve on the maximum size of the solution already found. The runtime can be O(nlogn) since we sort the n requests in finishing order, then in O(n) find an optimal solution with our greedy algorithm. Similarly, finding an algorithm that schedules every interval in the order that will choose as many intervals as possible, looks for a maximum depth of intervals and labels each. This is always using a minimum number of labels. (9/10)

4.2 Scheduling to Minimize Lateness: An Exchange Argument

We want to use a resource and process n requests to use that resource for certain intervals of time. Each request i hopes to be done by deadline di¬ when it has length ti. Lateness for request i is the finish time f(i) = s(i) + ti where s(i) is the time at which we start it. We sum these all for our order chosen to finish the requests and try to minimize the total lateness of all of the requests. Our greedy algorithm is going to go in order of earliest deadlines first. This algorithm leaves no gaps of unused time, and all solutions produced will have the same maximum lateness L. An inversion is a job with an earlier deadline scheduled before a job with a later deadline. Optimal solutions produced by our specific algorithm should not have inversions. Swapping a solution with such an inversion will not increase maximum lateness. I finally understand what the deal with inversions is now. (7/10)

4.4 Shortest Paths in a Graph

This is the application of greedy design to shortest paths on graphs. We have a start node s and want to find the shortest path to each other node. We go through each incident node and build up shortest paths. We then go through the incident nodes and those edges not yet explored from them. If we run across a node that we can get to through others that has already been visited by a shorter path than already recorded, we rerecord the path length. Newly explored nodes get the length of the shortest path to the node prior plus the length of the edge from that node to itself. This is dijkstra’s algorithm, and it runs well so long as edges do not have negative lengths. It runs with a total of O(mn) times since it runs through each node and each edge attached to each node to determine minimum path length. With heap-based priority queues this can run in O(mlogn). I’m not very clear on the implementation run times given on this problem. (7/10 – didn’t enjoy, but learned a lot from it)

4.5 The Minimum Spanning Tree Problem

Problem: we want to build a communication network over a graph of n nodes connected by individual relationships. There is a cost of each edge between two nodes. The minimum cost solution to our problem will be a tree over all the nodes in V, our connected graph. If the solution is a tree, it is a spanning tree, and a lowest cost solution (with the lowest sum of all the weights of the edges) is a minimum spanning tree.

Kruskal’s: Start with no edges, build a spanning tree by inserting edges by increasing weights as long as they do not create a cycle. Produces a minimum spanning tree.

Prim’s: Start with root node s and build a tree out by adding the cheapest edge to our current tree that adds a new node. Produces a minimum spanning tree. We can implement both of the above in O(mlogn) time. Prim’s uses a heap-based priority queue to achieve this time.

Reverse-Delete: Start with a full tree and delete the edges with the largest cost first as long as the edge deleted would not disconnect the graph. Produces a minimum spanning tree.

Cut Property: all distinct edges. e = minimum-cost edge with one end in any subset S and the other in V-S. Then every minimum spanning tree contains e.

Cycle Property: When C is a cycle, and e is the most expensive edge in the cycle, e will not be in any minimum spanning tree. (8/10 – not particularly readable)

4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure

Find the set of connected components. We look at a graph over time as edges are added. Therefore, we will store in Union-Find for easy search and update functions. Find(u) will return the name of the set containing u. As we add edges, if u and v are already in the same set we have no problem. Otherwise, we use Union(Find(u),Find(v)) to merge the connected components of the nodes on each end of the edge being added before adding the edge itself. This structure has three operations: MakeUnionFind(S), Find(u), and Union(A,B). These have O(n), O(logn), and O(logn) respective running times. We make an array Component to hold the name of the set that holds each element. We set the set in which an element is located and find elements in constant time, and we merge two sets in at most O(n) time. k Union operations will take at most O(klogk). We want to improve this to being at worst O(logn) at the cost of increasing the Find time to O(logn) and evening out the running times. We will use pointers to make this happen. Union will take O(1), MakeUnionFind O(n), and Find O(logn). Every node will have a pointer to the name of the set to which it belongs. I don’t quite get why we have O(logn) here for Find. We use this data structure to implement Kruskal’s by sorting edges by cost with time O(mlogm), and m is less than or equal to n2, so it is also O(mlogn). After sorting, we use the structure to keep track of connected components. We go through at most 2m Finds and n-1 Union operations, so we have a total of O(mlogn) runtime. (9/10)

4.7 Clustering

Clustering is the case in which we group objects based on their distance from other objects; these distances are represented as weighted edges on a graph. When we k-cluster U, we partition the objects in U into k nonempty sets wherein the spacing is the minimum distance between any pair of points in different clusters. Our inherent goal with clustering is to maximize the spacing. Our algorithm to k-cluster U will output a graph in which the connected components are clusters. This method runs Kruskal’s Algorithm on a minimum spanning tree but stops before the k-1 final edges are added. We could do this by deleting the k-1 most expensive edges of a minimum spanning tree. Any two not connected components must have at least a spacing of the length of the (k-1)th most expensive edge, and this must be the maximum spacing. I found the proof written out to be particularly confusing; while I understand another method of proving that weight(k-1th most expensive edge) = maximum spacing, I cannot figure out how exactly to follow the proof given in the book. I really enjoyed this section. (9/10)

4.8 Huffman Codes and Data Compression

We now want to use a greedy rule to solve a problem in smaller components with recursion. Being greedy in the smaller case will benefit in the original problem’s long run execution. We use such processes for encoding, in which we let sequences of bits stand for characters in larger alphabets. The most simple way to do this is to use a fixed number of bits with different combinations of 0s and 1s (b bits can present 2b distinct combinations). When we have different frequencies, though, it makes sense to use less bits for the most frequently used characters – reducing the average bits per letter is data compression. Morse code is one encoding of the English alphabet. When one encoding is a prefix (the first part of another letter’s encoding), we have ambiguity prefix code does not have any of this ambiguity, so as soon as you have a sequence scanned left to right long enough to be an encoding, you check to see if it is a valid encoding of a letter. Once translated, always delete the scanned bits from the front of the encoded message before continuing to decode the remaining message. Optimally, the most frequently used characters would have the shortest available encodings, while the least frequently used take the longest encodings. When the Average Bits per Letter is minimum, our code is optimal. A binary tree in which going down and left is a zero on the end of an encoding and down and right is a one on the end of an encoding represents the problem. Constructing an encoding down to a leaf of the tree will be a prefix code, and non-leaf nodes will not be characters, but instead meta-characters that occur with the frequency of the sum of the frequencies of their children. The length of an encoding will be the depth of the leaf, and an optimal prefix code will correspond to a full binary tree since it will not have any encodings longer than necessary. We will build such a tree from the bottom up to the root. Huffman’s Algorithm builds an optimal tree as such from the bottom up (pg 172-173) in time O(klogk), where k is the size of the alphabet to encode since we run through all k elements and perform insertion and extraction in logk time. (8/10)

5.1 A First Recurrence: The Mergesort Algorithm

We look at Divide and Conquer algorithms starting with Mergesort since it divides a list of numbers in two halves, sorts each half recursively, and combines the results. I found the beginning of the recurrence explanation confusing. A recurrence can be solved either by running across a few levels to see a pattern and then summing over the running times of all of the levels or by guess and check with an induction on n. The first method, when used on mergesort, shows that we have 2^n problems of size cn/2^n on each level, so we have cn time at most on each level. We will have at most log2n levels deep, so we have a total running time of O(nlogn). Guess and check in the plug in method will show us that, if we follow the equation T(n) ⇐ 2T(n/2) + cn, by plugging in our guess of a running time of nlogn as (n/2)log(n/2). The inequality holds, so we can have preserved the inequality and our guess and check is correct. If we follow the equation after plugging in our guess and find a contradiction that the time associated with n/2 is greater than that for n, we won’t have the proper depth. We must finish the induction without contradiction to have a proof in this way. (6/10)

5.2 Further Recurrence Relations

We want to consider algorithms that recursively call on q subproblems of size n/2, then combine results in O(n) time. Mergesort is an example of such behavior for q=2. For q=2, T(n) follows the recurrence relation (for some constant c) T(n) ⇐ qT(n/2) + cn for n>2, and T(2) ⇐ c. We will treat every different q separately. For q>2, We start by analyzing the first levels. The first level will have a single problem of size n, for a total of cn. The second level will have q problems of size n/2. The nth level of recursion will have q^(n) problems of size 1/2n. The total work on a level will be q^n(cn/2^n), and therefore the total workload will be O(n^(log2(q))). When q=1, there is just the combination of the solutions, so we have a workload of O(n). For q=2, the workload is O(nlogn). For q=1, the work is mostly in the top level; q=2 is a set in which each level has the same amount of work; for q>2 a majority of the work will be done at the bottom levels. When recombination requires time O(cn^2), it will be ruled by O(n^2) considering how quickly (n/x)^2 decreases as x increases. I find the T(n) notation confusing, but understand how we arrived at each of the upper bounds in some way or another. (7/10)

5.3 Counting Inversions

We start with a problem that analyzes by rankings to match preferences with a target audience by the process of collaborative filtering. This method could also be used for meta-searches, which combine the process of searching multiple search engines on the web. In a set of n items in a specific rank order, collaborative filtering will look for other users who have ranked items similarly to your order. In order to carry this out, the process will count inversions, which is the amount of items “out of order” between the two sets. When a set fully disagrees, there will be n choose 2 inversions total, at most. The simplest algorithm would look through each pair in O(n^2) time, but we can find an algorithm with O(nlogn) instead. We will divide the list into two halves, count the inversions in each half, and then combine the solutions in O(n). We will walk through each of the sorted lists of half the size of the total elements, and then append them to a final sorted list with the number of inversions counted. This will take a total of O(n) time. Since A and B are sorted, an inversion occurs when any item goes onto the separate list C from B, in the amount of the number of items remaining in A but not yet appended to C. I’m completely confused about how we sort lists A and B when they are rankings assigned by the user. (9/10)

5.4 Finding the Closest Pair of Points

When we have n points in a plane, we want to find the pair of points that is closest together; we want to find a solution that is more efficient than the obvious O(n^2) of looking at every pair of points. We will be able to find an algorithm that completes the problem in O(nlogn) at this point, and eventually in O(n) with randomization. We want to find the points pi,pj which minimize d(pi,pj). In one dimension, this can be done by sorting the points in O(nlogn) time by x coordinate and then walking through that sorted list and looking at the distances from one point to the next. The first step of our algorithm is the same – we sort the points by their x-coordinate. We divide the sets in half at this point so that we have two sets of size n/2 (left half will have the ceiling function). When we run through the points to order by x-coordinate, we will also take note of the order by y-coordinates, all in O(n) time. Using accumulating/pointer variables, we find the closest points in each the left and the right side, and set that distance equal to delta. We then look at all points within delta of the line we used to separate the sets, and if they are closer than delta to one another, they are the new closest pair. This is an algorithm based heavily on recursion and knowing how to efficiently combine our data to produce a total running time of O(nlogn). About halfway through reading this section I finally understood exactly what we had talked about in class and was able to sketch the solution before reviewing the class notes and remainder of the text. This was an extremely helpful section. (9/10)

5.5 Integer Multiplication

For integer multiplication we require more than two recursive calls on each level. The elementary school method of finding a product is O(n^2), but we can do better. x_1 will correspond to the “high-order” n/2 bits in a base 2 number system, and x_0 to the “low order” n/2 bits. Then xy = x_1*y_1*2^n + (x_1y_0 + x_0y_1)*2^(n/2) + x_0y_0. We recursively solve the problem for the four problems of size n/2. Combining our solution will take O(n), so our recurrence is bounded on T(n) ⇐ 4T(n/2) + cn, q = 4, so this will be T(n) ⇐ O(n^(log2 q) = O(n^2). To bring q down to 3 and the whole algorithm to O(n^(1.59)), we will solve x_1y_1 and x_0y_0 by recursion; we find some of the terms and then use them to determine the other terms. Each recursion will be on a size of n/2 asymptotically, and in our better instance we will have T(n) ⇐ 3T(n/2) + cn. From the earlier parts of the chapter, we know O(n^(log2 3)) = O(n^1.59) is our runtime. This problem was easy to understand, but it glossed over how we achieve the q = 3 case a bit too much. (5/10)

6.1 Weighted Interval Scheduling: A Recursive Procedure

Chapter 6 is about Dynamic Programming, which is the opposite of greedy as it looks at every possible solution by making a series of subproblems, and then building up the correct solutions to increasingly large subproblems, in a method that is very close to brute force but without looking at every problem’s solution directly. We will first use this method in a recursive form for weighted interval scheduling. We want to accept a set of intervals that do not overlap of maximum value in this problem. We don’t know any greedy algorithm that optimally solves such a general form of the problem yet. Each request I will have start time s_i, finish time f_i, and value v_i. We want to choose a subset S in {1, … , n} with no overlap in the elements of S where the sum of the values of the elements in S are maximized. To solve this problem, we will break up our data set such that OPT(j) is the value of an optimal solution of a smaller problem of form {1, 2, … , j}. We are looking for OPT(n). p(j) is the largest index of a job (in order of increasing finish time) such that the job does not overlap with j. Then OPT(j) = max(v_j + OPT(p(j)), OPT(j-1)). j belongs to the optimal solution if and only if v_j + OPT(p(j)) >= OPT(j-1). Assuming that we have already sorted by finish time and figured out p(i) for each i, we have then figured out our recursion relation. By induction on j, we find out that our solution is optimal [Comp-Opt(0) = 0 trivially – for j>0, Comp-Opt(i) is correct for 0<i<j – we know that Comp-Opt(p(j)) = OPT(p(j)) and Comp-OPT(j -1) = OPT(j-1), so it follows that OPT(j) = max(v_j + Comp-Opt(p(j)), Comp-Opt(j-1)) = Comp-Opt(j)]. But this would take exponential time. Therefore we must memoize the recursion (save previously calculated inputs) to speed up time to polynomial-time, our ultimate goal. We are really only looking for Comp-Opt(0,…,j), which is of size n+1, so we can store each Comp-Opt when it is calculated so that it is accessible. Each single call is O(1), and we can only call O(n) of these values, so our M-Comp-Opt with memoization will run in our desired O(n). In addition to finding the optimal value, we should keep track of the set of intervals, so we will add this functionality into our M-Comp-Opt by getting our optimal solution back from the array M after our optimal solution is achieved. We create a Find-Solution(j) recursion that checks whether v_j + M(p(j)) >= M(j-1), and if it is, it outputs j in addition to the result of Find-Sol’n(p(j)), or if it is not, outputs the result of Find-Sol’n(j-1). Therefore, it takes O(n) calls with O(1) per call. It therefore returns the solution in O(n) time when it has the array M of the optimal values of the subproblems. I’m not entirely clear on where the array M comes from, but the remainder of the solution is very clear. This was a very helpful section to read though it took a few rounds to understand. (8/10)

6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems

We will here convert our last section’s Weighted Interval Scheduling method using M-Comp-Opt, the array M, and Find-Sol’n to explain and study dynamic programming by iterating through subproblems instead of using explicit recursion. The array M is important because it stores the optimal solutions to each iterative subproblem. Once we create the entire array, we have the solution, and tracing back with Find-Sol’n just takes the data from M and converts it to the indexes we have in an optimal solution set S. Instead of memoized recursion, we can increment j and take the maximum as M[j] of the function that we have already stated. By induction, as before, we can show that this chooses OPT(j) as M(j). It is clearly an algorithm in O(n) time, as it simply goes through n problems and chooses and assigns a maximum in constant time at each level. Each manner works, but we will focus on the iterative build-up of subproblems. We will need to choose subproblems such that they are (informally) polynomial in number, they can compute the original problem’s solution, and they have a natural ordering and easy recurrence so that they can be calculated from the “smaller” subproblems easily. Though we may want to go about ordering subproblems, it will be easier if we first define their recurrence to be sure that they will be useful subproblems. This section was particularly readable, and easy. It was frustrating to read the simple iterative algorithm. The last section was hard to read, and it was so stinking simple. What does the book mean by these being informal guidelines for the subproblems? I liked this section a lot, but it was really vague. (8/10)

6.3 Segmented Least Squares: Multi-Way Choices

We want to look at a dynamic problem where we do not have the binary choice of an interval being inside or outside of the optimal solution. Instead, there will be multiple possibilities for the solution’s structure. Here we are looking at a line of best fit through a scattered plot of data points. We decide which line best fits by finding the minimum cumulative error – the sum of the distances that each of the points are from the line of best fits. Since the points may lie on more than one line, such as data where the points follow three lines, we will look to best fit the data with as few lines as possible. We will want to keep track of the points where the curve changes from one line to another. To find the minimum amount of lines, we use a penalty system where we multiply the number of segments by a positive multiplier and add that to a segment and the error value of the optimal line through each segment. We want to find the line with minimum penalty; since the multiplier is not set, we can set it higher to penalize using more lines. This problem is a partition of n objects. If our last partition optimally is i through n, then the optimal value is OPT(n) = e_(i,n) + C +OPT(i-1) where e_(i,n) is the minimum error on i through n. and OPT(j) = min_(1-i-j) (e_(i,j) + C + OPT(i-1)). We will trace back through the array M to create a partition. This algorithm will run through n times on n^2 points, so our algorithm to find the errors is O(n^3), and to run through the array and find segments, we will have O(n^2) after the errors have been caluculated. I am not entirely clear on how we find the formulas for the different lines, and this section was not particularly readable. (5/10)

6.4 Subset Sums and Knapsacks: Adding a Variable

We will add to our interval scheduling problem by looking at n requests which use resources between time 0 and W, and have weights w_i for problem i. we want the sums of the weights to be as large as possible without exceeding W. We call this a knapsack problem because we fill our knapsack with items of value v_i and weight w_i until the total weight does not succeed W and the value is the maximum problem. We need to work out enough subproblems that we know OPT(n-1)’s value and the best possible solution on n-1 for weight W-w_n. Then OPT(I,w) = max(OPT(i-1,w), w_i + OPT(i-1,w-w_i)). To do that we will build a table of the values of OPT(i,w) which only computes each value once. Finding M[i,w] comes from two other values in the table, so we compute each of these in constant time, so the running time increases in proportion with the number of entries in the table, so Subset-Sum(n,W) runs with O(nW) time, so it is pseudo-polynomial, which grows very quickly with large inputs. Given M, we can find the optimal set with O(n) time. Therefore, our total running time is O(nW). I found this section very readable and helpful, but my understanding was dependent on having already studied this in class. (6/10)

6.6 Sequence Alignment

We use sequence alignment to compare strings using shortest paths in graphs with any integer weight to edges. When we compare strings we can use gaps to minimize mismatches. To define the similarity of two strings, we should look at the indexing without gaps. An alignment will denote which symbols are the same in order without counting the gap spaces. For example, if the second spot of the first word and the first spot in the second word match, we add the pair (2,1) to the set of alignments. When positions don’t match, there is a gap, and it has cost delta > 0. Mismatches cost a specific amount, and having the same letter costs nothing. The cost total is the sum of gap and mismatches. The first alignment of strings will only be better if the number of gaps times delta + the sum of the mismatches is less than the number of gaps times delta; in this case the first way uses mismatched letters and the second puts in gaps everywhere to avoid mismatches. Similarity is therefore the minimum cost of an alignment between X and Y. If we have an optimal M, either (m,n) is in M, the mth position of X is not matched, or the nth position of Y is not matched. The recurrence for i and j will be OPT(i,j) = min([alpha_x_i,y_j + OPT (i-1,j-1) , delta + OPT (i-1,j), delta + OPT(i,j-1)]). If (i,j) is in our optimal M, then the first of these WILL achieve the minimum value. This runs in time O(mn) since A has O(mn) values, and we spend at most constant time on each. We have the minimum cost path from i to j, f(i,j) = OPT(i,j). I thought this section was a bit confusing because it failed to give an algorithm for how exactly to choose where to put gaps. (8/10)

7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm

Transportation uses a network model in which edges carry traffic and nodes let you switch between different highways; then networks will have capacities on the edges, source nodes that generate traffic, and destinations which take in and eliminate traffic. These highways will have directions along which the traffic flows, so they will be directed graphs with capacities on edge e c_e, one source node s, and one sink node t. We want to move the traffic around so that we efficiently use the capacities available. The maximum flow equals the minimum capacity of an area, called the minimum cut. To find the maximum flow, we want to push as much flow as possible along the edges from s to t. We can push backwards and create a residual graph. I do not understand at all how the pushing backwards of flow works. We can show that the Ford-Fulkerson Algorithm terminates while computing an s-t flow on G after at most C, the sum of the capacities of the edges iterations on the while loop. The bounds can simply be O(m), so O(mC). This section confused me instead of clearing things up. (4/10)

7.2 Maximum Flows and Minimum Cuts in a Network

This section continues analyzing the uses of the Ford-Fulkerson algorithm from page 344. We want to show that the algorithm returns the absolute maximum value for any flow in G; therefore we will use cuts to place more upper bounds than just C. Dividing the nodes into two sets, A and B means that we can bound the maximum flow values on A and on B; a cut is therefore a partition of the vertex set into A and B where s is in A and t is in B. The capacity of this partition is c(A,B) = sum_(e in A) c_e. v(f) \leq c(A,B). Every flow is upper-bounded by the capacity of every cut, so the maximum flow is equal to the minimum cut. W can compute an s-t cut of minimum capacity in O(m) time since we can extend the algorithm easily. There is a maximum flow where every flow is an integer, but this is not always the case for every flow. We can prove the Max-Flow Min-Cut Theorem even for real numbers, but we work well with integers on this problem. This section was easier to read and helped a bit with the Ford-Fulkerson algorithm, but I still don’t understand residual graphs. (7/10)

Table of Contents