Alicia's Wiki

PREFACE

SUMMARY Algorithms can give us a whole new way at looking at computer science problems. They can help us make sense of and find solutions to complicated problems and even allow us to reframe the question, stripping away application-specific details to look only at the clean math problem at the core.

REFLECTION: Do we always need to prove that an algorithm is correct with a mathematical proof? Are there some things that can't be proven? The preface is very readable (8/10) and gives good motivation for learning algorithm design.

CHAPTER 1

Section 1.1: The Stable Matching Problem

SUMMARY: The stable matching problem focuses on a situation where n number of entities have to match with n number of other entities in a one to one pairing. The “problem” arises when one party (x) would rather be matched with someone other than their current partner and that party (y) would also prefer x to their current partner. This can cause instability because it means that one half of the matching could reject its partner for a more preferable match and said current partner would have to seek out a new match. The instability could ripple outward and cause many pairs to split and reform. A match is stable when both parties of ALL pairs are as happy as can possibly be. A stable matching can be found using the Gale-Shapley (G-S) algorithm, which terminates after at most n2 iterations. In this algorithm, one side (eg. males or companies) make proposals to the other side (eg. women or interns). If the member of the second side has not been matched, they must accept; if they are already matched, they keep or leave their current match depending on whether the new partner is higher or lower in their ranking. This continues as long as there is an unmatched entity from the first side. The side that does the proposing (side one) always gets its best possible matching while the side that gets proposed to always get its worst possible match.

REFLECTION: I appreciate that they gave real world motivation for solving this problem by discussing matching interns with companies. However, the algorithm as described is very simple – I wonder how it can be broadened to provide matchings when there are ties among rankings or when one company has multiple internships available. Reading this chapter helped me understand the patterns that we discussed in class (a women's match only goes up, a man's only goes down, etc). It was very interesting to see that every execution yields the same matching. This chapter was an 8/10 on the readability scale.

CHAPTER 2

Section 2.1: Computational Tractability

SUMMARY: When implementing an algorithm, we need to consider a few factors to determine whether our implementation is efficient or not. One way to evaluate this is to look at the running time of an N-size input. In calculating running time we tend to think in terms of worst-case scenario, since it is difficult to say what an “average” input would be and therefore difficult to calculate average running time. A better definition of an efficient algorithm is one that has a better running time than the brute-force implementation. Probably the best definition is an algorithm that has a polynomial runtime, which encompasses O(n), O(nlog), O(n^k), etc. As the input grows, the running time for polynomial time algorithms is leagues better than exponential time algorithms.

REFLECTION: This chapter is a 5/10 on the readability scale. It seems to me as though the authors are using unnecessarily complex words: “One further reason why the mathematical formalism and the empirical evidence seem to line up well in the case of polynomial-time solvability…” The way that this same material was presented in class was much more straightforward, which I greatly appreciated. I would like to see an example of a problem that has no polynomial time algorithm. Would that be the result of a very complex input or would the input just require a lot of manipulation?

Section 2.2: Asymptotic Order of Growth

SUMMARY: When evaluating run time, we need to focus on general terms rather than exerting excess time and energy on finding very specific and detailed calculations. We achieve this by finding asymptotic upper and lower bounds. To get the upper bound you must inflate all of the terms in the equation to match the biggest term; to find the lower bound you must do the opposite. A few properties of asymptotic growth rates are transitivity [f = O(g) and g = O(h), f = O(h)] and sums of functions (if f = O(h) and g = O(h), f + g = O(h)]. When you have a polynomial equation, the run time is determined by the “biggest” term. Logarithmic run times such as O(log n) are also considered to be polynomial time algorithms and it is not considered sloppy to write O(log n) without the bases since it essentially works out to the same thing – all log n functions grow pretty slowly regardless of the base. However, saying that an algorithm is “exponential” is sloppy because the degree of the exponent DOES make a big difference in the run time and how quickly it grows based on input size (eg O(n²) grows much more slowly than O(n¹⁰).

REFLECTION: I found this chapter a little difficult to read (6/10) due to the mathematical notation. After discussing how the terms were pronounced in class, the proofs became easier to understand and it the properties themselves made more sense, especially after seeing the diagrams of the different functions in the slides. I wonder if there are other properties of bounds that aren't covered here?

Section 2.3: Implementing the G-S Algorithm

SUMMARY: The motivation for this chapter is to demonstrate how we can use theoretical mathematical concepts to design a correct and efficient algorithm. THe books implementation uses arrays and lists to achieve this goal. Arrays are handy because accessing an item at a given index takes O(1) time. When the list is sorted, searching through it only takes O(logn) time, though if it isn’t sorted search time is O(n). However, an array is cumbersome to update; if we will be adding and removing elements frequently, it is better to use a list than an array. By representing single men in a linked list, we can identify and select a free man in constant time. We can also keep arrays that show whether a man or a woman is engaged and to whom, which allows us to identify pairs in constant time. Another useful array to have is a Next[m] array, which will allow us to keep track of which woman a man should propose to next. Lastly, we must have one final to efficiently judge whether a woman prefers m or m’. This array will just be an inverse of her preferences, where the indices represent the men and the contents represent rankings. wpref[m] > wpref[m’] is a constant time comparison. By cleverly exploiting the benefits of lists and arrays, we can implement the G-S algorithm in O(n²) time. The pseudo code for this algorithm is as follows:

initialize each person to be free while there is a free man: - - - - select single man m - - - - w = top woman on m's list who he hasn't proposed to - - - - if (w is single): - - - - - - - - pair w with m - - - - elif (w prefers m to current partner m'): - - - - - - - - pair w with m - - - - - - - - m' becomes single - - - - else: - - - - - - - - w rejects man return list of pairs

REFLECTION: I was already familiar with lists and arrays and the refresher in class helped me straighten out the running times of different operations for these data structures (eg add, get item at index i) so this section was very readable for me (8/10). Analyzing this kind of algorithm is pretty straightforward, but what happens when the data we must work with cannot be represented by a simple array? I suppose a complex Person class that would contain all of this information (Name, PreferenceList, etc) could be broken down into just the parts that we need for the algorithm to work… but will we always be able to use a simple data structure to represent all of the data that we need to work with?

Section 2.5: Common Running Times

SUMMARY: Algorithms with linear running times, aka O(n), typically involve processing the input in one pass with constant functions, such as a for loop over each item in input n that adds one each time. O(n logn) algorithms usually involve splitting the input into 2 pieces, solving each piece recursively, and then combining the answers (ex merge-sort). When you have to search over all pairs of input items (such as in a nested for-loop), the typical running time will be O(n²) if you spend constant time per pair. When you have to spend more than constant time per pair, the running time can grow to be O(n²). An O(n^k) running time occurs when your algorithm must search over all subsets of size k (rather than just pairs). Although it might seem like you can’t get higher than O(n^k), the function n! actually grows faster — an example of an algorithm with this running time is the traveling salesman problem. Lastly, there are certain algorithms that perform better than linear time. For example, binary search on a sorted list is O(logn) time.

REFLECTION: The explanation of why merging 2 sorted lists would only be O(n) time made much more sense in class than in the book. One question I had for this chapter was, if the salesman has to make a loop between given cities, although there are multiple paths or combinations, is one combination really better than another path that is the same but in a different order? Do we really have to iterate over all possible combinations? I give this chapter an 8/10 on readability because there weren’t any seriously complex mathematical ideas, but rather a description of a running time and an example where said running time would apply.

Section 2.6: Priority Queues

SUMMARY: A priority queue is a data structure where each value also has a value key. The lowest value keys indicate the highest priority and the queue supports getting the highest priority, adding a new item, and deleting an item. We want a max of O(log n) running time for each of the priority queue’s functions. When adding an item, we must insert it at the end (position n +1 where n is the current length) and then perform the heapify-up function to switch the item up until it is smaller than all of its children. This takes at most O(log n) running time, since at most the item would go from the bottom level to the root. When deleting an element, we have to swap the element in the n-th position to fill the hole left behind by the deleted item, then either heapify-up or heapify-down to make sure that heap structure is maintained. Heapify-down is also O(log n), finding the minimum is constant, and extracting the minimum is O(log n) since it is a combination of finding and deleting.

REFLECTION: I think its really confusing to use “key” when describing the value of an item, since I’m used to thinking of keys as what you use to access a value in a dictionary — aka a key value pair where the key is typically different from it’s value. If we are just setting the key equal to the value of our node in our heap/priority queue, why can’t we just refer to it as the value? Also, if we combined heapify-up and heapify-down into one function called healpify, would it still have O(log n) running time? The book was much more specific about the priority queue’s setup and functions, which made it much easier to evaluate running times. Overall I give this an 8/10 on the readability scale.

Chapter 3

Section 3.1: Graph Fundamentals

SUMMARY: Graphs are comprised of nodes and edges where each edge connects 2 nodes together. An edge in an undirected graph has a set of nodes {u, v} while an edge in a directed graph has an ordered pair of nodes (u, v). Graphs are useful for representing transportation, social, dependency, and other networks. A graph can have both weak and strong connectedness based on whether it is directed or not. If the the graph is undirected and does not contain a cycle, it is considered a tree, which can be reshaped to form a hierarchical tree.

REFLECTION: In class we defined a cycle for nodes k>3 that start and end in the same place, but the book defines it as k>2 nodes… why is that? The traveling salesman problem (which has a running time of O(n!) is most typically represented by a graph, isn’t it? Do problems represented by graphs typically have solutions with very high running times? I would give this a 9/10 on the readability scale, since the chapter was short and to the point. The slides in class relating to cycles helped me get a better idea of what it actually looks like.

Section 3.2: Graph Connectivity & Traversal

SUMMARY: Two basic ways of traversing through a graph are depth-first search (DFS) and breadth-first search (BFS). BFS explores all nodes connected to a root node s and adds them in layers corresponding to how far away they are from the root. If there is no path to a node, it will not be represented in the resulting tree, which is known as the connected component. DFS, on the other hand, explores one path entirely before retracing and going down a different path, which produces a spindly tree representation of the graph. This kind of search is useful for solving mazes, where different rooms are nodes and hallways between rooms are edges. Similarly to BFS, the resulting tree will only contain nodes in the connected component. Also, any edges in the graph that don't appear in the DFS tree will be connections between ancestors and descendants.

REFLECTION: This section was very readable: 8/10. I didn't really understand the importance of some of the proofs in the book or that we covered in class. However, our class discussion and the images in the slides did make the logic of DFS and BFS more understandable. Are there additional searches besides DFS and BFS? What other information do BFS and DFS represent?

Section 3.3: Implementing Graph Traversals

SUMMARY: The book seeks to implement BFS and DFS in O(m+n) running time, where m represents edges and n represents nodes. A graph with n nodes and m edges can be represented by an nXn adjacency matrix, which allows us to check if there is an edge between 2 nodes in the graph. Adjacency lists are another way to represent represent a graph and are preferable to a matrix when edges are sparse; a list also only uses O(m+n) space, which is preferable to O(n²). Two data structures that support the implementation of DFS and BFS are queues (first in first out) and stacks (last in first out).

BFS(s): set Discovered[s] to True and Discovered[v] to False for all other v's initialize L[0] to consist of element s set layer counter i=0 set current bfs tree T = not 0 while L[i] is not empty: - - - - initialize an empty list L[i+1] - - - - for each node u in L[i] - - - - - - - - consider each edge (u,v) incident to u - - - - - - - - if Discovered[v] = false then - - - - - - - - - - - - set Discovered[v] = True - - - - - - - - - - - - add edge (u,v) to the tree T - - - - - - - - - - - - add v to the list L[i+1] - - - - - - - - endif - - - - endfor - - - - increment layer counter i by one endwhile

Because the for-loop iterates at most n times and the loop within the for-loop executes the degree of incident edges to u (2m), the entire algorithm is O(n+m) running time.

DFS(s): initialize S to be a stack with one element s while S is not empty: - - - - take node u from S - - - - if Explored[u] = False then - - - - - - - - set Explored[u] = True - - - - - - - - for each edge (u,v) incident to u - - - - - - - - - - - - add v to the stack S - - - - - - - - endfor - - - - endif endwhile

The number of nodes that get added to stack S is n, while the the number of times that each node gets added S corresponds to the degree of each node n, which is represented by the number of edges m. Therefore, the running time of this algorithm is also O(n+m). Even when you have distinct components the running time for DFS and BFS is O(n+m).

REFLECTION: Since we discussed these algorithms and there implementations in class before I had a chance to read the chapter, I found it pretty easy to follow (8/10). I might have to actually implement these algorithms to truly understand how they work, but thanks to the lecture and the reading I have a pretty good grasp on the basic logic behind using stacks vs queues, etc. One question I have is, in the pseudocode we looked at in class we passed (G, s) into the function but in the book's pseudocode we only pass s. Why the discrepancy?

Section 3.4: Testing Bipartiteness

SUMMARY: A bipartite graph is a graph where the nodes can be divided into groups of X and Y (or red and blue) so that each edge in the graph connects one node in X (red) to one node in Y (blue). A graph containing an odd cycle (eg a triangle or a pentagon) cannot be bipartite. The algorithm for determining if a graph is bipartite is a variation of BFS, where alternating layers are colored red and blue. If there are edges which connect a node to another node in the same layer, then the graph is not bipartite. This is because an edge connecting two nodes in the same layer indicates that there is an odd-length cycle present. The running time of this algorithm is still O(n+m).

REFLECTION: This section was an 8/10 on the readability scale. The book did a really good job of describing the properties of a bipartite graph. This message was reinforced in class with visuals showing how an algorithm would work. One question I have is why the “iterate through edges to make sure that none of them connect nodes in the same layer” would not add to the running time. Is it because this step with a running time of O(m) happens totally after the BFS algorithm has executed? O(m) + O(n+m) = O(n+m) still?

Section 3.5: Connectivity in Directed Graphs

SUMMARY: In a directed graph, an edge (u, v) goes from node u to node b but not vice versa; this alters the structure and connectedness of the graph. To represent a directed graph, we use two adjacency lists: one that represents incoming edges and one that represents outgoing edges. The DFS and BFS algorithms are very similar and still have an O(n+m) running time, but naturally the results will be different from those of an undirected graph. When a directed graph has a path for every two nodes u to v and v to u, this means that it is a strongly connected graph (essentially the equivalent of an undirected graph?). There can also be strong components within a graph that is not necessarily a strongly connected graph.

REFLECTION: This section was relatively short and an 8/10 on the readability scale. Again, there were a few proofs that seemed kind of just like common sense and I didn't fully understand the importance of them, but they were interesting nonetheless. I am also wondering what a real world application of finding all strong components in a directed graph would be.

Section 3.6: Directed Acyclic Graphs

SUMMARY: A graph with no cycles is known as a Directed Acyclic Graph (DAG), which can be used to represent dependencies or processes that need to happen in a certain order. If a graph is a DAG, then it has a topological ordering and vice versa, which we prove with an algorithm to derive a topological ordering from a DAG. The algorithm is founded on the basis that every DAG has a node that has only outgoing edges, meaning that there is no path to it from another node. This is true, since if we picked a node and “walked backward,” we would eventually complete a cycle if each node had at least one incoming edge.

topologicalOrder(G): find a node v with no incoming edges and order it first delete v from G call function recursively and add results after v

This version of the algorithm is O(n²), but with a few modifications to keep track of the remaining nodes as well as nodes with 0 incoming edges we can get the runtime down to O(n+m).

REFLECTION: I first learned about this algorithm in class and it made a lot of sense, so the section was very easy to follow (9/10). I'm wondering: would this be the best way to find out if a large and complex graph is a DAG, by running the algorithm until it fails or returns a topological ordering?

Chapter 4

Section 4.1: Interval Scheduling

SUMMARY: The trick to designing a greedy algorithm is to find a simple trick that will allow you to make the best possible choice each step of the way so that you end up with the best possible solution. However, finding the right simple trick is not always so simple. For example, with the interval scheduling problem it may seem natural to pick the jobs that start first or require the shortest time; while that may work some of the time, there are possible arrangements of jobs where using these tricks would not result in the best solution. However, if you always choose the job with the smallest end time, you will always be at or ahead of the optimal solution. This algorithm is only O(logn) running time, since the most costly part is sorting the jobs by finishing time.

An alternative problem is one where we must develop an algorithm for scheduling all jobs using the fewest number of resources. One way to do this is to create a number of labels equal to the depth (aka the number of greatest overlapping jobs) and then iterate through each job and assign a label to it that has jobs that don't overlap. In class we did it differently, ordering the jobs by start time and putting them into the same category if they didn't overlap with previous jobs in that category or creating a new category if they did until all jobs were sorted.

REFLECTION: I would give this section a 7/10 on readability – the part where the book diverged from our algorithm in class threw me off a bit. How would you know the depth without going through all the jobs and counting all of the overlaps?

Section 4.2: Scheduling to Minimize Lateness

SUMMARY: This problem is similar to the job scheduling problem, but this time the jobs have deadlines and we are seeking to complete them in the most timely manner possible. Again, there are several orderings that don't work very well and one that does: completing the jobs in the order that they are due. To get an optimal solution, you would want to schedule jobs without any idle time between them. If another solution exists with inversions (meaning that job A's deadline is before job B, but job B is scheduled first and the jobs are adjacent) that means we can switch the jobs to do away with inversions without increasing the maximum lateness. By this process we can transfigure an optimal solution with inversions into one without them, which is the same solution output by our Earliest Deadline Algorithm.

REFLECTION: The textbook didn't specifically say it and I can't remember if we discussed it in class, but I think that the running time for this would again be O(logn) since we have to sort the jobs by deadline. A question I have is how much more complicated the algorithm for the extension problem is (when the jobs also have an earliest release time). I give this section an 8/10 on readability.

Section 4.4: Shortest Paths in a Graph

SUMMARY: A BFS is handy for determining the shortest path from node s to node u. However, what can we do when we want to know the shortest path from node s to all other nodes in the graph? In 1959, Edsger Dijkstra developed an algorithm to find just that.

Dijkstra’s Algorithm (G, l): - - - - Let S be the set of explored nodes - - - - For each u in S, we store a distance d(u) - - - - - - - - Initially S = [s]} and d(s) = 0 - - - - - - - - While S is not V - - - - - - - - - - - - Select a node u not in S with at least one edge from S for which - - - - - - - - - - - - - - path d’(u) from s is as small as possible - - - - - - - - - - - - Add u to S and define d(v)=d’(v) - - - - - - - - Endwhile

This algorithm begins at node s and explores all adjacent paths, choosing the shortest path and adding the node on the other end to the set of explored nodes S. It records the distances to other nodes and updates them as it runs if a shorter path is encountered. The node the is chosen to be explored next does not get updated again, since the algorithm always chooses the one with the shortest path from s to such node u. This algorithm does not work if the weights of edges can be negative. When implemented using a priority queue, the running time of this algorithm is O(mlogn).

REFLECTION: I think this algorithm is the first one I've encountered where the images actually made it more difficult to understand what was happening. It was just confusing trying to walk through each step because there are so many things to keep track of! However, the chapter made the algorithm implementation and analysis seem really straight-forward, which is why I'm giving it a 9/10. One question I have is if other algorithms exist for this problem and if any are more efficient.

Section 4.5: The Minimum Spanning Tree Problem

SUMMARY: The minimum cost of connecting all nodes in a graph can be represented by a Minimum Spanning Tree (MST). One way of obtaining a MST from a graph is by using Kruskal's Algorithm, where we rank the edges in ascending order and add them to the tree if they do not create a cycle within the tree. Another Algorithm, similar to Dijkstra's Algorithm, involves traversing all edges in a graph and adding them if they are the shortest path to an unexplored node. Lastly, there is the Reverse-Delete Algorithm, which is essentially a backward version of Kruskal's algorithm. This means that we order the edges in descending order and delete them one by one if they don't disconnect the graph. The correctness of these algorithms can be proven using the Cut Property and the Cycle Property. The Cut Property states that if there is a set of explored nodes and a set of unexplored nodes and edge e is the edge with the smallest weight between the two groups, it must be included in the MST of that graph. The Cycle Property states that if there is a cycle in Graph G and edge is the edge with the biggest weight, edge e will not be in the MST.

These algorithms all work under the assumption that edge weights are distinct – when they are not, we can make slight changes to the weights to act as tie breakers. This will make the graph compatible with the algorithms and will produce an valid MST. Both Prim's and Kruskal's algorithms can be implemented in O(mlogn) time.

REFLECTION: The proof for the Cut and Cycle Properties honestly just confused me even more, both in class and in the reading. I would give this sections a 7/10 for readability.

Section 4.6: Implementing Kruskal's Algorithm

SUMMARY: In order to implement Kruskal's algorithm, we need to be able to check if two nodes are already connected. If they are, we cannot add another edge between them because that would create a cycle. A Union-Find Structure allows us to do these checks and join the 2 nodes (or set of nodes) together in O(nlogn) time when pointers are used. With further optimization, the Find(u) operation can get almost as low as linear – or O(n) running time.

Kruskal’s Algorithm (G): - - - - Sort the edges by cost - - - - For each edge e connecting nodes u,v - - - - - - - - Find(u) != Find(v) - - - - - - - - - - - - Union(u,v) - - - - - - - - - - - - Add e to tree T

Overall the algorithm has O(mlogn) running time, which we can't really reduce because of the time it takes to sort all of the edges.

REFLECTION: I give this section a 6/10 on the readability scale. I think it delved to deeply into the implementation and setup and proofs related to the Union-Find structure and it still didn't make me comprehend it that much better. I do find this data structure very fascinating though, don't get me wrong!

Section 4.7: Clustering

SUMMARY: A cluster is a way of grouping objects so that similar items are in the same group while dissimilar items are in a different group. One way to accomplish this is to define a distance function so that items that are less than that distance apart are in the same cluster. You can also run Kruskal's algorithm, which connects the 2 closest points first, then the next 2 closest points, etc. until there are k distinct clusters. This way you will always group together items that are close together and these groups will be maximally distant from other groups of points.

REFLECTION: I really liked that this section was short and straightforward so I give it an 8/10 for readability. I also liked that it gave us another application of Kruskal's Algorithm – this shows how versatile some algorithms can be in solving varied problems. At first the clustering seemed really complicated, but by tweaking the algorithm just a little we were able to find an easy and correct solution.

Section 4.8: Huffman Codes

SUMMARY: The problem explored in this chapter is that of data compression, namely saving space by doing away with fixed-bit character representations. Samuel Morse, inventor of Morse code, designed this form of communication with frequency disparities in mind. He made letters that appeared more frequently easier to transmit by making their encoding shorter and simpler than letters that appeared less often. One problem with Morse code is that some letters were prefixes of others and transmitters had to differentiate between them by allowing a small pause between distinct characters. In prefix codes, no encoding is a prefix of any other encoding. A prefix code that minimizes the average bits per letter (ABL) of a given string is called the optimal prefix code. This means that the most frequent letters in the string must have the shortest encodings and vice versa. A prefix code can be represented by a binary tree where each left down is a 0 and each right down is a 1 and the end of the path is a leaf node; an optimal prefix code is always represented by a full tree. The best way to obtain such a tree is to use a bottom-up approach using the following algorithm:

Huffman Algorithm (S): - - - - If S has two letters then: - - - - - - - - Encode one letter using 0 and the other letter using 1 - - - - Else: - - - - - - - - Let y* and z* be the two lowest-frequency letters - - - - - - - - Form a new alphabet S’ by deleting y* and z* and - - - - - - - - - - replacing them with a new letter w of frequency f(y*) + f(z*) - - - - - - - - Recursively construct a prefix code Y’ for S’ with tree T’ - - - - - - - - Define a prefix code for S as follows: - - - - - - - - - - - - Start with T' - - - - - - - - - - - - Take the leaf labeled w and add two children below it - - - - - - - - - - - - - - labeled y* and z* - - - - Endif

Huffman's Algorithm is greedy because it always takes the same approach (merging the two lowest frequency letters) without worrying about the big picture. By using a priority queue to represent the alphabet (where the frequency of a letter is the key), we can implement Huffman's Algorithm in O(nlogn) time (where n is the number of letters in the alphabet).

REFLECTION: I give this chapter an 8/10 since it was a pretty straightforward read (but long!). After reading about optimal prefix codes, I wonder how long the encoding for the most frequent letters is when we have a 100+ characters in our alphabet. Also, are emojis just bits as well? The alternative description of Huffman's Algorithm given in class helped me understand the algorithm more than the information in the book (pictured above).

Chapter 5

Section 5.1: The Mergesort Algorithm

SUMMARY: In Mergesort, you divide the input in half and solve each half recursively, combining the result at the end. The recurrence for this kind of solution is T(n) ⇐ 2T(n/2) + O(n). You can solve a recurrence relation by 'unrolling' the recurrence, meaning that you expand the recursion until you find a pattern and then sum up the running times of the levels. For example, in the first level the problem has a total cost of cn. The next level is 2 problems of cn/2 size. The third level is 4 problems of cn/4 size etc… Each level has a total running time of cn (aka n) and there are at most logn levels, hence a running time of O(nlogn). If you have a guess for what the running time is, you can also plug it into the recursion and see if it works in an induction proof.

REFLECTION: The unrolling of this recurrence makes a lot of sense, and the diagrams in class really helped. I don't think I could derive the inductive proof solution on my own despite having gone over it in class and reading it in the book. I think I just need more practice doing these kinds of problems. Part of this section read like a math textbook so I'm giving it a 6/10. One question I have is how does this compare to quicksort?

Section 5.2: Further Recurrence Relations

SUMMARY: There are other recurrence relations where q (the number of problems the original is subdivided into) is greater than 2. Unrolling these recurrences gives us a general template of (q/2)^j * cn, where j is the level number. Summing all levels of the recurrence gives us a geometric sum, which can be simplified to a running time of O(n^logq). When q=1, meaning you do away with half of the input each time, the problem still has an upper bound of O(n). For the recurrence T(n)=2T(n/2)+O(n²), the running time becomes just O(n²).

REFLECTION: I give this section a 7/10 because it was difficult, but also laid out in the simplest way this kind of material can be. I think we covered geometric series briefly in discrete math, but that was a few years ago so I'm a bit rusty. I'm wondering what sort of algorithm would have you split the problem into 3 or more parts and solve those recursively…

Section 5.3: Counting Inversions

SUMMARY: Ranking comparison is one problem that can be tackled by a divide and conquer approach. We can compare 2 different rankings by counting the number of inversions between them. The brute force approach (comparing every pair of numbers) would take O(n² running time, so we obviously want a more efficient approach which we can achieve by breaking the input into 2 equal halves and solving recursively (as we did with Mergesort). In order to solve this in O(nlogn) running time, we need a routine that merges and sorts the two different lists at the same time. We count every time an element in b_j is added to C (the total set of half A and half B) since the fact that it's smaller than elements in A means that it represents an inversion. Essentially, this is just Mergesort with an added counting step during the merge. It runs in O(nlogn) time.

REFLECTION: I give this section an 8/10 since it was so short and simple. It might be hard to actually implement the algorithm since there is so much to keep track of, but the theory is straightforward.

Chapter 6

Section 6.1: Weighted Interval Scheduling

SUMMARY: Greedy algorithms are only applicable to a small number of problems; in other cases, such as the weighted interval scheduling problem, we must use dynamic programming to find an optimal solution. In the weighted interval scheduling problem, we are trying to pick a subset of jobs from the set of n jobs such that the total value of all non-overlapping jobs is maximized. We can get a solution by starting with the final job, evaluating each job in turn, and deciding whether or not it should be should be included in the final solution. If we choose to include job n, the final solution is v_n + Opt(n-1). In other words, we have the value of n plus the optimal solution of all jobs before n. If job n is not in the final solution, then the optimal solution Opt(n) = Opt(n-1). Implementing the algorithm using recursion would lead to an exponential running time, but luckily we can cut back on this by using memoization. We can store the value of Opt(j) the first time it is computed and then access it again wherever it reappears in the algorithm. The first part of the algorithm discussed only finds the value of the optimal solution – a second pass must be made through this output to see which objects actually belong to the optimal solution.

REFLECTION: I really like the premise of dynamic programming – either you take a job or you don't – but it seems hard to actually implement it. I would give this section a 7/10 on readability.

Section 6.2: Memoization vs Iteration

SUMMARY: The algorithm we looked at in section 6.1 can be reformulated to use an iteration rather than recursion. In this case, you begin with M[0] (when there are 0 jobs) and work up from there, with Opt(j) = max(v_j + M[p(j)], M[j-1]). List M[j] represents the memoization of greatest values, which we build up as we go. In order to develop a dynamic algorithm, the problem:

1) Must have a polynomial number of subproblems
2) Must be solvable by combining the solutions of these smaller subproblems
3) Must have a natural ordering for the subproblems (eg. smallest to largest)

REFLECTION: I actually found the recursive version of the algorithm to be easier to understand than the iterative one. I think its because I've been working with recursion so much in my programming languages class (functional programming is very recursion focused). I would give this section an 8/10 though, because it laid out the new form of the algorithm very clearly.

Section 6.3:

SUMMARY: In the problem of the best fit line(s), we must determine the optimal starting and stopping points for an unknown number of lines to minimize the error. On one end of the spectrum, we could have n-1 number of lines connecting each pair pf points so that the error is 0, but if there is a cost associated with the lines this is not feasible.

Find-Segments(j)
- - - - If j = 0 then
- - - - - - - - Output nothing
- - - - Else
- - - - - - - - Find an i that minimizes eij+C+M[i-1]
- - - - - - - - Output the segment {Pi …..p]} and the result of
- - - - - - - - - - Find-Segments (i - I)
- - - - Endif

The running of this algorithm is O(n² because at each stage you compare the current point with each of the remaining j-1 points.

REFLECTION: I give this section a 7.5/10. I think the book explained it well, but I don't really understand when things are super conceptual.

Chapter 7

Section 7.1: Maximum Flow

SUMMARY: All transportation networks have three things in common: they have a source (where stuff originates), a sink (where everything eventually ends up), and each edge has a capacity (how much it can transport at one time). Two conditions that apply to the flow are 1) capacity condition, which states that the flow through an edge cannot exceed it's capacity and 2) conservation condition, which states that the total flow coming into an internal node must be the total flow going out. A common problem in a flow network is finding the maximum flow that can be pushed at once – in order to do this we need to use residual graphs, which are a new data structure.

Max-Flow - - - - Initially f(e)=0 for all e in G - - - - While there is an s-t path in the residual graph G_f - - - - - - - - Let P be a simple s-t path in G_f - - - - - - - - f’ = augment(f, P) - - - - - - - - Update f to be f’ - - - - - - - - Update the residual graph G_f to be G_f' - - - - Endwhile Return f

The algorithm above is known as the Ford-Fulkerson Algorithm and it terminates in at most C iterations of the while loop, where C is the sum of capacities of all edges coming out of the source. The overall running time of the algorithm is O(m*C).

REFLECTION: I give this section a 9/10. The concept of a residual graph just blew me away – what other applications are these graphs good for? It's really interesting how the type of data structure you have makes a world of difference in the running time of your algorithm. The class lecture was good for introducing the material and then the chapter in the book went into a LOT of detail. I still don't really understand how it can be O(m*C) running time because doesn't finding a path take O(n+m) just on its own?

Section 7.2: Minimum Cuts

SUMMARY: We can prove that the Ford-Fulkerson Algorithm gives us the optimal solution by looking at cuts. A cut is when you divide the graph into 2 sets A and B so that the source is in A and the sink is in B – the capacity of a cut is just the total of all capacities of edges that go from A to B. The total value of the flow from any cut in the s-t path is equal to the flow out of A minus the flow into A. For each graph, there exists a minimum cut that limits the overall maximum flow. The logic behind the correctness of the Ford-Fulkerson Algorithm depends heavily on the capacities of edges being integers – if they are not integers, the running time of the algorithm might be much longer than O(m*C). However, the Max-Flow Min-Cut theorem holds even if the capacities were real numbers.

REFLECTION: I'm giving this section a 5/10 for readability because it seems like they took some very common sense things and turned them into really complicated math proofs. It just makes sense that at some point in the graph you're going to have a bottleneck that limits the overall input. It could be at the very beginning if all the edges coming out of the source have the smallest total capacity of any cut in the graph or at the very end if all the edges going into the sink have that. Or it could be at any point in between. I would like more clarification of why real numbers could make it potentially a bad algorithm, since it seems highly likely that we would not only be working with integers in a real world implementation of this algorithm.

Section 7.5: Bipartite Matching

SUMMARY: In the Bipartite Matching Problem, we are trying to find the largest possible matching in graph G such that every edge in the set of edges has one end in set X and one end in set Y. We can compute this by turning the undirected graph into a directed graph where all edges have a capacity of 1. Then, we find the maximum flow over this new graph and this results in the largest matching in G. The overall running time of this (converting to directed graph and running the Ford-Fulkerson Algorithm) is O(n*m), which is the same as running BFS or DFS. Hall's Theorem states that if the maximum flow over the graph is less than n, there is no bipartite matching possible.

REFLECTION: I found this section very difficult to read 4/10, because they used symbols that I have never before, such as the upside down L. I really wish I knew math terminology because I feel like that would make reading this textbook much easier at times. One question I have is: when is Bipartite Matching useful? What real world applications does it relate to? We haven't talked about this problem in class yet, hopefully that will make it clearer.

Section 7.7: Extensions to Maximum Flow

SUMMARY: Many problems can be solved in polynomial time because they can be reduced to problems of maximum flow. One such extension is meeting the demand of multiple sinks using the flow produced by multiple sources (eg. multiple factories shipping products to multiple stores). By manipulating the graph and adding a “super” source and “super” sink, we can reduce the problem to essentially a maximum flow problem. If the maximum flow of this new graph has value D, that means there is a feasible circulation – otherwise, there is none. Another extension is that of forcing the solution to include certain edges by placing lower bounds on edges. We want to know if there is a circulation that satisfies the new conditions created by this limitation. We do this by creating a graph G' that is the same as G but without lower bounds. If G' contains a feasible circulation, that means that G does as well.

REFLECTION: I give this section a 7/10. It wasn't super-interesting but it also wasn't terribly complicated like the last couple of sections I've read. I'm interested to see these ideas explained in class – I feel like some of these concepts would be more clear with step-by-step illustrations. The algorithms we looked at here are for seeing if there exists a way to solve the problem at hand, but can they actually say what the most efficient way is? Do companies actually use algorithms like this to ensure top efficiency in product delivery?

Table of Contents