Johanna's Journal

Preface Summary

Due to the often detail-muddled nature of algorithmic problems, the formation of algorithms has two components. First one must find the mathematical wording of the problem that makes it solvable through algorithms. Then one must identify the proper algorithm design techniques that befit the problem. I don't really have any questions about the preface, but I am interested to see throughout the course and the book what type of detail we go into when solving problems regarding real cs implementation (i.e. specificity of control structures and collection types, etc.).

Chapter 1.1 Summary

This chapter begins by introducing the problem which inspired the Gale-Shapley Stable Matching Algorithm. Initially the problem is a question of matching applicants with companies in a way that stops applicants from leaving their new employer for a preferred employer that will take them. In a word, we want the pairing of applicants and employers to be “self-enforcing”. This question can also be applied to matching residents with hospitals.
The first step in coming up with an algorithm to solve this problem is formulating the problem. We can simplify it by not worrying about companies looking for multiple applicants. Instead we will just consider n companies each searching for one of n applicants. This representation of the problem is applicable to a broader, extended case as well. Now this problem is the exact problem we solved in class with matching n men and n women.
In order to design the algorithm, we must explore certain fundamental ideas motivating it. The components of our problem are a set of single men, a set of single women, a list of each woman’s ranked preferences in men, and a list of each man’s ranked preferences in women. Note that initially everyone is unmarried and the goal is to marry everyone stably. The initial step will involve an unmarried man choosing his highest ranked woman and proposing to her. We will use engagement as an intermediate step between singleness and marriage because it is dangerous for this woman to reject this man in case no other man proposes to her but she is not fully committed until every other man is engaged. Now consider a state where some men and women are still not engaged and some are engaged. It would make sense to have a random single man choose the highest ranked woman on his list and propose to her. If she is engaged and prefers her current man, she will reject this arbitrary man. If she is engaged and prefers the arbitrary man, she will break her current engagement and accept the proposal. Then her ex will become single. If she is single, she will automatically accept his proposal. The algorithm terminates when no one is free anymore.
The next step is analyzing the algorithm to see if it returns a stable matching, and even a perfect matching. As discussed in class, we know a woman w becomes engaged and does not go back to being single. Her current partner can only improve in terms of her preference list. We know that a man m starts single and can get engaged but can still go back to being single. Each time he gets engaged, his current partner will get worse and worse in terms of his preference list. As proven in class, we know that the algorithm terminates after at most n2 iterations. Since we have already discussed proofs 1.3, 1.4, 1.5, 1.6 in class, I will not go into the details here.
Upon evaluation of the algorithm, it is clear that the G-S algorithm is not fair to one of the two groups represented depending on who does the “proposing”. In the marriage example, men are favored because they propose. If they all list different women as their highest ranked woman, they each get married to their favorite woman regardless of the women’s preference lists. Therefore, the algorithm favors whoever does the proposing.
All executions of the G-S algorithm yield the same matching. A woman w is a valid partner for man m if there exists a stable matching that contains the pair (m, w). W is the best valid partner for m if w is m’s highest ranked woman who is a valid partner. According to proof 1.7, every matching created by the G-S algorithm results in the set of pairs where each man is matched with his best valid partner. This can be proven by way of contradiction. In this same way, we find that women are always paired with their worst valid partner.
In the future, will we discuss the idea of algorithm “fairness” in more detail? It seems like in all the possible uses of the G-S algorithm discussed here, fairness is a very important factor so I can't imagine we will continue to settle for “unfair” solutions?
I found this chapter to be a 3 because we already discussed most of the information in very great detail in class.

Chapter 2.1 Summary

In studying the creation of efficient algorithms, we look to study discrete problems without much irrelevant detail. The algorithms we work with will encompass vast combinatorial possibilities, like the Stable Matching Problem. Computational efficiency is paralleled by efficiency in running time, which is more measurable. We must also keep in mind memory/space, one of our other important resources when designing efficient algorithms.
It is hard to define efficiency without using arbitrary words like “quickly”. The proposed definition of efficiency leaves out the idea of how well an algorithm will scale as problem sizes grow as well as the idea of where the algorithm will be implemented. A good way to tell how impressive or efficient an algorithm is is to check it against the search space it defines. In the Stable Matching problem, the search space is n! because that is how many possible matches can be made between n men and n women, but it completes in O(n2) time, so that is pretty impressive!
So our new definition of efficiency, based off this new criteria, is “achieves qualitatively better worst-case performance, at an analytic level, than brute-force search”.
But now we have this vague term, “qualitatively better worst-case performance”. We can remedy this by considering polynomial running time. If a solution’s scaling behavior when the input size is increased is always a constant, the solution is efficient. This can be achieved by a polynomial-time algorithm or better. So the new definition of efficiency becomes “it has a polynomial running time.”
The greatest benefit of having such a specific definition of efficiency is that we can now definitively express whether or not an efficient solution exists to every problem. With our original ambiguous definition, this was not the case at all.
Is this final definition of efficiency the commonly observed rule among computer scientists? I feel like the rule of thumb for efficiency should really depend more on the problem at hand than anything. Polynomial-time is definitely good for large problems but it seems like a big generalization to say that this is always an efficient running time for an algorithm.
I give this chapter a 6 out of 10 because it presents some new information and it cracks a few solid jokes.

Chapter 2.2 Summary

So far we have determined that an algorithm’s worst-case running time on inputs of size n is bounded above by some function f(n). It is easier and more meaningful to speak of f(n) in terms of its greatest degree instead of as a complete function (i.e. it is better to say a function is O(n2) than to say that it took 16n2 + 3.5n + 8 steps). Our regular O notation is used to represent asymptotic upper bounds of algorithms, which we have already discussed plenty. Asymptotic upper bounds bound our algorithm’s steps (T(n)) from below, so it they are sort of underestimations of our algorithm’s run-time. They are meant to be simplified in the same way as regular O-notation. An asymptotically tight bound grows exactly the way our algorithm function does to within a constant factor. This function represents an instance in which our algorithm function’s asymptotic upper bound is equal to its asymptotic lower bound. The tight bound can be found by closing the gap between the upper bound and the lower bound and occasionally by finding the limit as n goes to infinity.
All of the aforementioned bounds share the property of transitivity. We discussed this in class too. To further expand upon the idea of polynomial-time algorithms, it is important to realize that polynomials do not always take the form of n to a degree d that is independent of n. An algorithm with T(n) of O(nlogn) is also polynomial-time, as is O(√n). Logarithmic-time algorithms are even slower-growing than polynomial-time algorithms and exponential-time algorithms are much larger than all polynomial and logarithmic-time algorithms over large input sizes. These generalizations are all hard to see with small input sizes, but as input size increases they hold true.
I know the text said we want to avoid coming up with O(f(n)) with super specific f(n) functions, but I am curious as to how one would even compute a long polynomial function for run-time to begin with.
5/10 for interesting-ness.

Chapter 2.3 Summary

In order to implement the Gale-Shapley Algorithm in O(n²) time as intended, much thought must be given to which data structures we will use to represent the different pieces of the algorithm. Initially, we are given a list of men and a list of women, each with their own list of preferences. The initial step of implementing any algorithm is often preprocessing, or converting certain input from its given input representation into a data structure that is more appropriate for the problem at hand. In the case of the G-S algorithm, we will begin by converting the list of each man's preferences in women from a list to an array. This will be beneficial because it allows us direct access, and the initial step of creating the array will only take O(n) time. This is a worthwhile trade-off. Next, we will consider the implementation of the list of single men. This will be represented by a doubly linked list because our list of single men will be very dynamic, thus O(1) deletion and insertion are key. The fact that a doubly linked list has O(n) search time is not going to be relevant because we will not need to search our list of single men at any point in the algorithm. Once again, this trade-off is in our favor. We will represent the list of women each man has yet to propose to by an array for each man. Each will use a pointer to indicate which woman he should propose to next. Then we will represent current engagements using an array and we will implement each woman's preference list using a 2-dimensional array that will require an initial time of O(n²). Although this is a big initial cost, it will allow us to look up each man's rankings within a woman's preference list in constant time. This is a very beneficial trade-off since many comparisons will be made using this array and each array will only have to be constructed once. Therefore, once all these pieces of implementation are added up, the G-S Algorithm is implemented in O(n²) time.
Question: Although a doubly linked list is fancier, I don't understand why a regular linked list wouldn't do that trick. Seeing as we will never go backwards through our list of single men, as long as the last node connects back to the head node of our linked list, I feel like a singly linked list should be all we need to implement this piece of the algorithm (sorry this is sort of nit-picky, I really couldn't think of any deeper questions on this chapter after all the discussions we had in class).
Rating: 5/10

Chapter 2.4 Summary

This chapter reviews a few of the most commonly encountered running times for algorithms and surveys different examples of each. First, we look at linear running time, or O(n). Two examples of this are computing the maximum of n numbers and merging two sorted lists. Next we look at O(nlogn) time, which is commonly associated with sorting. The Mergesort algorithm has an O(nlogn) time cost because the merging step (as discussed above) requires O(n). Another example of an O(nlogn) algorithm is the comparison of gaps between time-stamps which we talked about in detail in class. Next we look at quadratic time algorithms. A common cause of quadratic running time is nested O(n²) loops. An example of this is the brute-force algorithm presented for finding the closest pair of points within a given set of points in space. Next we will look at cubic time. An example of this is an algorithm solving the following problem: Given a list of sets S₁, SS₂, …, S_n where each set is a subset of {1, 2, …, n} and we need to figure out whether any of these sets are disjoint. The algorithm to solve this problem (on page 52-53) is O(n³) because each set has max size of O(n) so the loop comparing an element in S_n to all elements in S_k takes O(n). This O(n) inner loop must be repeated for O(n) different sets S_k and then this nested loop must be repeated over all the sets, thus giving us O(n) x O(n) x O(n) time. Next we will look at O(n^k) running time. An example of this is the algorithm on page 53 which determines how many independent sets exist within a graph. The algorithm described is the brute-force search but unlike most instances of brute-force search, there doesn't seem to be a more efficient algorithm for solving this problem. Anything beyond O(n^k) / polynomial time algorithms create extremely rapidly-growing running times. In most cases, a run-time beyond polynomial indicates that a more efficient algorithm exists, as in the case of the Interval Scheduling Problem (the brute-force search algorithm for this would yield an O(2ⁿ) time algorithm). But in certain cases, this is the best algorithm possible, like in the problem described on page 54. Finally, we will examine sublinear time algorithms. Since just reading the given information is a linear-time task, sublinear algorithms rely on our ability to “query” input indirectly instead of reading it. A common example of a sublinear algorithm is the binary search algorithm which can find an element in an array in O(logn) time, which is asymptotically smaller than linear time.
Question: The idea of not being able to bypass the brute force search algorithm (as in the example on page 54) is interesting. Is there any known way to determine when a faster algorithm exists, the same way in math it is often possible to prove when no solution exists to a problem? In other words, is it only possible to prove that the brute-force search algorithm works? Or is there a way to prove that no faster algorithm exists?
Rating: 6/10

Chapter 2.5 Summary

Often, the efficiency of an algorithm comes down to implementation details such as data structures. In this chapter, we will look at the priority queue and how best to implement it. The goal time per priority queue operation (deletion, insertion) is O(logn) because this will allow us to sort n numbers in O(nlogn) time. We will suit this need using a heap as the underlying implementation for the priority queue. By maintaining heap order (each parent node has a smaller or equal value to both of its child nodes), we will be able to sort the heap easily because the height of the heap is logn and we will create algorithms so that we will not deal with each of n elements but instead deal with logn elements (the height) when deleting or inserting an element. The heapify-up algorithm is what we will use every time an element is added. We will automatically add a new element to the bottom of the heap, and then run heapify-up on it until it is in its proper position (under a parent that is smaller than itself and above children bigger than itself). On page 62, we see the proof for why the heapify-up operation will fix an almost-heap array in O(logn) time. Since each swap within the heap takes O(1) time and the maximum number of swaps will only be log(n), the reordering of the tree will only take O(logn). Next, we discuss the slightly more complex algorithm Heapify-down, in which the added element is too large and needs to be moved down in the heap. Heapify-down will help us when we delete an element. If an element is deleted from somewhere other than the end position of the heap, it will leave a hole. This hole will be filled with the current last element in the heap and then Heapify-down will be run on this element until it finds its proper place in the heap. Like Heapify-up, the algorithm takes advantage of the fact that each swap is O(n) and it can re-sort the heap properly in O(logn) time. Now we can perform all priority queue (heap) operations in O(logn) time, and so we can sort a list of elements using a priority queue in O(nlogn) time. This is a useful application of the priority queue implemented as a heap.
Question: The chapter did not provide any examples of heap uses other than the priority queue. What else is a heap used to implement? It seems very efficient for sorting an array, so why is it not a better-known data structure? Or is it? It seems like it should render the regular sorted array/sorted list obsolete.
Rating: 4/10

Chapter 3.1 Summary

This chapter describes terminology associated with graphs, which we have reviewed pretty heavily in class. It also provides some examples of real-life applications that can be represented using graphs. For example, a transportation network can easily by modeled using a directed graph where each node represents an airport or train station and the edge from one node to another is a flight or train. The directedness is not entirely necessary here since often when there is a trip from destination u to v, there is a consecutive return trip. Another example in which a directed graph would model a real-life system well is information networks. When one webpage contains a hyperlink to another page, it is not always the case that that page has a hyperlink pointing back. Thus, for creating algorithms to infer the most important pages on the Web (something search engines must do), directed graphs are a good model. An undirected graph is a tree if the graph is connected but does not contain any cycles. Trees will be important in discussing graph traversal algorithms. Having a root is helpful because it can serve as a starting location for traversal and add direction to our traversal algorithms. “Rooting” the tree helps us see that each n-node tree has exactly n-1 edges.
Question: In the list of graph applications, the book mentions search engines using hierarchy of hyperlinks to determine most-searched webpages. In an algorithm containing as many webpages as a search engine algorithm would, what structure would one use to implement a graph? Is an adjacency list or matrix often used like in class? Or an array? I am still a little confused about the underlying implementation of graphs.
Rating: 7/10

Chapter 3.2 Summary

Now we will discuss the concepts of graph connectivity and graph traversal. This section is dedicated to answering the question: is there a path from node s to node t in graph G? This is called the s-t Connectivity Problem or equivalently the Maze-Solving Problem. First we will examine the Breadth-First Search Algorithm. This algorithm explores the graph, starting at s, one “layer” at a time. Each layer is the set of nodes joined by an edge to the previous “layer”, starting with all nodes joined by an edge to s. The algorithm terminates when the last possible layer is reached and added to our BFS tree. Next we will explore the Depth-First Search Algorithm. The Depth-First Search Algorithm achieves qualitatively similar efficiency to the Breadth-First Search Algorithm. It travels downward through the graph from s, marking each node incident to s as “Explored”, then continuing to do so until it can go no further. It then repeats the same procedure on the next unexplored vertical branch, and another until it can go no further downward or side to side.
Question: I am a bit confused by theorem 3.7 and its proof though. It states that if x and y are nodes in depth-first search tree T, and (x,y) is an edge of G that is not an edge of T, then one of x or y is an ancestor of the other. When the proof says “y is a node that was discovered between the invocation and end of the recursive call DFS(x)”, I don't quite understand what it is saying. Does this mean that the y node has already been explored because it is the child of another node besides x that has also already been explored? I believe this must be what it means but it is a little vague and I'm not sure how this proves that x and y are ancestors.
Rating: 5/10

Chapter 3.3 Summary

This chapter answers some of my previous questions about the underlying implementation of graphs. I was all distressed before reading this chapter but now everything's okay again.
Previously, our algorithms had running times that were functions only of n. Now when we implement our BFS and DFS algorithms, we will encounter algorithms that complete in O(m+n) time where m is the number of edges in a given graph. To implement a graph, we will use an adjacency matrix. Adjacency matrix A is an nxn matrix where A[u, v] is equal to 1 if the graph contains edge (u, v) and 0 if not. The adjacency matrix is efficient in that it allows us constant access when checking if arbitrary edge (u, v) is present in the graph. The graph takes up O(n²) space. Checking for all nodes incident to a given node will take O(n) time (worst-case) using either implementation (adjacency matrix or list). We are looking for a way to find all incident edges more quickly. The adjacency list representation of a graph is made up of an array Adj, where Adj[i] is a linked list containing all nodes adjacent to node v. In an adjacency list representation of a graph, each edge is represented twice (once when w appears on the list for node v and again when v appears on the list for node w). The matrix requires O(n²) space and the list representation requires only O(m+n) space. Because m ⇐ n² we know that the bound O(m+n) is always better than O(n²). Thus, the adjacency list is more space-efficient. However, the matrix is more time-efficient when trying to access any arbitrary edge, since an array has constant-time lookup. But if a search algorithm is already pointing at one node in an edge, it can read the list of neighbors in constant time to find the second node in that edge. Thus, the adjacency list representation of the graph makes the most sense in search algorithms because when the algorithm “explores” a graph, it will already be at the node in the graph when it traverses its neighbors. So this will be equal in efficiency and better in space to using the adjacency matrix representation.
The breadth-first search algorithm can make use of either a queue (FIFO) or a stack (LIFO) to represent a layer L_i because the algorithm is allowed to consider the nodes in a layer in any order. Using an adjacency list to represent graph G, an array Discovered where Discovered[u] = true when our search sees u, and a queue or stack for each Layer L_i, our BFS algorithm runs in O(m+n) time. It makes more sense to implement each layer as a queue but it is not that consequential. The depth-first search algorithm uses a stack instead of a queue to contain the nodes to be processed. This is because each time a new node is found, the algorithm immediately explores it and stops exploring the node it was previously on, even if other nodes are left temporarily unexplored. These nodes will be stored in the stack to be popped off and explored as soon as the DFS algorithm stops finding unexplored nodes in deeper “layers”. Using a stack for this is efficient because insertions and deletions (the only operations we will need) take O(1) time. Because the total number of nodes added to the stack will be 2m (where m is the number of edges), the overall runtime of the DFS algorithm is O(m + n) as desired.
Question: Would it be better to use the BFS algorithm or the DFS algorithm when finding the set of all connected components? They have equal bounds on their run-time, but BFS maintains order to some extent because it keeps layers together. If we were to look for all connected components of a graph in order to use them in some other structure, would it make more sense to use one search algorithm over another or would it be inconsequential?
Rating: 5/10

Chapter 3.4- 3.6 Summary

A graph is bipartite if its nodes can be partitioned into two subsets– one where each node is red and the other where each node is blue– such that each edge has one blue node and one red node. If a graph is bipartite, then it cannot contain an odd cycle. This makes sense when looked at inductively starting with the example of a triangle (or a graph with three edges and three connected nodes in a cycle). In determining bipartiteness, it is becomes apparent that the existence of an odd cycle is the only characteristic that would prohibit a graph from being bipartite. In order to see if a graph is bipartite, we use a strategy very similar to Breadth-First Search where each node in layer L_i is colored blue and every node in layer L_i+1 is colored red, etc. This continues until the whole graph is colored and if we end up with some node in the last layer connected to a node also within the last layer, this creates an odd cycle and also reveals that the graph is not bipartite.
Now we will consider directed graphs instead of undirected graphs. The difference in implementation is that each node will have two lists associated with it instead of just one. It will have a list of its neighbors pointing to it as well as a list of neighbors it points to. The search algorithms for the directed graph are a little different because it is possible for a node s to have a path to t and for t not to have a path back to s in a directed graph while this is impossible in an undirected graph. If a path exists in both directions, then the two nodes s and t are mutually reachable. In order to determine whether a directed graph is strongly directed, it is necessary to perform a BFS in G starting at any node s and then we also run BFS on G^reverse starting at the same s. If either of these fails, then the graph is not strongly connected because if it were strongly connected, (3.16) explains that all nodes would have to be mutually reachable from s.
If an undirected graph has no cycles, each of its connected components forms a tree. If a directed graph has no cycles it is called a directed acyclic graph. DAGs are useful to model precedence relations or dependencies. For example, given a set of courses, a DAG can model the fact that certain courses are prerequisites for others and thus other courses depend on them. In order for a graph to be topologically ordered, for all of its edges (v_i, v_j), we have that <j. If this is the case for some graph G then G is a directed acyclic graph! Important question: If each G that can be topologically ordered is a DAG, then can each DAG be topologically ordered? This ends up being true because for every DAG G, there is a node v with no incoming edges. This will serve as the starting node for our topological ordering. The existence of such a node in a DAG is fairly simple. Pick a node at random and begin following edges backwards until you either find a node that has no nodes pointing to it or hit the same node twice. If this occurs then G is clearly not acyclic. The algorithm for finding the topological ordering for the nodes in a DAG simply requires us to find the node without any nodes pointing to it, delete it from the graph and add it to the topological order. Do this recursively until all the nodes in G are deleted and the topological order is complete. This algorithm runs in O(m+n) time because we maintain the number of incoming edges each node has from active nodes (nodes still not removed from the graph by our algorithm) and the set of all active nodes still in G that have no incoming edges from other active nodes.
Question: If a node is deleted from a DAG in the algorithm described on page 102, is it possible that this deletion will create more than one new node without any other nodes pointing to them? Can this possibly affect the algorithm and cause it to fail?
Rating: 5/10

Chapter 4 Intro and 4.1 Summary

The idea of a greedy algorithm is to optimize individual steps of an algorithm in order to create overall optimal outcome. If each local decision is optimal, the greedy algorithm will “stay ahead” because there is no point at which the algorithm will fall behind any other possible algorithm solving the same problem.
The interval scheduling problem is one that can be easily solved using a greedy algorithm. Given a set of requests {1, 2, …, n}, the ith request is the request corresponding to time interval s(i) - f(i). A subset of requests is compatible if no two of the intervals overlaps in time. The idea is to get as many intervals into a compatible set as possible. This is the optimal solution. The most important step of the algorithm is determining what rule we should use to select our first request, and then all subsequent non-conflicting requests. As we discussed in class, we could select the request that is the shortest, or the request that ends soonest, or the request that takes longest, etc. We end up choosing the request that finishes first, i.e. the request i for which f(i) is the smallest. We end up with an algorithm in which we pick the request i with the smallest f(i) then we add it to a set. Then we pick find the i with the next smallest f(i) and check that against the last f(i) value in our set of accepted requests. If they do not conflict, we add this job to the set. After each request is considered, it is deleted if it conflicts with the end time of our current set of accepted requests. This continues until the set of all requests is empty. Inductively, it is possible to prove that there is no solution more optimal than the one we have come up with. This is done by a “Greedy-stays-ahead” Algorithm which assumes there is a more optimal solution and proves by contradiction that there is not. The Interval Partitioning Problem is a similar problem in which we do not have one resource, as in the Interval Scheduling Problem, but instead we have many resources and wish to schedule all the requests using as few of these resources as possible. Let the depth of a set of intervals be the maximum number of intervals that pass over any point on the timeline at once. Then, intuitively, the number of resources needed to satisfy all the requests is at least the depth of the set of intervals. Once we realize that there are no long-term restraints beyond the depth of the set of intervals, it is clear that an algorithm exists such that the number of resources needed is exactly the depth. Using the algorithm described on page 124, we can achieve this. The algorithm essentially orders all the intervals by start time and assigns each interval that overlaps with an interval from every “label” (number between 1 and the depth), it assigns this interval to whatever label has not yet been assigned. It otherwise places the interval into an already existing label with which it does not conflict. The algorithm leaves no interval unlabeled and does not overlap intervals because if intervals were overlapping, they would not be assigned to the same resource label.
Question: What is the run-time of this algorithm and how does the cost of finding the depth factor in? I imagine that in a very large set of intervals over a long span of time, searching for the exact point at which the most intervals overlap is probably a large time cost. Would this be a part of the setup of the algorithm or does this calculation stand alone?
Rating 7/10 because I appreciated the effort in the introduction with that movie quote.

Chapter 4.2 Summary

New lateness minimizing problem: this time we have a task i that has a deadline d_i and takes time interval of length t_i but it can be scheduled anytime as long as it finished before the deadline. When given a set of jobs, we are allowed to schedule certain jobs such that they finish late. The algorithm we will design must determine a start time for each task(interval) and thus it will have end time f(i) = s(i) + t(i) where s(i) is the start time and t(i) is the interval length. A request is late if it misses the deadline, i.e. if f(i) > d_i. The lateness of this request is l_i = f(i) - d_i. The goal of our next algorithm will be to schedule all the given intervals in a way that minimizes the maximum lateness. Natural first question: How should we sort the jobs to complete in order? One idea is to schedule jobs in order of increasing length and another is to sort the jobs in order of increasing slack time between deadline and time length. These both fail. Instead we will use the Earliest Deadline First rule, where we sort the jobs in increasing order of their deadlines and schedule them in this order. The algorithm will simply schedule the first job in this sorted order to begin at time s so s = s(1) and then from this point on f(1) = s(2) and f(2) = s(3) etc. Thus, there is no idle time, i.e. there is always some job being completed. Using a series of exchange arguments, we can prove that the algorithm produced creates the most optimal scheduling of intervals. First we show that there is an optimal schedule with no idle time and ten that all schedule with no inversions and no idle time have the same maximum lateness (this is a part of the exchange argument). Since our algorithm creates a schedule with no inversions by definition as well as no idle time, we see that any optimal solution O has the same lateness as our schedule. Then we show that there is an optimal schedule that has no inversions and no idle time. This is done by showing that if an inversion exists in O, swapping the inverted intervals i and j will create a swapped schedule with a maximum lateness no larger than that of O. Finally, after these sub-proofs that add up to an entire exchange argument, we get that the schedule produced by the greedy algorithm had optimal maximum lateness.
Question: At the end of the chapter, the book mentions a harder version of this question in which a release time is included for each interval as well as its deadline and requested time. In this case, there will most likely be idle time because we will not be able to simply begin the jobs one right after another. In this type of problem, how would we figure out what to change in how we organize our intervals? I feel like coming up with the organizations of these intervals is easy enough once I see how the book does it but I cannot intuitively come up with a good idea/plan of attack for algorithms like these when I'm on my own/ doing the problem sets. Are there strategies for approaching certain types of problems like the design patterns we learned last semester in Software Development for approaching different types of code smells?
Rating: 5/10

Chapter 4.4 Summary

Now we are dealing with the single-source shortest path problem: given a starting node s what is the shortest path from s to each other node? In Dijkstra's Algorithm, we maintain a set of explored nodes and we initialize it to the starting node and the total shortest distance to 0. For each node in our set of unexplored nodes, we travel through the explored set S and then add on the single edge from u in S to unexplored node v. We then add v to S and add the length of (u,v) to d(u) to get d(v). Proof 4.14 shows that each path P_u is the shortest path to u in S by induction. Since we already know that P_n is the shortest path to n, we know that P_n+1 is the shortest path to n+1 where n and n+1 are connected nodes with the shortest path between them. Since we have made a local optimization to reach n+1 from n and the path to P_n is optimal, we know that the overall path to P_n+1 is optimal. In order to implement this entire algorithm in a running time of O(m) where m is the number of edges in the graph, we need to use a priority queue to store the nodes V in a priority queue with d'(v) as the key for each v in V (V is the number of nodes in the graph).
Question: I understand the implementation of the algorithm and how the priority queue and ExtractMin operation help the running time but wouldn't the creation of the priority queue using distances as keys have a run-time of O(n) since we would need to figure out the distance to all the keys? Is the set-up not considered to be part of the overall runtime?
Rating: 5/10

Chapter 4.5 Summary

Next we will consider the minimum spanning tree– a minimum-cost solution that connects all the vertices of a graph. The solution that most cheaply connects all the nodes of a graph will be a tree because having two different paths to the same node will not be minimum-cost by definition (one path will have to be cheaper than another, thus eliminating some edge in the solution and creating a tree).
In this case there are multiple greedy algorithms that optimally solve the minimum spanning tree problem:
Kruskal's Algorithm starts with an empty tree and organizes all the edges in the graph in order of increasing cost. It iterates over these edges from least cost to greatest cost and adds each edge that does not create a cycle in the tree. If the edge creates a cycle, it is discarded and the algorithm proceeds until all edges have been considered.
Prim's Algorithm starts with a tree S just containing the starting node. In each iteration, it adds the smallest edge attached to S at only one node to the final MST and adds the one node to S.
The Reverse-Delete Algorithm begins with a the full graph and deletes edges in order of decreasing cost until it can no longer delete an edge without disconnecting the graph.
The Cut Property: If e is the minimum-cost edge such that e contains a node in S (the explored set of nodes) and V - S (the unexplored set of nodes), then e is a part of the MST. This is proven by the fact that no other edge can possibly connect s to the vertex in V-S more efficiently than e can.
Using the Cut Property, it is possible to prove the optimality of both Prim's and Kruskal's Algorithms.
The Cycle Property: If C is a cycle in graph G, and e is the most expensive edge belonging to C, then e is not a part of the minimum spanning tree of G. This is also proven by an exchange argument because if e is part of a cycle, that means there exists an alternate path from s to every node in C and that this alternate path will be shorter (since e is the most expensive edge in C). Thus, there exists an alternate path more optimal than e.
Now we can prove the optimality of the Reverse-Delete Algorithm using the Cycle property defined above. The idea behind this proof is that every edge removed by the Reverse-Delete Algorithm is part of a cycle and since it is removed using a list of edges in descending order of cost, it is the most expensive edge in the cycle. Thus, by the Cycle Property, its deletion creates a more optimal graph.
Question: Is there any situation in which one algorithm for creating a minimum spanning tree would be more optimal than another? Are there certain applications of the minimum spanning tree in which it would make more sense to use the Reverse-Delete Algorithm than Prim's or Kruskal's? Since they are opposite approaches it seems like they would have different implications when applied to different networking problems?
Rating: 6/10

Chapter 4.6 Summary

In this section, we explore the Union-Find Data Structure which supports rapid searching and updating of connected components of a graph while Kruskal's Algorithm is being implemented
Since Kruskal's Algorithm adds edges before the nodes are necessarily connected, the set of edges in the MST begins with a series of nonsensical non-connected components until the algorithm terminates. This means it would be really helpful to have a structure that tells us which connected component each edge (v, w) belongs to throughout our algorithm implementation. The Union-Find data structure allows us to find whether a node u is connected to a node v by checking if Find(u) = Find (v), since Find(v) will return the name of the set containing u. Note that the Union-Find data structure is only equipped to handle edge addition, not deletion throughout the Kruskal Algorithm implementation.
The Union-Find data structure will be implemented using an array called Component which has indices 1-n pertaining to the nodes 1-n, where the value stored at Component[n] is the name of the connected component that contains node n. Even with some optimizations that consider the union of large sets into one connected component, the worst-case run-time of adding nodes to a connected component using this implementation is O(n), which is not ideal when adding nodes will be done repeatedly throughout Kruskal's Algorithm.
A different approach is using pointers between nodes and connected components represented by a member node's name (the first node to be added to a component, i.e. to exist without being connected to any other nodes). It is possible using this data structure to implement Kruskal's Algorithm in O(mlogn) time because we are using the Find function 2m times (and Find has a runtime of O(logn) time now), and the Union operations n-1 times (the Union operation has a runtime of O(1) and MakeUnionFind has O(n) runtime).
Question: Would we just use n 2-element arrays to create the node object in the pointer implementation? Would the first element in the array be the actual node and then the second element be the node it points to?
6/10 very intricate stuff

Chapter 4.7 Summary

In this section we talk about clustering. The algorithm for constructing a k-clustering of maximum spacing is actually the same as Kruskal's Algorithm except because we start with an empty graph H and add the smallest edge in U one at a time until we have k connected components. The only time we don't add an edge (p_i, p_j) is when both p_i and p_j are in the same connected component already, ensuring that we do not create a cycle. We know the spacing of the resultant k-cluster is optimal because we can prove it using proof 4.26. The idea behind the proof is to assume there is some other k-clustering of vertex set U and show that the spacing of this other set, call it C', is at most the spacing of our original solution, C. If C' is not equal to C, this means there exists two nodes p_i and p_j such that both are in cluster C_r in C but are in different clusters in C'. Let's say p_i in C'_s and p_j in C'_t which does not equal C'_s. Since both nodes belong to our solution C, this means the path between them P was added by Kruskal's Algorithm, meaning each edge on P has length at most d*. But these two nodes belong to different clusters in C'. And since the distance of the path P between them minus edges containing p_i and p_j is less than or equal to d*, this means the spacing of C' is at most d*, which concludes our proof.
The proof of this algorithm was a bit hard to follow at first, which is why I typed it out as well as I could in my own words instead of summarizing it more compactly. It makes sense now but I am just wondering, is there some way that a greedy-stays-ahead proof could work to prove the optimality of our solution? It seems like since we are constantly optimizing the edges added to our clusters, the greedy algorithm stays ahead intuitively and often when this is the case we can prove optimality using a greedy-stays-ahead proof.
Rating: 4/10

Chapter 4.8 Summary

Data compression is an area that studies compression algorithms– algorithms that take files as input and reduce their space through efficient encoding. For example, when encoding 32 characters in the English language using 0's and 1's, the simplest solution would be to represent each character using 5 bits since 32 = 2⁵. But anyone who speaks English, or any language for that matter, knows that certain characters are used much more frequently than others and therefore we could more efficiently encode the English language by assigning small numbers of bits to frequent letters and larger numbers of bits to the less frequent letters. Algorithms to implement this type of optimization are what we are looking for.
A prefix code for a set S of letters is a function that maps each letter in S to some sequence of zeros and ones, in such a way that for distinct letters in S, the sequences of x in S's encoding is not a prefix of the sequence y in S. Our ideal optimal prefix code, given an alphabet and a set of frequencies for the letters in the alphabet, will minimize the average number of bits per letter, ABL(γ). We will use a binary tree to help us come up with such an optimal prefix code. Each leaf node of the tree will be a letter in the alphabet S. We start at the root of the tree T and follow along the path to our desired letter x. Each time we move from a node to its left child, we write down a 0, and each time we move from a node to its right child we write down a 1. The bit code we are left with when we finally arrive at x is the encoding for the letter x. Using this system, we ensure that S is a prefix code because no letter's leaf node lies on the path to another leaf node (since each leaf node is the end of a downward path through the tree). Next we observe that the binary tree corresponding to the optimal prefix code is full. This is true because if a certain parent node has only one child, the child could be moved up and become the parent node, thus creating a shorter encoding for this child node, and a more optimal overall encoding function.
Top-down approach that splits the alphabet into nearly-equal halves recursively does not guarantee an optimal prefix code. However, there exists a greedy approach that yields an optimal prefix code using a binary tree. By proof 4.29, we can see that if a leaf node i has greater depth than a leaf node j then the letter represented by node i has to have less frequency than the letter represented by node j. Then it follows that we can build an optimal prefix code by first taking all leaves of depth 1 and labeling them with the highest frequency letters, then labeling the leaves of depth 2 with the next-highest frequency letters, etc. This partially solves our overall problem. It allows us to create an optimal prefix encoding given the structure of tree T and the alphabet with a set of frequencies for the letters in the alphabet. But we aren't really building this optimal encoding “from scratch”.
In order to do so, we must begin building our tree from the bottom up. We will consider the set of all letters in our alphabet initially, choosing the two with the smallest frequencies– call them e and f. These will be merged into one single letter (ef) of frequency f_e + f_f. Then this merged letter will be added back to the set of letters in the alphabet and f and e will be deleted from the set. We will continue to consider and merge the two lowest-frequency letters in the set until all the letters have been merged into one letter, which will become the root node of our tree with frequency 1.
Question: Implementing this algorithm using priority queues via heaps is a clear choice but the idea of using the tree to represent our overall encoding does not seem as obvious. I assume we would use an array for our underlying implementation of the binary tree since it allows constant access?
Rating: 6/10 longest section ever! But reading it again after discussing it in class helped clarify a lot.

Chapter 5.1 Summary

In this section, we look at the mergesort algorithm, a very common example of a divide and conquer algorithm. The general structure of the mergesort algorithm is that we divide the input into two pieces of equal size, sort the two equal halves recursively, then combine the two results into an overall solution, spending linear time for initial division and final recombining. If we let T(n) be the worst-case running time of this algorithm on an input of size n, we can write the recurrence relation shown in (5.1). It basically states that the worst-case runtime T(n) is always less than 2T(n/2) + O(n). O(n) can also be written as cn for some constant c, since this is the actual definition of O(n). Unfortunately this representation of T(n) is not so helpful because we would prefer to have T(n) only on the left side of the inequality, not both sides. We will need to solve the recurrence relation.
There are a few approaches to solving the recurrence relation. One is to “unroll” the recursion, finding a pattern for runtime that depends on the level of recursion. This is the intuitive way to solve the recurrence relation for mergesort. The other way is to start with a guess for the solution, then substitute it into the recurrence relation and check its accuracy.
Here is how “unrolling” the recursion works for mergesort: At the first level, we have a single problem of size n, which by definition takes at most cn plus the time spent in all subsequent recursive calls. The next level consists of two problems each of size n/2 which take time cn/2 plus the time spent in all subsequent recursive calls. Thus this level overall takes cn time as well. this pattern continues, resulting in the fact that each level takes a time of cn. Since there are always going to be logn levels of recursion when given a total input size of n, and each level takes cn time at most, we get a total running time of O(nlogn).
Here is how guessing a solution and substituting it into the recurrence relation works for mergesort: We can prove by induction that T(n) ≤ cnlog₂n for all n ≥ 2. We assume that T(m) ≤ cmlog₂m for all m less than n and then go on to prove inductively that the same holds for n. If we do not know for which k T(n) ≤ knlog₂n, we can also use partial substitution to find this constant k.
Questions: When would we use the guessing a solution approach to solve a recurrence relation? Both types of guessing approaches seem to start with some type of specific idea as to what the runtime will be. Would it ever be easier to use this than the “unrolling” approach?
Rating:4/10

Chapter 5.2 Summary

This chapter is all about unrolling different recurrence relations to come up with asymptotic runtime. We look at the recurrence relation T(n) < qT(n/2) + cn for two separate cases– when q = 1 and when q > 2. In the case that q > 2, unrolling the recurrence the same way we demonstrated extensively in class and on our problem set gives us the result O(n^log2n) as the asymptotic runtime. In the case that q = 1, the runtime is O(n) because each level of the recurrence tree (let this level be k) yields a one problem the size of cn/2^j. These recurrence relations could also have been solved by partial substitution, in which a bound is guessed and then checked through induction. When q = 2, the asymptotic runtime of the algorithm is O(nlogn). This range of running times can be attributed to the different pieces of the algorithm that account for most of the work. When q = 1, the first level dominates the running time whereas when q=2, each level does equal work in contributing to the running time, making it sensible that O(nlogn) would be the running time since this divides the levels evenly. Last, we find that a recurrence relation of T(n) ≤ 2T(n/2) +cn² yields a running time of O(n²). This makes sense because at each level k there are 2^k problems each with a size of (n/2j)². When multiplied and placed into a geometric series, this leaves us with a T(n) less than or equal to O(n²).
Question: It would be helpful to see an example of figuring out asymptotic run time using partial substitution. I think the ideas make sense in the book but it would be difficult for me to think of them on my own.
Rating: 4/10

Chapter 5.3 Summary

This chapter deals with the “movie ranking” problem we worked on in class where two preference lists are considered and we try to compare them to see how similar they are. The strategy is to count the number of inversions in order to measure this. In order to specifically compare two separate lists, it is only necessary to use the ordering of one list as the “regular” ordering and then do all actual operations on the other. This list will be the one involved in the algorithm. To find the number of inversions, it is necessary to split the list into two halves, consider how many inversions exist in each half alone, then how many inversions exist between the two halves. This process will be repeated recursively. The recursive step of this algorithm will also be responsible for sorting the lists in order that the combining step will have to do less work and thus decrease the running time. This important process will be called the Merge-and-Count. The Merge-and-Count will walk along the sorted lists A and B, appending the smallest element to the sorted list C. We use pointers to our current position in each list in order to keep track of our place when adding to C. If the pointers are pointing to two elements a and b, then the lowest of the two simply gets removed and added to C and then the pointer on this list is updated. In this same step we can also count the inversions because if an element of list A is added to C, no inversions exist (since all elements of A are smaller than elements of B) but if an element of B is added to C, then the inversions added by this element are equal to the number of elements remaining in A. Merge-and-Count runs in O(n) time and the entire algorithm requires O(nlogn) time.
Questions: Would this algorithm fall apart if the list were to be recursively divided into three sub-lists? Would it be possible to use more sub-problems or would this break it? The step where we don't have to add inversions to C if the new element comes from A and vice versa for B would be a bit different but I feel like it would still work somehow.
Rating: 4/10

Chapter 5.4 Summary

In this section, we are going to find the closest pair of points! The first step of our algorithm is to sort all n points into order of ascening x-coordinates. Then create another list where they are sorted by ascending y-coordinates. Then we divide the plane of points into two halves with n/2 points. This is done by splitting the list P_x into two halves, Q and R. Next, we look through P_x and P_y and in O(n) time we can create a list of the points in Q sorted by increasing x-coord, a list of points in Q sorted by increasing y-coord, and the same two lists for the R half. Then we calculate the two closest points in R and Q and we then assign the smallest of these two values to the variable delta. Then we can search in between the two halves, with a search radius of delta extending from the center of the plane. If there are no other points within delta from the center line, we know there are none closer than the closest points in R or Q. Otherwise, we search along the dividing line using boxes of size (δ/2 x δ/2) to contain them. If a point is in one such box, it is necessary to compare its distance to any points within the next 15 boxes upwards and to the sides (as long as it's across the axis) of this point's box (and within it). Otherwise, it need not be compared to any other points along the dividing line. This algorithm will complete in O(nlogn) time.
Questions: The proof of this algorithm is a bit fuzzy to me. I would like to discuss it in class.
Rating: 7/10 this is a really cool algorithm.

Chapter 6.1 Summary

This section deals with weighted interval scheduling. The input is a set of jobs with start and end times and values (or weights) and the goal is to accept the set of non-overlapping jobs with the greatest value possible. No greedy algorithm exists. So the idea is that we will design an algorithm based on the idea that for each job, we will either select it or not. If we do not select job j, we go on to consider job j-1. If we do select it, we add the value of j plus the optimal solution of the next non-overlapping set to our final solution. In order to pick whether or not we add job j, we simply recursively choose the maximum of these two options. The efficiency of this algorithm relies on memoizing our recursion. This means after the first time we compute the optimal value for a certain job, we store it in a globally accessible array and thus we will not have to calculate it each time we are calculating the optimal value for other jobs. Using memoization the compute-opt method becomes O(n).

Chapter 6.2 Summary

Instead of defining our solution to the weighted intervals problem recursively, we can also create an equally efficient iterative solution. In this solution, we begin by computing M[0], the memoized optimal value of interval 0, then iterate through the jobs until we reach our job n. From here on out we will use dynamic programming algorithms that use this iterative type of approach instead of recursive. In order to build an algorithm based on dynamic programming, we need a set of subproblems that satisfies these properties:
polynomial number of subproblems
solutions to subproblems can be easily combined to solve overall problem
subproblems can be naturally ordered from “smallest” to “largest”
Question: Why is it better to use the iterative approach than the recursive approach? Is it just for the sake of writing the algorithm or is there some type of running time tradeoff I am missing?
Rating: 5/10 because it didn't give much detail on the reason for our new interest in iterative algorithms instead of recursive ones.

Chapter 6.3 Summary

The next section focuses on problems where we do not just pick between choosing something or not but instead have to choose between some polynomial number of possibilities. These are called “multi-way choices”.
We will try to create a line of best fit given a set of n points in the (x,y) plane. This problem is unique because instead of creating just one line to fit the entire set of points, often it looks like the set of points would be better fit by two lines. This will be accommodated by our algorithm. We will divide the line into partitions P, where P corresponds to a line segment S and for each P, we will calculate the penalty of the partition which is defined as the sum of the number of segments into which P is partitioned and the error value of the optimal line through that segment. The penalty will help us to use dynamic programming to calculate the optimal set of lines to fit our points.

Chapter 6.4 Summary

The knapsack problem!
Given a set of objects with weight and value, and a knapsack with a maximum weight W, how can we maximize the value of the items stored in the knapsack without putting too much weight in it?
In order to solve for Opt(n), we will need to find out the value for Opt(n-1) and the Opt() of all subsets of the first n-1 items. Thus, we will use a 2D array to store the values for the optimal value in the bag per weight per num items added for each subset. This is a version of the memoization we used in previous algorithms to allow us not to have to recalculate and help us easily make comparisons throughout the algorithm.

Chapter 7.1 Summary

This section deals with flow networks. These are networks that model traffic through “switches” passing from edge to edge. These are modeled by graphs with a few components: capacities on the edges, source nodes in the graph that generate traffic, sink nodes in the graph that absorb traffic, and the traffic itself which is transmitted across the edges. The graphs of this form are directed graphs represented by G = (V, E) as any other graph. However, this graph also has a single source node s and a single sink node t. The other nodes will be called internal nodes. An s-t flow is a function mapping each edge e in E to a nonnegative real number. The value f(e) represents the amount of flow carried by edge e. A value f(e) must satisfy the two values: 0≤f(e)≤c_e and the sum of flow into e is equal to the sum of the flow out of e. The problem we will try to solve with this network flow model will be to find the arrangement of traffic that makes the most efficient use possible of a given set of capacities and flows.
The residual graph is a tool we will use to implement an algorithm to solve the maximum-flow problem. The residual graph contains the same nodes as our original graph but its edges are different. Each forward edge (u,v) has a capacity of c_e - f(e) and then a backwards edge (v,u) is added so that we can “undo” this forward flow. The capacity of this backwards edge is f(e).
The augment function defined on page 342 traverses a path P turning it into a simple residual graph by decreasing the remaining capacity of each edge by bottleneck(P, f) and increasing the residual capacity by b. Proof 7.1 proves that the result of this augment function is a flow in G.
Then the Ford-Fulkerson Algorithm uses this augment function and creates an entire residual graph and then uses it to calculate the maximum flow. It does so by running the augment function on every s-t path in the residual graph.

Chapter 7.2 Summary

We will now use the fact that cuts provide very natural upper bounds on the values of flows. This is since the capacity of a cut (A,B) is the sum of capacities on all edges out of A. Then we will prove that v(f)= f^out(A) - fⁱⁿ(A). The intuition is that fⁱⁿ(s) = 0 so v(f) = ^out(s) - ⁱⁿ(s). Then either edge e has both ends in A, e has one end in and one end out of A, or e has neither ends in A. So f(e) either appears once in the sum with a + and once with a -, just once in the sum with a +, or just in the sum with a -. Thus, the sum of flow for edges in A is f^out(A) - fⁱⁿ(A).
This helps us come to the conclusion that the maximum flow in the maximum-flow problem is equal to the minimum cut.

Table of Contents