======= The Front Matter ======= ==Summary== This section is an introduction to Greedy Algorithms. Greedy algorithms build a solution by making step-by-step, local decisions to optimize underlying criterion. There are two types of proofs: Greedy Stays Ahead and Exchange Argument. Greedy Stays Ahead measures the progress of the algorithm and proves that it does better than or equal to any other algorithm, especially a simple brute search. The exchange argument is when you consider an optimal solution and gradually transform it into the solution yielded by the greedy algorithm without jeopardizing it optimality at any point. ==Notes== * “Greed… is good. Greed is right. Greed works.” * An algorithm is greedy if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion. * When a greedy algorithm succeeds in solving a nontrivial problem optimally, it typically implies something interesting and useful about the structure of the problem itself; there is a local decision rule that one can use to construct optimal solutions. * Types of proofs * Greedy Stays Ahead: if you measure the greedy algorithm’s progress in a step-by-step fashion, one sees that it does better than any other algorithm at each step, and thus, produces an optimal solution. * Exchange Argument: consider any possible solution to the problem and gradually transform it into the solution found by the greedy algorithm without hurting its quality. ==Additional Information== There wasn't anything that I was unclear about or had to reread to understand after class or after my first run through of this section. In terms of readability, this section was a 10 because it was just an introduction section. It was basically definitions, and they were super easy to understand because they were clearly explained. Also, this section was a page, so it was a super quick read. ======= 4.1 Interval Scheduling ======= ==Summary== In this section, there are two problems to solve: Interval Scheduling and Interval Partitioning. In Interval Scheduling, the greedy aspect of the problem is to select intervals based on finish times (the smallest one first); this allows us to maximize the time left to complete other tasks. We prove the algorithm's optimality based on this greedy criterion by using the greedy stays ahead proof. In Interval Partitioning, the greedy aspect is that we select intervals based on the earliest start time. We prove this greedy algorithm for optimality by using a structural argument; the number of resources for the intervals cannot be less than the depth of the intervals. ==The Problem== * A subset of requests is compatible if no two of them overlap in time; our goal is to accept as large a compatible subset as possible * Compatible sets of maximum size are optimal ==Designing a Greedy Algorithm== * Basic idea in greedy algorithm for interval scheduling is to use a simple rule to select a first request i1, then reject all requests not compatible with i1, then select request i2, then reject all requests incompatible with i2, and so on until we run out of requests * The challenge is in designing a good greedy algorithm is deciding which simple rule to use for the selection * There are many natural rules for this problem, which do not give good solutions * Pick based on earliest start time: not optimal * Pick based on smallest interval of time: not optimal * For each request, count the number of requests that are not compatible and accept the request that has the fewest number of incompatible requests (pick based on fewest conflicts): surprisingly, not optimal * The optimal solution: accept the first request that finishes first, a request for which the finish time is the smallest * Allows us to maximize the time left to satisfy other requests Algorithm Initially let R be the set of all requests, and let A be empty While R is not yet empty Choose a request i in R that has the smallest finishing time Add request i to A Delete all requests from R that are not compatible with request i Endwhile Return the set A as the set of accepted requests ==Analyzing the Algorithm== * A is a compatible set of requests * We need to show that the solution yielded by the algorithm is optimal * We will show that the size of A (set of compatible requests yielded by the algorithm) and the size of O (an optimal solution) are equal, A has the same number of intervals as O and hence is also optimal * We will use a greedy stays ahead proof * We want to show that each of A’s intervals finishes at least as ssoon as the corresponding interval in set O * For each r>=1, the rth accepted request in the algorithm’s scheduling finishes no later than the rth request in the optimal schedule * For all indices r<=k we have finish time of rth i (ir) is less than or equal to the finish time of rth j (jr) * Now let r>1. Assume statement is true for r-1, and we will prove it for r. In order for the greedy algorithm to not be optimal, it would have to fall behind. However, it will not because the greedy always has the option at worst of choosing jr. * The greedy algorithm returns an optimal set A * If A is not optimal, an optimal set O must have more requests, must have m>k. Therefore, there must be a request j(k+1) in O, which starts after jk ends, and thus after ik ends, therefore j(k+1) would be compatible with ik and the set A. The greedy algorithm stops with request ik, and it is only supposed to stop when R is empty – a contradiction. * Runtime: O(nlogn), most costly portion of the algorithm is sorting the requests in order of finishing time * Extensions: scheduler needs to make decisions about accepting and rejecting before knowing about the full set of requests, requests may time out if the scheduler waits too long to schedule; each request has a different value, want to maximize the value ==Scheduling All Intervals== * Goal is to partition all intervals across multiple resources, Interval Partitioning Problem * We have many identical resources available, and we want to schedule all requests using as few resources as possible * Define the depth of a set of intervals as the maximum number that pass over any single point on the timeline * In any instance of Interval Partitioning, the number of resources needed is at least the depth of the set of intervals * A set of intervals has depth d and let requests 1 through d pass over a common point on the timeline. Each of these intervals must be scheduled on a different resource, need at least d resources * Can we design an algorithm that schedules all intervals using the minimum possible number of resources? Is there always a schedule using a number of resources that is equal to the depth? Yes and yes!!!!! * Design a greedy algorithm that schedules all intervals using a number of resources equal to depth, no solution can have number of resources that is smaller than the depth * Structural argument: there is a structural bound asserting that every possible solution must have at least a certain value Algorithm Sort the intervals by their start times, breaking ties arbitrarily Let I1, I2, …., In denote the intervals in this order For j = 1, 2, 3, …, n: For each interval Ii that precedes Ij in sorted order and overlaps it: Exclude the label of Ii from considerations for Ij’ Endfor If there is any label from {1, 2,…, d} that has not been excluded then: Assign a nonexcluded label to Ij Else Leave Ij unlabeled Endif Endfor * If we use the greedy algorithm above, every interval will be assigned a label, and no two overlapping will receive the same label. * The greedy algorithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of intervals. This is the optimal number of resources needed. ==Additional Information== In class, I understood what the structural argument was in the specific Interval Partitioning problem: the optimal solution cannot be less than the depth of the intervals. However, I was kind of confused by what it was in a broader sense. Luckily, after reading this section, it became much clearer. You just have to prove that there is a structural bound asserting that every possible solution must have at least a certain value. Then, you have to show that the greedy algorithm yields a result that has that minimum value. In terms of readability, this section is a solid 9. Because I read this section after hearing the lecture in class, it was super easy to comprehend, and the proof s were written very clearly. The only proof that was a little difficult to understand in terms of the way that the book explained it was the greedy stays ahead proof. Luckily, it wasn't too difficult to get because we learned about it in class. ======= 4.2 Scheduling to Minimize Lateness ======= ==Summary== We want to be able to schedule all of the tasks, without overlapping, so to minimize the maximum lateness. Scheduling jobs in order of increasing length and by shortest slack time do not always yield optimal solutions. The greedy criterion for this problem is sort the jobs in increasing order of the deadline and then schedule them in that order. Of course, there would be no idle time. To prove optimality for this greedy algorithm, we will use an exchange argument in which we will modify an optimal solution, ensuring it remains optimal at each step, slowly until we transform the old optimal solution into the solution that our greedy algorithm would produce. In this case, we would modify an optimal solution with inversions to get it to have no inversions to prove the greedy algorithm's optimality. ==The Problem== * The request i has a deadline di, requires a contiguous time interval of length ti, but it is willing to be scheduled at any time before the deadline. Each accepted request must be assigned an interval of time of length ti, and different requests must be assigned to non-overlapping intervals. * Request i is late if it misses the deadline * Our goal is to schedule all requests using non-overlapping intervals, so as to minimize the maximum lateness ==Designing the Algorithm== * Schedule jobs in order of increasing length: not optimal * Scheduling jobs by smallest slack time: not optimal * The optimal solution: sort the jobs in increasing order of their deadlines di, and schedule them in order (Earliest Deadline First Rule). * s is start time, f is finishtime, d is deadline, t is time length of the interval Algorithm Order the jobs in order of their deadline Assume that d1<=…<=dn (sorted deadlines) Initially, f = s Consider jobs i=1,…, n in this order: Assign job I to the time interval from s(i) = f to f(i) = f + ti Let f = f + ti End Return the set of scheduled intervals [s(i), f(i)] for i = 1,…, n ==Analyzing the Algorithm== * The algorithm above produces no gaps, no idle time. * There is an optimal solution with no idle time. * Exchange argument: gradually modify O, preserving its optimality at each step, but eventually transforming it into a schedule that is identical to schedule A found by the greedy algorithm. * All schedules with no inversions and no idle time have the same maximum lateness. * They can only differ in the order in which jobs with identical deadlines are scheduled, which cannot affect the maximum lateness. * There is an optimal schedule that has no inversions and no idle time. * If O has an inversion, then there is a pair of jobs i and j such that j is scheduled immediately after i and has dj 0 when pi and pj are distinct, and d(pi, pj) = d(pj, pi). The algorithm to find this clustering is basically running Kruskal’s Algorithm but stopping it right before it adds its last k-1 edges. The connected components formed by deleting the k-1 most expensive edges of the minimum spanning tree T create a k-clustering of maximum spacing. ==The Problem== * A collection of objects that one is trying to classify or organize into coherent groups. * Define a distance function on the objects, with the interpretation that objects at a larger distance from one another are less similar to each other. * Given a distance function on the objects, the clustering problem seeks to divide them into groups so that objects within the same group are “close” and objects in different groups are “far apart”. * N objects labeled p1, p2, …, pn * For each pair pi and pj, we have distance d(pi, pj) * d(pi, pi) = 0, d(pi, pj) > 0 for distinct pi and pj, and d(pi, pj) = d(pj, pi) * we want to divide the objects in U into k groups for a given parameter k * a k-clustering of U is a partition of U into k nonempty sets C1, C2,…Ck * spacing of a k-clustering to be the minimum distance between any pair of points lying in different clusters * we are looking for the k-clustering with the maximum possible sapcing ==Designing the Algorithm== * consider growing a graph on the vertex set U * connected components will be clusters * drawing an edge between the closest pair of points, draw an edge between the next closest pair of points, and so on; continue adding edges between pairs in order of increasing distance d(pi, pj) * if we are about to add edge (pi, pj) and find pi and pj are already in the same cluster, we will not add the edge (never create a cycle which means we will have union trees) * the iterative merging of clusters is termed single-linked clustering, a special case of hierarchical agglomerative clustering * our procedure above is Kruskal’s Minimum Spanning Tree Algorithm; the only difference is we stop the procedure once we obtain k connected components * run Kruskal’s Algorithm but stop it just before it adds its last k-1 edges * The components C1, C2, …, Ck formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing. * The spacing of the k-clustering is precisely the length d* of the (k-1)st most expensive edges in the minimum spanning tree. d(p, p’) <= d*, since the edge (p, p’) was added by Kruskal’s algorithm. Consider some other clustering, which partitions U into nonempty sets C1', C2',... Ck'. We need to show that the spacing of the other clustering is at most d*. Because the two clusterings are not the same, one of our clusters Cr is not a subset of any off the k sets Cs'. Because two nodes pi and pj belong to the same component Cr, Kruskal's algorithm must have added all of the edges of a pi-pj path P before it stopped. Each edge on P has at most d*. We know that pi in Cs' but pj is not in Cs'. Thus, d(p, p') <= d* because the edge (p, p') was added by Kruskal's Algorithm. Therefore, p' is the first node on P that does not belong to Cs' and p is the nodes on P that comes just before p'. But, p and p’ belong to different sets in the k-clustering, hence the spanning of the k-clustering is at most d(p, p’) <= d*. ==Additional Information== The thing that confused me in this section was the proof of the algorithm for correctness. The notation in the proof was super confusing to me. This proof rests on the fact that Kruskal's Algorithm adds the smallest distance edges first and leaves the largest for last, which it does not add. Thus, in any instance when there is a deviation between the resulting MST from Kruskal's Algorithm and another algorithm. The distance of two points that are not in the same cluster in the deviant output its distance is less than or equal to the distance added by Kruskal's. Therefore, Kruskal's produces the optimal solution. On a scale of 1 to 10, the readability of this section was a 9. The book did a really great job explaining what a clustering was, and the description of the algorithm was super clear. The only thing that confused me a little was the proof of the algorithm we used. But, still, I feel like I at least got the gist of the proof. ======= 4.8 Huffman Codes ======= ==Summary== We want to map encodings to letters so that no encoding is the prefix of another in order to avoid ambiguity. A prefix code that minimizes the average bits per letter is optimal. In other words, more frequently used letters should have shorter encodings in order to compress the data. A brute force algorithm is infeasible because the search space is so large and so variable. We will use a full binary tree to represent prefix codes all letters encoded will be leaf nodes to avoid any encoding being the prefix of another. Meta-letters will be interior nodes (parents to letters). The greedy aspect of Huffman's algorithm is that you merge the two lowest frequency letters in the algorithm. The bottom-up approach is the optimal approach. The runtime of the algorithm is O(klogk). ==The Problem== * A greedy rule is used to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion * The global consequences of the initial greedy decision do not become fully apparent until the full recursion is complete * Data compression, an area that forms part of the foundation for digital communication * The mapping of bit strings to symbols is arbitrary * The letters in most human alphabets do not get used equally frequently * Issue of reducing the average number of bits per letter is a fundamental problem in the area of data compression * How might we construct the optimal way to take advantage of the non-uniform frequencies of the letters? * Appealing to the problem of compressing data: squeezes all the available gains out of non-uniformities in the frequencies * Variable Length Encoding Schemes * Morse code uses such short strings for the letters that the encoding of words becomes ambiguous * To deal with this ambiguity, Morse code transmissions involve short pauses between letters * Prefix Codes * The ambiguity in Morse code arises because there exists pairs of letters where the bit string that encodes one letter is a prefix of the bit string that encodes another * It is enough to map the letters to bit strings in such a way that no encoding is a prefix of any other * A prefix code for a set S of letters is a function γ that maps each letter x in set S to some sequence of zeroes and ones, in such a way that for distinct x, y in set S, the sequence γ (x) is not a prefix of the sequence γ (y). * If we hand this message to recipient who knows the function y, they will be able to reconstruct the text according to the following rule * Scan the bit sequence from left to right * As soon as you’ve seen enough bits to match the encoding of some letter, output this as the first letter of the text (must be the correct first letter because no shorter or longer prefix of the bit sequence could encode any other letter) * Delete the corresponding set of bits from the front of the message and iterate * Optimal Prefix Codes * More frequent letters can have shorter encodings * Notation to express the frequencies of letters; below is the average number of bits required per letter * ALB(γ ) = Σx∈Sfx |γ(x)| * A prefix code that minimizes the average number of bits per letter [ALB(γ )] is optimal ==Designing the Algorithm== * Search space includes all possible ways of mapping letters to bit strings, subject to defining property of prefix codes * Brute force is infeasible * To construct an optimal prefix code: develop a tree-based means of representing prefix codes * In using a binary tree, follow the path from the root to the leaf labeled, go from a node to its left child = 0, go from a node to its right child = 1, resulting string of bits is encoding of x * The encoding of S constructed from T is a prefix code * Start with a root; all letters x in set S whose encodings begin with 0 will be leaves of the left subtree of the root and all letters y in set S whose encodings begin with a l will be leaves in the right subtree, build two subtrees recursively using this rule * Search for optimal prefix code can be viewed as the search for a binary tree T, together with the labeling of the leaves of T, that minimizes the average number of bits per letter * The length of the encoding of a letter x in set S is simply the length of the path from the root to the leaf labeled x * Seeking the labeled tree that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: Σx∈Sfx *depthT(x) * A binary tree is full if each node that is not a leaf has two children * The binary tree corresponding to the optimal prefix code is full * A natural way to do this would be to try building a tree from the top down by packing the leaves as tightly as possible: this is suboptimal * Suppose that u and v are leaves of T*, such that depth(u) < depth(v). Suppose that in a labeling pf T*, corresponding to an optimal prefix code, leaf u is labeled with y in set S and leaf v is labeled with z in set S. Then fy >= fz. * Go through leaves in order of increasing depth, assigning letters in order of decreasing frequency * Among the labels we assign a block of leaves all at the same depth, it doesn’t matter which label we assign to which leaf * Consider a lead v in T* whose depth is as large as possible. Leaf v has a parent w and T* is a full binary tree, so u has another child w. We refer to v and w as siblings since the since they have a common parent. * w is a leaf of T* * There is an optimal prefix code with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*. * y* and z* are the two lowest frequency letters in S * This common parent acts like a “meta-letter” whose frequency is the sum of frequencies of y* and z*. ==Huffman’s Algorithm== To construct a prefix code for an alphabet S, with given frequencies: If S has two letters: Encode one letter using 0 and the other using 1 Else: Let y* and z* be the two lowest-frequency letters Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter w of frequency fy*+fz* Recursively construct a prefix code y’ for S’, with tree T’ Define a prefix code for S as follows: Start with T’ Take the leaf labeled w and add two children below it labeled y* and z* Endif * The greedy rule underlying Huffman’s Algorithm – the merging of the two lowest frequency letters – fits into the structure of the algorithm as a whole * Simply commit to having them be children of the same parent, and this is enough to produce a new, equivalent problem with one less letter * Bottom-up approach: it focuses on the leaves representing the two lowest frequency letters, and then continues by recursion ==Analyzing the Algorithm== * The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code. * Implementation and Runtime * There are k-1 iterations of recursive calls * Using an implementation of priority queues via heaps, we can make insertion and extraction of minimum in O(logk) time * Each iteration, which performs three of these operations, takes time O(logk) * Summing over all k iterations, we get a total runtime of O(klogk) ==Extensions== * The challenge here is to define an encoding schedule where the notion of using fractions of bits is well-defined * Another drawback of prefix codes is that they cannot adapt to changes in the text ==Additional Information== A concept that was troubling me in the beginning stages of understanding Huffman codes was the concept of the meta-letter that is the parent of leaf nodes. As a read the section after the class, it made more sense. The meta-letter is there because a letter node cannot be there. A letter encoding cannot be an interior node because that means that it is the prefix of another node, which is the point of the Huffman codes. There can be no encoding that is the prefix of another. Moreover, meta-letters are helpful and make more sense in Huffman's optimal bottom up approach anyway. The first meta-letter joins the two letters of the lowest frequency and that way its like a super letter and now the alphabet has one less letter. This explanation of the purpose of the meta-letter also makes it clear why we would implement the algorithm with a priority queue. On a scale of 1 to 10, the readability of this section is a 9. The book explained really well the notation and the concepts behind the Huffman Code. Also, the main part of the code involved binary trees, which we've already seen multiple times in this class, so it was really building off the knowledge we already knew.