Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| courses:cs211:winter2018:journals:cohene:home:chapter4 [2018/02/22 17:20] – [4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead] cohene | courses:cs211:winter2018:journals:cohene:home:chapter4 [2018/03/12 23:28] (current) – Added Section 4.8 cohene | ||
|---|---|---|---|
| Line 20: | Line 20: | ||
| We now look at greedy algorithms through the exchange argument. Here, we go back to the use of a single resource and a set of n tasks to use the resource for a certain interval of time. Now, these requests have deadlines, but still take a time period of t to complete. We want to schedule each task to minimize lateness. This problem seems much easier to think about a possible greedy solution for as it reminds me of in general scheduling things to be completed, like homework. Just as in the solution, when I do my homework, it is the most logical to complete items that are due first, rather than doing what takes the shortest amount of time or some other method. | We now look at greedy algorithms through the exchange argument. Here, we go back to the use of a single resource and a set of n tasks to use the resource for a certain interval of time. Now, these requests have deadlines, but still take a time period of t to complete. We want to schedule each task to minimize lateness. This problem seems much easier to think about a possible greedy solution for as it reminds me of in general scheduling things to be completed, like homework. Just as in the solution, when I do my homework, it is the most logical to complete items that are due first, rather than doing what takes the shortest amount of time or some other method. | ||
| + | |||
| A request is considered late if it misses its deadline. Here, we want to minimize the maximum lateness. As already mentioned, the optimal solution here is pretty intuitive, schedule based on earliest deadline, however, the proof isn’t so simple. First, we want to order jobs by their deadline. Then we want to assign the next job to start once we have completed the previous task. In this algorithm there are no gaps in time. This means that there is an optimal schedule with no idle time. Considering an optimal schedule, O, we want to modify O at each step, eventually transforming it into a schedule that is identical to our result from the greedy algorithm. This is an exchange argument. An inversion is if a job is scheduled before a job with an earlier deadline. By definition our greedy algorithm has no inversions. We can then say that all schedules with no inversions and no idle time have the same maximum lateness, and the same about the optimal solution. We can prove that both of these scenarios exist, and so the schedule obtained by the greedy algorithm is optimal. | A request is considered late if it misses its deadline. Here, we want to minimize the maximum lateness. As already mentioned, the optimal solution here is pretty intuitive, schedule based on earliest deadline, however, the proof isn’t so simple. First, we want to order jobs by their deadline. Then we want to assign the next job to start once we have completed the previous task. In this algorithm there are no gaps in time. This means that there is an optimal schedule with no idle time. Considering an optimal schedule, O, we want to modify O at each step, eventually transforming it into a schedule that is identical to our result from the greedy algorithm. This is an exchange argument. An inversion is if a job is scheduled before a job with an earlier deadline. By definition our greedy algorithm has no inversions. We can then say that all schedules with no inversions and no idle time have the same maximum lateness, and the same about the optimal solution. We can prove that both of these scenarios exist, and so the schedule obtained by the greedy algorithm is optimal. | ||
| Line 30: | Line 31: | ||
| For Dijkstra’s algorithm to work, the edges must have positive lengths. It can also be observed that it is a “continuous version” of breadth-first search. Now, we look at the runtime of this algorithm. We know that it will execute n-1 times for a graph with n nodes. To find the path to the next node we need to check each edge, which means this would take O(m) time, and for each node O(mn) time. The use of additional data structures, however, can improve this runtime. If we use a priority queue to hold the nodes and then remove them when they are explored, it can reduce this runtime to O(m) time. Implementing the heap-based priority queue takes O(log n) time, so the overall time for this algorithm would take O(m log n) time. | For Dijkstra’s algorithm to work, the edges must have positive lengths. It can also be observed that it is a “continuous version” of breadth-first search. Now, we look at the runtime of this algorithm. We know that it will execute n-1 times for a graph with n nodes. To find the path to the next node we need to check each edge, which means this would take O(m) time, and for each node O(mn) time. The use of additional data structures, however, can improve this runtime. If we use a priority queue to hold the nodes and then remove them when they are explored, it can reduce this runtime to O(m) time. Implementing the heap-based priority queue takes O(log n) time, so the overall time for this algorithm would take O(m log n) time. | ||
| + | |||
| + | |||
| + | ===== 4.5: The Minimum Spanning Problem ===== | ||
| + | |||
| + | This section applies an exchange argument to solve the Minimum Spanning Problem. The minimum spanning problem occurs when there is a set of locations that can be represented through nodes and we want to create a network on top of those locations. The network should be connected, having a path between each location, with the goal of creating the " | ||
| + | |||
| + | We know that the minimum-cost solution to this problem can be turned into a tree, (V, T), where V are the locations and T is the solution to the problem. (V, T) must be connected by definition, and we need to show that it does not contain a cycle. To prove that there are no cycles, we use a proof by contradiction. Assuming there is some edge in a proposed cycle, the tree without that edge is still connected as any path with that edge can now find a longer path around the rest of the cycle instead. This also in turn solves the problem and is less expensive, which gives us a contradiction. A subset of the tree can be called a spanning tree of the graph. | ||
| + | |||
| + | There are three greedy algorithms which creates a minimum spanning tree. The first of with is Kruskal' | ||
| + | |||
| + | |||
| + | It is considered " | ||
| + | |||
| + | We can implement both Kruskal' | ||
| + | |||
| + | Overall this section was difficult to follow for me. I had a difficult time understanding some of this during class, and the textbook readings did not help very much. Overall readability was about a 4 out of 10. | ||
| + | ===== 4.6: Implementing Kruskal' | ||
| + | |||
| + | The Union-Find data-structure will store a representation of connected components that allows us to quickly and efficiently search and update these components. This turns out to be the same data structure in Kruskal' | ||
| + | |||
| + | To create this structure, we create an array Component, and a set with n elements. To create the union-find, we initialize the Component array to the set. We can optimize this set by choosing the name for the union to be the name of one of the sets. Furthermore, | ||
| + | |||
| + | |||
| + | We can implement Kruskal' | ||
| + | |||
| + | ===== 4.7: Clustering ===== | ||
| + | |||
| + | This section discusses clustering. Clustering can be used whenever one is attempting to organize a collection into various groups. There are various ways to group a collection. One way is by distance, such as with points on a plane. We can use minimum spanning trees to play a part in clusterings of maximum spacing. If we want to divide objects into k groups, we will have a k-clustering as a partition of a union into k nonempty sets. The spacing of a k-clustering is the minimum distance between any pair of points in different clusters. | ||
| + | |||
| + | |||
| + | To find a clustering of maximum spacing, we want to cluster points as rapidly as possible. We grow a graph edge by edge, with connected components corresponding to clusters. This process will never create a cycle, but rather a union of trees. This procedure is extremely similar to Kruskal' | ||
| + | |||
| + | |||
| + | ===== 4.8: Huffman Codes and Data Compression ===== | ||
| + | |||
| + | This chapter discusses solving problems using data compression. Here, the problem is encoding symbols using bits. To convert various " | ||
| + | |||
| + | One example of how data has been compressed is Morse code. Morse code translates each symbol into dots or dashes. However, Morse code removes ambiguity between letters that are represented similarly by adding a space in between each letter. If we were to turn this into bits, the ambiguity would remain as certain encodings are prefixes to others. A method must be built to eliminate the prefix problem. The method could work as follows: | ||
| + | Start reading bits from left to right, once you come across enough bits to match an encoding this becomes a letter, continue with the next bit and repeat. | ||
| + | |||
| + | Now, we must account for letter frequencies to make the most optimal encoding. We can use a greedy algorithm to find the optimal prefix code. First, we can build a tree to help us build each encoding. We will create a binary tree where each time we go from a node to its left child we add a 0 and every time we go from a node to its right child we add a 1. By using a tree, we can see that the length of each symbol in bits is its corresponding layer in the tree. We want to fill the binary tree to make it optimal. | ||
| + | |||
| + | The first thought of building is tree is to attempt to build it top down. However, if we knew the optimal prefix code, this would help a great amount. We could then simply fill in the nodes of the tree. We would start with the highest frequency letters and move down through the tree, labeling nodes in order of decreasing frequency. This is essentially the basis of Huffman' | ||
| + | |||
| + | Huffman' | ||
| + | |||
