Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| courses:cs211:winter2018:journals:beckg:ch4 [2018/02/27 04:12] – beckg | courses:cs211:winter2018:journals:beckg:ch4 [2018/03/12 20:48] (current) – beckg | ||
|---|---|---|---|
| Line 55: | Line 55: | ||
| We know that this runs //O(n logn)//. | We know that this runs //O(n logn)//. | ||
| + | Personally, I think they got hung up between the generality of the problem and the examples of classes/ | ||
| ===== 4.2: Scheduling to Minimize Lateness: Exchange Argument ===== | ===== 4.2: Scheduling to Minimize Lateness: Exchange Argument ===== | ||
| Line 77: | Line 78: | ||
| Now, a more nuanced proof based on inequalities illustrates that swapping any given inverted pair will //not// increase maximum lateness. | Now, a more nuanced proof based on inequalities illustrates that swapping any given inverted pair will //not// increase maximum lateness. | ||
| - | Therefore, we can continue the process of doing so from any arbitrary O, and end up at the schedule (or an equivalent, if there are equal end times) produced by our greedy algorithm | + | Therefore, we can continue the process of doing so from any arbitrary O, and end up at the schedule (or an equivalent, if there are equal end times) produced by our greedy algorithm--all the while never increasing the maximum lateness. Hence, our output is indeed optimal. |
| + | This was an enjoyable section, I think it was very well explained and complemented the class material perfectly. 9/10. | ||
| + | ===== 4.4: Shortest Paths in a Graph ===== | ||
| + | Consider a weighted directed graph G=(V,E). We proceed to give an algorithm which will determine the shortest (lowest cost) path from any source node s to each other node. We assume that a path does indeed exist from s to all others. We will call // | ||
| + | |||
| + | We maintain a set S of vertices u for which we have determined a shortest path distance d(u) from s--the " | ||
| + | |||
| + | === Dijkstra' | ||
| + | * Let S be the set of explored nodes | ||
| + | * For each u in S, store distance d(u) | ||
| + | * initialize S={s} and d(s)=0 | ||
| + | * While S != V (not all nodes) | ||
| + | * Select a node v not in S with at least one edge from S for which d' | ||
| + | * Add v to S and define d(v)=d' | ||
| + | |||
| + | === Analyzing Algorithm === | ||
| + | We note that at any point in the algorithm' | ||
| + | |||
| + | Note that this only works under the assumption of no negative lengths, and additionally that it's in essence a continuous breadth-first search. | ||
| + | |||
| + | == Implementation with PQ and runtime == | ||
| + | Using a priority queue to store unexplored nodes, with their priority being their current value d(j), we can get the runtime of this algorithm down to //O(m logn)//. Essentially, | ||
| + | |||
| + | |||
| + | This was yet another good section and went well with the classwork. However, I appreciated the method we went through it in class a bit more, combining the PQ implementation into our main workthrough of the algorithm. 8/10. | ||
| + | |||
| + | |||
| + | ===== 4.5: The Minimum Spanning Tree Problem ===== | ||
| + | Given an undirected weighted graph //G//, we aim to create a network of edges within //G// such that it connects all the nodes of //G// but the edges sum to the minimum cost possible. First, note that the minimum cost solution to this problem is a tree. So, we proceed to define three algorithms which solve this problem using different routes: | ||
| + | * Prim's Algorithm: we essentially run Dijkstra' | ||
| + | * Kruskal' | ||
| + | * Reverse-Delete Algorithm: essentially Kruskal' | ||
| + | |||
| + | === Analyzing Algorithms === | ||
| + | We can first analyze these algorithms by laying out a few properties of when we know that an edge //will// be in a minimum spanning tree (MST) and when we know it will not be. We assume that all edge costs are distinct. | ||
| + | * //Cut Property// | ||
| + | * This property very easily extends to proving that both Kruskal' | ||
| + | * //Cycle Property// | ||
| + | * This property extends to prove that the Reverse-Delete Algorithm produces an MST, because we essentially keep deleting the edges that are part of cycles that are most expensive until we are left with an MST (no cycles, tree, touches all nodes). | ||
| + | It is a short extension to eliminate the assumption of all distinct weights. The proof utilizes small perturbations to make any equal edges actually unequal, but qualitatively we can understand these as simple tiebreakers, | ||
| + | |||
| + | === Implementing Prim's Algorithm === | ||
| + | Prim's Algorithm can be implemented in essentially the exact same manner as Dijkstra' | ||
| + | |||
| + | This section was very well explained, however I personally found it more clear the way we discussed in class (and the way I presented it here): with the Cut and Cycle properties first presented together and then used in analyzing the algorithms together. 8/10. | ||
| + | |||
| + | ===== 4.6: Implementing Kruskal' | ||
| + | As Kruskal' | ||
| + | |||
| + | This structure will support the following operations: | ||
| + | * Find(//u//) will return the name of the set containing node //u// | ||
| + | * Union(// | ||
| + | So, when considering an edge //(u,v)//, the Find operation can be used to test if the nodes are already in the same set (connected component), and if they are not, then we add the edge to our MST-in-progress, | ||
| + | |||
| + | We aim to implement the above operations in //O(logn)// time. Additionally, | ||
| + | |||
| + | Using a pointer-based system, we initialize a record for each node with a null pointer. This does MakeUnionFind in //O(n)// time. The sets will all have names corresponding to a single node in them, and we can easily maintain in constant time a value telling us the size of the set. Then, when performing Union, we simply update the pointer from the namesake node of the smaller set to the namesake of the larger (or if equal, it doesn' | ||
| + | |||
| + | This can actually be further refined by collapsing the pointer system as we trace upwards from a node when calling Find: adjust the pointers of every node we pass to point directly to the namesake. This therefore reduces the time of a sequence of //n// Find calls to //O(n alpha(n))// where // | ||
| + | |||
| + | Regardless, we hit the bottleneck of sorting the //m// edges: //O(m log m)// time, but more succinctly put, because //m =< n< | ||
| + | |||
| + | I thoroughly enjoyed this section and the nuances between the different implementations (array, pointer, and collapse optimized pointer) of the Union-Find structure were very well explained. 9/10. | ||
| + | |||
| + | ===== 4.7: Clustering ===== | ||
| + | Another application of MSTs is // | ||
| + | |||
| + | With regard to formalism, we consider a set //U// of //n// objects labeled // | ||
| + | |||
| + | The solution arises by seeing the clusters as connected components. Sorting the edges in ascending order of distance (cost), we can then iteratively merge these connected components by adding edges (skipping them if they would simply connect back to the same component). This is technically // | ||
| + | |||
| + | Or, we could think of it as taking the MST produced by the algorithm, and deleting the //k-1// most expensive edges. Therefore, within each component we actually have a tree, and there are //k// components. __These components constitute a k-clustering of maximum spacing.__ This hinges on the fact that those most expensive edges are the ones that would have connected any of the //k// components. Therefore, the next edge that would have been added (but wasn' | ||
| + | |||
| + | This was a succinct and yet informative section. They explained the extension (abridgement? | ||
| + | |||
| + | |||
| + | |||
| + | ===== 4.8: Huffman Codes and Data Compression ===== | ||
| + | Encoding complex information into binary sequences is a crucial part of data compression. A particular component is focused on storing information in as little physical memory as possible, and that's where taking advantage of the nonuniform frequencies of letter appearances comes into play. " | ||
| + | |||
| + | A //prefix code// for a set S of letters is a function g that maps each letter to some sequence of zeros and ones, in such a way that for distinct letters x and y in S, the sequence g(x) is not a prefix of the sequence g(y). This then allows the encoded message be perfectly decoded by scanning the bit sequence left to right. //Optimal// prefix codes are those that minimize the average number of bits required per letter (ABL(g)). This coincides with a given text to be translated as it depends on the relative frequencies of the letters' | ||
| + | |||
| + | Note that we can represent prefix codes as binary trees, by simply defining from the root node, taking the path to the left child is a 0 and the right a 1. Additionally, | ||
| + | |||
| + | Proven in the book, we have that there is an optimal prefix code and corresponding tree T in which the two lowest frequency letters are assigned to leaves that are siblings in T. So, given the relative frequencies of each letter, // | ||
| + | * If S has two letters: encode one using 0 and one using 1. | ||
| + | * Else: | ||
| + | * Let y and z be the two lowest freq. letters | ||
| + | * form a new alphabet S' by deleting y and z and replacing them with a new letter w of frequency // | ||
| + | * Recursively construct a prefix code for S' with tree T' | ||
| + | * Define a prefix code for S as follows: | ||
| + | * Start with T' | ||
| + | * Take leaf labeled w and add two children below it labeled y and z | ||
| + | |||
| + | The codes produced by this are called //Huffman codes//, and they achieve the minimum ABL of any prefix code. Implementing the algorithm using heap based PQs, we get a running time of //O(n logn)// for an alphabet of n letters. | ||
| + | |||
| + | I very much liked this section. It was very good explaining everything, as well as the nuances involved in data compression. It certainly " | ||
