Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:beckg:ch4 [2018/02/27 00:27] – created beckg
+++ courses:cs211:winter2018:journals:beckg:ch4 [2018/03/12 20:48] (current) – beckg
@@ Line 1: / Line 1: @@
 ===== Greedy Algorithms =====
 Greedy algorithms essentially use a localized decision making process to yield a globally optimal solution. Fortunately, the ability of a greedy algorithm to solve a nontrivial usually reveals a great deal about the structure and nature of the problem itself. In the first two sections of the chapter we will work through two example problems to illustrate two typical methods of proving the greedy algorithm indeed yields an optimal solution:
-  - greedy stays ahead
+  - greedy stays ahead (and structural)
-  - exchange (and structural) argument.
+  - exchange argument.
 ===== 4.1: Interval Scheduling: Greedy Stays Ahead =====
+==== Interval Scheduling Problem ====
 The //Interval Scheduling Problem// consists of a set of //n// requests for use of a single resource, each //i<sup>th</sup>// request corresponding to an interval of time starting at //s(i)// and ending at //t(i)//. Essentially, we want to get the maximum number of "jobs" scheduled where none of them overlap. The efficient method of this is a greedy algorithm.
@@ Line 31: / Line 32: @@
+==== Interval Partitioning Problem ====
+A variation on the problem we just did is the //interval partitioning problem//. The easiest way to think of this is a set of class lectures that each occur at specific time intervals, and we want an algorithm to return a schedule that minimizes the number of classrooms required.
+We define the //depth// of the set of intervals to be the maximum number that overlap at a given time. Clearly, the number of classrooms needed is at least the depth of the set. We proceed to give an algorithm which schedules all lectures in a number of classrooms equal to the depth.
+Suppose we are given a set of intervals. Let //d// be the depth of the set. We will work to assign each interval (class lecture) a label from 1 to //d//, which we can interpret as the classroom designated for that lecture. The following algorithm begins by sorting the intervals according to start times, and then we assign the first a label (1), and then iterate through the sorted intervals, adding them to the same classroom if they fit, and if not, adding them to the next highest classroom label. The algorithm is as follows:
+  * Sort intervals by __start times__ so that //s(1)=<s(2)=<...=<s(n)//
+  * d = 0 (number of allocated classrooms)
+  * for j = 1 to n
+    * if (lecture j is compatible with some classroom k)
+      * schedule lecture j in classroom k
+    * else
+      * allocate a new classroom d+1
+      * schedule lecture j in classroom d+1
+      * d = d+1
+=== Analyzing Algorithm ===
+We see that every interval will be assigned a label and no two overlapping intervals will receive same label. We know we will never reach a point where all labels are currently in use due to the structural argument that the depth is indeed maximum. So, we will never be triggered to use a //depth+1// level classroom. Hence, our output schedule is indeed optimal.
+== Runtime ==
+We know that this runs //O(n logn)//.
+Personally, I think they got hung up between the generality of the problem and the examples of classes/jobs. The generality they discussed (classrooms as "resources", etc.) did not help in the ways that mathematical formalism does, and then they also got caught between the two quite often. I'm leaning toward a 7/10.
+===== 4.2: Scheduling to Minimize Lateness: Exchange Argument =====
+In this problem, we are given a set of tasks, each which takes a set time to complete (t(j)), and has a set deadline (d(j)). We proceed to give an algorithm which will minimize the maximum lateness among all the tasks (again assuming none can overlap):
+  * Sort n jobs by __deadline__ so that //d(1)=<d(2)=<...=<d(n)//
+  * t = 0
+  * for j=1 to n
+    * assign job j to interval [t, t+t(j)]
+    * s(j)=t
+    * f(j)=t+t(j)
+    * t=t+t(j)
+  * output intervals [s(j),f(j)]
+=== Analyzing Algorithm ===
+We can make a number of observations. First, there is of course an optimal schedule with no idle time (time in which no job is being done). Defining an inversion to be jobs i and j such that i is scheduled before j but d(j)<d(i), we note that our algorithm automatically produces a schedule without inversions. Additionally, we know that all schedules with no inversions and no idle time have the same maximum lateness--because it then would only depend on swapping of adjacent jobs that have the same deadline.
+Now, consider an optimal schedule O, with no idle time. We know that if O has an inversion, then there is a pair of jobs i and j such that j is immediately after i and has d(j)<d(i). This means that if there is an inversion somewhere, there must be an adjacent inversion somewhere.
+Suppose O has at least one inversion, and suppose i and j are an adjacent inversion as mentioned above. Then we know that upon swapping them, no new inversion is created, but we eliminate one. So we swap them and get a schedule with one less inversion.
+Now, a more nuanced proof based on inequalities illustrates that swapping any given inverted pair will //not// increase maximum lateness.
+Therefore, we can continue the process of doing so from any arbitrary O, and end up at the schedule (or an equivalent, if there are equal end times) produced by our greedy algorithm--all the while never increasing the maximum lateness. Hence, our output is indeed optimal.
+This was an enjoyable section, I think it was very well explained and complemented the class material perfectly. 9/10.
+===== 4.4: Shortest Paths in a Graph =====
+Consider a weighted directed graph G=(V,E). We proceed to give an algorithm which will determine the shortest (lowest cost) path from any source node s to each other node. We assume that a path does indeed exist from s to all others. We will call //l<sub>e</sub>// the length of edge e. Then the length of any path is the sum of those edge lengths in it.
+We maintain a set S of vertices u for which we have determined a shortest path distance d(u) from s--the "explored" component.
+=== Dijkstra's Algorithm (G, //l//) ===
+  * Let S be the set of explored nodes
+    * For each u in S, store distance d(u)
+  * initialize S={s} and d(s)=0
+  * While S != V (not all nodes)
+    * Select a node v not in S with at least one edge from S for which d'(v)=min<sub>//e=(u,v):u in S//</sub>d(u) + //l<sub>e</sub>// is as small as possible
+    * Add v to S and define d(v)=d'(v)
+=== Analyzing Algorithm ===
+We note that at any point in the algorithm's execution, for any node u in S, the length of the path from s to u given is indeed a shortest path possible. This immediately gives us algorithm correctness because when it finishes, everything will be in S.
+Note that this only works under the assumption of no negative lengths, and additionally that it's in essence a continuous breadth-first search.
+== Implementation with PQ and runtime ==
+Using a priority queue to store unexplored nodes, with their priority being their current value d(j), we can get the runtime of this algorithm down to //O(m logn)//. Essentially, we use the //O(logn)// time process of updating the priorities in the PQ as we update the values of d(j) (which we do with each edge, so m times), and then the remove function is also //O(logn)// which we do n times. But we know that m is greater than n for anything but a straight line graph, so that term dominates. So, with a heap-based PQ implementation, we yield the runtime of //O(m logn)//.
+This was yet another good section and went well with the classwork. However, I appreciated the method we went through it in class a bit more, combining the PQ implementation into our main workthrough of the algorithm. 8/10.
+===== 4.5: The Minimum Spanning Tree Problem =====
+Given an undirected weighted graph //G//, we aim to create a network of edges within //G// such that it connects all the nodes of //G// but the edges sum to the minimum cost possible. First, note that the minimum cost solution to this problem is a tree. So, we proceed to define three algorithms which solve this problem using different routes:
+  * Prim's Algorithm: we essentially run Dijkstra's algorithm, maintaining a partial tree (our "explored" set from Dijkstra's) as we build greedily outward by always adding the node that costs the least.
+  * Kruskal's Algorithm: builds the spanning tree by successively inserting edges. We consider the edges in order of ascending cost, and if adding it would create a cycle in our tree (thus making it not a tree), then we discard/skip it. If it would not create a cycle, then we add it and go to the next.
+  * Reverse-Delete Algorithm: essentially Kruskal's, but backward. We consider the edges in descending order of cost, and instead of starting with an empty tree and building, we start with the whole graph and successively delete. So, when considering an edge, if its deletion would lead to some node or nodes becoming disconnected, then we do not delete it and move on to the next.
+=== Analyzing Algorithms ===
+We can first analyze these algorithms by laying out a few properties of when we know that an edge //will// be in a minimum spanning tree (MST) and when we know it will not be. We assume that all edge costs are distinct.
+  * //Cut Property//--when we know that every MST contains edge //e//. Let //S// be any subset of nodes that is neither empty nor equal to all of //V// (the set of all our nodes in //G//), and let edge //e=(v,w)// be the minimum cost edge with one end in //S// and the other in //V-S//. Then every MST contains //e//.
+    * This property very easily extends to proving that both Kruskal's and Prim's Algorithms produce a minimum spanning tree. This is because each essentially build outward in expanding connected sets until they span all the nodes, although they differ in the directions they take.
+  * //Cycle Property//--when we know that no MST contains edge //e//. Let //C// be any cycle in //G//, and let edge //e=(v,w)// be the most expensive edge belonging to //C//. Then //e// does not belong to any minimum spanning tree.
+    * This property extends to prove that the Reverse-Delete Algorithm produces an MST, because we essentially keep deleting the edges that are part of cycles that are most expensive until we are left with an MST (no cycles, tree, touches all nodes).
+It is a short extension to eliminate the assumption of all distinct weights. The proof utilizes small perturbations to make any equal edges actually unequal, but qualitatively we can understand these as simple tiebreakers, and realize that the net cost of the spanning tree will not change based on the choice between tied cost edges in the algorithms.
+=== Implementing Prim's Algorithm ===
+Prim's Algorithm can be implemented in essentially the exact same manner as Dijkstra's (above). Therefore, with a priority queue, the algorithm runs in //O(m logn)// time.
+This section was very well explained, however I personally found it more clear the way we discussed in class (and the way I presented it here): with the Cut and Cycle properties first presented together and then used in analyzing the algorithms together. 8/10.
+===== 4.6: Implementing Kruskal's Algorithm: the Union-Find Data Structure =====
+As Kruskal's Algorithm works forward, it adds edges to our solution that do not necessarily reach out of a single connected set. In fact, it slowly combines connected subsets until they become one large connected set: our MST. So, given this problem, we create a boutique data structure--the Union-Find data structure--to suit our exact needs.
+This structure will support the following operations:
+  * Find(//u//) will return the name of the set containing node //u//
+  * Union(//A,B//) will merge two sets //A// and //B// into a single, new set.
+So, when considering an edge //(u,v)//, the Find operation can be used to test if the nodes are already in the same set (connected component), and if they are not, then we add the edge to our MST-in-progress, and perform Union(Find(u),Find(v)) to merge their connected components into a single new one.
+We aim to implement the above operations in //O(logn)// time. Additionally, we must include the MakeUnionFind(V) operation which takes our set of set of vertices and initializes a Union-Find structure where each is in a separate connected component set on its own. We aim to make this operation //O(n)//.
+Using a pointer-based system, we initialize a record for each node with a null pointer. This does MakeUnionFind in //O(n)// time. The sets will all have names corresponding to a single node in them, and we can easily maintain in constant time a value telling us the size of the set. Then, when performing Union, we simply update the pointer from the namesake node of the smaller set to the namesake of the larger (or if equal, it doesn't matter). This takes //O(1)// time. Then, when calling Find on a given node, we simply follow the pointer trail upward until we reach the namesake node. Because of the name convention keeping the larger set's namesake, the sets size can at most double //logn// times, and therefore there can be at most //logn// name changes (pointers to follow before reaching the namesake). Therefore, Find is run in //O(log n)// time.
+This can actually be further refined by collapsing the pointer system as we trace upwards from a node when calling Find: adjust the pointers of every node we pass to point directly to the namesake. This therefore reduces the time of a sequence of //n// Find calls to //O(n alpha(n))// where //alpha(n)// is the //inverse Ackermann function//--a function that is extremely slow growing and less than or equal to 4 for any value of //n// we could encounter in practice.
+Regardless, we hit the bottleneck of sorting the //m// edges: //O(m log m)// time, but more succinctly put, because //m =< n<sup>2</sup>//: //O(m logn)//. We do a total of at most //2m// Find operations, and //n-1// Union ops while running Kruskals, so the sorting time dominates and the final time is //O(m logn)//.
+I thoroughly enjoyed this section and the nuances between the different implementations (array, pointer, and collapse optimized pointer) of the Union-Find structure were very well explained. 9/10.
+===== 4.7: Clustering =====
+Another application of MSTs is //clustering//: when we have a collection of objects and want to partition them into coherent. A common approach is defining a //distance function//, where objects more dissimilar can be considered more distant.
+With regard to formalism, we consider a set //U// of //n// objects labeled //p<sub>i</sub>// for //1=<i=<n//. We define the numerical distance between two objects //d(p<sub>i</sub>, p<sub>j</sub>)// to be always positive and symmetric between objects. Additionally, a //k-clustering// is a partition of //U// into //k// nonempty sets //C<sub>1</sub>, C<sub>2</sub>, ..., C<sub>k</sub>//, and the //spacing// of a k-clustering is the minimum distance between any pair of points lying in different clusters. We then are faced with a natural problem: finding a k-clustering (among the exponentially many possibilities) that has maximum spacing.
+The solution arises by seeing the clusters as connected components. Sorting the edges in ascending order of distance (cost), we can then iteratively merge these connected components by adding edges (skipping them if they would simply connect back to the same component). This is technically //single-link clustering//, a special case of //hierarchical agglomerative clustering// (agglomerative refers to combining clusters, single-link means we do so after a single link joins them). //But//, this is simply Kruskal's Algorithm, stopping after we've reached //k// connected components--just before it adds its last //k-1// edges.
+Or, we could think of it as taking the MST produced by the algorithm, and deleting the //k-1// most expensive edges. Therefore, within each component we actually have a tree, and there are //k// components. __These components constitute a k-clustering of maximum spacing.__ This hinges on the fact that those most expensive edges are the ones that would have connected any of the //k// components. Therefore, the next edge that would have been added (but wasn't, because we stopped the algorithm) can be considered to be the //spacing// of the k-clustering, which we know to be maximal due to the edge ordering. And if we had reached a higher edge, then we would have too many edges to have //k// disjoint clusters.
+This was a succinct and yet informative section. They explained the extension (abridgement?) of Kruskal's very well, and the formalism was illustrative and not overbearing. 9/10.
+===== 4.8: Huffman Codes and Data Compression =====
+Encoding complex information into binary sequences is a crucial part of data compression. A particular component is focused on storing information in as little physical memory as possible, and that's where taking advantage of the nonuniform frequencies of letter appearances comes into play. "Letters" in this case refers to fundamental pieces of whatever "alphabet" may be being used--this does not necessarily mean ASCII based information, it could be some form of encoded info.
+A //prefix code// for a set S of letters is a function g that maps each letter to some sequence of zeros and ones, in such a way that for distinct letters x and y in S, the sequence g(x) is not a prefix of the sequence g(y). This then allows the encoded message be perfectly decoded by scanning the bit sequence left to right. //Optimal// prefix codes are those that minimize the average number of bits required per letter (ABL(g)). This coincides with a given text to be translated as it depends on the relative frequencies of the letters' appearances which sum to 1.
+Note that we can represent prefix codes as binary trees, by simply defining from the root node, taking the path to the left child is a 0 and the right a 1. Additionally, the tree that corresponds to the optimal prefix code is //full//--each non-leaf node has two children. Also, note that the ABL therefore can be thought of as the weighted average of the depths of leaves (the weight is the frequency of each letter's appearance).
+Proven in the book, we have that there is an optimal prefix code and corresponding tree T in which the two lowest frequency letters are assigned to leaves that are siblings in T. So, given the relative frequencies of each letter, //Huffman's Algorithm// creates such a tree recursively from the bottom up by:
+  * If S has two letters: encode one using 0 and one using 1.
+  * Else:
+    * Let y and z be the two lowest freq. letters
+    * form a new alphabet S' by deleting y and z and replacing them with a new letter w of frequency //f<sub>y</sub>+f<sub>z</sub>//.
+    * Recursively construct a prefix code for S' with tree T'
+    * Define a prefix code for S as follows:
+      * Start with T'
+      * Take leaf labeled w and add two children below it labeled y and z
+The codes produced by this are called //Huffman codes//, and they achieve the minimum ABL of any prefix code. Implementing the algorithm using heap based PQs, we get a running time of //O(n logn)// for an alphabet of n letters.
+I very much liked this section. It was very good explaining everything, as well as the nuances involved in data compression. It certainly "fits" in this chapter of the book due to the greedy nature of the algorithm, but the introduction to information theory made the field seem fascinating and all I want is more! 9/10.