Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:boyese:chapter4 [2018/02/27 01:10] – [Section 4.2: Scheduling to Minimize Lateness: An Exchange Argument] boyese
+++ courses:cs211:winter2018:journals:boyese:chapter4 [2018/03/12 18:25] (current) – [Section 4.8: Huffman Codes and Data Compression] boyese
@@ Line 157: / Line 157: @@
 Using the heap-based priority queue implementation (discussed in Chapter 2), each priority queue operation can be made to run in O(log n) time. Thus the overall time for the implementation is O(m log n).
+The biggest question I have after reading this section is how is Dijkstra's Algorithm better than a BFS, and why would someone choose this algorithm as opposed to a BFS? I had a hard time digesting this section too so I would give it a 6/10 for readability.
 ====Section 4.5: The Minimum Spanning Tree Problem====
+===Minimum Spanning Tree (MST)===
+  * Spanning Tree: spans all nodes in a graph
+  * Given a connected graph G = (V, E) with positive edge weights ce, an MST is a subset of the edges T ⊆ E such that T is a spanning tree whose sum of edge weights is minimized
+**MST Applications**
+  * Network design
+      * Telephone, electrical, hydraulic, TV cable, computer, road
+  * Approximation algorithms for NP-hard problems
+      * Traveling salesperson problem, Striner tree
+  * Indirect applications
+      * Max bottleneck paths
+      * Reducing data storage in sequencing amino acids in a protein
+  * Cluster analysis
+//Cayley’s Theorem:// There are n<sup>n-2</sup> spanning trees
+**Possible algorithms to find MST:**
+  * //Kruskal’s Algorithm:// Start without any edges and builds a spanning tree by successively inserting edges from E in order of increasing cost. Insert each edge e as long as it does not create a cycle when added to the edges already inserted. If inserting e would result in a cycle, discard e and continue.
+  * //Prim’s Algorithm:// Start with a root node s and try to greedily grow a tree from s outward. At each step, we simply add the node that can be attached as cheaply as possibly to the partial tree we already have.
+  * //Reverse Delete Algorithm:// Start with the full graph (V, E) and begin deleting edges in order of decreasing cost. As we get to each edge e (starting from the most expensive), we delete it as long as doing so would not actually disconnect the graph we currently have.
+**Properties of the MST**
+  * //Cut property:// Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then MST contains e.
+      * //Cutset:// A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S.
+  * //Cycle property:// Let C be a cycle, and let f be the max cost edge belonging to C. Then MST does not contain f. (All edge costs ce are distinct)
+      * //Cycle:// Set of edges in the form a-b, b-c, c-d, …, y-z, z-a
+  * //Cycle-Cut Intersection:// A cycle and a cutset intersect in an even number of edges
 ====Section 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure====
+===Union-Find Data Structure:===
+  * Keeps track of a graph as edges are added
+      * Cannot handle when edges are deleted
+  * Maintains disjoint sets
+      * E.g., graph’s connected components
+**Operations:**
+  * Find(u): returns name of set containing u
+      * How is this utilized to see if two nodes are in the same set?
+      * Goal implementation: O(log n)
+  * Union(A, B): merge sets A and B into one set
+      * Goal implementation: O(log n)
+===Implementing Kruskal’s Algorithm===
+  * Using the union-find data structure
+      * Build set of T edges in the MST
+      * Maintain set for each connected component
+  * Using best implementation of union-find
+      * Sorting: O(m log n)
+      * Union-find: O(m α (m, n))
+      * O(m log n)
+**The Algorithm:**
+<code>
+Sort edge weights so that c<sub>1</sub> ≤ c<sub>2</sub> ≤ ... ≤ c<sub>m</sub>
+T = {}
+foreach (u ∈ V) make a set containing singleton u
+for i = 1 to m
+    (u,v) = e<sub>i</sub>
+    if (u and v are in different sets)
+        T = T ∪ {e<sub>i</sub>}
+        merge the sets containing u and v
+return T
+</code>
 ====Section 4.7: Clustering====
+===Clustering===
+Given a set U of n objects (or points) labeled p<sub>1</sub>, …, p<sub>n</sub>, classify into coherent groups
+  * **Problem:** divide objects into clusters so that points in different clusters are far apart
+      * Requires quantification of distance
+  * Applications
+      * Routing in mobile ad-hoc networks
+      * Identify patterns in gene expression
+      * Identifying patterns in web application use cases
+          * Sets of URLs
+      * Similarity searching in medical image databases
+**Distance Function**
+The numeric value specifying “closeness” of two objects
+Assume distance function satisfies several natural properties:
+  * d(p<sub>i</sub>, p<sub>j</sub>) = 0 iff p<sub>i</sub> = p<sub>j</sub> (identity of indiscernibles)
+  * d(p<sub>i</sub>, p<sub>j</sub>) ≥ 0 (non-negativity)
+  * d(p<sub>i</sub>, p<sub>j</sub> ) = d(p<sub>j</sub>, p<sub>i</sub> ) (symmetry)
+**Problem: k-Clustering of Maximum Spacing**
+  * //k-clustering:// Divide objects into k non-empty groups
+  * //Spacing:// Min distance between any pair of points in different clusters
+  * //k-clustering of maximum spacing:// Given an integer k, find a k-clustering of maximum spacing
+**Greedy Clustering Algorithm**
+  * Single-link k-clustering algorithm
+      * Form a graph on the vertex set U, corresponding to n clusters
+      * Find the closest pair of objects such that each object is in a different cluster and add an edge between them
+      * Repeat n-k times until there are exactly k clusters
+  * This is the same as Kruskal’s algorithm, except we stop when there are k connected components
+  * Equivalent to finding MST and deleting the k-1 most expensive edges
+**//Theorem://** Let C denote the clustering C1, …, Ck formed by deleting the k-1 most expensive edges of a MST. C is a k-clustering of maximum spacing.
 ====Section 4.8: Huffman Codes and Data Compression====
+===Huffman Codes===
+**Problem: Encoding**
+  * Computers use bits: 0s and 1s
+  * Need to represent what humans know to what computers know
+  * Map symbol → unique sequence of 0s and 1s
+  * Process is called encoding
+  * The least number of bits needed to encode 26 letters, space, and 5 punctuation marks (, . ? ! ‘) (32 characters = 25), so 5 bits
+  * For a longer string of characters, we can use shorter encodings for frequently used characters like ETAOINSHRDLU
+      * Goal: optimal encoding that takes advantage of non-uniformity of letter frequencies
+      * Looking to represent data as compactly as possible
+      * Example: Morse code, which is an example of variable-length encoding
+      * We run into a problem when doing this: ambiguity caused by the encoding of one character being a prefix of encoding another
+**Optimal Prefix Codes**
+  * Solution to ambiguity: Prefix codes, which are a map of letters to bit strings such that no encoding is a prefix of any other
+      * Goal: minimize the average number of bits per letter (ABL):
+      * Σ<sub>x∈S</sub> : frequency of x * length of encoding x
+      * fx : frequency that the letter x occurs
+      * |ϒ(x)|: length of encoding of x
+      * Minimize ABL =  Σ<sub>x∈S</sub> f<sub>x</sub>|ϒ(x)|
+  * Binary tree to represent prefix codes
+      * Exposes structure netter than list of mappings
+          * Each leaf node is a letter
+          * Follow path to the letter
+              * Going left: 0, Going right: 1
+  * Recursively generate tree
+      * All letters are in root node
+      * For all letters in node:
+          * If encoding begins with 0, letter belongs in left subtree
+          * Otherwise (encoding begins with 1), letter belongs in right subtree
+          * If last bit of encoding, make the letter a leaf node of that subtree
+          * Shift encoding one bit
+          * Process left and right children
+  * Tree Properties
+      * The length of the path from root to leaf is the depth
+      * The binary tree T corresponding to the optimal prefix code is full, i.e., each internal node has two children
+  * Conclusions
+      * The binary tree corresponding to the optimal prefix code is full, i.e., each internal node has two children
+      * We want to label the leaf nodes of the binary tree corresponding to the optimal prefix code such that nodes with greatest depth have least frequency
+      * The two letters with least frequency are definitely going to be siblings
+**Huffman’s Algorithm**
+  * Data structures needed:
+      * Binary tree for the prefix codes
+      * Priority queue for choosing the node with lowest frequency
+  * Costs
+      * Inserting and extracting node into PQ: O(log n)
+      * Number of insertions and extractions: O(n)
+      * O(n log n)
+<code>
+To construct a prefix code for an alphabet S, with given frequencies:
+    If S has two letters then
+        Encode one letter using 0 and the other letter using 1
+    Else
+        Let y* and z* be the two lowest-frequency letters
+        Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter ω of frequency f<sub>y*</sub> + f<sub>z*</sub>
+        Recursively construct a prefix code γ’ for S’, with tree T’
+        Define a prefix code for S as follows:
+            Start with T’
+        Take the leaf labeled ω and add two children below it labeled y* and z*
+    Endif
+</code>
 ====Section 4.9: Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm====