Garrett's Journal

My journal entries for the “Algorithm Design” textbook by Jon Kleinberg and Éva Tardos.

"Algorithm Design" Journal Entries

Week 7

Chapter Scores

  • Readability Score: 7/10
  • Interest Score: 4/101)

Readings

Chapter 5: Divide and Conquer
5.1: A First Recurrence - The Mergesort Algorithm

This section of the reading describes mergesort as the most canonical divide-and-conquer algorithm. A divide-and-conquer algorithm is an algorithm that divides the input into multiple pieces (usually 2), solves each subproblem using recursion, and then combines the results of those recursive calls into a solution. As with all recursion, the recursive calls in a divide-and-conquer algorithm need to have a base case so that the algorithm has a point at which it stops dividing and starts conquering. Usually this “bottom out” point is when the recursive method is called on an input of some predefined constant size, such as 1.

Divide-and-conquer algorithms are a little less straightforward in computing an algorithm's time complexity, but their efficiency can be expressed as a recurrence relation. Mergesort can be measured by the number of comparisons based on the input size. This measurment can be expressed as a function that computes the number of comparisons (or “time”) based on the input size n:

T(n) ≤ 2T(n/2) + cn if n > 2
T(2) ≤ c if n = 2
5.2: Further Recurrence Relations

A generalized form of the recurrence relation equation mentioned above is:

T(n) ≤ qT(n/2) + cn if n > 2
T(2) ≤ c if n = 2

where q is the number of subproblems per step in the recursive algorithm. This gives us a few cases depending on what q is…

Case q > 2

Case q = 1

Quadratic Split and Reduce

5.3: Counting Inversions

A good way to quantify the differences between two comparable sets of data is to determine the degrees of difference between the two. A generalized way to do this is to calculate the number of inversions between the two. Assuming the data in question is organized like two lists of the same set of elements, an inversion is the minimum number of times that two elements need to trade places in the first list to become the second list. For example, if you have the following list:

24135

and you are comparing it to:

12345

you could quantify the differences between them by saying that it would take 3 inversions to get from the first list of data to the second list of data:

24135
14235
12435
12345

We can use the algorithm on Pages 224-225 to count inversions. This algorithm has a time complexity of: , meaning that that is the the time it would take to find the number of inversions and perform them.

2012/03/07 01:12 · garrettheath4 · 0 Comments

Week 5-6

Chapter Scores

  • Readability Score: 7/10
  • Interest Score: 9/10

Readings

Chapter 4: Greedy Algorithms
4.1: Interval Scheduling

The interval scheduling problem involves a shared resource and a set of requests. Each request wants to use that resource from a certain point in time to a certain ending point in time. When a request is using a resource, no other request can be using the resource during any part of its time slot, meaning that there can be no more than one request scheduled on the resource at any given time.

The interval scheduling problem can be solved with a greedy algorithm. A greedy algorithm is an algorithm that, like an inductive proof, starts with a base case and takes growing steps in which it always makes the best choice at each step in the solution and each step makes progress toward the solution. The greedy algorithm to solve the interval scheduling problem is as follows (based on algorithm on Page 118):

Overall Runtime: O(n*log n)

  • Initially let R be the set of all requests, and let A be empty
    O(n)
  • While R is not yet empty
    O(n*log n)
    • Choose a request i in R that has the smallest finishing time O(log n)
    • Add request i to A O(1)
    • Delete all requests from R that are not compatible with request i
      O(n)
  • EndWhile
  • Report the set A as the set of accepted requests
4.2: Scheduling to Minimize Lateness

Another problem similar to the interval scheduling problem is the task scheduling problem. The task scheduling problem involves a set of tasks that have a set length of time but are flexible in when they are done. Each task has a deadline, however, to indicate when the task should be done by. Every task must be completed and no two tasks can be scheduled at the same time. The goal of the algorithm is to schedule the tasks such that the amount of time that every task is late is minimal (the “lateness” of the schedule).

The task scheduling problem can be solved with an intuitive greedy algorithm. The central concept of the algorithm is to minimize lateness by “being ahead” and scheduling tasks in order of earliest deadline to late deadline. The algorithm is as follows (based on algorithm on Page 124):

Overall Runtime: O(n*log n)

  • Order the jobs in order of their deadlines O(n*log n)
  • Assign labels to the ordered jobs such that d_i ≤ … ≤ d_n O(n)
  • Set the finishing time, f, to be the start time for all jobs O(1)
  • For each job j_i where 1≤i≤n: O(n*1)
    • Assign the job i to the time interval from s(i)=f to f(i)=f+t_i
      O(1)
    • Let f=f+t_i O(1)
  • End
  • Report the set of scheduled intervals [s(i), f(i)] for i=1 to i=n
4.4: Shortest Paths in a Graph

The shortest path problem involves a weighted graph in which, given a start node s, the shortest possible path to any other node, v, in the graph can be known. As can be expected, this problem can be solved with a greedy algorithm. The greedy algorithm grows by adding one node at a time to a set S of nodes whose shortest path distances are all known. The set S starts with the start node s. The distance from it to itself is 0 and the distance to every other node starts out as ∞. In each growing step of the algorithm, a node that is outside of the set S but is directly connected to one of the nodes inside the set S is added to the set. When it is added to the set, the distances of all of the nodes are updated based on the distances calculated so far for all of the new node’s neighbors. This well-known algorithm is called Dijkstra’s Algorithm.

4.5: The Minimum Spanning Tree Problem

A spanning tree is a tree in which, given a set of nodes V and the non-negative cost of connecting each pair of connectable nodes, create a tree that connects all the nodes together with the least cost possible. A key concept in finding the minimum spanning tree of a graph is that the minimum spanning tree will contain no redundant edges and no cyclic paths. This problem can be solved with one of three algorithms: Kruskal’s Algorithm, Prim’s Algorithm, and the Reverse-Delete algorithm.

4.6: The Union-Find Data Structure

Kruskal’s algorithm from the previous section is special in that in order to use the algorithm to solve the problem in an efficient manner, it needs a special data structure. Unlike the general-purpose abstract data types like stack, queues, and lists, the union-find data structure is a specialist structure that is designed to be optimal for this particular algorithm. The union-find data structure has specific functionalities, including finding the name of a set containing a given element and a union operation on two named sets. Its operations deal with keeping track of edges in a way that optimizes Kruskal’s Algorithm.

4.7: Clustering

Instead of previously discussed graph algorithms that provide more details about a graph, clustering is a way to generalize a graph to provide higher-level comprehension of the data represented. A k-cluster of a set of objects with the given “distances” between them, for example, is a graph of those objects with exactly k strongly-connected components or “clusters”. A k-cluster’s spacing is the minimum distance between any pair of points lying in different clusters. A clustering of maximum spacing is a clustering configuration in which the spacing of the graph is maximized.

4.8: Huffman Codes and Data Compression

At the lowest level, all data on a computer is simply represented as a list of the binary numbers 0 and 1. When data is converted from a human-readable format to a format that can be stored and recognized by a computer, each symbol in data is converted into a string of 1’s and 0’s. Instead of representing each symbol as a string of exactly n 1‘s and 0‘s, variable-length encoding schemes allow us to use shorter strings to optimize the amount of data needed to represent the original list of symbols. In order to take advantage of a variable-length encoding scheme without a loss of data, we use prefix codes. A prefix code is a set of encodings in which no code is a prefix of another code. Huffman codes allow us to take a symbol’s frequency of appearance in the medium into account and make shorter-length codes correspond to more frequent symbols in order to effectively compress the data.

2012/03/01 04:49 · garrettheath4 · 0 Comments

Discussion

Week 72023/02/05 16:59Garrett Koller0 Comments
Week 32023/01/27 15:29Garrett Koller0 Comments
Week 42023/01/27 03:55Garrett Koller0 Comments
Week 82023/01/26 17:32Garrett Koller0 Comments
Week 5-62023/01/26 15:42Garrett Koller0 Comments
Week 92023/01/15 19:47Garrett Koller0 Comments
Week 102023/01/15 19:47Garrett Koller0 Comments
Week 12023/01/15 19:47Garrett Koller0 Comments
Week 22023/01/13 17:26Garrett Koller0 Comments
1)
Everyone knows recursions and mergesort, so why are we beating it with yet another dead horse?
courses/cs211/winter2012/journals/garrett/home.txt · Last modified: 2012/03/07 03:47 by garrettheath4
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0