This is an old revision of the document!

Garrett's Journal

My journal entries for the “Algorithm Design” textbook by Jon Kleinberg and Éva Tardos.

"Algorithm Design" Journal Entries

Chapter Scores

Readability Score	8 / 10
Interest Score	9 / 10

Readings

Chapter 7: Network Flow

7.1: The Maximum-Flow Problem and the Ford-Fulkerson Algorithm

A flow network is a specific kind of graph that has a node that is the source of some sort of traffic, directed edges indicating the maximum amount of allowed traffic between two nodes, and a node that is the destination of all of the traffic. The amount of traffic flow coming from the source node must equal the amount of traffic flow leaving the graph through the destination node. Additionally, the amount of flow into and out of any node must be conserved such that the amount of flow into a node is equal to the flow out of a node.

Given a problem that can be modeled with a flow network, we typically want to find the maximum flow of the graph. We use the Ford-Fulkerson algorithm to find the maximum flow of a network flow graph. It uses a residual graph and a growing set of examined nodes to progress toward the optimal solution of the network flow graph.

7.2: Maximum Flows and Minimum Cuts in a Network

When finding the maximum flow of a network flow graph, it is useful to find the bottleneck of the graph to know the best-case of flow. This is done by considering the nodes in the graph to be separated into two major groups, called a cut, and summing up all of the edge capacities between the two groups. Since the flow has to go through these edges (having no other path to the destination node), we know that it is a best-case scenario that the flow of the graph be equal to the minimum possible capacity through any cut.

7.5: The Bipartite Matching Problem

A bipartite graph is a graph in which there are two sets of nodes, or “sides” of the graph, that are not connected to each other but are connected to the other set. A node in one side of the graph can be connected to more than one node on the other side of the graph.

The bipartite matching problem is a problem of finding the largest set of edges with distinct endpoints in a bipartite graph. This problem can be solved by adding a source node for the left side of the graph, link it to all of the nodes in the left side of the graph, add a destination node for the right side of the graph, link it to all of the nodes on the right side of the graph, and set all of the edges to be directed edges (pointing to the right, towards the destination node) to have a capacity of 1. Once you have this, you can run the Ford-Fulkerson algorithm on the network graph to find the largest set of edges with distinct endpoints.

7.7: Extensions to the Maximum-Flow Problem

There are a lot of interesting problems that are based on the maximum-flow problem. One such problem is a network flow in which each node also has its own supply and demand. A node with a supply is a node that brings a certain amount flow traffic into the graph. A node with a demand is a node that acts as a destination node for a certain amount of flow traffic out of the graph.

Another problem is a network flow in which each edge has a lower bound in addition to its usual upper bound (capacity). In this kind of problem, simply add the lower bound of an edge to the demand of both of its endpoint nodes.

2012/04/04 03:24 · garrettheath4 · 0 Comments

Chapter Scores

Readability Score	6 / 10
Interest Score	8 / 10

Readings

Chapter 6: Dynamic Programming

6.1: Weighted Interval Scheduling

The weighted interval scheduling problem is a lot like the unweighted interval scheduling problem except that each task to be scheduled has a priority or weight to it and we want to schedule jobs such that we have the largest possible weight. With so many variables and components to consider and take into account, a greedy algorithm isn't enough. We need something that is more powerful than greedy algorithms but more efficient than brute-force algorithms.

6.2: Memoization or Iteration over Subproblems

We use recursion all the time without even thinking about it. Sometimes, however, we do more work than we have to when we come across the same call in different recursive call branches of execution. Instead of recomputing those results, we can use memoization to remember the value of previously calculated results. Every time the recursive function is called, we check the call against a memoization table. If there is no stored result for that call, we run the recursive call and store its result in the memoization table. If there is a stored result, simply return that instead of running the call again.

Instead of memoization, we can also use an algorithm that iterates over subproblems. In this case, we solve the anticipated subproblems and then build up a solution from those. Typically, we have to do some post-processing or code tracing to find the solution steps that led to us attaining the solution's value that we got after the iteration.

6.3: Segmented Least Squares

Given a plot of points on an xy-plane, find the segments of best fit that minimizes the function. We want a solution that has a minimum amount of error while also having the least number of line segments (otherwise n-1 line segments would always be the optimal solution). We're going to use the tradeoff function E + cL, for some constant c > 0 where E is the sum of sums of the squared errors in each setment and L is the number of line segments, as our metric for an optimal solution. To find the solution, our insight is that if we iterate through the points (starting at the last point¹⁾) and make a multiway decision to include a point in any of a number of line segments.

6.4: Subset Sums and Knapsacks

The knapsack problem is a problem in which we have n items, each of which has a weight (w>0) and a value (v>0). If we have a knapsack with a capacity of W kilograms, how can we fill the knapsack so we can maximize the total value?

We go about solving this problem much like we did with the weighted interval scheduling problem in that we have to make a binary decision to include an item in the solution or not. Unlike weighted interval scheduling, however, items with equal or less weight do not conflict with each other, so we can't stop our algorithm just because we choose an item, we have to keep going to make sure we consider all reasonable combinations. We use a weight limit to designate the amount of weight we can still add to the knapsack after adding previous items. The ongoing weight limit to determine whether or not to include an item.

2012/03/28 03:08 · garrettheath4 · 0 Comments

Chapter Scores

Readability Score: 4 / 10
Interest Score: 7 / 10

Readings

Chapter 5: Divide and Conquer

5.3: Counting Inversions

Section 5.3 Summary

5.4: Finding the Closest Pair of Points

Problem: Given n points on an X,Y coordinate plane, find the two points that are the smallest distance away from each other than to any other two points.

The initial solution to this problem is obvious:

This algorithm, however, is inefficient because it is . An optimization can be found with a divide and conquer algorithm in which we divide the plane in half and recursively find the closest pair of points. In this algorithm, a problem with boundary cases of points near the dividing line quickly arises but is squelched when we realize that points beyond a distance of half of the smallest distance so far don't have to be examined.

5.5: Integer Multiplication

Problem: Given an n-digit number A and an n-digit number B, find the product A x B.

The initial solution to this problem is obvious:

Naturally, this algorithm is inefficient because its nested for loop makes it . By using a divide and conquer algorithm, we can divide up the problem and remove unnecessary operations to discover an algorithm that is .

2012/03/13 23:44 · garrettheath4 · 0 Comments

Chapter Scores

Readability Score: 7/10
Interest Score: 4/10²⁾

Readings

Chapter 5: Divide and Conquer

5.1: A First Recurrence - The Mergesort Algorithm

This section of the reading describes mergesort as the most canonical divide-and-conquer algorithm. A divide-and-conquer algorithm is an algorithm that divides the input into multiple pieces (usually 2), solves each subproblem using recursion, and then combines the results of those recursive calls into a solution. As with all recursion, the recursive calls in a divide-and-conquer algorithm need to have a base case so that the algorithm has a point at which it stops dividing and starts conquering. Usually this “bottom out” point is when the recursive method is called on an input of some predefined constant size, such as 1.

Divide-and-conquer algorithms are a little less straightforward in computing an algorithm's time complexity, but their efficiency can be expressed as a recurrence relation. Mergesort can be measured by the number of comparisons based on the input size. This measurment can be expressed as a function that computes the number of comparisons (or “time”) based on the input size n:

`T(n) ≤ 2T(n/2) + cn`	if `n > 2`
`T(2) ≤ c`	if `n = 2`

5.2: Further Recurrence Relations

A generalized form of the recurrence relation equation mentioned above is:

`T(n) ≤ qT(n/2) + cn`	if `n > 2`
`T(2) ≤ c`	if `n = 2`

where q is the number of subproblems per step in the recursive algorithm. This gives us a few cases depending on what q is…

Case q > 2

Case q = 1

Quadratic Split and Reduce

5.3: Counting Inversions

A good way to quantify the differences between two comparable sets of data is to determine the degrees of difference between the two. A generalized way to do this is to calculate the number of inversions between the two. Assuming the data in question is organized like two lists of the same set of elements, an inversion is the minimum number of times that two elements need to trade places in the first list to become the second list. For example, if you have the following list:

2	4	1	3	5

and you are comparing it to:

1	2	3	4	5

you could quantify the differences between them by saying that it would take 3 inversions to get from the first list of data to the second list of data:

2	4	1	3	5
1	4	2	3	5
1	2	4	3	5
1	2	3	4	5

We can use the algorithm on Pages 224-225 to count inversions. This algorithm has a time complexity of: , meaning that that is the the time it would take to find the number of inversions and perform them.

2012/03/07 01:12 · garrettheath4 · 0 Comments

Chapter Scores

Readability Score: 7/10
Interest Score: 9/10

Readings

Chapter 4: Greedy Algorithms

4.1: Interval Scheduling

The interval scheduling problem involves a shared resource and a set of requests. Each request wants to use that resource from a certain point in time to a certain ending point in time. When a request is using a resource, no other request can be using the resource during any part of its time slot, meaning that there can be no more than one request scheduled on the resource at any given time.

The interval scheduling problem can be solved with a greedy algorithm. A greedy algorithm is an algorithm that, like an inductive proof, starts with a base case and takes growing steps in which it always makes the best choice at each step in the solution and each step makes progress toward the solution. The greedy algorithm to solve the interval scheduling problem is as follows (based on algorithm on Page 118):

Overall Runtime: O(n*log n)

Initially let R be the set of all requests, and let A be empty
O(n)
While R is not yet empty
O(n*log n)
- Choose a request i in R that has the smallest finishing time O(log n)
- Add request i to A O(1)
- Delete all requests from R that are not compatible with request i
  O(n)
EndWhile
Report the set A as the set of accepted requests

4.2: Scheduling to Minimize Lateness

Another problem similar to the interval scheduling problem is the task scheduling problem. The task scheduling problem involves a set of tasks that have a set length of time but are flexible in when they are done. Each task has a deadline, however, to indicate when the task should be done by. Every task must be completed and no two tasks can be scheduled at the same time. The goal of the algorithm is to schedule the tasks such that the amount of time that every task is late is minimal (the “lateness” of the schedule).

The task scheduling problem can be solved with an intuitive greedy algorithm. The central concept of the algorithm is to minimize lateness by “being ahead” and scheduling tasks in order of earliest deadline to late deadline. The algorithm is as follows (based on algorithm on Page 124):

Overall Runtime: O(n*log n)

Order the jobs in order of their deadlines O(n*log n)
Assign labels to the ordered jobs such that d_i ≤ … ≤ d_n O(n)
Set the finishing time, f, to be the start time for all jobs O(1)
For each job j_i where 1≤i≤n: O(n*1)
- Assign the job i to the time interval from s(i)=f to f(i)=f+t_i
  O(1)
- Let f=f+t_i O(1)
End
Report the set of scheduled intervals [s(i), f(i)] for i=1 to i=n

4.4: Shortest Paths in a Graph

The shortest path problem involves a weighted graph in which, given a start node s, the shortest possible path to any other node, v, in the graph can be known. As can be expected, this problem can be solved with a greedy algorithm. The greedy algorithm grows by adding one node at a time to a set S of nodes whose shortest path distances are all known. The set S starts with the start node s. The distance from it to itself is 0 and the distance to every other node starts out as ∞. In each growing step of the algorithm, a node that is outside of the set S but is directly connected to one of the nodes inside the set S is added to the set. When it is added to the set, the distances of all of the nodes are updated based on the distances calculated so far for all of the new node’s neighbors. This well-known algorithm is called Dijkstra’s Algorithm.

4.5: The Minimum Spanning Tree Problem

A spanning tree is a tree in which, given a set of nodes V and the non-negative cost of connecting each pair of connectable nodes, create a tree that connects all the nodes together with the least cost possible. A key concept in finding the minimum spanning tree of a graph is that the minimum spanning tree will contain no redundant edges and no cyclic paths. This problem can be solved with one of three algorithms: Kruskal’s Algorithm, Prim’s Algorithm, and the Reverse-Delete algorithm.

4.6: The Union-Find Data Structure

Kruskal’s algorithm from the previous section is special in that in order to use the algorithm to solve the problem in an efficient manner, it needs a special data structure. Unlike the general-purpose abstract data types like stack, queues, and lists, the union-find data structure is a specialist structure that is designed to be optimal for this particular algorithm. The union-find data structure has specific functionalities, including finding the name of a set containing a given element and a union operation on two named sets. Its operations deal with keeping track of edges in a way that optimizes Kruskal’s Algorithm.

4.7: Clustering

Instead of previously discussed graph algorithms that provide more details about a graph, clustering is a way to generalize a graph to provide higher-level comprehension of the data represented. A k-cluster of a set of objects with the given “distances” between them, for example, is a graph of those objects with exactly k strongly-connected components or “clusters”. A k-cluster’s spacing is the minimum distance between any pair of points lying in different clusters. A clustering of maximum spacing is a clustering configuration in which the spacing of the graph is maximized.

4.8: Huffman Codes and Data Compression

At the lowest level, all data on a computer is simply represented as a list of the binary numbers 0 and 1. When data is converted from a human-readable format to a format that can be stored and recognized by a computer, each symbol in data is converted into a string of 1’s and 0’s. Instead of representing each symbol as a string of exactly n 1‘s and 0‘s, variable-length encoding schemes allow us to use shorter strings to optimize the amount of data needed to represent the original list of symbols. In order to take advantage of a variable-length encoding scheme without a loss of data, we use prefix codes. A prefix code is a set of encodings in which no code is a prefix of another code. Huffman codes allow us to take a symbol’s frequency of appearance in the medium into account and make shorter-length codes correspond to more frequent symbols in order to effectively compress the data.

2012/03/01 04:49 · garrettheath4 · 0 Comments

Older entries >>

¹⁾

The “last” point means the point with the largest x value.

²⁾

Everyone knows recursions and mergesort, so why are we beating it with yet another dead horse?

Table of Contents

Garrett's Journal

"Algorithm Design" Journal Entries

Chapter Scores

Readings

Chapter 7: Network Flow

7.1: The Maximum-Flow Problem and the Ford-Fulkerson Algorithm

7.2: Maximum Flows and Minimum Cuts in a Network

7.5: The Bipartite Matching Problem

7.7: Extensions to the Maximum-Flow Problem

Chapter Scores

Readings

Chapter 6: Dynamic Programming

6.1: Weighted Interval Scheduling

6.2: Memoization or Iteration over Subproblems

6.3: Segmented Least Squares

6.4: Subset Sums and Knapsacks

Chapter Scores

Readings

Chapter 5: Divide and Conquer

5.3: Counting Inversions

5.4: Finding the Closest Pair of Points

5.5: Integer Multiplication

Chapter Scores

Readings

Chapter 5: Divide and Conquer

5.1: A First Recurrence - The Mergesort Algorithm

5.2: Further Recurrence Relations

Case q > 2

Case q = 1

Quadratic Split and Reduce

5.3: Counting Inversions

Chapter Scores

Readings

Chapter 4: Greedy Algorithms

4.1: Interval Scheduling

4.2: Scheduling to Minimize Lateness

4.4: Shortest Paths in a Graph

4.5: The Minimum Spanning Tree Problem

4.6: The Union-Find Data Structure

4.7: Clustering

4.8: Huffman Codes and Data Compression