Table of Contents

Mike's Journal

Preface:

Algorithms are used all over the place in tons of applications, but most of the time there is no dirrect proof that one is better than the other, and it all becomes muddled by the specifics of the individual application. Designing a great algorithm is an elegant dance of logic, creativity, and language. Learning how to create good ones is an art.

Chapter 1

The Matching Problem

The goal is to make an efficient algorithm for matching two groups together. The simplest form of this problem, and thus the easiest example of it to explain, is where every one thing matches with only 1 thing, and there are an identical number of items on each side. For this example the book uses men and women getting engaged. The Book Presents the Gale and Shapely method for what a stable and unstable match is. An unstable match is one in which two sets of partners would rather be with the partner in the other group. There is then an algorithm presented to match men with women. This is followed by several pretty trivial proofs about the algorithm.

Five Representative Problems

Graphs are good for representing pair-wise relationships. Five separate problems to get some experience with algorithms.

Interval Scheduling

Problem: How do you schedule individual use of a popular resource. This problem is pretty simple to solve.

Weighted Interval Scheduling

Problem: How do you schedule a resource which is in demand, given that individuals have a priority associated with them, to maximize the return. This is not as easy to solve as the non-weighted problem, but you can use dynamic programming.

Bipartite Matching

Problem: When doing the matching problem using Bipartite graphs, the problem gets more difficult. You need to build your matching and then back track to confirm what is needed. Process is called augmentation, applicable in network flow problems.

Independent Set

Problem: given a graph, find the subgraph such that all nodes are independent. Finding independent sets and testing independent sets are very different, with one being very difficult, and the other being rather simple.

Competitive Facility Location

Problem: you have two players competing for owning a certain sum of weighted nodes with rules as to which can be adjacent, each picks nodes after the other, what's the best possible way to play the game to getting the largest sum?

Chapter 2

Basics of Algorithm Analysis

Computational Tactability

Defining efficiency is difficult. You can talk about speed, the finite nature of operations, and the memory used by the data structures. The book first simply says something is efficient if it works quickly with real world datasets. Each set of instructions has a worst case senario in which a maximum amount of instructions are needed to get one broader goal done. Since you don't know what your data is going to be, you want the worst case of your algorithm to be better than the worst case in a brute force approach to solving the problem. Exponential growth, grows exponentially, so try to avoid them as best you can…

Asymptotic Order of Growth

When discussing algorithm complexity it is most important to discuss how the algorithm handles number of instructions relative to the amount of data/information one handles. This is represented by functions. Big O notation is the Asymptotic upper bounds, such that if a function f has O(g(n)), then there exists a c in N such that c*N > f for all n in N. Omega (Ω) notation is the asymptotic lower bounds of the function. If f~O(g(n))~Ω(g(n)), then we say that g(n) is the asymptotically tight bound for f(n), or rather f~Θ(g(n)). This means that the difference between the best case and worst case is not steep. All one needs to understand asymptotic orders and compair them is a basic understanding of how different functions grow, and at which rates they grow.

Chapter 4

Section 7: Clustering

The Problem

Maximum Spacing

Design of the Algorithm

Analyzing the Algorithm

Section 8: Huffman Codes and Data Compression

The Problem

Variable-Length Encoding Schemes
Prefix Codes
Optimal Prefix Codes

Design

Prefix Codes as Binary Trees
Top-Down Approach

Analyzing

Chapter 5

Divide and Conquer

Section 1: Mergesort

Idea: Divide a list in half until you have sorted lists (all single entries) and then merge them back together as a sorted list such that the final list is a sorted version of the initial list. Takes n steps to break down and n*log(n) steps to merge them back together.

Approaches

Two basic ways:

Unrolling the Mergesort Recurrence

Analyze the first few layers and generalize from there to unroll how the recursion takes place.

Substituting a Solution into the Mergesort Recurrence

Guess what the solution is and then check it (this is good for merge sort since we know it takes n*log(n).

Partial Substitution

So if you don't know the constant then you can assume it exists and then check to make sure that it turns out right in the end and then work it back up to find the actual constant involved.

Chapter 6

Dynamic Programing - Break down into sub problems, then build the problems back up to solve the large problem.

Weighted interval scheduling

Recursive algorithm: this is the first step in creating a dynamic algorithm.

How to do it

  1. label the requests, and sort them in order of finishtime
  2. Think of the optimal solution, either the last item is in it or isn't in it
  3. Find the optimal solution involving the last one (so ignore all that don't finish before it starts) and the optimal solution that doesn't involve the last one. Pick the one that has the larger weight. Work this back recursively and then it will work its way back up to the beginning when it works its way through the call stack.

This is a problem because the call stack grows at very very fast rate, we want to avoid this, so we introduce the dynamic part.

By memorizing the result of a previous calculation, we reduce the number of calculations to O(n) so its goes a lot faster.

You can use the Array of calculation choices to work your way back to find the optimal solution. Doesn't add to the complexity but does add to memory needed.

Memoization

We want to make it more clear how dynamic programing is done right rather than hodgepodged on top of a recursive algoritm. M[0] = 0 then if v_i is the value of the i'th job, and p(i) is the latest job before job i that can be done if job i is chosen, then for all i compute M[i] = max(v_i + M[p(i)], M[i-1]) then the optimal amount will be the max of M.

This is a simple straight forward way to think of the previous solution.

When Dynamic programming should be used

  1. There are only a polynomial amount of subproblems.
  2. Original problem is easy to get from subproblems
  3. There is a natural ordering to the subproblems.

If working backwards gets you the solution its probably going to be the way to go.

Least Squares

Finding the best fit line. Want to minimize the square of the deviations of data from a line. Data is given, find the line. There isn't always one line, sometimes multiple lines, but we want creating a new line to have a penalty, otherwise there would be a line from each data point to another datapoint and the SSE (sum of squares error) = 0.

This can be made into a dynamic function by working backwards, you find the SSE for the last three points (otherwise it's 0 so it's easy) and then compair whether making a new partition is going to be more effective in minimizing the SSE + penalty or if it's more useful to just adjust the original line.

This total adds up to O(n^2)

Adding a variable

The subset sums problem is where each event has a value and a weight. There is a limit on the weight you can take on, so you want to optimize the sum of the values given a maximum weight W. There is no greedy algorithm to find this solution, so we look for dynamic programming. We build a 2D table (matrix) of optimal solutions to find the overall optimal solution for all w's less than W. This way it can be computed quickly, though large amounts of memory may be required.

The knapsack problem is one where each element has a value and a weight. You want to maximize the sum of the values while keeping the weight below a certain amount. This is solved the same was as the scheduling problem.