Algorithms are used all over the place in tons of applications, but most of the time there is no dirrect proof that one is better than the other, and it all becomes muddled by the specifics of the individual application. Designing a great algorithm is an elegant dance of logic, creativity, and language. Learning how to create good ones is an art.
The goal is to make an efficient algorithm for matching two groups together. The simplest form of this problem, and thus the easiest example of it to explain, is where every one thing matches with only 1 thing, and there are an identical number of items on each side. For this example the book uses men and women getting engaged. The Book Presents the Gale and Shapely method for what a stable and unstable match is. An unstable match is one in which two sets of partners would rather be with the partner in the other group. There is then an algorithm presented to match men with women. This is followed by several pretty trivial proofs about the algorithm.
Graphs are good for representing pair-wise relationships. Five separate problems to get some experience with algorithms.
Problem: How do you schedule individual use of a popular resource. This problem is pretty simple to solve.
Problem: How do you schedule a resource which is in demand, given that individuals have a priority associated with them, to maximize the return. This is not as easy to solve as the non-weighted problem, but you can use dynamic programming.
Problem: When doing the matching problem using Bipartite graphs, the problem gets more difficult. You need to build your matching and then back track to confirm what is needed. Process is called augmentation, applicable in network flow problems.
Problem: given a graph, find the subgraph such that all nodes are independent. Finding independent sets and testing independent sets are very different, with one being very difficult, and the other being rather simple.
Problem: you have two players competing for owning a certain sum of weighted nodes with rules as to which can be adjacent, each picks nodes after the other, what's the best possible way to play the game to getting the largest sum?
Basics of Algorithm Analysis
Defining efficiency is difficult. You can talk about speed, the finite nature of operations, and the memory used by the data structures. The book first simply says something is efficient if it works quickly with real world datasets. Each set of instructions has a worst case senario in which a maximum amount of instructions are needed to get one broader goal done. Since you don't know what your data is going to be, you want the worst case of your algorithm to be better than the worst case in a brute force approach to solving the problem. Exponential growth, grows exponentially, so try to avoid them as best you can…
When discussing algorithm complexity it is most important to discuss how the algorithm handles number of instructions relative to the amount of data/information one handles. This is represented by functions. Big O notation is the Asymptotic upper bounds, such that if a function f has O(g(n)), then there exists a c in N such that c*N > f for all n in N. Omega (Ω) notation is the asymptotic lower bounds of the function. If f~O(g(n))~Ω(g(n)), then we say that g(n) is the asymptotically tight bound for f(n), or rather f~Θ(g(n)). This means that the difference between the best case and worst case is not steep. All one needs to understand asymptotic orders and compair them is a basic understanding of how different functions grow, and at which rates they grow.
Divide and Conquer
Idea: Divide a list in half until you have sorted lists (all single entries) and then merge them back together as a sorted list such that the final list is a sorted version of the initial list. Takes n steps to break down and n*log(n) steps to merge them back together.
Two basic ways:
Analyze the first few layers and generalize from there to unroll how the recursion takes place.
Guess what the solution is and then check it (this is good for merge sort since we know it takes n*log(n).
So if you don't know the constant then you can assume it exists and then check to make sure that it turns out right in the end and then work it back up to find the actual constant involved.
Dynamic Programing - Break down into sub problems, then build the problems back up to solve the large problem.
Recursive algorithm: this is the first step in creating a dynamic algorithm.
This is a problem because the call stack grows at very very fast rate, we want to avoid this, so we introduce the dynamic part.
By memorizing the result of a previous calculation, we reduce the number of calculations to O(n) so its goes a lot faster.
You can use the Array of calculation choices to work your way back to find the optimal solution. Doesn't add to the complexity but does add to memory needed.
We want to make it more clear how dynamic programing is done right rather than hodgepodged on top of a recursive algoritm. M[0] = 0 then if v_i is the value of the i'th job, and p(i) is the latest job before job i that can be done if job i is chosen, then for all i compute M[i] = max(v_i + M[p(i)], M[i-1]) then the optimal amount will be the max of M.
This is a simple straight forward way to think of the previous solution.
If working backwards gets you the solution its probably going to be the way to go.
Finding the best fit line. Want to minimize the square of the deviations of data from a line. Data is given, find the line. There isn't always one line, sometimes multiple lines, but we want creating a new line to have a penalty, otherwise there would be a line from each data point to another datapoint and the SSE (sum of squares error) = 0.
This can be made into a dynamic function by working backwards, you find the SSE for the last three points (otherwise it's 0 so it's easy) and then compair whether making a new partition is going to be more effective in minimizing the SSE + penalty or if it's more useful to just adjust the original line.
This total adds up to O(n^2)
The subset sums problem is where each event has a value and a weight. There is a limit on the weight you can take on, so you want to optimize the sum of the values given a maximum weight W. There is no greedy algorithm to find this solution, so we look for dynamic programming. We build a 2D table (matrix) of optimal solutions to find the overall optimal solution for all w's less than W. This way it can be computed quickly, though large amounts of memory may be required.
The knapsack problem is one where each element has a value and a weight. You want to maximize the sum of the values while keeping the weight below a certain amount. This is solved the same was as the scheduling problem.