Table of Contents

Maggie's Journal

Preface

Algorithms’ use has an enormous reach, not just for use in computer programing and the internet but for also in expressing the similarities among genes and genomes in biology, algorithms is a powerful lens through which to view the field of computer science in general, algorithms is the tasks of getting to the mathematically clean core of a problem and the task of identifying the appropriate algorithm design technics based on the structure of the problem, the goal of this books is to offer advice on how to identify clean algorithms in complex issues from different areas of computing and how to design efficient algorithms

1.1 Stable Matching:

The Stable Matching Problem – Gale and Shapley wanted to come up with a self-enforcing process, so you take two sets of groups, one being applicants and the other being employers for example, and you match each applicant to each company based on a list of preferences from both sets. Companies makes offers and an applicant can either accept or reject the offer and the process continues until all are in a stable matching – ie there isn’t an applicant and a company that would be willing to leave their matched other for each other. So a company gives out an offer to their favorite applicant. if the applicant is not matched with anyone they accept the offer and if the applicant is already matched to a company then the applicant can accept the offer if it is a better offer or reject if it is not a better offer, if rejected the company would then place an offer to the next applicant on their preference list. Once a stable matching has been made then we move on to another company until all the companies and applicants are matched. One problem with the matching of companies to applicants is that one company is searching for multiple applicants while each applicant is searching for one employer. To generalize it is best to look at n men and n women who are looking to be matched with one another into pairs. So our goal is to make a set of marriages with no instabilities. This algorithm for making stable pairs terminates after at most n^2 iterations of the while loop because the worst case scenario is that each of the n men must propose to each of the n women before he finds one who accepts. Here are some properties that arise during the running of the algorithm

 - A woman remains engaged from the point at which she receives her first proposal
 - The sequence women to whom a man proposes gets worse and worse as in terms of his preference list
 - If a man is free at some point during the algorithm, then there is a women to whom he has not yet proposed
 - The set of pairs returned at termination is a perfect matching
 - The set of pairs returned is a stable matching

2.1 Computational Tractability

Efficiency: An algorithm is efficient if, when implemented, it runs quickly on real input instances. But what does it mean to run quickly and what are real input instances?? This varies from algorithm to algorithm and how the algorithms were implemented. So we often look at the worst-case scenario as a guide for how efficient an algorithm is. A second definition for efficient is an algorithm is efficient if it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search. So back to the stable matching problem if we were to do the brute-force search of figuring out which were stable matches we would have to look at the n! different possible matchings and sort through them that way as opposed to the algorithm which has a worst case runtime of n^2. And n^2 is definitely better than n!. The different types of runtime are constant (1), linear(n), logarithmic(nlogn) polynomial(n^2, n^3 and so on), exponential(2^n), and factorial(n!). Another definition for efficiency is an algorithm is efficient if it has a polynomial running time. (see table 2.1 on page 34)

2.2 Asymptotic Order of Growth:

Asymptotic Upper Bounds – Let T(n) be a function, the worst-case running time of a certain algorithm of an input of size n, we say that T(n) is order f(n) given another function f(n) if for sufficiently large n, the function T(N) is bounded above by a constant multiple of f(n) – sometimes written as T(n) = O(f(n)) if there exists constants c>0 and n_0 >= 0 such that for all n >=n_0 we have T(n) ⇐ c * f(n). here we say that T is asymptotically upper-bounded by f, a constant c to exist that works for all n, c cannot depend on n, O notation expresses only an upper bound, not the exact growth rate of the function so if T(n) = pn^2 + qn + r then O(n^2)

Asymptotic Lower Bounds – we want to express the notion that for arbitrarily large input sizes n, the function T(n) is atleast a constant multiple of some specific function f(n), T(n) = Omega(f(n)) as being the asymptotically lower-bounded by f. Asymptotic Tight Bounds – if we can show that a running time T(n) is both O(f(n)) and Omega(f(n)), then in a natural sense we’ve found the right bound: T(n) grows exactly like f(n) within a constant factor, if T(n) is both O(f(n)) and Omega(f(n)) then we say that T(n) is Theta(f(n)), in this case, we say that f(n) is an asymptotically tight bound for T(n) Properties of Growth Rates

-Let f and g be two functions that the limit as n approaches infinity of f(n)/g(n) exist and is equal to some number number c > 0. Then f(n) = Theta(g(n))

-A first property is transitivity: if a function f is asymptotically upper-bounded by a function g and if g in turn is asymptotically upper-bounded by a function h, then f is asymptotically upper-bounded by, a similar property holds for lower bounds o If f = O(g) and g = O(h), then f = O(h)

-If f = Theta(g) and g = Theta(h), then f = Theta(h)

-Suppose that f and g are two functions such that for some other function h, we have f = O(h) and g = O(h), then f + g = O(h)

-Let k be a fized constant, and let f_1, f_2, …, f_k and h be functions such that f_i = O(h) for all i. Then f_1 + f_2 + … + f_k = O(h)

-Suppose that f and g are two functions, such that g = O(f). Then f + g = Theta(f)

-Let f be a polynomial of degree d, in which the coefficient a_d is positive, then f = O(n^d)

-For every b > 1 and every x > 0, we have log_b(n) = O(n^x)

-For every r > 1 and every d > 0, we have n^d = O(r^n)

2.3 Implementing the Stable Matching Algorithm Using Lists and Arrays:

Implementation for the Stable Matching

*All of these run in constant time*

2.4 Survey of Common Running Times:

2.5 A More Complex Data Structure: Priority Queues:

2.3, 2.4, 2.5

3.1 Basic Definitions and Applications:

3.2 Graph Connectivity and Graph Traversal:

Breadth-First Search - explore outward from node s in all possible directions, adding nodes one “layer” at a time, the first layer being all the nodes connected to s, then the second layer would be all the nodes that are connected to the nodes in the first layer if they haven't already been placed in a layer, (3.3)pg 80 For each j greater than or equal to 1, layer Lj produced by BFS consists of al nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer. A further property of breadth-first search is that it produces, in a very natural way, a tree T rooted at s on the set of nodes reachable from s, we call the tree T that is produced in this way a breadth-first search tree. (3.4) pg 81 Let T be a breadth-first search tree, let x and y be nodes in T belonging to layers Li and Lj respectively, and let(x,y) be an edge of G. Then i and j differ by at most 1.

  1. The set R of nodes discovered by the BFS algorithm is precisely those reachable from the starting node s is the connected component of G containing S (3.5) pg 82

Depth-First Search - it explores a graph G by going as deeply as possibly and only retreating when necessary, like BFS, DFS builds the connected component containing s

  1. (3.8) pg 86 For any two nodes s and t in a graph, their connected components are either identical or disjoint, clear in Figure 3.2, pg 79

3.3 Implementing Graph Traversal Using Queues and Stacks:

Queues and Stacks

Implementing BFS

Implementing DFS

3.1, 3.2, 3.3

Are we going to use the recursive DFS that is given on pg 84? The book references it but the only one we went over in class was the non recursive one. I just don't really get how that one works so incase we need to know it maybe we can go over it in class.

So if you're running DFS on a graph and say you do it twice where you start at the same node, will you always get the same result? What if you choose one node after the first and a different one after the first. You won't get the same, but they are equivalent results right?

3.4 Testing Bipartiteness

3.5 Connectivity in Directed Graphs

3.6 Directed Acyclic Graphs and Topological Ordering

To compute topological ordering of G: Find a node v with no incoming edges and order it first Delete v from G Recursively compute a topological ordering of G-{v} and append this order after v

4 preface

4.1 interval Scheduling: The Greedy Algorithm Stays Ahead

Initially let R be the set of all requests, and let A be empty While R is not yet empty

  Choose a request i in R that has the smallest finishing time
  Add request i to A
  Delete all requests from R that are not compatible with request i

End while loop Return the set A as the set of accepted requests

Scheduling All Intervals - there is a single resource and many requests in the form of time intervals, so we must choose which requests to accept and which to reject

Sort the intervals by their start times, breaking ties arbitrarily Let I1, I2,…,In denote the intervals in this order For j = 1, 2, 3, … n

 For each interval Ij that procedes Ij in sorted order and overlaps i
    Exclude the lavel of Ii from consideration for Ij
 End for
 If there is any label from (1, 2, ...d) that has not been excluded then
    Assign a nonexcluded label to Ij
 Else
    Leave Ij unlabeled
 End if

Endfor

3.4 to 4.1 readings

4.2 Scheduling to Minimize Lateness: An Exchange Argument

4.4 Shortest Paths in a Graph

Dijkstra's Algorithm

4.5 The Minimum Spanning Tree Problem

4.6 Implementing Kruskal's Algorithm

I am rather confused with the Union-Find Data Structure section, I don't really understand why it is used to implement Kruskal's algorithm, nor do I understand how it is implemented, I am thrown with the pointers. Can we go over this in class again?

4.7 Clustering

4.8 Huffman Codes and Data Compression

Huffman's Algorithm which produces for a given alphabet a Huffman Code

To construct a prefix code for an alphabet S, with given frequencies:

   If S has two letters then
        Encode one letter using 0 and the other letter using 1
   Else
        Let y* and z* be the two lowest-frequency letters
        Form a new alphabet S' by deleting y* and z* and replacing them with a new letter w of frequency f_x* + f_y*
        Recursively construct a prefix code x' for S', with tree T'
        Define a prefix code for S as follows:
            Start with T
        Take the leaf labeled w and add two children below it lableed y* and z*
   Endif

5.1 A First Recurrence: The Merge-sort Algorithm

5.2 Further Recurrence Relations

For some constant c, T(n) < = qT(n/2) + cn when n>2, and T(2) < = c

For some constant c, T(n) < = 2T(n/2) + cn^2, when n >2 and T(2) < = c

5.3 Counting Inversions

5.4 Finding the Closet Pair of Points

6.1 Weighted Interval Scheduling: A Recursive Procedure

Compute-Opt(j):

  If j = 0
      Return 0
  Else
      Return max(vj + Compute-Opt(p(j)), Compute-Opt(j-1))

Memoizing the Recursion

M-Compute-Opt(j)

  If j = 0
      Return 0
  If M[j] is not empty
      Return M[j]
  else
      Define M[j] = max(vj + M-Compute-Opt(p(j)), M-Compute-Opt(j-1))
          Return M[j]

Find-Solution(j)

  If j = 0
      Output Nothing
  Else
      If vj + M[p(j)] >= M[j-1]
          Output j together with the result of Find-Solution(p(j))
      Else
          Output the result of Find-Solution(j-1)

6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems

Iterative-Compute-Opt

  M[0] = 0
  For j from 1 to n
      M[j] = max(vj + M[p(j)], M[j-1])

A Basic Outline of Dynamic Programming

  1. There are only a polynomial number of subproblems
  2. The solution to the original problem can be easily computed from teh solutions to the subproblems
  3. there is a natural ordering on subproblems from smallest to largest, together with an easy-to-compute recurrence that allows one to determien the solution to a subproblem from the solutions to some number of smaller subproblems

6.3 Segmented Least Squares: Multi-way Choices

Segmented-Least-Squares(n)

   Array M[0...n]
   Set M[0] = 0
   For all pairs i <= j
        Compute the least squares error eij for the segment pi,...pj
   For j = 1,...n
        Use the recurrence OPT(j) = min from i = 1 to j (eij + C + OPT(i-1)) to compute M[j]
   Return M[n]

FindSegments(j)

   If j = 0
        output nothing
   Else
        Find an i that minimizes eij + C + M[i-1]
        Output the segment {pi,...,pj} and the result of Find-Segments(i-1)

6.4 Subset Sums and Knapsacks: Adding a Variable

Subset-Sum(n, W)

   Array M[0...n, 0....W]
   Initialize M[0,w] = 0 for each w = 0,1,...W
   For i = 1,..n
        For w = 0,...,W
             Use the recurrence If w < wi then OPT(i, w) = OPT(i-1, w), otherwise 
             OPT(i, w) = max(OPT(i-1, w), wi + OPT(i-1, w-wi)) to compute M[i,w]
   Return M[n, W]
   
* for the algorithm we've just designed, we  need a two-dimensional table, reflecting the two-dimensional array of subproblems that is being built up
* The Subset-Sum(n, W) algorithm correctly coputes the optimal value of the problem, and runs in O(nW) time
* Given a table M of the optimal values of the subproblems, the optimal set S can be found in O(n) time

7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm

7.2 Maximum Flows and Minimum Cuts in a Network

7.5 A First Application: The Bipartite Matching Problem

7.6 Disjoint Paths in Directed and Undirected Graphs