====== 6.3. Segmented Least Squares: Multi-way Choices ====== With this problem, at each step we have a polynomial number of possibilities to consider the structure of the optimal solution.\\ \\ ===== The Problem ===== >>>>>>>>>>>>>>>>>> Suppose our data consists of a set P of n points in the plane,:\\ >>>>>>>>>>>>>>>>>> (x1y1),(x2y2),...,(xnyn), where x1 < x2,...,< xn\\ >>>>>>>>>>>>>>>>>> Given a line L with equation y = ax + b, we say that an "error" of L with respect to P is the sum of all of its squared distances to the points in P:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error(L,P) = ∑ from i =1 to n (yi - axi- b) 2\\ >>>>>>>>>>>>>>>>>> Thus naturally, we are bound to finding the line with minimum error.\\ >>>>>>>>>>>>>>>>>> The solution turns out to be a line y = ax + b, where:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a = [(n∑i xiyi) - (∑ixi)(∑i yi)]/[(n∑i xi2) - (∑i xi)2]\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And b = (∑i yi- a∑i xi)/n\\ \\ >>>>>>>>>>>>>>>>>> However, our problem is different from the above mentioned which these (above) formulas solve.\\ ** Formulating our problem** \\ >>>>>>>>>>>>>>>>>> Given a sequence of data points, we need to identify a few points in the sequence at which a discrete change occurs.--> In our specific case, a change from one linear approximation to another.\\ >>>>>>>>>>>>>>>>>> So, we have a set of points P = {(x1,y1),(x2,y2),...,(xn,yn)} with x1 < x2,..., xn.\\ >>>>>>>>>>>>>>>>>> pi denotes the point (xi,yi)\\ >>>>>>>>>>>>>>>>>> We must first partition P into some number of segments.\\ >>>>>>>>>>>>>>>>>> Each segment is a subset of P that represents a contiguous set of x-coordinates of the form {pi,pi+1,...,pj-1,pj} for some indices i≤j.\\ >>>>>>>>>>>>>>>>>> For each segment S in our partition P, we compute the line minimizing the error with respect to the points in S using the formula we found above(where we wrote the value of a and b in the line y = ax + b).\\ >>>>>>>>>>>>>>>>>> The penalty of a partition is defined to be the sum of:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The number of segments into which we partition P, times a fixed, given multiplier C>0.\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For each segment, the error value of the optimal line through that segment.\\ >>>>>>>>>>>>>>>>>> Our goal is to find the a partition of minimum penalty.\\ ===== Designing the Algorithm ===== >>>>>>>>>>>>>>>>>> INPUT: n, p1,...,pN, C\\ >>>>>>>>>>>>>>>>>> Segmented-Least-Squares()\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> M[0] = 0\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e[0][0] = 0\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for j = 1 to n\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for i = 1 to j\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e[i][j] = least square error for the segment pi,...,pj\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for j = 1 to n\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> M[j] = min 1 ≤ i ≤ j (e[i][j] + c + M[i-1])\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return M[n]\\ \\ Just like the Weighted Interval Scheduling Problem, we need to find an algorithm that gives us the solution itself.\\ ** Algorithm to give for the solution itself** \\ >>>>>>>>>>>>>>>>>> FindSegments(j):\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if j = 0:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output nothing\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> else:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Find an i that minimizes ei,j + C + M[i-1]\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Output the segment {pi, …, pj} and the result of \\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FindSegments(i-1) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> End if ===== Algorithm Analysis ===== \\ The running time of the algorithm is O(n2) since filling in the array M[j] takes us O(n) time for each j.\\ I give this section 7/10.