This is an old revision of the document!
6.3. Segmented Least Squares: Multi-way Choices
With this problem, at each step we have a polynomial number of possibilities to consider the structure of the optimal solution.
The Problem
Suppose our data consists of a set P of n points in the plane,:
(x1y1),(x2y2),…,(xnyn), where x1 < x2,…,< xn
Given a line L with equation y = ax + b, we say that an “error” of L with respect to P is the sum of all of its squared distances to the points in P:
Error(L,P) = ∑ from i =1 to n (yi - axi- b) 2
Thus naturally, we are bound to finding the line with minimum error.
The solution turns out to be a line y = ax + b, where:
a = [(n∑i xiyi) - (∑ixi)(∑i yi)]/[(n∑i xi2) - (∑i xi)2]
And b = (∑i yi- a∑i xi)/n
However, our problem is different from the above mentioned which these (above) formulas solve.
Formulating our problem
Given a sequence of data points, we need to identify a few points in the sequence at which a discrete change occurs.–> In our specific case, a change from one linear approximation to another.
So, we have a set of points P = {(x1,y1),(x2,y2),…,(xn,yn)} with x1 < x2,…, xn.
pi denotes the point (xi,yi)
We must first partition P into some number of segments.
Each segment is a subset of P that represents a contiguous set of x-coordinates of the form {pi,pi+1,…,pj-1,pj} for some indices i≤j.
For each segment S in our partition P, we compute the line minimizing the error with respect to the points in S using the formula we found above(where we wrote the value of a and b in the line y = ax + b).
The penalty of a partition is defined to be the sum of:
The number of segments into which we partition P, times a fixed, given multiplier C>0.
For each segment, the error value of the optimal line through that segment.
Our goal is to find the a partition of minimum penalty.