Table of Contents

6.3. Segmented Least Squares: Multi-way Choices

With this problem, at each step we have a polynomial number of possibilities to consider the structure of the optimal solution.

The Problem

Suppose our data consists of a set P of n points in the plane,:

(x1y1),(x2y2),…,(xnyn), where x1 < x2,…,< xn

Given a line L with equation y = ax + b, we say that an “error” of L with respect to P is the sum of all of its squared distances to the points in P:
Error(L,P) = ∑ from i =1 to n (yi - axi- b) 2

Thus naturally, we are bound to finding the line with minimum error.

The solution turns out to be a line y = ax + b, where:

a = [(n∑i xiyi) - (∑ixi)(∑i yi)]/[(n∑i xi2) - (∑i xi)2]

And b = (∑i yi- a∑i xi)/n


However, our problem is different from the above mentioned which these (above) formulas solve.

Formulating our problem

Given a sequence of data points, we need to identify a few points in the sequence at which a discrete change occurs.–> In our specific case, a change from one linear approximation to another.

So, we have a set of points P = {(x1,y1),(x2,y2),…,(xn,yn)} with x1 < x2,…, xn.

pi denotes the point (xi,yi)

We must first partition P into some number of segments.

Each segment is a subset of P that represents a contiguous set of x-coordinates of the form {pi,pi+1,…,pj-1,pj} for some indices i≤j.

For each segment S in our partition P, we compute the line minimizing the error with respect to the points in S using the formula we found above(where we wrote the value of a and b in the line y = ax + b).

The penalty of a partition is defined to be the sum of:
The number of segments into which we partition P, times a fixed, given multiplier C>0.

For each segment, the error value of the optimal line through that segment.

Our goal is to find the a partition of minimum penalty.

Designing the Algorithm

INPUT: n, p1,…,pN, C

Segmented-Least-Squares()
M[0] = 0

e[0][0] = 0

for j = 1 to n
for i = 1 to j
e[i][j] = least square error for the segment pi,…,pj

for j = 1 to n

M[j] = min 1 ≤ i ≤ j (e[i][j] + c + M[i-1])

return M[n]


Just like the Weighted Interval Scheduling Problem, we need to find an algorithm that gives us the solution itself.
Algorithm to give for the solution itself

FindSegments(j):
if j = 0:
output nothing

else:

Find an i that minimizes ei,j + C + M[i-1]

Output the segment {pi, …, pj} and the result of

FindSegments(i-1)

End if

Algorithm Analysis


The running time of the algorithm is O(n2) since filling in the array M[j] takes us O(n) time for each j.

I give this section 7/10.