Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2012:journals:jeanpaul:chaptersixsectioniii [2012/03/28 01:48] – created mugabej
+++ courses:cs211:winter2012:journals:jeanpaul:chaptersixsectioniii [2012/03/28 02:24] – [The Problem] mugabej
@@ Line 6: / Line 6: @@
 >>>>>>>>>>>>>>>>>> Suppose our data consists of a set P of n points in the plane,:\\
-(x<sub>1</sub>y<sub>1</sub>),(x<sub>2</sub>y<sub>2</sub>),...,(x<sub>n</sub>y<sub>n</sub>), where x<sub>1</sub> < x<sub>2</sub>,...,< x<sub>n</sub>\\
+>>>>>>>>>>>>>>>>>> (x<sub>1</sub>y<sub>1</sub>),(x<sub>2</sub>y<sub>2</sub>),...,(x<sub>n</sub>y<sub>n</sub>), where x<sub>1</sub> < x<sub>2</sub>,...,< x<sub>n</sub>\\
 >>>>>>>>>>>>>>>>>> Given a line L with equation y = ax + b, we say that an "error" of L with respect to P is the sum of all of its squared distances to the points in P:\\
->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error(L,P) = ∑<sub>i =1</sub><sup>n</sup> (y<sub>i</sub> - ax<sub>i</sub>- b) <sup>2</sup>
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error(L,P) = ∑<sub> from i =1 to n</sub> (y<sub>i</sub> - ax<sub>i</sub>- b) <sup>2</sup>\\
+>>>>>>>>>>>>>>>>>> Thus naturally, we are bound to finding the line with minimum error.\\
+>>>>>>>>>>>>>>>>>> The solution turns out to be a line y = ax + b, where:\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a = [(n∑<sub>i</sub> x<sub>i</sub>y<sub>i</sub>) - (∑<sub>i</sub>x<sub>i</sub>)(∑<sub>i</sub> y<sub>i</sub>)]/[(n∑<sub>i</sub> x<sub>i</sub><sup>2</sup>) - (∑<sub>i</sub> x<sub>i</sub>)<sup>2</sup>]\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And b = (∑<sub>i</sub> y<sub>i</sub>- a∑<sub>i</sub> x<sub>i</sub>)/n\\
+\\
+>>>>>>>>>>>>>>>>>> However, our problem is different from the above mentioned which these (above) formulas solve.\\
+** Formulating our problem**
+\\
+>>>>>>>>>>>>>>>>>> Given a sequence of data points, we need to identify a few points in the sequence at which a discrete change occurs.--> In our specific case, a change from one linear approximation to another.\\
+>>>>>>>>>>>>>>>>>> So, we have a set of points P = {(x<sub>1</sub>,y<sub>1</sub>),(x<sub>2</sub>,y<sub>2</sub>),...,(x<sub>n</sub>,y<sub>n</sub>)} with x<sub>1</sub> < x<sub>2</sub>,..., x<sub>n</sub>.\\
+>>>>>>>>>>>>>>>>>> p<sub>i</sub> denotes the point (x<sub>i</sub>,y<sub>i</sub>)\\
+>>>>>>>>>>>>>>>>>> We must first partition P into some number of segments.\\
+>>>>>>>>>>>>>>>>>> Each segment is a subset of P that represents a contiguous set of x-coordinates of the form {p<sub>i</sub>,p<sub>i+1</sub>,...,p<sub>j-1</sub>,p<sub>j</sub>} for some indices i≤j.\\
+>>>>>>>>>>>>>>>>>> For each segment S in our partition P, we compute the line minimizing the error with respect to the points in S using the formula we found above(where we wrote the value of a and b in the line y = ax + b).\\
+>>>>>>>>>>>>>>>>>> The penalty of a partition is defined to be the sum of:\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The number of segments into which we partition P, times a fixed, given multiplier C>0.\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For each segment, the error value of the optimal line through that segment.\\
+>>>>>>>>>>>>>>>>>> Our goal is to find the a partition of minimum penalty.\\