Least Squares Line
The Simple Linear Regression model is materialized by a straight line, called the "Least Squares Line". This line is a condensed graphic representation of the distribution of the sample in the (x, y) plane. It is further used to predict "y" for new values of "x".
The very name of this line tells how it is determined. For any straigth line D in the plane :
* Measure the vertical distance from a point to the line D,
* Square this value,
* Add the results for all points in the sample.
It can be shown that there is one, and only one line for which this quantity is minimal. This is the Least Squares Line.
The following animation illustrates the concept of Least Squares Line.
The number of points can be changed in the "Reset" mode only. "Noise" is in arbitrary units.
Drag the green cursors to move the "candidate" line until you get the lowest possible value in the mobile display.
This value is a modified version of the sum of the squares of the distances between the points and the line :
* First, this sum is divided by the number of points, in order to obtain the average value of the squares of the distances of the points to the line.
* Then, one takes the square root of this new quantity in order to obtain not the square of a distance, but something akin to a distance, which is easier to visualize (this is pretty much what we do when switching from variance to Standard Deviation). This last quantity is then displayed. It looks pretty much like the average distance from the points to the line, but it's not the average distance from the points to the line.
For a given sample, try several starting positions
for the line. You'll easily convince yourself that you always end up with the
same final line : there is only one line such that any small change of
the position of the line always causes an increase of the sum of squares. This
is a very important property. It is linked to the fact that we are trying to
account for the sample with a straight line.
The Least Squares Line is identified by the mathematical method known as "Ordinary Least Squares" (OLS). It possesses optimal properties that make Simple Linear Regression (SLR) the most popular data modelization technique.
When some of the assumptions of SLR have to be abandonned for less stringent assumptions, the Least Squares Line can, under certain conditions, be replaced by the "Weighted Least Squares Line" (see here).
Related readings :