Least Squares Line

The Simple Linear Regression model is materialized by a straight line, called the "Least Squares Line". This line is a condensed graphic representation of the distribution of the sample in the (x, y) plane. It is further used to predict "y" for new values of "x".

 

The very name of this line tells how it is determined. For any straigth line D in the plane :

    * Measure the vertical distance from a point to the line D,

    * Square this value,

    * Add the results for all points in the sample.


 It can be shown that there is one, and only one line for which this quantity is minimal. This is the Least Squares Line.

The following animation illustrates the concept of Least Squares Line.

 

 

The "Book of Animations" on your computer

 

 

 

The number of points can be changed in the "Reset" mode only. "Noise" is in arbitrary units.

 

Drag the green cursors to move the "candidate" line until you get the lowest possible value in the mobile display.

This value is a modified version of the sum of the squares of the distances between the points and the line :

    * First, this sum is divided by the number of points, in order to obtain the average value of the squares of the distances of the points to the line.

    * Then, one takes the square root of this new quantity in order to obtain not the square of a distance, but something akin to a distance, which is easier to visualize (this is pretty much what we do when switching from variance to Standard Deviation). This last quantity is then displayed. It looks pretty much like the average distance from the points to the line, but it's not the average distance from the points to the line.

____________________

 

For a given sample, try several starting positions for the line. You'll easily convince yourself that you always end up with the same final line : there is only one line such that any small change of the position of the line always causes an increase of the sum of squares. This is a very important property. It is linked to the fact that we are trying to account for the sample with a straight line.
In more complex situations, a more complex shape may be appropriate. It may then happen that several different "curved" lines are such that any small change of the position or shape of a curve will cause an increase of the sum of squares. This is what happens, for instance, with Neural Networks.

 

 

 

The Least Squares Line is identified by the mathematical method known as "Ordinary Least Squares" (OLS). It possesses optimal properties that make Simple Linear Regression (SLR) the most popular data modelization technique.

When some of the assumptions of SLR have to be abandonned for less stringent assumptions, the Least Squares Line can, under certain conditions, be replaced by the "Weighted Least Squares Line" (see here).

 

 ____________________________________________

 

Related readings :

Least Squares estimation

Simple Linear Regression

Weighted Least Squares

Download this Glossary