Interactive animation

Intercept

The general meaning of the word is : "y-coordinate of the point where a given line intersects the y-axis".

 

 

 
Two parameters are needed to unambiguously define a straight line, and the other parameter is usually the slope.

 
The term is most often refering to the intersection of a Least Squares Line (LSL) with the y-axis with the context of Simple Linear Regression (SLR). 

 
Because the LSL embodies the predictions of the model, the intercept is the prediction of the SLR for the value "0" of the predictor x. SLR is often used to describe the evolution of the response variable y when a "control" variable x varies across a range. The intercept of the LSL then answers the question : "What would the value of  y be if  x were tuned to 0 ?". For example : "What would the residual volume of this gas be if the temperature could be tuned down to absolute 0 ?".

 
The LSL depends on the particular sample at hand. So does the intercept, which is then to be considered a random variable. Under the standard assumptions of SLR, the distribution of the intercept is well understood, and can be calculated exactly.

 

This animation illustrates the distribution of the intercept under the standard assumptions of Simple Linear Regression.

 

 

The "Book of Animations" on your computer

 

 

 

The illustration first suggests :

    * a regression line (in red), 

    * a sample,

    * the corresponding LSL (in blue), together with the current intercept (also in blue).

To chose another regression line, click on "New".

 

The observations in the illustration are equally spaced along the x axis. This may look like a severe limitation, but it's not :

    * First, it is not an unsual situation in real life.

    * But more importantly, SLR does not consider x as a random variable (only y is random). The distribution of the intercept depends only on the sample size, the x-coordinate of the sample mean, the x standard deviation of the sample, and the noise level, those four quantities remaining constant when jumping from one sample to the next. The detailed positions of the observations are not needed, so keeping observations equally spaced, although a limitation, is not a severe one.

 

The lower frame shows a gaussian curve, the theoretical distribution of the intercept.

    * The mean of the gaussian is positioned at the value of the intercept of the true red regression line (which is unknown in real life). This is a consequence of the fact that the intercept of the LSL is an unbiased estimator of the intercept of the regression line.

    * The standard deviation of the gaussian curve is the theoretical standard deviation of the distribution of the intercept of the LSL.

 

Click on "Go" and observe  the distribution of the intercept progressively build up.
_____________________________

 

The intercept is not an intrinsic parameter of the LSL. It depends on where the vertical y axis is positioned, which is pretty much arbitrary : changing the position of the y-axis amounts to adding (or subtracting) the same quantity to all the x-values of the observations. As a consequence, the distribution of the intercept depends on the position the y-axis.

Move the y-axis around (green slider), and observe how the predicted distribution of the intercept changes. Its standard deviation goes through a minimum for a particular position of the y-axis. Can you guess the position of this minimum ?


If you keep the LSL visible, you'll notice that its seems to randomly "pivot" around some sort of fulcrum within the cloud of observations. Consequently, you would expect the standard deviation of the intercept to be large if all the observations are far away on one side of the the y-axis. Use the "Left" and "Right" controls to position the observations at one end of the scene, while positioning the y-axis at the other end. Notice the increase of the standard deviation of the gaussian curve (and therefore of the distribution of the intercept).
You may do that any time (as well as changing the sample size, or changing the noise level) while retaining the current regression line : just click on the small "Reset" button at the lower right corner of the illustration.
____________________________

 
Change the sample size (all other parameters being held fixed), and observe that the standard deviation of the intercept's distribution always decreases with an increased sample size. As this number goes up, the LSL is more and more constrained to stick to the true regression line.
____________________________

 
Notice also that everything else being equal, the standard deviation of the intercept's distribution decreases as you increase the range of the sample. This situation is similar to that of a direction in space being defined by a pipe : the direction (regression line) is more accurately defined for longer pipes.
____________________________

 
Notice that the standard deviation of the intercept's distribution does not depend at all on the regression line (for a given set of values of the parameters) : click repetitively on "New", and observe that although the position of the gaussian curve varies to reflect  the value of the intercept of the regression line, its standard deviation remains constant. 

 

______________________________________________________________________


Basic results about the intercept in Simple Linear Regression

 

    1) The equation of the LSL is denoted :

y = a + bx

The intercept is therefore "a"  (and "b" is the slope).

 

    2) Value of "a"

a = - b.

 

where :

    * "b" is the value of the slope :

b = Cov(x, y)/Var(x)

    *   and  are the x-average and y-average of the sample.

 

    3) Properties of the intercept "a" as an estimator

            In this paragraph, no assumption is made about the noise distribution, other than :

                * the lack of correlation of the noise between any two observations.

                * the variance ² of the noise being the same for all observations (homoscedasticity).
In particular, it is not assumed that the noise is gaussian.
 

            3-a) "a" is an unbiased estimator of the intercept A of the true regression line.

 

E[a] = A


where E denotes the expectation.

 

 

        3-b) The variance of "a" is :

Var(a) = (²/n).(1 + ²/Var(x))


where n is the number of observations in the sample.

 
This expression clearly shows the influence of the position of the vertical y axis on the variance of the intercept. In particular, var(a) reaches its lowest value when = 0, that is when the y axis is positioned on the x-average of the sample. The variance of a is then just ²/n.

 

 

        3-c) "a" and the slope "b" are usually correlated.

 

Cov(a, b) = - ²./n.Var(x)

 


Notice that when   > 0, the covariance is negative : a lower slope most often (but not always) corresponds to a larger intercept, which is quite intuitive.
Only when = 0 (that is when the y axis is positioned on the x-average of the sample) are the intercept and the slope uncorrelated. Recall that this is also the situation that makes the variance of the intercept smallest.

 

 

    4) Distribution of "a" under the assumption of a gaussian noise

        The noise is now assumed to be distributed as N(0, ²).

 

        4-1) "a" is normally distributed. So :

a ~ N(mean, variance)

 

with "mean" and "variance" as in the previous paragraph (recall that these values do not depend on the nature of the noise).

 

        4-2) "a" is an efficient estimator of A

            No other unbiased estimator of A has a lower variance than "a".

 

        4-3) "a" and any residual ui are independent variables.

 _________________________________________________________________

Related readings :

Simple Linear Regression

Slope

Correlation coefficient

Download this Glossary