|
Interactive animation |
The general meaning of the word is : "y-coordinate of the point where a given line intersects the y-axis".

Two
parameters are needed to unambiguously define a straight line, and the
other parameter is usually the slope.
The term is most often refering to the intersection
of a Least Squares Line (LSL) with the y-axis with the context of Simple Linear Regression (SLR).
Because the LSL embodies the predictions of the model,
the intercept is the prediction of the SLR for the value "0" of the predictor x. SLR is often used to
describe the evolution of the response variable y when a "control"
variable x varies across a range. The intercept of the LSL then
answers the question : "What would the value of y be if x
were tuned to 0 ?". For example : "What would the residual volume
of this gas be if the temperature could be tuned down to absolute 0 ?".
The LSL depends on the particular
sample at hand. So does the intercept, which is then to be considered a random
variable. Under the standard assumptions of SLR, the distribution of the intercept
is well understood, and can be calculated exactly.
This animation illustrates the distribution of the intercept under the standard assumptions of Simple Linear Regression.
The illustration first suggests : * a regression line (in red), * a sample, * the corresponding LSL (in blue), together with the current intercept (also in blue). To chose another regression line, click on "New".
The observations in the illustration are equally spaced along the x axis. This may look like a severe limitation, but it's not : * First, it is not an unsual situation in real life. * But more importantly, SLR does not consider x as a random variable (only y is random). The distribution of the intercept depends only on the sample size, the x-coordinate of the sample mean, the x standard deviation of the sample, and the noise level, those four quantities remaining constant when jumping from one sample to the next. The detailed positions of the observations are not needed, so keeping observations equally spaced, although a limitation, is not a severe one.
The lower frame shows a gaussian curve, the theoretical distribution of the intercept. * The mean of the gaussian is positioned at the value of the intercept of the true red regression line (which is unknown in real life). This is a consequence of the fact that the intercept of the LSL is an unbiased estimator of the intercept of the regression line. * The standard deviation of the gaussian curve is the theoretical standard deviation of the distribution of the intercept of the LSL.
Click on "Go" and observe the distribution
of the intercept progressively build up.
The intercept is not an intrinsic parameter of the
LSL. It depends on where the vertical y axis is positioned, which is
pretty much arbitrary : changing the position of the y-axis amounts to
adding (or subtracting) the same quantity to all the x-values of the observations. As a consequence, the distribution of the intercept
depends on the position the y-axis.
|
______________________________________________________________________
Basic results about the intercept in Simple
Linear Regression
1) The equation of the LSL is denoted :
y = a + bx
The intercept is therefore "a" (and "b" is the slope).
2) Value of "a"
|
a = |
where :
* "b" is the value of the slope :
b = Cov(x, y)/Var(x)
*
and
are
the x-average and y-average of the sample.
3) Properties of the intercept "a" as an estimator
In this paragraph, no assumption is made about the noise distribution, other than :
* the lack of correlation of the noise between any two observations.
*
the variance
²
of the noise being the same for all observations (homoscedasticity).
In particular, it is not assumed that the noise
is gaussian.
3-a) "a" is an unbiased estimator of the intercept A of the true regression line.
|
E[a] = A |
where E denotes the expectation.
3-b) The variance of "a" is :
|
Var(a) = ( |
where n is the number of observations in
the sample.
This expression clearly shows the influence of the
position of the vertical y axis on the variance of the intercept. In
particular, var(a) reaches its lowest value when
= 0, that is when the y axis is positioned on the x-average of
the sample. The variance of a is then just
²/n.
3-c) "a" and the slope "b" are usually correlated.
|
Cov(a, b) = - |
Notice that when
>
0, the covariance is negative : a lower slope most often (but not always) corresponds
to a larger intercept, which is quite intuitive.
Only when
= 0 (that is when the y axis is positioned on the x-average of
the sample) are the intercept and the slope uncorrelated. Recall that this is
also the situation that makes the variance of the intercept smallest.
4) Distribution of "a" under the assumption of a gaussian noise
The
noise is now assumed to be distributed as N(0,
²).
4-1) "a" is normally distributed. So :
|
a ~ N(mean, variance) |
with "mean" and "variance" as in the previous paragraph (recall that these values do not depend on the nature of the noise).
4-2) "a" is an efficient estimator of A
No other unbiased estimator of A has a lower variance than "a".
4-3) "a" and any residual ui are independent variables.
_________________________________________________________________
Related readings :