|
Interactive animation |
The general meaning of the word is : "y-coordinate of the point where a given line intersects the y-axis".

Two
parameters are needed to unambiguously define a straight line, and the
other parameter is usually the slope.
The word is most often refering to the intersection
of a Least Squares Line (LSL) with the y-axis
in the context of Simple Linear Regression (SLR).
Because the LSL embodies the predictions of the model,
the intercept is the prediction of the SLR for the value "0" of the
independent variable x. SLR is often used to
describe the evolution of the response variable y when a "control"
variable x varies across a range. The intercept of the LSL then
answers the question : "What would the value of y be if x
were tuned to 0 ?". For example : "What would the residual volume
of this gas be if the temperature could be tuned down to absolute 0 ?".
The LSL depends on the particular
sample at hand. So does the intercept, which is then to be considered a random
variable. Under the standard assumptions of SLR, the distribution of the intercept
is well understood, and can be calculated exactly.
The following figure illustrates the distribution of the intercept under various "experimental" conditions. You need Flash Player to view it. If you don't have it, you can download it for free at www.adobe.com/downloads/ .
The illustration first suggests :
* a regression line (in red),
* a sample,
* the corresponding LSL (in blue), together with the current intercept (also in blue).
To chose another regression line, click on "New".
The points in the illustration are equally spaced along the x axis. This may look like a severe limitation, but it's not :
* First, it is not an unsual situation in real life.
* But more importantly, SLR does not consider x as a random variable (only y is random). The distribution of the intercept depends only on the number of points, the x-coordinate of the sample average, the x standard deviation of the sample, and the noise level, those four quantities remaining constant when jumping from one sample to the next. The detailed positions of the points are not needed, so keeping points equally spaced, although a limitation, is not a severe one.
The bottom frame shows a gaussian curve, the theoretical distribution of the intercept.
* The mean of the gaussian is positioned at the value of the intercept of the true red regression line (which is unknown in real life). This is a consequence of the fact that the intercept of the LSL is an unbiased estimator of the intercept of the regression line.
* The standard deviation of the gaussian curve is the theoretical standard deviation of the distribution of the intercept of the LSL.
Click on "Go" and observe the distribution
of the intercept progressively build up.
_____________________________
The intercept is not an intrinsic parameter of the
LSL. It depends on where the vertical y axis is positioned, which is
pretty much arbitrary : changing the position of the y-axis amounts to
adding (or subtracting) the same quantity to all the x-values of the
sample points. As a consequence, the distribution of the intercept
depends on the position the y-axis.
Move the y-axis around
(green slider), and observe how the predicted distribution of the intercept
changes. Its standard deviation goes through a minimum for a particular position
of the y-axis. Can you guess the position of this minimum ?
If you keep the LSL visible, you'll notice that its
seems to randomly "pivot" around some sort of fulcrum within
the cloud of points. Consequently, you would expect the standard deviation of
the intercept to be large if all the points are far away on one side of the the y-axis.
Use the "Left" and "Right" controls to position the points
at one end of the scene, while positioning the y-axis at the other end.
Notice the increase of the standard deviation of the gaussian curve (and therefore
of the distribution of the intercept).
You may do that any time (as well as changing the
number of points, or changing the noise level) while retaining the
current regression line : just click on the small "Reset" button at
the lower right corner of the illustration.
____________________________
Change the number of points (all other parameters
being held fixed), and observe that the standard deviation of the intercept's
distribution always decreases with an increased the number of points. As
this number goes up, the LSL is more and more constrained to stick to the true
regression line.
____________________________
Notice also that everything else being equal, the
standard deviation of the intercept's distribution decreases as you increase
the range of the sample. This situation is similar to that of a direction in
space being defined by a pipe : the direction (regression line) is
more accurately defined for longer pipes.
____________________________
Notice that the standard deviation of the intercept's
distribution does not
depend at all on the regression line (for a given set of values of the parameters) : click repetitively on "New",
and observe that although the position of the gaussian curve varies to reflect
the value of the intercept of the regression line, its standard deviation remains constant.
______________________________________________________________________
Basic results about the intercept in Simple
Linear Regression
1) The equation of the LSL is denoted :
y = a + bx
The intercept is therefore "a" (and "b" is the slope).
2) Value of "a"
|
a = |
where :
* "b" is the value of the slope :
b = Cov(x, y)/Var(x)
*
and
are
the x-average and y-average of the sample.
3) Properties of the intercept "a" as an estimator
In this paragraph, no assumption is made about the noise distribution, other than :
* the lack of correlation of the noise between any two observations.
*
the variance
²
of the noise being the same for all observations (homoscedasticity).
In particular, it is not assumed that the noise
is gaussian.
3-a) "a" is an unbiased estimator of the intercept A of the true regression line.
|
E[a] = A |
where E denotes the expectation.
3-b) The variance of "a" is :
|
Var(a) = ( |
where n is the number of observations in
the sample.
This expression clearly shows the influence of the
position of the vertical y axis on the variance of the intercept. In
particular, var(a) reaches its lowest value when
= 0, that is when the y axis is positioned on the x-average of
the sample. The variance of a is then just
²/n.
3-c) "a" and the slope "b" are usually correlated.
|
Cov(a, b) = - |
Notice that when
>
0, the covariance is negative : a lower slope most often (but not always) corresponds
to a larger intercept, which is quite intuitive.
Only when
= 0 (that is when the y axis is positioned on the x-average of
the sample) are the intercept and the slope uncorrelated. Recall that this is
also the situation that makes the variance of the intercept smallest.
4) Distribution of "a" under the assumption of a gaussian noise
The
noise is now assumed to be distributed as N(0,
²).
4-1) "a" is normally distributed. So :
|
a ~ N(mean, variance) |
with "mean" and "variance" as in the previous paragraph (recall that these values do not depend on the nature of the noise).
4-2) "a" is an efficient estimator of A
No other unbiased estimator of A has a lower variance than "a".
4-3) "a" and any residual ui are independent variables.
___________________________________________
Related readings
|
Want to contribute to this site ? |