Interactive animation

Slope

In mathematics, the "slope" is the tangent of the angle between a straight line and the x axis.

In Data Modeling, the terme "slope" is to be found within the context of Simple Linear Regression. It then refers to the slope of the Least Squares Line (LSL).

Two parameters are needed to unambiguously define a straight line, and the other parameter is usually the intercept. The slope has a simple interpretation : suppose you move a distance dx to the right along the x axis. Then the corresponding point on the straight line goes up (or down) by a quantity dy, with :

dy = dx.Slope

So the slope tells how fast y changes when x is changed (although its numerical value depends on the units for x and y).

The LSL depends on the particular sample at hand. So does the slope which is then to be considered a random variable. Under the standard assumptions of SLR, the distribution of the slope is well understood, and can be calculated exactly.

The following animation illustrates the distribution of the slope under various "experimental" conditions.

 The "Book of Animations" on your computer

 The animation first suggests :     * a regression line (in red),      * a sample,     * the corresponding LSL (in blue), together with the current slope (also in blue).   To chose another regression line, click on "New".   The points in the illustration are equally spaced along the x axis. This may look like a severe limitation, but it's not :     * First, it is not an unsual situation in real life.     * But more importantly, SLR does not consider x as a random variable (only y is random). The distribution of the slope depends only on the number of points, the x standard deviation of the sample, and the noise level, those three quantities remaining constant when jumping from one sample to the next. The detailed positions of the points are not needed, so keeping points equally spaced, although a limitation, is not a severe one.   The bottom frame shows a gaussian curve, the theoretical distribution of the slope.     * The mean of the gaussian is positioned at the value of the slope of the true red regression line (which is unknown in real life). This is a consequence of the fact that the slope of the LSL is an unbiased estimator of the slope of the regression line.     * The standard deviation of the gaussian is the theoretical standard deviation of the distribution of the slope of the LSL.   Click on "Go" and observe  the distribution of the slope progressively build up.________________________________________ The variance (or standard deviation) of the slope's distribution is a fundamental quantity in SLR. It is a measure of the uncertainty about the slope of the regression line, that is about the strength of the link between the independent variable x and the response variable y. It is the basis of the test that will decide whether the existence of a functional link between x and y is a credible assumption.If you're already somewhat knowledgeable about SLR, you may be surprised that horizontal Regression Lines are not banned from the above illustration, as they depict situations where y does not depend on x. The reason is that we are here not addressing the issue of the link between x and y, but only that of the slope of the LSL, which is unambiguously defined whether or not there is a link between x and y.________________________________________Contrary to the intercept, the slope's distribution does not depend on the positions of the x and y axes (this is the reason why none of these axes is adjustable in the illustration). In other words :     * adding a same quantity to all the x values,     * and/or adding a same quantity to all the y values,   leaves the slope unchanged.  You may simulate a translation of the y axis by translating the sample range (use "Left" and "Right" controls, and keep the difference "Right - Left" constant), leaving all other parameters constant. You may do that while retaining the same regression line by first clicking on the small "Reset" button at the bottom right corner of the illustration.Notice that the slope's distribution is unchanged when you translate the sample's range.____________________________  Change the number of points (all other parameters being held fixed), and observe that the standard deviation of the slope's distribution always decreases with an increased the number of points. Increasing the number of points reduces the uncertainty about the true position of the regression line.. ____________________________  Change the range of the sample (all other parameters being held fixed). Observe that the standard deviation of the slope's distribution always decreases when the range increases. This situation is similar to that of a direction in space being defined by a pipe : the direction (regression line) is more accurately defined for longer pipes.____________________________  Notice that the variance of the slope's distribution does not depend at all on the regression line (for a given set of values of the parameters) : click repetitively on "New", and observe that although the position of the gaussian curve varies to reflect  the value of the slope of the regression line, its variance remains constant.

# Basic results about the Slope in Simple Linear Regression

1) The equation of the LSL is denoted :

y = a + bx

The slope is therefore "b"  (and "a" is the intercept).

2) Value of "b"

 b = Cov(x, y)/Var(x)

The slope can also be expressed as :

 b = .sy /sx

where :

* "" is the correlation coefficient of x and y.

* sx and sy are the standard deviations of x and of y.

3) Properties of the slope "b" as an estimator

In this section, no assumption is made about the noise distribution, other than :

* the lack of correlation of the noise between any two observations.

* the variance ² of the noise being the same for all observations (homoscedasticity).

In particular, it is not assumed that the noise is gaussian.

3-a) "b" is an unbiased estimator of the slope B of the true regression line.

 E[b] = B

where E denotes the expectation.

3-b) The variance of "b" is :

 Var(b) = ²/n.Var(x)

where n is the number of observations in the sample.

3-c) The slope "b" and the intercept "a" are generally correlated.

 Cov(a, b) = - ²./n.Var(x)

Note that when   > 0, the covariance is negative : a lower slope usually (but not always) corresponds to a larger intercept, which is quite intuitive.

Only when = 0 (that is when the y axis is positioned on the x-average of the sample) are the slope and the intercept uncorrelated.  Recall that this is also the situation that makes the variance of the intercept smallest.

4) Distribution of "b" under the assumption of a gaussian noise

The noise is assumed to be normally distributed as N(0, ²).

4-1) "b" is normally distributed. So :

 b ~ N(mean, variance)

with "mean" and "variance" as in the previous paragraph (recall that these values do not depend on the nature of the noise).

4-2) "b" is an efficient estimator of B

No other unbiased estimator of B has a lower variance than "b".

4-3) "b" and any residual ui are independent variables.

4-4) "b" and are independent gaussian variables (there is no equivalent statement for the intercept "a").

____________________________________________