|
Interactive animation |
t distribution
Recall that if X~N(µ,
²)
is a random standard normal variable, the sample mean m of n-observation
samples is also normally distributed:
* with mean µ,
* and variance
²/n.
m~N(µ,
²/n)
The variable

is the departure of the sample mean from the true distribution mean, standardized by its own distribution's standard deviation used as "unit length".
m' is normal with mean 0 and unit variance:
m'~N(0, 1)
Therefore, if the variance
²
of a normal distribution is known, the sample mean is transformed
into a standard normal variable in a very simple way.
Now what happens if the mean
µ is known, but not the variance
²
? In the expression of m', one replaces
²
by its unbiased estimate:

to obtain the so-called T statistic. If we denote

we get

very much like before.
Note that T can be calculated from the observations
only, as its expression does not contain
anymore.
Unfortunately, S is now a random variable that prevents the distribution of T from being standard normal. What is the distribution of T ?
This distribution is known as "Student's t distribution", or simply "t distribution". Its exact shape depends on n, that is therefore a parameter of the distribution.
The distribution of T (for n-observation samples) is called the "t distribution with n - 1 degrees of freedom", and is denoted tn-1:
T~tn-1
(The reason why the number of degrees of freedom is n - 1 and not n is given here).
The following interactive animation illustrates the t distribution.
1) Upper frame
Displays a normal distribution, and a sample drawn from this distribution. The distribution is standard (mean 0 and unit variance), but this is not fundamental.
2) Middle display
Below the upper frame are displayed:
* To the left, the difference between the sample mean and the distribution mean (here, 0). This is T''s numerator.
* To the right is T''s denominator

with

3) Lower frame
* The tallest curve (blue)
is the standard normal distribution. It is the distribution of m' (that
is, when the variance
²
is known).
* The middle curve (red) is the t distribution corresponding to the number of observations in the sample. This curve is symmetric with respect to the vertical axis T = 0.
Note that the number of degrees of freedom (df) is n - 1.
* The shortest curve (black) is the t distribution for n = 2 (df = 1). We show in the Tutorial that it is the Cauchy distribution.
Below the frame is the value of the T statistic for the current sample.
_______________________________
* Change the number of points and observe how the shape of the t distribution changes.
You may click on "Mask sample" to avoid being disturbed
by the sample.
Although it always looks like the standard gaussian, it is never as tall as the gaussian. But the area under the curve is always 1, so the missing area in the central zone is to be found in the "tails" of the t distribution, that are always fatter than that of the gaussian.
This is a consequence of the fact that the sum of the squares of the distances of the observations to the sample mean is always less that the sum of the squares of the distances of the observations to the true mean. Because we are forced to estimate the distribution variance, we introduce an uncertainty about the value of the ratio of:
Large values of T are therefore more probable
than the same values of m' (that is, when the variance
² is
known).
This remark is the vary basis of t-tests.
* Observe that the t distribution converges towards the standard normal distribution for large values of n. This reflects the fact that the estimated variance converges in probability towards the true variance when n grows without limit.
* Observe that the tails of the t distribution become more pronounced for smaller values of n.
* The T statistic needs at least two observations. When n = 2 (df = 1), the t distribution is identical to another classical distribution, the Cauchy distribution. It's tails are so fat that they prevent it from having a mean (as well as any higher order moment).
* Select a value for n, then click on "Go" and observe the progressive build-up of the histogram of the corresponding t distribution.
___________________________________________
The key point about the t distribution is that it does not depend on the variance s² of the original normal distribution. This point is made clear below.
The T statistic is therefore a pivotal quantity, from which it is possible to devise :
We show that under the standard assumptions of Simple Linear Regression, the coefficients (slope and intercept) of the Least Squares Line are both normally distributed. But, contrary to what we assumed when we defined the T statistic, not only are the variances of these normal distributions unknown, but their means are also unknown and have to be estimated. So, for either the slope or the intercept, the distribution of the standardized coefficient now involves the estimation of two parameters instead of just one.
As a consequence, it can be shown that the standardized coefficients are distributed as tn - 2, and this is the distribution that has to be taken into account when elaborating confidence intervals and tests for the regression coefficients.
This result is difficult, and is not demonstrated, but it should come as no surprise that estimating two parameters instead of one leads to losing two degrees of freedom instead of one.
For just a little bit more on "losing degrees of freedom",
please see here.
This Tutorial bears on the Chi-square distribution, but we show below that the
t distribution is intimately linked to the Chi-square distribution.
We can now elaborate a more general formal definition
of the t distribution. Take the expression for T, and divide
both the numerator and the denominator by
,
the true standard deviation:

1) The new numerator of T is

which is N(0, 1).
2) The new denominator of T can be written:

But the term

is
n-1
(see here).
The denominator under the radical is n - 1, that is just the number of degrees of freedom of the numerator.
3) The numerator and denominator of T are independent random variables. If we write T under its original form:

* The numerator is normally distributed,
* with estimated standard deviation S.
and these quantities are known to be independent.
4) Note that T is now identified as the ratio of two independent variables, the distributions of which do not depend on the variance s² of the original normal distribution. Therefore, we do not even need to calculate the distribution of T to assert that this distribution does not depend on s².
-----
The formal definition of the t distribution is therefore:
|
By definition, the T random variable has a tn distribution with n degrees of fredom if:
where: * U~N(0, 1), * X²~ * U and X² are independent. |
This definition makes no reference to the original problem that led to the identification of the t distribution. It can therefore be used in a more general context.
__________________________________________________________
|
Tutorial |
In this Tutorial, we establish the analytical form of the t distribution (a bit technical). Note that we also calculate it here by considering a T variable as the ratio of two independent r.v..
We then insist on the t distribution sort of paving a way from the normal distribution (very large samples) to the Cauchy distribution (2-observation samples) by adjusting the number of degrees of freedom.
We conclude by recalling that even though the number of degrees of freedom of the t distribution is by nature an integer, nothing precludes its mathematical form to accommodate non-integer degrees of freedom. This turn out to be useful to calculate approximate confidence intervals for the difference of two sample means when the variances of the normal distributions are not assumed to be equal (Welch's approximation).
THE ANALYTICAL FORM OF THE t DISTRIBUTION
|
General outline of the demonstration Definition of the t distribution The joint pdf of U and X² The distribution function of the t distribution The pdf of T The structure of F(t ) Differentiating F(t ) Special cases n = 1 : Cauchy distribution n infinite : normal distribution Non-integer number of "degrees of freedom" : Welch's approximation |
||
|
TUTORIAL |
||
______________________________________________________
Related readings:
|
Interval estimation |
|
|
t-test |
|
|
T as the ratio of two random variables |
|
|
The square of a Student's Tn is a Fisher's F1, n |
|
|
Normal distribution |
|
|
Cauchy distribution |