Interactive animation

t distribution

Recall that if X~N(µ, ²) is a random standard normal variable, the sample mean m of n-observation samples is also normally distributed:

    * with mean µ,

    * and variance ²/n.

m~N(µ, ²/n)

The variable

 

is the departure of the sample mean from the true distribution mean, standardized by its own distribution's standard deviation used as "unit length".

m' is normal with mean 0 and unit variance:

m'~N(0, 1)

 

Therefore, if the variance ² of a normal distribution is known, the sample mean is transformed into a standard normal variable in a very simple way.

The T statistic

Now what happens if the mean µ is known, but not the variance ² ? In the expression of m', one replaces ² by its unbiased estimate:

to obtain the so-called T statistic. If we denote

we get

 

very much like before.


Note that T can be calculated from the observations only, as its expression does not contain anymore.

Unfortunately, S is now a random variable that prevents the distribution of from being standard normal. What is the distribution of T ?

Student's t distribution

This distribution is known as "Student's t distribution", or simply "t distribution". Its exact shape depends on n, that is therefore a parameter of the distribution.

 

The distribution of T (for n-observation samples) is called the "t distribution with n - 1 degrees of freedom", and is denoted tn-1:

T~tn-1

 

(The reason why the number of degrees of freedom is n - 1 and not n is given here).

Animation

The following interactive animation illustrates the t distribution.

 

 

The "Book of Animations" on your computer

 

 

 

1) Upper frame

    Displays a normal distribution, and a sample drawn from this distribution. The distribution is standard (mean 0 and unit variance), but this is not fundamental.

 

2) Middle display

    Below the upper frame are displayed:

        * To the left, the difference between the sample mean and the distribution mean (here, 0). This is T''s numerator.

        * To the right is T''s denominator

with

 

3) Lower frame

    * The tallest curve (blue) is the standard normal distribution. It is the distribution of m' (that is, when the variance ² is known).

    * The middle curve (red) is the t distribution corresponding to the number of observations in the sample. This curve is symmetric with respect to the vertical axis T = 0.

Note that the number of degrees of freedom (df) is n - 1.

    * The shortest curve (black) is the t distribution for n = 2 (df = 1). We show in the Tutorial that it is the Cauchy distribution.

 

Below the frame is the value of the T statistic for the current sample.

_______________________________

 

    * Change the number of points and observe how the shape of the t distribution changes.


You may click on "Mask sample" to avoid being disturbed by the sample.

Although it always looks like the standard gaussian, it is never as tall as the gaussian. But the area under the curve is always 1, so the missing area in the central zone is to be found in the "tails" of the t distribution, that are always fatter than that of the gaussian.

This is a consequence of the fact that the sum of the squares of the distances of the observations to the sample mean is always less that the sum of the squares of the distances of the observations to the true mean. Because we are forced to estimate the distribution variance, we introduce an uncertainty about the value of the ratio of:

 

Large values of T are therefore more probable than the same values of m' (that is, when the variance ² is known).

This remark is the vary basis of t-tests.

 

    * Observe that the t distribution converges towards the standard normal distribution for large values of n. This reflects the fact that the estimated variance converges in probability towards the true variance when n grows without limit.

 

    * Observe that the tails of the t distribution become more pronounced for smaller values of n.

 

    * The T statistic needs at least two observations. When n = 2 (df = 1), the t distribution is identical to another classical distribution, the Cauchy distribution. It's tails are so fat that they prevent it from having a mean (as well as any higher order moment).

 

    * Select a value for n, then click on "Go" and observe the progressive build-up of the histogram of the corresponding t distribution.

___________________________________________

Applications of the t distribution

General

The key point about the t distribution is that it does not depend on the variance s² of the original normal distribution. This point is made clear below.

The T statistic is therefore a pivotal quantity, from which it is possible to devise :

Linear regression

We show that under the standard assumptions of Simple Linear Regression, the coefficients (slope and intercept) of the Least Squares Line are both normally distributed. But, contrary to what we assumed when we defined the T statistic, not only are the variances of these normal distributions unknown, but their means are also unknown and have to be estimated. So, for either the slope or the intercept, the distribution of the standardized coefficient now involves the estimation of two parameters instead of just one.

As a consequence, it can be shown that the standardized coefficients are distributed as tn - 2, and this is the distribution that has to be taken into account when elaborating confidence intervals and tests for the regression coefficients.

 

This result is difficult, and is not demonstrated, but it should come as no surprise that estimating two parameters instead of one leads to losing two degrees of freedom instead of one.


For just a little bit more on "losing degrees of freedom", please see here. This Tutorial bears on the Chi-square distribution, but we show below that the t distribution is intimately linked to the Chi-square distribution.

Formal definition of the t distribution

We can now elaborate a more general formal definition of the t distribution. Take the expression for T, and divide both the numerator and the denominator by , the true standard deviation:

 

    1) The new numerator of T is

which is N(0, 1).
 

    2) The new denominator of T can be written:

But the term

 

is n-1 (see here).

The denominator under the radical is n - 1, that is just the number of degrees of freedom of the numerator.

 

3) The numerator and denominator of T are independent random variables. If we write T under its original form:

    * The numerator is normally distributed,

    * with estimated standard deviation S.

 

and these quantities are known to be independent.

 

4) Note that T is now identified as the ratio of two independent variables, the distributions of which do not depend on the variance s² of the original normal distribution. Therefore, we do not even need to calculate the distribution of T to assert that this distribution does not depend on s².

-----

The formal definition of the t distribution is therefore:

 


By definition, the T random variable has a tn distribution with n degrees of fredom if:

where:

    * U~N(0, 1),

    * X²~n,

    * U and X² are independent.

 

 

This definition makes no reference to the original problem that led to the identification of the t distribution. It can therefore be used in a more general context.

__________________________________________________________

 

 

Tutorial

 

In this Tutorial, we establish the analytical form of the t distribution (a bit technical). Note that we also calculate it here by considering a T variable as the ratio of two independent r.v..

We then insist on the t distribution sort of paving a way from the normal distribution (very large samples) to the Cauchy distribution (2-observation samples) by adjusting the number of degrees of freedom.

We conclude by recalling that even though the number of degrees of freedom of the t distribution is by nature an integer, nothing precludes its mathematical form to accommodate non-integer degrees of freedom. This turn out to be useful to calculate approximate confidence intervals for the difference of two sample means when the variances of the normal distributions are not assumed to be equal (Welch's approximation).

 

 

 

THE ANALYTICAL FORM OF THE t DISTRIBUTION

General outline of the demonstration

Definition of the t distribution

The joint pdf of U and X²

The distribution function of the t distribution

The pdf of T

The structure of F(t )

 Differentiating F(t )

Special cases

n = 1 : Cauchy distribution

n infinite : normal distribution

Non-integer number of "degrees of freedom" : Welch's approximation

TUTORIAL

 

______________________________________________________

 

Related readings:

Interval estimation

t-test

T as the ratio of two random variables

The square of a Student's Tn is a Fisher's F1, n

Normal distribution

Cauchy distribution

Download this Glossary

 

Want to contribute to this site ?