t tests

t tests are a group of tests that all address the issue of the credibility :

    * Of the mean of a sample drawn form a normal population as an estimate of the (unknown) mean of this population  (one-sample t test).

    * Of the difference between the means of two samples drawn from two normal populations as an estimate of the difference of the means of these populations (Two-sample t tests).

One sample t test with reference value

The following sample was drawn from a normal population whose mean value is believed to be µ0. For example, it may be the measured values of the diameters of some mechanical parts made by a machine tuned for making parts with a diameter µ0.

 

 

 

 

The mean value of the measurements is , which is slightly different from µ0. Should this difference be considered significant ? In other words, is the difference between and µ0 so large that it clearly belies the statement that the mean of the population from which the sample was drawn is indeed µ0 (well tuned machine) ?

In the vocabulary of tests, we therefore want to test :

    * The null hypothesis H0 : µ = µ0, where µ is the true mean of the normal population, and where µ0 is called the "reference value",

    * Against the alternative hypothesis H1 : µ  µ0.

-----

The larger the absolute value of ( - µ0), the more we'll tend to reject the null hypothesis to the benefit of the alternative hypothesis. So it appears that ( - µ0) could be used as a test statistic : it is a normally distributed r.v. with mean 0 (when H0 is true) and with variance σ²/n. We cannot use is as such, though, because its distribution (when H0 is true), depends on the variance σ² of the mother population.

Two things may then happen :

Variance is known

If the variance σ² is known, all we have to do is standardize ( - µ0) by dividing it by its own standard devitation used as a "unit length", and thus obtain a standard normal variable z distributed as N(0, 1) whose distribution does not depend on σ² anymore :

 

where n is the sample size.

 

        * Two-sided test

The null hypothesis is then rejected at the α risk level if the value of z is larger than zα/2 or less than -zα/2, where zα/2 is the value such that the area under the standard gaussian curve to the right of zα/2 is equal to α.

 

 

 

        * One-sided test

The alternative hypothesis may be more restrictive than just H1 : µ  µ0. For example, the hypothesis H1 : µµ0 states that µ is not just different from µ0, but more precisely that it is larger than µ0. It is then natural to reject the null hypothesis in favor of the alternative hypothesis if z is much larger than 0, an argument in favor of µ being larger than µ0.

The critical region is then defined by

z > zα 

 

 

Variance is unknown

The variance of the population is unfortunately usually unknown, and the above expression for z cannot be used as such because it contains σ.

We are then exactly in the situation that led us to define Student's t distribution. We can therefore assert that the statistic Tn obtained by replacing in the expression of z the true standard deviation σ by

follows the tn - 1 distribution with (n - 1) degrees of freedom.

 

Tn ~ tn - 1

 

 

The test is then exactly the same as when the variance is known, except for the fact that the standard normal distribution of the statistic z has to be replaced by the tn - 1 distribution of the statistic Tn.

Two-sample t test

We just designed a test bearing on the equality of a mean and a fixed reference value.

We now generalize this question to two normal populations. It is believed that these two populations have the same mean (H0 : µ1 = µ2 ), and the issue is to assess whether the observed difference in the values of the two sample means is large enough to contradict this assumption (H1 : µ1  µ2 ).

At this point, we have to distinguish between two different situations : the two samples are either independent, or paired.

t test with two independent samples

Suppose we have two batches of mechanical parts with identical designs.

    * The first batch contains n1 parts made by machine M1 which is tuned for making parts with a diameter equal to µ1.

    * The second batch contains n2 parts made by a second machine M2 also tuned for making parts with a diameter equal to µ1. Yet, the possibility that M1 and M2 do not have exactly the same tuning cannot be discarded, so we will call µ2 the mean diameter pf the parts machined by M2,  and we just assume that µ1 = µ2.

The diameters of the parts in both batches are measured, and the average values of these two series of measurements are denoted 1 and 2.

 

 

 

The question is whether the difference (2 - 1) is so large that it contradicts the assumption of identical tuning of the two machines, and therefore contradicts the null hypothesis  µ1 = µ2.

Again, we'll use (2 - 1) as a starting point for devising the test statistic. As both populations are assumed normal and independent, (2 - 1) follows a normal distribution, and this distribution is centered if µ1 = µ2 (H0 true).

    * Variances are known

If the variances σ1² and σ2² of the two populations are known, the variance of (2 - 1) is also know for :

        - The variance of  1 is σ1² / n1.

        - The variance of  2 is σ2² / n2.

        - The variance of (2 - 1) is therefore σ1² / n1 + σ2² / n2.

and we are back to the case of of a single sample with a reference value (here, 0).

 

    * Variances are unknown

        - Variances are equal

The variances of the two populations are assumed to be equal but unknow. We therefore need to identify an estimator of this common variance σ².

We'll show that the quantity

 

is an unbiased estimator of σ². It is called the pooled variance estimator.

We'll deduce that the statistic

 

 

 

follows the tn1 + n2 - 2 distribution, which does not contain σ and can therefore be used as a test statistic.

Then everything happens as with the one-sample test : the null hypothesis is rejected in favor of the alternative hypothesis if the value of T is in the critical region defined by the chosen value fot the risk level α.

 

        - Variances are not equal

The "pooled variance" calculation explicitely assumes that the variances of the two populations are equal. It cannot be generalized to the case of two populations with unknown and unequal variances. If each of the two variances are estimated separately, one ends up with two independent t variables, and the distribution of their difference has no known functional form.

In fact, no exact test is known for testing the null hypothesis µ1 = µ2, but there exist two approximate tests.

 

                * Asymptotic distribution

(2 - 1) is still normally distributed, and its variance is equal to the sum of the variances of  1 and of 2, that is σ1²/n1 + σ2²/n2, but the problem is again to estimate this variance.

For any distribution, the sample variance S ² is a convergent estimator of the distribution variance. It can therefore be anticipated that for large values of n1 and of n2, the quantity S1²/n1 + S2²/n2 be close to the variance of (2 - 1), and that the distribution of the rv T defined as

 

be close to the standard normal distribution.

This intuition is correct, and it can be shown that as n1 and of n2 grow without limit, T converges in distribution to a standard normal variable. This result, although quite intuitive, is difficult to demonstrate, and we state it without proof.

-----

 So T is the statistic of an asymptotic test, that can be used as an approximate test for samples of moderate size. It should be kept in mind, though, that using asymptotic results on finite size samples is a heuristic whose quality can never be guaranteed, or even estimated.

                * Welch's approximation

The above T variable is not a t variable : we'll show that that the term under the radical of the denominator is not a  variable divided by its own number of degrees of freedom. Yet, it is a ponderated sum of independent  variables, a fact from which we'll deduce that T is approximately distributed as a t variable with n* degrees of freedom, where n* is not an integer.

This approximation is known as Welch's approximation.

t test with two paired samples

Imagine a group of patients that receive an experimental treatment against high blood pressure. Before the treatment, the blood pressure of each of the n patients is measured, and this set of measurements is the first sample. The blood pressure of the patients is measured again after treatment, and this new set of measurements is the second sample.

 

 

 

Note that the two samples now necessarily have the same size.

It is assumed that the bood pressure in the general population is normally distributed with mean µ1 and variance σ². It is further assumed that if the general population received the treatment, the new distribution of the blood pressure would still be normal with variance σ² (variance does not change) with mean µ2 possibly different from µ1.

In the "two independent samples" case, the observations were considered as realizations of two independent normal variables. Now, this assumption certainly does not hold anymore. In particular, the variance of (2 - 1) cannot be estimated by the sum of the variances of these two variables anymore, because they are correlated (the value of their correlation coefficient is unknown, but we won't need it). Actually, the assumption is now that the two samples were drawn from a bivariate normal (or "binormal") distribution.

The difference d = 2 -  is normally distributed, and we therefore have a n-sample di = x2i - x1i drawn from a (univariate) normal population,  and we want to assess the plausibility of the hypothesis  "This normal distribution is centered (mean 0)".

So we are back again to the one-sample case with reference value (here, 0). Hence, we can assert that the following statistic

 

 

follows the tn - 1 distribution (n - 1 degrees of freedom).

More than two groups of observations : ANOVA

The t test compares the means of two samples, but it cannot be generalized to the comparison of the means of three samples or more.

When more than two means have to be compared, the first idea is to conduct a series of t tests on all pairs of means at the α risk level, and reject the null hypothesis (all the means are identical) if at least one test rejects its own null hypothesis (µi ≠ µj for some pair (i, j)), hoping that the global rejection is then also done at the α risk level.

For reasons that we explain here, this approach is defective because it leads to wrongly rejecting the null hypothesis more often than α.100% of the time.

The correct solution is then to call on a test called ANOVA (univariate), more complex than the t test, but that allows testing the equality of the means of any number of groups of observations at a specified risk level.

t tests in other contexts

We introduced the t test for comparing the means of groups of observations. But in fact, the test proper relies only on the fact that we identified a statistic whose distribution was Student's t distribution. The conclusions we reached concerning whether the null hypothesis should be rejected or not would have been the same for any statistic following the t distribution.

This is in particular the case for the estimated parameters {βi} of a Linear Regression (whether Simple or Multiple), which are t distributed. One can then test the null hypothesis βi = 0 separately for each parameter in a context completely different from that described in this page.

Non paramtric test of equality of means : the Mann-Whitney

The "two independent samples" t test explicitly assumes that the samples are drawn from normal distributions : it is a parametric test. Unfortunately, it is quite sensitive to the departure of these populations from strict normality, especially for samples with quite different sizes.

-----

Note that if we assume that :

    * The two populations are normal,

    * That their variances are equal,

    * And that their means are also equal (H0 : µ1 = µ0),

then the t test may be regarded as testing the identity of the two populations from which the samples were drawn..

One can then use a non parametric identity test such as the Mann-Whitney test. This test is only slightly less powerful than the t test when the populations are actually normal, and is substantially better than the t test when the populations depart significantly from normality.

Multivariate generalization, Hotelling's T² test

The multivariate generalization of the normal distribution is the multivariate normal (or multinormal) distribution. Just as the t test was adressing the issue of the equality of the means of two normal distributions with equal variances, on may wonder if two multinormal distributions with identical covariance matrices, and represented by two multivariate samples, have or not equal means.

This question is important for techniques such as Discriminant Analysis, that attempts to assign observations to classes assumed to be multinormal, and usually with identical covariance matrices : if the means of the distributions are equal, then the classes are in fact identical and any effort at building a classifier is pointless.

The question is solved by calculating the distribution of the squared Mahalanobis distance between the two sample means, the classes being assumed identical (means are equal, covariance matrices are equal). This difficult calculation is not given in this Glossary, but the result is simple, and is given here. The resulting test is called "Hotelling's T² test".

 

____________________________________________________________________________

 

 

 

Tutorial 1

 

Because the t test may be perceived as the archetypal test, we devote this first Tutorial to a detailed overview of the rationale behind the test.

 

OVERVIEW OF THE t TEST

What does confidence depend on ?

Sample spread

Sample size

The T statistic

The assumptions

Variance is known

Variance is unknown

Student's t distribution

Degrees of freedom

TUTORIAL

______________________________________

 

 

 

Tutorial 2

 

 We now go over the mechanism of the t test for the three settings we mentioned :

    * Reference value.

    * Paire samples.

    * Independent samples.

 

MECHANISM OF THE t TEST

 The "Reference value" t test

The "Paired samples" t test

The "Independent samples t test"

TUTORIAL

______________________________________

 

Tutorial 3

 

Bzecause the t test is ubiquitous, we describe how the results of the test are most frequently displayed by software, and how to interpret them.

 

READING THE RESULTS OF A t TEST

Standard error

Degrees of freedom

Significance and p-value

TUTORIAL

 

 ____________________________________________

 

Related readings

Confidence intervals

Normal distribution

t distribution

ANOVA

Mann-Whitney test

Download this Glossary