t tests
t tests are a group of tests that all address the issue of the credibility :
* Of the mean of a sample drawn form a normal population as an estimate of the (unknown) mean of this population (one-sample t test).
* Of the difference between the means of two samples drawn from two normal populations as an estimate of the difference of the means of these populations (Two-sample t tests).
The following sample was drawn from a normal population whose mean value is believed to be µ0. For example, it may be the measured values of the diameters of some mechanical parts made by a machine tuned for making parts with a diameter µ0.

The mean value of the measurements is
,
which is slightly different from µ0.
Should this difference be considered significant ? In other words, is the difference
between
and µ0 so large that it clearly
belies the statement that the mean of the population from which the sample was
drawn is indeed µ0 (well
tuned machine) ?
In the vocabulary of tests, we therefore want to test :
* The null hypothesis H0 : µ = µ0, where µ is the true mean of the normal population, and where µ0 is called the "reference value",
* Against the alternative hypothesis H1 : µ ≠ µ0.
-----
The larger the absolute value of (
- µ0),
the more we'll tend to reject the null hypothesis to the benefit of the alternative
hypothesis. So it appears that (
- µ0)
could be used as a test statistic : it is a normally distributed r.v. with mean
0 (when H0 is true) and with variance σ²/n.
We cannot use is as such, though, because its distribution (when H0 is
true), depends on the variance σ² of the mother population.
Two things may then happen :
If the variance σ² is known, all we have
to do is standardize (
- µ0)
by dividing it by its own standard devitation used as a "unit length",
and thus obtain a standard normal variable z distributed as N(0,
1) whose distribution does not depend on σ² anymore :

where n is the sample size.
* Two-sided test
The null hypothesis is then rejected at the α risk level if the value of z is larger than zα/2 or less than -zα/2, where zα/2 is the value such that the area under the standard gaussian curve to the right of zα/2 is equal to α.

* One-sided test
The alternative hypothesis may be more restrictive than just H1 : µ ≠ µ0. For example, the hypothesis H1 : µ > µ0 states that µ is not just different from µ0, but more precisely that it is larger than µ0. It is then natural to reject the null hypothesis in favor of the alternative hypothesis if z is much larger than 0, an argument in favor of µ being larger than µ0.
The critical region is then defined by
z > zα

The variance of the population is unfortunately usually unknown, and the above expression for z cannot be used as such because it contains σ.
We are then exactly in the situation that led us to define Student's t distribution. We can therefore assert that the statistic Tn obtained by replacing in the expression of z the true standard deviation σ by

follows the tn - 1 distribution with (n - 1) degrees of freedom.
|
Tn ~ tn - 1 |
The test is then exactly the same as when the variance is known, except for the fact that the standard normal distribution of the statistic z has to be replaced by the tn - 1 distribution of the statistic Tn.
We just designed a test bearing on the equality of a mean and a fixed reference value.
We now generalize this question to two normal populations. It is believed that these two populations have the same mean (H0 : µ1 = µ2 ), and the issue is to assess whether the observed difference in the values of the two sample means is large enough to contradict this assumption (H1 : µ1 ≠ µ2 ).
At this point, we have to distinguish between two different situations : the two samples are either independent, or paired.
Suppose we have two batches of mechanical parts with identical designs.
* The first batch contains n1 parts made by machine M1 which is tuned for making parts with a diameter equal to µ1.
* The second batch contains n2 parts made by a second machine M2 also tuned for making parts with a diameter equal to µ1. Yet, the possibility that M1 and M2 do not have exactly the same tuning cannot be discarded, so we will call µ2 the mean diameter pf the parts machined by M2, and we just assume that µ1 = µ2.
The diameters of the parts in both batches are measured,
and the average values of these two series of measurements are denoted
1
and
2.

The question is whether the difference
(
2
-
1) is
so large that it contradicts the assumption of identical tuning of the two machines,
and therefore contradicts the null hypothesis µ1
= µ2.
Again, we'll use (
2
-
1)
as a starting point for devising the test statistic. As both populations
are assumed normal and independent, (
2
-
1) follows
a normal distribution, and this distribution is centered if µ1
= µ2 (H0 true).
* Variances are known
If the variances σ1² and σ2² of
the two populations are known, the variance of (
2
-
1) is
also know for :
-
The variance of
1 is σ1² / n1.
-
The variance of
2 is σ2² / n2.
- The
variance of
(
2
-
1)
is therefore σ1² / n1 + σ2² /
n2.
and we are back to the case of of a single sample with a reference value (here, 0).
* Variances are unknown
- Variances are equal
The variances of the two populations are assumed to be equal but unknow. We therefore need to identify an estimator of this common variance σ².
We'll show that the quantity

is an unbiased estimator of σ². It is called the pooled variance estimator.
We'll deduce that the statistic
|
|
follows the tn1 + n2 - 2 distribution, which does not contain σ and can therefore be used as a test statistic.
Then everything happens as with the one-sample test : the null hypothesis is rejected in favor of the alternative hypothesis if the value of T is in the critical region defined by the chosen value fot the risk level α.
- Variances are not equal
The "pooled variance" calculation explicitely assumes that the variances of the two populations are equal. It cannot be generalized to the case of two populations with unknown and unequal variances. If each of the two variances are estimated separately, one ends up with two independent t variables, and the distribution of their difference has no known functional form.
In fact, no exact test is known for testing the null hypothesis µ1 = µ2, but there exist two approximate tests.
* Asymptotic distribution
(
2
-
1)
is still normally distributed, and its variance is equal to the sum of the variances
of
1 and
of
2,
that is σ1²/n1 + σ2²/n2,
but the problem is again to estimate this variance.
For any distribution, the sample variance S ² is
a convergent estimator of the distribution variance. It can therefore be anticipated
that for large values of n1 and of n2,
the quantity S1²/n1 + S2²/n2 be
close to the variance of (
2
-
1),
and that the distribution of the rv T defined as

be close to the standard normal distribution.
This intuition is correct, and it can be shown that as n1 and of n2 grow without limit, T converges in distribution to a standard normal variable. This result, although quite intuitive, is difficult to demonstrate, and we state it without proof.
-----
So T is the statistic of an asymptotic test, that can be used as an approximate test for samples of moderate size. It should be kept in mind, though, that using asymptotic results on finite size samples is a heuristic whose quality can never be guaranteed, or even estimated.
* Welch's approximation
The above T variable is not a t variable
: we'll show that that the term under the radical of the denominator is not
a
variable
divided by its own number of degrees of freedom. Yet, it is a ponderated sum
of independent
variables,
a fact from which we'll deduce that T is approximately distributed
as a t variable with n* degrees of freedom, where
n* is not an integer.
This approximation is known as Welch's approximation.
Imagine a group of patients that receive an experimental treatment against high blood pressure. Before the treatment, the blood pressure of each of the n patients is measured, and this set of measurements is the first sample. The blood pressure of the patients is measured again after treatment, and this new set of measurements is the second sample.

Note that the two samples now necessarily have the same size.
It is assumed that the bood pressure in the general population is normally distributed with mean µ1 and variance σ². It is further assumed that if the general population received the treatment, the new distribution of the blood pressure would still be normal with variance σ² (variance does not change) with mean µ2 possibly different from µ1.
In the "two independent samples" case, the
observations were considered as realizations of two independent normal variables.
Now, this assumption certainly does not hold anymore. In particular, the variance
of (
2
-
1) cannot
be estimated by the sum of the variances of these two variables anymore, because
they are correlated (the value of their correlation coefficient is unknown,
but we won't need it). Actually, the assumption is now that the two samples
were drawn from a bivariate normal (or "binormal")
distribution.
The difference d =
2
-
is
normally distributed, and we therefore have a n-sample di = x2i
- x1i drawn from a (univariate) normal population, and
we want to assess the plausibility of the hypothesis "This normal
distribution is centered (mean 0)".
So we are back again to the one-sample case with reference value (here, 0). Hence, we can assert that the following statistic
follows the tn - 1 distribution (n - 1 degrees of freedom).
The t test compares the means of two samples, but it cannot be generalized to the comparison of the means of three samples or more.
When more than two means have to be compared, the first idea is to conduct a series of t tests on all pairs of means at the α risk level, and reject the null hypothesis (all the means are identical) if at least one test rejects its own null hypothesis (µi ≠ µj for some pair (i, j)), hoping that the global rejection is then also done at the α risk level.
For reasons that we explain here, this approach is defective because it leads to wrongly rejecting the null hypothesis more often than α.100% of the time.
The correct solution is then to call on a test called ANOVA (univariate), more complex than the t test, but that allows testing the equality of the means of any number of groups of observations at a specified risk level.
We introduced the t test for comparing the means of groups of observations. But in fact, the test proper relies only on the fact that we identified a statistic whose distribution was Student's t distribution. The conclusions we reached concerning whether the null hypothesis should be rejected or not would have been the same for any statistic following the t distribution.
This is in particular the case for the estimated parameters {βi} of a Linear Regression (whether Simple or Multiple), which are t distributed. One can then test the null hypothesis βi = 0 separately for each parameter in a context completely different from that described in this page.
The "two independent samples" t test explicitly assumes that the samples are drawn from normal distributions : it is a parametric test. Unfortunately, it is quite sensitive to the departure of these populations from strict normality, especially for samples with quite different sizes.
-----
Note that if we assume that :
* The two populations are normal,
* That their variances are equal,
* And that their means are also equal (H0 : µ1 = µ0),
then the t test may be regarded as testing the identity of the two populations from which the samples were drawn..
One can then use a non parametric identity test such as the Mann-Whitney test. This test is only slightly less powerful than the t test when the populations are actually normal, and is substantially better than the t test when the populations depart significantly from normality.
The multivariate generalization of the normal distribution is the multivariate normal (or multinormal) distribution. Just as the t test was adressing the issue of the equality of the means of two normal distributions with equal variances, on may wonder if two multinormal distributions with identical covariance matrices, and represented by two multivariate samples, have or not equal means.
This question is important for techniques such as Discriminant Analysis, that attempts to assign observations to classes assumed to be multinormal, and usually with identical covariance matrices : if the means of the distributions are equal, then the classes are in fact identical and any effort at building a classifier is pointless.
The question is solved by calculating the distribution of the squared Mahalanobis distance between the two sample means, the classes being assumed identical (means are equal, covariance matrices are equal). This difficult calculation is not given in this Glossary, but the result is simple, and is given here. The resulting test is called "Hotelling's T² test".
____________________________________________________________________________
|
Tutorial 1 |
Because the t test may be perceived as the archetypal test, we devote this first Tutorial to a detailed overview of the rationale behind the test.
OVERVIEW OF THE t TEST
|
What does confidence depend on ? Sample spread Sample size The T statistic The assumptions Variance is known Variance is unknown Student's t distribution Degrees of freedom |
||
|
TUTORIAL |
||
______________________________________
|
Tutorial 2 |
We now go over the mechanism of the t test for the three settings we mentioned :
* Reference value.
* Paire samples.
* Independent samples.
MECHANISM OF THE t TEST
|
The "Reference value" t test The "Paired samples" t test The "Independent samples t test" |
||
|
TUTORIAL |
||
______________________________________
|
Tutorial 3 |
Bzecause the t test is ubiquitous, we describe how the results of the test are most frequently displayed by software, and how to interpret them.
READING THE RESULTS OF A t TEST
|
Standard error Degrees of freedom Significance and p-value |
||
|
TUTORIAL |
||
____________________________________________
Related readings