t tests
t tests are a group of three tests that all bear on the same question :
"How much trust can be placed in the sample mean as a guess of the mean of the normal distribution from which the sample was drawn ?"
The following image displays :
* A sample (red dots),
* And a reference value µ0.

The question is : "Is the sample mean m significantly different from the reference value µ0 ?".
-----
If the sample is known to have been generated by a normal distribution whose mean µ is unknown, the question can be rephrased as : "How likely is it that the mean of this distribution was indeed µ0 ?". In other words : "How likely is it that the sample was generated by the normal distribution shown on the above illustration ?".
If the mean of the distribution is indeed µ0, one should expect the sample mean m to be close to the reference value µ0. A large difference between µ0 and m would therefore lead us to the conclusion that it is unlikely that the sample was generated by this distribution, and hence to reject the null hypothesis
H0 : µ = µ0.
There are two versions of this basic test.
* The simplest version assumes that the variance s² of the normal distribution that generated the sample is known. The test then relies on the fact that the distribution of the standardized sample mean is the standard normal distribution N(0, 1).
* Unfortunately, more often than not, the variance of the generating distribution is unknown. This variance then has to be estimated from the sample, and this increases the level of uncertainty about the true position of the distribution mean. Yet, the distribution of the standardized sample mean is still known : it is a (Student's) t distribution, which is similar to, but broader than the standard normal distribution.
A further distinction can be made between two different questions.
* The first question asks
only if there is a significant difference between m and µ0,
but is not concerned about whether this difference is positive or negative.
The corresponding test is then called "two-sided", and the alternative
hypothesis is then simply H1 : µ
µ0.
* But suppose that we are
concerned about the mean of the population being not just different, but more
specifically larger (resp. smaller) than µ0, then a variant
of the test, called the "one-sided t-test" will test H0 against
the alternative hypothesis H1 : µ
µ0
(resp. µ
µ0).
The following image displays two samples. Not only do the samples have the same size, but the observations are numbered so that to each observation of the first sample corresponds an observation of the second sample. The samples are then said to be paired, or matched.

Note that if the observations of the first sample are numbered in order of increasing value, then it will not necessarily be so for the second sample.
This setting may occur in situations like this one. The blood pressure of a group of patients has been measured :
* Before treatment (red sample),
* And after treatment (blue sample).
so that each patient is represented by two observations.
The question is : "Did the treatment have any significant effect ?".
Note that the question is not : "Are the means of the two samples significantly different", but rather : "Is the average shift of the observations due to the treatment significantly different from 0 ?".
-----
Assuming that the two samples were drawn from normal populations with the same variance, we'll show that this problem can easily be transformed into a one-sample t-test as described above.
The following illustration displays two samples (that do not necessarily have the same size).

The samples have different origins : for example, they could represent the cholesterol level of two groups of people with different eating habits.
The question is : "Are the means m1 and m2 of these two groups significantly different ?".
-----
Assuming that the samples were drawn from two normal populations with the same variance but unknown means µ1 and µ2, the question can be reformulated as : "How likely is it that the means of the two populations are equal ?".
Again, a large difference between the two sample means should lead us to reject the null hypothesis H0 : "µ1 = µ2".
-----
Again, the test has two versions depending on whether the common variance of the two populations is known or not. If it is known, the difference between the two standardized sample means will be Ñ(0, 1), else it will be t distributed.
The test also has a one-sided and a two-sided version.
The t test on independent samples compares the means of two groups of observations. What if we have three groups of observations, or more ?
The natural idea is to run a series of t tests on every pair of groups, and declare that the means of these groups are different if we find at least one pair of groups such that the hypothesis of equality of the underlying distributions is rejected.
For reasons that are explained here, this procedure is defective because it will reject the hypothesis of equality of the means of the populations more often than it should.
The proper solution is then to call on the ANOVA test, which is the correct generalization of the simple t test to more than two groups of observations.
The utility of the t test extends beyond these basic examples. In particular, it will be used in Multiple Linear Regression to assess whether a given parameter of the model is significantly different from 0 or not. If it is not, then the corresponding predictor will be considered as not carrying any relevant information for the model, and will be discarded.
t tests relies on strong assumptions :
* The samples are drawn from normal populations.
* When there are two samples,
the variances (either known or unknown) are identical.
If there are serious
doubts about the validity of these assumptions, one should resort to a
non parametric test : the Mann-Whitney
test. This test will assess whether it is likely that two samples were drawn
from the same (or identical) population(s) without making any stringent assumption
about the nature of this population.
____________________________________________________________________
|
Tutorial 1 |
Because the t test may be perceived as the archetypal test, we devote this first Tutorial to a detailed overview of the rationale behind the test.
OVERVIEW OF THE t TEST
|
What does confidence depend on ? Sample spread Sample size The T statistic The assumptions Variance is known Variance is unknown Student's t distribution Degrees of freedom |
||
|
TUTORIAL |
||
______________________________________
|
Tutorial 2 |
We now go over the mechanism of the t test for the three settings we mentioned :
* Reference value.
* Paire samples.
* Independent samples.
MECHANISM OF THE t TEST
|
The "Reference value" t test The "Paired samples" t test The "Independent samples t test" |
||
|
TUTORIAL |
||
______________________________________
|
Tutorial 3 |
Bzecause the t test is ubiquitous, we describe how the results of the test are most frequently displayed by software, and how to interpret them.
READING THE RESULTS OF A t TEST
|
Standard error Degrees of freedom Significance and p-value |
||
|
TUTORIAL |
||
____________________________________________
Related readings
|
Want to contribute to this site ? |