THE t -TEST
|
Can we trust a sample average ? What does confidence depend on ? |
Can we trust a sample average ?
t-tests deal with the general question of how much trust can be placed in a sample average. Here is an example. A bank introduces a new savings product. Before actually introducing the product, it polls a fraction of its customer base in its own town in order to figure how much money interested customers are considering investing in this new product. The bank collects the following numbers :
|
Customer |
Poll |
|
Smith |
950$ |
|
Jones |
1,100$ |
|
..... |
...... |
|
Doe |
800$ |
|
Brown |
1.200$ |
|
Average |
975$ |
Decisions will be made on the basis of this observed "975$" average value. How trustworthy is this number ? More specifically, t-tests will answer 3 kinds of questions, each one dealing with a slightly different problem.
1) The bank had anticipated a 1,000$ average answer to the poll. Is the 25$ difference with the observed 975$ significant ? The answer to this question is in the so-called "One sample t-test".
2) The project has been delayed, and the same customers that had been polled six months earlier are now polled again. The new average answer is 980$. Is the difference between this new average and the old average significant ? The answer to this question lies in the so-called "Two dependent samples t-test".
3) A similar poll is conducted in a neighboring town, and leads to an average answer of 995$. Is the difference between the two averages significant ? The answer to this question is the so-called "Two independent samples t-test".
Each of these question will lead to slightly different forms of the test.
What does confidence depend on ?
First, note that the test cannot work on average values alone. For example, the question "Is 975 significantly different from 1,000 ?" is meaningless. It becomes more meaningful, though, if you take all the numbers into consideration. Here are the two quantities that your trust on an average values will rely on. We describe them within the context on comparing an average value to a Reference Value ("One sample" test), but they are the same in the other two situations.
For a given difference between sample average and Reference Value, you will intuitively trust samples with very low dispersion, and distrust samples with broad dispersion, and your intuition is right (drag mouse pointer over image). In the extreme case, were all answers very nearly "975$", the bank would be almost sure that the "1,000$" anticipation was indeed optimistic.
If he bank collected 10,000 answers, it certainly will have more faith in the observed "975$" than if it collected only 20 answers.
So, how much faith we have in an observed average clearly depends on :
1) The size of the sample : the larger the sample, the more confidence in the observed average.
2) The dispersion of the values around the average : the smaller the dispersion, the more confidence in the observed average.
The T statistic
The test will assume that the population at large follows
a normal distribution (with mean m an variance
²).
This may sound rather restrictive, but it is in fact a very reasonable assumption.
The value of m is assumed to be equal to m0, and the value of
²
will first be considered as known, then will be considered as unknown.
We will therefore note m0 the 1,000$ mark, and call it the Reference Value.
Before we go any further, we have to make it clear that
the sample average M has to be considered a random variable.
What that means is that if we had conducted the poll over and over again on
different people each time, we
would have each time observed a different value for M. This is the reason
why we cannot trust M entirely.
It is a classical result that M has
a normal distribution of variance
²/n,
where n is the number of observations (polled people). This confirms
our earlier intuition that M's randomness decreases with sample's size,
and increases with the population's variance (assuming that the sample's
variance is a fair estimate of the population's variance. We'll come back to
this point later).
Note the scale on the illustrations. In the bottom picture,
there are 4 times as may observations than in the top picture, but the gaussian
curve is only half as wide, and is only twice as high as that of the top illustration.
We are going to illustrate how the test works by considering the "One sample test" case. The two other tests ("Two dependent samples" and "Two independent samples" are very similar.
The one sample t- test
So we have one sample with a numerical variable x. This sample contains n observations. We also have a reference value m0, that is the anticipated value of the mean of x over the whole population (not just the sample). The sample average is M. It is different from m0, and we wonder whether this difference is significant.
Note that the whole population is assumed to be normal, an important assumption.
How can we quantify the discrepancy between observed average M and the reference value m0 ? We are looking for a quantity T that will be a fair indicator of how bad the departure of M from m0 is.
Our first idea is simply to use T = (M - m0) : the larger this difference, the more we believe that it reflects a genuine error in our initial idea that m = m0.
But that won't work because for a given value of T, this difference is :
* more significant for samples with many observations than for samples with few observations,
* and more significant for "narrowly distributed" samples than for "broadly distributed" samples,
and we are looking for a quantity T whose significance does not depend on sample size or variance. So we "normalize" (M - m0) by dividing it by its own standard deviation of M - m0 :
T =
(M - m0)/(
/
)
The value of T measures how far away M is from the Reference Value m0 in "Standard Deviation of M" units.
T now follows a standard normal distribution (0 mean, standard deviation is 1), irrespective of sample size and variance. How the test is actually conducted from there is explained in the next paragraph.
Any quantity built from the sample (here,
T) is called a "statistic".
The variance
² is
unknown. Student's t distribution
We have been very optimistic in assuming
that the variance of the population was known. More often than not,
it is not. Yet, we want to salvage our idea of "normalizing" the
difference (M - m0) by its
own Standard Deviation. So we think of replacing the true Standard Deviation
by
the its estimated value from the sample.
We have now :
T = (M - m0) / (Estimated
Stand. Dev./ |
|
|
|
Problem is, when the sample changes, so does the Estimated Standard Deviation, which has then to be considered as a random variable too. Fortunately, its probability distribution is also known, although a bit more complex than a gaussian (Chi-square after proper normalization, see tutorial on Chi-square). Calculations then allow to figure out the exact distribution of T, that bears the name of Student's t distribution.
Graphically, Student's t distribution looks pretty much like a gaussian, the main difference being that its tails are "fatter" than that of a true gaussian. This can be interpreted as a consequence of the fact that estimating the Standard Deviation adds uncertainty to the case of known Standard Deviation, which, in turns, makes extreme values of T less improbable.
Because the area under a distribution-curve is always 1, the peak of a t curve is also lower than that of a gaussian. This can again be interpreted as the fact that, because the true variance is unknown, low values for T, that is M being very nearly m0, is less likely than in the "known variance" case.
|
|
In fact, the shape of the t distribution depends somewhat on
n : we have to specify n to know the exact shape of the distribution.
So there is not just one t distribution, there is a family of
t distributions, indexed by a number that is called the "number of degrees
of freedom" of the distribution, and that is usually noted "df".
It can be shown that in the "One sample" case, one has :
|
df = n - 1 |
We can now state that T follows a t distribution with
n - 1 degrees of freedom, a fact that is often noted :
T ~ t( n- 1)
As n (and therefore df) grows without limit, the t curve looks more
and more like a standard gaussian. This is not surprising, as when n
becomes very large, the estimated standard deviation becomes more and more credible,
and we are then almost back to the case when the population variance
²
is known.
We now turn to building the test proper.