top

 

 


THE  t -TEST

What is a t-test ?

Can we trust a sample average ?

What does confidence depend on ?

Sample spread

Sample size

The T statistic

The assumptions

The variance is known

The variance is unknown. Student's t-distribution

Degrees of freedom

 

 

 

What is a t-test ?

Can we trust a sample average ?

    t-tests deal with the general question of how much trust can be placed in a sample average. Here is an example. A bank introduces a new savings product. Before actually introducing the product, it polls a fraction of its customer base in its own town in order to figure how much money interested customers are considering investing in this new product. The bank collects the following numbers :

 

 

Customer

Poll

Smith

950$

Jones

1,100$

.....

......

Doe

800$

Brown

1.200$

Average

975$

 

 

Decisions will be made on the basis of this observed "975$" average value. How trustworthy is this number ? More specifically, t-tests will answer 3 kinds of questions, each one dealing with a slightly different problem.

    1) The bank had anticipated a 1,000$ average answer to the poll. Is the 25$ difference with the observed 975$ significant ? The answer to this question is in the so-called "One sample t-test".

    2) The project has been delayed, and the same customers that had been polled six months earlier are now polled again. The new average answer is 980$. Is the difference between this new average and the old average significant ? The answer to this question lies in the so-called "Two dependent samples t-test".

    3) A similar poll is conducted in a neighboring town, and leads to an average answer of 995$. Is the difference between the two averages significant ? The answer to this question is the so-called "Two independent samples t-test".

 

Each of these question will lead to slightly different forms of the test.

 

What does confidence depend on ?

First, note that the test cannot work on average values alone. For example, the question "Is 975 significantly different from 1,000 ?" is meaningless. It becomes more meaningful, though, if you take all the numbers into consideration. Here are the two quantities that your trust on an average values will rely on. We describe them within the context on comparing an average value to a Reference Value ("One sample" test), but they are the same in the other two situations.

Sample spread

        For a given difference between sample average and Reference Value, you will intuitively trust samples with very low dispersion, and distrust samples with broad dispersion, and your intuition is right (drag mouse pointer over image). In the extreme case, were all answers very nearly "975$", the bank would be almost sure that the "1,000$" anticipation was indeed optimistic.

Sample size

        If he bank collected 10,000 answers, it certainly will have more faith in the observed "975$"  than if it collected only 20 answers.

 

 

So, how much faith we have in an observed average clearly depends on :

    1) The size of the sample : the larger the sample, the more confidence in the observed average.

    2) The dispersion of the values around the average : the smaller the dispersion, the more confidence in the observed average.

The T statistic

The assumptions

The test will assume that the population at large follows a normal distribution (with mean m an variance ²). This may sound rather restrictive, but it is in fact a very reasonable assumption. The value of m is assumed to be equal to m0, and the value of ² will first be considered as known, then will be considered as unknown.

We will therefore note m0 the 1,000$ mark, and call it the Reference Value.


Before we go any further,  we have to make it clear that the sample average M has to be considered a random variable. What that means is that if we had conducted the poll over and over again on different people each time, we would have each time observed a different value for M. This is the reason why we cannot trust M entirely.

 

It is a classical result that M has a normal distribution of variance ²/n, where n is the number of observations (polled people). This confirms our earlier intuition that M's randomness decreases with sample's size, and increases with the population's variance (assuming that the sample's variance is a fair estimate of the population's variance. We'll come back to this point later).


Note the scale on the illustrations. In the bottom picture, there are 4 times as may observations than in the top picture, but the gaussian curve is only half as wide, and is only twice as high as that of the top illustration.

We are going to illustrate how the test works by considering the "One sample test" case. The two other tests ("Two dependent samples" and "Two independent samples" are very similar.

The one sample t- test

So we have one sample with a numerical variable x. This sample contains n observations. We also have a reference value m0, that is the anticipated value of the mean of x over the whole population (not just the sample). The sample average is M. It is different from m0, and we wonder whether this difference is significant.

Note that the whole population is assumed to be normal, an important assumption.

The variance ² is known

How can we quantify the discrepancy between observed average M and the reference value m0 ? We are looking for a quantity T that will be a fair indicator of how bad the departure of M from m0 is.

Our first idea is simply to use T = (M - m0) : the larger this difference, the more we believe that it reflects a genuine error in our initial idea that m =  m0.

But that won't work because for a given value of T, this difference is :

    * more significant for samples with many observations than for samples with few observations,

    * and more significant for "narrowly distributed" samples than for "broadly distributed" samples,

 

and we are looking for a quantity T  whose significance does not depend on sample size or variance. So we "normalize" (M - m0) by dividing it by its own standard deviation of M - m0 :

T  =  (M - m0)/(/)

The value of T measures how far away M is from the Reference Value m0 in "Standard Deviation of M" units.

T  now follows a standard normal distribution (0 mean, standard deviation is 1), irrespective of sample size and variance. How the test is actually conducted from there is explained in the next paragraph.


Any quantity built from the sample (here, T) is called a "statistic".

The variance ² is unknown. Student's t distribution

We have been very optimistic in assuming that the variance of the population was known. More often than not, it is not. Yet, we want to salvage our idea of "normalizing" the difference (M - m0) by its own Standard Deviation. So we think of replacing the true Standard Deviation  by the its estimated value from the sample. We have now :
 

T =  (M - m0) / (Estimated Stand. Dev./)

 

 

Problem is, when the sample changes, so does the Estimated Standard Deviation, which has then to be considered as a random variable too. Fortunately, its probability distribution is also known, although a bit more complex than a gaussian (Chi-square after proper normalization, see tutorial on Chi-square). Calculations then allow to figure out the exact distribution of T, that bears the name of Student's t distribution.

 

Graphically, Student's t distribution looks pretty much like a gaussian, the main difference being that its tails are "fatter" than that of a true gaussian. This can be interpreted as a consequence of the fact that estimating the Standard Deviation adds uncertainty to the case of known Standard Deviation, which, in turns, makes extreme values of T  less improbable.

Because the area under a distribution-curve is always 1, the peak of a t curve is also lower than that of a gaussian. This can again be interpreted as the fact that, because the true variance is unknown, low values for T, that is M being very nearly m0, is less likely than in the "known variance" case.

Degrees of freedom


In fact, the shape of the t distribution depends somewhat on n : we have to specify n to know the exact shape of the distribution.

So there is not just one t distribution, there is a family of t distributions, indexed by a number that is called the "number of degrees of freedom" of the distribution, and that is usually noted "df". It can be shown that in the "One sample" case, one has :
 

df = n - 1


We can now state that T follows a t distribution with n - 1 degrees of freedom, a fact that is often noted  :

T ~ t( n- 1)

As n (and therefore df) grows without limit, the t curve looks more and more like a standard gaussian. This is not surprising, as when n becomes very large, the estimated standard deviation becomes more and more credible, and we are then almost back to the case when the population variance ² is known.

 

We now turn to building the test proper.