Interactive animation

Chi-square  (distribution)

Definition of the Chi-square distribution

The distribution of the sample mean of a N(0,1) distribution is N(0,1/n), where n is the sample size. What is the distribution of the sample variance ?

The concept of variance was defined so as to adequately describe the dispersion of the observations in a sample around the mean of the distribution. It turns out that the average of the squared distances of the observations in the sample to the distribution mean has "good" mathematical properties that justify a posteriori the definition of the variance.

So one is quite naturally lead, for the special case of the N(0,1) distribution, to study the distribution of the sum of the squared differences of the observations in the sample to the distribution mean 0, that is, in fact, to the sum of the squared values of these observations.


The reason why we focus on the sum of these squares rather than on their average value is explained in the Tutorial below, see "Additivity".

So, by definition, the Chi-square distribution is that of the sum of the squared values of the observations drawn from the N(0,1) distribution. It is denoted by .

More precisely, and more formally :

 

 

(X1² + X2² +...+ Xn²) ~ n

Degrees of freedom

So there is not one  distribution, but a family of distributions, indexed by n. This parameter is called the "number of degrees of freedom" of the distribution (this same expression is found in several other distribution families, like Student's t or Fisher's F).

The "Chi-square distribution with n degrees of freedom" is therefore that of the sum of n independent squared r.v. all ~N(0,1).

The general N(µ,s²) distribution

Sum of the squared differences

Let X be any normally distribution r.v. :

 X~N(µ,s²)

Recall that the change of variable :

X ' = (X - µ)/s

turns any normal r.v. X into a standard normal variable X '~N(0,1).

So if X~N(µ,s²), the sum of the squared standardized observations of a n-sample is distributed as n.

Variance

In practice, one is more interested by the distribution of the average value of the squared observations rather than by their sum. Let :

S² = 1/n.Si(xi - µ

The simple change of variable :

Mean = Sum / n

shows that :

nS²/s² ~ n

Estimating the distribution mean

So far, we assumed that the distribution mean µ was known. In practice, this is rarely the case. So one is lead to replace the true mean µ by its estimated value

= 1/n.Sixi

in the above expression.

But  is a random variable, and there is now no reason to believe that the modified nS²/s² is  distributed anymore.

Distribution of the sample variance

We finally get to the question of interest to the analyst : "What is the distribution of the sample variance of the normal distribution ?".

Fundamental result

Let :

s² = 1/(n - 1).S(xi -

which is the sample variance, an unbiased estimator of the distribution variance.

We will show that :
 

(n - 1)s²/s² ~n-1

 

 

So it appears that replacing the distribution mean by the sample mean does not change the nature of the distribution of the sample variance (it remains ) : it simply reduces by one unit the number of degrees of freedom of the distibution.

 

This result is fundamental.


Note that replacing the distribution variance by the sample variance has a deeper effect on the distribution of the standardized sample mean : this distribution is then no longer normal, but is a t distribution instead (see here).

Losing a degree of freedom

The transition from "n" to "n - 1" is called "losing a degree of freedom". This phenomenon is quite general, and will be encountered in other circumstances involving  distributions, t distributions or F distributions. It is a consequence of having to replace an unknown parameter by its estimate.

Animation

You'll find here an interactive animation illustrating the Chi-square distribution. It compares the distributions of the sample variance depending on whether the distribution mean is assumed to be known, or else estimated.

Independence of the sample mean and the sample variance

In the course of demonstrating the above result, we'll incidentally demonstrate another important result pertaining to the normal distribution :

 

The sample mean  and the sample variance s² are independent random variables

 

 

This is a characteristic property of the normal distribution : a distribution such that the sample mean and the sample variance are independent r.v. is necessarily normal (difficult).

Importance of the distribution

The basic Chi-square test

Now that the distribution of the sample variance is known, it is possible to test the hypothesis H0 : s² = s0² about the true value of a normal distribution.

-----

But the importance of the  distribution extends beyond the issue of the variance of a normal distribution. It happens that several important statistics follow approximately distributions for large samples, and that it is therefore possible to design tests about the corresponding quantities.

Goodness-of-fit Chi-square test

A common problem in Statistics is to assess the plausibility of the assertion : "This sample was generated by this candidate distribution". It is possible to test this hypothesis through a statistic that follows approximately a Chi-square distribution.

Chi-square of identity

In the same spirit, it is possible to test the assertion : "These two samples were drawn from identical distributions" with a statistic that follows approximately a Chi-square distribution.

Chi-square test of independence

Given two discrete variables X and Y over finite ranges, it is possible to test the hypothesis "X and Y are independent" with a statistic that follows approximately a Chi-square distribution.

___________________________________

 

 

Tutorial 1

 

In this Tutorial, we establish the basic properties of the Chi-square distribution. We have here an excellent example of the efficacy of the moment generating function, without which calculating the probability density function of the Chi-square distribution would be difficult.

We will also use the fact that the Chi-2 distribution will be recognized as a special case of the Gamma distribution.

-----

Knowing the explicit (and complicated) analytical form of the Chi-square distribution is not as useless as it may first seem. For example, it will come in handy for identifying a sufficient statistic for the variance of the normal distribution (see here).

 

 

 

BASIC PROPERTIES OF THE CHI-SQUARE DISTRIBUTION

The 1 distribution

Cumulative distribution function of 1

Probability density function of 1

Moment generating function of 1

Moment generating function of n

 

Probability density function of n  

Moments, mode

Mean

Variance

Mode

Special cases

n = 2 : exponential distribution

n = 1 : vertical asymptote

Additivity

 

 

TUTORIAL

 ___________________________________________________________________

 

 

Tutorial 2

 

We now demonstrate the fundamental result :

(n - 1)s²/s² ~n-1

that expresses the fact that replacing the true distribution mean by the sample mean :

    * Preserves the "" nature of the distribution of the sample variance,

    * But causes a loss of one degree of freedom of this distribution.

 

We first go over the 2-observation sample case as it can be represented graphically, as well as the demonstration and the final result.

We then move on to the demonstration for samples of any size. We'll use an elementary demonstration that does not call on Linear Algebra.

-----

This demonstration will incidentally establish another very important result :

The sample mean  and the sample variance s² of the normal distribution are independent random variables.

 

 

DISTRIBUTION OF THE SAMPLE VARIANCE

OF THE NORMAL DISTRIBUTION

Case n = 2

The general case

Another expression for the sample variance

Changing the reference frame

Distribution of the sample variance

Independence of the sample mean and the sample variance

 

TUTORIAL

 

 

 

* Shapes of the distribution (number of degrees of     freedom is adjustable).
* Progressive histogram of the true variance or of the     estimated variance.

 

 

 

 

 

 

 

 

________________________________________________________________

 

Related readings:

Normal distribution

Distribution of the empirical variance of a normal distribution

Distribution of the empirical standard deviation of a normal distribution

1 as the square of a r.v.

Chi-square tests

Gamma distribution

 

Download this Glossary

 

Want to contribute to this site ?