Interactive animation

Central Limit Theorem

One of the cornerstones of the theory of random variables.

# General idea of the Central Limit Theorem

Let X be a continuous random variable described by a probability density with a mean µ and a variance σ². In a nutshell, the Central Limit Theorem (CLT) states that for large samples, the sample mean n of this distribution is approximately normally distributed. Moreover, it adds that the distribution of the sample mean can be made as close to normal as wanted : all it takes is to consider larger and larger samples.

In addition, for any distribution (with a mean and a variance) and any sample size n, although the distribution of n is usually unknown (with some notable exceptions like normal, Chi-square, binomial or Poisson distributions), it is always true that :

* The mean of n is µ, the mean of the distribution.

* The variance of n is σ²/n.

So one can build the standardized variable

whose mean is 0 and whose variance is 1.

The CLT then states that for large samples, the distribution of n' is approximately N(0, 1), and that the approximation gets better and better as n grows without limit.

-----

This is the most commonly perceived idea of what the CLT is all about. But we'll see that, although correct, this idea does not tell the whole story.

# Animation

The following animation illustrates the CLT.

 The "Book of Animations" on your computer

 Upper frame     a) The green rectangle is a uniform distribution. A sample drawn from this distribution is also displayed, and the tall vertical red line is the sample mean. You may change the sample size by using the "Sample size" buttons. A new sample is drawn each time you click on one of these buttons.       b) You can carve out any (limited support) density function you like by repetitively clicking inside the frame (even within the green area). The red gaussian curve has the same mean and variance as the green density function.   Lower frame The red gaussian curve is the standard normal distribution N(0, 1), which will be used as a reference. It remains unchanged as the green density in the upper frame, or the sample size, is changed. Its horizontal and vertical scales are arbitrary, and not related to those of the upper frame.   According to the CLT, the distribution of the "standardized sample mean":  should get closer and closer to this gaussian as n grows without limit. This is what the animation now illustrates.   Animation Click on "Go", and watch the progressive build-up of the histogram of the sample mean of your distribution. For some dnsities,  the histogram may be too tall for the frame. Reduce then the vertical scale "on the fly" with the "Vert. scale" button. We suggest the following experiments (among many others) : * Keep the initial uniform distribution, and observe the distributions of the sample mean for increasing values of n. For n = 2, observe that the distribution is triangular. This is demonstrated here. Even for low values of n, the distribution is very close to the reference gaussian curve.   * For any distribution you build, set the number of points to 1. The distribution of the sample "mean" will just be a duplicate of the original distribution.   * Build a distribution that is "as different as possible" from a normal distribution. For example, you may build a concave parabola-shaped distribution with 0 p.d.f. at the mean. No point will ever be created there. Yet, for large values of n, this is where the p.d.f. of the sample mean will reach its largest value!   * Keep the same original "concave" distribution, and set the number of points to 2. (To change the value of n while retaining the current distribution, first click on "Pause", then change n before clicking on "Go" again). The distribution of the sample mean is now strongly modulated, with 3 "humps" and 2 "troughs". Can you interpret this structure : Either in terms of self-convolution of the current density, Or with a more heuristic approach, asking "For a given abscissa x, what kind of likely sample will have x for the value of its sample mean ?".   * Now retain this same distribution, and launch again the animation for larger and larger values of n. The number of "humps" gets larger as n gets larger, but the "troughs" are more and more shallow. Can you explain why ? Observe that the histogram gets closer to the reference gaussian as n gets larger.

# The Central Limit Theorem

Let's now formulate the Central Limit Theorem in terms that are both more general and more correct.

1) Since N(0, 1) is a continuous distribution, it would seem that the CLT applies only to continuous distributions. In fact it applies to any distribution (continuous, discrete or other) with a mean and a variance.

2) The CLT does not address the convergence of probability density functions (or of probability mass functions), but the convergence of  distribution functions.

It states that, for any distribution with a mean and a variance, the distribution function of the standardized sample mean n' converges to the distribution function Φ(.) of the standard normal distribution when the sample size n grows without limit. This fundamental result is expressed as :

We are in a situation where the distribution function of a r.v. (the standardized sample mean) converges to a limit function (the distribution function of the standard normal distribution) : this kind of convergence is called "convergence in distribution".

# Central Limit Theorem and probability distributions

## Discrete random variables

The distribution function of a discrete rv is a step function (and is therefore not continuous), but an infinite sequence {Fn(x)} of discontinuous functions may very well converge to a continuous limit F(x) (as shown in this illustration), and such is indeed the case when the CLT is applied to a discrete rv.

Consider for example the binomial distribution. Its probability mass function assigns probabilities to integer values of the variable. The sample mean also has a probability mass function made of "spikes". The distance between these spikes tends to 0 as the sample size grows without limit, but such a sequence of functions does not converge to anything in the sense of Calculus (see animation).

But the distribution function of the standardized sample mean, although is is not continuous (step function), does converge to the distribution function of the standard normal distribution.

## Convergence of probability density functions

Recall that just because a sequence {Fn} of differentiable functions converges to a differentiable limit F does not mean that that the sequence {Fn' } of derivatives converges to the derivative F' of F as shown in this illustration

Here, although {Fn} converges to F(x) = 0 (whose derivative is 0), the sequence of derivatives clearly does not converge to 0 (or to anything for that matter).

Recall also that a probability density is the derivative of the corresponding distribution function.

Now if the random variable X has a probability density, then so does the standardized sample mean. But the above example tells us that just because the distribution function of the sample mean converges to that of a standard normal variable (CLT), we should not blindly believe that the probability density function of the standardized sample mean converges to the standard normal distribution.

-----

In fact, it is true that under rather weak conditions, if X has a probability density, then the probability density of the standardized sample mean converges to the standard normal distribution N(0, 1). But this result is not the CLT, and requires a (difficult) demonstration not given in this Glossary. It can be explained in loose terms by the fact that the convolution of a function by itself has the side effect of "smoothing out" the oscillations of the function. Iterate the convolution an infinite number of times, and you end up with a perfectly smooth function : the gaussian function.

# Limitations of the Central Limit Theorem

Although the range of applicability of the CLT is immense, the CLT is not universal. In particular, it demands that the considered distribution has both a mean and a variance. If such is not the case, then the CLT breaks down.

The most obvious failure of the CLT is to be found with the Cauchy distribution, which does not have any moment. In this particular case, the sample mean always has the same distribution (Cauchy), irrespective of the sample size.

Two other classic distributions with no means are Fisher's Fn, 1et Fn, 2.

# Generalizations of the Central Limit Theorem

There are many versions of the CLT that differ by the assumptions that are formulated about the considered probability distribution. For example :

* A version weaker than the one we stated above assumes that the distribution has a moment generating function. This is the version we demonstrate in the Tutorial below.

* Conversely, a stronger version does not refer to the sample mean of a distribution, but rather to the mean of a series of independent rvs {Xn}that are only assumed to have the same mean µ and the same variance σ².

There exist even stronger versions of the CLT, that is, that formulate even weaker assumptions about the random variables whose limit distribution of the standardized sum is seeked.

# Importance of the Central Limit Theorem

Beyond its major theoretical significance, the Central Limit Theorem has an important practical consequence. It is quite common that a certain quantity can be considered as resulting from the addition of a large number of small and independent contributions with identical distributions. The CLT then explains why it is so frequently observed that these quantities are normally distributed, without having to worry about the nature of this common elementary probability distribution

______________________________________________________________________________

 Tutorial

In this Tutorial, we demonstrate the Central Limit Theorem (CLT). More precisely, we demonstrate one version of the CLT, which assumes that the distribution under consideration has a moment generating function. This assumption is unnecessary, but it makes the demonstration much simpler. It is also reasonable as most of the distributions commonly encountered have a mgf.

We first establish a preliminary result about a certain class of undertermined limits of the form 0/0.

We then address the demonstration proper. Although it is not extremely complicated, it is probably a good idea to first give an outline that will help the reader to follow the various steps.

The final step of the demonstration calls on the convergence property of the moment generating function, that we mentioned but the demonstration of which is unfortunately beyond the bounds of this Glossary.

THE CENTRAL LIMIT THEOREM

 Preliminary result : a classical undetermined form The Central Limit Theorem Outline of the demonstration Demonstration of the Central Limit Theorem The standardized sample mean Mgf of the standardized sample mean Taylor expansion of the m.g.f. Taking the limit for large samples Convergence property of the mgf The Central Limit Theorem TUTORIAL

_______________________________________________________

 Binomial distribution Normal distribution Moment generating function