|
Interactive animation |
Normal (distribution)
Also known as "gaussian distribution".
By far the most well known probability distribution.
A random variable X is said to be normally distributed if its probability density function (pdf) is :
|
|
for a certain pair of values of the two parameters µ and σ.
The term :

before the exponential is a normalization coefficient that makes the integral of f(x) equal to 1 as is necessary for a probability density.
The value of this normalization coefficient is justified in the Tutorial below.
-----
* µ is clearly a location parameter,
* and σ is just as clearly a spread parameter.
Although the "true" parameter is σ, applications are usually concerned with σ˛, and consequently, the normal distribution with parameters µ and σ will be denoted N(µ, σ˛).
The symmetrical bell-shaped curve of the normal distribution is ubiquitous, and you'll see it many times throughout this site and just about everywhere else. We give here an illustrated example that displays both :
* A normal distribution (upper green curve) together with a sample drawn from this distribution. The mean of this sample is marked by a red point, from which a vertical red line extends downwards.
* Another normal distribution (lower red curve) which is the theoretical distribution of the sample mean of the upper curve (we demonstrate this result.in the Tutorial below).
Change the value of the standard deviation of the
upper normal curve with the vertical cursor in the upper-right corner of the animation,
and observe the changes of spreads of the two gaussian curves.
Change the sample size, and observe that the lower gaussian becomes narrower as the sample getsd larger.
The other controls are self-explanatory.
We'll see in many instances that the special normal distribution N(0, 1) plays a central role is Statistics. It is called the standard normal distribution.
If X ~ N(µ, σ˛), then the variable transformation

shows that X ' ~ N(0, 1), a distribution that does not depend on the values of µ and σ.
Then, for a any number a :

and all inferences about X can be carried over to similar inferences about X ', whose unique distribution has been tabulated once and for all.
Let X be a r.v. distributed as N(µ, σ˛). We'll show directly that
|
E[X] = µ |
|
Var(X) = σ˛ |
A normal distribution is therefore entirely determined
by the values of its first two moments.
A more general calculation will then allow us to identify an expression giving moments of all even orders E[X 2k] of the standard normal distribution :
|
|
The centered moments of a normal distribution with an arbitrary variance are immediately derived from this result.
-----
Since the pdf of a centered normal distribution is even, all the odd-order moments are 0.
We'll show that the moment generating function of the normal distribution is :
|
|
We were able to calculate the moments of all orders without resorting to the moment generating function. Yet, this mgf will be useful for establishing :
* The properties of a sum of independent normal variables (see here).
* The properties of the multivariate normal distribution.
|
X ' ~ N(aµ + b, a˛σ˛) |
We remark that, as is true for any rv :
* The mean is submitted to the same linear transformation as the original variable,
* The variance is multiplied by a˛ and is not affected by a translation.
So the truly original part of this result is that the linear transform of a normal r.v. is also normal.
Let {Xi ~ N(µi, σi˛)} be a set of n independent normal variables. Let also {a1, a2, ..., an} be a set of n real numbers. Finally, let
Y = Σi aiXi
be linear combination of the Xis.
By calling on the properties of the moment generating function, we'll show that :
|
Y ~ N(Σi aiµi, Σiai˛σ˛i) |
We remark that, as is true for any set of independent rvs :
* The mean of the linear combination is equal to the same linear combination of the means of the variables.
* The variance of the linear combination is equal to the same linear combination of the variances of the variables, but with the coefficients squared.
So the truly original part of this result is that a linear combination of independent normal rvs is also normal.
-----
Note that if all ai = 1, this result reads :
* The sum of independent normal rvs is also normal. The mean of the sum is then the sum of the means, and the variance is the sum of the variances.
We also establish this result here
by calculating the convolution of two normal distributions.
-----
Two normal variables that are not independent may very well have normally distributed linear combinations (think of X + X with X normal). But we give here examples of linear combinations of normal variables that are not normally distributed because the variables in the combination are not independent.
The above result has some sort of converse. The Cramér-Levy theorem states that if X and Y are two independent random variables such that (X + Y) is normally distributed, then both X and Y are normal random variables. This difficult result is not addressed in this Glossary.
Let N(µ, σ˛) be the normal distribution with mean µ and variance σ˛ from which we draw n-samples {Xi}. Denote :
*
the sample mean :
= 1/n.
i
Xi
* S ˛ the sample variance :
S ˛ = 1/(n - 1).
i(Xi -
)˛
Recall that the (n - 1) factor is there to make S ˛ an unbiased estimator of the variance σ˛ of the distribution.
Devising tests and confidence
intervals for µ and σ˛
demands that the distributions of
and
of S ˛ be known.
The above animation showed that the sample mean :
* Is normally distributed,
* With mean identical to the mean of the mother distribution (which is true for any distribution),
* And with variance σ˛/n (a result also true for any distribution).
|
|
This result is demonstrated in the Tutorial below.
The distribution of the sample variance S ˛ is such that :
|
(n - 1)S ˛/
σ˛
~ |
where "
n
- 1" stands for the Chi-square distribution with (n - 1) degrees
of freedom (see here). This fundamental result
is demonstrated here.
We show here that :
* The sample mean
,
* And the sample variance S ˛
are independent random variables.
-----
This property is in fact a characteristic property of the normal distribution but the converse is difficult and not addressed in this Glossary.
The sample mean
is
distributed as N(µ, σ˛/n) (see
Tutorial below), and the standardized
sample mean is therefore distributed as N(0, 1). This result leads to
confidence intervals and the most basic tests (the so-called "t
tests") bearing on the value of the mean of a normal distribution with
known variance.
But more often than not, the variance σ˛ is unknown, which makes standardizing the sample mean impossible. One can nevertheless replace σ˛ by the sample variance, thus creating a quantity T that does not depend on σ˛, and whose distribution is known : it is Student's t distribution with (n - 1) degrees of freedom, where n is the sample size.
It is then possible to construct confidence intervals and tests bearing on :
* The value of the mean of a normal distribution whose variance is unknown,
* The difference between the means of two normal distributions of equal but unknown variances.
-----
Other quantities also follow t distributions, in particular during the elaboration of confidence intervals on the values of the parameters of a Multiple Linear Regression.
The ratio of the variances of two samples drawn from two independent normal distributions with equal variances follows a distribution known as Fisher'F distribution. The properties of the F distribution are detailed here.
-----
Other quantities also follow the F distribution, which plays a central role :
* In Analysis of Variance (ANOVA),
* In Linear Regression (Simple of Multiple) for tests bearing on the overall validity of adjusted models.
The normal distribution was first introduced as the limit of the binomial distribution B(n, p) for very large values of n. The binomial distribution is then impossible to calculate exactly because of the factorials that lead to very large numbers, and the need was felt for an approximate formula that was easier to calculate. As early as the first half of the 18th century, de Moivre (with the help of his friend Stirling) was the first to identify the above analytical form. More precisely, de Moivre showed that if X is distributed as B(n, p), then

( with q = 1 - p) where one recognizes np as the mean of the binomial distribution, and npq its variance.
The normal distribution was born.
The Poisson distribution provides another approximation to the binomial distribution B(n, p) for large values of n which is better than the normal distribution for very small values of p (or (1 - p)).
But it later turned out that the normal distribution has a much deeper origin. What de Moivre observed with the binomial distribution is just a special case of a much more general setting.
If {Xi} is an infinite sequence of independent r.v., we can construct another infinite sequence of r.v. defined by :

So Yn is defined as the sum of the first n Xis.
Then it can be shown that, under some not very restrictive conditions, the variable :

converges in distribution to N(0, 1) as n grows without limit.
This fundamental result is known as the Central Limit Theorem.
You'll easily convince yourself that the result obtained by de Moivre is a special case of the much more general Central Limit Theorem, with the Xi being independent Bernoulli variables, all with the same parameter p.
-----
The Central Limit Theorem "explains" why so many distributions encountered in applications resemble closely normal distributions. Many quantities can be interpereted as resulting from the cumulative effect of a large number of independent random causes with the same probability distribution. Then, whatever the nature of this distribution, the effect will be nearly normally distributed.
The normal distribution therefore appears as a universal distribution.
We show here that :
* The sample mean
is
a sufficient statistic for the distribution mean µ when the variance
σ˛ is known,
* Σi (xi - µ)˛ is a sufficient statistic for the variance σ˛ when the mean µ is known,
* And that the pair (
,
Σi xi˛) is sufficient
for the pair (µ, σ˛) when the values of both these parameters are
unknown.
-----
We show here that these statistics are in fact not just sufficient, but also minimal sufficient.
We show here
that the normal distribution belongs to the exponential family. From this result,
we'll deduce that the sample mean
is an efficient estimator
of the mean µ.
The concept of "normality" extends to the joint distribution of several variables, and the multivariate normal distribution may very well be the most important multivariate distribution. Its properties are detailed here.
Simulating a normally distributed random variable is of course an important topic owing to the ubiquity of the normal distribution. The most popular technique for simulating the normal distribution is the Box-Muller transform.
____________________________________________________________
|
Tutorial 1 |
In this first Tutorial, we establish the basic properties of the normal distribution.
* We first calculate the value of the normalization coefficient of the normal distribution.
* We then calculate the mean and variance from their definitions. Calculating the mean is easy, calculating the variance is a bit more difficult. We won't be surprised to discover that µ is the mean of the normal distribution, and σ˛ its variance.
* We then establish the general formula giving all the even-order moments.
* We finally calculate the moment generating function of the normal distribution, and then derive the mean and the variance again from this mgf.
-----
We insist on the importance of the standard normal distribution N(0, 1), which is the cornerstone of the simplest of the confidence intervals and t test.
BASIC PROPERTIES OF THE NORMAL DISTRIBUTION
|
Coefficient of normalization Mean and variance (direct calculation) Mean Variance Moments of all orders Moment generating function Moments Mean Variance Standard normal distribution Definition Cumulative distribution function Importance of the standard normal distribution Example 1 Example 2 |
||
|
TUTORIAL |
||
______________________________________________
|
Tutorial 2 |
We now proceed with the estimation of the parameters of the normal distribution.
We first use the "natural" method of moments, then the method of Maximum Likelihood. We'll see that the ML solution is unique (which is not always the case). We'll also see that the method of moments provides a slightly better estimator of the distribution variance than the method of ML does, which is the exception rather than the rule.
We illustrate these result with an interactive animation.
ESTIMATION OF THE PARAMETERS
OF THE NORMAL DISTRIBUTION
|
Estimation by the method of moments Estimation by the method of Maximum Likelihood The log-likelihood Maximum Likelihood estimate of the mean Maximum Likelihood estimate of the variance Estimator Bias of the ML estimator of the variance _______________________________ |
||
|
TUTORIAL |
||
________________________________________________________
|
Tutorial 3 |
We establish here two important properties of normally distributed random variables :
* The linearly transform of a normal r.v. is also normal, and we calculate its parameters.
* A linear combination of independent normal r.v.s is also normal, and we calculate its parameters. The condition "independent" is important, and we give here two examples of pairs of normally distributed r.v. {X, Y}, that are not independent and whose sums Z = X + Y are not normally distributed.
We then calculate the distribution of the sample mean of a normal distribution, a result widely used in applications.
BASIC PROPERTIES OF
NORMALLY DISTRIBUTED R.V.
|
Linear transformations of normal variables First solution : general results on variable transformations Second solution : moment generating function Linear combination of independent normal r.v. Preliminary remark Sum of independent normal r.v. Linear combination of independent normal r.v. Distribution of the sample mean |
||
|
TUTORIAL |
||
________________________________________________________
|
Tutorial 4 |
This exercise and animation illustrate the concept of "random parameter".
We define the r.v. X by describing the procedure to obtain one realization of X.
* Y is a r.v. with the normal distribution N(0, s˛). Let m be a realization of Y.
* Consider now the normal distribution N(m, σ˛). We draw an observation from this distribution, and consider this observation as a realization of X.
In other words, the distribution of X may be interpreted as :
* A normal distribution N(µ, σ˛),
* Whose parameter µ is itself a normally distributed r.v. : µ~ N(0, s˛).
-----
What is the distribution of X ?
We give two solutions in the Tutorial below.
* The first solution is short, and relies on results obtained in the previous Tutorial.
* The second solution is longer, but is a good example of calculation of a probability density knowing the density conditionally to an auxiliary variable, as well as the distribution of this conditioning variable.
* The distribution of µ is the green gaussian curve in the upper part of the animation. Its standard deviation (s) can be changed with the vertical cursor at the right of this curve.
* The distribution N(0, σ˛) is the blue gaussian curve in the middle section of the animation. Its standard deviation (σ) can be changed with the vertical cursor at the right of this curve.
* The theoretical distribution of X is the red curve in the lower section of the animation. Observe that it changes when either s or σ is changed.
-----
* Click repetitively on "Next". Every time you click, a green point is drawn from the N(0, s˛) distribution. This point defines the position of the mean of the blue distribution, from which a blue point is then drawn.
The goal of the exercise is to calculate the distribution of the blue points.
-----
* Click on "Go", and observe the progressive build-up of the histogram of the distribution of X.
NORMAL DISTRIBUTION WHOSE MEAN
IS ITSELF A NORMAL R.V.
|
First solution Second solution : calculating a distribution when
a conditional distribution Interactive animation |
||
|
TUTORIAL |
||
___________________________________________________________
Related readings :
|