Multivariate normal distribution

We suggest that you first report to the entry about the bivariate normal distribution.

-----

The multivariate normal distribution (or "multinormal distribution") is the most important multidimensional distribution. It is common for real world data to be at least approximately multinormally distributed, and some techniques like Discriminant Analysis explicitely make this assumption.

# Definition of the multivariate normal distribution

Our goal is to generalize the ordinary (univariate) normal distribution to random vectors. This can be done in several different ways.

## The wrong way

We first insist on what the multivariate normal distribution is not.

A natural idea would be to "define" the multivariate normal distribution as a distribution whose marginals are all normal. But we know that there exist bivariate distributions whose marginals are normal, and yet that are not binormal.

We therefore have to forego this erroneous approach.

## The bivariate normal distribution

We defined the bivariate normal distribution by introducing an adjustable correlation coefficient between its two normal marginals.

Generalizing this approach to more than two variates is cumbersome, and will not be pursued.

## Linear combinations of the marginals

It is possible to define the multivariate normal distribution as a distribution such that any linear combination of its marginals is (univariate) normal.

We will not use this definition, but we'll show that this result is indeed a characteristic property of the multivariate normal distribution.

## Transformation of the standard spherical multinormal distribution

The standard spherical multivariate normal distribution is defined as the joint distribution of p independent univariate standard normal variables.

The (general) multivariate normal distribution may be defined as the transform of this standard spherical distribution under any regular linear transformation.

## Formal generalization of the univariate normal distribution

In this Glossary, we define the multivariate normal distribution by the analytical form of its probability density, that we'll choose to be a formal generalization of the univariate case.

Recall that the normal distribution is :

f(x) = k.exp[-1/2.a(x - b)²]

where k and a are appropriate coefficients (see here).

This expression is generalized to the multidimensional case as follows :

* The variable x is replaced by the vector  x = {x1, x2, ..., xp} with p components.

* The normalization coefficient k is replaced by the coefficient K whose role is also to make the integral of the density equal to 1.

* The term a(x - b)² is replaced by a quadratic form in (x - b) :

(x - b)A(x - b)'

where b is a vector.

In the univariate case, we have a = 1/², which is always positive. By analogy, we impose to the matrix A to be symmetric positive definite, a property that generalizes the notion of "positiveness" to matrices.

-----

So, by definition, the distribution of the random vector X = {X1, X2, ..., Xp} is said to be a multivariate normal distribution if its probability density is :

 f(x) = K.exp[-1/2.(x - b)A(x - b)']

with A a symmetric positive definite matrix.

# Basic properties of the multivariate normal distribution

In the Tutorial below, we'll establish the following results :

## Normalization coefficient

Recall that the normalisation coefficient of the (univariate) normal distribution is :

We'll show that the normalisation coefficient of the multivariate normal distribution is :

where "det" stands for "determinant".

## Mean

Recall that the mean µ of the (univariate) normal distribution is equal to b.

We'll establish that the mean vector of the multivariate normal distribution is equal to the vector b.

 E[X] = b

Therefore, we'll later on replace b by µ.

## Covariance matrix

In the univariate case, we have :

a = 1/²

and we'll show that in the multivariate case, we have :

 A = -1

where is the covariance matrix of X.

## Final form

Bringing these these results together, we'll obtain the following final result :

 The density of a multivariate normal vector is : f(x) = (2)- p / 2.[det(-1)]-1/2 .exp[-1/2.(x - µ)-1(x - µ)']

that will be denoted N(µ, ) by analogy with the notation N(µ, ²) used for the univariate normal distribution.

-----

So there's a perfect analogy with the univariate case, with the covariance matrix playing now the role that ² was playing for the univariate case.

Of course, this expression reduces to the ordinary normal distribution when the "vector" X is reduced to a single component.

# Marginal distributions of the multivariate normal distribution

Let X = {X1, X2, ..., Xp} be a random vector.

A marginal distribution of X is the joint distribution of any subset of the variables (X1, X2, ..., Xp). So there are as many marginal distributions as there are such subsets, that is 2 p - 2 (ignoring the empty and the complete subsets of variables).

The following illustration shows the two marginal distributions of a bivariate normal distribution :

We'll show that the marginal distributions of the multivariate normal distribution are also multinormal, a fundamental result.

Let X = {X1, X2, ..., Xp} be a p-dimensional multinormal vector, and consider the vector X1 made up of the first k components of X :

X1 = {X1, X2, ..., Xk}       k < p

The p components of X can always be indexed in such a way that any subset of k components is made to be the subset of the first k components.

Then :

 * The distribution of X1 is a multivariate normal distribution. * The k components of the mean vector of this distribution are the means of the k variables Xi. * Its covariance matrix (of order k) is made up of the pairwise covariances of the k variables Xi.

This illustration represents the covariance matrix of X :

The covariance matrix X1 of X1 is just the upper left corner square submatrix of order k of X  (lower image of the illustration).

This submatrix is traditionally denoted 11.

-----

When k = 1, this result shows that the individual components Xi of X = {X1, X2, ..., Xp} are (univariate) normal variables.

# Conditional distributions of the multivariate normal distribution

Let X = {X1, X2, ..., Xp} be a random vector.

A "conditional distribution" of X is the joint distribution of any subset of of the variables (X1, X2, ..., Xp) when the other variables are held fixed. In other words, it is the normalized profile of a "cut" through the distribution of X made by a hyperplane defined by the (fixed) values assigned to the other variables. So there are as many marginal distributions as there are such subsets, that is 2 p - 2 (ignoring the empty and the complete subsets of variables).

This illustration shows one of the two conditional distributions of a bivariate normal distribution.

Let X = {X1, X2, ..., Xp} be a multinormal vector, that we partition into two sub-vectors :

X = (X1, X2 )

We'll show that the distribution of X1 conditionally to X2 is multinormal, a fundamental property.

In addition, we'll show the two following important properties :

 * The mean vector of this condtional distribution depends linearly on X2. * The covariance matrix of this conditional distribution does not depend on X2.

This last point means that if a cutting hyperplane is translated parallel to itself, all the cuts it generates have the same covariance matrix.

# Multivariate normal distribution and Regression

We now consider the multinormal distribution from the point of view of using X2 for predicting X1.

Identifying the conditional distributions of the multivariate normal distribution, and in particular the expectation of X1 conditionally to the value of X2 allows considering the multivariate normal distribution as a linear model of regression.

For example, this figure illustrates the prediction of the vector X1 = (u, v) by the unique quantity X2 = (w).

This illustration is reproduced and commented more thoroughly in the Tutorial below.

This model exhibits strong similarities with the standard model of Multiple Linear Regression (MLR) :

* The expectation of the response variable is a linear function of the predictors.

* The residuals are normal an uncorrelated,

but with also two important differences :

* In MLR, the predictors are considered as fixed, and therefore are not random variables, whereas X2 is here a random vector.

* MLR tries to predict the value of a unique response variable y, whereas the response variable is here a vector (a group of variables).

We'll show that the model based on the conditional distributions of the multivariate normal distribution is better than any other linear model X1 = f(X2 ) in two respects :

* It minimizes the Mean Square Error (MSE) between predictions and observations.

* It maximizes the correlation coefficient between each of the variables and any linear combination of the other variables used for predicting the value of this variable. This coefficient is called the Multiple Correlation Coefficient attached to the variable, and we'll calculate its value.

# Moment generating function of the multivariate normal distribution

Let X ~ N(µ, ).

We'll show that its m.g.f. MX(t)  is :

 MX(t) = exp{t'µ + 1/2.t't}

where t is a vector parameter.

We'll then use this result for demonstrating again and generalize some of the previous results, and also establish a useful characteristic property of the multivariate normal distribution.

# Quadratic forms in multivariate normal variables

Statistics is often facing quadratic forms is multivariate normal variables, in particular :

* In Analysis of Variance (ANOVA),

Under certain conditions that are detailed here, these quadratic forms follow (exactly) a Chi-square distribution.

-----

The squared Mahalanobis distance is a quadratic form which follows a Chi-square distribution when the variable is a multinormal vector.

# Simulation of a multinormal random vector

We explain here how the association of the Box-Muller transform and the Mahalanobis transformation can be used for simulating an arbitrary multivariate normal distribution.

___________________________________________________________________________

 Tutorial 1

We first show that an appropriate linear transformation transforms the general multivariate normal distribution into the simplest possible multivariate normal distribution : the standard spherical  normal distribution, which is defined as the joint distribution of p independent standard normal variables (lower image of this illustration).

This transformation is useful in many circumstances, as it allows to carry problems about the general multivariate normal distribution over to this particularly simple distribution.

-----

From this result, we'll derive :

* The value of the normalization coefficient K,

* And the mean µ,

of the multivariate normal distribution.

We'll then calculate the covariance matrix of the multivariate normal distribution, and show that it is equal to A-1, a fundamental result.

THE MULTIVARIATE NORMAL DISTRIBUTION

 Spherization of the multivariate normal distribution Reduction of a positive definite matrix to the identity matrix Spherization The transformation The Jacobian The standard spherical normal distribution Normalization coefficient of the multivariate normal distribution Mean of the multivariate normal distribution Covariance matrix of the multivariate normal distribution Complete form of the multivariate normal distribution TUTORIAL

__________________________________________________________________

 Tutorial 2

In this Tutorial, we show that the marginal distributions of the normal multivariate distribution are also multinormal, and we calculate their parameters.

There exist several ways to establish this important result, of which we give three.

1) We first use a very orthodox method, a bit lengthy but that stays close to intuition and delivers the additional important result that uncorrelated marginals are necessarily independent.

- We first show that a regular linear transformation changes a multinormal distribution into another multinormal distribution, whose parameters we'll calculate. This lemma is useful in just about any circumstance involving multivariate normal distributions.

- We then address the special case where every component Xi (i k) of X1 = {X1, X2, ..., Xk} is uncorrelated with any component Xj ( j > k) of

X2 = {Xk + 1, X k + 2, ..., Xp}. We then show that the distribution of the marginal X1 is multinormal (with of course a similar result for X2 ).

In passing, we'll also show that this lack of correlation between Xi and Xj is in fact genuine independence.

- We finally address the general problem where no assumption whatever is made about the correlation between the components of X. We'll show that it is still true that the marginals X1 and X2  are multinormally distributed by identifying an appropriate transformation that will transform the original distribution into a new distribution for which the two marginals are uncorrelated. We'll then be able to deduce the multinormality of the marginals of the original distribution.

2) We then give a second demonstration that uses the above mentioned lemma to short-circuit all algebraic developments. It delivers the result in just a few lines, and is an illustration of the power of Linear Algebra at the expense of intuition.

3) Finally, we'll use the the moment generating function of the multivariate normal distribution to again establish this result in a simple and elegant manner (see below).

MARGINALS ARE MULTINORMAL

UNCORRELATED MARGINALS ARE INDEPENDENT

 Linear transform of a multinormal vector Transform of the quadratic form The Jacobian The distribution of the transform is multinormal Special case : the two groups of variables are uncorrelated Partitioning the covariance matrix Partitioning the quadratic form The marginals are multinormal Lack of correlation implies independence General case Transformation of the original distribution The marginals of the transformed distribution are multinormal The marginals of the original distribution are multinormal  Second demonstration TUTORIAL

_______________________________________________________________

 Tutorial 3

We now calculate the conditional distributions of the multivariate normal distribution.

The most straightforward method would be to call on the fundamental property of conditional distributions, which states that a conditional distribution is equal to :

* The joint distribution of the complete set of variables,

* Divided by the joint distribution of the conditioning variables.

In this particular case, this approach leads to fairly cumbersome calculations, and we'll find it more convenient to first transform the original distribution into a new distribution that can be conveniently factored. The inverse transformation will then allow us to write the original distribution is a form that will make calculating the conditional distributions very simple.

-----

We'll then remark that :

1) A conditional distribution of a multivariate normal distribution is multinormal.

2) Its mean vector depends linearly on the conditioning vector.

3) Its covariance matrix does not depend on the value assigned to the conditioning vector.

CONDITIONAL DISTRIBUTIONS

OF THE MULTIVARIATE NORMAL DISTRIBUTION

 Factorization of the joint distribution Transformation of the joint distribution Definition of the transformation Mean vector Covariance matrix Factorization of the transformed distibution Factorization of the original distibution Conditional distributions of the multivariate normal distribution The conditional distributions are multinormal The mean vector depends linearly on the conditioning vector The covariance matrix does not depend on the conditioning vector TUTORIAL

_______________________________________________________________

 Tutorial 4

In this Tutorial, we look at the multinormal vector X = (X1, X2 ) from the point of view of using X2 for predicting X1.

Identifying the conditional distributions of the multivariate normal distribution, and in particular the expectation of X1 conditionally to the value of X2 allows considering the multivariate normal distribution as a linear model of regression.

-----

We will show that this model is better than any other linear model in at least two respects :

* It minimizes the Mean Square Error (MSE) between predictions and observations.

* It maximizes the correlation coefficient between each of the variables and any linear combination of the other variables used for predicting the value of this variable. This coefficient is called the Multiple Correlation Coefficient attached to the variable, and we'll calculate its value.

MINIMIZATION OF THE PREDICTION MEAN SQUARED ERRORS

MAXIMIZATION OF THE MULTIPLE CORRELATION COEFFICIENT

 Residuals Residual vector Predictor and residual are uncorrelated Minimization of the prediction Mean Squared Errors Multiple correlation coefficient Correlation between observations and predictions, multiple correlation The conditional expectation model maximizes the multiple correlation coefficient Value of the multiple correlation coefficient TUTORIAL

________________________________________________________________________

 Tutorial 5

We now calculate the moment generating function M(t) of the multivariate normal distribution. Because this distribution is multivariate, the parameter t is a vector, but the function itself is scalar.

As is often the case, the m.g.f. turns out to be a very convenient and powerful tool for establishing all sorts of results about a distribution, sometimes in a concise and elegant manner. We'll go over some of results that we already obtained somewhat laboriously using Linear Algebra, and show how they can also be obtained by using the m.g.f.. In particular, we'll generalize the result about linear transforms of a multinormal vector to the case where :

* The matrix of the transformation is square, but singular.

* The matrix of the transformation is rectangular.

We'll also show that a characteristic property of the multinormal distribution is that any linear combination of its components is (univariate) normal. Recall that this property is sometimes used as the definition of the multivariate normal distribution.

MOMENT GENERATING FUNCTION

OF THE MULTIVARIATE NORMAL DISTRIBUTION

 Moment generating function of the multivariate normal distribution Special case : the spherical standard multinormal distribution The general case Some immediate consequences of the Moment Generating Function General linear transform of a multinormal vector is multinormal Marginals are multinormal Uncorrelation implies independence A vector is multinormal iff all linear combinations of its components are normal TUTORIAL

__________________________________________________

 Univariate normal distribution Bivariate normal distribution Positive definite matrix Covariance matrix Multiple Linear Regression Quadratic forms