Interactive animation

Covariance matrix

The variance of a variable is a measure of the dispersion of the values taken by the variable around its mean value.

The covariance matrix generalizes the concept of variance to random vectors, or sets of random variables.

Definition of the covariance matrix

Informal definition

Let x = {X1, X2, ..., Xp} be a random vector with mean vector µ = {µ1, µ2, ..., µp}.

    * The dispersion of each Xi around its mean is measured by its variance (which is its own covariance).

    * The covariance Cov(Xi, Xj ) of the pair {Xi, Xj }is a measure of the linear coupling between these two variables.

 

This set of numbers (together with the set of means of the Xis) completely defines the structure of the joint probability distribution of {X1, X2, ..., Xp} up to order 2, just as the mean and variance of a single random variable completely defines its distribution up to order 2.

 

It is common to group all these numbers into a square table called the Covariance Matrix of the distribution according to the following layout :

 

 

The covariance matrix is often denoted .

    * ij is the covariance of Xi and Xj .

    * ii is the covariance of Xi with itself, that is its variance i˛. So the diagonal elements of the covariance matrix are the variances of the Xis.

Formal definition

Just as the variance of a single r.v. X is defined by :

Var(X) = E[(X - µ)˛]

the covariance matrix of a random vector is formally defined by :

 

= E[(x - µ)(x - µ)']

 

 

which is easily verified to be equivalent to the informal definition given above.

-----

Just as :

Var(X) = E[X˛] - E[X

for a single r.v., it is easily verified that for a random vector x :

 

= E[xx'] - µµ'

 

______________

 

If all variables are standardized, the covariance matrix is identical to the Correlation Matrix.

Change of basis

The covariance matrix of a random vector is not an intrinsic quantity attached to its distribution : it depends on the basis in which it is calculated. We'll see below that some basis allow a covariance matrix to take a particularly simple and useful form.

Covariance matrix and multivariate normal distribution

The multivariate normal distribution (or "multinormal distribution") plays a central role in data modeling as real-life multivariate data is often at least approximately multinormally distributed.

Recall that the multinormal distribution is entirely determined by its mean vector and its covariance matrix. So is is equivalent to say :

    * We develop a theory where data distribution is assumed to be multinormal.

    * We develop a theory where we make no assumption about data distribution, but the theory is developped only up to the second order.

 

This is in particular the approach chosen by Discriminant Analysis.

Sample covariance matrix

We defined the covariance matrix of a multivariate distribution. But the same definition applies to a sample drawn from this distribution (just as in the univariate case). The terms "variance" and "covariance" just have to be replaced by "sample variance" and "sample covariance". The matrix thus obtained is then called the "sample covariance matrix" (or "empirical covariance matrix").

Let X be the data matrix of a centered sample of size n :

 

 

    * The first draw from the multivariate distribution delivers the first realization of the random vector, whose coordinates make up the first row of X.

    * The second draw from the multivariate distribution delivers the second realization of the random vector, whose coordinates make up the second row of X.

    * ...

and there are n draws so we have a sample of size n from the multivariate distribution.

Then it is easily seen that the sample covariance matrix  is 1/n times the product of X by its transpose :

 

 

X'X = n

 

 

The above illustration represents the most commonly encountered case where n > p (the number of observations is larger than the number of variables).

Properties of a covariance matrix

The covariance matrix is not just a convenient way of displaying numbers. As a matrix, it has several important properties which derive from the fact that a covariance matrix is always positive semidefinite. The converse is also true : any positive semidefinite matrix is the covariance matrix of a random vector (in fact, of many).

 

In particular, the spectral decomposition of the covariance matrix of a random vector x shows that :

    * There exists an orthonormal basis such that the covariance matrix  of x expressed in this basis is diagonal. The axes of this new basis are called the Principal Components of (or of the distribution of x).

    * As the off-diagonal elements of this new matrix are 0, the new variables defined by this new basis (the projections of x on the Principal Components) are uncorrelated.

    * The diagonal elements of this new, diagonal covariance matrix are the eigenvalues of . So the variances of the projections of x on the Principal Components ar equal to the corresponding eigenvalues of .

    * If units are changed so that all Principal Components carry now the same variance, the distribution becomes spherically symmetrical. The distribution is then said to be "standardized".


Note that this is not true if a change of units is made so that the original axes all carry the same variance (for example, if the original variables are standardized). The resulting cloud, although it has the same variance on all original axes, is not standardized (the marginal distributions are correlated). You may experiment with this idea in the interactive animation below.

__________________________________________________________

This remarks are the starting point of Principal Components Analysis (PCA).

They are illustrated by the following interactive animation, and demonstrated in the Tutorial below.

Animation

 This animation illustrates the concept of Covariance Matrix.

 

 

The "Book of Animations" on your computer

 

 

Upper frame

Lower frame

            The green (x', y') axes of the upper frame are rotated so that the x' axis is now in the familiar horizontal direction. The sample and the ellipse have been rotated along with the (x', y') reference frame. The axes of the ellipse are now horizontal and vertical, but it has exactly the same size and shape as the ellipse in the upper frame.

Recall that x' is the direction of the maximum elongation of the sample. So now the sample looks stretched out horizontally (but in fact, its shape is exactly the same as in the upper frame)..

In a similar way, y' is the direction of minimum elongation. The sample looks "sqashed" in the y' direction.

Covariance Matrix

            To the right of the upper frame is the sample's Covariance Matrix.

Diagonal elements

                    They are the variances of the projections of the sample respectively on the (horizontal and vertical) x and y axes.

Off-diagonal elements

                    They are equal (the matrix is said to be "symmetrical"), and their common value is the covariance Cov(x, y) = Cov(y, x).

 

Diagonalized Covariance Matrix

            To the right of the lower frame is the so-called "Diagonalized Covariance Matrix". It is the Covariance Matrix of the sample as shown in the lower frame.

Diagonal elements

                    They are the variances of the projections of the sample respectively on the (horizontal and vertical) x' and y' axes.

                        * The first value is the largest possible variance of the projection of a sample on any axis. Notice that it is larger than either variance as read in the upper Covariance Matrix.  In the vocabulary of Linear Algebra (and of PCA), this value is the First (or Largest) Eigenvalue of the original Covariance Matrix.

The half-length of the long axis of the ellipse is the square root of the first eigenvalue. It is denoted by a horizontal orange segment.
 

                        * The second value is the smallest possible variance of the projection of a sample on any axis. Notice that it is smaller than either variance as read in the upper Covariance Matrix. It is the second eigenvalue of the original Covariance Matrix.

The half-length of the short axis of the ellipse is the square root of the second eigenvalue. It is denoted by a vertical orange segment.
 

                        * The sum of the two variances in the Covariance Matrix is equal (within the round off errors) to the sum of the variances in the Diagonalized Covariance Matrix. First, this is a theorem of Linear Algebra (the so-called "trace" of a square matrix is invariant under a change of unitary orthogonal frame). Second, in the framework of  PCA, this sum receives an interpretation which is independent of any reference frame.

Off-diagonal elements

                    Both off-diagonal elements are zero (so the matrix is still symmetrical). This reads "x' and y' have zero covariance, they are uncorrelated". This can be demonstrated, but it is quite intuitive : in the (x',  y') reference frame, as you move along the x' axis to the right, there is no systematic tendency for y' to go up or down. Discovering such a tendency was the very reason for inventing the concept of Covariance in the first place, so, "no tendency" certainly leads to expect "zero covariance".

Animation

            In the upper frame, move red points about with the tip of your mouse, and observe changes of :

 

In the "general case", the sample has a somewhat elongated shape at an angle with x.

 


A word about the green axes. Whereas their directions are defined unambiguously, their + and - orientation are arbitrary. This animation chooses orientations such that :
   * Increasing values of x' always go to the rignt.
   * Increasing values of y' always go upward.
This causes abrupt changes in axes orientations when going through a vertical (x') or horizontal (y')  position, with a corresponding discontinuous change in the displayed sample in the lower frame.

____________________________________________

 

 Related animations :

Inertia

Bivariate normal distribution

Mahalanobis distance

_________________________________________________________________________

 

 

Tutorial

 

This Tutorial addresses some of the basic properties of covariance matrices.

 

    * We first show that a covariance matrix is positive semidefinite and that, conversely, any positive semidefinite matrix is the covariance matrix of a random vector (in fact, of infinitely many).

 

    * When a covariance matrix is only positive semidefinite instead of being positive definite, we'll show that the distribution of the random vector is degenerate : it occupies only a subspace (whose dimension we'll calculate) of the complete space.

 

    * The spectral decomposition of a covariance matrix is in fact its diagonalization. This will lead us to demonstrate the following well know facts :

        - The eigenvalues of a covariance matrix are equal to the variances of the projections of the random vector on the eigenvectors of this covariance matrix.

        - The direction of the largest variance of a projection of x is that defined by the eigenvector associated to the largest eigenvalue of the covariance matrix.

        - More generally, we'll show that if the eigenvectors are sorted by decreasing values of the eigenvalues, then the direction orthogonal to {u1, u2, ..., uk},          k < p which maximizes the projected variance of x is uk + 1.

        - The projections of x on the eigenvectors of the covariance matrix are uncorrelated random variables.

 

    * We finally introduce the Mahalanobis transformation that leads to the notion of Mahalanobis distance, a random variable whose properties we'll describe, with a special emphasis on the case where x is a multivariate normal vector.

 

 

 

 

 

COVARIANCE MATRIX

A covariance matrix is semidefinite positive, and conversely

A covariance matrix is semidefinite positive

A positive semidefinite matrix is a covariance matrix

Singular covariance matrices

Degenerate distribution

Dimension of the subspace of the distribution

Diagonalization of a covariance matrix

Eigenvalues are projected variances

Eigenvectors are directions of largest projected variance

The First Principal Component

Other Principal Components

Projections on eigenvectors are uncorrelated random variables

Direct calculation

By diagonalization of the covariance matrix

Mahalanobis distance

Standardization of a random vector

Mahalanobis transformation and Mahalanobis distance

The general case

The multivariate normal case, Chi-square distribution

TUTORIAL

 

___________________________________________________

 

Related readings :

Positive semidefinite matrix

Principal Components Analysis

Discriminant Analysis

Correlation matrix

Covariance

Variance

Inertia

Bivariate normal distribution

Multivariate normal distribution

Mahalanobis distance

Download this Glossary