|
Interactive animation |
Covariance matrix
The variance of a variable is a measure of the dispersion of the values taken by the variable around its mean value. How can this notion be generalized to the case where n variables are considered simultaneously, instead of just one ? How can the spread of the n-dimensional joint probability density be expressed in numbers ?
There is no general way of doing that in a practical way, except when the distribution is multinormal. Then, the distribution is entirely characterized by :
* It's mean vector µ,
* And the set of pairwise covariances Cov(Xi, Xj), including the variances Var(Xi) (Recall that Cov(Xi, Xi) = Var(Xi)).
It is common to group all these numbers in a square table called the Covariance Matrix of the distribution according to the following layout :
|
|
For convenience, "Cov" is denoted by "C", and Var by "V" in the above illustration.
The covariance matrix is often denoted S. So, quite generally :
S[i, j] = Cov(Xi, Xj)
The matrix is symmetric with respect to the first diagonal, because the Covariance is itself symmetrical.
If all variables are of unit variance, then the covariance matrix is the same as the Correlation Matrix.
What follows requires a basic knowledge of Linear Algebra, and can be skipped by the casual reader.
The covariance matrix is more than just a convenient way of displaying numbers. It is actually a matrix in the full mathematical sense of the word. More specifically :
1) The covariance matrix can usually be inverted into the inverse covariance matrix S-1, a matrix such that :
S.S-1 = S-1 .S = I
where :
* the product "." is to be understood in the "product of matrices" sense,
* and I is the n-identity matrix, a square matrix with "1" in the diagonal positions, and "0" everywhere else.
The covariance matrix cannot be inverted when it is not regular, that is when the multinormal density it represents is not truly n-dimensional. For example, in the ordinary 3D space, the covariance matrix of a 2D flat "pancake" binormal distribution cannot be inverted.
The inverse covariance matrix plays a central role for multinormal distributions. Recall that the 1-dimensional normal density functions is :

Note that the variance s² appears in two positions :
* In the normalization factor, as the standard deviation s.
* In the denominator of the term inside the exponential.
A more complex, but similar, expression holds for multinormal distributions. The expression for the probability density function then becomes :

where "| |" stands for "determinant". Clearly, the ordinary 1-dimensional case is a special case of this more general expression.
Techniques like Discriminant Analysis rely heavily on this kind of expression for describing the density within classes in a classification problem.
2) The covariance matrix can be diagonalized
(with the same restriction as for the inversion). See interactive animation
.
* The eigenvectors of the covariance matrix are the Principal Components of the distribution. The first Principal Component is the direction of maximum elongation of the "football shaped" multinormal distribution. The second eigenvector is, of all the directions orthogonal to the first eigenvector, the direction of maximum elongation, a.s.o..
* The eigenvalues are the variances of the projections of the distribution on the eigenvectors.
If a change of reference frame is made so that the Principal Components become the new axes, then the directions of maximum elongation are the axes themselves, the new variables are uncorrelated, and the new covariance matrix is diagonal.
If units are then changed so that all Principal Components carry now the same variance, then the distribution becomes spherically symmetrical, like a globular cluster (for astronomers only!). The distribution is then said to be "sphericized".
Note that this is not true if a change of units is
made so that the original axes all carry the same variance (for example, if
the original variables are standardized).
The resulting cloud, although it has the same variance on all original
axes, is not spherically symmetrical. You may experiment with this idea
in the interactive animation
.
These topics are further developed in the Tutorial on Principal Components Analysis, or PCA.
____________________________________________________________
Related readings :
|
|
Want to contribute to this site ? |