Interactive animation

Mahalanobis distance

The "Mahalanobis distance" is a metric (a rule for calculating the distance between two points) which is better adapted than the usual "Euclidian distance" to settings involving non spherically symmetric distributions. It is more particularly useful when multinormal distributions are involved, although its definition does not require the distributions to be normal.

This page describes two usual cases where the Mahalanobis distance plays an important role :

1) Distance of a point to the mean of a distribution,

2) And, further down, distance between the means of two distributions.

# Distance of an observation to the mean of a distribution,

## Definition of the Mahalanobis distance

In this image, the two points A and B are equally distant from the center µ of the distribution.

Yet, it seems inappropriate to say that they occupy "equivalent" positions with respect to O as:

* A is in a low density region,

* While B is in a high density region.

So, in a situation like this one, the usual Euclidian distance :

d ²(A, µ) = i (oi - µi)²

does not seem to be the right tool for measuring the "distance" of a point to the center of the distribution.

We could instead consider as "equally distant from the mean" two points with the same probability density : this would make them equally probable when drawing observations from the distribution.

Suppose now that the distribution is multinormal. Because of the functional form of the multivariate normal distribution, these two points would lead to the same value of the quantity :

D ² = (x - µ)'-1(x - µ)

with the covariance matrix of the distribution.

D is called the Mahalanobis distance of the point x to the mean µ of the distribution.

## Mahalanobis distance and Discriminant Analysis

Suppose you want to discriminate between two equally extended spherical classes with equal a priori probabilities. Then the best classification rule is simply to assign an observation x to the class whose center (mean) is closer to x in the sense of the ordinary Euclidian distance.

But it is not so if the classes are not spherical anymore (although still multinormal with identical covariance matrices). We then should assign x to the class to which it has the larger probability to belong, that is the class with the largest probability density in x (because of the equal a priori probabilities), and therefore to the class with the lower value of the Mahalanobis distance of x to the class mean. For example, in the lower image of the above illustration, x should be assigned to class C1 although it is in "C2 territory" from a Euclidian point of view.

So the "assign to the class whose mean is closer" paradigm is still valid, provided that "Euclidian distance" be replaced by "Mahalanobis distance".

This point of view is adopted and further developed by Discriminant Analysis.

# Mahalanobis transformation, standardization of a distribution

The Mahalanobis distance is often associated to the multinormal distribution, but it is in fact more general than that.

Consider any non degenerate distribution with covariance matrix . We show here that it is possible to identify a transformation that makes the transformed distribution " standard", that is with uncorrelated, standardized marginals. The covariance matrix of this new distribution is then I, the identity matrix.

This transformation is called the Mahalanobis transformation.

-----

The Mahalanobis transformation is a powerful tool for studying the multivariate normal distribution as it often allows :

* To transform a given multinormal distribution into the simple standard (spherical) multinormal distribution.

* To solve the problem at hand on this very simple distribution.

* Then to carry this solution back to the original distribution using the "reverse" Mahalanobis transformation.

For example, it can be used :

* For calculating the marginal distributions of the multinormal distribution (see here).

* For studying the distributions and conditions of independence of quadratic forms in multivariate normal variables (see here).

* For assisting the Box-Muller transform when simulating a multinormal random vector.

# Mahalanobis distance and Linear Regression

Linear Regression is much concerned by outliers, that is, observations sitting at the periphery of the data cloud, as they have a stronger influence than average on the fitted model.

Among the many ways of detecting outliers is the Mahalanobis distance D : using the above expression for D, software calculate the Mahalanobis distance of each observation. With some care, this distance may be used for detecting observations which may then be suspected to be quite different from the "average" observations in the data set. In this respect, the Mahalanobis distance comes as a complement of other diagnotic tools like the "Cook's distance", another way of measuring how atypical an observation is.

# Distribution of the squared Mahalanobis distance

As the squared Mahalanobis distance D ² will be used in data modeling, it is important to know its probability distribution.

* The general case

Not much can be said about this distribution in the general case. Yet, we show here that whatever the (non degenerate) distribution, the mean of D ² is equal to p, the dimension of the space of the distribution.

 E[D ²] = p

* Special case : multivariate normal distribution

If the distribution is multinormal, the distribution of D ² is completely known : we show here that it is a Chi-square distribution with p degrees of freedom.

 If x is multinormal, then D ² ~p

Of course, owing to the properties of the  distribution, we find again that the expectation of D ² is p.

Note that although the definition of the Mahalanobis distance makes an explicit use of the covariance matrix , the distribution of D ² does not depend at all on this covariance matrix : it is the same for all (non degenerate) multivariate normal distributions, and depends only on the space dimension.

This point is illustrated in the following animation.

## Animation

This animation illustrates the distribution of the squared Mahalanobis distance in the case of a bivariate normal distribution.

 The "Book of Animations" on your computer

 Click on "Go". Observations are drawn repetitively from the standard bivariate normal distribution. The lower frame displays as a red curve the theoretical distribution of the squared Mahalanobis distance D ², which is 2. Observe the build-up of the corresponding histogram. While the animation is running, change the characteristics of the distribution by :     * Changing the standard deviations of the two marginals ("SD1" and "SD2" cursors).     * Changing the correlation coefficient of these two marginals ("ρ" cursor).   Despite these modifications, nothing changes in the lower frame : the theoretical distribution of D ² remains the same, and the buid-up of the histogram is unaffected. So we see that :     * The multinormal nature of the distribution determines the nature of the distribution of D ².     * The number of degrees of freedom is determined by the dimension of the space,   but all other parameters (summed-up in the covariance matrix) are irrelevant. The animation does not allow the adjustable correlation coefficient to go all the way to -1 or +1. This is because calculating the inverse of the covariance matrix then involves handling very large numbers that are conducive to numerical instabilities.If this correlation coefficient is equal to -1 or +1, the distribution becomes degenerate : the covariance matrix becomes singular, and therefore has no inverse, and the Mahalanobis distance is not defined anymore.

# Mahalanobis distance between the means of two classes, Hotelling's T² test

In the preceding section, we defined the Mahalanobis distance between an observation and the mean of a distribution.

The same definition is used for defining the Mahalanobis distance between the means of two distributions, that we assume to be multinormal with identical covariance matrices Σ.

We then have

D ² = (µ2 - µ1)'Σ-1(µ2 - µ1)

By a common abuse of language, we'll say that D is the "distance between classes C1 and C2".

An important issue considered by the analyst before building a classification model is whether the classes are indeed different, or if they overlap to such a large extent that any model will fail telling them apart. When considering multinormal classes with identical covariance matrices, the question boils down to determining whether the two class centers µ1 and µ2 are on top of each other or not. When indeed µ1 = µ2, the distribution of the empirical Mahalanobis distance is known, and it is therefore possible to test the null hypothesis H0 : µ1 = µ2.

More precisely, under the following conditions :

* n1 observations are drawn from C1,

* n2 observations are drawn from C2,

* The two classes have p-dimensional multinormal distributions with identical covariance matrices Σ.

* This common covariance matrix is estimated by Σ*, the ponderated mean of the empirical covariance matrices of C1 and and of C2, that we denote Σ*1 and Σ*2

Σ* =

where n = n1 + n2,

* The sample means of the two classes are denoted respectively 1 and 2.

The empirical squared Mahalanobis distance beween the two classes is then defined as

Dp²* = ( 2 - 1)'Σ*( 2 - 1)

It can be shown that if µ1 = µ2 :

where F is Fisher's F distribution. The corresponding test is called "Hotelling's T ² test".

-----

In particular, in the unidimensional case (p = 1), the squared Mahalanobis distance between the sample means of two normal distributions is :

where σ²* is the estimated common variance of the two normal distributions.

We recognize D1²* as the squared T statistic used by the two independent (univariate) samples t test. Its distribution is obtained by making p = 1 in the above general expression to obtain :

We certainly anticipated this result, as we know that a squared Student's Tn - 2 variable is F(1, n - 2) distributed.

______________________________________________________

Related readings :

 Covariance matrix Discriminant Analysis Outlier Multiple Linear Regression Cook's distance Multivariate normal distribution Quadratic form
 Download this Glossary