Mahalanobis distance
Given two "points" x1 and x2 defined by numerical attributes (e.g., two observations), and if asked what the distance between these two points is, one usually will not think twice, and use the traditional euclidian distance :
d²(x1 , x2)
=
i(x1i
- x2 i)²
As we briefly explain here, the euclidian distance is not always appropriate. Among other classical "distances", the Mahalanobis distance plays an important role.
|
|
Given a multivariate normal distribution, one defines the (square of the) Mahalanobis distance of an observation x to the barycenter g of the distribution as follows :
d²M = (x - g)'
-1(x
- g)
with
the covariance matrix of the distribution.
Two observations sitting in regions with the same density are at the same (Mahalanobis) distance from the barycenter (although their euclidian distances from the barycenter may be quite different). Points that are at a given Mahalanobis distance from the barycenter sit on an ellipsoïd centered on the barycenter.
You can read a more detailed account of the Mahalanobis distance in the Tutorial on Discriminant Analysis.
An observation with a large Mahalanobis distance sits at the periphery of the data set, and is therefore a leverage point. Multiple Linear Regression software often incorporates Mahalanobis distances of observations (as well as observations Cook's distances).
_______________________________________
Related readings
|
Want to contribute to this site ? |