As the name implies, covariance is a measure of the strength of the link between two (numerical) random variables.
Given two such r.v. X1 and X2 , two extreme circumstances can be encountered :
X2 = f(X1)
Knowing the value of X1 then detemines the value of X2 without any uncertainty.
Most often, the link between two r.v. is "somewhere in between" : knowing the value taken by X1 reduces, to a certain extent, the uncertainty about the value that X2 will take.
There is no universal way to define and measure the strength of the link in this intermediary situation. Covariance is one way to do it, and is very useful in many practical situations despite its limitations.
If X1 and X2 are strongly (positively) linked, then we could think of defining covariance in a way that would embody the following idea :
* Whenever X1 is positive, then X2 is likely to be positive too.
* Whenever X1 is negative, then X2 is likely to be negative too.
This will not do because we want the covariance to be unchanged when both probability distributions are translated by arbitrary quantities. So instead of measuring the values of X1 and X2 from "0", we will measure them from reference points that translate along with the probability distributions, for example their respective means µ1 and µ2. Our original idea now reads :
* Whenever (X1 - µ1) is positive, then (X2 - µ2 ) is likely to be positive too.
* Whenever (X1 - µ1) is negative, then (X2 - µ2 ) is likely to be negative too.
So if X1 and X2 are strongly (positively) linked, more often than not, X1 - µ1 and X2 - µ2 are :
* simultaneously positive,
* or simultaneously negative.
The product (X1 - µ1).(X2 - µ2 ) is then likely to be very often positive :
* Either because both quantities are positive,
* Or because both quantities are negative.
Yet, the product (X1 - µ1).(X2 - µ2 ) is a random variable, and we want a fixed number. But a random variable that spends most of its time taking positive values is likely to have a positive expectation. So we will consider the expectation of (X1 - µ1).(X2 - µ2 ), and call it the covariance of X1 and X2 :
Cov(X1, X2) = E[(X1 - µ1).(X2 - µ2 )]
We'll show that this expression is equivalent to this other one, more convenient in practice :
Cov(X, Y) = E[XY] - E[X].E[Y]
Staying with purely qualitative arguments, we notice that, for given and fixed probability distributions of X1 and of X2 :
* A large positive value of the Covariance is an indication that (X1 - µ1) and (X2 - µ2 ) often take large positive or large negative values simultaneously, a circumstance that strengthens our belief that the variables are indeed tightly linked.
* Wheraeas a smaller positive value of the covariance is an indication that one of the variables has a fair chance to be close to its mean when the other takes large (positive or negative) values.
So, a large positive value of the covariance is a good detector of a strong link between two r.v.. It can be shown that this link is then necessarily linear.
What conclusions may be drawn from a low value (close to 0) of the covariance ? In general, none.
* The Covariance may be low because, indeed, the link between the two variables is weak.
* But their may exist a strong, non linear link between the two variables, the nature of this link making the Covariance low (see here).
We developed the argument leading to the definition of the covariance on the basis of a positive link between X1 and X2. But it applies just as well in the case of a negative link. We can use the same line of reasoning if X1 - µ1 taking large positive values makes it likely that X2 - µ2 will take large negative values. In this case, the covariance is a large negative number.
A drawback of covariance is that its value depends on the units used to express the values of X1 and X2, whereas a practical measure of the strength of the link between two variables certainly shouldn't. Therefore, attractive as it is for the theoretician, the application oriented analyst will usually prefer its standardized version, the correlation coefficient, whose value does not depend on measurement units, and that we describe here.
If the two random variables X and Y are independent, then their covariance is 0.
But the converse is not true : two random variables may have 0 covariance, and yet not be independent. For example, let :
* X be uniformly distributed in [-1, +1].
* Y = X ².
We leave it as an exercise to show that
Cov(X, Y ) = 0
while X and Y are clearly not independent.
The basic properties of Covariance are listed and
demonstrated in the following Tutorial :
BASIC PROPERTIES OF THE COVARIANCE
Variance is self-covariance
Another expression for the Covariance
An interpretation of the covariance
Linearity with respect to constants
Linearity with respect to variables
Variance of a sum of random variables
Covariance and independence
Related readings :
Interesting examples of calculation of a covariance :