Independent random variables

# Definition of independence

Let X and Y be two random variables. The behavior of the pair {X, Y} is entirely described by their joint probability density function (or probability mass function)  fXY (x, y). In general, after a realization of {X, Y} has been drawn, the observed value x of X gives us some information about what the value of Y might be. This information is embodied in the distribution of Y conditionally to X = x. For example, if X and Y are both discrete, this conditional distribution is :

P{Y = y | X = x}

which is usually different for each and every value of x.

But it may happen that this conditional distribution does not, in fact, depend on x. In other words, the distribution of Y conditionally to X = x is the same whatever the value of x. As a consequence, knowing x gives us absolutely no information about the value of Y. The r.v.s X and Y are then said to be independent.

This leads us to the formal definition of the independence of two random variables :

 X and Y are said to be independent if the distribution of Y conditionally on X does not depend on x.

We show below that this definition is equivalent to the one where the roles of X and Y are interchanged.

Independence has deep and usually favorable consequences.

# Independence and joint probability distribution

The first consequence is that independence places a strong constraint on the mathematical form of the joint probability distribution of X and Y.

For suppose that X and Y are both continuous with densities fX (x) and fY (y). Then their joint probability density fX Y (x, y) is :

fX Y (x, y) = fY |X (y | x).fX (x)

by the property of conditional distributions.

The marginal density of Y is obtained by integrating this joint density over x :

But if the conditional density of Y does not depend on x, it can be taken out of the integral and we then have

So

fY |X (y | x) = fY (y)

and consequently

fX Y (x, y) = fX (x).fY (y)

or in words

If X and Y are independent, then their joint pdf is the product of their individual (marginal) pdfs.

-----

Conversely, suppose that the joint pdf of X and Y factors into the product of their (marginal) pdfs. Then

fY |X (y | x).fX (x) = fX Y (x, y) = fX (x).fY (y)

Hence

fY |X (y | x) = fY (y)

which clearly does not depend on x, and the two variables are therefore independent.

-----

Consequently

 X and Y are independent if and only if their joint distribution factors into the product of their marginal distributions.

Note that in our first definition of independence, X and Y played different roles. But now we see that the definition is in fact symmetrical in X and Y.

Note also that if X and Y are independent, then not only is the conditional distribution of Y independent of x, but it is also equal to the marginal distribution of Y (with a similar result for X).

______________________

The same calculation and result apply to discrete variables : just replace integration by summation.

# Factorization of the probability distribution

It is often the case that computing fX Y (x, y), the joint pdf of two r.v.s X and Y, will lead to an expression like :

fX Y (x, y) = g(x)h(y)

that is, the product of a nonnegative function of x and a nonnegative function of y.

See for example :
* Calculating the marginals or the Bivariate normal distribution after rotation of the axes .
* Transforming two Gamma distributed variables into a Beta distributed variable.

Note that g(x) and h(y) are not uniquely defined. For example, for any positive constant C, we also have  fX Y (x, y) = [Cg(x)][h(y)/C] = g' (x)h' (y).

The formal similarity with the above result makes it tempting to believe that X and Y are then independent, and that g(x) and h(y) are just the marginals of X and Y within constants.

We'll show that this is true, with an important caveat made explicit in the following proposition.

 * Let SX  be support of the r.v. X   (i.e. the domain over which the pdf of X is strictly positive). * Let SY be support of the r.v. Y.   Then X and Y are independent if and only if, over S = SX ×SY , the joint pdf of {X, Y} is the product of two positive functions g(x) and h(y).

It is then true, as we'll show, that the marginal of X is proportional to g(x), with a similar result for Y.

The caveat is that the support of {X, Y} has to be the product space of the supports of X and Y. If this condition is not respected, then the theorem fails as we'll show with a counterexample.

Another counterexample is to be found with the joint pdf of the order statistics of the uniform distribution (see here) : this joint pdf is constant and is therefore equal the product of its marginals (within a constant factor), but the support of the joint pdf is not the product space of the supports of the marginals (it's not [0, 1]n ).  And as a matter of fact, order statistics are, of course, never independent random variables.

-----

The theorem is also true if X and Y are discrete random variables.

# Independence and cumulative distribution function

We expressed the condition of independence in terms of probability density functions (or probability mass functions), but we might as well have expressed it in terms of cumulative distribution functions (cdf).

Recall that the cdf of the pair {X, Y} is the function FX Y (x, y) defined by :

FX Y (x, y) = P{X  x, Y  y}

We'll show that :

 * Let FX  (x) be the cdf of the r.v. X. * Let FY (y) be the cdf of the r.v. Y.   Then X and Y are independent if and only if FX Y (x, y) = FX (x).FY (y)

# Independence and probabilities

We'll show that a consequence of this result is that the "factorization paradigm" applies also to the probabilities for the pair {X, Y} to be inside a given rectangle of the (x, y) plane.

More specifically, we'll show that :

 X and Y are independent if and only if, for any numbers a, b, c, and d : P{a < X ≤ b, c < Y ≤ d} = P{a < X ≤ b}.P{c < Y ≤ d}

In other words, X and Y are independent if and only if the two events :

* A = X in [a, b]

* B = Y in [c, d]

are independent.

# Independence and expectation

We show here that :

 Two r.v.s X and Y are independent if and only if for any pair of functions f(x) and g(y) E[f(x)g(y)] = E[f(x)]E[g(y)]

provided that the expectations exist.

The direct part is easy, the converse a bit more difficult.

-----

A special case is of course f(x) = x and g(y) = y. Then the direct part of the theorem reads :

 If X and Y are independent, then : E[XY] = E[X].E[Y ]

As a consequence, two independent r.v.s are uncorrelated (show that their covariance is 0).

The converse is not true, though, and there are many exemples of uncorrelated, yet not independent random variables. The only case where lack of correlation implies independence is to be found with the marginals of the multivariate normal distribution.

# Independence and Moment Generating Function

Let MXY (t, s) be the moment generating function (mgf) of the pair {X, Y}. We'll show that :

 X and Y are independent if and only if their joint moment generating function can be factored into MXY (t, s) = MXY (t, 0).MXY (0, s)

In addition, we'll see that MXY (t, 0) = MX (t), the mgf of X (with a similar result for Y ).

In other words :

X and Y are independent if and only if their joint mgf factors into the product of their individual mgfs.

# Multivariate independence

All of the above generalizes straightforwardly to the case of n independent random variables.

Yet it is important to realize that "A set of n independent random variables" is a much stronger concept than  "A set of pairwise independent random variables".

In other words, given a set {Xi} of r.v.s, we may have :

Xj and Xk  independent for all pairs (j, k)    j ≠ k

and yet the n variables not be independent.

As a counterexample, we'll describe a set of three r.v.s that are pairwise independent and yet that are not independent.

# "Independent variables" in Data Modeling

The term "independent" is sometimes used in a very different context.

A predictive model has "input" and "output" variables.

The input variables are sometimes called the "independent variables", "independent" then meaning that their values can, at least in principle, be freely adjusted by the analyst, and are not determined by the values of other variables.

The use of this term is somewhat unfortunate because of the risk of confusion with independence as defined in this page. It is therefore preferable to use another denomination like "predictors", or "regressors" when the model is a regression model.

___________________________________________________________

 Tutorial

In this Tutorial, we show that the independence of two r.v. X and Y has several useful equivalent formulations (in addition to the definition and to their joint distribution being equal to the product of their marginal distributions) :

* Their joint pdf (pmf) is the product of a function of x and a function of y that are both positive over the product space of the supports of the marginals.

* Their joint cdf is the product of their individuals cdfs.

* The probability for the pair {X, Y} to be inside a given rectangle is equal to the product of the probabilities for the individual r.v.s to be inside the respective sides of the rectangle.

* The expectation of the product of any two functions of X and Y is equal to the product of the individual expectations of these functions (provided these expectations exist).

* The mgf of {X, Y} is equal to the product of their respective mgfs.

-----

We conclude by describing a set of three r.v. that are not independent although they are pairwise independent.

EQUIVALENT FORMULATIONS

OF INDEPENDENCE

 Factored joint distribution function Direct Converse Counterexample Independence and cumulative distribution function Direct Converse Independence and probabilities Direct Converse Independence and expectations Independence and moment generating functions Direct Converse Pairwise independence is not independence TUTORIAL

_____________________________________________________

 Covariance Correlation coefficient