Expectation
The most common measure of central tendency of a random variable.
Let X be a discrete random variable. The expectation (or "expected value") of X, denoted E[X], is the ponderated sum of the values in the domain of X, the "weights" being the respective probabilities of these values.
Denote by {xi} the domain of X. {xi} may containe a finite or infinite (e.g. Poisson r.v.) number of values.
We have :
|
E[X] = |
where P{X = xi} is the probability for X to take the value xi.
So the value of the expectation is more influenced by values with large probabilities than by values with low probabilities.
If all the probabilities P{X = xi} are equal, the expectation of X is then just the average of the set of values {xi}, which is then necessarily finite.
If the r.v. is continuous with a probability density function p(x), the expectation is defined as above, with the sum being replaced by an integral :
|
|
The foregoing expressions show that the definition of the expectation of a r.v. is the same as that of the mean µ of the probability distribution describing this variable. In this Glossary, we'll use :
* "Expectation" for random variables, and
* "Mean" for probability distributions.
For continuous as well as for discrete r.v., the expectation may not be defined.
For example, Fisher's Fm, n distributions with m = 1 or 2 have no mean. The reason is that, although the function Fm, n decreases fast enough at infinity for its integral to be equal to 1, it is not so for the integrand that defines the mean : it does tend to 0 at infinity, but not fast enough to prevent its integral to be infinite.
So, a r.v. with a Fm ,n distribution has no expectation for m = 1 or 2.
-----
Another example of a r.v. with no expectation is a Cauchy variable. The Cauchy distribution is symmetric with respect to the origin, and the mean could therefore be expected to be equal to 0. But is is not so because the quantity :

tends to infinity as a grows without limit. Facing this kind of difficulty, one imposes the expectation to be defined if and only if the two quantities :

and

are both finite. The expectation is then defined as the sum of these two integrals.
In the case of the Cauchy distribution, both integrals are infinite, as we'll show. The expression for the expectation is then :
-4 + (+4)
which is not defined. A Cauchy variable has therefore no expectation.
The expectation can be regarded as the central tendency of a random variable (just the same as for the mean of its distribution). Both Markov inequality and Chebyshev inequality show that it becomes increasingly difficult for a r.v. to reach a value that drifts away without limit from the variable's expectation.
Nevertheless, it should be kept in mind that the value of the expectation of a r.v. in not necessarily the most frequently observed value of the realizations of this r.v.. Consider the following :
* The value of the expectation of a discrete r.v. may be outside the range of the r.v.. For example, the expectation of the r.v. representing the roll of a die is equal to :
(1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5
which is a non-integer value, and can therefore never be the result of a roll.
* The expected value of a continuous r.v. may lie in a region where the probability density is near 0 or even 0. The r.v. will then never take a value equal to its expectation.

The reason why the expectation is such a useful concept is to be found in the Law of Large Numbers. In short, this Law states that, unless you're terribly unlucky, the average of a series of realizations of a random variable converges to the value of the expectation of this r.v. as the number of realizations grows without limit. This result bestows some sort of experimental status on the otherwise purely mathematical concept of expectation.
The concept of expectation plays a central role in the definition of higher order moments.
* Moments are expectations. The nth order moment of a r.v. X is, by definition, equal to E[X n].
* It is common to consider centered moments, the r.v. being then measured from its expectation. By definition, the nth order centered moment of a r.v. X is :
E[(X - E[X])n]
In particular, the variance Var(X) is the second order centered moment of X. Note that the above expression can be transformed into the very useful expression :
Var(X) = E[X²] - E[X]²
Let X be r.v.,and let f(.) be a function. What is the expectation of the random variable Y = f(X ) ?
Consider, for example, the case of a continuous variable X with probability density p(x), and f(.) regular enough for Y to have a probability density function g(y). Then, by definition, the expectation of Y is :

Unfortunately, except in some rare occasions, calculating g(y) from f(x) is intractable. But fortunately, calculating g(y) is not necessary for our purpose, for it can be shown that the expectation of Y is just :
|
|
A similar result applies when X is discrete. Denote p(.) the function :
p(xi) = P{X = xi}
Then it can be shown that :
|
E[Y] = |
-----
These results seem so natural that many newcomers to Statistics mistake them for definitions of E[Y], whereas they are in fact not definitions but theorems. This explains why they are often dubbed "The Law of the Unconscious Statistician", and refered to by the smart acronym "LOTUS".
Both results are demonstrated in the Tutorial below.
A similar, but more complex result, will be obtained for functions of two random variables. We'll need this result :
* For calculating the expectation of a linear combination of random variables (see below).
* For calculating the expectation of the product of two random variables (see below).
* For calculating the moments of Fisher's F distribution (see here).
The expectation has many useful properties. Here are some of the most important.
Let X be a r.v., and a and b be two real numbers. Then :
|
E[aX + b] = aE[X] + b |
Let {X1, ..., Xn } be n r.v.., and {a1, ..., an }be n real numbers. Then :
|
E[ |
-----
The two foregoing results are demonstrated in the Tutorial below.
Let X and Y be two random variables. What is the expectation of their product XY ? The result calls on the concept of conditional expectation :
|
E[XY] = E[X.E[Y | X]] |
and is demonstrated here.
The Cauchy-Schwarz inequality places an upper bound on the absolute value of the expectation of the product of two r.v. :
|
| E[XY] | |
where | ... | denotes the absolute value.
One of the consequences of the Cauchy-Schwarz inequality is that the correlation coefficient of two r.v. is a number between -1 and +1.
From the expression for the expectation of a product, the following result can be derived.
Two r.v. are uncorrelated if and only if :
E[XY] = E[X].E[Y]
-----
Shifting from mere lack of correlation to genuine independence requires the stronger following condition.
Two r.v. X and Y are independent if and only if for any pair of functions f(.) and g(.) we have :
E[ f(X)g(Y)] = E[ f(X)].E[ f(Y)]
provided that the expectations are defined. We demonstrate this result here.
Given two r.v. X and Y, the (unconditional) expectation of X can be calculated from the expectation of X conditionally to Y by the so-called "iterated expectation theorem" :
|
E[X] = E[E[X | Y]] |
We establish this result here, and give examples of its remarkable usefulness for calculating not only expectations, but also probabilities and moment generating functions.
________________________________________________
|
Tutorial |
We first get the problem of the existence of the expectation out of the way by showing that a Cauchy distributed r.v. has indeed no expectation.
In the remainder of this Tutorial, all r.v. will be assumed to have expectations.
-----
We then establish the "single variable" version of the Law of The Unconscious Statistician (LOTUS). The demonstration for discrete variables does not extend to continous variables, and the two cases therefore demand separate treatments.
To provide for an easy start, we here assume that the function of the r.v. is one-to-one, which makes the demonstrations quite a bit simpler. We'll remove this restriction in the second part of the Tutorial.
We then use LOTUS to calculate the expectation of a linear transform of a r.v..
-----
We then establish the "two variable" version of LOTUS. Again, the discrete and continuous cases have to be treated separately. Now that we're more familiar with LOTUS, we do not place any restriction on the function (except that it be piecewise continuous). This makes the demonstrations more involved, but also more general.
We then use LOTUS to calculate the expectation of a linear combination of r.v..
THE LAW OF THE UNCONSCIOUS STATISTICIAN (LOTUS)
AND FIRST CONSEQUENCES
|
A Cauchy r.v. has no expectation Expectation of a function of a random variable Discrete case Continuous case Expectation of a linear transform Expectation of a function of two random variables Discrete case Continuous case Expectation of a linear combination of random variables Two variables Any number of variables |
||
|
TUTORIAL |
||
___________________________________________________
Related readings :