Law of Large Numbers (Weak)
If you toss a fair coin only twice, although "Heads" and "Tails" both have a probability of 0.5, you certainly wouldn't be too surprised if the two tosses happened to both produce "Heads", or both produce "Tails", instead of producing exactly one "Head" and one "Tail".
But if you now toss the same fair coin 1000 times, you certainly expect the number of "Heads" to be very close to 500.
This intuition is backed by the Weak Law of
Large Numbers. In words, this Law states that if a trial is reproduced
a large number of times n, then it becomes exceedingly improbable that
the average of the outcomes of these n trials will differ significantly
from the expected value of one outcome as n grows without limit.
In
more technical terms, the Weak Law of Large Numbers states that :
* If {Xi} is an infinite sequence of i.i.d. random variables with common mean µ,
* If we define Yn as the r.v. equal to the mean of the first n Xis,
* Then, for any ε, the probability for a realization of Yn to fall more than ε away from µ tends to 0 as n grows without limit.
This can be expressed as follows :
|
|
or in words :
* No matter how small ε, all you have to do to make the probability for the mean of the first n terms to differ from the mean µ by more than ε to be as small as you wish.is to make n large enough.
In the vocabulary of Estimation, the WLLN states that the sample mean is a consistent estimator of the population mean.
-------------------------
The standard terminology of Calculus would reformulate the Law as follows :
* No matter how small ε > 0,
* No matter how small δ > 0,
* There is a number N(ε, δ) such that if n > N(ε, δ) then :
The WLLN receives a simple graphic interpretation. For each value of n, the r.v. Yn has a probability distribution (that we here assume to be continuous) which is represented by the green curve in the illustration below. The area under the curve is always 1.
Now position a 2ε long segment s on top of µ. Denote An the area under the curve outside s.
An is the probability for Yn to be different from µ by more than ε (in absolute value).
The WLLN states that, for a given ε, An tends to 0 as N grows without limit. In other words, outside of s, the "tails" of the distribution of Yn become negligible when n tends to infinity (lower image of the above illustration).
The coin tossing paradigm is convenient for illustrating the WLLN.
Consider :
* A fair coin,
Suppose these sequences are numbered 1, 2, ..., 2n in an arbitrary way. For sequence #i, denote mi the average number of Heads per toss in the sequence :
mi = 1/n. Number of Heads in the sequence
Then take an arbitrary small positive real number ε, and count the number N(n, ε) of sequences such that mi departs from 1/2 by more than ε.We show here that the proportion of these "deviant" sequences :
N(n, ε) /2n
tends to 0 as n grows without limit.
So the WLLN can easily be derived directly in the case of a fair coin tossing.
The interesting things about this derivation are that :
-----
Yet you may find it amusing to demonstrate that if you consider only sequences with an even number 2n of tosses, then the proportion of sequences that have exactly n Heads and n Tails tends to 0 as n grows without limit.
-----
Suppose you toss a fair coin 20 times, and obtain 20 Heads in a row. Even the most seasoned and experienced statistician is submitted to the temptation of thinking :
|
The WLLN tells us that that ultimately, the numbers of Heads and Tails must be about equal. After this incredibly unlucky opening sequence, the following tosses must therefore produce mostly Tails for the game to return to a roughly balanced count of Heads and Tails. Hence, I'm expecting the following tosses to generate mostly Tails. In other words, an excess of Heads in an opening sequence must cause an increase of the probability of Tails for the ensuing tosses.
How could the balance be attained otherwise ? |
Of course, because the tosses are independent, there is no such thing as "probability adjustment". The WLLN does not care whether an opening sequence, however long, is balanced or not. All it says is that all you have to do is toss the coin long enough to increase the probability for the sequence to be just about balanced, but it says nothing about how long you have to wait for this to happen. Many gamblers went to their ruin for misinterpreting the WLLN.
The above expressions make an explicit reference to the common mean µ of the variables.
In addition, the WLLN will appear to be a consequence of Chebyshev inequality, which requires the variables to also have a variance.
So the WLLN seems to apply only to variables that have at least a mean and a variance. In fact, the existence of the variance is not required, and more sophisticated proofs than the one we give below make no reference to the existence of the variance.
-----
The existence of the mean is of course always required. So, for example, the WLLN does not apply to samples drawn from the Cauchy distribution, as this distribution has no mean. We already noticed that the distribution of the mean of a sample drawn from a Cauchy distribution does not depend on the sample size : therefore, there is no "shrinking" of this distribution as the sample size increases.
We stated that the WLLN applies to independent and identically distributed r.v.s. Assuming the independence of variables is so common in Statistics that we sometimes forget how strong a restriction this is.
In the Tutorial below, we give a counter-example that deals with variables that are indeed identically distributed but that are not independent. A consequence of the breakdown of the independence hypothesis will be that the WLLN does not apply to this sequence of variables.
The Weak Law of Large Numbers bears on the convergence in probability of the empirical mean. All the r.v. to be found in its definition are not only independent, but also identically distributed.
This last restriction can be lifted. One generalization of the WLLN considers all the r.v. involved as independent, but each one with its own probability distribution, and therefore with its own mean and variance, with no restriction.
The Law about the empirical mean beneficiates from the fact that the variance of this mean tends to 0 as the sample size tends to infinity. This advantage disappears if the variances of the r.v. are arbitrary.
The Law must then be somewhat complexified so as to make it possible to build an infinite sequence of r.v. that can be guaranteed to converge in probability.
We formulate and demonstrate this Generalized Law of Large Numbers in the Tutorial below. We then use it to demonstrate the Fundamental Theorem of Statistics.
Why is this Law considered "weak" ?
The term "Weak" refers to the way the sample mean converges to the distribution mean. At first sight, it may seem that there is no better way to converge than what we described here, and which is known as "convergence in probability".
But it turns out that the sample mean converges to the distribution mean in a much "stronger" way than just "in probability" : this convergence is almost sure, and the Weak Law of Large Numbers is in fact superseded by a "Strong" Law of Large Numbers.
___________________________________________________
|
Tutorial |
We demonstrate here the Weak Law of Large Numbers (WLLN).
We first demonstrate Markov inequality, and then its generalization, Chebyshev inequality. The WLLN will turn out to be a direct application of Chebyshev inequality.
We insist on the necessity for the variables to be independent by exhibiting a counter-example where the variables in the sequence are indeed identically distributed, but are not independent. A consequence of this lack of independence will be that the WLLN will not apply to this sequence.
We then demonstrate a generalization of the WLLN that does not bear on the convergence of the empirical mean, but on that of a largely arbitrary sequence of random variables. We'll then use this result to :
* Demonstrate the Fundamental Theorem of Statistics.
* Outline a proof that the sample moments are consistent estimators of the moments of a distribution.
THE WEAK LAW OF LARGE NUMBERS
|
Markov inequality Chebyshev inequality The Weak Law of Large Numbers Counter-example : a breakdown of the WLLN Generalized Weak Law of Large Numbers Generalizing the Law Demonstration of the generalized WLLN Lemma Demonstration Fundamental Theorem of Statistics Convergence of the sample distribution function Glivenko-Cantelli theorem (no demonstration) Estimation of the moments of a distribution (outline) |
||
|
TUTORIAL |
||
____________________________________________________
Related readings