Convergence (Almost sure)
"Almost sure" convergence of an infinite sequence {Xn} of random variables to another r.v. X is the ultimate type of convergence. No other type of convergence of a sequence of r.v.s comes closer to the convergence of a sequence of numbers in the usual sense of Calculus.
It took a long time for mathematicians to clarify the intuitive concept of limit. The standard definition is now as follows :
|
The sequence of real numbers {x1, x2, ... } is said to converge to the limit "l" if : * For any ε, however small, * There exists a rank N such that, for all n > N, | xn - l | < ε. |
In other words, l is the (unique) limit of the sequence if, given any small interval covering l, all the terms of the sequence beyond a certain rank are inside this interval.
Note that a consequence of this definition is that for any interval covering l, there is only a finite number of terms of the sequence outside this interval.
-----
We now would like to extend this definition to random variables, and define unambiguously what meaning could be attributed to the more or less intuitive idea of "a sequence {Xn} of r.v.s converging to a limit l ".
The most natural idea is to transpose the definition of a converging sequence of numbers to the sequence of realizations of the r.v.s Xn. So, a tentative definition could be something like this :
The sequence {Xn} of r.v.s is said to converge to a limit l if any sequence of realizations of the Xns is a sequence of numbers that converges to l.
Nice try, but it won't work. Except in some almost trivial cases (try to imagine one), there is no way you can guarantee that for any ε, there will only be a finite number of realizations outside the interval ]l - ε, l + ε[ because of the random nature of the Xns.
So we have to relax our definition somewhat in order to take the randomness of the Xns more thoroughly into account. We will accept the possibility for the sequence of realizations of the Xns not to converge to l, but only as rarely as possible. What this means is that, however small the probability α, the event "The sequence of realizations did not converge to l" should occur less than (α.100)% of the time.
Such an event is said to happen with probability 0. This does not mean that the event is impossible, just that it happens so rarely that it is not possible to assign any non-zero probability to its occurence.
Events happening with probability 0 are not as strange as it may seem. Consider any infinite sequence of 0s and 1s :
0100110001011010011..........
Now toss a coin until the end of times, and consider how likely it is that this sequence of tosses will produce exactly the above sequence. Clearly :
* It is not impossible to reproduce the sequence, for any toss will produce a 0 or a 1, and will produce the appropriate bit with probability .5.
* But it is not possible to assign any non-zero probability to the event : "All the bits of the infinite sequence have been correctly generated by the infinite sequence of tosses".
So any infinite sequence of 0s and 1s will be attained with probability 0.
This setting should remind us of the geometric distribution. We then wondered how many times a coin has to be tossed before the first Head turns up, but we omitted to note that it is possible that Head never shows up, although this event occurs with probability 0.
The complementary event of an event occuring with probability 0 is said to be almost sure, or to occur "almost surely". We illustrate this concept with an example which is just a little bit less simple than the previous one, and which also relates more directly to axiomatization of the Theory of Probability (an issue that is not addressed in this Glossary).
Let U[0, 1] be the uniform distribution over [0, 1]. We are going to show that if x is a number drawn from this distribution, then x is almost surely an irrational number.
-----
Consider the segment S = [0, 1], and s = ] a, b [ an interval included in S. The length of s is l = b - a.
Now let's draw an observation from the uniform distribution U[0, 1]. The probability for x to be in s is equal to l.
For any set E included in s, we clearly have :
P{x in E}
l
-----
For any finite or countably infinite (or "countable") set of events Ai with respective probabilities Pi such that ΣiPi < + ∞ , we have :
P{At least one of the Ai occurs}
Σi Pi
with equality iff the Ai are independent.
-----
Now consider the set Q of rational numbers, restricted to [0, 1]. This set in countable and all rational numbers in [0, 1] can therefore be labeled by integers without omission or repetition :
Q = {q1, q2, .....}
Let ε be a positive real number. Cover the rational qi by an interval si of length li :

The sum of the lengths of these intervals is :

The probability for an observation x drawn from U[0, 1] to be in at least one of the si is therefore smaller than ε, and this is a fortiori true of the probability for x to be rational, for every rational is inside at least one of the si.
As ε is arbitrarily small, we see that the probability for x to be rational is smaller than any non zero ε. Therefore, the event "x is rational" occurs with probability 0.
Therefore, x is almost surely irrational.
This simple result is not always perceived as obvious by the newcomer : after all, between any two points of [0, 1], however close, there is an infinity of rational numbers (Q is said to be "dense"). Yet, hitting a rational number by shooting at random is the next best thing to a miracle (probability 0). This is because the points of Q can be covered by intervals whose total length is arbitrarily small : Q is said to be of "measure 0".
Any countable set in [0, 1] is of measure 0. Yet, there exist sets of measure 0 that are uncountable, and
which are therefore "as populated" as the set of real numbers.
The set of non normal numbers
is a typical example of an uncountable set of measure 0.
We can now return to the issue of defining the convergence of a sequence of random variables to a constant.
A sequence {Xn} of r.v.s is said to converge almost surely to the limit l
if the convergence of the sequence of realizations of the Xn to l is an event of probability 1.
In other words, the convergence of the realizations to l is an almost sure event.
This idea is formalized in the following definition of almost sure convergence of a sequence {Xn} of r.v.s to a number l :
|
P{lim {Xn} = l} = 1 |
We established the Weak Law of Large Numbers, which, in informal terms, states that as we consider larger and larger samples, the sample mean (which is a r.v.) converges to the distribution mean. The term "Weak" refers to the type of convergence that we demonstrated (convergence "in probability"), and which is "weaker" than the almost sure convergence described in this section (see below).
There exists a Strong Law of Large Numbers which states that our little demonstration missed an important point : the convergence of the sample mean to the distribution mean is in fact almost sure. So the Strong Law of Large Numbers is as follows :
* Let {Xn} be a sequence of independent and identically distributed random variables with a mean (the existence of the variance is not requested).
* Then the sequence {Yn} with :

converges almost surely to the common mean µ of the Xns.
|
|
Although almost sure convergence is a common phenomenon, it is usually difficult to prove, and the Strong Law of Large Numbers, simple as it looks, is no exception. We therefore state it without proof, except here in the case of the fair Heads and Tails game.
Coin tossing was a convenient paradigm for illustrating the Weak Law of Large Numbers, and it is also the case for the SLLN.
-----
Denote Ω the set of all semi-infinite sequences of 0s and 1s. Each one of these sequences may represent the outcome of an infinite sequence of tosses of a fair coin. Consider ω, one of these sequences, and denote mn the average number of Heads per toss in the first n tosses :
mn = 1/n.Number of Heads is the first n tosses.
One of two things happens :
* Either mn tends to 1/2 as n grows without limit.
* Or else mn does not tend to any limit at all (can you imagine such a sequence ?), or to a limit different from 1/2.
Denote A the set of sequences for which mn does indeed tend to 1/2, and Ac the complementary set (the set of "irregular" sequences).
Then what the SLLN says is that if you proceed with an infinite sequence of tosses of a fair coin, then the sequence of Heads and Tails you'll obtain will be in A with probability 1, or else in Ac with probability 0.
We mentioned earlier that any infinite sequence of Heads and Tails, although not impossible, occurs with probability 0. But the above statement is even stronger : there is an infinite number of irregular sequences, and what the SLLN says is that obtaining any of these irregular sequences is, globally, an event with probability 0.
-----
We noted that illustrating the Weak Law of Large Numbers with the coin tossing paradigm did not involve considering infinite sequences of tosses. Now, we see that the very nature of the Strong Law of Large Numbers is about such infinite sequences.
________________
The Strong Law of Large Numbers is demonstrated here in the special case of the fair Heads and Tails game.
So far, we considered the almost sure convergence of a sequence of r.v. to a limit l, with l a number. The definition can easily be generalized to the convergence of sequence {Xn}of r.v. to not a number, but to a random variable X.
{Xn} is said to converge almost surely to X if the sequence {| Xn - X |} converges almost surely to 0.
which is usually transcribed as :
|
P{lim Xn = X} = 1 |
or equivalently :
P{lim(| Xn - X |) = 0 } = 1
Exploring the world of almost sure convergence represents a quantum leap in difficulty over the spirit of this Glossary. So we'll just mention a few important results, and won't even attempt to outline the proofs.
Nothing beats almost sure convergence. In particular, almost sure convergence implies :
* Convergence in probability (as in the Weak Law of Large Numbers)
* Convergence in distribution (as in the Central Limit Theorem), which, anyway, is implied by convergence in probability.
The following result sheds some light on the relationship between almost sure convergence and convergence in probability.
* If the sequence {Xn} of r.v.s converges in probability to X,
* Then from this sequence can be extracted a sub-sequence that converges almost surely to X.
As we stated, demonstrating almost sure convergence is always difficult. The definition is usually of little help, but various criteria have been identified that guarantee almost sure convergence. Here are the most common ones.
The sequence {Xi} of r.v. converges almost surely to the r.v. X if and only if :
* However small ε,
* There is a rank N such that, for all n > N :
P{ |Xn - X |} < ε] > 1 - ε
If, however small ε, we have :

then {Xn} converges almost surely to X.
{Xn} converges almost surely to X if and only if, however small ε, we have :
P{lim sup( | Xn - X | > ε)} = 0
_____________________________________________________
Related readings :