Law of Large Numbers  (Strong)

We suggest that you first read the entry about the Weak Law of Large Numbers (WLLN).

"Weak" and "Strong" Laws of Large Numbers

Let {Xn} be a sequence of iid rvs with common mean µ, and let ε be an arbitrarily small (positive) number.

The mathematical expression of the WLLN is

 

which expresses the fact that the probability for the sample mean to depart significantly from the distribution mean tends to 0 as the sample size grows without limit (convergence "in probability").

The Strong Law of Large Numbers addresses the same problem, but states that, however small ε,

 

 

 

which expresses the fact that the sample mean converges almost surely to the distribution mean.

 

The framed expression is not always perceived as self-explanatory, and we now spend some time explaining what it actually means.

Shortcomings of elementary probability theory

An impressive number of results can be obtained by an elementary, intuitive probability theory with no heavy mathematical formalization.

Yet, this theory breaks down when it comes to describing two types of events (that are intimately connected) :

    * Events that are not impossible but are so rare that no finite probability can be assigned to them. A simple example of such difficulty can be found with the failed attempt to assign a probability to the event "x = x0" where x0 is a given real number. Although not impossible, the event cannot be assigned a finite probability, however small.

    * Infinite sequences of events. Consider for example the random walk of a point on a linear lattice. Will the point ever return to its starting position ? Under rather general conditions, the answer is "Yes, it will almost surely.", meaning that it is not possible to assign a probability, however small, to the event "The point will never returns to its original position", although this event is definitely not impossible.

-----

It is a major achievement of Probability Theory to have succesfully developed a powerful mathematical system that allows a rigorous treatment of such very real problems. The complete (Axiomatic) Probability Theory lies beyond the bounds of this Glossary, and an appropriate treatment of the Strong Law of Large Numbers requires the fully developed form of Probability Theory.

Yet, there is a special case where the deep concepts of "probability 0 event" or "probability 1 event" can be addressed rigorously without resorting to the arsenal of Axiomatic Probability Theory. We now describe this special case, that will be extensively studied in the Tutorials below.

The fair "Heads and Tails" game

Correspondence with real numbers

Consider an infinite sequence of tosses of a fair coin. Write down the sequence of outcomes, with the convention that "Heads" is represented by "0" and "Tails" by "1". The sequence will look something like

011010010110.........

The mathematician interprets this sequence as the binary (or "dyadic") expansion of a real number in ]0, 1] and hence establishes the correspondence

One real number in ]0, 1]                  One infinite sequence of tosses  

 As we'll show, the correspondence is in fact not perfect, but the departure from exact correspondence has no adverse consequences.

Elementary probability theory

This approach is very productive. In particular, all of elementary probability theory can be mirrored into the properties of finite collections of intervals in ]0, 1] once it is realized that the probability for a randomly chosen point x to be inside the interval ]a, b[ is just (b - a), the length of the interval.

P{x in ]a, b[} = b - a

 

 

 

So it appears that :

    * The probability for a randomly drawn point to be inside a given interval is equal to the length of this interval.

    * The probability for a randomly chosen point to be inside a given finite union of disjoint intervals is equal to the sum of the lengths of these intervals.

"Measure" of a set of points

At this point, the correspondence between probability theory and the properties of intervals in ]0, 1] may be perceived as nothing more than amusing.

But as stated above, and as we'll show below, there are very real probabilistic problems that cannot be interpreted in terms of finite unions of intervals because the events to which one tries to assign probabilities can indeed be represented by sets of points in ]0, 1], but these sets are not unions of disjoint intervals, and therefore do not have "lengths" in the ordinary sense of the word.

It is a major achievement of mathematics to have succesfully generalized the concept of "length of an interval" to that of measure of just about any set of points (including, of course, intervals as a special case). It then becomes possible to pursue the analogy between "probability" and "measure of a set of points" in situations where elementary probability theory breaks down.

Sets of "measure 0"

"Measure theory" is a formidable body of concepts and results that is not addressed in this Glossary, except for the very special but important case of "sets of measure 0", that we already touched upon here. In particular, we showed that the set Q of rational numbers is of measure 0, with the practical consequence that a randomly drawn number is "almost surely" irrational ( a result that can't even be expressed in elementary probability theory).

This result is almost trivial, because the set of rational numbers is countable : it is easy to exhibit a one-to-one correspondence between Q and N, the set of integers. It is then just as easy to show that any countable set of points is of measure 0.

-----

It may happen that a set is so big that it is not possible to establish a one-to-one correspondence between this set and N, the set of integers. For example, it can be shown that there can be no one-to-one correspondence between R, the set of real numbers, and N because there are just "too many" real numbers. A set is said to be uncountable if it can be put in one-to-one correspondence with R. The set of points in ]0, 1] is uncountable.

The special case of the Strong Law of Large Numbers that we are going to study will be facing a pretty wild kind of beast : an uncountable set of measure 0, that is, a set that can be put in one-to-one correspondence with ]0, 1]  (or R), and yet that can be confined inside a "box" as small as desired.

The Strong Law of Large Numbers for the fair "Heads and Tails" game

We will not demonstrate the full version of the Strong Law of Large Numbers, but only the special case pertaining to the fair "Heads and Tails" game. This restricted version nevertheless captures the spirit of the complete SLLN.

Normal numbers

Consider an infinite sequence of tosses of a fair coin as described above. It is anticipated that the sequence will produce "as many" 1s as 0s. More precisely, it is anticipated that the proportion of 1s in a n digit long opening sequence of the binary expansion of the real number associated to the sequence will tend to 0.5 as n grows without limit. This intuition is confirmed by the Weak Law of Large Numbers.

An asymptotically balanced sequence of 0s and 1s is said to be the binary expansion of a normal number.

Non normal numbers

Yet, it is easy to build sequences of 0s and 1s such that this perfect asymptotic balance is never reached. Such a sequence is said to be the binary expansion of a non normal number. A non normal number represents an infinite sequence of tosses for which the proportion of Heads does not tend to .5 as longer and longer opening sequences of tosses are considered.

We intuitively feel that this kind of event, although possible, is extremely rare, but there is no way a finite probability can be assigned to it, and describing the properties of the set of "imbalanced" sequences of tosses is beyond the capabilities of elementary Probability Theory.

Borel's normal numbers theorem

Let's put probabilities aside for a second. The set of non normal numbers in ]0, 1] is unambiguously defined. Borel's normal numbers theorem, that we demonstrate below, states that

 

The set of non normal numbers is of measure 0

 

This result is far from obvious as we will show that the set of non normal numbers is uncountable.

-----

Note that Borel's theorem is a set theoretic result, and more precisely, a Measure theory result.

Borel's theorem  and  the Strong Law of Large Numbers

We now return to probabilities. Tossing a fair coin an infinite number of times is equivalent to drawing a number at random in ]0, 1]. Keeping in mind the correspondence between "probability of an event" and "measure of the set of points representing the event", Borel's theorem tells us that

The probability of an infinite and imbalanced sequence of tosses of a fair coin is 0.

This is, in informal terms, the Strong Law of Large Numbers in the special case of independent Bernoulli random variables with p = .5.

-----

If we now return to the mathematical formulation of the SLLN :

    * The term within curly brackets expresses the fact that the infinite sequence of tosses is slightly off balance,

    * And P{} = 0 expresses the fact that however small the imbalance, the probability for this imbalance to occur is 0 in the sense that the set of points representing the event "the sequence is not perfectly balanced" is of measure 0.

The general Strong Law of Large Numbers

The "Heads and Tails" version of the Strong Law of Large Numbers is in fact a result in Measure theory. Its probabilistic interpretation is based on the interpretation of "probability 0" meaning "the probability for a randomly chosen point to belong to a set of measure 0".

In its general form, the Strong Law cannot rely on the geometric interpretation of the dyadic development of real numbers anymore. Demonstrating the Strong Law then requires to have first defined the concept of probability, and this can be done only within the framework of a fully axiomatized probability theory, a topic which is not addressed in this Glossary.

Nevertheless, the demonstration of the Strong Law follows, by and large, the same lines of reasoning as the one we give for the Heads and Tails game. Understanding the full demontration is made much easier by an earlier understanding of the demonstration of the special case.

"Strong" is stronger than "Weak"

It can be shown that the Strong Law implies the Weak Law, which can therefore be regarded as a consequence of the Strong Law.

The converse is, however, wrong : it is possible to exhibit sequences of r.v.s following the Weak Law, but not the Strong one (although these examples are somewhat artificial). So the terms "Weak" and "Strong" are indeed justified.

Unfortunately, demonstrating this result requires the development of several imporant concepts of the general theory of probability (even for the simple case of the Heads and Tails game), and we therefore state them without proof.

"Strong" and "Weak" Laws are of different natures

The mathematical formulations of the "Strong" and "Weak" Laws of Large Numbers look somewhat similar, and the newcomer to Probability theory usually finds it difficult to tell them apart. Yet, the two Laws are quite different in nature :

    * The Weak Law never considers infinite sequences of realizations of a random variable. It only states that imbalanced opening sequences are less and less likely to occur as one considers longer and longer sequences.

In the special case of the Heads and Tails game, the Weak Law of Large Numbers is in fact a result in combinatorics. As we'll show, it states that the proportion of n bit long binary words exhibiting a given level of imbalance between the number of 0s and the number of 1s goes to 0 as n grows without limit.

    * On the other hand, the Strong Law considers only infinite sequences of realizations of a random variable, and more precisely, the set of these infinite sequences. It states that the set of imbalanced sequences has probability 0 in a sense that generalizes the concept of "set of measure 0" as we introduced it about the Heads and Tails game (Borel's normal numbers theorem).

-----

At any rate, the connection between the concept of "probability" and the real world lies in the Weak Law of Large Number. The Strong Law belongs to the domain of mathematics only. By nature, random variables are known only through finite numbers of realizations, and no real world experiment can therefore distinguish between "convergence in probability" and "almost sure convergence" (inasmuch as a real world experiment can capture the concept of "probability", an issue that does not seem to be in a position to receive a universally accepted answer within a predictable delay).

________________________________________________________________________

 

These three Tutorials may be regarded as a single text culminating with the demonstration of Borel's normal numbers theorem.

    * The first one describes the properties of dyadic expansions of real numbers in some detail.

    * The second Tutorial gives a new demonstration of the Weak Law of Large Numbers within the framework of dyadic expansions of real numbers.

    * Only then shall we be able to demonstrate Borel's theorem in the third Tutorial.

 

 

 

 

Tutorial 1

 

In this Tutorial, we analyze the properties of the so-called "dyadic expansion" of real numbers, that is, of the representation of a real number in ]0, 1] as an infinite string of 0s and 1s.

These properties are the key to the correspondence between the theory of probabilities and its representation in the world of sets of points ]0, 1]. In particular, in the next Tutorial, we give a new demonstration of the Weak Law of Large Numbers within the framework of dyadic expansions of real numbers.

 

 

 

DYADIC EXPANSION OF REAL NUMBERS

Binary (or "dyadic") expansion of a real number

Geometric interpretation

Terminating and nonterminating expansions

Dyadic intervals

Left end

Right end

Properties of dyadic intervals

Dyadic functions

Dyadic expansion and probabilities

TUTORIAL

__________________________________________________

 

 

 

Tutorial 2

 

In this Tutorial, we give a new demonstration of the Weak Law of Large Numbers in the special case of the fair Heads and Tails game.

We'll take advantage of the representation of a probability as the length of an interval for first reformulating the WLLN in terms of dyadic functions. We'll then identify an upper bound on the proportion of opening sequences of a dyadic expansions that exhibit a certain level of imbalance, the same way Chebyshev inequality placed an upper bound on the probability for the sample mean to depart from the distribution mean by a given amount. We'll then make the length of opening sequences tend to infinity and reach the conclusion that the proportion of opening sequences exhibiting a given level of imbalance then tends to 0. Reversing to the probabilitic interpretation of dyadic expansions, we'll state that we have demonstrated again the WLLN.

Why demonstrate again the WLLN ? The reason is that Chebyshev's upper bound is fairly weak. It is adequate for demonstrating the WLLN, but is not very stringent. The new upper bound we'll identify, although just as weak as Chebyshev's, can be made stronger pretty much at will, and we'll need a "reinforced" version in the next Tutorial when we demonstrate Borel's theorem.

 

 

 

 

THE WEAK LAW OF LARGE NUMBERS REVISITED

The Weak Law of Large Numbers

WLLN and coin tossing

Chebyshev inequality is loose

Another proof of the WLLN

The "arithmetic" formulation of the WLLN

Rademacher functions

Definition

WLLN and Rademacher functions

Properties of Rademacher functions

A property of step functions

Proof of the WLLN

TUTORIAL

_____________________________________________

 

 

 

Tutorial 3

 

We finally come to demonstrating Borel's theorem : the set of non normal numbers is of measure 0.

The key will be to strengthen the bound placed by Chebyshev inequality on the probability to be found in the WLLN. Once this is done, we'll be able to identify an infinite sequence of sets whose intersection is included in the set of normal numbers. This will, by complementation (de Morgan rules), lead to the identification of an infinite sequence of sets whose union covers the set of non normal numbers. As we'll be able to show that the the sum of the measures of thee sets in the union can be made arbitrarily small, we'll deduce that the set of non normal numbers, although uncountable, can be confined inside an arbitrarily small "box", and is therefore of measure 0. This is the Strong Law of Large Numbers in the special case of Bernoulli rvs with p = .5.

The proof is a bit difficult, but understanding it is well worth the effort as it will reveal a fundamental property (measure 0) of an object (the set of non normal numbers) whose structure is hopelessly beyond the reach of human intuition (well, of that of the author anyway).

 

 

 

 

BOREL'S NORMAL NUMBERS THEOREM

Normal and non normal numbers

Countable and uncountable sets

Normal numbers

Non normal numbers are uncountable

"Measure 0" sets of points

Borel's normal numbers theorem

Borel's normal numbers theorem

Tighter than Chebyshev

The proof, at last

Fixed ε

Sequence of εn

A sufficient condition for normality

The covering

Probabilitic interpretation of Borel's theorem

TUTORIAL

 

 _______________________________________________________

 

Related readings :

Weak Law of Large Numbers

Almost sure convergence

Download this Glossary