|
AN INFORMAL JUSTIFICATION OF BOOTSTRAP PRINCIPLE |
In this section, the variable is numérical with a probability density. The reasoning would go along the same lines for other type of probabiloity distributions.
The concept of bootstrap relies on a chain of simple ideas:
1) The probability density p(x) that gave birth to the sample is unknown. Yet, it can reasonably be expected that the sample is representative of this density, that is, that there are "many" observations is regions where the density is high, and "few" observations where the density is low. This hope is a direct consequence of the definition of a probability density.
The sample may then be perceived as a downgraded, "dotted" version of the probability density, but that preserves its main features.
2) The sample is discrete, but the density is continuous. They therefore cannot be compared directly. Yet the comparison becomes possible if one considers cumulative distribution functions: it can be reasonably expected that the empirical distribution function F*(x) is a fair approximation of the true distribution function F(x) in a sense that could easily be quantified.
The empirical distribution function is very easy to determine: it is made of steps, each step 1/n high, and the steps are positioned on top of the observations.
3) Bootstrap then says :"The distribution function F is unknown (just as the probability density is). But we we can easily build what we hope to be a good approximation, namely F*. Therefore, anything we could do with F, we can approximate using F* instead".
4) Lets forget about bootstrap for a moment. Suppose we want to estimate a quantity θ defined on a known probability density p(x) (for example, one of its moments). Suppose further that we have identified an estimator Θ for θ. Unfortunately, calculations on Θ turn out intractable, and we cannot determine the theoretical distribution of our estimator.
Is there another way to estimate this distribution ?
Yes, by resorting to simulation. The distribution of Θ is just the histogram of the values of Θ (X1, X2, ..., Xn) for infinitely many samples drawn from p(x). Now replace "infinitely many" by "a large number of", and we get a practical method for estimating the distribution of Θ:
1) Draw a large number of samples from p(x).
2) For each new sample {X1, X2, ..., Xn}, compute the estimated value of θ, namely θ* = Θ (X1, X2, ..., Xn).
3) Draw the histogram of the θ* values. It is as estimate of the distribution of Θ
4) From this histogram, extract:
* The mean, which will be our simulated estimate of the expectation of Θ.
* The variance, which will be our estimate of the variance of Θ.
The procedure we just described belongs to
a family of probabilistic simulation techniques known as "Monte Carlo methods". Many
of our interactive animations are based on Monte-Carlo simuations (see for example
Central Limit Theorem).
5) Several methods can be used for drawing a sample from a given probability density p(x). One of them is as follows:
* The distribution function F(x) is calculated (or tabulated). remember it grows monotonically from 0 to 1.
* A number "y" is drawn at random in the [0,1] interval.
* The unique number "x" such that F(x) = y is determined. This is the first observation of our sample.
* The procedure is iterated n-1 more times to get finally the n observation sample.
This is a direct application of the Probability
Integral Transformation.

6) We can now get back to the bootstrap. It estimates the distribution of Θ by a Monte-Carlo simulation. For that purpose, it uses the above described method with F* as the distribution function, as it is our best estimate of the true, but unknown distribution function F.
* Clearly, any observation obtained by this method will belong to the original sample {X1, X2, ..., Xn}.
* What is the probability for the next draw to yield observation #i ? Because all the steps of F* have the same height 1/n, this probability is 1/n for any number i.
Therefore, in this particular case, we can choose between two equivalent methods:
* Use the "distribution function" technique,
* Or simply assign the probability 1/n to all observations and designate n times an observation at random, that is, in the final analysis, proceed with a draw of n observations with replacement from the original sample.
________________________________________________
So our basic building blocks of the bootstrap are now justified:
* The "bootstrap sample",
* Resorting to a classical statistic Θ for the purpose of building the bootstrap histogram,
* Using the mean and variance of this histogram as Monte-Carlo estimates of the expectation and variance of Θ.
____________________
So we are now convinced that bootstrap can produce a "plausible" estimate of any quantity defined on the probability density that generated the sample. The bootstrap is therefore an estimation technique. But at this point, we know nothing of the general properties of bootstrap estimators, for instance its bias, if any. This question, and many others, have no unique answer. It all depends of the quantity being estimated, and of p(x).
At any rate, these question are difficult and sometimes controversial. We won't address them in this Glossary.
Note the the same kind of question arise with any estimation
technique, like for instance the Maximum Likelihood
method.
Yet, because of its simplicity and of the availability of cheap computing power, the bootstrap is often found in software. There exist many accounts of thorough investigations of the behavior of bootstrap in various conditions: the newcomer is strongly advised to refer to these studies instead of hastily accept the automatic conclusions delivered by software.