Interactive animation

Bootstrap

The so called "bootstrap" method generates, for any quantity defined on a on a (unknown) probability distribution :

 

It does so from a single sample {X1, X2, ..., Xn}. The distribution may be continuous or discrete or a combination thereof.

 

In other words, the bootstrap is a method of estimation in its own right. In fact, it is most often used to estimate classical quantities like :

 

More advanced applications of the boostrap involve estimating various measures of "error" (bias of an estimator, prediction error of a predictive model, confidence intervals, tests).

 

The bootstrap has three special characteristics, though :

  1. It is completely non parametric (makes no assumption whatever about the underlying distribution).
  2. It may be used as a substitute to parametric estimations methods when they lead to intractable calculations.
  3. It is very computer intensive (see animation below).

 ______________________________

 

The bootstrap method relies on two basic concepts :

  1. The selection of an appropriate statistic Z(X1, X2, ..., Xn) on the sample at hand. For example, if one wants to conduct a bootstrap estimation of the median a distribution, one will retain the sample median as the selected statistic. More generally, Z will be a classical estimator of the quantity that is to be estimated.
     
  2. The "bootstrap sample".  The sample {X1, X2, ..., Xn} is considered as an " urn " from which n elements are drawn with replacement to make up one bootstrap sample. A bootstrap sample is then just the original sample, with each observation being assigned a multiplicity from 0 to n (because of the replacement step), the sum of the multiplicities being of course n.

 

Bootstrapping a sample then goes along the following lines :

  1. Many bootstrap samples are created from the unique original sample.
  2. The statistic Z(X1, X2, ..., Xn)  is calculated for each bootstrap sample.
  3. Two informations are extracted from the histogram of the values of Z :

____________________________


You'll find here an informal justification of the bootstrap technique.

____________________________

 

 

 

The following animation illustrates the concept of bootstrap.

 

 

 

 

The "Book of Animations" on your computer

 

 


1) Click anywhere in the upper frame (including the green area) to carve out a probability density.

 

2) Select the quantity you want to estimate. In this small demonstration the choice is limited to :

The long, vertical red line hanging from the top edge of the frame materializes the true value of the selected quantity.

 

3) Chose the number of observations that will constitue the sample you're going to create.

 

4) Click on "New" to create a sample from your probability density. Click again until you get a sample with characteristics that you deem "appropriate".

The short vertical red line at the heart of the sample materializes the value of the statistic Z (mean or median) for that sample. It is the "classical" estimate of the quantity being studied.

 

5) Clic on "Go" to start building up of the bootstrap histogram in the lower frame. The number of bootstrap samples is limited to 1000, a larger number that what is usually required in practical situations.

The verical red line materializes the mean of the histogram, that is, the current bootstrap estimate of the quantity being estimated.

.

Note that spread out and/or small samples are conducive to irregular histograms, that may even be made up of several clusters separated by gaps. This is particularly marked for the "Median" option. Can you explain this phenomenon ?

 

6) Click on "Pause". Multiplicities of observations in the current bootstrap sample are then displayed. Click on "Next" to get the next bootstrap sample etc...

The vertical blue line at the heart of the sample materializes the value of the statistic Z for the current bootstrap sample.

Click on "Resume" to get back to the automatic mode.

___________________________________

 

Observe that, in the "Mean" mode, the bootstrap estimate converges towards the "ordinary" estimate (within the pixel quantization error of the display). It is not so for the median : the empirical median and the bootstrap median are different estimates of the median of the distribution.

 

Notice also that the bootstrap estimate is better for samples that are fairly representative of the probability density. But this is true, of course, for any estimator.

 

Note that the probability density plays no role in illustrating the concept of bootstrap. It is there just for the purpose of reminding you that the quantity being estimated is defined on this distribution.

 ________________________________________

Related readings :

Estimation

Distribution function

Download this Glossary

 

Want to contribute to this site ?