|
Interactive animation |
The so called "bootstrap" method generates, for any quantity defined on a on a (unknown) probability distribution :
It does so from a single sample {X1, X2, ..., Xn}. The distribution may be continuous or discrete or a combination thereof.
In other words, the bootstrap is a method of estimation in its own right. In fact, it is most often used to estimate classical quantities like :
More advanced applications of the boostrap involve estimating various measures of "error" (bias of an estimator, prediction error of a predictive model, confidence intervals, tests).
The bootstrap has three special characteristics, though :
______________________________
The bootstrap method relies on two basic concepts :
Bootstrapping a sample then goes along the following lines :
____________________________
You'll find here
an informal justification of the bootstrap technique.
____________________________
The following animation illustrates the concept of bootstrap.
1) Click anywhere
in the upper frame (including the green area) to carve out a probability density.
2) Select the quantity you want to estimate. In this small demonstration the choice is limited to :
The long, vertical red line hanging from the top edge of the frame materializes the true value of the selected quantity.
3) Chose the number of observations that will constitue the sample you're going to create.
4) Click on "New" to create a sample from your probability density. Click again until you get a sample with characteristics that you deem "appropriate".
The short vertical red line at the heart of the sample materializes the value of the statistic Z (mean or median) for that sample. It is the "classical" estimate of the quantity being studied.
5) Clic on "Go" to start building up of the bootstrap histogram in the lower frame. The number of bootstrap samples is limited to 1000, a larger number that what is usually required in practical situations.
The verical red line materializes the mean of the histogram, that is, the current bootstrap estimate of the quantity being estimated.
.
Note that spread out and/or small samples are conducive to irregular histograms, that may even be made up of several clusters separated by gaps. This is particularly marked for the "Median" option. Can you explain this phenomenon ?
6) Click on "Pause". Multiplicities of observations in the current bootstrap sample are then displayed. Click on "Next" to get the next bootstrap sample etc...
The vertical blue line at the heart of the sample materializes the value of the statistic Z for the current bootstrap sample.
Click on "Resume" to get back to the automatic mode.
___________________________________
Observe that, in the "Mean" mode, the bootstrap estimate converges towards the "ordinary" estimate (within the pixel quantization error of the display). It is not so for the median : the empirical median and the bootstrap median are different estimates of the median of the distribution.
Notice also that the bootstrap estimate is better for samples that are fairly representative of the probability density. But this is true, of course, for any estimator.
Note that the probability density plays no role in illustrating the concept of bootstrap. It is there just for the purpose of reminding you that the quantity being estimated is defined on this distribution.
________________________________________
Related readings :
|
Want to contribute to this site ? |