INTERACTIVE ANIMATION: HISTOGRAM
This animation illustrates the concept of Histogram, and more particularly the bias-variance dilemma.
In the frame are:
1) A horizontal green line that represents a uniform distribution.
2) A sample drawn from this distribution. The sample size is read in the "Nb Points" display and may be adjusted with the "Nb Points" buttons. A new sample is drawn for each new sample size.
3) The histogram (yellow) of the sample. the number of bins is read in the "Nb Bins" display. You may change the number of bins with th "Nb Bins" buttons.
You may change the shape of the density function . Click several times inside the frame, either above or below the green line. For every click, a new sample is drawn, and its histogram is built.
Build a density function, select a sample size and a number of bins, and click on "Go". A series of samples are drawn from the same density, and for each sample, the corresponding histogram is displayed.
In the "Pause" mode, click on "Step" to draw new samples.
The purpose of this animation is to illustrate the "bias-variance dilemma" issue.
1) Keep the sample size fixed, and change the number of bins (this can be done while the animation is running).
So there has to be an "optimal" bin width that reaches a reasonable trade-off between:
Unfortunately, there is no clear-cut definition of what "optimal" means. Several definitions may be imagined, for example based on the Kullback-Leibler distance, but none has an absolutely compelling status. At any rate, the practitioner pays little heed to such criteria, as their efficiency depends on the actual shape of the density, which is of course unknown in practice.
2) For a given bin width, change the number of points (this can be done while the animation is running). Note that the stability of the histogram gets better when the number of points gets larger. For a given level of stability (and therefore, of credibility), it then becomes possible to reduce the bin width, and therefore improve the "sharpness" of the image of the density. The "optimal" bin width gets smaller when samples become larger.