Interactive animation

Histogram

Given a sample from an unknown probability distribution p(x), a histogram is a model that is hoped to give a reasonably faithful graphical representation of this probability distribution.

It is built as follows :

 

So, a histogram is a series of bins placed side by side on the x axis, and such that the height of a bin is the number of observations in the corresponding bracket.

The histogram as a representation of p(x)

The sample is, as usual, hoped to be a faithful representation of p(x) : many observations are expected to be found in regions where p(x) is large, and few observations are expected to be found in regions where p(x) is small. Bins are expected to be tall where p(x) is large, and short where p(x) is small. So, the "skyline" of the (appropriately normalized) histogram is expected to be a faithful, discretized, staircase-like representation of  p(x) itself (lower image of the illustration below).
 

 

The histogram as a model

A histogram is :

The histogram and the bias-variance tradeoff

Simple as it is, the histogram provides a very good illustration of one of the most fundamental aspects of practical, down-to-earth statistical data modeling : the so-called bias-variance tradeoff. For histograms, the bias-variance tradeoff reads as follows :

 

.

So a good histogram must have just the right number of bins. What is this number ? The answer is disappointing : it cannot be calculated. But it could be roughly estimated by validation techniques.

 

This question of great practical importance is developed in more details in the next page. It is also illustrated in an interactive animation that you'll find here.

 

For more on the bias-variance tradeoff, please see here.

____________________________________________________________

 

Related readings

Estimation

Bias-variance tradeoff

Download this Glossary