Binomial distribution

You're playing Heads and Tails with a coin whose probability to produce a "Head" is p (and therefore the probability to produce a "Tail" is q = 1 - p).

You toss the coin n times, thus producing k "Heads" (and therefore n - k "Tails"). Of course, should you decide to proceed with a second series of n tosses, you would expect to generate a different number of "Heads". Therefore, k is the realization of a random variable that we denote X.

 

By definition, X has a binomial distribution with parameters n and p, which is denoted B(n, p). This distribution is completely determined by the list of the probabilities to produce k "Heads" in n tosses, for all values of k from 0 to n. We will denote theses probabilities Pn,p{X = k}.

-----

Technically, the binomial distribution is therefore the distribution of the sum of n independent and identically distributed Bernoulli random variables with common parameter p.

Animation

The following animation illustrates the Binomial distribution.

 

 

The "Book of Animations" on your computer

 

 

 

Initial set-up

The animation opens on an initial set-up with :

    1) A sample of n = 6 outcomes of a Bernoulli variable with parameter p = .4. This probability is read as the ratio of the length of the elongated white area of the upper frame to the overall length of the frame.

The red and white dots were drawn from a uniform distribution over the length of the frame. Thus, a dot has :
       * Probability p to be in the white area (red dots). It is then considered as a "Head", or a "success".
       * Probability (1 - p) to be in the gray area (black dots). It is then considered as a "Tails", or a "failure".

   2) The theoretical histogram of the B(n, p) distribution displayed in the middle frame. The thin blue vertical line marks the mean of the distribution.
----------

Each time you click on "Next", a new n-sample is drawn from the Bernoulli distribution. Observe that the number of red dots varies randomly.
The displayed binomial distribution B(n, p) is that of the number of red dots.

Controls
   1) Probability p

        Adjust p by sliding the boundary between the white and and gray areas of the upper frame with your mouse. Observe the changes in the binomial distribution B(n, p) as the value of p varies. In particular, note that the distribution is nearly symmetric when p is close to .5, but becomes skewed when p becomes substantially different from .5.

   2) Number of trials

        Use the buttons of the "Nb. Draws" display to adjust n, the number of "tosses" of the coin. When n = 1, the binomial distribution becomes the Bernoulli distribution with parameter p.
-----

Observe that for a given p, the binomial distribution looks more and more like a normal distribution as the number of draws is made larger and larger.

Central Limit Theorem
This feeling is confirmed visually by the display of the lower frame (gray background). The red curve is the standard normal distribution, and the yellow vertical bars represent the standardized binomial distribution B(n, p).

   * If p is close to .5, it can be seen that the normal distribution provides a good approximation of this distribution, even for moderate numbers of draws.
   * But if p is substantially different from .5, the quality of the approximation degrades because the binomial distribution is then skewed. The approximation gets somewhat better, though, as the number of draws is made larger.
-----
So it appears that :
   * As n is made larger and larger, the binomial distribution tends to a "discretized" normal distribution. This result is a direct consequence of the Central Limit Theorem (CLT).
   * But the convergence becomes harder to obtain if the value of p is substantially different from .5 because the binomial distribution is then skewed. The need is then felt for another kind of approximation of the binomial distribution : this new approximation is provided by the Poisson distribution.
 
Animation
Click on "Go" and observe the progressive build-up of the histogram of the binomial distribution B(n, p).

 

 

 

Basic properties of the binomial distribution

We'll establish the following results :

Probability mass function

The probability to obtain k Heads in a series of n tosses is :

 

 

 

where

 

is the number of ways k objects can be chosen among n objects. It is called the binomial coefficient. The reason for this name (and that of the distribution) is that it is the number of times that xkyn - k appears in the development of the binom :

(x + y)n

Mean

This is the number of Heads you'll get, on the average, in a series of n tosses.

µ = np

 

Variance

σ²= np(1 - p)

 

 Note that the variance is almost 0 if p is close to 0 or else close to 1. In the first case, you'll almost always get a very small number of Heads, and a number of Heads close to N is the other case. In both cases, this number is very stable and the variance is low.


Show that the variance is largest for p = .5.

-----

This result can also be written as :

σ²= µ(1 - µ)/n

The significance of this expression will appear when we consider the binomial distribution as belonging to the natural exponential family.

Mode

The animation suggests that the value of k for which P{X = k} is largest (the mode) is always less than one unit away from the mean, a result that we'll demonstrate.

A common way to express this result is to say that the most probable frequency of Heads is approximately equal to p, the probability to obtain a Head on any toss of the coin.

Moment generating function

 

M(t) = (pet + q)n

 

 

We'll use this mgf to show that the binomial distribution tends to a normal distribution for large samples.

Generating function

We show here that the generating function of the binomial distribution B(n, p) is

 

G(s) = (q + ps)n

Additivity

Let   X1 and X2   be two independent binomial variables, respectively with distributions B(n1 , p) and B(n2 , p). Then the r.v. X = X1X2 is also binomial, with X ~ B(n1 + n2, p).

Theoretical importance of the binomial distribution

The practical importance of the binomial distribution cannot be overstated, as many events can be interpreted as resulting from a sequence of "trials", each one leading to a "success" of a "failure". The binomial distribution is then that of the number a successes in such a sequence.

But the importance of the binomial distribution extends beyond its practical usefulness. Two of the most fundamental results in the theory of random variables are conveniently illustrated by the binomial distribution.

Weak Law of Large Numbers

If you toss a fair coin a large number of times, you intuitively expect to obtain about as many Heads as Tails. More generally, you expect the number of Heads to be approximately equal to the overall number of tosses multiplied by the probability to obtain Head at each toss. This intuition is justified, and is confirmed by the Weak Law of Large Numbers (WLLN).

The scope of the WLLN is much broader than just the Heads and Tails game, but it is quite illuminating to directly demonstrate the WLLN in the special case of the Heads and Tails game, which we do here. It there appears that the WLLN is then an asymptotic property of the binomial coefficient (see above), and is therefore a result in combinatorics.

Central Limit Theorem

The above animation shows that as n is made larger and larger, the binomial distribution clings more and more closely to a gaussian curve, all the more so that p is close to .5.

Today, this appears as a straightforward consequence of the Central Limit Theorem (CLT). In a nutshell, the CLT states that this kind of convergence to a normal distribution is quite general, and is definitely not specific to the binomial distribution.

Yet, in the early days of probability theory, the CLT was not known, and the above result was derived directly in what is known today as "de Moivre theorem", that we'll demonstrate.

Again, it then appears that in the special case of the binomial distribution, the "gaussian evolution" of the binomial distribution is an asymptotic property of the binomial coefficient, and is therefore again a result in combinatorics.


We use below a third method for establishing this result.

Binomial distribution and Poisson distribution

Calculating P{X = k} involves handling factorials, which is a heavy job (even for computers), and the need for an approximate but quick way to calculate a binomial probability was immediately felt. Actually, the work by de Moivre was motivated by this desire to calculate binomial probabilities by hand for other than very small values of n.

    * When p is reasonably close to .5, the gaussian approximation comes in handy, and provides adequately accurate results even for low values of n (see animation).

    * But when p is close to 0 or close to 1, the binomial distribution is strongly skewed and the gaussian approximation is not so good anymore. The need is then felt for yet another approximation of the binomial distribution "specialized" in extreme values of p. This improved approximation is known as the Poisson distribution.

-----

Today, the importance of the Poisson distribution extends far beyond its role as an approximation of the binomial distribution for extreme values of p.

Multinomial distribution

The draws from which the binomial distribution is derived have only two possible outcomes ("Heads" or "Tails"). If these draws have more than just two possible outcomes (e.g. if you roll a die instead of tossing a coin), the distribution of the numbers of the various outcomes in a series of n draws is called the multinomial distribution, which is therefore a generalisation of the binomial distribution.

Estimation of the parameters

Estimation by the Method of Moments

We use the binomial distribution for illustrating the fact that the Method of Moments can be used for estimating simultaneously several parameters of a distribution when these parameters are not moments (as is the case, for instance, for the normal distribution).

In the Tutorial below, we use the method of moments for estimating both the parameters n and p of the binomial distribution B(n, p).

Estimation by Maximum Likelihood

The binomial distribution usually leads to cumbersome or even intractable calculations. Yet, we'll show how to estimate n (assuming that p is known) by the method of Maximum Likelihood. The result does not come in a closed form, but as an equation that can be solved by numerical methods.

Some additional results

        * We calculate here the distribution of the number of Heads in the first m tosses in a series of n tosses conditionally to the total number of Heads in the series of n tosses (we do it in the course of identifying a sufficient statistic for the parameter p of the binomial distribution).

        * In the same spirit, we calculate here the expectation of this same number of Heads.

-----

        * We calculate here the distributions of two independent binomial variables with the same parameter p but different sizes conditionally to the value of their sum. The result will be the key to the Fisher-Irwin test, that tests the identity of the values of the parameters of two independent Bernoulli populations.

-----

        *  Under certain conditions, the hypergeometric distribution converges to a binomial distribution when the (finite) population is made to grow without limit.

Calculette

Please Use our Binomial Calculator to calculate the individual and cumulated probabilities you might need.

 

____________________________________________________________________

 

 

Tutorial 1

 

We establish here the basic properties of the binomial distribution.


The mean and variance of the binomial distribution are also established here by calling on the properties of the generating function.
The additivity property is also established here, also by calling on the properties of the generating function.

We then show that the binomial distribution tends to a normal distribution for large samples. Of course, this is an immediate consequence of the Central Limit Theorem (CLT), but we'll pretend that we never heard of the CLT, and show directly that the moment generating function of the binomial distribution converges to that of a normal distribution as the sample size grows without limit.

-----

We conclude the Tutorial by estimating n by two different methods :

    * The method of moments.

    * The method of Maximum Likelihood.

 

 

 

BINOMIAL DISTRIBUTION : BASIC PROPERTIES

Probability mass function

Mean µ

Variance σ²

Mode

Moment generating function

Direct approach

Indirect approach

Moments

Generating function

Additivity

Direct calculation

By mgf

Convergence to normal (by limit of the mgf)

Mgf of the standardized binomial distribution

Taylor expansion of the mgf

Limit of the mgf

Estimation of n

Estimation of both n and p by the method of moments

Estimation of n (p known) by Maximum Likelihood

TUTORIAL

___________________________________________________

 

 

Tutorial 2

 

In this Tutorial, we demonstrate the famed "De Moivre theorem" which establishes that the binomial distribution tends to a normal distribution when n, the number of draws, tends to infinity.

Today, De Moivre theorem is superseded by the considerably more general Central Limit Theorem. In addition, we already demonstrated this result in the previous Tutorial by calculating the limit of the moment generating function of the binomial distribution. Yet, it seems appropriate not to forget this historically significant achievement, if only to discover the remarkably high degree of mathematical sophistication that already existed as far back as the first half of the 18th century.

The demonstration reproduces rather faithfully that of De Moivre (completed by his friend Stirling), only expressed in modern notations.

 

 

 

DE MOIVRE THEOREM

Centering the binomial distribution

Consecutive probabilities

Any probability to that of the mode

Normalization coefficient

The normal limit

TUTORIAL

 

_______________________________________________________

 

Related readings:

Multinomial distribution

B(n, p) and the Poisson distribution

Central Limit Theorem

The role of B(n, p) in Order Statistics

Bin's height in histograms are B(n, p)

Negative binomial distribution

Binomial as limit of hypergeometric

Distribution of 2 independent binomial rvs
conditionally to their sum

Download this Glossary