Binomial distribution
You're playing Heads and Tails with a coin whose probability to produce a "Head" is p (and therefore the probability to produce a "Tail" is q = 1 - p).
You toss the coin n times, thus producing k "Heads" (and therefore n - k "Tails"). Of course, should you decide to proceed with a second series of n tosses, you would expect to generate a different number of "Heads". Therefore, k is the realization of a random variable that we denote X.
By definition, X has a binomial distribution with parameters n and p, which is denoted B(n, p). This distribution is completely determined by the list of the probabilities to produce k "Heads" in n tosses, for all values of k from 0 to n. We will denote theses probabilities Pn,p{X = k}.
-----
Technically, the binomial distribution is therefore the distribution of the sum of n independent and identically distributed Bernoulli random variables with common parameter p.
The following animation illustrates the Binomial distribution.
|
Initial set-up The animation opens on an initial set-up with : 1) A sample of n = 6 outcomes of a Bernoulli variable with parameter p = .4. This probability is read as the ratio of the length of the elongated white area of the upper frame to the overall length of the frame. The red and white dots were drawn from
a uniform distribution over the length of the frame. Thus, a dot
has : Each time you click on "Next",
a new n-sample is drawn from the Bernoulli distribution.
Observe that the number of red dots varies randomly. Adjust
p by sliding the boundary between the white and and gray
areas of the upper frame with your mouse. Observe the changes in
the binomial distribution B(n, p) as the value
of p varies. In particular, note that the distribution is
nearly symmetric when p is close to .5, but becomes skewed
when p becomes substantially different from .5. Use
the buttons of the
"Nb. Draws" display to adjust n, the
number of "tosses" of the coin. When n = 1, the
binomial distribution becomes the Bernoulli distribution with parameter
p. Observe that for a given p, the
binomial distribution looks more and more like a normal
distribution as the number of draws is made larger and larger. * If p is
close to .5, it can be seen that the normal distribution provides
a good approximation of this distribution, even for moderate numbers
of draws.
|
We'll establish the following results :
The probability to obtain k Heads in a series of n tosses is :
|
|
where

is the number of ways k objects can be chosen among n objects. It is called the binomial coefficient. The reason for this name (and that of the distribution) is that it is the number of times that xkyn - k appears in the development of the binom :
This is the number of Heads you'll get, on the average, in a series of n tosses.
|
µ = np |
|
σ²= np(1 - p) |
Note that the variance is almost 0 if p is close to 0 or else close to 1. In the first case, you'll almost always get a very small number of Heads, and a number of Heads close to N is the other case. In both cases, this number is very stable and the variance is low.
Show that the variance is largest for p = .5.
-----
This result can also be written as :
σ²= µ(1 - µ)/n
The significance of this expression will appear when we consider the binomial distribution as belonging to the natural exponential family.
The animation suggests that the value of k for which P{X = k} is largest (the mode) is always less than one unit away from the mean, a result that we'll demonstrate.
A common way to express this result is to say that the most probable frequency of Heads is approximately equal to p, the probability to obtain a Head on any toss of the coin.
|
M(t) = (pet + q)n |
We'll use this mgf to show that the binomial distribution tends to a normal distribution for large samples.
We show here that the generating function of the binomial distribution B(n, p) is
|
G(s) = (q + ps)n |
Let X1 and X2 be two independent binomial variables, respectively with distributions B(n1 , p) and B(n2 , p). Then the r.v. X = X1 + X2 is also binomial, with X ~ B(n1 + n2, p).
The practical importance of the binomial distribution cannot be overstated, as many events can be interpreted as resulting from a sequence of "trials", each one leading to a "success" of a "failure". The binomial distribution is then that of the number a successes in such a sequence.
But the importance of the binomial distribution extends beyond its practical usefulness. Two of the most fundamental results in the theory of random variables are conveniently illustrated by the binomial distribution.
If you toss a fair coin a large number of times, you intuitively expect to obtain about as many Heads as Tails. More generally, you expect the number of Heads to be approximately equal to the overall number of tosses multiplied by the probability to obtain Head at each toss. This intuition is justified, and is confirmed by the Weak Law of Large Numbers (WLLN).
The scope of the WLLN is much broader than just the Heads and Tails game, but it is quite illuminating to directly demonstrate the WLLN in the special case of the Heads and Tails game, which we do here. It there appears that the WLLN is then an asymptotic property of the binomial coefficient (see above), and is therefore a result in combinatorics.
The above animation shows that as n is made larger and larger, the binomial distribution clings more and more closely to a gaussian curve, all the more so that p is close to .5.
Today, this appears as a straightforward consequence of the Central Limit Theorem (CLT). In a nutshell, the CLT states that this kind of convergence to a normal distribution is quite general, and is definitely not specific to the binomial distribution.
Yet, in the early days of probability theory, the CLT was not known, and the above result was derived directly in what is known today as "de Moivre theorem", that we'll demonstrate.
Again, it then appears that in the special case of the binomial distribution, the "gaussian evolution" of the binomial distribution is an asymptotic property of the binomial coefficient, and is therefore again a result in combinatorics.
We use below a
third method for establishing this result.
Calculating P{X = k} involves handling factorials, which is a heavy job (even for computers), and the need for an approximate but quick way to calculate a binomial probability was immediately felt. Actually, the work by de Moivre was motivated by this desire to calculate binomial probabilities by hand for other than very small values of n.
* When p is reasonably close to .5, the gaussian approximation comes in handy, and provides adequately accurate results even for low values of n (see animation).
* But when p is close to 0 or close to 1, the binomial distribution is strongly skewed and the gaussian approximation is not so good anymore. The need is then felt for yet another approximation of the binomial distribution "specialized" in extreme values of p. This improved approximation is known as the Poisson distribution.
-----
Today, the importance of the Poisson distribution extends far beyond its role as an approximation of the binomial distribution for extreme values of p.
The draws from which the binomial distribution is derived have only two possible outcomes ("Heads" or "Tails"). If these draws have more than just two possible outcomes (e.g. if you roll a die instead of tossing a coin), the distribution of the numbers of the various outcomes in a series of n draws is called the multinomial distribution, which is therefore a generalisation of the binomial distribution.
We use the binomial distribution for illustrating the fact that the Method of Moments can be used for estimating simultaneously several parameters of a distribution when these parameters are not moments (as is the case, for instance, for the normal distribution).
In the Tutorial below, we use the method of moments for estimating both the parameters n and p of the binomial distribution B(n, p).
The binomial distribution usually leads to cumbersome or even intractable calculations. Yet, we'll show how to estimate n (assuming that p is known) by the method of Maximum Likelihood. The result does not come in a closed form, but as an equation that can be solved by numerical methods.
* We calculate here the distribution of the number of Heads in the first m tosses in a series of n tosses conditionally to the total number of Heads in the series of n tosses (we do it in the course of identifying a sufficient statistic for the parameter p of the binomial distribution).
* In the same spirit, we calculate here the expectation of this same number of Heads.
-----
* We calculate here the distributions of two independent binomial variables with the same parameter p but different sizes conditionally to the value of their sum. The result will be the key to the Fisher-Irwin test, that tests the identity of the values of the parameters of two independent Bernoulli populations.
-----
* Under certain conditions, the hypergeometric distribution converges to a binomial distribution when the (finite) population is made to grow without limit.
Please Use our Binomial Calculator to calculate the individual and cumulated probabilities you might need.
____________________________________________________________________
|
Tutorial 1 |
We establish here the basic properties of the binomial distribution.
The mean and variance of the binomial distribution are also established
here by calling on the properties
of the generating function.
The additivity property is also established here,
also by calling on the properties
of the generating function.
We then show that the binomial distribution tends to a normal distribution for large samples. Of course, this is an immediate consequence of the Central Limit Theorem (CLT), but we'll pretend that we never heard of the CLT, and show directly that the moment generating function of the binomial distribution converges to that of a normal distribution as the sample size grows without limit.
-----
We conclude the Tutorial by estimating n by two different methods :
* The method of moments.
* The method of Maximum Likelihood.
BINOMIAL DISTRIBUTION
: BASIC PROPERTIES
|
Probability mass function Mean µ Variance σ² Mode Moment generating function Direct approach Indirect approach Moments Generating function Additivity Direct calculation By mgf Convergence to normal (by limit of the mgf) Mgf of the standardized binomial distribution Taylor expansion of the mgf Limit of the mgf Estimation of n Estimation of both n and p by the method of moments Estimation of n (p known) by Maximum Likelihood |
||
|
TUTORIAL |
||
___________________________________________________
|
Tutorial 2 |
In this Tutorial, we demonstrate the famed "De Moivre theorem" which establishes that the binomial distribution tends to a normal distribution when n, the number of draws, tends to infinity.
Today, De Moivre theorem is superseded by the considerably more general Central Limit Theorem. In addition, we already demonstrated this result in the previous Tutorial by calculating the limit of the moment generating function of the binomial distribution. Yet, it seems appropriate not to forget this historically significant achievement, if only to discover the remarkably high degree of mathematical sophistication that already existed as far back as the first half of the 18th century.
The demonstration reproduces rather faithfully that of De Moivre (completed by his friend Stirling), only expressed in modern notations.
DE MOIVRE THEOREM
|
Centering the binomial distribution Consecutive probabilities Any probability to that of the mode Normalization coefficient The normal limit |
||
|
TUTORIAL |
||
_______________________________________________________
Related readings:
|
Multinomial distribution |
|
|
B(n, p) and the Poisson distribution |
|
|
Central Limit Theorem |
|
|
The role of B(n, p) in Order Statistics |
|
|
Bin's height in histograms are B(n, p) |
|
|
Negative binomial distribution |
|
|
Binomial as limit of hypergeometric |
|
|
Distribution of 2 independent
binomial rvs |