Interactive animation

Hypergeometric distribution

An urn contains N balls, of which :

with   W+R = N.

 

n balls are drawn from the urn, without replacement. That is, n balls (the sample) are randomly selected and taken out of the urn. The number of white balls in the sample is a random variable Y, whose distribution is known as the hypergeometric distribution.
It depends on the three parameters N, W and n and will be denoted HG(N, W, n).

 

Probability mass function of the hypergeometric distribution

We denote P{Y = w} the probability for the sample to contain exactly w white balls.

 

 

with    max(0, n-R)  w  min(B, n).

Mean of the hypergeoemtric distribution

 

 

 

If  p denotes the initial proportion of white balls in the urn, this can also be written :

µ = np

Variance of the hypergeometric distribution

 

 

 

 

If p denotes the initial proportion of white balls in the urn, this can also be written :

s² = np(1 - p)(1 - (n - 1)/(N - 1))

 

Note that, for a given value of p, the value of s² converges to np(1 - p), that is, the variance of the binomial distribution with parameters (n, p), when N grows without limit. Can you explain why this should have been expected ?

________________________________________________________

 

Tutorial 1

 

These results are demonstrated in the following Tutorial. We call on a technique that is often useful with discrete distribution, and that consists in representing the number of observation in a bin (here, the number of white balls in the sample) as a sum of Bernoulli variables. Other examples of this approach can be found when we study the binomial and multinomial distributions.

 

 

THE HYPERGEOMETRIC DISTRIBUTION : BASIC PROPERTIES

Probability mass function of the hypergeometric distribution

Mean of the hypergeometric distribution

Sample size as a sum of Bernoulli variables

Calculating the mean

Variance of the hypergeometric distribution

Variances of the auxiliary Bernoulli variables

Covariances of the auxiliary Bernoulli variables

Variance of the hypergeometric distribution

TUTORIAL

 

 

 

 

* All parameters are adjustable.

* Progressive build-up of the histogram.

 

 

 

________________________________________________________

 

 

Tutorial 2

 

We then address an issue where  the hypergeometric distribution turns up a bit unexpectedly :

    * Let X and Y be two independent binomial random variables, with the same p but different sizes m and n. Choose an integer k, and consider the distribution of X under the condition X + Y = k. This distribution is hypergeometric, and does not depend on p.

This important property of  the binomial distribution is illustrated by a quite instructive interactive animation.

 

It is the starting point of the Fisher-Irwin test whose purpose is to test the H0 hypothesis according to which two Bernoulli populations have the same value of the paramter p.
 

DISTRIBUTION OF TWO INDEPENDENT BINOMIAL VARIABLES

CONDITIONALLY TO THEIR SUM

 Distribution of two independent binomial variables conditionally to their sum

_______________________________________
 

 Interactive animation

* Parameters of the two binomials are adjustable.
* Sum is adjustable.
* Progressive build-up of the histogram.

TUTORIAL

 

_______________________________________________________

 

Related readings:

The binomial distribution

The Fisher-Irwin test

Download this Glossary

 

Want to contribute to this site ?