|
Interactive animation |
Hypergeometric distribution
An urn contains N balls, of which :
with W+R = N.
n balls are drawn from the urn, without replacement.
That is, n balls (the sample) are randomly selected and taken out of the urn. The
number of white balls in the sample is a random variable Y, whose distribution
is known as the hypergeometric distribution.
It depends on the three
parameters N, W and n and will be denoted HG(N, W, n).
We denote P{Y = w} the probability for the sample to contain exactly w white balls.
|
|
with max(0, n-R)
w
min(B, n).
|
|
If p denotes the initial proportion of white balls in the urn, this can also be written :
µ = np
|
|
If p denotes the initial proportion of white balls in the urn, this can also be written :
s² = np(1 - p)(1 - (n - 1)/(N - 1))
Note that, for a given value of p, the value of s² converges to np(1 - p), that is, the variance of the binomial distribution with parameters (n, p), when N grows without limit. Can you explain why this should have been expected ?
________________________________________________________
|
Tutorial 1 |
These results are demonstrated in the following Tutorial. We call on a technique that is often useful with discrete distribution, and that consists in representing the number of observation in a bin (here, the number of white balls in the sample) as a sum of Bernoulli variables. Other examples of this approach can be found when we study the binomial and multinomial distributions.
THE HYPERGEOMETRIC DISTRIBUTION : BASIC PROPERTIES
|
Probability mass function of the hypergeometric distribution Mean of the hypergeometric distribution Sample size as a sum of Bernoulli variables Calculating the mean Variance of the hypergeometric distribution Variances of the auxiliary Bernoulli variables Covariances of the auxiliary Bernoulli variables Variance of the hypergeometric distribution |
||
|
TUTORIAL |
||
|
|
|
|
* All parameters are adjustable. * Progressive build-up of the histogram. |
|
________________________________________________________
|
Tutorial 2 |
We then address an issue where the hypergeometric distribution turns up a bit unexpectedly :
* Let X and Y be two independent binomial random variables, with the same p but different sizes m and n. Choose an integer k, and consider the distribution of X under the condition X + Y = k. This distribution is hypergeometric, and does not depend on p.
This important property of the binomial distribution is illustrated by a quite instructive interactive animation.
It is the starting point of the Fisher-Irwin
test whose purpose is to test the H0 hypothesis according
to which two Bernoulli populations have the same value of the paramter p.
DISTRIBUTION OF TWO INDEPENDENT BINOMIAL VARIABLES
CONDITIONALLY TO THEIR SUM
|
Distribution of two independent binomial variables conditionally to their sum _______________________________________ |
||
|
TUTORIAL |
||
_______________________________________________________
Related readings:
|
Want to contribute to this site ? |