Geometric distribution
You're playing "Heads and Tails". You decide to play until the first "Head" turns up, and then stop the game. How many times do you have to toss the coin ?
After the first game, you decide to play a second game. You expect the number of tosses needed to "win" this second game to be different from the number of tosses you needed to win the first one. Therefore the number X of tosses needed to finally obtain a "Head" is a random variable. By definition, the probability distribution of this r.v. is the geometric distribution.
Clearly, this probability distribution function depends on the bias of the coin. Let p be the probability for a "Head" (and therefore q = 1 - p the probability for a "Tail"). For a fair coin, p = 0.5. But suppose p = 0.9, you would expect, on the average, the games to be much shorter than for a fair coin.
The Geometric Distribution is sometimes defined as that of the number of tosses before the first "Head" turns up (instead of the number of tosses needed to obtain the first "Head"). Let X' denote this new r.v.. Clearly :
X ' = X - 1
As an exercise, you may derive the basic properties of this r.v. (see below) :
* Either by direct calculation,
* Or by using the results relative to the functions of r.v., as X' is just a translation of X (see here).
This animation illustrates the concept of Geometric Distribution.
|
|
The probability pSlide the limit between the gray and white areas of the upper rectangle with your mouse. You then change the value of the probability p, which is equal to the ratio of the length of the white area and the total length of the rectangle. Lower frame* The lower frame displays the Geometric Distribution for the selected value of p. This distribution changes with p : it becomes "flatter" and stretches out to the the right for small values of p but always keeps its exponential character, as illustrated by the red exponential curve of parameter q. For large values of p, the probability of having to wait for more than a few tosses before "Heads" shows up becomes negligible.
* It is sometimes perceived as surprising that "Heads" occurs more frequently in the first position than in any other position, irrespective of the value of p. For very small values of p, shouldn't it be expected that it be very unlikely that "Heads" will appear right away ? Shouldn't we rather expect to have to wait a certain number of tosses before the probability of getting "Heads" becomes substantial ? Consequently, shouldn't we expect k with the largest probability to be somewhat "to the right", that is, larger than 1 ? This remark illustrates the difference between the "mode" (the value of the r.v. with the largest probability), and the "mean". True enough, the average number of tosses before getting "Heads" for the first time becomes larger for smaller values of p. Yet, it is always the case that "Heads" will show up more frequently in the first position than in any other position.
* The vertical blue line marks the mean of the distribution.
* Note that as p runs across the available range, the height of each pink bin goes through a maximum. For a given k, it seems that this maximum occurs when the mean of the the distribution is precisely equal to k. This is true. Can you demonstrate it ? Animation* Click on"Go", and observe the progressive build up of the histogram of the geometric distribution. * Click on "Pause", then on "Next". A new sample is built point after point. The build up goes on as long as the new points fall in the gray area (which happens, for each new toss, with probability q = 1 - p), and stops at the first (red) point falling in the white area (which happens, for each new toss, with probability p). The click again on "Next" and start building up a new sample etc... |
We'll establish the following properties of the geometric ditribution.
The probability for the
first "Head" to turn up at toss #k is :
|
P{X = k} = p.qk-1 |
By
definition, F(n) is the probability that it takes at most n
tosses for the first "Head" to turn up.
|
F(n) = P{X |
|
µ = 1/p |
|
σ² = q/p² |
-----
This result can also be written as :
σ² = µ(µ - 1)
The significance of this expression will appear when we consider the geometric distribution as belonging to the natural exponential family.
|
|
We show here that the generating function of the geometric distribution is :
|
|
The concept of "memoryless process" is explained here (within the context of the exponential distribution). This property is expressed by :
P{X > s + t | X > s} = P{X > t}
For the Geometric Distribution, this translates into :
This characteristic property illustrates the close relationship between the Geometric Distribution and the Exponential Distribution. This relationship is made explicit in the above interactive animation.
In fact, the Geometric Distribution may be perceived as the discrete approximation of the Exponential Distribution. This link can be formalized as follows. Suppose that the delay between two consecutive tosses is made to tend to 0. Then we'll show that the distribution function of the Geometric Distribution converges to that of an exponential distribution.
This connexion between the Geometric and Exponential distributions is the key to an intuitive interpretation of Poisson processes.
Note that we calculate :
*
The mean (here),
* The moment generating function
(here),
of the geometric distribution
by methods that involve only the memoryless
property and conditional
expectations.
The geometric distribution is a special case of a more general distribution called the "negative binomial distribution". Because it is quite a bit simpler than the negative binomial distribution, we give the geometric distribution a separate treatment.
____________________________________________________________
|
Tutorial |
These results are demonstrated in this Tutorial.
We also clarify the relationship between the geometric and the exponential distributions by showing that a geometric r.v. may be considered as resulting from the discretization of an exponential r.v. (The reverse relation is addressed here).
-----
The mean and variance of the geometric distributions are also calculated here by calling on the properties of the generating function.
THE GEOMETRIC DISTRIBUTION
|
Probability mass function of the geometric distribution The cumulative distribution function Mean µ (Direct calculation) Variance (Direct calculation) Moment generating function M.g.f. Mean Variance Memoryless property Geometric r.v. as a discretized exponential r.v. |
||
|
TUTORIAL |
||
____________________________________________________