Multinomial distribution
The binomial distribution B(n, p) is obtained by considering n tosses of a coin (Heads and Tails), with p being the probability that the coin will land on a Head.
The multinomial distribution is a generalization of the binomial distribution : each "toss" can now produce more than just two outcomes. For example, on may imagine rolling a "die" with k faces, with pi being the probability for the die to land on face ri.
A multinomial distribution is entirely characterized by :
* n, the number of times the die is rolled.
* The set of probabilities {p1, p2 , ..., pk} with p1 + p2 + ..., pk = 1
-----
After having rolled the die n times, we denote by ni the number of times the die landed face ri up. We therefore have n1 + n2 + ..., nk = n. Because rolling a die is a random activity, the nis are the realizations of k random variables that we denote Ni (i = 1, 2, ..., k). These variables are not independent, as they are linked by the relation Si Ni = n.
The multinomial distribution Mult(n, p1, p2 , ..., pk ) is the joint distribution of the k random variables Ni. It is therefore a multivariate, discrete distribution. Its support is the set of k-uplets of non negative integers {n1, n2, ..., nk} such that n1 + n2 + ...+ nk = n.
The distribution Mult(n, p1, p2 , ..., pk ) is determined by the values of the probabilities of each of the possible k-uplets. We denote these probabilities by P{N1 = n1, ..., Nk = nk }.
We'll show that :
|
|
for all the k-uplets within the support of the distribution (and 0 otherwise).
-----
The binomial distribution is a special case with k = 2.
The term n! / (n1!.n2!...nk!) is called the multinomial coefficient. It is the number of different "words" that can be written with an alphabet containing k letters by using the first letter n1 times, the second letter n2 times etc...
We'll justify this result two different ways.
-----
Note that the multinomial coefficient is equal to the coefficient of the monom xn1xn2 ...xnk in the development of (x1 + x2 + ...+ xk )n , hence the name of the distribution.
Marginal distributions of the multinomial distribution
Recall that Ni is the number of observations in the modality ri.
Consider Ni. After each draw :
* The result is ri with probability pi.
* And therefore the result is "Not ri" (that is, any other modality) with probability (1 - pi).
Therefore Ni is binomial B(n, pi).
Ni being B(n, pi), we have :
|
E[Ni] = npi |
Ni being B(n, pi), we have :
|
Var(Ni) = n.pi(1 - pi) |
We'll give two demonstrations of the following result :
|
Cov(Ni , Nj ) = -n pi pj |
All the covariances are negative : for a given number of draws n, any increase of ni will be conducive, on the average, to a reduction of the number of observations in any other modality.
Recall that we gave here a third demonstration of this result by calling on the Theorem of iterated expectation.
-----
Because of the constraint :
Si Ni = n
the covariance matrix of the multinomial distribution is not full rank : its rank is k - 1.
By reporting to the definition of the correlation coefficient between two variables, we have :
|
|
Note that this expression does not contain n.
Let us partition the set of variables {Ni} into two groups. For ease of exposition, let k' be an integer smaller than k, and consider :
* A first group containing the first k' variables {N1, N2 , ..., Nk'}.
* A second group containing the last k' - k variables {Nk'+1, Nk'+2 , ..., Nk}.
We now impose the condition N1+ N2 + ...+ Nk' = z (and therefore Nk'+1+ Nk'+2 + ...+ Nk = n - z). In other words, we consider only these series of n draws that lead to N1+ N2 + ...+ Nk' = z, and ignore the others.
Then it can be shown that the joint distribution of {N1, N2 , ..., Nk'} conditionally to N1+ N2 + ...+ Nk' = z in the multinomial distribution Mult(z, p'1, p'2 , ..., p'k') with :
p'i = pi /(p1+ p2 + ... + pk')
with a similar result for the other group.
The moment generating function of the multinomial distribution Mult(n, p1, p2 , ..., pk ) is :
|
|
Let X = {X1, X2, ..., Xk} be a set of k random variables distributed as Mult(n, p1, p2 , ..., pk ).
Then X* = {X1 + X2, ..., Xk} is a set of (k - 1) r.v. distributed as Mult(n, p1+ p2 , ..., pk ).
Why ?
This result generalizes easily to any partitioning of the set of variables into groups of variables.
There is an intimate link between the multinomial distribution and the Poisson distribution. We showed here that if {N1, N2, ..., Nk} are k independent (but not necessarily identically distributed) Poisson variables, then the joint distribution of {N1, N2, ..., Nk} conditionally to their sum is a multinomial distribution.
___________________________________________________
|
Tutorial 1 |
We calculate the probability mass function of the multinomial distribution. The key part is establishing the expression of the multinomial coefficient, which we do two by different methods.
PROBABILITY MASS FUNCTION OF THE MULTINOMIAL DISTRIBUTION
|
Probability mass function of the multinomial distribution The multinomial coefficient First demonstration Second demonstration |
||
|
TUTORIAL |
||
________________________________________________________
|
Tutorial 2 |
We now use two different methods for calculating Cov(Ni, Nj ), the covariance of the numbers of observations in two modalities of the multinomial distribution.
The second method represents these (random) numbers as sums of auxiliary Bournoulli variables. This approach is often quite effective with discrete probability distributions problems (see for example the calculation of the mean of the hypergeometric distribution).
COVARIANCES OF THE MULTINOMIAL DISTRIBUTION
|
Direct calculation of the covariance Second demonstration (indicator variables) Number of observations in a modality as a sum of Bernoulli r.v. Calculation of the covariance |
||
|
TUTORIAL |
|
|
____________________________________________
Related readings
|
Want to contribute to this site ? |