Multinomial distribution

Definition of the multinomial distribution

The binomial distribution B(n, p) is obtained by considering n tosses of a coin (Heads and Tails), with p being the probability that the coin will land on a Head.

The multinomial distribution is a generalization of the binomial distribution : each "toss" can now produce more than just two outcomes. For example, on may imagine rolling a "die" with k faces, with pi being the probability for the die to land on face ri.

A multinomial distribution is entirely characterized by :

    * n, the number of times the die is rolled.

    * The set of probabilities {p1, p2 , ..., pk}       with    p1p2 + ...,  pk = 1

-----

After having rolled the die n times, we denote by ni the number of times the die landed face ri up. We therefore have n1n2 + ...,  nk = n. Because rolling a die is a random activity, the nis are the realizations of k random variables that we denote Ni (i = 1, 2, ..., k). These variables are not independent, as they are linked by the relation Si Ni  = n.

The multinomial distribution Mult(n, p1, p2 , ..., pk ) is the joint distribution of the k random variables Ni. It is therefore a multivariate, discrete distribution. Its support is the set of k-uplets of non negative integers {n1n2, ...,  nk} such that n1n2 + ...+ nk = n.

The multinomial probability distribution

The distribution Mult(n, p1, p2 , ..., pk ) is determined by the values of the probabilities of each of the possible k-uplets. We denote these probabilities by P{N1 = n1, ..., Nk = nk }.

We'll show that :

 

 

for all the k-uplets within the support of the distribution (and 0 otherwise).

-----

The binomial distribution is a special case with k = 2.

Multinomial coefficient

The term n! / (n1!.n2!...nk!)  is called the multinomial coefficient. It is the number of different "words" that can be written with an alphabet containing k letters by using the first letter n1 times, the second letter n2 times etc...

We'll justify this result two different ways.

-----

Note that the multinomial coefficient is equal to the coefficient of the monom xn1xn2 ...xnk in the development of  (x1 + x2 + ...+ xk )n , hence the name of the distribution.

Marginal distributions of the multinomial distribution

Recall that Ni is the number of observations in the modality ri.

Distribution

Consider Ni. After each draw :

    * The result is ri with probability pi.

    * And therefore the result is "Not  ri" (that is, any other modality) with probability (1 - pi).

Therefore Ni is binomial B(n, pi).

Mean

Ni being B(n, pi), we have :

E[Ni] = npi

Variance

Ni being B(n, pi), we have :

Var(Ni) = n.pi(1 - pi)

 

Covariances

We'll give two demonstrations of the following result :

 

Cov(Ni , Nj ) = -n pi pj

 

All the covariances are negative : for a given number of draws n, any increase of ni will be conducive, on the average, to a reduction of the number of observations in any other modality.

 

Recall that we gave here a third demonstration of this result by calling on the Theorem of iterated expectation.

-----

Because of the constraint :

Si Ni  = n

the covariance matrix of the multinomial distribution is not full rank : its rank is k - 1.

Correlation coefficient

By reporting to the definition of the correlation coefficient between two variables, we have :

 

 

 

Note that this expression does not contain n.

Conditional distributions of the multinomial distribution

Let us partition the set of variables {Ni} into two groups. For ease of exposition, let k' be an integer smaller than k, and consider :

    * A first group containing the first k' variables {N1, N2 , ..., Nk'}.

    * A second group containing the last k' - k variables {Nk'+1, Nk'+2 , ..., Nk}.

We now impose the condition  N1+ N2 + ...+ Nk' = z (and therefore Nk'+1+ Nk'+2 + ...+ Nk = n - z). In other words, we consider only these series of n draws that lead to N1+ N2 + ...+ Nk' = z, and ignore the others.

Then it can be shown that the joint distribution of {N1, N2 , ..., Nk'} conditionally to N1+ N2 + ...+ Nk' = z  in the multinomial distribution Mult(z, p'1, p'2 , ..., p'k') with :

p'i = pi /(p1+ p2 + ... + pk')

with a similar result for the other group.

Moment generating function of the multinomial distribution

The moment generating function of the multinomial distribution Mult(n, p1, p2 , ..., pk ) is :

 

 

Fusing modalities

Let X = {X1, X2, ..., Xk} be a set of k random variables distributed as Mult(n, p1, p2 , ..., pk ).

Then X* = {X1 + X2, ..., Xk} is a set of (k - 1) r.v. distributed as Mult(n, p1+ p2 , ..., pk ).

Why ?

This result generalizes easily to any partitioning of the set of variables into groups of variables.

Link with the Poisson distribution

There is an intimate link between the multinomial distribution and the Poisson distribution. We showed here that if {N1, N2, ..., Nk} are k independent (but not necessarily identically distributed) Poisson variables, then the joint distribution of {N1, N2, ..., Nk} conditionally to their sum is a multinomial distribution.

___________________________________________________
 

 

Tutorial 1

 

We calculate the probability mass function of the multinomial distribution. The key part is establishing the expression of the multinomial coefficient, which we do two by different methods.

 

 

PROBABILITY MASS FUNCTION OF THE MULTINOMIAL DISTRIBUTION

Probability mass function of the multinomial distribution

The multinomial coefficient

First demonstration

Second demonstration

TUTORIAL

________________________________________________________

 

 

Tutorial 2

 

We now use two different methods for calculating Cov(Ni, Nj ), the covariance of the numbers of observations in two modalities of the multinomial distribution.

The second method represents these (random) numbers as sums of auxiliary Bournoulli variables. This approach is often quite effective with discrete probability distributions problems (see for example the calculation of the mean of the hypergeometric distribution).

 

 

COVARIANCES OF THE MULTINOMIAL DISTRIBUTION

Direct calculation of the covariance

Second demonstration (indicator variables)

Number of observations in a modality as a sum of Bernoulli r.v.

Calculation of the covariance

TUTORIAL

 

 ____________________________________________

 

Related readings

Binomial distribution

Poisson distribution

 

Download this Glossary

 

Want to contribute to this site ?