Analyse des Correspondances

L'Analyse Factorielle des Correspondances (AFC) est une présentation graphique des modalités de deux variables nominales, permettant une perception et une analyse visuelle de leurs interactions. La base de cette cette représentation est assez semblable à l'Analyse en Composantes Principales (ACP), et fait appel à la définition de "facteurs" permettant une représentation plane aussi fidèle que possible de la répartition des modalités.

 

Celles-ci sont représentées par des points dans un plan ou plusieurs plans factoriels, de façon à ce que leurs distances mutuelles soient interprétables en termes "d'attraction" ou de "répulsion" de modalités. Par exemple, si les deux variables sont :

    * V1 avec 3 modalités M1, M2, et M3,  et

    * V2 avec 2 modalités N1 et N2,

 

la proximité des points M1 et N2 dans un plan factoriel peut être représentative du fait qu'il y a plus d'individus qui ont choisi la paire de modalités (M1, N2) que ne le laisserait supposer l'hypothèse d'indépendance entre les deux variables.

L'AFC permet également d'interpréter les proximités entre modalités appartenant à une même variable en termes de ressemblance des compositions de ces modalité sur les modalités de l'autre variable.

 

 La machinerie mathématique de l'AFC est un peu lourde, et l'interprétation d'une AFC n'est pas aussi simple que l'exemple précédent pourrait le laisser supposer. Néanmoins, l'AFC est un outil irremplaçable pour une interprétation rapide et assez intuitive d'un grand tableau de contingence.

_____________________


 L'AFC se généralise à plus de deux variables nominales, et s'appelle alors l' "Analyse des Correspondances Multiples" (ACM). Certains logiciels anglo-saxons utilisent, en lieu et place de l'ACM, une technique équivalente connue sous le nom d' "Analyse de l'Homogénéité" ("HOMogeneity AnaLysiS, ou HOMALS).

_________________________________________________________

 

Ces Tutoriels n'ont pas encore été traduits en français. Nous vous prions de nous excuser pour cette gêne.
Une fois dans la zone des Tutoriels, vous pourrez librement accéder aux Tutoriels en français.

 

 

 

 

Tutorial 1

 

This first Tutorial is an overview of Correspondence Analysis. We show how contingency tables may be regarded as a numerical coding of the interaction between two categorical variables through frequencies of pairs of modalities.

A PCA-like transformation then allows the modalities of the variables to be represented as points in factorial planes. Visual analysis of these plots, and in particular of the proximities between modalities, will then give us a visual clue about whether the frequency profile of two modalities across other modalities are similar or not.

 

 

OVERVIEW OF CORRESPONDENCE ANALYSIS

Interaction between categorical variables

Independent categorical variables

Interaction between categorical variables

The mechanism of CA

Contingency tables

PCA on rows and on columns

Simultaneous representation

What is expected from a graphical representation ?

Axes

Distance to the origin

Two modalities belonging to the same variable

Two modalities belonging to different variables

TUTORIAL

______________________________________________________

 

 

Tutorial 2

 

Correspondence Analysis does not work on raw contengency tables. It first normalizes them so that cell counts are replaced by frequencies, and modalities of one variable are decribed by normalized "frequency profiles" across the modalities of the other variable.

We then justify that the traditional euclidian distance in not appropriate in this setting for the purpose of measuring the similarity between modalities, and has to be replaced by the so-called "Chi-square distance". The upcoming PCAs will be performed with this newly defined distance.

 

 

THE MECHANISM OF CORRESPONDENCE ANALYSIS

Reformating data

Contengency tables

Frequencies

Profiles

Ponderation

The Chi-square distance

Definition of the Chi-square distance

Why the Chi-square distance ?

The 2 PCAs

How many dimensions ?

The barycenters

Chi-square and total inertia

TUTORIAL

________________________________________

 

 

Tutorial 3

 

At this stage, we have performed two PCAs :

    1) One on row profiles,

    2) One on column profiles.

 

We are ready to proceed with the interpretation of the results. This interpretation will be inspired by the interpretation procedure of regular PCA, with some changes because of the specifics of Correspondence Analysis : ponderation of the modalities, Chi-square distance and the ensuing changes in interpreting inertias.

We review here the elements that will be needed for interpreting a CA. Later on, we will interpret a simple, but realistic example of CA, and we will need to keep the elements below in mind.

 

 

INTERPRETATION OF CORRESPONDENCE ANALYSIS

Plots

Interpretation of the total inertia

Eigenvalues

Inertia of the modalities

Weights of the modalities

Coordinates, weight and inertia

Barycenters and origin

Contribution of modality to a factor

Quality of representation of the modalities

Inertia of the factors

TUTORIAL

_____________________________________________

 

 

Tutorial 4

 

We now treat a simple but realistic example. Although real life problems are usually quite a bit more complex, the step-by-step interpretation procedure that we demonstrate here would be very much the same. The treatment of this example covers the next three sections.

-----

The first section covers the interpretation of the factors.

 

 

EXAMPLE (Part 1) : INTERPRETATION OF THE FACTORS

 The data

The contingency table

The Chi-square

The inertia

Total inertia

How many factors ?

Interpretation of the factors

The basic principle

Which modalities determine the first factor ?

Interpretation of the first factor

The second factor

Other factors

Summary of the interpretation of the factors

TUTORIAL

________________________________________________ 

 

 

Tutorial 5

 

The role of the plots of modalities is to suggest associations of modalities by pair, belonging :

    * either to the same variable,

    * or to different variables.

 

In this section, we address the issue of interpreting each variable individually. Each one of the two variable is described by a plot, and we address the issue of whether it is justified to overlay the two plots into a single combined plot.

 

 

EXAMPLE (Part 2) : INTERPRETING THE MODALITIES

"Quality" or "Square Cosines"

Distance to the origin

"Near center" modalities

"Remote" modalities

Heavy modalities

Neighboring modalities

TUTORIAL

____________________________________

 

 

Tutorial 6

 

In the previous section, we interpreted each variable individually. We could thus discover some properties of the modalities that could certainly have been dug out of the contengency table, but that the plot of modalities made  lot easy to identify.

We now come to interpreting the combined plot of modalities in order to analyze the interactions between the two variables. For this purpose, we display again the same "combined" plot as we did in the previous section, but this time we will consider both variables simultaneously.

 

 

EXAMPLE (Part 3) : THE COMBINED PLOT

The basic idea

Neighboring modalities

Confirming with the contingency table

Expected populations

An association is not symmetrical

Summary of the analysis

Analysis of the cloud

Interpretation of the factors

Interpretation of individual variables

Interpretation of the combined plot

TUTORIAL

_________________________________________

 

 

Tutorial 7

 

We finally address some additional questions pertaining to the interpretation of the plots :

    * Supplemetary variables, which are variables that were not taken into account for building the model, but that are diplayed on the plots and may facilitate their interpretation.

    * Ordinal variables, which are categorical variables whose modalities are naturally ordered. In particular, we show how non linear interactions between variables may then be detected by a fundamentally linear technique.

 

 

CORRESPONDENCE ANALYSIS : COMPLEMENTS

Supplementary variables

Ordinal variables

Interpreting the factors

The Guttman effect

TUTORIAL

 

____________________________________________

 

Voir aussi:

Analyse en Composantes Principales

Tableau de contingence

Téléchargez ce Glossaire