Analyse des Correspondances
L'Analyse Factorielle des Correspondances (AFC) est une présentation graphique des modalités de deux variables nominales, permettant une perception et une analyse visuelle de leurs interactions. La base de cette cette représentation est assez semblable à l'Analyse en Composantes Principales (ACP), et fait appel à la définition de "facteurs" permettant une représentation plane aussi fidèle que possible de la répartition des modalités.
Celles-ci sont représentées par des points dans un plan ou plusieurs plans factoriels, de façon à ce que leurs distances mutuelles soient interprétables en termes "d'attraction" ou de "répulsion" de modalités. Par exemple, si les deux variables sont :
* V1 avec 3 modalités M1, M2, et M3, et
* V2 avec 2 modalités N1 et N2,
la proximité des points M1 et N2 dans un plan factoriel peut être représentative du fait qu'il y a plus d'individus qui ont choisi la paire de modalités (M1, N2) que ne le laisserait supposer l'hypothèse d'indépendance entre les deux variables.
L'AFC permet également d'interpréter les proximités entre modalités appartenant à une même variable en termes de ressemblance des compositions de ces modalité sur les modalités de l'autre variable.
La machinerie mathématique de l'AFC est un peu lourde, et l'interprétation d'une AFC n'est pas aussi simple que l'exemple précédent pourrait le laisser supposer. Néanmoins, l'AFC est un outil irremplaçable pour une interprétation rapide et assez intuitive d'un grand tableau de contingence.
_____________________
L'AFC se généralise
à plus de deux variables nominales, et s'appelle alors l' "Analyse
des Correspondances Multiples" (ACM). Certains logiciels
anglo-saxons utilisent, en lieu et place de l'ACM, une technique équivalente
connue sous le nom d' "Analyse de l'Homogénéité" ("HOMogeneity
AnaLysiS, ou HOMALS).
_________________________________________________________
Ces Tutoriels n'ont pas encore
été traduits en français. Nous vous prions de nous excuser pour cette gêne. |
|
Tutorial 1 |
This first Tutorial is an overview of Correspondence Analysis. We show how contingency tables may be regarded as a numerical coding of the interaction between two categorical variables through frequencies of pairs of modalities.
A PCA-like transformation then allows the modalities of the variables to be represented as points in factorial planes. Visual analysis of these plots, and in particular of the proximities between modalities, will then give us a visual clue about whether the frequency profile of two modalities across other modalities are similar or not.
OVERVIEW OF CORRESPONDENCE ANALYSIS
|
Interaction between categorical variables Independent categorical variables Interaction between categorical variables The mechanism of CA Contingency tables PCA on rows and on columns Simultaneous representation What is expected from a graphical representation ? Axes Distance to the origin Two modalities belonging to the same variable Two modalities belonging to different variables |
||
|
TUTORIAL |
||
______________________________________________________
|
Tutorial 2 |
Correspondence Analysis does not work on raw contengency tables. It first normalizes them so that cell counts are replaced by frequencies, and modalities of one variable are decribed by normalized "frequency profiles" across the modalities of the other variable.
We then justify that the traditional euclidian distance in not appropriate in this setting for the purpose of measuring the similarity between modalities, and has to be replaced by the so-called "Chi-square distance". The upcoming PCAs will be performed with this newly defined distance.
THE MECHANISM OF CORRESPONDENCE ANALYSIS
|
Reformating data Contengency tables Frequencies Profiles Ponderation The Chi-square distance Definition of the Chi-square distance Why the Chi-square distance ? The 2 PCAs How many dimensions ? The barycenters Chi-square and total inertia |
||
|
TUTORIAL |
||
________________________________________
|
Tutorial 3 |
At this stage, we have performed two PCAs :
1) One on row profiles,
2) One on column profiles.
We are ready to proceed with the interpretation of the results. This interpretation will be inspired by the interpretation procedure of regular PCA, with some changes because of the specifics of Correspondence Analysis : ponderation of the modalities, Chi-square distance and the ensuing changes in interpreting inertias.
We review here the elements that will be needed for interpreting a CA. Later on, we will interpret a simple, but realistic example of CA, and we will need to keep the elements below in mind.
INTERPRETATION OF CORRESPONDENCE ANALYSIS
|
Plots Interpretation of the total inertia Eigenvalues Inertia of the modalities Weights of the modalities Coordinates, weight and inertia Barycenters and origin Contribution of modality to a factor Quality of representation of the modalities Inertia of the factors |
||
|
TUTORIAL |
||
_____________________________________________
|
Tutorial 4 |
We now treat a simple but realistic example. Although real life problems are usually quite a bit more complex, the step-by-step interpretation procedure that we demonstrate here would be very much the same. The treatment of this example covers the next three sections.
-----
The first section covers the interpretation of the factors.
EXAMPLE (Part 1) : INTERPRETATION OF THE FACTORS
|
The data The contingency table The Chi-square The inertia Total inertia How many factors ? Interpretation of the factors The basic principle Which modalities determine the first factor ? Interpretation of the first factor The second factor Other factors Summary of the interpretation of the factors |
||
|
TUTORIAL |
||
________________________________________________
|
Tutorial 5 |
The role of the plots of modalities is to suggest associations of modalities by pair, belonging :
* either to the same variable,
* or to different variables.
In this section, we address the issue of interpreting each variable individually. Each one of the two variable is described by a plot, and we address the issue of whether it is justified to overlay the two plots into a single combined plot.
EXAMPLE (Part 2) : INTERPRETING THE MODALITIES
|
"Quality" or "Square Cosines" Distance to the origin "Near center" modalities "Remote" modalities Heavy modalities Neighboring modalities |
||
|
TUTORIAL |
||
____________________________________
|
Tutorial 6 |
In the previous section, we interpreted each variable individually. We could thus discover some properties of the modalities that could certainly have been dug out of the contengency table, but that the plot of modalities made lot easy to identify.
We now come to interpreting the combined plot of modalities in order to analyze the interactions between the two variables. For this purpose, we display again the same "combined" plot as we did in the previous section, but this time we will consider both variables simultaneously.
EXAMPLE (Part 3) : THE COMBINED PLOT
|
The basic idea Neighboring modalities Confirming with the contingency table Expected populations An association is not symmetrical Summary of the analysis Analysis of the cloud Interpretation of the factors Interpretation of individual variables Interpretation of the combined plot |
||
|
TUTORIAL |
||
_________________________________________
|
Tutorial 7 |
We finally address some additional questions pertaining to the interpretation of the plots :
* Supplemetary variables, which are variables that were not taken into account for building the model, but that are diplayed on the plots and may facilitate their interpretation.
* Ordinal variables, which are categorical variables whose modalities are naturally ordered. In particular, we show how non linear interactions between variables may then be detected by a fundamentally linear technique.
CORRESPONDENCE ANALYSIS : COMPLEMENTS
|
Supplementary variables Ordinal variables Interpreting the factors The Guttman effect |
||
|
TUTORIAL |
||
____________________________________________
Voir aussi: