Independent (variables)

This word has two different meanings :

 

  1. In predictive modeling, input variables are often called the "independant" variables (and the variable to be predicted is called the "dependent" variable). Remember that in descriptive modeling, no distinction is made between variables.

  2.  In statistics, two variables X and Y are said to be independent (from each other) if knowing the value that X takes for one particular observation provides absolutely no information as to the value that Y  will take for the same observation (and conversely). In other words, whatever the imposed value X0  on X, the observed distribution of values for Y  for those observations with the same value X0 of  X  is always the same (and conversely).

 

A very commmon case of departure from the condition of independence is collinearity. A set of variables is said to exhibit collinearity if any of the variables of the set is a (near) linear combination of the other variables in the set.

 

Models built with (statistically) independent input variables are pretty stable and trusworthy. Therefore, it is good practice to use as input to a model variables that are "as independent from each other" as possible. That may mean resorting to dimensionality reduction techniques as pre-processing, as these techniques produce a smaller  number of new variables that are "more independent from each other" than the original variables.
This is, for instance, just what PCA does. The new variables it creates (i.e. the so-called "Principal Components") are uncorrelated, which means that their pairwise correlation coefficients are exactly 0. But be careful,  uncorrelation is a only weak form of independence :

    * Two variables which are independent from each other (in the statistical sense) are uncorrelated,

    * But two variables with a "0" correlation coefficient may very well have a strong, non linear relationship (for more on that, see here).
 

There exist now techniques that are more powerful than PCA. They generate new variables that are not only decorrelated, but also genuinely independent. We'll tell you more about that as soon as we know of a commercially available versions of these techniques.

 

Indicators (Class)

In a data base, categorical variables are usually represented by their modalities, which are non numerical quantities. For example, the variable "Region" might have four modalities : "America", "Europe", "Asia" and "Africa" (top image). But certain algorithms can handle only numerical variables. It is then possible to code categorical variables into a numerical form as follows :

    1) If the variable has M modalities (here, 4), then M new columns are created.

    2) Each one of these new columns is assigned to one of the modalities of the variable.

    3) For each record in the table, all positions of these new columns are set to "0", except the one column that corresponds to the modality of the variable that the record has adopted.

    4) The original column (with the modalities) is erased, or masked (bottom image).

 

 

 

Each of the new columns is now considered as a new numerical variable, whose values can only be "0" or "1". These new variables are called (class) "indicators". Therefore, the original categorical variable is replaced by as many numerical indicators as it has modalities.

 

Class indicators are particularly important in classification, where the dependent variable is, by very nature, categorical. It can easily be shown that for each new record, the class posterior probabilities are just the values of the regression functions of the indicators. Therefore, the classes are first coded as indicators, then the regression functions of these indicators are estimated. Such techniques as Logistic Regression and Neural Networks operate just this way.

 

 When there are only two classes, only one class indicator is created. It takes the value "0" or "1" depending on which class the record belongs to.

 

Inertia  (of a cloud of points)

Please see here.


Intercept

Please see here.

 

Interpretability

A Decision Tree generates rules that states facts in business terms : it is interpretable. Under certain conditions, coefficients of a Linear Regression can receive an interpretation in terms of the influence that each input variable on the model's prediction.

 

On the other hand, a Neural Network (supervised or unsupervised), even if producing excellent results, produces nothing like "rules" : it is not interpretable, and is said to function like a "black box" (although there are now many tools available to the user that help him go beyond a simple prediction)..

 

Interpretability is obviously a valuable property for a model. It should nevertheless be kept in mind that "there is no such thing as a free lunch", and that interpretability has a price :

 

        1) Interpretability comes as a by-product of the very simple architecture of certain models. Because of this very simplicity, interpretable models usually cannot provide high quality results (high bias).

 

        2) At the other end of the spectrum, Neural networks, because of their richer architecture, are both powerful and non interpretable.

 

In other words, there is balance to be found by the user between performance and interpretability. Only the nature of the problem can decide on which side the scale should tip.

 

Download this Glossary

 

Want to contribute to this site ?