Decision Trees

Decision Trees are one of the most popular Data Mining classification technique. Once built, a Decision Tree looks indeed like an upside-down arborescence with each  terminal node ("or "leaf") containing a fraction of the original sample consisting mostly of individuals belonging to one particular class.

 

 

A new observation "trickles down" the Tree from the root node to a "leaf" (see bottom illustration). Its path down the Tree is uniquely determined by the values of its attributes (predictors). The observation is then assigned to the dominating class in the leaf. More specifically, for every k, the posterior probability of  class k in the leaf may be estimated by the proportion of sample observations in the leaf that belong to class k.

 

Each leaf is a "segment" of the sample that is as homogenous as possible with respect to the dependent variable (class).

 

 Decision Trees can accommodate both numerical and categorical predictors. They are fast, can accommodate large volumes of data, and their decisions can be interpreted to a certain extent. But these predictions are usually less accurate than those made by more sophisticated techniques like Logistic Regression or Neural Networks.

 

There are several types of Decision Trees available to the practioner. What makes them different is how they decide on a split, and how they handle categorical predictors. The main types are known as :

    * CART

    * CHAID

    * C5

    * QUEST

 ___________________________________________________

 

 

Tutorial 1

 

In this first Tutorial, we go over the basic principles of Decision Trees. Although there are many different types of Trees, they differ mostly by the way nodes are split, but they share a large body of common strategy that is dictated by the desire to consider variables one at a time.

 

 

OVERVIEW OF DECISION TREES

What are Decision Trees ?

Classifiers (and regressors too)

Rectangular regions

Sub-optimal regions

Growing the Tree

Discriminating power of a predictor

Using both predictors

Growing the Tree

Choosing a predictor for a split

Choosing a split

Split criterion

"Greedy" algorithm

How big a Tree ?

Using the Tree

Classification of new observations

Identifying important predictors

TUTORIAL

___________________________________________

 

 

Tutorial 2

 

So far, we paid no attention to whether the predictors were categorical, ordinal or numerical. In fact, Decision Trees always consider predictors as categorical. What we mean is that if a predictor is numerical, it is first quantized into contiguous blocks that are then considered as modalities. Let's go over the details.

 

THE VARIOUS TYPES OF PREDICTORS

Categorical predictors

The standard case

Merging the modalities

Too many modalities

Chi-square test for merging

Binary Trees

A special case : binary dependent variable

Ordinal predictors

Merging

Splitting

Numerical predictors

TUTORIAL

_________________________________________

 

Tutorial 3

 

We now go over the various criteria used for deciding which node to split next, and how to define the split.

 

SPLITTING NODES

Impurity

Impurity of a node

Misclassification rate

Gini index

Entropy

Variance

Concavity of the impurity criterion

Impurity of the set of child nodes

Splitting the node

"Chi-square"-based splitting : CHAID

The "Twoing" split

Priors and weights

Costs

TUTORIAL

_________________________________________

 

 

Tutorial 4

 

Like all models, whether "parametric" or "non parametric", Decision Trees are prone to overfitting. When the process of tree growing is pushed beyond a certain limit, the predictive performance of the tree starts deteriorating : the leaves are then determined by the finest details of the data, that is, by noise.

Two approaches are used for avoiding overfitting :

    * Early stopping of the tree growth,

    * And pruning of a certainly overgrown tree.

 

 

THE "RIGHT SIZE" TREE

Tree "overgrowing"

Stopping the Tree growth

Stopping conditions

Node size

Parent node size

Child node size

Minimum impurity

Parent node impurity

Child node impurity

Set of child nodes impurity

Minimum decrease in impurity

Identifying the thresholds

Weaknesses of stopping conditions

"Cost-Complexity" Pruning

Penalizing large Trees

Best sub-tree

Pruning

Adjusting the penalty

Nested optimal subtrees

Choosing ai

 

TUTORIAL

_________________________________________

Download this Glossary