Decision Trees
Decision Trees are one of the most popular Data Mining classification technique. Once built, a Decision Tree looks indeed like an upside-down arborescence with each terminal node ("or "leaf") containing a fraction of the original sample consisting mostly of individuals belonging to one particular class.
A new observation "trickles down" the Tree from the root node to a "leaf" (see bottom illustration). Its path down the Tree is uniquely determined by the values of its attributes (predictors). The observation is then assigned to the dominating class in the leaf. More specifically, for every k, the posterior probability of class k in the leaf may be estimated by the proportion of sample observations in the leaf that belong to class k.
Each leaf is a "segment" of the sample that is as homogenous as possible with respect to the dependent variable (class).
Decision Trees can accommodate both numerical and categorical predictors. They are fast, can accommodate large volumes of data, and their decisions can be interpreted to a certain extent. But these predictions are usually less accurate than those made by more sophisticated techniques like Logistic Regression or Neural Networks.
There are several types of Decision Trees available to the practioner. What makes them different is how they decide on a split, and how they handle categorical predictors. The main types are known as :
* CART
* CHAID
* C5
* QUEST
___________________________________________________
|
Tutorial 1 |
In this first Tutorial, we go over the basic principles of Decision Trees. Although there are many different types of Trees, they differ mostly by the way nodes are split, but they share a large body of common strategy that is dictated by the desire to consider variables one at a time.
OVERVIEW OF DECISION TREES
|
What are Decision Trees ? Classifiers (and regressors too) Rectangular regions Sub-optimal regions Growing the Tree Discriminating power of a predictor Using both predictors Growing the Tree Choosing a predictor for a split Choosing a split Split criterion "Greedy" algorithm How big a Tree ? Using the Tree Classification of new observations Identifying important predictors |
||
|
TUTORIAL |
||
___________________________________________
|
Tutorial 2 |
So far, we paid no attention to whether the predictors were categorical, ordinal or numerical. In fact, Decision Trees always consider predictors as categorical. What we mean is that if a predictor is numerical, it is first quantized into contiguous blocks that are then considered as modalities. Let's go over the details.
THE VARIOUS TYPES OF PREDICTORS
|
Categorical predictors The standard case Merging the modalities Too many modalities Chi-square test for merging Binary Trees A special case : binary dependent variable Ordinal predictors Merging Splitting Numerical predictors |
||
|
TUTORIAL |
||
_________________________________________
|
Tutorial 3 |
We now go over the various criteria used for deciding which node to split next, and how to define the split.
SPLITTING NODES
|
Impurity Impurity of a node Misclassification rate Gini index Entropy Variance Concavity of the impurity criterion Impurity of the set of child nodes Splitting the node "Chi-square"-based splitting : CHAID The "Twoing" split Priors and weights Costs |
||
|
TUTORIAL |
||
_________________________________________
|
Tutorial 4 |
Like all models, whether "parametric" or "non parametric", Decision Trees are prone to overfitting. When the process of tree growing is pushed beyond a certain limit, the predictive performance of the tree starts deteriorating : the leaves are then determined by the finest details of the data, that is, by noise.
Two approaches are used for avoiding overfitting :
* Early stopping of the tree growth,
* And pruning of a certainly overgrown tree.
THE "RIGHT SIZE" TREE
|
Tree "overgrowing" Stopping the Tree growth Stopping conditions Node size Parent node size Child node size Minimum impurity Parent node impurity Child node impurity Set of child nodes impurity Minimum decrease in impurity Identifying the thresholds Weaknesses of stopping conditions "Cost-Complexity" Pruning Penalizing large Trees Best sub-tree Pruning Adjusting the penalty Nested optimal subtrees Choosing ai
|
||
|
TUTORIAL |
||
_________________________________________