Descriptive modeling
Descritpive Modeling is one of the two main branches of Data Modeling (the other one being Predictive Modeling). Il is also called "Exploratory Analysis".
It's purpose is to extract compact and easily understood information from large, sometimes gigantic data tables. Contrary to Predictive Modeling, it makes no distinction between variables, that are here all placed on the same footing.
Descriptive Modeling is not as neatly structured as Predictive modeling. It encompasses a large and disparate set of goals and techniques. Here are a few examples :
* The study may focus on one variable only, and then require only ordinary Statistics. Calculating the mean and variance of a variable, draw its histogram with various bin widths are simple Descriptive Models.
* A bit mode complex is studying pairs of variables, typically through their correlation coefficient. So, enunciating that the correlation coefficient of "Height" and "Weight" is 0.65 is a useful, compact, (not so) easily understandable information that may have been extracted from a table with millions of lines.
* Some other Descriptive Models are definitely more complex. For example :
____________________________________________________________
These last two examples show that the distinction between Descritive and Predictive Modeling is a bit artificial.
Consider Principal Component Analysis (PCA), the most popular Dimensionality Reduction technique (although it is rarely used for that purpose). PCA operates by detecting linear redundancies between variables, and then use these redundancies to define a small number of synthetic variables that are then used for describing the data set with little information loss.
From a theoretical standpoint, these redundancies could also be used for predicting some of the original variables using the values of some of the other original variables. Consequently, PCA could also be perceived as a Predictive technique, but no practioner ever considers it that way.
Similarly, the high value of a Correlation Coefficient of a pair of numerical variables is typically considered as a nice descriptive information. Yet, an immediate consequence is that Simple Linear Regression can predict with reasonably good accuracy the values of the other variable. So this information might be considered as predictive as well as descriptive.
The conclusion is that Descriptive Modeling is not defined by a set of specific techniques, but rather by the need to account for some of the structures in the data in a compact and comprehensible way.
We already mentioned that Clustering detects redundancies between the rows of a data table. Recall that Predictive Modeling detects the redundancies between the columns of the table. It would therefore be conceivable to present Data Modeling along these two well defined directions, but it is not customary to do so.
____________________________________________
Related readings