Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

Classification and regression trees

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Classification and regression trees are machine‐learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14‐23 DOI: 10.1002/widm.8

Figure 1.

Partitions (left) and decision tree structure (right) for a classification tree model with three classes labeled 1, 2, and 3. At each intermediate node, a case goes to the left child node if and only if the condition is satisfied. The predicted class is given beneath each leaf node.

[ Normal View | Magnified View ]
Figure 2.

CRUISE, QUEST, RPART, C4.5, and GUIDE trees for car data without manuf. The CHAID tree is trivial with no splits. At each intermediate node, a case goes to the left child node if and only if the condition is satisfied. The predicted class is given beneath each leaf node.

[ Normal View | Magnified View ]
Figure 3.

Data and split at the node marked with an asterisk (*) in the GUIDE tree in Figure 2.

[ Normal View | Magnified View ]
Figure 4.

CRUISE, C4.5, and GUIDE trees for car data with manuf included. The CHAID, RPART, and QUEST trees are trivial with no splits. Sets S1 and S2 are (Plymouth, Subaru) and (Lexus, Lincoln, Mercedes, Mercury, Volvo), respectively, and S3 is the complement of S1S2.

[ Normal View | Magnified View ]
Figure 5.

GUIDE piecewise constant, simple linear, simple quadratic, and stepwise linear, and M5' piecewise constant regression trees for predicting FEV. The RPART tree is a subtree of (a), with leaf nodes marked by asterisks (*). The mean FEV and linear predictors (with signs of the coefficients) are printed beneath each leaf node. Variable ht2 is the square of ht.

[ Normal View | Magnified View ]
Figure 6.

Data and fitted regression lines in the five leaf nodes of the GUIDE piecewise simple linear model in Figure 5(b).

[ Normal View | Magnified View ]
Figure 7.

Data and fitted regression functions in the three leaf nodes of the GUIDE piecewise simple quadratic model in Figure 5(c).

[ Normal View | Magnified View ]
Figure 8.

Observed versus predicted values for the tree models in Figure 5 and two ordinary least squares models.

[ Normal View | Magnified View ]

Browse by Topic

Technologies > Classification
Technologies > Machine Learning
Technologies > Prediction
Technologies > Statistical Fundamentals

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts