This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 1.939

Hyperparameters and tuning strategies for random forest

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model‐based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters. This article is categorized under: Algorithmic Development > Biological Data Mining Algorithmic Development > Statistics Algorithmic Development > Hierarchies and Trees Technologies > Machine Learning
Boxplots of performance differences to ranger default. On the left side the boxplots with outliers are depicted and on the right side the same plots without outliers. For the error rate, the brier score and the logarithmic loss, low values are better, while for the area under the curve (AUC), high values are preferable. If the tuned measure equals the evaluation measure, the boxplot is displayed in gray
[ Normal View | Magnified View ]
Average runtime of the different algorithms on different datasets (upper plot: Unscaled, lower plot: Logarithmic scale). The datasets are ordered by the average runtime of the tuneRangerMMCE algorithm
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Machine Learning
Algorithmic Development > Biological Data Mining
Algorithmic Development > Hierarchies and Trees
Algorithmic Development > Statistics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts