This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract The random forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and return measures of variable importance. This paper synthesizes 10 years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is paid to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research. © 2012 Wiley Periodicals, Inc. This article is categorized under: Algorithmic Development > Hierarchies and Trees Algorithmic Development > Statistics Application Areas > Health Care

Random forest algorithm.

[ Normal View | Magnified View ]

Two different types of variable importance of the example data set. (a) permutation VIM. (b) Gini VIM. The two VIMs result in a different ordering.

[ Normal View | Magnified View ]

Related Articles

Classification and regression trees

Browse by Topic

Application Areas > Health Care
Algorithmic Development > Statistics
Algorithmic Development > Hierarchies and Trees

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts