Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 4.476

Predicting disease‐associated genes: Computational methods, databases, and evaluations

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Complex diseases are associated with a set of genes (called disease genes), the identification of which can help scientists uncover the mechanisms of diseases and develop new drugs and treatment strategies. Due to the huge cost and time of experimental identification techniques, many computational algorithms have been proposed to predict disease genes. Although several review publications in recent years have discussed many computational methods, some of them focus on cancer driver genes while others focus on biomolecular networks, which only cover a specific aspect of existing methods. In this review, we summarize existing methods and classify them into three categories based on their rationales. Then, the algorithms, biological data, and evaluation methods used in the computational prediction are discussed. Finally, we highlight the limitations of existing methods and point out some future directions for improving these algorithms. This review could help investigators understand the principles of existing methods, and thus develop new methods to advance the computational prediction of disease genes. This article is categorized under: Technologies > Machine Learning Technologies > Prediction Algorithmic Development > Biological Data Mining
Classification of existing computational methods for disease gene prediction
[ Normal View | Magnified View ]
Eight types of evidence valuable for disease gene prediction. The five types of evidence in the left blue circle characterize the functional similarity of genes, and the two types of evidence in the right yellow circle contain disease‐associated information. Gene expression in the middle contain both types of information
[ Normal View | Magnified View ]
Classic pipeline of supervised machine learning‐based methods
[ Normal View | Magnified View ]
Schematic example of a random walk. The network contains 50 nodes (genes), in which 15 of them are disease‐associated. Their corresponding entries in P0 are equal to 1. The random walk is performed with a restart probability of r = 0.5, and it reaches a stead state when t = 17
[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Biological Data Mining
Technologies > Prediction
Technologies > Machine Learning

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts