Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 8.127

Machine‐learning scoring functions to improve structure‐based binding affinity prediction and virtual screening

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure‐based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine‐learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine‐learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert‐selected structural features can be strongly improved by a machine‐learning approach based on nonlinear regression allied with comprehensive data‐driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405–424. doi: 10.1002/wcms.1225 This article is categorized under: Computer and Information Science > Chemoinformatics
Examples of force‐field, knowledge‐based, empirical, and machine‐learning scoring functions (SFs). The first three types, collectively termed classical SFs, are distinguished by the type of structural descriptors employed. However, from a mathematical perspective, all classical SFs assume an additive functional form. By contrast, nonparametric machine‐learning SFs do not make assumptions about the form of the functional. Instead, the functional form is inferred from training data in an unbiased manner. As a result, classical and machine‐learning SFs behave very differently in practice.
[ Normal View | Magnified View ]
Blind test showing how test set performance (Rp) grows with more training data when using random forest (models 3 and 4), but stagnates with multiple linear regression (model 2). Model 1 is AutoDock Vina acting as a baseline for performance.
[ Normal View | Magnified View ]
Workflow to train and validate a scoring function (SF). Feature Selection (FS) can be data‐driven or expert‐based (for simplicity, we are not representing embedded FS that would take place at the model training stage). A range of machine‐learning regression or classification models can be used for training, whereas linear regression has been used with classical SFs. Model selection has ranged from taking the best model on the training set to selecting that with the best cross‐validated performance. Metrics for model selection and performance evaluation depend on the application.
[ Normal View | Magnified View ]
Criteria to select data to build and validate scoring functions (SFs). Protein‐ligand complexes can be selected by their quality, protein‐family membership as well as type of structural and binding data depending on intended docking application and modeling strategy. Classical SFs typically employ a few hundred x‐ray crystal structures of the highest quality along with their binding constants to score complexes with proteins from any family. In contrast, data selection for machine‐learning SFs is much more varied, with the largest training data volumes leading to the best performances.
[ Normal View | Magnified View ]

Browse by Topic

Computer and Information Science > Chemoinformatics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts