Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 8.127

Machine‐learning scoring functions for structure‐based drug lead optimization

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Molecular docking can be used to predict how strongly small‐molecule binders and their chemical derivatives bind to a macromolecular target using its available three‐dimensional structures. Scoring functions (SFs) are employed to rank these molecules by their predicted binding affinity (potency). A classical SF assumes a predetermined theory‐inspired functional form for the relationship between the features characterizing the structure of the protein–ligand complex and its predicted binding affinity (this relationship is almost always assumed to be linear). Recent years have seen the prosperity of machine‐learning SFs, which are fast regression models built instead with contemporary supervised learning algorithms. In this review, we analyzed machine‐learning SFs for drug lead optimization in the 2015–2019 period. The performance gap between classical and machine‐learning SFs was large and has now broadened owing to methodological improvements and the availability of more training data. Against the expectations of many experts, SFs employing deep learning techniques were not always more predictive than those based on more established machine learning techniques and, when they were, the performance gain was small. More codes and webservers are available and ready to be applied to prospective structure‐based drug lead optimization studies. These have exhibited excellent predictive accuracy in compelling retrospective tests, outperforming in some cases much more computationally demanding molecular simulation‐based methods. A discussion of future work completes this review. This article is categorized under: Computer and Information Science > Chemoinformatics
Performances of the classical scoring functions (SFs) tested on CASF‐2007 (“+” signs) along with every SF that has surpassed the previous best Rp performance (“x” signs, all machine learning [ML]‐based SFs). This shows that: (a) no competitive classical SF has been introduced since the advent of ML‐based SFs, (b) Deep learning‐based SFs are not necessarily the most predictive, and (c) as we can now predict the affinities of protein–ligand complexes with high accuracy on average across diverse targets, future efforts should focus instead on which SF is most predictive for each target (accuracy on some targets is still poor)
[ Normal View | Magnified View ]
PDBbind v2018 blind test showing how test set performance (in terms of Rp) grows with more training data (from the PDBbind refined sets v2007 to v2017) when using random forest, but stagnates with multiple linear regression. AutoDock Vina acts as a baseline without retraining
[ Normal View | Magnified View ]

Browse by Topic

Computer and Information Science > Chemoinformatics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts