This Title All WIREs
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 25.113

Featurization strategies for protein–ligand interactions and their applications in scoring function development

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract The predictive performance of classical scoring functions (SFs) seems to have reached a plateau. Currently, SFs relying on sophisticated machine learning techniques have shown great potential in binding affinity prediction and virtual screening. As one of the most indispensable components in the workflow of training a machine learning scoring function (MLSF), the featurization or representation process enables us to catch certain physical processes that are important for protein–ligand interactions and to obtain machine‐readable descriptors. Currently, according to how they are derived, the descriptors used in MLSFs for both continuous and binary binding affinity estimates can be grouped into two broad categories: handcrafted features and automated‐extraction features. Moreover, the automated‐extraction features emerge as a new featurization trend along with the application of deep learning algorithms. Here, we make a thorough summary of the advances in the featurization strategies for protein–ligand interactions in the context of MLSFs, with emphasis on the recently rising automated‐extraction features. We also discuss the similarity between protein–ligand interaction representations and small‐molecule representations, and the challenges confronted by the scientific community in characterizing protein–ligand interactions. We expect that this review could inspire the development of novel featurization approaches and boosted MLSFs. This article is categorized under: Data Science > Artificial Intelligence/Machine Learning Software > Molecular Modeling Molecular and Statistical Mechanics > Molecular Interactions
The workflow to train an machine learning scoring function
[ Normal View | Magnified View ]
The concordance between protein–ligand complex representation and small‐molecule representation
[ Normal View | Magnified View ]
Conceptual illustration of automated‐extraction featurization strategies for protein–ligand complexes. (a) The local context of each compound atom, including atom types, distances, amino acid type, could be learned to translate protein–ligand complexes into fixed‐sized vectors. (b) 3D grids of atomic densities that store the ligand and receptor information are currently the most used automated‐extraction featurization strategy. (c) The concept of adjacency from atomic distance matrix is adopted in graph‐based features
[ Normal View | Magnified View ]
A schematic representation of the handcrafted featurization strategies for protein–ligand complexes. (a) Interaction energy terms involved in protein–ligand recognition can characterize protein–ligand complexes. (b) Occurrences of protein–ligand atom pairs at different distance thresholds are frequently used in machine learning scoring functions. (c) Residue‐based terms generated in molecular mechanics/generalized Born surface area (MM/GBSA) energy decomposition can be regarded as a kind of protein–ligand interaction fingerprints. (d). Multiscale weighted colored subgraph is one of the mathematical features
[ Normal View | Magnified View ]

Browse by Topic

Molecular and Statistical Mechanics > Molecular Interactions
Software > Molecular Modeling
Computer and Information Science > Visualization

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts