This Title All WIREs
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 16.778

The enumeration of chemical space

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract In the field of medicinal chemistry, the chemical space describes the ensemble of all organic molecules to be considered when searching for new drugs (estimated >1060 molecules), as well as the property spaces in which these molecules are placed for the sake of describing them. Molecules can be enumerated computationally by the millions, which was first undertaken in the field of computer‐aided structure elucidation. Scoring the enumerated virtual libraries by virtual screening has recently become an attractive strategy to prioritize compounds for synthesis and testing. Enumeration methods include combinatorial linking of fragments, genetic algorithms based on cycles of enumeration and selection by ligand‐based or target‐based scoring functions, and exhaustive enumeration from first principles. The chemical space of molecules following simple rules of chemical stability and synthetic feasibility has been enumerated up to 13 atoms of C, N, O, Cl, S, forming the GDB‐13 database with 977 million structures. The database has been organized in a 42‐dimensional chemical space using molecular quantum numbers (MQN) as descriptors, which can be visualized by projection in two dimensions by principal component analysis, and searched within seconds using a Web browser available at www.gdb.unibe.ch. © 2012 John Wiley & Sons, Ltd. This article is categorized under: Computer and Information Science > Chemoinformatics

(a) Principle of spaceship algorithm, which travels in the chemical space of molecules up to 50 heavy atoms. Chemical space is represented as a plane in which the horizontal axis indicates molecular size and the vertical axis represents molecular polarity and rigidity. The exhaustive enumeration database GDB (see below) occupies densely the small molecule chemical space. (b) Chemical space travel between AMPA and CNQX represented in the two‐dimensional Tanimoto similarity space. The x‐ and y‐axes indicate similarity to start and target as the geometrical mean of the Tanimoto similarity coefficients of the substructure fingerprints (TSF) and pharmacophore fingerprints (TPF). The trajectory library is colored according to the distance from CNQX to AMPA in number of mutation steps. Binding energies as estimated by docking with Autodock 3.0.5 to the AMPA‐receptor 1FTK.pdb are indicated for start and target and a strong‐docking intermediate.

[ Normal View | Magnified View ]

Web browser windows accessible at www.gdb.unibe.ch to search for nearest CBDMQN neighbors. (a) MQN‐browser for GDB‐13, allows searching of 977 million structures in GDB‐13, shown with nicotine as example query. (b) MQN‐browser for PubChem, allowing search of 20 million organic molecules in PubChem, shown with sucrose as example query. Note that MQNs do not encode stereochemistry.

[ Normal View | Magnified View ]

MQN maps of the (PC1, PC2) plane for database GDB‐13 (977 million molecules). The surface is hashed in 1000 × 700 pixels. Each pixel is colored according to the occupancy or to the average value in that pixel, following the values indicated on the map on the corresponding color. Saturation to gray indicates the standard deviation for that value in the pixel up to ±2.1 (ring atoms and H‐bond acceptors). The lightness scale (fading to white) encodes the occupancy in a logarithmic scale between 0 (white) and 200 (full color). For the category map, molecules were assigned to categories in the priority order heteroaromatic (red) > aromatic (purple, not visible) > fused heterocycles (blue) > fused carbocycles (cyan) > heterocycles (green) > carbocycles (green‐yellow) > heteroacyclic compounds (acyclic molecules with interrupted carbon chain, yellow) > carboacyclic compounds (acyclic molecules with continuous carbon chain, orange), and pixels were colored following the most frequent category in that pixel with fading to gray indicating category purity in the pixel.

[ Normal View | Magnified View ]

The drug phenmetrazine (cpd no. 1) and 24 of its most structurally similar isomers found in GDB‐13. Structural similarity is expressed as the Tanimoto similarity coefficient TSF of a 1024‐bit Daylight type substructure fingerprint.

[ Normal View | Magnified View ]

Properties and topological features of molecules in GDB‐13. MW, molecular weight; TPSA, topological polar surface area; clogP, logarithm of the calculated water/octanol partition coefficient; RBC, rotatable bond count; HBD, number of H‐bond donor atoms; HBA, number of H‐bond accceptor atoms.

[ Normal View | Magnified View ]

Browse by Topic

Computer and Information Science > Chemoinformatics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts