Knowledge discovery in databases has become an integral part of practically every aspect of bioinformatics research, which usually produces, and has to process, very large amounts of data. Rational drug design is one of the current scientific areas that has greatly benefited from bioinformatics, particularly a step, which analyzes receptor–ligand interactions via molecular docking simulations. An important challenge is the inclusion of the receptor flexibility since they can become computationally very demanding. We have represented this explicit flexibility as a series of different conformations derived from a molecular dynamics simulation trajectory of the receptor. This model has been termed as the fully flexible receptor (FFR) model. In our studies, the receptor is the enzyme InhA from Mycobacterium tuberculosis, which is the major drug target for the treatment of tuberculosis. The FFR model of InhA (named FFR_InhA) was docked to four ligands, namely, nicotinamide adenine dinucleotide, pentacyano(isoniazid)ferrate II, triclosan, and ethionamide, thus, generating very large amounts of data, which needs to be mined to produce useful knowledge to help accelerate drug discovery and development. Very little work has been done in this area. In this article, we review our work on the application of classification decision trees, regression model tree, and association rules using properly preprocessed data of the FFR molecular docking results, and show how they can provide an improved understanding of the FFR_InhA‐ligand behavior. Furthermore, we explain how data mining techniques can support the acceleration of molecular docking simulations of FFR models. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 532–541 DOI: 10.1002/widm.46
WIREs Data Mining Knowl Discov
Mining flexible‐receptor molecular docking data
Can't access this content? Tell your librarian.
The preprocessing steps needed to generate appropriate inputs for data mining. (a) The definition of each attribute in the input files. (b) An example of input data mining file for the ethionamide ligand. (c) Intermediate steps in data preparation needed for some data mining techniques. (d) The final data mining inputs for each different mining technique. See text for details.
Example of M5P algorithm output. (a) The final model tree for the nicotinamide adenine dinucleotide ligand. This tree has 11 linear models and 10 nodes. (b) Description of the linear model 1 (LM1) of this model tree.
Induced decision tree for the nicotinamide adenine dinucleotide ligand. The leaf nodes are colored according to the free energy of binding (FEB) classes obtained after discretization. Good and Excellent (G and E) FEB classes are in green. Bad and Very Bad (B and VB) FEB classes are in red. The Regular (R) FEB class is in white.


