Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 8.127

Algorithms for chemoinformatics

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract The field of chemoinformatics makes use of a great number of different computational algorithms from mathematics and computer science as well as algorithms that are defined explicitly for challenges specific to chemoinformatics. This article provides an overview of algorithms and algorithmic complexity together with a review of common algorithmic design paradigms. These approaches are then given context with a set of relevant examples from chemoinformatics with definitions of the algorithms and, where relevant, graphical explanations. © 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 716‐726 DOI: 10.1002/wcms.42 This article is categorized under: Computer and Information Science > Chemoinformatics

The pseudocode representation for the calculation of the Tanimoto similarity coefficient for continuous vectors. From the pseudocode, one can see that the dot product of the two vectors is only calculated once, but applied twice in calculating the Tanimoto coefficient.

[ Normal View | Magnified View ]

Pseudocode of the Morgan algorithm.

[ Normal View | Magnified View ]

Graphic description of the Morgan algorithm. The algorithm could terminate after i = 2, but implementations typically run for a few more iterations to ensure that the solution is found.

[ Normal View | Magnified View ]

Pseudocode of the Dijkstra single‐source shortest path algorithm.

[ Normal View | Magnified View ]

Graphic description of the Dijkstra single‐source shortest path algorithm. The current node in each iteration is provided in red and the previous node in yellow. Nodes are white before they are visited and gray once they are visited. The edge weights (or distances) are given as the smaller numbers on the edges, with the shortest path to each node at that iteration given as the node labels. At i = 1, the shortest path to the node at the bottom is 8; however, as more paths are explored with i = 2 the shortest distance becomes 7 because an alternative route via the node at the top has been discovered.

[ Normal View | Magnified View ]

Pseudocode of the Fingal algorithm.

[ Normal View | Magnified View ]

Graphical schematic of the generation of a molecular fingerprint using the Fingal algorithm. (a) The original molecule, caffeine. (b) Illustration of a tree of paths through the molecule for one nitrogen atom up to a path length of three edges. (c) Three paths being encoded into the fingerprint using the hashed indices. Note that the element highlighted in red in (c) is an example of a hashing collision where different paths hash into the same value. Multiple indices are often used to alleviate this problem as the chance of different paths hashing into identical sets of values is greatly reduced.

[ Normal View | Magnified View ]

A flowchart representation of a simple genetic algorithm (GA) is provided in (a), with the key aspects clarified further in parts (b), (c), (d), and (e). The definition of a binary chromosome definition is provided in (a), with an allele set (the set of permitted values that each gene may take) of 0 and 1, and a chromosome of N genes. The fitness function is defined in (c) for a population of M chromosomes with each cn being the fitness of chromosome n in the population. Roulette wheel selection, a stochastic selection process, is provided in (d) where chromosomes are assigned a proportion of the roulette wheel in direct proportion to their fitness score. The roulette wheel is spun randomly and whichever chromosome position on which it falls is selected. The crossover and mutation operators are shown in (d).

[ Normal View | Magnified View ]

Pseudocode of the binary search algorithm.

[ Normal View | Magnified View ]

Graphic description of the binary search algorithm searching an ordered list for the value 5: (a) Searching through the vector, beginning at the mid‐point of the vector; (b) a tree representation of the search path. The search requires four comparisons for a vector of length 14.

[ Normal View | Magnified View ]

Related Articles

Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review
Diversity selection algorithms
Similarity searching
Outstanding challenges in protein–ligand docking and structure‐based virtual screening
Automated systematic nomenclature generation for organic compounds
Algorithms for Chemoinformatics: an Interdisciplinary View
Drug Discovery: An Interdisciplinary View

Browse by Topic

Computer and Information Science > Chemoinformatics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts