Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 16.778

ChemML: A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract ChemML is an open machine learning (ML) and informatics program suite that is designed to support and advance the data‐driven research paradigm that is currently emerging in the chemical and materials domain. ChemML allows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general‐purpose utility, versatility, and user‐friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community. ChemML is also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data‐driven in silico research. This article is categorized under: Software > Simulation Methods Computer and Information Science > Chemoinformatics Structure and Mechanism > Computational Materials Science Software > Molecular Modeling
ChemML provides a multitude of methods as part of seven core task classes to conduct data mining projects (such as the creation of data‐derived machine learning surrogate models)
[ Normal View | Magnified View ]
The process of applying transfer learning from a deep learning model 1 for target property α based on a large set of lower‐quality data to a more accurate model 2 based on a small set of high‐quality data
[ Normal View | Magnified View ]
The process of applying a model‐based active learning strategy to a pool of unlabeled candidates as implemented in ChemML. After initializing the training and test sets randomly, an ensemble of deep learning models score unlabeled candidates in a sequential or batch selection format to query one or several new data points that promise to improve the model the most
[ Normal View | Magnified View ]
Toy workflow/computation graph used in ChemML. (a) Plot of the computation graph with corresponding Python code (b) and input file (c); (d) shows the structure of a computation unit and (e) an example of its implementation in the Jupyter notebook graphical user interface. We stress that actual machine learning workflows are considerably more involved than this simplistic toy example
[ Normal View | Magnified View ]
Design scheme and architecture of the ChemML program package, consisting of ChemML Library and ChemML Wrapper. The overview shows the seven task classes and the methods available within them, as well as the six categories we distinguish with regards to the method sources
[ Normal View | Magnified View ]

Browse by Topic

Software > Molecular Modeling
Structure and Mechanism > Computational Materials Science
Computer and Information Science > Chemoinformatics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts