Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

Data mining and life sciences applications on the grid

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Data mining (DM) is increasingly used in the analysis of data generated in life sciences, including biological data produced in several disciplines such as genomics and proteomics, medical data produced in clinical practice, and administrative data produced in health care. The difficulty in mining such data is twofold. First of all, data in life sciences are inherently heterogeneous, spanning from molecular level data to clinical and administrative data. Second, data in life sciences are produced at an increasing rate and data repositories are becoming very large. Thus, the management and analysis of such data is becoming a main bottleneck in biomedical research. The main goal of this paper is to review the main methodologies to mine life sciences data and the ways they are coupled to high‐performance infrastructures and systems that result in an efficient analysis. This paper recalls basic concepts of DM, grids, and distributed DM on grids, and reviews main approaches to mine biomedical data on high‐performance infrastructures with special focus on the analysis of genomics, proteomics, and interactomics data, and the exploration of magnetic resonance images in neurosciences. The paper can be of interest both to bioinformaticians, who can learn how to exploit high performance infrastructures to mine life sciences data, and to computer scientists, who can address the heterogeneity and the high volumes of life sciences data at the data management, algorithm, and user interface layers. © 2013 Wiley Periodicals, Inc.

Figure 1.

The Weka explorer interface.

[ Normal View | Magnified View ]
Figure 2.

The RapidMiner interface.

[ Normal View | Magnified View ]
Figure 3.

Workflow diagram for the knowledge discovery in neuroscience.

[ Normal View | Magnified View ]
Figure 4.

Confusion matrix generated by Weka by applying SVM (SMO Weka Classifier) on the breast cancer dataset available on the Weka website.

[ Normal View | Magnified View ]
Figure 5.

ROC graph related to the classification of breast cancer dataset generated by Weka.

[ Normal View | Magnified View ]

Related Articles

Genomics: An Interdisciplinary View

Browse by Topic

Technologies > Computer Architectures for Data Mining
Algorithmic Development > Biological Data Mining
Application Areas > Data Mining Software Tools
Application Areas > Health Care

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts