This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

Machine learning for bioinformatics and neuroimaging

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Machine Learning (ML) is a well‐known paradigm that refers to the ability of systems to learn a specific task from the data and aims to develop computer algorithms that improve with experience. It involves computational methodologies to address complex real‐world problems and promises to enable computers to assist humans in the analysis of large, complex data sets. ML approaches have been widely applied to biomedical fields and a great body of research is devoted to this topic. The purpose of this article is to present the state‐of‐the art in ML applications to bioinformatics and neuroimaging and motivate research in new trend‐setting directions. We show how ML techniques such as clustering, classification, embedding techniques and network‐based approaches can be successfully employed to tackle various problems such as gene expression clustering, patient classification, brain networks analysis, and identification of biomarkers. We also present a short description of deep learning and multiview learning methodologies applied in these contexts. We discuss some representative methods to provide inspiring examples to illustrate how ML can be used to address these problems and how biomedical data can be characterized through ML. Challenges to be addressed and directions for future research are presented and an extensive bibliography is included.

This article is categorized under:

  • Application Areas > Health Care
  • Technologies > Computational Intelligence
  • Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
  • Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining
MVDA integration methodology. The approach is composed of four steps. As a first step, the features are clustered and replaced by their prototypes, in order to reduce the input dimension (part a of the figure). Second, the prototypes are ranked by the patient class separability and the most significant ones are selected (part b of the figure). Third, the patients are clustered and the membership matrices are obtained (part c of the figure). Fourth, a late integration approach is used to integrate clustering results (part d of the figure)
[ Normal View | Magnified View ]
Data integration stages as proposed by Pavlidis, Weston, Cai, and Grundy (). They proposed a Support Vector Machine (SVM) kernel function in order to integrate microarray data. In early integration methodologies, SVMs are trained with a kernel obtained from the concatenation of all the views in the data set (a). In intermediate integration, first a kernel is obtained for each view, and then the combined kernel is used to train the SVM (b). In the late integration methodology a single SVM is trained on a single kernel for each view and then the final results are combined (c)
[ Normal View | Magnified View ]
Independent component analysis on functional magnetic resonance imaging data: the input matrix consists of time series associated to brain voxels; the mixing matrix contains, for each time point, the relative contribution of each independent component to the global signal; the components matrix indicates the contribution of the single voxels to the components
[ Normal View | Magnified View ]
Example of autoencoder architecture with five inputs (×1, ×2, ×3, ×4, and ×5) and outputs (x 1, x 2, x 3, x 4, and x 5) and one hidden layer with three neurons (z1, z2, and z3)
[ Normal View | Magnified View ]
Typical CNN architecture. Image by Aphex34 (Own work) [CC BY‐SA 4.0 (https://creativecommons.org/licenses/by‐sa/4.0)], via Wikimedia Commons
[ Normal View | Magnified View ]
Network modeling applied to the investigation of brain connectivity: nodes represent brain regions and an edge between two regions indicates that there is a functional connection between the measured brain activity in the two regions
[ Normal View | Magnified View ]
An example of gene co‐expression network (orange undirected graph) and gene regulation network (violet directed graph)
[ Normal View | Magnified View ]
(a) A nonlinearly separable data set. The flexibility of the decision tree improves classification performance. (b) The binary tree derived from the trained classifier. Predictions are computed by visiting the tree and performing the tests included inside the diamond‐shaped nodes. When a rectangular node is reached, the predicted label corresponds to the reported class
[ Normal View | Magnified View ]
Linearly separable problem solved by a linear support vector machine. The area enclosed by the dashed lines corresponds to the margin. The circled samples of both classes represent the support vectors and define the decision boundary shown by the solid line
[ Normal View | Magnified View ]
One of the advantage of clustering‐based feature selection techniques is that the anatomical information associated to features is preserved, therefore at the end of the analysis region of interest can be easily visualized
[ Normal View | Magnified View ]

Browse by Topic

Application Areas > Health Care
Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining
Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
Technologies > Computational Intelligence

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts