This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Clustering genes with expression and beyond

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Clustering over gene expression is now a popular computational analysis in biology. In general, the amount of expression can be measured by high‐throughput techniques over thousands of genes simultaneously. The expression dataset can be a large table (or matrix) with numerical values, each being specified by one gene and one sample, and needs computational methods to be analyzed. This review starts with surveying techniques of clustering genes by expression, classifying them into three types: hierarchical, partitional, and subspace clustering. Major methods of hierarchical and partitional clustering as well as a variety of algorithms for subspace clustering are extensively reviewed. Techniques for clustering over expression, however, are now well matured and their performance is limited due to the inevitable noisiness of the high‐throughput nature of expression data. We then extend the scope of this review further to clustering genes with recently emerging data, gene networks, and show graph partitioning approaches, such as spectral methods, for clustering genes by a network. Furthermore, advanced approaches of gene clustering now combine gene networks with expression. This setting corresponds to so‐called semi‐supervised clustering in machine learning, and approaches under this problem setting will be widely reviewed, classifying those approaches into three types. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 496–511 DOI: 10.1002/widm.41 This article is categorized under: Algorithmic Development > Biological Data Mining Application Areas > Science and Technology Technologies > Machine Learning

Schematic pictures of three types of clustering approaches for expression: (a) hierarchical clustering, (b) partitional clustering: [(b‐1) deterministic partitional clustering and (b‐2) probabilistic partitional clustering], and (c) subspace clustering.

[ Normal View | Magnified View ]

Pseudocode of the algorithm in Ref 54.

[ Normal View | Magnified View ]

Pseudocode of the algorithm in Ref 61.

[ Normal View | Magnified View ]

Schematic picture of combining expression with a gene network.

[ Normal View | Magnified View ]

Pseudocode of Cluster Identification via Connectivity Kernel.

[ Normal View | Magnified View ]

Pseudocode of spectral clustering.

[ Normal View | Magnified View ]

Pseudocode of Clustering via Iterative Feature Filtering.

[ Normal View | Magnified View ]

Pseudocode of k‐means.

[ Normal View | Magnified View ]

Browse by Topic

Technologies > Machine Learning
Application Areas > Science and Technology
Algorithmic Development > Biological Data Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts