This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Identification of significant features in DNA microarray data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

DNA microarrays are a relatively new technology that can simultaneously measure the expression level of thousands of genes. They have become an important tool for a wide variety of biological experiments. One of the most common goals of DNA microarray experiments is to identify genes associated with biological processes of interest. Conventional statistical tests often produce poor results when applied to microarray data owing to small sample sizes, noisy data, and correlation among the expression levels of the genes. Thus, novel statistical methods are needed to identify significant genes in DNA microarray experiments. This article discusses the challenges inherent in DNA microarray analysis and describes a series of statistical techniques that can be used to overcome these challenges. The problem of multiple hypothesis testing and its relation to microarray studies are also considered, along with several possible solutions. WIREs Comput Stat 2013, 5:309–325. doi: 10.1002/wics.1260 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Data Reduction, Smoothing, and Filtering Applications of Computational Statistics > Genomics/Proteomics/Genetics Data: Types and Structure > Microarrays
Illustration of a typical microarray experiment (using cDNA technology). First, mRNA is extracted from two groups of cells, namely an experimental sample of interest and a control sample. Each sample is labeled with a different color of fluorescent dye. The samples are then combined and hybridized onto an array. The relative abundance of the mRNA corresponding to a particular gene can be measured by calculating the ratio of red dye to green dye at the appropriate spot on the array.
[ Normal View | Magnified View ]
Illustration of the optimal discovery procedure (ODP). Suppose that the test statistic for the null hypothesis of no differential expression is t = − 2 for one gene and t = 2 for a second gene. Suppose further that there are several other genes with similar expression patterns to the second gene for which t ≈ 2. Using traditional hypothesis testing procedures, one would be equally likely to reject the null hypothesis of no differential expression for both of the two genes. Using ODP, one would be more likely to reject the null hypothesis for the gene where t = 2, since the existence of several genes with similar expression patterns increases ones confidence that the result is not due to chance.
[ Normal View | Magnified View ]
Illustration of the association between the complexity of a model and the bias/variance of the model. In general, as the complexity of a model increases, the variance of the model increases and the bias of the model decreases.
[ Normal View | Magnified View ]
Illustration of the bias‐variance trade‐off. The above figure shows a regression problem where the objective is to predict y given a value of x. The dotted line shows the true relationship between x and y. The linear regression estimator (shown in blue) has high bias and low variance, and the interpolation estimator (shown in orange) has low bias and high variance.
[ Normal View | Magnified View ]
Heat map of the leukemia microarray data of Bullinger et al. Each colored square on the map corresponds to the expression level of a given gene for a given patient. In the above figure, each row represents a gene and each column represents a patient. The brighter the color of a given square, the higher (or lower) the expression level of the corresponding gene. Usually hierarchical clustering is performed on the rows and columns of the data set prior to drawing the heat map.
[ Normal View | Magnified View ]
Image of a DNA microarray slide. One may measure the relative gene expression of each gene by comparing the ratio of the amount of red dye to the amount of green dye at each probe on the array.
[ Normal View | Magnified View ]

Browse by Topic

Data: Types and Structure > Microarrays
Applications of Computational Statistics > Genomics/Proteomics/Genetics
Statistical and Graphical Methods of Data Analysis > Data Reduction, Smoothing, and Filtering

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts