Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Kernel‐based measures of association

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Measures of association have been widely used for describing statistical relationships between two sets of variables. Traditionally, such association measures focus on specialized settings. Based on an in‐depth summary of existing common measures, we present a general framework for association measures that unifies existing methods and novel extensions based on kernels, including practical solutions to computational challenges. Specifically, we introduce association screening and variable selection via maximizing kernel‐based association measures. We also develop a backward dropping procedure for feature selection when there are a large number of candidate variables. The proposed framework was evaluated by independence tests and feature selection using kernel association measures on a diversified set of simulated association patterns with different dimensions and variable types. The results show the superiority of the generalized association measures over existing ones. We also apply our framework to a real‐world problem of gender prediction from handwritten texts. We demonstrate, through this application, the data‐driven adaptation of kernels, and how kernel‐based association measures can naturally be applied to data structures including functional input spaces. This suggests that the proposed framework can guide derivation of appropriate association measures in a wide range of real‐world problems and work well in practice. WIREs Comput Stat 2018, 10:e1422. doi: 10.1002/wics.1422 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
A map of association measures. Measures in red text can only detect monotone association, while the measure in blue text (MIC) is powerful in capturing local patterns. Four groups of different colored boxes indicate the types of variables. Broken lines indicate heuristic relations. Measures discussed in Appendix S1 are indicated by an asteroid (*)
[ Normal View | Magnified View ]
The most informative (left) and the least informative (right) feature groups in the handwriting data set according to KL distance correlation. Male is plotted with blue and female with red. For each feature and each class, 21 quantiles are plotted using dots. The connected lines show the medians of the features for male and female, respectively
[ Normal View | Magnified View ]
Classification errors for the handwriting data set with different numbers of features from different feature selection methods (shown in different colors)
[ Normal View | Magnified View ]
The evolution of the scaling factors (weights) through the optimization procedure. For this particular realization there are 60 iterations in the optimization process (the upper panel), of which five are plotted (the lower panels, indicated by dots in the upper panel)
[ Normal View | Magnified View ]
Simulated data with a hypothetical dependence structure in a three‐dimensional space. (a) Double helix with class labels indicated by colors. (b) Pairwise scatter plots for the three informative features in the simulated double‐helix data, with class labels indicated by colors
[ Normal View | Magnified View ]
Slice plots of the sixth association pattern in Figure . The scatter plots are based on one random simulation
[ Normal View | Magnified View ]
p Value box plots from two competing tests for each of the three‐dimensional association patterns based on 50 samples. The box plots are based on 200 simulations
[ Normal View | Magnified View ]
Association patterns and corresponding p value box plots from four different tests based on 50 samples. The scatter plots are based on one random simulation and the box plots of p values are based on 200 simulations
[ Normal View | Magnified View ]
Patterns for the two angles used in the simulations (adapted from Wikipedia)
[ Normal View | Magnified View ]

Browse by Topic

Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery
Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition
Statistical and Graphical Methods of Data Analysis > Multivariate Analysis

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts