This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

The top‐K tau‐path screen for monotone association in subpopulations

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

A pair of variables that tend to rise and fall either together or in opposition are said to be monotonically associated. For certain phenomena, this tendency is causally restricted to a subpopulation, as, e.g., the severity of an allergic reaction trending with the concentration of an air pollutant. Previously, Yu et al. (Stat Methodol 2011, 8:97–111) devised a method of rearranging observations to test paired data to see if such an association might be present in a subpopulation. However, the computational intensity of the method limited its application to relatively small samples of data, and the test itself only judges if association is present in some subpopulation; it does not clearly identify the subsample that came from this subpopulation, especially when the whole sample tests positive. The present study adds a ‘top‐K’ feature (Sampath S, Verducci JS. Stat Anal Data Min 2013, 6:458–471) based on a multistage ranking model, that identifies a concise subsample that is likely to contain a high proportion of observations from the subpopulation in which the association is supported. Computational improvements incorporated into this top‐K tau‐path algorithm now allow the method to be extended to thousands of pairs of variables measured on sample sizes in the thousands. A description of the new algorithm along with measures of computational complexity and practical efficiency help to gauge its potential use in different settings. Simulation studies catalog its accuracy in various settings, and an example from finance illustrates its step‐by‐step use. WIREs Comput Stat 2016, 8:206–218. doi: 10.1002/wics.1382 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical and Graphical Methods of Data Analysis > Nonparametric Methods
Quadratic fits to average runtimes of FastBCS2.
[ Normal View | Magnified View ]
The graph represents the price of oil over 497 weeks starting from the end of 2004. The rug plot refers to the cluster of 24 similarly profiled stocks from Table . The colors (online version) indicate how many of the 24 stocks are included by TKTP in each time interval as being part of the general pattern of stock price increasing with 6‐month prior increases in the price of oil.
[ Normal View | Magnified View ]
Five stock prices associated with 6‐month lagged oil: 2005–2014.
[ Normal View | Magnified View ]
Cumulative number of stocks associated with 6‐month lagged oil.
[ Normal View | Magnified View ]
Summary of TKTP (α = 0.05, w = 3) simulations under Frank mixtures of copulae.
[ Normal View | Magnified View ]
Density contours of Frank (a) and Gaussian (b) copulae, both with τ = 0.5 and ρ = 0.7. Contours depict density levels of 0.5 to 5 in steps of 0.05.
[ Normal View | Magnified View ]
Corresponding population values of Kendall's τ and Spearman's ρ from Frank and Gaussian copulae.
[ Normal View | Magnified View ]

Browse by Topic

Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis
Statistical and Graphical Methods of Data Analysis > Nonparametric Methods

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts