This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

A survey on Neyman‐Pearson classification and suggestions for future research

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

In statistics and machine learning, classification studies how to automatically learn to make good qualitative predictions (i.e., assign class labels) based on past observations. Examples of classification problems include email spam filtering, fraud detection, market segmentation. Binary classification, in which the potential class label is binary, has arguably the most widely used machine learning applications. Most existing binary classification methods target on the minimization of the overall classification risk and may fail to serve some real‐world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one specific class than the other. Neyman‐Pearson (NP) paradigm was introduced in this context as a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers with a minimal type II error subject to a type I error constraint under some user‐specified level. Though NP classification has the potential to be an important subfield in the classification literature, it has not received much attention in the statistics and machine learning communities. This article is a survey on the current status of the NP classification literature. To stimulate readers' research interests, the authors also envision a few possible directions for future research in NP paradigm and its applications. WIREs Comput Stat 2016, 8:64–81. doi: 10.1002/wics.1376 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification
Classical versus NP classifiers in a binary classification example. The true distributions of data x under the two balanced classes are and respectively. Suppose that a user prefers a type I error ≤0.05. The classical classifier (x ≥ 1) that minimizes the risk would result in a type I error = 0.16 > 0.05. On the other hand, the NP classifier (x ≥ 1.65) that minimizes the type II error under the type I error constraint (≤0.05) delivers the desirable type I error.
[ Normal View | Magnified View ]
Marker detection via multi‐class NP strategy 1.
[ Normal View | Magnified View ]
Automatic disease diagnosis via NP classification and network‐assisted correction.
[ Normal View | Magnified View ]
Average error rates of over 1000 independent simulations for each combination of (d, m, n). Error rates are computed as the average of 2000 independent testing data points over 1000 simulations.
[ Normal View | Magnified View ]
Illustration violation of the margin assumption and detection condition. The solid lines represent the oracle decision boundaries. Subfigure (a) illustrates violation of the margin assumption, and subfigure (b) illustrates violation of the detection condition.
[ Normal View | Magnified View ]

Browse by Topic

Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts