Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

No Free Lunch Theorem for concept drift detection in streaming data classification: A review

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Many real‐world data mining applications have to deal with unlabeled streaming data. They are unlabeled because the sheer volume of the stream makes it impractical to label a significant portion of the data. The data streams can evolve over time and these changes are called concept drifts. Concept drifts have different characteristics, which can be used to categorize them into different types. A trade‐off between performance and cost exists among many concept drift detection approaches. On the one hand, high accuracy detection approach usually requires labeled data, possibly involving high cost for labeling. On the other hand, a variety of methods have been devoted to the topic of concept drift detection with unlabeled data, but these approaches often are most suited for only a subset of the concept drift types. The objective of this survey is to present these methods, categorize them and give recommendations of usage based on their behaviors under different types of concept drift. This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Explainable AI > Classification
Demonstrating concept drift: (a) linear classification boundary trained on initial training data separating two classes and (b) classification boundary becomes vertical after concept drift changing class distribution
[ Normal View | Magnified View ]
Ensemble of drift detection experimental results summarization: (a) percentage of correct detection of each algorithm compared to Drift Detection Method and (b) percentage of false positive of each algorithm
[ Normal View | Magnified View ]
Margin density failed to detect concept drift
[ Normal View | Magnified View ]
Illustration of margin density drift detection
[ Normal View | Magnified View ]
When data distribution‐based density monitoring fails. Density‐based concept drift detection using cluster with labeled data (a) and without labeled data (b)
[ Normal View | Magnified View ]
Basic principle of data distribution‐based density monitoring
[ Normal View | Magnified View ]
When distribution‐based statistical testing fails. Characteristics of the data and probabilistic distributions of X and Y feature values (a) before concept drift and (b) after concept drift
[ Normal View | Magnified View ]
Basic principle of data distribution‐based statistical testing. Characteristics of the data and probabilistic distributions of X and Y feature values (a) before concept drift, and (b) after concept drift
[ Normal View | Magnified View ]
Basic principle of performance‐based drift detection using ensemble learning: (a) ensemble before drift and (b) ensemble after drift
[ Normal View | Magnified View ]
Basic principle of performance‐based statistical testing
[ Normal View | Magnified View ]
Illustration of combination between recurrent and periodic drift: (a) periodic recurrent drift, (b) periodic non‐recurrent drift, (c) non‐periodic recurrent drift, and (d) non‐periodic non‐recurrent drift
[ Normal View | Magnified View ]
Illustration of periodic drift
[ Normal View | Magnified View ]
Illustration of recurrent drift
[ Normal View | Magnified View ]
Illustration of combination between speed and distribution of change: (a) fixed space sudden drift, (b) fixed space gradual drift, (c) fixed space incremental drift, (d) non‐fixed space sudden drift, (e) non‐fixed space gradual drift, and (f) non‐fixed space incremental drift
[ Normal View | Magnified View ]
Novel class in streaming data: (a) initially two classes occupy a data stream, (b) novel class appear in a new data region, and (c) novel class appear in existing data region
[ Normal View | Magnified View ]
Illustration of (a) fixed space and (b) non‐fixed space drifts
[ Normal View | Magnified View ]
(a) Sudden, (b) incremental, and (c) gradual drifts classified by speed of change
[ Normal View | Magnified View ]
Novel features in streaming data: (a) initial data in stream showing only two variables and (b) a new feature appears in data stream
[ Normal View | Magnified View ]
Detecting various types of concept drift is difficult without label: (a) data without labels showing no change before and after, (b) data with labels showing class distribution change, (c) data without labels having a new region of data, signaling concept drift, and (d) data with labels shows new region does not affect existing classification
[ Normal View | Magnified View ]
Flow chart for a common data stream classification framework
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Classification
Fundamental Concepts of Data and Knowledge > Data Concepts
Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts