This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Density‐based clustering

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Clustering refers to the task of identifying groups or clusters in a data set. In density‐based clustering, a cluster is a set of data objects spread in the data space over a contiguous region of high density of objects. Density‐based clusters are separated from each other by contiguous regions of low density of objects. Data objects located in low‐density regions are typically considered noise or outliers. In this review article we discuss the statistical notion of density‐based clusters, classic algorithms for deriving a flat partitioning of density‐based clusters, methods for hierarchical density‐based clustering, and methods for semi‐supervised clustering. We conclude with some open challenges related to density‐based clustering. This article is categorized under: Technologies > Data Preprocessing Ensemble Methods > Structure Discovery Algorithmic Development > Hierarchies and Trees
Density‐distributions of data points and density‐based clusters for different density levels. Different colors indicate different clusters or noise. (a) Data set, (b) probability density function of the data, (c) two clusters merged, (d) low density level, (e) three clusters separated, (f) good density level, (g) one cluster becomes noise, and (h) high density level
[ Normal View | Magnified View ]
Visualization of the HDBSCAN* hierarchical clustering as a simplified cluster tree, where “eps” stands for the radius ε, which is inversely proportional to the density level: From top to bottom, we can see that there are two major clusters. The left one stays stable and shrinks until it disappears as the density level increases. The right one splits into two noticeable clusters, one of which shrinks until it disappears, whereas the other one shrinks along a certain lifetime interval until it further splits into two subclusters, each of which quickly split further into two very small subclusters, which then die out. The red boxes correspond to the optimal selection of clusters performed by algorithm FOSC, which is an optional postprocessing routine used by HDBSCAN*, with its default stability criterion. This figure was reproduced using the R package “dbscan,” with the toy data set “moons,” following the codes in the package vignettes
[ Normal View | Magnified View ]
Reachability plot for a sample 2D data set (visualization with ELKI [Schubert & Zimek, ])
[ Normal View | Magnified View ]
The impact of different kernels on the density estimation
[ Normal View | Magnified View ]
Density connectivity
[ Normal View | Magnified View ]
Maunga Whau Volcano (Mt. Eden). (a) Topographic information and (b) area of selected minimum altitude
[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Hierarchies and Trees
Algorithmic Development > Structure Discovery
Technologies > Data Preprocessing

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts