This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

Open issues for partitioning clustering methods: an overview

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Over the last decades, a great variety of data mining techniques have been developed to reach goals concerning Knowledge Discovery in Databases. Among them, cluster detection techniques are of major importance. Although these techniques have already been largely explored in the scientific literature, there are at least two important open issues: the existent algorithms are not scalable for large high‐dimensional datasets, and the unsupervised nature of traditional data clustering makes it very difficult to generate meaningful clusters. This article presents an overview of the strategies being explored in order to deal more deeply with these issues. Moreover, it describes a new semi‐supervised clustering strategy that exemplifies the integration of several approaches and that can be employed with partitioning algorithms, such as PAM and Clarans. The technique addresses an improvement to these types of algorithms, which is obtained by using must‐link feedback information provided by the users in an interactive and visual environment. WIREs Data Mining Knowl Discov 2014, 4:161–177. doi: 10.1002/widm.1127 This article is categorized under: Technologies > Structure Discovery and Clustering
The Iris dataset. Multidimensional data and a two‐dimensional scatterplot.
[ Normal View | Magnified View ]
The four cases of the cost function of semi‐supervised PAM. Cases (a) and (b): the non‐medoid element sj belongs to the cluster represented by the medoid being considered in the swap (rm). Cases (c) and (d): the non‐medoid element sj does not belong to the cluster represented by the medoid being considered in the swap (rm). Filled circle (•) indicates must‐linked elements, plus symbol (+) indicates cluster center, symbol ‘—’ indicates before swap, and dashed line (‐‐‐) indicates after swap.
[ Normal View | Magnified View ]
Semi‐supervised clustering tool. (a) 3D visualization window with interaction controls. (b) Zoom allows to detail the visualization. The user created a must‐link constraint between two images in different clusters. (c) Visualization of the images selected for the must‐link constraint.
[ Normal View | Magnified View ]
Semi‐supervised clustering tool. (a) Main window. (b) Data visualization window.
[ Normal View | Magnified View ]
Neighborhood representation of a borderline element.
[ Normal View | Magnified View ]
Semi‐supervised clustering example. (a) Must‐link and cannot‐link constraints are indicated in the original dataset. (b) Clustering result considering two must‐link and one cannot‐link constraint. (c) Clustering result in the modified feature space implied by the distance function learned from the three constraints. Symbol ‘×’ Indicates cannot‐link and ‘—’ indicates must‐link.
[ Normal View | Magnified View ]
Image similarity in the feature space regarding color. Copyright 1999–2014, SIGNA. Photos by C. Hensler and D. Kramb.
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Structure Discovery and Clustering

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts