Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

Method evaluation, parameterization, and result validation in unsupervised data mining: A critical survey

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Machine Learning (ML) and Data Mining (DM) build tools intended to help users solve data‐related problems that are infeasible for “unaugmented” humans. Tools need manuals, however, and in the case of ML/DM methods, this means guidance with respect to which technique to choose, how to parameterize it, and how to interpret derived results to arrive at knowledge about the phenomena underlying the data. While such information is available in the literature, it has not yet been collected in one place. We survey three types of work for clustering and pattern mining: (1) comparisons of existing techniques, (2) evaluations of different parameterization options and studies providing guidance for setting parameter values, and (3) work comparing mining results with the ground truth. We find that although interesting results exist, as a whole the body of work on these questions is too limited. In addition, we survey recent studies in the field of community detection, as a contrasting example. We argue that an objective obstacle for performing needed studies is a lack of data and survey the state of available data, pointing out certain limitations. As a solution, we propose to augment existing data by artificially generated data, review the state‐of‐the‐art in data generation in unsupervised mining, and identify shortcomings. In more general terms, we call for the development of a true “Data Science” that—based on work in other domains, results in ML, and existing tools—develops needed data generators and builds up the knowledge needed to effectively employ unsupervised mining techniques. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Ensemble Methods > Structure Discovery Internet > Society and Culture Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
Different paths through the survey
[ Normal View | Magnified View ]
Item support distribution for the retail data set (left), Quest N1000L2000T10I4 (center), and Quest with L80 (right)
[ Normal View | Magnified View ]
Data mining results can either act as inputs in further data analysis, or can be interpreted by humans to arrive at knowledge about the world
[ Normal View | Magnified View ]
Relative performance graphs for the comparisons by Liu et al. (left) and Guerra et al. (right). In the left‐hand graph, criteria at the same level have similar capabilities
[ Normal View | Magnified View ]
Relative performance graphs of several evaluations of internal validation criteria
[ Normal View | Magnified View ]
Relative performance graph of different clustering evaluations
[ Normal View | Magnified View ]
Relative performance graph of different FGM publications. Directed edges from one algorithm to another indicate that the former ran faster than the latter, edges without direction indicate similar behavior
[ Normal View | Magnified View ]
Relative performance graph of different FSM publications. Directed edges from one algorithm to another indicate that the former ran faster than the latter, edges without direction indicate similar behavior
[ Normal View | Magnified View ]
Relative performance graphs for several FIM publications. A directed edge from an algorithm to another means that the first algorithm ran faster, an undirected edge that they perform similarly
[ Normal View | Magnified View ]

Browse by Topic

Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining
Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
Algorithmic Development > Structure Discovery

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts