This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Similarity measures for sequential data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Expressive comparison of strings is a prerequisite for analysis of sequential data in many areas of computer science. However, comparing strings and assessing their similarity is not a trivial task and there exists several contrasting approaches for defining similarity measures over sequential data. In this paper, we review three major classes of such similarity measures: edit distances, bag‐of‐word models, and string kernels. Each of these classes originates from a particular application domain and models similarity of strings differently. We present these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 296–304 DOI: 10.1002/widm.36 This article is categorized under: Algorithmic Development > Biological Data Mining Algorithmic Development > Text Mining Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining

Suffix tree (a) and matching statistic (b) for exemplary strings x and y. The additional symbol | indicates the different suffices of x. For simplicity, only two suffix links (dotted lines) are shown.

[ Normal View | Magnified View ]

Browse by Topic

Fundamental Concepts of Data and Knowledge > Data Concepts
Algorithmic Development > Biological Data Mining
Algorithmic Development > Text Mining
Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts