Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 1.939

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data—likely to contain noise and imperfections—are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k‐nearest neighbors rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data—which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context are investigated. This includes a brief overview of Smart Data, current and future trends for the k‐nearest neighbor algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data‐ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k‐nearest neighbor algorithm to obtain Smart/Quality Data for a high‐quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analyzed.

This article is categorized under:

  • Technologies > Data Preprocessing
  • Fundamental Concepts of Data and Knowledge > Big Data Mining
  • Technologies > Classification
The k‐nearest Neighbors algorithm plays a key role to cope with Big Data by transforming it into Smart data that is free of redundant information, noise and/or missing values. Gleaning quality data is essential for a correct data mining process that will uncover valuable insights
[ Normal View | Magnified View ]
Runtime chart to perform kNN‐LI
[ Normal View | Magnified View ]
Runtime chart in logarithmic scale to perform smart filtering
[ Normal View | Magnified View ]
Runtime chart in logarithmic scale to perform smart reduction
[ Normal View | Magnified View ]
Storage requirements reduction on ECBDL’14 dataset
[ Normal View | Magnified View ]
Flowchart of the kNN‐LI algorithm. The dataset is split into M chunks (Map function) that are processed locally by a standard kNN‐I algorithm. The resulting amended partitions are then gathered together
[ Normal View | Magnified View ]
Big data preprocessing is the key to transform raw big data into quality and smart data
[ Normal View | Magnified View ]

Browse by Topic

Fundamental Concepts of Data and Knowledge > Big Data Mining
Technologies > Classification
Technologies > Data Preprocessing

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts