Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

Anomaly detection by robust statistics

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Real data often contain anomalous cases, also known as outliers. These may spoil the resulting analysis but they may also contain valuable information. In either case, the ability to detect such anomalies is essential. A useful tool for this purpose is robust statistics, which aims to detect the outliers by first fitting the majority of the data and then flagging data points that deviate from it. We present an overview of several robust methods and the resulting graphical outlier detection tools. We discuss robust procedures for univariate, low‐dimensional, and high‐dimensional data, such as estimating location and scatter, linear regression, principal component analysis, classification, clustering, and functional data analysis. Also the challenging new topic of cellwise outliers is introduced. WIREs Data Mining Knowl Discov 2018, 8:e1236. doi: 10.1002/widm.1236

This article is categorized under:

  • Algorithmic Development > Spatial and Temporal Data Mining
  • Technologies > Classification
  • Technologies > Structure Discovery and Clustering
  • Technologies > Visualization
Male mortality in France in 1816–2010: (left) detecting outlying rows by a robust principal component analysis method; (right) detecting outlying cells by DetectDeviatingCells. After the analysis, the cells were grouped in blocks of 5 × 5 for visibility.
[ Normal View | Magnified View ]
Glass data: (left) spectra; (right) outlier map.
[ Normal View | Magnified View ]
Illustration of PCA: (left) types of outliers; (right) outlier map: plot of orthogonal distances versus score distances.
[ Normal View | Magnified View ]
Stackloss data: (left) standardized nonrobust least squares (LS) residuals of y versus nonrobust distances of x; (right) same with robust residuals and robust distances.
[ Normal View | Magnified View ]
Stars data: standardized robust residuals of y versus robust distances of x.
[ Normal View | Magnified View ]
Stars data: classical least squares line (red) and robust line (blue).
[ Normal View | Magnified View ]
Animal data: robust distance versus classical Mahalanobis distance.
[ Normal View | Magnified View ]
Animal data: tolerance ellipse of the classical mean and covariance matrix (red), and that of the robust location and scatter matrix (blue).
[ Normal View | Magnified View ]
Cell map of the glass data. The positions of the deviating cells reveal the chemical contaminants.
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Classification
Technologies > Structure Discovery and Clustering
Technologies > Visualization
Algorithmic Development > Spatial and Temporal Data Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts