Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 1.939

On the accuracy of linear regression routines in some data mining packages

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted.

This article is categorized under:

  • Technologies > Statistical Fundamentals
  • Algorithmic Development > Statistics
  • Application Areas > Data Mining Software Tools
Minimum log relative errors (LREs) for Big Longley Regressions by package
[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Statistics
Application Areas > Data Mining Software Tools
Technologies > Statistical Fundamentals

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts