Applying statistical thinking to ‘Big Data’ problems

Much has been written recently about ‘Big Data’ and the new possibilities that mining this vast amount of data brings. It promises to help us understand or predict everything from the Higgs boson to what a customer might purchase next from Amazon. As with most new phenomena, it is hard to sift through the hype and promotion to understand what is actually true and what is actually useful. One implicit or even explicitly stated assumption in much of the Big Data literature is that statistical thinking fundamentals are no longer relevant in the petabyte age. However, we believe just the opposite. Fundamentals of good modeling and statistical thinking are crucial for the success of Big Data projects. Sound statistical practices, such as ensuring high‐quality data, incorporating sound domain (subject matter) knowledge, and developing an overall strategy or plan of attack for large modeling problems, are even more important for Big Data problems than small data problems. WIREs Comput Stat 2014, 6:222–232. doi: 10.1002/wics.1306 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling Data: Types and Structure > Massive Data Data: Types and Structure > Traditional Statistical Data Applications of Computational Statistics > Education in Computational Statistics
The building blocks of statistical thinking.
The phases of Big Data projects.
Data: Types and Structure > Massive Data
Data: Types and Structure > Traditional Statistical Data
Applications of Computational Statistics > Education in Computational Statistics
Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling

