This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The term ‘Big Data’ has spread rapidly in the framework of Data Mining and Business Intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but also the necessity for scalability, i.e., to ensure a response in an acceptable elapsed time. When the scalability term is considered, usually traditional parallel‐type solutions are contemplated, such as the Message Passing Interface or high performance and distributed Database Management Systems. Nowadays there is a new paradigm that has gained popularity over the latter due to the number of benefits it offers. This model is Cloud Computing, and among its main features we has to stress its elasticity in the use of computing resources and space, less management effort, and flexible costs. In this article, we provide an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks. In particular, we focus on those systems for large‐scale analytics based on the MapReduce scheme and Hadoop, its open‐source implementation. We identify several libraries and software projects that have been developed for aiding practitioners to address this new programming model. We also analyze the advantages and disadvantages of MapReduce, in contrast to the classical solutions in this field. Finally, we present a number of programming frameworks that have been proposed as an alternative to MapReduce, developed under the premise of solving the shortcomings of this model in certain scenarios and platforms. WIREs Data Mining Knowl Discov 2014, 4:380–409. doi: 10.1002/widm.1134 This article is categorized under: Technologies > Classification Technologies > Computer Architectures for Data Mining
Business Intelligence structure.
[ Normal View | Magnified View ]
BSP model workflow.
[ Normal View | Magnified View ]
Alternative frameworks for the standard MapReduce model.
[ Normal View | Magnified View ]
Machine Learning software suites: a three‐generational view.
[ Normal View | Magnified View ]
MapReduce simplified flowchart.
[ Normal View | Magnified View ]
The architecture of Hadoop‐distributed file system (HDFS). The namenode (master) is responsible for maintaining the file namespace and directing clients to datanodes (slaves) that actually hold data blocks containing user data.
[ Normal View | Magnified View ]
Big Data framework.
[ Normal View | Magnified View ]
Illustration of the layers for the Service‐Oriented Architecture
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Computer Architectures for Data Mining
Technologies > Classification

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts