This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Concurrent software architectures for exploratory data analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Decades ago, increased volume of data made manual analysis obsolete and prompted the use of computational tools with interactive user interfaces and rich palette of data visualizations. Yet their classic, desktop‐based architectures can no longer cope with the ever‐growing size and complexity of data. Next‐generation systems for explorative data analysis will be developed on client–server architectures, which already run concurrent software for data analytics but are not tailored to for an engaged, interactive analysis of data and models. In explorative data analysis, the key is the responsiveness of the system and prompt construction of interactive visualizations that can guide the users to uncover interesting data patterns. In this study, we review the current software architectures for distributed data analysis and propose a list of features to be included in the next generation frameworks for exploratory data analysis. The new generation of tools for explorative data analysis will need to address integrated data storage and processing, fast prototyping of data analysis pipelines supported by machine‐proposed analysis workflows, pre‐emptive analysis of data, interactivity, and user interfaces for intelligent data visualizations. The systems will rely on a mixture of concurrent software architectures to meet the challenge of seamless integration of explorative data interfaces at client site with management of concurrent data mining procedures on the servers. WIREs Data Mining Knowl Discov 2015, 5:165–180. doi: 10.1002/widm.1155 This article is categorized under: Application Areas > Data Mining Software Tools Technologies > Computer Architectures for Data Mining
RStudio has a command line interface to invoke analysis methods and plot data graphs. Visualizations are displayed in a separate window (right), and the next scripting steps often strongly rely on the displayed results.
[ Normal View | Magnified View ]
Intelligent data visualization by VizRank. The window on the right shows Radviz visualization of gene expression profiles from Ref. . Vizrank can score Radviz data projections by degree of separation between data items of different classes and offer them in browsable ranked list (window on the left).
[ Normal View | Magnified View ]
Machine prediction of workflow components. The user constructs a workflow with two modeling algorithms (Classification Tree and a rule‐based classifier CN2), each followed by components for model visualization (Classification Tree Graph and CN2 Rules Viewer). After he adds another modeling algorithm (Logistic Regression), the framework can anticipate that it will be followed by a visualization component (Nomogram) as well.
[ Normal View | Magnified View ]
General architecture for data analysis. Clients with user interface send data analysis requests to application servers that engage workers. These communicate with a data repository to load data and store the results of the analysis. Application servers render the results and communicate them back to the clients.
[ Normal View | Magnified View ]
The web application BigML allows users to upload their data sets, build and visualize decision tree models, and use them on new data. Each of the application's tabs (top row) is associated with one of the available tasks carried out within a simple interface also suitable for users with little or no data mining experience.
[ Normal View | Magnified View ]
Taverna workbench can be used to design complex workflows (on the right) from the list of available services (on the left).
[ Normal View | Magnified View ]
Interface for meta learning in JAM architecture showing the progress of the analysis execution (on the right).
[ Normal View | Magnified View ]
A workflow editor from Orange data mining toolbox, with an example workflow for cross‐validation of supervised data mining algorithms and analysis of results. Computational units are represented with icons and communicate through typed communication channels (gray lines).
[ Normal View | Magnified View ]

Browse by Topic

Application Areas > Data Mining Software Tools
Technologies > Computer Architectures for Data Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts