This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

DataDesk: an interactive package for data exploration, display, model building, and data analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract DataDesk is an interactive package for exploration, display, and analysis of data that runs on a variety of desktop computers. Because it has been designed from the ground up for the purpose of data exploration, it incorporates several global design principles not found together in other data analysis software. This article discusses those design principles and illustrates how they lead to innovative approaches to regression model building. WIREs Comput Stat 2012. doi: 10.1002/wics.1208 This article is categorized under: Software for Computational Statistics > Software/Statistical Software Statistical and Graphical Methods of Data Analysis > Statistical Graphics and Visualization Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data

GDP/Capita is not distributed symmetrically and is not linearly related to the response variable–both conditions likely to be improved by re‐expression.

[ Normal View | Magnified View ]

A rotating plot of the five predictors can be rotated into an orientation that separates the African countries of concern from the others. DataDesk can display the equations of the rotated axes. Here the x‐axis equation is a good discriminant function for the two groups of countries.

[ Normal View | Magnified View ]

Displaying the names of the unusual cases reveals them to be countries in southern Africa.

[ Normal View | Magnified View ]

The multiple regression model with four predictors and the indicator for Zimbabwe, along with the updated scatterplot of externally studentized residuals vs predictred values. A cluster of countries with low residual and low predicted value has been colored red and plotted with “x” symbols.

[ Normal View | Magnified View ]

The scatterplots of Figure 5 after Zimbabwe has been removed from the model by introducing an indicator variable. Each of the plots was updated in place in response the change in residuals caused by adding the indicator to the regression model.

[ Normal View | Magnified View ]

Scatterplots of three potential predictors that have similar correlations with the residuals reveal one country with a particularly low residual.

[ Normal View | Magnified View ]

An initial regression model and associated plot of externally studentized residuals vs predicted values.

[ Normal View | Magnified View ]

A table of correlations has hypertext menus accessed by clicking on any of the correlations. Both appropriate scatterplots are offered.

[ Normal View | Magnified View ]

Re‐expressiong GDP/Capita by logarithms both improves the univarite distribution and makes the relationship with the response variable more nearly linear.

[ Normal View | Magnified View ]

Related Articles

Scientific Visualization

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data
Statistical and Graphical Methods of Data Analysis > Statistical Graphics and Visualization
Software for Computational Statistics > Software/Statistical Software

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts