Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

Applications of tensor (multiway array) factorizations and decompositions in data mining

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Tensor (multiway array) factorization and decomposition has become an important tool for data mining. Fueled by the computational power of modern computer researchers can now analyze large‐scale tensorial structured data that only a few years ago would have been impossible. Tensor factorizations have several advantages over two‐way matrix factorizations including uniqueness of the optimal solution and component identification even when most of the data is missing. Furthermore, multiway decomposition techniques explicitly exploit the multiway structure that is lost when collapsing some of the modes of the tensor in order to analyze the data by regular matrix factorization approaches. Multiway decomposition is being applied to new fields every year and there is no doubt that the future will bring many exciting new applications. The aim of this overview is to introduce the basic concepts of tensor decompositions and demonstrate some of the many benefits and challenges of modeling data multiway for a wide variety of data and problem domains. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 24‐40 DOI: 10.1002/widm.1 This article is categorized under: Algorithmic Development > Scalable Statistical Methods Technologies > Classification Technologies > Machine Learning Technologies > Structure Discovery and Clustering

Illustration of the Tucker model of a third‐order tensor . The model decomposes the tensor into loading matrices with a mode specific number of components as well as a core array accounting for all multilinear interactions between the components of each mode. The Tucker model is particularly useful for compressing tensors into a reduced representation given by the smaller core array .

[ Normal View | Magnified View ]

The matricizing operation on a third‐order tensor of size 4 × 4 × 4.

[ Normal View | Magnified View ]

Illustration of the Weizmann face database used in the analysis of TensorFaces.14

[ Normal View | Magnified View ]

Illustration of the three‐way microarray data set used in the study of Ref 19.

[ Normal View | Magnified View ]

Left panel: Tutorial dataset two of ERPWAVELAB50 given by . Right panel a three component nonnegativity constrained three‐way cp decomposition of Channel × TimeFrequency × SubjectCondition and a three component nonnegative matrix factorization of Channel × Time − Frequency − Subject − Condition. The two models account for 60% and 76% of the variation in the data, respectively. The matrix factorization assume spatial consistency but individual time‐frequency patterns of activation across the subjects and conditions, whereas the three‐way cp analysis impose consistency in the time‐frequency patterns across the subjects and conditions. As such, these most consistent patterns of activations are identified by the model.

[ Normal View | Magnified View ]

Example of cp analysis and shiftcp analysis of Electroencephalography (eeg) data of described in Ref 6. Because of violation of trilinearity, a degenerate solution is extracted by the cp model given by the first four highly correlated components that account for the dynamics of the data through a strong degree of between component cancelation. However, when accounting for latency changes across the trials in four out of the five components, degeneracy no longer occur, whereas the most consistent spatial and temporal patterns of activation across the trials are successfully extracted (the amplitude and phase plot account for the trial‐specific strength and delay of the various components). The corresponding two‐way analysis here given by ica in order to resolve the rotational ambiguity of two‐way decomposition based on the fastICA algorithm (http://www.cis.hut.fi/projects/ica/fastica/ using the non‐linear function tanh (·)) no longer assume consistency across the trials. As a result, the matrix decomposition of channel × time − epoch is mainly driven by noisy artifacts, whereas the analysis of the trial averaged data also denoted the evoked potential (ep) to the bottom right mainly focus on accounting for the dynamics of the P100 − N200 − P300 complex of the ep. As a result, multilinear modeling enable direct extraction of the most consistent reproducible patterns across the trials.

[ Normal View | Magnified View ]

Example of svd (bottom left) and cp analysis (bottom middle) of the claus.mat fluorescence data given at the top provided by the N‐way toolbox.45 Both the three component svd and cp model accounts for more than 99.9% of the variation in the data. However, the cp decomposition admits a unique account of the data, resulting in the identification of the true underlying chemical compounds and their relative concentrations (bottom right).

[ Normal View | Magnified View ]

Example of a Tucker (2, 3, 2) analysis of the chopin data described in Ref 49. The overall mean of the data has been subtracted prior to analysis. Black and white boxes indicate negative and positive variables, whereas the size of the boxes their absolute value. The model accounts for 40.42% of the variation in the data, whereas the model on the same data random permuted accounts for 2.41 ± 0.09% of the variation. As such, the data are very structured and compressible by the Tucker model.

[ Normal View | Magnified View ]

Illustration of the candecomp/parafac (cp) model of a third‐order tensor . The model decomposes a tensor into a sum of rank one components and the model is very appealing due to its uniqueness properties.

[ Normal View | Magnified View ]

Browse by Topic

Technologies > Classification
Technologies > Machine Learning
Technologies > Structure Discovery and Clustering
Algorithmic Development > Scalable Statistical Methods

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts