Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

Frequent item set mining

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Frequent item set mining is one of the best known and most popular data mining methods. Originally developed for market basket analysis, it is used nowadays for almost any task that requires discovering regularities between (nominal) variables. This paper provides an overview of the foundations of frequent item set mining, starting from a definition of the basic notions and the core task. It continues by discussing how the search space is structured to avoid redundant search, how it is pruned with the a priori property, and how the output is reduced by confining it to closed or maximal item sets or generators. In addition, it reviews some of the most important algorithmic techniques and data structures that were developed to make the search for frequent item sets as efficient as possible. © 2012 Wiley Periodicals, Inc.

Figure 1.

(a) A simple example database with 10 transactions (market baskets, shopping carts) over the item base B = {a, b, c, d, e} and (b) the frequent item sets that can be found in it if the minimum support is chosen to be smin = 3 (the numbers state the support of these item sets).

[ Normal View | Magnified View ]
Figure 2.

(a) Hasse diagram for the partial order induced by ⊆ on 2{a, b, c, d, e} and (b) frequent item sets for the database shown in Figure 1 and smin = 3.

[ Normal View | Magnified View ]
Figure 3.

Schematic illustration of maximal (a) and closed item sets (b) demonstrating their relation.

[ Normal View | Magnified View ]
Figure 4.

The number of frequent, closed, and maximal item sets on a common benchmark data set (BMS‐Webview‐1).

[ Normal View | Magnified View ]
Figure 5.

(a) A subset tree that results from assigning a unique parent to each item set and (b) a prefix tree in which sibling nodes with the same prefix are merged.

[ Normal View | Magnified View ]
Figure 6.

Two steps of the Apriori algorithm: (a) adding the second level; (b) adding the third level (blue: a priori pruning, red: a posteriori pruning for smin = 3).

[ Normal View | Magnified View ]
Figure 7.

Illustration of the divide‐and‐conquer approach to find frequent item sets: (a) first split, (b) second split; blue: split item/prefix, green: first subproblem (include split item), red: second subproblem (exclude split item).

[ Normal View | Magnified View ]
Figure 8.

(Top levels of) the subproblem tree of the divide‐and‐conquer scheme (with a globally fixed item order).

[ Normal View | Magnified View ]
Figure 9.

Preprocessing a transaction database: (a) original form; (b) item frequencies; (c) transactions with sorted items and infrequent items eliminated (smin = 3); (d) lexicographically sorted reduced transactions; (e) data structure used in the SaM (Split and Merge) algorithm.

[ Normal View | Magnified View ]
Figure 10.

The basic operations of the SaM (Split and Merge) algorithm: split (left; first subproblem, include split item) and merge (right; second subproblem, exclude split item).

[ Normal View | Magnified View ]
Figure 11.

Split into subproblems in the Eclat algorithm: (a) transaction database; (b) vertical representation with split item in blue; (c) conditional transaction database after intersection with split item list (first subproblem: include split item); (d) transaction index list of split item removed (second subproblem: exclude split item).

[ Normal View | Magnified View ]
Figure 12.

Occurrence deliver scheme used by LCM (Linear time Closed item set Miner) to find the conditional transaction database for the first subproblem (needs a horizontal representation in parallel).

[ Normal View | Magnified View ]
Figure 13.

Building an FP‐tree (Frequent Pattern Tree): (a) original transaction database; (b) lexicographically sorted transactions (infrequent items eliminated, other items sorted according to descending frequency); (c) resulting FP‐tree structure.

[ Normal View | Magnified View ]
Figure 14.

Subproblem split in the FP‐growth (Frequent Pattern Growth) algorithm: (a) projecting an FP‐tree (Frequent Pattern Tree) (to item e); (b) detached projection (FP‐tree of conditional transaction database); (c) remaining FP‐tree after split item level is removed; (d) conditional transaction database in horizontal representation.

[ Normal View | Magnified View ]
Figure 15.

Eclat with transaction ranges: (a) lexicographically sorted transactions; (b) transaction range lists.

[ Normal View | Magnified View ]
Figure 16.

Illustration of a 4‐items machine: (a) table of highest set bits; (b) empty 4‐items machine (no transactions); (c) after inserting the transactions of Figure 1 (without e); (d) after propagating the transaction lists left‐/downward.

[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Association Rules
Application Areas > Data Mining Software Tools
Technologies > Association Rules

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts