A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns

Author

Lucchese, Claudio ; Orlando, Salvatore ; Perego, Raffaele

Author_Institution

ISTI, Pisa, Italy

Volume

26

Issue

12

fYear

2014

fDate

Dec. 2014

Firstpage

2900

Lastpage

2913

Abstract

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, see the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PANDA⁺, an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PANDA⁺ to detect overlapping communities in a bipartite network. The results show that PANDA⁺ is able to discover high-quality patterns in both synthetic and real-world datasets.

Keywords

data mining; greedy algorithms; minimisation; text analysis; PANDA⁺; approximate top-k binary pattern mining; approximate top-k pattern extraction; binary matrixes; bipartite network; cost function minimization; data description; greedy algorithms; real-world text collection; synthetic data; synthetic datasets; top-k pattern discovery problem; unifying formulation; Approximation algorithms; Cost function; Data mining; Encoding; Matrix decomposition; Noise measurement; 0-1 data; Clustering; Data mining; MDL; Mining methods and algorithms; and association rules; approximate top- (k) patterns; classification; communities in bipartite networks;

fLanguage

English

Journal_Title

Knowledge and Data Engineering, IEEE Transactions on

Publisher

ieee

ISSN

1041-4347

Type

jour

DOI

10.1109/TKDE.2013.181

Filename

6682889

A Unifying Framework for Mining Approximate Top- Binary Patterns

Lucchese, Claudio ; Orlando, Salvatore ; Perego, Raffaele

jour

A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns