Title :
A Hypergraph-based Method for Discovering Semantically Associated Itemsets
Author :
Liu, Haishan ; Le Pendu, P. ; Jin, Ruoming ; Dou, Dejing
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Oregon, Eugene, OR, USA
Abstract :
In this paper, we address an interesting data mining problem of finding semantically associated itemsets, i.e., items connected via indirect links. We propose a novel method for discovering semantically associated itemsets based on a hypergraph representation of the database. We describe two similarity measures to compute the strength of associations between items. Specifically, we introduce the average commute time similarity, sCT, based on the random walk model on hypergraph, and the inner-product similarity, sL+, based on the Moore-Penrose pseudoinverse of the hypergraph Laplacian matrix. Given semantically associated 2-itemsets generated by these measures, we design a hypergraph expansion method with two search strategies, namely, the clique and connected component search, to generate k-itemsets (k >; 2). We show the proposed method is indeed capable of capturing semantically associated itemsets through experiments performed on three datasets ranging from low to high dimensionality. The semantically associated itemsets discovered in our experiment is promising to provide valuable insights on interrelationship between medical concepts and other domain specific concepts.
Keywords :
data mining; database management systems; graph theory; search problems; Moore-Penrose pseudoinverse; average commute time similarity; clique search; connected component search; data mining problem; domain specific concepts; hypergraph Laplacian matrix; hypergraph based method; hypergraph database representation; hypergraph expansion method; inner product similarity; medical concepts; search strategies; semantically associated itemset discovery; Blood; Equations; Itemsets; Joining processes; Laplace equations; Mathematical model; Semantics; Semantically associated itemset; hypergraph; random walk;
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
Print_ISBN :
978-1-4577-2075-8
DOI :
10.1109/ICDM.2011.12