DocumentCode :
3125580
Title :
Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets
Author :
Wu, Cheng Wei ; Fournier-Viger, Philippe ; Yu, Philip S. ; Tseng, Vincent S.
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. ChengKung Univ., Chengkung, Taiwan
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
824
Lastpage :
833
Abstract :
Mining high utility item sets from transactional databases is an important data mining task, which refers to the discovery of item sets with high utilities (e.g. high profits). Although several studies have been carried out, current methods may present too many high utility item sets for users, which degrades the performance of the mining task in terms of execution and memory efficiency. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed+ high utility item sets, which serves as a compact and loss less representation of high utility item sets. We present an efficient algorithm called CHUD (Closed+ High Utility item set Discovery) for mining closed+ high utility item sets. Further, a method called DAHU (Derive All High Utility item sets) is proposed to recover all high utility item sets from the set of closed+ high utility item sets without accessing the original database. Results of experiments on real and synthetic datasets show that CHUD and DAHU are very efficient with a massive reduction (up to 800 times in our experiments) in the number of high utility item sets. In addition, when all high utility item sets are recovered by DAHU, the approach combining CHUD and DAHU also outperforms the state-of-the-art algorithms in mining high utility item sets.
Keywords :
data mining; database management systems; transaction processing; CHUD; DAHU; closed+ high utility item set discovery; closed+ high utility item set mining; data mining task; derive all high utility item sets; high utility itemsets concise representation mining; high utility itemsets lossless representation mining; transactional databases; Algorithm design and analysis; Arrays; Data mining; Educational institutions; Itemsets; Memory management; closed+ high utility itemset; frequent itemset; lossless and concise representation; utility mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.60
Filename :
6137287
Link To Document :
بازگشت