DocumentCode :
3165788
Title :
Maximum Entropy Based Significance of Itemsets
Author :
Tatti, Nikolaj
Author_Institution :
Helsinki Univ. of Technol., Helsinki
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
312
Lastpage :
321
Abstract :
We consider the problem of defining the significance of an itemset. We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets. In other words, we estimate the frequency of the itemset from the frequencies of its sub-itemsets and compute the deviation between the real value and the estimate. For the estimation we use Maximum Entropy and for measuring the deviation we use Kullback-Leibler divergence. A major advantage compared to the previous methods is that we are able to use richer models whereas the previous approaches only measure the deviation from the independence model. We show that our measure of significance goes to zero for derivable itemsets and that we can use the rank as a statistical test. Our empirical results demonstrate that for our real datasets the independence assumption is too strong but applying more flexible models leads to good results.
Keywords :
data mining; maximum entropy methods; Kullback-Leibler divergence; flexible model; independence assumption; independence model; itemsets significance; maximum entropy; Computer science; Data mining; Electronic mail; Entropy; Frequency estimation; Frequency measurement; Itemsets; Predictive models; Proposals; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3018-5
Type :
conf
DOI :
10.1109/ICDM.2007.43
Filename :
4470255
Link To Document :
بازگشت