DocumentCode :
3140664
Title :
Mining association algorithm with threshold based on ROC analysis
Author :
Kawahara, Minoru ; Kawano, Hiroyuki
Author_Institution :
Data Process. Center, Kyoto Univ., Japan
fYear :
2001
fDate :
6-6 Jan. 2001
Abstract :
The mining association algorithm is one of the most important data mining algorithms to derive association rules at high speed from huge databases. However, the algorithm tends to derive those rules that contain noise, such as stopwords, and then some systems remove the noise using noise filters. We have been improving the algorithm and developing navigation systems for semi-structured data using the algorithm, and we also use a dictionary to remove noise from derived association rules. In order to derive effective rules, it is very important to determine system parameters such as the threshold values of minimum support and minimum confidence. We have adapted ROC analysis to the algorithm on our navigation systems and have evaluated the performance of derived rules. In this paper, we import the parameters from the ROC analysis into the algorithm in order to propose extended mining association algorithms. Moreover, we evaluate the performance of our proposed algorithms using a experimental database and show how our proposed algorithms can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.
Keywords :
data analysis; data mining; noise; online front-ends; software performance evaluation; very large databases; Apriori algorithm; ROC analysis; algorithm performance evaluation; association rules derivation; automatic stopword removal; data mining algorithm; dictionary; information navigation systems; large databases; minimum confidence; minimum support; mining association algorithm; noise filters; noise removal; receiver operating characteristic; rule performance; semi-structured data; system parameters; threshold values; Algorithm design and analysis; Association rules; Data mining; Data processing; Databases; Dictionaries; Electronic mail; Filters; Navigation; Performance analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 2001. Proceedings of the 34th Annual Hawaii International Conference on
Conference_Location :
Maui, HI, USA
Print_ISBN :
0-7695-0981-9
Type :
conf
DOI :
10.1109/HICSS.2001.926303
Filename :
926303
Link To Document :
بازگشت