Title :
Mining association algorithm with improved threshold based on ROC analysis
Author :
Kawahara, Minoru ; Kawano, Hiroyuki
Author_Institution :
Data Process. Center, Kyoto Univ., Japan
fDate :
6/23/1905 12:00:00 AM
Abstract :
The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However, the algorithm tends to derive those rules that contain noises such as stopwords then many systems use noise filters to remove such noises. In order to remove the noises automatically and derive more effective rules, we proposed an algorithm using the true positive rate and the false positive rate of derived rules in a database based on the ROC analysis. In this paper, we make corrections in the parameters to improve the extended mining association algorithm. Moreover, we evaluate the performance of our proposed algorithm using a experimental database and show how our proposed algorithm can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data
Keywords :
bibliographic systems; data mining; very large databases; Web data; association rules; bibliographic data; data mining algorithms; false positive rate; huge databases; mining association algorithm; noise filters; semi-structured data; stopwords; Algorithm design and analysis; Association rules; Bibliographies; Data mining; Data processing; Filters; Humans; Navigation; Text mining; Transaction databases;
Conference_Titel :
Communications, Computers and signal Processing, 2001. PACRIM. 2001 IEEE Pacific Rim Conference on
Conference_Location :
Victoria, BC
Print_ISBN :
0-7803-7080-5
DOI :
10.1109/PACRIM.2001.953729