DocumentCode :
3194922
Title :
Mining association rules in text databases using multipass with inverted hashing and pruning
Author :
Holt, John D. ; Chung, Soon M.
Author_Institution :
Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
fYear :
2002
fDate :
2002
Firstpage :
49
Lastpage :
56
Abstract :
In this paper, we propose a new algorithm named multipass with inverted hashing and pruning (MIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, the apriori algorithm and the direct hashing and pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the proposed MIHP algorithm. It has been shown that the MIHP algorithm performs better for large text databases.
Keywords :
data mining; file organisation; text analysis; apriori algorithm; association rule mining; direct hashing and pruning algorithm; itemsets; large text databases; multipass with inverted hashing and pruning algorithm; text databases; Association rules; Computer science; Data engineering; Data mining; Indexing; Itemsets; Marketing and sales; Performance analysis; Thesauri; Transaction databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings. 14th IEEE International Conference on
ISSN :
1082-3409
Print_ISBN :
0-7695-1849-4
Type :
conf
DOI :
10.1109/TAI.2002.1180787
Filename :
1180787
Link To Document :
بازگشت