DocumentCode :
2415621
Title :
Web document classification based on fuzzy association
Author :
Haruechaiyasak, Choochart ; Shyu, Mei-Ling ; Chen, Shu-Ching
Author_Institution :
Dept. of Electr. & Comput. Eng., Miami Univ., Coral Gables, FL, USA
fYear :
2002
fDate :
2002
Firstpage :
487
Lastpage :
492
Abstract :
In this paper, a method of automatically classifying web documents into a set of categories using the fuzzy association concept is proposed. Using the same word or vocabulary to describe different entities creates ambiguity, especially in the web environment where the user population is large. To solve this problem, fuzzy association is used to capture the relationships among different index terms or keywords in the documents, i.e., each pair of words has an associated value to distinguish itself from the others. Therefore, the ambiguity in word usage is avoided. Experiments using data sets collected from two web portals: Yahoo! and Open Directory Project are conducted. We compare our approach to the vector space model with the cosine coefficient. The results show that our approach yields higher accuracy compared to the vector space model.
Keywords :
Internet; data mining; fuzzy set theory; information retrieval; pattern classification; ambiguity; associated value; data mining; fuzzy association concept; index terms; information retrieval; text categorization; vector space model; web document classification; Data mining; Database systems; Fuzzy logic; Fuzzy sets; Information retrieval; Laboratories; Multimedia systems; Portals; Web mining; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference, 2002. COMPSAC 2002. Proceedings. 26th Annual International
ISSN :
0730-3157
Print_ISBN :
0-7695-1727-7
Type :
conf
DOI :
10.1109/CMPSAC.2002.1045052
Filename :
1045052
Link To Document :
بازگشت