DocumentCode
3065675
Title
Web Document Classification Based on Extended Rough set
Author
Yi, Gaoxiang ; Hu, Heping ; Lu, Zhengding
Author_Institution
Huazhong University of Science and technology,Wuhan,Hubei, China
fYear
2005
fDate
05-08 Dec. 2005
Firstpage
916
Lastpage
919
Abstract
A VSM algorithm for Web document classification based on an extended rough set --Tolerance Rough Set is proposed. Firstly, Web document are denoted by vector space model with terms. Then the value of term co-occurrence is made used of description of tolerance class of term, which extends the capability of term to document. Finally, Web document classification algorithm is implemented, in which the similarity between documents is described by term tolerance class. Experiments using data sets collected from two Web portals: Yahoo and Open Directory Project are conducted.
Keywords
Classification algorithms; Computer science; Data mining; Database systems; Internet; Portals; Set theory; Space technology; Web mining; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Computing, Applications and Technologies, 2005. PDCAT 2005. Sixth International Conference on
Print_ISBN
0-7695-2405-2
Type
conf
DOI
10.1109/PDCAT.2005.251
Filename
1579063
Link To Document