• DocumentCode
    3065675
  • Title

    Web Document Classification Based on Extended Rough set

  • Author

    Yi, Gaoxiang ; Hu, Heping ; Lu, Zhengding

  • Author_Institution
    Huazhong University of Science and technology,Wuhan,Hubei, China
  • fYear
    2005
  • fDate
    05-08 Dec. 2005
  • Firstpage
    916
  • Lastpage
    919
  • Abstract
    A VSM algorithm for Web document classification based on an extended rough set --Tolerance Rough Set is proposed. Firstly, Web document are denoted by vector space model with terms. Then the value of term co-occurrence is made used of description of tolerance class of term, which extends the capability of term to document. Finally, Web document classification algorithm is implemented, in which the similarity between documents is described by term tolerance class. Experiments using data sets collected from two Web portals: Yahoo and Open Directory Project are conducted.
  • Keywords
    Classification algorithms; Computer science; Data mining; Database systems; Internet; Portals; Set theory; Space technology; Web mining; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies, 2005. PDCAT 2005. Sixth International Conference on
  • Print_ISBN
    0-7695-2405-2
  • Type

    conf

  • DOI
    10.1109/PDCAT.2005.251
  • Filename
    1579063