• DocumentCode
    1963344
  • Title

    Study on Web-Page Classification Algorithm Based on Rough Set Theory

  • Author

    Yin, Shiqun ; Wang, Fang ; Xie, Zhong ; Qiu, Yuhui

  • Author_Institution
    Fac. of Comput. & Inf. Sci., Southwest Univ., Chongqing
  • fYear
    2008
  • fDate
    23-25 May 2008
  • Firstpage
    202
  • Lastpage
    206
  • Abstract
    The large number of Web-page documents is comprise high dimensional huge text database with the development of Internet technology. But it is only a very small portion with the relevant users. The Web-page should be assigned to a category structure through the Web-page classification technology. it is not only convenient for customers to browse Web-page, but also easier to make Web-page seek through restriction search scope. Mining in high dimensional data is extraordinarily difficult because of the curse of dimensionality. We must adopt feature select to solve these problems. A algorithm is given in this paper to reduce the Web-page feature term and extract classification rule at last used attribute reduction on rough set theory. Experimental results show that this method has been greatly reduced feature vector space dimension and gotten easy-to-understand classification rules, and its accuracy is higher and the speed of classification is faster than based on the classification of vector comparison.
  • Keywords
    Internet; classification; feature extraction; rough set theory; text analysis; Internet; Web-page document classification algorithm; classification rule extraction; feature extraction; rough set theory; text database; Classification algorithms; Databases; Decision making; Feature extraction; Information processing; Information science; Internet; Set theory; Space technology; Web mining; Classification rule; Feature selection; Rough set; Web-page; vector space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Processing (ISIP), 2008 International Symposiums on
  • Conference_Location
    Moscow
  • Print_ISBN
    978-0-7695-3151-9
  • Type

    conf

  • DOI
    10.1109/ISIP.2008.118
  • Filename
    4554085