• DocumentCode
    460869
  • Title

    Feature Reduction for Web Document Classification

  • Author

    Song, MuHee ; Kang, DongJin ; Lee, SangJo

  • Author_Institution
    Dept. of Comput. Eng., Kyungpook Nat. Univ., Daegu
  • Volume
    1
  • fYear
    2006
  • fDate
    Nov. 2006
  • Firstpage
    785
  • Lastpage
    788
  • Abstract
    This paper suggests an evolutionary Web page classification method that originated from the need to enhance today´s classification performance of Web pages. Words that are utilized in certain Web pages are used to characterize that specific Web page. However, treating every word as a possible feature in a Web page classification does not guarantee a better classification performance. In response to this demand, this paper introduces one of the statistical analysis methods known as the principal component analysis (PCA) in order to reduce a large-scaled feature vector down to a smaller scaled feature vector containing a few chief elements and presents a result of simulation experiments to verify the reduction of feature vector size and the improvements of Web page classification-ability. For the classification-ability experiment, Yahoo, com´s sports News Web page section was experimented under the Naive Bayesian classification algorithm. The results of this experiment verified that the suggested method of news Web page classification algorithm used in this paper was indeed providing satisfactory accuracy in Web page classification among the sports-news database
  • Keywords
    Web sites; data reduction; document handling; pattern classification; principal component analysis; Web document classification; evolutionary Web page classification; feature reduction; large-scaled feature vector; principal component analysis; statistical analysis; Analytical models; Bayesian methods; Classification algorithms; Information technology; Principal component analysis; Spatial databases; Statistical analysis; Terminology; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security, 2006 International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    1-4244-0605-6
  • Electronic_ISBN
    1-4244-0605-6
  • Type

    conf

  • DOI
    10.1109/ICCIAS.2006.294242
  • Filename
    4072195