DocumentCode
460869
Title
Feature Reduction for Web Document Classification
Author
Song, MuHee ; Kang, DongJin ; Lee, SangJo
Author_Institution
Dept. of Comput. Eng., Kyungpook Nat. Univ., Daegu
Volume
1
fYear
2006
fDate
Nov. 2006
Firstpage
785
Lastpage
788
Abstract
This paper suggests an evolutionary Web page classification method that originated from the need to enhance today´s classification performance of Web pages. Words that are utilized in certain Web pages are used to characterize that specific Web page. However, treating every word as a possible feature in a Web page classification does not guarantee a better classification performance. In response to this demand, this paper introduces one of the statistical analysis methods known as the principal component analysis (PCA) in order to reduce a large-scaled feature vector down to a smaller scaled feature vector containing a few chief elements and presents a result of simulation experiments to verify the reduction of feature vector size and the improvements of Web page classification-ability. For the classification-ability experiment, Yahoo, com´s sports News Web page section was experimented under the Naive Bayesian classification algorithm. The results of this experiment verified that the suggested method of news Web page classification algorithm used in this paper was indeed providing satisfactory accuracy in Web page classification among the sports-news database
Keywords
Web sites; data reduction; document handling; pattern classification; principal component analysis; Web document classification; evolutionary Web page classification; feature reduction; large-scaled feature vector; principal component analysis; statistical analysis; Analytical models; Bayesian methods; Classification algorithms; Information technology; Principal component analysis; Spatial databases; Statistical analysis; Terminology; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security, 2006 International Conference on
Conference_Location
Guangzhou
Print_ISBN
1-4244-0605-6
Electronic_ISBN
1-4244-0605-6
Type
conf
DOI
10.1109/ICCIAS.2006.294242
Filename
4072195
Link To Document