Title :
Research on Algorithm of Web Classification Based on EP and FFSS
Author :
Wang, WeiPing ; Wang, Zufeng
Abstract :
In this paper, we present a new algorithm of web classifi- cation by combining extended pages (EP) and fair feature- subset selection (FFSS). As the importance of hyperlink, we extend web pages by anchor text. In extended pages, the proportion of the useful feature increases, so we can im- prove the solution of the web classification. In view of using the structure of the web, we get extended pages by append- ing the sentence or the paragraph including anchor text to the original pages. Fair feature-subset selection not only gives fair treatment to each category but also has ability to identify useful features, including both positive and negative features, so it can address the issue of high dimensionality of vector space. Experiments show that the new algorithm enhances the precision and recall of the traditional method.
Keywords :
Classification algorithms; Computational intelligence; Feature extraction; Information management; Information security; Pattern recognition; Space technology; Support vector machine classification; Support vector machines; Web pages;
Conference_Titel :
Computational Intelligence and Security, 2007 International Conference on
Conference_Location :
Harbin
Print_ISBN :
0-7695-3072-9
Electronic_ISBN :
978-0-7695-3072-7
DOI :
10.1109/CIS.2007.152