• DocumentCode
    1929381
  • Title

    Applying a Novel Combined Classifier for Hypertext Classification in Pornographic Web Filtering

  • Author

    Gao, Zhong ; Lu, Guanming ; Dong, Hao ; Wang, Shutong ; Wang, Haibo ; Wei, Xiaopei

  • Author_Institution
    Sch. of Telecommun. & Inf. Eng., Nanjing Univ. of Posts & Telecommun., Nanjing
  • fYear
    2008
  • fDate
    28-29 Jan. 2008
  • Firstpage
    270
  • Lastpage
    273
  • Abstract
    As the Web expands exponentially, there are a flood of pornographic Web sites on the Internet. Thus effective Web filtering systems are essential. Web filtering based on hypertext classification has become one of the important techniques to handle and filter inappropriate information on the Web. Hypertext classification, that is the automatic classification of Web documents into predefined classes, came to elevate humans from that task. However, how to improve the performance of the hypertext classification under the situation of noisy data is still a challenging problem. In this paper, we propose a new approach for hypertext classification in Web filtering, which uses a novel support vector machine and K-nearest neighbor (KNN-SVM) to remove noisy training examples. The experimental results show that the generalization performance and the accuracy of classification are improved significantly compared to that of the traditional SVM classifier, and adapt to engineering applications.
  • Keywords
    Internet; classification; hypermedia; information filtering; support vector machines; Internet; K-nearest neighbor; hypertext classification; pornographic Web filtering; support vector machines; HTML; Information filtering; Information filters; Internet; Machine learning; Support vector machine classification; Support vector machines; Text categorization; Vocabulary; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Computing in Science and Engineering, 2008. ICICSE '08. International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-0-7695-3112-0
  • Electronic_ISBN
    978-0-7695-3112-0
  • Type

    conf

  • DOI
    10.1109/ICICSE.2008.88
  • Filename
    4548272