• DocumentCode
    2183448
  • Title

    An editor labeling model for training set expansion in Web categorization

  • Author

    Liu, Tie-Yan ; Wan, Hao ; Ma, Wei-Ying

  • Author_Institution
    Microsoft Res. Asia, China
  • fYear
    2005
  • fDate
    19-22 Sept. 2005
  • Firstpage
    165
  • Lastpage
    171
  • Abstract
    Automatically classifying Web pages is an effective way to manage the massive information on the Web. However, our experiments show that the state-of-the-art text categorization technologies can not achieve a satisfactory classification performance in this task. The major reason is the existence of large proportion of rare categories in Web taxonomies. The failure in such categories is simply because there is not enough information to train reliable classifiers. To tackle this problem, we propose to expand the training set of the rare categories, by simulating the labeling behavior of the human editors of Web directories. Experimental results show that in such a way, we achieved significant (relatively 93%) improvement in classification accuracy, which is highly encouraging for high performance Web classification.
  • Keywords
    Internet; classification; text analysis; Web categorization; Web directory; Web page classification; Web taxonomy; editor labeling; text categorization; training set expansion; Asia; Humans; Information management; Labeling; Support vector machine classification; Support vector machines; Taxonomy; Text categorization; Web pages; Wide area networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2415-X
  • Type

    conf

  • DOI
    10.1109/WI.2005.27
  • Filename
    1517838