• DocumentCode
    3249036
  • Title

    Discriminative category matching: efficient text classification for huge document collections

  • Author

    Fung, Gabriel Pui Cheong ; Yu, Jeffrey Xu ; Lu, Hongjun

  • Author_Institution
    Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, China
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    187
  • Lastpage
    194
  • Abstract
    With the rapid growth of textual information available on the Internet, having a good model for classifying and managing documents automatically is undoubtedly important. When more documents are archived, new terms, new concepts and concept-drift will frequently appear Without a doubt, updating the classification model frequently, rather than using the old model for a very long period is absolutely essential. Here, the challenges are: a) obtain a high accuracy classification model; b) consume low computational time for both model training and operation; and c) occupy low storage space. However, none of the existing classification approaches could achieve all of these requirements. In this paper, we propose a novel text classification approach, called discriminative category matching, which could achieve all of the stated characteristics. Extensive experiments using two benchmarks and a large real-life collection are conducted. The encouraging results indicated that our approach is highly feasible.
  • Keywords
    Internet; computational complexity; data mining; pattern matching; text analysis; Internet; computational time; concept-drift; discriminative category matching; document classification; document management; efficient text classification; huge document collections; Computational efficiency; Computer science; Costs; Document handling; Government; Internet; Support vector machine classification; Support vector machines; Technology management; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1183902
  • Filename
    1183902