• DocumentCode
    2188387
  • Title

    A Novel Hybrid system for Large-Scale Chinese Text Classification Problem

  • Author

    Gao, Zhong ; Lu, Guanming ; Gu, Daquan

  • Author_Institution
    Coll. of Telecommun. & Inf. Eng., Nanjing Univ. of Posts & Telecommun., Nanjing, China
  • fYear
    2008
  • fDate
    27-28 Dec. 2008
  • Firstpage
    121
  • Lastpage
    124
  • Abstract
    Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. In this paper, we propose a hybrid system based on BW and a novel cascade SVM with feedback that can be splitting the problem into smaller subsets and training a network to assign samples of different subsets. The proposed parallel training algorithm on large-scale classification problems where multiple SVM classifiers are applied speeds up the process of training SVM and increase the classification accuracy.
  • Keywords
    classification; feedback; natural language processing; probability; support vector machines; text analysis; word processing; bag of words; feedback; large-scale Chinese text classification; probability tool; semantic architecture; support vector machine; text representation; Computer architecture; Educational institutions; Feedback; Large-scale systems; Machine learning; Natural language processing; Quadratic programming; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontier of Computer Science and Technology, 2008. FCST '08. Japan-China Joint Workshop on
  • Conference_Location
    Nagasahi
  • Print_ISBN
    978-1-4244-3418-3
  • Type

    conf

  • DOI
    10.1109/FCST.2008.29
  • Filename
    4736518