• DocumentCode
    506904
  • Title

    Automatic Genre Classification by Using Co-training

  • Author

    Liu, Rui ; Jiang, Minghu ; Tie, Zheng

  • Author_Institution
    Lab. of Comput. Linguistics, Tsinghua Univ., Beijing, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    129
  • Lastpage
    132
  • Abstract
    Researchers have concentrated on topic-based text classification while the genre of a document is rarely considered. In this article, we discuss the automatic genre classification and its application. We argue that word level features and sentence level features are two important measures which vary in number among different genres. Word level features include word frequency and POS (part of speech) tag statistics. Sentence level features include grammar rules, which have strong relations between different genres. Based on the two aspects of view, we explore a robust approach where the co-training method is employed to obtain high effectiveness for genre classification.
  • Keywords
    pattern classification; statistical analysis; text analysis; automatic genre classification; co-training; part of speech; tag statistics; topic-based text classification; word frequency; Computational linguistics; Frequency shift keying; Fuzzy systems; HTML; Robustness; Search engines; Speech; Text categorization; Uniform resource locators; Web pages; Co-training; genre classification; grammar rules;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.609
  • Filename
    5358649