• DocumentCode
    3303085
  • Title

    Text Classification by Literary Period Using PPM-C Data Compression

  • Author

    Barufaldi, Bruno ; Santana, Eduardo F. ; Filho, José Rogério B B ; van der Poel, J. ; Marques, Marco ; Batista, Leonardo Vidal

  • Author_Institution
    Dept. de Inf., Univ. Fed. da Paraiba, Joao Pessoa, Brazil
  • fYear
    2009
  • fDate
    8-11 Sept. 2009
  • Firstpage
    125
  • Lastpage
    133
  • Abstract
    Methods and techniques for data compression have been used for pattern recognition, including automatic text classification. The performance of the Prediction by Partial Matching (PPM) as a text classifier has already been proofed by many works, including authorship attribution for Portuguese texts. Classes involved in classification process may not be restricted by only one author. By including two or more authors in one class, one can create a literature style. This work presents a literature style classifier for texts from Brazilian literature by using the PPM-C statistical model.
  • Keywords
    Data compression; Humans; Internet; Pattern recognition; Predictive models; Text categorization; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
  • Conference_Location
    Sao Carlos, TBD, Brazil
  • Print_ISBN
    978-1-4244-6008-3
  • Type

    conf

  • DOI
    10.1109/STIL.2009.39
  • Filename
    5532446