DocumentCode
3303085
Title
Text Classification by Literary Period Using PPM-C Data Compression
Author
Barufaldi, Bruno ; Santana, Eduardo F. ; Filho, José Rogério B B ; van der Poel, J. ; Marques, Marco ; Batista, Leonardo Vidal
Author_Institution
Dept. de Inf., Univ. Fed. da Paraiba, Joao Pessoa, Brazil
fYear
2009
fDate
8-11 Sept. 2009
Firstpage
125
Lastpage
133
Abstract
Methods and techniques for data compression have been used for pattern recognition, including automatic text classification. The performance of the Prediction by Partial Matching (PPM) as a text classifier has already been proofed by many works, including authorship attribution for Portuguese texts. Classes involved in classification process may not be restricted by only one author. By including two or more authors in one class, one can create a literature style. This work presents a literature style classifier for texts from Brazilian literature by using the PPM-C statistical model.
Keywords
Data compression; Humans; Internet; Pattern recognition; Predictive models; Text categorization; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
Conference_Location
Sao Carlos, TBD, Brazil
Print_ISBN
978-1-4244-6008-3
Type
conf
DOI
10.1109/STIL.2009.39
Filename
5532446
Link To Document