DocumentCode :
3303085
Title :
Text Classification by Literary Period Using PPM-C Data Compression
Author :
Barufaldi, Bruno ; Santana, Eduardo F. ; Filho, José Rogério B B ; van der Poel, J. ; Marques, Marco ; Batista, Leonardo Vidal
Author_Institution :
Dept. de Inf., Univ. Fed. da Paraiba, Joao Pessoa, Brazil
fYear :
2009
fDate :
8-11 Sept. 2009
Firstpage :
125
Lastpage :
133
Abstract :
Methods and techniques for data compression have been used for pattern recognition, including automatic text classification. The performance of the Prediction by Partial Matching (PPM) as a text classifier has already been proofed by many works, including authorship attribution for Portuguese texts. Classes involved in classification process may not be restricted by only one author. By including two or more authors in one class, one can create a literature style. This work presents a literature style classifier for texts from Brazilian literature by using the PPM-C statistical model.
Keywords :
Data compression; Humans; Internet; Pattern recognition; Predictive models; Text categorization; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
Conference_Location :
Sao Carlos, TBD, Brazil
Print_ISBN :
978-1-4244-6008-3
Type :
conf
DOI :
10.1109/STIL.2009.39
Filename :
5532446
Link To Document :
بازگشت