DocumentCode :
2633967
Title :
Text categorization study case: Patents´ application documents
Author :
de Oliveira Gomes, Neide ; Passos, Emmanuel Piceses Lopes
Author_Institution :
Electr. Eng. Dept., Pontifical Catholic Univ., Rio de Janeiro, Brazil
fYear :
2011
fDate :
21-23 June 2011
Firstpage :
446
Lastpage :
450
Abstract :
This paper presents computational methods aiming to patent´s text categorization in Portuguese language, involving techniques from machine learning and computational linguistics. The algorithm used was the k-Nearest Neighbor method (k-NN) modified which showed good results, although it requires much computational time in the training stage. For the pre-processing step, it was implemented, with modifications, the stemming method called StemmerPortuguese that includes the removal of suffixes, besides the removal of stopwords and treatment of compound terms.
Keywords :
natural language processing; text analysis; Portuguese language; StemmerPortuguese; computational linguistics; computational time; k-NN; k-Nearest Neighbor method; machine learning; patents application documents; stemming method; text categorization; Classification algorithms; Databases; Equations; Informatics; Patents; Text categorization; Training; Categorization of Patents´ Applications; Classification of Patent´s Applications; Knowledge Discovery in Texts; Text Categorization; Text Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on
Conference_Location :
Beijing
ISSN :
pending
Print_ISBN :
978-1-4244-8754-7
Electronic_ISBN :
pending
Type :
conf
DOI :
10.1109/ICIEA.2011.5975625
Filename :
5975625
Link To Document :
بازگشت