Title :
The Influence of Order on a Large Bag of Words
Author :
Prado, Charles B. ; Franca, Felipe M. G. ; Diacovo, Ramon ; Lima, Priscila M V
Author_Institution :
R&D Dept., Globo Telev. Network, Rio de Janeiro
Abstract :
Text classification has been mostly performed through implicit semantic correlation techniques, such as latent semantic analysis. This approach however, has proved insufficient for situations where there are short texts to be classified into one or more from many classes. That is the case of the classification of statements of purpose of Brazilian companies, according to the around one thousand and eight hundred categories of the government administration detailment of National Classification of Economical Activities (CNAE), CNAE-Subclasses. The impact of the order of words in a text is evaluated by comparing the performance of three classifiers based on the weightless artificial neural model, WISARD. Results point to the need of combining semantic with syntactic information in order to improve the classifiers performance.
Keywords :
artificial intelligence; government data processing; neural nets; pattern classification; semantic networks; text analysis; CNAE-Subclasses; National Classification of Economical Activities; WISARD; government administration detailment; latent semantic analysis; semantic correlation technique; text classification; weightless artificial neural model; Agriculture; Artificial neural networks; Business; Classification tree analysis; Computer networks; Cows; Design engineering; Environmental economics; Humans; Production; Classification of Economical Activities; WISARD; Weightless Neural Network;
Conference_Titel :
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-0-7695-3382-7
DOI :
10.1109/ISDA.2008.299