DocumentCode :
1654141
Title :
Classification Based on Specific Vocabulary
Author :
Savoy, Jacques ; Zubaryeva, Olena
Author_Institution :
Comput. Sci. Dept., Univ. of Neuchatel, Neuchatel, Switzerland
Volume :
1
fYear :
2011
Firstpage :
120
Lastpage :
123
Abstract :
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms characterizing a document (or a sample of texts). We then show how these Z score values can be used to derive an efficient categorization scheme. To evaluate this proposition we categorize speeches given by B. Obama as either electoral or presidential. The results tend to show that the suggested classification scheme performs better than a Support Vector Machine scheme, and a Naive Bayes classifier (10-fold cross validation).
Keywords :
binomial distribution; classification; text analysis; vocabulary; binomial distribution; categorization scheme; classification scheme; document weight terms characterization; naive Bayes classifier; specific vocabulary; standardized Z score computation; support vector machine scheme; word occurrence; Frequency measurement; Machine learning; Smoothing methods; Speech; Support vector machines; Text categorization; Vocabulary; Lexical Analysis; Machine Learning; Natural Language Processing; Political Discourse; Text Categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM International Conference on
Conference_Location :
Lyon
Print_ISBN :
978-1-4577-1373-6
Electronic_ISBN :
978-0-7695-4513-4
Type :
conf
DOI :
10.1109/WI-IAT.2011.19
Filename :
6040507
Link To Document :
بازگشت