Title :
Classification Based on Specific Vocabulary
Author :
Savoy, Jacques ; Zubaryeva, Olena
Author_Institution :
Comput. Sci. Dept., Univ. of Neuchatel, Neuchatel, Switzerland
Abstract :
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms characterizing a document (or a sample of texts). We then show how these Z score values can be used to derive an efficient categorization scheme. To evaluate this proposition we categorize speeches given by B. Obama as either electoral or presidential. The results tend to show that the suggested classification scheme performs better than a Support Vector Machine scheme, and a Naive Bayes classifier (10-fold cross validation).
Keywords :
binomial distribution; classification; text analysis; vocabulary; binomial distribution; categorization scheme; classification scheme; document weight terms characterization; naive Bayes classifier; specific vocabulary; standardized Z score computation; support vector machine scheme; word occurrence; Frequency measurement; Machine learning; Smoothing methods; Speech; Support vector machines; Text categorization; Vocabulary; Lexical Analysis; Machine Learning; Natural Language Processing; Political Discourse; Text Categorization;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM International Conference on
Conference_Location :
Lyon
Print_ISBN :
978-1-4577-1373-6
Electronic_ISBN :
978-0-7695-4513-4
DOI :
10.1109/WI-IAT.2011.19