Title :
Application of the Character-Level Statistical Method in Text Categorization
Author :
Yang, Zhen ; Nie, Xiangfei ; Xu, Weiran ; Guo, Jun
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Abstract :
It is generally thought that semantic and grammatical information was very significant to better understanding and processing of text. But in simple text categorization task, absence of this information does not always lead to the degradation of classifier performance. In this paper, we discuss the application of the character-level statistical method in text categorization, which extract character-level frequent pattern rather than consider the semantic and grammatical information. Compared with traditional n-gram model, the presented method is easy and convenient. Then by casting character-level statistical method in Bayesian theory framework, the proposed method was applied to spam detection. At last, we discuss the multiclass problem in short message categorization based on combination strategies. Effectiveness of the models and feasibility of the present method are verified
Keywords :
Bayes methods; natural language processing; pattern recognition; statistical analysis; text analysis; Bayesian theory; character-level frequent pattern extraction; character-level statistical method; grammatical information; semantic information; short message categorization; spam detection; text categorization; Bayesian methods; Casting; Data mining; Degradation; Feature extraction; Information processing; Natural languages; Statistical analysis; Text categorization; Text processing;
Conference_Titel :
Computational Intelligence and Security, 2006 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
1-4244-0605-6
Electronic_ISBN :
1-4244-0605-6
DOI :
10.1109/ICCIAS.2006.295293