DocumentCode :
3317683
Title :
Application of the Character-Level Statistical Method in Text Categorization
Author :
Yang, Zhen ; Nie, Xiangfei ; Xu, Weiran ; Guo, Jun
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Volume :
2
fYear :
2006
fDate :
3-6 Nov. 2006
Firstpage :
1412
Lastpage :
1417
Abstract :
It is generally thought that semantic and grammatical information was very significant to better understanding and processing of text. But in simple text categorization task, absence of this information does not always lead to the degradation of classifier performance. In this paper, we discuss the application of the character-level statistical method in text categorization, which extract character-level frequent pattern rather than consider the semantic and grammatical information. Compared with traditional n-gram model, the presented method is easy and convenient. Then by casting character-level statistical method in Bayesian theory framework, the proposed method was applied to spam detection. At last, we discuss the multiclass problem in short message categorization based on combination strategies. Effectiveness of the models and feasibility of the present method are verified
Keywords :
Bayes methods; natural language processing; pattern recognition; statistical analysis; text analysis; Bayesian theory; character-level frequent pattern extraction; character-level statistical method; grammatical information; semantic information; short message categorization; spam detection; text categorization; Bayesian methods; Casting; Data mining; Degradation; Feature extraction; Information processing; Natural languages; Statistical analysis; Text categorization; Text processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Security, 2006 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
1-4244-0605-6
Electronic_ISBN :
1-4244-0605-6
Type :
conf
DOI :
10.1109/ICCIAS.2006.295293
Filename :
4076199
Link To Document :
بازگشت