Title :
Automatic Classification of Tibetan Web Pages
Author :
Xu, Guixian ; Xiang, Chuncheng ; Gao, Xu ; Zhao, Xiaobing ; Yang, Guosheng
Author_Institution :
Coll. of Inf. Eng., Minzu Univ. of China, Beijing, China
Abstract :
A classification approach for Tibetan web pages is introduced in this paper. It takes advantage of the class feature dictionary and Rocchio classification algorithm to classify the Tibetan web pages into the predefined classes rapidly and accurately. The experimental results present that the approach has better classification accuracy for Tibetan web pages classification. It is useful and helpful for the construction of the statistical and rule-based classification of Tibetan texts as well as construction of high-quality Tibetan corpus.
Keywords :
Web sites; natural language processing; pattern classification; statistical analysis; text analysis; Rocchio classification algorithm; Tibetan texts; automatic Tibetan Web page classification; class feature dictionary; high-quality Tibetan corpus; rule-based classification; statistical classification; Classification algorithms; Dictionaries; Information processing; Kernel; Machine learning; Text categorization; Web pages; Classification of Web Pages; Text classification; Tibetan Information Processing;
Conference_Titel :
Computer Science and Electronics Engineering (ICCSEE), 2012 International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4673-0689-8
DOI :
10.1109/ICCSEE.2012.177