Title :
Based on support vector and word features new word discovery research
Author :
Chengcheng, Li ; Yuanfang, Xu
Author_Institution :
Sch. of Comput. & Inf. Eng., Inner Mongolia Normal Univ., Hohhot, China
Abstract :
Chinese word segmentation is difficult to deal with ambiguity and unknown words recognition, this paper proposes the new word mode features as well as various word internal patterns from the training corpus of positive and negative samples to quantify extraction, and then through the training of support vector machine to get new support vector classification. On the test corpus with absolute discounting method new candidate extraction and selection, and with the training corpus to extract word patterns to quantify the new support vector classification for support vector machine test, through a portion of the rule filter to get the final word recognition results.
Keywords :
natural language processing; pattern classification; support vector machines; word processing; Chinese word segmentation; absolute discounting method; negative samples; positive samples; rule filter; support vector classification; support vector machine training; test corpus training; unknown word recognition; word discovery research; word internal patterns; word mode features; word pattern extraction; Classification algorithms; Computers; Educational institutions; Feature extraction; Statistical analysis; Support vector machines; Training; Natural language processing; support vector machine; word feature; word recognition;
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location :
Zhangjiajie
Print_ISBN :
978-1-4673-0088-9
DOI :
10.1109/CSAE.2012.6272688