DocumentCode
507801
Title
Applying the Word Acquiring Algorithm to the Pinyin-to-Character Conversion
Author
Wei, Jiang ; Li, Pang Xiu
Author_Institution
Res. Center of Inf. Manage. & Inf. Syst., Harbin Inst. of Technol., Harbin, China
Volume
4
fYear
2009
fDate
14-16 Aug. 2009
Firstpage
17
Lastpage
21
Abstract
This paper applies the information entropy based word acquiring algorithm to the task of Pinyin-to-character (PTC) conversion, which adopts artificial immune network model. Firstly, the artificial immune network is used to overcome the sparse data problem and the independent identical distribution (iid.) assumption. Secondly, the word acquiring algorithm based on information entropy is presented to collect the Chinese word and some typically combinations. The experiments show that our method can achieve a better performance than the n-gram language model, and this kind of improvement is hardly acquired by the classical supervised learning models. In addition, the word acquiring method is applied, and further improves the PTC performance.
Keywords
artificial immune systems; learning (artificial intelligence); word processing; Chinese word; Pinyin-to-character conversion; artificial immune network model; independent identical distribution; information entropy; sparse data problem; supervised learning models; word acquiring algorithm; word acquiring method; Dictionaries; Electronic mail; Error analysis; Feedback; Information entropy; Information management; Management information systems; Natural languages; Supervised learning; Support vector machines; Information Entropy; Pinyin-to-Character Conversion; Word Acquiring Algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Computation, 2009. ICNC '09. Fifth International Conference on
Conference_Location
Tianjin
Print_ISBN
978-0-7695-3736-8
Type
conf
DOI
10.1109/ICNC.2009.568
Filename
5363161
Link To Document