• DocumentCode
    507801
  • Title

    Applying the Word Acquiring Algorithm to the Pinyin-to-Character Conversion

  • Author

    Wei, Jiang ; Li, Pang Xiu

  • Author_Institution
    Res. Center of Inf. Manage. & Inf. Syst., Harbin Inst. of Technol., Harbin, China
  • Volume
    4
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    17
  • Lastpage
    21
  • Abstract
    This paper applies the information entropy based word acquiring algorithm to the task of Pinyin-to-character (PTC) conversion, which adopts artificial immune network model. Firstly, the artificial immune network is used to overcome the sparse data problem and the independent identical distribution (iid.) assumption. Secondly, the word acquiring algorithm based on information entropy is presented to collect the Chinese word and some typically combinations. The experiments show that our method can achieve a better performance than the n-gram language model, and this kind of improvement is hardly acquired by the classical supervised learning models. In addition, the word acquiring method is applied, and further improves the PTC performance.
  • Keywords
    artificial immune systems; learning (artificial intelligence); word processing; Chinese word; Pinyin-to-character conversion; artificial immune network model; independent identical distribution; information entropy; sparse data problem; supervised learning models; word acquiring algorithm; word acquiring method; Dictionaries; Electronic mail; Error analysis; Feedback; Information entropy; Information management; Management information systems; Natural languages; Supervised learning; Support vector machines; Information Entropy; Pinyin-to-Character Conversion; Word Acquiring Algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation, 2009. ICNC '09. Fifth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3736-8
  • Type

    conf

  • DOI
    10.1109/ICNC.2009.568
  • Filename
    5363161