• DocumentCode
    2729232
  • Title

    Tibetan word segmentation system based on conditional random fields

  • Author

    Jiang, Tao ; Yu, Hongzhi ; Jam, Yangkyi

  • Author_Institution
    Key Lab. of China´´s Nat. Linguistic Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
  • fYear
    2011
  • fDate
    15-17 July 2011
  • Firstpage
    446
  • Lastpage
    448
  • Abstract
    Unlike English and other western languages, there are no delimiters to mark word boundaries in both Chinese and Tibetan. Therefore, word segmentation is the first step for Chinese and Tibetan natural language processing such as machine translation and information retrieval. However, Chinese word segmentation has been studied for many years and the technology is relatively mature. In contrast, Tibetan word segmentation is less concerned by researchers. In this paper, we learn from Chinese word segmentation approach and analysis the characteristic of Tibetan language, designs a Tibetan word segmentation system based on conditional random fields. The experiment shows that the algorithm is effective and can be preliminary applied.
  • Keywords
    image segmentation; natural language processing; random processes; word processing; Chinese word segmentation; Tibetan natural language processing; Tibetan word segmentation; conditional random fields; information retrieval; machine translation; Dictionaries; Feature extraction; Hidden Markov models; Laboratories; Markov processes; Natural language processing; Tagging; Natual language processing; Tibetan word segmentation; conditional random fields;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-9699-0
  • Type

    conf

  • DOI
    10.1109/ICSESS.2011.5982349
  • Filename
    5982349