• DocumentCode
    3304046
  • Title

    A Hybrid Approach for Part-of-Speech Tagging of Burmese Texts

  • Author

    Myint, Cynthia

  • Author_Institution
    Univ. of Comput. Studies, Mandalay, Myanmar
  • fYear
    2011
  • fDate
    19-21 May 2011
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In Myanmar to English language translation system, in order to provide meaningful sentence from one language to another is non-trivial task. POS tagging is used as an early stage of linguistic text analysis in many applications. POS tagging is a process of assigning correct syntactic categories to each word. Tagsets and word disambiguation rules are fundamental parts of any POS tagger. This paper presents a new approach for POS tagging of Myanmar Language. Firstly, Users input a simple Myanmar sentence and then this sentence is segmented into words by using segmentation rules. These words are assigned to appropriate syntactic categories of Myanmar language by using rule based and probabilistic approach. This system applied CRF method for tagging POS ambiguities on words. CRF is a framework for building discriminative probabilistic models for segmenting and labeling sequential data. The tagsets for Myanmar POS, segmentation rule, tagging algorithm and CRF method are designed. The proposed approach is used UCSM Lexicon. So, this hybrid approach for POS tagging can give the optimal accuracy and robustness of machine translation system.
  • Keywords
    knowledge based systems; language translation; probability; text analysis; Burmese texts; CRF method; English; Myanmar language; Myanmar sentence; UCSM Lexicon; language translation system; linguistic text analysis; part-of-speech tagging; probabilistic approach; rule based approach; segmentation rules; syntactic categories; Accuracy; Grammar; Probabilistic logic; Speech; Syntactics; Tagging; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Management (CAMAN), 2011 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-9282-4
  • Type

    conf

  • DOI
    10.1109/CAMAN.2011.5778890
  • Filename
    5778890