• DocumentCode
    1954287
  • Title

    A Letter Tagging Approach to Uyghur Tokenization

  • Author

    Aisha, Batuer

  • fYear
    2010
  • fDate
    28-30 Dec. 2010
  • Firstpage
    11
  • Lastpage
    14
  • Abstract
    In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very useful to other Altaic languages information processing.
  • Keywords
    identification technology; natural language processing; altaic language information processing; label bias; letter tagging; uyghur tokenization; word tokenization; Accuracy; Hidden Markov models; Information processing; Labeling; Natural language processing; Tagging; Training; Letter tagging approach; Morpheme analysis (MA); Tokenization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2010 International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4244-9063-9
  • Type

    conf

  • DOI
    10.1109/IALP.2010.72
  • Filename
    5681556