• DocumentCode
    3105041
  • Title

    Korean Spacing by Improving Viterbi Segmentation

  • Author

    Hong, Gumwon ; Rim, Hae-Chang

  • fYear
    2007
  • fDate
    22-24 Aug. 2007
  • Firstpage
    75
  • Lastpage
    80
  • Abstract
    This paper presents a Korean spacing approach which employs an improved Viterbi segmentation model. Traditional Viterbi segmentation using the word unigram language model is simple and fast, but has two problems: data sparseness and improper preference of fewer segments. To overcome these limitations, the segmentation model is extended by employing a split probability based on character bigram. Contextual information is selectively used for further resolution of spacing ambiguities without much increase of the complexity. Experimental results show that the extended model performs better than the traditional segmentation model. Futhermore, compared to the state of the art system, our approach achieves better efficiency in terms of processing time without losing significant accuracy.
  • Keywords
    Hidden Markov models; Humans; Information technology; Natural languages; Statistics; Tagging; Text processing; Viterbi algorithm; spacingsegmentationViterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
  • Conference_Location
    Luoyang, Henan, China
  • Print_ISBN
    978-0-7695-2930-1
  • Type

    conf

  • DOI
    10.1109/ALPIT.2007.84
  • Filename
    4460618