Korean Spacing by Improving Viterbi Segmentation

Author

Hong, Gumwon ; Rim, Hae-Chang

fYear

2007

fDate

22-24 Aug. 2007

Firstpage

Lastpage

Abstract

This paper presents a Korean spacing approach which employs an improved Viterbi segmentation model. Traditional Viterbi segmentation using the word unigram language model is simple and fast, but has two problems: data sparseness and improper preference of fewer segments. To overcome these limitations, the segmentation model is extended by employing a split probability based on character bigram. Contextual information is selectively used for further resolution of spacing ambiguities without much increase of the complexity. Experimental results show that the extended model performs better than the traditional segmentation model. Futhermore, compared to the state of the art system, our approach achieves better efficiency in terms of processing time without losing significant accuracy.

Keywords

Hidden Markov models; Humans; Information technology; Natural languages; Statistics; Tagging; Text processing; Viterbi algorithm; spacingsegmentationViterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on

Conference_Location

Luoyang, Henan, China

Print_ISBN

978-0-7695-2930-1

Type

conf

DOI

10.1109/ALPIT.2007.84

Filename

4460618

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3105041