DocumentCode
3105041
Title
Korean Spacing by Improving Viterbi Segmentation
Author
Hong, Gumwon ; Rim, Hae-Chang
fYear
2007
fDate
22-24 Aug. 2007
Firstpage
75
Lastpage
80
Abstract
This paper presents a Korean spacing approach which employs an improved Viterbi segmentation model. Traditional Viterbi segmentation using the word unigram language model is simple and fast, but has two problems: data sparseness and improper preference of fewer segments. To overcome these limitations, the segmentation model is extended by employing a split probability based on character bigram. Contextual information is selectively used for further resolution of spacing ambiguities without much increase of the complexity. Experimental results show that the extended model performs better than the traditional segmentation model. Futhermore, compared to the state of the art system, our approach achieves better efficiency in terms of processing time without losing significant accuracy.
Keywords
Hidden Markov models; Humans; Information technology; Natural languages; Statistics; Tagging; Text processing; Viterbi algorithm; spacingsegmentationViterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location
Luoyang, Henan, China
Print_ISBN
978-0-7695-2930-1
Type
conf
DOI
10.1109/ALPIT.2007.84
Filename
4460618
Link To Document