DocumentCode :
3105041
Title :
Korean Spacing by Improving Viterbi Segmentation
Author :
Hong, Gumwon ; Rim, Hae-Chang
fYear :
2007
fDate :
22-24 Aug. 2007
Firstpage :
75
Lastpage :
80
Abstract :
This paper presents a Korean spacing approach which employs an improved Viterbi segmentation model. Traditional Viterbi segmentation using the word unigram language model is simple and fast, but has two problems: data sparseness and improper preference of fewer segments. To overcome these limitations, the segmentation model is extended by employing a split probability based on character bigram. Contextual information is selectively used for further resolution of spacing ambiguities without much increase of the complexity. Experimental results show that the extended model performs better than the traditional segmentation model. Futhermore, compared to the state of the art system, our approach achieves better efficiency in terms of processing time without losing significant accuracy.
Keywords :
Hidden Markov models; Humans; Information technology; Natural languages; Statistics; Tagging; Text processing; Viterbi algorithm; spacingsegmentationViterbi algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location :
Luoyang, Henan, China
Print_ISBN :
978-0-7695-2930-1
Type :
conf
DOI :
10.1109/ALPIT.2007.84
Filename :
4460618
Link To Document :
بازگشت