DocumentCode :
1843588
Title :
A Stochastic Technique to Obtain Training Data for Word Segmentation
Author :
Fukuda, Takuya ; Miura, Takao
Volume :
3
fYear :
2009
fDate :
15-18 Sept. 2009
Firstpage :
283
Lastpage :
286
Abstract :
Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
Keywords :
Stochastic processes; Training data; Markov Chain Monte Carlo (MCMC) method; Stochastic Techniques; Word Segmentation;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Milan, Italy
Print_ISBN :
978-0-7695-3801-3
Electronic_ISBN :
978-1-4244-5331-3
Type :
conf
DOI :
10.1109/WI-IAT.2009.283
Filename :
5285030
Link To Document :
بازگشت