DocumentCode
1843588
Title
A Stochastic Technique to Obtain Training Data for Word Segmentation
Author
Fukuda, Takuya ; Miura, Takao
Volume
3
fYear
2009
fDate
15-18 Sept. 2009
Firstpage
283
Lastpage
286
Abstract
Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
Keywords
Stochastic processes; Training data; Markov Chain Monte Carlo (MCMC) method; Stochastic Techniques; Word Segmentation;
fLanguage
English
Publisher
iet
Conference_Titel
Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
Conference_Location
Milan, Italy
Print_ISBN
978-0-7695-3801-3
Electronic_ISBN
978-1-4244-5331-3
Type
conf
DOI
10.1109/WI-IAT.2009.283
Filename
5285030
Link To Document