A Stochastic Technique to Obtain Training Data for Word Segmentation

Author

Fukuda, Takuya ; Miura, Takao

Volume

fYear

2009

fDate

15-18 Sept. 2009

Firstpage

283

Lastpage

286

Abstract

Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.

Keywords

Stochastic processes; Training data; Markov Chain Monte Carlo (MCMC) method; Stochastic Techniques; Word Segmentation;

fLanguage

English

Publisher

iet

Conference_Titel

Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on

Conference_Location

Milan, Italy

Print_ISBN

978-0-7695-3801-3

Electronic_ISBN

978-1-4244-5331-3

Type

conf

DOI

10.1109/WI-IAT.2009.283

Filename

5285030

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1843588