• DocumentCode
    1843588
  • Title

    A Stochastic Technique to Obtain Training Data for Word Segmentation

  • Author

    Fukuda, Takuya ; Miura, Takao

  • Volume
    3
  • fYear
    2009
  • fDate
    15-18 Sept. 2009
  • Firstpage
    283
  • Lastpage
    286
  • Abstract
    Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
  • Keywords
    Stochastic processes; Training data; Markov Chain Monte Carlo (MCMC) method; Stochastic Techniques; Word Segmentation;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Milan, Italy
  • Print_ISBN
    978-0-7695-3801-3
  • Electronic_ISBN
    978-1-4244-5331-3
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2009.283
  • Filename
    5285030