DocumentCode :
2648742
Title :
Term Extraction from Japanese Ancient Writings Using Probability of Character N-grams
Author :
Kimura, Fuminori ; Yoshimura, Mamoru ; Maeda, Akira
Author_Institution :
Coll. of Inf. Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
fYear :
2011
fDate :
20-22 Oct. 2011
Firstpage :
183
Lastpage :
184
Abstract :
Currently, there are no tools available to separate ancient Japanese sentence into words. Therefore, it is difficult to extract archaic Japanese terms from Japanese ancient writings. In this paper, we propose a method of term extraction for ancient Japanese documents. We calculate the likelihood of character n-grams to be a word, and extract character n-grams with higher likelihood as archaic Japanese terms. We conducted experiments of term separation using the term likelihood by the proposed method.
Keywords :
document handling; natural language processing; Japanese ancient writings; ancient Japanese documents; ancient Japanese sentence separation; archaic Japanese term extraction; character n-grams probability; term likelihood; term separation; Adaptation models; Data mining; Educational institutions; Information science; Probability; Stochastic processes; Writing; Japanese ancient writings; character n-gram; term extraction; term likelihood;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Culture and Computing (Culture Computing), 2011 Second International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4577-1593-8
Type :
conf
DOI :
10.1109/Culture-Computing.2011.56
Filename :
6103252
Link To Document :
بازگشت