Title :
Term Extraction from Japanese Ancient Writings Using Probability of Character N-grams
Author :
Kimura, Fuminori ; Yoshimura, Mamoru ; Maeda, Akira
Author_Institution :
Coll. of Inf. Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
Abstract :
Currently, there are no tools available to separate ancient Japanese sentence into words. Therefore, it is difficult to extract archaic Japanese terms from Japanese ancient writings. In this paper, we propose a method of term extraction for ancient Japanese documents. We calculate the likelihood of character n-grams to be a word, and extract character n-grams with higher likelihood as archaic Japanese terms. We conducted experiments of term separation using the term likelihood by the proposed method.
Keywords :
document handling; natural language processing; Japanese ancient writings; ancient Japanese documents; ancient Japanese sentence separation; archaic Japanese term extraction; character n-grams probability; term likelihood; term separation; Adaptation models; Data mining; Educational institutions; Information science; Probability; Stochastic processes; Writing; Japanese ancient writings; character n-gram; term extraction; term likelihood;
Conference_Titel :
Culture and Computing (Culture Computing), 2011 Second International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4577-1593-8
DOI :
10.1109/Culture-Computing.2011.56