DocumentCode :
2406550
Title :
Linguistics-oriented language resource development at the National Institute for Japanese Language and Linguistics
Author :
Maekawa, Kikuo
Author_Institution :
Dept. Corpus Studies, Nat. Inst. for Japanese Language & Linguistics, Japan
fYear :
2011
fDate :
26-28 Oct. 2011
Firstpage :
1
Lastpage :
6
Abstract :
The aim of this talk consists in the introduction to the language-resource-related activities of the National Institute for Japanese Language and Linguistics (NINJAL). Since the last half of the 1990s, the former National Language Research Institute (NLRI) played a central role in the development of Japanese language resources by constructing corpora like Corpus of Spontaneous Japanese (CSJ) and Taiyo Corpus. In 2006, the language resource group of NLRI started a Japanese corpus compilation initiative named KOTONOHA, and set about the construction of a 100 million words Balanced Corpus of Contemporary Written Japanese (BCCWJ). The activity of NLRI was inherited by the NINJAL Center for Corpus Development reestablished in 2009. Now that the construction of the BCCWJ was completed successfully in August 2011, the NINJAL center set about two new projects of exploratory nature: a historical corpus project and a 10-billion-word ultra-large-scale Web-based corpus project. In addition to the presentation of the NLRI-NINJAL activities, language resource development in Japanese institutions other than NINJAL will be introduced briefly in the beginning. Also, application of the CSJ to the study of phonetics will also be demonstrated at the end.
Keywords :
Internet; linguistics; natural language processing; speech processing; 10-billion-word ultra-large-scale Web-based corpus project; BCCWJ; Balanced Corpus of Contemporary Written Japanese; CSJ; Corpus of Spontaneous Japanese; Japanese corpus compilation initiative; Japanese language resource development; KOTONOHA; NINJAL Center for Corpus Development; NLRI; National Institute for Japanese Language and Linguistics; National Language Research Institute; Taiyo Corpus; linguistics-oriented language resource development; Decision support systems; Helium; BCCWJ; Corpus; KOTONOHA; NINJAL;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on
Conference_Location :
Hsinchu
Print_ISBN :
978-1-4577-0930-2
Type :
conf
DOI :
10.1109/ICSDA.2011.6085971
Filename :
6085971
Link To Document :
بازگشت