Title :
Developing corpus management system for Bahasa Indonesia the “Perisalah” project
Author :
Uliniansyah, Teduh ; Riza, Hammam ; Riandi, Oskar
Author_Institution :
Inf. & Comput. Syst., ICT Center (PTIK) Agency for the Assessment & Applic. of Technol., Jakarta, Indonesia
Abstract :
This paper present a report on the research and development of Indonesian corpus management system as part of the speech summarization system (Perisalah). The continuous improvement of the speech recognition for Indonesian language, require a better and larger monolingual corpus. We will discuss our method on building speech recognition. The system is equipped with a capability to handle variation of speech input, a more natural mode of communication between the system and the users. We discuss data contained in our text corpus and the corpus management system, mainly on how to handle sentence segmentation and unknown words (typos).
Keywords :
audio databases; continuous improvement; natural language processing; research and development management; speech recognition; text analysis; Bahasa Indonesia; Indonesian corpus management system; Indonesian language; Perisalah project; continuous improvement; monolingual corpus; research and development; sentence segmentation; speech recognition; speech summarization system; text corpus; Adaptation models; Buildings; Data models; Dictionaries; Speech; Speech processing; Speech recognition; Corpus management system; bahasa Indonesia; natural language; speech processing;
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
DOI :
10.1109/ICSDA.2013.6709887