Phoneme alignment of Filipino speech corpus

Author

Sagum, Ramil G. ; Ensomo, Ryan A. ; Tan, Emerson M. ; Guevara, Rowena Cnstina L

Author_Institution

Dept. of Electr. & Electron. Eng., Philippines Univ., Quezon City, Philippines

Volume

3

fYear

2003

fDate

15-17 Oct. 2003

Firstpage

964

Abstract

Segmentation and transcription of a speech corpus is a prerequisite in the development of an automatic speech recognition (ASR) system. In this paper, we develop a method for automatically segmenting and transcribing the Filipino speech corpus that is being developed at the DSP laboratory. A multi-layer perceptron (MLP) will take speech feature inputs, multiply them by weights computed from a training set of labeled speech. The system is based on a multi-layer perceptron and start synchronous decoder. The corpus was divided into three subcorpora, the paragraphs and sentences sub-corpus (par+sen), the words sub-corpus and the syllables sub-corpus. For the par+sen sub-corpus, we obtained a 62.64% phoneme recognition rate with 75.68% of labels within 20 ms of hand-labeled transcriptions; for the words-subcorpus, 63.93% phoneme recognition rate with 72/38% within 20 ms of hand-labeled transcriptions; and for the syllables sub-corpus, 72.60% phoneme recognition rate with 75.69% within 20 ms of hand-labeled transcriptions.

Keywords

multilayer perceptrons; natural languages; speech processing; speech recognition; Filipino speech corpus; automatic speech recognition system; multilayer perceptron; phoneme alignment; start synchronous decoder; Automatic speech recognition; Decoding; Digital signal processing; Hidden Markov models; Laboratories; Multilayer perceptrons; Natural languages; Neural networks; Speech processing; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region

Print_ISBN

0-7803-8162-9

Type

conf

DOI

10.1109/TENCON.2003.1273390

Filename

1273390