مرکز منطقه ای اطلاع رساني علوم و فناوري - A comparison of phone and grapheme-based spoken term detection

DocumentCode :

3426789

Title :

A comparison of phone and grapheme-based spoken term detection

Author :

Wang, Dong ; Frankel, Joe ; Tejedor, Javier ; King, Simon

Author_Institution :

Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh

fYear :

2008

fDate :

March 31 2008-April 4 2008

Firstpage :

4969

Lastpage :

4972

Abstract :

We propose grapheme-based sub-word units for spoken term detection (STD). Compared to phones, graphemes have a number of potential advantages. For out-of-vocabulary search terms, phone- based approaches must generate a pronunciation using letter-to-sound rules. Using graphemes obviates this potentially error-prone hard decision, shifting pronunciation modelling into the statistical models describing the observation space. In addition, long-span grapheme language models can be trained directly from large text corpora. We present experiments on Spanish and English data, comparing phone and grapheme-based STD. For Spanish, where phone and grapheme-based systems give similar transcription word error rates (WERs), grapheme-based STD significantly outperforms a phone- based approach. The converse is found for English, where the phone- based system outperforms a grapheme approach. However, we present additional analysis which suggests that phone-based STD performance levels may be achieved by a grapheme-based approach despite lower transcription accuracy, and that the two approaches may usefully be combined. We propose a number of directions for future development of these ideas, and suggest that if grapheme-based STD can match phone-based performance, the inherent flexibility in dealing with out-of-vocabulary terms makes this a desirable approach.

Keywords :

speech processing; statistical analysis; error-prone hard decision; grapheme-based spoken term detection; grapheme-based subword units; letter-to-sound rules; long-span grapheme language model; out-of-vocabulary search terms; phone spoken term detection; pronunciation modelling; statistical model; transcription word error rates; Computer errors; Error analysis; Humans; Information retrieval; Laboratories; Lattices; Natural languages; Performance analysis; Speech; Vocabulary; Spoken term detection; graphemes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location :

Las Vegas, NV

ISSN :

1520-6149

Print_ISBN :

978-1-4244-1483-3

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2008.4518773

Filename :

4518773

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3426789