Title :
Statistical models for topic identification using phoneme substrings
Author :
Wright, Jerry H. ; Carey, Michael J. ; Parris, Eluned S.
Author_Institution :
Ensigma Ltd., Chepstow, UK
Abstract :
Phoneme substrings that are recurrent within training data are detected and logged using dynamic programming procedures. The resulting keystrings (cluster centroids) are awarded a usefulness rating based on smoothed occurrence probabilities in wanted and unwanted data. The rankings of the keystrings by usefulness measured on training, development test and final test data for three language-pairs from the OGI multi-language corpus are highly consistent, showing that language-specific features are being found. Statistical measures of local association also suggest that keystring occurrences can be correlated in a manner similar to that of keywords for a particular topic. With improved recognition accuracy it should be possible to exploit this information in order to enhance performance in topic identification
Keywords :
correlation methods; dynamic programming; probability; smoothing methods; speech processing; speech recognition; statistical analysis; OGI multilanguage corpus; cluster centroids; correlation; development test; dynamic programming; keystrings; language-pairs; local association; performance; phoneme substrings; recognition accuracy; smoothed occurrence probabilities; statistical measures; statistical models; test data; topic identification; training data; usefulness rating; Cepstral analysis; Dynamic programming; Filter bank; Hidden Markov models; Mathematics; Parameter estimation; Speech recognition; Testing; Training data; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.540419