مرکز منطقه ای اطلاع رساني علوم و فناوري - Multilevel speech intelligibility for robust speaker recognition

DocumentCode :

3163504

Title :

Multilevel speech intelligibility for robust speaker recognition

Author :

Nemala, Sridhar Krishna ; Elhilali, Mounya

Author_Institution :

Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4393

Lastpage :

4396

Abstract :

In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/ background and channel effects. Labeling the different regions of an acoustic signal according to their information levels would greatly benefit all automatic speech processing tasks. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed parsing approach exploits higher-level perceptual information about signal intelligibility levels. This labeling information is integrated into a novel multilevel framework for automatic speaker recognition task. The system processes the input acoustic signal along independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed system achieves significant improvements over standard baseline and VAD-based approaches, and attains a performance similar to the one obtained with oracle speech segmentation information.

Keywords :

acoustic signal processing; speaker recognition; speech intelligibility; VAD; acoustic signal; automatic speaker recognition task; automatic speech processing task; channel effect; higher-level perceptual information; information level; multilevel speech intelligibility; natural conversational speech; parsing approach; perception-based measure; robust speaker recognition; segmentation approach; signal intelligibility level; speech segment; voice-activity detection; Acoustics; Multilevel systems; NIST; Speaker recognition; Speech; Speech processing; Speech recognition; Noise robustness; Speaker recognition; Speech intelligibility; Voice-activity detection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288893

Filename :

6288893

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3163504