Title :
Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data
Author :
Shokouhi, Navid ; Ziaei, Ali ; Sangwan, Abhijeet ; Hansen, John H. L.
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
Abstract :
The ability to estimate the number of words spoken by an individual over a certain period of time is valuable in second language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. In this study, we propose a noise robust overlapped speech detection algorithm to estimate the likelihood of overlapping speech in a given audio file in the presence of environment noise. This information is embedded into a word-count estimator, which uses a linear minimum mean square estimator (LMMSE) to predict the number of words from the syllable rate. Syllables are detected using a modified version of the mrate algorithm. The proposed word-count estimator is tested on long duration files from the Prof-Life-Log corpus. Data is recorded using a LENA recording device, worn by a primary speaker in various environments and under different noise conditions. The overlap detection system significantly outperforms baseline performance in noisy conditions. Furthermore, applying overlap detection results to word-count estimation achieves 35% relative improvement over our previous efforts, which included speech enhancement using spectral subtraction and silence removal.
Keywords :
least mean squares methods; speaker recognition; speech enhancement; speech intelligibility; LENA recording device; LMMSE; Prof-Life-Log corpus; audio file; environment noise; linear minimum mean square estimator; multi-party conversations; naturalistic scenarios; noise robust overlapped speech detection algorithm; overlap detection system; overlapping speech; primary speaker; realistic scenarios; robust automatic framework; silence removal; spectral subtraction; speech enhancement; syllable rate; word-count estimator; Detection algorithms; Estimation; Harmonic analysis; Noise; Noise measurement; Speech; Speech processing; Massive audio data; Prof-Life-Log; Word-count estimation; overlapped speech detection;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178867