Title :
Computationally-efficient endpointing features for natural spoken interaction with personal-assistant systems
Author :
Arsikere, Harish ; Shriberg, Elizabeth ; Ozertem, Umut
Author_Institution :
Electr. Eng. Dept., Univ. of California, Los Angeles, Los Angeles, CA, USA
Abstract :
Current speech-input systems typically use a nonspeech threshold for end-of-utterance detection. While usually sufficient for short utterances, the approach can cut speakers off during pauses in more complex utterances. We elicit personal-assistant speech (reminders, calendar entries, messaging, search) using a recognizer with a dramatically increased endpoint threshold, and find frequent nonfinal pauses. A standard endpointer with a 500 ms threshold (latency) results in a 36% cutoff rate for this corpus. Based on the new data, we develop low-cost acoustic features to discriminate nonfinal from final pauses. Features capture periodicity, speaking rate, spectral constancy, duration/intensity, and pitch of prepausal speech - using no speech recognition, speaker or session information. Classification experiments yield 20% EER at a 100 ms latency, thereby reducing both cutoffs and latency compared with the threshold-only baseline. Additional results on computational cost, feature importance, and speaker differences are discussed.
Keywords :
feature extraction; natural language interfaces; natural language processing; speaker recognition; speech-based user interfaces; EER; computationally-efficient endpointing features; end-of-utterance detection; endpoint threshold; feature importance; natural spoken interaction; nonspeech threshold; personal-assistant speech; personal-assistant systems; prepausal speech; session information; speaker differences; speaker information; speech recognition; speech-input systems; time 100 ms; time 500 ms; Databases; Feature extraction; Market research; Modulation; Speech; Speech recognition; Standards; acoustic-prosodic features; computationally efficient; endpointing; pausing; personal assistants;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854199