DocumentCode :
699815
Title :
Application of the DYPSA algorithm to segmented time scale modification of speech
Author :
Thomas, Mark R. P. ; Gudnason, Jon ; Naylor, Patrick A.
Author_Institution :
Imperial Coll. London, London, UK
fYear :
2008
fDate :
25-29 Aug. 2008
Firstpage :
1
Lastpage :
5
Abstract :
This paper presents a method for speech time scale modification. Voiced speech is pseudo-periodic, allowing time scale modification by the repetition or removal of cycles as necessary. However, in the case of unvoiced speech and at the boundaries of voiced speech, no such periodicity exists so the speech should not be modified. To address this issue, the proposed approach is novel in its use of the DYPSA algorithm to derive speech periodicity from glottal closure instants (GCIs), followed by a Gaussian Mixture model-based voiced/unvoiced/silence (VUS) classifier. A listening test based on ITU-T P800 has been conducted and has shown that, by employing VUS detection, the average mean opinion score of the perceptual quality of processed speech exceeds that of a method without VUS detection by 0.61 over a range of modification factors. Results are presented as a function of modification factor for normal and fast original talking rate. Reliable time scale modification of high audio quality enables many applications, such as time scale compression for fast scanning of recorded voicemail messages, slowing talking rate for improved intelligibility in forensics and lip synchronization in motion video.
Keywords :
Gaussian processes; mixture models; speech processing; DYPSA algorithm; GSI; Gaussian mixture model-based VUS classifier; ITU-T P800; average mean opinion score; fast original talking rate; forensics; glottal closure instants; high audio quality; lip synchronization; listening test; modification factor; motion video; normal talking rate; perceptual quality; pseudoperiodic speech; speech periodicity; speech time scale modification; voiced speech; voiced-unvoiced-silence classifier; voicemail messages; Abstracts; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing Conference, 2008 16th European
Conference_Location :
Lausanne
ISSN :
2219-5491
Type :
conf
Filename :
7080347
Link To Document :
بازگشت