Title :
Use of Temporal Information: Detection of Periodicity, Aperiodicity, and Pitch in Speech
Author :
Deshmukh, O.M. ; Espy-Wilson, Carol Y. ; Salomon, Ariel ; Singh, Jawahar
Author_Institution :
Dept. of Electr., Univ. of Maryland, College Park, USA
Abstract :
In this paper, we present a time domain aperiodicity, periodicity, and pitch (APP) detector that estimates 1) the proportion of periodic and aperiodic energy in a speech signal and 2) the pitch period of the periodic component. The APP system is particularly useful in situations where the speech signal contains simultaneous periodic and aperiodic energy, as in the case of breathy vowels and some voiced obstruents. The performance of the APP system was evaluated on synthetic speech-like signals corrupted with noise at various levels of signal-to-noise ratio (SNR) and on three different natural speech databases that consist of simultaneously recorded electroglottograph (EGG) and acoustic data. When compared on a frame basis (at a frame rate of 2.5 ms) the results show excellent agreement between the periodic/aperiodic decisions made by the APP system and the estimates obtained from the EGG data (94.43% for periodicity and 96.32% for aperiodicity). The results also support previous studies that show that voiced obstruents are frequently manifested with either little or no aperiodic energy, or with strong periodic and aperiodic components. The EGG data were used as a reference for evaluating the pitch detection algorithm. The ground truth was not manually checked to rectify or exclude incorrect estimates. The overall gross error rate in pitch prediction across the three speech databases was 5.67%. In the case of synthetic speech-like data, the estimated SNR was found to be in close proportion to the actual SNR, and the pitch was always accurately found regardless of the presence of any shimmer or jitter.
Keywords :
speech processing; speech recognition; time-domain analysis; acoustic data; aperiodicity detection; electroglottograph data; periodicity detection; pitch detection; signal to noise ratio; speech database; speech signal; synthetic speech like data; temporal information; Acoustic noise; Databases; Detection algorithms; Detectors; Error analysis; Jitter; Natural languages; Noise level; Signal to noise ratio; Speech; Aperiodic and periodic energy; average magnitude difference function (AMDF); pitch detection; speech preprocessing; voice quality; voiced obstruents;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.851910