DocumentCode
2992485
Title
Automatic phrase segmentation and clustering in spontaneous speech
Author
Beke, Andras ; Szaszak, Gyorgy ; Varadi, Viola
Author_Institution
Res. Inst. of Linguistics, Hungary
fYear
2013
fDate
2-5 Dec. 2013
Firstpage
459
Lastpage
462
Abstract
The aim of this research is to segment spontaneous speech using an unsupervised learning technique. We are especially interested from a machine perception or detection point-of-view, and focus on revealing some structure of prosody in spontaneous speech. The BEA spontaneous speech database is used to develop a speech segmentation system. The spontaneous narratives are annotated manually for intonational phrases (IP) and further divided for phonological phrases (PP). Word level transcription is also provided. For the automatic detection of IPs and embedded PPs, a two-step segmentation method is applied. In the first step, the IPs are detected automatically based on speech energy, spectral centroid and a double-thresholding technique. In the second step, PPs are segmented within the IPs, based on F0, energy and Kullback-Leibler divergence combined with an adaptive thresholding method. The results show that the proposed method can provide good and efficient framework for segmenting Hungarian spontaneous speech, with a performance close to read speech.
Keywords
audio databases; natural language processing; pattern clustering; speech recognition; unsupervised learning; word processing; BEA spontaneous speech database; F0; Hungarian spontaneous speech segmentation; Kullback-Leibler divergence; adaptive thresholding method; automatic IP detection; automatic phrase clustering; automatic phrase segmentation; double-thresholding technique; embedded PP; intonational phrases; machine perception; phonological phrases; spectral centroid; speech energy; spontaneous narratives; spontaneous speech segmentation system; two-step segmentation method; unsupervised learning technique; word level transcription; Accuracy; Clustering algorithms; Databases; Feature extraction; Speech; Stress; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on
Conference_Location
Budapest
Print_ISBN
978-1-4799-1543-9
Type
conf
DOI
10.1109/CogInfoCom.2013.6719290
Filename
6719290
Link To Document