• DocumentCode
    2992485
  • Title

    Automatic phrase segmentation and clustering in spontaneous speech

  • Author

    Beke, Andras ; Szaszak, Gyorgy ; Varadi, Viola

  • Author_Institution
    Res. Inst. of Linguistics, Hungary
  • fYear
    2013
  • fDate
    2-5 Dec. 2013
  • Firstpage
    459
  • Lastpage
    462
  • Abstract
    The aim of this research is to segment spontaneous speech using an unsupervised learning technique. We are especially interested from a machine perception or detection point-of-view, and focus on revealing some structure of prosody in spontaneous speech. The BEA spontaneous speech database is used to develop a speech segmentation system. The spontaneous narratives are annotated manually for intonational phrases (IP) and further divided for phonological phrases (PP). Word level transcription is also provided. For the automatic detection of IPs and embedded PPs, a two-step segmentation method is applied. In the first step, the IPs are detected automatically based on speech energy, spectral centroid and a double-thresholding technique. In the second step, PPs are segmented within the IPs, based on F0, energy and Kullback-Leibler divergence combined with an adaptive thresholding method. The results show that the proposed method can provide good and efficient framework for segmenting Hungarian spontaneous speech, with a performance close to read speech.
  • Keywords
    audio databases; natural language processing; pattern clustering; speech recognition; unsupervised learning; word processing; BEA spontaneous speech database; F0; Hungarian spontaneous speech segmentation; Kullback-Leibler divergence; adaptive thresholding method; automatic IP detection; automatic phrase clustering; automatic phrase segmentation; double-thresholding technique; embedded PP; intonational phrases; machine perception; phonological phrases; spectral centroid; speech energy; spontaneous narratives; spontaneous speech segmentation system; two-step segmentation method; unsupervised learning technique; word level transcription; Accuracy; Clustering algorithms; Databases; Feature extraction; Speech; Stress; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on
  • Conference_Location
    Budapest
  • Print_ISBN
    978-1-4799-1543-9
  • Type

    conf

  • DOI
    10.1109/CogInfoCom.2013.6719290
  • Filename
    6719290