• DocumentCode
    3167342
  • Title

    Acoustic TextTiling for story segmentation of spoken documents

  • Author

    Zheng, Lilei ; Leung, Cheung-Chi ; Xie, Lei ; Ma, Bin ; Li, Haizhou

  • Author_Institution
    Shaanxi Provincial Key Lab. of Speech & Image Inf. Process., Northwestern Polytech. Univ., Xi´´an, China
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5121
  • Lastpage
    5124
  • Abstract
    We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.
  • Keywords
    Gaussian processes; audio streaming; speech processing; Gaussian posteriorgrams; LVCSR transcripts; TDT2 Mandarin corpus; acoustic TextTiling method; acoustic representations; audio streams; cosine-based lexical similarity; lexical TextTiling; segmental dynamic time warping; spoken document story segmentation; text blocks; Acoustic measurements; Acoustics; Glass; Heuristic algorithms; Speech; Speech processing; Vectors; TextTiling; segmental dynamic time warping; spoken document processing; story segmentation; topic segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289073
  • Filename
    6289073