• DocumentCode
    310586
  • Title

    Speaker identification based text to audio alignment for an audio retrieval system

  • Author

    Roy, Deb ; Malamud, Carl

  • Author_Institution
    MIT Media Lab., Cambridge, MA, USA
  • Volume
    2
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    1099
  • Abstract
    We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generated by the US Government) using a novel method based on speaker identification. Speaker sequence and approximate timing information is extracted from the text transcript and used to constrain a Viterbi alignment of speaker models to the observed audio. Speakers are modeled by computing Gaussian statistics of cepstral coefficients extracted from samples of each person´s speech. The speaker identification is used to locate speaker transition points in the audio which are then linked to corresponding speaker transitions in the text transcript. The alignment system has been successfully integrated into a World Wide Web based search and browse system as an experimental service on the Internet
  • Keywords
    Gaussian processes; Internet; audio systems; cepstral analysis; government data processing; information retrieval systems; natural language interfaces; online front-ends; speaker recognition; speech processing; statistical analysis; Gaussian statistics; Internet; US Government; United States House of Representatives; Viterbi alignment; WWW interface; World Wide Web; alignment system; approximate timing information; audio retrieval system; browse system; cepstral coefficients; experimental service; large audio database; search system; speaker identification; speaker models; speaker sequence; speech samples; text to audio alignment; text transcripts; Audio databases; Audio recording; Cepstral analysis; Data mining; Information retrieval; Internet; Statistics; Timing; US Government; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.596133
  • Filename
    596133