• DocumentCode
    591916
  • Title

    Automatic transcription of academic lectures from diverse disciplines

  • Author

    AlHarbi, G. ; Hain, Thomas

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sheffield, Sheffield, UK
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    398
  • Lastpage
    403
  • Abstract
    In a multimedia world it is now common to record professional presentations, on video or with audio only. Such recordings include talks and academic lectures, which are becoming a valuable resource for students and professionals alike. However, organising such material from a diverse set of disciplines seems to be not an easy task. One way to address this problem is to build an Automatic Speech Recognition (ASR) system in order to use its output for analysing such materials. In this work ASR results for lectures from diverse sources are presented. The work is based on a new collection of data, obtained by the Liberated Learning Consortium (LLC). The study´s primary goals are two-fold: first to show variability across disciplines from an ASR perspective, and how to choose sources for the construction of language models (LMs); second, to provide an analysis of the lecture transcription for automatic determination of structures in lecture discourse. In particular, we investigate whether there are properties common to lectures from different disciplines. This study focuses on textual features. Lectures are multimodal experiences - it is not clear whether textual features alone are sufficient for the recognition of such common elements, or other features, e.g. acoustic features such as the speaking rate, are needed. The results show that such common properties are retained across disciplines even on ASR output with a Word Error Rate (WER) of 30%.
  • Keywords
    acoustic signal processing; audio recording; educational computing; multimedia computing; natural language processing; speech recognition; text analysis; video recording; word processing; ASR system; LLC; LM; Liberated Learning Consortium; WER; academic material analysis; acoustic features; audio recording; automatic academic lecture transcription; automatic lecture discourse structure determination; automatic speech recognition system; language models; professional presentation recording; speaking rate; textual features; video recording; word error rate; Acoustics; Biology; Education; Hidden Markov models; Materials; Speech; Vocabulary; automatic speech recognition; lecture analysis; lecture transcription; perplexity; text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424257
  • Filename
    6424257