Automatic transcription of academic lectures from diverse disciplines

Author

AlHarbi, G. ; Hain, Thomas

Author_Institution

Dept. of Comput. Sci., Univ. of Sheffield, Sheffield, UK

fYear

2012

fDate

2-5 Dec. 2012

Firstpage

398

Lastpage

403

Abstract

In a multimedia world it is now common to record professional presentations, on video or with audio only. Such recordings include talks and academic lectures, which are becoming a valuable resource for students and professionals alike. However, organising such material from a diverse set of disciplines seems to be not an easy task. One way to address this problem is to build an Automatic Speech Recognition (ASR) system in order to use its output for analysing such materials. In this work ASR results for lectures from diverse sources are presented. The work is based on a new collection of data, obtained by the Liberated Learning Consortium (LLC). The study´s primary goals are two-fold: first to show variability across disciplines from an ASR perspective, and how to choose sources for the construction of language models (LMs); second, to provide an analysis of the lecture transcription for automatic determination of structures in lecture discourse. In particular, we investigate whether there are properties common to lectures from different disciplines. This study focuses on textual features. Lectures are multimodal experiences - it is not clear whether textual features alone are sufficient for the recognition of such common elements, or other features, e.g. acoustic features such as the speaking rate, are needed. The results show that such common properties are retained across disciplines even on ASR output with a Word Error Rate (WER) of 30%.

Keywords

acoustic signal processing; audio recording; educational computing; multimedia computing; natural language processing; speech recognition; text analysis; video recording; word processing; ASR system; LLC; LM; Liberated Learning Consortium; WER; academic material analysis; acoustic features; audio recording; automatic academic lecture transcription; automatic lecture discourse structure determination; automatic speech recognition system; language models; professional presentation recording; speaking rate; textual features; video recording; word error rate; Acoustics; Biology; Education; Hidden Markov models; Materials; Speech; Vocabulary; automatic speech recognition; lecture analysis; lecture transcription; perplexity; text analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language Technology Workshop (SLT), 2012 IEEE

Conference_Location

Miami, FL

Print_ISBN

978-1-4673-5125-6

Electronic_ISBN

978-1-4673-5124-9

Type

conf

DOI

10.1109/SLT.2012.6424257

Filename

6424257