DocumentCode :
591916
Title :
Automatic transcription of academic lectures from diverse disciplines
Author :
AlHarbi, G. ; Hain, Thomas
Author_Institution :
Dept. of Comput. Sci., Univ. of Sheffield, Sheffield, UK
fYear :
2012
fDate :
2-5 Dec. 2012
Firstpage :
398
Lastpage :
403
Abstract :
In a multimedia world it is now common to record professional presentations, on video or with audio only. Such recordings include talks and academic lectures, which are becoming a valuable resource for students and professionals alike. However, organising such material from a diverse set of disciplines seems to be not an easy task. One way to address this problem is to build an Automatic Speech Recognition (ASR) system in order to use its output for analysing such materials. In this work ASR results for lectures from diverse sources are presented. The work is based on a new collection of data, obtained by the Liberated Learning Consortium (LLC). The study´s primary goals are two-fold: first to show variability across disciplines from an ASR perspective, and how to choose sources for the construction of language models (LMs); second, to provide an analysis of the lecture transcription for automatic determination of structures in lecture discourse. In particular, we investigate whether there are properties common to lectures from different disciplines. This study focuses on textual features. Lectures are multimodal experiences - it is not clear whether textual features alone are sufficient for the recognition of such common elements, or other features, e.g. acoustic features such as the speaking rate, are needed. The results show that such common properties are retained across disciplines even on ASR output with a Word Error Rate (WER) of 30%.
Keywords :
acoustic signal processing; audio recording; educational computing; multimedia computing; natural language processing; speech recognition; text analysis; video recording; word processing; ASR system; LLC; LM; Liberated Learning Consortium; WER; academic material analysis; acoustic features; audio recording; automatic academic lecture transcription; automatic lecture discourse structure determination; automatic speech recognition system; language models; professional presentation recording; speaking rate; textual features; video recording; word error rate; Acoustics; Biology; Education; Hidden Markov models; Materials; Speech; Vocabulary; automatic speech recognition; lecture analysis; lecture transcription; perplexity; text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
Type :
conf
DOI :
10.1109/SLT.2012.6424257
Filename :
6424257
Link To Document :
بازگشت