Author_Institution :
Dept. of Comput. Sci., Columbia Univ., NY, USA
Abstract :
We introduce new techniques for extracting, analyzing, and visualizing textual contents from instructional videos of low production quality. Using automatic speech recognition, approximate transcripts (≈75% word error rate) are obtained from the originally highly compressed videos of university courses, each comprising between 10 to 30 lectures. Text material in the form of books or papers that accompany the course are then used to filter meaningful phrases from the seemingly incoherent transcripts. The resulting index into the transcripts is tied together and visualized in 3 experimental graphs that help in understanding the overall course structure and provide a tool for localizing certain topics for indexing. We specifically discuss a transcript index map, which graphically lays out key phrases for a course, a textbook chapter to transcript match, and finally a lecture transcript similarity graph, which clusters semantically similar lectures. We test our methods and tools on 7 full courses with 230 hours of video and 273 transcripts. We are able to extract up to 98 unique key terms for a given transcript and up to 347 unique key terms for an entire course. The accuracy of the Textbook Chapter to Transcript Match exceeds 70% on average. The methods used can be applied to genres of video in which there are recurrent thematic words (news, sports, meetings, etc.).
Keywords :
courseware; data visualisation; feature extraction; speech recognition; automatic speech recognition; course structure; data analysis; data visualization; index words; lecture transcript similarity graph; recurrent thematic words; transcript index map; visualizing textual content; Automatic speech recognition; Books; Computer science; Data visualization; Error analysis; Filters; Indexing; Speech analysis; Testing; Videos;