DocumentCode :
3722396
Title :
Enhance Lecture Archive Search with OCR Slide Detection and In-Memory Database Technology
Author :
Martin Malchow;Matthias Bauer;Christoph Meinel
Author_Institution :
Hasso Plattner Inst., Univ. of Potsdam, Potsdam, Germany
fYear :
2015
Firstpage :
176
Lastpage :
183
Abstract :
On the Web there are a lot of frequently used video lecture archives which have grown up fast during the last couple of years. This fact led to a lot of lecture recordings which include knowledge for a variety of subjects. The typical way of searching these videos is by title and description. Unfortunately, not all important keywords and facts are mentioned in the title or description if they are available. Furthermore, there is no possibility to analyze how important those detected keywords are for the whole video. Another lecture archive specific virtue is that every regular university lecture is repeated yearly. Normally this will lead to duplicate lecture recordings. In search results doubling is disturbing for students when they want to watch the most recent lectures from the search result. This paper deals with the idea to resolve these problems by analyzing the recorded lecture slides with Optical Character Recognition (OCR). In addition to the name and description the OCR data will be used for a full text analysis to create an index for the lecture archive search. Furthermore, a fuzzy search is introduced. This will solve the issue of misspelled search requests and OCR detection defects. Additionally, this paper deals with the performance issues of a full text search with an in-memory database, issues in OCR detection, handling duplicate recordings of lectures repeated every year. Finally, an evaluation of the search performance in comparison with other database ideas besides the in-memory database is performed. Additionally, a user acceptability survey for the search results to increase the learning experience on lecture archives was performed. As a result, this paper shows how to handle the big amount of OCR data for a full text live search performed on an in-memory database in reasonable time. During this search a fuzzy search is performed additionally to resolve spelling mistakes and OCR detection problems. In conclusion this paper shows a solution for an enhanced video lecture archive search that supports students in online research processes and enhances their learning experience.
Keywords :
"Optical character recognition software","Search problems","Indexes","Search engines","Synchronization","Multimedia communication"
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2015 IEEE 18th International Conference on
Type :
conf
DOI :
10.1109/CSE.2015.19
Filename :
7371371
Link To Document :
بازگشت