DocumentCode :
1338845
Title :
A Semantics-Based Approach for Speech Annotation of Images
Author :
Kalashnikov, Dmitri V. ; Mehrotra, Sharad ; Xu, Jie ; Venkatasubramanian, Nalini
Author_Institution :
Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
Volume :
23
Issue :
9
fYear :
2011
Firstpage :
1373
Lastpage :
1387
Abstract :
Associating textual annotations/tags with multimedia content is among the most effective approaches to organize and to support search over digital images and multimedia databases. Despite advances in multimedia analysis, effective tagging remains largely a manual process wherein users add descriptive tags by hand, usually when uploading or browsing the collection, much after the pictures have been taken. This approach, however, is not convenient in all situations or for many applications, e.g., when users would like to publish and share pictures with others in real time. An alternate approach is to instead utilize a speech interface using which users may specify image tags that can be transcribed into textual annotations by employing automated speech recognizers. Such a speech-based approach has all the benefits of human tagging without the cumbersomeness and impracticality typically associated with human tagging in real time. The key challenge in such an approach is the potential low recognition quality of the state-of-the-art recognizers, especially, in noisy environments. In this paper, we explore how semantic knowledge in the form of co-occurrence between image tags can be exploited to boost the quality of speech recognition. We postulate the problem of speech annotation as that of disambiguating among multiple alternatives offered by the recognizer. An empirical evaluation has been conducted over both real speech recognizer´s output as well as synthetic data sets. The results demonstrate significant advantages of the proposed approach compared to the recognizer´s output under varying conditions.
Keywords :
multimedia databases; speech recognition; multimedia analysis; multimedia database; semantics based approach; speech annotation; speech based approach; speech recognition; Correlation; Image recognition; Real time systems; Semantics; Speech; Speech recognition; Tagging; Using speech for tagging and annotation; branch and bound algorithm.; correlation-based approach; maximum entropy approach; using semantics to improve ASR;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.185
Filename :
5590246
Link To Document :
بازگشت