Title :
Fast multimedia contents retrieval by partially spoken query
Author :
Jeong, So-Young ; Han, Icksang ; Kwak, Byung-Kwan ; Cho, Jeongmi ; Kim, Jeongsu
Abstract :
We present novel fast multi-pass decoding strategies for recognizing large named-entities on a low-resource embedded device and thus retrieving MP3 music using spoken query, which contains partial segments of whole music titles and artists. After acoustic-phonetic decoding in the first stage processing, we incorporate word boundary information with phonetic confusion matrix into next stage partial word matching. Then, we rescore candidate phone lists using more complex context-dependent acoustic model, whose outputs are the retrieved songs. We tested our retrieval system to the task of retrieving 1000 songs on a commercial MP3 player and could achieve about 15.5% relative improvements in response time over conventional frame-based multi-pass decoding method without sacrificing recognition rates.
Keywords :
content-based retrieval; music; speech processing; speech recognition; MP3 music player; acoustic-phonetic decoding; context-dependent acoustic model; fast multimedia contents retrieval; low-resource embedded device; multipass decoding strategy; next stage partial word matching; phone lists; phonetic confusion matrix; spoken query; Acoustics; Context modeling; Decoding; Digital audio players; Speech; Speech recognition; Time factors;
Conference_Titel :
Consumer Electronics (ICCE), 2011 IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-8711-0
DOI :
10.1109/ICCE.2011.5722893