DocumentCode :
2311527
Title :
Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis (PLSA)
Author :
Hsieh, Ya-Chao ; Huang, Yu-Tsun ; Wang, Chien-Chih ; Lee, Lin-shan
Author_Institution :
Graduate Inst. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei
Volume :
1
fYear :
2006
fDate :
14-19 May 2006
Abstract :
Spoken document retrieval will be very important in the future network era. In this paper, we propose using a "dynamic key term lexicon" automatically extracted from the ever-changing document archives as an extra feature set in the retrieval task. This lexicon is much more compact but semantically rich, thus it can retrieve relevant documents more efficiently. The key terms include named entities and others selected by a new metric referred to as the term entropy here derived from probabilistic latent semantic analysis (PLSA). Various configurations of retrieval models were tested with a broadcast news archive in Mandarin Chinese and significant performance improvements were obtained, especially with the new version of PLSA models based on a key term lexicon rather than the full lexicon
Keywords :
information retrieval; natural languages; probability; Mandarin Chinese; dynamic key term lexicon; probabilistic latent semantic analysis; spoken document retrieval; Computer science; Content based retrieval; Data mining; Entropy; Frequency; Information analysis; Information retrieval; Large scale integration; Speech analysis; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
ISSN :
1520-6149
Print_ISBN :
1-4244-0469-X
Type :
conf
DOI :
10.1109/ICASSP.2006.1660182
Filename :
1660182
Link To Document :
بازگشت