DocumentCode :
3166900
Title :
Unsupervised two-stage keyword extraction from spoken documents by topic coherence and support vector machine
Author :
Chen, Yun-nung ; Huang, Yu ; Lee, Hung-yi ; Lee, Lin-shan
Author_Institution :
Coll. of EECS, Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
5041
Lastpage :
5044
Abstract :
This paper proposes an unsupervised two-stage approach to automatically extract keywords from spoken documents. In the first stage, for each candidate term we compute a topic coherence and term significance measure (TCS) based on probabilistic latent semantic analysis (PLSA) models. In the second stage, we take the candidate terms with highest and lowest TCS scores as positive and negative examples to train an SVM classifier in an unsupervised way using prosodic, lexical, and semantic features, and then classify the candidate keyword using this SVM classifier. The experiments with course lectures showed that the first-stage offered very good precision, so the second-stage effectively extracted the keywords.
Keywords :
document handling; feature extraction; probability; speech processing; support vector machines; unsupervised learning; PLSA models; SVM classifier; TCS; probabilistic latent semantic analysis model; spoken documents; support vector machine; term significance measure; topic coherence; unsupervised two-stage approach; unsupervised two-stage keyword extraction; Coherence; Encyclopedias; Feature extraction; Internet; Semantics; Support vector machines; keyword; support vector machine (SVM); topic coherence and term significance measure (TCS);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6289053
Filename :
6289053
Link To Document :
بازگشت