DocumentCode :
179026
Title :
Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes
Author :
Chao Yang ; Lei Xie ; Xiangzeng Zhou
Author_Institution :
Shaanxi Provincial Key Lab. of Speech & Image Inf. Process., Northwestern Polytech. Univ., Xi´an, China
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
4062
Lastpage :
4066
Abstract :
Traditional unsupervised broadcast news story segmentation approaches have to set the segmentation number manually, while this number is often unknown in real-world applications. In this paper, we solve this problem by modeling the generative process of stories as distance dependent Chinese restaurant process (dd-CRP) mixtures. We cut a news program into fixed-size text blocks and consider these blocks in the same story are generated from a story-specific topic. Specifically, we add a dd-CRP prior which has an essential bias that the blocks´ topic is more likely to be the same with the nearby blocks. Subsequently, story boundaries can be found by detecting the changes of topics. Experiments show that our approach outperforms both supervised and unsupervised approaches and the segmentation number can be automatically learned from data.
Keywords :
inference mechanisms; stochastic processes; television broadcasting; text analysis; unsupervised learning; automatic learning; block topic; dd-CRP mixtures; distance dependent Chinese restaurant processes; fixed-size text blocks; generative process modeling; news program; real-world applications; story boundaries; story generation; topic change detection; unsupervised broadcast news story segmentation; Bayes methods; Computational modeling; Dynamic programming; Image segmentation; Probabilistic logic; Speech; Speech processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854365
Filename :
6854365
Link To Document :
بازگشت