DocumentCode :
518795
Title :
Removing fillers to induce semantic classes for a Chinese dialogue system
Author :
Li, Yali ; Yan, Yonghong
Author_Institution :
ThinkIT Lab., Chinese Acad. of Sci., Beijing, China
Volume :
4
fYear :
2010
fDate :
27-29 March 2010
Firstpage :
163
Lastpage :
166
Abstract :
In this paper, we introduced an unsupervised method to remove fillers in spoken dialogues semi-automatically based on their probability distribution. Disfluencies such as fillers, repairs often make the sentence ill-formed, longer and hard to process. Fillers were emphasized instead of repairs in this paper. We conduct the unigram and bigram distribution of fillers on our Chinese voice search data and find that only using these distributions, fillers are in the first 1% of all words. We give a new perspective of fillers distribution and new measure to detect fillers on the natural dialogue corpus.
Keywords :
natural language processing; statistical distributions; unsupervised learning; Chinese dialogue system; Chinese voice search data; fillers bigram distribution; fillers removal; fillers unigram distribution; natural dialogue corpus; probability distribution; unsupervised method; Acoustics; Bleaching; Delay; Laboratories; Natural language processing; Probability distribution; Speech processing; Speech recognition; Training data; fillers detection; fillers distribution; spoken dialogues;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computer Control (ICACC), 2010 2nd International Conference on
Conference_Location :
Shenyang
Print_ISBN :
978-1-4244-5845-5
Type :
conf
DOI :
10.1109/ICACC.2010.5486981
Filename :
5486981
Link To Document :
بازگشت