Title :
A novel similarity measure for semantic class induction in human-computer spoken dialogues
Author :
Li, Yali ; Bao, Changchun ; Yan, Yonghong
Author_Institution :
ThinkIT Lab., Chinese Acad. of Sci., Beijing, China
Abstract :
In this paper, we introduced a new semantic induction metric which can induce some semantic classes from a set of domain-specific unannotated data. We emphasized on the co-occurrence probability instead of just distances of word probability distribution. Compared to the traditional approach on right or left context to calculate the similarity, we used both left and right information simultaneously in the metric. Before processing, we removed fillers based on their unigram and bigram context distribution. We can find that co-occurrence metric is simple, effective and have lower misclassified portion. We test the metric on our Chinese voice-search data, and get F1 for 84.3.
Keywords :
human computer interaction; interactive systems; natural language interfaces; natural language processing; probability; speech-based user interfaces; Chinese voice-search data; bigram context distribution; co-occurrence probability; domain-specific unannotated data; human-computer spoken dialogues; semantic class induction; similarity measure; unigram context distribution; word probability distribution; Acoustic measurements; Electrostatic precipitators; Entropy; Induction generators; Laboratories; Man machine systems; Natural languages; Probability distribution; Tagging; Testing; semantic class induction; similarity metric;
Conference_Titel :
Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5074-9
Electronic_ISBN :
978-1-4244-5076-3
DOI :
10.1109/YCICT.2009.5382351