DocumentCode :
1798842
Title :
Semi-supervised learning of dialogue acts using sentence similarity based on word embeddings
Author :
Xiaohao Yang ; Jia Liu ; Zhenfeng Chen ; Weilan Wu
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2014
fDate :
7-9 July 2014
Firstpage :
882
Lastpage :
886
Abstract :
This paper describes a methodology for semi-supervised learning of dialogue acts using the similarity between sentences. We suppose that the dialogue sentences with the same dialogue act are more similar in terms of semantic and syntactic information. However, previous work on sentence similarity mainly modeled a sentence as bag-of-words and then compared different groups of words using corpus-based or knowledge-based measurements of word semantic similarity. Novelly, we present a vector-space sentence representation, composed of word embeddings, that is, the related word distributed representations, and these word embeddings are organised in a sentence syntactic structure. Given the vectors of the dialogue sentences, a distance measurement can be well-defined to compute the similarity between them. Finally, a seeded k-means clustering algorithm is implemented to classify the dialogue sentences into several categories corresponding to particular dialogue acts. This constitutes the semi-supervised nature of the approach, which aims to ameliorate the reliance of the availability of annotated corpora. Experiments with Switchboard Dialog Act corpus show that classification accuracy is improved by 14%, compared to the state-of-art methods based on Support Vector Machine.
Keywords :
interactive systems; learning (artificial intelligence); pattern classification; pattern clustering; word processing; Switchboard Dialog Act corpus; annotated corpora; classification accuracy improvement; dialogue acts; dialogue sentence classification; dialogue sentence similarity; distance measurement; seeded k-means clustering algorithm; semantic information; semisupervised learning; sentence syntactic structure; syntactic information; vector-space sentence representation; word distributed representations; word embeddings; Clustering algorithms; Computational linguistics; Semantics; Supervised learning; Support vector machines; Syntactics; Vectors; dialog acts; seeded k-means; sentence similarity; word embeddings;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Audio, Language and Image Processing (ICALIP), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-3902-2
Type :
conf
DOI :
10.1109/ICALIP.2014.7009921
Filename :
7009921
Link To Document :
بازگشت