Title :
Text similarity calculation based on domain feature word
Author :
Yan Luo ; Ning OuYang
Author_Institution :
School of Information and Communication, Guilin University of Electronic Technology, 541004, Guangxi, China
Abstract :
This paper proposes an improved method for feature selection based on traditional mutual information by establishing domain feature words which utilize the differences in the representation of a word in different classes. By the method, we can reselect the feature set out of the established one based on the traditional mutual information. It not only reduces the dimension of the vector but also represents the text more effectively. At the same time, a text similarity calculation system is designed in this paper. Finally, the experimental results show that the improved feature extraction method is superior to the traditional mutual information and the system has a good performance.
Keywords :
domain feature word; feature extraction; feature selection; text similarity calculation;
Conference_Titel :
Automatic Control and Artificial Intelligence (ACAI 2012), International Conference on
Conference_Location :
Xiamen
Electronic_ISBN :
978-1-84919-537-9
DOI :
10.1049/cp.2012.1399