DocumentCode :
3278594
Title :
A similarity measure for text processing
Author :
Jiang, Jung-Yi ; Cheng, Wen-Hao ; Chiou, Yu-Shu ; Lee, Shie-Jue
Author_Institution :
Dept. of Electr. Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan
Volume :
4
fYear :
2011
fDate :
10-13 July 2011
Firstpage :
1460
Lastpage :
1465
Abstract :
In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.
Keywords :
text analysis; document data processing; feature values; novel similarity measure; text processing; Document similarity; Euclidean distance; Jaccard distance; classification accuracy; k-NN; similarity measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location :
Guilin
ISSN :
2160-133X
Print_ISBN :
978-1-4577-0305-8
Type :
conf
DOI :
10.1109/ICMLC.2011.6016998
Filename :
6016998
Link To Document :
بازگشت