DocumentCode
3278594
Title
A similarity measure for text processing
Author
Jiang, Jung-Yi ; Cheng, Wen-Hao ; Chiou, Yu-Shu ; Lee, Shie-Jue
Author_Institution
Dept. of Electr. Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan
Volume
4
fYear
2011
fDate
10-13 July 2011
Firstpage
1460
Lastpage
1465
Abstract
In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.
Keywords
text analysis; document data processing; feature values; novel similarity measure; text processing; Document similarity; Euclidean distance; Jaccard distance; classification accuracy; k-NN; similarity measure;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location
Guilin
ISSN
2160-133X
Print_ISBN
978-1-4577-0305-8
Type
conf
DOI
10.1109/ICMLC.2011.6016998
Filename
6016998
Link To Document