Title :
A similarity measure for text processing
Author :
Jiang, Jung-Yi ; Cheng, Wen-Hao ; Chiou, Yu-Shu ; Lee, Shie-Jue
Author_Institution :
Dept. of Electr. Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan
Abstract :
In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.
Keywords :
text analysis; document data processing; feature values; novel similarity measure; text processing; Document similarity; Euclidean distance; Jaccard distance; classification accuracy; k-NN; similarity measure;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location :
Guilin
Print_ISBN :
978-1-4577-0305-8
DOI :
10.1109/ICMLC.2011.6016998