Title :
A Comment on “A Similarity Measure for Text Classification and Clustering”
Author :
Nagwani, Naresh Kumar
Author_Institution :
Nat. Inst. of Technol. Raipur, Raipur, India
Abstract :
A similarity measure namely, similarity measure for text processing (SMTP) is proposed by Lin et al. [1] for knowledge discovery on text collection. The proposed measure considered the three cases for similarity measurements between the pairs of documents. These cases are based on absence and presence of features in the pair of text documents. The first case covers the features appearing in both of the documents, second case covers the features appears in only one document and the third case covers the features appears in none of the documents. The proposed similarity measure considered to be ideal for finding similarity between the pair of text documents on the basis of presence or absence of features available in text documents, however, while exploring the SMTP similarity measurement it is found that the case of measuring similarity between the pair of similar documents is not covered. The objective of this work is to highlight this gap and propose a minor change to make the SMTP a complete similarity measurement technique for knowledge discovery in line with the other standard similarity techniques.
Keywords :
data mining; pattern classification; pattern clustering; text analysis; SMTP similarity measurement; knowledge discovery; similarity measure for text processing; similarity techniques; text classification; text clustering; text collection; text documents; Classification; Knowledge discovery; Measurement techniques; Text processing;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2015.2451616