A Comment on “A Similarity Measure for Text Classification and Clustering”

Author

Nagwani, Naresh Kumar

Author_Institution

Nat. Inst. of Technol. Raipur, Raipur, India

Volume

27

Issue

9

fYear

2015

Firstpage

2589

Lastpage

2590

Abstract

A similarity measure namely, similarity measure for text processing (SMTP) is proposed by Lin et al. [1] for knowledge discovery on text collection. The proposed measure considered the three cases for similarity measurements between the pairs of documents. These cases are based on absence and presence of features in the pair of text documents. The first case covers the features appearing in both of the documents, second case covers the features appears in only one document and the third case covers the features appears in none of the documents. The proposed similarity measure considered to be ideal for finding similarity between the pair of text documents on the basis of presence or absence of features available in text documents, however, while exploring the SMTP similarity measurement it is found that the case of measuring similarity between the pair of similar documents is not covered. The objective of this work is to highlight this gap and propose a minor change to make the SMTP a complete similarity measurement technique for knowledge discovery in line with the other standard similarity techniques.

Keywords

data mining; pattern classification; pattern clustering; text analysis; SMTP similarity measurement; knowledge discovery; similarity measure for text processing; similarity techniques; text classification; text clustering; text collection; text documents; Classification; Knowledge discovery; Measurement techniques; Text processing;

fLanguage

English

Journal_Title

Knowledge and Data Engineering, IEEE Transactions on

Publisher

ieee

ISSN

1041-4347

Type

jour

DOI

10.1109/TKDE.2015.2451616

Filename

7177179