• DocumentCode
    3604232
  • Title

    A Comment on “A Similarity Measure for Text Classification and Clustering”

  • Author

    Nagwani, Naresh Kumar

  • Author_Institution
    Nat. Inst. of Technol. Raipur, Raipur, India
  • Volume
    27
  • Issue
    9
  • fYear
    2015
  • Firstpage
    2589
  • Lastpage
    2590
  • Abstract
    A similarity measure namely, similarity measure for text processing (SMTP) is proposed by Lin et al. [1] for knowledge discovery on text collection. The proposed measure considered the three cases for similarity measurements between the pairs of documents. These cases are based on absence and presence of features in the pair of text documents. The first case covers the features appearing in both of the documents, second case covers the features appears in only one document and the third case covers the features appears in none of the documents. The proposed similarity measure considered to be ideal for finding similarity between the pair of text documents on the basis of presence or absence of features available in text documents, however, while exploring the SMTP similarity measurement it is found that the case of measuring similarity between the pair of similar documents is not covered. The objective of this work is to highlight this gap and propose a minor change to make the SMTP a complete similarity measurement technique for knowledge discovery in line with the other standard similarity techniques.
  • Keywords
    data mining; pattern classification; pattern clustering; text analysis; SMTP similarity measurement; knowledge discovery; similarity measure for text processing; similarity techniques; text classification; text clustering; text collection; text documents; Classification; Knowledge discovery; Measurement techniques; Text processing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2451616
  • Filename
    7177179