• DocumentCode
    3278594
  • Title

    A similarity measure for text processing

  • Author

    Jiang, Jung-Yi ; Cheng, Wen-Hao ; Chiou, Yu-Shu ; Lee, Shie-Jue

  • Author_Institution
    Dept. of Electr. Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan
  • Volume
    4
  • fYear
    2011
  • fDate
    10-13 July 2011
  • Firstpage
    1460
  • Lastpage
    1465
  • Abstract
    In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.
  • Keywords
    text analysis; document data processing; feature values; novel similarity measure; text processing; Document similarity; Euclidean distance; Jaccard distance; classification accuracy; k-NN; similarity measure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
  • Conference_Location
    Guilin
  • ISSN
    2160-133X
  • Print_ISBN
    978-1-4577-0305-8
  • Type

    conf

  • DOI
    10.1109/ICMLC.2011.6016998
  • Filename
    6016998