• DocumentCode
    2891592
  • Title

    Discriminative Application of String Similarity Methods to Chemical and Non-chemical Names for Biomedical Abbreviation Clustering

  • Author

    Yamaguchi, Atsuko ; Yamamoto, Yasunori ; Kim, Jin-Dong ; Takagi, Toshihisa ; Yonezawa, Akinori

  • Author_Institution
    Database Center for Life Sci., Res. Organ. of Inf. & Syst., Tokyo, Japan
  • fYear
    2011
  • fDate
    12-15 Nov. 2011
  • Firstpage
    544
  • Lastpage
    549
  • Abstract
    Term clustering by measuring the string similarities between terms is known to be an effective method to improve the quality of texts and dictionaries. However, based on our observations, chemical names are difficult to cluster using string similarity measures such as the edit distance. To demonstrate this difficulty clearly, we compared the string similarities determined using the edit distance, the Monge-Elkan score, SoftTFIDF, and the bigram Dice coefficient for chemical names with those for other terms. The experimental results show that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
  • Keywords
    bioinformatics; pattern clustering; string matching; text analysis; Monge-Elkan score; SoftTFIDF; bigram Dice coefficient; biomedical abbreviation clustering; discriminative application; edit distance; nonchemical name; string similarity method; term clustering; Biomedical measurements; Chemicals; Databases; Dictionaries; Gold; Length measurement; Unified modeling language; Abbreviation dictionary; String similarity measures; Term clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4577-1799-4
  • Type

    conf

  • DOI
    10.1109/BIBM.2011.98
  • Filename
    6120499