DocumentCode
2891592
Title
Discriminative Application of String Similarity Methods to Chemical and Non-chemical Names for Biomedical Abbreviation Clustering
Author
Yamaguchi, Atsuko ; Yamamoto, Yasunori ; Kim, Jin-Dong ; Takagi, Toshihisa ; Yonezawa, Akinori
Author_Institution
Database Center for Life Sci., Res. Organ. of Inf. & Syst., Tokyo, Japan
fYear
2011
fDate
12-15 Nov. 2011
Firstpage
544
Lastpage
549
Abstract
Term clustering by measuring the string similarities between terms is known to be an effective method to improve the quality of texts and dictionaries. However, based on our observations, chemical names are difficult to cluster using string similarity measures such as the edit distance. To demonstrate this difficulty clearly, we compared the string similarities determined using the edit distance, the Monge-Elkan score, SoftTFIDF, and the bigram Dice coefficient for chemical names with those for other terms. The experimental results show that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
Keywords
bioinformatics; pattern clustering; string matching; text analysis; Monge-Elkan score; SoftTFIDF; bigram Dice coefficient; biomedical abbreviation clustering; discriminative application; edit distance; nonchemical name; string similarity method; term clustering; Biomedical measurements; Chemicals; Databases; Dictionaries; Gold; Length measurement; Unified modeling language; Abbreviation dictionary; String similarity measures; Term clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Conference_Location
Atlanta, GA
Print_ISBN
978-1-4577-1799-4
Type
conf
DOI
10.1109/BIBM.2011.98
Filename
6120499
Link To Document