DocumentCode :
2891592
Title :
Discriminative Application of String Similarity Methods to Chemical and Non-chemical Names for Biomedical Abbreviation Clustering
Author :
Yamaguchi, Atsuko ; Yamamoto, Yasunori ; Kim, Jin-Dong ; Takagi, Toshihisa ; Yonezawa, Akinori
Author_Institution :
Database Center for Life Sci., Res. Organ. of Inf. & Syst., Tokyo, Japan
fYear :
2011
fDate :
12-15 Nov. 2011
Firstpage :
544
Lastpage :
549
Abstract :
Term clustering by measuring the string similarities between terms is known to be an effective method to improve the quality of texts and dictionaries. However, based on our observations, chemical names are difficult to cluster using string similarity measures such as the edit distance. To demonstrate this difficulty clearly, we compared the string similarities determined using the edit distance, the Monge-Elkan score, SoftTFIDF, and the bigram Dice coefficient for chemical names with those for other terms. The experimental results show that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
Keywords :
bioinformatics; pattern clustering; string matching; text analysis; Monge-Elkan score; SoftTFIDF; bigram Dice coefficient; biomedical abbreviation clustering; discriminative application; edit distance; nonchemical name; string similarity method; term clustering; Biomedical measurements; Chemicals; Databases; Dictionaries; Gold; Length measurement; Unified modeling language; Abbreviation dictionary; String similarity measures; Term clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1799-4
Type :
conf
DOI :
10.1109/BIBM.2011.98
Filename :
6120499
Link To Document :
بازگشت