DocumentCode :
3520150
Title :
Detecting Duplicate Biological Entities Using Markov Random Field-Based Edit Distance
Author :
Song, Min ; Rudniy, Alex
Author_Institution :
Inf. Syst., New Jersey Inst. of Technol., Newark, NJ
fYear :
2008
fDate :
3-5 Nov. 2008
Firstpage :
457
Lastpage :
460
Abstract :
Duplicate entities detection in biological data became a demanded research task. In this paper, we propose a novel context-sensitive Markov random field-based edit distance. We apply the Markov random field theory to Needleman-Wunsch distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.
Keywords :
Markov processes; biology computing; Markov random field-based edit distance; Needleman-Wunsch distance; biological entities; biological entity matching; synonym matching; token-based distance algorithm; Bioinformatics; Computer science; Computer vision; Cost function; Databases; Explosives; Image processing; Information systems; Markov random fields; Speech processing; Duplicate entities detection; Edit Distance; Markov Random Field Theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-0-7695-3452-7
Type :
conf
DOI :
10.1109/BIBM.2008.34
Filename :
4684939
Link To Document :
بازگشت