DocumentCode
3520150
Title
Detecting Duplicate Biological Entities Using Markov Random Field-Based Edit Distance
Author
Song, Min ; Rudniy, Alex
Author_Institution
Inf. Syst., New Jersey Inst. of Technol., Newark, NJ
fYear
2008
fDate
3-5 Nov. 2008
Firstpage
457
Lastpage
460
Abstract
Duplicate entities detection in biological data became a demanded research task. In this paper, we propose a novel context-sensitive Markov random field-based edit distance. We apply the Markov random field theory to Needleman-Wunsch distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.
Keywords
Markov processes; biology computing; Markov random field-based edit distance; Needleman-Wunsch distance; biological entities; biological entity matching; synonym matching; token-based distance algorithm; Bioinformatics; Computer science; Computer vision; Cost function; Databases; Explosives; Image processing; Information systems; Markov random fields; Speech processing; Duplicate entities detection; Edit Distance; Markov Random Field Theory;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
978-0-7695-3452-7
Type
conf
DOI
10.1109/BIBM.2008.34
Filename
4684939
Link To Document