Title :
Detecting Duplicate Biological Entities Using Markov Random Field-Based Edit Distance
Author :
Song, Min ; Rudniy, Alex
Author_Institution :
Inf. Syst., New Jersey Inst. of Technol., Newark, NJ
Abstract :
Duplicate entities detection in biological data became a demanded research task. In this paper, we propose a novel context-sensitive Markov random field-based edit distance. We apply the Markov random field theory to Needleman-Wunsch distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.
Keywords :
Markov processes; biology computing; Markov random field-based edit distance; Needleman-Wunsch distance; biological entities; biological entity matching; synonym matching; token-based distance algorithm; Bioinformatics; Computer science; Computer vision; Cost function; Databases; Explosives; Image processing; Information systems; Markov random fields; Speech processing; Duplicate entities detection; Edit Distance; Markov Random Field Theory;
Conference_Titel :
Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-0-7695-3452-7
DOI :
10.1109/BIBM.2008.34