• DocumentCode
    3520150
  • Title

    Detecting Duplicate Biological Entities Using Markov Random Field-Based Edit Distance

  • Author

    Song, Min ; Rudniy, Alex

  • Author_Institution
    Inf. Syst., New Jersey Inst. of Technol., Newark, NJ
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    457
  • Lastpage
    460
  • Abstract
    Duplicate entities detection in biological data became a demanded research task. In this paper, we propose a novel context-sensitive Markov random field-based edit distance. We apply the Markov random field theory to Needleman-Wunsch distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.
  • Keywords
    Markov processes; biology computing; Markov random field-based edit distance; Needleman-Wunsch distance; biological entities; biological entity matching; synonym matching; token-based distance algorithm; Bioinformatics; Computer science; Computer vision; Cost function; Databases; Explosives; Image processing; Information systems; Markov random fields; Speech processing; Duplicate entities detection; Edit Distance; Markov Random Field Theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-0-7695-3452-7
  • Type

    conf

  • DOI
    10.1109/BIBM.2008.34
  • Filename
    4684939