Title :
Integrating Biomedical Publications with Existing Metadata
Author :
Nikolov, Nikolay ; Stoehr, Peter
Author_Institution :
Eur. Bioinf. Inst., Poznan
Abstract :
Currently biomedical literature is largely disconnected from its metadata. While there are freely accessible centralised metadata repositories the publications themselves are split among a large number of repositories. We address this problem by harvesting freely accessible biomedical publications from the Web and integrating them with the corresponding metadata. The system involves title recognition applied on the harvested publications using knowledge-based algorithm and a fuzzy match between the extracted title and the metadata records using edit distance metric. So far we were able to locate +300.000 publications on the Web and achieve +96% precision and nearly 85% recall on a random sample of 250 documents harvested from the Web.
Keywords :
Internet; electronic publishing; fuzzy set theory; knowledge based systems; medical computing; meta data; World Wide Web; biomedical literature; biomedical publication; fuzzy match; knowledge-based algorithm; metadata; Bioinformatics; Biomedical computing; Fuzzy systems; HTML; Indexing; Information retrieval; Text mining; Uniform resource locators; Web services; XML; data integration;
Conference_Titel :
Computer-Based Medical Systems, 2008. CBMS '08. 21st IEEE International Symposium on
Conference_Location :
Jyvaskyla
Print_ISBN :
978-0-7695-3165-6
DOI :
10.1109/CBMS.2008.127