DocumentCode
2096073
Title
Integrating Biomedical Publications with Existing Metadata
Author
Nikolov, Nikolay ; Stoehr, Peter
Author_Institution
Eur. Bioinf. Inst., Poznan
fYear
2008
fDate
17-19 June 2008
Firstpage
653
Lastpage
655
Abstract
Currently biomedical literature is largely disconnected from its metadata. While there are freely accessible centralised metadata repositories the publications themselves are split among a large number of repositories. We address this problem by harvesting freely accessible biomedical publications from the Web and integrating them with the corresponding metadata. The system involves title recognition applied on the harvested publications using knowledge-based algorithm and a fuzzy match between the extracted title and the metadata records using edit distance metric. So far we were able to locate +300.000 publications on the Web and achieve +96% precision and nearly 85% recall on a random sample of 250 documents harvested from the Web.
Keywords
Internet; electronic publishing; fuzzy set theory; knowledge based systems; medical computing; meta data; World Wide Web; biomedical literature; biomedical publication; fuzzy match; knowledge-based algorithm; metadata; Bioinformatics; Biomedical computing; Fuzzy systems; HTML; Indexing; Information retrieval; Text mining; Uniform resource locators; Web services; XML; data integration;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer-Based Medical Systems, 2008. CBMS '08. 21st IEEE International Symposium on
Conference_Location
Jyvaskyla
ISSN
1063-7125
Print_ISBN
978-0-7695-3165-6
Type
conf
DOI
10.1109/CBMS.2008.127
Filename
4562076
Link To Document