Title :
Automatic Extraction of Bibliographic Information from Biomedical Online Journal Articles Using a String Matching Algorithm
Author :
Kim, Jongwoo ; Le, Daniel X. ; Thoma, George R.
Author_Institution :
Nat. Libr. of Medicine, Bethesda, MD
Abstract :
A system has been developed to extract bibliographic data (grant numbers and databank accession numbers) from online biomedical journal articles for the National Library of Medicine´s MEDLINEreg database. Rule-based algorithms and a string matching algorithm are proposed to extract the bibliographic data from HTML-formatted articles. Experiments conducted with 411 medical articles from 73 journal issues show an accuracy exceeding 96%
Keywords :
bibliographic systems; information retrieval; knowledge based systems; medical information systems; string matching; MEDLINE database; automatic extraction; bibliographic information; biomedical online journal articles; databank accession numbers; grant numbers; rule-based algorithms; string matching algorithm; Data mining; Databases; Genetics; HTML; Labeling; Libraries; Mars; Production; Protein sequence; XML;
Conference_Titel :
Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on
Conference_Location :
Salt Lake City, UT
Print_ISBN :
0-7695-2517-1
DOI :
10.1109/CBMS.2006.55