DocumentCode
2156145
Title
Automatic Extraction of Bibliographic Information from Biomedical Online Journal Articles Using a String Matching Algorithm
Author
Kim, Jongwoo ; Le, Daniel X. ; Thoma, George R.
Author_Institution
Nat. Libr. of Medicine, Bethesda, MD
fYear
0
fDate
0-0 0
Firstpage
905
Lastpage
912
Abstract
A system has been developed to extract bibliographic data (grant numbers and databank accession numbers) from online biomedical journal articles for the National Library of Medicine´s MEDLINEreg database. Rule-based algorithms and a string matching algorithm are proposed to extract the bibliographic data from HTML-formatted articles. Experiments conducted with 411 medical articles from 73 journal issues show an accuracy exceeding 96%
Keywords
bibliographic systems; information retrieval; knowledge based systems; medical information systems; string matching; MEDLINE database; automatic extraction; bibliographic information; biomedical online journal articles; databank accession numbers; grant numbers; rule-based algorithms; string matching algorithm; Data mining; Databases; Genetics; HTML; Labeling; Libraries; Mars; Production; Protein sequence; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on
Conference_Location
Salt Lake City, UT
ISSN
1063-7125
Print_ISBN
0-7695-2517-1
Type
conf
DOI
10.1109/CBMS.2006.55
Filename
1647685
Link To Document