• DocumentCode
    2156145
  • Title

    Automatic Extraction of Bibliographic Information from Biomedical Online Journal Articles Using a String Matching Algorithm

  • Author

    Kim, Jongwoo ; Le, Daniel X. ; Thoma, George R.

  • Author_Institution
    Nat. Libr. of Medicine, Bethesda, MD
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    905
  • Lastpage
    912
  • Abstract
    A system has been developed to extract bibliographic data (grant numbers and databank accession numbers) from online biomedical journal articles for the National Library of Medicine´s MEDLINEreg database. Rule-based algorithms and a string matching algorithm are proposed to extract the bibliographic data from HTML-formatted articles. Experiments conducted with 411 medical articles from 73 journal issues show an accuracy exceeding 96%
  • Keywords
    bibliographic systems; information retrieval; knowledge based systems; medical information systems; string matching; MEDLINE database; automatic extraction; bibliographic information; biomedical online journal articles; databank accession numbers; grant numbers; rule-based algorithms; string matching algorithm; Data mining; Databases; Genetics; HTML; Labeling; Libraries; Mars; Production; Protein sequence; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1063-7125
  • Print_ISBN
    0-7695-2517-1
  • Type

    conf

  • DOI
    10.1109/CBMS.2006.55
  • Filename
    1647685