• DocumentCode
    3353915
  • Title

    Web Page Downloading and Classification

  • Author

    Tran, Loc Q. ; Moon, Chan W. ; Le, Daniel X. ; Thoma, George R.

  • Author_Institution
    Nat. Libr. of Med., Bethesda, MD, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    321
  • Lastpage
    326
  • Abstract
    Describes the processes of downloading and classifying Web-based articles in online medical journals as a preliminary step to extracting bibliographic data to populate MEDLINE(R), the widely-used database of the National Library of Medicine (NLM). The processes are combined to develop an automated system named WPDC (“Web Page Downloading and Classification”). The system downloads the Web pages using Microsoft´s Windows Internet API tool WinInet, and a combination of several artificial intelligence (AI) techniques, including the breadth-first search algorithm and the constraint satisfaction method. The breadth-first search algorithm and the constraint satisfaction method are then used to traverse the Web page´s links, identify these pages as abstract, full text, PDF or image files, and recognize and generate the successors of the downloading pages
  • Keywords
    Internet; application program interfaces; bibliographic systems; classification; constraint handling; document handling; hypermedia; information resources; medical information systems; tree searching; Internet API tool; MEDLINE; Microsoft Windows; PDF file; WPDC; Web Page Downloading and Classification; Web page identification; Web page link traversal; WinInet; World Wide Web-based articles; abstract; artificial intelligence techniques; bibliographic data extraction; breadth-first search algorithm; constraint satisfaction method; downloading page successors; full text; image file; online medical journals; Biomedical imaging; Data mining; Image databases; Internet; Lab-on-a-chip; Libraries; Mars; Moon; Web pages; Web server;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems, 2001. CBMS 2001. Proceedings. 14th IEEE Symposium on
  • Conference_Location
    Bethesda, MD
  • ISSN
    1063-7125
  • Print_ISBN
    0-7695-1004-3
  • Type

    conf

  • DOI
    10.1109/CBMS.2001.941739
  • Filename
    941739