• DocumentCode
    1282563
  • Title

    A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois Digital Library Initiative project

  • Author

    Chen, Hsinchun ; Schatz, Bruce ; Ng, Tobun ; Martinez, Joanne ; Kirchhoff, Amy ; Lin, Chienting

  • Author_Institution
    Dept. of Manage. Inf. Syst., Arizona Univ., Tucson, AZ, USA
  • Volume
    18
  • Issue
    8
  • fYear
    1996
  • fDate
    8/1/1996 12:00:00 AM
  • Firstpage
    771
  • Lastpage
    782
  • Abstract
    This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer to as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we conducted experiments using the concept space approach on parallel supercomputers. Our test collection included computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising
  • Keywords
    bibliographic systems; indexing; information retrieval; information services; library automation; parallel processing; statistical analysis; thesauri; CM-5; INSPEC database; Illinois Digital Library Initiative project; SGI Power Challenge; automatic indexing; automatic thesaurus generation; computer science abstracts; domain-specific concepts; electrical engineering abstracts; engineering concept spaces; graphs; large-scale information retrieval; parallel computing; parallel supercomputers; scalability; semantic retrieval; statistical analysis; terms; textual analysis; vocabulary; weighted co-occurrence relationships; Information analysis; Information retrieval; Large-scale systems; Merging; Parallel processing; Scalability; Software libraries; Testing; Thesauri; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.531798
  • Filename
    531798