Title :
A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois Digital Library Initiative project
Author :
Chen, Hsinchun ; Schatz, Bruce ; Ng, Tobun ; Martinez, Joanne ; Kirchhoff, Amy ; Lin, Chienting
Author_Institution :
Dept. of Manage. Inf. Syst., Arizona Univ., Tucson, AZ, USA
fDate :
8/1/1996 12:00:00 AM
Abstract :
This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer to as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we conducted experiments using the concept space approach on parallel supercomputers. Our test collection included computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising
Keywords :
bibliographic systems; indexing; information retrieval; information services; library automation; parallel processing; statistical analysis; thesauri; CM-5; INSPEC database; Illinois Digital Library Initiative project; SGI Power Challenge; automatic indexing; automatic thesaurus generation; computer science abstracts; domain-specific concepts; electrical engineering abstracts; engineering concept spaces; graphs; large-scale information retrieval; parallel computing; parallel supercomputers; scalability; semantic retrieval; statistical analysis; terms; textual analysis; vocabulary; weighted co-occurrence relationships; Information analysis; Information retrieval; Large-scale systems; Merging; Parallel processing; Scalability; Software libraries; Testing; Thesauri; Vocabulary;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on