DocumentCode :
1282563
Title :
A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois Digital Library Initiative project
Author :
Chen, Hsinchun ; Schatz, Bruce ; Ng, Tobun ; Martinez, Joanne ; Kirchhoff, Amy ; Lin, Chienting
Author_Institution :
Dept. of Manage. Inf. Syst., Arizona Univ., Tucson, AZ, USA
Volume :
18
Issue :
8
fYear :
1996
fDate :
8/1/1996 12:00:00 AM
Firstpage :
771
Lastpage :
782
Abstract :
This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer to as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we conducted experiments using the concept space approach on parallel supercomputers. Our test collection included computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising
Keywords :
bibliographic systems; indexing; information retrieval; information services; library automation; parallel processing; statistical analysis; thesauri; CM-5; INSPEC database; Illinois Digital Library Initiative project; SGI Power Challenge; automatic indexing; automatic thesaurus generation; computer science abstracts; domain-specific concepts; electrical engineering abstracts; engineering concept spaces; graphs; large-scale information retrieval; parallel computing; parallel supercomputers; scalability; semantic retrieval; statistical analysis; terms; textual analysis; vocabulary; weighted co-occurrence relationships; Information analysis; Information retrieval; Large-scale systems; Merging; Parallel processing; Scalability; Software libraries; Testing; Thesauri; Vocabulary;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/34.531798
Filename :
531798
Link To Document :
بازگشت