DocumentCode :
2351590
Title :
Factor semantics for document retrieval
Author :
Calvo, Rafael A.
Author_Institution :
Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
1998
fDate :
9-11 Dec 1998
Firstpage :
198
Lastpage :
203
Abstract :
Principal component analysis is a useful technique for reducing the dimensionality of datasets, a paramount need in high dimensional term spaces. We study three neural networks with Hebbian-like learning that approximately produce the principal components of a document database. The explained variance of the solutions shows how much information the reduced space retains. In this database the first factor is strongly described by words that are usually included in a stop-word list, other factors show high loadings in content specific terms; this indicates that the networks are learning the semantic space. The retrieval performance on the reduced space is compared with other methods
Keywords :
Hebbian learning; database management systems; feature extraction; neural nets; principal component analysis; query processing; Hebbian-like learning; datasets; dimensionality reduction; document database; document retrieval; factor semantics; feature extraction; neural networks; principal component analysis; semantic space; Feature extraction; Frequency; Helium; Information retrieval; Large scale integration; Mutual information; Principal component analysis; Reactive power; Redundancy; Thesauri;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 1998. Proceedings. Vth Brazilian Symposium on
Conference_Location :
Belo Horizonte
Print_ISBN :
0-8186-8629-4
Type :
conf
DOI :
10.1109/SBRN.1998.731028
Filename :
731028
Link To Document :
بازگشت