Title : 
Latent semantic indexing and large dataset: Study of term-weighting schemes
         
        
            Author : 
Zaman, A.N.K. ; Brown, Charles Grant
         
        
            Author_Institution : 
Comput. Sci. Program, Univ. of Northern British Columbia (UNBC), Prince George, BC, Canada
         
        
        
        
        
            Abstract : 
The primary purpose of an information retrieval (IR) system is to retrieve all the relevant documents, which are relevant to the user query. Latent Semantic Indexing/Analysis (LSI/LSA) based ad hoc document retrieval task investigates the performance of retrieval systems that search a static set of documents using new questions. Performance of LSI has been tested by others for several smaller datasets (e.g. MED, CISI abstracts) however, LSI has not been tested for a large dataset. So, we decided to test LSI for a very large dataset. We used TREC-8 LA Times dataset for our experimentation. We applied three different term weighting schemes and our own stop word list to judge the performance. Recall-precision graph and Coefficient of Variation (CV) were used to evaluate the retrieval performance of LSI based retrieval system. We found tf-idf term weighting scheme performs better than log-entropy and raw term frequency weighting schemes when the test collection became very large.
         
        
            Keywords : 
database indexing; query processing; very large databases; TREC-8 LA Times dataset; ad hoc document retrieval task; coefficient of variation; information retrieval system; latent semantic indexing; recall-precision graph; tf-idf term weighting scheme; user query; very large dataset; Artificial neural networks; Decision support systems; coefficient of variation; latent semantic indexing; recall-precision; retrieval performance; term-weighting;
         
        
        
        
            Conference_Titel : 
Digital Information Management (ICDIM), 2010 Fifth International Conference on
         
        
            Conference_Location : 
Thunder Bay, ON
         
        
            Print_ISBN : 
978-1-4244-7572-8
         
        
        
            DOI : 
10.1109/ICDIM.2010.5664669