Title :
Evaluation of stop word lists in text retrieval using Latent Semantic Indexing
Author :
Zaman, A.N.K. ; Matsakis, Pascal ; Brown, Charles
Author_Institution :
Sch. of Comput. Sci., Univ. of Guelph, Guelph, ON, Canada
Abstract :
The goal of this research is to evaluate the use of English stop word lists in Latent Semantic Indexing (LSI)-based Information Retrieval (IR) systems with large text datasets. Literature claims that the use of such lists improves retrieval performance. Here, three different lists are compared: two were compiled by IR groups at the University of Glasgow and the University of Tennessee, and one is our own list developed at the University of Northern British Columbia. We also examine the case where stop words are not removed from the input dataset. Our research finds that using tailored stop word lists improves retrieval performance. On the other hand, using arbitrary (non-tailored) lists or not using any list reduces the retrieval performance of LSI-based IR systems with large text datasets.
Keywords :
indexing; information retrieval; natural languages; text analysis; English stop word list; IR group; LSI-based IR system; Northern British Columbia; University of Glasgow; University of Tennessee; large text dataset; latent semantic indexing -based information retrieval system; Educational institutions; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Semantics; Vectors; Information Retrieval; Latent Semantic Indexing; recall-precision; stop words;
Conference_Titel :
Digital Information Management (ICDIM), 2011 Sixth International Conference on
Conference_Location :
Melbourn, QLD
Print_ISBN :
978-1-4577-1538-9
DOI :
10.1109/ICDIM.2011.6093315