Title :
Efficient retrieval of Malay language documents using Latent Semantic Indexing
Author :
Sadjirin, Roslan ; Rahman, Nurazzah Abd
Author_Institution :
Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
Abstract :
The main objectives of this research is to investigate whether by using Latent Semantic Indexing (LSI) will improve the retrieval effectiveness on Malay document, compared to by using exact term-matching technique. LSI is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and documents and documents. Cosine similarity measurement is used to measure the similarity between the query word and terms as well as the documents. This research uses Malay Language Test Collection consisting of 210 Malay documents, queries, relevant judgment and Malay stemmer to stem Malay terms. Results and analyses show that, LSI retrieval method outperformed the exact term-matching technique despite the longer processing time it took during the indexing. The best result for retrieval effectiveness for Malay documents in this domain is achieved when k-dimension is 4 and the threshold value is 0.8, which is 80.2 percent.
Keywords :
indexing; information retrieval; natural language processing; singular value decomposition; word processing; Malay language document retrieval; Malay language test collection; Malay stemmer; exact term matching technique; latent semantic indexing; singular value decomposition; DSL; Decision support systems; Latent Semantic Analysis; Latent Semantic Indexing; Malay Information Retrieval;
Conference_Titel :
Information Technology (ITSim), 2010 International Symposium in
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-6715-0
DOI :
10.1109/ITSIM.2010.5561613