Title :
Concept Based Search Using LSI and Automatic Keyphrase Extraction
Author :
Rodrigues, Ravina ; Asnani, Kavita
Author_Institution :
Dept. of Inf. Technol. (M.E.), Padre Conceicao Coll. of Eng., Verna, India
Abstract :
Classic information retrieval model might lead to poor retrieval due to unrelated documents that might be included in the answer set or missed relevant documents that do not contain at least one index term. Retrieval based on index terms is vague and noisy. The user information need is more related to concepts and ideas than to index terms. Latent Semantic Indexing (LSI) model is a concept-based retrieval method which overcomes many of the problems evident in today´s popular word-based retrieval systems. Most retrieval systems match words in the user´s queries with words in the text of documents in the corpus, whereas LSI model performs the match based on the concepts. In order to perform concept mapping, Singular Value Decomposition (SVD) is used. Also key phrases are an important means of document summarization, clustering and topic search. Key phrases give high level description of document contents that indeed makes it easy for perspective readers to decide whether or not it is relevant to them. In this paper, we first develop an automatic key phrase extraction model for extracting key phrases from documents and then use these key phrases as a corpus on which conceptual search will be performed using LSI.
Keywords :
indexing; information needs; information retrieval; singular value decomposition; LSI; automatic keyphrase extraction; concept based search; concept-based retrieval method; document clustering; document summarization; information need; information retrieval model; latent semantic indexing model; singular value decomposition; Keyphrases; Latent Semantic Indexing; Retrieval models; Singular Value Decomposition;
Conference_Titel :
Emerging Trends in Engineering and Technology (ICETET), 2010 3rd International Conference on
Conference_Location :
Goa
Print_ISBN :
978-1-4244-8481-2
Electronic_ISBN :
2157-0477
DOI :
10.1109/ICETET.2010.100