Title :
Shallow NLP techniques for noun phrase extraction
Author :
Subhashini, R. ; Kumar, V.J.S.
Author_Institution :
Sathyabama Univ., Chennai, India
Abstract :
The field of Information Retrieval plays an important role in searching on the Internet. Most of the information retrieval systems are limited to the query processing based on keywords. In information retrieval system the matching of the query against a set of text record is the core of the system. Retrieval of the relevant natural language text document is of more challenge. Today´s most search engines are based on keyword based (bag of words) techniques, which results in some disadvantages. For text retrieval key phrases can help to narrow the search results or rank retrieved documents. We exploit shallow NLP techniques to support a range of NL queries and snippets over an existing keyword-based search. This paper describes a simple system for choosing noun phrases from a document as key phrases. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification using Chunking. A preliminary evaluation was conducted to test this technique with the standard IR benchmark collections such as classic test collections and then with the web snippets collection from the search engines results. The experimental results have been encouraging.
Keywords :
Internet; natural language processing; query processing; search engines; Internet; NLP techniques; chunking; information retrieval system; keyword based techniques; natural language text document; noun phrase extraction; part-of-speech tagging; query processing; search engines; tokenization; Arrays; Data mining; Natural language processing; Search engines; Syntactics; Tagging; Chunking; Information Retrieval; NLP; Noun Phrases;
Conference_Titel :
Trendz in Information Sciences & Computing (TISC), 2010
Conference_Location :
Chennai
Print_ISBN :
978-1-4244-9007-3
DOI :
10.1109/TISC.2010.5714612