Title :
Extraction of purpose data using surface text patterns
Author :
Mayee, P. Kiran ; Sangal, Rajeev ; Paul, Soma
Author_Institution :
Language Technol. Res. Centre, Int. Inst. of Inf. Technol., Hyderabad, India
Abstract :
This paper presents the concept of surface text patterns for extracting purpose data from the web. In order to obtain an optimal set of patterns, we have developed a method for learning purpose patterns automatically. A corpus was downloaded from the Internet using bootstrapping by providing a few hand-crafted examples of each purpose pattern to a generic search engine. This corpus was then tagged and patterns were extracted from the returned documents by automated means and standardized. The precision of each pattern and the average precision for each group were computed. The extracted patterns were then used to extract purpose data. The results for extraction from the web have been reported.
Keywords :
Internet; information retrieval; learning (artificial intelligence); search engines; text analysis; Internet; World Wide Web; bootstrapping; generic search engine; learning purpose pattern; purpose data extraction; surface text pattern; Data mining; Encyclopedias; Google; Internet; Pattern matching; Search engines; Surface morphology; Information Extraction; Surface Text Patterns;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587860