Title :
A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool
Author :
Bhattacharyya, Debnath ; Das, Poulami ; Ganguly, Debashis ; Mitra, Kheyali ; Mukherjee, Swarnendu ; Bandyopadhyay, Samir Kumar ; Tai-Hoon Kim
Author_Institution :
Comput. Sci. & Eng. Dept., Heritage Inst. of Technol., Kolkata, India
Abstract :
Extracting specific information from a collection of documents is called information extraction (IE). In general, the information on the a Web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction technique. Here, we have described how this approach enables any naive user to extract Indian regional language based document from a Web document efficiently which is quite similar to a standard search engine. It is just similar to a pre-programmed information extraction engine.
Keywords :
XML; hypermedia markup languages; information retrieval; learning (artificial intelligence); natural language processing; pattern matching; text analysis; HTML; Indian regional language design; Unicode font-mapping tool; Web document; XML format; interactive information extraction technique; learning techniques; pattern matching; raw-text extractor; standard search engine; Application software; Computer science; Data mining; Design engineering; HTML; Knowledge engineering; Natural languages; Pattern matching; Search engines; Web sites; Corpus; HTML; Information Extraction; Mapped;
Conference_Titel :
Advanced Science and Technology, 2009. AST '09. International e-Conference on
Conference_Location :
Dajeon
Print_ISBN :
978-0-7695-3672-9
DOI :
10.1109/AST.2009.16