DocumentCode :
3278933
Title :
A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool
Author :
Bhattacharyya, Debnath ; Das, Poulami ; Ganguly, Debashis ; Mitra, Kheyali ; Mukherjee, Swarnendu ; Bandyopadhyay, Samir Kumar ; Tai-Hoon Kim
Author_Institution :
Comput. Sci. & Eng. Dept., Heritage Inst. of Technol., Kolkata, India
fYear :
2009
fDate :
7-9 March 2009
Firstpage :
24
Lastpage :
29
Abstract :
Extracting specific information from a collection of documents is called information extraction (IE). In general, the information on the a Web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction technique. Here, we have described how this approach enables any naive user to extract Indian regional language based document from a Web document efficiently which is quite similar to a standard search engine. It is just similar to a pre-programmed information extraction engine.
Keywords :
XML; hypermedia markup languages; information retrieval; learning (artificial intelligence); natural language processing; pattern matching; text analysis; HTML; Indian regional language design; Unicode font-mapping tool; Web document; XML format; interactive information extraction technique; learning techniques; pattern matching; raw-text extractor; standard search engine; Application software; Computer science; Data mining; Design engineering; HTML; Knowledge engineering; Natural languages; Pattern matching; Search engines; Web sites; Corpus; HTML; Information Extraction; Mapped;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Science and Technology, 2009. AST '09. International e-Conference on
Conference_Location :
Dajeon
Print_ISBN :
978-0-7695-3672-9
Type :
conf
DOI :
10.1109/AST.2009.16
Filename :
5231732
Link To Document :
بازگشت