DocumentCode
3278933
Title
A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool
Author
Bhattacharyya, Debnath ; Das, Poulami ; Ganguly, Debashis ; Mitra, Kheyali ; Mukherjee, Swarnendu ; Bandyopadhyay, Samir Kumar ; Tai-Hoon Kim
Author_Institution
Comput. Sci. & Eng. Dept., Heritage Inst. of Technol., Kolkata, India
fYear
2009
fDate
7-9 March 2009
Firstpage
24
Lastpage
29
Abstract
Extracting specific information from a collection of documents is called information extraction (IE). In general, the information on the a Web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction technique. Here, we have described how this approach enables any naive user to extract Indian regional language based document from a Web document efficiently which is quite similar to a standard search engine. It is just similar to a pre-programmed information extraction engine.
Keywords
XML; hypermedia markup languages; information retrieval; learning (artificial intelligence); natural language processing; pattern matching; text analysis; HTML; Indian regional language design; Unicode font-mapping tool; Web document; XML format; interactive information extraction technique; learning techniques; pattern matching; raw-text extractor; standard search engine; Application software; Computer science; Data mining; Design engineering; HTML; Knowledge engineering; Natural languages; Pattern matching; Search engines; Web sites; Corpus; HTML; Information Extraction; Mapped;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Science and Technology, 2009. AST '09. International e-Conference on
Conference_Location
Dajeon
Print_ISBN
978-0-7695-3672-9
Type
conf
DOI
10.1109/AST.2009.16
Filename
5231732
Link To Document