• DocumentCode
    3278933
  • Title

    A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool

  • Author

    Bhattacharyya, Debnath ; Das, Poulami ; Ganguly, Debashis ; Mitra, Kheyali ; Mukherjee, Swarnendu ; Bandyopadhyay, Samir Kumar ; Tai-Hoon Kim

  • Author_Institution
    Comput. Sci. & Eng. Dept., Heritage Inst. of Technol., Kolkata, India
  • fYear
    2009
  • fDate
    7-9 March 2009
  • Firstpage
    24
  • Lastpage
    29
  • Abstract
    Extracting specific information from a collection of documents is called information extraction (IE). In general, the information on the a Web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction technique. Here, we have described how this approach enables any naive user to extract Indian regional language based document from a Web document efficiently which is quite similar to a standard search engine. It is just similar to a pre-programmed information extraction engine.
  • Keywords
    XML; hypermedia markup languages; information retrieval; learning (artificial intelligence); natural language processing; pattern matching; text analysis; HTML; Indian regional language design; Unicode font-mapping tool; Web document; XML format; interactive information extraction technique; learning techniques; pattern matching; raw-text extractor; standard search engine; Application software; Computer science; Data mining; Design engineering; HTML; Knowledge engineering; Natural languages; Pattern matching; Search engines; Web sites; Corpus; HTML; Information Extraction; Mapped;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Science and Technology, 2009. AST '09. International e-Conference on
  • Conference_Location
    Dajeon
  • Print_ISBN
    978-0-7695-3672-9
  • Type

    conf

  • DOI
    10.1109/AST.2009.16
  • Filename
    5231732