• DocumentCode
    2753807
  • Title

    A Novel Approach for Refinement of Corpus in the Field of Opinion Mining

  • Author

    Bhattacharyya, Debnath ; Das, Poulami ; Mitra, Kheyali ; Ganguly, Debashis ; Mukherjee, Swarnendu ; Bandyopadhyay, S.K. ; Kim, Tai-Hoon

  • Author_Institution
    Comput. Sci. & Eng. Dept., Heritage Inst. of Technol., Kolkata, India
  • fYear
    2009
  • fDate
    7-9 March 2009
  • Firstpage
    281
  • Lastpage
    285
  • Abstract
    In this paper, we have provided a heuristic approach for the refinements of corpus based on regular expressions and its possible applications in the field of Opinion Mining. The proposed work is based on a corpus of reviews. The crude corpus is the raw html files containing reviews. This html file is refined further for the ease of our work so that we can get only the required part from that page. The ultimate output yields the xml files which will precisely store the important parts of the review pages from that refined html page. And that is going to be fed to the further process of language processing for machine learning process in the field of Opinion Mining.
  • Keywords
    XML; data mining; hypermedia markup languages; learning (artificial intelligence); natural language processing; HTML files; XML files; corpus refinement; crude corpus; language processing; machine learning process; opinion mining field; review corpus; Application software; Computer science; Frequency; HTML; Humans; Learning systems; Machine learning; Natural language processing; Natural languages; Speech; Corpus; crude corpus; natural language processing; regular expression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Future Networks, 2009 International Conference on
  • Conference_Location
    Bangkok
  • Print_ISBN
    978-0-7695-3567-8
  • Type

    conf

  • DOI
    10.1109/ICFN.2009.24
  • Filename
    5189944