• DocumentCode
    2175479
  • Title

    Information Extraction -a text mining approach

  • Author

    Kanya, N. ; Geetha, S.

  • Author_Institution
    Panimalar Eng. Coll., Chennai
  • fYear
    2007
  • fDate
    20-22 Dec. 2007
  • Firstpage
    1111
  • Lastpage
    1118
  • Abstract
    Text mining concerns looking for patterns in unstructured text. The related task of information extraction (IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DISCOTEX (discovery from text extraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DISCOTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. In addition, rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of the underlying extraction system. Encouraging results are presented on applying these techniques to a corpus of computer job announcement postings from an Internet newsgroup.
  • Keywords
    data mining; learning (artificial intelligence); natural language processing; text analysis; DISCOTEX; IE learning system; Internet newsgroup; database extraction; information extraction; natural-language document; standard rule induction module; text mining approach; BWI; IR; Information Extraction; KDD; Rapier;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Information and Communication Technology in Electrical Sciences (ICTES 2007), 2007. ICTES. IET-UK International Conference on
  • Conference_Location
    Tamil Nadu
  • ISSN
    0537-9989
  • Type

    conf

  • Filename
    4735960