• DocumentCode
    2666329
  • Title

    Automatic extraction and incorporation of purpose data into PurposeNet

  • Author

    Mayee, Kiran ; Sangal, Rajeev ; Paul, Some

  • Author_Institution
    Language Technol. Res. Centre, Int. Inst. of Inf. Technol., Hyderabad, India
  • Volume
    6
  • fYear
    2010
  • fDate
    16-18 April 2010
  • Abstract
    PurposeNet is a knowledge base of objects and actions in which the knowledge is organized around purpose. Such knowledge also connects with language - namely, verbs for related actions. It can be used with an embedded reasoner, resulting in an effective system for QA, topic-listing, summarization and other tasks. However, extracting PurposeNet related data manually is time-consuming, labor-intensive, and expensive. This paper describes a framework for automatic purpose data extraction, given a corpus. It identifies a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of purpose data. It also deals with the subsequent automatic incorporation of this data into the PurposeNet resource. The results are used to augment and critique the structure of a large hand-built resource. The cases where purpose data is incomplete has also been analyzed. The extent of success, in terms of richness of the resource, achieved in the process is also discussed.
  • Keywords
    data mining; information retrieval; knowledge representation languages; pattern recognition; PurposeNet resource; automatic knowledge extraction language; embedded reasoner; lexico-syntactic patterns; pattern recognition; purpose data extraction; Data mining; Classification; Information Retrieval; PurposeNet; Supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering and Technology (ICCET), 2010 2nd International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-6347-3
  • Type

    conf

  • DOI
    10.1109/ICCET.2010.5486346
  • Filename
    5486346