• DocumentCode
    3335813
  • Title

    XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents

  • Author

    Oro, Ermelinda ; Ruffolo, Massimo

  • Author_Institution
    DEIS, Univ. of Calabria, Rende
  • Volume
    1
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    118
  • Lastpage
    125
  • Abstract
    Information extraction is of paramount importance in several real world applications in the areas of business intelligence, competitive and military intelligence. Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF unstructured documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.
  • Keywords
    document handling; information retrieval; ontologies (artificial intelligence); PDF unstructured documents; XONTO system; ontology-based system; semantic information extraction; Artificial intelligence; Competitive intelligence; Data mining; Encoding; HTML; Intelligent structures; Ontologies; Pattern recognition; Visualization; Wrapping; Information Extraction; Knowledge representation and reasoning; PDF format; attribute grammars; ontology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2008. ICTAI '08. 20th IEEE International Conference on
  • Conference_Location
    Dayton, OH
  • ISSN
    1082-3409
  • Print_ISBN
    978-0-7695-3440-4
  • Type

    conf

  • DOI
    10.1109/ICTAI.2008.48
  • Filename
    4669679