• DocumentCode
    2386922
  • Title

    A hierarchical approach for semi-structured document indexing and terminology extraction

  • Author

    Bounhas, Ibrahim ; Slimani, Yahya

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Tunis ElManar, Tunis, Tunisia
  • fYear
    2010
  • fDate
    17-18 March 2010
  • Firstpage
    315
  • Lastpage
    320
  • Abstract
    Many approaches of terminology extraction make use of contextual information to acquire relations between terms. The quality and the quantity of this information influence the accuracy of the terminology extractor. In this paper, we assume that logical structure of documents constitute a rich source of contextual information which can be used to infer semantic relations between terms and thus construct a termino-ontological resource. We propose a top-down indexing method which attributes greatest importance to terms that appear in the head nodes of the document. Terms are weighted according to their position in the hierarchical structure of the document. Once documents are indexed, logical relationships between their fragments are mined to build a contextual network of terms. Links of this network help deduce semantic relations useful for terminology organization. A so extracted knowledge can be exploited as a mapping scheme in a domain-specific information retrieval (IR) system. We experiment our approach by taking the example of an Arabic corpus talking about animals.
  • Keywords
    indexing; information retrieval; knowledge acquisition; Arabic corpus; contextual information; information retrieval system; knowledge extraction; mapping scheme; semistructured document indexing; termino ontological resource; terminology extraction; top down indexing method; Animal structures; Computer science; Data mining; Indexing; Information management; Information retrieval; Intelligent structures; Navigation; Ontologies; Terminology; Document indexing; Domain-specific IR; Logical structure; Terminology extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Retrieval & Knowledge Management, (CAMP), 2010 International Conference on
  • Conference_Location
    Shah Alam, Selangor
  • Print_ISBN
    978-1-4244-5650-5
  • Type

    conf

  • DOI
    10.1109/INFRKM.2010.5466894
  • Filename
    5466894