Title :
A hierarchical approach for semi-structured document indexing and terminology extraction
Author :
Bounhas, Ibrahim ; Slimani, Yahya
Author_Institution :
Dept. of Comput. Sci., Univ. of Tunis ElManar, Tunis, Tunisia
Abstract :
Many approaches of terminology extraction make use of contextual information to acquire relations between terms. The quality and the quantity of this information influence the accuracy of the terminology extractor. In this paper, we assume that logical structure of documents constitute a rich source of contextual information which can be used to infer semantic relations between terms and thus construct a termino-ontological resource. We propose a top-down indexing method which attributes greatest importance to terms that appear in the head nodes of the document. Terms are weighted according to their position in the hierarchical structure of the document. Once documents are indexed, logical relationships between their fragments are mined to build a contextual network of terms. Links of this network help deduce semantic relations useful for terminology organization. A so extracted knowledge can be exploited as a mapping scheme in a domain-specific information retrieval (IR) system. We experiment our approach by taking the example of an Arabic corpus talking about animals.
Keywords :
indexing; information retrieval; knowledge acquisition; Arabic corpus; contextual information; information retrieval system; knowledge extraction; mapping scheme; semistructured document indexing; termino ontological resource; terminology extraction; top down indexing method; Animal structures; Computer science; Data mining; Indexing; Information management; Information retrieval; Intelligent structures; Navigation; Ontologies; Terminology; Document indexing; Domain-specific IR; Logical structure; Terminology extraction;
Conference_Titel :
Information Retrieval & Knowledge Management, (CAMP), 2010 International Conference on
Conference_Location :
Shah Alam, Selangor
Print_ISBN :
978-1-4244-5650-5
DOI :
10.1109/INFRKM.2010.5466894