Title of article :
Automatic Thesaurus Development:
Term Extraction From Title Metadata
Author/Authors :
Jun Wang، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2006
Abstract :
The application of thesauri in networked environments is
seriously hampered by the challenges of introducing
new concepts and terminology into the formal controlled
vocabulary, which is critical for enhancing its retrieval
capability. The author describes an automated process
of adding new terms to thesauri as entry vocabulary
by analyzing the association between words/phrases
extracted from bibliographic titles and subject descriptors
in the metadata record (subject descriptors are
terms assigned from controlled vocabularies of thesauri
to describe the subjects of the objects [e.g., books, articles]
represented by the metadata records). The investigated
approach uses a corpus of metadata for scientific
and technical (S&T) publications in which the titles contain
substantive words for key topics. The three steps of
the method are (a) extracting words and phrases from the
title field of the metadata; (b) applying a method to identify
and select the specific and meaningful keywords
based on the associated controlled vocabulary terms
from the thesaurus used to catalog the objects; and
(c) inserting selected keywords into the thesaurus as
new terms (most of them are in hierarchical relationships
with the existing concepts), thereby updating the
thesaurus with new terminology that is being used in the
literature. The effectiveness of the method was demonstrated
by an experiment with the Chinese Classification
Thesaurus (CCT) and bibliographic data in China
Machine-Readable Cataloging Record (MARC) format
(CNMARC) provided by Peking University Library. This
approach is equally effective in large-scale collections
and in other languages.
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology