Title of article

Automatic Thesaurus Development: Term Extraction From Title Metadata

Author/Authors

Jun Wang، نويسنده ,

Issue Information

ماهنامه با شماره پیاپی سال 2006

Pages

From page

907

To page

920

Abstract

The application of thesauri in networked environments is seriously hampered by the challenges of introducing new concepts and terminology into the formal controlled vocabulary, which is critical for enhancing its retrieval capability. The author describes an automated process of adding new terms to thesauri as entry vocabulary by analyzing the association between words/phrases extracted from bibliographic titles and subject descriptors in the metadata record (subject descriptors are terms assigned from controlled vocabularies of thesauri to describe the subjects of the objects [e.g., books, articles] represented by the metadata records). The investigated approach uses a corpus of metadata for scientific and technical (S&T) publications in which the titles contain substantive words for key topics. The three steps of the method are (a) extracting words and phrases from the title field of the metadata; (b) applying a method to identify and select the specific and meaningful keywords based on the associated controlled vocabulary terms from the thesaurus used to catalog the objects; and (c) inserting selected keywords into the thesaurus as new terms (most of them are in hierarchical relationships with the existing concepts), thereby updating the thesaurus with new terminology that is being used in the literature. The effectiveness of the method was demonstrated by an experiment with the Chinese Classification Thesaurus (CCT) and bibliographic data in China Machine-Readable Cataloging Record (MARC) format (CNMARC) provided by Peking University Library. This approach is equally effective in large-scale collections and in other languages.

Journal title

Journal of the American Society for Information Science and Technology

Serial Year

2006

Journal title

Journal of the American Society for Information Science and Technology

Record number

844124

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=844124