• DocumentCode
    1815136
  • Title

    Resolving the unencoded character problem for chinese digital libraries

  • Author

    Juang, Derming ; Wang, Jenq-Haur ; Lai, Chen-Yu ; Hsieh, Ching-Chun ; Chien, Lee-Feng ; Ho, Jan-Ming

  • Author_Institution
    Inst. of Inf. Sci., Acad. Sinica
  • fYear
    2005
  • fDate
    7-11 June 2005
  • Firstpage
    311
  • Lastpage
    319
  • Abstract
    Constructing a Chinese digital library, especially for historical article archiving, is often hindered by the small character sets supported by current computer systems. This paper is aimed at resolving the unencoded character problem with a practical and composite approach for Chinese digital libraries. The proposed approach consists of the glyph expression model, glyph structure database, and supporting tools. With this approach, the following problems can be resolved. First, the extensibility of Chinese characters can be preserved. Second, it would be as easy to generate, input, display, and search unencoded characters as existing ones. Third, it is compatible with existing encoding schemes that most computers use. This approach has been utilized by organizations and projects in various application domains including archeology, linguistics, ancient texts, calligraphy and paintings, and stone and bronze rubbings. For example, in Academia Sinica, a very large full-text database of ancient texts called Scripta Sinica has been created using this approach. The Union Catalog of National Digital Archives Project (NDAP) dealt with the unencoded characters encountered when merging the metadata of 12 different thematic domains from various organizations. Also, in Bronze Inscriptions Research Team (BIRT) of Academia Sinica, 3,459 bronze inscriptions were added, which is very helpful to the education and research in historic linguistics
  • Keywords
    character sets; digital libraries; history; linguistics; Academia Sinica; Bronze Inscriptions Research Team; Chinese characters; Chinese digital library; Scripta Sinica; Union Catalog of National Digital Archives Project; ancient texts; archeology; bronze rubbings; calligraphy; computer systems; education; glyph expression model; glyph structure database; historical article archiving; linguistics; metadata; paintings; stone rubbings; unencoded character problem; very large full-text database; Character generation; Code standards; Computer displays; Databases; Encoding; Information retrieval; Information science; Natural languages; Permission; Software libraries; character encoding; digital library; glyph expression; unencoded chinese characters;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2005. JCDL '05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on
  • Conference_Location
    Denver, CO
  • Print_ISBN
    1-58113-876-8
  • Type

    conf

  • DOI
    10.1145/1065385.1065457
  • Filename
    4118559