• DocumentCode
    401776
  • Title

    A Chinese segmentation system based on document self-matching for identifying the unknown words

  • Author

    Sun, Yue-heng ; He, Pi-Lian ; Nie, Song ; Wu, Guang-Yuan

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tianjin Univ., China
  • Volume
    4
  • fYear
    2003
  • fDate
    2-5 Nov. 2003
  • Firstpage
    2080
  • Abstract
    This paper proposes a Chinese segmentation system based on document self-matching for identifying the unknown words. We discuss in detail the realization process, modules and model of this system. The experiments show that our system has a high ability in identifying the unknown words. In addition, we also carry through a study on the features of these words and the results prove that they can be considered as good indexing terms in information retrieval because of their competence in representing document content, discriminating documents and returning the relevant documents for the user queries.
  • Keywords
    dictionaries; information retrieval; natural languages; word processing; Chinese segmentation system; dictionary matching; document self-matching; information retrieval; realization process; unknown word identification; user queries; word frequency static; Computer science; Content based retrieval; Dictionaries; Helium; Indexing; Information retrieval; Particle separators; Sun; Terminology; Trademarks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2003 International Conference on
  • Print_ISBN
    0-7803-8131-9
  • Type

    conf

  • DOI
    10.1109/ICMLC.2003.1259847
  • Filename
    1259847