Title :
A Chinese segmentation system based on document self-matching for identifying the unknown words
Author :
Sun, Yue-heng ; He, Pi-Lian ; Nie, Song ; Wu, Guang-Yuan
Author_Institution :
Dept. of Comput. Sci. & Technol., Tianjin Univ., China
Abstract :
This paper proposes a Chinese segmentation system based on document self-matching for identifying the unknown words. We discuss in detail the realization process, modules and model of this system. The experiments show that our system has a high ability in identifying the unknown words. In addition, we also carry through a study on the features of these words and the results prove that they can be considered as good indexing terms in information retrieval because of their competence in representing document content, discriminating documents and returning the relevant documents for the user queries.
Keywords :
dictionaries; information retrieval; natural languages; word processing; Chinese segmentation system; dictionary matching; document self-matching; information retrieval; realization process; unknown word identification; user queries; word frequency static; Computer science; Content based retrieval; Dictionaries; Helium; Indexing; Information retrieval; Particle separators; Sun; Terminology; Trademarks;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1259847