Title :
Repairing errors for Chinese word segmentation and part-of-speech tagging
Author :
Yao, Tian-fang ; Ding, Wei ; Erbach, Gregor
Author_Institution :
Computational Linguistics Dept., Saarlandes Univ., Saarbrucken, Germany
Abstract :
For improving the recognition performance of Chinese named entities, transformation based machine learning has been introduced to repair errors caused during word segmentation and part-of-speech (POS) tagging. Since Chinese is not a segmented language, the words in a sentence must be segmented before they are processed by consequent Chinese named entity recognition components. Similarly, POS tagging is also an important fundamental task for Chinese named entity recognition. In order to enhance the quality of word segmentation and POS tagging, it is necessary to explore different approaches for improving the performance. One of the approaches is to repair errors as much as possible, if word segmentation and POS tagging tool is available on hand. This paper aims at introducing an effective error repairer using transformation based error-driven machine learning technique. It deals with detecting error positions, producing error repairing rules, selecting higher-score rules, ordering rules and distinguishing rule usage conditions, etc. The experimental results show that word segmentation and POS tagging errors are significantly reduced and the performance has been improved.
Keywords :
knowledge based systems; learning (artificial intelligence); natural languages; pattern recognition; Chinese language; Chinese named entity recognition; Chinese word segmentation; error repairing; error-driven machine learning; natural language processing; part-of-speech tagging; transformation based machine learning; Algorithm design and analysis; Error correction; Machine learning; Modems; Natural languages; Neck; Probability; Statistics; Tagging; Text recognition;
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
DOI :
10.1109/ICMLC.2002.1175365