Title :
Extracting location names from Chinese texts based on SVM and KNN
Author :
Li, Lishuang ; Mao, Tingting ; Huang, Degen
Author_Institution :
Dept. of Comput. Sci. & Eng., Dalian Univ. of Technol., China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
This paper presents a method of extracting location names from Chinese texts based on support vector machine (SVM) and K nearest neighbors (KNN). The character itself, character-based part-of-speech (POS) tag, the information whether a character appears in the location name characteristic word table and context information are extracted as the features of the vectors. A model based on SVM is set up for extracting location names. To improve the accuracy of SVM classifier, KNN algorithm is introduced; furthermore, to fit the unbalanced data, a modified SVM-KNN classifier is proposed. The experimental results show that this model is efficient in identifying location names from Chinese texts. The recall, precision and F-measure are up to 90.38%, 92.12% and 91.24% respectively in open test. The hybrid machine learning model based on SVM and KNN can be used for recognizing location names and other unknown words such as person names and organization names in Chinese texts. The modified SVM-KNN model can be generalized to the fields of machine learning with unbalanced class distribution.
Keywords :
computational linguistics; natural languages; pattern classification; support vector machines; text analysis; Chinese texts; K nearest neighbors; SVM-KNN classifier; character-based part-of-speech tag; location name extraction; support vector machine; Computer science; Data mining; Kernel; Machine learning; Machine learning algorithms; Nearest neighbor searches; Paper technology; Support vector machine classification; Support vector machines; Testing;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598764