Title :
An Optimization Algorithm of K-NN Classification
Author :
Zhan, Yan ; Chen, Hao ; Zhang, Guo-chun
Author_Institution :
Machine Learning Center, Hebei Univ.
Abstract :
K-nearest neighbor (K-NN) algorithm is a classification method based on statistical theory. In this algorithm the Euclidean distance is usually chosen as the similarity measure, which usually relates to all attributes. Accordingly one practical issue in applying K-NN algorithm is that the distance between instances is calculated based on all attributes of the instance. One interesting approach to overcoming this problem is to weight each attribute differently when calculating the distance between two instances. So we can decide different functions of each feature by using feature weight learning. Another issue is that we still need evaluate K value by testing different values. In order to avoid searching for K value in nearest neighbor experiment and make the accuracy and efficiency more perfect, we bring forward one validity function in this paper for judging clustering when the classification of data set is clear. We apply it into classification problem such as K-NN combining with supervised classification. Thus we can only select the nearest neighbor (1-NN) not only to achieve more precise classification but also to avoid the trouble of looking for K, which will reduce the query complexity greatly and improve the efficiency of nearest neighbor algorithm. Simultaneously, the nearest neighbor algorithm is one of the most basic case-base reasoning (CBR) problems and case-base maintenance (CBM) is an important issue in CBR system to obtain the efficient case bases. This paper proposes a new approach to select representative cases based on generalization capability of cases. Using this method, most redundant cases, which affect the solution accuracy, can be deleted. It will improve indexing efficiency in searching near neighbors
Keywords :
case-based reasoning; generalisation (artificial intelligence); learning (artificial intelligence); optimisation; pattern classification; pattern clustering; statistical analysis; CBR problems; Euclidean distance; K-NN classification algorithm; case generalization capability; case-base maintenance; case-base reasoning problems; optimization algorithm; representative cases; similarity measure; statistical theory; Classification algorithms; Clustering algorithms; Computer science; Cybernetics; Electronic mail; Euclidean distance; Machine learning; Machine learning algorithms; Mathematics; Nearest neighbor searches; Optimization methods; Testing; K-NN algorithm; case-base maintenance; clustering validity; feature weight; generalization capability; similarity metrics;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.258667