Title :
Based on Rough Sets and the Associated Analysis of KNN Text Classification Research
Author :
Guo Aizhang;Yang Tao
Author_Institution :
Qilu Univ. of Technol., Jinan, China
Abstract :
With the rapid development of network information technology, the text is as a basic information carrier and begins to present exponential growth. The existing text classification methods haven´t got information from the vast amounts of information resources timely and accurately. In order to solve the problem, the paper puts forward a new method about text categorization. It is a KNN algorithm based on rough set and correlation analysis. Firstly, we introduce the concept of rough set. In the training set of text vector space, we divide all kinds of text vector spaces into certain and uncertain areas. For certain areas, we can directly judge its category. For uncertain areas, we determine the type of text vector through KNN text classification algorithm based on correlation analysis. Experimental results show that the KNN text classification algorithm based on rough sets and the associated analysis have greatly improved the efficiency and accuracy of text categorization. It can meet the requirements of processing large amounts of text data.
Keywords :
"Text categorization","Classification algorithms","Algorithm design and analysis","Training","Approximation algorithms","Rough sets","Correlation"
Conference_Titel :
Distributed Computing and Applications for Business Engineering and Science (DCABES), 2015 14th International Symposium on
DOI :
10.1109/DCABES.2015.127