Title :
A new feature selection method for text categorization based on information gain and particle swarm optimization
Author :
Yigit, Ferruh ; Baykan, Omer Kaan
Author_Institution :
Ilgin Vocational Sch., Selcuk Univ., Konya, Turkey
Abstract :
Rapid increases of the documents which are created in digital media necessitate analyze and classify of these documents automatically. Feature extraction, feature selection and classifier selection in the analysis of documents and classification affects performance. In text document categorization, it is a fundamental problem that the numbers of extracted features are a lot of. In this study, by using a new feature selection method based on IG (information gain) and PSO (particle swarm optimization) algorithms, text categorization process performed. Reuters 21.578 and Classic3 corpus were used in the experiments. The roots of the words in the texts of corpus were taken as the features. Feature selection and categorization processes performed with k-Nearest Neighbors algorithm (K-NN) and Naive Bayes classifiers by using IG and PSO algorithms. Proposed system performance was evaluated by using CA (Classification Accuracy), Precision, Recall and F-measure criteria.
Keywords :
Bayes methods; feature selection; particle swarm optimisation; pattern classification; text analysis; CA; Classic3 corpus; F-measure criteria; IG algorithm; K-NN; PSO algorithm; Reuters 2l.578; classification accuracy; classifier selection; digital media; document analysis; document classification; feature extraction; feature selection; information gain algorithm; k-nearest neighbor algorithm; naive Bayes classifiers; particle swarm optimization algorithm; performance evaluation; precision value; recall value; text document categorization; Classification algorithms; Optimization; Text categorization; Text categorization; feature selection; particle swarm optimization;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN :
978-1-4799-4720-1
DOI :
10.1109/CCIS.2014.7175792