Title :
Two new feature extraction methods for text classification: TESDF and SADF
Author :
Kilic, Erdal ; Ates, Nurullah ; Karakaya, Aykut ; Sahin, Durmus Ozkan
Author_Institution :
Bilgisayar Muhendisligi Bolumu, Ondokuz Mayis Univ., Samsun, Turkey
Abstract :
In this study, two new document weighting methods are proposed based on term frequency-inverse document frequency (TF-IDF) generally used in text mining methods. Also, insignificance of the verb in text classification which will be a new method in pre-processing have been put forward and tested. The better results were observed through using these methods when these methods compare with other method, It was observed that the performance rate hardly change and the data size which was processed decreased by omitting verbs of texts.
Keywords :
document image processing; feature extraction; text analysis; SADF; TESDF; document weighting methods; feature extraction methods; term frequency-inverse document frequency; text classification; text mining methods; Automation; Conferences; Feature extraction; Niobium; Signal processing; Signal processing algorithms; Text categorization; inverse document frequency; term weighting; text classification;
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2015 23th
Conference_Location :
Malatya
DOI :
10.1109/SIU.2015.7129862