DocumentCode :
124171
Title :
Feature Selection and Term Weighting
Author :
Algarni, Abdulmohsen ; Tairan, Nasser
Author_Institution :
Coll. of Comput. Sci., King Khalid Univ., Abha, Saudi Arabia
Volume :
1
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
336
Lastpage :
339
Abstract :
Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining techniques have been adapted to reduce noisy information from extracted features but still contains some noises features. However, the noise features are extracted from the same training documents that good features extracted from. Therefore, the main problem is that some training documents contain large a mount of noises data. If we can reduce the noises data in the training documents that would help to reduce noises in extracted features. Moreover, we believe that remove some of training documents (documents that contains noises data more than useful data) can help to improve the effectiveness of the classifier. Using the advantages of clustering method can help to reduce the affect of noises data. The main problem of clustering is defined to be that of finding groups of similar projects in the data. In this paper we introduce the methodology that using clustering algorithm to group training data before use it. Also we tested our theory that not all training documents are useful to train the classifier.
Keywords :
data mining; data reduction; feature extraction; feature selection; pattern classification; pattern clustering; text analysis; classifier; clustering algorithm; feature extraction; feature selection; group training data; noise data reduction; noisy information reduction; term weighting approach; text documents; text-mining techniques; training documents; Feature extraction; Frequency measurement; Information retrieval; Noise; Text categorization; Training; Data mining; Information retrieval; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
Type :
conf
DOI :
10.1109/WI-IAT.2014.53
Filename :
6927562
Link To Document :
بازگشت