DocumentCode
2006139
Title
Simultaneously Removing Noise and Selecting Relevant Features for High Dimensional Noisy Data
Author
Byeon, Boseon ; Rasheed, Khaled
Author_Institution
Comput. Sci., Univ. of Georgia, Athens, GA
fYear
2008
fDate
11-13 Dec. 2008
Firstpage
147
Lastpage
152
Abstract
The classification for the noisy training data in high dimension suffers from concurrent negative effects by noise and irrelevant/redundant features. Noise disrupts the training data and irrelevant/redundant features prevent the classifier from picking relevant features in building the model. Therefore they may reduce classification accuracy. This paper introduces a novel approach to improve the quality of training data sets with noisy dependent variable and high dimensionality by simultaneously removing noisy instances and selecting relevant features for classification. Our approach relies on two genetic algorithms, one for noise detection and the other for feature selection, and allows them to exchange their results periodically at certain generation intervals. Prototype selection is used to improve the performance along with the genetic algorithm in the noise detection method. This paper shows that our approach enhances the quality of noisy training data sets with high dimension and substantially increases the classification accuracy.
Keywords
feature extraction; genetic algorithms; learning (artificial intelligence); pattern classification; genetic algorithm; high dimensional noisy data classification; machine learning; noise detection; noise removal; prototype selection; relevant feature selection; Application software; Computer science; Filtering; Filters; Genetic algorithms; Machine learning; Nearest neighbor searches; Noise generators; Prototypes; Training data; Feature Selection; Genetic Algorithm; Noise Detection; Outlier Detction; Prototype Selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
Conference_Location
San Diego, CA
Print_ISBN
978-0-7695-3495-4
Type
conf
DOI
10.1109/ICMLA.2008.87
Filename
4724968
Link To Document