• DocumentCode
    2006139
  • Title

    Simultaneously Removing Noise and Selecting Relevant Features for High Dimensional Noisy Data

  • Author

    Byeon, Boseon ; Rasheed, Khaled

  • Author_Institution
    Comput. Sci., Univ. of Georgia, Athens, GA
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    147
  • Lastpage
    152
  • Abstract
    The classification for the noisy training data in high dimension suffers from concurrent negative effects by noise and irrelevant/redundant features. Noise disrupts the training data and irrelevant/redundant features prevent the classifier from picking relevant features in building the model. Therefore they may reduce classification accuracy. This paper introduces a novel approach to improve the quality of training data sets with noisy dependent variable and high dimensionality by simultaneously removing noisy instances and selecting relevant features for classification. Our approach relies on two genetic algorithms, one for noise detection and the other for feature selection, and allows them to exchange their results periodically at certain generation intervals. Prototype selection is used to improve the performance along with the genetic algorithm in the noise detection method. This paper shows that our approach enhances the quality of noisy training data sets with high dimension and substantially increases the classification accuracy.
  • Keywords
    feature extraction; genetic algorithms; learning (artificial intelligence); pattern classification; genetic algorithm; high dimensional noisy data classification; machine learning; noise detection; noise removal; prototype selection; relevant feature selection; Application software; Computer science; Filtering; Filters; Genetic algorithms; Machine learning; Nearest neighbor searches; Noise generators; Prototypes; Training data; Feature Selection; Genetic Algorithm; Noise Detection; Outlier Detction; Prototype Selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.87
  • Filename
    4724968