• DocumentCode
    751097
  • Title

    Input variable selection: mutual information and linear mixing measures

  • Author

    Trappenberg, Thomas ; Ouyang, Jie ; Back, Andrew

  • Author_Institution
    Dept. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
  • Volume
    18
  • Issue
    1
  • fYear
    2006
  • Firstpage
    37
  • Lastpage
    46
  • Abstract
    Determining the most appropriate inputs to a model has a significant impact on the performance of the model and associated algorithms for classification, prediction, and data analysis. Previously, we proposed an algorithm ICAIVS which utilizes independent component analysis (ICA) as a preprocessing stage to overcome issues of dependencies between inputs, before the data being passed through to an input variable selection (IVS) stage. While we demonstrated previously with artificial data that ICA can prevent an overestimation of necessary input variables, we show here that mixing between input variables is common in real-world data sets so that ICA preprocessing is useful in practice. This experimental test is based on new measures introduced in this paper. Furthermore, we extend the implementation of our variable selection scheme to a statistical dependency test based on mutual information and test several algorithms on Gaussian and sub-Gaussian signals. Specifically, we propose a novel method of quantifying linear dependencies using ICA estimates of mixing matrices with a new linear mixing measure (LMM).
  • Keywords
    data mining; independent component analysis; statistical testing; Gaussian signal; ICA; data preprocessing; independent component analysis; input variable selection; linear mixing measure; mutual information estimation; statistical dependency test; sub-Gaussian signal; Classification algorithms; Data mining; Data preprocessing; Independent component analysis; Input variables; Mutual information; Parameter estimation; Performance evaluation; Predictive models; Testing; Index Terms- Input variable selection; data preprocessing; independent component analysis; modeling; mutual information estimation.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2006.11
  • Filename
    1549826