• DocumentCode
    3105985
  • Title

    A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm

  • Author

    Zhang, Ji ; Gao, Qigang ; Wang, Hai

  • Author_Institution
    Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    731
  • Lastpage
    740
  • Abstract
    Detecting outlying subspaces is a relatively new research problem in outlier-ness analysis for high-dimensional data. An outlying subspace for a given data point p is the sub- space in which p is an outlier. Outlying subspace detection can facilitate a better characterization process for the detected outliers. It can also enable outlier mining for high- dimensional data to be performed more accurately and efficiently. In this paper, we proposed a new method using genetic algorithm paradigm for searching outlying subspaces efficiently. We developed a technique for efficiently computing the lower and upper bounds of the distance between a given point and its kth nearest neighbor in each possible subspace. These bounds are used to speed up the fitness evaluation of the designed genetic algorithm for outlying subspace detection. We also proposed a random sampling technique to further reduce the computation of the genetic algorithm. The optimal number of sampling data is specified to ensure the accuracy of the result. We show that the proposed method is efficient and effective in handling outlying subspace detection problem by a set of experiments conducted on both synthetic and real-life datasets.
  • Keywords
    data mining; genetic algorithms; genetic algorithm; high-dimensional data mining; high-dimensional databases; outlierness analysis; random sampling technique; subspace detection; Algorithm design and analysis; Computer science; Credit cards; Data analysis; Data mining; Genetic algorithms; Nearest neighbor searches; Sampling methods; Spatial databases; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2006. ICDM '06. Sixth International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2701-7
  • Type

    conf

  • DOI
    10.1109/ICDM.2006.6
  • Filename
    4053098