• DocumentCode
    2709262
  • Title

    Inlier-Based Outlier Detection via Direct Density Ratio Estimation

  • Author

    Hido, Shohei ; Tsuboi, Yuta ; Kashima, Hisashi ; Sugiyama, Masashi ; Kanamori, Takafumi

  • Author_Institution
    IBM Res., Tokyo Res. Lab., Tokyo
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    223
  • Lastpage
    232
  • Abstract
    We propose a new statistical approach to the problem of inlier-based outlier detection, i.e.,finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score; we estimate the ratio directly in a semi-parametric fashion without going through density estimation. Thus our approach is expected to have better performance in high-dimensional problems. Furthermore, the applied algorithm for density ratio estimation is equipped with a natural cross-validation procedure, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. The algorithm offers a closed-form solution as well as a closed-form formula for the leave-one-out error. Thanks to this, the proposed outlier detection method is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.
  • Keywords
    data analysis; learning (artificial intelligence); statistical analysis; closed-form formula; closed-form solution; direct density ratio estimation; high-dimensional problem; inlier-based outlier detection; kernel width; leave-one-out error; machine learning; natural cross-validation procedure; regularization parameter; semiparametric fashion; statistical approach; Data mining; density ratio; importance; outlier detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.49
  • Filename
    4781117