• DocumentCode
    2445224
  • Title

    DB-Outlier Detection by Example in High Dimensional Datasets

  • Author

    Li, Yuan ; Kitagawa, Hiroyuki

  • Author_Institution
    Grad. Sch. of Syst. & Inf. Eng., Tsukuba Univ., Tsukuba
  • fYear
    2007
  • fDate
    15-15 April 2007
  • Firstpage
    73
  • Lastpage
    78
  • Abstract
    Outlier detection is an important problem with applications in many fields. Such applications generally process high dimensional datasets. Among the existing methods of detecting outliers, Distance-Based outlier (DB-Outlier) detection is one of the most commonly used and simplest approaches, since it detects outliers only by calculating distances between data points. However, in high dimensional space, data is sparse, so every data point becomes a good outlier candidate. A Subspace-Based method has been proposed to deal with the curse of dimensions. It shows that meaningful outliers are likely to be identified by examining the behavior of data in low dimensional projections. On the other hand, most existing methods detect outliers with parameters being determined by users in advance. Such parameters usually contain hidden user view of outliers. Example-Based outlier detection methods are presented to be promising in discovering the hidden user view of outliers. In this paper, we discuss a new technique to detect DB-Outliers in high dimensional datasets based on user examples. Our proposed method makes use of Subspace-Based and Example-Based methods to discover a subspace where user examples are outstanding more significantly than in any other subspaces, and reports DB-Outliers detected in this subspace.
  • Keywords
    data mining; very large databases; data mining; distance-based outlier detection; example-based outlier detection; high dimensional dataset; subspace-based outlier detection; Clustering algorithms; Credit cards; Data engineering; Data mining; Density measurement; Extraterrestrial measurements; Object detection; Robustness; Systems engineering and theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Databases for Next Generation Researchers, 2007. SWOD 2007. IEEE International Workshop on
  • Conference_Location
    Istanbul
  • Print_ISBN
    1-4244-0903-9
  • Electronic_ISBN
    1-4244-0904-7
  • Type

    conf

  • DOI
    10.1109/SWOD.2007.353201
  • Filename
    4163065