• DocumentCode
    88609
  • Title

    Mining Statistically Significant Co-location and Segregation Patterns

  • Author

    Barua, Simul ; Sander, Joerg

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada
  • Volume
    26
  • Issue
    5
  • fYear
    2014
  • fDate
    May-14
  • Firstpage
    1185
  • Lastpage
    1199
  • Abstract
    In spatial domains, interaction between features gives rise to two types of interaction patterns: co-location and segregation patterns. Existing approaches to finding co-location patterns have several shortcomings: (1) They depend on user specified thresholds for prevalence measures; (2) they do not take spatial auto-correlation into account; and (3) they may report co-locations even if the features are randomly distributed. Segregation patterns have yet to receive much attention. In this paper, we propose a method for finding both types of interaction patterns, based on a statistical test. We introduce a new definition of co-location and segregation pattern, we propose a model for the null distribution of features so spatial auto-correlation is taken into account, and we design an algorithm for finding both co-location and segregation patterns. We also develop two strategies to reduce the computational cost compared to a naïve approach based on simulations of the data distribution, and we propose an approach to reduce the runtime of our algorithm even further by using an approximation of the neighborhood of features. We evaluate our method empirically using synthetic and real data sets and demonstrate its advantages over a state-of-the-art co-location mining algorithm.
  • Keywords
    Bayes methods; data mining; pattern classification; statistical distributions; statistical testing; Naive approach; computational cost; data distribution; feature interaction; feature null distribution; interaction patterns; random distribution; statistical test; statistically significant colocation pattern mining; statistically significant segregation pattern mining; user specified thresholds; Atmospheric measurements; Computational modeling; Data mining; Data models; Indexes; Particle measurements; Runtime; Data mining; Database Applications; Database Management; Information Technology and Systems; Spatial data; Spatial databases; Systems; co-location; segregation; spatial interaction; statistically significant pattern;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.88
  • Filename
    6523223