• DocumentCode
    3557927
  • Title

    Reducing Redundancy in Subspace Clustering

  • Author

    Chu, Yi-Hong ; Chen, Ying-Ju ; Yang, De-Nian ; Chen, Ming-Syan

  • Author_Institution
    Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    21
  • Issue
    10
  • fYear
    2009
  • Firstpage
    1432
  • Lastpage
    1446
  • Abstract
    In this paper, we first study an important but unsolved dilemma in the literature of subspace clustering, which is referred to as ldquoinformation overlapping-data coveragerdquo challenge. Current solutions of subspace clustering usually invoke a grid-based apriori-like procedure to identify dense regions and construct subspace clusters afterward. Due to the nature of monotonicity property in apriori-like procedures, it is inherent that if a region is identified as dense, all its projected regions are also identified as dense, causing overlapping/redundant clustering information to be inevitably reported to users when generating clusters from such highly correlated regions. However, naive methods to filter redundant clusters will incur a challenging problem in the other side of the dilemma, called the ldquodata coveragerdquo issue. Note that two clusters may have highly correlated dense regions but their data members could be highly different to each other. Arbitrarily removing one of them may lose the coverage of data with clustering information, thus likely reporting an incomplete and biased clustering result. In this paper, therefore, we further propose an innovative algorithm, called "nonredundant subspace cluster miningrdquo (NORSC), to efficiently discover a succinct collection of subspace clusters while also maintaining the required degree of data coverage. NORSC not only avoids generating the redundant clusters with most of the contained data covered by higher dimensional clusters to resolve the information overlapping problem but also limits the information loss to cope with the data coverage problem. As shown by our experimental results, NORSC is very effective in identifying a concise and small set of subspace clusters, while incurring time complexity in orders of magnitude better than that of previous works.
  • Keywords
    computational complexity; data mining; pattern clustering; data coverage; grid-based apriori-like procedure; higher dimensional cluster; information overlapping; innovative algorithm; monotonicity property; naive method; nonredundant subspace cluster mining; redundancy reduction; time complexity; Clustering; Data mining; and association rules; classification; redundancy filtering.; subspace clustering;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • Conference_Location
    10/10/2008 12:00:00 AM
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.207
  • Filename
    4641928