• DocumentCode
    3324061
  • Title

    A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition

  • Author

    Pan, Feng ; Zhang, Xiang ; Wang, Wei

  • Author_Institution
    Dept. of Comput. Sci., Univ. of North Carolina, Chapel Hill, NC
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1337
  • Lastpage
    1339
  • Abstract
    Simultaneously clustering columns and rows (co- clustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of mtimesn (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed sampling- based matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Experimental results show that CRD achieves competitive accuracy to existing co-clustering methods but with much less computational cost.
  • Keywords
    computational complexity; data handling; matrix decomposition; pattern clustering; sampling methods; very large databases; fast co-clustering method; large data matrix; large datasets; sampling-based matrix decomposition methods; time complexity; Application software; Clustering algorithms; Computational efficiency; Computer science; Data analysis; Data mining; Gene expression; Matrix decomposition; Partitioning algorithms; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497548
  • Filename
    4497548