• DocumentCode
    1790798
  • Title

    On the sample complexity of subspace clustering with missing data

  • Author

    Pimentel, D. ; Nowak, Robert ; Balzano, L.

  • Author_Institution
    Electr. & Comput. Eng., Univ. of Wisconsin, Madison, WI, USA
  • fYear
    2014
  • fDate
    June 29 2014-July 2 2014
  • Firstpage
    280
  • Lastpage
    283
  • Abstract
    Subspace clustering is a useful tool for analyzing large complex data, but in many relevant applications missing data are common. Existing theoretical analysis of this problem shows that subspace clustering from incomplete data is possible, but that analysis requires the number of samples (i.e., partially observed vectors) to be super-polynomial in the dimension d. Such huge sample sizes are unnecessary when no data are missing and uncommon in applications. There are two main contributions in this paper. First, it is shown that if subspaces have rank at most r and the number of partially observed vectors greater than dr+1 (times a poly-logarithmic factor), then with high probability the true subspaces are the only subspaces that agree with the observed data. We may conclude that subspace clustering may be possible without impractically large sample sizes and that we can certify the output of any subspace clustering algorithm by checking its fit to the observed data. The second main contribution is a novel EM-type algorithm for subspace clustering with missing data. We demonstrate and compare it to several other algorithms. Experiments with simulated and real data show that such algorithms work well in practice.
  • Keywords
    computational complexity; data analysis; expectation-maximisation algorithm; pattern clustering; EM-type algorithm; large complex data analysis; missing data; partially observed vectors; poly-logarithmic factor; sample complexity; subspace clustering algorithm; super-polynomial; Clustering algorithms; Computational modeling; Conferences; Monitoring; Signal processing; Signal processing algorithms; Vectors; Matrix Completion; Subspace Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Statistical Signal Processing (SSP), 2014 IEEE Workshop on
  • Conference_Location
    Gold Coast, VIC
  • Type

    conf

  • DOI
    10.1109/SSP.2014.6884630
  • Filename
    6884630