Title :
Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions
Author :
Tianyi Zhou ; Wei Bian ; Dacheng Tao
Author_Institution :
Centre for Quantum Comput. & Intell. Syst., Univ. of Technol. Sydney, Sydney, NSW, Australia
Abstract :
Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption, which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low-dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non-iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank-k matrix by solving math cal O(klog k) sub-problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.
Keywords :
data handling; divide and conquer methods; geometry; iterative methods; linear programming; matrix decomposition; conical hull; convex hull; divide-and-conquer anchoring; fast hypothesis testing; greedy pursuit methods; linear programming; low- dimensional geometry; low-dimensional random spaces; near-separable nonnegative matrix factorization; polynomial time; separability assumption; Bismuth; Geometry; Noise; Noise measurement; Robustness; Testing; Vectors; Nonnegative matrix factorization; anchor points; big data; conical hull; divide-and-conquer; matrix completion; random projection; separability assumption;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.29