Author_Institution :
Dept. of Stat., Stanford Univ., Stanford, CA, USA
Abstract :
We consider recovery of low-rank matrices from noisy data by hard thresholding of singular values, in which empirical singular values below a threshold λ are set to 0. We study the asymptotic mean squared error (AMSE) in a framework, where the matrix size is large compared with the rank of the matrix to be recovered, and the signal-to-noise ratio of the low-rank piece stays constant. The AMSE-optimal choice of hard threshold, in the case of n-by-n matrix in white noise of level σ, is simply (4/√3)√nσ ≈ 2.309√nσ when σ is known, or simply 2.858 · ymed when σ is unknown, where ymed is the median empirical singular value. For nonsquare, m by n matrices with m ≠ n the thresholding coefficients 4/√3 and 2.858 are replaced with different provided constants that depend on m/n. Asymptotically, this thresholding rule adapts to unknown rank and unknown noise level in an optimal manner: it is always better than hard thresholding at any other value, and is always better than ideal truncated singular value decomposition (TSVD), which truncates at the true rank of the low-rank matrix we are trying to recover. Hard thresholding at the recommended value to recover an n-by-n matrix of rank r guarantees an AMSE at most 3 nrσ2. In comparison, the guarantees provided by TSVD, optimally tuned singular value soft thresholding and the best guarantee achievable by any shrinkage of the data singular values are 5 nrσ2, 6 nrσ2, and 2 nrσ2, respectively. The recommended value for hard threshold also offers, among hard thresholds, the best possible AMSE guarantees for recovering matrices with bounded nuclear norm. Empirical evidence suggests that performance improvement over TSVD and other popular shrinkage rules can be substantial, for different noise distributions, even in relatively small n.
Keywords :
mean square error methods; singular value decomposition; white noise; AMSE-optimal choice; TSVD; asymptotic mean squared error; data singular value shrinkage; low-rank matrix recovery; median empirical singular value; n-by-n matrix; noise distribution; noisy data; optimal hard threshold; optimally tuned singular value soft thresholding; signal-to-noise ratio; thresholding coefficient; truncated singular value decomposition; unknown noise level; white noise; Approximation methods; Information theory; Noise level; Noise reduction; Signal to noise ratio; Vectors; White noise; Singular values shrinkage; bulk edge; low-rank matrix denoising; optimal threshold; quarter circle law; scree plot elbow truncation; unique admissible;