SIGNAL RECOVERY FROM INCOMPLETE MEASUREMENTS IN THE PRESENCE OF OUTLIERS

. We study the restoration of a sparse signal or an image with a sparse gradient from a relatively small number of linear measurements which are additionally corrupted by a small amount of white Gaussian noise and outliers. We minimize ℓ 1 − ℓ 1 and ℓ 1 − TV regularization functionals using various algorithms and present numerical results for diﬀerent measurement matrices as well as diﬀerent sparsity levels of the initial signal/image and of the outlier vector.


Introduction
Recently, substantial progress was made in solving the fundamental problem of recovering a finite signal from a limited set of measurements [17,8,12,29,30,19,20]. Let A ∈ C n,N , n ≤ N be a matrix with some 'good' properties. Typical examples of such matrices are random matrices from the Gaussian ensemble or the symmetric Bernoulli ensemble or matrices whose rows are n random vectors from the unit sphere in R N . Let x 0 be a sparse vector. Usually, we measure sparsity in the ℓ 0 -seminorm x 0 := |{j : x j = 0}|. Moreover, for 1 ≤ p < ∞, we will use the ℓ p -norms x p := ( N −1 j=0 |x j | p ) 1/p . In this paper, we are interested in recovering x 0 from measurements b := Ax 0 + z 0 + e 0 , where z 0 denotes a vector with small ℓ 2 -norm z 0 2 ≤ ε, e.g., white Gaussian noise and e 0 is a sparse vector with large non-zero coefficients (outliers), e.g., due to missing data in the measurements. Gross sparse error vectors combined with errors with small ℓ 2 -norm were recently considered in a setting different from the one considered in this paper in [6]. If we have no noise, i.e., z 0 = e 0 = 0, and n is sufficiently large, then, with high probability, the linear problem has a unique solution which is equal to x 0 , cf. [9]. If we have Gaussian noise but no outliers, i.e., e 0 = 0, and n is sufficiently large, then with high probability, the solution of the quadratic problem (P ε 2,1 ) arg min or equivalently (for appropriate ε, λ) of (P 2,1 ) arg min is unique and provides a good approximation to x 0 , cf. [10,7].
In the presence of outliers, method (P 2,1 ) fails to work. However, in various papers, e.g., on image restoration [18,3] the minimization of functionals with ℓ 1 data-fitting term has shown a good performance. A great amount of theoretical work in this direction was done by M. Nikolova in [24,25,26]. In this paper, we want to adopt these ideas to the incomplete measurement problem.
More precisely, in the case z 0 = 0, we examine the performance of (P 1,1 ) arg min x,e or equivalently of (P 1,1 ) arg min If we have in addition Gaussian noise with z 0 ≤ ε it makes sense to solve (P ε 2,1,1 ) arg min or equivalently (for appropriate ε, α) It can be proved that problems (P 2,1,1 ) and (P 1,1 ) become equivalent if α, respectively ε tends to zero, cf. [1]. Finally, in order to recover images x 0 with sparse gradients we replace the ℓ 1norm of x by a (discrete) total variation norm · TV and consider (P 1,TV ) arg min Alternatively, it is possible to replace the TV-norm by the Besov norm in B 1 1,1 and use more general wavelet frames.
Problems (P 2,1,1 ), (P 1,1 ) are convex with coercive functionals such that there exists a solution which however is in general not unique.
The outline of this paper is as follows: In order to get an idea of the number of outliers that can be handled, Section 2 starts by proving an ℓ 0 minimization result. In Section 3 we present the numerical algorithms for solving (P 1,1 ), (P 2,1,1 ) and their TV counterparts (P 1,TV ) and (P 2,1,TV ). Numerical examples examining the reconstruction capability of our algorithms w.r.t. the sparsity of x 0 and e 0 are presented in Section 4. In partucular, we numerically evaluate the probability that the solution of (P 1,1 ) coincides with the original x 0 in dependence on the sparsity x 0 0 and e 0 0 .

ℓ 0 Minimization
The starting point for examining the recovering of signals from incomplete data was the question if it is possible to reconstruct a sparse vector x 0 from its incomplete measurements b := Ax 0 by solving One result can be formulated in terms of the smallest number of linearly dependent columns of A, denoted by spark(A), cf. [16]. In particular, if every set of n columns of A is linearly independent, i.e., spark(A) = n + 1, then perfect reconstruction is guaranteed for every x 0 ∈ C N with x 0 0 ≤ m if and only if n ≥ 2m.
To get a clue about the influence of the outliers, let us first assume that the positions Ω ⊂ {1, . . . , n} of the K outliers are known. ByΩ we denote the complement of Ω in {1, . . . , n}. Since the outliers carry no information about x 0 the best we can do is to solve arg min where b|Ω denotes the restriction of b to those components with indices inΩ and A|Ω ∈ C n−K,N contains the rows of A with indices inΩ. By Proposition 2.1 this problem has the unique solution x 0 if 2m < spark(A|Ω). Thus, if every set of n − K columns of A|Ω is linearly independent, perfect reconstruction is guaranteed if n − K ≥ 2m, i.e., n ≥ 2m + K.
In general, we have no oracle that gives us the position of the outliers. Thus, given b := Ax 0 + e 0 , we are looking for conditions such that has the unique solution x 0 . At least for matrices having only invertible quadratic submatrices a sufficient condition will be proved in the next proposition. Examples of such matrices are the N -th Fourier matrix F N := (e 2πijk/N ) N −1 j,k=0 of prime size N = p, cf. [28] and the Toeplitz matrix with entries from the Gaussian radial basis function (e −σ(j−k) 2 ) N −1 j,k=0 , cf. [22].

Numerical Algorithms
Since the practical solution of (P 0 ) is too expansive (exponentially increasing in the problem size N ) the ℓ 0 -minimization was only discussed theoretically, while for practical computations the ℓ 0 -seminorm was replaced by the ℓ 1 -norm. Before providing the numerical algorithms for solving the corresponding minimization problems we prove the following simple, but interesting proposition which can be summarized as follows: The fact that x 0 is a solution of (P 1,1 ) does not depend on the magnitude of the outliers e 0 , see also [26].
Since the functional is convex we know thatf is a solution of (1) if and only if the zero vector is in the subdifferential of the functional atf , i.e., where the quotient is meant componentwise and as usual x/|x| : In particular, we have that x 0 is a solution of (P 1,1 ) if and only iff = 0 is a solution of (1) if and only if However, the right-hand side does only depend on sgn(x 0 ) and sgn(e 0 ) but not on their sizes. Hence we are done. Problem (P 1,1 ). If the matrix A is real-valued, we can compute a minimizer of (P 1,1 ) by applying the algorithm proposed in [18]. To this end, we use the decomposition Hence, we can solve the following linear program using MATLAB resp. CPLEX LINPROG (2) arg min where 1 N denotes the vector consisting of N components 1.
In case of a complex-valued matrix A, we can compute the minimizer by secondorder cone programming (SOCP), cf. [21], [31]. It is easily seen that a solution to (P 1,1 ) can be found by solving The cone K p is defined by We used MOSEK to solve this SOCP. Problem (P 2,1,1 ). We solve (P 2,1,1 ) with the following alternating minimization algorithm suggested in [2] in the context of image decomposition. As initial value we use e (0) := 0 but any other initialization is possible. Let us have a look at the two subproblems. In the second step, the functional is strictly convex and has a unique solutionê which can be obtained by soft shrinkage ofỹ with threshold α cf. [3], i.e.,ê = S α (ỹ), where In the first step, there exists a minimizer due to convexity and coercivity of the functional but uniqueness is not guaranteed. Since the computation of the minimizer requires similar consideration as the approach to (P 2,1,TV ) in the next paragraph, we consider more generally (6) arg min where L ∈ R pN,N and |X| := ( p−1 k=0 X 2 j+kp ) 1/2 N −1 j=0 . The functional J (x) := |Lx| 1 is convex and one-homogeneous such that its dual J * is the indicator function of the set see [13], [27]. Then the following proposition holds true, cf. [4]. Proposition 3.3. Let µ > 0 be chosen such that µ A * A 2 < 1. Then any solution of the fixed point equation is a solution of (6). Assuming the existence of a solution of (6), the sequence converges for any initial vector x (0) to a fixed point of (8).
For our problem (P 2,1,1 ) we have that β = αλ, L = I and p = 1. Consequently, This can be solved componentwise, i.e., for j = 0, . . . , N − 1 we have to compute Obviously, the solution of (10) is given by βµ if y j ≥ βµ, by −βµ if y j ≤ −βµ and by y j otherwise. Hence, I − Π µβC is the soft shrinkage operator with threshold βµ.
Concerning the convergence of the alternating minimization algorithm we have the following proposition. The proof applies the ideas of [1,2] to our setting in a straightforward way and is left to the reader. Problems (P 1,TV ) and (P 2,1,TV ). Finally, we deal with images X ∈ RÑ ,Ñ having a sparse gradient rather than being sparse themself. A typical example of such an image is the Shepp-Logan phantom in Fig. 2. We reshape these images columnwise into a vector x of length N =Ñ 2 . For defining a discrete version of the TV-norm we introduce the directional forward difference matrix and set x TV := |Dx| 1 . Problem (P 2,1,TV ) can be solved by an alternating minimization algorithm similar to Algorithm 3.2. We only have to replace x 1 in (4) by x TV . Then L = D and p = 2 in (6). Of course this results in another indicator set C and consequently in another projection operator Π µβC . The resulting projections can be computed, e.g., by using Chambolle's algorithm [13], see also [27].
For solving problem (P 1,TV ) one can try to follow the lines of [14] or [26]. Both algorithms introduce an additional parameter ε in the data-fitting and/or the regularization term to cope with the singularity of the absolute value function at zero and converge rather slowly in general. Another possibility to solve (P 1,TV ) is SOCP. This was recently proposed for several image processing problems, e.g. in [31] and [21]. To this end, we rewrite (P 1,TV ) as follows: We restrict our attention to real matrices A. However, the corresponding problem for complex matrices can also be solved by SOCP since we can separate the real and complex parts of A and b. This results in cones of 'dimension' 3 and 5.

Numerical Examples
In our numerical examples we consider two types of matrices: First, real matrices A ∈ R n,N which are constructed by randomly selecting n columns from the orthogonal discrete cosine transform matrix of typ II: , where ε 0 = 1/ √ 2 and ε j = 1 for j = 0. We mention that we have obtained similar results for the solution of (P 1,1 ) and (P 2,1,1 ) with matrices having Gaussian random numbers with mean zero and variance 1/n as entries. In our tests we choose N = 64. The second class of matrices we considered consists of randomly chosen columns of the discrete Fourier transform matrix of length N .
We are interested in the probability that the solutionx of (P 1,1 ) coincides with x 0 for various values of m = x 0 0 and K = e 0 0 . We say that we have recovered e min = min(real(Ax 0 )) + i min(imag(Ax 0 )), e max = max(real(Ax 0 )) + i max(imag(Ax 0 )).
The parameter λ was set to 1. The experiment was repeated 5000 times for every m, K.
In our first example we considered the matrices A = C N and A = F N for N = n = 64 as well as matrices which were constructed by randomly choosing n = 40 rows of the above two matrices. Fig. 1 shows the probability that the solutionx of (P 1,1 ) coincides with x 0 in dependence of m and K for these four matrices. Finally, we are interested in images having a sparse gradient. We use the Shepp-Logan phantom image (N = 64). The measurement operator A is constructed as follows. First, we construct two matrices A 1 , A 2 ∈ R n,N by randomly selecting n = 40 rows of the matrix C N , then we set A = A 1 ⊗ A 2 .
As before, we corrupt the measured data by K = 16 min-max outliers. Furthermore, we add a small amount of Gaussian noise. Solving (P 2,1,TV ) and (P 1,TV ) yields the results shown in Fig. 3 and Fig. 4. The solution to problem (P 2,1,TV ) was computed by applying Algorithm 3.2 to the two-dimensional case (parameters: µ = 0.9, λ = 0.003, α = 0.04 for Gaussian noise with standard deviation 0.001 and α = 0.03 for Gaussian noise with standard deviation 0.01, respectively). The algorithm was stopped when the relative distance between two consecutive images produced by the algorithm, measured in the Frobenius norm, was smaller than 10 −8 . A solution to problem (P 1,TV ) was found by solving the SOCP (13) with MOSEK (parameters: λ = 5 for Gaussian noise with standard deviation 0.001 and λ = 6 for Gaussian noise with standard deviation 0.01, respectively). Fig. 3 and Fig. 4 show that the proposed methods perform remarkably well.