Robust time-of-arrival localization via ADMM

This article considers the problem of source localization (SL) using possibly unreliable time-of-arrival (TOA) based range measurements. Adopting the strategy of statistical robustification, we formulate TOA SL as minimization of a versatile loss that possesses resistance against the occurrence of outliers. We then present an alternating direction method of multipliers (ADMM) to tackle the nonconvex optimization problem in a computationally attractive iterative manner. Moreover, we prove that the solution obtained by the proposed ADMM will correspond to a Karush-Kuhn-Tucker point of the formulation when the algorithm converges, and discuss reasonable assumptions about the robust loss function under which the approach can be theoretically guaranteed to be convergent. Numerical investigations demonstrate the superiority of our method over many existing TOA SL schemes in terms of positioning accuracy and computational simplicity. In particular, the proposed ADMM achieves estimation results with mean square error performance closer to the Cram\'{e}r-Rao lower bound than its competitors in our simulations of impulsive noise environments.


I. INTRODUCTION
Passive source localization (SL) refers to determining the position of a signal-emitting target from the measurements collected using multiple spatially separated sensors [1].Owing to its great significance to a lot of location-based applications (e.g., emergency assistance [2], asset tracking [3], Internet of Things [4], and radar [5]), the SL problem has received much attention in the literature over the past decades [6].Depending on the measurements being used, methods (Corresponding author: Wenxin Xiong.)Wenxin Xiong and Christian Schindelhauer are with the Department of Computer Science, University of Freiburg, Freiburg 79110, Germany (e-mail: w.x.xiong@outlook.com;schindel@informatik.uni-freiburg.de).
Hing Cheung So is with the Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China (e-mail: hcso@ee.cityu.edu.hk).
for SL can be roughly divided into two groups: range-and direction-based.A representative and perhaps the most extensively studied instance of the former, is to exploit the time-of-arrival (TOA) observations characterizing the source-sensor range information for positioning.Under the Gaussian noise assumption, least squares (LS) has undoubtedly been the very thing one needs in order to deliver statistically meaningful estimation results.Along this line, numerous LS techniques have been proposed in the literature, such as algebraic explicit and exact solutions [7]- [9], convex programming approaches [10], [11], and local optimization algorithms [12]- [15], to name but a few.
Statistical model mismatch will appear in real-world scenarios where the Gaussian noise assumption may be violated due to the presence of unreliable sensor data [22], thereby largely deteriorating the performance of the LS methodology [16].To reduce the negative impact of non-Gaussian measurement errors on the positioning accuracy, researchers have resorted to the strategies of joint estimation of location coordinates and a balancing parameter [17], [18], worstcase criterion [19]- [21], as well as robust statistics [23]- [28].Here, we are particularly interested in the statistical robustification of TOA-based LS position estimators.Countermeasures of this type have been attracting increasing attention in recent years, mainly because of their relatively low prior knowledge requirements and low complexity of implementation [28].
Consider a single-source localization system deployed in H-dimensional space.It consists of a radiating source that is to be located, with unknown position x ∈ R H , and L sensors with signal receiving capabilities and known positions {x i ∈ R H |i = 1, ..., L}.Synchronization among the source and sensors is assumed to be guaranteed, so that the TOA of the emitted signal at each of the L sensors can be estimated.Specifically, the TOA-based range measurements are modeled as [1] where • 2 denotes the ℓ 2 -norm and e i accounts for the observation uncertainty associated with the ith source-sensor path.Under the assumption that {e i } are uncorrelated zero-mean Gaussian processes with variances {σ 2 i } and in the maximum likelihood (ML) sense [29], the TOA SL problem is mathematically stated as [12] The ℓ 2 -norm based estimator (2) is, however, vulnerable to outlying (unreliable) data, which manifest themselves as range measurements immersed in non-Gaussian, potentially bias-like errors.Common causes of them in the localization context include the non-line-of-sight and multipath propagation of signals, interference, sensor malfunction, and malicious attacks [5], [16], [22], [30].To achieve resistance against the occurrence of outliers in such adverse situations, (2) was statistically robustified in [25]- [28] by substituting the ℓ 2 loss (•) 2 with a certain fitting error measure less sensitive to biased r i , i.e.
where f (•) turns into the ℓ 1 loss | • | in [26], the Huber loss: with radius R i > 0 in [25], the Welsch loss with parameter σ W in [27], and the ℓ p loss | • | p with 1 ≤ p < 2 in [28].Forming the associated epigraph problem [46]: one may directly apply the classical convex approximation technique of second-order cone programming (SOCP) [47] or difference-of-convex programming (DCP) [48] to address the ℓ 1 -minimization version of (3): Smoothed approximations through the composition of the natural logarithm and hyperbolic cosine functions [26], on the other hand, make the Lagrange-type neurodynamic optimization framework of Lagrange programming neural network (LPNN) applicable to where with parameter ν > 0 is also known as (a.k.a.) the smoothed ℓ 1 loss [49].The idea of resorting instead to the Huber convex underestimator for (1) based on the composite loss f was conceptualized in [25], where [•] + = max(•, 0) is the ramp function.Nonetheless, the focus of [25] is more on the extension of to sensor network node localization rather than single-source positioning.In [27], the intractable range-based Welsch loss minimization problem: was approximated by using the squared ranges, after which a half-quadratic (HQ) optimization algorithm has been devised.The authors of [28] converted the ℓ p -minimization problem: into an iteratively reweighted LS (IRLS) framework and applied sum-product message passing (MP) to solve the x-update IRLS subproblem in closed form.
While implementing the convex programs via interior-point methods is known to scale badly with L, numerical realization of the neurodynamic approach in [26] usually takes several thousands of iterations to reach equilibrium [50].Despite the low-complexity advantage of HQ, the squared-range formulation (11) in [27] inherently suffers from an impaired statistical efficiency.
The MP-assisted IRLS [28] is computationally attractive, yet the lack of a theoretical foundation to support convergence of it may affect the soundness of the technique.With these concerns in mind, we are motivated to explore new avenues for coping with (3), especially in a way less computationally demanding but more statistically efficient and theoretically complete than the existing solutions.
Our main contribution in this work is to develop an alternating direction method of multipliers (ADMM) [31] for tackling the general robust TOA positioning problem (3).As the versatility of the cost function will only be embodied in one of the subproblems that amounts to computing the proximal operator of f (•), we are able to adapt f (•) to specific noise environments with ease.In an analytical manner, we prove that if the proposed ADMM is convergent, the limit point to which it converges will satisfy the Karush-Kuhn-Tucker (KKT) conditions for the equivalent constrained reformulation of (3).We also show the possibility of insuring the theoretical convergence of the ADMM, under several extra assumptions about f (•).In addition, numerical examples are presented to corroborate the positioning accuracy and computational simplicity superiorities of our solution over its competitors.It should be pointed out that compared with the ADMM in [12], our scheme is of independent interest in the following two aspects.First, the algorithm in [12] deals with the non-outlier-resistant formulation (2), whereas we aim at solving (3) with a robustness-conferring fitting error measure.Second, we provide analytical convergence results of the presented ADMM.The authors of [12], in contrast, incorrectly assumed the nonconvex constrained optimization reformulation of (2) to be convex and, as a result, omitting to probe into the convergence of their method under nonconvexity.
The rest of this contribution is organized as follows.The equivalent constrained reformulation of (3) and the ADMM solution to it are described in Section II.The complexity and convergence properties of the proposed ADMM are discussed in detail in Section III.Performance evaluations are conducted in Section IV.Ultimately, Section V concludes the article.

II. ALGORITHM DEVELOPMENT
Let us start by transforming (3) into where d = [d 1 , ..., d L ] ∈ R L is a dummy vector of decision variables for the source-sensor distances, independent of x.
We further rewrite (13) as by introducing an auxiliary vector β = β T 1 , ..., β T L T ∈ R HL into the characterization of range information [12].Here, β i ∈ R H is a unit vector that indicates the source's direction with respect to (w.r.t.) the ith sensor, a.k.a. the direction vector of arrival [32].
For (14), we construct the following augmented Lagrangian with constraints: where λ = λ T 1 , ..., λ T L T ∈ R HL is a vector containing the Lagrange multipliers for (14a) and ρ > 0 the augmented Lagrangian parameter.Splitting the primal variables into two parts, the ADMM for solving (14) consists of the following iterative steps: where the iteration index is indicated by (•) (k) .To be specific, (16a) and (16b) sequentially minimize the augmented Lagrangian (15) w.r.t. the primal variables, after which (16c) updates the dual variables via a gradient ascent with step size ρ.We note that there are only two (namely, x and (d, β)) rather than three primal blocks in the ADMM governed by (16).By comparison, natural extension of the basic two-block ADMM to multi-block cases may result in divergence [33].
More detailed explanations for (16a) and (16b) are given as follows.
By ignoring the constant terms independent of x, the subproblem (16a) can be simplified into an LS form where It has the following closed-form solution: Similarly, another subproblem (16b) is re-expressed as where Exploiting the particular structure of (20), the optimal β can be obtained first as Thus, ( 20) is reduced to which is separable w.r.t. the partition of d into its L elements and leads to the following L subproblems: With the following proposition, we may simplify (24) to some extent in order to facilitate the calculations.
PROPOSITION 1 Under mild assumptions about the robust loss and range observations that f (•) is an even function strictly increasing on the nonnegative semi-axis and {r i } are always nonnegative, coping with (24) will be equivalent to handling PROOF See Appendix A.
It is worth noting that the assumptions made in Proposition 1 are actually rather common (see our justification in the following).In such cases (where Proposition 1 applies), solving (24) boils down to computing the proximal operator of f (•), namely, complying with the definition of proximal mapping below.
DEFINITION 1 The proximal mapping of a function f : R → R with parameter τ > 0 for any Obviously, the computational simplicity of the proximal mapping procedure is crucial to the efficient update of d in each iteration of the proposed ADMM.In fact, the calculation of ( 27) can be done in a relatively unencumbered manner or in closed form for many choices of f (•) exhibiting outlier-resistance [34]- [37].As summarized in Table I 1 , this article restricts the scope of discussion to two typical options: (i) the ℓ p loss | • | p for 1 ≤ p < 2 and (ii) the Huber function The two corresponding instantiations of (3) are then (12) and respectively.Our justification for them is given as follows.The ℓ p -minimization criterion with 1 ≤ p < 2 is known to show a considerable degree of robustness against outliers in a wide range of adverse environments [28], [35], [56], [59].Closely resembling its ℓ 1 (resp.ℓ 2 ) counterpart for large (resp.small) fitting errors, the Huber loss minimization scheme offers another means of allowing somewhat controllability therebetween [38].Most importantly, all these cost functions are compatible with our assumptions in Proposition 1.
We note that a + and a − in Table I are the unique positive root of a−b τ +pa p−1 = 0 for a ∈ [0, b] (when b ≥ 0) and unique negative root of a−b τ − p(−a) p−1 = 0 for a ∈ [b, 0] (when b < 0), respectively, both of which can be conveniently found using simple root-finding algorithms (e.g., bisection) [35].
The steps of ADMM for dealing with the robust TOA SL problem are summarized in Algorithm 1, where δ is the tolerance to terminate the iterations. 1 For the purpose of completeness, we also include the non-robust case of p = 2, in which the ℓp-minimization estimator will coincide with (2), providing optimum estimation performance as long as the disturbances {ei} are independent and identically distributed (i.i.d.) Gaussian processes.
Output: Estimate of source location x.

III. COMPLEXITY AND CONVERGENCE PROPERTIES
The complexity and convergence properties of the proposed ADMM are discussed in detail in this section.
From Algorithm 1, it is not difficult to find that the complexity of the ADMM is dominated by that of the d-update steps, in which proximal mapping calculations are involved.Therefore, the total complexity of Algorithm 1 will be O(N ADMM L) if the loss function taken from Table I admits closed-form d-updates and O(N ADMM KL) otherwise, where N ADMM represents the iteration number of the ADMM and K the number of searches involved in the root-finding process at each ADMM iteration.Empirically, a few steps of the bisection method applied in each d-update and a few tens to hundreds of ADMM iterations will already be enough to yield an accurate source location estimate.By comparison, realizing the second-order cone program derived from (5): N CCCP concave-convex procedure (CCCP) iterations for the difference-of-convex program (5): and the ramp function based Huber convex underestimator ( 9) in [25] using interior-point methods results in O(L 3.5 ), O(N CCCP L 3.5 ), and O(L 3.5 ) complexity, respectively [47].The LPNN in [26], HQ technique in [27], and the MP-assisted IRLS in [28] are three other representative TOA positioning schemes from the literature that adopt the strategy of statistical robustification as well.They lead to O(N LPNN L), O(N HQ KL), and O(N IRLS L) complexity, respectively, where N LPNN is the number of iterations in the numerical implementation of LPNN, N HQ the number of steps needed for the HQ algorithm to converge, and N IRLS that of the IRLS iterations.As reported in [27], [28], [50], N HQ typically takes a value of several tens and the same goes for K and N IRLS , whereas N LPNN is usually of several thousands.It turns out that the proposed ADMM with loss functions permitting closed-form proximal computations, the HQ algorithm in [27], and the IRLS in [28] are most computationally efficient.
With an appropriately selected value of ρ, Algorithm 1 is observed to be always convergent in the simulation studies.Nonetheless, existing theoretical analyses of the convergence of ADMM under nonconvexity [39] are not easily transferable to our design, owing to disparity between the formulation ( 14) and the paradigms specified in [39].We note that similar dilemmas have been faced by many practitioners, despite empirically sound ADMM use cases in their respective contributions [12], [24], [35], [40].In what follows, we will present our own analytic proofs for the convergence of Algorithm 1 in lieu of pinning all hopes on the generally applicable results from the literature.
Establishing the equivalence between (14) and the following proposition points out that (14b) can be disregarded and we may shift our focus to the simplification (31) instead in the convergence analysis: PROPOSITION 2 Under the same assumptions as in Proposition 1, the formulations ( 14) and ( 31) are equivalent to one another.
PROOF The proof of Proposition 2 is similar to that of Proposition 1. Please see Appendix B for the details.
Next, we derive the following theorem for the optimality of tuples produced by Algorithm 1: .. be the tuples of primal and dual variables generated by Algorithm 1.If then the limit (x ⋆ , d ⋆ , β ⋆ , λ ⋆ ) will satisfy the KKT conditions (a.k.a. the first-order necessary conditions [41]) for (31) 2 : where 2 In cases where f (•) is non-differentiable at the origin, the condition (33b) should be substituted with the inclusion: 0L ∈ ∂d (x ⋆ , d ⋆ , β ⋆ , λ ⋆ ) [42], where ∂d (•) denotes the generalized gradient of a function at d that takes into consideration the subdifferential calculus [43].For simplicity's sake, we confine our analyses here to the special case with a differentiable f (•).
Nevertheless, it is worth pointing out that with slight modifications, the results can be readily extended to the scenarios in the presence of non-differentiability.
is the associated Lagrangian of (31).
PROOF See Appendix C.
Also mentionable is the linear independence constraint qualification (LICQ), which has been one of the most commonly adopted constraint qualifications and is necessary to guarantee that an optimal solution will satisfy the KKT conditions [44].Moreover, the uniqueness of the Lagrange multipliers can be ensured by it [41].In our case, the LICQ concerns the linear independence of constraint gradients at the local solution point y ⋆ = (x ⋆ ) T , (d ⋆ ) T , (β ⋆ ) T T : where 0 a×b (resp.I a ) denotes the a × b zero (resp.a × a identity) matrix.In a nontrivial source localization setting, the position of the source is different from those of the sensors [45].That is, none of the elements of {β ⋆ i } should be equal to zero and the same goes for {d ⋆ i }.It is then easily deducible that the LICQ holds.REMARK 1 Theorem 1 does not guarantee the convergence of the proposed ADMM.Instead, it only reveals the optimality of solution when the algorithm converges.To further improve the analytical completeness, one may borrow the idea from [40] to relax the formulation (31) somewhat by introducing additional auxiliary variables and an extra quadratic penalty, whereby rigorously proving the convergence of the generated sequence is made possible.This will, however, give rise to degradation of the estimator's performance [40] (i.e., the incompleteness of analytical convergence results can be seen as an acceptable trade-off for higher estimation accuracy).For avoiding such performance-impairing approximations to the source localization formulation, one would have to limit the scope of choice of f (•), as elaborated in the following statements.
PROOF See Appendix D.
from below for a sufficiently large ρ.
.. is convergent under the above assumptions about f (•) and ρ.

PROOF This corollary is established via Theorems 2 and 3.
Equipped with these promising tools, we are finally enabled to present the following theorem, which asserts that Algorithm 1 can be theoretically guaranteed to converge with certain reasonable assumptions.

IV. SIMULATION RESULTS
In this section, we carry out computer simulations to evaluate the performance of Algorithm 1.
Comparisons are made between the proposed ADMM and several existing TOA-based location estimators that are also built upon robust statistics.Table II gives a summary of these methods, and we note that the convex approximation techniques are all realized using the MATLAB CVX package [52].To implement the LPNN [26], we invoke the MATLAB routine ode15s, a variable-step, variable-order solver relying on the formulas for numerical differentiation of orders 1 to 5 [53].
An SL system with H = 2 and L = 8 is considered.Unless otherwise mentioned, the locations of the source and sensors are randomly generated inside an origin-centered 20 m × 20 m square area, in each of the Monte Carlo (MC) runs.The positioning accuracy metric is the root-meansquare error (RMSE), defined as where N MC denotes the total number of MC runs and is fixed at 3000 here, and x{j} represents the estimate of the source position, x {j} , in the jth MC run.User-specified parameters of the ADMM are set as δ = 10 −5 and ρ = 5 [24].The tunable parameter σ W of the Welsch loss is adaptively chosen according to the Silverman's heuristic [54], in the same fashion as was indicated in [27].The smoothing parameter ν is set to 0.1 to ensure feasibility of the ode15s solver.
Taking into account the presence of outliers, we follow [28], [56] to use the class of α-stable distributions for modeling {e i } in (1).Stable processes are well known for their suitability to characterize skewness and heavy-tailedness.Except for a few special instances, members of the stable distribution family do not have an explicit expression of probability density function (PDF).Instead, their PDF p(z) is implicitly described through the inverse Fourier transform of the characteristic function Φ(t; α, ζ, γ, µ): where the detailed analytical parameterization of Φ(t) can be found in [57].As one may see, there are four parameters defining the family.The stability parameter, 0 < α ≤ 2, controls the tails of the distribution.Generally speaking, the smaller the value of α, the heavier the tails and the more impulsive the random variable being modeled.µ determines the location.The skewness parameter, −1 ≤ ζ ≤ 1, is a measure of asymmetry.In the simplest case where ζ = 0, the distribution becomes symmetric about its mean (which is, µ, when α > 1) and degenerates into the so-called symmetric α-stable (SαS) distribution.By contrast, the distribution is said to be right-skewed (resp.left-skewed) for ζ > 0 (resp.ζ < 0).γ > 0 is the scale parameter, measuring to what extent the distribution spreads out (similar to the variance of the normal distribution).
For illustration purposes, Fig. 1 plots the PDF functions for several representative choices of α and ζ.As in our case, the stable-distributed range measurement noise {e i } is assumed to be i.i.d.
Because the variance of the stable distribution is undefined for α < 2, we introduce the concept of generalized signal-to-noise ratio (GSNR) from [28] to quantify the relative noise level: Furthermore, the square root of the trace of the MC-approximated Cramér-Rao lower bound matrix (termed RCRLB) [58] is included in the comparison results to offer a benchmark for the accuracy of different robust position estimators in i.i.d.non-Gaussian noise.As it is in principe difficult to work out a general schedule for adjusting the Huber radius under stably modeled non-Gaussianity, we simply follow [55] to assign a fixed value of 1 to R i .
To begin with, Figs.We should also note that for reproducibility, in this test L = 8 sensors are evenly placed on the perimeter of the afore-defined square region and the source is deterministically deployed at x = [2, 3] T m.Both ℓ p -ADMM and Huber-ADMM are observed to rapidly decrease the objective function and converge to a point close to the true source position, in the first few tens of iterations.
In our following simulation studies, the value of p required by ℓ p -ADMM and ℓ p -IRLS will be set to the optimal one hinging on α [60], in a way analogous to [28], [56].two proposed ADMM approaches deliver the lowest level of positioning errors for all GSNRs.
Especially, the RMSE performance of ℓ p -ADMM is the closest to the RCRLB benchmark (with a small gap of about half a meter).Among the six competitors, ℓ p -IRLS produces the best results.This reconfirms the findings reported in [28], that ℓ p -IRLS can be fairly statistically efficient in impulsive noise if p is optly chosen.The direct nonlinear optimization based ℓ 1 -minimization techniques, ℓ 1 -DCP and ℓ 1 -LPNN, are slightly inferior to ℓ p -IRLS due to model mismatch but still capable of outperforming the remaining schemes, whose estimation performance looks relatively poorer.ℓ 1 -LPNN will not be comparable to ℓ 1 -DCP as the GSNR decreases from 25 dB, attributed to the smoothed approximations introduced in (7).Nonetheless, the former does not have tightness issues from which the traditional convex relaxation methods ℓ 1 -SOCP and Huber-CUE suffer, nor is it quite as statistically inefficient Welsch-HQ.
Setting the GSNR to 20 dB and the impulsiveness-controlling parameter as α = 1.5, we plot the RMSE versus the number of sensors used for localization, L ∈ 9], in Fig. 5. Again, the two ADMM solutions have the best performance.Although none of the estimators can attain RCRLB, our algorithms are still the closest ones to it, particularly in the scenarios with a large number of sensors.Fig. 6 further plots the RMSE as a function of α ∈ [1.1, 1.9] for L = 8 and GSNR = 20 dB, which is once more the case where ℓ p -ADMM is demonstrated to be superior to the rest of the approaches.
In Table III, we list the per-sample average CPU time of the eight location estimators in the above simulations.It is seen that the running time can vary by several orders of magnitude.
Huber-ADMM admitting closed-form proximal mapping of f (•) and Welsch-HQ are the second Nonetheless, its problem-solving process still takes less time than those of the classical convex approximation techniques.In general, the simulation results agree with our complexity analysis in Section III.

V. CONCLUSION
This article focused on the problem of outlier-resistant TOA SL. on the ADMM, we presented an iterative algorithm to handle the statistically robustified positioning formulation.
Each iteration of our ADMM consists of two primal variable minimization steps and a dual variable update, all of can be effortlessly implemented provided that the loss function relied on allows convenient proximal computations.What is more, we proved that the ADMM will converge to a KKT point of the nonconvex constrained optimization problem under certain conditions, and verified that the LICQ holds at the point for nontrivial SL configurations.The superiority of the devised scheme over a number of robust statistics type TOA SL methods in terms of positioning accuracy in impulsive noise and computational simplicity was demonstrated through simulations.Let (x * , d * , β * ) be the point corresponding to the globally optimal solution to (31).To prove the proposition, it suffices to show that d * i ≥ 0 holds ∀i ∈ {1, ..., L}.Based on the assumptions made and the reverse triangle inequality, we have  22) renders the solution to the β-subproblem in each of the ADMM iterations certainly feasible, the condition (33e) is satisfied.On the other hand, based on the dual variable update schedule (16c) and our assumption about lim k→∞ λ (k) in (32), it is straightforward to deduce that the condition (33d) is satisfied as well.We now proceed to check the remaining conditions (33a)-(33c).As x (k+1) and d (k+1) , β (k+1) are the minimizers of the subproblems (16a) and (16b), respectively, the following relations hold: Putting ( 32), (42a), and the verified condition (33d) together, we have Analogously, it follows from ( 32), (33d), and (42b) that and ( 32), (33d), and (42c) that The conditions (33a)-(33c) are verified by the above equalities.The proof is complete.

APPENDIX D PROOF OF THEOREM 2
In order to study the monotonicity properties of the sequence we decompose the difference in L ρ between two successive ADMM iterations as and analyze their ranges one by one.
For (47a) and (47b), it follows from the xand β-update schedules of the proposed ADMM that and hold by definition.
On the other hand, additional assumptions about f (•) will be required to facilitate the analysis of (47c) and (47d).As long as we assume the convexity of f (r i − d i ) w.r.t.d i , ∀i ∈ {1, ..., L}, it would then be easily verified that is strongly convex w.r.t.d (with parameter M 2 > 0), by examining the positive semidefiniteness of the difference between its Hessian and M 2 • I L .Applying the equivalent inequality that characterizes strong convexity [46] to the points d (k+1) and d (k) , we have for (47c) which subsequently leads to The remaining difference of two evaluated augmented Lagrangians, (47d), can be re-expressed as based on (16c).Since our d i -minimization schedule implies taking advantage of the convexity of f (r i − d i ) w.r.t.d i , ∀i ∈ {1, ..., L} once again we derive and Utilizing the fact that β (k) i and β (k+1) i are both unit vectors and our convexity as well as Lipschitz continuity assumptions about f (r i − d i ), we arrive at after putting ( 55) and ( 56) into (53).

APPENDIX E PROOF OF THEOREM 3
Combining the convexity of f (r i −d i ) w.r.t.d i with the M 1 -Lipschitz continuity of its gradient gives [51, Lemma 4] Having in mind that β (k) i is a unit column vector, we construct from ( 59) by letting η 1 and η 2 be β (k) i T x (k) − x i and d (k) i , respectively.Plugging ( 56) into (60) further deduces (61) Based on (61) and the Cauchy-Schwarz inequality, we have (62a) Under the symmetry and monotonicity assumptions earlier made about f (•) in Proposition 1, it is not hard to find that the first part of the equation enclosed within square brackets in (62c) is bounded from below.The same can also be said of the second part for ρ ≥ M 1 (in fact, ρ should satisfy ρ ≥ max(M 1 , (2M 2 1 )/M 2 ) so that Propositions 1 and 2 remain valid), trivially, as it is lower bounded by 0. This completes the proof.

APPENDIX F PROOF OF THEOREM 4
First of all, it follows from (58) and Corollary 1 that as well.Akin to ( 50) and ( 51), the strong convexity characterizing inequality (with parameter (65) associated with the points x = x (k+1) and x = x (k) gives which implies by invoking Corollary 1 once more.Likewise, we can show for some parameter M 4 > 0, following a procedure similar to (65)-(67).The proof is complete.

THEOREM 4 .
The sequence x(k) , d(k) , β(k) , λ (k) k=1,... is convergent, namely,(32) holds, under the same circumstances as in Corollary 1. PROOF See Appendix F. COROLLARY 2 Let us adopt the setting of Corollary 1.The solution obtained by Algorithm 1 corresponds to a KKT point of the nonconvex constrained optimization problem (31).PROOF This corollary is established via Theorems 1 and 4.

2 and 3
show the convergence behavior of ℓ p -ADMM (p taking different values from 1 to 2) and Huber-ADMM in a single MC run for α = 1.5 and GSNR = 20 dB.
optimal solution to (25) by d * 3 .Straightforwardly, the proposition will hold if we are able to show that d * i ≥ 0 holds ∀i ∈ {1, ..., L}.Based on the assumptions made and the reverse triangle inequality ||a| − |b|| ≤ |a − b|, we have ) which means that the objective function value associated with the global optimum d = d * is greater than or equal to its counterpart associated with d = abs(d * ), where abs(•) stands for the element-wise absolute value function.Namely, d * will no longer be the globally optimal solution if the inequality in (40c) holds strictly.It then follows that ≥ in (40c) should degrade into =, which holds if and only if d * i = |d * i |, ∀i ∈ {1, ..., L} since the sum of two strictly monotonic functions of the same kind of monotonicity is still a monotonic function.Therefore, d * i ≥ 0 is tacitly satisfied ∀i ∈ {1, ..., L}.The proof is complete.APPENDIX B PROOF OF PROPOSITION 2

TABLE I PROXIMAL
COMPUTATIONS FOR ℓp (1 ≤ p ≤ 2) AND HUBER LOSS FUNCTIONS

TABLE III AVERAGE
CPU TIME FOR SIMULATIONS CONDUCTED ON A LAPTOP WITH 16 GB MEMORY AND 4.7 GHZ CPU and third fastest, respectively, taking only slightly more time than the state-of-the-art method ℓ p -IRLS.Because of the extra bisection procedure incorporated into the ADMM algorithm, ℓ p -ADMM may not be as computationally efficient as Huber-ADMM in actual CPU time comparisons.