Skip to main content
Log in

Nonconvex nonsmooth optimization via convex–nonconvex majorization–minimization

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

The class of majorization–minimization algorithms is based on the principle of successively minimizing upper bounds of the objective function. Each upper bound, or surrogate function, is locally tight at the current estimate, and each minimization step decreases the value of the objective function. We present a majorization–minimization approach based on a novel convex–nonconvex upper bounding strategy for the solution of a certain class of nonconvex nonsmooth optimization problems. We propose an efficient algorithm for minimizing the (convex) surrogate function based on the alternating direction method of multipliers. A preliminary convergence analysis for the proposed approach is provided. Numerical experiments show the effectiveness of the proposed method for the solution of nonconvex nonsmooth minimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Afonso, M.V., Bioucas-Dias, J.M., Figueiredo, M.A.T.: Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)

    Article  MathSciNet  Google Scholar 

  2. Asif, M.S., Romberg, J.: Fast and accurate algorithms for re-weighted \(l_1\)-norm minimization. IEEE Trans. Signal Process. 61(23), 5905–5916 (2013)

    Article  MathSciNet  Google Scholar 

  3. Baglama, J., Reichel, L.: Augmented implicitly restarted lanczos bidiagonalization methods. SIAM J. Sci. Comput. 27(1), 19–42 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

  5. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    Article  MATH  Google Scholar 

  6. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, New York (1987)

  7. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  8. Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52(2), 489–509 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Candès, E.J., Wakin, M.B., Boyd, S.: Enhancing sparsity by reweighted l1 minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  10. Charbonnier, P., Blanc-Feraud, L., Aubert, G., Barlaud, M.: Deterministic edge-preserving regularization in computed imaging. IEEE Trans. Image Process. 6(2), 298–311 (1997)

    Article  Google Scholar 

  11. Chartrand, R.: Fast algorithms for nonconvex compressive sensing:MRI reconstruction from very few data. In: Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro—ISBI’09, pp. 262–265. Boston, MA, USA (2009)

  12. Chen, P.-Y., Selesnick, I.W.: Group-sparse signal denoising: non-convex regularization, convex optimization. IEEE Trans. Signal Process. 62(13), 3464–3478 (2014)

    Article  MathSciNet  Google Scholar 

  13. Chouzenoux, E., Jezierska, A., Pesquet, J., Talbot, H.: A majorize-minimize subspace approach for \(\ell _2-\ell _0\) image regularization. SIAM J. Imag. Sci. 6(1), 563–591 (2013)

    Article  MATH  Google Scholar 

  14. Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Combettes, P.L., Pesquet, J.-C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18(4), 1351–1376 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., et al., editors. Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, Berlin (2011)

  17. Daubechies, I., DeVore, R., Fornasier, M., Gunturk, C.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ding, Y., Selesnick, I.W.: Artifact-free wavelet denoising: Non-convex sparse regularization, convex optimization. IEEE Signal Process. Lett. 22(9), 1364–1368 (2015)

    Article  Google Scholar 

  19. Do, T.-M.-T., Artières, T.: Regularized bundle methods for convex and non-convex risks. J. Mach. Learn. Res. 13(1), 3539–3583 (2012)

    MathSciNet  MATH  Google Scholar 

  20. Donoho, D.L.: Compressed sensing. IEEE Trans. Inform. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Figueiredo, M., Bioucas-Dias, J., Nowak, R.: Majorization–minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16(12), 2980–2991 (2007)

    Article  MathSciNet  Google Scholar 

  22. Figueiredo, M., Nowak, R.: An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 12(8), 906–916 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Geman, D., Chengda, Y.: Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 4(7), 932–946 (1995)

    Article  Google Scholar 

  24. Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE Trans. Pattern Anal. Mach. Intel. 14(3), 367–383 (1992)

  25. Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imag. Sci. 2(2), 323–343 (2009)

    Article  MATH  Google Scholar 

  26. Jacobson, M.W., Fessler, J.A.: Properties of mm algorithms on convex feasible sets: extended version. Technical Report, Comm. and Sign. Proc. Lab., Dept. of EECS, Univ. of Michigan, Ann Arbor, 48109-2122, p. 353 (2004)

  27. Lange, K., Chi, E.C., Zhou, H.: A brief survey of modern optimization for statisticians. Int. Stat. Rev. 82(1), 46–70 (2014)

    Article  MathSciNet  Google Scholar 

  28. Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: A generalized Krylov subspace method for \(\ell _p-\ell _q \) minimization. SIAM J. Sci. Comput. 37(5), S30–S50 (2015)

    Article  MATH  Google Scholar 

  29. Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization. In: Aujol, J.-F., Nikolova, M., Papadakis, N., editors. Scale Space and Variational Methods in Computer Vision, vol. 9087. Lecture Notes in Computer Science, pp. 666–677. Springer, Berlin (2015)

  30. Lanza, A., Morigi, S., Sgallari, F.: Constrained \({TV}_p-\ell _2\) model for image restoration. J. Sci. Comput. 68(1), 64–91 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization with parameter selection. J. Math. Imag. Vis. 56(2), 195–220 (2016). doi:10.1007/s10851-016-0655-7

  32. Laporte, L., Flamary, R., Canu, S., Déjean, S., Mothe, J.: Nonconvex regularizations for feature selection in ranking with sparse SVM. IEEE Trans. Neural Netw. Learn. Syst. 25(6), 1118–1130 (2014)

  33. Mairal, J.: Incremental majorization–minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  34. Mourad, N., Reilly, J.P.: Minimizing nonconvex functions for sparse vector reconstruction. IEEE Trans. Signal Process. 58(7), 3485–3496 (2010)

    Article  MathSciNet  Google Scholar 

  35. Nesterov, Y., et al.: Gradient methods for minimizing composite objective function (2012)

  36. Nikolova, M.: Estimation of binary images by minimizing convex criteria. Proc. IEEE Int. Conf. Image Process. (ICIP), vol. 2, pp. 108–112 (1998)

  37. Nikolova, M.: Markovian reconstruction using a GNC approach. IEEE Trans. Image Process. 8(9), 1204–1220 (1999)

    Article  Google Scholar 

  38. Nikolova, M.: Energy minimization methods. In: Scherzer, O., editor. Handbook of Mathematical Methods in Imaging, chap. 5, pp. 138–186. Springer, Berlin (2011)

  39. Nikolova, M., Ng, M.K., Tam, C.-P.: Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process. 19(12), 3073–3088 (2010)

    Article  MathSciNet  Google Scholar 

  40. Parekh, A., Selesnick, I.W.: Convex denoising using non-convex tight frame regularization. IEEE Signal Process. Lett. 22(10), 1786–1790 (2015)

    Article  Google Scholar 

  41. Rao, B.D., Engan, K., Cotter, S.F., Palmer, J., Kreutz-Delgado, K.: Subset selection in noise based on diversity measure minimization. IEEE Trans. Signal Process. 51(3), 760–770 (2003)

    Article  Google Scholar 

  42. Repetti, A., Pham, M.Q., Duval, L., Chouzenoux, E., Pesquet, J.-C.: Euclid in a taxicab: sparse blind deconvolution with smoothed l1/l2 regularization. IEEE Signal Process. Lett. 22(5), 539–543 (2015)

    Article  Google Scholar 

  43. Rodriguez, P., Wohlberg, B.: Efficient minimization method for a generalized total variation functional. IEEE Trans. Image Process. 18(2), 322–332 (2009)

    Article  MathSciNet  Google Scholar 

  44. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  45. Selesnick, I.W., Parekh, A., Bayram, I.: Convex 1-D total variation denoising with non-convex regularization. IEEE Signal Process. Lett. 22(2), 141–144 (2015)

    Article  Google Scholar 

  46. Selesnick, I.W., Bayram, I.: Sparse signal estimation by maximally sparse convex optimization. IEEE Trans. Signal. Process. 62(5), 1078–1092 (2014)

    Article  MathSciNet  Google Scholar 

  47. Voronin, S., Chartrand, R.: A new generalized thresholding algorithm for inverse problems with sparsity constraints. In: Proc. IEEE Int. Conf. Acoust, Speech, Signal Processing (ICASSP), pp. 1636–1640 (2013)

  48. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imag. Sci. 1(3), 248–272 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  49. Zhao, Y.-B., Li, D.: Reweighted \( \ell _1 \)-minimization for sparse solutions to underdetermined linear systems. SIAM J. Optim. 22(3), 1065–1088 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank the referees for comments that lead to improvements of the presentation. Research by IS was supported by the NSF (USA) under Grant No. CCF-1525398. Research by SM and FS was supported in part by the National Group for Scientific Computation (GNCS-INDAM), Research Projects 2015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Morigi.

Appendix

Appendix

Proof of Proposition 3.3

By applying formula (4) in Lemma 2, namely \(\phi (t;a) = u(t;a) + |t|\), after simple algebraic manipulations the functional \(\mathcal {J}(x)\) in (1) can be equivalently rewritten as follows:

$$\begin{aligned} \mathcal {J}(x) \,\;{=}\;\, \underbrace{\frac{1}{2} \, \Vert A x \Vert _2^2 + \, \sum _{i=1}^{s} \mu _i u\left( (L x)_i;a_i \right) }_{\mathcal {J}_1(x)} \,\;{+}\;\, \underbrace{\frac{1}{2} \, \Vert b \Vert _2^2 \,{-}\; b^{T} \!A x \,\;{+}\;\, \sum _{i=1}^{s} \mu _i | (L x)_i |}_{\mathcal {J}_2(x)}. \end{aligned}$$
(89)

Since functional \(\mathcal {J}_2(x)\) in (89) is convex, convexity of \(\mathcal {J}(x)\) follows from convexity of \(\mathcal {J}_1(x)\), which is twice continuously differentiable due to statement 1) of Lemma 2. Hence, a sufficient condition for \(\mathcal {J}(x)\) being strictly convex is that the Hessian matrix H(x) of functional \(\mathcal {J}_1(x)\) in (89) is positive definite for all \(x \in {\mathbb R}^n\), that is:

$$\begin{aligned} H(x) \,\;{=}\;\, \underbrace{A^{T} A}_{H_A} \,\;{-}\;\, \underbrace{L^{T} \, \Gamma (x) \, L}_{H_L(x)} \,\;{\succ }\;\, 0 \quad \forall x \in {\mathbb R}^n, \end{aligned}$$
(90)

where \(\Gamma (x)\) is the \(s \times s\) diagonal matrix depending on x defined as:

$$\begin{aligned} \Gamma (x) \;{=}\; \mathrm {diag}\big ( \gamma _1(x),\ldots ,\gamma _s(x) \big ) \; , \quad \; \gamma _i(x) \;{=}\;\, -\mu _i \, u''\!\left( (L x)_i;a_i\right) . \end{aligned}$$
(91)

Since \(\mu _i>0\) by assumption, from statement 2) of Lemma 2 it follows that:

$$\begin{aligned} \gamma _i(x) \;{\in }\;\, [\,0, \, \mu _i a_i\,] \quad \;\, \forall \, x \,{\in }\, {\mathbb R}^n, \;\;\, \forall \, i \,{\in }\, \{1,\ldots ,s\} . \end{aligned}$$
(92)

Hence, the two \(n \times n\) matrices \(H_A\) and \(H_L(x)\) in (90) are both at least positive semi-definite, if not positive definite, for any \(x \in {\mathbb R}^n\). We notice that if matrix \(H_A\) is only positive semi-definite, that is \(\ker \{A^{T}A\} \ne \{0\}\), there is no possibility for the Hessian matrix H(x) in (90) to be positive definite for all \(x \in {\mathbb R}^n\). This justifies condition (5) for strict convexity of \(\mathcal {J}(x)\).

To further investigate positive-definiteness of matrix H(x) in (90), we introduce the singular value decomposition (SVD) of matrices \(A \in {\mathbb R}^{m \times n}\) and \(L \in {\mathbb R}^{s \times n}\):

$$\begin{aligned} \begin{array}{llll} A \,\;{=}\;\, U_A \, \Sigma _A V_A^{T}, &{} U_A \in {\mathbb R}^{m \times n}, &{} \!\!\!\Sigma _A \in {\mathbb R}^{n \times n}, &{} \!\!\!V_A \in {\mathbb R}^{n \times n}, \\ L \,\;{=}\;\, U_L \,\!\, \Sigma _L \, V_L^{T}, &{} U_L \,\! \in {\mathbb R}^{s \times p}, &{} \!\!\!\Sigma _L \,\! \in {\mathbb R}^{p \times p}, &{} \!\!\!V_L \,\! \in {\mathbb R}^{n \times p}, \;\;\, p \;{:=}\, \min \{s,n\}, \end{array} \end{aligned}$$
(93)

where in the SVD of matrix \(A \in {\mathbb R}^{m \times n}\) we are implicitly assuming that \(m \,{\ge }\, n\), since otherwise condition (5) can not be satisfied. We recall that \(\Sigma _A\) and \(\Sigma _L\) in (93) are diagonal matrices containing the singular values of matrices A and L, respectively, while \(U_A,V_A\) and \(U_L,V_L\) are orthogonal matrices containing the left and right singular vectors of matrices A and L, respectively, and are such that \(U_A^{T} U_A = V_A^{T} V_A = V_A V_A^{T} = I_n\) and \(U_L^{T} U_L = V_L^{T} V_L = I_p\).

By substituting (93) into the positive-definiteness condition (90), we obtain:

$$\begin{aligned} H(x) \,\;{=}\;\, \underbrace{V_A \Sigma _A^2 V_A^{T}}_{H_A} \,\;{-}\;\, \underbrace{V_L \Sigma _L U_L^{T} \, \Gamma (x) \, U_L \Sigma _L V_L^{T}}_{H_L(x)} \,\;{\succ }\;\, 0 \quad \forall x \in {\mathbb R}^n. \end{aligned}$$
(94)

In order to obtain sufficient conditions for (94) being satisfied, we introduce a lower bound (in terms of positive-definiteness) \(\,\,\underline{H}\) \(_{\!A}\) for \(H_A\):

$$\begin{aligned} {\,\,\underline{H}}_{\!A} \,\;{:=}\;\, V_A \, {\,\underline{\Sigma }}_A^2 \, V_A^{T} \,\;{=}\;\, V_A \, \sigma _{A,\mathrm {min}}^2 I_n \, V_A^{T} \,\;{=}\;\, \sigma _{A,\mathrm {min}}^2 I_n \,\;\;{\preceq }\;\;\, H_A, \end{aligned}$$
(95)

and an upper bound \({\bar{H}}_{\!L}\) for \(H_L(x)\):

(96)

where \(\sigma _{A,\mathrm {min}}\) and \(\sigma _{L,\mathrm {max}}\) denote the minimum and maximum among the singular values of matrices A and L, respectively, and where the upper bound \(\,\bar{\Gamma }\) comes from properties (92) of the diagonal matrix \(\Gamma (x)\) defined in (91). By substituting the lower bound \(\,\,\underline{H}\) \(_{\!A}\) in (95) for \(H_A\) and the upper bound \(\bar{H}\) \(_{\!L}\) in (96) for \(H_L(x)\) into the definition of matrix H(x) in (94), and introducing the matrix

$$\begin{aligned} X \,{:=}\, U_L V_L^{T} \in {\mathbb R}^{s \times n}, \end{aligned}$$
(97)

we obtain a lower bound \(\,\,\underline{H}\) for H(x):

(98)

We notice that, since \(U_L\) and \(V_L\) are orthogonal matrices, the matrix \(X \in {\mathbb R}^{s \times n}\) defined in (97) has full (column and/or row) rank, that is \(\mathrm {rank}\{X\} = \min \{s,n\} = p\). To conclude the proof, we consider separately the two cases \(s \ge n\) (square or tall matrix) and \(s < n\) (wide matrix) for the linear operator \(L \in {\mathbb R}^{s \times n}\).

Case \({s \ge n}\) In this case, since \(p \;{:=}\, \min \{s,n\} \,{=}\; n\), the SVD in (93) of the square or tall matrix \(L \in {\mathbb R}^{s \times n}\) reads as follows: \(L \,\;{=}\;\, U_L \,\!\, \Sigma _L \,\!\, V_L^{T}\), \(U_L \,{\in }\;\, {\mathbb R}^{s \times n}\), \(\Sigma _L \,{\in }\;\, {\mathbb R}^{n \times n}\), \(V_L \,{\in }\;\, {\mathbb R}^{n \times n}\), where the orthogonal matrices \(U_L\) and \(V_L\) are such that \(U_L^{T} U_L \;{=}\; V_L^{T} V_L \;{=}\; V_L V_L^{T} \;{=}\; I_n, \) and the matrix \(X \in {\mathbb R}^{s \times n}\) defined in (97) satisfies:

$$\begin{aligned} X^{T} X \;{=}\; V_L U_L^{T} U_L V_L^{T} \;{=}\; I_n, \end{aligned}$$
(99)

that is the n (\(\le \)s) columns of X are s-dimensional orthonormal vectors. By substituting the expression (99) for \(I_n\) in (98), the lower bound matrix \(\,\,\underline{H}\) can be equivalently rewritten as follows:

$$\begin{aligned} {\,\,\underline{H}}= & {} \sigma _{A,\mathrm {min}}^2 \, X^{T} X \,\;{-}\;\; \sigma _{L,\mathrm {max}}^2 \, X^{T} \! \mathrm {diag}(\mu _1 a_1, \ldots , \mu _s a_s) \, X \nonumber \\= & {} X^{T} \mathrm {diag}(\sigma _{A,\mathrm {min}}^2 {-}\; \sigma _{L,\mathrm {max}}^2 \, \mu _1 a_1,\ldots , \sigma _{A,\mathrm {min}}^2 {-}\; \sigma _{L,\mathrm {max}}^2 \, \mu _s a_s) X. \end{aligned}$$
(100)

Recalling that \(\,\,\underline{H}\) in (100) is a lower bound of H(x) in (90) for any \(x \in {\mathbb R}^n\) and that the matrix \(X \in {\mathbb R}^{s \times n}\) in (100) has full column rank, it follows from Lemma 1 that:

$$\begin{aligned} H(x) \,\;{\succeq }\; {\,\,\underline{H}} \,\;{\succ }\;\, 0 \;\; \forall x \in {\mathbb R}^n \quad \; \mathrm {if} \quad \; \sigma _{A,\mathrm {min}}^2 \,\!\,\!{-}\;\, \sigma _{L,\mathrm {max}}^2 \, \mu _i a_i \;{>}\;\, 0 \;\;\, \forall \, i \,{\in }\, \{1,\ldots ,s\} \,, \end{aligned}$$
(101)

thus proving the second condition for strict convexity of \(\mathcal {J}(x)\) in (6).

Case \({s < n}\) In this case, \(p \;{:=}\, \min \{s,n\} \,{=}\; s\) and the SVD in (93) of the wide matrix \(L \in {\mathbb R}^{s \times n}\) is \(L \,\;{=}\;\, U_L \,\!\, \Sigma _L \,\!\, V_L^T\), \(U_L \,{\in }\;\, {\mathbb R}^{s \times s}\), \(\Sigma _L \,{\in }\;\, {\mathbb R}^{s \times s}\), \(V_L \,{\in }\;\, {\mathbb R}^{n \times s}\), where the orthogonal matrices \(U_L\) and \(V_L\) are such that \(U_L^T U_L \;{=}\; U_L U_L^T \;{=}\; V_L^T V_L \;{=}\; I_s\), and the matrix \(X \in {\mathbb R}^{s \times n}\) defined in (97) satisfies: \(X X^T \;{=}\; U_L V_L^T V_L U_L^T \;{=}\; I_s\), that is the s (\(< n\)) rows of X are n-dimensional orthonormal vectors. Hence, it is always possible to build a square orthogonal matrix \(\widetilde{X} \in {\mathbb R}^{n \times n}\) defined as follows:

$$\begin{aligned} \widetilde{X} \;{:=}\; (\begin{array}{cccc} X&\tilde{v}_1&\ldots&\tilde{v}_{n-s} \end{array} )^T, \tilde{v}_i \in {\mathbb R}^n \;\;\mathrm {such}\;\mathrm {that}\;\; \widetilde{X} \widetilde{X}^T = \widetilde{X}^T \! \widetilde{X} = I_n. \end{aligned}$$
(102)

Based on (102), the lower bound matrix \(\,\,\underline{H}\) defined in (98) is rewritten as follows:

$$\begin{aligned} {\,\,\underline{H}}= & {} \sigma _{A,\mathrm {min}}^2 \widetilde{X}^T \! \widetilde{X} \;{-}\;\, \sigma _{L,\mathrm {max}}^2 \, \widetilde{X}^T \mathrm {diag}(\mu _1 a_1,\ldots ,\mu _s a_s, \underbrace{0,\ldots ,0}_{n-s\;\,entries}) \, \widetilde{X} \nonumber \\= & {} \widetilde{X}^T \mathrm {diag}\! \bigg (\sigma _{A,\mathrm {min}}^2 {-}\; \sigma _{L,\mathrm {max}}^2 \, \mu _1 a_1,\;\ldots ,\, \sigma _{A,\mathrm {min}}^2 {-}\; \sigma _{L,\mathrm {max}}^2 \, \mu _s a_s, \nonumber \\&\times \underbrace{\sigma _{A,\mathrm {min}}^2,\;\ldots \;,\,\sigma _{A,\mathrm {min}}^2}_{n-s\;\,entries}\bigg ) \, \widetilde{X}. \end{aligned}$$
(103)

Since \(\,\,\underline{H}\) in (103) is a lower bound of H(x) in (90) for any \(x \in {\mathbb R}^n\) and \(\widetilde{X} \in {\mathbb R}^{n \times n}\) in (103) has full (column and row) rank, from Lemma 1 it follows that:

$$\begin{aligned} H(x) \,\;{\succeq }\; {\,\,\underline{H}} \,\;{\succ }\;\, 0 \;\; \forall x \in {\mathbb R}^n \;\;\, \mathrm {if} \;\;\, \left\{ \! \begin{array}{l} \displaystyle {\sigma _{A,\mathrm {min}}^2 \,\!\,\!{-}\;\, \sigma _{L,\mathrm {max}}^2 \, \mu _i a_i \;{>}\;\, 0 \;\;\, \forall \, i \,{\in }\, \{1,\ldots ,s\}} \\ \displaystyle {\sigma _{A,\mathrm {min}}^2 \;{>}\;\, 0} \end{array} \right. \!\!\!. \end{aligned}$$
(104)

Since \(\ker \{A^TA\} = \{0\}\), the second condition \(\sigma _{A,\mathrm {min}}^2 \;{>}\;\, 0\) in (104) is always satisfied, while the first condition is equivalent to the convexity condition in (6). \(\square \)

Proof of Proposition 3

By substituting v for t in (12), and \(c_m\) given in (13), we obtain (14).

Recalling the definition of \(w_m\) in (13), the first-order partial derivative \(m_t\) of the majorant function m in (12) with respect to t, for \(t,v \in {\mathbb R}{\setminus }\{0\}\), is given by:

$$\begin{aligned} m_t(t,v;a_m) \,\;{=}\;\, \frac{\phi '(v;a)}{\phi '(v;a_m)} \;\, \phi '(t;a_m). \end{aligned}$$
(105)

Substituting v for t in (105) we have (15). In the case \(v=0\), the majorant function in (12) reduces to \(m(t,0;a_m) \; {=} \; \phi (t;a_m)\). Since both the majorized and the majorant functions belong to the family of penalty functions \(\phi \) defined in Sect. 2, it follows from assumption A6) that \(m_t(0^{\pm },0;a_m) \; = \; \phi '(0^{\pm };a) \; = \; \pm 1\), hence (16).

Since both the majorized function \(\phi (t;a)\) and the majorizing function \(m(t,v;a_m)\) are continuous and even in t for any \(v \in {\mathbb R}\) due to assumptions A1) and A2), it is sufficient to prove (17) for \(v > 0\) and \(t > 0\). Noting that \(\phi (t;a)\) and \(m(t,v;a_m)\) are both continuously differentiable in t for \(t > 0\) thanks to assumption A3), we have:

$$\begin{aligned} m(t,v;a_m) \,\;= & {} \;\, m(v,v;a_m) \;{+} \int _v^t \! m_t(\xi ,v;a_m) \, d\xi \,, \end{aligned}$$
(106)
$$\begin{aligned} \phi (t;a) \,\;= & {} \,\; \phi (v;a) \;{+} \int _v^t \! \phi '(\xi ;a) \, d\xi \,, \end{aligned}$$
(107)

Hence, by subtracting (107) from (106) and recalling (14), we obtain:

$$\begin{aligned} m(t,v;a_m) - \phi (t;a) \,\;{=}\, \int _v^t \! \big ( m_t(\xi ,v;a_m) - \phi '(\xi ;a) \big ) \, d\xi . \end{aligned}$$
(108)

Given the definition of \(m_t\) in (105), we can thus write:

$$\begin{aligned} m(t,v;a_m) - \phi (t;a)= & {} \int _v^t \! \left( \frac{\phi '(v;a)}{\phi '(v;a_m)} \, \phi '(\xi ;a_m) \;{-}\; \phi '(\xi ;a) \right) d\xi \nonumber \\= & {} \int _v^t \! \left( \frac{\phi '(v;a)}{\phi '(v;a_m)} \;{-}\; \frac{\phi '(\xi ;a)}{\phi '(\xi ;a_m)} \right) \phi '(\xi ;a_m) \,\, d\xi \nonumber \\= & {} \int _v^t \! \big ( h(v) \;{-}\; h(\xi ) \big ) \, \phi '(\xi ;a_m) \, d\xi \nonumber \\= & {} \int _v^t \! h'(\vartheta ) (v - \xi ) \, \phi '(\xi ;a_m) \,\, d\xi \,, \end{aligned}$$
(109)

where in the third equality we introduced the function \(h: {\mathbb R}_+^* \rightarrow {\mathbb R}_+^*\) defined as:

$$\begin{aligned} h(z) \,\;{=}\;\, \frac{\phi '(z;a)}{\phi '(z;a_m)} \,, \quad z > 0 \, , \end{aligned}$$
(110)

and in the last equality (109), which is valid for some \(\vartheta \) between v and \(\xi \), we replaced the first-order Taylor’s expansion of h around \(\xi \). The first-order derivative of function h in (110) is guaranteed to exist for any \(z > 0\) due to assumption A3) and is as follows:

$$\begin{aligned} h'(z)= & {} \frac{\phi ''(z;a)\phi '(z;a_m)-\phi '(z;a)\phi ''(z;a_m)}{\left( \phi '(z;a_m)\right) ^2} \nonumber \\= & {} \frac{\phi '(z;a)}{\phi '(z;a_m)} \; \left( \frac{\phi ''(z;a)}{\phi '(z;a)} - \frac{\phi ''(z;a_m)}{\phi '(z;a_m)} \right) \nonumber \\\le & {} 0 \quad \forall \, z > 0, \; 0< a_m \!< a, \end{aligned}$$
(111)

where the last inequality (111) follows from assumption A4) and assumption A7) with \(a_1 = a_m\), \(a_2 = a\).

We finally rewrite (109) taking into consideration the integration extremes:

$$\begin{aligned} m(t,v;a_m) - \phi (t;a) \,\;{=}\;\, \left\{ \begin{array}{rl} \displaystyle {\int _v^t (\xi - v) \, (-h'(\vartheta )) \, \phi '(\xi ;a_m) \,\, d\xi } &{} \;\; \mathrm {if} \;\; t > v \\ 0 &{} \;\; \mathrm {if} \;\; t = v \\ \displaystyle {\int _t^v (v - \xi ) \, (-h'(\vartheta )) \, \phi '(\xi ;a_m) \,\, d\xi } &{} \;\; \mathrm {if} \;\; t < v \end{array} \right. .\qquad \end{aligned}$$
(112)

Recalling (111) and assumption A4), we can conclude that the two integrand functions in (112) are both non negative for any \(\xi \) in their associated integration domain, for any possible integration domain defined by \(t,v > 0\), for any \(0< a_m \! < a\), hence (17). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lanza, A., Morigi, S., Selesnick, I. et al. Nonconvex nonsmooth optimization via convex–nonconvex majorization–minimization. Numer. Math. 136, 343–381 (2017). https://doi.org/10.1007/s00211-016-0842-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-016-0842-x

Mathematics Subject Classification

Navigation