Skip to main content
Log in

Linear Convergence of Prox-SVRG Method for Separable Non-smooth Convex Optimization Problems under Bounded Metric Subregularity

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

With the help of bounded metric subregularity which is weaker than strong convexity, we show the linear convergence of proximal stochastic variance-reduced gradient (Prox-SVRG) method for solving a class of separable non-smooth convex optimization problems where the smooth item is a composite of strongly convex function and linear function. We introduce an equivalent characterization for the bounded metric subregularity by taking into account the calmness condition of a perturbed linear system. This equivalent characterization allows us to provide a verifiable sufficient condition to ensure linear convergence of Prox-SVRG and randomized block-coordinate proximal gradient methods. Furthermore, we verify that these sufficient conditions hold automatically when the non-smooth item is the generalized sparse group Lasso regularizer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)

    MathSciNet  MATH  Google Scholar 

  2. Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in banach spaces. J. Nonlinear Convex Anal. 15(1), 35–47 (2014)

    MathSciNet  MATH  Google Scholar 

  3. Aubin, J.P.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9(1), 87–111 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  4. Borwein, J.M., Zhu, Q.J.: Techniques of Variational Analysis. Springer, New York (2005)

    MATH  Google Scholar 

  5. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654. Curran Associates, Inc. (2014)

  6. Dontchev, A.L., Rockafellar, R.T.: Regularity and conditioning of solution mappings in variational analysis. Set-Valued Anal. 12(1–2), 79–109 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. arXiv preprint arXiv:1602.06661, (2006)

  8. Francisco, F., Pang, J.S.: Finite-dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)

    Google Scholar 

  9. Fercoq, O., Richtárik, P.: Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent. SIAM Rev. 58(4), 739–771 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fornasier, M., Rauhut, H.: Recovery algorithms for vector-valued data with joint sparsity constraints. SIAM J. Numer. Anal. 46(2), 577–613 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Friedman, J., Hastie, T., Tibshirani, R.: A note on the Lasso and a sparse group Lasso. arXiv preprint arXiv:1001.0736, (2010)

  12. Gfrerer, H., Ye, J.J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27(2), 842–865 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  13. Gong, P., Ye, J.: Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity. arXiv preprint arXiv:1406.1102, (2014)

  14. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD, pp. 795–811. Lecture Notes in Computer Science, vol. 9851. Springer, Cham (2016)

  15. Kowalski, M.: Sparse regression using mixed norms. Appl. Comput. Harmon. Anal. 27(3), 303–324 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  16. Liu, J., Ye, J.: Efficient \(l_1/l_q\) norm regularization. arXiv preprint arXiv:1009.4766, (2010)

  17. Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control. Optim. 30(2), 408–425 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ma, C., Tappenden, R., Takáč, M.: Linear convergence of randomized feasible descent methods under the weak strong convexity assumption. J. Mach. Learn. Res. 17(1), 8138–8161 (2016)

    MathSciNet  MATH  Google Scholar 

  19. Meier, L., Van De Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. Roy. Stat. Soc. Ser. B. 70(1), 53–71 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: convergence analysis and error bounds. SIAM J. Optim. 26(1), 197–226 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Necoara, I., Nesterov, Y., Glineur, F.: Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 173(1), 227–254 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  24. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  25. Rockafellar, R.T., Wets, R.: Variational Analysis. Springer, Berlin (2009)

    MATH  Google Scholar 

  26. Robinson, S.M.: Stability theory for systems of inequalities. Part I: linear systems. SIAM J. Numer. Anal. 12(5), 754–769 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  27. Robinson, S.M.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  28. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  29. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  30. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  31. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15(1), 1523–1548 (2014)

    MathSciNet  MATH  Google Scholar 

  33. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  34. Ye, J.J., Ye, X.Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22(4), 977–997 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  35. Ye, J.J., Yuan, X.M., Zeng, S.Z., Zhang, J.: Variational analysis perspective on linear convergence of some first order methods for nonsmooth convex optimization problems. Set-Valued Var. Anal. (2021). https://doi.org/10.1007/s11228-021-00591-3

  36. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  37. Zheng, X.Y., Ng, K.F.: Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM J. Optim. 24(1), 154–174 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  38. Zhou, H., Sehl, M.E., Sinsheimer, J.S., Lange, K.: Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19), 2375–2382 (2010)

    Article  Google Scholar 

  39. Zhou, Z., So, A.M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165(2), 689–728 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  40. Zhou, Z., Zhang, Q., So, A.M.C.: \(\ell _{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods. ICML, 1501–1510, (2015)

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (Nos. 11901380, 11971220), Shenzhen Science and Technology Program (No. RCYX20200714114700072), the Stable Support Plan Program of Shenzhen Natural Science Fund (No. 20200925152128002), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011152) and Shanghai Pujiang Program (No. 2020PJC058). We are grateful to the editor and two anonymous referees for their comments which have helped us improve the paper substantially.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xide Zhu.

Additional information

Communicated by Xiaojun Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 5.1

If \((\varvec{s}_J)_j\in [-\lambda ,\lambda ]\), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=0\). Clearly, (22) is true for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\). If \((\varvec{s}_J)_j>\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=(\varvec{s}_J)_j-\lambda \). Considering \(\Vert \varvec{y}_J\Vert _{\infty }\le 1\), we can further obtain \( 0<(\varvec{s}_J)_j-\lambda \le (\varvec{s}_J-\lambda \varvec{y}_J)_j. \) If \((\varvec{s}_J)_j<-\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=(\varvec{s}_J)_j+\lambda \). Similarly, we obtain \( (\varvec{s}_J-\lambda \varvec{y}_J)_j\le (\varvec{s}_J)_j+\lambda <0. \) In either case, we have (22). Consequently, (23) holds for any \(\varvec{x}_J \in \mathbb {R}^{|J|}\) and \(\varvec{y}_J \in \partial \Vert \varvec{x}_J\Vert _1\). \(\square \)

Proof of Lemma 5.2

From (24), we know \( \varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p+\lambda {\partial \Vert \varvec{x}_J\Vert }_1. \) Since \(p\in ]1, \infty [\), we have \(\frac{p}{q}\in ]0, \infty [\). From (19), we know that since \(\varvec{x}_J\ne \varvec{0}\), there exists \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\) such that

$$\begin{aligned} \varvec{s}_J-\lambda \varvec{y}_J = w_J \frac{\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}. \end{aligned}$$
(A.1)

Let \(j\in J\) be arbitrary. If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\varvec{y}_J)_j=1\). It follows from (A.1) that \((\varvec{s}_J)_j-\lambda =(\varvec{s}_J-\lambda \varvec{y}_J)_j\ge 0\) holds for such j. If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\) and \((\varvec{y}_J)_j\in [-1,1]\). It follows from (A.1) that \((\varvec{s}_J-\lambda \varvec{y}_J)_j=0\) holds for such j. If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\varvec{y}_J)_j=-1\). It follows from (A.1) that \((\varvec{s}_J)_j+\lambda =(\varvec{s}_J-\lambda \varvec{y}_J)_j\le 0\) holds for such j. In either case, we know that \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=(\varvec{s}_J-\lambda \varvec{y}_J)_j\) holds for each \(j\in J\). Thus, we obtain (25). \(\square \)

Proof of Proposition 5.1

The case (i) is trivial since \(\partial g_J(\varvec{x}_J) \equiv \{\varvec{0}\}\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). We now consider the case (ii). In this case, we have \(\partial g_J(\varvec{x}_J)=\lambda {\partial \Vert \varvec{x}_J\Vert }_1\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\), which means that for any fixed \(\varvec{s}_J\in \mathbb {R}^{|J|}\),

$$\begin{aligned} (\partial g_J)^{-1}(\varvec{s}_J)=\big (\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J). \end{aligned}$$

The case \(\Vert \varvec{s}_J\Vert _{\infty }>\lambda \) is trivial. Now, we suppose that there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) satisfying \(\varvec{x}_J\in \big (\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J)\), and then, we have \(\varvec{s}_J\in \lambda {\partial \Vert \varvec{x}_J\Vert _1}\). Clearly, we have \(\Vert \varvec{s}_J\Vert _{\infty }\le \lambda \). Moreover, we have \(\big (\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J)=\{\varvec{0}\}\) if \(\Vert \varvec{s}_J\Vert _{\infty }<\lambda \). Thus, let us consider the case where \(\Vert \varvec{s}_J\Vert _{\infty }=\lambda \). According to (15), we know that if \(\Vert \varvec{s}_J\Vert _{\infty }=\lambda \), then

$$\begin{aligned} (\varvec{x}_J)_j \in \left\{ \begin{array}{ll} [0,+\infty [ &{} ~\text{ if }~~ (\varvec{s}_J)_j=\lambda ,\\ ]-\infty ,0] &{} ~\text{ if }~~ (\varvec{s}_J)_j=-\lambda ,\\ \{0\} &{} ~\text{ if }~~ (\varvec{s}_J)_j\in ]-\lambda ,\lambda [. \end{array}\right. \end{aligned}$$

Combined with (28), we obtain (27) immediately.

(iii) If \(p\in ]1, \infty [\), then we have \(q\in ]0,\infty [\) and \(\frac{q}{p}\in ]0,\infty [\). For any fixed \(\varvec{s}_J\in \mathbb {R}^{|J|}\), we divide the difference of \(\Vert \varvec{s}_J\Vert _q\) and \(w_J\) into three cases, that is, \(\Vert \varvec{s}_J\Vert _q>w_J\), \(\Vert \varvec{s}_J\Vert _q<w_J\) and \(\Vert \varvec{s}_J\Vert _q=w_J\). It follows from (19) that if there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) such that \(\varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p\), then we have \(\Vert \varvec{s}_J\Vert _q\le w_J\), and moreover, if \(\varvec{x}_J\ne \varvec{0}\), then we have \(\Vert \varvec{s}_J\Vert _q=w_J\) . Thus, we immediately obtain that \(\big (w_J \partial \Vert \cdot \Vert _p\big )^{-1}(\varvec{s}_J)=\emptyset \) if \(\Vert \varvec{s}_J\Vert _q>w_J\) and \(\big (w_J \partial \Vert \cdot \Vert _p\big )^{-1}(\varvec{s}_J)=\{\varvec{0}\}\) if \(\Vert \varvec{s}_J\Vert _q<w_J\). Next, we consider the case \(\Vert \varvec{s}_J\Vert _q=w_J\). Suppose that there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) such that \(\varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p\), then either \(\varvec{x}_J=\varvec{0}\) or \(\varvec{x}_J\ne \varvec{0}\) satisfying

$$\begin{aligned} \varvec{s}_J=w_J \frac{\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}. \end{aligned}$$
(A.2)

Clearly, (A.2) means that vectors \(\varvec{s}_J\) and \(\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\) are linearly dependent. In either case, there must exist \(\varvec{\alpha }_J \ge 0\) such that

$$\begin{aligned} \varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)={({\alpha }_J)^{\frac{p}{q}}} \varvec{s}_J. \end{aligned}$$
(A.3)

Substituting (A.3) into (17), we can further obtain

$$\begin{aligned} \varvec{x}_J=\varvec{\varvec{\varphi }}_{{\frac{q}{p}}}\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )=\varvec{\varvec{\varphi }}_{{\frac{q}{p}}}\big ({({\alpha }_J)^{\frac{p}{q}}} \varvec{s}_J\big ) = {\alpha }_J {\varvec{\varvec{\varphi }}_{{\frac{q}{p}}}(\varvec{s}_J)}. \end{aligned}$$

Thus, we have (29).

(iv) In this case, we have \(\partial g_J(\varvec{x}_J)=w_J \Vert \varvec{x}_J\Vert _1\). Similar to the proof of case (ii), we can obtain (30). In the following, we consider the case (v). First, we consider the first case where \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q>w_J\). In this case, we claim that

$$\begin{aligned} \big (w_J \partial \Vert \cdot \Vert _p+\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J)=\emptyset . \end{aligned}$$
(A.4)

Indeed, if (A.4) does not hold, then there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) such that

$$\begin{aligned} \varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p+\lambda {\partial \Vert \varvec{x}_J\Vert }_1. \end{aligned}$$
(A.5)

Moreover, from (A.5), there exists \(\varvec{y}_J\in {\partial \Vert \varvec{x}_J\Vert }_1\) such that \( \varvec{s}_J-\lambda \varvec{y}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p. \) Clearly, we have \(\Vert \varvec{s}_J-\lambda \varvec{y}_J \Vert _q\le w_J\). Since \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q>w_J\), we have

$$\begin{aligned} \Vert \varvec{s}_J-\lambda \varvec{y}_J \Vert _q<\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q. \end{aligned}$$

Clearly, this result conflicts with (23). Thus, we have (A.4) if \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q>w_J\). Next, let us consider the second case where \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\). In this case, we know from (19) that \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\in w_J \partial \Vert \varvec{0}\Vert _p\). Moreover, from (21), we have \(\varvec{s}_J-\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\in \lambda {\partial \Vert \varvec{0}\Vert }_1\). Thus, we have \( \varvec{s}_J\in w_J \partial \Vert \varvec{0}\Vert _p+\lambda {\partial \Vert \varvec{0}\Vert }_1, \) that is,

$$\begin{aligned} \varvec{0} \in \big (w_J \partial \Vert \cdot \Vert _p+\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J). \end{aligned}$$
(A.6)

We claim that if \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\), then

$$\begin{aligned} \big (w_J \partial \Vert \cdot \Vert _p+\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J)=\{\varvec{0}\}. \end{aligned}$$
(A.7)

Indeed, if there exists \(\varvec{x}_J\ne \varvec{0}\) satisfying (A.5), then by Lemma 5.2 we have

$$\begin{aligned} \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J) = w_J \frac{\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}. \end{aligned}$$
(A.8)

From (A.8), we can further obtain that \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q=w_J\). Clearly, this result conflicts with the assumption that \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\). Thus, we have (A.7) if \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\). Lastly, we consider the third case \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q=w_J\). Clearly, \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\in w_J \partial \Vert \varvec{0}\Vert _p\). Similarly, we have (A.6). Now, we suppose that there exists \(\varvec{x}_J\ne \varvec{0}\) satisfying (A.5). According to Lemma 5.2, we have (A.8). Since \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q=w_J\), we can rewrite (A.8) as

$$\begin{aligned} \frac{\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)}{\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q} = \frac{\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}. \end{aligned}$$
(A.9)

Clearly, (A.9) means that \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\) and \(\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\) are linearly dependent nonzero vectors. Thus, we know that there must exist \({\alpha }_J > {0}\) such that

$$\begin{aligned} \varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)= (\alpha _J)^{\frac{p}{q}} \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J). \end{aligned}$$
(A.10)

Using (17), we can further obtain from (A.10) that \( \varvec{x}_J=\varvec{\varvec{\varphi }}_{\frac{q}{p}}\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big ) = {\alpha }_J \varvec{\varvec{\varphi }}_{\frac{q}{p}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big ). \) This, together with (A.6), yields

$$\begin{aligned} \big ( w_J \partial \Vert \cdot \Vert _p + \lambda {\partial \Vert \cdot \Vert }_1 \big )^{-1}(\varvec{s}_J) = \big \{\alpha _J \varvec{\varvec{\varphi }}_{{\frac{q}{p}}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )\in \mathbb {R}^{|J|}: {\alpha }_J \ge {0}\big \}. \end{aligned}$$

From the above analysis, we obtain (31).

(vi) In this case, we have \(\partial g_J(\varvec{x}_J) = (w_J + \lambda ) \Vert \varvec{x}_J\Vert _1\). Similar to the proof of case (ii), we obtain (32). This completes the proof. \(\square \)

Proof of Lemma 5.3

Since \(p \in ]1, 2]\), we have \(\frac{q}{p} \in [1,\infty [\) and \(\frac{p}{q} \in ]0, 1]\). For any \(p \in ]1, 2]\), it is clear that the function \(z \rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\) is continuously differentiable on \(\mathbb {R}\) and hence locally Lipschitz. Thus, for any fixed \(\varvec{x}^0\in \mathbb {R}^n\), there exist \(\delta _{J}, \kappa _{J}>0\) such that for all \(\varvec{\xi }_{J,1}, \varvec{\xi }_{J,2} \in \mathbb {U}_{\delta _{J}}\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big )\),

$$\begin{aligned} \left\| \varvec{\varphi }_{\frac{q}{p}}(\varvec{\xi }_{J,1}) - \varvec{\varphi }_{\frac{q}{p}}(\varvec{\xi }_{J,2})\right\| \le \kappa _{J} \Vert \varvec{\xi }_{J,1} - \varvec{\xi }_{J,2}\Vert . \end{aligned}$$
(A.11)

Next, we will show that there exists \(\epsilon _J>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),

$$\begin{aligned} \big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)-\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big \Vert \le \delta _{J}, \end{aligned}$$
(A.12)

and

$$\begin{aligned} \Big \Vert \varvec{s}_J^0 {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}-\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\Big \Vert \le \delta _{J}. \end{aligned}$$
(A.13)

Since the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{p}{q}}\) is continuous on \(\mathbb {R}\), there exists \(\epsilon _{J,1}>0\) such that (A.12) is satisfied for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\). Next, let us consider (A.13). If \(\varvec{x}^0_J=\varvec{0}\), then we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)=\varvec{0}\) since the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{p}{q}}\) is continuous. Hence, in this case, we have

$$\begin{aligned} \lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\Big \Vert \varvec{s}_J^0 {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}-\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\Big \Vert =0. \end{aligned}$$
(A.14)

Otherwise, from (19), we have \(\varvec{s}_J^0=w_J {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big \Vert _q}\). Considering the continuity of the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{p}{q}}\), we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\varvec{s}_J^0 {w_J}^{-1}{\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\). In either case, we have (A.14). Thus, there exists \(\epsilon _{J,2}>0\) such that (A.13) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,2}}(\varvec{x}_J^0)\).

Let \(\varepsilon _J{:}{=}\min \{\epsilon _{J,1},\epsilon _{J,2}\}\). For any \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\), we have (A.12) and (A.13). Combining (A.11) and (17), we obtain that (A.11) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\). \(\square \)

Proof of Lemma 5.4

Since \(p\in ]1,2]\), we have \(\frac{q}{p}\ge 1\). For any \(p\in ]1,2]\), it is clear that the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\) is continuously differentiable on \(\mathbb {R}\) and hence locally Lipschitz. Thus, for \(\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\), there exist \(\delta _{J},\kappa _{J}>0\) such that for all \(\varvec{\xi }_{J,1}, \varvec{\xi }_{J,2} \in \mathbb {U}_{\delta _{J}}\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big )\),

$$\begin{aligned} \left\| \varvec{\varphi }_{\frac{q}{p}}(\varvec{\xi }_{J,1}) - \varvec{\varphi }_{\frac{q}{p}}(\varvec{\xi }_{J,2})\right\| \le \kappa _{J} \Vert \varvec{\xi }_{J,1}-\varvec{\xi }_{J,2}\Vert . \end{aligned}$$
(A.15)

Next, we will show that there exists \(\epsilon _J>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),

$$\begin{aligned} \left\| \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)-\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\right\| \le \delta _{J}, \end{aligned}$$
(A.16)

and

$$\begin{aligned} \left\| \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}-\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\right\| \le \delta _{J}. \end{aligned}$$
(A.17)

Since \(p\in ]1,2]\), we have \(\frac{p}{q}\in ]0, 1]\). Since the function \(z\rightarrow \text{ sign }(z) |z|^{t}\) is continuous for any fixed \(t\in ]0, 1]\), there exists \(\epsilon _{J,1}>0\) such that (A.16) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\). Next, let us consider (A.17). If \(\varvec{x}^0_J=\varvec{0}\), we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)=\varvec{0}\) since the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\) is continuous. Hence, in this case, we obtain

$$\begin{aligned} \lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J} \left\| \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}-\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\right\| = 0. \end{aligned}$$
(A.18)

If \(\varvec{x}^0_J \ne \varvec{0}\), then by Lemma 5.2 we have \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)=w_J {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big \Vert _q}\). Considering the continuity of the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\), we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J} \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0) {w_J}^{-1}\) \( {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\). In either case, we have (A.18). Thus, there exists \(\epsilon _{J,2}>0\) such that (A.17) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,2}}(\varvec{x}_J^0)\).

Let \(\varepsilon _J{:}{=}\min \{\epsilon _{J,1},\epsilon _{J,2}\}\). For any \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\), we have (A.16) and (A.17). With (A.15) and (17), we obtain that (35) is satisfied for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\). \(\square \)

Proof of Lemma 5.5

Let \(j\in J\) be arbitrary. Since \(\varvec{x}_J\ne 0\), we have

$$\begin{aligned} \partial \Vert \varvec{x}_J\Vert _p=\frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}. \end{aligned}$$

(i) In the case where \((\varvec{s}_J^0)_j>\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )_j=(\varvec{s}_J^0)_j-\lambda >0\). If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=1\). Clearly, we have

$$\begin{aligned} \Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j = \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \partial \Vert \varvec{x}_J\Vert _1\Bigg )_j. \end{aligned}$$

If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=-1\). Clearly, we have

$$\begin{aligned} 0<\Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j < \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \partial \Vert \varvec{x}_J\Vert _1\Bigg )_j. \end{aligned}$$

If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=[-1,1]\). Clearly, for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\),

$$\begin{aligned} 0<\Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j \le \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \varvec{y}_J\Bigg )_j. \end{aligned}$$

(ii) In the case where \((\varvec{s}_J^0)_j<-\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )_j=(\varvec{s}_J^0)_j+\lambda <0\). If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=1\). Clearly, we have

$$\begin{aligned} 0>\Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j > \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \partial \Vert \varvec{x}_J\Vert _1\Bigg )_j. \end{aligned}$$

If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=-1\). Clearly, we have

$$\begin{aligned} \Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j = \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \partial \Vert \varvec{x}_J\Vert _1\Bigg )_j. \end{aligned}$$

If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=[-1,1]\). Clearly, for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\),

$$\begin{aligned} 0 > \Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j \ge \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \varvec{y}_J\Bigg )_j. \end{aligned}$$

(iii) In the case where \((\varvec{s}_J^0)_j\in [-\lambda , \lambda ]\), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )_j=0\). If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=1\). Clearly, we have

$$\begin{aligned} 0>\Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j \ge \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \partial \Vert \varvec{x}_J\Vert _1\Bigg )_j. \end{aligned}$$

If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=-1\). Clearly, we have

$$\begin{aligned} 0<\Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j \le \Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \partial \Vert \varvec{x}_J\Vert _1\Bigg )_j. \end{aligned}$$

If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\). Hence, we have

$$\begin{aligned} \Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j=0. \end{aligned}$$

From the above analysis, in either case, for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\), we have

$$\begin{aligned} \Bigg |\Bigg (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Bigg )_j\Bigg | \le \Bigg |\Bigg (\varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\lambda \varvec{y}_J\Bigg )_j\Bigg |,~~\forall ~j\in J. \end{aligned}$$

Thus, we obtain (36). This completes the proof. \(\square \)

Proof of Proposition 5.3

Let \(\left( \varvec{x}^0_J, \varvec{s}^0_J \right) \in \hbox \mathrm{gph} (\partial g_J)\) be arbitrary, then

$$\begin{aligned} \varvec{s}_J^0 \in w_J \partial \Vert \varvec{x}_J^0\Vert _p + \lambda \partial \Vert \varvec{x}_J^0\Vert _1. \end{aligned}$$
(A.19)

We consider the following five cases: (i) \(p=1\); (ii) \(p \in ]1, 2]\), \(w_J=0\) and \(\lambda = 0\); (iii) \(p \in ]1, 2]\) \(w_J>0\) and \(\lambda =0\); (iv) \(p \in ]1, 2]\), \(w_J=0\) and \(\lambda >0\); (v) \(p \in ]1, 2]\) \(w_J>0\) and \(\lambda >0\).

(i) In this case, we have \(g_J(\varvec{x}_J) = (w_J+\lambda ) \Vert \varvec{x}_J\Vert _1\) for all \(\varvec{x}_J \in \mathbb {R}^{|J|}\). If \(w_J + \lambda > 0\), then \(g_J\) is a polyhedral convex function. From [39, Section 4.2], we know that \(g_J\) is metrically subregular at \(\left( \varvec{x}_J^0, \varvec{s}_J^0 \right) \). Otherwise, we have \(g_J( \varvec{x}_J ) \equiv 0\) for all \(\varvec{x}_J \in \mathbb {R}^{|J|}\). Considering (A.19), we have \(\varvec{s}_J^0 = \varvec{0}\). It follows from (26) that \(\left( \partial g_J \right) ^{-1}\left( \varvec{s}^0_J\right) = \mathbb {R}^{|J|}\). Thus, for all \(\epsilon _J > 0\) and \(\varvec{x}_J \in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),

$$\begin{aligned} \mathrm{{dist}}\left( \varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\right) =\mathrm{{dist}}\big (\varvec{s}_J^0,\partial g_J(\varvec{x}_J)\big )=0. \end{aligned}$$
(A.20)

(ii) In this case, we have \(g_J(\varvec{x}_J)\equiv 0\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). Similar to the proof of case (i), we can show that (A.20) holds for all \(\epsilon _J>0\) and \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\).

(iii) In this case, we have \(g_J(\varvec{x}_J) = w_J \Vert \varvec{x}_J\Vert _p\) for all \(\varvec{x}_J \in \mathbb {R}^{|J|}\). It follows from (19) that either (iiia) \(\Vert \varvec{s}_J^0\Vert _q< w_J\) or (iiib) \(\Vert \varvec{s}_J^0\Vert _q=w_J\).

(iiia) If \(\Vert \varvec{s}_J^0\Vert _q<w_J\), then we have \(\big (w_J \partial \Vert \Vert _p\big )^{-1}(\varvec{s}_J^0)=\{\varvec{0}\}\) and hence \(\varvec{x}_J^0=\varvec{0}\). Thus, for any \(\varvec{x}_{J}\in \mathbb {R}^{|J|}\), we have

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big ) = \left\| \varvec{x}_J-\varvec{x}_J^0 \right\| . \end{aligned}$$
(A.21)

Set \( \epsilon _J {:}{=} \min _{\varvec{z} \in \mathbb {R}^{|J|}}\left\{ \left\| \varvec{s}_J^0-w_J \varvec{z} \right\| : \Vert \varvec{z}\Vert _q=1 \right\} , \) then \(\epsilon _J>0\) due to \(\Vert \varvec{s}_J^0\Vert _q<w_J\).

Let \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\) be arbitrary. In the case where \(\varvec{x}_J=\varvec{0}\), we can obtain (A.20) immediately. In the case where \(\varvec{x}_J\ne \varvec{0}\), from (A.21), we have \( \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )=\Vert \varvec{x}_J-\varvec{x}_J^0\Vert \le \epsilon _J. \) From (19), we have \(\partial \Vert \varvec{x}_J\Vert _p={\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\). Since \(\left\| {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\right\| _q=1\), we have

$$\begin{aligned} \mathrm{{dist}}\big (\varvec{s}_J^0,\partial g_J(\varvec{x}_J)\big )=\left\| \varvec{s}_J^0-w_J \frac{\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\right\| \ge \epsilon _J. \end{aligned}$$

In either case, we obtain that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )\le \mathrm{{dist}}\big (\varvec{s}_J^0,\partial g_J(\varvec{x}_J)\big ). \end{aligned}$$

(iiib) If \(\Vert \varvec{s}_J^0\Vert _q=w_J\), it is clear that

$$\begin{aligned} \varvec{s}_J^0\in w_J \partial \Vert \varvec{0}\Vert _p. \end{aligned}$$
(A.22)

In addition, from (29), we have

$$\begin{aligned} (\partial g_J)^{-1}(\varvec{s}^0_J)=\Big \{\alpha _J \varvec{\varphi }_{\frac{q}{p}}(\varvec{s}_J^0): \alpha _J\ge 0\Big \}. \end{aligned}$$
(A.23)

In the case where \(\varvec{x}_J=\varvec{0}\), together with (A.22) and (A.23), we can obtain (A.20) immediately. Next, we focus on the case where \(\varvec{x}_J\ne \varvec{0}\). First, from (19), we have \(\partial \Vert \varvec{x}_J\Vert _p={\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\). Using (A.23), we have

$$\begin{aligned} \mathrm{{dist}}\left( \varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\right)= & {} \min \left\{ \left\| \varvec{x}_J-\alpha _J \varvec{\varphi }_{\frac{q}{p}}(\varvec{s}_J^0)\right\| : \alpha _J\ge 0\right\} \nonumber \\\le & {} \left\| \varvec{x}_J-\left( {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\right) ^{\frac{q}{p}} \varvec{\varphi }_{\frac{q}{p}}\big (\varvec{s}_J^0\big )\right\| . \end{aligned}$$

This, together with (17), yields that

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )\le \Big \Vert \varvec{x}_J-\varvec{\varphi }_{\frac{q}{p}}\big (\varvec{s}_J^0 {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q} \big )\Big \Vert . \end{aligned}$$
(A.24)

By Lemma 5.3, there exist \(\epsilon _J,\kappa _{J,1}>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),

$$\begin{aligned} \Big \Vert \varvec{x}_J-\varvec{\varphi }_{\frac{q}{p}}\big (\varvec{s}_J^0 {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\big )\Big \Vert \le \kappa _{J,1} \Big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)-\varvec{s}_J^0 {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\Big \Vert ,\nonumber \\ \end{aligned}$$
(A.25)

and moreover, there must exist \(\kappa _{J,2}>0\) such that

$$\begin{aligned} \big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J) \big \Vert _q \le \kappa _{J,2}, ~~\forall ~\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0). \end{aligned}$$
(A.26)

Let \(\kappa _J{:}{=}\kappa _{J,1} \kappa _{J,2} {w_J}^{-1}\), then \(\kappa _J>0\). Thus, for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\), we have

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )\le & {} \Big \Vert \varvec{x}_J-\varvec{\varphi }_{\frac{q}{p}}\big ({w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q} \varvec{s}_J^0\big )\Big \Vert \nonumber \\\le & {} \kappa _{J,1} \Big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)-{w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q} \varvec{s}_J^0\Big \Vert \nonumber \\\le & {} \kappa _{J,1} \kappa _{J,2} {w_J}^{-1} \Big \Vert w_J {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}-\varvec{s}_J^0\Big \Vert \nonumber \\= & {} \kappa _J \mathrm{{dist}}\big (\varvec{s}_J^0,\partial g_J(\varvec{x}_J)\big ), \end{aligned}$$

where the first inequation follows from (A.24), the second is due to (A.25), the third is due to (A.26), and the last comes from the definition of \(\kappa _J\).

(iv) In this case, we have \(g_J(\varvec{x}_J)=\lambda \Vert \varvec{x}_J\Vert _1\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). Similar to the proof of case (i), we can show that (A.20) holds for all \(\epsilon _J>0\) and \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\).

(v) In this case, we have \(g_J(\varvec{x}_J)=w_J \Vert \varvec{x}_J\Vert _p+\lambda \Vert \varvec{x}_J\Vert _1\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). It follows from (31) that either (va) \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)\Vert _q< w_J\) or (vb) \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)\Vert _q=w_J\).

(va) If \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)\Vert _q<w_J\), then from (31) we have

$$\begin{aligned} (\partial g_J)^{-1}(\varvec{s}^0_J)=\{\varvec{0}\}, \end{aligned}$$
(A.27)

and hence, \(\varvec{x}_J^0 = \varvec{0}\). Thus, for any \(\varvec{x}_{J}\in \mathbb {R}^{|J|}\), we have

$$\begin{aligned} \mathrm{{dist}}\left( \varvec{x}_J, (\partial g_J)^{-1}(\varvec{s}^0_J) \right) = {\left\| \varvec{x}_J-\varvec{x}_J^0 \right\| } = \Vert \varvec{x}_J\Vert . \end{aligned}$$
(A.28)

In the case \(\varvec{x}_J=\varvec{0}\), we have (A.20) immediately. In the case \(\varvec{x}_J \ne \varvec{0}\), we set

$$\begin{aligned} \epsilon _J {:}{=} \min _{\varvec{z}_J,\varvec{y}_J\in \mathbb {R}^{|J|}} \Big \{{\left\| \varvec{s}_J^0-(w_J \varvec{z}_J+\lambda \varvec{y}_J) \right\| } : \Vert \varvec{z}_J\Vert _q=1, \varvec{y}_J\in \partial \Vert \varvec{z}_J\Vert _1 \Big \}. \end{aligned}$$

We claim \(\epsilon _J>0\). Otherwise, if \(\epsilon _J=0\), then there exists \(\varvec{z}^0_J\in \mathbb {R}^{|J|}\) satisfying \(\Vert \varvec{z}^0_J\Vert _q=1\) and \(y^0_J\in \partial \Vert \varvec{z}^0_J\Vert _1\), such that \( \varvec{s}_J^0=w_J \varvec{z}^0_J+\lambda \varvec{y}^0_J. \) Since \(\Vert \varvec{z}^0_J\Vert _q=1\), we know \(\varvec{z}^0_J\ne \varvec{0}\). Take \(\tilde{\varvec{x}}^0_J{:}{=}\varvec{\varphi }_{\frac{q}{p}}(\varvec{z}^0_J)\). It is easy to verify that \(\varvec{z}^0_J={\varvec{\varphi }_{\frac{p}{q}}(\tilde{\varvec{x}}^0_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\tilde{\varvec{x}}^0_J)\big \Vert _q}\). From (19), we have \(\varvec{z}^0_J\in \partial \Vert \tilde{\varvec{x}}^0_J\Vert _p\). Moreover, we have \(\partial \Vert \varvec{z}^0_J\Vert _1=\partial \Vert \tilde{\varvec{x}}^0_J\Vert _1\) since \(\varvec{z}^0_J\ne \varvec{0}\). Thus, we have

$$\begin{aligned} \varvec{s}_J^0\in w_J \partial \Vert \tilde{\varvec{x}}^0_J \Vert _p + \lambda \partial \Vert \tilde{\varvec{x}}^0_J \Vert _1. \end{aligned}$$

Clearly, this result conflicts with (A.27). Hence, we have \(\epsilon _J>0\).

Since \(\varvec{x}_J\ne \varvec{0}\), we have \(\partial \Vert \varvec{x}_J\Vert _p={\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\). Moreover, we have \(\partial \Vert \varvec{x}_J\Vert _1=\partial \Big \Vert {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Big \Vert _1\). Considering the definition of \(\epsilon _J\), we can obtain \( \mathrm{{dist}}\big (\varvec{s}_J^0,g_J(\varvec{x}_J)\big )\ge \epsilon _J. \) This, together with (A.28), yields that

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )\le \mathrm{{dist}}\big (\varvec{s}_J^0,g_J(\varvec{x}_J)\big ),~~\forall ~\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0). \end{aligned}$$

(vb) If \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\Vert _q= w_J\), then from (31) we have

$$\begin{aligned} (\partial g_J)^{-1}(\varvec{s}^0_J)= \big \{\beta _J \varvec{\varphi }_{{\frac{q}{p}}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )\in \mathbb {R}^{|J|}:\beta _J\ge 0\big \}. \end{aligned}$$
(A.29)

Clearly, we have \(\varvec{0} \in (\partial g_J)^{-1}(\varvec{s}^0_J)\), namely

$$\begin{aligned} \varvec{s}_J^0\in w_J \partial \Vert \varvec{0}\Vert _p+\lambda \partial \Vert \varvec{0}\Vert _1. \end{aligned}$$

In the case where \(\varvec{x}_J=\varvec{0}\), we can obtain (A.20) immediately. Next, we consider the case \(\varvec{x}_J\ne \varvec{0}\). Using (A.29), we have

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )= & {} \min \Big \{\big \Vert \varvec{x}_J-\beta _J \varvec{\varphi }_{{\frac{q}{p}}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )\big \Vert : \beta _J\ge 0\Big \}\nonumber \\\le & {} \Big \Vert \varvec{x}_J-\left( {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\right) ^{\frac{q}{p}} \varvec{\varphi }_{{\frac{q}{p}}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )\Big \Vert . \end{aligned}$$

This, together with (17), yields

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big ) \le \Big \Vert \varvec{x}_J-\varvec{\varphi }_{\frac{q}{p}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q} \big )\Big \Vert . \end{aligned}$$
(A.30)

By Lemma 5.4, there exist \(\epsilon _{J,1},\kappa _{J,1}>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\),

$$\begin{aligned}&\Big \Vert \varvec{x}_J-\varvec{\varphi }_{\frac{q}{p}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\big )\Big \Vert \nonumber \\\le & {} \kappa _{J,1} \Big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)-\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\Big \Vert , \end{aligned}$$
(A.31)

and moreover, there exists \(\kappa _{J,2}>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\),

$$\begin{aligned} \big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J) \big \Vert _q\le \kappa _{J,2}. \end{aligned}$$
(A.32)

Let \(\kappa _J{:}{=}\kappa _{J,1} \kappa _{J,2} {w_J}^{-1}\), then \(\kappa _J>0\). Thus, for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\), we obtain

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )\le & {} \Big \Vert \varvec{x}_J-\varvec{\varphi }_{\frac{q}{p}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q} \big )\Big \Vert \nonumber \\\le & {} \kappa _{J,1} \Big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)-\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J) {w_J}^{-1} {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}\Big \Vert \nonumber \\\le & {} \kappa _{J} \Big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)-w_J {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Big \Vert \nonumber \\\le & {} \kappa _J \mathrm{{dist}}\big (\varvec{s}_J^0,\partial g_J(\varvec{x}_J)\big ), \end{aligned}$$

where the first inequation follows from (A.30), the second is due to (A.31), the third is due to (A.32) and the definition of \(\kappa _J\), the last comes from Lemma 5.5.

In summary, in either case, there exist \(\epsilon _J,\kappa _{J}>0\) such that

$$\begin{aligned} \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big ) \le \kappa _J \mathrm{{dist}}\big (\varvec{s}_J^0,g_J(\varvec{x}_J)\big ),~~\forall ~\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0). \end{aligned}$$

Consequently, \(\partial g\) is metrically subregular at \(\left( \varvec{x}_J^0, \varvec{s}_J^0 \right) \in \hbox \mathrm{gph} (\partial g_J)\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zhu, X. Linear Convergence of Prox-SVRG Method for Separable Non-smooth Convex Optimization Problems under Bounded Metric Subregularity. J Optim Theory Appl 192, 564–597 (2022). https://doi.org/10.1007/s10957-021-01978-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-021-01978-w

Keywords

Mathematics Subject Classification

Navigation