Abstract
With the help of bounded metric subregularity which is weaker than strong convexity, we show the linear convergence of proximal stochastic variance-reduced gradient (Prox-SVRG) method for solving a class of separable non-smooth convex optimization problems where the smooth item is a composite of strongly convex function and linear function. We introduce an equivalent characterization for the bounded metric subregularity by taking into account the calmness condition of a perturbed linear system. This equivalent characterization allows us to provide a verifiable sufficient condition to ensure linear convergence of Prox-SVRG and randomized block-coordinate proximal gradient methods. Furthermore, we verify that these sufficient conditions hold automatically when the non-smooth item is the generalized sparse group Lasso regularizer.
Similar content being viewed by others
References
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)
Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in banach spaces. J. Nonlinear Convex Anal. 15(1), 35–47 (2014)
Aubin, J.P.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9(1), 87–111 (1984)
Borwein, J.M., Zhu, Q.J.: Techniques of Variational Analysis. Springer, New York (2005)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654. Curran Associates, Inc. (2014)
Dontchev, A.L., Rockafellar, R.T.: Regularity and conditioning of solution mappings in variational analysis. Set-Valued Anal. 12(1–2), 79–109 (2004)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. arXiv preprint arXiv:1602.06661, (2006)
Francisco, F., Pang, J.S.: Finite-dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)
Fercoq, O., Richtárik, P.: Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent. SIAM Rev. 58(4), 739–771 (2016)
Fornasier, M., Rauhut, H.: Recovery algorithms for vector-valued data with joint sparsity constraints. SIAM J. Numer. Anal. 46(2), 577–613 (2008)
Friedman, J., Hastie, T., Tibshirani, R.: A note on the Lasso and a sparse group Lasso. arXiv preprint arXiv:1001.0736, (2010)
Gfrerer, H., Ye, J.J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27(2), 842–865 (2017)
Gong, P., Ye, J.: Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity. arXiv preprint arXiv:1406.1102, (2014)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD, pp. 795–811. Lecture Notes in Computer Science, vol. 9851. Springer, Cham (2016)
Kowalski, M.: Sparse regression using mixed norms. Appl. Comput. Harmon. Anal. 27(3), 303–324 (2009)
Liu, J., Ye, J.: Efficient \(l_1/l_q\) norm regularization. arXiv preprint arXiv:1009.4766, (2010)
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control. Optim. 30(2), 408–425 (1992)
Ma, C., Tappenden, R., Takáč, M.: Linear convergence of randomized feasible descent methods under the weak strong convexity assumption. J. Mach. Learn. Res. 17(1), 8138–8161 (2016)
Meier, L., Van De Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. Roy. Stat. Soc. Ser. B. 70(1), 53–71 (2008)
Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: convergence analysis and error bounds. SIAM J. Optim. 26(1), 197–226 (2016)
Necoara, I., Nesterov, Y., Glineur, F.: Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 173(1), 227–254 (2017)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.: Variational Analysis. Springer, Berlin (2009)
Robinson, S.M.: Stability theory for systems of inequalities. Part I: linear systems. SIAM J. Numer. Anal. 12(5), 754–769 (1975)
Robinson, S.M.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B. 58(1), 267–288 (1996)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15(1), 1523–1548 (2014)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Ye, J.J., Ye, X.Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22(4), 977–997 (1997)
Ye, J.J., Yuan, X.M., Zeng, S.Z., Zhang, J.: Variational analysis perspective on linear convergence of some first order methods for nonsmooth convex optimization problems. Set-Valued Var. Anal. (2021). https://doi.org/10.1007/s11228-021-00591-3
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. 68(1), 49–67 (2006)
Zheng, X.Y., Ng, K.F.: Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM J. Optim. 24(1), 154–174 (2014)
Zhou, H., Sehl, M.E., Sinsheimer, J.S., Lange, K.: Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19), 2375–2382 (2010)
Zhou, Z., So, A.M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165(2), 689–728 (2017)
Zhou, Z., Zhang, Q., So, A.M.C.: \(\ell _{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods. ICML, 1501–1510, (2015)
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (Nos. 11901380, 11971220), Shenzhen Science and Technology Program (No. RCYX20200714114700072), the Stable Support Plan Program of Shenzhen Natural Science Fund (No. 20200925152128002), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011152) and Shanghai Pujiang Program (No. 2020PJC058). We are grateful to the editor and two anonymous referees for their comments which have helped us improve the paper substantially.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Xiaojun Chen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Lemma 5.1
If \((\varvec{s}_J)_j\in [-\lambda ,\lambda ]\), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=0\). Clearly, (22) is true for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\). If \((\varvec{s}_J)_j>\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=(\varvec{s}_J)_j-\lambda \). Considering \(\Vert \varvec{y}_J\Vert _{\infty }\le 1\), we can further obtain \( 0<(\varvec{s}_J)_j-\lambda \le (\varvec{s}_J-\lambda \varvec{y}_J)_j. \) If \((\varvec{s}_J)_j<-\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=(\varvec{s}_J)_j+\lambda \). Similarly, we obtain \( (\varvec{s}_J-\lambda \varvec{y}_J)_j\le (\varvec{s}_J)_j+\lambda <0. \) In either case, we have (22). Consequently, (23) holds for any \(\varvec{x}_J \in \mathbb {R}^{|J|}\) and \(\varvec{y}_J \in \partial \Vert \varvec{x}_J\Vert _1\). \(\square \)
Proof of Lemma 5.2
From (24), we know \( \varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p+\lambda {\partial \Vert \varvec{x}_J\Vert }_1. \) Since \(p\in ]1, \infty [\), we have \(\frac{p}{q}\in ]0, \infty [\). From (19), we know that since \(\varvec{x}_J\ne \varvec{0}\), there exists \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\) such that
Let \(j\in J\) be arbitrary. If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\varvec{y}_J)_j=1\). It follows from (A.1) that \((\varvec{s}_J)_j-\lambda =(\varvec{s}_J-\lambda \varvec{y}_J)_j\ge 0\) holds for such j. If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\) and \((\varvec{y}_J)_j\in [-1,1]\). It follows from (A.1) that \((\varvec{s}_J-\lambda \varvec{y}_J)_j=0\) holds for such j. If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\varvec{y}_J)_j=-1\). It follows from (A.1) that \((\varvec{s}_J)_j+\lambda =(\varvec{s}_J-\lambda \varvec{y}_J)_j\le 0\) holds for such j. In either case, we know that \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big )_j=(\varvec{s}_J-\lambda \varvec{y}_J)_j\) holds for each \(j\in J\). Thus, we obtain (25). \(\square \)
Proof of Proposition 5.1
The case (i) is trivial since \(\partial g_J(\varvec{x}_J) \equiv \{\varvec{0}\}\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). We now consider the case (ii). In this case, we have \(\partial g_J(\varvec{x}_J)=\lambda {\partial \Vert \varvec{x}_J\Vert }_1\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\), which means that for any fixed \(\varvec{s}_J\in \mathbb {R}^{|J|}\),
The case \(\Vert \varvec{s}_J\Vert _{\infty }>\lambda \) is trivial. Now, we suppose that there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) satisfying \(\varvec{x}_J\in \big (\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J)\), and then, we have \(\varvec{s}_J\in \lambda {\partial \Vert \varvec{x}_J\Vert _1}\). Clearly, we have \(\Vert \varvec{s}_J\Vert _{\infty }\le \lambda \). Moreover, we have \(\big (\lambda {\partial \Vert \cdot \Vert }_1\big )^{-1}(\varvec{s}_J)=\{\varvec{0}\}\) if \(\Vert \varvec{s}_J\Vert _{\infty }<\lambda \). Thus, let us consider the case where \(\Vert \varvec{s}_J\Vert _{\infty }=\lambda \). According to (15), we know that if \(\Vert \varvec{s}_J\Vert _{\infty }=\lambda \), then
Combined with (28), we obtain (27) immediately.
(iii) If \(p\in ]1, \infty [\), then we have \(q\in ]0,\infty [\) and \(\frac{q}{p}\in ]0,\infty [\). For any fixed \(\varvec{s}_J\in \mathbb {R}^{|J|}\), we divide the difference of \(\Vert \varvec{s}_J\Vert _q\) and \(w_J\) into three cases, that is, \(\Vert \varvec{s}_J\Vert _q>w_J\), \(\Vert \varvec{s}_J\Vert _q<w_J\) and \(\Vert \varvec{s}_J\Vert _q=w_J\). It follows from (19) that if there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) such that \(\varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p\), then we have \(\Vert \varvec{s}_J\Vert _q\le w_J\), and moreover, if \(\varvec{x}_J\ne \varvec{0}\), then we have \(\Vert \varvec{s}_J\Vert _q=w_J\) . Thus, we immediately obtain that \(\big (w_J \partial \Vert \cdot \Vert _p\big )^{-1}(\varvec{s}_J)=\emptyset \) if \(\Vert \varvec{s}_J\Vert _q>w_J\) and \(\big (w_J \partial \Vert \cdot \Vert _p\big )^{-1}(\varvec{s}_J)=\{\varvec{0}\}\) if \(\Vert \varvec{s}_J\Vert _q<w_J\). Next, we consider the case \(\Vert \varvec{s}_J\Vert _q=w_J\). Suppose that there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) such that \(\varvec{s}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p\), then either \(\varvec{x}_J=\varvec{0}\) or \(\varvec{x}_J\ne \varvec{0}\) satisfying
Clearly, (A.2) means that vectors \(\varvec{s}_J\) and \(\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\) are linearly dependent. In either case, there must exist \(\varvec{\alpha }_J \ge 0\) such that
Substituting (A.3) into (17), we can further obtain
Thus, we have (29).
(iv) In this case, we have \(\partial g_J(\varvec{x}_J)=w_J \Vert \varvec{x}_J\Vert _1\). Similar to the proof of case (ii), we can obtain (30). In the following, we consider the case (v). First, we consider the first case where \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q>w_J\). In this case, we claim that
Indeed, if (A.4) does not hold, then there exists \(\varvec{x}_J\in \mathbb {R}^{|J|}\) such that
Moreover, from (A.5), there exists \(\varvec{y}_J\in {\partial \Vert \varvec{x}_J\Vert }_1\) such that \( \varvec{s}_J-\lambda \varvec{y}_J\in w_J \partial \Vert \varvec{x}_J\Vert _p. \) Clearly, we have \(\Vert \varvec{s}_J-\lambda \varvec{y}_J \Vert _q\le w_J\). Since \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q>w_J\), we have
Clearly, this result conflicts with (23). Thus, we have (A.4) if \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q>w_J\). Next, let us consider the second case where \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\). In this case, we know from (19) that \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\in w_J \partial \Vert \varvec{0}\Vert _p\). Moreover, from (21), we have \(\varvec{s}_J-\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\in \lambda {\partial \Vert \varvec{0}\Vert }_1\). Thus, we have \( \varvec{s}_J\in w_J \partial \Vert \varvec{0}\Vert _p+\lambda {\partial \Vert \varvec{0}\Vert }_1, \) that is,
We claim that if \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\), then
Indeed, if there exists \(\varvec{x}_J\ne \varvec{0}\) satisfying (A.5), then by Lemma 5.2 we have
From (A.8), we can further obtain that \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q=w_J\). Clearly, this result conflicts with the assumption that \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\). Thus, we have (A.7) if \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q<w_J\). Lastly, we consider the third case \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q=w_J\). Clearly, \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\in w_J \partial \Vert \varvec{0}\Vert _p\). Similarly, we have (A.6). Now, we suppose that there exists \(\varvec{x}_J\ne \varvec{0}\) satisfying (A.5). According to Lemma 5.2, we have (A.8). Since \(\big \Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big \Vert _q=w_J\), we can rewrite (A.8) as
Clearly, (A.9) means that \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\) and \(\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\) are linearly dependent nonzero vectors. Thus, we know that there must exist \({\alpha }_J > {0}\) such that
Using (17), we can further obtain from (A.10) that \( \varvec{x}_J=\varvec{\varvec{\varphi }}_{\frac{q}{p}}\big (\varvec{\varvec{\varphi }}_{\frac{p}{q}}(\varvec{x}_J)\big ) = {\alpha }_J \varvec{\varvec{\varphi }}_{\frac{q}{p}}\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J)\big ). \) This, together with (A.6), yields
From the above analysis, we obtain (31).
(vi) In this case, we have \(\partial g_J(\varvec{x}_J) = (w_J + \lambda ) \Vert \varvec{x}_J\Vert _1\). Similar to the proof of case (ii), we obtain (32). This completes the proof. \(\square \)
Proof of Lemma 5.3
Since \(p \in ]1, 2]\), we have \(\frac{q}{p} \in [1,\infty [\) and \(\frac{p}{q} \in ]0, 1]\). For any \(p \in ]1, 2]\), it is clear that the function \(z \rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\) is continuously differentiable on \(\mathbb {R}\) and hence locally Lipschitz. Thus, for any fixed \(\varvec{x}^0\in \mathbb {R}^n\), there exist \(\delta _{J}, \kappa _{J}>0\) such that for all \(\varvec{\xi }_{J,1}, \varvec{\xi }_{J,2} \in \mathbb {U}_{\delta _{J}}\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big )\),
Next, we will show that there exists \(\epsilon _J>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),
and
Since the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{p}{q}}\) is continuous on \(\mathbb {R}\), there exists \(\epsilon _{J,1}>0\) such that (A.12) is satisfied for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\). Next, let us consider (A.13). If \(\varvec{x}^0_J=\varvec{0}\), then we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)=\varvec{0}\) since the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{p}{q}}\) is continuous. Hence, in this case, we have
Otherwise, from (19), we have \(\varvec{s}_J^0=w_J {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big \Vert _q}\). Considering the continuity of the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{p}{q}}\), we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\varvec{s}_J^0 {w_J}^{-1}{\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\). In either case, we have (A.14). Thus, there exists \(\epsilon _{J,2}>0\) such that (A.13) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,2}}(\varvec{x}_J^0)\).
Let \(\varepsilon _J{:}{=}\min \{\epsilon _{J,1},\epsilon _{J,2}\}\). For any \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\), we have (A.12) and (A.13). Combining (A.11) and (17), we obtain that (A.11) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\). \(\square \)
Proof of Lemma 5.4
Since \(p\in ]1,2]\), we have \(\frac{q}{p}\ge 1\). For any \(p\in ]1,2]\), it is clear that the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\) is continuously differentiable on \(\mathbb {R}\) and hence locally Lipschitz. Thus, for \(\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\), there exist \(\delta _{J},\kappa _{J}>0\) such that for all \(\varvec{\xi }_{J,1}, \varvec{\xi }_{J,2} \in \mathbb {U}_{\delta _{J}}\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big )\),
Next, we will show that there exists \(\epsilon _J>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),
and
Since \(p\in ]1,2]\), we have \(\frac{p}{q}\in ]0, 1]\). Since the function \(z\rightarrow \text{ sign }(z) |z|^{t}\) is continuous for any fixed \(t\in ]0, 1]\), there exists \(\epsilon _{J,1}>0\) such that (A.16) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\). Next, let us consider (A.17). If \(\varvec{x}^0_J=\varvec{0}\), we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J}\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)=\varvec{0}\) since the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\) is continuous. Hence, in this case, we obtain
If \(\varvec{x}^0_J \ne \varvec{0}\), then by Lemma 5.2 we have \(\varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)=w_J {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\big \Vert _q}\). Considering the continuity of the function \(z\rightarrow \text{ sign }(z) |z|^{\frac{q}{p}}\), we have \(\lim _{\varvec{x}_J\rightarrow \varvec{x}^0_J} \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0) {w_J}^{-1}\) \( {\Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\Vert _q}=\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}^0_J)\). In either case, we have (A.18). Thus, there exists \(\epsilon _{J,2}>0\) such that (A.17) holds for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,2}}(\varvec{x}_J^0)\).
Let \(\varepsilon _J{:}{=}\min \{\epsilon _{J,1},\epsilon _{J,2}\}\). For any \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\), we have (A.16) and (A.17). With (A.15) and (17), we obtain that (35) is satisfied for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J}}(\varvec{x}_J^0)\). \(\square \)
Proof of Lemma 5.5
Let \(j\in J\) be arbitrary. Since \(\varvec{x}_J\ne 0\), we have
(i) In the case where \((\varvec{s}_J^0)_j>\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )_j=(\varvec{s}_J^0)_j-\lambda >0\). If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=1\). Clearly, we have
If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=-1\). Clearly, we have
If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=[-1,1]\). Clearly, for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\),
(ii) In the case where \((\varvec{s}_J^0)_j<-\lambda \), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )_j=(\varvec{s}_J^0)_j+\lambda <0\). If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=1\). Clearly, we have
If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=-1\). Clearly, we have
If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=[-1,1]\). Clearly, for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\),
(iii) In the case where \((\varvec{s}_J^0)_j\in [-\lambda , \lambda ]\), we have \(\big (\varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\big )_j=0\). If \((\varvec{x}_J)_j>0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j>0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=1\). Clearly, we have
If \((\varvec{x}_J)_j<0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j<0\) and \((\partial \Vert \varvec{x}_J\Vert _1)_j=-1\). Clearly, we have
If \((\varvec{x}_J)_j=0\), then \(\big (\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big )_j=0\). Hence, we have
From the above analysis, in either case, for any \(\varvec{y}_J\in \partial \Vert \varvec{x}_J\Vert _1\), we have
Thus, we obtain (36). This completes the proof. \(\square \)
Proof of Proposition 5.3
Let \(\left( \varvec{x}^0_J, \varvec{s}^0_J \right) \in \hbox \mathrm{gph} (\partial g_J)\) be arbitrary, then
We consider the following five cases: (i) \(p=1\); (ii) \(p \in ]1, 2]\), \(w_J=0\) and \(\lambda = 0\); (iii) \(p \in ]1, 2]\) \(w_J>0\) and \(\lambda =0\); (iv) \(p \in ]1, 2]\), \(w_J=0\) and \(\lambda >0\); (v) \(p \in ]1, 2]\) \(w_J>0\) and \(\lambda >0\).
(i) In this case, we have \(g_J(\varvec{x}_J) = (w_J+\lambda ) \Vert \varvec{x}_J\Vert _1\) for all \(\varvec{x}_J \in \mathbb {R}^{|J|}\). If \(w_J + \lambda > 0\), then \(g_J\) is a polyhedral convex function. From [39, Section 4.2], we know that \(g_J\) is metrically subregular at \(\left( \varvec{x}_J^0, \varvec{s}_J^0 \right) \). Otherwise, we have \(g_J( \varvec{x}_J ) \equiv 0\) for all \(\varvec{x}_J \in \mathbb {R}^{|J|}\). Considering (A.19), we have \(\varvec{s}_J^0 = \varvec{0}\). It follows from (26) that \(\left( \partial g_J \right) ^{-1}\left( \varvec{s}^0_J\right) = \mathbb {R}^{|J|}\). Thus, for all \(\epsilon _J > 0\) and \(\varvec{x}_J \in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),
(ii) In this case, we have \(g_J(\varvec{x}_J)\equiv 0\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). Similar to the proof of case (i), we can show that (A.20) holds for all \(\epsilon _J>0\) and \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\).
(iii) In this case, we have \(g_J(\varvec{x}_J) = w_J \Vert \varvec{x}_J\Vert _p\) for all \(\varvec{x}_J \in \mathbb {R}^{|J|}\). It follows from (19) that either (iiia) \(\Vert \varvec{s}_J^0\Vert _q< w_J\) or (iiib) \(\Vert \varvec{s}_J^0\Vert _q=w_J\).
(iiia) If \(\Vert \varvec{s}_J^0\Vert _q<w_J\), then we have \(\big (w_J \partial \Vert \Vert _p\big )^{-1}(\varvec{s}_J^0)=\{\varvec{0}\}\) and hence \(\varvec{x}_J^0=\varvec{0}\). Thus, for any \(\varvec{x}_{J}\in \mathbb {R}^{|J|}\), we have
Set \( \epsilon _J {:}{=} \min _{\varvec{z} \in \mathbb {R}^{|J|}}\left\{ \left\| \varvec{s}_J^0-w_J \varvec{z} \right\| : \Vert \varvec{z}\Vert _q=1 \right\} , \) then \(\epsilon _J>0\) due to \(\Vert \varvec{s}_J^0\Vert _q<w_J\).
Let \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\) be arbitrary. In the case where \(\varvec{x}_J=\varvec{0}\), we can obtain (A.20) immediately. In the case where \(\varvec{x}_J\ne \varvec{0}\), from (A.21), we have \( \mathrm{{dist}}\Big (\varvec{x}_J,(\partial g_J)^{-1}(\varvec{s}^0_J)\Big )=\Vert \varvec{x}_J-\varvec{x}_J^0\Vert \le \epsilon _J. \) From (19), we have \(\partial \Vert \varvec{x}_J\Vert _p={\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\). Since \(\left\| {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\right\| _q=1\), we have
In either case, we obtain that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),
(iiib) If \(\Vert \varvec{s}_J^0\Vert _q=w_J\), it is clear that
In addition, from (29), we have
In the case where \(\varvec{x}_J=\varvec{0}\), together with (A.22) and (A.23), we can obtain (A.20) immediately. Next, we focus on the case where \(\varvec{x}_J\ne \varvec{0}\). First, from (19), we have \(\partial \Vert \varvec{x}_J\Vert _p={\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\). Using (A.23), we have
This, together with (17), yields that
By Lemma 5.3, there exist \(\epsilon _J,\kappa _{J,1}>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\),
and moreover, there must exist \(\kappa _{J,2}>0\) such that
Let \(\kappa _J{:}{=}\kappa _{J,1} \kappa _{J,2} {w_J}^{-1}\), then \(\kappa _J>0\). Thus, for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\), we have
where the first inequation follows from (A.24), the second is due to (A.25), the third is due to (A.26), and the last comes from the definition of \(\kappa _J\).
(iv) In this case, we have \(g_J(\varvec{x}_J)=\lambda \Vert \varvec{x}_J\Vert _1\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). Similar to the proof of case (i), we can show that (A.20) holds for all \(\epsilon _J>0\) and \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\).
(v) In this case, we have \(g_J(\varvec{x}_J)=w_J \Vert \varvec{x}_J\Vert _p+\lambda \Vert \varvec{x}_J\Vert _1\) for all \(\varvec{x}_J\in \mathbb {R}^{|J|}\). It follows from (31) that either (va) \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)\Vert _q< w_J\) or (vb) \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)\Vert _q=w_J\).
(va) If \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}_J^0)\Vert _q<w_J\), then from (31) we have
and hence, \(\varvec{x}_J^0 = \varvec{0}\). Thus, for any \(\varvec{x}_{J}\in \mathbb {R}^{|J|}\), we have
In the case \(\varvec{x}_J=\varvec{0}\), we have (A.20) immediately. In the case \(\varvec{x}_J \ne \varvec{0}\), we set
We claim \(\epsilon _J>0\). Otherwise, if \(\epsilon _J=0\), then there exists \(\varvec{z}^0_J\in \mathbb {R}^{|J|}\) satisfying \(\Vert \varvec{z}^0_J\Vert _q=1\) and \(y^0_J\in \partial \Vert \varvec{z}^0_J\Vert _1\), such that \( \varvec{s}_J^0=w_J \varvec{z}^0_J+\lambda \varvec{y}^0_J. \) Since \(\Vert \varvec{z}^0_J\Vert _q=1\), we know \(\varvec{z}^0_J\ne \varvec{0}\). Take \(\tilde{\varvec{x}}^0_J{:}{=}\varvec{\varphi }_{\frac{q}{p}}(\varvec{z}^0_J)\). It is easy to verify that \(\varvec{z}^0_J={\varvec{\varphi }_{\frac{p}{q}}(\tilde{\varvec{x}}^0_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\tilde{\varvec{x}}^0_J)\big \Vert _q}\). From (19), we have \(\varvec{z}^0_J\in \partial \Vert \tilde{\varvec{x}}^0_J\Vert _p\). Moreover, we have \(\partial \Vert \varvec{z}^0_J\Vert _1=\partial \Vert \tilde{\varvec{x}}^0_J\Vert _1\) since \(\varvec{z}^0_J\ne \varvec{0}\). Thus, we have
Clearly, this result conflicts with (A.27). Hence, we have \(\epsilon _J>0\).
Since \(\varvec{x}_J\ne \varvec{0}\), we have \(\partial \Vert \varvec{x}_J\Vert _p={\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\). Moreover, we have \(\partial \Vert \varvec{x}_J\Vert _1=\partial \Big \Vert {\varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)}\big /{\big \Vert \varvec{\varphi }_{\frac{p}{q}}(\varvec{x}_J)\big \Vert _q}\Big \Vert _1\). Considering the definition of \(\epsilon _J\), we can obtain \( \mathrm{{dist}}\big (\varvec{s}_J^0,g_J(\varvec{x}_J)\big )\ge \epsilon _J. \) This, together with (A.28), yields that
(vb) If \(\Vert \varvec{\mathcal {T}}_{\lambda }(\varvec{s}^0_J)\Vert _q= w_J\), then from (31) we have
Clearly, we have \(\varvec{0} \in (\partial g_J)^{-1}(\varvec{s}^0_J)\), namely
In the case where \(\varvec{x}_J=\varvec{0}\), we can obtain (A.20) immediately. Next, we consider the case \(\varvec{x}_J\ne \varvec{0}\). Using (A.29), we have
This, together with (17), yields
By Lemma 5.4, there exist \(\epsilon _{J,1},\kappa _{J,1}>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\),
and moreover, there exists \(\kappa _{J,2}>0\) such that for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _{J,1}}(\varvec{x}_J^0)\),
Let \(\kappa _J{:}{=}\kappa _{J,1} \kappa _{J,2} {w_J}^{-1}\), then \(\kappa _J>0\). Thus, for all \(\varvec{x}_J\in \mathbb {U}_{\epsilon _J}(\varvec{x}_J^0)\), we obtain
where the first inequation follows from (A.30), the second is due to (A.31), the third is due to (A.32) and the definition of \(\kappa _J\), the last comes from Lemma 5.5.
In summary, in either case, there exist \(\epsilon _J,\kappa _{J}>0\) such that
Consequently, \(\partial g\) is metrically subregular at \(\left( \varvec{x}_J^0, \varvec{s}_J^0 \right) \in \hbox \mathrm{gph} (\partial g_J)\). \(\square \)
Rights and permissions
About this article
Cite this article
Zhang, J., Zhu, X. Linear Convergence of Prox-SVRG Method for Separable Non-smooth Convex Optimization Problems under Bounded Metric Subregularity. J Optim Theory Appl 192, 564–597 (2022). https://doi.org/10.1007/s10957-021-01978-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-021-01978-w
Keywords
- Linear convergence
- Bounded metric subregularity
- Calmness
- Proximal stochastic variance-reduced gradient
- Randomized block-coordinate proximal gradient