Abstract
We consider the problem of minimizing the difference of two nonsmooth convex functions over a simple convex set. To deal with this class of nonsmooth and nonconvex optimization problems, we propose new proximal bundle algorithms and show that the given approaches generate subsequences of iterates that converge to critical points. Trial points are obtained by solving strictly convex master programs defined by the sum of a convex cutting-plane model and a freely-chosen Bregman function. In the unconstrained case with the Bregman function being the Euclidean distance, new iterates are solutions of strictly convex quadratic programs of limited sizes. Stronger convergence results (d-stationarity) can be achieved depending on (a) further assumptions on the second DC component of the objective function and (b) solving possibly more than one master program at certain iterations. The given approaches are validated by encouraging numerical results on some academic DC programs.
Similar content being viewed by others
Notes
We have used \(\rho =10^4\times n\) in our numerical experiments.
Probability computed by using the mvncdf Matlab’s function.
References
Astorino, A., Miglionico, G.: Optimizing sensor cover energy via DC programming. Optim. Lett. 10(2), 355–368 (2016)
Bagirov, A.M.: A method for minimization of quasidifferentiable functions. Optim. Methods Softw. 17(1), 31–60 (2002)
Bagirov, A.M., Yearwood, J.: A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems. Eur. J. Oper. Res. 170(2), 578–596 (2006)
Ben-Tal, A., Nemirovski, A.: Non-Euclidean restricted memory level method for large-scale convex optimization. Math. Program. 102, 407–456 (2005)
Bonnans, J., Gilbert, J., Lemaréchal, C., Sagastizábal, C.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)
Clarke, F.H.: Optimisation and nonsmooth analysis. Soc. Ind. Appl. Math. (1990). https://doi.org/10.1137/1.9781611971309
Cruz Neto, J.X., Oliveira, P.R., Soubeyran, A., Souza, J.C.O.: A generalized proximal linearized algorithm for DC functions with application to the optimal size of the firm problem. Ann. Oper. Res. (2018). https://doi.org/10.1007/s10479-018-3104-8
de Oliveira, W., Solodov, M.: A doubly stabilized bundle method for nonsmooth convex optimization. Math. Program. 156(1), 125–159 (2016)
de Oliveira, W.: Target radius methods for nonsmooth convex optimization. Oper. Res. Lett. 45(6), 659–664 (2017)
de Oliveira, W., Sagastizábal, C., Lemaréchal, C.: Convex proximal bundle methods in depth: a unified analysis for inexact oracles. Math. Program. Ser. B 148, 241–277 (2014)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Frangioni, A.: Generalized bundle methods. SIAM J. Optim. 13(1), 117–156 (2002)
Frangioni, A., Gorgone, E.: Generalized bundle methods for sum-functions with “easy” components: applications to multicommodity network design. Math. Program. 145(1), 133–161 (2014)
Fuduli, A., Gaudioso, M., Giallombardo, G.: A DC piecewise affine model and a bundling technique in nonconvex nonsmooth minimization. Optim. Methods Softw. 19(1), 89–102 (2004)
Gaudioso, M., Giallombardo, G., Miglionico, G.: Minimizing piecewise-concave functions over polyhedra. Math. Oper. Res. 43(2), 580–597 (2018)
Gaudioso, M., Giallombardo, G., Miglionico, G., Bagirov, A.M.: Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations. J. Glob. Optim. 71(1), 37–55 (2018)
Hare, W., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5), 2442–2473 (2010)
Hare, W., Sagastizábal, C., Solodov, M.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63(1), 1–28 (2016)
Henrion, R.: A Critical Note on Empirical (Sample Average, Monte Carlo) Approximation of Solutions to Chance Constrained Programs (Chapter 3 in [24]). Springer, Berlin (2013)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Grundlehren der mathematischen Wissenschaften, vol. 305, 2nd edn. Springer, Berlin (1996)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II. Grundlehren der mathematischen Wissenschaften, vol. 306, 2nd edn. Springer, Berlin (1996)
Hiriart-Urruty, J.B.: Generalized Differentiability/Duality and Optimization for Problems Dealing with Differences of Convex Functions, pp. 37–70. Springer, Berlin, Heidelberg (1985)
Holmberg, K., Tuy, H.: A production–transportation problem with stochastic demand and concave production costs. Math. Program. 85(1), 157–179 (1999)
Hömberg, D., Tröltzsch, F. (eds.): System Modeling and Optimization. IFIP Advances in Information and Communication, vol. 391. Springer, Berlin (2013)
Hong, L.J., Yang, Y., Zhang, L.: Sequential convex approximations to joint chance constrained programs: A Monte Carlo approach. Oper. Res. 59(3), 617–630 (2011)
Joki, K., Bagirov, A., Karmitsa, N., Mäkelä, M.M., Taheri, S.: Double bundle method for finding Clarke stationary points in nonsmooth DC programming. SIAM J. Optim. 28(2), 1892–1919 (2018)
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M.: A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes. J. Glob. Optim. 68(3), 501–535 (2017)
Kelley, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)
Khalaf, W., Astorino, A., d’Alessandro, P., Gaudioso, M.: A DC optimization-based clustering technique for edge detection. Optim. Lett. 11(3), 627–640 (2017)
Kiwiel, K.C.: A proximal bundle method with approximate subgradient linearizations. SIAM J. Optim. 16(4), 1007–1023 (2006)
Le Thi, H.A., Tao, P.D.: DC programming in communication systems: challenging problems and methods. Vietnam J. Comput. Sci. 1(1), 15–28 (2014)
Le Thi, H.A., Pham Dinh, T., Ngai, H.V.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52(3), 509–535 (2012)
Lemaréchal, C.: An algorithm for minimizing convex functions. Inf. Process. 1, 552–556 (1974)
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1), 111–147 (1995)
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1), 135–163 (2013)
Mäkelä, M.M., Miettinen, M., Lukšan, L., Vlček, J.: Comparing nonsmooth nonconvex bundle methods in solving hemivariational inequalities. J. Glob. Optim. 14(2), 117–135 (1999)
Noll, D., Apkarian, P.: Spectral bundle methods for non-convex maximum eigenvalue functions: first-order methods. Math. Program. 104(2), 701–727 (2005)
Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42(1), 95–118 (2017)
Prékopa, A.: Stochastic Programming. Kluwer, Dordrecht (1995)
Rockafellar, R.: Convex Analysis, 1st edn. Princeton University Press, Princeton (1970)
Souza, J.C.O., Oliveira, P.R., Soubeyran, A.: Global convergence of a proximal linearized algorithm for difference of convex functions. Optim. Lett. 10(7), 1529–1539 (2016)
Tao, P.D., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
Tuy, H.: Convex Analysis and Global Optimization. Springer Optimization and Its Applications, 2nd edn. Springer, Berlin (2016)
van Ackooij, W.: Eventual convexity of chance constrained feasible sets. Optimization 64(5), 1263–1284 (2015)
van Ackooij, W., Cruz, J.B., de Oliveira, W.: A strongly convergent proximal bundle method for convex minimization in Hilbert spaces. Optimization 65(1), 145–167 (2016)
van Ackooij, W., Henrion, R.: Gradient formulae for nonlinear probabilistic constraints with Gaussian and Gaussian-like distributions. SIAM J. Optim. 24(4), 1864–1889 (2014)
van Ackooij, W., de Oliveira, W.: Level bundle methods for constrained convex optimization with various oracles. Comput. Optim. Appl. 57(3), 555–597 (2014)
van Ackooij, W., Sagastizábal, C.: Constrained bundle methods for upper inexact oracles with application to joint chance constrained energy problems. SIAM J. Optim. 24(2), 733–765 (2014)
Acknowledgements
The author is grateful to the reviewers for their remarks and constructive suggestions that considerably improved the original version of this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix
A Appendix
1.1 A.1 A self-contained analysis of the sequence of infinitely many null steps generated after a last serious step
We assume that after the \({\hat{\ell }}\mathrm{th}\)-stability center \(x^{k({\hat{\ell }})} = {\hat{x}}\) only null steps are performed, i.e.,
where \({\hat{g}}_2 = g_2^{k({\hat{\ell }})}\) is the last subgradient computed for \(f_2\). Notice that in this case the sequence \(\{\mu _k\}_{k\ge k({\hat{\ell }})}\) is nondecreasing. In what follows we present some auxiliary results to prove Lemma 4. We start by defining the following two useful functions:
Notice that \(F^{-k}\) is twice differentiable (because \(\omega \) defining D is so):
where \(\nabla ^2 \omega (x) \in {\mathbb {R}}^{n\times n}\) is the Hessian of the function \(\omega \). Since \(\omega \) is strongly convex then \(\nabla ^2 \omega (x)\) is positive definite for all \(x \in {\mathbb {R}}^n\). It follows from (10) that \(\nabla F^{-k}(x^{k+1})=0\), i.e., the point \(x^{k+1}\) is the unique minimizer of \( F^{-k}(x)\) over \({\mathbb {R}}^n\). The Taylor and mean value Theorems [5, Sect. 13] give, for some \(z = \lambda x^{k+1}+(1-\lambda ){\hat{x}}\) and \(\lambda \in [0,1]\),
where the inequality is due to the assumption that \(\omega \) is strongly convex with parameter \(\rho >0\) and norm \(\Arrowvert \cdot \Arrowvert _p\) in (5) (\(\langle \nabla ^2 \omega (z) (x-x^{k+1}),x-x^{k+1}\rangle \ge \rho \Arrowvert x-x^{k+1} \Arrowvert _p\)), and the last equality follows from (31) and (32). The above development is crucial to show the following lemma, which is essentially a reformulation of [10, Lemma 6.3] to our setting.
Lemma 7
Let \({\hat{x}}= x^{k(\ell )}\) be the last stability center generated by Algorithm 1 during the iteration \(k({\hat{\ell }})\) after which only null steps are performed. Assume also that for \(k\ge k({\hat{\ell }})\) the function \(F^k\) is the model given in (31) and \(x^{k+1}\) is an iterate obtained from a null step. If \(\{\mu _k\}_{k\ge k({\hat{\ell }})}\) is nondecreasing, then
-
(i)
the sequence \(\{ F^{k}(x^{k+1})\}_{{k\ge k({\hat{\ell }})}}\) is nondecreasing and satisfies
$$\begin{aligned} F^{k}(x^{k+1})+\frac{\mu _k \rho }{2}\Arrowvert x^{k+2}-x^{k+1} \Arrowvert ^2_p \le F^{{k+1}}(x^{k+2}) \;{\hbox { for all }\; k\ge k({\hat{\ell }})}; \end{aligned}$$ -
(ii)
the sequence \(\{F^{k}(x^{k+1})\}_{{k\ge k(\hat{\ell })}}\) is bounded from above:
$$\begin{aligned} F^{k}(x^{k+1})+\frac{\mu _k \rho }{2}\Arrowvert {\hat{x}}-x^{k+1} \Arrowvert ^2_p \le f_1({\hat{x}}) \;{\hbox { for all }\; k\ge k({\hat{\ell }})}; \end{aligned}$$ -
(iii)
the following inequality holds true for all \(k\ge k({\hat{\ell }})\)
$$\begin{aligned} {\check{f}}_1^{k}(x^{k+1})-{\check{f}}_1^{k-1}(x^k)\le & {} F^{k}(x^{k+1}) -F^{k-1}(x^k) + \mu _{k-1}[D(x^k,{\hat{x}}) \\&- D(x^{k+1},{\hat{x}})] -\langle {\hat{g}}_2,x^k-x^{k+1}\rangle . \end{aligned}$$
Proof
Algorithm 1 ensures that the aggregate index \(-k\) enters the bundle in every null step. In particular \(-k \in {\mathcal {B}}_1^{{k+1}}\) for all \(k\ge k({\hat{\ell }})\). Then for all \(x \in X\) and all \(k\ge k({\hat{\ell }})\) we have that
where the first inequality is due to \(s^{k+1}\in N_X(x^{k+1})\), the second follows from inequality \(\bar{f}_1^{-k}{(\cdot )}\le {\check{f}}_1^{k+1}{(\cdot )}\) ensured because \(-k \in {\mathcal {B}}_1^{{k+1}}\), and the last inequality follows from the assumption \(\mu _{k+1}\ge \mu _{k}\). Set \(x=x^{k+2}\) in (33) to obtain (i), and \(x={\hat{x}}\) to obtain \(F^{k}(x^{k+1}) + \frac{\mu _k\rho }{2}\Arrowvert {\hat{x}}-x^{k+1} \Arrowvert ^2_p\le F^{-k}({\hat{x}}) \le F^{k+1}({\hat{x}})= {\check{f}}_1^{k+1}({\hat{x}}) \le f_1({\hat{x}})\). To show (iii), note that for all \(k\ge k({\hat{\ell }})\)
where the inequality is due to \(\mu _k\ge \mu _{k-1}\) and the last equality follows from (31) (with k therein replaced with \(k-1\)). The result thus follows. \(\square \)
Given the above properties, the following lemma shows that the cutting-plane model \({\check{f}}_1^k\) asymptotically approximates the DC component \(f_1\) on the sequence of null iterates.
Lemma 8
Under the assumptions of Lemma 7, Algorithm 1 ensures that \(\{x^k\}_{{k}}\) is a bounded sequence and
Proof
Lemma 7(i) ensures that the sequence \(\{F^k(x^{k+1})\}_{{k\ge k({\hat{\ell }})}}\) is nondecreasing. Thus, there exists a constant \(C>0\) such that \(F^k(x^{k+1}) \ge -C\) for all \(k\ge k({\hat{\ell }})\). Using Lemma 7(ii) we conclude that
showing that the sequences \(\{\Arrowvert {\hat{x}}-x^k \Arrowvert _p\}_{{k\ge k({\hat{\ell }})}}\) is bounded because \(\{\mu _k\}_{{k> k({\hat{\ell }})}}\) is nondecreasing. Accordingly, \(\{x^k\}_{{k}}\) is also bounded. It follows from Lemma 7(ii) that the sequence \(\{F^k(x^{k+1})\}_{{k\ge k({\hat{\ell }})}}\) is bounded from above by \(f_1({\hat{x}})\). Lemma 7(i) shows that \(\{F^k(x^{k+1})\}_{{k\ge k({\hat{\ell }})}}\) is nondecreasing and hence
where the second limit follows from Lemma 7(i) and the assumption that \(\{\mu _k\}_{{k\ge k({\hat{\ell }})}}\) is nondecreasing. Note that, by (5),
Since \(\{\mu _k\}_{{k}}\) is a bounded sequence and \(\omega \) is a continuous function, we conclude that
The inclusion \(k \in {\mathcal {B}}_1^k\) implies \( f_1(x^k)+\langle g_1^k,x-x^k\rangle =\bar{f}_1^k(x)\le {\check{f}}_1^k(x) \; \hbox { for all }x\in {\mathbb {R}}^n. \) Setting \(x=x^{k+1}\) in this inequality yields \( f_1(x^k)=\bar{f}_1^k(x^{k+1}) + \langle g_1^k,x^k-x^{k+1}\rangle \). Therefore, for \(k>k({\hat{\ell }})\)
where the last inequality is due to Lemma 7(iii). Applying the limit with \(k\rightarrow \infty \) in the above inequalities and taking into account (35) and (36), remembering that \(\{x^k\}_{{k}}\), and \(\{g_1^k\}_{{k}}\) are bounded sequences, we conclude that \(\limsup _{k\rightarrow \infty } [f_1(x^k)-{\check{f}}_1^{k-1}(x^k)]\le 0\). Since \(f_1\) is convex we have \(f_1(x^k)\ge {\check{f}}_1^{k-1}(x^k)\) and the result follows. \(\square \)
1.2 Proof of Lemma 4
Suppose that \({\hat{x}}\) is not a cluster point of \(\{x^{k+1}\}_{{k\ge k({\hat{\ell }})}}\). Then there would exist \(\epsilon >0\) and an index \({\tilde{k}}\ge k({\hat{k}})\) such that
for all index \(k+1\ge {\tilde{k}}\). It follows from Lemma 8 that there exists an index \({\bar{k}}\ge k({\hat{\ell }})\) such that
Definition of \(x^{k+1}\), feasibility of \({\hat{x}}\) and inequality \({\check{f}}_1^k {(\cdot )}\le f_1{(\cdot )}\) yield the inequality
As by assumption \(\kappa \in (0,1)\), \(\mu _k \ge {\underline{\mu }}\) and assuming that \(k+1 > \max \{{\tilde{k}},\,{\bar{k}}\}\) we get
showing that \(x^{k+1}\) satisfies the descent test (13), i.e. \(x^{k+1}\) becomes the new stability center. But this contradicts the fact that \({\hat{x}}\) is the last stability center. Hence, the sequence \(\{x^{k+1}\}_{{k}}\) has a subsequence that converges to \({\hat{x}}\), i.e., \(\lim _{k\in {\mathcal {K}}} x^{k} ={\hat{x}}\) for some index set \({\mathcal {K}}\subset {\{k({\hat{\ell }})+1,k({\hat{\ell }})+2,\ldots \}}\). We now proceed to show that indeed the whole sequence converges to \({\hat{x}}\): it follows from (31) that
and, therefore, \(\lim _{k\in {\mathcal {K}}} F^{k-1}(x^k) = f_1({\hat{x}})\) from Lemma 8. Lemma 7(i) shows that \(\{F^{k-1}(x^k)\}_{{k\ge k({\hat{\ell }})}}\) is nondecreasing and hence \(\lim _{k\rightarrow \infty } F^{k}(x^{k+1}) =\lim _{k\in {\mathcal {K}}} F^{k-1}(x^k)= f_1({\hat{x}})\). This property combined with (34) shows that the whole sequence \(\{x^k\}_{{k}}\) converges to \({\hat{x}}\). \(\square \)
Rights and permissions
About this article
Cite this article
de Oliveira, W. Proximal bundle methods for nonsmooth DC programming. J Glob Optim 75, 523–563 (2019). https://doi.org/10.1007/s10898-019-00755-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-019-00755-4