Differentiable McCormick relaxations

Khan, Kamil A.; Watson, Harry A. J.; Barton, Paul I.

doi:10.1007/s10898-016-0440-6

Differentiable McCormick relaxations

Published: 27 May 2016

Volume 67, pages 687–729, (2017)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Kamil A. Khan¹,
Harry A. J. Watson² &
Paul I. Barton²

1003 Accesses
26 Citations
Explore all metrics

A Correction to this article was published on 25 January 2018

This article has been updated

Abstract

McCormick’s classical relaxation technique constructs closed-form convex and concave relaxations of compositions of simple intrinsic functions. These relaxations have several properties which make them useful for lower bounding problems in global optimization: they can be evaluated automatically, accurately, and computationally inexpensively, and they converge rapidly to the relaxed function as the underlying domain is reduced in size. They may also be adapted to yield relaxations of certain implicit functions and differential equation solutions. However, McCormick’s relaxations may be nonsmooth, and this nonsmoothness can create theoretical and computational obstacles when relaxations are to be deployed. This article presents a continuously differentiable variant of McCormick’s original relaxations in the multivariate McCormick framework of Tsoukalas and Mitsos. Gradients of the new differentiable relaxations may be computed efficiently using the standard forward or reverse modes of automatic differentiation. Extensions to differentiable relaxations of implicit functions and solutions of parametric ordinary differential equations are discussed. A C++ implementation based on the library MC++ is described and applied to a case study in nonsmooth nonconvex optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Role of grossone in Nonlinear Programming and Exact Penalty Methods

Adjoint Mode Computation of Subgradients for McCormick Relaxations

An ODE-like nonmonotone method for nonsmooth convex optimization

Article 26 September 2015

Change history

25 January 2018
These errata correct various errors in the closed-form relaxations provided by Khan, Watson, and Barton in the article “Differentiable McCormick Relaxations” (J Glob Optim, 67:687–729, 2017). Without these corrections, the provided closed-form relaxations may fail to be convex or concave and may fail to be valid relaxations.

References

Achterberg, T.: SCIP: solving constraint integer programs. Math. Program. Comput. 1, 1–41 (2009)
Article MathSciNet MATH Google Scholar
Achterberg, T., Berthold, T., Koch, T., Wolter, K.: Constraint integer programming: a new approach to integrate CP and MIP. In: Proceedings of the Fifth International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 6–20. Paris (2008)
Adjiman, C.S., Dallwig, S., Floudas, C.A., Neumaier, A.: A global optimization method, $\alpha $BB, for general twice-differentiable constrained NLPs—I. Theoretical advances. Comput. Chem. Eng. 22, 1137–1158 (1998)
Article Google Scholar
Alefeld, G., Mayer, G.: Interval analysis: theory and applications. J. Comput. Appl. Math. 121, 421–464 (2000)
Article MathSciNet MATH Google Scholar
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, Hoboken (2006)
Book MATH Google Scholar
Beckers, M., Mosenkis, V., Naumann, U.: Adjoint mode computation of subgradients for McCormick relaxations. In: Forth, S., Hovland, P., Phipps, E., Utke, J., Walther, A. (eds.) Recent Advances in Algorithmic Differentiation, pp. 103–113. Springer, Berlin (2012)
Chapter Google Scholar
Belotti, P.: COUENNE: A user’s manual. https://projects.coin-or.org/Couenne (2006)
Bertsekas, D.P.: Nondifferentiable optimization via approximation. In: Balinski, M., Wolfe, P. (eds.) Mathematical Programming Study 3, pp. 1–25. North-Holland Publishing Company, Amsterdam (1975)
Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Glob. Optim. 52, 1–28 (2012)
Article MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Broyden, C.G., Dennis Jr., J.E., Moré, J.J.: On the local and superlinear convergence of quasi-Newton methods. J. Inst. Math. Appl. 12, 223–245 (1973)
Article MathSciNet MATH Google Scholar
Chachuat, B.: MC++: a toolkit for bounding factorable functions, v1.0. Retrieved 2 July 2014 https://projects.coin-or.org/MCpp (2014)
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)
Book MATH Google Scholar
Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations. McGraw Hill Co., Inc., New York (1955)
MATH Google Scholar
Du, K., Kearfott, R.B.: The cluster problem in multivariate global optimization. J. Glob. Optim. 5, 253–265 (1994)
Article MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 2. Springer, New York (2003)
MATH Google Scholar
Feehery, W.F., Tolsma, J.E., Barton, P.I.: Efficient sensitivity analysis of large-scale differential-algebraic systems. Appl. Numer. Math. 25, 41–54 (1997)
Article MathSciNet MATH Google Scholar
Gabriel, S.A., Moré, J.J.: Smoothing of Mixed Complementarity Problems. Preprint MCS-P541-0995, Argonne National Laboratory (1995)
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2002)
Article MathSciNet MATH Google Scholar
Griewank, A., Rabier, P.J.: On the smoothness of convex envelopes. Trans. Am. Math. Soc. 322, 691–709 (1990)
Article MathSciNet MATH Google Scholar
Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Other Titles in Applied Mathematics, 2nd edn. SIAM, Philadelphia (2008)
Book MATH Google Scholar
Grossmann, I.E., Yeomans, H., Kravanja, Z.: A rigorous disjunctive optimization model for simultaneous flowsheet optimization and heat integration. Comput. Chem. Eng. 22(98), 157–164 (1998)
Article Google Scholar
Hartman, P.: Ordinary Differential Equations, 2nd edn. SIAM, Philadelphia (2002)
Book MATH Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (1993)
Book MATH Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (1993)
Book MATH Google Scholar
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 2nd edn. Springer, Berlin (1993)
Book MATH Google Scholar
Kesavan, P., Allgor, R.J., Gatzke, E.P., Barton, P.I.: Outer approximation algorithms for separable nonconvex mixed-integer nonlinear programs. Math. Program. Ser. A 100, 517–535 (2004)
Article MathSciNet MATH Google Scholar
Khan, K.A.: Sensitivity analysis for nonsmooth dynamic systems. Ph.D. thesis, Massachusetts Institute of Technology (2015)
Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics. Springer, Berlin (1985)
Book MATH Google Scholar
Lemaréchal, C., Strodiot, J.J., Bihain, A.: On a bundle algorithm for nonsmooth optimization. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming 4. Academic Press, New York (1981)
Google Scholar
Li, X., Tomasgard, A., Barton, P.I.: Nonconvex generalized Benders decomposition for stochastic separable mixed-integer nonlinear programs. J. Optim. Theory Appl. 151, 425–454 (2011)
Article MathSciNet MATH Google Scholar
Liberti, L., Pantelides, C.C.: Convex envelopes of monomials of odd degree. J. Glob. Optim. 25, 157–168 (2003)
Article MathSciNet MATH Google Scholar
Lin, Y., Schrage, L.: The global solver in the LINDO API. Optim. Methods Softw. 24, 657–668 (2009)
Article MathSciNet MATH Google Scholar
Mäkelä, M.M.: Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine MPBNGC 2.0. Reports of the Department of Mathematical Information Technology, Series B, Scientific computing B 13/2003, University of Jyväskylä (2003)
Maly, T., Petzold, L.R.: Numerical methods and software for sensitivity analysis of differential-algebraic systems. Appl. Numer. Math. 20, 57–79 (1996)
Article MathSciNet MATH Google Scholar
Mangasarian, O.L.: A simple characterization of solution sets of convex programs. Oper. Res. Lett. 7(1), 21–26 (1988)
Article MathSciNet MATH Google Scholar
McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part I—convex underestimating problems. Math. Program. 10, 147–175 (1976)
Article MATH Google Scholar
Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59, 503–526 (2014)
Article MathSciNet MATH Google Scholar
Mitsos, A., Chachuat, B., Barton, P.I.: McCormick-based relaxations of algorithms. SIAM J. Optim. 20, 573–601 (2009)
Article MathSciNet MATH Google Scholar
Moore, R.E.: Methods and Applications of Interval Analysis. SIAM, Philadelphia (1979)
Book MATH Google Scholar
Najman, J., Mitsos, A.: Convergence analysis of multivariate McCormick relaxations. J. Glob. Optim. in press (2016)
Naumann, U.: The Art of Differentiating Computer Programs. SIAM, Philadelphia (2012)
MATH Google Scholar
Neumaier, A.: Interval Methods for Systems of Equations. Cambridge University Press, Cambridge (1990)
MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer series in operations research and financial engineering, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Classics in Applied Mathematics. SIAM, Philadelphia (2000)
Book MATH Google Scholar
Qi, L., Sun, D.: Smoothing functions and smoothing Newton method for complementarity and variational inequality problems. J. Optim. Theory Appl. 113, 121–147 (2002)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton University Press, Princeton (1970)
Google Scholar
Ryoo, H.S., Sahinidis, N.V.: Global optimization of nonconvex NLPs and MINLPs with applications in process design. Comput. Chem. Eng. 19(5), 551–566 (1995)
Article Google Scholar
Sahinidis, N.V.: BARON: a general purpose global optimization software package. J. Glob. Optim. 8, 201–205 (1996)
Article MathSciNet MATH Google Scholar
Sahinidis, N.V.: BARON 15.9: Global Optimization of Mixed-Integer Nonlinear Programs, User’s Manual. https://www.gams.com/help/topic/gams.doc/solvers/baron/index.html (2015)
Schaber, S.D.: Tools for dynamic model development. Ph.D. thesis, Massachusetts Institute of Technology (2014)
Scholz, D.: Theoretical rate of convergence for interval inclusion functions. J. Glob. Optim. 53, 749–767 (2012)
Article MathSciNet MATH Google Scholar
Scott, J.K.: Reachability analysis and deterministic global optimization of differential-algebraic systems. Ph.D. thesis, Massachusetts Institute of Technology (2012)
Scott, J.K., Barton, P.I.: Convex and concave relaxations for the parametric solutions of semi-explicit index-one differential-algebraic equations. J. Optim. Theory Appl. 156, 617–649 (2013)
Article MathSciNet MATH Google Scholar
Scott, J.K., Barton, P.I.: Improved relaxations for the parametric solutions of ODEs using differential inequalities. J. Glob. Optim. 57, 143–176 (2013)
Article MathSciNet MATH Google Scholar
Scott, J.K., Chachuat, B., Barton, P.I.: Nonlinear convex and concave relaxations for the solutions of parametric ODEs. Optim. Control Appl. Methods 34, 145–163 (2013)
Article MathSciNet MATH Google Scholar
Scott, J.K., Stuber, M.D., Barton, P.I.: Generalized McCormick relaxations. J. Glob. Optim. 51, 569–606 (2011)
Article MathSciNet MATH Google Scholar
Shor, N.Z.: Minimization Methods for Non-differentiable Functions. Springer series in computational mathematics. Springer, Berlin (1985)
Book MATH Google Scholar
Stuber, M.D., Scott, J.K., Barton, P.I.: Convex and concave relaxations of implicit functions. Optim. Methods Softw. 30(3), 424–460 (2015)
Article MathSciNet MATH Google Scholar
Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Nonconvex Optimization and Its Applications. Springer, Dordrecht (2002)
Book MATH Google Scholar
Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: a theoretical and computational study. Math. Program. A 99, 563–591 (2004)
Article MathSciNet MATH Google Scholar
Tsoukalas, A., Mitsos, A.: Multivariate McCormick relaxations. J. Glob. Optim. 59, 633–662 (2014)
Article MathSciNet MATH Google Scholar
Watson, H.A.J., Khan, K.A., Barton, P.I.: Multistream heat exchanger modeling and design. AIChE J. 61(10), 3390–3403 (2015)
Article Google Scholar
Wechsung, A.: Global optimization in reduced space. Ph.D. thesis, Massachusetts Institute of Technology (2014)
Wechsung, A., Aspelund, A., Gundersen, T., Barton, P.I.: Synthesis of heat exchanger networks at subambient conditions with compression and expansion of process streams. AIChE J. 57(8), 2090–2108 (2011)
Article Google Scholar
Wechsung, A., Schaber, S.D., Barton, P.I.: The cluster problem revisited. J. Glob. Optim. 58, 429–438 (2014)
Article MathSciNet MATH Google Scholar
Wechsung, A., Scott, J.K., Watson, H.A.J., Barton, P.I.: Reverse propagation of McCormick relaxations. J. Glob. Optim. 63(1), 1–36 (2015)
Article MathSciNet MATH Google Scholar
Whitney, H.: Analytic extensions of differentiable functions defined in closed sets. Trans. Amer. Math. Soc. 36, 63–89 (1934)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank Achim Wechsung and Spencer Schaber for several helpful discussions.

Author information

Authors and Affiliations

Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA
Kamil A. Khan
Process Systems Engineering Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Harry A. J. Watson & Paul I. Barton

Authors

Kamil A. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Harry A. J. Watson
View author publications
You can also search for this author in PubMed Google Scholar
Paul I. Barton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul I. Barton.

Additional information

This material was supported by Novartis Pharmaceuticals as part of the Novartis-MIT Center for Continuous Manufacturing, was also supported by Statoil, and was based in part on work supported by the US Department of Energy, Office of Science, under contract DE-AC02-06CH11357.

A correction to this article is available online at https://doi.org/10.1007/s10898-017-0601-2.

Appendices

Proofs of results

1.1 Proof of Proposition 3

This proof proceeds by showing that the requirements of [69, Theorem I] are met by f on C. Since the components of f may be considered separately, it suffices to consider the case in which $m=1$. For each $x\in C$, assume that $N_x$ is convex without loss of generality; if this is not true, then redefine $N_x$ to be an open convex subset containing x. Since C is compact, there exists a finite subset $I\subset C$ for which $C\subset \bigcup _{x\in I}N_x$.

Suppose, to obtain a contradiction, that there exist $x,\xi \in I$ and $y\in C$ for which $y\in N_x\cap N_\xi $ but ${\nabla } \phi _x(y)\ne {\nabla } \phi _\xi (y)$. Since C is convex and has nonempty interior, either $y\in \mathrm {int}(C)$ or $y\in \mathrm {bd}\left( \mathrm {int}(C)\right) $. In either case, there exists a sequence $\{z_{(i)}\}_{i\in {\mathbb {N}}}$ in the nonempty open set $\tilde{N}:=N_x\cap N_\xi \cap \mathrm {int}(C)$ that converges to y. Since $\phi _x\equiv \phi _\xi \equiv f$ on $\tilde{N}$, ${\nabla } \phi _x(z_{(i)})={\nabla } \phi _\xi (z_{(i)})$ for each i. The continuity of ${\nabla } \phi _x$ and ${\nabla } \phi _\xi $ on $\tilde{N}$ then yields ${\nabla } \phi _{x}(y)={\nabla } \phi _\xi (y)$, which contradicts the choices of x, $\xi $, and y. Thus, there exists a single continuous function $g:C\rightarrow {\mathbb {R}}$ for which, for each $x\in I$, $g\equiv {\nabla } \phi _x$ on $N_x\cap C$.

To show that f is Whitney-${\mathscr {C}}^1$ on C, it suffices in light of [69, Theorem I] to show that, for each $\epsilon >0$, there exists $\delta _\epsilon >0$ for which

$$\begin{aligned} \Vert f(y)-f(x)-\langle g(x), y-x \rangle \Vert <\epsilon \Vert y-x\Vert \end{aligned}$$

(16)

whenever $x,y\in C$ and $\Vert y-x\Vert <\delta _\epsilon $. Thus, choose any $\epsilon >0$. Since g is continuous on the compact set C, g is uniformly continuous on C; there exists $\tilde{\delta }_\epsilon >0$ for which $\Vert g(y)-g(x)\Vert <\epsilon $ whenever $x,y\in C$ satisfy $\Vert y-x\Vert <\tilde{\delta }_\epsilon $. Now, consider any $x,y\in C$ with $\Vert y-x\Vert <\tilde{\delta }_{\epsilon }$; the bound (16) will be shown to hold for x and y.

Define the line segment $L:=\mathrm {conv}\{x,y\}$. Since L is compact and $L\subset C$, choose $J\subset I$ as a set for which $L\subset \bigcup _{\xi \in J}N_\xi $ but $L\not \subset (\bigcup _{\xi \in J}N_\xi )\backslash N_\eta $ for each $\eta \in J$. Using these constructions, choose $k\in {\mathbb {N}}$, $0=\lambda _0<\lambda _1<\cdots <\lambda _{k}=1$, and $\xi _{(1)},\ldots ,\xi _{(k)}\in J$ for which:

$x_{(0)}:=x\in N_{\xi _{(1)}}$,
$x_{(k)}:=y\in N_{\xi _{(k)}}$, and
$x_{(q)}:=\lambda _qx + (1-\lambda _q)y\in N_{\xi _{(q+1)}}\cap N_{\xi _{(q)}}\cap L$, for each $q\in \{1,2,\ldots ,k-1\}$.

Observe that, for each $q\in \{1,\ldots ,k\}$, $x_{(q-1)}\in N_{\xi _{(q)}}$ and $x_{(q)}\in N_{\xi {(q)}}$. So, the mean-value theorem and the established properties of g yield the following, for some $y_{(q)}\in \mathrm {conv}\{x_{(q-1)},x_{(q)}\}\subset L$:

$$\begin{aligned} f\left( x_{(q)}\right) -f\left( x_{(q-1)}\right) = \left\langle {{\nabla } \phi _{\xi _{(q)}}(y_{(q)})},{x_{(q)}-x_{(q-1)}}\right\rangle = \left\langle {g(y_{(q)})},{x_{(q)}-x_{(q-1)}}\right\rangle . \end{aligned}$$

Hence,

$$\begin{aligned}&\Vert f(y)-f(x)-\langle g(x), y-x \rangle \Vert \\&\qquad = \left\| \sum _{q=1}^k\left( f\left( x_{(q)}\right) -f\left( x_{(q-1)}\right) -\left\langle {g(x)},{x_{(q)}-x_{(q-1)}}\right\rangle \right) \right\| \\&\qquad =\left\| \sum _{q=1}^k\left\langle {g\left( y_{(q)}\right) -g(x)},{x_{(q)}-x_{(q-1)}}\right\rangle \right\| \\&\qquad \le \sum _{q=1}^k\left\| g\left( y_{(q)}\right) -g(x)\right\| \left\| x_{(q)}-x_{(q-1)}\right\| \\&\qquad \le \sum _{q=1}^k\epsilon \left\| x_{(q)}-x_{(q-1)}\right\| =\epsilon \Vert y-x\Vert , \end{aligned}$$

as required; the final equation above follows from the definitions of each $x_{(q)}$ and the inequality chain $0=\lambda _0<\lambda _1<\cdots <\lambda _k=1$. $\square $

1.2 Proof of Proposition 4

This proof employs the following intermediate result.

Lemma 1

Consider an interval , a Lipschitz continuous function , and the convex envelope of f on . Then, $\underline{f}^{\mathrm {C}}(\underline{x})=f(\underline{x})$ and $\underline{f}^{\mathrm {C}}(\overline{x})=f(\overline{x})$. Moreover, $\underline{f}^{\mathrm {C}}$ is Lipschitz continuous on , with the same Lipschitz constant as f. Analogous results hold for the concave envelope of f on .

Proof

Only the convex envelope of f will be considered here; a similar argument addresses the concave envelope. The required results are trivial if $\underline{x}=\overline{x}$, so assume that $\underline{x}<\overline{x}$. Let $k_f$ denote a Lipschitz constant for f on . Applying the definition of the convex envelope,

(17)

the first inequality above is due to f dominating $\underline{f}^{\mathrm {C}}$, and the second inequality is due to $\underline{f}^{\mathrm {C}}$ dominating each convex underestimator of f on . Setting y to $\underline{x}$ in the above inequality chain yields $\underline{f}^{\mathrm {C}}(\underline{x})=f(\underline{x})$.

A similar argument yields:

(18)

setting y to $\overline{x}$ yields $\underline{f}^{\mathrm {C}}(\overline{x})=f(\overline{x})$.

Thus, (17) and (18) become:

Defining $D_+\underline{f}^{\mathrm {C}}$ and $D_-\underline{f}^{\mathrm {C}}$ as the right-derivative and left-derivative of $\underline{f}^{\mathrm {C}}$ described in [25, Theorem I.4.1.1], it follows from [25, Proposition I.4.1.3] that $D_+\underline{f}^{\mathrm {C}}(\underline{x})$ and $D_-\underline{f}^{\mathrm {C}}(\overline{x})$ both exist, are finite, and satisfy $D_+\underline{f}^{\mathrm {C}}(\underline{x}) \ge -k_f$, and $D_-\underline{f}^{\mathrm {C}}(\overline{x}) \le k_f$. Thus, $\underline{f}^{\mathrm {C}}$ is continuous at $\underline{x}$ and $\overline{x}$. Moreover, [25, Theorem I.4.2.1] implies that for each , each subgradient of $\underline{f}^{\mathrm {C}}$ at y is an element of $[-k_f,k_f]$. This result, combined with the mean-value theorem [25, Theorem I.4.2.4], shows that $\underline{f}^{\mathrm {C}}$ is Lipschitz continuous on , with a Lipschitz constant of $k_f$. $\square $

Using the above lemma, Proposition 4 may be proved as follows. Only the convex envelope of f will be considered here; a similar argument addresses the concave envelope. The required result is trivial if $\underline{x}=\overline{x}$, so assume that $\underline{x}<\overline{x}$. Theorem 3.2 in [21] implies that $\underline{f}^{\mathrm {C}}$ is ${\mathscr {C}}^1$ on ; it remains to be shown that $\underline{f}^{\mathrm {C}}$ is also ${\mathscr {C}}^1$ at $\underline{x}$ and $\overline{x}$. Noting that f is Lipschitz continuous on , construct the right-derivative $D_+\underline{f}^{\mathrm {C}}$ and the left-derivative $D_-\underline{f}^{\mathrm {C}}$ as in the proof of Lemma 1. As in the proof of Lemma 1, $D_+\underline{f}^{\mathrm {C}}(\underline{x})$ and $D_-\underline{f}^{\mathrm {C}}(\overline{x})$ each exist and are finite. Define the following function, which extends the domain of $\underline{f}^{\mathrm {C}}$ to ${\mathbb {R}}$:

The function $\psi $ is evidently continuous, and is ${\mathscr {C}}^1$ at each $y\in {\mathbb {R}}\backslash \{\underline{x},\overline{x}\}$. Applying the definitions of $D_+\underline{f}^{\mathrm {C}}$ and $D_-\underline{f}^{\mathrm {C}}$, it follows that $\psi $ is differentiable at $\underline{x}$ and $\overline{x}$ as well; thus,

This equation, together with [25, Theorem I.4.2.1(iii)], shows that $\psi $ is ${\mathscr {C}}^1$ even at $\underline{x}$ and $\overline{x}$, and is therefore ${\mathscr {C}}^1$ on ${\mathbb {R}}$. Hence, $\underline{u}^{\mathrm {C}}$ is Whitney-${\mathscr {C}}^1$ on . $\square $

1.3 Proof of Theorem 4

The proof of Theorem 4 uses several intermediate results concerning a generic optimal-value function described by the following assumption.

Assumption 3

For some $m\in {\mathbb {N}}$, consider a convex open set $X\subset {\mathbb {R}}^m$ and a convex set $C\subset X$ with nonempty interior. Define sets:

$$\begin{aligned} K&:=\{(\ell ,u): \ell \in C,\,\, u\in C,\,\, \ell \le u\}\subset {\mathbb {R}}^{2m}, \\ \text {and}\quad H&:=\{(\ell ,u): \ell \in X,\,\, u\in X,\,\, \ell \le u\}\subset {\mathbb {R}}^{2m}. \end{aligned}$$

Consider a convex ${\mathscr {C}}^1$ function $\psi :X\rightarrow {\mathbb {R}}$, and an optimal-value function:

$$\begin{aligned} \gamma :H\rightarrow {\mathbb {R}}:(\ell ,u)\mapsto \min \{\psi (\xi ):\ell \le \xi \le u\}. \end{aligned}$$

Lemma 2

Suppose that Assumption 3 holds with $m=2$. Define a function

$$\begin{aligned} \omega :H\times X\rightarrow {\mathbb {R}}:((a,b),c)\mapsto \gamma ((c,a),(c,b)). \end{aligned}$$

The function $\omega $ is convex and ${\mathscr {C}}^1$ on $\mathrm {int}(H)\times X$.

Proof

For each $((a,b),c)\in H\times X$, observe that $\omega ((a,b),c)=\min \{\psi (\xi ):\xi _1=c,\,\,a\le \xi _2\le b\}$. Hence, according to [48, Section 29], $\omega $ is convex. It then suffices by [48, Corollary 25.5.1] to show that $\omega $ is differentiable at some arbitrary $((\hat{a},\hat{b}),\hat{c})\in \mathrm {int}(H)\times X$. Let (CP${}_\omega $) denote the convex program $\min \{\psi (\xi ):\xi _1=\hat{c},\,\,\hat{a}\le \xi _2\le \hat{b}\}$. Weierstrass’s Theorem guarantees the existence of an optimal solution of (CP${}_\omega $); thus, choose a particular solution $\xi ^*\in C$. By [48, Theorem 28.3], there exists a Karush-Kuhn-Tucker (KKT) vector $(\lambda ,\mu )\in {\mathbb {R}}\times {\mathbb {R}}^2$ satisfying the following KKT conditions (among others) for all solutions $\eta ^*$ of (CP${}_\omega $) simultaneously:

$$\begin{aligned} 0&= {\nabla } {\psi }(\eta ^*) + \lambda e_{(1)} + \left( \mu _2-\mu _1\right) e_{(2)}, \\ \lambda&\in {\mathbb {R}}, \quad \mu _1\ge 0, \quad \mu _2\ge 0, \quad \mu _1\left( \hat{a}-\eta ^*_1\right) =0,\quad \mu _2\left( \eta ^*_2-\hat{b}\right) =0. \end{aligned}$$

Moreover, since $\hat{a}<\hat{b}$, any such vector $(\lambda ,\mu )$ is unique; when $\eta ^*:=\xi ^*$, the above KKT conditions imply:

$$\begin{aligned} \lambda = -\frac{\partial {\psi }}{\partial {\xi _1}}(\xi ^*), \quad \mu _1=\left\{ \begin{array}{ll} \frac{\partial {{\psi }}}{\partial {\xi _2}}(\xi ^*),&{}\quad \text {if }\xi ^*_2=\hat{a}, \\ 0, &{}\quad \text {if }\xi ^*_2\ne \hat{a}, \end{array} \right. \quad \text {and}\quad {\mu }_2=\left\{ \begin{array}{ll} -\frac{\partial {{\psi }}}{\partial {\xi _2}}(\xi ^*),&{} \quad \text {if }\xi ^*_2=\hat{b}, \\ 0, &{} \quad \text {if }\xi ^*_2\ne \hat{b}. \end{array} \right. \end{aligned}$$

According to [48, Corollary 29.1.3], this uniqueness shows that $\omega $ is differentiable at $((\hat{a},\hat{b}),\hat{c})$, as required.

Lemma 3

Suppose that Assumption 3 holds, $m=2$, C is compact, and there exists a vector $d\in {\mathbb {R}}^2$ such that, for all $\xi \in X$, $\langle {\nabla } \psi (\xi ), d \rangle > 0$. The function $\gamma $ is Whitney-${\mathscr {C}}^1$ on the compact set K.

Proof

Without loss of generality, suppose that $d\ge 0$; the other cases are handled similarly. Observe that K has nonempty interior under Assumption 3; to see this, choose any $x\in \mathrm {int}(C)$ and let $e\in {\mathbb {R}}^m$ be a vector whose components are all equal to unity. For sufficiently small $\tau >0$, $(x,x+\tau e)\in \mathrm {int}(K)$, as required.

By Proposition 3, it then suffices to show that, for some arbitrary $(\ell ,u)\in K$, there exists a neighborhood $N\in {\mathbb {R}}^2\times {\mathbb {R}}^2$ of $(\ell ,u)$ and a ${\mathscr {C}}^1$ function $\phi :N\rightarrow {\mathbb {R}}$ for which $\phi \equiv \gamma $ on $N\cap K$. Now,

$$\begin{aligned} 0<\langle {\nabla } \psi (\ell ), d \rangle =d_1\frac{\partial {\psi }}{\partial {x_1}}(\ell ) + d_2\frac{\partial {\psi }}{\partial {x_2}}(\ell ). \end{aligned}$$

Since $d\ge 0$, the above inequality implies that $\frac{\partial {\psi }}{\partial {x_i}}(\ell )> 0$ for some $i\in \{1,2\}$. Suppose that $\frac{\partial {\psi }}{\partial {x_1}}(\ell )> 0$; the case in which $\frac{\partial {\psi }}{\partial {x_2}}(\ell )> 0$ is handled similarly. Since ${\nabla } \psi $ is continuous on X, there exists a neighborhood $N_\ell \subset X$ of $\ell $ for which $\frac{\partial {\psi }}{\partial {x_1}}(a)>0$ for each $a\in N_\ell $. Since $N_\ell $ is open and $C\subset X$ is compact, choose $\delta >0$ for which:

$y\in X$ whenever $x\in C$ and $\Vert y-x\Vert <3\delta $, and
$a\in N_\ell $ whenever $\Vert a-\ell \Vert <3\delta $.

Define a neighborhood

$$\begin{aligned} N_{(\ell ,u)}:=\left\{ (a,b)\in X^2:\Vert (a,b)-(\ell ,u)\Vert <\delta \right\} , \end{aligned}$$

and a function:

$$\begin{aligned} \phi :N_{(\ell ,u)}\rightarrow {\mathbb {R}}: (a,b)&\mapsto \gamma ((a_1,a_2),(a_1,u_2+2\delta )) \\&\qquad +{} \gamma ((a_1,\ell _2-2\delta ),(a_1,b_2)) - \gamma ((a_1,\ell _2-2\delta ),(a_1,u_2+2\delta )). \end{aligned}$$

Since $\ell ,u\in C$ and $\ell \le u$, if $(a,b)\in N_{(\ell ,u)}$, then $a_2<u_2+2\delta $ and $\ell _2-2\delta <b_2$. Thus, $\phi $ is indeed well-defined. Lemma 2 shows that each “$\gamma $” term in the definition of $\phi $ is ${\mathscr {C}}^1$ with respect to (a, b), which in turn shows that $\phi $ is ${\mathscr {C}}^1$ on $N_{(\ell ,u)}$.

To complete this proof, it will be shown that $\phi \equiv \gamma $ on $N_{(\ell ,u)}\cap K$. Consider some arbitrary point $(a,b)\in N_{(\ell ,u)}\cap K$, and let ($\text {CP}_\gamma $) denote the convex program:

$$\begin{aligned} \min \{\psi (\xi ):a\le \xi \le b\}. \end{aligned}$$

First, it will be shown that the set $\{\xi \in [a,b]:\xi _1=a_1\text { or }\xi _2=a_2\}$ contains all solutions of ($\text {CP}_\gamma $). To obtain a contradiction, suppose that this is not so, and choose some solution $\eta ^*$ of ($\text {CP}_\gamma $) accordingly for which $a_1<\eta ^*_1\le b_1$ and $a_2<\eta ^*_2\le b_2$. Since $d\ge 0$, there exists $\tau >0$ solving the linear program:

$$\begin{aligned} \max \left\{ \tau \ge 0: \eta ^*_1-\tau d_1\ge a_1, \quad \eta ^*_2-\tau d_2\ge a_2\right\} . \end{aligned}$$

Thus, with $\zeta :=\eta ^*-\tau d$, $\zeta \in [a,b]$ and either $\zeta _1=a_1$ or $\zeta _2=a_2$. By the mean-value theorem, there exists $s\in [0,\tau ]$ for which

$$\begin{aligned} \psi (\zeta )=\psi (\eta ^*-\tau d) = \psi (\eta ^*) - \tau \langle {\nabla } \psi (\eta ^*-sd), d \rangle < \psi (\eta ^*), \end{aligned}$$

which contradicts the definition of $\eta ^*$.

Thus, all solutions of ($\text {CP}_\gamma $) lie in the set $\{\xi \in [a,b]:\xi _1=a_1\text { or }\xi _2=a_2\}$. Now, since the mapping $t\mapsto \psi (a+te_{(1)})$ is convex on $[0,b_1-a_1]$, the mapping $t\mapsto \frac{\partial {\psi }}{\partial {x_1}}(a+te_{(1)})$ is increasing on $[0,b_1-a_1]$. This implies:

$$\begin{aligned} 0\le t\frac{\partial {\psi }}{\partial {x_1}}(a) \le \int _0^t\frac{\partial {\psi }}{\partial {x_1}}\left( a+se_{(1)}\right) \,{\hbox {d}}s =\psi \left( a+te_{(1)}\right) -\psi (a), \quad \forall t\in [0,b_1-a_1]. \end{aligned}$$

Hence, there exists a solution of ($\text {CP}_\gamma $) in the set $\{\xi \in [a,b]:\xi _1=a_1\}$, which implies that

$$\begin{aligned} \gamma (a,b)=\gamma ((a_1,a_2),(a_1,b_2)). \end{aligned}$$

The convexity of $\psi $ then yields:

$$\begin{aligned} \gamma (a,b)=\max \left\{ \gamma ((a_1,a_2),(a_1,u_2+2\delta )),\,\,\, \gamma ((a_1,\ell _2-2\delta ),(a_1,b_2))\right\} ; \end{aligned}$$

(19)

to see this, assume, to obtain a contradiction, that (19) does not hold. In this case, since $(a,b)\in N_{(\ell ,u)}$ implies that $b_2<u_2+2\delta $ and $\ell _2-2\delta <a_2$, the definition of $\gamma $ implies that the left-hand side of (19) is strictly greater than the right-hand side. Thus, there exist $\underline{\beta }\in [\ell _2-2\delta ,b_2]$ and $\overline{\beta }\in [a_2,u_2+2\delta ]$ for which

The above inequalities and the definition of $\gamma $ imply that neither $\underline{\beta }$ nor $\overline{\beta }$ are contained in the interval $[a_2,b_2]$; thus, $\underline{\beta }\in [\ell _2-2\delta ,a_2]$ and $\overline{\beta }\in [b_2,u_2+2\delta ]$. Since $[a_2,b_2]\subset [\underline{\beta },\overline{\beta }]$, the convexity of $\psi $ would then imply that

$$\begin{aligned} \psi \left( a_1,\tfrac{1}{2}(a_2+b_2)\right) < \gamma ((a_1,a_2),(a_1,b_2)), \end{aligned}$$

which is contradicted by the definition of $\gamma $.

Lastly, the following equation follows from the definition of $\gamma $:

$$\begin{aligned}&\gamma ((a_1,\ell _2-2\delta ),(a_1,u_2+2\delta ))\nonumber \\&\quad = \min \left\{ \gamma ((a_1,a_2),(a_1,u_2+2\delta )),\,\,\, \gamma ((a_1,\ell _2-2\delta ),(a_1,b_2))\right\} . \end{aligned}$$

(20)

Along with the definition of $\phi $, (19) and (20) show that $\phi (a,b)=\gamma (a,b)$, as required. $\square $

Lemma 4

Suppose that Assumption 3 holds, $m=2$, $X={\mathbb {R}}^2$, C is compact, and there exists a nonzero vector $d\in {\mathbb {R}}^2$ such that, for all $\xi \in {\mathbb {R}}^2$, $\langle {\nabla } \psi (\xi ), d \rangle = 0$. The function $\gamma $ is Whitney-${\mathscr {C}}^1$ on the compact set K.

Proof

Since $d\ne 0$, either $d_1\ne 0$, $d_2\ne 0$, or both. Thus, without loss of generality, suppose that $d_1\ge 0$ and $d_2>0$; the other cases are handled similarly. Let $\pi $ denote the linear transformation $\xi \in {\mathbb {R}}^2\mapsto (\xi _1-(\frac{d_1}{d_2})\xi _2,0)\in {\mathbb {R}}^2$, and observe that $\pi (\xi ) = \xi - (\tfrac{\xi _2}{d_2})d$ for each $\xi \in {\mathbb {R}}^2$. Thus,

$$\begin{aligned} \psi (\pi (\xi ))=\psi (\xi )-\int _0^{\tfrac{\xi _2}{d_2}}\left\langle {{\nabla } \psi (\xi -sd)},{d}\right\rangle \mathrm {d}s=\psi (\xi ), \quad \forall \xi \in {\mathbb {R}}^2; \end{aligned}$$

which implies that, for each $(\ell ,u)\in K$,

$$\begin{aligned} \gamma (\ell ,u)=\min \{\psi (\eta ):\eta =\pi (\xi ),\,\,\ell \le \xi \le u\}. \end{aligned}$$

Since the transformation $\pi $ is linear and $[\ell ,u]$ is convex, the set $\{\pi (\xi ):\ell \le \xi \le u\}$ is the convex hull of $\{\pi (\ell _1,\ell _2),\pi (\ell _1,u_2),\pi (u_1,\ell _2),\pi (u_1,u_2)\}$. Since $d\ge 0$, this set is readily evaluated to be:

$$\begin{aligned} \{\pi (\xi ):\ell \le \xi \le u\} = \left\{ (\eta _1,0)\in {\mathbb {R}}^2:\ell _1-\left( \tfrac{d_1}{d_2}\right) u_2 \le \eta _1 \le u_1-\left( \tfrac{d_1}{d_2}\right) \ell _2\right\} . \end{aligned}$$

The following reformulation of $\gamma $ is thus obtained:

$$\begin{aligned} \gamma (\ell ,u) = \min \left\{ \psi (\eta _1,0): \ell _1-\left( \tfrac{d_1}{d_2}\right) u_2 \le \eta _1 \le u_1-\left( \tfrac{d_1}{d_2}\right) \ell _2\right\} , \quad \forall (\ell ,u)\in H. \end{aligned}$$

Since the univariate mapping $\eta _1\in {\mathbb {R}}\mapsto \psi (\eta _1,0)$ is convex and ${\mathscr {C}}^1$, Theorem 2 implies that $\gamma $ is Whitney-${\mathscr {C}}^1$ on K. $\square $

Theorem 4 is then proved as follows. The claim of the theorem regarding $\underline{g}^{\mathrm {C}}$ will be demonstrated; a similar argument yields the claim regarding $\overline{g}^{\mathrm {C}}$. Define a compact convex set

$$\begin{aligned} B:=\{(\ell ,u): \ell \in X, u\in X, \ell \le u\}\subset {\mathbb {R}}^2\times {\mathbb {R}}^2. \end{aligned}$$

Lemmas 3 and 4 show that the function: $\gamma :B\rightarrow {\mathbb {R}}:(\ell ,u)\mapsto \min \left\{ \underline{\phi }^{\mathrm {C}}(\xi ):\ell \le \xi \le u\right\} $ is Whitney-${\mathscr {C}}^1$. Observing that $\underline{g}^{\mathrm {C}}$ is equivalent to the mapping

$$\begin{aligned} z\mapsto \gamma \left( \underline{f}^{\mathrm {C}}_1(z),\ldots ,\underline{f}^{\mathrm {C}}_m(z),\overline{f}^{\mathrm {C}}_1(z),\ldots ,\overline{f}^{\mathrm {C}}_m(z)\right) \end{aligned}$$

on Z, the chain rule for Whitney-${\mathscr {C}}^1$ functions applies to this representation of $\underline{g}^{\mathrm {C}}$. This yields the required result. $\square $

Gradients for suggested multivariate relaxations

The following two propositions present gradients for the Whitney-${\mathscr {C}}^1$ relaxations provided in Theorems 6 and 7. In each case, the provided gradients may be computed directly using the chain rule for Whitney-${\mathscr {C}}^1$ functions.

Proposition 15

Suppose that the conditions of Theorem 6 hold. Gradients for the relaxations $\underline{g}^{\mathrm {C}}_\times $ and $\overline{g}^{\mathrm {C}}_\times $ are as follows, at any $z\in Z$. Arguments of partial derivatives are suppressed here, and are the same as the analogous function arguments in Theorem 6. The partial derivatives of $\underline{\psi }_{\times ,\mathrm {A}}$ may be computed at any argument corresponding to the “min” in the definition of $\underline{g}^{\mathrm {C}}_{\times ,\mathrm {A}}$; this follows from a gradient invariance property of Mangasarian [37].

$$\begin{aligned} {\nabla } \underline{g}^{\mathrm {C}}_\times (z)&=\left\{ \begin{array}{ll} \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \underline{f}^{\mathrm {C}}_1(z) + \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }0\le {\underline{x}_1}\text { and }0\le \underline{x}_2, \\ -\frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \overline{f}^{\mathrm {C}}_1(z) - \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }{\overline{x}_1}\le 0\text { and }\overline{x}_2\le 0, \\ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_1(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_1(z) \\ \qquad {}+ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_2(z)&{}\\ \qquad + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {otherwise,} \end{array} \right. \\ {\nabla } \overline{g}^{\mathrm {C}}_\times (z)&=\left\{ \begin{array}{ll} \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \overline{f}^{\mathrm {C}}_1(z) - \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }{\overline{x}_1}\le 0 \text { and }0\le \underline{x}_2, \\ -\frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \underline{f}^{\mathrm {C}}_1(z) + \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }0\le {\underline{x}_1}\text { and }\overline{x}_2\le 0, \\ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_1(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_1(z) \\ \qquad {}- \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_2(z)&{}\\ \qquad - \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {otherwise,} \end{array} \right. \end{aligned}$$

where the required partial derivatives of $\underline{\psi }_{\times ,\mathrm {A}}$ and $\underline{\psi }_{\times ,\mathrm {B}}$ are as follows.

$$\begin{aligned} \frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}(x,y,\varvec{\zeta },\varvec{\eta })&= \frac{1}{2}\left( \underline{\eta }+\overline{\eta } + (\mu +1)\left( \overline{\eta }-\underline{\eta }\right) \left( \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right) \left| \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right| ^{\mu +1}\right) , \\ \frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}(x,y,\varvec{\zeta },\varvec{\eta })&= \frac{1}{2}\left( \underline{\zeta }+\overline{\zeta } + (\mu +1)\left( \overline{\zeta }-\underline{\zeta }\right) \left( \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right) \left| \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right| ^{\mu +1}\right) , \\ \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}(x,y,\varvec{\zeta },\varvec{\eta })&= \underline{\eta } + (\mu +1)\left( \overline{\eta }-\underline{\eta }\right) \left( \max \left\{ 0,\tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right\} \right) ^\mu , \\ \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}(x,y,\varvec{\zeta },\varvec{\eta })&= \underline{\zeta } + (\mu +1)\left( \overline{\zeta }-\underline{\zeta }\right) \left( \max \left\{ 0,\tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right\} \right) ^\mu . \end{aligned}$$

Proposition 16

Suppose that the conditions of Theorem 7 hold. Gradients for the relaxations $\underline{g}^{\mathrm {C}}_{\max }$ and $\overline{g}^{\mathrm {C}}_{\max }$ are as follows, at any $z\in Z$. The required partial derivatives of $\underline{\psi }_{\max }$ may be evaluated at any argument that yields $\underline{g}^{\mathrm {C}}_{\max }$ in Theorem 7; this follows from a gradient invariance property of Mangasarian [37].

$$\begin{aligned} {\nabla } \underline{g}^{\mathrm {C}}_{\max }(z)&=\left\{ \begin{array}{ll} {\nabla } \underline{f}^{\mathrm {C}}_1(z), &{}\quad \text {if }\overline{x}_2\le \underline{x}_1, \\ {\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if }\overline{x}_1\le \underline{x}_2, \\ \\ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {x}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_1(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {x}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_1(z) \\ \qquad {}+ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {y}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_2(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {y}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {otherwise,} \end{array} \right. \end{aligned}$$

and the mapping $\overline{g}^{\mathrm {C}}_{\max }$ has the following gradient, evaluated at any $z\in Z$:

$$\begin{aligned} {\nabla } \overline{g}^{\mathrm {C}}_{\max }(z)&=\left\{ \begin{array}{ll} {\nabla } \overline{f}^{\mathrm {C}}_1(z),&{} \quad \text {if }\overline{x}_2\le \underline{x}_1, \\ {\nabla } \overline{f}^{\mathrm {C}}_2(z),&{} \quad \text {if }\overline{x}_1\le \underline{x}_2, \\ \left( \frac{\max \{\overline{x}_1,\overline{x}_2\} - \max \{\underline{x}_1,\overline{x}_2\}}{\overline{x}_1-\underline{x}_1}\right) {\nabla } \overline{f}^{\mathrm {C}}_1(z) &{}\\ \qquad + \left( \frac{\max \{\overline{x}_1,\overline{x}_2\} - \max \{\overline{x}_1,\underline{x}_2\}}{\overline{x}_2-\underline{x}_2}\right) {\nabla } \overline{f}^{\mathrm {C}}_2(z) \\ \qquad {}+(\mu +1){\Delta }\left( \max \left\{ 0,\frac{\overline{x}_1-\overline{f}^{\mathrm {C}}_1(z)}{\overline{x}_1-\underline{x}_1} -\frac{\overline{f}^{\mathrm {C}}_2(z)-\underline{x}_2}{\overline{x}_2-\underline{x}_2}\right\} \right) ^\mu &{}\\ \qquad \left( \frac{{\nabla } \overline{f}^{\mathrm {C}}_1(z)}{\overline{x}_1-\underline{x}_1} + \frac{{\nabla } \overline{f}^{\mathrm {C}}_2(z)}{\overline{x}_2-\underline{x}_2}\right) , &{} \quad \text {otherwise,} \end{array} \right. \end{aligned}$$

where ${\Delta }:=\max \{\underline{x}_1,\overline{x}_2\} +\max \{\overline{x}_1,\underline{x}_2\} -\max \{\underline{x}_1,\underline{x}_2\} -\max \{\overline{x}_1,\overline{x}_2\}$, and where the required partial derivatives of $\underline{\psi }_{\max }$ are as follows.

$$\begin{aligned} \frac{\partial {\underline{\psi }_{\max }}}{\partial {x}}(x,y,\varvec{\zeta },\varvec{\eta })&= \left\{ \begin{array}{ll} 1 - (\mu +1)\left( \max \left\{ 0,\frac{y-x}{\overline{\eta }-\underline{\zeta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\eta }\le \underline{\zeta }<\overline{\eta }, \\ (\mu +1)\left( \max \left\{ 0,\frac{x-y}{\overline{\zeta }-\underline{\eta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\zeta }<\underline{\eta }<\overline{\zeta }, \\ \end{array} \right. \\ \frac{\partial {\underline{\psi }_{\max }}}{\partial {y}}(x,y,\varvec{\zeta },\varvec{\eta })&= \left\{ \begin{array}{ll} (\mu +1)\left( \max \left\{ 0,\frac{y-x}{\overline{\eta }-\underline{\zeta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\eta }\le \underline{\zeta }<\overline{\eta }, \\ 1 - (\mu +1)\left( \max \left\{ 0,\frac{x-y}{\overline{\zeta }-\underline{\eta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\zeta }<\underline{\eta }<\overline{\zeta }. \\ \end{array} \right. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, K.A., Watson, H.A.J. & Barton, P.I. Differentiable McCormick relaxations. J Glob Optim 67, 687–729 (2017). https://doi.org/10.1007/s10898-016-0440-6

Download citation

Received: 07 May 2015
Accepted: 12 May 2016
Published: 27 May 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10898-016-0440-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differentiable McCormick relaxations

Abstract

Access this article

Similar content being viewed by others

The Role of grossone in Nonlinear Programming and Exact Penalty Methods

Adjoint Mode Computation of Subgradients for McCormick Relaxations

An ODE-like nonmonotone method for nonsmooth convex optimization

Change history

25 January 2018

References

Acknowledgments