Skip to main content
Log in

Differentiable McCormick relaxations

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

A Correction to this article was published on 25 January 2018

This article has been updated

Abstract

McCormick’s classical relaxation technique constructs closed-form convex and concave relaxations of compositions of simple intrinsic functions. These relaxations have several properties which make them useful for lower bounding problems in global optimization: they can be evaluated automatically, accurately, and computationally inexpensively, and they converge rapidly to the relaxed function as the underlying domain is reduced in size. They may also be adapted to yield relaxations of certain implicit functions and differential equation solutions. However, McCormick’s relaxations may be nonsmooth, and this nonsmoothness can create theoretical and computational obstacles when relaxations are to be deployed. This article presents a continuously differentiable variant of McCormick’s original relaxations in the multivariate McCormick framework of Tsoukalas and Mitsos. Gradients of the new differentiable relaxations may be computed efficiently using the standard forward or reverse modes of automatic differentiation. Extensions to differentiable relaxations of implicit functions and solutions of parametric ordinary differential equations are discussed. A C++ implementation based on the library MC++ is described and applied to a case study in nonsmooth nonconvex optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Change history

  • 25 January 2018

    These errata correct various errors in the closed-form relaxations provided by Khan, Watson, and Barton in the article “Differentiable McCormick Relaxations” (J Glob Optim, 67:687–729, 2017). Without these corrections, the provided closed-form relaxations may fail to be convex or concave and may fail to be valid relaxations.

References

  1. Achterberg, T.: SCIP: solving constraint integer programs. Math. Program. Comput. 1, 1–41 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Achterberg, T., Berthold, T., Koch, T., Wolter, K.: Constraint integer programming: a new approach to integrate CP and MIP. In: Proceedings of the Fifth International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 6–20. Paris (2008)

  3. Adjiman, C.S., Dallwig, S., Floudas, C.A., Neumaier, A.: A global optimization method, \(\alpha \)BB, for general twice-differentiable constrained NLPs—I. Theoretical advances. Comput. Chem. Eng. 22, 1137–1158 (1998)

    Article  Google Scholar 

  4. Alefeld, G., Mayer, G.: Interval analysis: theory and applications. J. Comput. Appl. Math. 121, 421–464 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, Hoboken (2006)

    Book  MATH  Google Scholar 

  6. Beckers, M., Mosenkis, V., Naumann, U.: Adjoint mode computation of subgradients for McCormick relaxations. In: Forth, S., Hovland, P., Phipps, E., Utke, J., Walther, A. (eds.) Recent Advances in Algorithmic Differentiation, pp. 103–113. Springer, Berlin (2012)

    Chapter  Google Scholar 

  7. Belotti, P.: COUENNE: A user’s manual. https://projects.coin-or.org/Couenne (2006)

  8. Bertsekas, D.P.: Nondifferentiable optimization via approximation. In: Balinski, M., Wolfe, P. (eds.) Mathematical Programming Study 3, pp. 1–25. North-Holland Publishing Company, Amsterdam (1975)

    Google Scholar 

  9. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  10. Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Glob. Optim. 52, 1–28 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  11. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  12. Broyden, C.G., Dennis Jr., J.E., Moré, J.J.: On the local and superlinear convergence of quasi-Newton methods. J. Inst. Math. Appl. 12, 223–245 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chachuat, B.: MC++: a toolkit for bounding factorable functions, v1.0. Retrieved 2 July 2014 https://projects.coin-or.org/MCpp (2014)

  14. Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)

    Book  MATH  Google Scholar 

  15. Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations. McGraw Hill Co., Inc., New York (1955)

    MATH  Google Scholar 

  16. Du, K., Kearfott, R.B.: The cluster problem in multivariate global optimization. J. Glob. Optim. 5, 253–265 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  17. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 2. Springer, New York (2003)

    MATH  Google Scholar 

  18. Feehery, W.F., Tolsma, J.E., Barton, P.I.: Efficient sensitivity analysis of large-scale differential-algebraic systems. Appl. Numer. Math. 25, 41–54 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  19. Gabriel, S.A., Moré, J.J.: Smoothing of Mixed Complementarity Problems. Preprint MCS-P541-0995, Argonne National Laboratory (1995)

  20. Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  21. Griewank, A., Rabier, P.J.: On the smoothness of convex envelopes. Trans. Am. Math. Soc. 322, 691–709 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Other Titles in Applied Mathematics, 2nd edn. SIAM, Philadelphia (2008)

    Book  MATH  Google Scholar 

  23. Grossmann, I.E., Yeomans, H., Kravanja, Z.: A rigorous disjunctive optimization model for simultaneous flowsheet optimization and heat integration. Comput. Chem. Eng. 22(98), 157–164 (1998)

    Article  Google Scholar 

  24. Hartman, P.: Ordinary Differential Equations, 2nd edn. SIAM, Philadelphia (2002)

    Book  MATH  Google Scholar 

  25. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (1993)

    Book  MATH  Google Scholar 

  26. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (1993)

    Book  MATH  Google Scholar 

  27. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 2nd edn. Springer, Berlin (1993)

    Book  MATH  Google Scholar 

  28. Kesavan, P., Allgor, R.J., Gatzke, E.P., Barton, P.I.: Outer approximation algorithms for separable nonconvex mixed-integer nonlinear programs. Math. Program. Ser. A 100, 517–535 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  29. Khan, K.A.: Sensitivity analysis for nonsmooth dynamic systems. Ph.D. thesis, Massachusetts Institute of Technology (2015)

  30. Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics. Springer, Berlin (1985)

    Book  MATH  Google Scholar 

  31. Lemaréchal, C., Strodiot, J.J., Bihain, A.: On a bundle algorithm for nonsmooth optimization. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming 4. Academic Press, New York (1981)

    Google Scholar 

  32. Li, X., Tomasgard, A., Barton, P.I.: Nonconvex generalized Benders decomposition for stochastic separable mixed-integer nonlinear programs. J. Optim. Theory Appl. 151, 425–454 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  33. Liberti, L., Pantelides, C.C.: Convex envelopes of monomials of odd degree. J. Glob. Optim. 25, 157–168 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  34. Lin, Y., Schrage, L.: The global solver in the LINDO API. Optim. Methods Softw. 24, 657–668 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  35. Mäkelä, M.M.: Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine MPBNGC 2.0. Reports of the Department of Mathematical Information Technology, Series B, Scientific computing B 13/2003, University of Jyväskylä (2003)

  36. Maly, T., Petzold, L.R.: Numerical methods and software for sensitivity analysis of differential-algebraic systems. Appl. Numer. Math. 20, 57–79 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  37. Mangasarian, O.L.: A simple characterization of solution sets of convex programs. Oper. Res. Lett. 7(1), 21–26 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  38. McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part I—convex underestimating problems. Math. Program. 10, 147–175 (1976)

    Article  MATH  Google Scholar 

  39. Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59, 503–526 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  40. Mitsos, A., Chachuat, B., Barton, P.I.: McCormick-based relaxations of algorithms. SIAM J. Optim. 20, 573–601 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  41. Moore, R.E.: Methods and Applications of Interval Analysis. SIAM, Philadelphia (1979)

    Book  MATH  Google Scholar 

  42. Najman, J., Mitsos, A.: Convergence analysis of multivariate McCormick relaxations. J. Glob. Optim. in press (2016)

  43. Naumann, U.: The Art of Differentiating Computer Programs. SIAM, Philadelphia (2012)

    MATH  Google Scholar 

  44. Neumaier, A.: Interval Methods for Systems of Equations. Cambridge University Press, Cambridge (1990)

    MATH  Google Scholar 

  45. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer series in operations research and financial engineering, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  46. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Classics in Applied Mathematics. SIAM, Philadelphia (2000)

    Book  MATH  Google Scholar 

  47. Qi, L., Sun, D.: Smoothing functions and smoothing Newton method for complementarity and variational inequality problems. J. Optim. Theory Appl. 113, 121–147 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  48. Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton University Press, Princeton (1970)

    Google Scholar 

  49. Ryoo, H.S., Sahinidis, N.V.: Global optimization of nonconvex NLPs and MINLPs with applications in process design. Comput. Chem. Eng. 19(5), 551–566 (1995)

    Article  Google Scholar 

  50. Sahinidis, N.V.: BARON: a general purpose global optimization software package. J. Glob. Optim. 8, 201–205 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  51. Sahinidis, N.V.: BARON 15.9: Global Optimization of Mixed-Integer Nonlinear Programs, User’s Manual. https://www.gams.com/help/topic/gams.doc/solvers/baron/index.html (2015)

  52. Schaber, S.D.: Tools for dynamic model development. Ph.D. thesis, Massachusetts Institute of Technology (2014)

  53. Scholz, D.: Theoretical rate of convergence for interval inclusion functions. J. Glob. Optim. 53, 749–767 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  54. Scott, J.K.: Reachability analysis and deterministic global optimization of differential-algebraic systems. Ph.D. thesis, Massachusetts Institute of Technology (2012)

  55. Scott, J.K., Barton, P.I.: Convex and concave relaxations for the parametric solutions of semi-explicit index-one differential-algebraic equations. J. Optim. Theory Appl. 156, 617–649 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  56. Scott, J.K., Barton, P.I.: Improved relaxations for the parametric solutions of ODEs using differential inequalities. J. Glob. Optim. 57, 143–176 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  57. Scott, J.K., Chachuat, B., Barton, P.I.: Nonlinear convex and concave relaxations for the solutions of parametric ODEs. Optim. Control Appl. Methods 34, 145–163 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  58. Scott, J.K., Stuber, M.D., Barton, P.I.: Generalized McCormick relaxations. J. Glob. Optim. 51, 569–606 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  59. Shor, N.Z.: Minimization Methods for Non-differentiable Functions. Springer series in computational mathematics. Springer, Berlin (1985)

    Book  MATH  Google Scholar 

  60. Stuber, M.D., Scott, J.K., Barton, P.I.: Convex and concave relaxations of implicit functions. Optim. Methods Softw. 30(3), 424–460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  61. Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Nonconvex Optimization and Its Applications. Springer, Dordrecht (2002)

    Book  MATH  Google Scholar 

  62. Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: a theoretical and computational study. Math. Program. A 99, 563–591 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  63. Tsoukalas, A., Mitsos, A.: Multivariate McCormick relaxations. J. Glob. Optim. 59, 633–662 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  64. Watson, H.A.J., Khan, K.A., Barton, P.I.: Multistream heat exchanger modeling and design. AIChE J. 61(10), 3390–3403 (2015)

    Article  Google Scholar 

  65. Wechsung, A.: Global optimization in reduced space. Ph.D. thesis, Massachusetts Institute of Technology (2014)

  66. Wechsung, A., Aspelund, A., Gundersen, T., Barton, P.I.: Synthesis of heat exchanger networks at subambient conditions with compression and expansion of process streams. AIChE J. 57(8), 2090–2108 (2011)

    Article  Google Scholar 

  67. Wechsung, A., Schaber, S.D., Barton, P.I.: The cluster problem revisited. J. Glob. Optim. 58, 429–438 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  68. Wechsung, A., Scott, J.K., Watson, H.A.J., Barton, P.I.: Reverse propagation of McCormick relaxations. J. Glob. Optim. 63(1), 1–36 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  69. Whitney, H.: Analytic extensions of differentiable functions defined in closed sets. Trans. Amer. Math. Soc. 36, 63–89 (1934)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Achim Wechsung and Spencer Schaber for several helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul I. Barton.

Additional information

This material was supported by Novartis Pharmaceuticals as part of the Novartis-MIT Center for Continuous Manufacturing, was also supported by Statoil, and was based in part on work supported by the US Department of Energy, Office of Science, under contract DE-AC02-06CH11357.

A correction to this article is available online at https://doi.org/10.1007/s10898-017-0601-2.

Appendices

Appendices

Proofs of results

1.1 Proof of Proposition 3

This proof proceeds by showing that the requirements of [69, Theorem I] are met by f on C. Since the components of f may be considered separately, it suffices to consider the case in which \(m=1\). For each \(x\in C\), assume that \(N_x\) is convex without loss of generality; if this is not true, then redefine \(N_x\) to be an open convex subset containing x. Since C is compact, there exists a finite subset \(I\subset C\) for which \(C\subset \bigcup _{x\in I}N_x\).

Suppose, to obtain a contradiction, that there exist \(x,\xi \in I\) and \(y\in C\) for which \(y\in N_x\cap N_\xi \) but \({\nabla } \phi _x(y)\ne {\nabla } \phi _\xi (y)\). Since C is convex and has nonempty interior, either \(y\in \mathrm {int}(C)\) or \(y\in \mathrm {bd}\left( \mathrm {int}(C)\right) \). In either case, there exists a sequence \(\{z_{(i)}\}_{i\in {\mathbb {N}}}\) in the nonempty open set \(\tilde{N}:=N_x\cap N_\xi \cap \mathrm {int}(C)\) that converges to y. Since \(\phi _x\equiv \phi _\xi \equiv f\) on \(\tilde{N}\), \({\nabla } \phi _x(z_{(i)})={\nabla } \phi _\xi (z_{(i)})\) for each i. The continuity of \({\nabla } \phi _x\) and \({\nabla } \phi _\xi \) on \(\tilde{N}\) then yields \({\nabla } \phi _{x}(y)={\nabla } \phi _\xi (y)\), which contradicts the choices of x, \(\xi \), and y. Thus, there exists a single continuous function \(g:C\rightarrow {\mathbb {R}}\) for which, for each \(x\in I\), \(g\equiv {\nabla } \phi _x\) on \(N_x\cap C\).

To show that f is Whitney-\({\mathscr {C}}^1\) on C, it suffices in light of [69, Theorem I] to show that, for each \(\epsilon >0\), there exists \(\delta _\epsilon >0\) for which

$$\begin{aligned} \Vert f(y)-f(x)-\langle g(x), y-x \rangle \Vert <\epsilon \Vert y-x\Vert \end{aligned}$$
(16)

whenever \(x,y\in C\) and \(\Vert y-x\Vert <\delta _\epsilon \). Thus, choose any \(\epsilon >0\). Since g is continuous on the compact set C, g is uniformly continuous on C; there exists \(\tilde{\delta }_\epsilon >0\) for which \(\Vert g(y)-g(x)\Vert <\epsilon \) whenever \(x,y\in C\) satisfy \(\Vert y-x\Vert <\tilde{\delta }_\epsilon \). Now, consider any \(x,y\in C\) with \(\Vert y-x\Vert <\tilde{\delta }_{\epsilon }\); the bound (16) will be shown to hold for x and y.

Define the line segment \(L:=\mathrm {conv}\{x,y\}\). Since L is compact and \(L\subset C\), choose \(J\subset I\) as a set for which \(L\subset \bigcup _{\xi \in J}N_\xi \) but \(L\not \subset (\bigcup _{\xi \in J}N_\xi )\backslash N_\eta \) for each \(\eta \in J\). Using these constructions, choose \(k\in {\mathbb {N}}\), \(0=\lambda _0<\lambda _1<\cdots <\lambda _{k}=1\), and \(\xi _{(1)},\ldots ,\xi _{(k)}\in J\) for which:

  • \(x_{(0)}:=x\in N_{\xi _{(1)}}\),

  • \(x_{(k)}:=y\in N_{\xi _{(k)}}\), and

  • \(x_{(q)}:=\lambda _qx + (1-\lambda _q)y\in N_{\xi _{(q+1)}}\cap N_{\xi _{(q)}}\cap L\), for each \(q\in \{1,2,\ldots ,k-1\}\).

Observe that, for each \(q\in \{1,\ldots ,k\}\), \(x_{(q-1)}\in N_{\xi _{(q)}}\) and \(x_{(q)}\in N_{\xi {(q)}}\). So, the mean-value theorem and the established properties of g yield the following, for some \(y_{(q)}\in \mathrm {conv}\{x_{(q-1)},x_{(q)}\}\subset L\):

$$\begin{aligned} f\left( x_{(q)}\right) -f\left( x_{(q-1)}\right) = \left\langle {{\nabla } \phi _{\xi _{(q)}}(y_{(q)})},{x_{(q)}-x_{(q-1)}}\right\rangle = \left\langle {g(y_{(q)})},{x_{(q)}-x_{(q-1)}}\right\rangle . \end{aligned}$$

Hence,

$$\begin{aligned}&\Vert f(y)-f(x)-\langle g(x), y-x \rangle \Vert \\&\qquad = \left\| \sum _{q=1}^k\left( f\left( x_{(q)}\right) -f\left( x_{(q-1)}\right) -\left\langle {g(x)},{x_{(q)}-x_{(q-1)}}\right\rangle \right) \right\| \\&\qquad =\left\| \sum _{q=1}^k\left\langle {g\left( y_{(q)}\right) -g(x)},{x_{(q)}-x_{(q-1)}}\right\rangle \right\| \\&\qquad \le \sum _{q=1}^k\left\| g\left( y_{(q)}\right) -g(x)\right\| \left\| x_{(q)}-x_{(q-1)}\right\| \\&\qquad \le \sum _{q=1}^k\epsilon \left\| x_{(q)}-x_{(q-1)}\right\| =\epsilon \Vert y-x\Vert , \end{aligned}$$

as required; the final equation above follows from the definitions of each \(x_{(q)}\) and the inequality chain \(0=\lambda _0<\lambda _1<\cdots <\lambda _k=1\). \(\square \)

1.2 Proof of Proposition 4

This proof employs the following intermediate result.

Lemma 1

Consider an interval , a Lipschitz continuous function , and the convex envelope of f on . Then, \(\underline{f}^{\mathrm {C}}(\underline{x})=f(\underline{x})\) and \(\underline{f}^{\mathrm {C}}(\overline{x})=f(\overline{x})\). Moreover, \(\underline{f}^{\mathrm {C}}\) is Lipschitz continuous on , with the same Lipschitz constant as f. Analogous results hold for the concave envelope of f on .

Proof

Only the convex envelope of f will be considered here; a similar argument addresses the concave envelope. The required results are trivial if \(\underline{x}=\overline{x}\), so assume that \(\underline{x}<\overline{x}\). Let \(k_f\) denote a Lipschitz constant for f on . Applying the definition of the convex envelope,

(17)

the first inequality above is due to f dominating \(\underline{f}^{\mathrm {C}}\), and the second inequality is due to \(\underline{f}^{\mathrm {C}}\) dominating each convex underestimator of f on . Setting y to \(\underline{x}\) in the above inequality chain yields \(\underline{f}^{\mathrm {C}}(\underline{x})=f(\underline{x})\).

A similar argument yields:

(18)

setting y to \(\overline{x}\) yields \(\underline{f}^{\mathrm {C}}(\overline{x})=f(\overline{x})\).

Thus, (17) and (18) become:

figure g

Defining \(D_+\underline{f}^{\mathrm {C}}\) and \(D_-\underline{f}^{\mathrm {C}}\) as the right-derivative and left-derivative of \(\underline{f}^{\mathrm {C}}\) described in [25, Theorem I.4.1.1], it follows from [25, Proposition I.4.1.3] that \(D_+\underline{f}^{\mathrm {C}}(\underline{x})\) and \(D_-\underline{f}^{\mathrm {C}}(\overline{x})\) both exist, are finite, and satisfy \(D_+\underline{f}^{\mathrm {C}}(\underline{x}) \ge -k_f\), and \(D_-\underline{f}^{\mathrm {C}}(\overline{x}) \le k_f\). Thus, \(\underline{f}^{\mathrm {C}}\) is continuous at \(\underline{x}\) and \(\overline{x}\). Moreover, [25, Theorem I.4.2.1] implies that for each , each subgradient of \(\underline{f}^{\mathrm {C}}\) at y is an element of \([-k_f,k_f]\). This result, combined with the mean-value theorem [25, Theorem I.4.2.4], shows that \(\underline{f}^{\mathrm {C}}\) is Lipschitz continuous on , with a Lipschitz constant of \(k_f\). \(\square \)

Using the above lemma, Proposition 4 may be proved as follows. Only the convex envelope of f will be considered here; a similar argument addresses the concave envelope. The required result is trivial if \(\underline{x}=\overline{x}\), so assume that \(\underline{x}<\overline{x}\). Theorem 3.2 in [21] implies that \(\underline{f}^{\mathrm {C}}\) is \({\mathscr {C}}^1\) on ; it remains to be shown that \(\underline{f}^{\mathrm {C}}\) is also \({\mathscr {C}}^1\) at \(\underline{x}\) and \(\overline{x}\). Noting that f is Lipschitz continuous on , construct the right-derivative \(D_+\underline{f}^{\mathrm {C}}\) and the left-derivative \(D_-\underline{f}^{\mathrm {C}}\) as in the proof of Lemma 1. As in the proof of Lemma 1, \(D_+\underline{f}^{\mathrm {C}}(\underline{x})\) and \(D_-\underline{f}^{\mathrm {C}}(\overline{x})\) each exist and are finite. Define the following function, which extends the domain of \(\underline{f}^{\mathrm {C}}\) to \({\mathbb {R}}\):

The function \(\psi \) is evidently continuous, and is \({\mathscr {C}}^1\) at each \(y\in {\mathbb {R}}\backslash \{\underline{x},\overline{x}\}\). Applying the definitions of \(D_+\underline{f}^{\mathrm {C}}\) and \(D_-\underline{f}^{\mathrm {C}}\), it follows that \(\psi \) is differentiable at \(\underline{x}\) and \(\overline{x}\) as well; thus,

This equation, together with [25, Theorem I.4.2.1(iii)], shows that \(\psi \) is \({\mathscr {C}}^1\) even at \(\underline{x}\) and \(\overline{x}\), and is therefore \({\mathscr {C}}^1\) on \({\mathbb {R}}\). Hence, \(\underline{u}^{\mathrm {C}}\) is Whitney-\({\mathscr {C}}^1\) on . \(\square \)

1.3 Proof of Theorem 4

The proof of Theorem 4 uses several intermediate results concerning a generic optimal-value function described by the following assumption.

Assumption 3

For some \(m\in {\mathbb {N}}\), consider a convex open set \(X\subset {\mathbb {R}}^m\) and a convex set \(C\subset X\) with nonempty interior. Define sets:

$$\begin{aligned} K&:=\{(\ell ,u): \ell \in C,\,\, u\in C,\,\, \ell \le u\}\subset {\mathbb {R}}^{2m}, \\ \text {and}\quad H&:=\{(\ell ,u): \ell \in X,\,\, u\in X,\,\, \ell \le u\}\subset {\mathbb {R}}^{2m}. \end{aligned}$$

Consider a convex \({\mathscr {C}}^1\) function \(\psi :X\rightarrow {\mathbb {R}}\), and an optimal-value function:

$$\begin{aligned} \gamma :H\rightarrow {\mathbb {R}}:(\ell ,u)\mapsto \min \{\psi (\xi ):\ell \le \xi \le u\}. \end{aligned}$$

Lemma 2

Suppose that Assumption 3 holds with \(m=2\). Define a function

$$\begin{aligned} \omega :H\times X\rightarrow {\mathbb {R}}:((a,b),c)\mapsto \gamma ((c,a),(c,b)). \end{aligned}$$

The function \(\omega \) is convex and \({\mathscr {C}}^1\) on \(\mathrm {int}(H)\times X\).

Proof

For each \(((a,b),c)\in H\times X\), observe that \(\omega ((a,b),c)=\min \{\psi (\xi ):\xi _1=c,\,\,a\le \xi _2\le b\}\). Hence, according to [48, Section 29], \(\omega \) is convex. It then suffices by [48, Corollary 25.5.1] to show that \(\omega \) is differentiable at some arbitrary \(((\hat{a},\hat{b}),\hat{c})\in \mathrm {int}(H)\times X\). Let (CP\({}_\omega \)) denote the convex program \(\min \{\psi (\xi ):\xi _1=\hat{c},\,\,\hat{a}\le \xi _2\le \hat{b}\}\). Weierstrass’s Theorem guarantees the existence of an optimal solution of (CP\({}_\omega \)); thus, choose a particular solution \(\xi ^*\in C\). By [48, Theorem 28.3], there exists a Karush-Kuhn-Tucker (KKT) vector \((\lambda ,\mu )\in {\mathbb {R}}\times {\mathbb {R}}^2\) satisfying the following KKT conditions (among others) for all solutions \(\eta ^*\) of (CP\({}_\omega \)) simultaneously:

$$\begin{aligned} 0&= {\nabla } {\psi }(\eta ^*) + \lambda e_{(1)} + \left( \mu _2-\mu _1\right) e_{(2)}, \\ \lambda&\in {\mathbb {R}}, \quad \mu _1\ge 0, \quad \mu _2\ge 0, \quad \mu _1\left( \hat{a}-\eta ^*_1\right) =0,\quad \mu _2\left( \eta ^*_2-\hat{b}\right) =0. \end{aligned}$$

Moreover, since \(\hat{a}<\hat{b}\), any such vector \((\lambda ,\mu )\) is unique; when \(\eta ^*:=\xi ^*\), the above KKT conditions imply:

$$\begin{aligned} \lambda = -\frac{\partial {\psi }}{\partial {\xi _1}}(\xi ^*), \quad \mu _1=\left\{ \begin{array}{ll} \frac{\partial {{\psi }}}{\partial {\xi _2}}(\xi ^*),&{}\quad \text {if }\xi ^*_2=\hat{a}, \\ 0, &{}\quad \text {if }\xi ^*_2\ne \hat{a}, \end{array} \right. \quad \text {and}\quad {\mu }_2=\left\{ \begin{array}{ll} -\frac{\partial {{\psi }}}{\partial {\xi _2}}(\xi ^*),&{} \quad \text {if }\xi ^*_2=\hat{b}, \\ 0, &{} \quad \text {if }\xi ^*_2\ne \hat{b}. \end{array} \right. \end{aligned}$$

According to [48, Corollary 29.1.3], this uniqueness shows that \(\omega \) is differentiable at \(((\hat{a},\hat{b}),\hat{c})\), as required.

Lemma 3

Suppose that Assumption 3 holds, \(m=2\), C is compact, and there exists a vector \(d\in {\mathbb {R}}^2\) such that, for all \(\xi \in X\), \(\langle {\nabla } \psi (\xi ), d \rangle > 0\). The function \(\gamma \) is Whitney-\({\mathscr {C}}^1\) on the compact set K.

Proof

Without loss of generality, suppose that \(d\ge 0\); the other cases are handled similarly. Observe that K has nonempty interior under Assumption 3; to see this, choose any \(x\in \mathrm {int}(C)\) and let \(e\in {\mathbb {R}}^m\) be a vector whose components are all equal to unity. For sufficiently small \(\tau >0\), \((x,x+\tau e)\in \mathrm {int}(K)\), as required.

By Proposition 3, it then suffices to show that, for some arbitrary \((\ell ,u)\in K\), there exists a neighborhood \(N\in {\mathbb {R}}^2\times {\mathbb {R}}^2\) of \((\ell ,u)\) and a \({\mathscr {C}}^1\) function \(\phi :N\rightarrow {\mathbb {R}}\) for which \(\phi \equiv \gamma \) on \(N\cap K\). Now,

$$\begin{aligned} 0<\langle {\nabla } \psi (\ell ), d \rangle =d_1\frac{\partial {\psi }}{\partial {x_1}}(\ell ) + d_2\frac{\partial {\psi }}{\partial {x_2}}(\ell ). \end{aligned}$$

Since \(d\ge 0\), the above inequality implies that \(\frac{\partial {\psi }}{\partial {x_i}}(\ell )> 0\) for some \(i\in \{1,2\}\). Suppose that \(\frac{\partial {\psi }}{\partial {x_1}}(\ell )> 0\); the case in which \(\frac{\partial {\psi }}{\partial {x_2}}(\ell )> 0\) is handled similarly. Since \({\nabla } \psi \) is continuous on X, there exists a neighborhood \(N_\ell \subset X\) of \(\ell \) for which \(\frac{\partial {\psi }}{\partial {x_1}}(a)>0\) for each \(a\in N_\ell \). Since \(N_\ell \) is open and \(C\subset X\) is compact, choose \(\delta >0\) for which:

  • \(y\in X\) whenever \(x\in C\) and \(\Vert y-x\Vert <3\delta \), and

  • \(a\in N_\ell \) whenever \(\Vert a-\ell \Vert <3\delta \).

Define a neighborhood

$$\begin{aligned} N_{(\ell ,u)}:=\left\{ (a,b)\in X^2:\Vert (a,b)-(\ell ,u)\Vert <\delta \right\} , \end{aligned}$$

and a function:

$$\begin{aligned} \phi :N_{(\ell ,u)}\rightarrow {\mathbb {R}}: (a,b)&\mapsto \gamma ((a_1,a_2),(a_1,u_2+2\delta )) \\&\qquad +{} \gamma ((a_1,\ell _2-2\delta ),(a_1,b_2)) - \gamma ((a_1,\ell _2-2\delta ),(a_1,u_2+2\delta )). \end{aligned}$$

Since \(\ell ,u\in C\) and \(\ell \le u\), if \((a,b)\in N_{(\ell ,u)}\), then \(a_2<u_2+2\delta \) and \(\ell _2-2\delta <b_2\). Thus, \(\phi \) is indeed well-defined. Lemma 2 shows that each “\(\gamma \)” term in the definition of \(\phi \) is \({\mathscr {C}}^1\) with respect to (ab), which in turn shows that \(\phi \) is \({\mathscr {C}}^1\) on \(N_{(\ell ,u)}\).

To complete this proof, it will be shown that \(\phi \equiv \gamma \) on \(N_{(\ell ,u)}\cap K\). Consider some arbitrary point \((a,b)\in N_{(\ell ,u)}\cap K\), and let (\(\text {CP}_\gamma \)) denote the convex program:

$$\begin{aligned} \min \{\psi (\xi ):a\le \xi \le b\}. \end{aligned}$$

First, it will be shown that the set \(\{\xi \in [a,b]:\xi _1=a_1\text { or }\xi _2=a_2\}\) contains all solutions of (\(\text {CP}_\gamma \)). To obtain a contradiction, suppose that this is not so, and choose some solution \(\eta ^*\) of (\(\text {CP}_\gamma \)) accordingly for which \(a_1<\eta ^*_1\le b_1\) and \(a_2<\eta ^*_2\le b_2\). Since \(d\ge 0\), there exists \(\tau >0\) solving the linear program:

$$\begin{aligned} \max \left\{ \tau \ge 0: \eta ^*_1-\tau d_1\ge a_1, \quad \eta ^*_2-\tau d_2\ge a_2\right\} . \end{aligned}$$

Thus, with \(\zeta :=\eta ^*-\tau d\), \(\zeta \in [a,b]\) and either \(\zeta _1=a_1\) or \(\zeta _2=a_2\). By the mean-value theorem, there exists \(s\in [0,\tau ]\) for which

$$\begin{aligned} \psi (\zeta )=\psi (\eta ^*-\tau d) = \psi (\eta ^*) - \tau \langle {\nabla } \psi (\eta ^*-sd), d \rangle < \psi (\eta ^*), \end{aligned}$$

which contradicts the definition of \(\eta ^*\).

Thus, all solutions of (\(\text {CP}_\gamma \)) lie in the set \(\{\xi \in [a,b]:\xi _1=a_1\text { or }\xi _2=a_2\}\). Now, since the mapping \(t\mapsto \psi (a+te_{(1)})\) is convex on \([0,b_1-a_1]\), the mapping \(t\mapsto \frac{\partial {\psi }}{\partial {x_1}}(a+te_{(1)})\) is increasing on \([0,b_1-a_1]\). This implies:

$$\begin{aligned} 0\le t\frac{\partial {\psi }}{\partial {x_1}}(a) \le \int _0^t\frac{\partial {\psi }}{\partial {x_1}}\left( a+se_{(1)}\right) \,{\hbox {d}}s =\psi \left( a+te_{(1)}\right) -\psi (a), \quad \forall t\in [0,b_1-a_1]. \end{aligned}$$

Hence, there exists a solution of (\(\text {CP}_\gamma \)) in the set \(\{\xi \in [a,b]:\xi _1=a_1\}\), which implies that

$$\begin{aligned} \gamma (a,b)=\gamma ((a_1,a_2),(a_1,b_2)). \end{aligned}$$

The convexity of \(\psi \) then yields:

$$\begin{aligned} \gamma (a,b)=\max \left\{ \gamma ((a_1,a_2),(a_1,u_2+2\delta )),\,\,\, \gamma ((a_1,\ell _2-2\delta ),(a_1,b_2))\right\} ; \end{aligned}$$
(19)

to see this, assume, to obtain a contradiction, that (19) does not hold. In this case, since \((a,b)\in N_{(\ell ,u)}\) implies that \(b_2<u_2+2\delta \) and \(\ell _2-2\delta <a_2\), the definition of \(\gamma \) implies that the left-hand side of (19) is strictly greater than the right-hand side. Thus, there exist \(\underline{\beta }\in [\ell _2-2\delta ,b_2]\) and \(\overline{\beta }\in [a_2,u_2+2\delta ]\) for which

The above inequalities and the definition of \(\gamma \) imply that neither \(\underline{\beta }\) nor \(\overline{\beta }\) are contained in the interval \([a_2,b_2]\); thus, \(\underline{\beta }\in [\ell _2-2\delta ,a_2]\) and \(\overline{\beta }\in [b_2,u_2+2\delta ]\). Since \([a_2,b_2]\subset [\underline{\beta },\overline{\beta }]\), the convexity of \(\psi \) would then imply that

$$\begin{aligned} \psi \left( a_1,\tfrac{1}{2}(a_2+b_2)\right) < \gamma ((a_1,a_2),(a_1,b_2)), \end{aligned}$$

which is contradicted by the definition of \(\gamma \).

Lastly, the following equation follows from the definition of \(\gamma \):

$$\begin{aligned}&\gamma ((a_1,\ell _2-2\delta ),(a_1,u_2+2\delta ))\nonumber \\&\quad = \min \left\{ \gamma ((a_1,a_2),(a_1,u_2+2\delta )),\,\,\, \gamma ((a_1,\ell _2-2\delta ),(a_1,b_2))\right\} . \end{aligned}$$
(20)

Along with the definition of \(\phi \), (19) and (20) show that \(\phi (a,b)=\gamma (a,b)\), as required. \(\square \)

Lemma 4

Suppose that Assumption 3 holds, \(m=2\), \(X={\mathbb {R}}^2\), C is compact, and there exists a nonzero vector \(d\in {\mathbb {R}}^2\) such that, for all \(\xi \in {\mathbb {R}}^2\), \(\langle {\nabla } \psi (\xi ), d \rangle = 0\). The function \(\gamma \) is Whitney-\({\mathscr {C}}^1\) on the compact set K.

Proof

Since \(d\ne 0\), either \(d_1\ne 0\), \(d_2\ne 0\), or both. Thus, without loss of generality, suppose that \(d_1\ge 0\) and \(d_2>0\); the other cases are handled similarly. Let \(\pi \) denote the linear transformation \(\xi \in {\mathbb {R}}^2\mapsto (\xi _1-(\frac{d_1}{d_2})\xi _2,0)\in {\mathbb {R}}^2\), and observe that \(\pi (\xi ) = \xi - (\tfrac{\xi _2}{d_2})d\) for each \(\xi \in {\mathbb {R}}^2\). Thus,

$$\begin{aligned} \psi (\pi (\xi ))=\psi (\xi )-\int _0^{\tfrac{\xi _2}{d_2}}\left\langle {{\nabla } \psi (\xi -sd)},{d}\right\rangle \mathrm {d}s=\psi (\xi ), \quad \forall \xi \in {\mathbb {R}}^2; \end{aligned}$$

which implies that, for each \((\ell ,u)\in K\),

$$\begin{aligned} \gamma (\ell ,u)=\min \{\psi (\eta ):\eta =\pi (\xi ),\,\,\ell \le \xi \le u\}. \end{aligned}$$

Since the transformation \(\pi \) is linear and \([\ell ,u]\) is convex, the set \(\{\pi (\xi ):\ell \le \xi \le u\}\) is the convex hull of \(\{\pi (\ell _1,\ell _2),\pi (\ell _1,u_2),\pi (u_1,\ell _2),\pi (u_1,u_2)\}\). Since \(d\ge 0\), this set is readily evaluated to be:

$$\begin{aligned} \{\pi (\xi ):\ell \le \xi \le u\} = \left\{ (\eta _1,0)\in {\mathbb {R}}^2:\ell _1-\left( \tfrac{d_1}{d_2}\right) u_2 \le \eta _1 \le u_1-\left( \tfrac{d_1}{d_2}\right) \ell _2\right\} . \end{aligned}$$

The following reformulation of \(\gamma \) is thus obtained:

$$\begin{aligned} \gamma (\ell ,u) = \min \left\{ \psi (\eta _1,0): \ell _1-\left( \tfrac{d_1}{d_2}\right) u_2 \le \eta _1 \le u_1-\left( \tfrac{d_1}{d_2}\right) \ell _2\right\} , \quad \forall (\ell ,u)\in H. \end{aligned}$$

Since the univariate mapping \(\eta _1\in {\mathbb {R}}\mapsto \psi (\eta _1,0)\) is convex and \({\mathscr {C}}^1\), Theorem 2 implies that \(\gamma \) is Whitney-\({\mathscr {C}}^1\) on K. \(\square \)

Theorem 4 is then proved as follows. The claim of the theorem regarding \(\underline{g}^{\mathrm {C}}\) will be demonstrated; a similar argument yields the claim regarding \(\overline{g}^{\mathrm {C}}\). Define a compact convex set

$$\begin{aligned} B:=\{(\ell ,u): \ell \in X, u\in X, \ell \le u\}\subset {\mathbb {R}}^2\times {\mathbb {R}}^2. \end{aligned}$$

Lemmas 3 and 4 show that the function: \(\gamma :B\rightarrow {\mathbb {R}}:(\ell ,u)\mapsto \min \left\{ \underline{\phi }^{\mathrm {C}}(\xi ):\ell \le \xi \le u\right\} \) is Whitney-\({\mathscr {C}}^1\). Observing that \(\underline{g}^{\mathrm {C}}\) is equivalent to the mapping

$$\begin{aligned} z\mapsto \gamma \left( \underline{f}^{\mathrm {C}}_1(z),\ldots ,\underline{f}^{\mathrm {C}}_m(z),\overline{f}^{\mathrm {C}}_1(z),\ldots ,\overline{f}^{\mathrm {C}}_m(z)\right) \end{aligned}$$

on Z, the chain rule for Whitney-\({\mathscr {C}}^1\) functions applies to this representation of \(\underline{g}^{\mathrm {C}}\). This yields the required result. \(\square \)

Gradients for suggested multivariate relaxations

The following two propositions present gradients for the Whitney-\({\mathscr {C}}^1\) relaxations provided in Theorems 6 and 7. In each case, the provided gradients may be computed directly using the chain rule for Whitney-\({\mathscr {C}}^1\) functions.

Proposition 15

Suppose that the conditions of Theorem 6 hold. Gradients for the relaxations \(\underline{g}^{\mathrm {C}}_\times \) and \(\overline{g}^{\mathrm {C}}_\times \) are as follows, at any \(z\in Z\). Arguments of partial derivatives are suppressed here, and are the same as the analogous function arguments in Theorem 6. The partial derivatives of \(\underline{\psi }_{\times ,\mathrm {A}}\) may be computed at any argument corresponding to the “min” in the definition of \(\underline{g}^{\mathrm {C}}_{\times ,\mathrm {A}}\); this follows from a gradient invariance property of Mangasarian [37].

$$\begin{aligned} {\nabla } \underline{g}^{\mathrm {C}}_\times (z)&=\left\{ \begin{array}{ll} \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \underline{f}^{\mathrm {C}}_1(z) + \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }0\le {\underline{x}_1}\text { and }0\le \underline{x}_2, \\ -\frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \overline{f}^{\mathrm {C}}_1(z) - \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }{\overline{x}_1}\le 0\text { and }\overline{x}_2\le 0, \\ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_1(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_1(z) \\ \qquad {}+ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_2(z)&{}\\ \qquad + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {otherwise,} \end{array} \right. \\ {\nabla } \overline{g}^{\mathrm {C}}_\times (z)&=\left\{ \begin{array}{ll} \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \overline{f}^{\mathrm {C}}_1(z) - \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }{\overline{x}_1}\le 0 \text { and }0\le \underline{x}_2, \\ -\frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}\,{\nabla } \underline{f}^{\mathrm {C}}_1(z) + \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}\,{\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if both }0\le {\underline{x}_1}\text { and }\overline{x}_2\le 0, \\ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_1(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_1(z) \\ \qquad {}- \max \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_2(z)&{}\\ \qquad - \min \left\{ 0,\frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {otherwise,} \end{array} \right. \end{aligned}$$

where the required partial derivatives of \(\underline{\psi }_{\times ,\mathrm {A}}\) and \(\underline{\psi }_{\times ,\mathrm {B}}\) are as follows.

$$\begin{aligned} \frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {x}}(x,y,\varvec{\zeta },\varvec{\eta })&= \frac{1}{2}\left( \underline{\eta }+\overline{\eta } + (\mu +1)\left( \overline{\eta }-\underline{\eta }\right) \left( \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right) \left| \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right| ^{\mu +1}\right) , \\ \frac{\partial {\underline{\psi }_{\times ,\mathrm {A}}}}{\partial {y}}(x,y,\varvec{\zeta },\varvec{\eta })&= \frac{1}{2}\left( \underline{\zeta }+\overline{\zeta } + (\mu +1)\left( \overline{\zeta }-\underline{\zeta }\right) \left( \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right) \left| \tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right| ^{\mu +1}\right) , \\ \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {x}}(x,y,\varvec{\zeta },\varvec{\eta })&= \underline{\eta } + (\mu +1)\left( \overline{\eta }-\underline{\eta }\right) \left( \max \left\{ 0,\tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right\} \right) ^\mu , \\ \frac{\partial {\underline{\psi }_{\times ,\mathrm {B}}}}{\partial {y}}(x,y,\varvec{\zeta },\varvec{\eta })&= \underline{\zeta } + (\mu +1)\left( \overline{\zeta }-\underline{\zeta }\right) \left( \max \left\{ 0,\tfrac{y-\underline{\eta }}{\overline{\eta }-\underline{\eta }} - \tfrac{\overline{\zeta }-x}{\overline{\zeta }-\underline{\zeta }}\right\} \right) ^\mu . \end{aligned}$$

Proposition 16

Suppose that the conditions of Theorem 7 hold. Gradients for the relaxations \(\underline{g}^{\mathrm {C}}_{\max }\) and \(\overline{g}^{\mathrm {C}}_{\max }\) are as follows, at any \(z\in Z\). The required partial derivatives of \(\underline{\psi }_{\max }\) may be evaluated at any argument that yields \(\underline{g}^{\mathrm {C}}_{\max }\) in Theorem 7; this follows from a gradient invariance property of Mangasarian [37].

$$\begin{aligned} {\nabla } \underline{g}^{\mathrm {C}}_{\max }(z)&=\left\{ \begin{array}{ll} {\nabla } \underline{f}^{\mathrm {C}}_1(z), &{}\quad \text {if }\overline{x}_2\le \underline{x}_1, \\ {\nabla } \underline{f}^{\mathrm {C}}_2(z), &{}\quad \text {if }\overline{x}_1\le \underline{x}_2, \\ \\ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {x}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_1(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {x}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_1(z) \\ \qquad {}+ \max \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {y}}\right\} {\nabla } \underline{f}^{\mathrm {C}}_2(z) + \min \left\{ 0,\frac{\partial {\underline{\psi }_{\max }}}{\partial {y}}\right\} {\nabla } \overline{f}^{\mathrm {C}}_2(z), &{}\quad \text {otherwise,} \end{array} \right. \end{aligned}$$

and the mapping \(\overline{g}^{\mathrm {C}}_{\max }\) has the following gradient, evaluated at any \(z\in Z\):

$$\begin{aligned} {\nabla } \overline{g}^{\mathrm {C}}_{\max }(z)&=\left\{ \begin{array}{ll} {\nabla } \overline{f}^{\mathrm {C}}_1(z),&{} \quad \text {if }\overline{x}_2\le \underline{x}_1, \\ {\nabla } \overline{f}^{\mathrm {C}}_2(z),&{} \quad \text {if }\overline{x}_1\le \underline{x}_2, \\ \left( \frac{\max \{\overline{x}_1,\overline{x}_2\} - \max \{\underline{x}_1,\overline{x}_2\}}{\overline{x}_1-\underline{x}_1}\right) {\nabla } \overline{f}^{\mathrm {C}}_1(z) &{}\\ \qquad + \left( \frac{\max \{\overline{x}_1,\overline{x}_2\} - \max \{\overline{x}_1,\underline{x}_2\}}{\overline{x}_2-\underline{x}_2}\right) {\nabla } \overline{f}^{\mathrm {C}}_2(z) \\ \qquad {}+(\mu +1){\Delta }\left( \max \left\{ 0,\frac{\overline{x}_1-\overline{f}^{\mathrm {C}}_1(z)}{\overline{x}_1-\underline{x}_1} -\frac{\overline{f}^{\mathrm {C}}_2(z)-\underline{x}_2}{\overline{x}_2-\underline{x}_2}\right\} \right) ^\mu &{}\\ \qquad \left( \frac{{\nabla } \overline{f}^{\mathrm {C}}_1(z)}{\overline{x}_1-\underline{x}_1} + \frac{{\nabla } \overline{f}^{\mathrm {C}}_2(z)}{\overline{x}_2-\underline{x}_2}\right) , &{} \quad \text {otherwise,} \end{array} \right. \end{aligned}$$

where \({\Delta }:=\max \{\underline{x}_1,\overline{x}_2\} +\max \{\overline{x}_1,\underline{x}_2\} -\max \{\underline{x}_1,\underline{x}_2\} -\max \{\overline{x}_1,\overline{x}_2\}\), and where the required partial derivatives of \(\underline{\psi }_{\max }\) are as follows.

$$\begin{aligned} \frac{\partial {\underline{\psi }_{\max }}}{\partial {x}}(x,y,\varvec{\zeta },\varvec{\eta })&= \left\{ \begin{array}{ll} 1 - (\mu +1)\left( \max \left\{ 0,\frac{y-x}{\overline{\eta }-\underline{\zeta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\eta }\le \underline{\zeta }<\overline{\eta }, \\ (\mu +1)\left( \max \left\{ 0,\frac{x-y}{\overline{\zeta }-\underline{\eta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\zeta }<\underline{\eta }<\overline{\zeta }, \\ \end{array} \right. \\ \frac{\partial {\underline{\psi }_{\max }}}{\partial {y}}(x,y,\varvec{\zeta },\varvec{\eta })&= \left\{ \begin{array}{ll} (\mu +1)\left( \max \left\{ 0,\frac{y-x}{\overline{\eta }-\underline{\zeta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\eta }\le \underline{\zeta }<\overline{\eta }, \\ 1 - (\mu +1)\left( \max \left\{ 0,\frac{x-y}{\overline{\zeta }-\underline{\eta }}\right\} \right) ^\mu , &{}\quad \text {if }\underline{\zeta }<\underline{\eta }<\overline{\zeta }. \\ \end{array} \right. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, K.A., Watson, H.A.J. & Barton, P.I. Differentiable McCormick relaxations. J Glob Optim 67, 687–729 (2017). https://doi.org/10.1007/s10898-016-0440-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-016-0440-6

Keywords

Mathematics Subject Classification

Navigation