Keywords

1 Introduction

Optimization problems and constraint satisfaction problems are ubiquitous in engineering, industry, and finance. These include the problem of finding an element of \(\mathbb {R}^n\) satisfying a finite set of constraints or determining that the constraints are unsatisfiable; the problem of bounding the value of an objective function over a domain defined by such a set of constraints; and the problem of finding a value of the domain that maximizes (or minimizes) the value of an objective function. Linear programming, revolutionized by Dantzig’s introduction of the simplex algorithm in 1947, deals with the case in which the constraints and objective function are linear. The development of interior point methods in the 1980s allows for the efficient solution of problems defined by convex constraints and objective functions, which gives rise to the field of convex programming [10, 36, 43]. Today there are numerous back-end solvers for convex optimization problems, including MOSEK [30], SeDuMi [41], and Gurobi [23]. They employ a variety of methods, each with its own particular strengths and weaknesses. (See [1, Section 1.2] for an overview.)

Using such software requires interpreting the problem one wants to solve in terms of one or more associated optimization problems. Often, this is straightforward; proving the safety of an engineered system might require showing that a certain quantity remains within specified bounds, and an industrial problem might require determining optimal or near-optimal allocation of certain resources. Other applications are less immediate. For example, proving an interesting mathematical theorem may require a lemma that bounds some quantity of interest (e.g. [4]). Once one has formulated the relevant optimization problems, one has to transform them into ones that the available software can solve, and one has to ensure that the conditions under which the software is designed to work correctly have been met. Mathematical knowledge and domain-specific expertise are often needed to transform a problem to match an efficient convex programming paradigm. A number of modeling packages then provide front ends that apply further transformations so that the resulting problem conforms to a back-end solver’s input specification [15, 17, 20, 26, 42]. The transformed problem is sent to the back-end solver and the solver produces a response, which then has to be reinterpreted in terms of the original problem.

Our goal here is to develop ways of using formal methods to make the passage from an initial mathematical problem to the use of a back-end solver more efficient and reliable. Expressing a mathematical problem in a computational proof assistant provides clarity by endowing claims with a precise semantics, and having a formal library at hand enables users to draw on a body of mathematical facts and reasoning procedures. These make it possible to verify mathematical claims with respect to the primitives and rules of a formal axiomatic foundation, providing strong guarantees as to their correctness. Complete formalization places a high burden on practitioners and often imposes a standard that is higher than users want or need, but verification is not an all-or-nothing affair: users should have the freedom to decide which results they are willing to trust and which ones ought to be formally verified.

With respect to the use of optimization software, the soundness of the software itself is one possible concern. Checking the correctness of a solution to a satisfaction problem is easy in principle: one simply plugs the result into the constraints and checks that they hold. Verifying the correctness of a bounding problem or optimization problem is often almost as easy, in principle, since the results are often underwritten by the existence of suitable certificates that are output by the optimization tools. In practice, these tasks are made more difficult by the fact that floating point calculation can introduce numerical errors that bear on the correctness of the solution.

Here, instead, we focus on the task of manipulating a problem and reducing it to a form that a back-end solver can handle. Performing such transformations in a proof assistant offers strong guarantees that the results are correct and have the intended meaning, and it enables users to perform the transformations interactively or partially, and thus introspect and explore the results of individual transformation steps. Moreover, in constructing and reasoning about the transformations, users can take advantage of an ambient mathematical library, including a database of functions and their properties.

In Section 3, we describe the process that CVXPY and other systems use to transform optimization problems expressed in the disciplined convex program (DCP) framework to conic form problems that can be sent to solvers like MOSEK [30]. In Section 4, we explain how our implementation in the Lean programming language and proof assistant [32, 33] augments that algorithm so that it at the same time produces a formal proof that the resulting reduction is correct. DCP relies on a library of basic atoms that serve as building blocks for reductions, and in Section 5, we explain how our implementation makes it possible to add new atoms in a verified way. In Section 6, we provide an example of the way that one can further leverage the power of an interactive theorem prover to justify the reduction of a problem that lies outside the DCP framework to one that lies within, using the mathematical library to verify its correctness. In Section 7, we describe our interface between Lean and an external solver, which transforms an exact symbolic representation of a problem into a floating point approximation. Related work is described in Section 8 and conclusions are presented in Section 9.

We have implemented these methods in a prototype, CvxLean.Footnote 1 We offer more information about the implementation in Section 9. A preliminary workshop paper [6] described our initial plans for this project and the reduction framework presented here in Section 2.

2 Optimization problems and reductions

The general structure of a minimization problem is expressed in Lean 4 as follows:

figure a

Here the data type D is the domain of the problem and R is the data type in which the objective function takes its values. The field objFun represents the objective function and constraints is a predicate on D, which, in Lean, is represented as a function from D to propositions: for every value a of the domain D, the proposition constraints a, which says that the constraints hold of a, is either true or false. The domain D is often \(\mathbb {R}^n\) or a space of matrices, but it can also be something more exotic, like a space of functions. The data type R is typically the real numbers, but in full generality it can be any type that supports an ordering. A maximization problem is represented as a minimization problem for the negation of the objective function.

A feasible point for the minimization problem p is an element point of D satisfying p.constraints. Lean’s foundational framework allows us to package the data point with the condition that it satisfies those constraints:

figure b

The curly and square brackets denote parameters that can generally be inferred automatically. A solution to the minimization problem p is a feasible point, denoted point, such that for every feasible point y the value of the objective function at point is smaller than or equal to the value at y.

figure c

Feasibility and bounding problems can also be expressed in these terms. If the objective function is constant (e.g. the constant zero function), a solution to the optimization problem is simply a feasible point. Given a domain, an objective function, and constraints, the value b is a strict lower bound on the value of the objective function over the domain if and only if the feasibility problem obtained by adding the inequality to the constraints has no solution.

Lean 4 allows us to implement convenient syntax for defining optimization problems. For example, the following specifies the problem of maximizing \(\sqrt{x - y}\) subject to the constraints \(y = 2x - 3\) and \(x^2 \le 2\):

figure e

The third condition, c3, ensures that the objective function makes sense and is concave on the domain determined by the constraints. In some frameworks, like CVXPY, this constraint is seen as implicit in the use of the expression sqrt (x - y), but we currently make it explicit in CvxLean. Problems can also depend on parameters and background conditions. For example, we can replace c1 above by y = a*x - 3 for a parameter a, and we can replace the objective function by b * sqrt (x - y) with the background assumption .

In Section 6, we will consider the covariance estimation for Gaussian variables, which can be expressed as follows, for a tuple of sample values y:

figure g

Here is Lean’s representation of the data type of \(n \times n\) matrices over the reals, gaussianPdf is the Gaussian probability density function defined in Section 6, and the constraint R.posDef specifies that R ranges over positive definite matrices.

If p and q are problems, a reduction from p to q is a function mapping any solution to q to a solution to p. The existence of such a reduction means that to solve p it suffices to solve q. If p is a feasibility problem, it means that the feasibility of q implies the feasibility of p, and, conversely, that the infeasibility of p implies the infeasibility of q. We can now easily describe what we are after: we are looking for a system that helps a user reduce a problem p to a problem q that can be solved by an external solver. (For a bounding problem q, the goal is to show that the constraints with the negated bound are infeasible by finding a reduction from an infeasible problem p.) At the same time, we wish to verify the correctness of the reduction, either automatically or with user interaction. This will ensure that the results from the external solver really address the problem that the user is interested in solving.

This notion of a reduction is quite general, and is not restricted to any particular kind of constraint or objective function. In the sections that follow, we explain how the notion can be applied to convex programming.

3 Reduction to conic form

Disciplined Convex Programming (DCP) is a framework for writing constraints and objective functions in such a way that they can automatically be transformed into problems that can be handled by particular back-end solvers. It aims to be flexible enough to express optimization problems in a natural way but restrictive enough to ensure that problems can be transformed to meet the requirements of the solvers. To start with, the framework guarantees that expressions satisfy the relevant curvature constraints [1, 21], assigning a role to each expression:

  • Constant expressions and variables are affine.

  • An expression \(f(\textsf{expr}_1, \ldots , \textsf{expr}_n)\) is affine if f is an affine function and for each i, \(\textsf{expr}_i\) is affine.

  • An expression \(f(\textsf{expr}_1, \ldots , \textsf{expr}_n)\) is convex if f is convex and for each i, one of the following conditions holds:

    • f is increasing in its ith argument and \(\textsf{expr}_i\) is convex.

    • f is decreasing in its ith argument and \(\textsf{expr}_i\) is concave.

    • \(\textsf{expr}_i\) is affine.

  • The previous statement holds with “convex” and “concave” switched.

An affine expression is both convex and concave. Some functions f come with side conditions on the range of arguments for which such curvature properties are valid; e.g. \(f(x) = \sqrt{x}\) is concave and increasing on the domain \(\{ x \in \mathbb {R} \mid x \ge 0 \}\).

A minimization problem is amenable to the DCP reduction if, following the rules above, its objective function is convex and the expressions occurring in its constraints are concave or convex, depending on the type of constraint. For example, maximizing \(\sqrt{x - y}\) requires minimizing \(- \sqrt{x - y}\), and the DCP rules tell us that the latter is a convex function of x and y on the domain where \(x - y \ge 0\), because \(x - y\) is affine, \(\sqrt{\cdot }\) is concave and increasing in its argument, and negation is affine and decreasing in its argument.

CvxLean registers the properties of atomic functions \(f(\bar{a})\) in a library of atoms. Each such function f is registered with a formal representation \(\textsf{expr}_f(\bar{a})\) using expressions, like x * log x or log (det A), that can refer to arbitrary functions defined in Lean’s library. The atom also registers the relevant properties of f. The curvature of f, \(\textsf{curv}_f\), has one of the values \(\textsf{convex}\), \(\textsf{concave}\), or \(\textsf{affine}\), and the monotonicity of the function in each of its arguments is tagged as \(\textsf{increasing}\), \(\textsf{decreasing}\), or \(\textsf{neither}\). CvxLean also allows the value \(\textsf{auxiliary}\), which indicates an expression that serves as a fixed parameter in the sense that it is independent of the variables in the optimization problem. Atoms can also come with background conditions \(\textsf{bconds}_f(\bar{a})\), which are independent of the domain variables, and variable conditions \(\textsf{vconds}_f(\bar{a})\), which constrain the domain on which the properties hold. Notably, the atoms also include proofs of properties that are needed to justify the DCP reduction.

By storing additional information with each atom, a DCP framework can use the compositional representation of expressions to represent a problem in a form appropriate to a back-end solver. For example, solvers like MOSEK expect problems to be posed in a certain conic form [30]. To that end, CVXPY stores a graph implementation for each atomic function f, which is a representation of f as the solution to a conic optimization problem. By definition, the graph implementation of an atomic function f is an optimization problem in conic form, given by a list of variables \(\bar{v}\), an objective function \(\textsf{obj}_f(\bar{x}, \bar{v})\), and a list of constraints \(\textsf{constr}_f(\bar{x}, \bar{v})\), such that the optimal value of the objective under the constraints is equal to \(f(\bar{x})\) for all \(\bar{x}\) in the domain of validity. For example, for any \(x \ge 0\), the concave function \(\sqrt{x}\) can be characterized as the maximum value of the objective function \(\textsf{obj}(x, t) = t\) satisfying the constraint \(\textsf{constr}(x, t)\) given by \(t^2 \le x\). Once again, a notable feature of CvxLean is that that the atom comes equipped with a formal proof of this fact.

The idea is that we can reduce a problem to the required form by iteratively replacing each application of an atomic function by an equivalent characterization in terms of the graph implementation. For example, we can replace a subexpression \(\sqrt{x - y}\) by a new variable t and add the constraint \(t^2 \le x - y\), provided that the form of the resulting problem ensures that, for any optimal solution to the constraints, t will actually be equal to \(\sqrt{x - y}\). Given a well-formed DCP minimization problem, CvxLean must perform the reduction and construct a formal proof of the associated claims. In this section we describe the reduction, and in the next section we describe the proofs. A more formal description of both are given in an extended version of this paper [7].

Let e be a well-formed DCP expression. CvxLean associates to each such expression a tree T whose leaves are expressions that are affine with respect to the variables of the optimization problem. For example, this is the tree associated with the expression -sqrt (x - y):

figure i

Alternatively, we could use a single leaf for x - y. Denoting the variables of the optimization problem by \(\bar{y}\), we can recursively assign to each node n a subexpression \(\textsf{oexpr}_n(\bar{y})\) of e that corresponds to the subtree with root n. In the example above, the subexpressions are x, y, x - y, sqrt (x - y), and -sqrt (x - y). To each internal node, we assign a curvature, \(\textsf{convex}\), \(\textsf{concave}\), or \(\textsf{affine}\), subject to the rules of DCP. An expression that is affine can be viewed as either convex or concave. Equalities and inequalities are also atoms; for example, \(e_1 \le e_2\) describes a convex set if and only if \(e_1\) is convex and \(e_2\) is concave. A formalization of the DCP rules allows us to recursively construct formal proofs of these curvature claims, modulo the conditions and assumptions of the problem. We elaborate on this process in the next section.

Now consider a well-formed DCP minimization problem with objective function o and constraints \(c_1, \dots , c_n\). We call these expressions the components of the problem. Recall the following example from the previous section, recast as a minimization problem:

figure j

Here the components are -sqrt (x - y), y = 2*x - 3, , and .

First, we assign to each component c an atom tree \(T_c\) as described above. If \(\bar{y}\) are the variables of the original problem, the variables of the reduced problem are \(\bar{y} \cup \bar{z}\), where \(\bar{z}\) is a collection of variables consisting of a fresh set of variables for the graph implementation at each internal node of each tree, for those atoms whose graph implementations introduce new variables. To each node n of each atom tree, we assign an expression \(\textsf{rexpr}_n(\bar{y}, \bar{z})\) in the language of the reduced problem representing the expression \(\textsf{oexpr}_n(\bar{y})\) in the original problem. At the leaves, \(\textsf{rexpr}_n(\bar{y}, \bar{z})\) is the same as \(\textsf{oexpr}_n(\bar{y})\). At internal nodes we use the objective function of the corresponding atom’s graph implementation, applied to the interpretation of the arguments. The objective of the reduced problem is the expression assigned to the root of \(T_o\).

As far as the constraints of the reduced problem, recall that each internal node of the original problem corresponds to an atom, which has a graph implementation. The graph implementation, in turn, is given by a list of variables \(\bar{v}\), an objective function \(\textsf{obj}_f(\bar{a}, \bar{v})\), and a list of constraints \(\textsf{constr}_f(\bar{a}, \bar{v})\). These constraints, applied to the expressions representing the arguments, are part of the reduced problem. Moreover, the constraints of the original problem, expressed in terms of the reduced problem, are also constraints of the reduced problem, with one exception. Recall that atoms can impose conditions \(\textsf{vconds}_f(\bar{a})\), which are assumed to be among the constraints of the original problem and to be implied by the graph implementation. For example, the condition is required to characterize \(\sqrt{x}\) as the maximum value of a value t satisfying , but, conversely, the existence of a t satisfying implies . So a constraint that is present in the original problem to justify the use of sqrt x can be dropped from the reduced problem.

In the example above, there is a tree corresponding to each of the components -sqrt (x - y), , , and y = 2*x - 3. As n ranges over the nodes of these trees, \(\textsf{oexpr}_n(x, y)\) ranges over all the subexpressions of these components, namely, x, y, x - y, sqrt (x - y), -sqrt (x - y), , 2, , and so on. The only atoms whose graph implementations introduce extra variables are the square root and the square. Thus, CvxLean introduces the variable t.0, corresponding to the expression sqrt (x - y), and the variable t.1, corresponding to the expression . The values of \(\textsf{rexpr}_n(x, y, t_0, t_1)\) corresponding to some of the expressions above are as follows:

figure w

The constraints c1 and c2 of the original problem translate to cone constraints c1’ and c2’ on the new variables, the constraint c3 is implied by the graph representation of , and the graph representations of sqrt (x - y) and become new cone constraints c4’ and c5’. Thus the reduced problem is as follows:

figure z

Here, ![t.0] and ![x] denote singleton vectors and the meaning of the cone constraints is annotated in the comments. For a description of the relevant conic forms, see the MOSEK modeling cookbook [31].

4 Verifying the reduction

The reduction described in the previous section is essentially the same as the one carried out by CVXPY. The novelty of CvxLean is that it provides a formal justification that the reduction is correct. The goal of this section is to explain how we manage to construct a formal proof of that claim. In fact, given a problem P with an objective function f, CvxLean constructs a new problem Q with an objective g, together with the following additional pieces of data:

  • a function \(\varphi \) from the domain of P to the domain of Q such that for any feasible point x of P, \(\varphi (x)\) is a feasible point of Q with \(g(\varphi (x)) \le f(x)\)

  • a function \(\psi \) from the domain of Q to the domain of P such that for any feasible point y of Q, \(\psi (y)\) is a feasible point of P with \(f(\psi (y)) \le g(y)\).

These conditions guarantee that if y is a solution to Q then \(\psi (y)\) is a solution to P, because for any feasible point x of P we have

$$ f(\psi (y)) \le g(y) \le g(\varphi (x)) \le f(x). $$

This shows that \(\psi \) is a reduction of P to Q, and the argument with P and Q swapped shows that \(\varphi \) is a reduction of Q to P. Moreover, whenever y is a solution to Q, instantiating x to \(\psi (y)\) in the chain of inequalities implies \(f(\psi (y)) = g(y)\). Similarly, when x is a solution to P, we have \(g(\varphi (x)) = f(x)\). So the conditions above imply that P has a solution if and only if Q has a solution, and when they do, the minimum values of the objective functions coincide. Below, we will refer to the data \((\varphi , \psi )\) as a strong equivalence between the two problems.

To construct and verify such a strong equivalence between the original problem and the result of applying the transformation described in Section 3, we need to store additional information with each atom. Specifically, for each atomic function \(f(\bar{a})\), that atom must provide solutions \(\textsf{sol}_f(\bar{a})\) to the graph implementation variables \(\bar{v}\), as well as formal proofs of the following facts:

  • The function \(f(\bar{a})\) satisfies the graph implementation: for each \(\bar{a}\) satisfying the conditions \(\textsf{vconds}_f(\bar{a})\), we have:

    • solution feasibility: \(\textsf{sol}_f(\bar{a})\) satisfies the constraints \(\textsf{constr}_f(\bar{a}, \textsf{sol}_f(\bar{a}))\)

    • solution correctness: we have \(\textsf{obj}_f(\bar{a}, \textsf{sol}_f(\bar{a})) = \textsf{expr}_f(\bar{a})\) , where \(\textsf{expr}_f(\bar{a})\) is the expression representing f.

  • The function \(f(\bar{a})\) is the optimal solution to the graph implementation, in the following sense. Write \(\bar{a}' \mathrel {\triangle } \bar{a}\) to express the assumptions that \(a'_i \ge a_i\) for increasing arguments to f, \(a'_i \le a_i\) for decreasing arguments, and \(a'_i\) and \(a_i\) are syntactically identical for other arguments. If f is convex and \(\bar{a} \mathrel {\triangle } \bar{a}'\), we require \(\textsf{obj}_f(\bar{a},\bar{v}) \ge \textsf{expr}_f(\bar{a}')\) for any \(\bar{v}\) such that \(\textsf{constr}_f(\bar{a}, \bar{v})\) holds. If f is concave and \(\bar{a}' \mathrel {\triangle } \bar{a}\), we require \(\textsf{obj}_f(\bar{a},\bar{v}) \le \textsf{expr}_f(\bar{a}')\) for any \(\bar{v}\) such that \(\textsf{constr}_f(\bar{a}, \bar{v})\) holds. For affine atoms, we require both.

Finally, as noted in the previous section, the graph implementation implies the conditions needed for the reduction. Under the assumptions on \(\bar{a}\) and \(\bar{a}'\) in the second case above, we also require a proof of \(\textsf{vconds}_f(\bar{a}')\). We refer to this as condition elimination.

For a concrete example, consider the atom for the concave function \(\sqrt{a}\). In that case, \(\textsf{vconds}(a)\) is the requirement \(a \ge 0\), and \(\textsf{expr}(a)\), the Lean representation of the function, is given by Lean’s sqrt function. The graph implementation adds a new variable v. The only constraint \(\textsf{constr}(a, v)\) is \(v^2 \le a\), and the objective function is \(\textsf{obj}(a, v) = v\). The solution function \(\textsf{sol}(a)\) returns \(\sqrt{a}\) when a is nonnegative and an arbitrary value otherwise. The atom for \(\sqrt{\cdot }\) stores Lean proofs of all of the following:

  • solution feasibility:

  • solution correctness:

  • optimality:

  • condition elimination: .

More precisely, the atom stores the representation of the graph of the square root function as a cone constraint, and the properties above are expressed in those terms. These properties entail that sqrt is concave, but we do not need to prove concavity explicitly.

Let the variables \(\bar{y}\) range over the domain of the original problem, P, and let the variables \(\bar{y}, \bar{z}\) be the augmented list of variables in the reduced problem, Q. We wish to construct a strong equivalence between P and Q. To that end, we need to define a forward map \(\varphi \) and a reverse map \(\psi \). The definition of \(\psi \) is easy: we simply project each tuple \(\bar{y}, \bar{z}\) to \(\bar{y}\). The definition of the forward map, \(\varphi \), is more involved, since we have to map each tuple \(\bar{y}\) of values to an expanded tuple \(\bar{y}, \bar{z}\). The values of \(\bar{y}\) remain unchanged, so the challenge is to define, for each new variable z, an expression \(\textsf{interp}_z(\bar{y})\) to interpret it.

Recall that for each subexpression \(\textsf{oexpr}_n(\bar{y})\) in the original problem, corresponding to a node n, there is an expression \(\textsf{rexpr}_n(\bar{y}, \bar{w})\) involving new variables from the reduced problem. Suppose a node n corresponds to an expression \(f(u_1, \ldots , u_n)\) in the original problem, and the graph implementation of f introduces new variables \(\bar{v}\). For each \(v_j\), we need to devise an interpretation \(\textsf{interp}_{v_j}(\bar{y})\). To start with, \(\textsf{sol}_f\) provides a solution to \(v_j\) in terms of the arguments \(u_1, \ldots , u_n\). For each of these arguments, \(\textsf{rexpr}\) provides a representation in terms of the variables \(\bar{y}\) and other new variables. Composing these, we get an expression \(\textsf{e}(\bar{y}, w_1, \ldots , w_{\ell })\) for \(v_j\) in terms of the variables \(\bar{y}\) of the original problem and new variables \(w_1, \ldots , w_{\ell }\). Recursively, we find interpretations \(\textsf{interp}_{w_k}(\bar{y})\) of each \(w_k\), and define \(\textsf{interp}_{v_j}(\bar{y})\) to be \(\textsf{e}(\bar{y}, \textsf{interp}_{w_1}(\bar{y}), \ldots , \textsf{interp}_{w_{\ell }}(\bar{y}))\). In other words, we read off the interpretation of each new variable of the reduced problem from the intended solution to the graph equation, which may, in turn, require the interpretation of other new variables that were previously introduced.

In the end, the forward map \(\varphi \) is the function that maps the variables \(\bar{y}\) in the original problem to the tuple \((\bar{y}, \textsf{interp}_{z_1}(\bar{y}), \dots , \textsf{interp}_{z_m}(\bar{y}))\), where \(z_1, \ldots , z_m\) are the new variables. To show that \((\varphi , \psi )\) is a strong equivalence, we must show that for any feasible point \(\bar{y}\) of the original problem, \(\varphi (\bar{y})\) is a feasible point of the reduced problem. This follows from the solution correctness requirement above. We also need to show that if \(f(\bar{y})\) is the objective function of the original problem and \(g(\bar{y}, \bar{z})\) is the objective function of the reduced problem, \(g(\varphi (\bar{y})) \le f(\bar{y})\). In fact, the solution correctness requirement enables us to prove the stronger property \(g(\varphi (\bar{y})) = f(\bar{y})\). Finally, we need to show that for any feasible point \(\bar{y}, \bar{z}\) of the reduced problem, the tuple \(\bar{y}\) is a feasible point of the original problem and \(f(\bar{y}) \le g(\bar{y}, \bar{z})\). To do that, we recursively use the optimality requirement to show \(\textsf{rexpr}_{n}(\bar{y},\bar{z}) \ge \textsf{oexpr}_{n}(\bar{y})\) whenever the node n marks a convex expression or an affine expression in the role of a convex expression, and \(\textsf{rexpr}_{n}(\bar{y},\bar{z}) \le \textsf{oexpr}_{n}(\bar{y})\) whenever the node n marks a concave expression or an affine expression in the role of a concave expression.

A proof that the maps \(\varphi \) and \(\psi \) constructed above form a strong equivalence can be found in the extended version of this paper [7], but it is helpful to work through the example from Section 3 to get a sense of what the proof means. For this example, the forward map is \(\varphi (x, y) = (x, y, \sqrt{x - y}, x^2)\) and the reverse map is \(\psi (x, y, t_0, t_1) = (x, y)\). Assuming that (xy) is a solution to the original problem, the fact that \(\varphi (x, y)\) satisfies c1’ follows from c1, the fact that it satisfies c2’ follows from c2, the fact that it satisfies c4’ and c5’ follows from the fact that \(\sqrt{x - y}\) and \(x^2\) are correct solutions to the graph implementation constraints. In this direction, \(g(\varphi (x, y)) = -\sqrt{x - y} = f(x, y)\). In the other direction, assuming that \((x, y, t_0, t_1)\) is a solution to the reduced problem, the fact that (xy) satisfies c1 follows from c1’, that fact that it satisfies c2 follows from c2’ and c5’, and the fact that is satisfies c3 follows from c4’. Here we have \(f(\psi (x, y, t_0, t_1)) = -\sqrt{x - y}\) and \(g(x, y, t_0, t_1) = -t_0\), and the fact that the former is less than or equal to the latter follows from c4’.

5 Adding atoms

One important advantage to using an interactive theorem prover as a basis for solving optimization problems is that it is possible to extend the atom library in a verified way. In a system like CVXPY, one declares a new atom with its graph implementation on the basis of one’s background knowledge or a pen-and-paper proof that the graph implementation is correct and that the function described has the relevant properties over the specified domain. In CvxLean, we have implemented syntax with which any user can declare a new atom in Lean and provide formal proofs of these facts. The declaration can be made in any Lean file, and it becomes available in any file that imports that one as a dependency. Lean has a build system and package manager that handles dependencies on external repositories, allowing a community of users to share such mathematical and computational content.

For example, the declaration of the atom for the logarithm looks as follows:

figure ae

The ellipses indicate places that are filled by formal proofs. Proof assistants like Lean allow users to write such proofs interactively in an environment that displays proof obligations, the local context, and error messages, all while the user types. For example, placing the cursor at the beginning of the optimality block displays the following goal:

figure af

In other words, given real values x and t and the relevant constraint in terms of the exponential cone, we need to prove that for every \(y \ge x\), we have \(t \le \log (y)\).

For the example we present in the next section, we had to implement the log-determinant atom [10, Example 9.5], whose arguments consist of a natural number n and a matrix \(A \in \mathbb {R}^{n \times n}\). This function is represented in Lean by the atom expression \(\textsf{expr}_{ log\text {-}det } = {\texttt {log (det A)}}\), where the parameter n is implicit in the type of A. The curvature is specified to be concave, the monotonicity in n is auxiliary because we do not support the occurrence of optimization variables in this argument, and the monotonicity in A is neither because the value of \(\log (\det A)\) is neither guaranteed to increase nor guaranteed to decrease as A increases. (The relevant order here on matrices is elementwise comparison.) The correctness of the reduction requires the assumption that A is positive definite. Following CVXPY, we used the following graph implementation:

figure ag

Here y is the diagonal of Y; Z is obtained from Y by setting all entries below the diagonal to 0; and D is obtained from Y by setting all entries off the diagonal to 0. Here, saying that the tuple (t, 1, y) is in the exponential cone means that \(e^{y_i} \ge t_i\) for each i. Our implementation in CvxLean required proving that this graph implementation is correct. To do so, we formalized an argument in the MOSEK documentation.Footnote 2 This, in turn, required proving properties of the Schur complement, triangular matrices, Gram-Schmidt orthogonalization, and LDL factorization. Moreover, the argument uses the subadditivity of the determinant function, for which we followed an argument by Andreas Thom on MathOverflow.Footnote 3

6 User-defined reductions

An even more important advantage of using an interactive proof assistant as a framework for convex optimization is that, with enough work, users can carry out any reduction that can be expressed and justified in precise mathematical terms. As a simple example, DCP cannot handle an expression of the form \( exp (x) exp (y)\) in a problem, requiring us instead to write it as \( exp (x + y)\). But in CvxLean, we have the freedom to express the problem in the first form if we prefer to and then verify that the trivial reduction is justified:

figure ah

Here the expression supplies the short formal proof that \( exp (x + y)\) can be replaced by \( exp (x) exp (y)\).

Of course, this functionality becomes more important as the reductions become more involved. As a more substantial example, we have implemented a reduction needed to solve the the covariance estimation problem for Gaussian variables [10, pp. 355]. In this problem, we are given N samples \(y_1, \dots, y_N \in \mathbb {R}^n\) drawn from a Gaussian distribution with zero mean and unknown covariance matrix R. We assume that the Gaussian distribution is nondegenerate, so R is positive definite and the distribution has density function

figure aj

We want to estimate the covariance matrix R using maximum likelihood estimation, i.e., we want to find the covariance matrix that maximizes the likelihood of observing \(y_1, \dots y_N \). The maximum likelihood estimate for R is the solution to the following problem:

$$\begin{aligned} \text {maximize}\;\;\prod _{k=1}^N p_R(y_k)\;\; \text {over}\;\; R\;\; \text {subject to}\;\; R \text { positive definite}. \end{aligned}$$

As stated, this problem has a simple analytic solution, namely, the sample covariance of \(y_1, \ldots , y_n\), but the problem becomes more interesting when one adds additional constraints, for example, upper and lower matrix bounds on R, or constraints on the condition number of R (see [10]). We can easily reduce the problem to maximizing the logarithm of the objective function above, but that is not a concave function of R. It is, however, a concave function of \(S = R^{-1}\), and common constraints on R translate to convex constraints on S. We can therefore reduce the problem above to the following:

$$\begin{aligned} \text {maximize}\;\; \log ( \det (S) ) - \sum _{k=1}^N y_k^T S y_k\;\; \text {over}\;\; S \;\; \text {subject to}\;\; S \text { positive definite}, \end{aligned}$$

possibly with additional constraints on S. We express the sum using the sample covariance \(Y = \frac{1}{N}\sum _{k=1}^N y_k y_k^T\) and the trace operator:

$$\begin{aligned}&\text {maximize}\;\; \log ( \det (S) ) - N \cdot {{\,\textrm{tr}\,}}(YS^T)\;\; \text {over}\;\; S\;\;\\&\text {subject to}\;\; S \text { positive definite} \end{aligned}$$

The problem can then be solved using disciplined convex programming. The constraint that S is positive definite is eliminated while applying the graph implementation of \( \log ( \det (S) )\).

We have formalized these facts in Lean and used them to justify the reduction. An example with an additional sparsity constraints on R can be found in CvxLean/Examples in our repository.

7 Connecting Lean to a conic optimization solver

Once a problem has been reduced to conic form, it can be sent to an external back-end solver. At this point, we must pass from the realm of precise symbolic representations and formal mathematical objects to the realm of numeric computation with floating point representations. We traverse our symbolic expressions, replacing functions on the reals from Lean’s mathematical library with corresponding numeric functions on floats, for example associating the floating point exponential function Float.exp to the real exponential function Real.exp. Our implementation makes it easy to declare such associations with the following syntax: addRealToFloat : Real.exp := Float.exp.

This is one area where more verification is possible. We could use verified libraries for floating point arithmetic [2, 9, 19, 44], we could use dual certificates to verify the results of the external solver, and we could carry out formal sensitivity analysis to manage and bound errors. Our current implementation is only designed to verify correctness up to the point where the problem is sent to the back-end solver, and to facilitate the last step, albeit in an unverified way.

We have implemented a solve command in CvxLean which takes a an optimization problem prob in DCP form and carries out the following steps:

  1. 1.

    It applies the dcp procedure to obtain a reduced problem, prob.reduced, and a reduction .

  2. 2.

    It carries out the translation to floats, traversing each expression and applying the registered translations.

  3. 3.

    It extracts the numerical data from the problem. At this point, we have scalars, arrays and matrices associated to every type of constraint.

  4. 4.

    It writes the problem to an external file in the conic benchmark format.Footnote 4

  5. 5.

    It calls MOSEK and receives a status code in return, together with a solution, if MOSEK succeeds in finding one. The problem status is added to the environment and if it is infeasible or ill-posed, we stop.

  6. 6.

    Otherwise, the solve command interprets the solution so that it matches the shape of the variables of prob.reduced. It also expresses these values as Lean reals, resulting in an approximate solution p to prob.reduced. It declares a corresponding Solution to prob.reduced, using a placeholder for the proofs of feasibility and optimality (since we simply trust the solver here).

  7. 7.

    It then uses the reduction from prob to prod.reduced, again reinterpreted in terms of floats, to compute an approximate solution to prob.

Finally, the results are added to the Lean environment. In the following example, the command solve so1 results in the creation of new Lean objects so1.reduced, so1.status, so1.value, and so1.solution. The first of these represents the conic-form problem that is sent to the back-end solver, while the remaining three comprise the resulting solution.

figure al

8 Related work

Our work builds on decades of research on convex optimization [10, 36, 39, 43], and most directly on the CVX family and disciplined convex programming [15, 17, 20, 21, 42]. Other popular packages include Yalmip [26].

Formal methods have been used to solve bounding problems [18, 38], constraint satisfaction problems [16], and optimization problems [25]. This literature is too broad to survey here, but [14] surveys some of the methods that are used in connection with the verification of cyber-physical systems. Proof assistants in particular have been used to verify bounds in various ways. Some approaches use certificates from numerical packages; Harrison [24] uses certificates from semidefinite programming in HOL Light, and Magron et al. [27] and Martin-Dorel and Roux [28] use similar certificates in Coq. Solovyev and Hales use a combination of symbolic and numeric methods in HOL Light [40]. Other approaches have focused on verifying symbolic and numeric algorithms instead. For example, Muñoz, Narkawicz, and Dutle [34] verify a decision procedure for univariate real arithmetic in PVS and Cordwell, Tan, and Platzer [13] verify another one in Isabelle. Narkawicz and Muñoz [35] have devised a verified numeric algorithm to find bounds and global optima. Cohen et al. [11, 12] have developed a framework for verifying optimization algorithms using the ANSI/ISO C Specification Language (ACSL) [5].

Although the notion of a convex set has been formalized in a number of theorem provers, we do not know of any full development of convex analysis. The Isabelle [37] HOL-Analysis library includes properties of convex sets and functions, including Carathéodory’s theorem on convex hulls, Radon’s theorem, and Helly’s theorem, as well as properties of convex sets and functions on normed spaces and Euclidean spaces. A theory of lower semicontinuous functions by Grechuk [22] in the Archive of Formal Proofs [8] includes properties of convex functions. Lean’s mathlib [29] includes a number of fundamental results, including a formalization of the Riesz extension theorem by Kudryashov and Dupuis and a formalization of Jensen’s inequality by Kudryashov. Allamigeon and Katz have formalized a theory of convex polyhedra in Coq with an eye towards applications to linear optimization [3]. We do not know of any project that has formalized the notion of a reduction between optimization problems.

9 Conclusions

We have argued that formal methods can bring additional reliability and interactive computational support to the practice of convex optimization. The success of our prototype shows that it is possible to carry out and verify reductions using a synergistic combination of automation and user interaction.

The implementation of CvxLean is currently spread between two versions of Lean [32, 33]. Lean 3 has a formal library, mathlib [29], which comprises close to a million lines of code and covers substantial portions of algebra, linear algebra, topology, measure theory, and analysis. Lean 4 is a performant programming language as well as a proof assistant, but its language is not backward compatible with that of Lean 3. All of the substantial programming tasks described here have been carried out in Lean 4, but we rely on a binary translation of the Lean 3 library and some additional results proved there. This arrangement is not ideal, but a source-level port of the Lean 3 library is already underway, and we expect to move the development entirely to Lean 4 in the near future.

There is still a lot to do. We have implemented and verified all the atoms needed for the examples presented in this paper, but these are still only a fraction of the atoms that are found in CVXPY. The DCP transformation currently leaves any side conditions that it cannot prove for the user to fill in, and special-purpose tactics, i.e. small-scale automation, could help dispel proof obligations like monotonicity. Textbooks often provide standard methods and tricks for carrying out reductions (e.g. [10, Section 4.1.3]), and these should also be supported by tactics in CvxLean. Our project, as well as Lean’s library, would benefit from more formal definitions and theorems in convex analysis and optimization. We need to implement more efficient means of extracting numeric values for the back-end solver, and it would be nice to verify more of the numeric computations and claims. Finally, and most importantly, we need to work out more examples like the ones presented here to ensure that the system is robust and flexible enough to join the ranks of conventional optimization systems like CVXPY.