A Penalty-free Method for Equality Constrained Optimization

A penalty-free method is introduced for solving nonlinear programming with nonlinear equality constraints. This method does not use any penalty function, nor a filter. It uses trust region technique to compute trial steps. By comparing the measures of feasibility and optimality, the algorithm either tries to reduce the value of objective function by solving a normal sub-problem and a tangential subproblem or tries to improve feasibility by solving a normal subproblem only. In order to guarantee global convergence, the measure of constraint violation in each iteration is required not to exceed a progressively decreasing limit. Under usual assumptions, we prove that the given algorithm is globally convergent to first order stationary points. Preliminary numerical results on CUTEr problems are reported.

1. Introduction.We consider the following equality constrained optimization problem min f (x) s. t. c(x) = 0, where f : R n → R, c : R n → R m are twice continuously differentiable functions.
Here, we propose a new algorithm based on trust region for solving (1) whose main feature is that it does not use any penalty function, nor a filter.Trust region method is an important class of methods for (1), see, e.g., [9] and the references therein.Besides trust region methods, there are many other methods for solving (1), such as the augmented Lagrangian methods ( [1]), SQP methods ( [5]) and the inexact Newton method ( [6]), etc.. Traditional methods for (1) resort to some penalty function or merit function to balance the two objectives: improving feasibility and reducing objective function.However, there are some difficulties in practice, of which the most significant is the difficulty of choosing appropriate merit function.To avoid the shortcomings of penalty methods, many researchers try to find methods that do not use any penalty function, i.e., penalty-free methods.The filter method first introduced by Fletcher and Leyffer [11] is one of the most important penalty-free methods.So far, the filter technique has been applied to SQP, SLP type methods [8,12], interior point methods [18], line search methods [19].In [13], Fletcher reviewed the history of filter methods.

ZHONGWEN CHEN, SONGQIANG QIU AND YUJIE JIAO
Filter method is not the only penalty-free method.In fact, in 1995, Zoppke-Donaldson [22] proposed an SQP method for (1) that does not use any penalty functions.Although he does not give proof of convergence, the numerical results on problems from CUTEr set [3] are impressing.M. Ulbrich et al [17] introduced a non-monotone trust region method for (1) that does not use any penalty function or a filter and showed its global and local convergence.Chen [7] also gave a penalty-free-type nonmonotone trust-region method for nonlinear constrained optimization.Yamashita et al [21], motivated by an early report by Yamashita [20], presented a penalty-free SQP algorithm.The main algorithm contains a restoration phase, which requires the solution of one quadratic problem, and a minimization phase, which needs the solutions of one linear programming problem and two quadratic programming problems.Bielschowsky et al [2] gave trust cylinder method which keeps the infeasibility under dynamic control.Gould et al [14] proposed trust funnel method which does not use any penalty function or a filter.In trust funnel algorithm, the iterates are forced to a solution in a so-called "trust funnel".The authors proved its global convergence without the linearly independent assumption.Liu et al [15] proposed a sequential quadratic programming method without a penalty function or a filter for (1).Under the weaker assumptions, its global convergence and locally superlinear convergence are analyzed.Qiu et al [16] proposed another new penalty-free-type algorithm based on trust region techniques for (1), which uses a feasibility safeguarding value, a progressively decreasing limit, like the "trust funnel", on the permitted constraint violation of the successive iterates to drive global convergence.
Motivated by the ideas in [14,16,17], we propose a new penalty-free method for equality constrained optimization.The new algorithm keeps a balance between feasibility and optimality.When the measure of optimality is worse than the measure of feasibility, the algorithm focuses on improving optimality and computes a composite step which is composed by a normal step and a tangential step.Otherwise, it improves feasibility by solving a normal subproblem simply.The framework of the new algorithm is more simple and different from one of M. Ulbrich et al [17].Moreover, the trust region subproblem on the normal step and the acceptable criteria on the trial step are different from those of Gould et al [14] and Qiu et al [16], which are very important for its globally convergent analysis.Under usual assumptions, we prove that the given algorithm is globally convergent to first order stationary points.
The paper is organized as follows.Section 2 gives the detailed description of the algorithm.Section 3 proves that the algorithm is well defined.In Section 4, global convergence is analysed.Preliminary numerical results on CUTEr problems are reported in Section 5. Some remarks are finally outlined.
2. Description of the algorithm.Define the measure of constraint violation as Suppose the current iterate point is x k and the current upper bound on constraint violation is h max k .Without loss of generality, we assume that h(x k ) ≤ h max k .
If the algorithm does not stop at x k , the algorithm seeks a normal step d n k by solving the following normal subproblem where Having obtained d n k , the algorithm tries to reduce the value of objective function in the null space of A T k .At the same time, it will not destroy the reduction of constraint violation obtained by normal step.Let where k is a symmetric matrix that approximates the Hessian matrix of the Lagrangian function of (1).A tangential step d t k is obtained by solving the following tangential subproblem And the trial step is defined as the combination of normal and tangential step, i.e., ).Similarly, define the actual reduction on h(x) obtained by trial step d k and the predicted one The new algorithm needs to give attention to the measure of feasibility and the measure of optimality at the same time.||c k || is the measure of feasibility.In order to denote the measure of optimality, we let Z k ∈ R n×(n−m) be a matrix whose columns form an orthogonal basis for the null space of A T k , that is, , where I is an identity matrix.Then ||Z T k g k || is the measure of optimality.According to the relation between the measure of feasibility and one of optimality, the algorithm decides to improve the feasibility or the optimality.Therefore, we consider the following switch condition If (4) holds, then the measure of feasibility is better than the measure of optimality at the current iterate point.Therefore, the algorithm solves (2) and (3) successively to obtain a normal step d n k and a tangential step d t k .Otherwise, if Z T k g k < γ c k holds, which shows that the current optimality is better than the feasibility, then the algorithm focuses on improving the feasibility.Thus the algorithm relaxes the trust region constraints in (2) and solves to get a normal step d n k .The tangential step in this case is set to be zero, that is, . Now let us consider the case that both the normal step and the tangential step are computed.The normal step is descent direction of constraint violation but not necessary a descent direction of the objective function.This leads to a reasonable requirement for successful step, that is, the predicted reduction of f (x) along the tangential step should not be destroyed by the possible increase provided by the normal step.Formally, the normal step and the tangential step should admit the following condition with κδ ∈ (1, +∞) fixed.( 6) is equivalent to In the practical algorithm, we will not use (6) but the following condition where κ δ = 1 − 1/κ δ , κ d ∈ (0, 1) is a very small constant, e.g., κ d = 10 −5 .On the other hand, the algorithm requires that the new trial point keeps the reasonable feasibility, that is, Thus it is possible for the algorithm to reduce the value of the objective function while keeping the constraint violation within a reasonable bound h max k .If the trial step d k satisfies the following sufficient descent condition then the algorithm accepts d k as a successful step, let x k+1 = x k + d k and the current iteration is referred to as an f-type iteration.Otherwise, let The trust region radius is updated in the following way At an f-type iteration, the bound h max k is unchanged whether the trial step is accepted or not.If (4) does not hold, then the current optimality is better than feasibility.The main task of the current iteration becomes improving feasibility and the current iteration is called a c-type iteration.If The trust region radius is updated as In this case, the upper bound h max k on constraint violation should be updated.The algorithm does it in the following way (see [14]) where, κh , κ h ∈ (0, 1).Now we give the detailed description of the algorithm as follows.Algorithm 2.1 Initialization.Given an initial point x 0 ∈ R n and initial symmetric matrix 8) and ( 9) hold then end while Algorithm 2.1 generates an iterate sequence {x k } which is made up of f-type iteration and c-type iteration.c-type iteration improves the measure of constraint violation while f-type iteration reduces mainly the value of the objective function.Since the restriction of the condition (7), the value of the objective function is reduced and the reasonable feasibility is kept at this phase.In fact, since there is the normal step at f-type iteration, the measure of feasibility is still improved on a certain extent, which is validated by the preliminary numerical experiments.
3. The well definedness of the algorithm.In this section, we analyze the well definedness of Algorithm 2.1.For this purpose, we make the following assumptions.
(ii) The iterate sequence {x k } generated by Algorithm 2.1 is contained in a compact convex set Ω.
(iii) There exists an As a direct consequence of Assumptions 3.1, there exist constants In order to analyse the well definedness of Algorithm 2.1, we introduce some results.First, let us recall a well-known results on sufficient descent condition on trust region subproblem, see, for example, [9, Thm.6.3.1 and Cor.6.3.2].Lemma 3.2.Let s * be a solution of the following trust region subproblem min.q(s) = g T s + 1 2 s T Hs, s. t.
s ≤ ∆, then there holds By Lemma 3.2 above, we have the following estimation of the predicted reduction on h(x) achieved by normal step under Assumption 3.1.Lemma 3.3.Let d n k be a solution to the subproblem (2), then there exists a constant κ cn ∈ (0, 1) such that where where, ∆ c k = κ n min{1, ∆ σ k }∆ k .Since A k has full rank by Assumption 3.1, we have that k is a solution to the subproblem ( 5), then we have that which also implies the result of Lemma 3.3.
Similarly, we have the following result.
Lemma 3.4.Let d t k be a solution of tangential subproblem (4), then there exists a constant κ f t ∈ (0, 1) such that where, ḡk = Let u k be a solution to subproblem (11), then by Lemma 3.2, we have that Lemma 3.5.Under Assumptions 3.1, the following inequalities hold Proof.By Taylor's theorem, we have that where ξ k is some point at the segment connecting x k with x k + d k .Thus, Noting that ∇h(x) = A(x)c(x).
we have that where ζ k is some point at the segment connecting x k with x k + d k .Thus So the results are proved.
Next we provide two properties about the upper bound {h max k } of constraint violation.
Lemma 3.6.For all k, there holds Proof.By the definition of h(x), we know that h(x k ) ≥ 0 for all k.Next we will show h(x k ) ≤ h max k by mathematical induction.For k = 0, h max 0 = max{1, h(x 0 )} ≥ h(x 0 ) obviously.Assume now that the statement holds for all j ≤ k.We are going to show that h(x k+1 ) ≤ h max k+1 .Two cases are considered, depending on whether By the mechanism of updating h max k (10), we have that Therefore, the proof is finished.Proof.See Lemma 2.3 in [14].
Theorem 3.8.Algorithm 2.1 is well defined, that is, if the algorithm does not terminate at x k , then after reducing trust region radius finite times, the algorithm can find a successful trial step.
Proof.If Algorithm 2.1 does not terminate at x k , then there exists a constant > 0 such that max{ Z T k g k , c k } ≥ .Assume, to arrive at a contradiction, that where ζ k+i is some point at the segment connecting x k and x k + d k+i .Noting that k+i for sufficiently large i.By Lemma 3.4 and Lemma 3.5, we have that Thus, for sufficiently large i, there holds ρ f k+i ≥ η 1 .Hence, the step d k+i is a successful trial step when i is large enough.
Case (ii) Hence, for sufficiently large i, there holds ρ c k+i ≥ η 1 , which implies that d k+i is a successful trial step when i is large enough.
If, on the other hand, holds for sufficiently large i, it follows from Lemma 3.4 that We also have that 400 ZHONGWEN CHEN, SONGQIANG QIU AND YUJIE JIAO holds for sufficiently large i.By ( 12), ( 13), ( 14), it follows, for sufficiently large i, Now we turn to show that h(x k+i + d k+i ) ≤ h max k+i holds for sufficiently large i.By Taylor's Theorem and Lemma 3.3, we have that for sufficiently large i.
The results above, combined with Lemma 3.4, Lemma 3.5, lead to Then for sufficiently large i, ρ f k+i ≥ η 1 .Summarizing the results above, we have shown that, for sufficiently large i, (7), ( 8) and (9) all hold, i.e., d k+i is a successful trial step.
4. Global convergence.First, we define two sets of indices which contain the successful iteration and the iteration at which h max k is changed, respectively.
Without loss of generality, we assume that |S| = +∞.Since {h max k } is nonincreasing monotonically and is bounded below, we can assume that lim k→∞ h max k = ĥ ≥ 0.
Noting that for all k ∈ H, Z T k g k < γ c k , therefore, lim k→∞,k∈H Thus, the result is proved.Now we assume that ĥ > 0 in the following discussion.
Lemma 4.2.Suppose that Assumptions 3.1 hold and ĥ > 0. If Algorithm 2.1 does not terminate at x k , then there exists a constant ∆ > 0 such that d k is a successful trial step as long as ∆ k ≤ ∆.
Proof.Since Algorithm 2.1 does not terminate at x k , we have that We consider two cases as follows.
In a word, under the case implies that ( 7), ( 8) and (9) all hold.So the k−th iteration is f-type successful iteration.Case (ii) It follows from Lemma 3.3 and Lemma 3.5 that It follows from ĥ > 0 that |H 1 | < +∞.Thus there exists k 0 > 0 such that Suppose, by contradiction, that there exists ∈ (0, 1) such that (15) holds for all k.It follows from ( 16) and Lemma 3.3 that implies c k ≥ min{1, γ −1 } .Having this lower bound in mind and by (17), there holds On the other hand, it follows from (15) and Lemma 4.2 that lim inf k∈S ∆ k > 0, which implies from H ⊆ S that lim inf k∈H ∆ k > 0, which is a contradiction with (18).So the proof is finished.Proof.Assume that there exists ∈ (0, 1) such that (15) holds for all k.Since |H| < +∞, there exists k 0 > 0 such that holds for all k ≥ k 0 .It follows from the mechanism of updating h max k that all the iterations after k 0 are f-type iterations, which implies that Z T k g k ≥ γ c k , ∀k ≥ k 0 .It follows from ( 15) that for all k ≥ k 0 .Noting that and the sequence {f (x k )} +∞ k=k0 decreases monotonically and is bounded below, we have that lim k∈S pred f k = 0.By (7) and lim k∈S pred f k = 0, we have that lim k∈S d n k = 0.By (19), If there are only finite number of unsuccessful iterations, then there exists k > k 0 such that ∆ k ≥ ∆k > 0 holds for all k ≥ k, which contradicts lim k∈S ∆ k = 0. Hence we consider the alternative case that there are infinite number of unsuccessful iterations after k 0 .For unsuccessful iteration k, we have that Suppose, without loss of generality, that

PENALTY-FREE METHOD FOR EQP 405
By (21), we have that Since the number of unsuccessful iterations after k 0 is infinite, it follows from ( 22) that there exists a sufficiently large k such that x k−1 is a unsuccessful point while x k is a successful point with ∆ k ≤ γ 1 ∆, where ∆ > 0 is a constant as Lemma 4.2.By the mechanism of updating trust region radius ∆ k−1 = ∆ k /γ 1 ≤ ∆, which implies that x k−1 is successful iterate point by Lemma 4.2.Thus, A contradiction is reached and the proof is finished.
5. Numerical results.In this section, a preliminary numerical implementation of Algorithm 2.1 is given.A Matlab code was written corresponding to this implementation and tested its performance on 82 problems selected from Bongartz et al ( [3,4]).The exact Hessian matrix of Lagrangian function for (1) is used, where the estimate λ k of Lagrangian multiplier is obtained by MATLAB's LSQR method.We compute the null space matrix Z k of A T k directly by the MATLAB's null space routine.The trust region subproblem is solved by the MATLAB's trust routine, i.e. trust.m.For the numerical tests we have used the following parameter settings in Algorithm 2.1.
The algorithm stops if max{||Z T k g k ||, ||c k ||} < = 10 −5 .The numerical results are reported in Table 5.1 and Table 5.2.Each problem is described by the number of variables n, constraints m.Denotations nf , ngf , nc and ngc represent the numbers of evaluation of f (x), ∇f (x), c(x) and ∇c(x), respectively.M denotes that the number of function evaluations is larger than 1000 or the solver fails.For comparison, we have included the corresponding results obtained by the default versions of LANCELOT and MINOS packages (see [4]), i.e., the exact Hessian information is also used.We also use the performance profile proposed by Dolan and More [10] to display the performance based on the computational results in Table 5.1 and 5.2 in Fig. 1.  6. Final remarks.The proposed algorithm constructs quadratic models for objective function and constraint violation respectively.It keeps a balance between optimality and feasibility and computes trial steps selectively.If feasibility is better than optimality, the algorithm reduces the value of objective function by solving normal subproblems and tangential subproblems in turn.Otherwise, it improves only feasibility by solving normal subproblems.The constraint violation is not allowed to exceed a decreasing limit to ensure that the cycle cannot happen and that global convergence achieves.Under usual assumptions, we prove that the algorithm is well defined and globally convergent.Similar to filter methods, the new algorithm allows for nonmonotonicity of both constraint violation and objective function value.The new algorithm does not use any penalty function or a filter, but it imposes nonmonotone decrease conditions that control feasibility and optimality in a loosely coupled way.We test some problems from Bongartz et al [4] and compare the results with that of LANCELOT and MINOS packages.The preliminary numerical results show that the new method is viable and effective.However, we have not yet established the local superlinear convergence results for our algorithm, which is a problem that we are still studying.

Lemma 3 . 7 .
The sequence {h max k } is non-increasing.Moreover, for all j ≥ k, there holds h(x j ) ≤ h max k and h max k > 0.