Practical Aspects of Primal-Dual Nonlinear Rescaling Method with Dynamic Scaling Parameter Update

Primal-dual nonlinear rescaling method with dynamic scaling parameter update (PDNRD) is an optimization method from a class of nonlinear rescaling techniques. Previous work does not discuss practical aspects of PDNRD method such as the explanation and the setting of the parameters. To complete this framework, the parameters were described. Moreover, PDNRD method was applied on two quadratic programming problems with quadratic constraints and recommendations about the setting of the parameters were made.


Introduction
The optimization theory is developing in parallel with the appearance of real-life problems and with the need to solve them.Thanks to that, some types of problems gained a special status among the others and many tools to solve these problems have been developed.An example is the linear programming (LP).In 1947, George Dantzig introduced the simplex method for solving LP [3].However, the simplex method has a strong competitor -interior-point methods, especially primal-dual interior-point methods [15].
In the field of the nonlinear programming (NLP), interior-point methods have been also applied.However, the situation in NLP is more complicated in comparison with LP calculations and interior point methods are sometimes experiencing numerical difficulties.This fact motivated Roman Polyak and Igor Griva to design an alternative method based on the nonlinear rescaling (NR) theory.
Nowadays, NR methods can solve large-scale NLP problems with thousands of variables and constraints.They were successfully used to the radiotherapy treatment planning and are applied at some hospitals in USA and Europe [1].
The basic idea of NR methods is a nonlinear transformation of constraint functions to improve the properties of Lagrangian.Originally, the modified barrier methods [8] were introduced along with few modified barrier functions.Afterwards, the log-sigmoid function was also considered usable for NR [9], [10].Consequently, the pieces of knowledge were refined, put together and a generalization of these techniques led to the concept of NR methods and NR functions.Similar to progress with interior-point methods, the primaldual nonlinear rescaling (PDNR) method was developed [11].
PDNR method is locally convergent with the Qlinear convergence rate.To improve these properties, PDNR method can be combined with another optimization method (e.g. the primal-dual path-following method) to obtain the global convergence [11].Another way how to improve the convergence of PDNR method is a dynamic scaling parameter update [5] together with some globalization strategy (e.g. a step length computation).Recently, generalizations and other improvements [12], [13] were developed to improve the asymptotic convergence rate and to reduce the computational effort.The main purposes of this paper are to describe in detail parameters of PDNRD method and to give recommendations about their setting.
The paper is organized as follows.First, the convex optimization problem is stated and basic assumptions are discussed.Then, NR functions are defined and the key idea of NR method is explained.Afterwards, basic primal-dual variant of NR method is presented.Next, PDNRD method is explained and its parameters are described.Finally, numerical experiments with different parameter settings were made and the results are presented in Section 7.

Statement of the Problem
We consider the convex optimization problem Function f is convex and functions c i are concave, ∀i = 1, . .., r.Let S ⊆ n be the admissible set of problem Eq. ( 1).For simplicity we define mapping c : n → r as We suppose that: • Functions f , c i , ∀i = 1, . . ., r, are at least twice continuously differentiable on the set n .
• The optimal set X * = Argmin {f (x); x ∈ S} is bounded and not empty.
• The Slater condition holds.
For problem Eq. ( 1) we define the Lagrangian Due to assumption (C), Karush-Kuhn-Tucker's (KKT) conditions can be used to test the optimality.If x ∈ X * then there is a vector λ ∈ r such that Conversely, if a pair (x, λ) ∈ S × r satisfies Eq. (3) then x ∈ X * .

Nonlinear Rescaling Functions
First, we define functions that will be used to transform constraints of problem Eq. ( 1).
For example, the exponential transformation, the modified logarithmic function and the hyperbolic barrier function defined by the following formulas are nonlinear rescaling functions [11].Functions ψ i , i = 2, 3, can be modified so that ψ i ∈ C 2 ( ).The function ψ 1 is already twice continuously differentiable and the following modification is not necessary, of course it can be done.For a given parameter τ ∈ (−1; 0) the quadratic extrapolation is defined by the relations whereas coefficients of the function q i can be determined from the formulas From this equations we obtain

Nonlinear Rescaling Approach
NR methods are based on the idea to transform problem Eq. ( 1) using a nonlinear rescaling function ψ defined on the whole real axis to the equivalent problem From the definition of NR function, it is obvious that problems Eq.(1) and Eq. ( 4) have the same admissible sets, because Hence, the optimal sets are also the same.Positive real number k is the scaling parameter, which can be fixed or dynamically enlarged during the iteration process, see Algorithm 3.
The Lagrangian for equivalent problem Eq. ( 4) is given by the following formula: For any r ∈ N we denote Suppose for a while that we know the solution of the dual problem λ * ∈ r + .Then it is sufficient to minimize the function L(x; λ * , k) in the primal variable x.The constrained optimization problem would be transformed to an unconstrained optimization problem.
Since the Lagrange multipliers λ * are not known, we estimate them and update them in every step of the method -just like the primal problem.In consequence, the constrained optimization problem is converted to a sequence of unconstrained optimization problems.Newton's method or its variant is applied in each step to minimize the Lagrangian L in the primal variable.

Algorithm 1. (The basic concept of NR methods)
Let k > 0 be a scaling parameter.Initial approximations x 0 ∈ n and λ 0 ∈ r ++ are given.We suppose that an approximation (x s , λ s ) ∈ n × r ++ , s ∈ N 0 , is known already.We find the next primal-dual pair (x s+1 , λ s+1 ) using the following formulas The NR method converges for any fixed but large enough k > 0 under the standard second order optimality conditions [5].
++ , then also λ s+1 ∈ r ++ .This property follows from condition (ii) of NR function ψ.In other words, NR methods are interior-point methods in the dual variable.
The Algorithm 1 is well defined due to the following theorem.
The main purpose of NR is to improve properties of the Lagrangian.The classical Lagrangian L -as a connection between the constrained and the unconstrained optimization -does not always work, because the existence of the unconstrained Lagrange minimizer is unknown in general.On the other hand, the unconstrained minimizer of the Lagrangian L always exists (according to Theorem 1).Moreover, NR dramatically sharpens the reaction of the Lagrangian to the constraint violation, which has an impact on the computations.
In comparison to interior-point methods, there is no "infinite wall".NR methods are exterior point methods.Also, the unbounded increase of the scaling parameter is not needed to guarantee the convergence of the method.
We illustrate the influence of NR on the Lagrangian on the following simple example.
Example 1.The Lagrangian of the following problem minimize x, x ∈ 1 , and the Lagrangian of the equivalent problem are considered.
We take a look on the graphs of both Lagrangians, which are defined by the formulas NR fundamentally changes the shape of the Lagrangian.The sharp reaction of the Lagrangian L to the constraint violation is obvious in Fig. 2.
Fig. 2: The Lagrangian for the problem Eq. ( 7) -on the topand for the equivalent problem Eq. ( 8) with k = 1 -on the bottom.
We project the graphs of the both Lagrangians to the phase plane in Fig. 3. Suppose that we want to find Lagrange minimizer for λ = 2.After substitution, formulas Eq. ( 9) have the form: The function L(x; 2) is a decreasing linear function.We cannot find the Lagrange minimizer of it, because it is unbounded below.
If we rearrange the first equation in formulas Eq. ( 9) in the following way: it is apparent that for any λ = 1 we cannot find the Lagrange minimizer.On the other hand, we know from Theorem 1 that the Lagrangian L has a minimizer for any λ > 0. This is the main contribution of NR.
Fig. 3: The projection of the Lagrangian for the problem Eq. ( 7) -on the top -and for the equivalent problem Eq. ( 8) with k = 1 -on the bottom -to the phase plane.

Primal-dual Nonlinear Rescaling Method
The update of the Lagrange multipliers is clear from equations Eq. ( 6).When we want to calculate x s+1 , we must solve a nonlinear system of equations using Newton's method.Although this approach works well far from the solution, primal-dual variant of the NR method performs better in the neighborhood of the solution [5].In Section 6, we will combine these two techniques to obtain a stable and globally convergent method.
Let approximations x ∈ n , λ ∈ r ++ be known.We suppose that x = x + ∆x, λ = λ + ∆λ, where (∆x, ∆λ) is a primal-dual Newton step and λ is a predictor of the Lagrange multipliers, which is given by the following formula where . The reason, why define λ in this way is hidden in the equality ∇ x L(x; λ, k) = ∇ x L(x; λ).
One step of Newton's method consists of solving the system of (n + r) linear equations where a pair (∆x, ∆λ) is unknown and Symbols Ψ (kc(x)), Λ denotes diagonal matrices defined as If N (•) is sparse, we can use numerical algebra techniques for sparse matrices [15] to solve the system Eq.(11).
In the opposite case, we express ∆λ from the second equation of the system Eq.( 11) and we substitute it to the first equation.Now, we must solve only a system with n equations, instead of a system of (n + r) equations.The system has the following form where Remark 3. It is obvious that M (•) is a symmetric matrix.It can be shown [11], page 120 that if the second order optimality conditions hold true at the point (x * ; λ * ), then M (•) is a positive definite matrix for all (x; λ) sufficiently close to the optimal primal-dual pair, when the scaling parameter is sufficiently large.
• We calculate the Newton step (∆x, ∆λ) from system Eq.( 11), or • we find out the primal Newton step ∆x from formula Eq. ( 12) and then we compute the dual Newton step using the following relation ∆λ = kΨ (kc(x)) Λ∇c(x)∆x.
The Algorithm 2 describes only one step of PDNR method.Although, the input vector of multipliers belongs to r ++ , it is not guaranteed that the updated vector of multipliers has also all its components positive.In contrast, this statement is true for Algorithm 1, according to Remark 2. Therefore, PDNR method cannot stand alone in a general case and the improvement is needed (see Algorithm 3).

Dynamic Scaling Parameter Update
To obtain a higher convergence rate of the method, we dynamically change the scaling parameter.Moreover, we use Newton's method with a step length (e.g. the backtracking line search algorithm) to solve formulas Eq. ( 6).
We introduce a function which measures the distance between the approximation (x, λ) and the solution (x * , λ * ).
Definition 2. The function ν : n × r → + , defined as follows is called the merit function.
For known primal-dual pair (x, λ) we set the parameter k according to the relation It is obvious that if a primal-dual sequence {(x s , λ s )} +∞ 0 tends to (x * , λ * ), then ν(x s , λ s ) → 0 + and also k s → +∞ when s → +∞.
To prevent from a singularity in system Eq.( 11), we solve the following regularized system where the primal-dual pair (x, λ) is an approximation of solution, k is the scaling parameter, λ = Ψ (kc(x)) λ is the dual predictor of the Lagrange multipliers and It is important to remark that the regularization has no effect to the 1.5-superlinear convergence rate of PDNRD method [5].
In a comparison to interior-point methods, the inverse matrix to N k (•) exists at the optimal point for every k ∈ ++ .
Assume that the approximation (x, λ) of the point (x * , λ * ) is known.First, we use PDNRD method.If there is a superlinear decrease of the merit function, we have found the next approximation.In the opposite case, we use the primal Newton step ∆x (calculated during the use of PDNRD method) to minimize the function L(x; λ, k), where λ and k are fixed.We apply the backtracking line search method to guarantee the global convergence in the minimization process.In this way, we obtained the globally convergent PDNRD method with the 1.5-superlinear convergence rate.
and go to step 1.
then go to step 8.
• Set k := ωk and go to step 7.

Numerical Experiments
From the finite element approximation of contact problems of the linear elasticity with the friction in three space dimensions arise a minimization problem where n = 3m is the number of variables, A ∈ n×n is a symmetric and positive definite matrix, b ∈ n , g ∈ m + , l ∈ m and I = {1, 2, . . ., m}.This is a convex programming problem so we can use NR approach to solve it.We use the function ψ q2 with τ = 1 2 to rescale the conditions and subsequently we obtain the equivalent problem.The reason for this choice is that for a sufficiently large class of functions c(x) is the function − ln c(x) self-concordant.If we apply Newton's method to a self-concordant function, we can say something more about the convergence of Newton's method [2]).
PDNRD method was tested on two model problems -the chord problem, which in contrary to Eq. ( 16) contains also unconstrained variables, and the steel brick problem, whose finite element approximation leads exactly to Eq. ( 16).All computations were performed in MATLAB on the PC Intel Core i7 (2.4 GHz) with 8 GB RAM.In tables below we reported the number of iterations (it) , the number of solutions of the primal-dual system (n S ) and the solution time in seconds (time).If some data is missing, it means that the solution time was too long in the comparison to the other cases in the table.

Chord Problem
We consider a problem where Fig. 4: The chord deformation.
Minimization problem Eq. ( 17) describes a loaded chord fixed at the endpoints.The chord is partially above a plain and partially inside a cylindrical tube Fig. 4. The function u(t) is the chord deflection.The chord problem was presented as a model problem in [6].
First, we decided if the primal-dual system has a sparse matrix.We say that the matrix is sparse if it has at most 10 % nonzero elements.In Fig. 5, the structure of the matrix N k (•) ∈ 192×192 for n = 128 is depicted.This matrix has only 636 nonzero elements (1.7 %).Hence, N k (•) is the sparse matrix.In the same way we could argue for other choices of n.
We solved the chord problem for different settings of parameters of PDNRD method.The main result from Tab. 1 is a non-increasing number of iterations and also a non-increasing number of solutions of the primal-dual system while increasing the number of variables.The best choice of the scaling parameter was k init = 10 in Tab.1: The chord problem.PDNRD method with parameters ω = 10, σ = 1 2 k init , θ = 0.4, q = 0.5, η = 0.01, ε = 10 −6 .From Tab. 2 it is obvious that the choice q = 0.1 was the best one.Parameters η and θ had no significant impact on the computation in this case.

Steel Brick Problem
Let us consider a steel brick lying on the rigid obstacle.The brick occupies the domain S = (0; 3) × (0; 1) × on the tests of PDNRD method on examples from Sections 7.1 and 7.2, we made considerations about a suitable setting of the parameters.

Factor ω
The factor ω affects the rate of the increase of the scaling parameter.For ω ∈ 5; 20 we obtain almost the same results.So we set ω = 10.However, even for the other choices ω > 1 there are not any significant changes.At most, it may happen that it takes a few extra steps of the method.

Parameter σ
The choice of the parameter σ is less important than the choice of the ratio between σ and the initial choice of the scaling parameter k init .It is the fraction σ k which decides about the number of inner iterations (and thus about the number of Newton steps in the damped phase of Newton's method).It is clear that for σ >> k there are too little inner steps.On the other hand, for σ << k there is too many of them.According to the results of the numerical experiments, the choice σ = 1 2 k init is suitable.

Parameter θ
The parameter θ affects whether the inner solver runs or not.If we set θ = 0, the inner solver runs whenever the 1.5-superlinear decrease of the merit function was not achieved.Choosing θ = 0.5, we are saying that we are satisfied with only linear decrease of the merit function.Therefore, it is wise to set θ ∈ 0; 0.5 .It appears (see Tab. 2 and Tab.4) that the method does not depend on this parameter.
In the condition, which is related to the parameter θ, the term min H 3/2−θ , 1 − θ is calculated.We can ask why do not simply use the term H 3/2−θ instead of the previous one.As we know, the classical Newton method is effective only in the neighborhood of the solution, so if we are "far" from the solution the damped Newton method is better suited to use.The expression "far" means that the merit function is greater than 1 − θ.

Factor q
The number of inner steps is influenced by the fraction σ k together with the factor q.This factor also affects how often is the scaling parameter increased.Due to the way in which is this parameter used in PDNRD method, it is needed to set q ∈ (0; 1).Moreover, we must choose the factor q so that the method does not use too many inner steps.Based on the data in Tab. 2 and Tab. 4, any factor q ∈ (0; 1) was suitable.However, the best results were obtained by setting q = 0.1.

Parameter η
The backtracking line search parameter η is usually chosen in the range from 0.01 to 0.3 (see [2] page 466).According to data in Tab. 2 and Tab. 4 the choice of the parameter η ∈ 0.01; 0.3 is arbitrary.

Initial value of the scaling parameter
Based on the results shown in Tab. 1 and Tab. 3, increasing the initial value of the scaling led to instabilities in computations.The best results in the both model examples were obtained with k init = 10.Therefore, we recommend setting k init around ten or less.Further reason to set k init rather lower is that the scaling parameter can be increased during the computation, but it cannot be decreased.

Initial approximation
The computation also depends on the initial approximation.The number of steps can be decreased, if "lucky" initial approximation is chosen.In problems like a beam deflection or contact problems we choose an initial state of the system as the initial approximation, because the shape changes of a body are usually very small.Thus we set x 0 = (0, 0, . . ., 0).

Conclusion
The practical aspects of PDNRD method were described.Especially, the meaning of the parameters of PDNRD method were explained in detail.PDNRD method was tested on two quadratic programming problems with quadratic constraints (the chord problem and the steel brick problem).Based on the tests, the recommendations about setting the parameters were made.
The solution of the primal-dual system is the most expensive operation, thus the number of solutions of the primal-dual system determines the total complexity of computations.It was found out that the increasing number of variables in both presented problems has not a consequence in the increasing number of solutions of the primal-dual system (see Tab. 1 and Tab.3).This fact supports the applicability of PDNRD method on problems of an arbitrary size.