Proximal Bundle Method for Contact Shape Optimization Problem

From the mathematical point of view, the contact shape optimization is a problem of nonlinear optimization with a specific structure, which can be exploited in its solution. In this paper, we show how to overcome the difficulties related to the nonsmooth cost function by using the proximal bundle methods. We describe all steps of the solution, including linearization, construction of a descent direction, line search, stopping criterion, etc. To illustrate the performance of the presented algorithm, we solve a shape optimization problem associated with the discretized two-dimensional contact problem with Coulomb’s friction.


Introduction
Shape optimization problems arise naturally in mechanical engineering whenever the design requirements include an optimal performance of a machine or a gadget comprising several bodies in mutual contact.From the mathematical point of view, these problems can be characterized by a locally Lipschitz continuous cost function, which is differentiable in most but not all points.Moreover, its gradient is not defined explicitly even in the point, at which it exists.Shape optimization problems have the following form: The solution of such problems can be obtained by a suitable iterative algorithm -its typical structure reads as follows.
Step 1: (Stopping criterion) If x k is "close enough" to the required solution then STOP.
Step 2: (Direction finding) Find a feasible descent direction d k ∈ R n : f (x k + td k ) < f (x k ) and x k + td k ∈ Ω for some t > 0.
Step 3: (Line search) Find a step size t k > 0 such that: Step 4: (Updating) Set x k+1 = x k + t k d k , k = k + 1 and go on to Step 1.

The hardest difficulty is the direction searching in
Step 2 since the cost function f mentioned in the above section is not differentiable but only locally Lipschitz continuous.This implies that to minimize the function f , we can choose an algorithm from the following two classes: derivative-free methods (like genetic algorithms) and methods that use the subgradient information (like subgradient or bundle methods).Since the subgradient information is available for our problem, we have chosen the latter class of algorithms.In this paper, the proximal bundle method (see [4]) is presented.This method needs the function value f (x) and one (arbitrary) Clarke subgradient of f at x in every step of the iteration process.
Key terms and theorems are introduced in the first part of this paper and then some important sections present direction finding, line search or subgradient aggregation in detail.To show the functionality of the presented algorithm, we solve a shape optimization problem with the discretized two-dimensional contact problem with Coulomb's friction in the last part.

Nonsmooth Analysis -Clarke Calculus
Before we come to the derivation of the aforementioned method, it is necessary to be familiar with the necessary theory.Since we often use the term generalized gradient and subgradients, it is suitable to define them.
Then conv (Ω) denotes the convex hull of the set Ω, which is defined by: Remark 1.The set conv (Ω) is the set of all convex combinations of the points in Ω.
Definition 5. Let the objective function f : R n → R be locally Lipschitz continuous (in R n ).The generalized gradient of the objective function f at x ∈ R n is the set where Each element g ∈ ∂f (x) is called a subgradient of the objective function f at x. Definition 6.Let f be locally Lipschitz continuous at x ∈ R n .Then the generalized (Clarke) directional derivative of f at the point x in the direction v is defined by: for all v ∈ R n .
We illustrate the previous definition Def. 5 in the following example.
Example 1 (Generalized gradient).Find the generalized gradient of a given function f at x = 0, where This function (see its graph in Fig. 1) can be rewritten in the form As the function f is not smooth at x = 0, we have to use one-sided limits of the derivatives of the function f to compute the generalized gradient (see Def. 5).Let us start with the interval (0, 1).For each x ∈ (0, 1) the derivative is ∇f (x) = 1 and therefore g = lim i→∞ ∇f (x i ) = 1 applies for x i → 0 + .We analogously have g = lim i→∞ ∇f (x i ) for x i → 0 − .The generalized gradient of f at x is then: and its graph can be seen in Fig. 2. In a similar way, we can compute the generalized gradient of the function f for all x ∈ R (see its graph in Fig. 3).

Optimality Conditions
Here are some necessary conditions for a constrained and for an unconstrained optimization problem.We should note that these conditions are only basic ones.

1) Optimality Conditions for Unconstrained Problems
Theorem 1.Let f be Lipschitz continuous at x and let f attain its local minimum at x, then Consider the first part o ∈ ∂f (x) and note that in the Exm. 1 above, the function f attains its local minimum at x = 0.The generalized gradient of f at x = 0 is the convex hull of {−1, 1}, which is the interval −1, 1 , and it is no surprise that 0 ∈ −1, 1 .
Theorem 2. Let f be a convex function, then the following conditions are equivalent: • f attains its global minimum at x,

2) Optimality Conditions for Constrained Problems
Tangent and normal cones are closely related to the optimality conditions for constrained problems, thus it is important to define them.
Definition 8.The tangent cone of a convex set Ω at x ∈ Ω is defined by: Definition 9.The normal cone of the convex set Ω at the point x ∈ Ω is given by the formula: We can see some examples of the tangent and normal cones in Fig. 4. Now, we can write the corresponding conditions.
Theorem 3. Let f be Lipschitz continuous at x ∈ Ω and let x be a minimizer of the function f over the set Theorem 4. If the function f is convex and if the set Ω is also convex, then the following conditions are equivalent: • f attains its global minimum at x over the set Ω.
For proofs of Thm. 1, Thm. 2, Thm. 3, Thm. 4 and further details on the theory of nonsmooth analysis the interested reader is reffered to [4], [6] or [5].Further definitions, pictures and examples can be found there.

Derivation of the Method
Consider the following nonlinear constrained optimization problem: where the objective function f : R n → R is locally Lipschitz continuous in R n , C ∈ R m×n is an constraint matrix, b ∈ R m is a right-hand side vector and x max ∈ R n , x min ∈ R n are bound vectors.To make these notations simple we suppose that the simple bounds x min , x max are included in the linear system Cx ≤ b.For further details on the proximal bundle method the interested reader is reffered to [4].

1) Direction Finding
Our aim is to solve the problem with respect to where Suppose that we have some starting point x 1 ∈ Ω, the current iteration point x k ∈ Ω and that we have subgradients g f j ∈ ∂f (y j ) for all j ∈ J k f , where J k f ⊂ {1, . . ., k} is a nonempty index set and where y j ∈ Ω is an auxiliary point.Denoting the linearization of our cost function is but we can rewrite the formulation Eq. ( 16) into its recursive form Moreover, we can employ this linearization for polyhedral approximation of the objective function (e.g. in Fig. 5) and then we can define the improved polyhedral function Ĥk By employing the proximal bundle idea (the idea of adding a penalty to be able to limit the step length) and after a series of adjustments, we can rewrite the whole problem Eq. ( 15) into its dual form where We denote the solution of the problem Eq. ( 21) as λ k and ν k .

2) Subgradient Aggregation
There is still one hidden but equally important difficulty in the problem Eq. ( 21).Let us consider the index set J k f .How to choose this set?The simplest option seems to let However, this is not the case.Because, in every iteration step, the index set will enlarge, which causes larger and larger memory requirements.We have to find another way.
The idea is to aggregate the constraints made up by the previous subgradients.This strategy allows us to keep the quantity of constraints bounded.First, we need to scale the multipliers for all j ∈ J k f and to be able to do that, we denote λ k f = j∈J k f λ k j , then the aggregate subgradients can be presented.
Definition 10.We define the scaled multipliers for all j ∈ J k f by: and the aggregate subgradients by: where α k f,j = f (x k ) − f k j is the linearization error.And finally the aggregate linearization is defined by: The vector p k f is unknown at the beginning, but we can solve this lack by generating p k f recursively (see Eq. (31)).Let x 1 ∈ Ω be a feasible starting point, then we begin the algorithm with this settings: and J 1 = {1} and we also define α k f,p = f (x k ) − f k p .Using stated equations, we are able to rewrite the problem Eq. ( 21) in suitable form to keep the index set bounded.Via restriction and dualization we obtain Suppose that we found the Lagrange multipliers of the problem above λ k p , λ k j for all j ∈ J k f and ν k i for all i ∈ I = {1, . . ., m}.First, we sum up the multipliers λ k j and then we can scale the multipliers and figure up the aggregate subgradients where f k p is updated recursively by Now we are able to choose the index set J k f bounded without any difficulties.Several types of selection exist and the simplest one (not necessarily the best one) is to keep the length of the index set bounded and replace the oldest element in J k f by the new one as it can be seen below:

3) Nonconvexity
Let us recall that α k f,j = f (x k ) − f k j is the linearization error.If f is convex, then α k f,j ≥ 0 for all j ∈ J k f and f j (x) ≤ f (x) for all x ∈ Ω.It means that our linearization approximates the cost function f from bellow and α k f,j indicates how good our linearization is.But this is true only if the cost function f is convex.Unfortunately, in the nonconvex case, the inequality f j (x) ≤ f (x) is not valid at every x ∈ Ω.The linear approximation can be above the cost function f and the linearization error may takes values less then zero.We can see this situation in Fig. 6.So what to do when the cost function f is not convex?Due to this (we can say) difficulty, we need to generalize the subgradient error α k f,j .We will use the subgradient locality measure.To achieve this, we will also need some information about the distance between the trial point y j and the actual iteration point x k .Definition 11.Let us define the distance measure at every iteration k by: And now we are able to define the aforementioned subgradient locality measure.
Definition 12.At every iteration step k, the subgradient locality measure is defined by: where γ ≥ 0 is the distance measure parameter, which is equal to zero, when the cost function is convex.
We choose the parameter γ heuristically.As it can be seen, we did not use the aggregate subgradients, which were described in the previous subsection.It is still important to be able to reduce the memory requirements, so we also define the aggregate version of the distance measures.
Definition 13.Denoting s 1 f = 0, the aggregate distance measures at each iteration step k are defined by: and the subgradient locality measures by: where γ ≥ 0 is the distance measure parameter.Set the parameter γ = 0, if f is convex.

4) Line Search
The descent direction are known.But we do not know yet how far we can go in the direction d k to evaluate the next value x k+1 .A solution to this problem was presented by Kiwiel in 1990 in his contribution [3].
After the initial setting of parameters, we are searching for the maximum value t k L ∈ 0, 1 that fulfils following conditions: If such a parameter exists, we obtain the so-called long serious step If only the first condition a) holds and also 0 < t k L < t, we take the short serious step: choose t k R ∈ t k L , 1 and and finally, in the case when t k L = 0, we have the null step (39)

5) Weight Update
One of the last but still very important questions is the choice of weight update u k .We cannot keep u k constant, because it could make some difficulties (e.g. if the parameter u k is large, values |v k | and d k will be very small and therefore all the steps will be taken as the serious ones and the decrease will be small).
This seemingly small difficulty was also solved by Kiwiel in 1990.The whole weight update strategy can be found in the book [4] and in the article [3].

Description of the Proximal Bundle Method Algorithm
At the beginning of our algorithm, we need to set several parameters such as stopping tolerance ε S > 0, which is used in the stopping criterion; the maximum number of stored subgradients M g ≥ 2; the line search parameters m L , m R , t and distance parameters γ f > 0.
In the next step of the algorithm, we should find multipliers λ k p , λ k j by solving the dual problem, see Eq. ( 28).In the first half of the algorithm, there is also the stopping criterion.We need to evaluate whether w k ≤ ε S , where w k = 1 2 p k 2 + βk f,p , holds or not.If so, the algorithm stops and we obtain the desired result.Otherwise the algorithm continues by line search and after finding the step size, we make the linearization update.
The final part of the algorithm consists of the weight update and the index set updating in the subsection Subgradient aggregation.Now it remains to increase the iteration counter k by 1 and to repeat the whole algorithm from the part with the dual problem.

Numerical Experiment
The proximal bundle method described in the previous sections will now be used to solve a model example.We chose the shape optimization of a discretized twodimensional contact problem with Coulomb friction as the model example.Shape optimization is a part of the optimal control, in which the control variables are linked to the geometry of elastic bodies that are in contact.The aim of the problem on the lower level, which is contact problem with friction is to find the set of the state variables for the fixed vector of control variables.The state vector contains variables which describe the displacements, and the variables which describe the normal stress on the contact boundary.Hereafter, the contact problem with Coulomb friction will be considered as the state problem.The mapping describing the solution of the state problem (i.e. the vector of the displacements and the contact stresses) for the prescribed control variable is named as the controlstate mapping.A typical feature of the contact shape optimization with Coulomb friction is its nonsmooth character due to the fact that the respective controlstate mapping is typically nondifferentiable.Shape optimization of a discretized two-dimensional contact problem with Coulomb friction was considered in [1].Shape optimization of a discretized three-dimensional contact problem with Coulomb friction was considered in [2].Sensitivity analysis (computation of the subgradients of the minimized function) with help of calculus of Clarke (for 2D case) and calculus of Mordukhovich (for 3D case) was proposed in mentioned papers [1] and [2].In this contribution, we approximate subgradients only numerically by the forward finite difference approximation.
Example 2. Now let us deal with the shape optimization of a discretized two-dimensional contact problem with Coulomb friction only briefly.Let J be a cost function.The shape optimization problem is defined generally as follows minimize J (α, S(α)), where the admissible set U ad is given by: We will try to smooth down the peaks of the normal contact stress distribution.To this aim, we should minimize the max-norm of the discrete normal contact stress λ.The objective function J , however, must be continuously differentiable to ensure that the composite function J (α, S(α)) is locally Lipschitz, so we will use the p power of the p norm of the vector λ with p = 4 as the objective function J .The shape optimization problem then reads as follows: minimize λ 4  4 , subject to α ∈ U ad . (41) The vector α denotes the control vector, u denotes the displacement and λ denotes the normal stress and mapping S: α ∈ U ad ⊂ R d → (u, λ) ∈ R 3p denotes the control-state mapping.Number d is the dimension of the control vector α, p is the number of the nodes of the discretized elastic body Ω(α) and U ad is the set of the admissible control variables.For more detailed description, see [1].
The shape of the elastic body Ω(α), α ∈ U ad , is defined through a Bezier function F α as follows (cf.Fig. 7): where the vector α contains the control points of the Bezier function F α .The stopping tolerance was set to ε S = 1•10 −6 .This required precision was reached after 11 iterations.We depict the initial shape in Fig. 8 and the distribution of the von Mises stress in the loaded body in Fig. 9. Figure 10 shows the optimal shape and Fig. 11 shows the von Mises stress in the deformed optimal body.Finally, figures Fig. 12 and Fig. 13 compare the contact normal stresses for the initial and optimal shape, respectively.Note that during the optimization process the initial value J (α 0 ) = 2.8612 • 10 11 of the cost functional dropped to J (α opt ) = 1.0695 • 10 11 .The decrease of the peak stress is also quite significant.The experiment was carried out in Mathworks Matlab.

Conclusion
In this contribution we have briefly introduced the proximal bundle method for nonsmooth optimization problems with linear constraints Cx ≤ b and with some simple bounds x min , x max .We also outlined the imple-mented algorithm, which was employed to solve our model example.We tried to deal with the shape optimization of a discretized two-dimensional contact problem with Coulomb friction.To find a solution, we had to solve nonsmooth minimization problem with linear and box constraints.To be able to solve even more complicated nonsmooth problems, we should enhance our algorithm with solver for nonlinear constraints and to improve the line search part.

Fig. 3 :
Fig. 3: Graph of the general gradient of f .

Fig. 4 :
Fig. 4: Examples of the tangent and normal cones of given convex set Ω.

f
as we exemplified c 2017 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING

From
Fig. 7 we can also see the distribution of external pressures on the boundary Γ P , given as P 1 = (0; −200 MPa) on (0, 2) × {1}, while P 2 = (100 MPa; 40 MPa) on {2} × (0, 1) Further, Γ u is the part of the boundary where the zero displacements are prescribed.The set of the admissible designs U ad is specified as follows: a = 2, b = 1 and C 0 = 0.75, C 1 = 1, C 21 = 1.8, C 22 = 2.The Young modulus E = 1 GPa and the Poisson constant σ = 0.3 are used for the definition of the mapping S. The value of the coefficient of the Coulomb friction is 0.25.The state problem on Ω(α) is discretized by isoparametric quadrilateral elements of Lagrange type.The total number of nodes (vertices of quadrilaterals) is 3976 for any α ∈ U ad .The dimension of the control vector α, generating the Bezier function and defining Ω(α), is d = 8.

Fig. 8 :
Fig. 8: Example, initial design -the initial shape of the body.

Fig. 9 :
Fig. 9: Example, initial design -the distribution of the von Mises stress.

Fig. 10 :
Fig.10: Example, optimal design -the optimal shape of the body.