On q-steepest descent method for unconstrained multiobjective optimization problems

1 College of Economics, Shenzhen University, Shenzhen 518060, China 2 Department of Mathematics, Institute of Science, Banaras Hindu University, Varanasi 221005, India 3 Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, India 4 Department of Economic Sciences, Indian Institute of Technology Kanpur, Kanpur 208016 India 5 DST-Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University, Varanasi 221005, India


Introduction
In multiobjective problems, several objective functions have to be minimized or maximized simultaneously. There are conflicts between the objective functions. It is not possible to get a single point that minimizes all objective functions simultaneously. Therefore, the concept of optimality is replaced by the concept of Pareto optimality, a measure of efficiency in multiobjective problems [1]. A point is called Pareto-optimal or efficient, if there does not exist a different point with same or smaller objective function values such that there is a decrease in at least one objective function value. Applications of such optimization problems can be found in space exploration [2,3], engineering [4], truss optimization [5,6], design [7,8], environmental analysis [9,10], statistics [11], management science [12][13][14], economic sciences [15], etc.
There are many solution strategies for finding a Pareto optimal point of unconstrained multiobjective optimization problems (UP). One popular approach is to reformulate (UP) as a scalar optimization problem whose objective function is a weighted linear combination of all objective functions. However, this approach may yield unbounded scalar problems [20], if the weighted values are not properly chosen. Another popular approach is the evolutionary algorithm which searches for a Pareto optimum in a set of candidate solutions with some genetic operator. No convergence proofs are known in this case, and empirical results show that convergence is quite slow [16]. Another general solution strategy for multiobjective optimization problems is -constrained method [17]. A drawback of this method is that may be selected so that the feasible region becomes empty. Other parameter-free multiobjective optimization techniques use the ordering of the different criteria, that is, an ordering of the objective functions based on priority. In this case, the ordering has to be prespecified. Moreover, the corresponding optimization process is usually augmented by an interactive procedure [18]. Another kind of approach is descent methods, which extend the traditional descent methods for scalar optimization to solve (UP), such as the gradient descent method [19], Newton's method [10,20], etc. A remarkable property of this method is that it is a parameter free approach. This is quite different from the weighted approach for (UP). With this method, there is no need to analyze the prior information including the relationships and conflicting between the objectives. Note that such information is very important for choosing the weights for the weighted method. Specifically, an example is shown in [20] that the weighted method fails for a large range of weights which leads to unbounded weighted objective, but a descent method works well with any starting point.
The steepest descent method is one of the oldest and simplest descent methods to minimize the real-valued functions proposed by Baron Augustin-Louis Cauchy (1789-1857) [21] in 1847. Cauchy developed the gradient method to solve nonlinear equations. The idea of this method is that the continuous function value always decreases if the negative descent direction is considered. Almost, all optimization books [26] discuss this method to learn advanced optimization algorithms. However, this method is used to solve system of ridge regression or regularized least squares [22], also see [23][24][25]. The steepest descent method for single-objective problems was extended for unconstrained multiobjective optimization problems (UP) [19,27]. Further, the method is presented to find critical points of (UP) using maps from a Euclidean space to a Banach space [28]. Recently, the descent method for (UP) decreases the gradient at the rate of 1 √ k regardless of the starting point. This rate of convergence is derived for obtaining some point satisfying weak Pareto-optimality [29].
On the other hand, quantum calculus (briefly, q-calculus) is dedicated to the study of calculus regardless of the limits. In fact, by introducing the parameter q, we consider q-analogue of calculus concepts such that they can be recaptured for the case when q → 1. Euler was the founder of this branch of mathematics, by using the parameter q in Newton's work of infinite series. The q-calculus is one of the research interests in the field of mathematics and physics for the last few decades. The q-analogue of ordinary derivative was first proposed by F.H. Jackson [30,31]. Its wide applications can be seen in several areas such as operator theory [32], mean value theorems of q-calculus [33], q-Taylor formula and its remainder [34,35], fractional integrals and derivatives [36], integral inequalities [37], variational calculus [38], transform calculus [39], sampling theory [40], etc. Recently, a qFunctions Mathematica package is developed for q-series, and partition theory applications [41].
A q-version of the classical steepest descent method has been developed to solve single objective unconstrained optimization problems [42][43][44][45]. In this method, the quantum value q acts as a dilation parameter that controls the balance between global and local search direction in the context of q-gradient. The q-LMS (Least Mean Square) algorithm is proposed which utilizes the q-gradient to compute the secant of the cost function instead of the tangent [46]. The algorithm takes larger steps towards the optimum solution and achieves a higher convergence rate. A new class of stochastic gradient algorithm based on q-calculus is developed to enhance the q-LMS algorithm. The proposed approach utilizes a parameterless concept of error correlation energy, and normalization of signal for high convergence, stability, and low steady-state error. The q-Taylor formula for the functions of several variables and mean value theorems in q-calculus are utilized for solving the systems of equations [48]. In multiobjective optimization, Newton's method using q-calculus [49] is extended for unconstrained multiobjective optimization problems [50]. Following this trend, the Quasi-Newton's method [51] is further extended for solving (UP) [52] where components of the Hessian matrix are computed using q-calculus.
The goal of this paper is to present the q-steepest descent method, a generalization of the steepest descent method for (UP) [10,27]. A subproblem is formulated using the quadratic approximation of all functions at every iteration, and a feasible descent direction due to q-gradient is obtained as a solution of this subproblem [53]. The ordering of the objective functions is avoided. Instead of the classical gradient, the q-gradient of the objective function is used in the proposed algorithm to find critical points and (weak) efficient points and its theoretical results are validated by giving convergence proofs. The advantage of using q-gradient is that it allows the steepest descent direction to perform in a more diverse set of directions which makes it possible to escape local critical points. Several examples are presented where q-analog is implemented to shift the search process from global in the beginning to almost local search in the end. To evaluate the performance of the proposed method, we compare it with [28] and the weighted sum method. In general, the involvement of q-calculus effectively reduces the number of iterations to reach the critical point or (weak) efficient points. Thus, the proposed method is more promising to solve convex problems, non-convex problems, and multimodal multiobjective optimization problems.
The paper is arranged as follows: Section 2 recalls concepts related to q-calculus and multiobjective optimization. In Section 3, the subproblem for the descent direction using q−derivative is presented. The proposed algorithm and convergence analysis with numerical examples are provided in Section 4. Comparison with the existing method is done in Section 5. The conclusion is given in the last section.

Preliminaries
Denote R as the set of real numbers, R n + := {x ∈ R n |x i ≥ 0, ∀ i = 1, . . . , m}, and as R n ++ := {x ∈ R n |x i > 0, ∀ i = 1, . . . , m}. A continuous function on any interval not containing 0, is called the continuous q-differentiable function. Let the q-integer [n] be defined as [n] = 1−q n 1−q , for n ∈ N, and the derivative of x n with respect to x be given as [n]x n−1 , then the q-derivative [30] is given as: otherwise. (2.1) The q-derivative reduces to an ordinary derivative as q equals 1. The first-order partial q-derivative [42] of function f with respect to the variable x i , where i = 1, . . . , n is: The q-gradient is the vector of n first-order partial q-derivatives of f is: where the parameter q is a vector q = (q 1 , . . . , q i , . . . , q n ) ∈ R n . Consider the following unconstrained multiobjective optimization problem (UP): A feasible solution x * ∈ X ⊆ R n is called a locally (weakly) efficient solution [20] if there is a neighborhood W ⊆ X of x * is an (weakly) efficient solution, then f i (x * ) is called a locally (weakly) non-dominated point. Let X ⊂ R n be a convex set, and f i : . . , m is componentwise-convex [17]. If the domain of the multiobjective optimization problem is a convex set and the objective functions are convex component-wise then critical point will be the weak efficient point, and if the objective functions are strictly convex component-wise then critical point will be efficient point. The relationship between critical point and efficient point is discussed in [20]. The classical gradient is obtained in the limit of q i → 1, for all i = 1, . . . , m. The following example is from [49].

The q-steepest descent for multiobjective optimization
The transpose of Jacobian matrix of q-partial derivative denoted as J q F(x) for the vector valued objective functions F is: (3.1) Note that a variant q ∈ (0, 1) has been considered in the above matrix. The range of the linear mapping given by the Jacobian matrix J q F(x) is range(J q F(x)). A necessary condition for a (local) weak Pareto minimizer x ∈ R n is given as: holds (see Lemma 3.1 of [55]). In the literature (3.2) or (3.3) is often considered as a necessary condition of weak efficiency. The points satisfying (3.2) are called Pareto critical points. If a point x ∈ R n is not Pareto critical then there exists a descent direction d q ∈ R n for F satisfying If d q ∈ R n is not a descent direction, then The descent gradient method for (UP) takes the schema x k+1 = x k + α k d k q , where α k ∈ (0, 1] is the step-length computed by Armijo-Wolfe line search with backtracking. For single-objective case, the negative q-gradient −∇ q f (x) is the steepest descent direction, and it is the solution of quadratic programming whose objective function is an approximation to For (UP) case, the subproblem introduced in [27], is an approximation to the least reduction of all objective functions, which is given as: where q ∈ (0, 1), and the search direction d k q is the solution of (3.6). The min-max subproblem (3.6) is depicted in the form of q-differentiable quadratic optimization problem (q-DQOP) with linear inequality constraints as: Note that the term 1 (1+q) d q 2 for q ∈ (0, 1) in the objective function ensures that the problem is bounded, and (d k q , t k ) are the optimial solution of the above problem (q-DQOP). The above sub-problem is a convex programming problem and satisfies Slater's constraint qualification since for t = 1, and d q = 0, the above inequalities become strict inequalities. The Karush-Kuhn-Tucker (KKT) conditions [54] for (q-DQOP) at x k are: . . m, are the Lagrange multipliers associated with linear inequality constraints of (3.7). Taking summation of (3.10) over i from 1 to m, and from (3.7), we get The solution of the (q-DQOP) is related to Pareto critical as stated in [27].
. . , m, otherwise Part 2 would not be satisfied, and for any fixī, we have t k ≥ ∇ q fī(x k ) T d k q . Part 1 of this lemma holds when (t k , d k q ) = (0, 0), is a feasible point of (q-DQOP). The constraint of (q-DQOP) is equivalent to (3.13). Since d q = 0, and t = 0, then we have (d k q , t t ) ≤ (0, 0), thus (3.12) holds. If x k is not a Pareto critical, then there exists a descent search direction d k is the solution of the sub-problem and d k q 0, following the arguments in [50,55] we select a suitable step length. Suppose θ k i is the angle between ∇ f k i and d k q . If cos 2 (θ k i ) ≥ δ, where δ > 0 holds for all i then we select α k satisfying and where β ∈ (0, 1), β 1 < β 2 < 1, and α ∈ (0, 1]. If cos 2 (θ k j ) < δ for any j then α k satisfying (3.14) is selected. The last inequality is a criterion for accepting a step-length in the multiobjective descent direction. We start with α = 1, and if the above condition is not satisfied, then set α := α 2 , and process will be continued.

Convergence analysis and numerical examples
Let us now summarize the steepest descent method for multiobjective proposed in [19] based on q-derivative which is given in Algorithm 1 as: Algorithm 1 q-Steepest Descent for Unconstrained Multiobjective Optimization (q-SDUP) Result: Critical Point or (Weak) Efficient Points Data: Choose β 1 ∈ (0, 1), x 0 ∈ R n , error of tolerance ε 1 , ε 2 , small positive number δ > 0, q ∈ (0, 1) for k=1,2,. . . do Solve (q-DQOP) and compute (d k q , t k , λ k ) for i=1,. . . ,m do Choose step-length α k ∈ (0, 1] such that x k + α k d k q ∈ X if cos 2 (θ k j ) >= δ then choose α k satisfying (3.14) and (3.15). else Choose α k satisfying (3.14). end end Theorem 1. Let f i where i = 1, . . . , m be continuously q-differentiable on a set X ⊆ R n for β 1 , q ∈ (0, 1) and {x k } be the sequence updated by x k+1 = x k + α k d q (x k ), where α k satisfies Proof. We know that that is, We obtain For at least one i 1 from i = 1, . . . , m for which f i (x) is bounded below such that f i1 (x) > −∞ for all x ∈ X. The sequence { f i1 (x k )} is also monotonically decreasing sequence and bounded below so that f i1 (x k ) converges to f i1 (x * ) as k → ∞ where f i1 (x * ) > −∞. We obtain the inequality as As k → ∞, we have Since d q (x j ) ≤ 0 for all j due to [10,52], and β 1 ∞ j=0 α j (−d q (x j )) is finite. Thus, we obtain β 1 α k (−d q (x k )) → 0 as k → ∞. Since the step length is bounded above so α k → ∞ for some k implies L 0 unbounded which is contradiction to the assumption. If α k ≥ β 1 for all k and for some β 1 ∈ (0, 1), then we get −d q (x k ) → 0 as k → ∞. Note that L 0 is bounded sequence, and has at least one accumulation point. Let {R * 1 , R * 2 , . . . , R * r } be the set of accumulation points {x k }. Since R * s is an accumulation point for every s ∈ {1, 2, . . . , r}, and d q is a continuous function, then d q (R * s ) is a critical point of f s for every s ∈ {1, 2, . . . , r}.
Theorem 2. Let f i for all i = 1, . . . , m be a continuously q-differentiable on a set X ⊂ R n , and {x k } be the sequence by x k+1 = x k + α k d k q (x k ), and given that 1.
. . , m, and 3. cos 2 θ k i ≥ δ for some δ > 0, for all i = 1, . . . , m, where θ k i is the angle between d q (x k ) and ∇ q f i (x k ).
Then, every accumulation point of {x k } generated is a weak efficient solution of (UP).
Proof. We have proved that every accumulation point of {x k } is a critical point of f i , for all i = 1, . . . , m. Let x * be an accumulation point of {x k }. Fix one i 0 from i = 1, . . . , m for which ∇ q f i 0 (x * ) = 0, then x * will be a (weak) efficient solution. Form Cauchy-Schwartz inequality for all i = 1 . . . , m, we obtain We obtain From part 1 of this theorem, this is, where θ k i is the angle between ∇ q f i (x k ) and d q (x k ). We have Taking k → ∞, We also have Since cos 2 θ k i > δ for i = 1, . . . , m, then min i=1,...,m ∇ q f i (x k ) 2 → 0 as k → ∞. Fix any i 0 from i = 1, . . . , m such that ∇ q f i0 (x k ) 2 → 0 as k → ∞. Since ∇ q f i0 (x k )| is a continuous function, and ∇ q f i0 (x k ) → 0 as k → ∞, then ∇ q f i0 (x * ) = 0 for every accumulation points x * of {x k }. Thus, x * is a local weak efficient solution.
Iteration Complexity of Algorithm 1: Based on the ideas above, we consider iteration complexity. For a given iteration k, a point solution for a particular x k , reaches to an optimal point under a desired tolerance . The steepest descent direction moves from d k q to some d k+1 q where d k+1 q − d k q is small enough to be able to determine the associated solution on the Pareto front. For non-convex, Algorithm 1 has a convergence rate of the order of 1 √ k , and for the convex case, Algorithm 1 establishes the desired 1 k rate for a certain sequence of weights {λ k } (see Theorem 4.1 of [29]) since for large value of k, q-gradient behaves as a classical gradient [43]. The global rates translate into worst-case complexity bounds of the order of 1 2 , 1 , and log 1 iterations, respectively, to reach an approximate optimality criterion of the form d k q ≤ for some ∈ (0, 1) where d k q is the steepest descent direction (3.6). Due to the existence of a uniform lower bound on the step-length α k in Algorithm 1 will always stop in a finite number of steps.
It is important to note that (weak) efficient solution of (UP) is not unique. Therefore, one can execute Algorithm 1 with any starting point to reach at one of these (weak) efficient points. User may consider a sufficiently large compact subset of R n as a domain of (UP) to solve. We have implemented algorithm 1 with the Armijo-Wolfe line search with backtracking. We have considered both convex and nonconvex problems. To avoid unbounded solutions of the subproblem at any iterating point x k , we consider the following subproblem: where lb and ub are lower and upper bound of x k , respectively. We now illustrate the methodology in numerical examples.
Example 2 (One Dimension [55]). Consider the problem (UP): This problem has two objective functions and one decision variable. Both functions are convex. The set of efficient solutions to this problem is [0, 1] which can be shown in Figure 1. Using Algorithm 1 with any starting point x 0 > 1, one can find the efficient point approximately equal to 1. At starting point x 0 = 10, the approximate solution is obtained at 29th step as x 29 = 1.00240356 ≈ 1 in [55]. Under the same starting point and stopping criterion, i.e, 1 = 10 −7 , 2 = 10 −11 , and δ = 10 −4 , our algorithm provides the efficient solution at 2nd step. The Pareto front using Algorithm 1 and [28] for −2 ≤ x ≤ 2 is found as a convex curve shown in Figure 2.  Example 3 (Two Dimension). Consider an unconstrained multiobjective optimization problem [55] minimize After running the Algorithm 1, and methodology used in [28], we obtain the point x * = (1, 2) T and λ * 1 = 0.9774, λ * 2 = 0.226, in two iterations as final solution with a starting point x 0 = (1, 3) T . One can verify that Note that function f 2 is not convex. A multiobjective optimization does not produce a unique solution but a set of efficient solutions. The algorithm does not depend upon the choice of the starting point. The sequence converges to one of the sets of critical or weak efficient solutions, starting at any starting point. For example, with starting point x 0 = (0, 0) T , Algorithm 1 converges to x * = (0.9975, 0.9936) T in 60 iterations, while methodology used in [28] converges to x * = (0.9981, 0.9950) in 64 iterations.
Comparison of approximate Pareto front with weighted sum method: To obtain an approximate Pareto front we have considered a multi-start technique. Here, 100 uniformly distributed random points are selected to execute Algorithm 1 and weighted sum (WS) algorithm individually. Approximate Pareto front obtained by Algorithm 1 is compared with the approximate Pareto front obtained by the weighted sum method. In (WS) method we have used weights (1, 0), (0, 1), and 98 random weights. The single objective optimization problem is solved by a single objective steepest descent method with some random initial approximation. Approximate Pareto fronts generated by Algorithm 1 for Example 2 and Example 4 are given in Figure 3(a), (b), respectively. One can observe that Algorithm 1 generates an approximate Pareto front for both problems but (WS) fails for Example 4 (non-convex problem).  Table 1 lists the average of number of iterations it, that of function evaluations eval f , that of gradient evaluations evalg and that of subproblem solved evalsp, respectively to show the performance of q-steepest descent method for (UP). The reported data in Table 1 represent a typical execution of algorithm 1. We conclude that Algorithm 1 performs better in terms of the number of iterations than the method of [28]. It is noteworthy that the number of steepest descent direction calculations is equal to the number of iterations. Example 5 (Three Dimension). Consider multiobjective problem: We solve the above problem with the method described in [28] and compare it with the proposed method of this paper.
Consider a starting point x 0 = (2, 1, 3) T , and stopping criteria min{∇ q f 1 , ∇ q f 2 , ∇ q f 3 } < 10 −5 . A program in MATLAB (2017a) is written in the lines of Algorithm 1. Of course, the search process moves from global at the beginning to particularly neighborhood at last due to the advantage of q-derivative shown in Table 2. The solution to this problem is found at 14th iterations, and this point is a local weak efficient solution of the above problem.

Comparison with the existing method
We select the work of [28] to compare the proposed method. The basic difference between the method of [28] and our method is that q-gradient based steepest descent algorithm gives faster convergence for q ∈ (0, 1). The q-drivative evaluates tangent, computes the secant of the objective functions, and therefore takes larger steps towards the optimum solution. Table 3 summarizes the results after executing the method of [28] and the proposed algorithm of this paper, with several starting points for the problem of Example 5. Both methods are solved using the MATLAB (2017a) program. It is seen that number of iterations using Algorithm 1 is less than a number of iterations in the methodology by [28].

Conclusions
We have applied the gradient descent approach to solve (UP), and gave necessary optimality condition based on q-derivative. We proved the convergence based on the proposed algorithm. The steepest descent method for multiobjective optimization with q-derivative converges independently of the starting point to a critical Pareto point. The solution strategy does not require any ordering information. The proposed algorithm has been compared with the existing method to measure the efficiency of algorithm.