Next Article in Journal
ConvFaceNeXt: Lightweight Networks for Face Recognition
Next Article in Special Issue
An Optimized Discrete Dragonfly Algorithm Tackling the Low Exploitation Problem for Solving TSP
Previous Article in Journal
The Power Fractional Calculus: First Definitions and Properties with Applications to Power Fractional Differential Equations
Previous Article in Special Issue
A Statistical Comparison of Metaheuristics for Unrelated Parallel Machine Scheduling Problems with Setup Times
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems

by
Khalid Abdulaziz Alnowibet
1,
Salem Mahdi
2,
Ahmad M. Alshamrani
1,
Karam M. Sallam
3 and
Ali Wagdy Mohamed
4,*
1
Statistics and Operations Research Department, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
2
Department of Mathematics & Computer Science, Faculty of Science, Alexandria University, Alexandria 21544, Egypt
3
School of IT and Systems, University of Canberra, Canberra, ACT 2601, Australia
4
Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(19), 3595; https://doi.org/10.3390/math10193595
Submission received: 23 August 2022 / Revised: 19 September 2022 / Accepted: 24 September 2022 / Published: 1 October 2022

Abstract

:
This paper contains two main parts, Part I and Part II, which discuss the local and global minimization problems, respectively. In Part I, a fresh conjugate gradient (CG) technique is suggested and then combined with a line-search technique to obtain a globally convergent algorithm. The finite difference approximations approach is used to compute the approximate values of the first derivative of the function f. The convergence analysis of the suggested method is established. The comparisons between the performance of the new CG method and the performance of four other CG methods demonstrate that the proposed CG method is promising and competitive for finding a local optimum point. In Part II, three formulas are designed by which a group of solutions are generated. This set of random formulas is hybridized with the globally convergent CG algorithm to obtain a hybrid stochastic conjugate gradient algorithm denoted by HSSZH. The HSSZH algorithm finds the approximate value of the global solution of a global optimization problem. Five combined stochastic conjugate gradient algorithms are constructed. The performance profiles are used to assess and compare the rendition of the family of hybrid stochastic conjugate gradient algorithms. The comparison results between our proposed HSSZH algorithm and four other hybrid stochastic conjugate gradient techniques demonstrate that the suggested HSSZH method is competitive with, and in all cases superior to, the four algorithms in terms of the efficiency, reliability and effectiveness to find the approximate solution of the global optimization problem that contains a non-convex function.

1. Introduction

The major goal of this paper is to find the local and global minima of a convex and non-convex function. The local and global minimization problems are defined as follows.
Definition 1.
A local minimum x lo S of the function f, f : S R is an input element with f ( x lo ) f ( x ) for all x neighboring x lo . If  S R n , it is formulated by
x lo ε > 0 : f ( x lo ) f ( x ) x S , x x lo ε .
Definition 2.
The point x gl S is called the global minimizer of the function f; f : S R such that f ( x gl ) f ( x ) x S . When S R n , then the problem can be formulated by
min x S f ( x ) : S R ,
In both problems (formulae) S R n is the range in which we find the global minimizer of f ( x ) . f ( x ) is continuously differentiable.
Global optimization (GO) attempts to find the approximate solution of the objective function are shown in Problem (2).
However, this task can be difficult since the knowledge about f is usually only local. On the other hand, the fastest algorithms (LO) prefer to find a local point since these algorithms are not capable of finding the global solution at each run.
The bottom line is that the core difference between the GO methods and the LO algorithms is as follows: the GO methods focus on solving Problem (2) over the given set, while the task of the LO methods is to solve (1). Consequently, solving Problem (1) is relatively simple by using deterministic (classical) local optimization methods. On the contrary, finding the global optimum of Problem (2) is an NP-hard problem.
Challenging problems arise in different application fields, for example, technical sciences, industrial engineering, economics, networks, chemical engineering, etc. See [1,2,3,4,5,6,7,8,9,10,11].
Recently, many optimization algorithms have been proposed to deal with these problems. The thoughts of those suggested methods rely on the standard of meta-heuristic strategies (random search).
There are different classifications for meta-heuristic methods [12].
Mohamed et al. [7] presented a brief description of these classifications.
In random algorithms, the minimization technique relies partly on probability.
In contrast, in the deterministic algorithms, a guessing scale is not utilized. Hence, deterministic techniques need an exhaustive examination over the research domain of function f to find the approximate solution to Problem (2) at each run. Otherwise, they fail in this task.
Therefore, finding the approximate solution to Problem (2) by using random techniques can be proved by the asymptotic convergence probability. See [13,14,15].
There are many deterministic methods that have been proposed for dealing with the local optimization problems. See, for example, Refs. [16,17,18,19,20].
The most popular deterministic method is the CG method [18]. CG methods are exceedingly utilized to find the local minimizer of Problem (1) [21].
However, the  CG algorithms have a numerical weakness, so their subsequent actions might be low if a little step is created away from the local point. Hence, for solving this issue, a line-search technique is combined with the CG technique to create a globally convergent algorithm [22,23].
Therefore, many conjugant gradient line-search methods are suggested; see, for example, refs. [18,24,25,26,27,28].
The CG method is an efficient and inexpensive technique to deal with Problem (1).
The CG method is an iterative algorithm. Therefore, the candidate solutions are generated by the following recursive formula.
x k + 1 = x k + α k d k ,
where the step size α k > 0 , and the directions d k are created by the following formula:
d k + 1 = g k + 1 + β k d k , d 0 = g 0 .
where g k denotes the gradient vector of the function f at the point x k .
Several versions of the CG methods are suggested. The core difference between those CG algorithms relies on choosing the parameter β k [18,27,28,29]. The main features of the CG method are as follows: it has low memory requirements, it is strongly local, and it has global convergence properties [30].
Many authors presented several studies to analyze the CG method; see, for example, Refs. [31,32].
In 1964, the authors of [33] applied the CG methods to nonlinear problems, and they proposed the following parameter.
β k F R = g k + 1 2 g k 2 .
The authors of [34,35] established the global convergence of the scheme defined in (5); they used an exact line search and an inexact line search respectively.
However, the author of [36] showed that there are some cases that have some strays; these jamming occurrences happen when the search directions d k are almost orthogonal to the gradient vector g k [18].
The authors of [37,38] presented a modification of the parameter β k F R for treating the noise event denoted in [36]. Hence, they proposed the following parameter.
β k P R P = y k T g k + 1 | | g k | | 2 ,
where y k = g k + 1 g k . When a noise occurs g k + 1 g k , β k P R P 0 , and  d k + 1 g k + 1 , i.e., when jamming happens, the search direction d k is no longer perpendicular to the gradient vector g k , but it is aligned with the vector g k . This built-in restart advantage of the β k P R P parameter usually has better quick convergence when compared to the parameter β k F R [18].
The authors of [39] proposed an approach closely related to β k P R P , and it is defined as follows.
β k H S = y k T g k + 1 d k T y k .
in the case that step-size α k is found by an exact line search algorithm. Hence, by (4) and the orthogonality situation g k + 1 T y k = 0 , the following can be obtained:
d k T y k = ( g k + 1 g k ) T d k = d k T g k = | | g k | | 2 .
Therefore, β k H S = β k P R P when the step size α k is calculated by an exact line search method. Other fundamentals formulas of the parameter β k which contain one term are listed as follows.
β k L S = g k + 1 T y k d k T g k .
Formula (9) was proposed by [40].
β k D Y = g k + 1 | | 2 y k T d k .
Formula (10) was proposed by Dai and Yuan [41]. It is noteworthy that when the f is quadratic and step size α k is selected to reduce f along d k , the options of the parameter β k mentioned above are alike for the generic nonlinear function.
Different alternatives have fully different convergence possessions [18].
Many version of the parameter β k have been proposed in two- and three terms; see, for example, Refs. [32,42,43,44,45,46,47,48,49,50].
For example, in the following two approaches, we present some modifications to obtain a new CG method. See Section 2.
β k H Z = ( y k T g k ) ( d k 1 T y k ) 2 | | y k | | 2 ( d k 1 T g k ) ( d k 1 T y k ) 2 .
Formula (11) was proposed by [30].
β k M H Z = ( y k T g k ) ( d k 1 T y k ) 2 | | y k | | 2 ( d k 1 T g k ) max { σ | | y k | | 2 | | d k | | 2 , ( d k 1 T y k ) 2 } ,
where σ > 0.5 is a constant. Formula (12) was proposed by [49]. The denominator ( d k 1 T y k ) 2 in the β k H Z is modified to max { σ y k 2 d k 2 , ( d k 1 T y k ) 2 } in the β k M H Z . This procedure may help the d k stay in a trusted area automatically beneath each iteration [49]. Furthermore, in a situation σ | | y k | | 2 | | d k | | 2 < ( d k 1 T y k ) 2 , β k M H Z decreases to β k H Z with α k calculated to satisfy the inexact line search. Moreover,  β k H Z decreases to β k H S under the exact line search.
Consequently, by using a line search method, the CG method can satisfy the following descent condition:
g k T d k C g k 2 ,
where C > 0 is a constant.
The sufficient descent condition (13) has a core task in the convergence analysis of the algorithms. See [17,30,31,32,35,41,49,51,52].
However, the CG method has a numerical obstacle; its sub-sequential phases might be low if a little step is created away from the intended point [49].
Recently, the authors of [48,49] proved that the CG algorithm includes powerful convergence features if it satisfies the trust-region feature that is determined by
| | d k | | < C v | | g k | | ,
where C v > 0 is a constant. It is shown, therefore, that the trust-region property can enable the search direction d k to be bounded in the trust radius [49]. Numerous researchers proposed many CG algorithms that give perfect results and powerful convergence properties. See [30,48,49,51].
The selection of the right step size α k can help the CG algorithms to achieve global convergence.
The exact line search is defined as follows:
f ( x k + α k d k ) = min α 0 θ ( α ) = f ( x k + α d k ) .
It is clear that in big-scale problems, the exact line search cannot be used.
Therefore, there are many techniques to achieve this task. Formula (15), for example, the weak Wolfe–Powell algorithm (WWP), is a popular technique, and it is exceedingly utilized. The WWP technique is designed to find the step size α k to satisfy the following inequalities:
f ( x k + α k d k ) f ( x k ) + δ α k g k T d k ,
and
g ( x k + α k d k ) T d k σ g k T d k ,
where δ ( 0 , 0.5 ) and σ ( δ , 1 ) are constants.
Inequality (16) is named the Armijo condition, and the WWP line search decreases to strong Wolfe–Powell (SWP) by substituting Inequality (17) with the following inequality:
| g ( x k + α k d k ) T d k | σ g k T d k ,
Generally, under the WWP line search, it is assumed that the gradient g ( x ) is Lipschitz continuous in the convergence analysis. Therefore, the following inequality is satisfied:
| | g ( x ) g ( y ) | | L | | x y | | ,
with L is a constant x , y R n .
In fact, the CG technique with the line search methods has proven notability in solving the local optimization problem [18,27,28]. However, in trying to solve Problem (2), the CG method fails to achieve this task per run because it is trapped to a local point. To prevent sticking in a local point, random parameters are used [53].
We can summarize the essence of the above discussions as follows.
Recently, there have been many and many proposed approaches presented to improve the performance of deterministic methods, such as CG methods, gradient descent methods, Newton methods, etc. Those new approaches are designed to deal with the local optimization problems. See, for example, Refs. [16,17,18,19,20].
On the other hand, a plentiful number of stochastic approaches are suggested to deal with the global optimization problems. See, for example, Refs. [1,2,4,5,7,54].
Therefore, to gain the features of both deterministic and stochastic methods, many studies presented several ideas and suggestions to combine deterministic and stochastic techniques to obtain a new technique that is efficient and effective in solving Problem (2). Numerical outcomes demonstrated that the interbreed between classical and stochastic techniques has been hugely successful. See [55,56,57,58,59].
This work focuses on solving the local and global minimization problems. So, the first part of this study trades with Problem (1) by suggesting a new modified CG method, while the second part of this paper presents a new random approach that includes three formulae by which the candidate solutions are generated randomly.
Therefore, the new proposed stochastic approach is combined with the new modified CG method that is proposed in the first part of this paper to obtain a new hybrid stochastic conjugate gradient algorithm that solves Problem (2). The new hybrid stochastic conjugate gradient algorithm has four formulae by which the candidate solutions are created. One of the four formulae is a purely deterministic formula, the second one is a mixture of deterministic and stochastic parameters, and the other two formulas contain parameters generated randomly. The bottom line is that we can claim that the main merit that makes the new hybrid algorithm capable of finding the approximate solution to the global minimum of a non-convex function comes from the hybridization of random and non-random parameters.
Consequently, the contribution of this paper is divided into two parts.
Part I presents the following contributions.
  • A new modified CG technique is proposed and added with a line search for obtaining a globally convergent algorithm that solves Problem (1). It is abbreviated by SHZ.
  • The convergence analysis of the SHZ algorithm is designed.
  • The gradient vector is estimated by using a numerical approximation approach (DFF); step-size h (interval) is randomly.
  • The convergence analysis of the DFF method is designed.
  • The four FR, SH, HZ and MZH methods are designed like the SHZ algorithm to solve Problem (1).
  • Numerical experiments of the five SHZ FR, SH, HZ and MZH algorithms are analyzed by using the performance profiles.
Part II presents the following contributions.
Stochastic parameters are designed (SP).
The five SHZ, FR, SH, HZ and MZH algorithms are hybridized with the SP technique to obtain five hybrid algorithms; HSSHZ, HSFR, HSSH, HSHZ and HSMZH. These five algorithms solve Problem (2).
Numerical experiments of the five HSSHZ, HSFR, HSSH, HSHZ and HSMZH algorithms are analyzed by using the performance profiles.
Consequently, the remainder of the study is arranged as follows.
Part I contains the following sections: Section 2 presents a new modified CG- SHZ technique with its convergence analysis.
In Section 3, the approximate value of the gradient vector is calculated by using the numerical differentiation. Section 4 presents the numerical investigations of the local minimization problem. Part II contains the following sections: Section 5 presents a random approach for unconstrained global optimization. Section 6 presents the hybridization of the conjugate gradient method with stochastic parameters. The numerical experiments of Problem (2) are presented in Section 7. Some concluding remarks are given in Section 8.

Part I: Local Minimization Problem

In this part, a new modified CG technique is presented, the convergence analysis of this technique is designed, the numerical differentiation approach is utilized to calculate the approximate values of the first derivative, the five algorithms are designed to solve Problem (1), and their numerical experiments are analyzed by using the performance profiles.

2. Suggested CG Method

Recently, the authors of [49] suggested a new MHZ-CG method, relying on the study which was proposed by the authors of [30]. The  MHZ method contains the sufficient descent and the trust-region features independent of a line search technique. The parameter of the MHZ is defined by (12).
Therefore, the story in this section begins with the authors of [30] who proposed a new CG-HZ method, where the parameter of the HZ method is defined by (11). The parameter β k H Z can ensure that d k satisfies the following inequality:
d k T g k 7 8 | | g k | | 2 ,
where (20) is proved by [30]. If the step size α k is calculated by the true line search, then β k H Z decreases to the β k H S that was proposed by [39] because d k T g k = 0 is true [49].
Hence, for obtaining the global convergence for a general function, Hager and Zhang [30] dynamically adjusted the down limitation of β k H Z by
d k = g k + β k H Z + d k 1 , d 0 = g 0 ,
β k H Z + = max { β H Z , r k } , r k = 1 | | d k 1 | | min { r , | | g k 1 | | } , where r > 0 is a constant.
Many researchers have suggested several modifications and refinements to improve the performance of the CG-HZ algorithm. The latest version of the CG-HZ method was offered by [49]. Yuan et al. [49] presented some modifications to the HZ-CG method, and the result was obtaining the new CG-MHZ algorithm.
The CG-MHZ algorithm contains a sufficient condition and the trust-region feature.
The research direction of the MHZ-CG technique is designed as follows:
d k = g k + β k M H Z d k 1 , d 0 = g 0 ,
where the β M H Z is defined by (12).
In this paper, the MHZ method is extended and modified to obtain a new proposed method called the SHZ method such that the SHZ method has a sufficient condition and the trust-region feature. This method is defined as follows:
d k = g k + β k S H Z d k 1 , d 0 = g 0 ,
β k S H Z = ( y k T g k ) ( d k 1 T y k ) 2 | | y k | | 2 ( d k 1 T g k ) max { ϑ y k 2 d k 2 , ( d k 1 T y k ) 2 } ,
where the ϑ = max { ρ , R k } , the  ρ and R k are defined as follows. The parameter ρ is changed randomly at each iteration and its values are taken from the range [ 0.8 , 2 ) and R k = f x . The values of f and x are calculated by
f = | f 0 f I t r | ,
where Itr is the number of iterations, and after the Itr number of iterations, f I t r and f are computed. Then, we set f 0 = f I t r , while x is defined by
x = x k + 1 x k , for k = 0 , 1 , , I t r .
Hence, when ϑ = σ β k S H Z inevitably reduces to one of the following methods { β k M H Z , β k H Z , β k H S } as follows.
If ϑ = σ and δ y k 2 d k 2 > ( d k 1 T y k ) 2 , the  β k S H Z reduces to the β k M H Z . Otherwise, β k S H Z reduces to β k H Z or to β k H S under the exact line search [49]. This procedure gives the advantages of the MHZ, HZ and HS methods to the proposed SHZ method. In other words, the SHZ algorithm gains the characteristics of the three MHZ, HZ and HS algorithms. This is why the SHZ algorithm is superior to the four other MHZ, HZ, HS and FR methods.
Note: The authors of [49] imposed that the σ > 0.5 is a constant, while the parameter ϑ is modified dynamically at each iteration.

Convergence Analysis of Algorithm 1

In this section, we present the features of Algorithm 1. We also present the convergence analysis of this algorithm, and we show that the search direction d k that is defined by Formula (23) satisfies the sufficient descent condition and the trust-region merit, which are defined by Formulae (13) and (14), respectively.
Algorithm 1 A conjugate gradient method (CG-SHZ).
Input: 
f : R n R , f C 1 , γ ( 0 , 1 ) , k = 0 , a starting point x k R n and ε > 0 .
Output: 
x * = x loc the local minimizer of f, f ( x * ) , the value of f at x *
1:
Set d 0 = g 0 and k : = 0 .
2:
while g k > ε . do
3:
    compute α k to satisfy (16) and (17).
4:
    Calculate a new point x k + 1 = x k + α k d k .
5:
    compute f k = f ( x k + 1 ) , g k = g ( x k + 1 )
6:
    Set k = k + 1 .
7:
    calculate the search direction d k by (23).
8:
end while
9:
return x ac the local minimizer and its function value f a c
Two sensible hypotheses are assumed as follows.
Hypothesis 1.
We suppose that Problems (1) and (2) contain an objective function f ( x ) with the following characteristics: continuity and differentiability properties.
Hypothesis 2.
In some neighborhood ℵ of the level set
= { x R n : f ( x ) f ( x 0 ) } ,
the gradient vector g ( x ) is Lipschitz continuous. This means that there is a fixed real number L < such that
g ( x ) g ( y ) L x y ,
for all x , y .
Lemma 1.
Suppose that the sequence { x k } is obtained by Algorithm 1. If  d k T y k 0 , then
g k T d k c g k 2 ,
and
| | d k | | r v g k ,
where c = 1 7 9 ϑ > 0 , ϑ = max { ρ , R k } , ρ is taken randomly from [ 8 10 , 2 ) at each iteration of Algorithm 1, 0 R k < , and  r v = ( 1 + 3 ϑ ) is the trust-region radius.
Proof. 
If k = 0 , d 0 = g 0 , then g 0 T d 0 = | | g 0 | | 2 and | | d 0 | | = g 0 , which indicates (27) and (28) by picking c ( 0 , 1 ] and r v [ 1 , ) .
Merging (23) with (24), the result is obtaining the following:
g k T d k = ( y k T g k ) ( d k 1 T y k ) ( g k T d k 1 ) 2 | | y k | | 2 ( g k T d k 1 ) 2 max { ϑ y k 2 d k 1 2 , ( d k 1 T y k ) 2 } g k 2 .
The following inequality u T v 1 2 ( | | u | | 2 + v 2 ) is applied to the first term of the numerator of Inequality (29), where u = d k 1 g k T y k , v = y k g k T d k 1 , and it is clear that u T v 7 9 ( | | u | | 2 + v 2 ) is right.
Therefore, the following inequality obtains
g k T d k = ( y k T g k ) ( d k 1 T y k ) ( g k T d k 1 ) 2 | | y k | | 2 ( g k T d k 1 ) 2 max { ϑ y k 2 d k 1 2 , ( d k 1 T y k ) 2 } g k 2
g k 2 + 7 9 | | y k | | 2 g k 2 | | d k 1 | | 2 + 7 9 | | y k | | 2 ( g k T d k 1 ) 2 2 | | y k | | 2 ( g k T d k 1 ) 2 max { ϑ y k 2 d k 1 2 , ( d k 1 T y k ) 2 } =
g k 2 + 7 9 | | y k | | 2 g k 2 | | d k 1 | | 2 11 9 | | y k | | 2 ( g k T d k 1 ) 2 max { ϑ y k 2 d k 1 2 , ( d k 1 T y k ) 2 }
g k 2 + 7 9 | | y k | | 2 g k 2 | | d k 1 | | 2 max { ϑ y k 2 d k 1 2 , ( d k 1 T y k ) 2 } ( 7 9 ϑ 1 ) g k 2 ,
such that
max ϑ y k 2 d k 1 2 , ( d k 1 T y k ) 2 ϑ y k 2 d k 1 2 ,
where ϑ = max { ρ , R k } . Since ϑ 8 10 and c = 1 7 9 ϑ > 0 , (27) is true.
By using (30), it is obvious that
d k = g k + ( y k T g k ) ( d k 1 T y k ) 2 | | y k | | 2 ( d k 1 T g k ) max { ϑ y k 2 d k 2 , ( d k 1 T y k ) 2 } d k 1 g k   +   | | y k | | 2 g k d k 1 2 + 2 y k 2 g k d k 1 2 ϑ y k 2 d k 1 2 = 1 + 3 ϑ g k
Consequently, (28) is met, where r v [ 1 + 3 ϑ , ) . The proof is complete.    □
Corollary 1.
According to Formula (28) of Lemma 1, the following formula is met.
k = 0 g k 4 d k 2 = .
Proof. 
Since d k r v g k 2 , where 1 < r v < , then d k 2 r v 2 g k 4 , therefore, d k 2 g k 4 r v 2 , hence g k 4 d k 2 1 r v 2 . Now, the final expression is summed as k . The result is obtaining the following inequality: k = 0 g k 4 d k 2 k = 0 1 r v 2 = 1 r v 2 k = 0 1 = . Therefore, (31) is met.    □
Under the assumptions, we give a helpful lemma that was basically proved by Zoutendijk [60] and Wolfe [61], Wolfe [62].
Lemma 2.
Assume that the x 0 is the initial point by which Assumption 1 is satisfied. Regarding any algorithm of Formula (23),  d k is a descent direction, and α k satisfies the standard Wolfe conditions (16) and (17). Hence, the following inequality is met:
k = 0 ( g k T d k ) 2 d k 2 <
Proof. 
It tracks Formula (17), such that
d k T y k = d k T ( g k + 1 g k ) ( σ 1 ) g k T d k .
On the other hand, the Lipschitz condition (19) implies
( g k + 1 g k ) T d k α k L d k 2 .
The above two inequalities give
α k σ 1 L . g k T d k d k 2 ,
which with (16) implies that
f k f k + 1 c ( g k T d k ) 2 d k 2 ,
where c = δ ( 1 σ ) L . By summing (36) and with the observation that f is limited below, we see that (32) holds, which concludes the proof.    □
Theorem 1.
Suppose that Hypotheses 1 and 2 hold, and by utilizing the outcome of Corollary 1, the sequence { g k } that is generated by Algorithm 1 satisfies the following:
lim k inf g k = 0 ,
Proof. 
By contradiction, suppose that (37) is not true; then, for some ϵ > 0 , the following inequality is true:
g k ϵ .
Hence, with inequality (38) and (27), we obtain
g k T d k c g k 2 ϵ 2 .
Then, we have
g k T d k d k ϵ 2 d k ;
g k T d k d k ϵ 4 d k 2 ,
and by summing the final expression, we obtain
k = 0 ( g k T d k ) 2 d k 2 k = 0 ϵ 4 d k 2 = .
Therefore, the above leads to a contradiction with (32). So, (37) is met.    □
Note 1: The search direction d k that is defined by Formula (23) satisfies the sufficient descent condition which is defined by Formula (13).
Note 2: Lemma 1 guarantees that Algorithm 1 has a sufficient descent property and the trust-region feature automatically.
Note 3: Theorem 1 confirms that the series { g k } that is obtained by Algorithm 1 approaches to 0 as long as k .
In the next section, the numerical differentiation approach is discussed by which the first derivative is estimated and the step size α k is computed.

3. Numerical Differentiation

We now turn our attention to the numerical approximation to compute the approximate value of the gradient vector. In precept, it can be possible to find an analytic form for the first derivative for any continuous and differentiable function. However, in some cases, the analytic form is very complicated. The numerical approximation of the derivative may be sufficient for some purposes.
In this paper, the values of the α k , g k and the direction d k are computed by using the numerical differentiation method. Moreover, we have another step size and research directions that are generated randomly.
Several suggested methods have given fair outcomes for computing the gradient vector values numerically. See [63,64,65,66,67].
The common approaches by which the first derivative is computed are the finite difference approximation methods. Therefore, the first derivative f ( x ) can be estimated by the following numerical differentiation formula:
D f f ( x i ) = f ( x i + 1 ) f ( x i ) x i + 1 x i = f ( x i + h ) f ( x i ) h ,
where h is limited and little, but it is not necessarily infinitesimally small.
Reasonably, if the value of the h is small, the approximated value of the first derivative may improve. The forward difference and the central difference are the familiar and common methods used in many studies; see for example, [68,69,70,71,72].
The Taylor series can be used to derive these formulas. Thus, 3, 4 and 5 points can be utilized to derive these formulas, but it will be more costly than utilizing 2 points. The central difference method is known to include aspects of both accuracy and precision [73] but it needs 2 n function evaluations against the forward-difference approximation approach, which needs n function evaluations for each iteration. So, in this study, the forward-difference approximation approach is used, because it is a cheap method and it has sensible precision [66,68].
The advantage of the finite difference approximation approaches relies on choosing the fit values of the h.
Error approximation of the first derivative is discussed in the next section.
Therefore, the discussion of the error analysis guides us to define an appropriate finite-difference interval for the forward-difference approximation that balances the truncation error that grows from the error in the Taylor formula, and the magnitude error that is obtained from noise during computing the function values [66].

3.1. Error Analysis

Formula (41) contains the forward-difference approximation form that is used to estimate the first derivative of the function f. Its errors are proportional to some power of the values of h. Therefore, it appears that the errors go on to reduce if h is reduced. However, it is a part of the problem since it is assumed only the truncation error yielded by truncating the high-order terms in the Taylor series expansion and does not take into account the round-off error induced by quantization. The round-off error is beside the truncation error; all of them are discussed in this section as follows.
Regarding this goal, suppose that the function values f ( x ) , f ( x + h ) , are quantized to θ 1 = f ( x + h ) + ϵ 1 , θ 0 = f ( x ) + ϵ 0 , with the sizes of the round-off errors ϵ 1 and ϵ 0 all being smaller than some positive number ε , that is | ϵ j | ε ; with j = 0 , 1 .
Hence, the total error of the forward difference approximation defined by (41) is derived by
D f f ( x ) = θ 1 θ 0 h = f ( x + h ) + ϵ 1 f ( x ) ϵ 0 h = f ( x ) + ϵ 1 ϵ 0 h + T f 2 h .
Hence,
| D f f ( x ) f ( x ) | | ϵ 1 ϵ 0 h | + | T f 2 | h 2 ε h + | T f | 2 h ,
with T f = f ( x ) . Therefore, the upper bound of the error is illustrated by the right-hand side of Formula (43). The maximum limited of error contains two expressions; the first comes from the rounding error and in inverse proportion to step-size h, whilst the second comes from the truncation error and in direct proportion to h. These two parts can be formulated as a function ϕ ( h ) with respect to h as follows ϕ ( h ) = 2 ε h + | T f | 2 h . Now, if we find the minimizer h * of the function ϕ ( h ) , then the value ϕ ( h * ) is the upper bound of the total error. Hence d ϕ ( h ) d h = 2 ε h 2 + | T f | 2 = 0 , then
h * = 2 ε | T f | = 2 ε | f ( x ) | .
Therefore, it can be concluded that as we create small values of h, the round-off error might grow, whilst the truncation error reduces. It is called the “step-size dilemma”.
Consequently, there have to be some optimal values of the h * for the forward difference approximation formula, as derived analytically in (44). However, Formula (44) is only of theoretical value and cannot be used practically to determine h * because we do not have any information about the second derivative and, therefore, we cannot estimate the values of T f .
Therefore, there are many approaches which have been presented to deal with the step-size dilemma.
Recently, Shi et al. [66] proposed a bisection search for finding a finite-difference interval for a finite-difference method. Their approach was presented to balance the truncation error that grows from the error in the Taylor formula and the measurement error obtained from noise in the function evaluation. According to their numerical experience, the finite-difference interval h * are bounded between the following ranges [ 2 × 10 4 , 6.32 × 10 1 ] , [ 2.72 × 10 4 , 8.26 × 10 0 ] and [ 8.44 × 10 3 , 3.94 × 10 0 ] by using the forward and central differences to estimate the values of the first derivative of the f.
Additionally, the authors of [68] gave a study of the theoretical and practical comparison of the approximate values of the gradient vector in derivative-free optimization. These authors analyzed some approaches for approximating gradients of noisy functions utilizing only function values; those techniques include a finite difference.
The values of the finite difference interval are as follows 10 8 h * 1 .
According to the earlier investigations, the core of the difference between all approaches is to determine the step size h. Hence, the value of the step size is ranged between this range h * [ 1 , 12 × 10 10 ] .
In this paper, the  h is designed in a way that makes its values generated randomly. Additionally, the values of the h are connected to the function values per iteration to cover this domain, thus the feature here is that the value of h is modified per iteration randomly.
Therefore, a fresh approach to define the h * is presented in the following section.

3.2. Selecting a Step-Size h

The forward difference approach is a cheap method compared to the different techniques.
The forward difference approach has shown promising results for minimizing noisy black-box functions [66].
Depending on the hypotheses which are listed in Section 2, let x 0 be any starting point, thus function f satisfies the following f 0 f 1 f k , for  k = 0 , 1 , 2 , . The numerical outcomes that are given in the past papers denote that the values of step-size h belong to the following range [ 10 10 , 1 ] .
Therefore, the next Algorithm 2 is created to generate the values of the h * randomly from the intervals [ 0.1 , 10 8 ] .
Algorithm 2 Algorithm for calculating the values of h * .
Step 1: At each iteration k, we generate a set random values between 10 2 , and  10 7 , and this set of random values is denoted by L ϵ = { l ϵ 1 , l ϵ 2 , , l ϵ 10 } .
Step 2: The minimum and maximum of the set L ϵ are extracted, respectively, as follows M ϵ = min { l ϵ i : i = 1 , 2 , , 10 } , N ϵ = max { l ϵ i : i = 1 , 2 , , 10 } and set M f = M ϵ 1 .
Step 3: The function value f is calculated at each k; f k = f ( x k ) .
Now we determine two cases according to the function values of the | f k | as follows.
Case 1: If | f k |   [ 10 1 , ) , the value of the h is determined by
h k = N ϵ M f if | f k | > M f , M ϵ | f f | otherwise .
Case 2: If | f k |   [ 0 , 10 1 ) , the value of the h is determined by a random way from the range [ 10 4 , 10 8 ] .
Example: In this example, we show how the above algorithm is run.
Let us suppose that the point x 0 has four different values as starting points with four different values of f, for example, f 0 = f ( x 0 ) = { 10 10 , 10 6 , 10 3 , 10 1 } and suppose we generate the set L ϵ as random values between 10 1 , and  10 7 such that L ϵ = { 1.50 × 10 4 , 5.10 × 10 6 , 1.01 × 10 6 , 1.40 × 10 2 , 1.78 × 10 7 , 1.92 × 10 5 , 1.09 × 10 3 , 2.77 × 10 4 , 2.99 × 10 04 , 5.15 × 10 4 } , M ϵ = 1.78 × 10 7 ; hence, M f = 5.618 × 10 6 , since f 0 = 10 10 > M f = 5.618 × 10 6 , then we set F 0 = M f = 5.618 × 10 6 and h 1 = 2 M ϵ M f = 2 1.78 × 10 7 5.618 × 10 6 = 3.56 × 10 7 . If f 0 = 10 6 , f 0 = 10 6 < M f = 5.618 × 10 6 , and  then h 1 = 2 M ϵ F 0 = 2 5.618 × 10 6 10 6 = 8.438 × 10 7 , and f 0 = { 10 3 < M f 5.618 × 10 6 , we set F 0 = 10 3 , then h 1 = 2 M ϵ F 0 = 2 5.618 × 10 6 10 3 = 2.6683 × 10 5 .
Finally, if  f 0 = 10 1 , then h 1 = 2 5.618 × 10 6 10 1 = 2.67 × 10 3 .
The above example shows how Case 1 is implemented by using Formula (45).
Regarding Case 2 when 0     | f k |   <   0.1 , the value of the h k is taken randomly from the range [ 10 4 , 10 8 ] .

3.3. Estimating Gradient Vector

The forward finite difference (DFF) is utilized to compute the approximate value of the gradient vector of function f at x R n by
[ DFF ] i = f ( x + he i ) f ( x ) h , for i = 1 , 2 , , n .
where h > 0 is the finite difference interval defined in Section 3.2, and e i R n is the i t h column of the identity matrix.
Therefore, g ( x ) DFF ( x ) , is the approximate value of the gradient vector of function f at point x .
Therefore, the step size φ k is defined in the following.
The function f ( x ) is estimated by utilizing Taylor’s expansion up to the linear term around the point x k , for each iteration k. Then we have
f ( x k + p ) f ( x k ) + g ( x k ) T p .
We define the quadratic model of f ( x ) at x k as
m k ( p ) = 1 2 f ( x k ) + g ( x k ) T p 2 = 1 2 f ( x k ) 2 + f ( x k ) g ( x k ) T p + 1 2 p T g ( x k ) g ( x k ) T p .
Set p = φ g ( x k ) where φ is the step size along the − g ( x k ) . The optimal value of the φ is picked by solving the following subproblem: min φ R m k ( φ ) = 1 2 f ( x k ) 2 φ f ( x k ) g ( x i ) T g ( x k ) + 1 2 φ 2 ( g ( x k ) T g ( x k ) ) 2 . This gives
φ k = f ( x k ) g ( x k ) 2 .
Therefore,
g ( x k ) 2 = f ( x k ) φ k , φ k 0 ,
where g ( x k ) DFF ( x k ) .

3.4. Convergence Analysis of DFF

The condition which is usually utilized in the convergence analysis of first-order methods with inexact gradient (DFF) vectors is defined by
| | DFF ( x ) g ( x ) | | C | | g ( x ) | | ,
for some 0 C < 1 . This condition is introduced by [74,75] and it is called a norm condition. This condition denotes that the g ( x ) D F F ( x ) is a descent direction for the function f [68].
However, condition (49) cannot be applied, unless we know g ( x ) ; therefore, this condition might be hard or impossible to verify.
There are many authors who have attempted to deal with this issue; see, for example, Refs. [68,76,77,78,79]. Byrd et al. [76] suggested a practical approach to estimate g ( x k ) , and they utilized it to guarantee some approximation of (49). Cartis and Scheinberg [77] and Paquette and Scheinberg [79] replaced condition (49) by
D F F ( x ) g ( x ) k α k | | g ( x ) | | ,
where k > 0 , and convergence rate analysis were derived for a line search method that has access to deterministic function values in [77] and stochastic function values (with additional assumptions) in [79]. Berahas et al. [68] established conditions under which (49) holds. For the forward finite differences method (DFF), they set h * = 2 M ε L .
Therefore, we present the following
Theorem 2.
Under Assumptions 1 and 2 of Section 2, let DFF ( x ) denote the forward finite difference approximation to the gradient g ( x ) . Then, for all x R n , the following inequality is true:
| D F F ( x k )   g ( x k ) | | f ( x k ) h i f ( x k ) | + f ( x k ) φ k , φ k 0 ,
where the value of the φ k is estimated by (47). We know that X and X are the norm infinity and the 2-norm, respectively, and they are defined by
X = max 1 i n | x i | ,
X = i x i 2 ,
and then
X = max 1 i n | x i | i x i 2 .
According to (46) which defines the gradient approximation by forward differences, the vector of [ DFF ( x k ) ] i is described by [ DFF ( x k ) ] i = 1 h [ f ( x k + e i h ) f ( x k ) ] i , wherer i = 1 , 2 , , n , then
DFF ( x k ) = max 1 i n | f ( x k + e i h ) f ( x k ) h i | = 1 h max 1 i n | [ f ( x k + e i h ) f ( x k ) ] i | ,
and therefore, the next inequality is true
D F F ( x k ) = 1 h max 1 i n | [ f ( x k + e i h ) f ( x k ) ] i | | f ( x k ) h i f ( x k ) | .
By using (48), (51), (54) and (55), we obtain | DFF ( x k ) g ( x k ) | DFF ( x k ) + g ( x k ) | f ( x k ) h i f ( x k ) | + g ( x k ) 2 = | f ( x k ) h i f ( x k ) | + f ( x k ) φ k , φ k 0 .
Therefore, the theorem holds.

4. Numerical Experiments of Part I

All experiments were run on a PC with Intel(R) Core(TM) i5-3230M [email protected] 2.60 GHz with RAM 4.00 GB of memory on a Windows 10 operating system. The five methods were coded by utilizing MATLAB version 8.5.0.197613 (R2015a) and the machine epsilon was about 10 16 .
The model optimization test problems are categorized into two types. The first type is the test problems that contain a convex function, while the second type include a non-convex function. Both kinds of test problems are listed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 such that the second type of the test problem is referred to by *. Columns 1–4 of Table 1 give the data of the test problems as follows: the abbreviation of the function f is given on Column 1, the number of variables n is listed on Column 2, the exact function value f ( x * ) at the global point x * is presented on Column 3, and the exact value of the norm of the gradient g ( x * ) vector is given by Column 4, where the mark “−” denotes that the value of the norm of the gradient g ( x * ) for the convex function satisfies the stopping criterion g ( x * ) < 10 6 . Columns 5–8 are as Columns 1–4.
The data in Table 1 are taken from [56].
The numerical results for the local minimizers of all test problems are listed in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. Columns 1–2 and 8–9 contain the abbreviation of the function f and the number of the variables n, respectively. Columns 3–7 contain the abbreviation of each algorithm of the five algorithm SHZ, MHZ, HZ, HS and FR, which present the number of worst iterations, number of worst function evaluations, number of best iterations, number of best function evaluations, average of time (CPU), average of the number of iterations and average of the number of function evaluations, respectively. Columns 10–14 are similar to Columns 3–7.
Note 1: It is worth noting that the full name for each test function is mentioned in Appendix A according to the reference in which the test problem is.
Note 2: F denotes that the algorithm has failed to find the local minimizer of the function f according to the stopping criteria of Algorithm 1 which are listed in Section 4.1 below.
The stopping criteria of Algorithm 1 are as follows.

4.1. Stopping Criteria of Algorithm 1

Since this section focuses in finding a local minimizer of all test problems, the stopping criteria of Algorithm 1 can be defined as follows.
According to the discussions of the convergence analysis which are mentioned in the previous sections, the stopping criterion of Algorithm 1 is, if  g ( x k ) ε 1 is satisfied, Algorithm 1 stops, where ε 1 [ 10 6 , 10 8 ] . However,  the exact value of the gradient vector is unknown since the value of the gradient vector is estimated by Formula (46); therefore, this condition is replaced by DFF k ε 2 or FEs = n 10 4 , i.e., if one of them is met, Algorithm 1 stops, where ε 2 [ 10 7 , 10 9 ] , FEs denotes the maximum function evaluations and n is the number variables of the f.
In the following section, the performance profile is presented as an easy tool to compare the performance of our proposed method versus other methods in finding local minimizers of convex or non-convex functions regarding the worst and best numbers of iterations and function evaluations, the average of CPU time and the average of iterations and function evaluations, respectively.

4.2. Performance Profiles

The performance profile is the best tool for testing the performance of the proposed algorithms [80,81,82,83,84].
In this paper, the five algorithms’ performance evaluation standards are as follows: the worst and best numbers of iterations and function emulations, and the average of the CPU time, iterations and function emulations. They are abbreviated as itr.w, itr.be, FEs.w, FEs.be, time.a, itr.a and EFs.a, respectively. In the remainder of the paper, the set Fit will be used to denote the seven criteria; Fit = { itr.w , itr.be , FEs.w , FEs.be , time.a , itr.a , EFs.a } .
Therefore, the numerical outcomes are presented in the form of performance profiles, as depicted in [82]. The most important characteristic of the performance profiles is that they can be shown in one figure by plotting for the different solvers a cumulative distribution function ρ s ( τ ) .
The performance ratio is defined by first setting r p , s = t p , s min { t p , s : s S } , where p P , P is a set of test problems, S is the set of solvers, and  t p , s is the value obtained by solver s on test problem p.
Then, define ρ s ( τ ) = 1 | P | size { p P : r p , s τ } , where | P | is the number of test problems.
The value of ρ s ( 1 ) is the probability that the solver will win over the remaining ones, i.e., it will yield a value lower than the values of the remaining ones.
In the following, the performance profiles are utilized to evaluate the performance of the five methods: SHZ, MHZ, HZ, SH and FR.
Therefore, in this paper, the term t p , s indicates one element of the set Fit, | P | = 46 is the number of test problems. We have 46 unconstrained test problems, 14 of which include non-convex functions. The group of solvers S = { S H Z , M H Z , H Z , S H , F R } finds the local minimizers of the 46 test problems; therefore, the values of the F i t are taken from the results of the 46 test problems as follows.
Each solver s of the set S is run 51 times for each of the 46 problems; at each run, every element of the set Fit has owned its value. So, they are analyzed in the following.
r p , s = fit p , s min { fit p , s : s S } if the s pass to solve the p , otherwise ,
where fit p , s is an element of the Fit for the test problem p by using the solver s.
Note: Formula (56) means that if the final result, obtained by a solver s S , satisfies Inequality (57), then the first branch of (56) is computed. Otherwise, we set r p , s = .
D F F k ε 2 ,
where ε 2 [ 10 5 , 10 9 ] .
Therefore, the performance profile of solver s is defined as follows:
δ ( r p , s , τ ) = 1 if r p , s τ , 0 otherwise ,
Therefore, the performance profile for solver s is then given by the following function:
ρ s ( τ ) = 1 | P | p P δ ( r p , s , τ ) , τ 1 .
As we mentioned above, | P | = 46 and τ [ 1 , 60 ] .
By definition of Fit p , s , ρ s ( 1 ) denotes the fraction of test problems for which solver s performs the best. In general, ρ s ( τ ) can be explained as the probability for solver s S that the performance ratio r p , s is within a factor τ of the best possible ratio. Additionally, the essential characteristic of performance profiles is that they present data on the proportional performance of numerous solvers [82,83].
The numerical outcomes of the five methods are analyzed by using the performance profiles as follows. Figure 1, Figure 2, Figure 3 and Figure 4 show the performance profiles of the set solvers S, for each element of the set Fit, respectively.
The performance profile depicted on the left of Figure 1 (in the term itr.w) compares the five techniques for a set of the 46 test problems.
The SHZ method has the best performance for the 46 test problems; this means that our suggested approach is capable of finding a local minimizer to the 46 test problems as fast as, or faster than, the other four approaches.
For instance, if  τ = 1 , the  SHZ technique is capable of finding the local minimizer for 65 % of problems versus the 33 % , 20 % , 20 % and 13 % of a set of test problems solved by the MHZ, HS, FR and HZ methods, respectively.
In general, the  term itr.w, τ = 60 displays that all test problems are solved by SHZ against 96 % of test problems solved by the MHZ, HZ and FR methods respectively, while 93 % of test problems are solved by the HS method. At  τ 400 , all test problems are solved by the MHZ, HZ and FR methods respectively, while 98 % of test problems are solved by the HS.
The right graph of Figure 1 shows that the method SHZ is capable of finding the local minimum of all test problems regarding term FEs.w.
The rest of Figure 2, Figure 3 and Figure 4 show that the SHZ algorithm is superior to the four algorithms regarding the rest of the terms of the set Fit.
Therefore, the SHZ technique includes the characteristics of efficiency, reliability and effectiveness in solving Problem (1) compared to the other four methods.
Note: The power of the SHZ technique comes from the fact that the SHZ method gains the features of the four methods MHZ, HZ and HS, as we mentioned in Section 2.

Part II: Global Minimization Problem

It is worth mentioning that the final results of Part I for the second set of test problems contain some global minimizers at some runs for some non-convex functions. This means that the pure CG technique could not find the global minimizer of the second type of test problems for each run because it is a local method.
Therefore, to make this method capable of solving Problem (2) per run, the random technique is proposed and it is added to the CG approach to gain a new PS-CG hybrid technique that solves Problem (2). In many studies, the numerical outcomes indicated that the interbreed between a classical method and a random technique is very successful in overcoming the weakness of these methods. See [55,56,57,58,59].
Consequently, this part of the paper seeks to solve Problem (2).
Therefore, each method of the five CG methods mentioned in Part I is hybridized with the stochastic technique to obtain five algorithms to try to solve Problem (2).
In the next section, a stochastic technique is presented.

5. Random Technique

In this section, a new random parameter “SP” is presented. This stochastic technique contains three different formulas by which three different points are generated. This set of formulas is combined with the CG method to obtain a new algorithm that solves Problem (2).

Random Parameters (SP Technique)

Step 1: The first point is computed as follows, generate V k [ 1 , 1 ] n is as a random vector, set γ k = 10 ψ k , ψ k [ 0.01 , 1 ) , where the interval [ 0.01 , 1 ) is divided into Itr of fractions and at every iteration k, the parameter ψ k takes one value of the Itr and then computes λ k = ( 1 + γ k ) | V i | γ k S V i as a research direction with the step lengths, where i = 1 , 2 , , n , n is the of number variables, Itr is the number of iterations, and S V i denotes the signs of the V and is defined by
S V i = 1 if V i < 0 , 1 otherwise .
Thus, a  point is calculated as follows:
x 1 = x ac + λ k ,
where x ac is the best point obtained yet, and then we compute f 1 = f ( x 1 ) .
Step 2: The second point is defined by
x 2 = x ac + η k B k ,
where B k = φ k d k , φ k is defined by (47), η k ( 0 , 2 ) is a random number, and the d k is defined by (23). Then, we compute f 2 = f ( x 2 ) .
Step 3: This point is defined by
x 3 = X w + 1 2 Dx ,
where Dx = ( 1 + μ k ) | V i | 1 μ k + 0.1 S V i , μ k = | f a c | 2 , f a c is the function value at the point x a c that has been accepted, and X w is a stochastic variable picked from the feasible range of the objective function. This means that for X w [ a , b ] n , a and b are the lower and upper bounds of the feasible range, respectively, and the random vector V with its signs S V i is defined by the first step.
Therefore, we calculate f 3 = f ( x 3 ) .
For finding the global minimizer of a non-convex function, the above stochastic technique is used since Algorithm 1 is not capable of finding the global solution at each run. In other words, in some runs, Algorithm 1 fails to find the global solution to this function due to it sticking to a local point.
In the following example, we show how the SP algorithm is run.
Example: This example shows how the three steps of the SP algorithm are implemented.
We use the first test problem of the list of the test problems that are listed in Appendix A. R 2 ( x ) = 100 ( x 1 2 x 2 ) 2 + ( x 1 1 ) 2 , to facilitate an explanation of the mechanism of using the Sp algorithm (Formulas (61)–(63)), we use the following easy information about the function R 2 ( x ) , n = 2 is the number of the variables, x a c = [ 2 ; 1 ] , or  x a c = [ 2 ; 1 ] , where x a c represents the best solution has been accepted so far or the starting point; hence, the function values at the two points are R 2 ( x a c ) = 100 ( 2 2 + 1 ) 2 + ( 2 1 ) 2 = 2500 + 1 = 2501 and R 2 ( x a c ) = 100 ( 2 2 1 ) 2 + ( 2 1 ) 2 = 900 + 1 = 901 .
Supposing I t r = 5 is the number of iterations, the interval [ 0.01 ; 1 ) is divided into five fractions with step size 1 0.01 5 = 0.198 , and thus the set of this fractions is A = { 0.01 , 0.208 , 0.406 , 0.604 , 0.802 } , let k be 3 which means the algorithm is at the third iteration. Then, ψ 3 = 0.406 , γ 3 = 10 ψ 3 = 10 0.406 = 2.5468 . Let V 3 be [ 0.5 ; 1 ] , then λ 3 = ( 1 + 2.5468 ) | 0 . 5 | 2.5468 × 1 ; ( 1 + 2.5468 ) | 1 | 2.5468 × 1 = 1.8833 2.5468 ; 3.5468 2.5468 = 0.73948 ; 1.3926 .
Therefore, the new solution is computed by Formula (61) as follows.
x 1 = x a c + λ 3 = [ 2 ; 1 ] + [ 0.73948 ; 1.3926 ] = [ 1.2605 ; 0.3926 ] or x 1 = x a c + λ 3 = [ 2 ; 1 ] + [ 0.73948 ; 1.3926 ] = [ 1.2605 ; 2.3926 ] .
The function values at both points are as follows.
R 2 ( x 1 ) = 100 ( 1 . 2605 2 0.3926 ) 2 + ( 1.2605 1 ) 2 = 143.1 + 0.06786 = 143.17 or R 2 ( x 1 ) = 100 ( 1 . 2605 2 2.3926 ) 2 + ( 1.2605 1 ) 2 = 64.6 + 0.06786 = 64.668 .
Therefore, R 2 ( x 1 ) < R 2 ( x a c ) ; this means the solution that is generated by Formula (61) reduces the function value.
In the following, we explain how the candidate solution is generated by Formula (62).
Let M ϵ = 1.2 × 10 6 . By using Formula (45), we obtain h 3 = 4.381 × 10 5 as the step size h (a random interval) to the difference approximations method, and then we have x h 1 = [ x a c ( 1 ) + h 3 ; x a c ( 2 ) ] = [ 2 + 4.381 × 10 5 ; 1 ] , x h 2 = [ x a c ( 1 ) ; x a c ( 2 ) + h 3 ] = [ 2 ; 1 + 4.381 × 10 5 ] .
Therefore, the values of the function at the three points x a c , x h 1 and x h 2 are listed in the following.
R 2 ( x a c ) = 2501 , R 2 ( x h 1 ) = 2501.175 and R 2 ( x h 2 ) = 2500.956 .
We compute the approximate value of the gradient vector by Formula (46) as follows:
D F F ( x a c ) = 2501.175 2501 4.381 × 10 5 ; 2500.956 2501 4.381 × 10 5 = 3994.522 ; 1004.34 ,
φ 3 = 2501 DFF 2 = 0.0002 , where φ 3 is defined by (47).
We consider d 3 = g ( x a c ) 3994.522 ; 1004.34 because we do not have information about the value of the d 2 in this illustration example.
Now, we apply Formula (62), as follows B 3 = φ 3 d 2 = [ 0.799 ; 0.201 ] , we take η 3 = 0.971 as a random number from the range ( 0 , 2 ) , then x 2 = [ 2 ; 1 ] + 0.971 × [ 0.799 ; 0.201 ] = [ 1.2242 ; 0.80483 ] , the function value at the point x 2 is R 2 ( x 2 ) = 530.66 .
We note that the R 2 ( x 2 ) = 530.66 < R 2 ( x a c ) = 2501 , i.e., the function value is reduced by the point x 2 .
In the following, we explain how the candidate solution is generated by Formula (63).
μ 3 = | f a c | 2 = 2501 2 = 6 , 255 , 001 , Dx = ( 1 + 6 , 255 , 001 ) | 0.5 | 1 6 , 255 , 001 + 0.1 × 1 ; ( 1 + 6 , 255 , 001 ) | 1 | 1 6 , 255 , 001 + 0.1 × 1 = 2501 1 6 , 255 , 001.1 ; 6 , 255 , 002 1 6 , 255 , 001.1 = 0.0004 ; 0.999 . X w = [ 3.095 ; 8.701 ] is as a random vector picked from the range [ 5 , 10 ] 2 , and then x 3 = [ 3.095 ; 8.701 ] + 1 2 [ 0.0004 ; 0.999 ] = [ 3.095 ; 8.701 ] + [ 0.0002 ; 0.4995 ] = [ 3.0952 ; 9.2005 ] .
We compute the function value at the point x 3 ; R 2 ( x 3 ) = 100 ( ( 3.0952 ) 2 9.2005 ) 2 + ( 3.0952 1 ) 2 = 14.422 + 16.771 = 31.193 .
We note that the R 2 ( x 3 ) = 31.193 < R 2 ( x a c ) = 2501 . Therefore, the point x 3 minimizes the function value.
According to the above example that illustrates the mechanism of Formulas (61)–(63), we deduce the following results.
Remark 1.
Formulas (3), (61) and (62) are the main formulas which are used in the new hybrid proposed algorithm that is described in Section 6. However, Formula (63) is used when Δ f = 0 that is defined by Formula (25); in this case, Algorithm 3 reaches a critical point, thus if this point is the approximate value of the global minimizer point of the f, then Algorithm 3 stops according to the condition in Line 4 or Line 1 of Algorithm 3. Otherwise, the candidate solution is generated by Formula (63); see Section 6. Consequently, in this example, at iteration k = 3 , the result which is obtained by Formula (63) cannot be taken into account due to the Δ f 0 .
Remark 2.
All Formulas (61)–(63) minimize the function value from any starting point.

6. Hybridization of the CG Method with Stochastic Parameters

When a stochastic method as a global optimization algorithm is combined with a globally convergent method (deterministic method), the result is a global optimization algorithm [55,56].
Therefore, the SP technique is hybridized with each of the five conjugate gradient methods SHZ, MHZ, HZ, HS and FR to obtain five techniques.
Our proposed algorithm is called a hybrid stochastic CG method abbreviated by HSSHZ that solves Problem (2). However, Algorithm 3 represents five alternative algorithms when the SHZ method is hybridized with the PS technique, then we obtain a new algorithm abbreviated by HSSHZ. When we combine any method of MHZ, HZ, HS or FR, we obtain four other abbreviations of algorithms as follows: HSMHZ, HSHZ, HSHS and HSFR, respectively.
In general, the outputs of this paper are five algorithms that solve Problem (2), where the best one is the HSSHZ algorithm as illustrated by the numerical experiments section of Part II.
In the following, Algorithm 1 is combined with SP technique to obtain Algorithm 3.
The SP method permits conducting an exhaustive wipe of the search range to guarantee that the global minimizer point is visited at least once per run.
Algorithm 3 Hybrid stochastic CG method.
Input: 
f : R n R , f C 1 , f a c = f c g gained by Algorithm 1 and ε > 0 .
Output: 
x gl = x ac the global minimizer of f, f ( x gl ) , the value of f at x gl .
1:
while | f a c f * | > ε or FEs < n 10 4  do
2:
     f c g is a function value f gained by Algorithm 1.
3:
     f a c = min { f c g , f 1 , f 2 } and x a c the best point gives the f a c .
4:
    if  | f a c f * | ε  then
5:
        Stop.
6:
    end if
7:
    if  f = = 0  then
8:
        calculate the x 3 and the f 3 = f ( x 3 ) by Formula (63).
9:
        if  f 3 < f a c  then
10:
           the x 3 is accepted, compute the x ac x 3 , f a c f 3 , and go to Line 1.
11:
        else
12:
           generate another point x 3 by Formula (63).
13:
        end if
14:
    else
15:
        go to Line 1.
16:
    end if
17:
end while
18:
return x ac the best point and its function value f a c

A Mechanism Running Algorithm 3

As we mentioned above, Algorithm 3 is a combination of two methods; the first is a CG method of the five techniques CG = { S H Z , M H Z , H Z , S H , F R } that are discussed in Part I, and the second is a random method is depicted by Section 5. The point x cg is obtained by Algorithm 1 and it will be an input to Algorithm 3.
Algorithm 3 begins with Line 1 that is the stopping standard of the algorithm. Therefore, Algorithm 3 ends if one of the following standards is satisfied: The first standard is | f a c f * | ε , and the second standard is FEs n 10 4 , where f a c the best value of the function f is gained, the f * is the true solution, ε = 10 6 , FEs is the number of function evaluations, and FEs = n 10 4 is a stopping standard indicated by [85,86].
In Line 3, the best value of f is selected from the three values of the function f c g , f 1 and f 2 , and indicated by f a c , the three values of the function f are calculated by Algorithms (1), (61) and (62), respectively, and x ac indicates this.
In Line 4, if | f a c f * | ε is fulfilled, the algorithm ends. The standard that is listed in Line 7 gives the algorithm an opportunity to flee from the local points. Consequently, if f = 0 , then the algorithm has reached a crucial point. Therefore, if the norm of the gradient vector is 0 or ≈0, this point is either a local point or the global point. According to the above actions, the hybrid algorithm has been granted sequential opportunities to escape out of a snare (a local point). Thus, the procedures in Lines 8–12 are eligible for helping the algorithm to flee this snare, especially since the second stopping standard guarantees that most of the research domain is scanned.
The numerical outcomes of the five methods are given in the next section.

7. Numerical Experiments of Part II

The numerical results for the second test problems (non-convex functions) are presented, and these results are obtained by Algorithm 3.
The performance profiles tool that is described in Part I is used here for assessing the achievement of Algorithm 3 that contains five alternatives of algorithms as we mentioned above in Section 6.
The numerical results of the second type of the test problems are listed in Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15. Columns 1–2 and 8–9 contain the abbreviation of the function f and the number of the unknowns n, respectively. Columns 3–7 contain the abbreviation of each algorithm of the five algorithm HSSHZ, HSMHZ, HSHZ, HSHS and HSFR, which present the number of worst iterations, number of worst function evaluations, number of best iterations, number of best function evaluations, average of time (CPU), average of number of iterations and average of number of function evaluations, respectively. Columns 10–14 are similar to Columns 3–7.
Note: F denotes that the algorithm has failed to find the local minimizer of the function f according to the stopping criteria of Algorithm 3 which are listed in Section 6.
The performance profiles for the five algorithms are analyzed as follows.
Figure 5, Figure 6, Figure 7 and Figure 8 show the performance profiles of the five set solvers S regarding the set standard Fit that is mentioned in Section 4.2.
The performance profiles which are drawn on the left of Figure 5 (in the term itr.w) compares 5 methods for the 14 test problems.
The HSSHZ technique has a good achievement (for the term itr.w) for all test problems, which indicates that the HSSHZ technique is capable of solving Problem (2) as fast as or faster than the four techniques.
For instance, if τ = 1 , the HSSHZ algorithm solves 71 % of the 14 test problems against 14 % , 14 % , 7 % and 0 % , of the 14 test problems solved by the HSMHZ, HSHZ, HSFR and HSHS algorithms, respectively.
In general, for the term itr.w, τ 60 exhibits that the second type of the test problems are solved by HSSHZ, while 64 % , 71 % , 43 % and 50 % of test problems are solved by the HSMHZ, HSHZ, HSHS and HSFR algorithms respectively.
Figure 5, Figure 6, Figure 7 and Figure 8 demonstrate that the performance of the HSSHZ technique is better than the performance of the four techniques regarding the seven standards listed in the set Fit, respectively.
Therefore, the HSSHZ technique includes the characteristics of efficiency, reliability and effectiveness in finding the global minimizer of the non-convex function f compared to the other four methods.
It is worth observing that the power of the HSSHZ algorithm comes from the fact that the SHZ method gains the features of the four methods, MHZ, HZ, HS and FR, as mentioned in Section 2.
Note 1: In Algorithm 3, a run is considered successful if Inequality (64) is met.
| f a c f * | 10 5 ,
where f * is the exact global solution that is listed in Columns 3 and 7 of Table 1, respectively, and the f a c is the final result obtained by Algorithm 3.
Note 2: Formula (56) means if the final result f a c , obtained by Algorithm 3 satisfies Inequality (64), then the first branch of (56) is computed; otherwise, we set r p , s = .

8. Conclusions and Future Work

A new modified CG algorithm is suggested, named SHZ. The SHZ finds the local minimizers of unconstrained optimization problems. The modernized formulae of the SHZ algorithm are more complicated than previous approaches; nevertheless, the numerical experiments of the SHZ are very strong. The convergence analysis of the SHZ algorithm is designed. We also analyzed the gradient approximation g ( x ) DFF constructed by finite differences (the forward differences method). This method includes a new approach for selecting the fit value of the h according to the value of the objective function and it is updated dynamically at each iteration. The numerical results demonstrate that the performance of the SHZ method is positively competitive with the other four conjugate gradient methods based on performance profiles.
Comparing the final results of the gradient vector that were obtained by the method DFF to the exact values of the gradient vector demonstrates that the fresh technique succeeded in picking the right value of h. The proposed random approach recreates a critical role to make the SHZ method capable of finding the global minimizers of unconstrained optimization test problems, especially when the objective function is non-convex.
It can be worth observing that the power of the HSSHZ algorithm comes from the fact that the SHZ method gains the characteristics of the four methods, MHZ, HZ, HS and FR.
The suggested approach can be improved and modified to deal with constrained, multi-objective optimization problems, and it will be used for image restorations.

Author Contributions

Conceptualization, K.A.A.; Data curation, A.M.A.; Formal analysis, A.M.A. and K.M.S.; Funding acquisition, K.A.A.; Investigation, A.W.M.; Methodology, S.M.; Project administration, K.A.A. and K.M.S.; Software, S.M.; Supervision, A.W.M.; Validation, A.M.A.; Writing—original draft, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research is funded by Researchers Supporting Program at King Saud University (Project# RSP-2021/305).

Data Availability Statement

Not applicable.

Acknowledgments

The authors present their appreciation to King Saud University for funding the publication of this research through Researchers Supporting Program (RSP-2021/305), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. List of Test Problems

1 R n :
Rosenbrock functions [57,87,88]
min x i = 1 n 1 100 ( x i 2 x i + 1 ) 2 + ( x i 1 ) 2 .
Range of starting points 5 < x i < 10 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (1, 1, …, 1).
2 Z n :
Zakharov functions [57,80,87,88]
min x i = 1 n x i 2 + i = 1 n 0.5 i x i 2 + 0.5 i x i 4 .
Range of starting points 5 < x i < 10 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (0, 0, …, 0).
3 PW:
Powell function [80]
min x i = 1 n 4 ( x 4 i 3 + 10 x 4 i 2 ) 2 + 5 ( x 4 i 1 x 4 i ) 2 + ( x 4 i 2 2 x 4 i 1 ) 4 + 10 ( x 4 i 3 x 4 i ) 4 . Range of starting points 600 < x i < 600 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (0, 0, …, 0).
4 SP:
Sphere function [89]
min x i = 1 n x i 2 .
Range of starting points 10 x i 10 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (0, 0, …, 0).
5 Tr:
Trid function [80]
min i = 1 n ( x i 1 ) 2 i = 2 n x i x i 1 .
Range of starting points n 2 < x i < n 2 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = n ( n + 4 ) ( n 1 ) 6 . at x * = i ( n + 1 i )
6:
Sum Squares function [90]
min x i = 1 n i x i 2 .
Range of starting points 100 < x i < 100 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (0, 0, …, 0).
7 CV:
Colville function [57,80,91]
min x { 100 x 1 2 x 2 2 + x 1 1 2 + x 3 1 2 + 90 x 3 2 x 4 2 + 10.1 x 2 1 2 + x 4 1 2 + 19.8 x 2 1 x 4 1 2 } .
Range of starting points 10 < x i < 10 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = ( 1 , 1 , 1 , 1 ) .
8 BR:
Branin function [57,92,93]
min x ( x 2 5.1 4 π 2 x 1 2 + 5 π x 1 6 ) 2 + 10 ( 1 1 8 π c o s ( x 1 ) ) + 10 .
Range of starting points 5 < x i < 15 , i = 1 , 2 .
Only one global minima: f ( x * ) = 0.397887 . at x * = { ( π , 12.275 ) , ( 9.42478 , 2.475 ) , ( π , 2.275 ) } .
9 DJ:
De Joung function [57,87,88]
min x x 1 2 + x 2 2 + x 3 2 .
Range of starting points 5 < x i < 15 , i = 1 , 2 , 3 .
Number of local minima: no local minima.
Global minima: f ( x * ) = 0 at x * = ( 0 , 0 , 0 ) .
10 BO:
Booth function [89]
min x ( x 1 + 2 x 2 7 ) 2 + ( 2 x 1 + x 2 5 ) 2 .
Range of starting points 10 < x i < 10 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (1, 3).
11 Ma:
Matyas function [90]
min x 0.26 x 1 2 + x 2 2 0.48 x 1 x 2 .
Range of starting points 10 < x i < 10 , i = 1 , 2 , . . . , n .
Global minima: f ( x * ) = 0 at x * = (0, 0 ).
12 Sm * :
Shekel functions [57,80,87,88,92,93,94]
min x j = 1 m i = 1 4 x i A i j 2 + c j 1 .
where c = 0.1 [ 1 , 2 , 2 , 4 , 4 , 6 , 3 , 7 , 5 , 5 ] ,
A = 4.0 1.0 8.0 6.0 3.0 2.0 5.0 8.0 6.0 7.0 4.0 1.0 8.0 6.0 7.0 9.0 3.0 1.0 2.0 3.0 4.0 1.0 8.0 6.0 3.0 2.0 5.0 8.0 6.0 7.0 4.0 1.0 8.0 6.0 7.0 9.0 3.0 1.0 2.0 3.0
Range of starting points 0 < x i < 10 , i = 1 , . . . , n .
Number of local minima: m local minima.
Global minima:
f ( x * ) n , m = 10.1532 , when m = 5 , 10.4029 , when m = 7 , 10.5364 , when m = 10 .
Global minima for three functions at x * = 4 , 4 , 4 , 4 .
13 GP * :
Goldstein and Price function [57,80,87,88,92,94]
u ( x ) = 1 + ( x 1 + x 2 + 1 ) 2 ( 19 14 x 1 + 3 x 1 2 14 x 2 + 6 x 1 x 2 + 3 x 2 2 )
v ( x ) = 30 + ( 2 x 1 3 x 2 ) 2 ( 18 32 x 1 + 12 x 1 2 + 48 x 2 36 x 1 x 2 + 27 x 2 2 ) .
min x v ( x ) u ( x ) .
Range of starting points 2 < x i < 2 , i = 1 , 2 .
Number of local minima: 4 local minima.
Global minima: f ( x * ) = 3 at x * = ( 0 , 1 ) .
14 Ras * :
Rastrigin function [93]
min x x 1 2 + x 2 2 c o s ( 18 x 1 ) c o s ( 18 x 2 ) .
Range of starting points 1 < x i < 1 , i = 1 , 2 .
Number of local minima: many local minima.
Global minima: f ( x * ) = 2 at x * = ( 0 , 0 ) .
15 Bh1 * :
Bohachevsky function [80]
min x x 1 2 + 2 x 2 2 0.3 c o s ( 3 π x 1 ) 0.4 c o s ( 4 π x 2 ) + 0.7 .
Range of starting points 100 < x i < 100 , i = 1 , 2 .
Number of local minima: many local minima.
Global minima: f ( x * ) = 0 at x * = (0, 0).
16 SH * :
Shubert function in [57,80,87,88,92]
min x i = 1 5 i c o s ( i + 1 ) x 1 + i i = 1 5 i c o s ( i + 1 ) x 2 + i .
Range of starting points 5.12 < x i < 5.12 , i = 1 , 2 .
Number of local minima: 760 local minima.
Global minima: f ( x * ) = 186.7309 at 18 point different of x * .
17 P8 *
Ref. [92]
min x π n k 1 s i n ( π y 1 ) 2 + i = 1 n 1 y i k 2 2 1 + k 1 s i n ( π y i + 1 ) 2 + ( y n k 2 ) 2 ,
with y i = 1 + 1 4 x i + 1 , k 1 = 10 and k 2 = 1 .
Range of starting points 10 x i 10 , i = 1 , 2 , 3 .
Number of local minima: 5 3 local minima.
Global minima: f ( x * ) = 0 at x * = ( 1 , 1 , 1 ) .
18 P16 *
Ref. [92]
min x k 3 s i n 2 ( π k 4 x 1 ) + i = 1 n 1 x i k 5 2 1 + k 6 s i n 2 ( π k 4 x i + 1 ) + ( x n k 5 ) 2 1 + k 6 s i n 2 ( π k 7 x n ) ,
where k 3 = 0.1 , k 4 = 3 , k 5 = 1 , k 6 = 1 , k 7 = 2 .
Range of starting points 5 x i 5 , i = 1 , . . , n .
Number of local minima: 15 5 local minima.
Global minima: f ( x * ) = 0 at x * = ( 1 , 1 , 1 , 1 , 1 ) .
19 CB * :
Camel back in [80] and camel function in [93]
min x 4 x 1 2 2.1 x 1 4 + 1 3 x 1 6 + x 1 x 2 4 x 2 2 + 4 x 2 4 .
Range of starting points 5 < x i < 5 , i = 1 , 2 .
Number of local minima: many local minima.
Global minima: f ( x * ) = 1.0316285 at x * = { ( 0.089842 , 0.71266 ) , ( 0.089842 , 0.71266 ) } .
20 H3 * :
Hartmann function [57,80,87,88,92,93,94]
min x i = 1 4 c i e x p j = 1 3 a i j x j p i j 2 .
Range of starting points 1 < x j < 1 , j = 1 , 2 , 3 .
Number of local minima: 4 local minima.
Global minima: f ( x * ) = 3.86278 at x * = 0.114614 , 0.555649 , 0.852547 .
21 H6 * :
Hartmann function [57,80,87,88,92,93,94]
min x i = 1 4 c i e x p j = 1 6 a i j x j p i j 2 .
Range of starting points 1 < x j < 1 , j = 1 , 2 , . . . , n .
Number of local minima: 4 local minima.
Global minima: f ( x * ) = 3.32237 at x * = (0.201690, 0.150011, 0.476874,
0.275332, 0.311652, 0.657300).
22 HM * :
hump Function [57]
min x 1.0316285 + 4 x 1 2 2.1 x 1 4 + 1 3 x 1 6 + x 1 x 2 4 x 2 2 + 4 x 2 4 .
Range of starting points 5 < x i < 5 , i = 1 , 2 .
Number of local minima: 3 local minima.
Global minima: f ( x * ) = 0 at x * = { ( 0.0898 , 0.7126 ) , ( 0.0898 , 0.7126 ) } .
23 Le * :
Levy function [95]
min x s i n 2 ( π w 1 ) + i = 1 n 1 ( w i 1 ) 2 1 + 10 s i n 2 ( π w i + 1 ) + ( w n 1 ) 2 1 + s i n 2 ( 2 π w n ) ,
where w i = 1 + x i 1 4 , for i = 1 , . . . , n .
Range of starting points 10 < x i < 10 , i = 1 , 2 , . . . , n .
Number of local minima: many local minima.
Global minima: f ( x * ) = 0 at x * = (1, 1, …, 1).

References

  1. Abdel-Baset, M.; Hezam, I. A Hybrid Flower Pollination Algorithm for Engineering Optimization Problems. Int. J. Comput. Appl. 2016, 140, 10–23. [Google Scholar] [CrossRef]
  2. Agrawal, P.; Ganesh, T.; Mohamed, A.W. A novel binary gaining–sharing knowledge-based optimization algorithm for feature selection. Neural Comput. Appl. 2021, 33, 5989–6008. [Google Scholar] [CrossRef]
  3. Ayumi, V.; Rere, L.; Fanany, M.I.; Arymurthy, A.M. Optimization of Convolutional Neural Network using Microcanonical Annealing Algorithm. arXiv 2016, arXiv:1610.02306. [Google Scholar]
  4. Lobato, F.S.; Steffen, V., Jr. Fish swarm optimization algorithm applied to engineering system design. Lat. Am. J. Solids Struct. 2014, 11, 143–156. [Google Scholar] [CrossRef] [Green Version]
  5. Mazhoud, I.; Hadj-Hamou, K.; Bigeon, J.; Joyeux, P. Particle swarm optimization for solving engineering problems: A new constraint-handling mechanism. Eng. Appl. Artif. Intell. 2013, 26, 1263–1273. [Google Scholar] [CrossRef]
  6. Mohamed, A.W.; Sabry, H.Z. Constrained optimization based on modified differential evolution algorithm. Inf. Sci. 2012, 194, 171–208. [Google Scholar] [CrossRef]
  7. Mohamed, A.W.; Hadi, A.A.; Mohamed, A.K. Gaining-sharing knowledge based algorithm for solving optimization problems: A novel nature-inspired algorithm. Int. J. Mach. Learn. Cybern. 2020, 11, 1501–1529. [Google Scholar] [CrossRef]
  8. Rere, L.; Fanany, M.I.; Arymurthy, A.M. Metaheuristic Algorithms for Convolution Neural Network. Comput. Intell. Neurosci. 2016, 2016, 1537325. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Samora, I.; Franca, M.J.; Schleiss, A.J.; Ramos, H.M. Simulated annealing in optimization of energy production in a water supply network. Water Resour. Manag. 2016, 30, 1533–1547. [Google Scholar] [CrossRef]
  10. Shao, Y. Dynamics of an Impulsive Stochastic Predator–Prey System with the Beddington–DeAngelis Functional Response. Axioms 2021, 10, 323. [Google Scholar] [CrossRef]
  11. Vallepuga-Espinosa, J.; Cifuentes-Rodríguez, J.; Gutiérrez-Posada, V.; Ubero-Martínez, I. Thermomechanical Optimization of Three-Dimensional Low Heat Generation Microelectronic Packaging Using the Boundary Element Method. Mathematics 2022, 10, 1913. [Google Scholar] [CrossRef]
  12. Blum, C.; Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. (CSUR) 2003, 35, 268–308. [Google Scholar] [CrossRef]
  13. Aarts, E.; Korst, J. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing; John Wiley & Sons, Inc.: New York, NY, USA, 1989. [Google Scholar]
  14. Hillier, F.S.; Price, C.C. International Series in Operations Research & Management Science; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
  15. Laarhoven, P.J.V.; Aarts, E.H. Simulated Annealing: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
  16. Farid, M.; Leong, W.J.; Hassan, M.A. A new two-step gradient-type method for large-scale unconstrained optimization. Comput. Math. Appl. 2010, 59, 3301–3307. [Google Scholar] [CrossRef] [Green Version]
  17. Gilbert, J.C.; Nocedal, J. Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 1992, 2, 21–42. [Google Scholar] [CrossRef] [Green Version]
  18. Hager, W.W.; Zhang, H. Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. (TOMS) 2006, 32, 113–137. [Google Scholar] [CrossRef]
  19. Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
  20. Shi, Z. A new memory gradient method under exact line search. Asia-Pac. J. Oper. Res 2003, 20, 275–284. [Google Scholar]
  21. Zhang, L.; Zhou, W.; Li, D.H. A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 2006, 26, 629–640. [Google Scholar] [CrossRef]
  22. Abubakar, A.B.; Malik, M.; Kumam, P.; Mohammad, H.; Sun, M.; Ibrahim, A.H.; Kiri, A.I. A Liu-Storey-type conjugate gradient method for unconstrained minimization problem with application in motion control. J. King Saud Univ.-Sci. 2022, 34, 101923. [Google Scholar] [CrossRef]
  23. Dai, Y.; Yuan, Y. An efficient hybrid conjugate gradient method for unconstrained optimization. Ann. Oper. Res. 2001, 103, 33–47. [Google Scholar] [CrossRef]
  24. Deng, S.; Wan, Z. A three-term conjugate gradient algorithm for large-scale unconstrained optimization problems. Appl. Numer. Math. 2015, 92, 70–81. [Google Scholar] [CrossRef]
  25. Ma, G.; Lin, H.; Jin, W.; Han, D. Two modified conjugate gradient methods for unconstrained optimization with applications in image restoration problems. J. Appl. Mathemat. Comput. 2022, 1–26. [Google Scholar] [CrossRef]
  26. Mtagulwa, P.; Kaelo, P. An efficient modified PRP-FR hybrid conjugate gradient method for solving unconstrained optimization problems. Appl. Numer. Math. 2019, 145, 111–120. [Google Scholar] [CrossRef]
  27. Waziri, M.Y.; Kiri, A.I.; Kiri, A.A.; Halilu, A.S.; Ahmed, K. A modified conjugate gradient parameter via hybridization approach for solving large-scale systems of nonlinear equations. SeMA J. 2022, 1–23. [Google Scholar] [CrossRef]
  28. Zhang, L.; Zhou, W.; Li, D. Global convergence of a modified Fletcher–Reeves conjugate gradient method with Armijo-type line search. Numer. Math. 2006, 104, 561–572. [Google Scholar] [CrossRef]
  29. Alshamrani, A.M.; Alrasheedi, A.F.; Alnowibet, K.A.; Mahdi, S.; Mohamed, A.W. A Hybrid Stochastic Deterministic Algorithm for Solving Unconstrained Optimization Problems. Mathematics 2022, 10, 3032. [Google Scholar] [CrossRef]
  30. Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef] [Green Version]
  31. Golub, G.H.; O’Leary, D.P. Some history of the conjugate gradient and Lanczos algorithms: 1948–1976. SIAM Rev. 1989, 31, 50–102. [Google Scholar] [CrossRef]
  32. Hager, W.W.; Zhang, H. A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2, 35–58. [Google Scholar]
  33. Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
  34. Powell, M.J. Nonconvex minimization calculations and the conjugate gradient method. In Numerical Analysis; Springer: Berlin/Heidelberg, Germany, 1984; pp. 122–141. [Google Scholar]
  35. Al-Baali, M. Descent property and global convergence of the Fletcher Reeves method with inexact line search. IMA J. Numer. Anal. 1985, 5, 121–124. [Google Scholar] [CrossRef]
  36. Powell, M.J.D. Restart procedures for the conjugate gradient method. Math. Program. 1977, 12, 241–254. [Google Scholar] [CrossRef]
  37. Polak, E.; Ribiere, G. Note sur la convergence de méthodes de directions conjuguées. ESAIM Math. Model. Numer. Anal. 1969, 3, 35–43. [Google Scholar] [CrossRef]
  38. Polyak, B.T. The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
  39. Hestenes, M.R.; Stiefel, E. Methods of Conjugate Gradients for Solving. J. Res. Natl. Bur. Stand. 1952, 49, 409. [Google Scholar] [CrossRef]
  40. Liu, Y.; Storey, C. Efficient generalized conjugate gradient algorithms, part 1: Theory. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
  41. Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
  42. Abubakar, A.B.; Kumam, P. A descent Dai-Liao conjugate gradient method for nonlinear equations. Numer. Algorithms 2019, 81, 197–210. [Google Scholar] [CrossRef]
  43. Abubakar, A.B.; Muangchoo, K.; Ibrahim, A.H.; Muhammad, A.B.; Jolaoso, L.O.; Aremu, K.O. A new three-term Hestenes-Stiefel type method for nonlinear monotone operator equations and image restoration. IEEE Access 2021, 9, 18262–18277. [Google Scholar] [CrossRef]
  44. Babaie-Kafaki, S.; Ghanbari, R. A descent family of Dai–Liao conjugate gradient methods. Optim. Methods Softw. 2014, 29, 583–591. [Google Scholar] [CrossRef]
  45. Dai, Y.; Liao, L. New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 2001, 43, 87–101. [Google Scholar] [CrossRef]
  46. Ibrahim, A.H.; Kumam, P.; Kumam, W. A family of derivative-free conjugate gradient methods for constrained nonlinear equations and image restoration. IEEE Access 2020, 8, 162714–162729. [Google Scholar] [CrossRef]
  47. Su, Z.; Li, M. A Derivative-Free Liu–Storey Method for Solving Large-Scale Nonlinear Systems of Equations. Math. Probl. Eng. 2020, 2020, 6854501. [Google Scholar] [CrossRef]
  48. Yuan, G.; Zhang, M. A three-terms Polak–Ribière–Polyak conjugate gradient algorithm for large-scale nonlinear equations. J. Comput. Appl. Math. 2015, 286, 186–195. [Google Scholar] [CrossRef]
  49. Yuan, G.; Jian, A.; Zhang, M.; Yu, J. A modified HZ conjugate gradient algorithm without gradient Lipschitz continuous condition for non convex functions. J. Appl. Mathemat. Comput. 2022, 1–22. [Google Scholar] [CrossRef]
  50. Zhou, Y.; Wu, Y.; Li, X. A new hybrid prpfr conjugate gradient method for solving nonlinear monotone equations and image restoration problems. Math. Probl. Eng. 2020, 2020, 6391321. [Google Scholar] [CrossRef]
  51. Yuan, G.; Meng, Z.; Li, Y. A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 2016, 168, 129–152. [Google Scholar] [CrossRef]
  52. Yuan, G.; Wei, Z.; Yang, Y. The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 2019, 362, 262–275. [Google Scholar] [CrossRef]
  53. Kan, A.R.; Timmer, G. Stochastic methods for global optimization. Am. J. Math. Manag. Sci. 1984, 4, 7–40. [Google Scholar] [CrossRef]
  54. Alnowibet, K.A.; Alshamrani, A.M.; Alrasheedi, A.F.; Mahdi, S.; El-Alem, M.; Aboutahoun, A.; Mohamed, A.W. A Efficient Modified Meta-Heuristic Technique for Unconstrained Optimization Problems. Axioms 2022, 11, 483. [Google Scholar] [CrossRef]
  55. Alnowibet, K.A.; Mahdi, S.; El-Alem, M.; Abdelawwad, M.; Mohamed, A.W. Guided Hybrid Modified Simulated Annealing Algorithm for Solving Constrained Global Optimization Problems. Mathematics 2022, 10, 1312. [Google Scholar] [CrossRef]
  56. EL-Alem, M.; Aboutahoun, A.; Mahdi, S. Hybrid gradient simulated annealing algorithm for finding the global optimal of a nonlinear unconstrained optimization problem. Soft Comput. 2021, 25, 2325–2350. [Google Scholar] [CrossRef]
  57. Hedar, A.R.; Fukushima, M. Hybrid simulated annealing and direct search method for nonlinear unconstrained global optimization. Optim. Methods Softw. 2002, 17, 891–912. [Google Scholar] [CrossRef]
  58. Pedamallu, C.S.; Ozdamar, L. Investigating a hybrid simulated annealing and local search algorithm for constrained optimization. Eur. J. Oper. Res. 2008, 185, 1230–1245. [Google Scholar] [CrossRef]
  59. Yiu, K.F.C.; Liu, Y.; Teo, K.L. A hybrid descent method for global optimization. J. Glob. Optim. 2004, 28, 229–238. [Google Scholar] [CrossRef]
  60. Zoutendijk, G. Nonlinear programming, computational methods. In Integer and Nonlinear Programming; Abadie, J., Ed.; North-Holland: Amsterdam, The Netherlands, 1970; pp. 37–86. [Google Scholar]
  61. Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1969, 11, 226–235. [Google Scholar] [CrossRef]
  62. Wolfe, P. Convergence conditions for ascent methods. II: Some corrections. SIAM Rev. 1971, 13, 185–188. [Google Scholar] [CrossRef]
  63. Conn, A.R.; Scheinberg, K.; Vicente, L.N. Introduction to Derivative-Free Optimization; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar]
  64. Kramer, O.; Ciaurri, D.E.; Koziel, S. Derivative-free optimization. In Computational Optimization, Methods and Algorithms; Springer: Berlin/Heidelberg, Germany, 2011; pp. 61–83. [Google Scholar]
  65. Larson, J.; Menickelly, M.; Wild, S.M. Derivative-free optimization methods. Acta Numer. 2019, 28, 287–404. [Google Scholar] [CrossRef] [Green Version]
  66. Shi, H.J.M.; Xie, Y.; Xuan, M.Q.; Nocedal, J. Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization. arXiv 2021, arXiv:2110.06380. [Google Scholar] [CrossRef]
  67. Shi, H.J.M.; Xuan, M.Q.; Oztoprak, F.; Nocedal, J. On the numerical performance of derivative-free optimization methods based on finite-difference approximations. arXiv 2021, arXiv:2102.09762. [Google Scholar]
  68. Berahas, A.S.; Cao, L.; Choromanski, K.; Scheinberg, K. A theoretical and empirical comparison of gradient approximations in derivative-free optimization. Found. Comput. Math. 2022, 22, 507–560. [Google Scholar] [CrossRef]
  69. Curtis, A.; Reid, J. The choice of step lengths when using differences to approximate Jacobian matrices. IMA J. Appl. Math. 1974, 13, 121–126. [Google Scholar] [CrossRef]
  70. Calio, F.; Frontini, M.; Milovanović, G.V. Numerical differentiation of analytic functions using quadratures on the semicircle. Comput. Math. Appl. 1991, 22, 99–106. [Google Scholar] [CrossRef] [Green Version]
  71. Gill, P.E.; Murray, W.; Saunders, M.A.; Wright, M.H. Computing forward-difference intervals for numerical optimization. SIAM J. Sci. Stat. Comput. 1983, 4, 310–321. [Google Scholar] [CrossRef]
  72. Xie, Y. Methods for Nonlinear and Noisy Optimization. Ph.D. Thesis, Northwestern University, Evanston, IL, USA, 2021. [Google Scholar]
  73. De Levie, R. An improved numerical approximation for the first derivative. J. Chem. Sci. 2009, 121, 935–950. [Google Scholar] [CrossRef] [Green Version]
  74. Carter, R.G. On the global convergence of trust region algorithms using inexact gradient information. SIAM J. Numer. Anal. 1991, 28, 251–265. [Google Scholar] [CrossRef] [Green Version]
  75. Rivet, A.; Souloumiac, A. Introduction to Optimization; Optimization Software, Publications Division: New Delhi, India, 1987. [Google Scholar]
  76. Byrd, R.H.; Chin, G.M.; Nocedal, J.; Wu, Y. Sample size selection in optimization methods for machine learning. Math. Program. 2012, 134, 127–155. [Google Scholar] [CrossRef]
  77. Cartis, C.; Scheinberg, K. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. 2018, 169, 337–375. [Google Scholar] [CrossRef] [Green Version]
  78. Grapiglia, G.N. Quadratic regularization methods with finite-difference gradient approximations. Comput. Optim. Appl. 2022, 1–21. [Google Scholar] [CrossRef]
  79. Paquette, C.; Scheinberg, K. A stochastic line search method with expected complexity analysis. SIAM J. Optim. 2020, 30, 349–376. [Google Scholar] [CrossRef]
  80. Ali, M.M.; Khompatraporn, C.; Zabinsky, Z.B. A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J. Glob. Optim. 2005, 31, 635–672. [Google Scholar] [CrossRef]
  81. Barbosa, H.J.; Bernardino, H.S.; Barreto, A.M. Using performance profiles to analyze the results of the 2006 CEC constrained optimization competition. In Proceedings of the IEEE Congress on Evolutionary Computation, Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
  82. Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
  83. Moré, J.J.; Wild, S.M. Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 2009, 20, 172–191. [Google Scholar] [CrossRef] [Green Version]
  84. Vaz, A.I.F.; Vicente, L.N. A particle swarm pattern search method for bound constrained global optimization. J. Glob. Optim. 2007, 39, 197–219. [Google Scholar] [CrossRef] [Green Version]
  85. Liang, J.; Runarsson, T.P.; Mezura-Montes, E.; Clerc, M.; Suganthan, P.N.; Coello, C.C.; Deb, K. Problem definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter optimization. J. Appl. Mech. 2006, 41, 8–31. [Google Scholar]
  86. Mohamed, A.W.; Hadi, A.A.; Mohamed, A.K.; Awad, N.H. Evaluating the performance of adaptive gainingsharing knowledge based algorithm on cec 2020 benchmark problems. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  87. Bessaou, M.; Siarry, P. A genetic algorithm with real-value coding to optimize multimodal continuous functions. Struct. Multidisc. Optim. 2001, 23, 63–74. [Google Scholar] [CrossRef]
  88. Chelouah, R.; Siarry, P. Tabu search applied to global optimization. Eur. J. Oper. Res. 2000, 123, 256–270. [Google Scholar] [CrossRef]
  89. Fan, S.K.S.; Zahara, E. A hybrid simplex search and particle swarm optimization for unconstrained optimization. Eur. J. Oper. Res. 2007, 181, 527–548. [Google Scholar] [CrossRef]
  90. Paulavičius, R.; Chiter, L.; Žilinskas, J. Global optimization based on bisection of rectangles, function values at diagonals, and a set of Lipschitz constants. J. Glob. Optim. 2018, 71, 5–20. [Google Scholar] [CrossRef]
  91. Cardoso, M.F.; Salcedo, R.L.; De Azevedo, S.F. The simplex-simulated annealing approach to continuous non-linear optimization. Comput. Chem. Eng. 1996, 20, 1065–1080. [Google Scholar] [CrossRef]
  92. Dekkers, A.; Aarts, E. Global optimization and simulated annealing. Math. Program. 1991, 50, 367–393. [Google Scholar] [CrossRef] [Green Version]
  93. Tsoulos, I.G.; Stavrakoudis, A. Enhancing PSO methods for global optimization. Appl. Math. Comput. 2010, 216, 2988–3001. [Google Scholar] [CrossRef]
  94. Siarry, P.; Berthiau, G.; Durbin, F.; Haussy, J. Enhanced Simulated-Annealing Algorithm for Globally Minimizing Functions of Many Continuous Variables. ACM Trans. Math. Softw. 1997, 23, 209–228. [Google Scholar] [CrossRef]
  95. Laguna, M.; Martí, R. Experimental testing of advanced scatter search designs for global optimization of multimodal functions. J. Glob. Optim. 2005, 33, 235–255. [Google Scholar] [CrossRef]
Figure 1. Plotting the results of the terms itr.w and FEs.w for 5 algorithms.
Figure 1. Plotting the results of the terms itr.w and FEs.w for 5 algorithms.
Mathematics 10 03595 g001aMathematics 10 03595 g001b
Figure 2. Plotting the results of the terms itr.be and FEs.be for 5 algorithms.
Figure 2. Plotting the results of the terms itr.be and FEs.be for 5 algorithms.
Mathematics 10 03595 g002
Figure 3. Plotting the results of the term time.a “CPU” for 5 algorithms.
Figure 3. Plotting the results of the term time.a “CPU” for 5 algorithms.
Mathematics 10 03595 g003
Figure 4. Plotting the results of the terms itr.a and FEs.a for 5 algorithms.
Figure 4. Plotting the results of the terms itr.a and FEs.a for 5 algorithms.
Mathematics 10 03595 g004
Figure 5. Plotting the results of the terms itr.w and FEs.w for 5 algorithms.
Figure 5. Plotting the results of the terms itr.w and FEs.w for 5 algorithms.
Mathematics 10 03595 g005
Figure 6. Plotting the results of the terms itr.be and FEs.be for 5 algorithms.
Figure 6. Plotting the results of the terms itr.be and FEs.be for 5 algorithms.
Mathematics 10 03595 g006
Figure 7. Plotting the results of the term time.a “CPU” for 5 algorithms.
Figure 7. Plotting the results of the term time.a “CPU” for 5 algorithms.
Mathematics 10 03595 g007
Figure 8. Plotting the results of the terms itr.a and FEs.a for 5 algorithms.
Figure 8. Plotting the results of the terms itr.a and FEs.a for 5 algorithms.
Mathematics 10 03595 g008
Table 1. List of both kinds of test problems.
Table 1. List of both kinds of test problems.
fn f ( x * ) g ( x * ) fn f ( x * ) g ( x * )
R n 10, 30, 50, 80, 1000- Z n 10, 30, 50, 80, 1000-
P W 8, 32, 84, 1200- S P 10, 30, 80, 1000-
T r 10, 30, 60, 80 n ( n + 4 ) ( n 1 ) 6 - S u 10, 30, 50, 80, 1000-
C V 40- B R 20.397887-
D J 30- B O 20-
M a 20- S 5 * 4−10.15323.2 × 10 5
S 7 * 4−10.4029- S 10 * 4−10.53643 × 10 5
G P * 232 × 10 6 R a s * 2−22.5 × 10 6
B h 1 * 202.4 × 10 5 S H * 2−186.73092 × 10 6
P 8 * 30- P 16 * 501.2 × 10 6
C B * 2−1.03162852 × 10 5 H 3 * 3−3.862782 × 10 5
H 6 * 6−3.322376 × 10 5 H M * 201.1 × 10 8
L e * 1002.1 × 10 6
Table 2. The number of worst iterations.
Table 2. The number of worst iterations.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 1029153740570550805185 R n 3022703555517051405050
R n 5026053805570552905145 R n 8027504010579551505890
R n 10028202950505059305840 Z n 10145170225210195
Z n 301075995182515751425 Z n 5022952600418036453515
Z n 8053354900925586107345 Z n 10090957490990599059905
P W 81470223071203980970 P W 3221354515970097002075
P W 8433456575988598852145 P W 12043857750992099204495
S P 101525253025 S P 301525303030
S P 801530253525 S P 1001530303530
T r 10575160135355155 T r 3028301765205596802280
T r 6098409840984098409840 T r 10098809905990599059905
S u 100155155190200185 S u 80140135175190185
S u 5011595130130130 S u 307580909580
S u 104540454040 B R 275757065200
C V 420701745176024555705 D J 31515354030
B O 23535404035 M a 28010565F140
S 5 * 4115445150750155 S 7 * 42002752201500215
S 10 * 4100250205620120 G P * 266706670667066706670
R a s * 2301751665280220 B h 1 * 235504007075
S H * 266706670667066706670 P 8 * 4208000800018804730
P 16 * 5208000800018804730 C B * 2252511525150
H 3 * 341565513003657500 H 6 * 6445142521908575565
H M * 22530252525 L e * 1011051575181510251200
Table 3. The number of worst function evaluations.
Table 3. The number of worst function evaluations.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 1032,06541,140290,95555,88057,035 R n 3070,370110,205160,270159,340156,550
R n 50132,855194,055290,955269,790262,395 R n 80222,750324,810469,395417,150477,090
R n 100284,820297,950510,050598,930589,840 Z n 1015951870247523102145
Z n 3033,32530,84556,57548,82544,175 Z n 50117,045132,600213,180185,895179,265
Z n 80432,135396,900749,655697,410594,945 Z n 100918,595756,4901,000,4051,000,4051,000,405
P W 813,23020,07064,08035,8208730 P W 3270,455148,995320,100320,10068,475
P W 84284,325558,875840,225840,225182,325 P W 120530,585937,7501,200,3201,200,320543,895
S P 10165275275330275 S P 30465775930930930
S P 8012152430202528352025 S P 10015153030303035353030
T r 1063251760148539051705 T r 3087,73054,71563,705300,08070,680
T r 60600,240600,240600,240600,240600,240 T r 100800,2801,000,4051,000,4051,000,4051,000,405
S u 10015,65515,65519,19020,20018,685 S u 8011,34010,93514,17515,39014,985
S u 5058654845663066306630 S u 3023252480279029452480
S u 10495440495440440 B R 2225225210195600
C V 410,3508725880012,27528,525 D J 36060140160120
B O 2105105160120105 M a 2240315195F420
S 5 * 457522257503750775 S 7 * 410001375110075001075
S 10 * 4500125010253100600 G P * 220,01020,01020,01020,01020,010
R a s * 2905254995840660 B h 1 * 21051501200210225
S H * 220,01020,01020,01020,01020,010 P 8 * 410040,00040,000940023,650
P 16 * 510040,00040,000940023,650 C B * 2757534575450
H 3 * 3166026205200146030,000 H 6 * 63115997515,33060,0253955
H M * 27590757575 L e * 1012,15517,32519,96511,27513,200
Table 4. The number of best iterations.
Table 4. The number of best iterations.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 10360460490520510 R n 30230485190590420
R n 50705375490650440 R n 80230400920125460
R n 100240275885670705 Z n 1060601158075
Z n 30245330875875810 Z n 50570905223517651885
Z n 809351565408043653495 Z n 10016702545634560455095
P W 81801752080375225 P W 3228026101250390280
P W 8451035254115520410 P W 12053527452765395435
S P 10510102010 S P 301010102010
S P 801010102015 S P 1001010102015
T r 108585658055 T r 307351370230350220
T r 609840984098409840430 T r 10098809905990599059905
S u 1007080959575 S u 8065557510070
S u 505055607050 S u 304040404540
S u 102025202520 B R 21515151510
C V 4275275690370600 D J 31010102010
B O 21515202020 M a 2302020F15
S 5 * 415202025125 S 7 * 415151530100
S 10 * 415152015100 G P * 22518017060165
R a s * 21020951545 B h 1 * 22020302575
S H * 239025584015520010 P 8 * 41015515125
P 16 * 51015515125 C B * 21515201545
H 3 * 3555515 H 6 * 650505050175
H M * 21015151530 L e * 10654010570550
Table 5. The number of best function evaluations.
Table 5. The number of best function evaluations.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 103960506024,99057205610 R n 30713015,035589018,29013,020
R n 5035,95519,12524,99033,15022,440 R n 8018,63032,40074,52010,12537,260
R n 10024,24027,77589,38567,67071,205 Z n 106606601265880825
Z n 30759510,23027,12527,12525,110 Z n 5029,07046,155113,98590,01596,135
Z n 8075,735126,765330,480353,565283,095 Z n 100168,670257,045640,845610,545514,595
P W 81620157518,72033752025 P W 32924086,13041,25012,8709240
P W 8443,350299,625349,77544,20034,850 P W 12064,735332,145334,56547,79552,635
S P 1055110110220110 S P 30310310310620310
S P 8081081081016201215 S P 10010101010101020201515
T r 10935935715880605 T r 3022,78542,470713010,8506820
T r 60600,240600,240600,240600,24026,230 T r 100800,2801,000,4051,000,4051,000,4051,000,405
S u 10070708080959595957575 S u 8052654455607581005670
S u 5025502805306035702550 S u 3012401240124013951240
S u 10220275220275220 B R 24545454530
C V 413751375345018503000 D J 34040408040
B O 24545806060 M a 2906060F45
S 5 * 475100100125125 S 7 * 4757575150100
S 10 * 4757510075100 G P * 275540510180165
R a s * 230602854545 B h 1 * 26060907575
S H * 21170765252046520,010 P 8 * 450752575125
P 16 * 550752575125 C B * 24545604545
H 3 * 31515151515 H 6 * 6300300300300175
H M * 23045454530 L e * 107154401155770550
Table 6. The average of time.
Table 6. The average of time.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 101.4631.4413.3162.4363.215 R n 302.7713.7026.8316.9346.816
R n 505.5716.12313.28812.28613.258 R n 8010.14911.27319.52921.93422.283
R n 10014.76115.97329.59629.49532.934 Z n 100.0830.0910.1390.1370.115
Z n 301.3361.3652.4773.2202.290 Z n 505.4455.29711.68911.29310.938
Z n 8024.81827.53258.54755.40350.910 Z n 10053.55251.210107.859104.313109.531
P W 80.4931.2314.7600.9850.309 P W 321.7836.81221.8736.4961.085
P W 848.77938.64476.49916.0614.562 P W 12016.95572.473113.17129.6237.631
S P 100.0110.0160.0200.0260.017 S P 300.0210.0320.0330.0420.036
S P 800.0600.0990.0960.1650.096 S P 1000.0750.1370.1250.1910.148
T r 100.1830.0840.0680.1440.069 T r 302.9482.8911.98224.7371.256
T r 6063.99073.81280.23558.58870.106 T r 10090.259130.122134.463145.078135.992
S u 1004.5424.7064.7365.1945.127 S u 802.2882.7532.8392.9482.716
S u 500.7800.7990.8420.9210.889 S u 300.2940.2650.2980.2960.247
S u 100.0510.0430.0380.0410.036 B R 20.0220.0240.0220.0190.045
C V 40.5680.5050.7620.7746.317 D J 30.0080.0090.0130.0200.013
B O 20.0140.0140.0160.0160.016 M a 20.0260.0260.017F0.019
S 5 * 40.1070.3210.1660.2970.162 S 7 * 40.2310.2040.1800.5540.273
S 10 * 40.1240.1800.2080.4320.194 G P * 26.0687.9275.41011.2033.164
R a s * 20.0210.0911.3550.1090.120 B h 1 * 20.0190.0300.1840.0390.043
S H * 213.34111.59713.22612.48717.294 P 8 * 40.0110.3070.2340.2470.155
P 16 * 50.1280.2763.4520.1293.886 C B * 20.0140.0150.0600.0150.058
H 3 * 30.1030.4110.2030.1140.400 H 6 * 60.2240.9020.2051.0640.164
H M * 20.0160.0210.0150.0150.016 L e * 100.5010.5130.8360.5530.612
Table 7. The average of number of iterations.
Table 7. The average of number of iterations.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 101469.31479.33114.82517.22952.5 R n 301273.41523.82877.92867.52721.5
R n 5013751530.63114.82746.42851.1 R n 801379.21535.22524.72885.52593.2
R n 1001403.91421.2282128392809.7 Z n 10104.61108.92168.04155.2142.16
Z n 30654.02674.221229.31187.31078.8 Z n 501491.31479.22947.12914.62817
Z n 803378.63298.665296519.26185.1 Z n 1005125.64818.49066.18703.79048.1
P W 8697.161731.65774.21339.6441.47 P W 321042.536758852.92993.3660.1
P W 841665.25767.79547.32632.5817.55 P W 1201774.86851.494242964.5897.55
S P 1010.29416.76516.47123.82418.529 S P 3010.78417.94118.62725.29421.176
S P 8011.17619.11819.21626.47119.412 S P 10010.58819.31418.72526.76520.98
T r 10283.24125.88100.59182.0696.961 T r 301713.51610.510537268.8649.41
T r 6098409840984098409117.5 T r 10098809905990599059905
S u 100116.08112.35152.06147.94141.86 S u 8096.37399.02134.9137.25117.35
S u 5076.66772.35395.9895.68689.51 S u 3058.72554.31470.39269.60857.255
S u 1032.45129.70631.96133.62729.706 B R 236.96132.25535.68631.66749.902
C V 4704.41634.91114.11015.23365.8 D J 311.07811.37321.17631.27518.824
B O 224.60823.82429.0228.23527.451 M a 258.82460.88239.902F40.882
S 5 * 452106.576.75255.2566.75 S 7 * 473.257361.25387.7576.5
S 10 * 440.2549.7557.75172.7551.75 G P * 22053.83273.32340.341251381
R a s * 221.568.25802.2586.2588.25 B h 1 * 228.533.25119.53842.5
S H * 25927.558556184.56344.36670 P 8 * 413.25771.5575.25603.75367.5
P 16 * 513.25771.5575.25603.75367.5 C B * 219.751950.7519.7547.5
H 3 * 392.25201.25225.5104.75450 H 6 * 6108.25250130.25580.5103.75
H M * 219.7520.251919.2519 L e * 10303.5323.25550.75375379
Table 8. The average of number of function evaluations.
Table 8. The average of number of function evaluations.
fnSHZMHZHZHSFRfnSHZMHZHZHSFR
R n 1016,16316,273158,85527,68932,477 R n 3039,47647,23989,21688,89484,366
R n 5070,12578,060158,855140,065145,405 R n 80111,717124,351204,501233,725210,052
R n 100141,796143,539284,919286,741283,780 Z n 1011511198184817071564
Z n 3020,27520,90138,10936,80533,444 Z n 5076,05575,440150,300148,645143,665
Z n 80273,669267,189528,851528,057500,993 Z n 100517,684486,662915,674879,076913,862
P W 8627415,58451,96812,0573973 P W 3234,404121,275292,14798,78021,783
P W 84141,542490,258811,517223,75869,492 P W 120214,751829,0161,140,306358,706108,603
S P 10113184181262204 S P 30334556578784657
S P 809051549155721441572 S P 10010691951189127032119
T r 1031161385110720031067 T r 3053,11949,92532,644225,33320,132
T r 60600,240600,240600,240600,240556,165 T r 100800,2801,000,4051,000,4051,000,4051,000,405
S u 10011,72411,34815,35814,94214,328 S u 807806802110,92711,1189506
S u 5039103690489548804565 S u 3018211684218221581775
S u 10357327352370327 B R 21119710795150
C V 4352231755571507616,829 D J 344468512575
B O 274721168582 M a 2177183120F123
S 5 * 42605333841276334 S 7 * 43663653061939383
S 10 * 4201249289864259 G P * 261619820702112,3754143
R a s * 2652052407259265 B h 1 * 286100359114128
S H * 217,7831756518,55419,03320,010 P 8 * 4663858287630191838
P 16 * 5663858287630191838 C B * 2595715259143
H 3 * 33698059024191800 H 6 * 675817509124064671
H M * 25961575857 L e * 1033393556605841254169
Table 9. The number of worst iterations.
Table 9. The number of worst iterations.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 431505585FF S 7 * 410,000F10,000FF
S 10 * 4710F3020FF H M * 2401009575180
H * 330059011554651270 H * 6505003009550F
C B * 2551451520090 P 8 * 420151555010
P 16 * 57558353280F7300 S H * 2100115200250190
B h 1 * 2205FFFF R a s * 21310FFFF
G P * 220FF300F L e * 1024701430FF3100
Table 10. The number of worst function evaluations.
Table 10. The number of worst function evaluations.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 412,600220340FF S 7 * 440,000F40,000FF
S 10 * 42840F12,080FF H M * 2120200190150360
H * 39001770346513953810 H * 63003000180057,300F
C B * 211029030400180 P 8 * 4806060220040
P 16 * 53775417516,400F36,500 S H * 2200230400500380
B h 1 * 2410FFFF R a s * 22620FFFF
G P * 240FF600F L e * 1024,70014,300FF31,000
Table 11. The number of best iterations.
Table 11. The number of best iterations.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 4503555FF S 7 * 4750F520FF
S 10 * 420F70FF H M * 2151010105
H * 350608520130 H * 65010010050F
C B * 21510105010 P 8 * 4555505
P 16 * 51503580F40 S H * 21010105010
B h 1 * 220FFFF R a s * 210FFFF
G P * 215FF50F L e * 10400120FF395
Table 12. The number of best function evaluations.
Table 12. The number of best function evaluations.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 4200140220FF S 7 * 43000F2080FF
S 10 * 480F280FF H M * 24520202010
H * 315018025560390 H * 6300600600300F
C B * 230202010020 P 8 * 420202020020
P 16 * 5725175400F200 S H * 220202010020
B h 1 * 240FFFF R a s * 220FFFF
G P * 230FF100F L e * 1040001200FF3950
Table 13. The average of time.
Table 13. The average of time.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 40.7200.0500.046FF S 7 * 47.368F13.249FF
S 10 * 40.151F0.885FF H M * 20.0170.0310.0280.0210.053
H * 30.1860.3530.4090.2710.361 H * 60.0570.1940.1434.712F
C B * 20.0180.0250.0100.0490.030 P 8 * 40.0140.0150.0090.1160.007
P 16 * 50.3190.1350.683F1.606 S H * 20.0280.0370.0500.0840.039
B h 1 * 20.039FFFF R a s * 20.261FFFF
G P * 20.015FF0.078F L e * 101.2210.627FF2.087
Table 14. The average of number of iterations.
Table 14. The average of number of iterations.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 4416.74767FF S 7 * 45928.7F8648FF
S 10 * 4131.3F589FF H M * 224.32934.825.850.8
H * 3213.3268.8373.3247333.8 H * 650205177.52382.5F
C B * 22623.312.857.536.8 P 8 * 4121110.5267.56.3
P 16 * 5376171.25878.8F2208.5 S H * 230.33948.56541.5
B h 1 * 274.3FFFF R a s * 2346.7FFFF
G P * 218FF62.5F L e * 101012.7506FF1380.3
Table 15. The average of number of function evaluations.
Table 15. The average of number of function evaluations.
fnHSSHZHSMHZHSHZHSHSHSFRfnHSSHZHSMHZHSHZHSHSHSFR
S 5 * 41666.7188268FF S 7 * 423,714.7F34,592FF
S 10 * 4525.3F2356FF H M * 2735869.551.5101.5
H * 3640806.31119.87411001.3 H * 63001230106514,295F
C B * 25246.525.511573.5 P 8 * 4484442107025
P 16 * 51880856.34393.8F11,042.5 S H * 260.7789713083
B h 1 * 2148.7FFFF R a s * 2693.3FFFF
G P * 236FF125F L e * 1010,126.75060FF13,802.5
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alnowibet, K.A.; Mahdi, S.; Alshamrani, A.M.; Sallam, K.M.; Mohamed, A.W. A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems. Mathematics 2022, 10, 3595. https://doi.org/10.3390/math10193595

AMA Style

Alnowibet KA, Mahdi S, Alshamrani AM, Sallam KM, Mohamed AW. A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems. Mathematics. 2022; 10(19):3595. https://doi.org/10.3390/math10193595

Chicago/Turabian Style

Alnowibet, Khalid Abdulaziz, Salem Mahdi, Ahmad M. Alshamrani, Karam M. Sallam, and Ali Wagdy Mohamed. 2022. "A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems" Mathematics 10, no. 19: 3595. https://doi.org/10.3390/math10193595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop