A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems

Alnowibet, Khalid Abdulaziz; Mahdi, Salem; Alshamrani, Ahmad M.; Sallam, Karam M.; Mohamed, Ali Wagdy

doi:10.3390/math10193595

Open AccessArticle

A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems

¹

Statistics and Operations Research Department, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia

²

Department of Mathematics & Computer Science, Faculty of Science, Alexandria University, Alexandria 21544, Egypt

³

School of IT and Systems, University of Canberra, Canberra, ACT 2601, Australia

⁴

Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(19), 3595; https://doi.org/10.3390/math10193595

Submission received: 23 August 2022 / Revised: 19 September 2022 / Accepted: 24 September 2022 / Published: 1 October 2022

(This article belongs to the Special Issue Development and Optimization of Mathematical Models for Operations Research)

Download

Browse Figures

Versions Notes

Abstract

:

This paper contains two main parts, Part I and Part II, which discuss the local and global minimization problems, respectively. In Part I, a fresh conjugate gradient (CG) technique is suggested and then combined with a line-search technique to obtain a globally convergent algorithm. The finite difference approximations approach is used to compute the approximate values of the first derivative of the function f. The convergence analysis of the suggested method is established. The comparisons between the performance of the new CG method and the performance of four other CG methods demonstrate that the proposed CG method is promising and competitive for finding a local optimum point. In Part II, three formulas are designed by which a group of solutions are generated. This set of random formulas is hybridized with the globally convergent CG algorithm to obtain a hybrid stochastic conjugate gradient algorithm denoted by HSSZH. The HSSZH algorithm finds the approximate value of the global solution of a global optimization problem. Five combined stochastic conjugate gradient algorithms are constructed. The performance profiles are used to assess and compare the rendition of the family of hybrid stochastic conjugate gradient algorithms. The comparison results between our proposed HSSZH algorithm and four other hybrid stochastic conjugate gradient techniques demonstrate that the suggested HSSZH method is competitive with, and in all cases superior to, the four algorithms in terms of the efficiency, reliability and effectiveness to find the approximate solution of the global optimization problem that contains a non-convex function.

Keywords:

global optimization; unconstrained minimization; numerical approximations of gradients; meta-heuristics; stochastic parameters; conjugate gradient methods; efficient algorithm; performance profiles; comparisons; testing

MSC:

90C26

1. Introduction

The major goal of this paper is to find the local and global minima of a convex and non-convex function. The local and global minimization problems are defined as follows.

Definition 1.

A local minimum

x_{lo} \in S

of the function f,

f : S \to R

is an input element with

f (x_{lo}) \leq f (x)

for all

x

neighboring

x_{lo}

. If

S \subseteq R^{n}

, it is formulated by

\forall x_{lo} \exists ε > 0 : f (x_{lo}) \leq f (x) \forall x \in S, ∥ x - x_{lo} ∥ \leq ε .

(1)

Definition 2.

The point

x_{gl} \in S

is called the global minimizer of the function f;

f : S \to R

such that

f (x_{gl}) \leq f (x)

\forall x \in S

. When

S \subseteq R^{n}

, then the problem can be formulated by

min_{x \in S} f (x) : S \to R,

(2)

In both problems (formulae)

S \subseteq R^{n}

is the range in which we find the global minimizer of

f (x)

.

f (x)

is continuously differentiable.

Global optimization (GO) attempts to find the approximate solution of the objective function are shown in Problem (2).

However, this task can be difficult since the knowledge about f is usually only local. On the other hand, the fastest algorithms (LO) prefer to find a local point since these algorithms are not capable of finding the global solution at each run.

The bottom line is that the core difference between the GO methods and the LO algorithms is as follows: the GO methods focus on solving Problem (2) over the given set, while the task of the LO methods is to solve (1). Consequently, solving Problem (1) is relatively simple by using deterministic (classical) local optimization methods. On the contrary, finding the global optimum of Problem (2) is an NP-hard problem.

Challenging problems arise in different application fields, for example, technical sciences, industrial engineering, economics, networks, chemical engineering, etc. See [1,2,3,4,5,6,7,8,9,10,11].

Recently, many optimization algorithms have been proposed to deal with these problems. The thoughts of those suggested methods rely on the standard of meta-heuristic strategies (random search).

There are different classifications for meta-heuristic methods [12].

Mohamed et al. [7] presented a brief description of these classifications.

In random algorithms, the minimization technique relies partly on probability.

In contrast, in the deterministic algorithms, a guessing scale is not utilized. Hence, deterministic techniques need an exhaustive examination over the research domain of function f to find the approximate solution to Problem (2) at each run. Otherwise, they fail in this task.

Therefore, finding the approximate solution to Problem (2) by using random techniques can be proved by the asymptotic convergence probability. See [13,14,15].

There are many deterministic methods that have been proposed for dealing with the local optimization problems. See, for example, Refs. [16,17,18,19,20].

The most popular deterministic method is the CG method [18]. CG methods are exceedingly utilized to find the local minimizer of Problem (1) [21].

However, the CG algorithms have a numerical weakness, so their subsequent actions might be low if a little step is created away from the local point. Hence, for solving this issue, a line-search technique is combined with the CG technique to create a globally convergent algorithm [22,23].

Therefore, many conjugant gradient line-search methods are suggested; see, for example, refs. [18,24,25,26,27,28].

The CG method is an efficient and inexpensive technique to deal with Problem (1).

The CG method is an iterative algorithm. Therefore, the candidate solutions are generated by the following recursive formula.

x_{k + 1} = x_{_{k}} + α_{_{k}} d_{_{k}},

(3)

where the step size

α_{_{k}} > 0

, and the directions

d_{_{k}}

are created by the following formula:

d_{k + 1} = - g_{_{k + 1}} + β_{_{k}} d_{_{k}}, d_{_{0}} = - g_{_{0}} .

(4)

where

g_{_{k}}

denotes the gradient vector of the function f at the point

x_{_{k}}

.

Several versions of the CG methods are suggested. The core difference between those CG algorithms relies on choosing the parameter

β_{_{k}}

[18,27,28,29]. The main features of the CG method are as follows: it has low memory requirements, it is strongly local, and it has global convergence properties [30].

Many authors presented several studies to analyze the CG method; see, for example, Refs. [31,32].

In 1964, the authors of [33] applied the CG methods to nonlinear problems, and they proposed the following parameter.

β_{_{k}}^{F R} = \frac{∥ g_{k + 1} ∥^{2}}{∥ g_{_{k}} ∥^{2}} .

(5)

The authors of [34,35] established the global convergence of the scheme defined in (5); they used an exact line search and an inexact line search respectively.

However, the author of [36] showed that there are some cases that have some strays; these jamming occurrences happen when the search directions

d_{_{k}}

are almost orthogonal to the gradient vector

g_{_{k}}

[18].

The authors of [37,38] presented a modification of the parameter

β_{_{k}}^{F R}

for treating the noise event denoted in [36]. Hence, they proposed the following parameter.

β_{_{k}}^{P R P} = \frac{y_{_{k}}^{T} g_{k + 1}}{| | g_{_{k}} {| |}^{2}},

(6)

where

y_{_{k}} = g_{k + 1} - g_{_{k}}

. When a noise occurs

g_{k + 1} \approx g_{_{k}}

,

β_{_{k}}^{P R P} \approx 0

, and

d_{k + 1} \approx - g_{k + 1}

, i.e., when jamming happens, the search direction

d_{_{k}}

is no longer perpendicular to the gradient vector

g_{_{k}}

, but it is aligned with the vector

- g_{_{k}}

. This built-in restart advantage of the

β_{_{k}}^{P R P}

parameter usually has better quick convergence when compared to the parameter

β_{_{k}}^{F R}

[18].

The authors of [39] proposed an approach closely related to

β_{_{k}}^{P R P}

, and it is defined as follows.

β_{_{k}}^{H S} = \frac{y_{_{k}}^{T} g_{k + 1}}{d_{_{k}}^{T} y_{_{k}}} .

(7)

in the case that step-size

α_{_{k}}

is found by an exact line search algorithm. Hence, by (4) and the orthogonality situation

g_{k + 1}^{T} y_{_{k}} = 0

, the following can be obtained:

d_{k}^{T} y_{_{k}} = {(g_{k + 1} - g_{_{k}})}^{T} d_{_{k}} = - d_{_{k}}^{T} g_{_{k}} = | | g_{_{k}} {| |}^{2} .

(8)

Therefore,

β_{_{k}}^{H S} = β_{_{k}}^{P R P}

when the step size

α_{_{k}}

is calculated by an exact line search method. Other fundamentals formulas of the parameter

β_{_{k}}

which contain one term are listed as follows.

β_{_{k}}^{L S} = \frac{g_{k + 1}^{T} y_{_{k}}}{- d_{_{k}}^{T} g_{_{k}}} .

(9)

Formula (9) was proposed by [40].

β_{_{k}}^{D Y} = \frac{∥ g_{k + 1} {| |}^{2}}{y_{_{k}}^{T} d_{_{k}}} .

(10)

Formula (10) was proposed by Dai and Yuan [41]. It is noteworthy that when the f is quadratic and step size

α_{_{k}}

is selected to reduce f along

d_{_{k}}

, the options of the parameter

β_{_{k}}

mentioned above are alike for the generic nonlinear function.

Different alternatives have fully different convergence possessions [18].

Many version of the parameter

β_{_{k}}

have been proposed in two- and three terms; see, for example, Refs. [32,42,43,44,45,46,47,48,49,50].

For example, in the following two approaches, we present some modifications to obtain a new CG method. See Section 2.

β_{_{k}}^{H Z} = \frac{(y_{_{k}}^{T} g_{_{k}}) (d_{k - 1}^{T} y_{_{k}}) - 2 | | y_{_{k}} {| |}^{2} (d_{k - 1}^{T} g_{_{k}})}{{(d_{k - 1}^{T} y_{_{k}})}^{2}} .

(11)

Formula (11) was proposed by [30].

β_{_{k}}^{M H Z} = \frac{(y_{_{k}}^{T} g_{_{k}}) (d_{k - 1}^{T} y_{_{k}}) - 2 | | y_{_{k}} {| |}^{2} (d_{k - 1}^{T} g_{_{k}})}{max {σ | | y_{_{k}} | |^{2} | | d_{_{k}} | |^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}},

(12)

where

σ > 0.5

is a constant. Formula (12) was proposed by [49]. The denominator

{(d_{k - 1}^{T} y_{_{k}})}^{2}

in the

β_{_{k}}^{H Z}

is modified to

max {σ ∥ y_{_{k}} ∥^{2} ∥ d_{_{k}} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}

in the

β_{_{k}}^{M H Z}

. This procedure may help the

d_{_{k}}

stay in a trusted area automatically beneath each iteration [49]. Furthermore, in a situation

σ | | y_{_{k}} {| |}^{2} | | d_{_{k}} {| |}^{2} < {(d_{k - 1}^{T} y_{_{k}})}^{2}

,

β_{_{k}}^{M H Z}

decreases to

β_{_{k}}^{H Z}

with

α_{_{k}}

calculated to satisfy the inexact line search. Moreover,

β_{_{k}}^{H Z}

decreases to

β_{_{k}}^{H S}

under the exact line search.

Consequently, by using a line search method, the CG method can satisfy the following descent condition:

g_{_{k}}^{T} d_{_{k}} \leq - C {∥ g_{_{k}} ∥}^{2},

(13)

where

C > 0

is a constant.

The sufficient descent condition (13) has a core task in the convergence analysis of the algorithms. See [17,30,31,32,35,41,49,51,52].

However, the CG method has a numerical obstacle; its sub-sequential phases might be low if a little step is created away from the intended point [49].

Recently, the authors of [48,49] proved that the CG algorithm includes powerful convergence features if it satisfies the trust-region feature that is determined by

| | d_{_{k}} | | < C_{_{v}} | | g_{_{k}} | |,

(14)

where

C_{_{v}} > 0

is a constant. It is shown, therefore, that the trust-region property can enable the search direction

d_{_{k}}

to be bounded in the trust radius [49]. Numerous researchers proposed many CG algorithms that give perfect results and powerful convergence properties. See [30,48,49,51].

The selection of the right step size

α_{_{k}}

can help the CG algorithms to achieve global convergence.

The exact line search is defined as follows:

f (x_{_{k}} + α_{_{k}} d_{_{k}}) = min_{α \geq 0} θ (α) = f (x_{_{k}} + α d_{_{k}}) .

(15)

It is clear that in big-scale problems, the exact line search cannot be used.

Therefore, there are many techniques to achieve this task. Formula (15), for example, the weak Wolfe–Powell algorithm (WWP), is a popular technique, and it is exceedingly utilized. The WWP technique is designed to find the step size

α_{_{k}}

to satisfy the following inequalities:

f (x_{_{k}} + α_{_{k}} d_{_{k}}) \leq f (x_{_{k}}) + δ α_{_{k}} g_{_{k}}^{T} d_{_{k}},

(16)

and

g {(x_{_{k}} + α_{_{k}} d_{_{k}})}^{T} d_{_{k}} \geq σ g_{_{k}}^{T} d_{_{k}},

(17)

where

δ \in (0, 0.5)

and

σ \in (δ, 1)

are constants.

Inequality (16) is named the Armijo condition, and the WWP line search decreases to strong Wolfe–Powell (SWP) by substituting Inequality (17) with the following inequality:

| g {(x_{_{k}} + α_{_{k}} d_{_{k}})}^{T} d_{_{k}} | \leq - σ g_{_{k}}^{T} d_{_{k}},

(18)

Generally, under the WWP line search, it is assumed that the gradient

g (x)

is Lipschitz continuous in the convergence analysis. Therefore, the following inequality is satisfied:

| | g (x) - g (y) | | \leq L | | x - y | |,

(19)

with L is a constant

\forall x, y \in R^{n}

.

In fact, the CG technique with the line search methods has proven notability in solving the local optimization problem [18,27,28]. However, in trying to solve Problem (2), the CG method fails to achieve this task per run because it is trapped to a local point. To prevent sticking in a local point, random parameters are used [53].

We can summarize the essence of the above discussions as follows.

Recently, there have been many and many proposed approaches presented to improve the performance of deterministic methods, such as CG methods, gradient descent methods, Newton methods, etc. Those new approaches are designed to deal with the local optimization problems. See, for example, Refs. [16,17,18,19,20].

On the other hand, a plentiful number of stochastic approaches are suggested to deal with the global optimization problems. See, for example, Refs. [1,2,4,5,7,54].

Therefore, to gain the features of both deterministic and stochastic methods, many studies presented several ideas and suggestions to combine deterministic and stochastic techniques to obtain a new technique that is efficient and effective in solving Problem (2). Numerical outcomes demonstrated that the interbreed between classical and stochastic techniques has been hugely successful. See [55,56,57,58,59].

This work focuses on solving the local and global minimization problems. So, the first part of this study trades with Problem (1) by suggesting a new modified CG method, while the second part of this paper presents a new random approach that includes three formulae by which the candidate solutions are generated randomly.

Therefore, the new proposed stochastic approach is combined with the new modified CG method that is proposed in the first part of this paper to obtain a new hybrid stochastic conjugate gradient algorithm that solves Problem (2). The new hybrid stochastic conjugate gradient algorithm has four formulae by which the candidate solutions are created. One of the four formulae is a purely deterministic formula, the second one is a mixture of deterministic and stochastic parameters, and the other two formulas contain parameters generated randomly. The bottom line is that we can claim that the main merit that makes the new hybrid algorithm capable of finding the approximate solution to the global minimum of a non-convex function comes from the hybridization of random and non-random parameters.

Consequently, the contribution of this paper is divided into two parts.

Part I presents the following contributions.

A new modified CG technique is proposed and added with a line search for obtaining a globally convergent algorithm that solves Problem (1). It is abbreviated by SHZ.
The convergence analysis of the SHZ algorithm is designed.
The gradient vector is estimated by using a numerical approximation approach (DFF); step-size h (interval) is randomly.
The convergence analysis of the DFF method is designed.
The four FR, SH, HZ and MZH methods are designed like the SHZ algorithm to solve Problem (1).
Numerical experiments of the five SHZ FR, SH, HZ and MZH algorithms are analyzed by using the performance profiles.

Part II presents the following contributions.

⋄: Stochastic parameters are designed (SP).
⋄: The five SHZ, FR, SH, HZ and MZH algorithms are hybridized with the SP technique to obtain five hybrid algorithms; HSSHZ, HSFR, HSSH, HSHZ and HSMZH. These five algorithms solve Problem (2).
⋄: Numerical experiments of the five HSSHZ, HSFR, HSSH, HSHZ and HSMZH algorithms are analyzed by using the performance profiles.

Consequently, the remainder of the study is arranged as follows.

Part I contains the following sections: Section 2 presents a new modified CG- SHZ technique with its convergence analysis.

In Section 3, the approximate value of the gradient vector is calculated by using the numerical differentiation. Section 4 presents the numerical investigations of the local minimization problem. Part II contains the following sections: Section 5 presents a random approach for unconstrained global optimization. Section 6 presents the hybridization of the conjugate gradient method with stochastic parameters. The numerical experiments of Problem (2) are presented in Section 7. Some concluding remarks are given in Section 8.

Part I: Local Minimization Problem

In this part, a new modified CG technique is presented, the convergence analysis of this technique is designed, the numerical differentiation approach is utilized to calculate the approximate values of the first derivative, the five algorithms are designed to solve Problem (1), and their numerical experiments are analyzed by using the performance profiles.

2. Suggested CG Method

Recently, the authors of [49] suggested a new MHZ-CG method, relying on the study which was proposed by the authors of [30]. The MHZ method contains the sufficient descent and the trust-region features independent of a line search technique. The parameter of the MHZ is defined by (12).

Therefore, the story in this section begins with the authors of [30] who proposed a new CG-HZ method, where the parameter of the HZ method is defined by (11). The parameter

β_{_{k}}^{H Z}

can ensure that

d_{_{k}}

satisfies the following inequality:

d_{_{k}}^{T} g_{_{k}} \leq - \frac{7}{8} | | g_{_{k}} {| |}^{2},

(20)

where (20) is proved by [30]. If the step size

α_{_{k}}

is calculated by the true line search, then

β_{_{k}}^{H Z}

decreases to the

β_{_{k}}^{H S}

that was proposed by [39] because

d_{_{k}}^{T} g_{_{k}} = 0

is true [49].

Hence, for obtaining the global convergence for a general function, Hager and Zhang [30] dynamically adjusted the down limitation of

β_{_{k}}^{H Z}

by

d_{_{k}} = - g_{_{k}} + β_{_{k}}^{H Z^{+}} d_{k - 1}, d_{_{0}} = - g_{_{0}},

(21)

β_{_{k}}^{H Z^{+}} = max {β^{H Z}, r_{_{k}}}

,

r_{_{k}} = \frac{- 1}{| | d_{k - 1} | | min {r, | | g_{_{k - 1}} | |}}

, where

r > 0

is a constant.

Many researchers have suggested several modifications and refinements to improve the performance of the CG-HZ algorithm. The latest version of the CG-HZ method was offered by [49]. Yuan et al. [49] presented some modifications to the HZ-CG method, and the result was obtaining the new CG-MHZ algorithm.

The CG-MHZ algorithm contains a sufficient condition and the trust-region feature.

The research direction of the MHZ-CG technique is designed as follows:

d_{_{k}} = - g_{_{k}} + β_{_{k}}^{M H Z} d_{k - 1}, d_{_{0}} = - g_{_{0}},

(22)

where the

β^{M H Z}

is defined by (12).

In this paper, the MHZ method is extended and modified to obtain a new proposed method called the SHZ method such that the SHZ method has a sufficient condition and the trust-region feature. This method is defined as follows:

d_{_{k}} = - g_{_{k}} + β_{_{k}}^{S H Z} d_{k - 1}, d_{_{0}} = - g_{_{0}},

(23)

β_{_{k}}^{S H Z} = \frac{(y_{_{k}}^{T} g_{_{k}}) (d_{k - 1}^{T} y_{_{k}}) - 2 | | y_{_{k}} {| |}^{2} (d_{k - 1}^{T} g_{_{k}})}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{_{k}} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}},

(24)

where the

ϑ = max {ρ, R_{_{k}}}

, the

ρ

and

R_{_{k}}

are defined as follows. The parameter

ρ

is changed randomly at each iteration and its values are taken from the range

[0.8, 2)

and

R_{_{k}} = ▵ f ▵ x

. The values of

▵ f

and

▵ x

are calculated by

▵ f = | f_{_{0}} - f_{_{I t r}} |,

(25)

where Itr is the number of iterations, and after the Itr number of iterations,

f_{_{I t r}}

and

▵ f

are computed. Then, we set

f_{_{0}} = f_{_{I t r}}

, while

▵ x

is defined by

▵ x = ∥ x_{_{k + 1}} - x_{_{k}} ∥, for k = 0, 1, \dots, I t r .

(26)

Hence, when

ϑ = σ

,

β_{_{k}}^{S H Z}

inevitably reduces to one of the following methods

{β_{_{k}}^{M H Z}, β_{_{k}}^{H Z}, β_{_{k}}^{H S}}

as follows.

If

ϑ = σ

and

δ ∥ y_{_{k}} ∥^{2} {∥ d_{_{k}} ∥}^{2} > {(d_{k - 1}^{T} y_{_{k}})}^{2}

, the

β_{_{k}}^{S H Z}

reduces to the

β_{_{k}}^{M H Z}

. Otherwise,

β_{_{k}}^{S H Z}

reduces to

β_{_{k}}^{H Z}

or to

β_{_{k}}^{H S}

under the exact line search [49]. This procedure gives the advantages of the MHZ, HZ and HS methods to the proposed SHZ method. In other words, the SHZ algorithm gains the characteristics of the three MHZ, HZ and HS algorithms. This is why the SHZ algorithm is superior to the four other MHZ, HZ, HS and FR methods.

Note: The authors of [49] imposed that the

σ > 0.5

is a constant, while the parameter

ϑ

is modified dynamically at each iteration.

Convergence Analysis of Algorithm 1

In this section, we present the features of Algorithm 1. We also present the convergence analysis of this algorithm, and we show that the search direction

d_{_{k}}

that is defined by Formula (23) satisfies the sufficient descent condition and the trust-region merit, which are defined by Formulae (13) and (14), respectively.

Algorithm 1 A conjugate gradient method (CG-SHZ).

Input:: $f : R^{n} \to R$ , $f \in C^{1}$ , $γ \in (0, 1)$ , $k = 0$ , a starting point $x_{_{k}} \in R^{n}$ and $ε > 0$ .
Output:: $x^{*} = x_{_{loc}}$ the local minimizer of f, $f (x^{*})$ , the value of f at $x^{*}$
1:: Set $d_{_{0}} = - g_{_{0}}$ and $k : = 0$ .
2:: while $∥ g_{_{k}} ∥ > ε$ . do
3:: compute $α_{_{k}}$ to satisfy (16) and (17).
4:: Calculate a new point $x_{_{k + 1}} = x_{_{k}} + α_{_{k}} d_{_{k}}$ .
5:: compute $f_{_{k}} = f (x_{_{k + 1}})$ , $g_{_{k}} = g (x_{_{k + 1}})$
6:: Set $k = k + 1$ .
7:: calculate the search direction $d_{_{k}}$ by (23).
8:: end while
9:: return $x_{ac}$ the local minimizer and its function value $f_{a c}$

Two sensible hypotheses are assumed as follows.

Hypothesis 1.

We suppose that Problems (1) and (2) contain an objective function

f (x)

with the following characteristics: continuity and differentiability properties.

Hypothesis 2.

In some neighborhood ℵ of the level set

ℓ = {x \in R^{n} : f (x) \leq f (x_{_{0}})},

the gradient vector

g (x)

is Lipschitz continuous. This means that there is a fixed real number

L < \infty

such that

∥ g (x) - g (y) ∥ \leq L ∥ x - y ∥,

for all

x, y \in ℵ

.

Lemma 1.

Suppose that the sequence

{x_{_{k}}}

is obtained by Algorithm 1. If

d_{_{k}}^{T} y_{_{k}} \neq 0

, then

g_{_{k}}^{T} d_{_{k}} \leq - c {∥ g_{_{k}} ∥}^{2},

(27)

and

| | d_{_{k}} | | \leq r_{_{v}} ∥ g_{_{k}} ∥,

(28)

where

c = 1 - \frac{7}{9 ϑ} > 0

,

ϑ = max {ρ, R_{_{k}}}

, ρ is taken randomly from

\in [\frac{8}{10}, 2)

at each iteration of Algorithm 1,

0 \leq R_{_{k}} < \infty

, and

r_{_{v}} = (1 + \frac{3}{ϑ})

is the trust-region radius.

Proof.

If

k = 0

,

d_{_{0}} = - g_{_{0}}

, then

g_{_{0}}^{T} d_{_{0}} = - | | g_{_{0}} {| |}^{2}

and

| | d_{_{0}} | | = ∥ g_{_{0}} ∥

, which indicates (27) and (28) by picking

c \in (0, 1]

and

r_{_{v}} \in [1, \infty)

.

Merging (23) with (24), the result is obtaining the following:

g_{_{k}}^{T} d_{_{k}} = \frac{(y_{_{k}}^{T} g_{_{k}}) (d_{k - 1}^{T} y_{_{k}}) (g_{_{k}}^{T} d_{k - 1}) - 2 | | y_{_{k}} {| |}^{2} {(g_{_{k}}^{T} d_{k - 1})}^{2}}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{k - 1} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}} - {∥ g_{_{k}} ∥}^{2} .

(29)

The following inequality

u^{T} v \leq \frac{1}{2} {(| | u | |}^{2} + {∥ v ∥}^{2})

is applied to the first term of the numerator of Inequality (29), where

u = d_{k - 1} g_{k}^{T} y_{_{k}}

,

v = y_{_{k}} g_{_{k}}^{T} d_{k - 1}

, and it is clear that

u^{T} v \leq \frac{7}{9} {(| | u | |}^{2} + {∥ v ∥}^{2})

is right.

Therefore, the following inequality obtains

g_{_{k}}^{T} d_{_{k}} = \frac{(y_{_{k}}^{T} g_{_{k}}) (d_{k - 1}^{T} y_{_{k}}) (g_{_{k}}^{T} d_{k - 1}) - 2 | | y_{_{k}} {| |}^{2} {(g_{_{k}}^{T} d_{k - 1})}^{2}}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{k - 1} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}} - {∥ g_{_{k}} ∥}^{2} \leq

- ∥ g_{_{k}} ∥^{2} + \frac{\frac{7}{9} | | y_{_{k}} {| |}^{2} ∥ g_{_{k}} ∥^{2} | | d_{k - 1} {| |}^{2} + \frac{7}{9} | | y_{_{k}} {| |}^{2} {(g_{_{k}}^{T} d_{k - 1})}^{2} - 2 | | y_{_{k}} {| |}^{2} {(g_{_{k}}^{T} d_{k - 1})}^{2}}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{k - 1} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}} =

- ∥ g_{_{k}} ∥^{2} + \frac{\frac{7}{9} | | y_{_{k}} {| |}^{2} ∥ g_{_{k}} ∥^{2} | | d_{k - 1} {| |}^{2} - \frac{11}{9} | | y_{_{k}} {| |}^{2} {(g_{_{k}}^{T} d_{k - 1})}^{2}}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{k - 1} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}} \leq

- ∥ g_{_{k}} ∥^{2} + \frac{\frac{7}{9} | | y_{_{k}} {| |}^{2} ∥ g_{_{k}} ∥^{2} | | d_{k - 1} {| |}^{2}}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{k - 1} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}} \leq (\frac{7}{9 ϑ} - 1) {∥ g_{_{k}} ∥}^{2},

such that

max \{ϑ ∥ y_{_{k}} ∥^{2} {∥ d_{k - 1} ∥}^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}\} \geq ϑ ∥ y_{_{k}} ∥^{2} {∥ d_{k - 1} ∥}^{2},

(30)

where

ϑ = max {ρ, R_{_{k}}}

. Since

ϑ \geq \frac{8}{10}

and

c = 1 - \frac{7}{9 ϑ} > 0

, (27) is true.

By using (30), it is obvious that

\begin{matrix} ∥ d_{_{k}} ∥ = ∥ - g_{_{k}} + \frac{(y_{_{k}}^{T} g_{_{k}}) (d_{k - 1}^{T} y_{_{k}}) - 2 | | y_{_{k}} {| |}^{2} (d_{k - 1}^{T} g_{_{k}})}{max {ϑ ∥ y_{_{k}} ∥^{2} ∥ d_{_{k}} ∥^{2}, {(d_{k - 1}^{T} y_{_{k}})}^{2}}} d_{k - 1} ∥ \leq \\ ∥ - g_{_{k}} ∥ + \frac{| | y_{_{k}} {| |}^{2} ∥ g_{_{k}} ∥ ∥ d_{k - 1} ∥^{2} + 2 ∥ y_{_{k}} ∥^{2} ∥ g_{_{k}} ∥ ∥ d_{k - 1} ∥^{2}}{ϑ ∥ y_{_{k}} ∥^{2} {∥ d_{k - 1} ∥}^{2}} = (1 + \frac{3}{ϑ}) ∥ g_{_{k}} ∥ \end{matrix}

Consequently, (28) is met, where

r_{_{v}} \in [1 + \frac{3}{ϑ}, \infty)

. The proof is complete. □

Corollary 1.

According to Formula (28) of Lemma 1, the following formula is met.

\sum_{k = 0}^{\infty} \frac{∥ g_{_{k}} ∥^{4}}{∥ d_{_{k}} ∥^{2}} = \infty .

(31)

Proof.

Since

∥ d_{_{k}} ∥ \leq r_{_{v}} {∥ g_{_{k}} ∥}^{2}

, where

1 < r_{_{v}} < \infty

, then

∥ d_{_{k}} ∥^{2} \leq r_{_{v}}^{2} {∥ g_{_{k}} ∥}^{4}

, therefore,

\frac{∥ d_{_{k}} ∥^{2}}{∥ g_{_{k}} ∥^{4}} \leq r_{_{v}}^{2}

, hence

\frac{∥ g_{_{k}} ∥^{4}}{∥ d_{_{k}} ∥^{2}} \geq \frac{1}{r_{_{v}}^{2}}

. Now, the final expression is summed as

k \to \infty

. The result is obtaining the following inequality:

\sum_{k = 0}^{\infty} \frac{∥ g_{_{k}} ∥^{4}}{∥ d_{_{k}} ∥^{2}} \geq \sum_{k = 0}^{\infty} \frac{1}{r_{_{v}}^{2}} = \frac{1}{r_{_{v}}^{2}} \sum_{k = 0}^{\infty} 1 = \infty

. Therefore, (31) is met. □

Under the assumptions, we give a helpful lemma that was basically proved by Zoutendijk [60] and Wolfe [61], Wolfe [62].

Lemma 2.

Assume that the

x_{_{0}}

is the initial point by which Assumption 1 is satisfied. Regarding any algorithm of Formula (23),

d_{_{k}}

is a descent direction, and

α_{_{k}}

satisfies the standard Wolfe conditions (16) and (17). Hence, the following inequality is met:

\sum_{k = 0}^{\infty} \frac{{(g_{_{k}}^{T} d_{_{k}})}^{2}}{∥ d_{_{k}} ∥^{2}} < \infty

(32)

Proof.

It tracks Formula (17), such that

d_{_{k}}^{T} y_{_{k}} = d_{_{k}}^{T} (g_{_{k + 1}} - g_{_{k}}) \geq (σ - 1) g_{_{k}}^{T} d_{_{k}} .

(33)

On the other hand, the Lipschitz condition (19) implies

{(g_{_{k + 1}} - g_{_{k}})}^{T} d_{_{k}} \leq α_{_{k}} L {∥ d_{_{k}} ∥}^{2} .

(34)

The above two inequalities give

α_{_{k}} \geq \frac{σ - 1}{L} . \frac{g_{_{k}}^{T} d_{_{k}}}{∥ d_{_{k}} ∥^{2}},

(35)

which with (16) implies that

f_{_{k}} - f_{_{k + 1}} \geq c \frac{{(g_{_{k}}^{T} d_{_{k}})}^{2}}{∥ d_{_{k}} ∥^{2}},

(36)

where

c = \frac{δ (1 - σ)}{L}

. By summing (36) and with the observation that f is limited below, we see that (32) holds, which concludes the proof. □

Theorem 1.

Suppose that Hypotheses 1 and 2 hold, and by utilizing the outcome of Corollary 1, the sequence

{g_{_{k}}}

that is generated by Algorithm 1 satisfies the following:

lim_{k \to \infty} inf ∥ g_{_{k}} ∥ = 0,

(37)

Proof.

By contradiction, suppose that (37) is not true; then, for some

ϵ > 0

, the following inequality is true:

∥ g_{_{k}} ∥ \geq ϵ .

(38)

Hence, with inequality (38) and (27), we obtain

{g_{k}}^{T} d_{_{k}} \leq - c {∥ g_{_{k}} ∥}^{2} \leq - ϵ^{2} .

(39)

Then, we have

\frac{{g_{k}}^{T} d_{_{k}}}{∥ d_{_{k}} ∥} \leq \frac{- ϵ^{2}}{∥ d_{_{k}} ∥};

\frac{{g_{k}}^{T} d_{_{k}}}{∥ d_{_{k}} ∥} \geq \frac{ϵ^{4}}{∥ d_{_{k}} ∥^{2}},

and by summing the final expression, we obtain

\sum_{k = 0}^{\infty} \frac{{({g_{k}}^{T} d_{_{k}})}^{2}}{∥ d_{_{k}} ∥^{2}} \geq \sum_{k = 0}^{\infty} \frac{ϵ^{4}}{∥ d_{_{k}} ∥^{2}} = \infty .

(40)

Therefore, the above leads to a contradiction with (32). So, (37) is met. □

Note 1: The search direction

d_{_{k}}

that is defined by Formula (23) satisfies the sufficient descent condition which is defined by Formula (13).

Note 2: Lemma 1 guarantees that Algorithm 1 has a sufficient descent property and the trust-region feature automatically.

Note 3: Theorem 1 confirms that the series

{g_{_{k}}}

that is obtained by Algorithm 1 approaches to 0 as long as

k \to \infty

.

In the next section, the numerical differentiation approach is discussed by which the first derivative is estimated and the step size

α_{k}

is computed.

3. Numerical Differentiation

We now turn our attention to the numerical approximation to compute the approximate value of the gradient vector. In precept, it can be possible to find an analytic form for the first derivative for any continuous and differentiable function. However, in some cases, the analytic form is very complicated. The numerical approximation of the derivative may be sufficient for some purposes.

In this paper, the values of the

α_{_{k}}

,

g_{_{k}}

and the direction

d_{_{k}}

are computed by using the numerical differentiation method. Moreover, we have another step size and research directions that are generated randomly.

Several suggested methods have given fair outcomes for computing the gradient vector values numerically. See [63,64,65,66,67].

The common approaches by which the first derivative is computed are the finite difference approximation methods. Therefore, the first derivative

f^{'} (x)

can be estimated by the following numerical differentiation formula:

D_{_{f}} f (x_{_{i}}) = \frac{f (x_{_{i + 1}}) - f (x_{_{i}})}{x_{_{i + 1}} - x_{_{i}}} = \frac{f (x_{_{i}} + h) - f (x_{_{i}})}{h},

(41)

where h is limited and little, but it is not necessarily infinitesimally small.

Reasonably, if the value of the h is small, the approximated value of the first derivative may improve. The forward difference and the central difference are the familiar and common methods used in many studies; see for example, [68,69,70,71,72].

The Taylor series can be used to derive these formulas. Thus, 3, 4 and 5 points can be utilized to derive these formulas, but it will be more costly than utilizing 2 points. The central difference method is known to include aspects of both accuracy and precision [73] but it needs

2 n

function evaluations against the forward-difference approximation approach, which needs n function evaluations for each iteration. So, in this study, the forward-difference approximation approach is used, because it is a cheap method and it has sensible precision [66,68].

The advantage of the finite difference approximation approaches relies on choosing the fit values of the h.

Error approximation of the first derivative is discussed in the next section.

Therefore, the discussion of the error analysis guides us to define an appropriate finite-difference interval for the forward-difference approximation that balances the truncation error that grows from the error in the Taylor formula, and the magnitude error that is obtained from noise during computing the function values [66].

3.1. Error Analysis

Formula (41) contains the forward-difference approximation form that is used to estimate the first derivative of the function f. Its errors are proportional to some power of the values of h. Therefore, it appears that the errors go on to reduce if h is reduced. However, it is a part of the problem since it is assumed only the truncation error yielded by truncating the high-order terms in the Taylor series expansion and does not take into account the round-off error induced by quantization. The round-off error is beside the truncation error; all of them are discussed in this section as follows.

Regarding this goal, suppose that the function values

f (x)

,

f (x + h)

, are quantized to

θ_{_{1}} = f (x + h) + ϵ_{_{1}}

,

θ_{_{0}} = f (x) + ϵ_{_{0}}

, with the sizes of the round-off errors

ϵ_{_{1}}

and

ϵ_{_{0}}

all being smaller than some positive number

ε

, that is

| ϵ_{_{j}} | \leq ε

; with

j = 0, 1

.

Hence, the total error of the forward difference approximation defined by (41) is derived by

D_{_{f}} f (x) = \frac{θ_{_{1}} - θ_{_{0}}}{h} = \frac{f (x + h) + ϵ_{_{1}} - f (x) - ϵ_{_{0}}}{h} = f^{'} (x) + \frac{ϵ_{_{1}} - ϵ_{_{0}}}{h} + \frac{T_{_{f}}}{2} h .

(42)

Hence,

| D_{_{f}} f (x) - f^{'} (x) | \leq | \frac{ϵ_{_{1}} - ϵ_{_{0}}}{h} | + | \frac{T_{_{f}}}{2} | h \leq \frac{2 ε}{h} + \frac{| T_{_{f}} |}{2} h,

(43)

with

T_{_{f}} = f ″ (x)

. Therefore, the upper bound of the error is illustrated by the right-hand side of Formula (43). The maximum limited of error contains two expressions; the first comes from the rounding error and in inverse proportion to step-size h, whilst the second comes from the truncation error and in direct proportion to h. These two parts can be formulated as a function

ϕ (h)

with respect to h as follows

ϕ (h) = \frac{2 ε}{h} + \frac{| T_{_{f}} |}{2} h

. Now, if we find the minimizer

h^{*}

of the function

ϕ (h)

, then the value

ϕ (h^{*})

is the upper bound of the total error. Hence

\frac{d ϕ (h)}{d h} = \frac{- 2 ε}{h^{2}} + \frac{| T_{_{f}} |}{2} = 0

, then

h^{*} = 2 \sqrt{\frac{ε}{| T_{_{f}} |}} = 2 \sqrt{\frac{ε}{| f ″ (x) |}} .

(44)

Therefore, it can be concluded that as we create small values of h, the round-off error might grow, whilst the truncation error reduces. It is called the “step-size dilemma”.

Consequently, there have to be some optimal values of the

h^{*}

for the forward difference approximation formula, as derived analytically in (44). However, Formula (44) is only of theoretical value and cannot be used practically to determine

h^{*}

because we do not have any information about the second derivative and, therefore, we cannot estimate the values of

T_{_{f}}

.

Therefore, there are many approaches which have been presented to deal with the step-size dilemma.

Recently, Shi et al. [66] proposed a bisection search for finding a finite-difference interval for a finite-difference method. Their approach was presented to balance the truncation error that grows from the error in the Taylor formula and the measurement error obtained from noise in the function evaluation. According to their numerical experience, the finite-difference interval

h^{*}

are bounded between the following ranges

[2 \times 10^{- 4}, 6.32 \times 10^{- 1}]

,

[2.72 \times 10^{- 4}, 8.26 \times 10^{0}]

and

[8.44 \times 10^{- 3}, 3.94 \times 10^{0}]

by using the forward and central differences to estimate the values of the first derivative of the f.

Additionally, the authors of [68] gave a study of the theoretical and practical comparison of the approximate values of the gradient vector in derivative-free optimization. These authors analyzed some approaches for approximating gradients of noisy functions utilizing only function values; those techniques include a finite difference.

The values of the finite difference interval are as follows

10^{- 8} \leq h^{*} \leq 1

.

According to the earlier investigations, the core of the difference between all approaches is to determine the step size h. Hence, the value of the step size is ranged between this range

h^{*} \in [1, 12 \times 10^{- 10}]

.

In this paper, the h is designed in a way that makes its values generated randomly. Additionally, the values of the h are connected to the function values per iteration to cover this domain, thus the feature here is that the value of h is modified per iteration randomly.

Therefore, a fresh approach to define the

h^{*}

is presented in the following section.

3.2. Selecting a Step-Size h

The forward difference approach is a cheap method compared to the different techniques.

The forward difference approach has shown promising results for minimizing noisy black-box functions [66].

Depending on the hypotheses which are listed in Section 2, let

x_{_{0}}

be any starting point, thus function f satisfies the following

f_{_{0}} \geq f_{_{1}} \geq \dots \geq f_{_{k}}

, for

k = 0, 1, 2, \dots

. The numerical outcomes that are given in the past papers denote that the values of step-size h belong to the following range

[10^{- 10}, \leq 1]

.

Therefore, the next Algorithm 2 is created to generate the values of the

h^{*}

randomly from the intervals

[0.1, 10^{- 8}]

.

Algorithm 2 Algorithm for calculating the values of

h^{*}

.

Step 1: At each iteration k, we generate a set random values between

10^{- 2}

, and

10^{- 7}

, and this set of random values is denoted by

L_{_{ϵ}} = {l_{_{ϵ_{1}}}, l_{_{ϵ_{2}}}, \dots, l_{_{ϵ_{10}}}}

.

Step 2: The minimum and maximum of the set

L_{_{ϵ}}

are extracted, respectively, as follows

M_{_{ϵ}} = min {l_{_{ϵ_{i}} : i = 1, 2, \dots, 10}}

,

N_{_{ϵ}} = max {l_{_{ϵ_{i}} : i = 1, 2, \dots, 10}}

and set

M_{_{f}} = M_{_{ϵ}}^{- 1}

.

Step 3: The function value f is calculated at each k;

f_{_{k}} = f (x_{_{k}})

.

Now we determine two cases according to the function values of the

| f_{_{k}} |

as follows.

Case 1: If

| f_{_{k}} | \in [10^{- 1}, \infty)

, the value of the h is determined by

h_{_{k}} = \{\begin{matrix} \sqrt{\frac{N_{_{ϵ}}}{M_{_{f}}}} & if | f_{_{k}} | > M_{_{f}}, \\ \sqrt{\frac{M_{_{ϵ}}}{| f_{_{f}} |}} & otherwise . \end{matrix}

(45)

Case 2: If

| f_{_{k}} | \in [0, 10^{- 1})

, the value of the h is determined by a random way from the range

[10^{- 4}, 10^{- 8}]

.

Example: In this example, we show how the above algorithm is run.

Let us suppose that the point

x_{_{0}}

has four different values as starting points with four different values of f, for example,

f_{_{0}} = f (x_{_{0}}) = {10^{10}, 10^{6}, 10^{3}, 10^{- 1}}

and suppose we generate the set

L_{_{ϵ}}

as random values between

10^{- 1}

, and

10^{- 7}

such that

L_{_{ϵ}} = {1.50 \times 10^{- 4}, 5.10 \times 10^{- 6}, 1.01 \times 10^{- 6}, 1.40 \times 10^{- 2}, 1.78 \times 10^{- 7}, 1.92 \times 10^{- 5}, 1.09 \times 10^{- 3}, 2.77 \times 10^{- 4}, 2.99 \times 10^{- 04}, 5.15 \times 10^{- 4}}

,

M_{_{ϵ}} = 1.78 \times 10^{- 7}

; hence,

M_{_{f}} = 5.618 \times 10^{6}

, since

f_{_{0}} = 10^{10} > M_{_{f}} = 5.618 \times 10^{6}

, then we set

F_{_{0}} = M_{_{f}} = 5.618 \times 10^{6}

and

h_{_{1}} = 2 \sqrt{\frac{M_{_{ϵ}}}{M_{_{f}}}} = 2 \sqrt{\frac{1.78 \times 10^{- 7}}{5.618 \times 10^{6}}} = 3.56 \times 10^{- 7}

. If

f_{_{0}} = 10^{6}

,

f_{_{0}} = 10^{6} < M_{_{f}} = 5.618 \times 10^{6}

, and then

h_{_{1}} = 2 \sqrt{\frac{M_{_{ϵ}}}{F_{_{0}}}} = 2 \sqrt{\frac{5.618 \times 10^{6}}{10^{6}}} = 8.438 \times 10^{- 7}

, and

f_{_{0}} = {10^{3} < M_{_{f}} 5.618 \times 10^{6}

, we set

F_{_{0}} = 10^{3}

, then

h_{_{1}} = 2 \sqrt{\frac{M_{_{ϵ}}}{F_{_{0}}}} = 2 \sqrt{\frac{5.618 \times 10^{6}}{10^{3}}} = 2.6683 \times 10^{- 5}

.

Finally, if

f_{_{0}} = 10^{- 1}

, then

h_{_{1}} = 2 \sqrt{\frac{5.618 \times 10^{6}}{10^{- 1}}} = 2.67 \times 10^{- 3}

.

The above example shows how Case 1 is implemented by using Formula (45).

Regarding Case 2 when

0 \leq | f_{_{k}} | < 0.1

, the value of the

h_{_{k}}

is taken randomly from the range

[10^{- 4}, 10^{- 8}]

.

3.3. Estimating Gradient Vector

The forward finite difference (DFF) is utilized to compute the approximate value of the gradient vector of function f at

x \in R^{n}

by

{[DFF]}_{_{i}} = \frac{f (x + {he}_{_{i}}) - f (x)}{h}, for i = 1, 2, \dots, n .

(46)

where

h > 0

is the finite difference interval defined in Section 3.2, and

e_{_{i}} \in R^{n}

is the

i^{t h}

column of the identity matrix.

Therefore,

g (x) \approx DFF (x)

, is the approximate value of the gradient vector of function f at point

x

.

Therefore, the step size

φ_{_{k}}

is defined in the following.

The function

f (x)

is estimated by utilizing Taylor’s expansion up to the linear term around the point

x_{_{k}}

, for each iteration k. Then we have

f (x_{_{k}} + p) \approx f (x_{_{k}}) + g {(x_{_{k}})}^{T} p .

We define the quadratic model of

f (x)

at

x_{_{k}}

as

m_{k} (p) = \frac{1}{2} {(f (x_{_{k}}) + g {(x_{_{k}})}^{T} p)}^{2} = \frac{1}{2} f {(x_{_{k}})}^{2} + f (x_{_{k}}) g {(x_{_{k}})}^{T} p + \frac{1}{2} p^{T} g (x_{_{k}}) g {(x_{_{k}})}^{T} p .

Set

p = - φ g (x_{_{k}})

where

φ

is the step size along the −

g (x_{_{k}})

. The optimal value of the

φ

is picked by solving the following subproblem:

min_{φ \in R} m_{k} (φ) = \frac{1}{2} f {(x_{_{k}})}^{2} - φ f (x_{_{k}}) g {(x_{_{i}})}^{T} g (x_{_{k}}) + \frac{1}{2} φ^{2} {(g {(x_{_{k}})}^{T} g (x_{_{k}}))}^{2}

. This gives

φ_{k} = \frac{f (x_{_{k}})}{∥ g (x_{_{k}}) ∥^{2}} .

(47)

Therefore,

∥ g (x_{_{k}}) ∥^{2} = \frac{f (x_{_{k}})}{φ_{_{k}}}, φ_{_{k}} \neq 0,

(48)

where

g (x_{_{k}}) \approx DFF (x_{_{k}})

.

3.4. Convergence Analysis of DFF

The condition which is usually utilized in the convergence analysis of first-order methods with inexact gradient (DFF) vectors is defined by

| | DFF (x) - g (x) | | \leq C | | g (x) | |,

(49)

for some

0 \leq C < 1

. This condition is introduced by [74,75] and it is called a norm condition. This condition denotes that the

g (x) \approx D F F (x)

is a descent direction for the function f [68].

However, condition (49) cannot be applied, unless we know

∥ g (x) ∥

; therefore, this condition might be hard or impossible to verify.

There are many authors who have attempted to deal with this issue; see, for example, Refs. [68,76,77,78,79]. Byrd et al. [76] suggested a practical approach to estimate

∥ g (x_{_{k}}) ∥

, and they utilized it to guarantee some approximation of (49). Cartis and Scheinberg [77] and Paquette and Scheinberg [79] replaced condition (49) by

∥ D F F (x) - g (x) ∥ \leq k α_{_{k}} | | g (x) | |,

(50)

where

k > 0

, and convergence rate analysis were derived for a line search method that has access to deterministic function values in [77] and stochastic function values (with additional assumptions) in [79]. Berahas et al. [68] established conditions under which (49) holds. For the forward finite differences method (DFF), they set

h^{*} = 2 \sqrt{\frac{M_{_{ε}}}{L}}

.

Therefore, we present the following

Theorem 2.

Under Assumptions 1 and 2 of Section 2, let

DFF (x)

denote the forward finite difference approximation to the gradient

g (x)

. Then, for all

x \in R^{n}

, the following inequality is true:

| ∥ D F F (x_{_{k}}) ∥_{_{\infty}} - {∥ g (x_{_{k}}) ∥}_{_{\infty}} | \leq | f {(x_{_{k}})}_{h_{_{i}}} - f (x_{_{k}}) | + \frac{f (x_{_{k}})}{φ_{_{k}}}, φ_{_{k}} \neq 0,

(51)

where the value of the

φ_{_{k}}

is estimated by (47). We know that

{∥ X ∥}_{_{\infty}}

and

∥ X ∥

are the norm infinity and the 2-norm, respectively, and they are defined by

{∥ X ∥}_{_{\infty}} = max_{1 \leq i \leq n} | x_{_{i}} |,

(52)

∥ X ∥ = \sqrt{\sum_{_{i}} x_{_{i}}^{2}},

(53)

and then

{∥ X ∥}_{_{\infty}} = max_{1 \leq i \leq n} | x_{_{i}} | \leq \sqrt{\sum_{_{i}} x_{_{i}}^{2}} .

(54)

According to (46) which defines the gradient approximation by forward differences, the vector of

{[DFF (x_{_{k}})]}_{_{i}}

is described by

{[DFF (x_{_{k}})]}_{_{i}} = \frac{1}{h} {[f (x_{_{k}} + e_{_{i}} h) - f (x_{_{k}})]}_{_{i}}

, wherer

i = 1, 2, \dots, n

, then

∥ DFF (x_{_{k}}) ∥_{_{\infty}} = max_{1 \leq i \leq n} | {[\frac{f (x_{_{k}} + e_{_{i}} h) - f (x_{_{k}})}{h}]}_{_{i}} | = \frac{1}{h} max_{1 \leq i \leq n} | {[f (x_{_{k}} + e_{_{i}} h) - f (x_{_{k}})]}_{_{i}} |,

and therefore, the next inequality is true

∥ D F F (x_{_{k}}) ∥_{_{\infty}} = \frac{1}{h} max_{1 \leq i \leq n} | {[f (x_{_{k}} + e_{_{i}} h) - f (x_{_{k}})]}_{_{i}} | \leq | f {(x_{_{k}})}_{h_{_{i}}} - f (x_{_{k}}) | .

(55)

By using (48), (51), (54) and (55), we obtain

| ∥ DFF (x_{_{k}}) ∥_{_{\infty}} - ∥ g (x_{_{k}}) ∥_{_{\infty}} | \leq ∥ DFF (x_{_{k}}) ∥_{_{\infty}} + ∥ g (x_{_{k}}) ∥_{_{\infty}} \leq | f {(x_{_{k}})}_{h_{_{i}}} - f (x_{_{k}}) | + ∥ g (x_{_{k}}) ∥^{2} = | f {(x_{_{k}})}_{h_{_{i}}} - f (x_{_{k}}) | + \frac{f (x_{_{k}})}{φ_{_{k}}}, φ_{_{k}} \neq 0

.

Therefore, the theorem holds.

4. Numerical Experiments of Part I

All experiments were run on a PC with Intel(R) Core(TM) i5-3230M [email protected] 2.60 GHz with RAM 4.00 GB of memory on a Windows 10 operating system. The five methods were coded by utilizing MATLAB version 8.5.0.197613 (R2015a) and the machine epsilon was about

10^{- 16}

.

The model optimization test problems are categorized into two types. The first type is the test problems that contain a convex function, while the second type include a non-convex function. Both kinds of test problems are listed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 such that the second type of the test problem is referred to by *. Columns 1–4 of Table 1 give the data of the test problems as follows: the abbreviation of the function f is given on Column 1, the number of variables n is listed on Column 2, the exact function value

f (x^{*})

at the global point

x^{*}

is presented on Column 3, and the exact value of the norm of the gradient

∥ g (x^{*}) ∥

vector is given by Column 4, where the mark “−” denotes that the value of the norm of the gradient

∥ g (x^{*}) ∥

for the convex function satisfies the stopping criterion

∥ g (x^{*}) ∥ < 10^{- 6}

. Columns 5–8 are as Columns 1–4.

The data in Table 1 are taken from [56].

The numerical results for the local minimizers of all test problems are listed in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. Columns 1–2 and 8–9 contain the abbreviation of the function f and the number of the variables n, respectively. Columns 3–7 contain the abbreviation of each algorithm of the five algorithm SHZ, MHZ, HZ, HS and FR, which present the number of worst iterations, number of worst function evaluations, number of best iterations, number of best function evaluations, average of time (CPU), average of the number of iterations and average of the number of function evaluations, respectively. Columns 10–14 are similar to Columns 3–7.

Note 1: It is worth noting that the full name for each test function is mentioned in Appendix A according to the reference in which the test problem is.

Note 2: F denotes that the algorithm has failed to find the local minimizer of the function f according to the stopping criteria of Algorithm 1 which are listed in Section 4.1 below.

The stopping criteria of Algorithm 1 are as follows.

4.1. Stopping Criteria of Algorithm 1

Since this section focuses in finding a local minimizer of all test problems, the stopping criteria of Algorithm 1 can be defined as follows.

According to the discussions of the convergence analysis which are mentioned in the previous sections, the stopping criterion of Algorithm 1 is, if

∥ g (x_{_{k}}) ∥ \leq ε_{_{1}}

is satisfied, Algorithm 1 stops, where

ε_{_{1}} \in [10^{- 6}, 10^{- 8}]

. However, the exact value of the gradient vector is unknown since the value of the gradient vector is estimated by Formula (46); therefore, this condition is replaced by

∥ {DFF}_{_{k}} ∥ \leq ε_{_{2}}

or

FEs = n 10^{4}

, i.e., if one of them is met, Algorithm 1 stops, where

ε_{_{2}} \in [10^{- 7}, 10^{- 9}]

, FEs denotes the maximum function evaluations and n is the number variables of the f.

In the following section, the performance profile is presented as an easy tool to compare the performance of our proposed method versus other methods in finding local minimizers of convex or non-convex functions regarding the worst and best numbers of iterations and function evaluations, the average of CPU time and the average of iterations and function evaluations, respectively.

4.2. Performance Profiles

The performance profile is the best tool for testing the performance of the proposed algorithms [80,81,82,83,84].

In this paper, the five algorithms’ performance evaluation standards are as follows: the worst and best numbers of iterations and function emulations, and the average of the CPU time, iterations and function emulations. They are abbreviated as itr.w, itr.be, FEs.w, FEs.be, time.a, itr.a and EFs.a, respectively. In the remainder of the paper, the set Fit will be used to denote the seven criteria;

Fit = {itr.w, itr.be, FEs.w, FEs.be, time.a, itr.a, EFs.a}

.

Therefore, the numerical outcomes are presented in the form of performance profiles, as depicted in [82]. The most important characteristic of the performance profiles is that they can be shown in one figure by plotting for the different solvers a cumulative distribution function

ρ_{_{s}} (τ)

.

The performance ratio is defined by first setting

r_{p, s} = \frac{t_{p, s}}{min {t_{p, s} : s \in S}}

, where

p \in P

, P is a set of test problems, S is the set of solvers, and

t_{p, s}

is the value obtained by solver s on test problem p.

Then, define

ρ_{_{s}} (τ) = \frac{1}{| P |} size {p \in P : r_{p, s} \leq τ}

, where

| P |

is the number of test problems.

The value of

ρ_{_{s}} (1)

is the probability that the solver will win over the remaining ones, i.e., it will yield a value lower than the values of the remaining ones.

In the following, the performance profiles are utilized to evaluate the performance of the five methods: SHZ, MHZ, HZ, SH and FR.

Therefore, in this paper, the term

t_{p, s}

indicates one element of the set Fit,

| P | = 46

is the number of test problems. We have 46 unconstrained test problems, 14 of which include non-convex functions. The group of solvers

S = {S H Z, M H Z, H Z, S H, F R}

finds the local minimizers of the 46 test problems; therefore, the values of the

F i t

are taken from the results of the 46 test problems as follows.

Each solver s of the set S is run 51 times for each of the 46 problems; at each run, every element of the set Fit has owned its value. So, they are analyzed in the following.

r_{p, s} = \{\begin{matrix} \frac{{fit}_{p, s}}{min {{fit}_{p, s} : s \in S}} & if the s pass to solve the p, \\ \infty & otherwise, \end{matrix}

(56)

where

{fit}_{p, s}

is an element of the Fit for the test problem p by using the solver s.

Note: Formula (56) means that if the final result, obtained by a solver

s \in S

, satisfies Inequality (57), then the first branch of (56) is computed. Otherwise, we set

r_{p, s} = \infty

.

∥ D F F_{_{k}} ∥ \leq ε_{_{2}},

(57)

where

ε_{_{2}} \in [10^{- 5}, 10^{- 9}]

.

Therefore, the performance profile of solver s is defined as follows:

δ (r_{p, s}, τ) = \{\begin{matrix} 1 & if r_{p, s} \leq τ, \\ 0 & otherwise, \end{matrix}

(58)

Therefore, the performance profile for solver s is then given by the following function:

ρ_{_{s}} (τ) = \frac{1}{| P |} \{\sum_{p \in P} δ (r_{p, s}, τ)\}, τ \geq 1 .

(59)

As we mentioned above,

| P | = 46

and

τ \in [1, 60]

.

By definition of

{Fit}_{p, s}

,

ρ_{_{s}} (1)

denotes the fraction of test problems for which solver s performs the best. In general,

ρ_{_{s}} (τ)

can be explained as the probability for solver

s \in S

that the performance ratio

r_{p, s}

is within a factor

τ

of the best possible ratio. Additionally, the essential characteristic of performance profiles is that they present data on the proportional performance of numerous solvers [82,83].

The numerical outcomes of the five methods are analyzed by using the performance profiles as follows. Figure 1, Figure 2, Figure 3 and Figure 4 show the performance profiles of the set solvers S, for each element of the set Fit, respectively.

The performance profile depicted on the left of Figure 1 (in the term itr.w) compares the five techniques for a set of the 46 test problems.

The SHZ method has the best performance for the 46 test problems; this means that our suggested approach is capable of finding a local minimizer to the 46 test problems as fast as, or faster than, the other four approaches.

For instance, if

τ = 1

, the SHZ technique is capable of finding the local minimizer for

65 %

of problems versus the

33 %

,

20 %

,

20 %

and

13 %

of a set of test problems solved by the MHZ, HS, FR and HZ methods, respectively.

In general, the term itr.w,

τ = 60

displays that all test problems are solved by SHZ against

96 %

of test problems solved by the MHZ, HZ and FR methods respectively, while

93 %

of test problems are solved by the HS method. At

τ \geq 400

, all test problems are solved by the MHZ, HZ and FR methods respectively, while

98 %

of test problems are solved by the HS.

The right graph of Figure 1 shows that the method SHZ is capable of finding the local minimum of all test problems regarding term FEs.w.

The rest of Figure 2, Figure 3 and Figure 4 show that the SHZ algorithm is superior to the four algorithms regarding the rest of the terms of the set Fit.

Therefore, the SHZ technique includes the characteristics of efficiency, reliability and effectiveness in solving Problem (1) compared to the other four methods.

Note: The power of the SHZ technique comes from the fact that the SHZ method gains the features of the four methods MHZ, HZ and HS, as we mentioned in Section 2.

Part II: Global Minimization Problem

It is worth mentioning that the final results of Part I for the second set of test problems contain some global minimizers at some runs for some non-convex functions. This means that the pure CG technique could not find the global minimizer of the second type of test problems for each run because it is a local method.

Therefore, to make this method capable of solving Problem (2) per run, the random technique is proposed and it is added to the CG approach to gain a new PS-CG hybrid technique that solves Problem (2). In many studies, the numerical outcomes indicated that the interbreed between a classical method and a random technique is very successful in overcoming the weakness of these methods. See [55,56,57,58,59].

Consequently, this part of the paper seeks to solve Problem (2).

Therefore, each method of the five CG methods mentioned in Part I is hybridized with the stochastic technique to obtain five algorithms to try to solve Problem (2).

In the next section, a stochastic technique is presented.

5. Random Technique

In this section, a new random parameter “SP” is presented. This stochastic technique contains three different formulas by which three different points are generated. This set of formulas is combined with the CG method to obtain a new algorithm that solves Problem (2).

Random Parameters (SP Technique)

Step 1: The first point is computed as follows, generate

V_{_{k}} \sim {[- 1, 1]}^{n}

is as a random vector, set

γ_{_{k}} = 10^{ψ_{_{k}}}

,

ψ_{_{k}} \in [0.01, 1)

, where the interval

[0.01, 1)

is divided into Itr of fractions and at every iteration k, the parameter

ψ_{_{k}}

takes one value of the Itr and then computes

λ_{_{k}} = \frac{{(1 + γ_{_{k}})}^{| V_{_{i}} |}}{γ_{_{k}}} S V_{_{i}}

as a research direction with the step lengths, where

i = 1, 2, \dots, n

, n is the of number variables, Itr is the number of iterations, and

S V_{_{i}}

denotes the signs of the

V

and is defined by

S V_{_{i}} = \{\begin{matrix} - 1 & if V_{_{i}} < 0, \\ 1 & otherwise . \end{matrix}

(60)

Thus, a point is calculated as follows:

x_{_{1}} = x_{ac} + λ_{_{k}},

(61)

where

x_{ac}

is the best point obtained yet, and then we compute

f_{_{1}} = f (x_{_{1}})

.

Step 2: The second point is defined by

x_{_{2}} = x_{_{ac}} + η_{_{k}} B_{_{k}},

(62)

where

B_{_{k}} = φ_{_{k}} d_{_{k}}

,

φ_{_{k}}

is defined by (47),

η_{_{k}} \in (0, 2)

is a random number, and the

d_{_{k}}

is defined by (23). Then, we compute

f_{_{2}} = f (x_{_{2}})

.

Step 3: This point is defined by

x_{_{3}} = X_{_{w}} + \frac{1}{2} Dx,

(63)

where

Dx = \frac{{(1 + μ_{_{k}})}^{| V_{_{i}} |} - 1}{μ_{_{k}} + 0.1} S V_{_{i}}

,

μ_{_{k}} = {| f_{a c} |}^{2}

,

f_{a c}

is the function value at the point

x_{a c}

that has been accepted, and

X_{_{w}}

is a stochastic variable picked from the feasible range of the objective function. This means that for

X_{_{w}} \sim {[a, b]}^{n}

, a and b are the lower and upper bounds of the feasible range, respectively, and the random vector

V

with its signs

S V_{_{i}}

is defined by the first step.

Therefore, we calculate

f_{_{3}} = f (x_{_{3}})

.

For finding the global minimizer of a non-convex function, the above stochastic technique is used since Algorithm 1 is not capable of finding the global solution at each run. In other words, in some runs, Algorithm 1 fails to find the global solution to this function due to it sticking to a local point.

In the following example, we show how the SP algorithm is run.

Example: This example shows how the three steps of the SP algorithm are implemented.

We use the first test problem of the list of the test problems that are listed in Appendix A.

R_{_{2}} (x) = 100 {(x_{_{1}}^{2} - x_{_{2}})}^{2} + {(x_{_{1}} - 1)}^{2}

, to facilitate an explanation of the mechanism of using the Sp algorithm (Formulas (61)–(63)), we use the following easy information about the function

R_{_{2}} (x)

,

n = 2

is the number of the variables,

x_{a c} = [2; - 1]

, or

x_{a c} = [2; 1]

, where

x_{a c}

represents the best solution has been accepted so far or the starting point; hence, the function values at the two points are

R_{_{2}} (x_{a c}) = 100 {(2^{2} + 1)}^{2} + {(2 - 1)}^{2} = 2500 + 1 = 2501

and

R_{_{2}} (x_{a c}) = 100 {(2^{2} - 1)}^{2} + {(2 - 1)}^{2} = 900 + 1 = 901

.

Supposing

I t r = 5

is the number of iterations, the interval

[0.01; 1)

is divided into five fractions with step size

\frac{1 - 0.01}{5} = 0.198

, and thus the set of this fractions is

A = {0.01, 0.208, 0.406, 0.604, 0.802}

, let k be 3 which means the algorithm is at the third iteration. Then,

ψ_{_{3}} = 0.406

,

γ_{_{3}} = 10^{ψ_{_{3}}} = 10^{0.406} = 2.5468

. Let

V_{_{3}}

be

[- 0.5; 1]

, then

λ_{_{3}} = [\frac{{(1 + 2.5468)}^{| - 0.5 |}}{2.5468} \times - 1; \frac{{(1 + 2.5468)}^{| 1 |}}{2.5468} \times 1] = [- \frac{1.8833}{2.5468}; \frac{3.5468}{2.5468}] = [- 0.73948; 1.3926] .

Therefore, the new solution is computed by Formula (61) as follows.

x_{_{1}} = x_{a c} + λ_{_{3}} = [2; - 1] + [- 0.73948; 1.3926] = [1.2605; 0.3926]

or

x_{_{1}} = x_{a c} + λ_{_{3}} = [2; 1] + [- 0.73948; 1.3926] = [1.2605; 2.3926]

.

The function values at both points are as follows.

R_{_{2}} (x_{_{1}}) = 100 {(1 . 2605^{2} - 0.3926)}^{2} + {(1.2605 - 1)}^{2} = 143.1 + 0.06786 = 143.17

or

R_{_{2}} (x_{_{1}}) = 100 {(1 . 2605^{2} - 2.3926)}^{2} + {(1.2605 - 1)}^{2} = 64.6 + 0.06786 = 64.668

.

Therefore,

R_{_{2}} (x_{_{1}}) < R_{_{2}} (x_{a c})

; this means the solution that is generated by Formula (61) reduces the function value.

In the following, we explain how the candidate solution is generated by Formula (62).

Let

M_{_{ϵ}} = 1.2 \times 10^{- 6}

. By using Formula (45), we obtain

h_{_{3}} = 4.381 \times 10^{- 5}

as the step size h (a random interval) to the difference approximations method, and then we have

x_{{_{h}}_{_{1}}} = [x_{a c} (1) + h_{_{3}}; x_{a c} (2)] = [2 + 4.381 \times 10^{- 5}; - 1]

,

x_{{_{h}}_{_{2}}} = [x_{a c} (1); x_{a c} (2) + h_{_{3}}] = [2; - 1 + 4.381 \times 10^{- 5}]

.

Therefore, the values of the function at the three points

x_{a c}

,

x_{{_{h}}_{_{1}}}

and

x_{{_{h}}_{_{2}}}

are listed in the following.

R_{_{2}} (x_{a c}) = 2501

,

R_{_{2}} (x_{{_{h}}_{_{1}}}) = 2501.175

and

R_{_{2}} (x_{{_{h}}_{_{2}}}) = 2500.956

.

We compute the approximate value of the gradient vector by Formula (46) as follows:

D F F (x_{a c}) = [\frac{2501.175 - 2501}{4.381 \times 10^{- 5}}; \frac{2500.956 - 2501}{4.381 \times 10^{- 5}}] = [3994.522; - 1004.34],

φ_{_{3}} = \frac{2501}{{∥ DFF ∥}^{2}} = 0.0002

, where

φ_{_{3}}

is defined by (47).

We consider

d_{_{3}} = - g (x_{a c}) \approx [- 3994.522; 1004.34]

because we do not have information about the value of the

d_{_{2}}

in this illustration example.

Now, we apply Formula (62), as follows

B_{_{3}} = φ_{_{3}} d_{_{2}} = [- 0.799; 0.201]

, we take

η_{_{3}} = 0.971

as a random number from the range

(0, 2)

, then

x_{_{2}} = [2; - 1] + 0.971 \times [- 0.799; 0.201] = [1.2242; - 0.80483]

, the function value at the point

x_{_{2}}

is

R_{_{2}} (x_{_{2}}) = 530.66

.

We note that the

R_{_{2}} (x_{_{2}}) = 530.66 < R_{_{2}} (x_{a c}) = 2501

, i.e., the function value is reduced by the point

x_{_{2}}

.

In the following, we explain how the candidate solution is generated by Formula (63).

μ_{_{3}} = {| f_{a c} |}^{2} = 2501^{2} = 6, 255, 001

,

Dx = [\frac{{(1 + 6, 255, 001)}^{| - 0.5 |} - 1}{6, 255, 001 + 0.1} \times - 1; \frac{{(1 + 6, 255, 001)}^{| 1 |} - 1}{6, 255, 001 + 0.1} \times 1] = [- \frac{2501 - 1}{6, 255, 001.1}; \frac{6, 255, 002 - 1}{6, 255, 001.1}] = [- 0.0004; 0.999] .

X_{_{w}} = [- 3.095; 8.701]

is as a random vector picked from the range

{[- 5, 10]}^{2}

, and then

x_{_{3}} = [- 3.095; 8.701] + \frac{1}{2} [- 0.0004; 0.999] = [- 3.095; 8.701] + [- 0.0002; 0.4995] = [- 3.0952; 9.2005]

.

We compute the function value at the point

x_{_{3}}

;

R_{_{2}} (x_{_{3}}) = 100 {({(- 3.0952)}^{2} - 9.2005)}^{2} + {(- 3.0952 - 1)}^{2} = 14.422 + 16.771 = 31.193

.

We note that the

R_{_{2}} (x_{_{3}}) = 31.193 < R_{_{2}} (x_{a c}) = 2501

. Therefore, the point

x_{_{3}}

minimizes the function value.

According to the above example that illustrates the mechanism of Formulas (61)–(63), we deduce the following results.

Remark 1.

Formulas (3), (61) and (62) are the main formulas which are used in the new hybrid proposed algorithm that is described in Section 6. However, Formula (63) is used when

Δ f = 0

that is defined by Formula (25); in this case, Algorithm 3 reaches a critical point, thus if this point is the approximate value of the global minimizer point of the f, then Algorithm 3 stops according to the condition in Line 4 or Line 1 of Algorithm 3. Otherwise, the candidate solution is generated by Formula (63); see Section 6. Consequently, in this example, at iteration

k = 3

, the result which is obtained by Formula (63) cannot be taken into account due to the

Δ f \neq 0

.

Remark 2.

All Formulas (61)–(63) minimize the function value from any starting point.

6. Hybridization of the CG Method with Stochastic Parameters

When a stochastic method as a global optimization algorithm is combined with a globally convergent method (deterministic method), the result is a global optimization algorithm [55,56].

Therefore, the SP technique is hybridized with each of the five conjugate gradient methods SHZ, MHZ, HZ, HS and FR to obtain five techniques.

Our proposed algorithm is called a hybrid stochastic CG method abbreviated by HSSHZ that solves Problem (2). However, Algorithm 3 represents five alternative algorithms when the SHZ method is hybridized with the PS technique, then we obtain a new algorithm abbreviated by HSSHZ. When we combine any method of MHZ, HZ, HS or FR, we obtain four other abbreviations of algorithms as follows: HSMHZ, HSHZ, HSHS and HSFR, respectively.

In general, the outputs of this paper are five algorithms that solve Problem (2), where the best one is the HSSHZ algorithm as illustrated by the numerical experiments section of Part II.

In the following, Algorithm 1 is combined with SP technique to obtain Algorithm 3.

The SP method permits conducting an exhaustive wipe of the search range to guarantee that the global minimizer point is visited at least once per run.

Algorithm 3 Hybrid stochastic CG method.

Input:: $f : R^{n} \to R$ , $f \in C^{1}$ , $f_{a c} = f_{_{c g}}$ gained by Algorithm 1 and $ε > 0$ .
Output:: $x_{_{gl}} = x_{_{ac}}$ the global minimizer of f, $f (x_{_{gl}})$ , the value of f at $x_{_{gl}}$ .
1:: while $| f_{_{a c}} - f^{*} | > ε$ or FEs $< n 10^{4}$ do
2:: $f_{c g}$ is a function value f gained by Algorithm 1.
3:: $f_{a c} = min {f_{c g}, f_{_{1}}, f_{_{2}}}$ and $x_{a c}$ the best point gives the $f_{a c}$ .
4:: if $| f_{_{a c}} - f^{*} | \leq ε$ then
5:: Stop.
6:: end if
7:: if $▵ f = = 0$ then
8:: calculate the $x_{_{3}}$ and the $f_{_{3}} = f (x_{_{3}})$ by Formula (63).
9:: if $f_{_{3}} < f_{a c}$ then
10:: the $x_{_{3}}$ is accepted, compute the $x_{ac} \to x_{_{3}}$ , $f_{a c} \to f_{_{3}}$ , and go to Line 1.
11:: else
12:: generate another point $x_{_{3}}$ by Formula (63).
13:: end if
14:: else
15:: go to Line 1.
16:: end if
17:: end while
18:: return $x_{ac}$ the best point and its function value $f_{a c}$

A Mechanism Running Algorithm 3

As we mentioned above, Algorithm 3 is a combination of two methods; the first is a CG method of the five techniques CG

= {S H Z, M H Z, H Z, S H, F R}

that are discussed in Part I, and the second is a random method is depicted by Section 5. The point

x_{_{cg}}

is obtained by Algorithm 1 and it will be an input to Algorithm 3.

Algorithm 3 begins with Line 1 that is the stopping standard of the algorithm. Therefore, Algorithm 3 ends if one of the following standards is satisfied: The first standard is

| f_{_{a c}} - f^{*} | \leq ε

, and the second standard is FEs

\geq n 10^{4}

, where

f_{_{a c}}

the best value of the function f is gained, the

f^{*}

is the true solution,

ε = 10^{- 6}

, FEs is the number of function evaluations, and FEs

= n 10^{4}

is a stopping standard indicated by [85,86].

In Line 3, the best value of f is selected from the three values of the function

f_{c g}

,

f_{_{1}}

and

f_{_{2}}

, and indicated by

f_{a c}

, the three values of the function f are calculated by Algorithms (1), (61) and (62), respectively, and

x_{ac}

indicates this.

In Line 4, if

| f_{_{a c}} - f^{*} | \leq ε

is fulfilled, the algorithm ends. The standard that is listed in Line 7 gives the algorithm an opportunity to flee from the local points. Consequently, if

▵ f = 0

, then the algorithm has reached a crucial point. Therefore, if the norm of the gradient vector is 0 or ≈0, this point is either a local point or the global point. According to the above actions, the hybrid algorithm has been granted sequential opportunities to escape out of a snare (a local point). Thus, the procedures in Lines 8–12 are eligible for helping the algorithm to flee this snare, especially since the second stopping standard guarantees that most of the research domain is scanned.

The numerical outcomes of the five methods are given in the next section.

7. Numerical Experiments of Part II

The numerical results for the second test problems (non-convex functions) are presented, and these results are obtained by Algorithm 3.

The performance profiles tool that is described in Part I is used here for assessing the achievement of Algorithm 3 that contains five alternatives of algorithms as we mentioned above in Section 6.

The numerical results of the second type of the test problems are listed in Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15. Columns 1–2 and 8–9 contain the abbreviation of the function f and the number of the unknowns n, respectively. Columns 3–7 contain the abbreviation of each algorithm of the five algorithm HSSHZ, HSMHZ, HSHZ, HSHS and HSFR, which present the number of worst iterations, number of worst function evaluations, number of best iterations, number of best function evaluations, average of time (CPU), average of number of iterations and average of number of function evaluations, respectively. Columns 10–14 are similar to Columns 3–7.

Note: F denotes that the algorithm has failed to find the local minimizer of the function f according to the stopping criteria of Algorithm 3 which are listed in Section 6.

The performance profiles for the five algorithms are analyzed as follows.

Figure 5, Figure 6, Figure 7 and Figure 8 show the performance profiles of the five set solvers S regarding the set standard Fit that is mentioned in Section 4.2.

The performance profiles which are drawn on the left of Figure 5 (in the term itr.w) compares 5 methods for the 14 test problems.

The HSSHZ technique has a good achievement (for the term itr.w) for all test problems, which indicates that the HSSHZ technique is capable of solving Problem (2) as fast as or faster than the four techniques.

For instance, if

τ = 1

, the HSSHZ algorithm solves

71 %

of the 14 test problems against

14 %

,

14 %

,

7 %

and

0 %

, of the 14 test problems solved by the HSMHZ, HSHZ, HSFR and HSHS algorithms, respectively.

In general, for the term itr.w,

τ \geq 60

exhibits that the second type of the test problems are solved by HSSHZ, while

64 %

,

71 %

,

43 %

and

50 %

of test problems are solved by the HSMHZ, HSHZ, HSHS and HSFR algorithms respectively.

Figure 5, Figure 6, Figure 7 and Figure 8 demonstrate that the performance of the HSSHZ technique is better than the performance of the four techniques regarding the seven standards listed in the set Fit, respectively.

Therefore, the HSSHZ technique includes the characteristics of efficiency, reliability and effectiveness in finding the global minimizer of the non-convex function f compared to the other four methods.

It is worth observing that the power of the HSSHZ algorithm comes from the fact that the SHZ method gains the features of the four methods, MHZ, HZ, HS and FR, as mentioned in Section 2.

Note 1: In Algorithm 3, a run is considered successful if Inequality (64) is met.

| f_{_{a c}} - f^{*} | \leq 10^{- 5},

(64)

where

f^{*}

is the exact global solution that is listed in Columns 3 and 7 of Table 1, respectively, and the

f_{_{a c}}

is the final result obtained by Algorithm 3.

Note 2: Formula (56) means if the final result

f_{_{a c}}

, obtained by Algorithm 3 satisfies Inequality (64), then the first branch of (56) is computed; otherwise, we set

r_{p, s} = \infty

.

8. Conclusions and Future Work

A new modified CG algorithm is suggested, named SHZ. The SHZ finds the local minimizers of unconstrained optimization problems. The modernized formulae of the SHZ algorithm are more complicated than previous approaches; nevertheless, the numerical experiments of the SHZ are very strong. The convergence analysis of the SHZ algorithm is designed. We also analyzed the gradient approximation

g (x) \approx

DFF constructed by finite differences (the forward differences method). This method includes a new approach for selecting the fit value of the h according to the value of the objective function and it is updated dynamically at each iteration. The numerical results demonstrate that the performance of the SHZ method is positively competitive with the other four conjugate gradient methods based on performance profiles.

Comparing the final results of the gradient vector that were obtained by the method DFF to the exact values of the gradient vector demonstrates that the fresh technique succeeded in picking the right value of h. The proposed random approach recreates a critical role to make the SHZ method capable of finding the global minimizers of unconstrained optimization test problems, especially when the objective function is non-convex.

It can be worth observing that the power of the HSSHZ algorithm comes from the fact that the SHZ method gains the characteristics of the four methods, MHZ, HZ, HS and FR.

The suggested approach can be improved and modified to deal with constrained, multi-objective optimization problems, and it will be used for image restorations.

Author Contributions

Conceptualization, K.A.A.; Data curation, A.M.A.; Formal analysis, A.M.A. and K.M.S.; Funding acquisition, K.A.A.; Investigation, A.W.M.; Methodology, S.M.; Project administration, K.A.A. and K.M.S.; Software, S.M.; Supervision, A.W.M.; Validation, A.M.A.; Writing—original draft, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research is funded by Researchers Supporting Program at King Saud University (Project# RSP-2021/305).

Data Availability Statement

Not applicable.

Acknowledgments

The authors present their appreciation to King Saud University for funding the publication of this research through Researchers Supporting Program (RSP-2021/305), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. List of Test Problems

1 $R_{n}$ :: Rosenbrock functions [57,87,88]

$min_{x} \{\sum_{i = 1}^{n - 1} [100 {(x_{i}^{2} - x_{i + 1})}^{2} + {(x_{i} - 1)}^{2}]\} .$

Range of starting points $- 5 < x_{i} < 10, i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (1, 1, …, 1).
2 $Z_{n}$ :: Zakharov functions [57,80,87,88]

$min_{x} \{\sum_{i = 1}^{n} x_{i}^{2} + {(\sum_{i = 1}^{n} 0.5 i x_{i})}^{2} + {(\sum 0.5 i x_{i})}^{4}\} .$

Range of starting points $- 5 < x_{i} < 10$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (0, 0, …, 0).
3 PW:: Powell function [80]
$min_{x} \{\sum_{i = 1}^{\frac{n}{4}} [{(x_{4 i - 3} + 10 x_{4 i - 2})}^{2} + 5 {(x_{4 i - 1} - x_{4 i})}^{2} + {(x_{4 i - 2} - 2 x_{4 i - 1})}^{4} + 10 {(x_{4 i - 3} - x_{4 i})}^{4}]\} .$ Range of starting points $- 600 < x_{i} < 600$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (0, 0, …, 0).
4 SP:: Sphere function [89]

$min_{x} \{\sum_{i = 1}^{n} x_{i}^{2}\} .$

Range of starting points $- 10 \leq x_{i} \leq 10$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (0, 0, …, 0).
5 Tr:: Trid function [80]

$min \sum_{i = 1}^{n} {(x_{i} - 1)}^{2} - \sum_{i = 2}^{n} x_{i} x_{_{i - 1}} .$

Range of starting points $- n^{2} < x_{i} < n^{2}$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = \frac{n (n + 4) (n - 1)}{6}$ . at $x^{*} = i (n + 1 - i)$
6:: Sum Squares function [90]

$min_{x} \{\sum_{i = 1}^{n} i x_{i}^{2}\} .$

Range of starting points $- 100 < x_{i} < 100$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (0, 0, …, 0).
7 CV:: Colville function [57,80,91]

$\begin{matrix} min_{x} {100 {(x_{1}^{2} - x_{2})}^{2} + {(x_{1} - 1)}^{2} + {(x_{3} - 1)}^{2} + 90 {(x_{3}^{2} - x_{4})}^{2} \\ + 10.1 ({(x_{2} - 1)}^{2} + {(x_{4} - 1)}^{2}) + 19.8 (x_{2} - 1) {(x_{4} - 1)}^{2}} . \end{matrix}$

Range of starting points $- 10 < x_{i} < 10, i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} = (1, 1, 1, 1)$ .
8 BR:: Branin function [57,92,93]
$min_{x} \{{(x_{2} - \frac{5.1}{4 π^{2}} x_{1}^{2} + \frac{5}{π} x_{1} - 6)}^{2} + 10 (1 - \frac{1}{8 π} c o s (x_{1})) + 10\} .$
Range of starting points $- 5 < x_{i} < 15, i = 1, 2 .$
Only one global minima: $f (x^{*}) = 0.397887 .$ at $x^{*} = {(- π, 12.275), (9.42478, 2.475), (π, 2.275)}$ .
9 DJ:: De Joung function [57,87,88]

$min_{x} \{x_{1}^{2} + x_{2}^{2} + x_{3}^{2}\} .$

Range of starting points $- 5 < x_{i} < 15$ , $i = 1, 2, 3 .$
Number of local minima: no local minima.
Global minima: $f (x^{*}) = 0$ at $x^{*} = (0, 0, 0)$ .
10 BO:: Booth function [89]

$min_{x} \{{(x_{1} + 2 x_{2} - 7)}^{2} + {(2 x_{1} + x_{2} - 5)}^{2}\} .$

Range of starting points $- 10 < x_{i} < 10$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (1, 3).
11 Ma:: Matyas function [90]

$min_{x} \{0.26 (x_{_{1}}^{2} + x_{_{2}}^{2}) - 0.48 x_{_{1}} x_{_{2}}\} .$

Range of starting points $- 10 < x_{i} < 10$ , $i = 1, 2, . . ., n .$
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (0, 0 ).
12 Sm $^{*}$ :: Shekel functions [57,80,87,88,92,93,94]

$min_{x} \{- \sum_{j = 1}^{m} {(\sum_{i = 1}^{4} {(x_{i} - A_{i j})}^{2} + c_{j})}^{- 1}\} .$

where $c = 0.1 [1, 2, 2, 4, 4, 6, 3, 7, 5, 5]$ ,
$A = [\begin{matrix} 4.0 & 1.0 & 8.0 & 6.0 & 3.0 & 2.0 & 5.0 & 8.0 & 6.0 & 7.0 \\ 4.0 & 1.0 & 8.0 & 6.0 & 7.0 & 9.0 & 3.0 & 1.0 & 2.0 & 3.0 \\ 4.0 & 1.0 & 8.0 & 6.0 & 3.0 & 2.0 & 5.0 & 8.0 & 6.0 & 7.0 \\ 4.0 & 1.0 & 8.0 & 6.0 & 7.0 & 9.0 & 3.0 & 1.0 & 2.0 & 3.0 \end{matrix}]$
Range of starting points $0 < x_{i} < 10, i = 1, . . ., n .$
Number of local minima: m local minima.
Global minima:

$f {(x^{*})}_{n, m} = \{\begin{matrix} - 10.1532, & when m = 5, \\ - 10.4029, & when m = 7, \\ - 10.5364, & when m = 10 . \end{matrix}$

Global minima for three functions at $x^{*} = (4, 4, 4, 4)$ .
13 GP $^{*}$ :: Goldstein and Price function [57,80,87,88,92,94]
$u (x) = 1 + {(x_{1} + x_{2} + 1)}^{2} (19 - 14 x_{1} + 3 x_{1}^{2} - 14 x_{2} + 6 x_{1} x_{2} + 3 x_{2}^{2})$
$v (x) = 30 + {(2 x_{1} - 3 x_{2})}^{2} (18 - 32 x_{1} + 12 x_{1}^{2} + 48 x_{2} - 36 x_{1} x_{2} + 27 x_{2}^{2})$ .

$min_{x} \{v (x) u (x)\} .$

Range of starting points $- 2 < x_{i} < 2$ , $i = 1, 2 .$
Number of local minima: 4 local minima.
Global minima: $f (x^{*}) = 3$ at $x^{*} = (0, - 1) .$
14 Ras $^{*}$ :: Rastrigin function [93]

$min_{x} \{x_{1}^{2} + x_{2}^{2} - c o s (18 x_{1}) - c o s (18 x_{2})\} .$

Range of starting points $- 1 < x_{i} < 1$ , $i = 1, 2 .$
Number of local minima: many local minima.
Global minima: $f (x^{*}) = - 2$ at $x^{*} = (0, 0)$ .
15 Bh1 $^{*}$ :: Bohachevsky function [80]

$min_{x} \{x_{1}^{2} + 2 x_{2}^{2} - 0.3 c o s (3 π x_{1}) - 0.4 c o s (4 π x_{2}) + 0.7\} .$

Range of starting points $- 100 < x_{i} < 100$ , $i = 1, 2$ .
Number of local minima: many local minima.
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (0, 0).
16 SH $^{*}$ :: Shubert function in [57,80,87,88,92]

$min_{x} \{(\sum_{i = 1}^{5} i c o s ((i + 1) x_{1} + i)) (\sum_{i = 1}^{5} i c o s ((i + 1) x_{2} + i))\} .$

Range of starting points $- 5.12 < x_{i} < 5.12$ , $i = 1, 2 .$
Number of local minima: 760 local minima.
Global minima: $f (x^{*}) = - 186.7309$ at 18 point different of $x^{*}$ .
17 P8 $^{*}$: Ref. [92]

$min_{x} \{\frac{π}{n} (k_{1} s i n {(π y_{1})}^{2} + \sum_{i = 1}^{n - 1} {(y_{i} - k_{2})}^{2} [1 + k_{1} s i n {(π y_{_{i + 1}})}^{2}] + {(y_{n} - k_{2})}^{2})\},$

with $y_{i} = 1 + \frac{1}{4} (x_{i} + 1)$ , $k_{1} = 10$ and $k_{2} = 1$ .
Range of starting points $- 10 \leq x_{i} \leq 10$ , $i = 1, 2, 3 .$
Number of local minima: $5^{3}$ local minima.
Global minima: $f (x^{*}) = 0$ at $x^{*} = (- 1, - 1, - 1) .$
18 P16 $^{*}$: Ref. [92]

$min_{x} k_{3} \{s i n^{2} (π k_{4} x_{1}) + \sum_{i = 1}^{n - 1} {(x_{i} - k_{5})}^{2} [1 + k_{6} s i n^{2} (π k_{4} x_{i + 1})] + {(x_{n} - k_{5})}^{2} [1 + k_{6} s i n^{2} (π k_{7} x_{n})]\},$

where $k_{3} = 0.1$ , $k_{4} = 3$ , $k_{5} = 1$ , $k_{6} = 1$ , $k_{7} = 2$ .
Range of starting points $- 5 \leq x_{i} \leq 5$ , $i = 1, . ., n .$
Number of local minima: $15^{5}$ local minima.
Global minima: $f (x^{*}) = 0$ at $x^{*} = (1, 1, 1, 1, 1) .$
19 CB $^{*}$ :: Camel back in [80] and camel function in [93]

$min_{x} \{4 x_{1}^{2} - 2.1 x_{1}^{4} + \frac{1}{3} x_{1}^{6} + x_{1} x_{2} - 4 x_{2}^{2} + 4 x_{2}^{4}\} .$

Range of starting points $- 5 < x_{i} < 5$ , $i = 1, 2 .$
Number of local minima: many local minima.
Global minima: $f (x^{*}) = - 1.0316285$ at $x^{*} = {(0.089842, - 0.71266), (- 0.089842, 0.71266)}$ .
20 H3 $^{*}$ :: Hartmann function [57,80,87,88,92,93,94]

$min_{x} \{- \sum_{i = 1}^{4} c_{i} e x p (- \sum_{j = 1}^{3} a_{i j} {(x_{j} - p_{i j})}^{2})\} .$

Range of starting points $- 1 < x_{j} < 1$ , $j = 1, 2, 3 .$
Number of local minima: 4 local minima.
Global minima: $f (x^{*}) = - 3.86278$ at $x^{*} = (0.114614, 0.555649, 0.852547)$ .
21 H6 $^{*}$ :: Hartmann function [57,80,87,88,92,93,94]

$min_{x} \{- \sum_{i = 1}^{4} c_{i} e x p (- \sum_{j = 1}^{6} a_{i j} {(x_{j} - p_{i j})}^{2})\} .$

Range of starting points $- 1 < x_{j} < 1$ , $j = 1, 2, . . ., n .$
Number of local minima: 4 local minima.
Global minima: $f (x^{*}) = - 3.32237$ at $x^{*} =$ (0.201690, 0.150011, 0.476874,
0.275332, 0.311652, 0.657300).
22 HM $^{*}$ :: hump Function [57]

$min_{x} \{1.0316285 + 4 x_{1}^{2} - 2.1 x_{1}^{4} + \frac{1}{3} x_{1}^{6} + x_{1} x_{2} - 4 x_{2}^{2} + 4 x_{2}^{4}\} .$

Range of starting points $- 5 < x_{i} < 5, i = 1, 2 .$
Number of local minima: 3 local minima.
Global minima: $f (x^{*}) = 0$ at $x^{*} = {(0.0898, - 0.7126), (- 0.0898, 0.7126)}$ .
23 Le $^{*}$ :: Levy function [95]

$min_{x} \{s i n^{2} (π w_{1}) + \sum_{i = 1}^{n - 1} {(w_{i} - 1)}^{2} (1 + 10 s i n^{2} (π w_{i} + 1)) + {(w_{n} - 1)}^{2} [1 + s i n^{2} (2 π w_{n})]\},$

where $w_{i} = 1 + \frac{x_{i} - 1}{4}$ , for $i = 1, . . ., n .$
Range of starting points $- 10 < x_{i} < 10$ , $i = 1, 2, . . ., n .$
Number of local minima: many local minima.
Global minima: $f (x^{*}) = 0$ at $x^{*} =$ (1, 1, …, 1).

References

Abdel-Baset, M.; Hezam, I. A Hybrid Flower Pollination Algorithm for Engineering Optimization Problems. Int. J. Comput. Appl. 2016, 140, 10–23. [Google Scholar] [CrossRef]
Agrawal, P.; Ganesh, T.; Mohamed, A.W. A novel binary gaining–sharing knowledge-based optimization algorithm for feature selection. Neural Comput. Appl. 2021, 33, 5989–6008. [Google Scholar] [CrossRef]
Ayumi, V.; Rere, L.; Fanany, M.I.; Arymurthy, A.M. Optimization of Convolutional Neural Network using Microcanonical Annealing Algorithm. arXiv 2016, arXiv:1610.02306. [Google Scholar]
Lobato, F.S.; Steffen, V., Jr. Fish swarm optimization algorithm applied to engineering system design. Lat. Am. J. Solids Struct. 2014, 11, 143–156. [Google Scholar] [CrossRef] [Green Version]
Mazhoud, I.; Hadj-Hamou, K.; Bigeon, J.; Joyeux, P. Particle swarm optimization for solving engineering problems: A new constraint-handling mechanism. Eng. Appl. Artif. Intell. 2013, 26, 1263–1273. [Google Scholar] [CrossRef]
Mohamed, A.W.; Sabry, H.Z. Constrained optimization based on modified differential evolution algorithm. Inf. Sci. 2012, 194, 171–208. [Google Scholar] [CrossRef]
Mohamed, A.W.; Hadi, A.A.; Mohamed, A.K. Gaining-sharing knowledge based algorithm for solving optimization problems: A novel nature-inspired algorithm. Int. J. Mach. Learn. Cybern. 2020, 11, 1501–1529. [Google Scholar] [CrossRef]
Rere, L.; Fanany, M.I.; Arymurthy, A.M. Metaheuristic Algorithms for Convolution Neural Network. Comput. Intell. Neurosci. 2016, 2016, 1537325. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Samora, I.; Franca, M.J.; Schleiss, A.J.; Ramos, H.M. Simulated annealing in optimization of energy production in a water supply network. Water Resour. Manag. 2016, 30, 1533–1547. [Google Scholar] [CrossRef]
Shao, Y. Dynamics of an Impulsive Stochastic Predator–Prey System with the Beddington–DeAngelis Functional Response. Axioms 2021, 10, 323. [Google Scholar] [CrossRef]
Vallepuga-Espinosa, J.; Cifuentes-Rodríguez, J.; Gutiérrez-Posada, V.; Ubero-Martínez, I. Thermomechanical Optimization of Three-Dimensional Low Heat Generation Microelectronic Packaging Using the Boundary Element Method. Mathematics 2022, 10, 1913. [Google Scholar] [CrossRef]
Blum, C.; Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. (CSUR) 2003, 35, 268–308. [Google Scholar] [CrossRef]
Aarts, E.; Korst, J. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing; John Wiley & Sons, Inc.: New York, NY, USA, 1989. [Google Scholar]
Hillier, F.S.; Price, C.C. International Series in Operations Research & Management Science; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Laarhoven, P.J.V.; Aarts, E.H. Simulated Annealing: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
Farid, M.; Leong, W.J.; Hassan, M.A. A new two-step gradient-type method for large-scale unconstrained optimization. Comput. Math. Appl. 2010, 59, 3301–3307. [Google Scholar] [CrossRef] [Green Version]
Gilbert, J.C.; Nocedal, J. Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 1992, 2, 21–42. [Google Scholar] [CrossRef] [Green Version]
Hager, W.W.; Zhang, H. Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. (TOMS) 2006, 32, 113–137. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Shi, Z. A new memory gradient method under exact line search. Asia-Pac. J. Oper. Res 2003, 20, 275–284. [Google Scholar]
Zhang, L.; Zhou, W.; Li, D.H. A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 2006, 26, 629–640. [Google Scholar] [CrossRef]
Abubakar, A.B.; Malik, M.; Kumam, P.; Mohammad, H.; Sun, M.; Ibrahim, A.H.; Kiri, A.I. A Liu-Storey-type conjugate gradient method for unconstrained minimization problem with application in motion control. J. King Saud Univ.-Sci. 2022, 34, 101923. [Google Scholar] [CrossRef]
Dai, Y.; Yuan, Y. An efficient hybrid conjugate gradient method for unconstrained optimization. Ann. Oper. Res. 2001, 103, 33–47. [Google Scholar] [CrossRef]
Deng, S.; Wan, Z. A three-term conjugate gradient algorithm for large-scale unconstrained optimization problems. Appl. Numer. Math. 2015, 92, 70–81. [Google Scholar] [CrossRef]
Ma, G.; Lin, H.; Jin, W.; Han, D. Two modified conjugate gradient methods for unconstrained optimization with applications in image restoration problems. J. Appl. Mathemat. Comput. 2022, 1–26. [Google Scholar] [CrossRef]
Mtagulwa, P.; Kaelo, P. An efficient modified PRP-FR hybrid conjugate gradient method for solving unconstrained optimization problems. Appl. Numer. Math. 2019, 145, 111–120. [Google Scholar] [CrossRef]
Waziri, M.Y.; Kiri, A.I.; Kiri, A.A.; Halilu, A.S.; Ahmed, K. A modified conjugate gradient parameter via hybridization approach for solving large-scale systems of nonlinear equations. SeMA J. 2022, 1–23. [Google Scholar] [CrossRef]
Zhang, L.; Zhou, W.; Li, D. Global convergence of a modified Fletcher–Reeves conjugate gradient method with Armijo-type line search. Numer. Math. 2006, 104, 561–572. [Google Scholar] [CrossRef]
Alshamrani, A.M.; Alrasheedi, A.F.; Alnowibet, K.A.; Mahdi, S.; Mohamed, A.W. A Hybrid Stochastic Deterministic Algorithm for Solving Unconstrained Optimization Problems. Mathematics 2022, 10, 3032. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef] [Green Version]
Golub, G.H.; O’Leary, D.P. Some history of the conjugate gradient and Lanczos algorithms: 1948–1976. SIAM Rev. 1989, 31, 50–102. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H. A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2, 35–58. [Google Scholar]
Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
Powell, M.J. Nonconvex minimization calculations and the conjugate gradient method. In Numerical Analysis; Springer: Berlin/Heidelberg, Germany, 1984; pp. 122–141. [Google Scholar]
Al-Baali, M. Descent property and global convergence of the Fletcher Reeves method with inexact line search. IMA J. Numer. Anal. 1985, 5, 121–124. [Google Scholar] [CrossRef]
Powell, M.J.D. Restart procedures for the conjugate gradient method. Math. Program. 1977, 12, 241–254. [Google Scholar] [CrossRef]
Polak, E.; Ribiere, G. Note sur la convergence de méthodes de directions conjuguées. ESAIM Math. Model. Numer. Anal. 1969, 3, 35–43. [Google Scholar] [CrossRef]
Polyak, B.T. The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
Hestenes, M.R.; Stiefel, E. Methods of Conjugate Gradients for Solving. J. Res. Natl. Bur. Stand. 1952, 49, 409. [Google Scholar] [CrossRef]
Liu, Y.; Storey, C. Efficient generalized conjugate gradient algorithms, part 1: Theory. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
Abubakar, A.B.; Kumam, P. A descent Dai-Liao conjugate gradient method for nonlinear equations. Numer. Algorithms 2019, 81, 197–210. [Google Scholar] [CrossRef]
Abubakar, A.B.; Muangchoo, K.; Ibrahim, A.H.; Muhammad, A.B.; Jolaoso, L.O.; Aremu, K.O. A new three-term Hestenes-Stiefel type method for nonlinear monotone operator equations and image restoration. IEEE Access 2021, 9, 18262–18277. [Google Scholar] [CrossRef]
Babaie-Kafaki, S.; Ghanbari, R. A descent family of Dai–Liao conjugate gradient methods. Optim. Methods Softw. 2014, 29, 583–591. [Google Scholar] [CrossRef]
Dai, Y.; Liao, L. New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 2001, 43, 87–101. [Google Scholar] [CrossRef]
Ibrahim, A.H.; Kumam, P.; Kumam, W. A family of derivative-free conjugate gradient methods for constrained nonlinear equations and image restoration. IEEE Access 2020, 8, 162714–162729. [Google Scholar] [CrossRef]
Su, Z.; Li, M. A Derivative-Free Liu–Storey Method for Solving Large-Scale Nonlinear Systems of Equations. Math. Probl. Eng. 2020, 2020, 6854501. [Google Scholar] [CrossRef]
Yuan, G.; Zhang, M. A three-terms Polak–Ribière–Polyak conjugate gradient algorithm for large-scale nonlinear equations. J. Comput. Appl. Math. 2015, 286, 186–195. [Google Scholar] [CrossRef]
Yuan, G.; Jian, A.; Zhang, M.; Yu, J. A modified HZ conjugate gradient algorithm without gradient Lipschitz continuous condition for non convex functions. J. Appl. Mathemat. Comput. 2022, 1–22. [Google Scholar] [CrossRef]
Zhou, Y.; Wu, Y.; Li, X. A new hybrid prpfr conjugate gradient method for solving nonlinear monotone equations and image restoration problems. Math. Probl. Eng. 2020, 2020, 6391321. [Google Scholar] [CrossRef]
Yuan, G.; Meng, Z.; Li, Y. A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 2016, 168, 129–152. [Google Scholar] [CrossRef]
Yuan, G.; Wei, Z.; Yang, Y. The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 2019, 362, 262–275. [Google Scholar] [CrossRef]
Kan, A.R.; Timmer, G. Stochastic methods for global optimization. Am. J. Math. Manag. Sci. 1984, 4, 7–40. [Google Scholar] [CrossRef]
Alnowibet, K.A.; Alshamrani, A.M.; Alrasheedi, A.F.; Mahdi, S.; El-Alem, M.; Aboutahoun, A.; Mohamed, A.W. A Efficient Modified Meta-Heuristic Technique for Unconstrained Optimization Problems. Axioms 2022, 11, 483. [Google Scholar] [CrossRef]
Alnowibet, K.A.; Mahdi, S.; El-Alem, M.; Abdelawwad, M.; Mohamed, A.W. Guided Hybrid Modified Simulated Annealing Algorithm for Solving Constrained Global Optimization Problems. Mathematics 2022, 10, 1312. [Google Scholar] [CrossRef]
EL-Alem, M.; Aboutahoun, A.; Mahdi, S. Hybrid gradient simulated annealing algorithm for finding the global optimal of a nonlinear unconstrained optimization problem. Soft Comput. 2021, 25, 2325–2350. [Google Scholar] [CrossRef]
Hedar, A.R.; Fukushima, M. Hybrid simulated annealing and direct search method for nonlinear unconstrained global optimization. Optim. Methods Softw. 2002, 17, 891–912. [Google Scholar] [CrossRef]
Pedamallu, C.S.; Ozdamar, L. Investigating a hybrid simulated annealing and local search algorithm for constrained optimization. Eur. J. Oper. Res. 2008, 185, 1230–1245. [Google Scholar] [CrossRef]
Yiu, K.F.C.; Liu, Y.; Teo, K.L. A hybrid descent method for global optimization. J. Glob. Optim. 2004, 28, 229–238. [Google Scholar] [CrossRef]
Zoutendijk, G. Nonlinear programming, computational methods. In Integer and Nonlinear Programming; Abadie, J., Ed.; North-Holland: Amsterdam, The Netherlands, 1970; pp. 37–86. [Google Scholar]
Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1969, 11, 226–235. [Google Scholar] [CrossRef]
Wolfe, P. Convergence conditions for ascent methods. II: Some corrections. SIAM Rev. 1971, 13, 185–188. [Google Scholar] [CrossRef]
Conn, A.R.; Scheinberg, K.; Vicente, L.N. Introduction to Derivative-Free Optimization; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar]
Kramer, O.; Ciaurri, D.E.; Koziel, S. Derivative-free optimization. In Computational Optimization, Methods and Algorithms; Springer: Berlin/Heidelberg, Germany, 2011; pp. 61–83. [Google Scholar]
Larson, J.; Menickelly, M.; Wild, S.M. Derivative-free optimization methods. Acta Numer. 2019, 28, 287–404. [Google Scholar] [CrossRef] [Green Version]
Shi, H.J.M.; Xie, Y.; Xuan, M.Q.; Nocedal, J. Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization. arXiv 2021, arXiv:2110.06380. [Google Scholar] [CrossRef]
Shi, H.J.M.; Xuan, M.Q.; Oztoprak, F.; Nocedal, J. On the numerical performance of derivative-free optimization methods based on finite-difference approximations. arXiv 2021, arXiv:2102.09762. [Google Scholar]
Berahas, A.S.; Cao, L.; Choromanski, K.; Scheinberg, K. A theoretical and empirical comparison of gradient approximations in derivative-free optimization. Found. Comput. Math. 2022, 22, 507–560. [Google Scholar] [CrossRef]
Curtis, A.; Reid, J. The choice of step lengths when using differences to approximate Jacobian matrices. IMA J. Appl. Math. 1974, 13, 121–126. [Google Scholar] [CrossRef]
Calio, F.; Frontini, M.; Milovanović, G.V. Numerical differentiation of analytic functions using quadratures on the semicircle. Comput. Math. Appl. 1991, 22, 99–106. [Google Scholar] [CrossRef] [Green Version]
Gill, P.E.; Murray, W.; Saunders, M.A.; Wright, M.H. Computing forward-difference intervals for numerical optimization. SIAM J. Sci. Stat. Comput. 1983, 4, 310–321. [Google Scholar] [CrossRef]
Xie, Y. Methods for Nonlinear and Noisy Optimization. Ph.D. Thesis, Northwestern University, Evanston, IL, USA, 2021. [Google Scholar]
De Levie, R. An improved numerical approximation for the first derivative. J. Chem. Sci. 2009, 121, 935–950. [Google Scholar] [CrossRef] [Green Version]
Carter, R.G. On the global convergence of trust region algorithms using inexact gradient information. SIAM J. Numer. Anal. 1991, 28, 251–265. [Google Scholar] [CrossRef] [Green Version]
Rivet, A.; Souloumiac, A. Introduction to Optimization; Optimization Software, Publications Division: New Delhi, India, 1987. [Google Scholar]
Byrd, R.H.; Chin, G.M.; Nocedal, J.; Wu, Y. Sample size selection in optimization methods for machine learning. Math. Program. 2012, 134, 127–155. [Google Scholar] [CrossRef]
Cartis, C.; Scheinberg, K. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. 2018, 169, 337–375. [Google Scholar] [CrossRef] [Green Version]
Grapiglia, G.N. Quadratic regularization methods with finite-difference gradient approximations. Comput. Optim. Appl. 2022, 1–21. [Google Scholar] [CrossRef]
Paquette, C.; Scheinberg, K. A stochastic line search method with expected complexity analysis. SIAM J. Optim. 2020, 30, 349–376. [Google Scholar] [CrossRef]
Ali, M.M.; Khompatraporn, C.; Zabinsky, Z.B. A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J. Glob. Optim. 2005, 31, 635–672. [Google Scholar] [CrossRef]
Barbosa, H.J.; Bernardino, H.S.; Barreto, A.M. Using performance profiles to analyze the results of the 2006 CEC constrained optimization competition. In Proceedings of the IEEE Congress on Evolutionary Computation, Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
Moré, J.J.; Wild, S.M. Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 2009, 20, 172–191. [Google Scholar] [CrossRef] [Green Version]
Vaz, A.I.F.; Vicente, L.N. A particle swarm pattern search method for bound constrained global optimization. J. Glob. Optim. 2007, 39, 197–219. [Google Scholar] [CrossRef] [Green Version]
Liang, J.; Runarsson, T.P.; Mezura-Montes, E.; Clerc, M.; Suganthan, P.N.; Coello, C.C.; Deb, K. Problem definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter optimization. J. Appl. Mech. 2006, 41, 8–31. [Google Scholar]
Mohamed, A.W.; Hadi, A.A.; Mohamed, A.K.; Awad, N.H. Evaluating the performance of adaptive gainingsharing knowledge based algorithm on cec 2020 benchmark problems. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Bessaou, M.; Siarry, P. A genetic algorithm with real-value coding to optimize multimodal continuous functions. Struct. Multidisc. Optim. 2001, 23, 63–74. [Google Scholar] [CrossRef]
Chelouah, R.; Siarry, P. Tabu search applied to global optimization. Eur. J. Oper. Res. 2000, 123, 256–270. [Google Scholar] [CrossRef]
Fan, S.K.S.; Zahara, E. A hybrid simplex search and particle swarm optimization for unconstrained optimization. Eur. J. Oper. Res. 2007, 181, 527–548. [Google Scholar] [CrossRef]
Paulavičius, R.; Chiter, L.; Žilinskas, J. Global optimization based on bisection of rectangles, function values at diagonals, and a set of Lipschitz constants. J. Glob. Optim. 2018, 71, 5–20. [Google Scholar] [CrossRef]
Cardoso, M.F.; Salcedo, R.L.; De Azevedo, S.F. The simplex-simulated annealing approach to continuous non-linear optimization. Comput. Chem. Eng. 1996, 20, 1065–1080. [Google Scholar] [CrossRef]
Dekkers, A.; Aarts, E. Global optimization and simulated annealing. Math. Program. 1991, 50, 367–393. [Google Scholar] [CrossRef] [Green Version]
Tsoulos, I.G.; Stavrakoudis, A. Enhancing PSO methods for global optimization. Appl. Math. Comput. 2010, 216, 2988–3001. [Google Scholar] [CrossRef]
Siarry, P.; Berthiau, G.; Durbin, F.; Haussy, J. Enhanced Simulated-Annealing Algorithm for Globally Minimizing Functions of Many Continuous Variables. ACM Trans. Math. Softw. 1997, 23, 209–228. [Google Scholar] [CrossRef]
Laguna, M.; Martí, R. Experimental testing of advanced scatter search designs for global optimization of multimodal functions. J. Glob. Optim. 2005, 33, 235–255. [Google Scholar] [CrossRef]

Figure 1. Plotting the results of the terms itr.w and FEs.w for 5 algorithms.

Figure 2. Plotting the results of the terms itr.be and FEs.be for 5 algorithms.

Figure 3. Plotting the results of the term time.a “CPU” for 5 algorithms.

Figure 4. Plotting the results of the terms itr.a and FEs.a for 5 algorithms.

Figure 5. Plotting the results of the terms itr.w and FEs.w for 5 algorithms.

Figure 6. Plotting the results of the terms itr.be and FEs.be for 5 algorithms.

Figure 7. Plotting the results of the term time.a “CPU” for 5 algorithms.

Figure 8. Plotting the results of the terms itr.a and FEs.a for 5 algorithms.

Table 1. List of both kinds of test problems.

f	n	$f (x^{*})$	$∥ g (x^{*}) ∥$	f	n	$f (x^{*})$	$∥ g (x^{*}) ∥$
$R n$	10, 30, 50, 80, 100	0	-	$Z n$	10, 30, 50, 80, 100	0	-
$P W$	8, 32, 84, 120	0	-	$S P$	10, 30, 80, 100	0	-
$T r$	10, 30, 60, 80	$\frac{- n (n + 4) (n - 1)}{6}$	-	$S u$	10, 30, 50, 80, 100	0	-
$C V$	4	0	-	$B R$	2	0.397887	-
$D J$	3	0	-	$B O$	2	0	-
$M a$	2	0	-	$S 5^{*}$	4	−10.1532	3.2 × 10 $^{- 5}$
$S 7^{*}$	4	−10.4029	-	$S 10^{*}$	4	−10.5364	3 × 10 $^{- 5}$
$G P^{*}$	2	3	2 × 10 $^{- 6}$	$R a s^{*}$	2	−2	2.5 × 10 $^{- 6}$
$B h 1^{*}$	2	0	2.4 × 10 $^{- 5}$	$S H^{*}$	2	−186.7309	2 × 10 $^{- 6}$
$P 8^{*}$	3	0	-	$P 16^{*}$	5	0	1.2 × 10 $^{- 6}$
$C B^{*}$	2	−1.0316285	2 × 10 $^{- 5}$	$H 3^{*}$	3	−3.86278	2 × 10 $^{- 5}$
$H 6^{*}$	6	−3.32237	6 × 10 $^{- 5}$	$H M^{*}$	2	0	1.1 × 10 $^{- 8}$
$L e^{*}$	10	0	2.1 × 10 $^{- 6}$

Table 2. The number of worst iterations.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	2915	3740	5705	5080	5185	$R n$	30	2270	3555	5170	5140	5050
$R n$	50	2605	3805	5705	5290	5145	$R n$	80	2750	4010	5795	5150	5890
$R n$	100	2820	2950	5050	5930	5840	$Z n$	10	145	170	225	210	195
$Z n$	30	1075	995	1825	1575	1425	$Z n$	50	2295	2600	4180	3645	3515
$Z n$	80	5335	4900	9255	8610	7345	$Z n$	100	9095	7490	9905	9905	9905
$P W$	8	1470	2230	7120	3980	970	$P W$	32	2135	4515	9700	9700	2075
$P W$	84	3345	6575	9885	9885	2145	$P W$	120	4385	7750	9920	9920	4495
$S P$	10	15	25	25	30	25	$S P$	30	15	25	30	30	30
$S P$	80	15	30	25	35	25	$S P$	100	15	30	30	35	30
$T r$	10	575	160	135	355	155	$T r$	30	2830	1765	2055	9680	2280
$T r$	60	9840	9840	9840	9840	9840	$T r$	100	9880	9905	9905	9905	9905
$S u$	100	155	155	190	200	185	$S u$	80	140	135	175	190	185
$S u$	50	115	95	130	130	130	$S u$	30	75	80	90	95	80
$S u$	10	45	40	45	40	40	$B R$	2	75	75	70	65	200
$C V$	4	2070	1745	1760	2455	5705	$D J$	3	15	15	35	40	30
$B O$	2	35	35	40	40	35	$M a$	2	80	105	65	F	140
$S 5^{*}$	4	115	445	150	750	155	$S 7^{*}$	4	200	275	220	1500	215
$S 10^{*}$	4	100	250	205	620	120	$G P^{*}$	2	6670	6670	6670	6670	6670
$R a s^{*}$	2	30	175	1665	280	220	$B h 1^{*}$	2	35	50	400	70	75
$S H^{*}$	2	6670	6670	6670	6670	6670	$P 8^{*}$	4	20	8000	8000	1880	4730
$P 16^{*}$	5	20	8000	8000	1880	4730	$C B^{*}$	2	25	25	115	25	150
$H 3^{*}$	3	415	655	1300	365	7500	$H 6^{*}$	6	445	1425	2190	8575	565
$H M^{*}$	2	25	30	25	25	25	$L e^{*}$	10	1105	1575	1815	1025	1200

Table 3. The number of worst function evaluations.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	32,065	41,140	290,955	55,880	57,035	$R n$	30	70,370	110,205	160,270	159,340	156,550
$R n$	50	132,855	194,055	290,955	269,790	262,395	$R n$	80	222,750	324,810	469,395	417,150	477,090
$R n$	100	284,820	297,950	510,050	598,930	589,840	$Z n$	10	1595	1870	2475	2310	2145
$Z n$	30	33,325	30,845	56,575	48,825	44,175	$Z n$	50	117,045	132,600	213,180	185,895	179,265
$Z n$	80	432,135	396,900	749,655	697,410	594,945	$Z n$	100	918,595	756,490	1,000,405	1,000,405	1,000,405
$P W$	8	13,230	20,070	64,080	35,820	8730	$P W$	32	70,455	148,995	320,100	320,100	68,475
$P W$	84	284,325	558,875	840,225	840,225	182,325	$P W$	120	530,585	937,750	1,200,320	1,200,320	543,895
$S P$	10	165	275	275	330	275	$S P$	30	465	775	930	930	930
$S P$	80	1215	2430	2025	2835	2025	$S P$	100	1515	3030	3030	3535	3030
$T r$	10	6325	1760	1485	3905	1705	$T r$	30	87,730	54,715	63,705	300,080	70,680
$T r$	60	600,240	600,240	600,240	600,240	600,240	$T r$	100	800,280	1,000,405	1,000,405	1,000,405	1,000,405
$S u$	100	15,655	15,655	19,190	20,200	18,685	$S u$	80	11,340	10,935	14,175	15,390	14,985
$S u$	50	5865	4845	6630	6630	6630	$S u$	30	2325	2480	2790	2945	2480
$S u$	10	495	440	495	440	440	$B R$	2	225	225	210	195	600
$C V$	4	10,350	8725	8800	12,275	28,525	$D J$	3	60	60	140	160	120
$B O$	2	105	105	160	120	105	$M a$	2	240	315	195	F	420
$S 5^{*}$	4	575	2225	750	3750	775	$S 7^{*}$	4	1000	1375	1100	7500	1075
$S 10^{*}$	4	500	1250	1025	3100	600	$G P^{*}$	2	20,010	20,010	20,010	20,010	20,010
$R a s^{*}$	2	90	525	4995	840	660	$B h 1^{*}$	2	105	150	1200	210	225
$S H^{*}$	2	20,010	20,010	20,010	20,010	20,010	$P 8^{*}$	4	100	40,000	40,000	9400	23,650
$P 16^{*}$	5	100	40,000	40,000	9400	23,650	$C B^{*}$	2	75	75	345	75	450
$H 3^{*}$	3	1660	2620	5200	1460	30,000	$H 6^{*}$	6	3115	9975	15,330	60,025	3955
$H M^{*}$	2	75	90	75	75	75	$L e^{*}$	10	12,155	17,325	19,965	11,275	13,200

Table 4. The number of best iterations.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	360	460	490	520	510	$R n$	30	230	485	190	590	420
$R n$	50	705	375	490	650	440	$R n$	80	230	400	920	125	460
$R n$	100	240	275	885	670	705	$Z n$	10	60	60	115	80	75
$Z n$	30	245	330	875	875	810	$Z n$	50	570	905	2235	1765	1885
$Z n$	80	935	1565	4080	4365	3495	$Z n$	100	1670	2545	6345	6045	5095
$P W$	8	180	175	2080	375	225	$P W$	32	280	2610	1250	390	280
$P W$	84	510	3525	4115	520	410	$P W$	120	535	2745	2765	395	435
$S P$	10	5	10	10	20	10	$S P$	30	10	10	10	20	10
$S P$	80	10	10	10	20	15	$S P$	100	10	10	10	20	15
$T r$	10	85	85	65	80	55	$T r$	30	735	1370	230	350	220
$T r$	60	9840	9840	9840	9840	430	$T r$	100	9880	9905	9905	9905	9905
$S u$	100	70	80	95	95	75	$S u$	80	65	55	75	100	70
$S u$	50	50	55	60	70	50	$S u$	30	40	40	40	45	40
$S u$	10	20	25	20	25	20	$B R$	2	15	15	15	15	10
$C V$	4	275	275	690	370	600	$D J$	3	10	10	10	20	10
$B O$	2	15	15	20	20	20	$M a$	2	30	20	20	F	15
$S 5^{*}$	4	15	20	20	25	125	$S 7^{*}$	4	15	15	15	30	100
$S 10^{*}$	4	15	15	20	15	100	$G P^{*}$	2	25	180	170	60	165
$R a s^{*}$	2	10	20	95	15	45	$B h 1^{*}$	2	20	20	30	25	75
$S H^{*}$	2	390	255	840	155	20010	$P 8^{*}$	4	10	15	5	15	125
$P 16^{*}$	5	10	15	5	15	125	$C B^{*}$	2	15	15	20	15	45
$H 3^{*}$	3	5	5	5	5	15	$H 6^{*}$	6	50	50	50	50	175
$H M^{*}$	2	10	15	15	15	30	$L e^{*}$	10	65	40	105	70	550

Table 5. The number of best function evaluations.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	3960	5060	24,990	5720	5610	$R n$	30	7130	15,035	5890	18,290	13,020
$R n$	50	35,955	19,125	24,990	33,150	22,440	$R n$	80	18,630	32,400	74,520	10,125	37,260
$R n$	100	24,240	27,775	89,385	67,670	71,205	$Z n$	10	660	660	1265	880	825
$Z n$	30	7595	10,230	27,125	27,125	25,110	$Z n$	50	29,070	46,155	113,985	90,015	96,135
$Z n$	80	75,735	126,765	330,480	353,565	283,095	$Z n$	100	168,670	257,045	640,845	610,545	514,595
$P W$	8	1620	1575	18,720	3375	2025	$P W$	32	9240	86,130	41,250	12,870	9240
$P W$	84	43,350	299,625	349,775	44,200	34,850	$P W$	120	64,735	332,145	334,565	47,795	52,635
$S P$	10	55	110	110	220	110	$S P$	30	310	310	310	620	310
$S P$	80	810	810	810	1620	1215	$S P$	100	1010	1010	1010	2020	1515
$T r$	10	935	935	715	880	605	$T r$	30	22,785	42,470	7130	10,850	6820
$T r$	60	600,240	600,240	600,240	600,240	26,230	$T r$	100	800,280	1,000,405	1,000,405	1,000,405	1,000,405
$S u$	100	7070	8080	9595	9595	7575	$S u$	80	5265	4455	6075	8100	5670
$S u$	50	2550	2805	3060	3570	2550	$S u$	30	1240	1240	1240	1395	1240
$S u$	10	220	275	220	275	220	$B R$	2	45	45	45	45	30
$C V$	4	1375	1375	3450	1850	3000	$D J$	3	40	40	40	80	40
$B O$	2	45	45	80	60	60	$M a$	2	90	60	60	F	45
$S 5^{*}$	4	75	100	100	125	125	$S 7^{*}$	4	75	75	75	150	100
$S 10^{*}$	4	75	75	100	75	100	$G P^{*}$	2	75	540	510	180	165
$R a s^{*}$	2	30	60	285	45	45	$B h 1^{*}$	2	60	60	90	75	75
$S H^{*}$	2	1170	765	2520	465	20,010	$P 8^{*}$	4	50	75	25	75	125
$P 16^{*}$	5	50	75	25	75	125	$C B^{*}$	2	45	45	60	45	45
$H 3^{*}$	3	15	15	15	15	15	$H 6^{*}$	6	300	300	300	300	175
$H M^{*}$	2	30	45	45	45	30	$L e^{*}$	10	715	440	1155	770	550

Table 6. The average of time.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	1.463	1.441	3.316	2.436	3.215	$R n$	30	2.771	3.702	6.831	6.934	6.816
$R n$	50	5.571	6.123	13.288	12.286	13.258	$R n$	80	10.149	11.273	19.529	21.934	22.283
$R n$	100	14.761	15.973	29.596	29.495	32.934	$Z n$	10	0.083	0.091	0.139	0.137	0.115
$Z n$	30	1.336	1.365	2.477	3.220	2.290	$Z n$	50	5.445	5.297	11.689	11.293	10.938
$Z n$	80	24.818	27.532	58.547	55.403	50.910	$Z n$	100	53.552	51.210	107.859	104.313	109.531
$P W$	8	0.493	1.231	4.760	0.985	0.309	$P W$	32	1.783	6.812	21.873	6.496	1.085
$P W$	84	8.779	38.644	76.499	16.061	4.562	$P W$	120	16.955	72.473	113.171	29.623	7.631
$S P$	10	0.011	0.016	0.020	0.026	0.017	$S P$	30	0.021	0.032	0.033	0.042	0.036
$S P$	80	0.060	0.099	0.096	0.165	0.096	$S P$	100	0.075	0.137	0.125	0.191	0.148
$T r$	10	0.183	0.084	0.068	0.144	0.069	$T r$	30	2.948	2.891	1.982	24.737	1.256
$T r$	60	63.990	73.812	80.235	58.588	70.106	$T r$	100	90.259	130.122	134.463	145.078	135.992
$S u$	100	4.542	4.706	4.736	5.194	5.127	$S u$	80	2.288	2.753	2.839	2.948	2.716
$S u$	50	0.780	0.799	0.842	0.921	0.889	$S u$	30	0.294	0.265	0.298	0.296	0.247
$S u$	10	0.051	0.043	0.038	0.041	0.036	$B R$	2	0.022	0.024	0.022	0.019	0.045
$C V$	4	0.568	0.505	0.762	0.774	6.317	$D J$	3	0.008	0.009	0.013	0.020	0.013
$B O$	2	0.014	0.014	0.016	0.016	0.016	$M a$	2	0.026	0.026	0.017	F	0.019
$S 5^{*}$	4	0.107	0.321	0.166	0.297	0.162	$S 7^{*}$	4	0.231	0.204	0.180	0.554	0.273
$S 10^{*}$	4	0.124	0.180	0.208	0.432	0.194	$G P^{*}$	2	6.068	7.927	5.410	11.203	3.164
$R a s^{*}$	2	0.021	0.091	1.355	0.109	0.120	$B h 1^{*}$	2	0.019	0.030	0.184	0.039	0.043
$S H^{*}$	2	13.341	11.597	13.226	12.487	17.294	$P 8^{*}$	4	0.011	0.307	0.234	0.247	0.155
$P 16^{*}$	5	0.128	0.276	3.452	0.129	3.886	$C B^{*}$	2	0.014	0.015	0.060	0.015	0.058
$H 3^{*}$	3	0.103	0.411	0.203	0.114	0.400	$H 6^{*}$	6	0.224	0.902	0.205	1.064	0.164
$H M^{*}$	2	0.016	0.021	0.015	0.015	0.016	$L e^{*}$	10	0.501	0.513	0.836	0.553	0.612

Table 7. The average of number of iterations.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	1469.3	1479.3	3114.8	2517.2	2952.5	$R n$	30	1273.4	1523.8	2877.9	2867.5	2721.5
$R n$	50	1375	1530.6	3114.8	2746.4	2851.1	$R n$	80	1379.2	1535.2	2524.7	2885.5	2593.2
$R n$	100	1403.9	1421.2	2821	2839	2809.7	$Z n$	10	104.61	108.92	168.04	155.2	142.16
$Z n$	30	654.02	674.22	1229.3	1187.3	1078.8	$Z n$	50	1491.3	1479.2	2947.1	2914.6	2817
$Z n$	80	3378.6	3298.6	6529	6519.2	6185.1	$Z n$	100	5125.6	4818.4	9066.1	8703.7	9048.1
$P W$	8	697.16	1731.6	5774.2	1339.6	441.47	$P W$	32	1042.5	3675	8852.9	2993.3	660.1
$P W$	84	1665.2	5767.7	9547.3	2632.5	817.55	$P W$	120	1774.8	6851.4	9424	2964.5	897.55
$S P$	10	10.294	16.765	16.471	23.824	18.529	$S P$	30	10.784	17.941	18.627	25.294	21.176
$S P$	80	11.176	19.118	19.216	26.471	19.412	$S P$	100	10.588	19.314	18.725	26.765	20.98
$T r$	10	283.24	125.88	100.59	182.06	96.961	$T r$	30	1713.5	1610.5	1053	7268.8	649.41
$T r$	60	9840	9840	9840	9840	9117.5	$T r$	100	9880	9905	9905	9905	9905
$S u$	100	116.08	112.35	152.06	147.94	141.86	$S u$	80	96.373	99.02	134.9	137.25	117.35
$S u$	50	76.667	72.353	95.98	95.686	89.51	$S u$	30	58.725	54.314	70.392	69.608	57.255
$S u$	10	32.451	29.706	31.961	33.627	29.706	$B R$	2	36.961	32.255	35.686	31.667	49.902
$C V$	4	704.41	634.9	1114.1	1015.2	3365.8	$D J$	3	11.078	11.373	21.176	31.275	18.824
$B O$	2	24.608	23.824	29.02	28.235	27.451	$M a$	2	58.824	60.882	39.902	F	40.882
$S 5^{*}$	4	52	106.5	76.75	255.25	66.75	$S 7^{*}$	4	73.25	73	61.25	387.75	76.5
$S 10^{*}$	4	40.25	49.75	57.75	172.75	51.75	$G P^{*}$	2	2053.8	3273.3	2340.3	4125	1381
$R a s^{*}$	2	21.5	68.25	802.25	86.25	88.25	$B h 1^{*}$	2	28.5	33.25	119.5	38	42.5
$S H^{*}$	2	5927.5	5855	6184.5	6344.3	6670	$P 8^{*}$	4	13.25	771.5	575.25	603.75	367.5
$P 16^{*}$	5	13.25	771.5	575.25	603.75	367.5	$C B^{*}$	2	19.75	19	50.75	19.75	47.5
$H 3^{*}$	3	92.25	201.25	225.5	104.75	450	$H 6^{*}$	6	108.25	250	130.25	580.5	103.75
$H M^{*}$	2	19.75	20.25	19	19.25	19	$L e^{*}$	10	303.5	323.25	550.75	375	379

Table 8. The average of number of function evaluations.

f	n	SHZ	MHZ	HZ	HS	FR	f	n	SHZ	MHZ	HZ	HS	FR
$R n$	10	16,163	16,273	158,855	27,689	32,477	$R n$	30	39,476	47,239	89,216	88,894	84,366
$R n$	50	70,125	78,060	158,855	140,065	145,405	$R n$	80	111,717	124,351	204,501	233,725	210,052
$R n$	100	141,796	143,539	284,919	286,741	283,780	$Z n$	10	1151	1198	1848	1707	1564
$Z n$	30	20,275	20,901	38,109	36,805	33,444	$Z n$	50	76,055	75,440	150,300	148,645	143,665
$Z n$	80	273,669	267,189	528,851	528,057	500,993	$Z n$	100	517,684	486,662	915,674	879,076	913,862
$P W$	8	6274	15,584	51,968	12,057	3973	$P W$	32	34,404	121,275	292,147	98,780	21,783
$P W$	84	141,542	490,258	811,517	223,758	69,492	$P W$	120	214,751	829,016	1,140,306	358,706	108,603
$S P$	10	113	184	181	262	204	$S P$	30	334	556	578	784	657
$S P$	80	905	1549	1557	2144	1572	$S P$	100	1069	1951	1891	2703	2119
$T r$	10	3116	1385	1107	2003	1067	$T r$	30	53,119	49,925	32,644	225,333	20,132
$T r$	60	600,240	600,240	600,240	600,240	556,165	$T r$	100	800,280	1,000,405	1,000,405	1,000,405	1,000,405
$S u$	100	11,724	11,348	15,358	14,942	14,328	$S u$	80	7806	8021	10,927	11,118	9506
$S u$	50	3910	3690	4895	4880	4565	$S u$	30	1821	1684	2182	2158	1775
$S u$	10	357	327	352	370	327	$B R$	2	111	97	107	95	150
$C V$	4	3522	3175	5571	5076	16,829	$D J$	3	44	46	85	125	75
$B O$	2	74	72	116	85	82	$M a$	2	177	183	120	F	123
$S 5^{*}$	4	260	533	384	1276	334	$S 7^{*}$	4	366	365	306	1939	383
$S 10^{*}$	4	201	249	289	864	259	$G P^{*}$	2	6161	9820	7021	12,375	4143
$R a s^{*}$	2	65	205	2407	259	265	$B h 1^{*}$	2	86	100	359	114	128
$S H^{*}$	2	17,783	17565	18,554	19,033	20,010	$P 8^{*}$	4	66	3858	2876	3019	1838
$P 16^{*}$	5	66	3858	2876	3019	1838	$C B^{*}$	2	59	57	152	59	143
$H 3^{*}$	3	369	805	902	419	1800	$H 6^{*}$	6	758	1750	912	4064	671
$H M^{*}$	2	59	61	57	58	57	$L e^{*}$	10	3339	3556	6058	4125	4169

Table 9. The number of worst iterations.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	3150	55	85	F	F	$S 7^{*}$	4	10,000	F	10,000	F	F
$S 10^{*}$	4	710	F	3020	F	F	$H M^{*}$	2	40	100	95	75	180
$H^{*}$	3	300	590	1155	465	1270	$H^{*}$	6	50	500	300	9550	F
$C B^{*}$	2	55	145	15	200	90	$P 8^{*}$	4	20	15	15	550	10
$P 16^{*}$	5	755	835	3280	F	7300	$S H^{*}$	2	100	115	200	250	190
$B h 1^{*}$	2	205	F	F	F	F	$R a s^{*}$	2	1310	F	F	F	F
$G P^{*}$	2	20	F	F	300	F	$L e^{*}$	10	2470	1430	F	F	3100

Table 10. The number of worst function evaluations.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	12,600	220	340	F	F	$S 7^{*}$	4	40,000	F	40,000	F	F
$S 10^{*}$	4	2840	F	12,080	F	F	$H M^{*}$	2	120	200	190	150	360
$H^{*}$	3	900	1770	3465	1395	3810	$H^{*}$	6	300	3000	1800	57,300	F
$C B^{*}$	2	110	290	30	400	180	$P 8^{*}$	4	80	60	60	2200	40
$P 16^{*}$	5	3775	4175	16,400	F	36,500	$S H^{*}$	2	200	230	400	500	380
$B h 1^{*}$	2	410	F	F	F	F	$R a s^{*}$	2	2620	F	F	F	F
$G P^{*}$	2	40	F	F	600	F	$L e^{*}$	10	24,700	14,300	F	F	31,000

Table 11. The number of best iterations.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	50	35	55	F	F	$S 7^{*}$	4	750	F	520	F	F
$S 10^{*}$	4	20	F	70	F	F	$H M^{*}$	2	15	10	10	10	5
$H^{*}$	3	50	60	85	20	130	$H^{*}$	6	50	100	100	50	F
$C B^{*}$	2	15	10	10	50	10	$P 8^{*}$	4	5	5	5	50	5
$P 16^{*}$	5	150	35	80	F	40	$S H^{*}$	2	10	10	10	50	10
$B h 1^{*}$	2	20	F	F	F	F	$R a s^{*}$	2	10	F	F	F	F
$G P^{*}$	2	15	F	F	50	F	$L e^{*}$	10	400	120	F	F	395

Table 12. The number of best function evaluations.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	200	140	220	F	F	$S 7^{*}$	4	3000	F	2080	F	F
$S 10^{*}$	4	80	F	280	F	F	$H M^{*}$	2	45	20	20	20	10
$H^{*}$	3	150	180	255	60	390	$H^{*}$	6	300	600	600	300	F
$C B^{*}$	2	30	20	20	100	20	$P 8^{*}$	4	20	20	20	200	20
$P 16^{*}$	5	725	175	400	F	200	$S H^{*}$	2	20	20	20	100	20
$B h 1^{*}$	2	40	F	F	F	F	$R a s^{*}$	2	20	F	F	F	F
$G P^{*}$	2	30	F	F	100	F	$L e^{*}$	10	4000	1200	F	F	3950

Table 13. The average of time.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	0.720	0.050	0.046	F	F	$S 7^{*}$	4	7.368	F	13.249	F	F
$S 10^{*}$	4	0.151	F	0.885	F	F	$H M^{*}$	2	0.017	0.031	0.028	0.021	0.053
$H^{*}$	3	0.186	0.353	0.409	0.271	0.361	$H^{*}$	6	0.057	0.194	0.143	4.712	F
$C B^{*}$	2	0.018	0.025	0.010	0.049	0.030	$P 8^{*}$	4	0.014	0.015	0.009	0.116	0.007
$P 16^{*}$	5	0.319	0.135	0.683	F	1.606	$S H^{*}$	2	0.028	0.037	0.050	0.084	0.039
$B h 1^{*}$	2	0.039	F	F	F	F	$R a s^{*}$	2	0.261	F	F	F	F
$G P^{*}$	2	0.015	F	F	0.078	F	$L e^{*}$	10	1.221	0.627	F	F	2.087

Table 14. The average of number of iterations.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	416.7	47	67	F	F	$S 7^{*}$	4	5928.7	F	8648	F	F
$S 10^{*}$	4	131.3	F	589	F	F	$H M^{*}$	2	24.3	29	34.8	25.8	50.8
$H^{*}$	3	213.3	268.8	373.3	247	333.8	$H^{*}$	6	50	205	177.5	2382.5	F
$C B^{*}$	2	26	23.3	12.8	57.5	36.8	$P 8^{*}$	4	12	11	10.5	267.5	6.3
$P 16^{*}$	5	376	171.25	878.8	F	2208.5	$S H^{*}$	2	30.3	39	48.5	65	41.5
$B h 1^{*}$	2	74.3	F	F	F	F	$R a s^{*}$	2	346.7	F	F	F	F
$G P^{*}$	2	18	F	F	62.5	F	$L e^{*}$	10	1012.7	506	F	F	1380.3

Table 15. The average of number of function evaluations.

f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR	f	n	HSSHZ	HSMHZ	HSHZ	HSHS	HSFR
$S 5^{*}$	4	1666.7	188	268	F	F	$S 7^{*}$	4	23,714.7	F	34,592	F	F
$S 10^{*}$	4	525.3	F	2356	F	F	$H M^{*}$	2	73	58	69.5	51.5	101.5
$H^{*}$	3	640	806.3	1119.8	741	1001.3	$H^{*}$	6	300	1230	1065	14,295	F
$C B^{*}$	2	52	46.5	25.5	115	73.5	$P 8^{*}$	4	48	44	42	1070	25
$P 16^{*}$	5	1880	856.3	4393.8	F	11,042.5	$S H^{*}$	2	60.7	78	97	130	83
$B h 1^{*}$	2	148.7	F	F	F	F	$R a s^{*}$	2	693.3	F	F	F	F
$G P^{*}$	2	36	F	F	125	F	$L e^{*}$	10	10,126.7	5060	F	F	13,802.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alnowibet, K.A.; Mahdi, S.; Alshamrani, A.M.; Sallam, K.M.; Mohamed, A.W. A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems. Mathematics 2022, 10, 3595. https://doi.org/10.3390/math10193595

AMA Style

Alnowibet KA, Mahdi S, Alshamrani AM, Sallam KM, Mohamed AW. A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems. Mathematics. 2022; 10(19):3595. https://doi.org/10.3390/math10193595

Chicago/Turabian Style

Alnowibet, Khalid Abdulaziz, Salem Mahdi, Ahmad M. Alshamrani, Karam M. Sallam, and Ali Wagdy Mohamed. 2022. "A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems" Mathematics 10, no. 19: 3595. https://doi.org/10.3390/math10193595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Family of Hybrid Stochastic Conjugate Gradient Algorithms for Local and Global Minimization Problems

Abstract

1. Introduction

Part I: Local Minimization Problem

2. Suggested CG Method

Convergence Analysis of Algorithm 1

3. Numerical Differentiation

3.1. Error Analysis

3.2. Selecting a Step-Size h

3.3. Estimating Gradient Vector

3.4. Convergence Analysis of DFF

4. Numerical Experiments of Part I

4.1. Stopping Criteria of Algorithm 1

4.2. Performance Profiles

Part II: Global Minimization Problem

5. Random Technique

Random Parameters (SP Technique)

6. Hybridization of the CG Method with Stochastic Parameters

A Mechanism Running Algorithm 3

7. Numerical Experiments of Part II

8. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. List of Test Problems

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI