A MODIFIED NONLINEAR CONJUGATE GRADIENT ALGORITHM FOR UNCONSTRAINED OPTIMIZATION AND PORTFOLIO SELECTION PROBLEMS

. Conjugate gradient methods play a vital role in finding solutions of large-scale optimization problems due to their simplicity to implement, low memory requirements and as well as their convergence properties. In this paper, we propose a new conjugate gradient method that has a direction satisfying the sufficient descent property. We establish global convergence of the new method under the strong Wolfe line search conditions. Numerical results show that the new method performs better than other relevant methods in the literature. Furthermore, we use the new method to solve a portfolio selection problem


Introduction
Conjugate gradient (CG) methods have become a widely attractive option for solving large scale unconstrained optimization problems.They are easy to implement and have good global convergence properties.Unlike Newton and quasi-Newton methods, they require low memory as they do not store any matrices.Conjugate gradient methods are applicable in areas such as reconstruction of radial magnetic resonance (MR) images [41], portfolio selection [1,5,11], motion control problems [2,3], compressive sensing and image restoration problems [24,40,42].
Generally, large-scale unconstrained optimization problems are of the form min  (), where  : R  → R is a continuously differentiable function.To solve (1), CG methods follow the iterative scheme  +1 =   +     ,  = 0, 1, 2, . . ., where   and  +1 are the current and next iteration points, respectively,   is the step length and   is the search direction.A single iteration moves a point   to a new point  +1 along a search direction   , taking a step size   .Here,   can be determined by using an exact or inexact line search and the direction   is given by where   = ∇ (  ) and   is a CG parameter which determines the CG method.Over the years, researchers have exploited numerous ways in which   can be chosen.Amongst those are the classical Hestenes and Stiefel (HS) [19], Fletcher and Reeves (FR) [18], Polak, Ribiére and Polyak (PRP) [32,33], Dai and Yuan (DY) [10], Liu and Storey (LS) [26] and the Conjugate Descent (CD) [17].
Research has shown that of these classical CG methods, the PRP, HS and LS methods have good computational performance, and the other three have good convergence properties.Hence, a lot of recent research has continued into constructing CG algorithms which possess both good numerical performance and convergence properties.This has led to the development of variations of CG methods such as hybrid CG methods [13,15,31], spectral CG methods [5,21] and three-term CG methods [20,23,27,36], among others.
In an attempt to improve numerical performance by incorporating both the gradient and function values information, Yin et al. [39] recently proposed a CG algorithm which satisfies the sufficient descent property and the trust region feature where ‖ • ‖ denotes the Euclidean norm.Their direction   is given as where τ > 0 and ,  ∈ (0, 1).Another way to improve the numerical performance of a CG method is to incorporate the use of a restart feature in the direction scheme [25].Using this idea, Jiang et al. [23] proposed a three term CG method with a restart feature and gave their search direction as where 0 < ζ < 1 and

•
They showed that the method is globally convergent when using the strong Wolfe line search conditions given by (2) and where 0 <  <  < 1. Delladji et al. [12] proposed a hybrid CG algorithm where the direction   , which satisfies the conjugacy condition     −1 = 0 at every iteration, is generated by the rule Their CG parameter is taken to be the convex combination of  FR

𝑘
and  BA

𝑘
given as where

•
Global convergence of the method was established under the strong Wolfe line search.One other CG method that satisfies the sufficient descent property and possesses the trust region feature is that proposed by Wu [37].In this method, the direction is calculated as where  1 ,  2 ,  3 > 0 are constants.Global convergence of the method was established under the Wolfe line search conditions given by (2) and where 0 <  <  < 1.The numerical results show that their method is effective and reliable when compared with relevant methods in the literature.
In [16], Faramarzi and Amini proposed a modified spectral CG method and showed that the direction satisfies the sufficient descent property.Global convergence was established under the strong Wolfe line search.The method was shown to be superior when compared to other methods in the literature.And in Kou and Dai [25], a three-term CG method with a restart procedure is presented.The three-term CG method is given as and The method uses a restart feature whenever a truncation happens, that is, when the authors set   = 0 and the method is restarted along the direction The authors showed that due to the restart procedure, the method achieves much better results.
In the next section, we propose a new CG method that satisfies the sufficient descent condition.We then establish the global convergence of this new method under the strong Wolfe line search in Section 3. We present the numerical experiments followed by an application in portfolio optimization in Section 4. Finally, in the last section we give the concluding remarks.

The method
In 2001, Dai and Liao [9] proposed an extension of the Hestenes-Stiefel (HS) conjugate gradient parameter using their proposed conjugacy condition and gave the new Dai-Liao (DL) conjugate gradient parameter where  > 0 is a scalar.This method was shown to be globally convergent for general functions.Following this, further developments of the DL method were proposed [28,38,43].And in Dai [8], a RMIL+ conjugate gradient parameter is presented, as a modification to accomplish the global convergence of the RMIL CG parameter by Rivaie et al. [34].Now, motivated by the ideas of these conjugate gradient parameters discussed above, specifically using the DL numerator    ( −1 −  −1 ), the HS and RMIL+ CG parameters, we construct a new CG parameter where with  > 0 a constant.We choose our direction as Notice that from (4) and ( 5), we get that The step length   is determined using the strong Wolfe line search conditions.The algorithm of our new CG method is presented as follows.
Algorithm 1.A new DP Conjugate Gradient method.

Global convergence
To present the convergence analysis, we make the following fundamental assumptions about the objective function  , which have been widely used in the literature.
Assumption 3.1.The level set where  0 is the starting point, is bounded.Assumption 3.2.The function  () is continuously differentiable and its gradient is Lispchitz continuous in some neighbourhood  of Ω, that is, there exists a constant  > 0 such that Let the sequences {  } and {  } be generated by Algorithm 1 for all  ≥ 0 with 0 <  < 1/4.Then Proof.Firstly, we use the fact that for any real number , and We prove by induction.The result (10) follows immediately for  = 0. Now, suppose that (10) holds for some  ≥ 0. Re-writing ( 6) for some  + 1 and multiplying by   +1 , we have When applying the strong Wolfe line search condition (9) and using the triangle inequality on (13), we obtain By (7) and the Cauchy-Schwarz inequality, the above inequality gives and on dividing by ‖ +1 ‖, and applying the induction process (10), leads to Therefore we obtain By ( 11) and ( 12) we get that and hence (10) has been established.This completes the proof.
Notice that from (10), if we square both sides we obtain that Lemma 3.2.Let the sequences {  } and {  } be generated by Algorithm 1 with 0 <  < 1/4.Then the sufficient descent property for all  ≥ 0 and constant  > 0, holds.
Next, we present the well-known Zountendijk condition, which was initially discussed in [44].
Lemma 3.3.Suppose Assumptions 3.1 and 3.2 hold and   is computed by Algorithm 1. Then In the following theorem, we establish global convergence of Algorithm 1.
Theorem 3.1.Let Assumptions 3.1 and 3.2 hold, and the sequences {  } and {  } be generated by Algorithm 1. Then Proof.We prove by contradiction.Suppose (21) does not hold, that is, there exists a constant ω > 0 such that ‖  ‖ ≥ ω, for all  ≥ 0. This means When   = −  +  DP   −1 is re-written as   +   =  DP   −1 and squared both sides, we have From (19), we obtain that and hence, from (23), we obtain that which, on using (7), leads to Dividing both sides of the above inequality by ‖  ‖ 4 , and using (15), we obtain and using (22), it follows that , for all  ≥ 0, which implies that , for all  ≥ 0.
We now have that contradicts (20).Hence ( 21) is true.The proof is complete.

Numerical experiments
In this section, we present results obtained from running the DP method (Algorithm 1) on a set of 105 unconstrained optimization problems with dimensions varying from 60 to 10 000.We also report results of three other CG methods in the literature to compare with our method.The first two methods include IMPRP by Jian et al. [22] and JJSL in [23], both which use the strong Wolfe line search with  = 0.1 and  = 0.01.And the other is the hFRBA method by Delladji [12], which also uses the strong Wolfe line search with  = 0.1 and  = 0.0001.For our DP method, we choose  = 0.1,  = 0.01 and  = 0.2.
The problems are taken from [4], except for Problems 13, 14 and 15, which are taken from [30].In Table 1, we present the names of the functions (Function Name), starting points ( 0 ) and dimensions (Dim) of these problems.The algorithms are stopped either when the number of iterations exceeds 10 000 or when the inequality ‖  ‖ ≤  is satisfied, where  = 10 −6 .All codes are written in MATLAB 2015b and run on a DELL desktop with Intel(R) Core(TM) i5-2400 CPU @ 3.10 GHz processor, 4 GB of RAM and Windows 10 operating system.Table 2 shows the results of the experiments in terms of the number of iterations (NI), function evaluations (FE), gradient evaluations (GE) and the time in seconds (TIME(s)) taken to solve a problem.An entry of "F" is made if the method fails to solve the problem within the maximum iterations.From Table 2, we can see that the DP method successfully solves 94% of the problems used, followed by IMPRP method at 90%, the JJSL at 89% and lastly, hFRBA method at 85%.
Another way to present the results is through performance profiles suggested by Dolan and Moré [14].Let  be the set of problems used for testing,   be the number of problems in ,  be the set of solvers (methods) in comparison and   be the number of solvers in .Here,  , is the number of iterations, number of function/gradient evaluations or CPU time in seconds obtained by solver  ∈  in solving problem  ∈ .The performance between the solvers is based on the ratio  We set  , = 2 max{ , :  ∈ } for an entry of "F" in Table 2.The performance profiles for number of iterations, number of function evaluations, number of gradient and CPU time in seconds are shown in Figures 1, 2, 3 and 4, respectively.In all these figures, we can see that the DP method is highly competitive and efficient because its graph is always above the other graphs or among the top graphs.In particular, for values of  between 2.3 and 5.3, where the DP is more efficient, its percentage success is as follows.In Figure 1, the DP method is the highest at 96%, followed by IMPRP with 91%, then JJSL with 87%, and finally hFRBA with 86%.For Figure 2 we have the DP method with 97%, followed by IMPRP with 93%, then JJSL with 90%, and lastly hFRBA with 88%.In Figure 3 we have the DP method with 96%, then IMPRP with 92%, followed by JJSL with 91%, and lastly hFRBA with 87%.

Application in portfolio selection
The theory of portfolio selection was initially proposed by Markowitz [29].A stock portfolio is a group of assets or stocks owned by an investor.Investors always seek to employ the best strategy in allocating and selecting their portfolio in order to make profit while incurring some risk.A criteria that can be employed here may be one that maximizes return, minimizes risk or minimizes risk with a specific target return [6,7].In this paper, we focus on the criteria that only minimizes risk.
Consider a portfolio consisting of  stocks.Return of stock   , denoted   , 1 ≤  ≤ , at time , is defined by where   and  −1 is the closing price of the stock at time  and  − 1, respectively.The mean return of the stock is defined as where  is the number of returns on the stock.The expected return of a portfolio of  assets is defined as where  is the expected return of stock   and   is the corresponding weight of the stock in the portfolio.The variance of return of stock, which measures how far the asset price has moved from the mean is calculated as It represents the risk of a portfolio [35].Covariance measures the relationship between two stocks in the portfolio, a positive covariance means the stock returns move together whereas a negative covariance means stock returns move inversely.It is calculated as We define the portfolio risk as the variance of the portfolio and denote it  2  .The risk-averse portfolio optimization problem, of a portfolio with  stocks, can be formulated as where   = [ 1 , . . .,   ] and  1 , . . .,   are the portfolio investment weighted proportions of the stocks in the portfolio.The matrix  is the variance-covariance matrix 2, . . . . . . . . .
Notice that ( 26) is a constrained optimization problem, which can be transformed into an unconstrained optimization problem of the form (1) by setting Furthermore, after some algebraic computations, equation ( 26) can be re-written as min where and  1,1 =  2 1,1 , . . .,  , =  2 , .For the numerical experiments, we choose a portfolio of 20 stocks ( = 20).We use weekly closing prices of 20 companies listed on the Johannesburg Stock Exchange.These companies are listed in Table 3 and the data is collected over a period spanning from 12 April 2020 to 3 April 2022 from the database https://www.investing.com/.Table 4 shows the mean and variances of the stocks.Table 5 shows the variance and covariances of the stocks and is set to be the symmetric matrix  in (26).
Running Algorithm 1, hFRBA, IMPRP and JJSL methods to solve (27) with initial points  0 = (0.1, . . ., 0.1), (0.2, . . ., 0.2), (0.3, . . ., 0.3) and (0.01, . . ., 0.01), where  0 ∈ R 19 , we obtained the solution   24) and ( 26), together with those in Tables 4 and 5, we obtain that μ = 0.0018 and  2  = 0.00144.This gives the allocation for each stock when investing under a criteria of minimizing risk as given in Table 6, with a portfolio risk of 0.00144 and an expected portfolio return of 0.0018.A negative allocation means the investor is short selling the stock, that is, selling stock that one does not own or that has been acquired on loan from a broker.Notice that because of risk-return trade-off, the strategy of minimizing risk when formulating the portfolio selection problem (27) minimizes expected return as well.

Conclusion and future work
In this paper, we proposed a new conjugate gradient method which has a direction that satisfies the sufficient descent property.The method is based on the ideas of the DL and RMIL+ conjugate gradient parameters.Its global convergence was established under the strong Wolfe line search.The method's efficacy was tested using a number of unconstrained optimization problems.Based on the numerical results, it showed to be efficient and robust as compared to other competing methods in the literature.Furthermore, the method's applicability was explored in portfolio selection, where a risk-averse portfolio optimization problem, with  stocks, is solved by transforming it into an unconstrained optimization problem.For future work, the proposed method can be extended to solve portfolio selection problems with more practical constraints, such as, for example, restricting the minimum and maximum proportions of asserts in a portfolio.The method can also be extended to solve other practical problems that arise in motion control, compressive sensing and image deblurring.

Figure 1 .
Figure 1.Number of iterations performance profiles.

Figure 2 .
Figure 2. Number of function evaluations performance profiles.

Figure 3 .
Figure 3. Number of gradient evaluations performance profiles.

Table 1 .
Table of problems, starting points and dimensions.

Table 2 .
Table of number of iterations, function evaluations, gradient evaluations and time in seconds.continued.

Table 3 .
Table of companies.

Table 4 .
Table of mean and variance.

Table 5 .
Table of variance and covariance.

Table 6 .
Table of stocks and allocations.