Optimal Solution Techniques in Decision Sciences: A Review∗

Methods to find the optimization solution are fundamental and extremely crucial for scientists to program computational software to solve optimization problems efficiently and for practitioners to use it efficiently. Thus, it is very essential to know about the idea, origin, and usage of these methods. Although the methods have been used for very long time and the theory has been developed too long, most, if not all, of the authors who develop the theory are unknown and the theory has not been stated clearly and systematically. To bridge the gap in the literature in this area and provide academics and practitioners with an overview of the methods, this paper reviews and discusses the four most commonly used methods to find the optimization solution including the bisection, gradient, Newton-Raphson, and secant methods. We first introduce the origin and idea of the methods and develop all the necessary theorems to prove the existence and convergence of the estimate for each method. We then give two examples to illustrate the approaches. Thereafter, we review the literature of the applications of getting the optimization solutions in some important issues and discuss the advantages and disadvantages of each method. We note that all the theorems developed in our paper could be well-known but, so far, we have not seen any book or paper that discusses all the theorems stated in our paper in detail. Thus, we believe the theorems developed in our paper could still have some contributions to the literature. Our review is useful for academics and practitioners in finding the optimization solutions in their studies.


Introduction
In many fields of science, economics, finance, and many other areas, the problems related to find the optimization solutions play a very important role in research. In statistics, the problem to find the optimization solution is basically in all models, for example, regression and time series models. In economics and finance, the problem to find the optimization solutions occurs in solving most of the economic and financial problems, for example, interest rate and stock returns. In social science research, this problem often occurs to solve many problems, for example, transportation problems. In marketing, the problem to find the optimization solutions occurs in solving most of the marketing and production problems, for example, cost of goods, etc. Methods to find the optimization solution are fundamental and extremely crucial for scientists to program computational software solving optimization problems efficiently and for practitioners to use it efficiently. Thus, it is very essential to know about the idea, origin, and usage of these methods.
There are numerous methods to find the optimization solutions. The four most ubiquitous approaches include bisection, gradient, Newton-Raphson (N-R), and secant methods. These methods are often applied in numerical analysis and they have many applications in statistics and numerous other fields. About this regard, there are several work have been studied and utilized in many different disciplines, for example, Broyden (1967), Chalco et al. (2015), Chin et al. (2018), Exl et al. (2019), Reddy et al. (2018), Sahu et al. (2018), Wu et al. (2017), Yang et al.(2019).
Until now, getting methods of finding the optimization solutions is still a very important problem. So far, the N-R method is found to be the most commonly utilized method. The N-R method is employed by many statisticians to find the optimization solutions in estimating functions of many models, for example, regression with missing data. For example, Wang et al. (2002) use the N-R method to find the optimization in the logistic regression model with missing covariates. Lukusa et al. (2016) employ the N-R method to find the optimization in the zero-inflated Poisson (ZIP) regression model with missing covariates. Other papers using this method include Hiesh et al. (2009 and, Lee et al. (2012 and. Readers may refer to Pho and Nguyen (2018) for the detail of the algorithm and applications of the N-R method.
Although the methods to find the optimization solutions have been used for very long time and there are numerous articles present about methods to find the optimization solution, the theory of finding the optimization solutions has been developed too long that most, if not all, of the authors who develop the theory are unknown and the theory has not been stated clearly and systematically. To bridge the gap in the literature in this area and provide academics and practitioners with an overview of the methods to find the optimization solution, our primary goals in this paper is to review and discuss the four most commonly used methods to find the optimization solution including the bisection, gradient, Newton-Raphson, and secant methods. In this study, we will discuss the origin and idea of the methods, discuss how to utilize the methods and discuss its applications.
To do so, in this paper we first introduce the origin and idea of methods to find the optimization solution and develop all the necessary theorems to prove the existence and convergence of the estimate for each method. We then give two examples to illustrate the approaches. Thereafter, we review the literature of the applications of getting the optimization solutions in some important issues by using the methods discussed in our paper and discuss the advantages and disadvantages of each method. We note that all the theorems developed in our paper could be well-known but, so far, we have not seen any book or paper that discusses all the theorems stated in our paper in detail, and thus, we believe the theorems developed in our paper could still have some contributions to the literature. Our review is useful for academics and practitioners in finding the optimization solutions in their studies.
The rest of the paper is structured as follows. In Section 2, we introduce the origin and idea of methods to find the optimization solution and develop all the necessary theorems to prove the existence and convergence of the estimate for each method. Section 3 provides two examples to illustrate the approaches of finding the optimization solution. Section 4 reviews the literature of the applications of getting the optimization solutions in some important issues by using the methods discussed in our paper. The last section gives some concluding remarks and inferences and discuss the advantages and disadvantages of methods being discussed in our paper.

The methods to find the optimization solution
In this section, we discuss the origin and idea of four approaches to find the optimization solutions that are most used in the literature. These methods include the bisection method, gradient method, Newton-Raphson method, and secant method. We first discuss the bisec-tion method.

Bisection method
The bisection method is a naive method based on the following theorem.

Theorem 1
Assuming that g(u) is a continuous function on [a, b] and g(a)g(b) < 0, then there exists at least one solution in the (a, b) interval for the equation g(u) = 0.
Theorem 1 is also known as Bolzano's theorem. This theorem was first proved by Bernard Bolzano in 1817. Augustin-Louis Cauchy provided an alternative proof in 1821. The procedure to implement the bisection method are as follows: one first finds the (a, b) interval satisfies g(a)g(b) < 0. Thereafter, one divides the (a, b) interval into two equal intervals. Let (a 1 , b 1 ) be an one of the above two intervals and satisfies g(a 1 )g(b 1 ) < 0. Afterwards, (a 1 , b 1 ) will be divided into two equal intervals. Let (a 2 , b 2 ) be an one of the above two intervals and satisfies g(a 2 )g(b 2 ) < 0. Then, one further repeats the above process. If at a i such that g(a i ) = 0 or at b i satisfies g(b i ) = 0, then one can conclude that a i or b i is a solution of g(u).
The process work of the bisection method is described in Figure 1. In Figure 1, we provide a few steps of the bisection method illustrated with the initial [a 1 ; b 1 ] interval. The bigger red dot is an optimization solution of g(u) = 0.

Theorem 2
In the procedure described in the above, we have g(a i ) → 0 and g(b i ) → 0 as i → ∞.
We note that Theorem 2 does not only hold for the bisection method, but also holds for all the methods discussed in this paper. Since the contents of the theorems for the convergence of the estimates in all other methods are the same, we skip display the theorem for other methods for simplicity.

Gradient method
Another commonly used approach is the gradient method. This is one of the ubiquitous method to find the optimization solution Terlaky (2013). Our main objective is to obtain the difference between u k and u k+1 and u * such that the difference between u k and u k+1 is within our acceptance level where u k is the obtained point after the k th loop. The best situation is that u k coincides with u * but usually, this is not the case. Then, we first construct the following theorem which shows the working process of the gradient method:

Theorem 3
Let g : R → R be a differentiable function, u k be an obtained point after the k th loop, and u * be an optimization solution of g(u). We have the following: 1. If g (u k ) > 0, then u k is on the right of u * ; and 2. if g (u k ) < 0 then u k is on the left of u * .
The gradient method could be used for the one-dimension case and the multi-dimension case.

One-dimension
We first discuss the one-dimension case. The idea of the gradient method in the onedimension case is based on its derivative. From Theorem 3, it well known that if g (u k ) > 0, then u k is on the right of u * . Similarly, if g (u k ) < 0, then u k is on the left of u * . In either situations, to the next value of u k is u k+1 that we wish it will be close to u * . If this is the situation, then one needs to move to the opposite of the sign of g (u k ). In general, we have where α is a quantity opposite to the sign of g (u k ). It can be rewritten as follows: where γ > 0 is the learning rate. The minus sign in the above expression shows that one needs to move backwards with the sign of its derivative. The formula (1) is a gradient method in the one-dimension case. The process of the gradient method is described in Figure 2. It can be observed from Figure 2 that, the objective function is g(u) = u 2 and the corresponding optimization solution is u = 0. If x k > 0, then f (x k ) > 0 as shown as the red dots on the right-hand side of the figure. If x k < 0, then f (x k ) < 0, as shown as the blue dots on the left-hand side of the figure. At point u = 0, it is illustrated by a dot with both red and blue color. This is also the optimization solution of g(u) = u 2 .

Multi-dimension
We turn to discuss the multi-dimension case. Let g : R n → R be a differentiable function, the direction of steepest descent is the vector −∇g(u 0 ), where u 0 is an initial value. To explain this phenomenon, we consider the following function where a is a unit vector; this means that a = 1. Using the chain rule, we have H (y) = ∂g ∂u 1 ∂u 1 ∂y + · · · + ∂g ∂u n ∂u n ∂y = ∂g ∂u 1 a 1 + · · · + ∂g ∂u n a n = ∇g(u 0 + ya).a .

Thus, we obtain
where α is the angle between ∇g(u 0 ) and a. We remark that H (0) in (2) is minimized when α = π. Therefore, we get Hence, the problem of minimizing a function of numerous variables can be reduced to a single variable minimization problem by solving the minimum of H(y) for the choice of a.
After addressing the minimizer y 0 , we let and continue the process by finding from u 1 in the direction of −∇g(u 1 ) to get u 2 by minimizing H 1 (y) = g(u 1 − y∇g(u 1 )), and so on. Now, we discuss the procedure of the Method of Steepest Descent. Given a starting value u 0 , the sequence of iterates {u k } will be calculated by using the following formula: where y k > 0 minimizes the following function The expression in (3) is a gradient method in multi-dimension.

Newton-Raphson method
We turn to discuss how to use the Newton-Raphson (N-R) method to find the optimization solution of the equation. The N-R method could also be used for the one-dimension case and the multi-dimension case.

One-dimension
We first discuss the one-dimension case in which the optimization solution of the equation The N-R method is originated from the presumption of the derivative of a function at a point: One has to find u * that satisfies g(u * ) = 0; this means that g(u 2 ) = 0. Hence, Generally, the formula solution of the N-R method can be expressed as follows: in which with u 0 is a starting value. We reduplicate the above formula until the difference between two contiguous solutions is smaller than α, a very small value we can accept.
We note that using the N-R method to find the optimization solution of the equation g (u) = 0 can be obtained from Equation (6) as follows: The process work of the Newton-Raphson method is provided in Figure 3. It can be observed from Figure 3 that the optimization solution of y = g(u) is described on the initial [a,b] interval by utilizing the Newton-Raphson method. Choosing an initial value u 0 = b, the tangent of y = g(u) at u 0 = b cuts the u axis at point u 1 , the tangent of y = g(u) at u 1 cuts the u axis at point u 2 , and so on, further repeating this process, one can obtain the optimization solution of y = g(u) to be u * . It can be seen be that the Newton-Raphson method utilizes the intersection of the tangent and horizontal axis, and thus, the Newton-Raphson method is also called the tangent method.

Multi-dimension
We turn to discuss the multi-dimension case. To build the N-R method to find the optimization solution of the equation that g(u) = 0 in the multi-dimension case, we need to approximate g(u) near the current repetition u n by tge function h n (u) such that the system of equations h n (u) = 0 is uncomplicated to address. Thereafter, we utilize the solution as the next repetition u n+1 and repeat the procedure. An appropriate selection for h n (u) is a linear approximation of g(u) at u n , whose graph is the tangent of g(u) at u n such that: Then, we address the following system of equations: Therefore, we do the following: Likewise, in case of the objective function is g (u), then the formula in (7) can be inferred to have the N-R algorithm in the multi-dimension case by substituting g (u) by a gradient ∇g(u), and replacing g (u) by the inverse of the Hessian matrix Hg(u). The formula solution of the N-R algorithm in the multi-dimension case can be then described as follows: where Similar to the procedure used in the one-dimensional case, to achieve the optimization solution by employing the Newton-Raphson algorithm in the multi-dimension situation, one needs to reduplicate the formula (10) until an adequately correct value is reached.

Secant method
The last one we discuss is how to use the secant method to find the optimization solution of the equation. The secant could also be used for the one-dimension case and the multidimension case.

One-dimension
We first discuss the one-dimension case. It well known that By replacing Equation (11) into Equation (6), we obtain the following equation: The formula in Equation (12) is referred to the secant method that is the method in which derivative is substituted by its approximation. Thus, it is also called a quasi-Newton method.

Multi-dimension
We turn to discuss the multi-dimension case. Similarly in one-dimension, to find the minimum of g(u), one can use its second-order approximation. Because the Taylor's expansion series is where ∇g is the gradient and H is an approximation of the Hessian matrix, the approximation of the gradient is, thus, given by Setting the above expression equal to zero, we get Thus, B is selected as follows This result is well known in the literature and it is also called the secant equation.
(ii) Gradient method Since it is easy to get g (u) = 3u 2 + 2u + 1, to use the gradient method, we choose the learning rate γ = 0.009. Following the expression in Equation (1), the sequence {u n } is defined as follows: From the above iterations, one can see that if more iterations are performed, then the value will converge to the exact solution of the original equation.

(iii) Newton-Raphson method
To use the Newton-Raphson method, one can observe that g (v) = 3v 2 + 2v + 1. The sequence {v n } is defined as follows: Last, we illustrate how to use the secant method. Based on the formula in (12), the sequence {u n } is defined as follows: Assuming that u 0 = 1 and u 1 = 0, we have It can be seen that if more iterations are performed, then the value will converge to the exact solution of the original equation.
Example 1 provides an illustration for the one-dimensional case. We turn to provide the following example for the two-dimensional case. To save space, we only discuss getting the optimization solution by utilizing the gradient and Newton-Raphson methods as shown in the following example.
Example 2: To find the optimization solution of g(u 1 , u 2 ) = 4u 2 1 − 4u 1 u 2 + 2u 2 2 by utilizing the gradient and Newton-Raphson methods with a starting value u (0) = (1, 1), we performing the 6 iterations by utilizing the gradient and Newton-Raphson methods: We first discuss the gradient method. It can be seen that Thereafter, one minimize the following function: Hence, This strictly convex function has a strict global minimum when H (y) = 128y −16 = 0, or y = 1/8, as can be seen by noting that H (y) = 128 > 0. We now conduct the following iterations:

Iteration 1
In Iteration 1, we first obtain Thereafter, we get We then let

Iteration 2
We turn to Iteration 2. To do so, we first get . and and get Thus, H (y) = 0 when y = 1/8 and H (y) = 32, this critical point is a strict global minimizer in Iteration 2.

Iteration 3
We turn to Iteration 3. To do so, we first get Therefore, we have and In addition, we obtain Thus, H (y) = 0 when y = 1 4 and H (y) = 4, this critical point is a strict global minimizer in Iteration 3.

Iteration 4
We are now doing Iteration 4. To do so, we first get ) .

Iteration 5
Next, we illustrate Iteration 5. To do so, we first obtain .
Hence, we get and In addition, we have Thus, we get H (y) = 0 when y = 1 4 and H (y) = 1, this critical point is a strict global minimizer in Iteration 5.

Iteration 6
Last, we illustrate Iteration 6. To do so, we first get .
It has been seen that the method of steepest descent produces a sequence of iterates u k that is converging to the strict global minimizer of g(u 1 , u 2 ) at u * = (0, 0). This ends our illustration of the gradient method.

Iteration 2
In Iteration 2, we first get Thus, we have It can be seen from this example that the Newton-Raphson method will converge faster than the gradient method. From Examples 1 and 2, it has been seen that, the Newton-Raphson

Applications of Optimization Solutions in Decision Sciences
In order to provide academics and practitioners with an overview of the methods to find the optimization solution, in this article we review of the four most commonly used approaches to find the optimization solution. After applying the methods in the analysis, one can utilize many other statistical models, for example, regression models, to find the optimization solutions in some of problems including interest rate, transportation, the issues involve to the cost of goods, etc. There are many areas in decision sciences that can use the approaches discussed in our paper to get optimization solutions. We discuss a few in this section.

Portfolio Optimization
The first area in decision sciences that can use the approaches discussed in our paper to get optimization solutions is portfolio optimization. The portfolio optimization is the milestone of modern finance theory for asset allocation, investment diversification, and optimal portfolio construction after Markowitz (1952) introduces the theory. In the procedure, investors select portfolios that maximize profit subject to achieving a specified level of calculated risk or, equivalently, minimize variance subject to obtaining a predetermined level of expected gain. Bai, Liu, and Wong (2009a) prove that the estimates proposed by Markowitz (1952) is seriously depart from its theoretic optimal return and they call this phenomenon "overprediction." In order to circumvent this over-prediction problem, they use a new method by incorporating the idea of the bootstrap into the theory of large dimensional random matrix. They develop new bootstrap-corrected estimates for the optimal return and its asset allocation, and prove that these bootstrap-corrected estimates can analytically correct the over-prediction and drastically reduce the error. They also show that the bootstrapcorrected estimate of return and its corresponding allocation estimate are proportionally consistent with their counterpart parameters.
There are some advantages of the approach introduced by Bai, Liu, and Wong (2009a) to obtain the optimal return and its asset allocation. However, the weakness of their approach is that there is no closed form for their estimator. To circumvent the limitation, Leung,

Stochastic Dominance
Another important area in decision sciences that can use the approaches discussed in our paper to get optimization solutions is stochastic dominance (SD) for different types of investors.
Readers may refer to Wong and Li (1999), Li and Wong (1999), Wong (2007) (2016) for the SD theory for risk averters and risk seekers; refer to Levy andLevy (2002, 2004) and Wong and Chan (2008) for the prospect SD (PSD) and Markowitz SD (MSD) to link to investors with the corresponding S-shaped and reverse S-shaped utility functions; and refer to Leshno and Levy (2002), Guo, Zhu, Wong, and Zhu (2013), Guo, Post, Wong, and Zhu (2014), and Guo, Wong, Zhu (2016) for the theory of almost SD.
The approaches discussed in our paper is useful to the SD theory because there are several SD tests that can be used the approaches discussed in our paper to get optimization solutions. For example, Bai, Li, McAleer, and Wong (2015) extend the SD test statistics developed by Davidson and Duclos (2000) to get SD tests for risk averters and risk seekers, Bai, Li, Liu, and Wong (2011) (2000) has better size and power performances than two alternative tests. The approaches discussed in our paper is useful to their SD test statistics.

Risk Measures
The third important area discussed in our paper related to decision sciences that can use the approaches discussed in our paper to get optimization solutions is risk measures. We include mean-variance rule as one of the risk measures. Readers may refer to Markowitz (1952) and Wong (2007) for the MV rule for risk averters and risk seekers, respectively, refer to Leung and Wong (2008), Wong, Wright, Yam, and Yung (2012), and the references there in for the Sharpe ratio, refer to Ma and Wong (2010) and the references therein for VaR and conditional-VaR (CVaR), refer to Guo, Jiang, and Wong (2017), Guo, Chan, Wong, and Zhu (2018), and the references therein for the Omega ratio, refer to Niu, Wong, and Xu (2017) and the references therein for the n-order Kappa ratio, refer to Guo, Niu, and Wong (2019) and the references therein for the Farinelli and Tibiletti ratio, and refer to Niu, Guo, McAleer, and Wong (2018), Lu, Yang, Wong (2018), Lu, Hoang, and Wong (2019) and the references therein for the economic performance measure of risk and the economic index of riskiness, refer to Bai, Wang, Wong (2011), Bai, Hui, Wong, Zitikis ( 2012 for the mean-variance ratio test, refer to Tang, Sriboonchitta, Ramos, Wong (2014), Ly, Pho, Ly, Wong (2019a,b) for Copulas.
Furthermore, there are other risk measures, for example, Guo, Li, McAleer, Wong, (2018), etc. In addition, there are many applications for the risk measures, see, for example, our discussion in Sections 4.1 and 4.2 for the applications.

Behavioral Models
We first review the utility functions that are the basics of the behavioral models. Utility starts with Bernoulli (1738) who first notes that people are risk averse. However, academics find that people are not always risk averse or even risk neutral; most people have risk-seeking behavior like buying lottery tickets. Hammond (1974), Stoyan (1983), Wong and Li (1999), Li and Wong (1999), Wong (2007), Levy (2015), Guo and Wong (2016), and others consider investors could be risk-averse or risk-seeking. Markowitz (1952), Levy andLevy (2002, 2004), Wong and Chan (2008) suggest investors could follow S-shaped as well as reverse S-shaped utility functions. Broll, Egozcue, Wong, and Zitikis (2010) and Egozcue, Fuentes Garca, Wong, and Zitikis (2011) further study investment behaviors for investors could follow Sshaped as well as reverse S-shaped utility functions. Guo, Lien, and Wong (2016) develop the exponential utility function with a 2n-order approximation for any integer n. Thompson and Wong (1991), Thompson and Wong (1996), Wong and Chan (2004) and others extend the dividend yield plus growth model (Gordon and Shapiro, 1956) by estimating the cost of capital using discounted cash flow (DCF) methods requires forecasting dividends and proving the existence and uniqueness of the reliability. Wong (2010, 2012), Fung, Lam, Siu, and Wong (1998), and Guo, McAleer, Wong, and Zhu (2017) apply the cost of capital model and use Bayesian models to explain investors' behavioral biases by using the conservatism heuristics and the representativeness heuristics.
Readers may ask why behavioral models can use the approaches discussed in our paper to get optimization solutions? Our answer is that after one develops any behavioral model, one may then develop the corresponding econometric models so that the behavioral models can be estimated. For example, Fabozzi, Fung, Lam, and Wong (2013) extend the models developed by Wong (2010, 2012), Guo, McAleer, Wong, and Zhu (2017) and others by developing 3 tests to test for the magnitude effect of short-term underreaction and long-term overreaction that can use the approaches discussed in our paper to get optimization solutions. On the other hand, Wong, Chow, Hon, and Woo (2018) conduct a questionnaire survey to examine whether the theory developed by Wong (2008, 2010), and Guo, McAleer, Wong, and Zhu (2017) and others that can use the approaches discussed in our paper to get optimization solutions.
There are many other behavior models also. For example, Egozcue and Wong (2010a) and Egozcue, Fuentes Garca, Wong, and Zitikis (2012a) develop an analytical theory to explain the behavior of investors with extended value functions in segregating or integrating multiple outcomes when evaluating mental accounting. Guo, Wong, Xu, and Zhu (2015), Egozcue, Guo, and Wong (2015),  develop models to investigate regret-averse firms' production and hedging behaviors while Guo, Egozcue, and Wong (2019) develop several properties of using disappointment aversion to model production decision.

Economic and Financial Indicators
Most of economic and financial indicators could be related to decision sciences and can use the approaches discussed in our paper to get optimization solutions. There are many economic and financial indicators that can use the approaches discussed in our paper to get optimization solutions. We only discuss those related to our work.
We have developed some financial indicators and have applied some economic indicators to study some important economic issues that could be related to decision sciences and can use the approaches discussed in our paper to get optimization solutions. For example, Wong, Chew, and Sikorski (2001)  than buy-and-hold strategies, but also produce greater wealth compared with TA strategies without trading rules.
In addition, Chong, Cao, and Wong (2017) develop a new market sentiment index for the Hong Kong stock market by using the turnover ratio, short-selling volume, money flow, HIBOR, and returns of the U.S. and Japanese markets, the Shanghai and Shenzhen Compos-ite indices. Thereafter, they incorporate the threshold regression model with the sentiment index as a threshold variable to capture the state of the Hong Kong stock market. Sethi, Wong, and Acharya (2018) examine the sectoral impact of disinflationary monetary policy by calculating the sacrifice ratios for several OECD and non-OECD countries. Sacrifice ratios calculated through the episode method reveal that disinflationary monetary policy has a differential impact across three sectors in both OECD and non-OECD countries.

Statistical and Econometric Models
Most of statistical and econometric models could be related to decision sciences and can use the approaches discussed in our paper to get optimization solutions. There are many statistical and econometric models that can use the approaches discussed in our paper to get optimization solutions. We only discuss those related to our work.
We first unit root test, cointegration, causality tests, and nonlinearity Tiku and Wong (1998) develop a unit root test to take care of data follow an AR(1) model. Penm, Terrell, Wong (2003) present simulations and an application that demonstrates the usefulness of the zero-non-zero patterned vector error-correction models (VECMs). Lam, Wong, and Wong (2006) develop some properties on the autocorrelation of the k-period returns for the general mean reversion (GMR) process in which the stationary component is not restricted to the AR(1) process but takes the form of a general ARMA process. Bai, Wong, and Zhang (2010) develop a nonlinear causality test in multivariate settings. Bai, Li, Wong, and Zhang (2011) first discuss linear causality tests in multivariate settings and thereafter develop a nonlinear causality test in multivariate settings. Bai, Hui, Jiang, Lv, Wong, Zheng (2018) revisit the issue by estimating the probabilities and reestablish the CLT of the new test statistic. Hui, Wong, Bai, and Zhu (2017) propose a quick and efficient method to examine whether a time series possesses any nonlinear feature by testing a kind of dependence remained in the residuals after fitting the dependent variable with a linear model. All the above models are be related to decision sciences and can use the approaches discussed in our paper to get optimization solutions. Literature of applying unit root, cointegration, causality and nonlinearity tests includes Wong, Penm, Terrell, and Lim (2004), Wong, Khan, and Du (2006), Qiao, Liew, andWong (2007), Foo, Wong, andChong (2008), Qiao, Smyth, and Wong (2008), Qiao, Chiang, and Wong (2008), Chiang, Qiao, and Wong (2009), Qiao, McAleer, and Wong (2009), Qiao, Li, and Wong (2011), Vieito, Wong, and Zhu (2015, Batai, Chu, Lv, Wong (2017), Chow, Cunado, Gupta, Wong (2018), Chow, Vieito, Wong (2018, Zhu, Bai, Vieito, Wong (2018), Demirer, Gupta, Lv, Wong (2019), Chow, Gupta, Suleman, Wong (2019), and many others.
We next discuss some robust estimation that can use the approaches discussed in our paper to get optimization solutions. We only discuss those related to our work. Firstly, Wong and Bian (1997) develop an alternative approach to estimate regression coefficients while Wong and Bian (2000) introduce the robust Bayesian estimator developed by Bian and Dickey (1996) to the estimation of the Capital Asset Pricing Model (CAPM) in which the distribution of the error component is well-known to be flat-tailed. Tiku, Wong, Vaughan, and Bian (2000) consider AR(q) models in time series with non-normal innovations represented by a member of a wide family of symmetric distributions (Student's t).
Since the ML (maximum likelihood) estimators are intractable, we derive the MML (modified maximum likelihood) estimators of the parameters and show that they are remarkably efficient. We use these estimators for hypothesis testing, and show that the resulting tests are robust and powerful. Tiku, Wong, and Bian (1999a) extend the work by considering AR(q) models in time series with asymmetric innovations represented by gamma and generalized logistic distributions. Tiku, Wong, and Bian (1999b) estimate coefficients in a simple regression model with autocorrelated errors and the underlying distribution is assumed to follow Student's t family. Wong and Bian (2005) extend the results to the case, where the underlying distribution is a generalized logistic distribution.
We have been developing or applying some other statistical and econometric models that can use the approaches discussed in our paper to get optimization solutions. We state a few here. First, Wong and Miller (1990) Mou, Wong, and McAleer (2018) analyze core enterprise credit risks in supply chain finance by means of a 'fuzzy analytical hierarchy process' to construct a supply chain financial credit risk evaluation system, making quantitative measurements and evaluation of core enterprise credit risk.
In addition, Pham, Wong, Moslehpour, and Musyoki (2018) suggest an outsourcing hierarchy model based on the concept of the analytic hierarchy process with four levels of the most concerned attributes: competitiveness, human resources, business environment, and government policies and compare between the analytic hierarchy process (AHP) and Fuzzy AHP show some significant differences but lead to similar conclusions. They provide decision makers an outsourcing hierarchy model based on the AHP and Fuzzy AHP approach with the most concerned factors.
We note that it is not only statistical and econometric models related to decision sciences that can use the approaches discussed in our paper to get optimization solutions. There are many other models, for example, probability and mathematical models that can use the approaches discussed in our paper to get optimization solutions. Many probability and mathematical models can use the approaches discussed in our paper to get optimization solutions because first, the models themselves could use the approaches discussed in our paper to get optimization solutions and second, we will further extend the probability and mathematical models to develop the corresponding statistical and econometric models so that the probability and mathematical models can be used in real data analysis while the corresponding statistical and econometric models could use the approaches discussed in our paper to get optimization solutions.
Over here, we give a few examples. Egozcue, Fuentes García, and Wong (2009)  decision rules for multiple products, which generally call 'exposure units' to naturally cover manifold scenarios spanning well beyond 'products'. All the above models could use the approaches discussed in our paper to get optimization solutions.
Last, we note that there are many other areas in decision sciences that can use the approaches discussed in our paper to get optimization solutions, in this paper we also review some as discussed in the above. For more applications in decision sciences that can use the approaches discussed in our paper to get optimization solutions, readers may refer to Chang, McAleer, and Wong (2015, 2018a, 2018b, 2018c) for more information.

Concluding Remarks and Discussion
In order to provide academics and practitioners with an overview of the methods to find the optimization solution, in this article we review of the four most commonly used approaches to find the optimization solution, including the bisection, gradient, Newton-Raphson, and secant methods. We have also developed all the necessary theorems to prove the existence and convergence of the estimate for each method in our paper and give two examples to illustrate our approaches. Since there are many areas in decision sciences that can use the approaches discussed in our paper to get optimization solutions, in this paper we also review the literature of the applications of getting the optimization solutions in some important issues by using the methods discussed in our paper and discuss the advantages and disadvantages of each method.
We note that all the theorems developed in our paper could be well-known but so far we have not seen any book or paper that discusses all the theorems stated in our paper in detail, and thus, we believe the theorems developed in our paper could still have some contributions to the literature. Our review is useful for academics and practitioners in finding the optimization solutions in their studies.
From our discussion and illustration, all four approaches including bisection method, gradient method, Newton-Raphson method, and secant method can find the optimization solution. We note that the bisection method could get convergent very slow, especially for the analysis in the multi-dimensional studies. The disadvantage of using the gradient method is that it not only depends on the starting value, but also relies on the learning rate. Thus, in order to get good use of this approach, one needs to provide both good starting value and good learning rate. On the other hand, both Newton-Raphson and secant methods only depend on the starting value. The secant method needs to have two starting values, while the Newton-Raphson method only needs to have one starting value.
It can be seen that, the Newton-Raphson method is as one of the most powerful tools to find the optimization solution. This is also the reason why most common optimization software uses the Newton-Raphson algorithm as the foundation. Hence, we recommend utilizing the Newton-Raphson method to find the optimization solution in the estimating problem. Nevertheless, in numerous practical applications, it is very difficult to get the first and second derivative of the objective function. Thus, it is not easy to apply the Newton-Raphson method. To solve this issue, we propose to apply the quasi-Newton method (secant method). We have presented the method in our paper. This is also the reason why most common optimization software uses the quasi-Newton algorithm as the foundation. Hence, we recommend utilizing the quasi-Newton method to find the optimization solution in the estimating problem.
In this regard, one can utilize the secant method or the numerical method for the Newton-  Nash (1990), Toomet et al. (2015), Zhu et al. (1997)).