Comparison of Newton-Raphson Algorithm and MaxLik Function

Our main objective is in antagonizing the performance of two approaches: the Newton-Raphson (N-R) algorithm and maxLik function in the statistical software R to obtain optimization roots of estimating functions. We present the approach of algorithms, examples and discussing about two approaches in detail. Besides, we prove that the N-R algorithm can perform if our data set contain missing values, while maxLik function cannot execute in this situation. In addition, we also compare the results, as well as, the time to run code to output the result of two approaches through an example is introduced in [1].


Introduction
In statistical inference and applied mathematics, estimating functions play an extremely vital position in researches.If having the estimating function we can execute some of approaches to gure out this issue.
Comprehensive theory and its applications can be obtained from numerous reference books on statistics.
In [2], Godlambe presented about esti-mating functions in which a function includes the data set and parameters need to be estimated.
An overview, the estimating function can be described by H with provided that H(data, ψ) = 0, where ψ ∈ Ψ and Ψ is a parameter space.The issues are associated with nding an optimization root to estimating functions are exceedingly crucial in several areas, such as: statistical inferences, mathematics, technology and economics, etc.Therefore, it is extremely meaningful to study of these problems.There are numerous approaches to obtain optimization roots for instance: the secant method, gradient method, Newton-Raphson algorithm, etc where the repetitive Newton-Raphson algorithm is one of the most widespread executed.About this regard, there are several scholars had researched and performed it.For example: Riks [3] presented the application of Newton's method to the problem of elastic stability, Broyden [4] introduced about quasi-Newton methods and their application to function mini-misation, Polyak [5] researched about the N-R method and its application in optimization, Chalco et al. [6] presented about on the Newton method for solving fuzzy optimization problems.Bakari et al. [7] introduced the application of Newton-Raphson method to non-linear models, Wu et al. [8] researched about a regularized Newton method for computing ground states of Bose-Einstein condensates, Chin et al. [9] presented an efcient alternating newton method for learning c 2018 Journal of Advanced Engineering and Computation (JAEC) factorization machines, Ferreira et al. [10] introduced about inexact Newton method for nonlinear functions with values in a cone, Mokhtari et al. [11] researched about IQN: An incremental quasi-Newton method with local superlinear convergence rate, etc.Furthermore, there are several available functions in R to obtain optimization roots, for example: the maxLik function is rst introduced by Henningsen et al. [12].Nash [13] developed an optim function.This function is derived from the algorithm of Zhu et al. [14].Hasselman [15] proposed about Nleqslv function, etc.Although the problem about nding the optimization solution of estimating functions has been extensively studied and widely applied in various elds, the issue about comparing the performance of these approaches has not been yet researched.

MaxLik function
The maxLik function is an ubiquitous available function to obtain the optimization root.This function is rst introduced by Henningsen et al. [12].Similarly, the other R packages, the maxLik package needs to be installed and loaded before using.The command to install and load maxLik function are as follows: >install.packages("maxLik")>library(maxLik) The simplest formula of the maxLik function is: maxLik(logLik, start), where logLik is the log-likelihood function of a target function, start is a starting value of parameters need to be estimated, which can get the real value or vector.

Complete cases (CC) estimator
Assuming that our data set having missing values.The complete case (CC) estimator is only performed on the data set no missing values, while the data set having missing values will be removed.As a result, the sample size in our data set will be reduced signicantly.This issue will seriously aect the results in researches.Let η i be a missing-ness status of X i i.e. η i = 1 when X i is observed and η i = 0 otherwise.Let T be a surrogate variable of X and T is independent of Y given (X, Z).The validation data set (η i = 1) includes (Y i , X i , V i ) and non-validation data set The general estimating function by CC estimators of the regression model parameters when covariates are missing at random (MAR), denoted by U CC,n (α), can be represented as follows: where α are interested parameters and S i (α) is the rst derivative of the log-likelihood function of P (Y i = 1|X i , V i ) with respect to α.
By guring out U CC,n (α) = 0 can be acquired αCC which is an estimator of α.
It has been seen that, E[U CC,n (α)] = 0. so it is called a biased estimating function.Wang et al. [16] and Lukusa et al. [17] also stated that the complete-cases estimator is not a trustworthy approach.

Inverse probability weighting (IPW) estimator
The inverse probability weighting (IPW) estimator is an improvement to complete case (CC) estimator.Zhao and Lipsitz [18] proposed an IPW estimator.Basically how this approach works as the complete case estimator, i.e. it only considers and works on data set with no missing values.
Nevertheless this approach is based on wighting observations.The authors have shown that this is a reliable method.Below MAR mechanism.
general its formula can be illustrated as follows: where α are interested parameters and S i (α) is the rst derivative of the log-likelihood function of P (Y i = 1|X i , V i ) with respect to α.
Let αW be an estimator of α that can be acquired by guring out U W,n (α, π) = 0.
In practice, π(Y i , V i ) is usually unknown and it is usually estimated by non-parametrically method [16].If π(Y i , V i ) is a correctly estimated, αW will usually be a consistent estimator of α.
Let v 1 , v 2 , . . ., v m be distinct values of the V i s.The non-parametric estimator of π(y, v) is provided as follows: where I(A) is an indicator function of A, y are natural numbers and v ∈ {v 1 , v 2 , . . ., v m }.
The function (4) can be expressed as follows: c 2018 Journal of Advanced Engineering and Computation (JAEC)

Joint conditional likelihood (JCL) estimator
The joint conditional likelihood (JCL) estimator is rst introduced by Wang et al. [19].This approach is based on both the validation and non-validation data set.The general its formula can be described as follows: Where α are interested parameters and S 1i (α) and S 2i (α) is the rst derivative of the loglikelihood function of P (Y i = 1|X i , V i , η i = 1) and P (Y i = 1|V i , η i = 0) with respect to α, respectively.The authors have shown that E[U J,n (α, π)] = 0 as a result, U J,n (α, π) is an unbiased estimating function.
Missing data is an ubiquitous issue is usually encountered in, e.g., health, education and transportation, etc.This issue arises by numerous reasons, such as: respondents do not response to a certain item in the survey questions, nonacceptance to response, incomprehensible response, etc. [20].The issues are associated with missing data can also be classied by 2 dierent types: missing outcome and missing covariates.
The problems about estimate parameters in regression models with missing data have been extensively studied and widely applied in various elds by several scholars.for instance: Wang et al. [19] performed an JCL estimator to estimate parameters in logistic regression with missing covariates.This method aslo extended by Hsieh et al. [21] and Lee et al. [22] in their studies.

All above authors performed a Newton
algorithm and Maxlik function to obtain the optimization root to estimating functions.We present the Maxlik function owing to the fact that it is an available widespread function to get the optimization root and it does not use the Hessian matrix in formula.In numerical analysis, the Newton-Raphson is an ubiquitous repetitive algorithm to get roots to a target function g(u) (solutions of g(u) = 0).In Statistics and optimization, the N-R algorithm is one of the most widespread performed algorithms to nd roots of the derivative of function g(u) (solutions of g (u) = 0).Our main objective is comparing the performance of two approaches to obtain the optimization root.Hence, we do not introduce about how to set up algorithms in detail.We only present formulas and some examples of two approaches.2.1.Newton-Raphson algorithm (a) Case 1: One-dimension Let g(u) be a target function need to be found its roots by performing the Newton-Raphson algorithm.The expression root of the N-R algorithm can be described as follows:

(b) Case 2 :
Multi-dimensionThe expression(2) can be extrapolated to the N-R algorithm in numerous dimensions by substituting the derivative of the target function by a gradient, ∇g(u), and substituting the reciprocal of the second derivative by the inverse of the Hessian matrix Hg(u).The expression root of the N-R algorithm in multi-dimension can be then illustrated by:

5 ).
are independently.In this study, we assume that Z i2 = X i2 and Z i3 = X i6 and choosing initial values as follows: α = (−0.3,1.2, 0.5, −0.75, −1, 0.8, 0) T and β = (−0.55,−0.7, −1, 0.45, 0) T Investigating numerous sample sizes (n = 150, 300, 500) and h i ∈ {4, 5, 6}.The numbers h i are allowed to change across subjects.Let(k 4 , k 5 , k 6 ) =(card{i : h i = 4}, card{i : h i = 5}, card{i : h i = 6}) With n = 150, using (k 4 , k 5 , k 6 ) = (60, 50, 40).When n = 300, performing (k 4 , k 5 , k 6 ) = (120, 100, 80) and with n = 500, choosing (k 4 , k 5 , k 6 ) = (200, 170, 130).Utilizing above values, the average proportion of zero-ination in our data set is 25%.The number of repetitions in simulation is chosen N = 5000 times and gure out the maximum likelihood estimation (MLE) γn = αT n , βT n T In this study, we execute two approaches: the Newton-Raphson method and maxLik function to estimate parameters.These results are provided in Tab. 1 and Tab. 2, respectively (in Appendix).It can seen be that, the biases of estimators are very small, the values of SD and ASE are very close and the values of CP are very close to 0.95.These prove that our estimated results are very trustworthy.In addition, it has been seen that the bias, SE, SD, and l(CI) of all estimators decrease as the sample size increases.Furthermore, it can be seen that the normal Q -Q plots are provided in Figs.1-4 (in Appendix) that the Gaussian approximation of the distribution of the MLE in the zero-inated Binomial (ZIB) regression model is reasonably satised.About the results, the authors in article of Diallo et al. (2017) have executed a maxLik function to study simulation.The results in this paper is performed by utilizing two approaches: maxLik function and N-R method.It can be observed from the above results of two approaches most are the same.We employed the HP desktop computer is congured with Intel Core i5, 8GB of RAM, 1TB of hard drive to check the time to run code to output the result of two approaches.To obtain the above results, it takes 60 minutes for maxLik function while the Newton-Raphson method is only 30 minutes.Thus the Newton-Raphson algorithm provides the results is faster than the maxLik function.observed that, in general, maxLik function and some available functions in the statistical software only can be performed to get the optimization root in case of the data set with no missing values.Furthermore, its structure is easier than the N-R algorithm.Notwithstanding, the N-R algorithm is a robust apparatus to get the optimization root and to estimate parameters in regression models.It can be executed if our data set contain missing values that some available functions in the statistical software are unworkable.These functions only can perform if our data set with no missing values (the validation data set (η = 1), they can not execute in case of the non-validation data set (η = 0)).In c 2018 Journal of Advanced Engineering and Computation (JAEC) the meantime, the Newton-Raphson algorithm can employ in all situations.For the results, the authors in article of Diallo et al. (2017) have performed a maxLik function to study simulation.The results in this article is employed by utilizing two approaches: maxLik function and N-R algorithm.It can be observed from the results of two approaches in Section 4 most are the same.For the time to run code to output these results of two approaches.It takes 60 minutes for maxLik function while the Newton-Raphson algorithm is only 30 minutes.