Beyond Newton: a new root-finding fixed-point iteration for nonlinear equations

Finding roots of equations is at the heart of most computational science. A well-known and widely used iterative algorithm is the Newton's method. However, its convergence depends heavily on the initial guess, with poor choices often leading to slow convergence or even divergence. In this paper, we present a new class of methods that improve upon the classical Newton's method. The key idea behind the new approach is to develop a relatively simple multiplicative transformation of the original equations, which leads to a significant reduction in nonlinearities, thereby alleviating the limitations of the Newton's method. Based on this idea, we propose two novel classes of methods and present their application to several mathematical functions (real, complex, and vector). Across all examples, our numerical experiments suggest that the new methods converge for a significantly wider range of initial guesses with minimal increase in computational cost. Given the ubiquity of Newton's method, an improvement in its applicability and convergence is a significant step forward, and will reduce computation times several-folds across many disciplines. Additionally, this multiplicative transformation may improve other techniques where a linear approximation is used.


I. INTRODUCTION
Newton-Raphson, or often referred to as Newton's, fixed-point iteration method has been the gold-standard for numerically solving equations for several centuries. In order to set the symbols and nomenclature, we start by defining a generic problem. Find x = x * such that r(x * ) = 0 (1) for a given function r : K → K (K = C or R). In general, fixed-point iteration methods start with a guess to the solution x n = x 0 and iteratively update it using where φ(x) depends on r(x) and the chosen numerical scheme. It is required that x n → x * as n → ∞ for the numerical scheme to converge, and, in practice, the fastest possible convergence is desired. If we expand r(x) in finite Taylor's series up to second order about point x n and evaluate it at x = x * to solve r(x * ) = 0, we get r(x * ) = r(x n ) + dr dx xn (x * − x n ) + 1 2 for some ζ ∈ [x n , x * ]. The last term is called the remainder and can be written in other forms [1]. Here, ζ is unknown. Therefore, in the standard Newton's method, we neglect the second order term and approximate x * with an updated guess to the solution: x n+1 . Thus, we obtain the Newton's fixed-point iteration, also called Newton-Raphson method (using notation r = dr/dx): Using Eq. (3) and Eq. (4), one can rearrange to get the evolution of error e n = |x * − x n |: The above step shows (at least) quadratic convergence of the Newton's method when r(x) has only simple roots. This quadratic convergence has led to a wide adoption of Newton's method in solving problems in all scientific disciplines. Furthermore, this attractive property has also led to a large amount of work towards developing root-finding algorithms that either imitate or approximate the Newton's method. Despite the aforementioned attractive convergence of the Newton's method, it can be seen from Eq. (5) that to achieve quadratic convergence, the current guess x n must be "sufficiently close" to the solution. When the initial guess is not close to the solution, the convergence can be slower than quadratic, or, worse, the iterations may not converge, oscillate, or diverge.
As an example, if we consider r(x) = e x − 500 and solve it using Eq. (4) with x 0 = 0, iterations diverge until we get numerical overflow (Fig. 1, solid red line). Instead, if we start with x 0 = 3, the error decreases sub-quadratically before achieving quadratic convergence closer to the solution ( Fig. 1, dashed red line).

II. METHODS AND RESULTS
The basin of attraction for a root x * is defined as the set of initial guesses from which a given fixed-point iteration converges to x * (this can be seen as the domain of convergence for initial guesses). Naturally, it is desirable to have a large basin of attraction for the root so that convergence is achieved for a wide range of initial guesses. Even though, for a general r(x), determination of the size of the basin of attraction is challenging, it is clear from Eq. (5) that size of the basin depends on a measure of nonlinearity 1 N (ζ, x n ) := r (ζ) 2r (x n ) .
We pose a question: can we pre-multiply the original equation by a function so that we obtain a larger (or, if possible, infinite) basin of attraction? That is, instead of solving Eq. (1), can we solve still using Newton's method ? The idea is to choose a P (x) that decreases the nonlinearity N and, hence, gives a larger basin of attraction, while retaining at least quadratic convergence close to the root. For Eq. (6), the measure of nonlinearity is We note that if P (x) is not a function of x, we do not change the nonlinearity of the problem, and this case would fall under the purview of the highly developed field of linear preconditioning. In an attempt to minimize N , Eq. (7), we equate N = 0, which gives for arbitrary integration constants c 1 and c 2 . However, this has two problems. Multiplying the above form of P (x) by r(x): 1. eliminates the original root x = x * and 2. introduces a new root x = c 2 /c 1 .
In order to to solve these problems, we add another constant κ = 0 in the denominator of P (x) in Eq. (8) This avoids the elimination of the x * root. In addition, we eliminate the undesirable new root x = c 2 /c 1 by choosing κ such that r(c 2 /c 1 ) + κ = 0. That is, κ = −r(c 2 /c 1 ). Furthermore, c 1 = 0 makes c 2 /c 1 indeterminate and any value c 1 = 0 only scales the equation with a constant without affecting its nonlinearity. Therefore, without loss of generality, we set c 1 = 1 and c 2 = c. That is, Thus instead of applying Newton's method to r(x) = 0, if we apply Newton's method to we get a new fixed-point iteration The superscript EN denotes "Extended Newton" method.

Choice of c
The new fixed-point iteration Eq. (12) contains an arbitrary constant c, which remains to be determined. We note that if c = x * such that r(c) = 0, we have x n+1 = x n + ∆x EN = x n + (c − x n ) = c = x * . That is, we will find the solution in a single iteration irrespective of the starting point x 0 and function r. This is not surprising; if one could chose c = x * , the solution is already known, making its basin of attraction infinite.
However, it is not clear how the distance of c from the solution will affect convergence.
Even though one could study the effect of c on the nonlinearity by assessing N (ζ, x n ) from Eq. Nevertheless, we look at the limiting case when c → x n . Rewriting Eq. (12) and taking the limit we get This gives us a new fixed-point iteration where the superscript CN denotes "Corrected Newton" method (the choice of this name will become clear shortly). The advantage of this fixed-point iteration is that there is no arbitrary constant involved, however we have to pay with the price of calculating the second derivative of r(x) 2 .
To analyze Eq. (14) further, we re-write the Taylor's expansion Eq. (3) with ζ = x n (thereby neglecting the remainder term of third order) and ∆x = x n+1 − x n as If we substitute ∆x within the parentheses as the one from standard Newton Eq. (4) and rearrange, we get Eq. (14). That is, Eq. (14) can be thought of as a correction to the Newton's equation using second derivative r (x). Thus, we can re-write Eq. (14) as where ∆x N is obtained from standard Newton's method. While the above form is obtained with the aim of expanding the basin of attraction, we note that the Eq. (18) is identical to the Halley's method with cubic order convergence close to the root [3]. This observation will be useful in developing the extension of the method to vector functions.
We solve the problem e x −H = 0 for varying values of c, x 0 , and H and find a significantly improved convergence using the proposed fixed-point iterations (Fig. 2). In order to verify the generality of the new schemes, we solve several different nonlinear equations and provide the results in the Supporting information (SI).

Complex Plane
The extension to functions on the complex plane is straightforward, and the same equations can be applied. Applying Newton's method to r(z) = z 3 − 1 gives us a Newton fractal, and the Extended Newton and Corrected Newton methods modify that fractal (Fig. 3). 2 The same limit on our modified equation Eq. (11) gives If we apply Newton's method to the above equation, we get There is a factor of 2 between ∆x CN and Eq. (16). Numerically, we verified that Eq. (14) performs better than Eq. (16).
where all m unknowns are collectively written as x. Applying the standard Newton's method gives us a system of m linear equations to be solved to obtain the step ∆x N Iterations to converge Converged root where r i,j = ∂r i /∂x j and we have dropped the subscript n to denote the current iteration value of x as that is implied.
There could be several different way of extending Eq. (11) to this case of multiple un-knowns. If it is possible to recognize a subset of the equations that are most nonlinear, one may use Extended Newton method for that subset of equations and standard Newton's method for the rest. Here, we consider only the case where all equations are transformed.
Thus, one straightforward extension is the following modified set of equations: for chosen values of c i (i = 1, . . . , m), and x ci is the vector x with i-th element replaced by c i . Accordingly, applying the Newton's method we get our new system of linear equations where δ ij = 1 when i = j and 0 otherwise. For real or complex functions with single unknown, we have to calculate r(c) only once and there is no need for computing r (c).
However, in the above extension to multiple unknowns, r i (x ci ) needs to be evaluated at every iteration along with r i,j (x ci ) as well. Thus, the Extended Newton method for the case with multiple unknowns leads to a cumbersome set of calculations.
On the other hand, extending the Corrected Newton method, Eq. (18), to multiple unknowns is simpler: Using an approximation that r i,i r i,jk ≈ r i,k r i,ji and utilizing Eq. (20), we get a simplified This equation gives us the step ∆x QCN j by solving only one system of m equations, and we call it a quasi-Corrected Newton method (QCN). In numerical methods, computing the Jacobian r i,j is common, but calculating another gradient r i,jk for Eq. (23) may not be desirable. However, in Eq. (25), we note that if we have an expression for r i,j , r i,ji can be computed using finite difference in only O(m) operations, making this approach highly attractive.
As an extension of the exponential example from single variable system, we solve the following two equations: This represents two springs in series with one end fixed and the other end under force H, where each spring's force is nonlinearly related to its change in length as e ∆l − 1. Similar to the single equation, the standard Newton's method fails to converge for initial guess x 1 = x 2 = 0, whereas the Extended Newton (for c 2 < −0.5) and Corrected Newton methods converge in less than 10 iterations (Fig. 4). As another example, we consider the minimization of Easom's function with two variables: For this function with a single minimum at (0, 0), the basin of attraction is relatively small using the standard Newton's method. Our proposed methods increase that basin significantly (Fig. 5). Surprisingly, the improvement using quasi-Corrected Newton method is even greater than the Corrected Newton method. Furthermore, the Extended Newton method can provide an even larger basin of attraction if c is chosen close enough to the solution ( Fig. 5 bottom row). A summary of proposed methods is provided in Table 1.

DISCUSSION
Newton's method is one of the most widely used methods for finding roots in science and engineering. This is particularly true when the computation of the Jacobian is inexpensive. In the opposite case, significant amount of work has focused on developing methods that approximate the Newton's method, largely through approximations of r i,j leading to the so-called quasi-Newton methods, so that convergence properties similar to the Newton's method can be achieved with reduced computational expense [4]. On the other hand, different methods have been proposed that achieve higher than quadratic convergence rates using higher order derivatives. For example, Halley's method, Cauchy's method, Chebyshev method [3], class of Householder method [5], and other variants [6][7][8][9]. However, the convergence of all of these methods is examined when the starting guess is close to the solution.
The objective of this article is to go beyond the classical approaches. The central idea is to decrease the system nonlinearity in view of improving convergence properties far from the solution. Starting with one variable and one equation, we premultiplied the original equation and formulated a new equation with the same roots (Eq. 11). Applying Newton's method to this equation, we proposed a new fixed-point iteration, which we term as the Extended Newton (EN) method Eq. (11). This method has an arbitrary constant c, and, in fact, the choice of c makes Extended Newton method a family of fixed-point iterations. Using several example functions, we found that the Extended Newton method, unlike standard Newton's method, can find the solution even when the starting point is considerably far from the solution. We found that irrespective of the choice of c, the performance of EN was always better than the standard Newton's method. It is also important to recognize that even if we choose c close to the starting guess, the convergence is greatly improved. Therefore, when no prior information on the choice of c is available, we recommend choosing c = x 0 + for small .

Method
Scalar function update Vector function update ∆x CN = − r(xn)/r (xn) 1−r(xn)r (xn)/2r (xn) 2 We note that although we used a constant value of c during iteration, there are other options of choosing its value at every iteration based on the current guess x n , previous guess x n−1 , and/or the function form r(x). In fact, if we choose c = x n−1 , the EN can be thought of as a combination of secant method and Newton's method. As a limit case c → x n , we rediscovered Halley's method Eq. (18), which we call the Corrected Newton (CN) method.
While Halley's method was motivated by achieving cubic convergence, our approach has been motivated by increase in the size of the basin of attraction. It is, therefore, remarkable that both approaches lead to identical algorithm. Thus, one contribution of this work is to elucidate that Halley's method not only provides cubic convergence but also results in a larger basin of attraction. We also note that the Extended Newton method provides at least quadratic convergence close to the root, because it is a Newton's method applied to P (x)r(x). In this sense, the EN method is no worse that Newton's method while providing a larger basin of attraction. The advantage of the Corrected Newton over Extended Newton method is that there is no arbitrary constant to choose; however, it requires the calculation of the second derivative r (x). With symbolic computation [10] and automatic differentiation [11] becoming commonplace, this limitation is becoming less important. The underlying reason for this remains unclear, especially because the application of this method to Easom's function did not show any such behavior.
On the other hand, the extension of Corrected Newton method to system of equations was straightforward and unique (Eq. 23). Even though this approach worked perfectly, it has two computational disadvantages: calculation and storage of r i,jk with m 3 terms and solving two linear systems -first for the standard Newton (Eq. 20) and then for the correction (Eq. 23).
As a simplification, we proposed its approximation (Eq. 25), which we call quasi-Corrected Newton method. This approach only requires calculation of one diagonal of r i,jk , i.e. r i,ij and solving only one linear system of equations. The name quasi-Corrected Newton method is inspired from multitude of quasi-Newton algorithms that are used to imitate Newton's method while decreasing the computational requirements. We note that even if explicit expressions for r i,jk may be unavailable or cumbersome to calculate, r i,ij can be calculated using finite difference in only O(m) operations, and can be easily parallelized (unlike the iterations of Newton's method).
To our knowledge, the quasi-Corrected Newton method has no hitherto reported counterpart. We believe that one of the reasons why these higher-order methods (Halley's method, Householder's, or other higher-order methods) are not widely used in science and engineering is that the faster convergence rates of these methods are usually offset by their higher computational cost at each iteration. In this respect, the quasi-Corrected Newton method provides a good balance between improved convergence properties and computational expense. In addition, the larger basin of attraction also outweighs the small extra computational cost of these proposed methods.
Among the different methods proposed, EN is associated with the least computational expense, and in many cases outperforms the Corrected Newton method, provided a good values of c is available (see SI). We believe that in the future, different scientific communities will report optimal ranges of c, obtained through numerical analysis and/or experimentation, for specific problems of wide interest. We foresee a large impact of the proposed methods in numerical solution of partial differential equations, where the nonlinearities only allow a step-by-step increase in boundary conditions using the classical Newton's method post discretization (a common example being the nonlinear finite element analysis in structural mechanics where the applied loads can only be increased incrementally in the classical framework). The new methods will potentially allow for significantly larger step sizes, similar to those reported in [12], thereby substantially reducing the solution times.
We believe there are many opportunities of extending this approach to other problems and end our discussion by mentioning a few potential avenues. For extending the EN method to vector functions, there are other options as well, which need to be explored in the future.
Similarly, simplifications of CN method to give quasi-Corrected Newton methods (or quasi-Halley's methods) would create a new area of research. In general, the extension of such root-finding methods to multiple-variables is a broad area, which also takes into account additional factors, such as sparsity and symmetry of the system, into account. The idea of multiplying with a function of x instead of a constant, can be used in the development of preconditioners. Unlike linear preconditioners, which are well-understood, this will lead to nonlinear preconditioning, which is a relatively under-explored area [13].
Application of a similar approach to nonlinear least squares fitting (or regression) will likely improve the fitting procedure, which is widely used in parameter estimation. Similarly, in optimization problems with multiple unknowns, a line search is commonly used. This approach could be used to improve the convergence of line search, which is akin to solving a single unknown equation. There is also the possibility of extending similar ideas to solving nonlinear dynamics, which is usually done by numerical time integration schemes. It is wellknown that an accurate and stable solution of stiff ordinary differential equations, owing to their nonlinearity and coupling of different time scales, requires extremely small time steps. A similar approach of reducing their nonlinearity may be used to improve the time integration methods so that larger time steps can be used.
It took several decades to develop methods based on the classical Newton's method. We anticipate that the new approach presented in this work, which resulted in some known