A Novel Recurrent Neural Network-Based Ultra-Fast, Robust, and Scalable Solver for Inverting a “Time-Varying Matrix”

The concept presented in this paper is based on previous dynamical methods to realize a time-varying matrix inversion. It is essentially a set of coupled ordinary differential equations (ODEs) which does indeed constitute a recurrent neural network (RNN) model. The coupled ODEs constitute a universal modeling framework for realizing a matrix inversion provided the matrix is invertible. The proposed model does converge to the inverted matrix if the matrix is invertible, otherwise it converges to an approximated inverse. Although various methods exist to solve a matrix inversion in various areas of science and engineering, most of them do assume that either the time-varying matrix inversion is free of noise or they involve a denoising module before starting the matrix inversion computation. However, in the practice, the noise presence issue is a very serious problem. Also, the denoising process is computationally expensive and can lead to a violation of the real-time property of the system. Hence, the search for a new ‘matrix inversion’ solving method inherently integrating noise-cancelling is highly demanded. In this paper, a new combined/extended method for time-varying matrix inversion is proposed and investigated. The proposed method is extending both the gradient neural network (GNN) and the Zhang neural network (ZNN) concepts. Our new model has proven that it has exponential stability according to Lyapunov theory. Furthermore, when compared to the other previous related methods (namely GNN, ZNN, Chen neural network, and integration-enhanced Zhang neural network or IEZNN) it has a much better theoretical convergence speed. To finish, all named models (the new one versus the old ones) are compared through practical examples and both their respective convergence and error rates are measured. It is shown/observed that the novel/proposed method has a better practical convergence rate when compared to the other models. Regarding the amount of noise, it is proven that there is a very good approximation of the matrix inverse even in the presence of noise.


Introduction
Matrix inversion is extensively used in linear algebra (e.g., for solving linear equations). Although matrix inversion is already referred to in very ancient books, tremendous attention has been devoted to it (by scientists) mainly since the 17th century. The interest devoted to matrix inversion has led to the development of various methods, concepts, and algorithms for solving linear equations [1]. Solving matrix inversion is very useful in engineering, physics, and other natural sciences [2]. Solving a real-time/online matrix inversion is part of mathematics and control theory. It finds important applications in various areas such as traffic simulation and/or online control in the frame of intelligent Sensors 2019, 19, 4002 2 of 20 transportation systems, robotics (e.g., for kinematics and inverse kinematics), communications [3], machine learning [4], smart/complex antennas (MIMO) [5][6][7], Field Programmable Gate Array (FPGA) [8,9], signal processing [10], image processing [11,12], and robotics [13][14][15], etc.
Also, the matrix inversion is very useful in several decision-making algorithms. There are plenty of different heuristics and goal-programming algorithms which are using linear relationships to solve problems and they often try to solve those problems via matrix inversion; this is the case, for example, in social media networks where it can help to speed up the ranking amongst different categories [16,17].
Matrix inversion is further widely used and is of critical importance in various processing contexts in smart sensors. In some cases, the accuracy of certain measurements can be significantly improved by involving matrix inversion. For example, a real-time matrix inversion is used to ensure high-precision in multi parameter sensing while using the so-called fiber Bragg grating (FBG)-based sensors [18]. Thereby, in that related approach, the sensor functionality is approximated with a linear system and the problem is thus solved through linear system of equations. Also, this last-mentioned approximation has other advantages like the capability of re-constructing lost information, a capability used, for example, in the context of so-called compressed sensing [19,20].
In general, the measurements of different physical quantities related to a dynamical system can be explained as linear measurement equations or multi-variate sensing processes [21]. The measurement of those/such mentioned quantities often requires a real-time computing of matrix inversions. Further, the relationship between different measurements can also be used for calibrating purposes for the related sensors [22].
Finally, matrix inversion used in sensors related processing processes does also provide the capability to significantly reduce noise in measurements. For example, just for illustration, matrix inversion is helping to ignore noisy measurement in spatially distributed sensors [23].
To fulfil the expected role in various applications, different methods have been developed to achieve both fast convergence and higher accuracy of the matrix inversion related calculations. Some of the most famous methods are the following: elimination of variables, Gaussian elimination (also known as row reduction), lower-upper (LU) decomposition, Newton's method, eigenvalue decomposition, Cholesky decomposition, and many other methods.
The first group encompasses methods like Gauss-Seidel or gradient descent. The initial condition (or starting point) is provided and each step uses the last value to calculate the new value. In each iteration, the solution approximation becomes better until the desired accuracy is reached [27,28].
On the other hand, direct methods like Cholesky or Gaussian elimination typically compute the solution in finite number of iterations. They can find the exact solution if there exists no rounding error [29]. Those analytical methods normally have a minimum arithmetic complexity of O(n 3 ) (provided those algorithms are implemented on a single CPU). For parallel systems, this arithmetic complexity is different, as the algorithm should be changed in a way to fit for efficient processing on a parallel system architecture. Hereby, depending on the parallelizability potential of the algorithm, the present number of parallel cores will affect the final effective complexity. However, in real benchmarking implementations, most parallel implementations of algorithms do require the transfer of large amounts of data amongst processors and thereby this communication need does lead to a loss of efficiency of the algorithm with respect to speeding-up capability in presence of multiple cores. Therefore, implementing the same algorithm on a parallel system is very inefficient (i.e., speed-up). On the other hand, in practice, the noise problem is a very serious problem and the related denoising process is an expensive one, which can provoke a violation of the systems' real-time property. The need for developing a new concept to solve this concern is therefore sufficiently justified. For answering the above described parallelizability problem, several parallel concepts have been introduced, one good example being the so-called cellular neural network (CNN). It has been introduced by Leon Chua in 1988 in the seminal paper entitled: "Cellular Neural Networks: Theory" [30]. In 1993, Sensors 2019, 19, 4002 3 of 20 a further article entitled: "The CNN Universal Machine: An Analogic Array Computer", was published by Tamas Roska and Leon Chua, where the first analog CNN processor was presented [31]. Since then many different articles have been published to show the applicability of CNN processors for various usages.
In this article, we use a more generic concept which is called recurrent neural network (RNN), which is basically a more general neural network family to which CNN belongs. Essentially, they (i.e., RNN) are building elements of the so-called dynamic neural network (DNN) [32]. The parallel computing nature of RNN and the inherent fast processing speed while solving problems make it a very good basic brick of a novel concept for efficiently solving differential problems on multiple cores [33].
An RNN processor, similarly to its ancestors (artificial neural networks), is a set of cells, which do have mathematical relations (i.e., coupling) with their respective neighbors. The dynamic property of each cell is expressed by one differential equation. Furthermore, the dynamic property of the network can be customized for solving a given ODE (ordinary differential equation) by appropriate values of ODE's parameter settings (or coefficients) expressed in the form of matrices called "templates".
Playing with templates does provide the possibility to solve different kind of problems without changing the physical property and/or the architecture of the RNN machine or processor.
The inherent flexibility and the fast processing speed while solving problems with RNN does provide two advantages to solve problems when compared to traditional or competing computational methods or concepts [34][35][36]. First, we can solve different kind of problems by changing templates. Second, the problem solving can be significantly accelerated (see speed-up) without losing accuracy.
The above-mentioned good features of RNN (e.g., flexibility, speed-up (in presence of multiple cores), etc.) motivate the use of this promising paradigm for an accelerated solving of linear algebraic equations on either one-core or multi-core platforms. This is not a simple conversion as we do face completely different hardware computing frameworks with different parameters to be taken care of. All RNN parameters should be adjusted such that the resulting new dynamical system does converge to the target solution of the problem.
In this paper, we do need to formulate the central problem (i.e., realizing a "matrix inversion") in a manner such that the final state of the RNN system is the solution of the target linear algebraic system of equations; see Equation (1).
In more detail, we define the linear algebraic equations system as Equation (1). Hereby, M(t) is a n × n matrix of smooth time varying real numbers and X is a n × n matrix, I is identity matrix of size n × n. We would like to find X such that Equation (1) is satisfied. Our target is to find corresponding RNN templates, which are such that for any initial value problem (IVP), after a finite number of iterations, our dynamic system (see Equation (2)) converges to a solution X * , which approximates X.
In Equation (2), is showing an ODE in a general form. The second part of Equation (2) is showing our problem that we wish to convert into the form of an ODE solving.
There exist several published works, in which one has tried to solve similar problems with dynamical systems like recurrent neural networks (RNN) [13,[37][38][39][40][41][42][43] or artificial neural networks (ANN) [44][45][46][47], etc. Although most of those works are also related to RNN, our work and the novel concepts presented in this paper have more potential to reliably provide faster convergence towards the solution of Equation (1).
One of the main differences between our method and other related works, especially RNN concepts, is that the proposed model does need training like this required for most of RNN networks required. Our model is converging to the problem's solution if and only if a correct selection/setting of internal dynamic parameters is done. Therefore, this functionality makes our model completely unique amongst other types of RNN models, which are usually used in machine learning [48].
For training, most of the published related works use common traditional methods like gradient descent [39] to create dynamical systems and do then customize them for an implementation on a target platform.
In this paper we do compare the performance of the own novel method developed with those already implemented traditional methods from the relevant literature. This does provide the basis for a fair judgement of both advantages and disadvantages of each method, old ones versus our new one.
The overall structure of this paper is as follows. Section 2 contains both background and a comprehensive overview of previous dynamic system models used for matrix inversion. Besides, related requirements with respect to the target platform for implementing the dynamic system (RNN, ANN . . . ) are discussed. In Section 3, we do formulate our problem (i.e., inverting a time-varying matrix) with required restrictions related to RNN. The effectiveness of the developed model for solving the matrix inversion problem will be demonstrated in Section 4. The main proof and interesting result are the one showing the global (fast) convergence of the RNN-based dynamic system to the final solution of Equation (1).
In Section 5, the new method will be used to create a new dynamical system, i.e., an RNN model. The performance of this novel RNN will be compared to that of previous/original concepts.
In the last section (i.e., Section 7), some concluding remarks summarize the quintessence of the key achievements in this work.

Related Works of Dynamical Neural Networks
Solving Equation (1) using any dynamic method like the so-called dynamic neural network (DNN) requires creating a dynamic system with an attractor in that system (if and only if Equation (1) has a real solution), which is a fix point equal to the solution of Equation (1). Such systems can be implemented by different ways, but the most famous way of creating such dynamic systems is to use either the gradient descent method, the Zhang dynamics, or the Chen dynamics. All three named methods do essentially use the same concept, but both the second and the third named methods do use smaller step sizes and therefore more memory is required. This can improve/speed-up the convergence of the algorithms. We will be providing (in the next sections) a full explanation of the advantages of each method; afterwards a generic new method for solving Equation (1) will be provided.

The Gradient Method
This method involves a first-order optimization technique for finding the minimum point of a function. This technique takes steps in the direction of the negative gradient. It is based on the observation of where the multivariate function F(X) decreases the fastest if we choose the negative gradient of F(X) at point a: −∇F(a). b = a − γ∇F(a), ∀γ ∈ R and γ > 0 γ is the step size for updating decision neurons. For small γ, F(a) ≥ F(b); with this remark, we can extend the observations by Equation (4).
That means, by iterating Equation (4), F(X) will be converging to the minimum point of the function F.
The gradient descent (see Equation (5)) can also be described as the "Euler method" for solving ordinary differential equations.
. Gradient descent can be used to solve linear equations. In that case, the problem is reformulated as the quadratic minimization of a function which is then solving it with gradient descent method.
Then, based on Equation (6), we have: By substituting Equation (7) into Equation (5) the following dynamic model is obtained: .
The exact analytical solution of Equation (8) is expressed as the following: Whereby the first derivative of X(t) denoted by . X(t) will be: .
Equations (8) and (1) converge to the same solution described by the point M −1 .
∀γ, a i,j ∈ R and γ > 0, then γMM T > 0 This method is also called "gradient-based" dynamics. It can be designed by norm-based energy functions [45,49]. The advantage of this model is its easiness of implementation, but due to the effective factor of convergence which can be seen in Equation (9) it takes (a long) time to converge to the solution of the problem, and this convergence rate has an effect on noise sensitivity of the model as it makes the model more sensitive to the noises. Also, this model is not appropriate for real-time matrixes.

Zhang Dynamics
There exists another method to create [13,50] a dynamic system converging to the required solution. In this method, the error function is defined in the following way: This means the error will be increasing proportionally to the amount of divergence of X from its true solution. Larger values of the difference will create larger values of the error.
Equation (11) is changing over time in a way such that the result will be the same as Equation (2). This requires that one moves against errors to come closer to the solution. Therefore, one can define the derivation of this function as following: .
The solution of Equation (12) can be expressed as Equation (13). This equation is showing how the error function is decreasing exponentially in a reverse direction to the errors. This will force the function to converge to the solution. Using this dynamic system is helping to create a new dynamic system for our problem. By substitution of E(t) and E (t) into Equation (12) we will obtain a new dynamic system [11].
Whereby the solution of Equation (14) expressed as follows: When we make a first derivation of Equation (15) we will have: .
By combining Equations (15) and (16) we derive Equation (14). It is observed again that by increasing t to infinite our dynamic system will be converging to the solution of Equation (1). This model has a very good convergence rate and this will solve the problem of noise sensitivity. On the other hand, it is a model which is much more difficult to implement.

Chen Dynamics
This method is a combination of the two previously described methods. It assumes matrix M is not changing during time; therefore, the time-derivation of M is zero. If we multiply Equation (8) by M and then sum it with Equation (14) we will obtain a new dynamic system [39]: Solving Equation (17) lead to the following solution: A derivation of Equation (18) leads to following dynamical system: .
By combining Equation (17) in Equation (18) leads to Equation (19). If M T M > 0 we can state and confirm that this method has a better convergence rate than the first and second above mentioned ones [39]. As we see in Equation (19) t is multiplied with (M T M + I) in the exponent term, which produces a larger exponent value than the ones for the previous two methods. This model is better than the previous model. It has a very good convergence rate and its sensitivity to noise is very low. On the other hand, its implementation is difficult as it has more coefficient to calculate with respect to the other methods and it does not fit for a real-time matrix inversion (i.e., for inverting a time-varying matrix).

Summary of the Main Previous/Traditional Methods
By comparing the properties of the above presented methods, there is one big difference amongst them. Table 1 shows the major differences between those models by using 4 different criteria. The convergence rate refers to the convergence during time during the solving process. The "time-varying time matrix inversion" criteria refers to the ability of a given model to be usable for the case of the inversion of a time-varying matrix. The "implementation" criteria refer to how easy it is to implement a given model on RNN machines/processors. And the last criteria in Table 1 refers to how far a given model is sensitive to noise that is present in the time-varying matrix values. This last-mentioned criterion is very important because it does express the resilience of a given model to noise, which is always present in analog computing signals. Although noise is relatively low in digital systems, digital systems do however introduce a noise-equivalent signal distortion originating from computational rounding of numbers as they are digitally represented and processed with/in a fix-size digital arithmetic.
The Zhang model does have a fixed convergence rate over to time. But the gradient descent model does contain a coefficient which can be changed to influence (increase or decrease) the convergence rate. However, the Chen model is much better than the two previous ones as it does potentially provide a much better convergence rate.
The convergence rate has a direct impact on the noise sensitivity of a given model. Indeed, an increase of the convergence rate does decrease the noise sensitivity. Therefore, the Chen model has highest level of stability with respect to noise, although it is the more complex to implement.
Furthermore, amongst the 3 models listed in Table 1, only the Zhang model offers the capability of solving a time-varying matrix (i.e., a real-time matrix inversion).

Our Concept: The Novel RNN Method
According to Chen [39], his model is converging to the solution of Equation (1) under any initial value. One can however re-formulate the Chen model [39] as result of following goal function: We can add another positive statement to Equation (20); thus, we multiply the last term of Equation (20), i.e., the term ((AX − I) 2 ), with the matrix A and we add it again to the function Z.
After adding that new term to the right side of Equation (20) and solving Z (according to Equation (21), one does find/obtain the following dynamical system: The solution of this equation (see Equation (22)) can be expressed as follows: In Equation (23), when times goes to infinite the limit of X will converge to M(t) −1 and it thereby provides the solution for Equation (1). C is a constant value (a matrix), which is added during/while solving Equation (22); see Table 2 for an illustration. Newly added terms M T M 2 + M T M produce a better convergence rate. The main reason of this convergence rate is the positive value of the integral and it provides additional factors when compared to the previous time-varying models. Therefore, by adding more coefficients to the right-hand side of Equation (22), we can obtain the following equation: Equation (24) is more general and can create the model of Equation (22), which is a specific configuration of it. For this, we just need to set the value of parameter n to 2. Theorem 1. For any given nonsingular matrix M ∈ R n×n , the state matrix X(t) ∈ R n×n , while starting from any (initial value problem) IVP X(0) ∈ R n×n , Equation (24) will achieve global convergence to X * (t) = M −1 (t).
Proof of Theorem 1. Let define E(t) = X(t) − X * (t) be the error value during the process for finding the solution. If this equation is multiplied by M, it does lead to Thus, a derivation of the error function will lead to M . (24) we obtain the following expression:

MX(t). By replacing this in Equation
Let us define the Lyapunov function (t) = ME(t), which is always a positive function. The derivative of this function can be obtained as follows: By replacing Equation (25) into Equation (26) it leads to the following: Hence: One can replace middle term n i=0 M T M i with large enough value of µ therefore: Thus, it appears that . (t) is always negative; furthers, . (t) = 0 if and only if X(t) = X(t) * is satisfied. Therefore, our differential equation globally converges towards a point (matrix), which is the equilibrium point for this function.
Equation (30) is the result of an analytical solving of Equation (24). Increasing t in this equation will lead to the solution of the algebraic equation (i.e., of Equation (1)).
In this equation, C is a constant value (matrix) and it is added during solving differential equation. Obviously, this equation has a much better rate of convergence when compared to the previous implementations and, like previous solutions/concepts, the minimum value of the eigenvectors of M T M should be positive.
Also, according to the Chen model, if Equation (24) is extended by introducing a monotonically increasing function F where F(0) = 0, here again our system will be converged to solution of

MX
(31) Theorem 2. For any given nonsingular matrix M ∈ R n×n , the state matrix X(t) ∈ R n×n , while starting from any IVP (initial value problem) X(0) ∈ R n×n and with a monotonically increasing function F where F(0) = 0, Equation (31) will achieve global convergence to X * (t) = M −1 (t).  (31), we obtain the following expression:

MX(t). By replacing this in Equation
Let's define the Lyapunov function (t) = ME(t), which is always a positive function. The derivative of this function can be obtained as follows: By replacing Equation (33) into Equation (32) it does lead to: Hence: One can replace middle term n i=0 M T M i with large enough value of µ therefore: In the last equation, E(t) T M T F(ME(t)) is always positive because if ME(t) becomes negative, F(ME(t)) also becomes negative and vise-versa.
Thus, it appears that . (t) is always negative; and . (t) = 0 if and only if X(t) = X(t) * is satisfied. Therefore, our differential Equation (31) or (32) globally converges towards a point (matrix), which is the equilibrium point for this function.
By choosing different forms of the function F, one can create various dynamical properties to be expressed by this model.
Examples of functions for F are: sigmoid, linear, square, cubic, arcos, etc. All these functions are suitable to be used in Equation (31) as all of them are increasing monotonic functions and they do all satisfy the F(0) = 0 condition (see Figure 1).
By choosing different forms of the function F, one can create various dynamical properties to be expressed by this model.
Examples of functions for F are: sigmoid, linear, square, cubic, arcos, etc. All these functions are suitable to be used in Equation (30) as all of them are increasing monotonic functions and they do all satisfy the (0) = 0 condition.

Model Implementation in SIMULINK
Equation (24) or Equation (28) can be implemented directly into SIMULINK (see Figure 3). This dynamic model has the components shown in Table 2.
If M is not a time-varying matrix we can simply put zero values in the M' matrix, otherwise M' model should be a corresponding derivative of matrix M. Executing the model will result in the

Model Implementation in SIMULINK
Equation (24) or Equation (31) can be implemented directly into SIMULINK (see Figure 2). This dynamic model has the components shown in Table 2.
If M is not a time-varying matrix we can simply put zero values in the M' matrix, otherwise M' model should be a corresponding derivative of matrix M. Executing the model will result in the following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24).  (24). Note that the matrix to be inverted is M.

Illustrative Examples
In this section we consider some numerical examples to demonstrate the performance of our RNN implementation for solving a system of linear algebraic equations.

Illustrative Example 1
Let us define M and V as follows: Here, instead of using the identity matrix in Equation (1) Thus, one can find as following: This above given value of X is the target value (ground truth). In the following, we shall see how far how fast the models developed (Equation (24) and/or Equation (28)) do reach well this target value.
M is not singular and therefore can be inverted. Thus, our system of equations has one unique solution. C, as explained in the previous section, is a weight values-matrix which is calculated using the following formula: Increasing the value of n will increase the convergence rate of Equation (24) or Equation (29) or (29). Here, we choose the value n = 3. The corresponding RNN parameters are defined as follows (see The RNN block diagram corresponding to Equation (24). Note that the matrix to be inverted is M. following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24).  Table 2 (see  first row of Table 2), SIMULINK output.

Component Description
For the setting n= 2, we do have the following value for C: + + Note: If the matrix M is time-varying, C will no more be constant and will also be time-varying. following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24). Table 2 (see  first row of Table 2), SIMULINK output. Table 2. Components of SIMULINK model to implement Equation (24).

Component Description
For the setting n= 2, we do have the following value for C: + + Note: If the matrix M is time-varying, C will no more be constant and will also be time-varying. following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24).  Table 2 (see  first row of Table 2), SIMULINK output. Table 2. Components of SIMULINK model to implement Equation (24).

Component Description
For the setting n= 2, we do have the following value for C: + + Note: If the matrix M is time-varying, C will no more be constant and will also be time-varying. following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24).  Table 2 (see  first row of Table 2), SIMULINK output. Table 2. Components of SIMULINK model to implement Equation (24).

Component Description
For the setting n= 2, we do have the following value for C: + + Note: If the matrix M is time-varying, C will no more be constant and will also be time-varying. following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24).  Table 2 (see  first row of Table 2), SIMULINK output.

Time-derivative of M; if M is constant, just put zero values in the matrix M':
For the setting n= 2, we do have the following value for C: + + Note: If the matrix M is time-varying, C will no more be constant and will also be time-varying.  following output in SIMULINK, see Figure 2, Figure 3, and Table 2, which is the solution of Equation (1). Therefore, it works well as we expected and gives the solution of Equation (24).  Table 2 (see  first row of Table 2), SIMULINK output. Table 2. Components of SIMULINK model to implement Equation (24).

Component Description
For the setting n= 2, we do have the following value for C: + + Note: If the matrix M is time-varying, C will no more be constant and will also be time-varying.  Table 2 (see  first row of Table 2), SIMULINK output.

Illustrative Examples
In this section we consider some numerical examples to demonstrate the performance of our RNN implementation for solving a system of linear algebraic equations.

Illustrative Example 1
Let us define M and V as follows: Here, instead of using the identity matrix in Equation (1) Thus, one can find X as following: This above given value of X is the target value (ground truth). In the following, we shall see how far how fast the models developed (Equation (24) and/or Equation (31)) do reach well this target value.
M is not singular and therefore can be inverted. Thus, our system of equations has one unique solution. C, as explained in the previous section, is a weight values-matrix which is calculated using the following formula: Increasing the value of n will increase the convergence rate of Equation (24) or Equation (31) or (29). Here, we choose the value n = 3. The corresponding RNN parameters are defined as follows (see Figure 3): These parameters are implemented in the previously described model in Simulink (see Figure 3) for numerical simulations and F (for simplicity) is taken as a linear function (Figure 1).
This above obtained result shows that the output of our system model is very close to the analytical solution and the difference is small and approximately 0.001, which can furthermore be corrected but adjusting the step size. These parameters are implemented in the previously described model in Simulink (see Figure  3) for numerical simulations and F (for simplicity) is taken as a linear function (Figure 1). Figure 4 is showing the system simulation results during the time interval = [0, 0.5]. The output (value of state vector) at the end of this time interval is the following: This above obtained result shows that the output of our system model is very close to the analytical solution and the difference is small and approximately 0.001, which can furthermore be corrected but adjusting the step size.

Illustrative Example 2
Now we do the same experience for calculating the inverse of a time-varying matrix M. The matrix M is defined as follows: the matrix M is always invertible and it does satisfy the conditions of Equation (1). Therefore, we can use it for testing our dynamic system model.

Illustrative Example 2
Now we do the same experience for calculating the inverse of a time-varying matrix M. The matrix M is defined as follows: the matrix M is always invertible and it does satisfy the conditions of Equation (1). Therefore, we can use it for testing our dynamic system model.
Also, we define F as a linear function ( Figure 1). As we have explained previously (see Table 2), we calculate the weight C again for n = 3 as the following: Now all parameters are introduced in the Simulink model to simulate our system model (see Equation (24)). The simulation is done for t = [0, 2.0]. Figure 5 does contain benchmarking results for the first element of the time-varying matrix (i.e., the matrix element in first row, first column). Here (see Figure 5) the speed of convergence is compared to that of the original Zhang model. Thereby, for our model we consider different values of the parameter n. One can clearly see that our novel model strongly outperforms the original Zhang model. Table 3 shows the difference very clearly: Here, it is clear that by choosing n = 3 our model will converge to the problem solution 4 times faster than the ZNN model; notice that the ZNN model corresponds to n = 0 in Table 3. Amount of memory for saving this model is very small as we need to save main templates parameter of dynamic model in Equation (24) or Equation (31). Therefore, as see in Table 3, the memory usage is very small and it can be implemented on small computers.

Illustrative Example 3
In the last illustrative experiment, we try to test and analyze the effect of changing the function F in Equation (31) on the convergence rate of our model.
For this test, we take the same parameter setting as in the first experience (see illustrative example 1), but we change the function F and calculate coefficients for n = 3. The selected functions for this test are the following ones: linear function (Y = X), sigmoid function (Y = 1 2 (|X + 1| − |X − 1|), arctangent function (Y = arctan(X)), tangent hyperbolic function (Y = tan h(X)), and polynomial function (Y = X 3 ). All other parameters are fixed such to show the correct properties of the functions. As we can see in the results of this simulation experiment (see Figure 6) the polynomial function is showing the very best convergence amongst all considered functions. Therefore, it is evident that selecting the appropriate function can help our model to converge faster to the solution of the matrix inversion problem. It is also evident that higher values of n will perform much better. Table 4 is showing our simulation result until t = 0.05 for this illustrative example 3. It does clearly show the effect of using different functions on the system/model convergence. Both polynomial and linear functions are the best functions for this task, while both sigmoid and tanh functions are the worst functions to be used in this context for matrix inversion.   Figure 6. Showing differences in convergence speed while considering different function types for F in Equation (28). Please note that n is 3 for the polynomial function F.
The superiority with respect to convergence speed of the polynomial function F is very evident.

Comparison of Our Novel Method with Previous Studies
In this section we compare our novel model with the following related methods which are wellknown from the relevant literature: gradient descent method, Zhang dynamics, and Chen dynamics. Hereby, 1000 different static random matrices are generated for use in the experiment. Finally, we then sum up the results obtained as shown in Figure 7. Figure 7 does display how the error converges towards zero when the different models are executed over time; the intention is to hereby illustrate the convergence speed of all considered models. Hereby, the error function is calculated by using the formula ‖ − ‖.
By comparing the different methods with our method (Figure 7), the fastest convergence of our method to the exact solution is observed. Specifically, our method is at least 3 times faster than the Chen method while implemented on CPU (the results of Figure 7 were obtained on CPU). An implementation on a multiple-core platform should display a much higher relative and comparative speed-up performance of our model. Obviously, by using more coefficients (i.e., higher values of n) in Equation (24) we can reach a much better convergence rate. The superiority with respect to convergence speed of the polynomial function F is very evident.

Comparison of Our Novel Method with Previous Studies
In this section we compare our novel model with the following related methods which are well-known from the relevant literature: gradient descent method, Zhang dynamics, and Chen dynamics. Hereby, 1000 different static random matrices are generated for use in the experiment. Finally, we then sum up the results obtained as shown in Figure 7. Figure 7 does display how the error converges towards zero when the different models are executed over time; the intention is to hereby illustrate the convergence speed of all considered models. Hereby, the error function is calculated by using the formula MX − I .
By comparing the different methods with our method (Figure 7), the fastest convergence of our method to the exact solution is observed. Specifically, our method is at least 3 times faster than the Chen method while implemented on CPU (the results of Figure 7 were obtained on CPU). An implementation on a multiple-core platform should display a much higher relative and comparative speed-up performance of our model. Obviously, by using more coefficients (i.e., higher values of n) in Equation (24) we can reach a much better convergence rate.

Conclusions
A novel method for solving the very important "matrix inversion" problem has been developed and validated in this paper. Consequently, it can also solve linear algebraic systems of equations. The concept developed is essentially an RNN-based analog computing machine.
We have compared this novel method with other dynamical systems methods from the relevant literature. It has been demonstrated that our novel method does theoretically have an exponential convergence rate towards the exact solution of the problem for any IVP (initial value problem). Also, the convergence rate of this method is much higher than those of other related competing concepts.
It is further possible to customize this novel model in order to enable its implementation on a cellular neural network processor machine. This has previously been done in our previous works.
For validation we have extensively compared our method with other relevant competing methods like gradient descent, Zhang, and Chen methods in one large simulation experiment. Hereby, we have considered the following scenarios: 1000 random matrixes of M are generated and applied in the different dynamical system models for matrix inversion. For each time-point, errors reached are summed-up using the formula ‖ − ‖.
It has been clear that while using our novel method we do visibly reach a significant speed-up even on one single CPU, this compared to the other methods. A use of more coefficients (i.e., higher values for n in model Equations (24) and (28)) will surely result in an even much higher convergence rate. On the other hand, however, if we use more coefficients, we would need more preparation time to create the templates and consume more computing resources for running the (our novel) RNN processor dynamical model. Therefore, it is recommended to reach a balance/tradeoff between convergence rate and number of required coefficients (i.e., value of n).
To finish, the last experiments have demonstrated that using the polynomial function for F in Equation (28) does lead to a clearly much higher convergence rate when compared to the other types of function. Further, the higher the polynomial order, the best. Figure 7. Comparison of different DNN models with respect to convergence speed under the same conditions-benchmarking. In this figure, GD refers to the "gradient descent" method. Our novel method (see novel in the figure; Equation (24)) is calculated for n = 3. The fast convergence of our method to the exact solution is clearly demonstrated. It is also clear that this convergence would be much faster if higher values of n are taken (n > 3).

Conclusions
A novel method for solving the very important "matrix inversion" problem has been developed and validated in this paper. Consequently, it can also solve linear algebraic systems of equations. The concept developed is essentially an RNN-based analog computing machine.
We have compared this novel method with other dynamical systems methods from the relevant literature. It has been demonstrated that our novel method does theoretically have an exponential convergence rate towards the exact solution of the problem for any IVP (initial value problem). Also, the convergence rate of this method is much higher than those of other related competing concepts.
It is further possible to customize this novel model in order to enable its implementation on a cellular neural network processor machine. This has previously been done in our previous works.
For validation we have extensively compared our method with other relevant competing methods like gradient descent, Zhang, and Chen methods in one large simulation experiment. Hereby, we have considered the following scenarios: 1000 random matrixes of M are generated and applied in the different dynamical system models for matrix inversion. For each time-point, errors reached are summed-up using the formula MX − I .
It has been clear that while using our novel method we do visibly reach a significant speed-up even on one single CPU, this compared to the other methods. A use of more coefficients (i.e., higher values for n in model Equations (24) and (31)) will surely result in an even much higher convergence rate. On the other hand, however, if we use more coefficients, we would need more preparation time to create the templates and consume more computing resources for running the (our novel) RNN processor dynamical model. Therefore, it is recommended to reach a balance/tradeoff between convergence rate and number of required coefficients (i.e., value of n).
To finish, the last experiments have demonstrated that using the polynomial function for F in Equation (31) does lead to a clearly much higher convergence rate when compared to the other types of function. Further, the higher the polynomial order, the best.

Conflicts of Interest:
The authors declare no conflict of interest