Fully complex conjugate gradient-based neural networks using Wirtinger calculus framework: Deterministic convergence and its application☆
Introduction
Currently, the complex-valued neural networks (CVNNs) have been applied successfully in computational intelligence, pattern recognition and signal processing (Chen et al., 2016, Fink et al., 2014, Liu et al., 2016, Nait-Charif, 2010, Tanaka, 2013). Differing from the real-valued network models, CVNNs employ the complex-valued parameters and variables to process data in complex field. Particularly, CVNNs can reduce the number of parameters and operation, and perform well in dealing with classification problems (Aizenberg, 2011, Nitta, 2003). In terms of different activations, there are mainly two kinds of CVNNs, the fully CVNNs (FCVNNs) (Kim and Adali, 2003, Li et al., 2005, Savitha et al., 2012) and the split CVNNs (SCVNNs) (Nitta, 1997). The fully complex activation functions have significant advantages (e.g. some elementary transcendental functions can provide adequate squashing-type nonlinear discrimination with well-defined first-order derivatives (Kim & Adali, 2003)) and have been successfully used in training various network models such as multi-layer perceptrons (Kim & Adali, 2003), extreme learning machines (Li et al., 2005) and radial basis function networks (Savitha et al., 2012).
For real-valued neural networks, the BP algorithm on the basis of gradient descent method (BPG) is the most widely used learning strategy (Li and Zhao, 2017, Xie, 2017, Zhao et al., 2016). As an extension, the complex BPG has been applied to train SCVNNs and FCVNNs, which are called the split complex BPG (SCBPG) (Nitta, 1997, Zhang, Xu et al., 2014, Zhang et al., 2009) and the fully complex BPG (FCBPG) (Li and Adali, 2008, Xu et al., 2015, Zhang, Liu et al., 2014), respectively. However, these methods converge slowly in the updating direction since the consecutive steps only use the negative gradient of the error function. We note that the convergent behavior in training network model is significantly affected by the updating direction. In fact, there are already some modifications of the gradient descent method to speed up the convergence rate (Lu et al., 2002, Papalexopoulos et al., 1994). Unfortunately, these improvements still do not solve the problems very well, particularly when the training procedure meets steep valleys.
In solving optimization problems, conjugate gradient (CG) and Newton methods are two common alternative training schemes (Goodband et al., 2008, Saini and Soni, 2002). Although the Newton method has the most fast convergence among these three methods, it has to compute the Hessian matrix and its inverse, which leads to a huge computational burden especially for large-scale problems. As a compromise algorithm, the CG method not only reaches the fast convergence, but also can be easily stated without calculating the second derivative of the error function (Hagan et al., 1996, Nocedal and Wright, 2006).
As a result of the fast convergent behavior and low computational requirements, the CG method has attracted more and more attention in training real-valued neural networks. In the early stage, the linear and nonlinear CG methods were introduced to deal with the linear system that the coefficient matrix is positive definite (Hestenes & Stiefel, 1952) and the large-scale nonlinear optimization system (Fletcher & Reeves, 1964), respectively. On the basis of the different conjugate direction parameters, the CG methods can be categorized with three typical different types which include the Hestenes–Stiefel (HS) (Hestenes and Stiefel, 1952, Liu and Li, 2011), Fletcher–Reeves (FR) (Fletcher & Reeves, 1964) and Polak–Ribire–Polyak (PRP) (Polak and Ribiere, 1969, Polyak, 1969, Wan et al., 2017) CG methods. To improve the convergence, Sun and Liu (2004) proposed a modified conjugate direction combined with Armijo search method. With a non-monotonic line search method, a modified PRP CG method was introduced to solve the non-smooth convex optimization problems (Yuan & Wei, 2016). Without line search technique, a non-monotonic HS conjugate gradient method was presented to guarantee the sufficient descent direction (Dong, Liu, Li, He, & Liu, 2016). For training FCVNNs, there are a few literatures applying the conjugate gradient method. To solve complex quadratic programming systems, two fast complex-valued optimization algorithms were proposed in Zhang and Xia (2016) which considered the linear constraints with and without -norm. In Zhang and Xia (2018), two complex-valued algorithms were presented to deal with the nonlinear constrained optimization system of real-valued function. In this paper, we attempt to construct one novel conjugate gradient method for fully complex-valued neural network model, through which one can get the sufficient descent direction during training procedure.
In training network model, the step size (learning rate) is another crucial factor to affect the convergence speed except for the updating direction. It is a common technique to set the learning rate as a positive constant in training CVNNs. In Xu et al. (2015), Zhang et al. (2009), Zhang, Liu et al. (2014) and Zhang, Xu et al. (2014), the learning rates in the weight updating formulas were all set to be positive constants. However, we know that the larger learning rate may lead to the more apparent zig-zagging trace during training networks while smaller rate is hard to reach efficiently convergence. As improved strategies, some exact line search methods (Magoulas, Vrahatis, & Androulakis, 1997) were employed to gain the suitable learning rate in the process of training real-valued neural networks. However, we note that these methods are generally inefficient and unreasonable if the initial points are not near the optimum points. More importantly, they are very time-consuming since there is much large-scale calculation to satisfy the exact line search conditions. To reduce the computation burden, some inexact line search techniques were preferred, such as Wolf search rule (Wang & Chen, 2015) and Armijo search rule (Dong, Yang, & Huang, 2015). In terms of generalized Armijo search criterion, a three terms conjugate gradient method was discussed in Sun and Liu (2004) to deal with the unconstrained optimization systems. It provided a different proof strategy for the convergence properties of neural network models. According to the numerical simulation results in Wang, Zhang, Sun, Hao, and Sun (2018), this generalized Armijo step size rule owns the competitive performance in searching suitable learning rate during training real-valued network models. It reaches the sufficient descent performance of error function in the iterative updating process. Consequently, it can result in the deterministic convergence of the weight sequence. However, it is inevitable to confront the big challenges when one directly extends it to the complex domain, such as the boundedness and differentiability of the activation function. Fortunately, Wirtinger calculus offers a possible solutions in dealing with these difficulties (Kreutz-Delgado, 2009, Mandic and Goh, 2009, Wirtinger, 1927). Under the framework of complex domain, we attempt to build one efficient CG method to train a fully complex-valued neural network. The proposed CG method is implemented by combining the generalized Armijo search technique with a modified conjugate coefficient.
As we know, the stability analysis is an important research topic for different feed-forward systems with unknown control (Wang and Zhu, 2018, Zhu, 2018, Zhu and Wang, 2018). The convergence analysis of network models is another crucial issue that needs to be concerned in real applications. There have been considerable literatures about the theoretical analysis of the CVNNs. The decreasing monotonicity and the convergence of FCBPG have been comprehensively discussed in Zhang, Liu et al. (2014). The description complexity of network model was then significantly reduced by applying the Wirtinger differential operator. However, these obtained theoretical results were dependent on the Schwarz symmetry condition. In Xu et al. (2015), this restriction in training FCVNNs was relaxed through introducing an augmented covariance matrix. It thus extended the possible choices of the activation functions. To speed up the convergence and control the magnitude of the weights, two extra momentum and penalty terms were added to establish the so-called SCBPG model (Zhang, Xu et al., 2014). It effectively improved the generalization of the built network model. The boundedness of weight sequence and convergence of the presented algorithm were then arrived as well. In Wang et al. (2017), one fractional order SCVNNs was introduced by virtue of the Caputo-type definition. It took advantage of the hereditary characteristics of fractional differential operator. It is easy to see that all of these theoretical analyses were focused on the gradient descent method. We note that there is little literature involving the deterministic convergence of CG method based FCVNNs.
Inspired by Sun and Liu (2004) and Wang et al. (2018), a modified CG method in terms of the generalized Armijo search is constructed to train fully complex BP neural networks (FCVCGGA) in this paper. It adopts the Wirtinger differential operator to deal with the derivative of fully complex functions. Wirtinger calculus provides an elegant way to compute the gradients of the objective functions with respect to weights and greatly reduces the complexity in describing the proposed algorithm. Compared with the traditional FCBPG, the FCVCGGA algorithm enormously accelerates the convergence and reaches the sufficient descent performance of error function. In comparison with the existing results, we demonstrate the following main contributions of this paper:
- (A)
To speed up the convergence, we have designed a novel conjugate gradient method to train fully complex-valued neural networks.
Different from the typical CG method, the conjugate coefficient adopted in this paper is enlarged to an extensive interval instead of a single value. It employs the information of the current gradient and the last updating direction and their spatial relationship. It not only can accelerate the convergent rates, but also obtain the sufficient descent performance of the objective function. Furthermore, the generalized Armijo search is used to determine the optimal step size (learning rate) in the above constructed conjugate direction for each iteration. It speeds up the training process as well. The illustrated simulations in Section 6 demonstrate the efficiency of this method.
- (B)
The monotonicity and the convergence results of FCVCGGA have been rigorously guaranteed under mild conditions. The weak convergence indicates that the gradient norm sequence of error function with respect to weights goes to zero. For strong convergence, the weight sequence approaches the optimal stationary point along with the increasing iterations.
Based on gradient descent method, Xu et al. (2015) and Zhang, Liu et al. (2014) put forward a big step on the deterministic convergence analyses for FCVNNs. In this work, we thoroughly analyze the convergent behavior for the proposed FCVCGGA algorithm. In addition, the boundedness assumption of weights is relaxed when we discuss the weak convergence, while it is a requisite condition in Xu et al. (2015) and Zhang, Liu et al. (2014).
The remaining sections of this paper are arranged as follows: We give some notations and useful rules for Wirtinger calculus in Section 2. In Section 3, the structure of FCVNNs and the proposed algorithm, FCVCGGA, are introduced. In Section 4, we show the main convergence results. The corresponding proofs are consecutively followed in Section 5. Two kinds of experiments have been simulated in Section 6 which support the effectiveness of the proposed algorithm and its convergence results. Finally, Section 7 concludes this paper.
Section snippets
Preliminaries
For simplicity, we introduce the following notations. and stand for the complex conjugate and the module of a complex variable , separately. The Euclidean norm of a complex vector is expressed as , and the Frobenius norm of a complex matrix is written as . In addition, we assume that the Schwarz symmetry principle is valid for the activation functions used in this paper, that is, (Kim and Adali, 2003, Needham, 1998, Novey, 2008).
For the convenience of analysis, we
Algorithms
Without loss of generality, a three-layer common fullycomplex-valued neural network is considered with input nodes, hidden nodes and one output node. Suppose that is the weight vector that connects the th hidden node and all of the input nodes, where . Write as the weight matrix that connects the hidden and input layers. Denote the weight vector connecting the hidden and output layers as . For brevity, we combine all
Convergence results
For the proposed algorithm, FCVCGGA, we build the following weak and strong convergence results. They theoretically assure the convergent behavior of the presented method and provide reliable guidance for practical applications. For the weak convergence, we only need the following necessary assumption
The activation functions, and , are analytic and their first derivatives, and , are both uniformly continuous in a local region.
In addition, the other two more assumptions are
Proofs
In this section, we mainly focus on the strict proofs of the main theoretical results of the proposed FCVCGGA algorithm. Firstly, we observe that the magnitude of the current updating direction can be constrained by the gradient norm of the objective function with respect to the current weight vector.
Lemma 5.1 Assume that is not the fixed point of the problem , then the following inequality is established where .
Proof According to (19), (21), (22), we have
Illustrated simulations
In this section, we give two kinds of experiments on both regression and classification problems. On the one hand, they effectively demonstrate the advantages of the proposed algorithm compared with its counterparts. On the other hand, these simulations verify the theoretical results as well. For regression problems, we consider the complex noncircular signal (Van, 1994) and a practical wind speed prediction problem. For classification problems, we carry out the presented algorithms on the
Conclusion
In this paper, motivated by HS conjugate gradient method, we extend it to complex domain and construct the FCVHSCG algorithm to train complex-valued network models. Moreover, another fully complex training algorithm, FCVCGGA, is proposed to improve the convergent behavior. We note that the derivation of the algorithms is simplified under the framework of Wirtinger differential operator. The deterministic convergence results of FCVCGGA are rigorously proved which sufficiently provide a
Acknowledgments
The authors would like to express their gratitude to the reviewers for their insightful comments and suggestions which greatly improved this work.
References (61)
- et al.
Predicting component reliability and level of degradation with complex-valued neural networks
Reliability Engineering & System Safety
(2014) - et al.
Fully complex extreme learning machine
Neurocomputing
(2005) - et al.
Bp artificial neural network based wave front correction for sensor-less free space optics communication
Optical Communications
(2017) - et al.
An improved maximum spread algorithm with application to complex-valued RBF neural networks
Neurocomputing
(2016) - et al.
Effective backpropagation training with variable stepsize
Neural Networks
(1997) An extension of the back-propagation algorithm to complex numbers
Neural Networks
(1997)Solving the XOR problem and the detection of symmetry using a single complex-valued neuron
Neural Networks
(2003)- et al.
Iterative solution of nonlinear equations in several variables
Society for Industrial and Applied Mathematics
(1970) The conjugate gradient method in extremal problems
USSR Computational Mathematics and Mathematical Physics
(1969)- et al.
Deterministic convergence of conjugate gradient method for feedforward neural networks
Neurocomputing
(2011)
A novel conjugate gradient method with generalized armijo search for efficient training of feedforward neural networks
Neurocomputing
Stability analysis of semi-markov switched stochastic systems
Automatica
The heat load prediction model based on BP neural network-markov model
Neural Computing and Applications
Convergence analysis of an augmented algorithm for fully complex-valued neural networks
Neural Networks
Stability analysis of stochastic delay differential equations with lvy noise
Systems & Control Letters
Output feedback stabilization of stochastic feedforward systems with unknown control coefficients and unknown output function
Automatica
Complex-valued neural networks with multivalued neurons
Single-layered complex-valued neural network for real-valued classification problems
Neurocomputing
A complex gradient operator and its application in adaptive array theory
IEE Proceedings H - Microwaves, Optics and Antennas
Channel equalization using adaptive complex radial basis function networks
IEEE Journal on Selected Areas in Communications
Feature selection using a neural framework with controlled redundancy
IEEE Transactions on Neural Networks and Learning Systems
Complex-valued b-spline neural network and its application to iterative frequency-domain decision feedback equalization for hammerstein communication systems
A modified nonmonotone Hestenes-Stiefel type conjugate gradient methods for large-scale unconstrained problem
Numerical Functional Analysis and Optimization
Global convergence of a new conjugate gradient method with armijo search
Journal of Henan Normal University
Function minimization by conjugate gradients
The Computer Journal
A comparison of neural network approaches for on-line prediction in IGRT
Medical Physics
Neural network design
Method of conjugate gradients for solving linear systems
Approximation by fully complex multilayer perceptrons
Neural Computation
Cited by (35)
Adaptive stepsize estimation based accelerated gradient descent algorithm for fully complex-valued neural networks
2024, Expert Systems with ApplicationsA global neural network learning machine: Coupled integer and fractional calculus operator with an adaptive learning scheme
2021, Neural NetworksCitation Excerpt :Various optimization methods in traditional mathematics are introduced into neural network. These methods can generally be divided into two categories, the first category is based on a single individual (Abrudan, Eriksson, & Koivunen, 2008; Bhotto & Antoniou, 2010; Chou, Hung, & Chou, 2011; Gao et al., 2019; Perantonis & Virvilis, 2000; Xu & Zhdanov, 2015; Zhang, Liu, Cao, Wu, & Wang, 2019; Zhang, Lou, & Feng, 2014; Zhang, Mu, & Zheng, 2013), and the other category is based on population (Blanco, Delgado, & Pegalajar, 2001; Chao & Wu, 2007; Kim, Lee, & Park, 2010; Kiranyaz, Ince, Yildirim, & Gabbouj, 2009; Liu, 2014; Lorenzo & Glisic, 2013; Sun, Dong & Chen, 2017; Sun, Jin, Cheng, Ding, & Zeng, 2017; Tran, Xue, & Zhang, 2019). For the method based on one individual, as the name implies, only one individual searches for the optimal value with the help of the gradient information of the function.
A training algorithm with selectable search direction for complex-valued feedforward neural networks
2021, Neural NetworksCitation Excerpt :In this algorithm, a sufficient descent search direction was found by choosing suitable conjugate coefficient from an interval characterized by two inequalities, and the learning rate was determined by generalized Armijo search. As analyzed in Zhang et al. (2019), the proposed algorithm expanded the search space at each iteration and tried to find the global minimum of the objective function. Currently, the study of training algorithm for CVFNNs is an active research topic.
Design and Application of A Robust Zeroing Neural Network to Kinematical Resolution of Redundant Manipulators Under Various External Disturbances
2020, NeurocomputingCitation Excerpt :Because of the real-time solution capability and parallel computation property, recurrent neural networks (RNNs) were extensively applied to many scientific and engineering computation [17–23]. At present, the solutions based on the RNNs can be regarded as a systematic strategy to addressing different real-time engineering issues (including real-time tracking control problems), with many progress made by investigators [13,14,24–27]. For instance, in the past years, many different types of RNN models were proposed and developed to address various optimization problems, and scientific computation problems arising in engineering fields [24,26–29].
- ☆
This work was supported in part by the National Natural Science Foundation of China (No. 61305075), the Natural Science Foundation of Shandong Province (No. ZR2015AL014, ZR201709220208) and the Fundamental Research Funds for the Central Universities (No. 15CX08011A, 18CX02036A).
- 1
Y.S. Liu and B.J. Zhang contributed equally to this paper.