Elsevier

Neural Networks

Volume 115, July 2019, Pages 50-64
Neural Networks

Fully complex conjugate gradient-based neural networks using Wirtinger calculus framework: Deterministic convergence and its application

https://doi.org/10.1016/j.neunet.2019.02.011Get rights and content

Abstract

Conjugate gradient method has been verified to be one effective strategy for training neural networks due to its low memory requirements and fast convergence. In this paper, we propose an efficient conjugate gradient method to train fully complex-valued network models in terms of Wirtinger differential operator. Two ways are adopted to enhance the training performance. One is to construct a sufficient descent direction during training by designing a fine tuning conjugate coefficient. Another technique is to pursue the optimal learning rate instead of a fixed constant in each iteration which is determined by employing a generalized Armijo search. In addition, we rigorously prove its weak and strong convergence results, i.e., the gradient norms of objective function with respect to weights approach zero along with the increasing iterations and the weight sequence tends to the optimal point. To verify the effectiveness and rationality of the proposed method, four illustrated simulations have been performed on both typical regression and classification problems.

Introduction

Currently, the complex-valued neural networks (CVNNs) have been applied successfully in computational intelligence, pattern recognition and signal processing (Chen et al., 2016, Fink et al., 2014, Liu et al., 2016, Nait-Charif, 2010, Tanaka, 2013). Differing from the real-valued network models, CVNNs employ the complex-valued parameters and variables to process data in complex field. Particularly, CVNNs can reduce the number of parameters and operation, and perform well in dealing with classification problems (Aizenberg, 2011, Nitta, 2003). In terms of different activations, there are mainly two kinds of CVNNs, the fully CVNNs (FCVNNs) (Kim and Adali, 2003, Li et al., 2005, Savitha et al., 2012) and the split CVNNs (SCVNNs) (Nitta, 1997). The fully complex activation functions have significant advantages (e.g. some elementary transcendental functions can provide adequate squashing-type nonlinear discrimination with well-defined first-order derivatives (Kim & Adali, 2003)) and have been successfully used in training various network models such as multi-layer perceptrons (Kim & Adali, 2003), extreme learning machines (Li et al., 2005) and radial basis function networks (Savitha et al., 2012).

For real-valued neural networks, the BP algorithm on the basis of gradient descent method (BPG) is the most widely used learning strategy (Li and Zhao, 2017, Xie, 2017, Zhao et al., 2016). As an extension, the complex BPG has been applied to train SCVNNs and FCVNNs, which are called the split complex BPG (SCBPG) (Nitta, 1997, Zhang, Xu et al., 2014, Zhang et al., 2009) and the fully complex BPG (FCBPG) (Li and Adali, 2008, Xu et al., 2015, Zhang, Liu et al., 2014), respectively. However, these methods converge slowly in the updating direction since the consecutive steps only use the negative gradient of the error function. We note that the convergent behavior in training network model is significantly affected by the updating direction. In fact, there are already some modifications of the gradient descent method to speed up the convergence rate (Lu et al., 2002, Papalexopoulos et al., 1994). Unfortunately, these improvements still do not solve the problems very well, particularly when the training procedure meets steep valleys.

In solving optimization problems, conjugate gradient (CG) and Newton methods are two common alternative training schemes (Goodband et al., 2008, Saini and Soni, 2002). Although the Newton method has the most fast convergence among these three methods, it has to compute the Hessian matrix and its inverse, which leads to a huge computational burden especially for large-scale problems. As a compromise algorithm, the CG method not only reaches the fast convergence, but also can be easily stated without calculating the second derivative of the error function (Hagan et al., 1996, Nocedal and Wright, 2006).

As a result of the fast convergent behavior and low computational requirements, the CG method has attracted more and more attention in training real-valued neural networks. In the early stage, the linear and nonlinear CG methods were introduced to deal with the linear system that the coefficient matrix is positive definite (Hestenes & Stiefel, 1952) and the large-scale nonlinear optimization system (Fletcher & Reeves, 1964), respectively. On the basis of the different conjugate direction parameters, the CG methods can be categorized with three typical different types which include the Hestenes–Stiefel (HS) (Hestenes and Stiefel, 1952, Liu and Li, 2011), Fletcher–Reeves (FR) (Fletcher & Reeves, 1964) and Polak–Ribière–Polyak (PRP) (Polak and Ribiere, 1969, Polyak, 1969, Wan et al., 2017) CG methods. To improve the convergence, Sun and Liu (2004) proposed a modified conjugate direction combined with Armijo search method. With a non-monotonic line search method, a modified PRP CG method was introduced to solve the non-smooth convex optimization problems (Yuan & Wei, 2016). Without line search technique, a non-monotonic HS conjugate gradient method was presented to guarantee the sufficient descent direction (Dong, Liu, Li, He, & Liu, 2016). For training FCVNNs, there are a few literatures applying the conjugate gradient method. To solve complex quadratic programming systems, two fast complex-valued optimization algorithms were proposed in Zhang and Xia (2016) which considered the linear constraints with and without l1-norm. In Zhang and Xia (2018), two complex-valued algorithms were presented to deal with the nonlinear constrained optimization system of real-valued function. In this paper, we attempt to construct one novel conjugate gradient method for fully complex-valued neural network model, through which one can get the sufficient descent direction during training procedure.

In training network model, the step size (learning rate) is another crucial factor to affect the convergence speed except for the updating direction. It is a common technique to set the learning rate as a positive constant in training CVNNs. In Xu et al. (2015), Zhang et al. (2009), Zhang, Liu et al. (2014) and Zhang, Xu et al. (2014), the learning rates in the weight updating formulas were all set to be positive constants. However, we know that the larger learning rate may lead to the more apparent zig-zagging trace during training networks while smaller rate is hard to reach efficiently convergence. As improved strategies, some exact line search methods (Magoulas, Vrahatis, & Androulakis, 1997) were employed to gain the suitable learning rate in the process of training real-valued neural networks. However, we note that these methods are generally inefficient and unreasonable if the initial points are not near the optimum points. More importantly, they are very time-consuming since there is much large-scale calculation to satisfy the exact line search conditions. To reduce the computation burden, some inexact line search techniques were preferred, such as Wolf search rule (Wang & Chen, 2015) and Armijo search rule (Dong, Yang, & Huang, 2015). In terms of generalized Armijo search criterion, a three terms conjugate gradient method was discussed in Sun and Liu (2004) to deal with the unconstrained optimization systems. It provided a different proof strategy for the convergence properties of neural network models. According to the numerical simulation results in Wang, Zhang, Sun, Hao, and Sun (2018), this generalized Armijo step size rule owns the competitive performance in searching suitable learning rate during training real-valued network models. It reaches the sufficient descent performance of error function in the iterative updating process. Consequently, it can result in the deterministic convergence of the weight sequence. However, it is inevitable to confront the big challenges when one directly extends it to the complex domain, such as the boundedness and differentiability of the activation function. Fortunately, Wirtinger calculus offers a possible solutions in dealing with these difficulties (Kreutz-Delgado, 2009, Mandic and Goh, 2009, Wirtinger, 1927). Under the framework of complex domain, we attempt to build one efficient CG method to train a fully complex-valued neural network. The proposed CG method is implemented by combining the generalized Armijo search technique with a modified conjugate coefficient.

As we know, the stability analysis is an important research topic for different feed-forward systems with unknown control (Wang and Zhu, 2018, Zhu, 2018, Zhu and Wang, 2018). The convergence analysis of network models is another crucial issue that needs to be concerned in real applications. There have been considerable literatures about the theoretical analysis of the CVNNs. The decreasing monotonicity and the convergence of FCBPG have been comprehensively discussed in Zhang, Liu et al. (2014). The description complexity of network model was then significantly reduced by applying the Wirtinger differential operator. However, these obtained theoretical results were dependent on the Schwarz symmetry condition. In Xu et al. (2015), this restriction in training FCVNNs was relaxed through introducing an augmented covariance matrix. It thus extended the possible choices of the activation functions. To speed up the convergence and control the magnitude of the weights, two extra momentum and penalty terms were added to establish the so-called SCBPG model (Zhang, Xu et al., 2014). It effectively improved the generalization of the built network model. The boundedness of weight sequence and convergence of the presented algorithm were then arrived as well. In Wang et al. (2017), one fractional order SCVNNs was introduced by virtue of the Caputo-type definition. It took advantage of the hereditary characteristics of fractional differential operator. It is easy to see that all of these theoretical analyses were focused on the gradient descent method. We note that there is little literature involving the deterministic convergence of CG method based FCVNNs.

Inspired by Sun and Liu (2004) and Wang et al. (2018), a modified CG method in terms of the generalized Armijo search is constructed to train fully complex BP neural networks (FCVCGGA) in this paper. It adopts the Wirtinger differential operator to deal with the derivative of fully complex functions. Wirtinger calculus provides an elegant way to compute the gradients of the objective functions with respect to weights and greatly reduces the complexity in describing the proposed algorithm. Compared with the traditional FCBPG, the FCVCGGA algorithm enormously accelerates the convergence and reaches the sufficient descent performance of error function. In comparison with the existing results, we demonstrate the following main contributions of this paper:

  • (A)

    To speed up the convergence, we have designed a novel conjugate gradient method to train fully complex-valued neural networks.

    Different from the typical CG method, the conjugate coefficient adopted in this paper is enlarged to an extensive interval instead of a single value. It employs the information of the current gradient and the last updating direction and their spatial relationship. It not only can accelerate the convergent rates, but also obtain the sufficient descent performance of the objective function. Furthermore, the generalized Armijo search is used to determine the optimal step size (learning rate) in the above constructed conjugate direction for each iteration. It speeds up the training process as well. The illustrated simulations in Section 6 demonstrate the efficiency of this method.

  • (B)

    The monotonicity and the convergence results of FCVCGGA have been rigorously guaranteed under mild conditions. The weak convergence indicates that the gradient norm sequence of error function with respect to weights goes to zero. For strong convergence, the weight sequence approaches the optimal stationary point along with the increasing iterations.

    Based on gradient descent method, Xu et al. (2015) and Zhang, Liu et al. (2014) put forward a big step on the deterministic convergence analyses for FCVNNs. In this work, we thoroughly analyze the convergent behavior for the proposed FCVCGGA algorithm. In addition, the boundedness assumption of weights is relaxed when we discuss the weak convergence, while it is a requisite condition in Xu et al. (2015) and Zhang, Liu et al. (2014).

The remaining sections of this paper are arranged as follows: We give some notations and useful rules for Wirtinger calculus in Section 2. In Section 3, the structure of FCVNNs and the proposed algorithm, FCVCGGA, are introduced. In Section 4, we show the main convergence results. The corresponding proofs are consecutively followed in Section 5. Two kinds of experiments have been simulated in Section 6 which support the effectiveness of the proposed algorithm and its convergence results. Finally, Section 7 concludes this paper.

Section snippets

Preliminaries

For simplicity, we introduce the following notations. z¯ and |z| stand for the complex conjugate and the module of a complex variable z, separately. The Euclidean norm of a complex vector z is expressed as z, and the Frobenius norm of a complex matrix Z is written as Z. In addition, we assume that the Schwarz symmetry principle is valid for the activation functions used in this paper, that is, f(z¯)=f(z)¯ (Kim and Adali, 2003, Needham, 1998, Novey, 2008).

For the convenience of analysis, we

Algorithms

Without loss of generality, a three-layer common fullycomplex-valued neural network is considered with l input nodes, m hidden nodes and one output node. Suppose that vi=(vi1,vi2,,vil)Tl is the weight vector that connects the ith hidden node and all of the input nodes, where i=1,2,,m. Write V=(v1,v2,,vm)Tm×l as the weight matrix that connects the hidden and input layers. Denote the weight vector connecting the hidden and output layers as u=(u1,u2,,um)Tm. For brevity, we combine all

Convergence results

For the proposed algorithm, FCVCGGA, we build the following weak and strong convergence results. They theoretically assure the convergent behavior of the presented method and provide reliable guidance for practical applications. For the weak convergence, we only need the following necessary assumption

  • (A1)

    The activation functions, g(z) and f(z), are analytic and their first derivatives, g(z) and f(z), are both uniformly continuous in a local region.

In addition, the other two more assumptions are

Proofs

In this section, we mainly focus on the strict proofs of the main theoretical results of the proposed FCVCGGA algorithm. Firstly, we observe that the magnitude of the current updating direction can be constrained by the gradient norm of the objective function with respect to the current weight vector.

Lemma 5.1

Assume that wn is not the fixed point of the problem minE(w)wm(l+1), then the following inequality is established dn¯C1w¯E(wn),where C1=1+1δ.

Proof

According to (19), (21), (22), we have dn¯=w

Illustrated simulations

In this section, we give two kinds of experiments on both regression and classification problems. On the one hand, they effectively demonstrate the advantages of the proposed algorithm compared with its counterparts. On the other hand, these simulations verify the theoretical results as well. For regression problems, we consider the complex noncircular signal (Van, 1994) and a practical wind speed prediction problem. For classification problems, we carry out the presented algorithms on the

Conclusion

In this paper, motivated by HS conjugate gradient method, we extend it to complex domain and construct the FCVHSCG algorithm to train complex-valued network models. Moreover, another fully complex training algorithm, FCVCGGA, is proposed to improve the convergent behavior. We note that the derivation of the algorithms is simplified under the framework of Wirtinger differential operator. The deterministic convergence results of FCVCGGA are rigorously proved which sufficiently provide a

Acknowledgments

The authors would like to express their gratitude to the reviewers for their insightful comments and suggestions which greatly improved this work.

References (61)

  • WangJ. et al.

    A novel conjugate gradient method with generalized armijo search for efficient training of feedforward neural networks

    Neurocomputing

    (2018)
  • WangB. et al.

    Stability analysis of semi-markov switched stochastic systems

    Automatica

    (2018)
  • XieL.

    The heat load prediction model based on BP neural network-markov model

    Neural Computing and Applications

    (2017)
  • XuD.P. et al.

    Convergence analysis of an augmented algorithm for fully complex-valued neural networks

    Neural Networks

    (2015)
  • ZhuQ.X.

    Stability analysis of stochastic delay differential equations with lvy noise

    Systems & Control Letters

    (2018)
  • ZhuQ.X. et al.

    Output feedback stabilization of stochastic feedforward systems with unknown control coefficients and unknown output function

    Automatica

    (2018)
  • AizenbergI.

    Complex-valued neural networks with multivalued neurons

    (2011)
  • AminM.F. et al.

    Single-layered complex-valued neural network for real-valued classification problems

    Neurocomputing

    (2009)
  • Bartholomew-BiggsM.
  • BrandwoodD.H.

    A complex gradient operator and its application in adaptive array theory

    IEE Proceedings H - Microwaves, Optics and Antennas

    (1983)
  • ChaI. et al.

    Channel equalization using adaptive complex radial basis function networks

    IEEE Journal on Selected Areas in Communications

    (1995)
  • ChakrabortyR. et al.

    Feature selection using a neural framework with controlled redundancy

    IEEE Transactions on Neural Networks and Learning Systems

    (2015)
  • ChenS. et al.

    Complex-valued b-spline neural network and its application to iterative frequency-domain decision feedback equalization for hammerstein communication systems

  • DongX.L. et al.

    A modified nonmonotone Hestenes-Stiefel type conjugate gradient methods for large-scale unconstrained problem

    Numerical Functional Analysis and Optimization

    (2016)
  • DongX.L. et al.

    Global convergence of a new conjugate gradient method with armijo search

    Journal of Henan Normal University

    (2015)
  • FletcherR. et al.

    Function minimization by conjugate gradients

    The Computer Journal

    (1964)
  • GoodbandJ.H. et al.

    A comparison of neural network approaches for on-line prediction in IGRT

    Medical Physics

    (2008)
  • HaganM.T. et al.

    Neural network design

    (1996)
  • HestenesM.R. et al.

    Method of conjugate gradients for solving linear systems

    (1952)
  • KimT. et al.

    Approximation by fully complex multilayer perceptrons

    Neural Computation

    (2003)
  • Cited by (35)

    • A global neural network learning machine: Coupled integer and fractional calculus operator with an adaptive learning scheme

      2021, Neural Networks
      Citation Excerpt :

      Various optimization methods in traditional mathematics are introduced into neural network. These methods can generally be divided into two categories, the first category is based on a single individual (Abrudan, Eriksson, & Koivunen, 2008; Bhotto & Antoniou, 2010; Chou, Hung, & Chou, 2011; Gao et al., 2019; Perantonis & Virvilis, 2000; Xu & Zhdanov, 2015; Zhang, Liu, Cao, Wu, & Wang, 2019; Zhang, Lou, & Feng, 2014; Zhang, Mu, & Zheng, 2013), and the other category is based on population (Blanco, Delgado, & Pegalajar, 2001; Chao & Wu, 2007; Kim, Lee, & Park, 2010; Kiranyaz, Ince, Yildirim, & Gabbouj, 2009; Liu, 2014; Lorenzo & Glisic, 2013; Sun, Dong & Chen, 2017; Sun, Jin, Cheng, Ding, & Zeng, 2017; Tran, Xue, & Zhang, 2019). For the method based on one individual, as the name implies, only one individual searches for the optimal value with the help of the gradient information of the function.

    • A training algorithm with selectable search direction for complex-valued feedforward neural networks

      2021, Neural Networks
      Citation Excerpt :

      In this algorithm, a sufficient descent search direction was found by choosing suitable conjugate coefficient from an interval characterized by two inequalities, and the learning rate was determined by generalized Armijo search. As analyzed in Zhang et al. (2019), the proposed algorithm expanded the search space at each iteration and tried to find the global minimum of the objective function. Currently, the study of training algorithm for CVFNNs is an active research topic.

    • Design and Application of A Robust Zeroing Neural Network to Kinematical Resolution of Redundant Manipulators Under Various External Disturbances

      2020, Neurocomputing
      Citation Excerpt :

      Because of the real-time solution capability and parallel computation property, recurrent neural networks (RNNs) were extensively applied to many scientific and engineering computation [17–23]. At present, the solutions based on the RNNs can be regarded as a systematic strategy to addressing different real-time engineering issues (including real-time tracking control problems), with many progress made by investigators [13,14,24–27]. For instance, in the past years, many different types of RNN models were proposed and developed to address various optimization problems, and scientific computation problems arising in engineering fields [24,26–29].

    View all citing articles on Scopus

    This work was supported in part by the National Natural Science Foundation of China (No. 61305075), the Natural Science Foundation of Shandong Province (No. ZR2015AL014, ZR201709220208) and the Fundamental Research Funds for the Central Universities (No. 15CX08011A, 18CX02036A).

    1

    Y.S. Liu and B.J. Zhang contributed equally to this paper.

    View full text