Adaptive complex-valued stepsize based fast learning of complex-valued neural networks
Introduction
Over the last years, different kinds of neural networks have been widely and successfully applied in various fields with desired performance (Wang et al., 2019, Wang and Liu, 2018). Among them, complex-valued neural networks (CVNNs) play an increasingly important role because of their advantages such as high functionality and fast learning (Aizenberg, 2011). For many practical applications (e.g., image classification, natural language processing and complex-valued signal processing, etc.), it is verified that the performance achieved by CVNNs is superior to that of the real-valued counterparts (Nakamura et al., 2019, Sun et al., 2019, Trabelsi et al., 2017, Zhang et al., 2017).
Depending on the activation functions, CVNNs are generally classified into three types: real-imaginary type CVNNs (Nitta, 1997), amplitude–phase type CVNNs (Hirose & Yoshida, 2012) and fully CVNNs (Kim & Adalı, 2003). Amplitude–phase type CVNNs perform nonlinear transformation on the amplitude and phase of signals by employing two real-valued activation functions. They are generally applied to complex-valued signal processing, because they have a good ability to directly deal with the relationship between the amplitude and phase of complex-valued signals (Hirose & Yoshida, 2012). Real-imaginary type CVNNs employ a pair of real-valued functions as a nonlinear complex-valued activation function, which solve the issue caused by the unboundedness of holomorphic complex-valued functions. However, the correlation between the real and imaginary components is ignored due to the inherent property of real-imaginary type CVNNs. To overcome it, fully CVNNs are introduced for practical applications. Fully CVNNs adopt some elementary transcendental functions to guarantee the validity of the well-defined first-order derivative information. In addition, several efficient training methods for fully CVNNs were proposed in Savitha, Suresh, and Sundararajan (2013) and Zhang, Liu, Cao, Wu, and Wang (2019) by the adoption of Wirtinger calculus (Brandwood, 1983, Kreutz-Delgado, 2009).
It is known that, up to now, real-valued learning rate is utilized in almost all learning algorithms for CVNNs (Xu et al., 2015, Zhang et al., 2014). While different from real-valued neural networks, most of the critical points of CVNNs where the gradient of the objective function vanishes are saddle points (Nitta, 2013). As discussed in Fukumizu and Amari (2000), complex-valued gradient based training algorithms with real positive stepsizes may converge very slowly when it falls into the area near a critical point. It would eventually degrade the performance of learning algorithms. In Goh and Mandic (2007) and Mandic (2004), adaptive real-valued stepsize based methods were established to accelerate the convergence process. However, this issue was not completely resolved by the methods in Goh and Mandic (2007) and Mandic (2004) since the search direction was obtained based on the negative gradient. On the other hand, the authors in Kim and Adali (2002) and Kim and Adalı (2003) initially investigated the effect of complex-valued stepsizes on complex multilayer perceptrons. It was found that the usage of complex-valued stepsize can enhance the performance for complex-valued tensor decomposition problems (Chen, Han, & Qi, 2011). Recently, theoretical and experimental analysis on the advantages and disadvantages of complex-valued stepsize for complex gradient descent learning algorithms was presented in Zhang and Mandic (2015). It was shown that the search space can be enlarged from half line to half plane when a complex-valued stepsize is adopted for a pre-calculated search direction and it is thus able to escape from saddle points. Moreover, a complex Barzilai–Borwein method (CBBM) was proposed in Zhang and Mandic (2015) for the choice of complex-valued stepsize by extending the real Barzilai–Borwein method (Barzilai and Borwein, 1988, Raydan, 1993) to the complex domain. However, according to our experiments, it is recognized that, similar to its real-valued counterpart, the CBBM suffers from the probabilistic divergence and may be unstable during the whole training procedure. Therefore, the design of complex-valued stepsize for the learning of CVNNs has not yet been fully addressed and remains challenging. It is thus of great significance to further investigate this issue.
Recently, an adaptable learning rate tree (ALRT) algorithm for real-valued feedforward neural networks was proposed in Takase, Oyama, and Kurihara (2018), where the stepsize can be efficiently determined according to the training loss. Different from most of the adaptive methods such as Adam (Kingma & Ba, 2014), AdamW (Loshchilov & Hutter, 2017) and Amsgrad (Reddi, Kale, & Kumar, 2019), where the stepsize gradually decreases during the training procedure, the ALRT can increase or decrease the stepsize adaptively such that the objective function decreases as much as possible at each iteration. It was further shown by experimental results that the ALRT outperformed other adaptive stepsize methods under certain conditions. However, for the ALRT algorithm in the real domain Takase et al. (2018), it only changes the length of the stepsize along the negative gradient direction. There is much evidence that the negative gradient direction is not necessarily the best search direction for optimization. Fortunately, a complex-valued stepsize with a proper phase provides the possibility of changing the search direction from the negative gradient one. That is to say, better search direction can be obtained by multiplying the negative gradient by a reasonable complex-valued stepsize with nonzero phase.
Motivated by these observations, in this paper, an adaptive complex-valued stepsize design method is presented for the learning of fully CVNNs based on improved ALRT technique. By introducing the scaling and rotation factors, both the amplitude of the learning rate and the search direction can be simultaneously adjusted to guarantee the reduction of the objective function as sufficient as possible at each iteration. For simplicity, the method proposed in this study is named by complex-valued ALRT (CALRT). The contribution and advantages of the proposed CALRT are as follows: (1) The complex-valued stepsize can be efficiently and easily determined according to the training loss at each iteration. (2) Due to the introduction of varying phase in the complex-valued stepsize, the CALRT is more capable to escape from saddle points than the one with constant complex-valued stepsize and thus converges more quickly. (3) Compared with other adaptive stepsize design methods, the CALRT can find more accurate solution and the training process is more stable. (4) When the training reaches the vicinity of the optimal solution, the CALRT is easy to converge since the steepest descent direction is always considered. (5) The initial stepsize is easy to be prescribed since it has limited effect on the performance of the CALRT. Finally, some experimental results on function approximation and pattern classification are given to illustrate the effectiveness and advantages of the proposed algorithm over some existing ones.
The rest of this paper is organized as follows. Some preliminaries on fully CVNNs, Wirtinger calculus and ALRT technique are introduced in Section 2. In Section 3, the basic principle of the CALRT is described. The dynamics of the CALRT is analyzed with simulation results in Section 4 to verify that it is efficient to escape from saddle points. In Section 5, the effect of some parameters on the performance of the CALRT is discussed and experimental results are obtained to demonstrate the advantages of the CALRT over some existing algorithms. The conclusion is finally made in Section 6.
Section snippets
Fully CVNNs
A fully CVNN is composed by alternate stacking of layers with linear and nonlinear operations. For , the output of the th layer is calculated by a recursive way and has the form where is the output vector of the th layer, is the input of the CVNN, is the weight matrix connecting the neurons between the th and th layers, and are respectively the bias vector and complex-valued activation function of the neurons in the th layer.
CVNNs are
CALRT
In the framework of ALRT, the stepsize must be real-valued since it is designed for real-valued neural networks. As a result, the search direction in Takase et al. (2018) is actually not changed by the ALRT. On the other hand, it is well recognized that the negative gradient direction is generally not the best search direction for unconstrained optimization problems. In order to better train CVNNs and fully take advantages of complex-valued stepsize, the ALRT technique is improved and
Ability to escape from saddle points
There are generally many saddle points in the objective function of fully CVNNs (Nitta, 2013), which would seriously affect the convergence speed of training algorithms. However, compared with real-valued stepsize and constant complex-valued stepsize, adaptively varying complex-valued stepsize can make the training algorithm easily escape from saddle points and achieve rapid convergence because better search direction is chosen at each iteration based on (14). It should be noted that the change
Experimental results
In this section, more experiments are conducted to further show the advantages of the CALRT algorithm. First of all, the Ionosphere dataset in Dua and Graff (2017) is used to analyze the effect of some parameters (such as the initial stepsize, the scaling and rotation factors, the number of branches and the beam size) on the performance of the CALRT. The Ionosphere dataset has two classes of samples and each sample has features. A fully CVNN with hidden neurons is constructed for this
Conclusion
In this paper, an adaptive CALRT method has been established for the determination of complex-valued stepsize for the learning of fully CVNNs. The scaling and rotation factors have been introduced such that more appropriate search direction has been efficiently found and much learning progress has been guaranteed at each iteration. As one of the advantages of the proposed algorithm, it completely has the capability escaping from saddle points during the training procedure. Compared with some
Acknowledgments
The authors would like to thank the anonymous reviewers for their constructive comments that have greatly improved the quality of this paper. This work was jointly supported by the Natural Science Foundation of Jiangsu Province of China under Grant no. BK20181431 and the Qing Lan Project of Jiangsu Province .
References (37)
- et al.
Local minima and plateaus in hierarchical structures of multilayer perceptrons
Neural Networks
(2000) An extension of the back-propagation algorithm to complex numbers
Neural Networks
(1997)Local minima in hierarchical structures of complex-valued neural networks
Neural Networks
(2013)- et al.
A complex-valued neuro-fuzzy inference system and its learning mechanism
Neurocomputing
(2014) - et al.
Effective neural network training with adaptive learning rate based on training loss
Neural Networks
(2018) - et al.
Convergence analysis of an augmented algorithm for fully complex-valued neural networks
Neural Networks
(2015) - et al.
Fully complex conjugate gradient-based neural networks using wirtinger calculus framework: Deterministic convergence and its application
Neural Networks
(2019) Complex-valued neural networks with multi-valued neurons, Vol. 353
(2011)- et al.
Single-layered complex-valued neural network for real-valued classification problems
Neurocomputing
(2009) - et al.
Two-point step size gradient methods
IMA Journal of Numerical Analysis
(1988)
A complex gradient operator and its application in adaptive array theory
IEE Proceedings H-Microwaves, Optics and Antennas
Stabilized Barzilai–Borwein method
New als methods with extrapolating search directions and optimal step size for complex-valued tensor decompositions
IEEE Transactions on Signal Processing
UCI machine learning repository
Stochastic gradient-adaptive complex-valued nonlinear neural adaptive filters with a gradient-adaptive step size
IEEE Transactions on Neural Networks
Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence
IEEE Transactions on Neural Networks and Learning Systems
Fully complex multi-layer perceptron network for nonlinear signal processing
Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology
Approximation by fully complex multilayer perceptrons
Neural Computation
Cited by (14)
A hybrid complex spectral conjugate gradient learning algorithm for complex-valued data processing
2024, Engineering Applications of Artificial IntelligenceAdaptive stepsize estimation based accelerated gradient descent algorithm for fully complex-valued neural networks
2024, Expert Systems with ApplicationsA training algorithm with selectable search direction for complex-valued feedforward neural networks
2021, Neural NetworksCitation Excerpt :It was analyzed that the adoption of complex-valued stepsize can make the learning rapidly get away from the plateaus of the objective function and thus accelerate the convergence speed of gradient descent algorithm. More recently, an adaptive complex-valued stepsize design method was proposed in Zhang and Huang (2020). Two kinds of factors were introduced to generalize the adaptable learning rate (ALR) technique (Takase, Oyama, & Kurihara, 2018) for CVFNNs, where the choice range of search direction was expanded from half line to half plane and some limitations in Zhang and Mandic (2015) were well resolved.