Multilayer Perceptrons: Other Learning Techniques

Du, Ke-Lin; Swamy, M. N. S.

doi:10.1007/978-1-4471-5571-3_5

Ke-Lin Du^3,4 &
M. N. S. Swamy³

9708 Accesses
1 Citations

Abstract

Training of feedforward networks can be viewed as an unconstrained optimization problem. BP is slow to converge when the error surface is flat along a weight dimension. Second-order optimization techniques have a strong theoretical basis and provide significantly faster convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amari, S. I. (1998). Natural gradient works efficiently in learning. Neural Computation, 10, 251–276.
Article Google Scholar
Amari, A., Park, H., & Fukumizu, K. (2000). Adaptive method of realizing natiral gradient learning for multilayer perceptrons. Neural Computation, 12, 1399–1409.
Article Google Scholar
Ampazis, N., & Perantonis, S. J. (2002). Two highly efficient second-order algorithms for training feedforward networks. IEEE Transactions on Neural Networks, 13(5), 1064–1074.
Google Scholar
Azimi-Sadjadi, R., & Liou, R. J. (1992). Fast learning process of multilayer neural networks using recursive least squares method. IEEE Transactions on Signal Processing, 40(2), 446–450.
Article Google Scholar
Baermann, F., & Biegler-Koenig, F. (1992). On a class of efficient learning algorithms for neural networks. Neural Networks, 5(1), 139–144.
Article Google Scholar
Barnard, E. (1992). Optimization for training neural nets. IEEE Transactions on Neural Networks, 3(2), 232–240.
Article Google Scholar
Battiti, R., & Masulli, F. (1990). BFGS optimization for faster automated supervised learning. In Proceedings of International Conference on Neural Networks (Vol. 2, pp. 757–760). Paris, France. Dordrecht: Kluwer.
Google Scholar
Battiti, R. (1992). First- and second-order methods for learning: Between steepest descent and Newton methods. Neural Computation, 4(2), 141–166.
Article Google Scholar
Battiti, R., Masulli, G., & Tecchiolli, G. (1994). Learning with first, second, and no derivatives: A case study in high energy physics. Neurocomputing, 6(2), 181–206.
Article Google Scholar
Beigi, H. S. M. (1993). Neural network learning through optimally conditioned quadratically convergent methods requiring no line search. In Proceedings of IEEE the 36th Midwest Symposium on Circuits and Systems (Vol. 1, pp. 109–112). Detroit, MI.
Google Scholar
Benvenuto, N., & Piazza, F. (1992). On the complex backpropagation algorithm. IEEE Transactions on Signal Processing, 40(4), 967–969.
Article Google Scholar
Bhaya, A., & Kaszkurewicz, E. (2004). Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Networks, 17, 65–71.
Article MATH Google Scholar
Bilski, J., & Rutkowski, L. (1998). A fast training algorithm for neural networks. IEEE Transactions on Circuits and Systems II, 45(6), 749–753.
Article Google Scholar
Bishop, C. M. (1992). Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Computation, 4(4), 494–501.
Article Google Scholar
Bishop, C. M. (1995). Neural networks for pattern recogonition. New York: Oxford Press.
Google Scholar
Bortoletti, A., Di Fiore, C., Fanelli, S., & Zellini, P. (2003). A new class of quasi-Newtonian methods for optimal learning in MLP-networks. IEEE Transactions on Neural Networks, 14(2), 263–273.
Article Google Scholar
Burton, R. M., & Mpitsos, G. J. (1992). Event dependent control of noise enhances learning in neural networks. Neural Networks, 5, 627–637.
Article Google Scholar
Charalambous, C. (1992). Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proceedings-G, 139(3), 301–310.
Google Scholar
Chen, H. H., Manry, M. T., & Chandrasekaran, H. (1999). A neural network training algorithm utilizing multiple sets of linear equations. Neurocomputing, 25, 55–72.
Article MATH Google Scholar
Chen, Y. X., & Wilamowski, B. M. (2002). TREAT: A trust-region-based error-aggregated training algorithm for neural networks. In Proceedings of International Joint Conference on Neural Networks (Vol. 2, pp. 1463–1468).
Google Scholar
Dai, Y. H., & Yuan, Y. (1999). A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on Optimization, 10, 177–182.
Google Scholar
Dixon, L. C. W. (1975). Conjugate gradient algorithms: Quadratic termination properties without linear searches. IMA Journal of Applied Mathematics, 15, 9–18.
Article Google Scholar
Ergezinger, S., & Thomsen, E. (1995). An accelerated learning algorithm for multilayer perceptrons: Optimization layer by layer. IEEE Transactions on Neural Networks, 6(1), 31–42.
Article Google Scholar
Fairbank, M., Alonso, E., & Schraudolph, N. (2012). Efficient calculation of the Gauss-Newton approximation of the Hessian matrix in neural networks. Neural Computation, 24(3), 607–610.
Article MATH Google Scholar
Fletcher, R. (1991). Practical methods of optimization. New York: Wiley.
Google Scholar
Fletcher, R., & Reeves, C. W. (1964). Function minimization by conjugate gradients. Computer Journal, 7, 148–154.
Article MathSciNet Google Scholar
Fukuoka, Y., Matsuki, H., Minamitani, H., & Ishida, A. (1998). A modified back-propagation method to avoid false local minima. Neural Networks, 11, 1059–1072.
Article Google Scholar
Georgiou, G., & Koutsougeras, C. (1992). Complex domain backpropagation. IEEE Transactions on Circuits and Systems II, 39(5), 330–334.
Article MATH Google Scholar
Gonzalez, A., & Dorronsoro, J. R. (2008). Natural conjugate gradient training of multilayer perceptrons. Neurocomputing, 71, 2499–2506.
Article Google Scholar
Goryn D. & Kaveh M. (1989). Conjugate gradient learning algorithms for multilayer perceptrons. In Proceedings of the 32nd Midwest Symposium on Circuits and Systems (Vol. 7, pp. 736–739). Champaign, IL.
Google Scholar
Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, 5(6), 989–993.
Article Google Scholar
Hanna, A. I. & Mandic, D. P. (2002). A normalised complex backpropagation algorithm. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 977–980). Orlando, FL.
Google Scholar
Hassibi, B., Stork, D. G. & Wolff, G. J. (1992). Optimal brain surgeon and general network pruning. In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco, CA.
Google Scholar
Heskes, T. (2000). On “natural” learning and pruning in multilayered perceptrons. Neural Computation, 12, 881–901.
Article Google Scholar
Hestenes, M. R. & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of National Bureau of Standards B, 49, 409–436.
Google Scholar
Hush, D. R., Horne, B., & Salas, J. M. (1992). Error surfaces for multilayer perceptrons. IEEE Transactions on Systems, Man, and Cybernetics, 22(5), 1152–1161.
Google Scholar
Igel, C., Toussaint, M., & Weishui, W. (2005). Rprop using the natural gradient. In: M.G. de Bruin, D.H. Mache & J. Szabados (Eds.), Trends and applications in constructive approximation, International Series of Numerical Mathematics (Vol. 151, pp. 259–271). Basel, Switzerland: Birkhauser.
Google Scholar
IIguni, Y., Sakai, H. & Tokumaru H. (1992) A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter. IEEE Transactions on Signal Processing, 40(4), 959–967.
Google Scholar
Johansson, E. M., Dowla, F. U., & Goodman, D. M. (1991). Backpropagation learning for multilayer feedforward neural networks using the conjugate gradient method. International Journal of Neural Systems, 2(4), 291–301.
Article Google Scholar
Kamarthi, S. V., & Pittner, S. (1999). Accelerating neural network training using weight extrapolations. Neural Networks, 12, 1285–1299.
Article Google Scholar
Kantsila, A., Lehtokangas, M., & Saarinen, J. (2004). Complex RPROP-algorithm for neural network equalization of GSM data bursts. Neurocomputing, 61, 339–360.
Article Google Scholar
Kim, T., & Adali, T. (2002). Fully complex multi-layer perceptron network for nonlinear signal processing. Journal of VLSI Signal Processing, 32(1), 29–43.
Article MATH Google Scholar
Kim, T., & Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Neural Computation, 15, 1641–1666.
Article MATH Google Scholar
Kostopoulos, A. E., & Grapsa, T. N. (2009). Self-scaled conjugate gradient training algorithms. Neurocomputing, 72, 3000–3019.
Article Google Scholar
Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. In Touretzky D. S. (Ed.), Advances in neural information processing systems (Vol. 2, pp. 396–404). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Lee, J. (2003). Attractor-based trust-region algorithm for efficient training of multilayer perceptrons. Electronics Letters, 39(9), 727–728.
Google Scholar
Leung, H., & Haykin, S. (1991). The complex backpropagation algorithm. IEEE Transactions on Signal Processing, 3(9), 2101–2104.
Article Google Scholar
Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174.
Article Google Scholar
Leung, C. S., Tsoi, A. C., & Chan, L. W. (2001). Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks. IEEE Transactions on Neural Networks, 12, 1314–1332.
Article Google Scholar
Li, Y., Zhang, D., & Wang, K. (2006). Parameter by parameter algorithm for multilayer perceptrons. Neural Processing Letters, 23, 229–242.
Article Google Scholar
Liu, C. S., & Tseng, C. H. (1999). Quadratic optimization method for multilayer neural networks with local error-backpropagation. International Journal on Systems Science, 30(8), 889–898.
Google Scholar
Manry, M. T., Apollo, S. J., Allen, L. S., Lyle, W. D., Gong, W., Dawson, M. S., et al. (1994). Fast training of neural networks for remote sensing. Remote Sensing Reviews, 9, 77–96.
Article Google Scholar
McLoone, S., & Irwin, G. (1999). A variable memory quasi-Newton training algorithm. Neural Processing Letters, 9, 77–89.
Article Google Scholar
McLoone, S. F., & Irwin, G. W. (1997). Fast parallel off-line training of multilayer perceptrons. IEEE Transactions on Neural Networks, 8(3), 646–653.
Article Google Scholar
McLoone, S.F., Asirvadam, V.S. & Irwin, G.W. (2002). A memory optimal BFGS neural network training algorithm. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 513–518). Honolulu, HI.
Google Scholar
Moller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4), 525–533.
Article Google Scholar
More, J. J. (1977). The Levenberg-Marquardt algorithm: Implementation and theory. In G. A. Watson (Ed.), Numerical analysis, Lecture Notes in Mathematics (Vol. 630, pp. 105–116). Berlin: Springer-Verlag.
Google Scholar
Nazareth, J. L. (2003). Differentiable optimization and equation solving. New York: Springer.
Google Scholar
Ng, S. C., Leung, S. H., & Luk, A. (1999). Fast convergent generalized back-propagation algorithm with constant learning rate. Neural Processing Letters, 9, 13–23.
Article Google Scholar
Ngia, L. S. H., & Sjoberg, J. (2000). Efficient training of neural nets for nonlinear adaptive filtering using a recursive Levenberg-Marquardt algorithm. IEEE Transactions on Signal Processing, 48(7), 1915–1927.
Article MATH Google Scholar
Nishiyama, K., & Suzuki, K. (2001). H\(_\infty \)-learning of layered neural networks. IEEE Transactions on Neural Networks, 12(6), 1265–1277.
Article Google Scholar
Nitta, T. (1997). An extension to the back-propagation algorithm to complex numbers. Neural Networks, 10(8), 1391–1415.
Article Google Scholar
Parisi, R., Di Claudio, E. D., Orlandim, G., & Rao, B. D. (1996). A generalized learning paradigm exploiting the structure of feedforward neural networks. IEEE Transactions on Neural Networks, 7(6), 1450–1460.
Article Google Scholar
Perantonis, S. J., Ampazis N. & Spirou, S. (2000). Training feedforward neural networks with the dogleg method and BFGS Hessian updates. In Proceedings of International Joint Conference on Neural Networks (pp. 138–143). Como, Italy.
Google Scholar
Perry, A. (1978). A modified conjugate gradient algorithm. Operation Research, 26, 26–43.
Article MathSciNet Google Scholar
Phua, P. K. H., & Ming, D. (2003). Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks, 14(6), 1460–1468.
Article Google Scholar
Polak, E. (1971). Computational methods in optimization: A unified approach. New York: Academic Press.
Google Scholar
Powell, M. J. D. (1977). Restart procedures for the conjugate gradient method. Mathematical Programming, 12, 241–254.
Google Scholar
Puskorius, G. V. & Feldkamp, L. A. (1991). Decoupled extended Kalman filter training of feedforward layered networks. In Proceedings of International Joint Conference on Neural Networks (vol. 1, pp. 771–777). Seattle, WA.
Google Scholar
Rao, K.D., Swamy M. N. S. & Plotkin, E. I. (2000). Complex EKF neural network for adaptive equalization. In Proceedings of IEEE International Symposium on Circuits and Systems (pp. 349–352). Geneva, Switzerland.
Google Scholar
Rigler, A. K., Irvine, J. M., & Vogl, T. P. (1991). Rescaling of variables in back propagation learning. Neural Networks, 4(2), 225–229.
Article Google Scholar
Rivals, I., & Personnaz, L. (1998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Neurocomputing, 20, 279–294.
Article Google Scholar
Rubanov, N. S. (2000). The layer-wise method and the backpropagation hybrid approach to learning a feedforward neural network. IEEE Transactions on Neural Networks, 11(2), 295–305.
Article Google Scholar
Ruck, D. W., Rogers, S. K., Kabrisky, M., Maybeck, P. S., & Oxley, M. E. (1992). Comparative analysis of backpropagation and the extended Kalman filter for training multilayer perceptrons. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(6), 686–691.
Article Google Scholar
Saarinen, S., Bramley, R., & Cybenko, G. (1993). Ill conditioning in neural network training problems. SIAM Journal on Scientific Computing, 14(3), 693–714.
Article MATH MathSciNet Google Scholar
Saito, K., & Nakano, R. (1997). Partial BFGS update and efficient step-length calculation for three-layer neural networks. Neural Computation, 9, 123–141.
Article MATH Google Scholar
Savitha, R., Suresh, S., Sundararajan, N., & Saratchandran, P. (2009). A new learning algorithm with logarithmic performance index for complex-valued neural networks. Neurocomputing, 72, 3771–3781.
Article Google Scholar
Scalero, R. S., & Tepedelenlioglu, N. (1992). A fast new algorithm for training feedforward neural networks. IEEE Transactions on Signal Processing, 40(1), 202–210.
Article Google Scholar
Shanno, D. (1978). Conjugate gradient methods with inexact searches. Mathematics of Operations Research, 3, 244–256.
Google Scholar
Shah, S. & Palmieri, F. (1990). MEKA–A fast, local algorithm for training feedforward neural networks. In Proceedings of International Joint Conference on Neural Networks (IJCNN) (Vol. 3, pp. 41–46). San Diego, CA.
Google Scholar
Shawe-Taylor, J. S., & Cohen, D. A. (1990). Linear programming algorithm for neural networks. Neural Networks, 3(5), 575–582.
Article Google Scholar
Singhal, S. & Wu, L. (1989). Training feedforward networks with the extended Kalman algorithm. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Vol. 2, pp. 1187–1190). Glasgow, UK .
Google Scholar
Stan, O., & Kamen, E. (2000). A local linearized least squares algorithm for training feedforward neural networks. IEEE Transactions on Neural Networks, 11(2), 487–495.
Article Google Scholar
Sum, J., Leung, C. S., Young, G. H., & Kan, W. K. (1999). On the Kalman filtering method in neural network training and pruning. IEEE Transactions on Neural Networks, 10, 161–166.
Article Google Scholar
Uncini, A., Vecci, L., Campolucci, P., & Piazza, F. (1999). Complex-valued neural networks with adaptive spline activation functions. IEEE Transactions on Signal Processing, 47(2), 505–514.
Article Google Scholar
van der Smagt, P. (1994). Minimisation methods for training feed-forward neural networks. Neural Networks, 7(1), 1–11.
Article Google Scholar
Verikas, A., & Gelzinis, A. (2000). Training neural networks by stochastic optimisation. Neurocomputing, 30, 153–172.
Article Google Scholar
Wang, Y. J., & Lin, C. T. (1998). A second-order learning algorithm for multilayer networks based on block Hessian matrix. Neural Networks, 11, 1607–1622.
Article Google Scholar
Wilamowski, B. M., Iplikci, S., Kaynak, O. & Efe, M.O. (2001). An algorithm for fast convergence in training neural networks. In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 1778–1782). Wahington, DC.
Google Scholar
Wilamowski, B. M., Cotton, N. J., Kaynak, O., & Dundar, G. (2008). Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. IEEE Transactions on Industrial Electronics, 55(10), 3784–3790.
Article Google Scholar
Wilamowski, B. M., & Yu, H. (2010). Improved computation for Levenberg Marquardt training. IEEE Transactions on Neural Networks, 21(6), 930–937.
Article Google Scholar
Wilamowski, B. M., & Yu, H. (2010). Neural network learning without backpropagation. IEEE Transactions on Neural Networks, 21(11), 1793–1803.
Article Google Scholar
Xu, Y., Wong, K.-W., & Leung, C.-S. (2006). Generalized RLS approach to the training of neural networks. IEEE Transactions on Neural Networks, 17(1), 19–34.
Article Google Scholar
Yang, S.-S., Ho, C.-L., & Siu, S. (2007). Sensitivity analysis of the split-complex valued multilayer perceptron due to the errors of the i.i.d. inputs and weights. IEEE Transactions on Neural Networks, 18(5), 1280–1293.
Article Google Scholar
Yang, S.-S., Siu, S., & Ho, C.-L. (2008). Analysis of the initial values in split-complex backpropagation algorithm. IEEE Transactions on Neural Networks, 19(9), 1564–1573.
Article Google Scholar
You, C., & Hong, D. (1998). Nonlinear blind equalization schemes using complex-valued multilayer feedforward neural networks. IEEE Transactions on Neural Networks, 9(6), 1442–1455.
Article Google Scholar
Yu, X. H., Chen, G. A., & Cheng, S. X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3), 669–677.
Article Google Scholar
Yu, C., Manry, M. T., Li, J., & Narasimha, P. L. (2006). An efficient hidden layer training method for the multilayer perceptron. Neurocomputing, 70, 525–535.
Article Google Scholar
Zhang, Y., & Li, X. (1999). A fast U-D factorization-based learning algorithm with applications to nonlinear system modeling and identification. IEEE Transactions on Neural Networks, 10, 930–938.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Enjoyor Labs, Enjoyor Inc., Hangzhou, China
Ke-Lin Du & M. N. S. Swamy
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du

Authors

Ke-Lin Du
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Multilayer Perceptrons: Other Learning Techniques. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5571-3_5
Published: 07 December 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5570-6
Online ISBN: 978-1-4471-5571-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics