Skip to main content

Multilayer Perceptrons: Other Learning Techniques

  • Chapter
  • First Online:
Neural Networks and Statistical Learning

Abstract

Training of feedforward networks can be viewed as an unconstrained optimization problem. BP is slow to converge when the error surface is flat along a weight dimension. Second-order optimization techniques have a strong theoretical basis and provide significantly faster convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amari, S. I. (1998). Natural gradient works efficiently in learning. Neural Computation, 10, 251–276.

    Article  Google Scholar 

  2. Amari, A., Park, H., & Fukumizu, K. (2000). Adaptive method of realizing natiral gradient learning for multilayer perceptrons. Neural Computation, 12, 1399–1409.

    Article  Google Scholar 

  3. Ampazis, N., & Perantonis, S. J. (2002). Two highly efficient second-order algorithms for training feedforward networks. IEEE Transactions on Neural Networks, 13(5), 1064–1074.

    Google Scholar 

  4. Azimi-Sadjadi, R., & Liou, R. J. (1992). Fast learning process of multilayer neural networks using recursive least squares method. IEEE Transactions on Signal Processing, 40(2), 446–450.

    Article  Google Scholar 

  5. Baermann, F., & Biegler-Koenig, F. (1992). On a class of efficient learning algorithms for neural networks. Neural Networks, 5(1), 139–144.

    Article  Google Scholar 

  6. Barnard, E. (1992). Optimization for training neural nets. IEEE Transactions on Neural Networks, 3(2), 232–240.

    Article  Google Scholar 

  7. Battiti, R., & Masulli, F. (1990). BFGS optimization for faster automated supervised learning. In Proceedings of International Conference on Neural Networks (Vol. 2, pp. 757–760). Paris, France. Dordrecht: Kluwer.

    Google Scholar 

  8. Battiti, R. (1992). First- and second-order methods for learning: Between steepest descent and Newton methods. Neural Computation, 4(2), 141–166.

    Article  Google Scholar 

  9. Battiti, R., Masulli, G., & Tecchiolli, G. (1994). Learning with first, second, and no derivatives: A case study in high energy physics. Neurocomputing, 6(2), 181–206.

    Article  Google Scholar 

  10. Beigi, H. S. M. (1993). Neural network learning through optimally conditioned quadratically convergent methods requiring no line search. In Proceedings of IEEE the 36th Midwest Symposium on Circuits and Systems (Vol. 1, pp. 109–112). Detroit, MI.

    Google Scholar 

  11. Benvenuto, N., & Piazza, F. (1992). On the complex backpropagation algorithm. IEEE Transactions on Signal Processing, 40(4), 967–969.

    Article  Google Scholar 

  12. Bhaya, A., & Kaszkurewicz, E. (2004). Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Networks, 17, 65–71.

    Article  MATH  Google Scholar 

  13. Bilski, J., & Rutkowski, L. (1998). A fast training algorithm for neural networks. IEEE Transactions on Circuits and Systems II, 45(6), 749–753.

    Article  Google Scholar 

  14. Bishop, C. M. (1992). Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Computation, 4(4), 494–501.

    Article  Google Scholar 

  15. Bishop, C. M. (1995). Neural networks for pattern recogonition. New York: Oxford Press.

    Google Scholar 

  16. Bortoletti, A., Di Fiore, C., Fanelli, S., & Zellini, P. (2003). A new class of quasi-Newtonian methods for optimal learning in MLP-networks. IEEE Transactions on Neural Networks, 14(2), 263–273.

    Article  Google Scholar 

  17. Burton, R. M., & Mpitsos, G. J. (1992). Event dependent control of noise enhances learning in neural networks. Neural Networks, 5, 627–637.

    Article  Google Scholar 

  18. Charalambous, C. (1992). Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proceedings-G, 139(3), 301–310.

    Google Scholar 

  19. Chen, H. H., Manry, M. T., & Chandrasekaran, H. (1999). A neural network training algorithm utilizing multiple sets of linear equations. Neurocomputing, 25, 55–72.

    Article  MATH  Google Scholar 

  20. Chen, Y. X., & Wilamowski, B. M. (2002). TREAT: A trust-region-based error-aggregated training algorithm for neural networks. In Proceedings of International Joint Conference on Neural Networks (Vol. 2, pp. 1463–1468).

    Google Scholar 

  21. Dai, Y. H., & Yuan, Y. (1999). A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on Optimization, 10, 177–182.

    Google Scholar 

  22. Dixon, L. C. W. (1975). Conjugate gradient algorithms: Quadratic termination properties without linear searches. IMA Journal of Applied Mathematics, 15, 9–18.

    Article  Google Scholar 

  23. Ergezinger, S., & Thomsen, E. (1995). An accelerated learning algorithm for multilayer perceptrons: Optimization layer by layer. IEEE Transactions on Neural Networks, 6(1), 31–42.

    Article  Google Scholar 

  24. Fairbank, M., Alonso, E., & Schraudolph, N. (2012). Efficient calculation of the Gauss-Newton approximation of the Hessian matrix in neural networks. Neural Computation, 24(3), 607–610.

    Article  MATH  Google Scholar 

  25. Fletcher, R. (1991). Practical methods of optimization. New York: Wiley.

    Google Scholar 

  26. Fletcher, R., & Reeves, C. W. (1964). Function minimization by conjugate gradients. Computer Journal, 7, 148–154.

    Article  MathSciNet  Google Scholar 

  27. Fukuoka, Y., Matsuki, H., Minamitani, H., & Ishida, A. (1998). A modified back-propagation method to avoid false local minima. Neural Networks, 11, 1059–1072.

    Article  Google Scholar 

  28. Georgiou, G., & Koutsougeras, C. (1992). Complex domain backpropagation. IEEE Transactions on Circuits and Systems II, 39(5), 330–334.

    Article  MATH  Google Scholar 

  29. Gonzalez, A., & Dorronsoro, J. R. (2008). Natural conjugate gradient training of multilayer perceptrons. Neurocomputing, 71, 2499–2506.

    Article  Google Scholar 

  30. Goryn D. & Kaveh M. (1989). Conjugate gradient learning algorithms for multilayer perceptrons. In Proceedings of the 32nd Midwest Symposium on Circuits and Systems (Vol. 7, pp. 736–739). Champaign, IL.

    Google Scholar 

  31. Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, 5(6), 989–993.

    Article  Google Scholar 

  32. Hanna, A. I. & Mandic, D. P. (2002). A normalised complex backpropagation algorithm. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 977–980). Orlando, FL.

    Google Scholar 

  33. Hassibi, B., Stork, D. G. & Wolff, G. J. (1992). Optimal brain surgeon and general network pruning. In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco, CA.

    Google Scholar 

  34. Heskes, T. (2000). On “natural” learning and pruning in multilayered perceptrons. Neural Computation, 12, 881–901.

    Article  Google Scholar 

  35. Hestenes, M. R. & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of National Bureau of Standards B, 49, 409–436.

    Google Scholar 

  36. Hush, D. R., Horne, B., & Salas, J. M. (1992). Error surfaces for multilayer perceptrons. IEEE Transactions on Systems, Man, and Cybernetics, 22(5), 1152–1161.

    Google Scholar 

  37. Igel, C., Toussaint, M., & Weishui, W. (2005). Rprop using the natural gradient. In: M.G. de Bruin, D.H. Mache & J. Szabados (Eds.), Trends and applications in constructive approximation, International Series of Numerical Mathematics (Vol. 151, pp. 259–271). Basel, Switzerland: Birkhauser.

    Google Scholar 

  38. IIguni, Y., Sakai, H. & Tokumaru H. (1992) A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter. IEEE Transactions on Signal Processing, 40(4), 959–967.

    Google Scholar 

  39. Johansson, E. M., Dowla, F. U., & Goodman, D. M. (1991). Backpropagation learning for multilayer feedforward neural networks using the conjugate gradient method. International Journal of Neural Systems, 2(4), 291–301.

    Article  Google Scholar 

  40. Kamarthi, S. V., & Pittner, S. (1999). Accelerating neural network training using weight extrapolations. Neural Networks, 12, 1285–1299.

    Article  Google Scholar 

  41. Kantsila, A., Lehtokangas, M., & Saarinen, J. (2004). Complex RPROP-algorithm for neural network equalization of GSM data bursts. Neurocomputing, 61, 339–360.

    Article  Google Scholar 

  42. Kim, T., & Adali, T. (2002). Fully complex multi-layer perceptron network for nonlinear signal processing. Journal of VLSI Signal Processing, 32(1), 29–43.

    Article  MATH  Google Scholar 

  43. Kim, T., & Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Neural Computation, 15, 1641–1666.

    Article  MATH  Google Scholar 

  44. Kostopoulos, A. E., & Grapsa, T. N. (2009). Self-scaled conjugate gradient training algorithms. Neurocomputing, 72, 3000–3019.

    Article  Google Scholar 

  45. Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. In Touretzky D. S. (Ed.), Advances in neural information processing systems (Vol. 2, pp. 396–404). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  46. Lee, J. (2003). Attractor-based trust-region algorithm for efficient training of multilayer perceptrons. Electronics Letters, 39(9), 727–728.

    Google Scholar 

  47. Leung, H., & Haykin, S. (1991). The complex backpropagation algorithm. IEEE Transactions on Signal Processing, 3(9), 2101–2104.

    Article  Google Scholar 

  48. Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174.

    Article  Google Scholar 

  49. Leung, C. S., Tsoi, A. C., & Chan, L. W. (2001). Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks. IEEE Transactions on Neural Networks, 12, 1314–1332.

    Article  Google Scholar 

  50. Li, Y., Zhang, D., & Wang, K. (2006). Parameter by parameter algorithm for multilayer perceptrons. Neural Processing Letters, 23, 229–242.

    Article  Google Scholar 

  51. Liu, C. S., & Tseng, C. H. (1999). Quadratic optimization method for multilayer neural networks with local error-backpropagation. International Journal on Systems Science, 30(8), 889–898.

    Google Scholar 

  52. Manry, M. T., Apollo, S. J., Allen, L. S., Lyle, W. D., Gong, W., Dawson, M. S., et al. (1994). Fast training of neural networks for remote sensing. Remote Sensing Reviews, 9, 77–96.

    Article  Google Scholar 

  53. McLoone, S., & Irwin, G. (1999). A variable memory quasi-Newton training algorithm. Neural Processing Letters, 9, 77–89.

    Article  Google Scholar 

  54. McLoone, S. F., & Irwin, G. W. (1997). Fast parallel off-line training of multilayer perceptrons. IEEE Transactions on Neural Networks, 8(3), 646–653.

    Article  Google Scholar 

  55. McLoone, S.F., Asirvadam, V.S. & Irwin, G.W. (2002). A memory optimal BFGS neural network training algorithm. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 513–518). Honolulu, HI.

    Google Scholar 

  56. Moller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4), 525–533.

    Article  Google Scholar 

  57. More, J. J. (1977). The Levenberg-Marquardt algorithm: Implementation and theory. In G. A. Watson (Ed.), Numerical analysis, Lecture Notes in Mathematics (Vol. 630, pp. 105–116). Berlin: Springer-Verlag.

    Google Scholar 

  58. Nazareth, J. L. (2003). Differentiable optimization and equation solving. New York: Springer.

    Google Scholar 

  59. Ng, S. C., Leung, S. H., & Luk, A. (1999). Fast convergent generalized back-propagation algorithm with constant learning rate. Neural Processing Letters, 9, 13–23.

    Article  Google Scholar 

  60. Ngia, L. S. H., & Sjoberg, J. (2000). Efficient training of neural nets for nonlinear adaptive filtering using a recursive Levenberg-Marquardt algorithm. IEEE Transactions on Signal Processing, 48(7), 1915–1927.

    Article  MATH  Google Scholar 

  61. Nishiyama, K., & Suzuki, K. (2001). H\(_\infty \)-learning of layered neural networks. IEEE Transactions on Neural Networks, 12(6), 1265–1277.

    Article  Google Scholar 

  62. Nitta, T. (1997). An extension to the back-propagation algorithm to complex numbers. Neural Networks, 10(8), 1391–1415.

    Article  Google Scholar 

  63. Parisi, R., Di Claudio, E. D., Orlandim, G., & Rao, B. D. (1996). A generalized learning paradigm exploiting the structure of feedforward neural networks. IEEE Transactions on Neural Networks, 7(6), 1450–1460.

    Article  Google Scholar 

  64. Perantonis, S. J., Ampazis N. & Spirou, S. (2000). Training feedforward neural networks with the dogleg method and BFGS Hessian updates. In Proceedings of International Joint Conference on Neural Networks (pp. 138–143). Como, Italy.

    Google Scholar 

  65. Perry, A. (1978). A modified conjugate gradient algorithm. Operation Research, 26, 26–43.

    Article  MathSciNet  Google Scholar 

  66. Phua, P. K. H., & Ming, D. (2003). Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks, 14(6), 1460–1468.

    Article  Google Scholar 

  67. Polak, E. (1971). Computational methods in optimization: A unified approach. New York: Academic Press.

    Google Scholar 

  68. Powell, M. J. D. (1977). Restart procedures for the conjugate gradient method. Mathematical Programming, 12, 241–254.

    Google Scholar 

  69. Puskorius, G. V. & Feldkamp, L. A. (1991). Decoupled extended Kalman filter training of feedforward layered networks. In Proceedings of International Joint Conference on Neural Networks (vol. 1, pp. 771–777). Seattle, WA.

    Google Scholar 

  70. Rao, K.D., Swamy M. N. S. & Plotkin, E. I. (2000). Complex EKF neural network for adaptive equalization. In Proceedings of IEEE International Symposium on Circuits and Systems (pp. 349–352). Geneva, Switzerland.

    Google Scholar 

  71. Rigler, A. K., Irvine, J. M., & Vogl, T. P. (1991). Rescaling of variables in back propagation learning. Neural Networks, 4(2), 225–229.

    Article  Google Scholar 

  72. Rivals, I., & Personnaz, L. (1998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Neurocomputing, 20, 279–294.

    Article  Google Scholar 

  73. Rubanov, N. S. (2000). The layer-wise method and the backpropagation hybrid approach to learning a feedforward neural network. IEEE Transactions on Neural Networks, 11(2), 295–305.

    Article  Google Scholar 

  74. Ruck, D. W., Rogers, S. K., Kabrisky, M., Maybeck, P. S., & Oxley, M. E. (1992). Comparative analysis of backpropagation and the extended Kalman filter for training multilayer perceptrons. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(6), 686–691.

    Article  Google Scholar 

  75. Saarinen, S., Bramley, R., & Cybenko, G. (1993). Ill conditioning in neural network training problems. SIAM Journal on Scientific Computing, 14(3), 693–714.

    Article  MATH  MathSciNet  Google Scholar 

  76. Saito, K., & Nakano, R. (1997). Partial BFGS update and efficient step-length calculation for three-layer neural networks. Neural Computation, 9, 123–141.

    Article  MATH  Google Scholar 

  77. Savitha, R., Suresh, S., Sundararajan, N., & Saratchandran, P. (2009). A new learning algorithm with logarithmic performance index for complex-valued neural networks. Neurocomputing, 72, 3771–3781.

    Article  Google Scholar 

  78. Scalero, R. S., & Tepedelenlioglu, N. (1992). A fast new algorithm for training feedforward neural networks. IEEE Transactions on Signal Processing, 40(1), 202–210.

    Article  Google Scholar 

  79. Shanno, D. (1978). Conjugate gradient methods with inexact searches. Mathematics of Operations Research, 3, 244–256.

    Google Scholar 

  80. Shah, S. & Palmieri, F. (1990). MEKA–A fast, local algorithm for training feedforward neural networks. In Proceedings of International Joint Conference on Neural Networks (IJCNN) (Vol. 3, pp. 41–46). San Diego, CA.

    Google Scholar 

  81. Shawe-Taylor, J. S., & Cohen, D. A. (1990). Linear programming algorithm for neural networks. Neural Networks, 3(5), 575–582.

    Article  Google Scholar 

  82. Singhal, S. & Wu, L. (1989). Training feedforward networks with the extended Kalman algorithm. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Vol. 2, pp. 1187–1190). Glasgow, UK .

    Google Scholar 

  83. Stan, O., & Kamen, E. (2000). A local linearized least squares algorithm for training feedforward neural networks. IEEE Transactions on Neural Networks, 11(2), 487–495.

    Article  Google Scholar 

  84. Sum, J., Leung, C. S., Young, G. H., & Kan, W. K. (1999). On the Kalman filtering method in neural network training and pruning. IEEE Transactions on Neural Networks, 10, 161–166.

    Article  Google Scholar 

  85. Uncini, A., Vecci, L., Campolucci, P., & Piazza, F. (1999). Complex-valued neural networks with adaptive spline activation functions. IEEE Transactions on Signal Processing, 47(2), 505–514.

    Article  Google Scholar 

  86. van der Smagt, P. (1994). Minimisation methods for training feed-forward neural networks. Neural Networks, 7(1), 1–11.

    Article  Google Scholar 

  87. Verikas, A., & Gelzinis, A. (2000). Training neural networks by stochastic optimisation. Neurocomputing, 30, 153–172.

    Article  Google Scholar 

  88. Wang, Y. J., & Lin, C. T. (1998). A second-order learning algorithm for multilayer networks based on block Hessian matrix. Neural Networks, 11, 1607–1622.

    Article  Google Scholar 

  89. Wilamowski, B. M., Iplikci, S., Kaynak, O. & Efe, M.O. (2001). An algorithm for fast convergence in training neural networks. In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 1778–1782). Wahington, DC.

    Google Scholar 

  90. Wilamowski, B. M., Cotton, N. J., Kaynak, O., & Dundar, G. (2008). Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. IEEE Transactions on Industrial Electronics, 55(10), 3784–3790.

    Article  Google Scholar 

  91. Wilamowski, B. M., & Yu, H. (2010). Improved computation for Levenberg Marquardt training. IEEE Transactions on Neural Networks, 21(6), 930–937.

    Article  Google Scholar 

  92. Wilamowski, B. M., & Yu, H. (2010). Neural network learning without backpropagation. IEEE Transactions on Neural Networks, 21(11), 1793–1803.

    Article  Google Scholar 

  93. Xu, Y., Wong, K.-W., & Leung, C.-S. (2006). Generalized RLS approach to the training of neural networks. IEEE Transactions on Neural Networks, 17(1), 19–34.

    Article  Google Scholar 

  94. Yang, S.-S., Ho, C.-L., & Siu, S. (2007). Sensitivity analysis of the split-complex valued multilayer perceptron due to the errors of the i.i.d. inputs and weights. IEEE Transactions on Neural Networks, 18(5), 1280–1293.

    Article  Google Scholar 

  95. Yang, S.-S., Siu, S., & Ho, C.-L. (2008). Analysis of the initial values in split-complex backpropagation algorithm. IEEE Transactions on Neural Networks, 19(9), 1564–1573.

    Article  Google Scholar 

  96. You, C., & Hong, D. (1998). Nonlinear blind equalization schemes using complex-valued multilayer feedforward neural networks. IEEE Transactions on Neural Networks, 9(6), 1442–1455.

    Article  Google Scholar 

  97. Yu, X. H., Chen, G. A., & Cheng, S. X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3), 669–677.

    Article  Google Scholar 

  98. Yu, C., Manry, M. T., Li, J., & Narasimha, P. L. (2006). An efficient hidden layer training method for the multilayer perceptron. Neurocomputing, 70, 525–535.

    Article  Google Scholar 

  99. Zhang, Y., & Li, X. (1999). A fast U-D factorization-based learning algorithm with applications to nonlinear system modeling and identification. IEEE Transactions on Neural Networks, 10, 930–938.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Multilayer Perceptrons: Other Learning Techniques. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5571-3_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5570-6

  • Online ISBN: 978-1-4471-5571-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics