A case study of a distributed high-performance computing system for neurocomputing

doi:10.1016/S1383-7621(99)00017-X

Journal of Systems Architecture

Volume 46, Issue 5, March 2000, Pages 429-438

https://doi.org/10.1016/S1383-7621(99)00017-X Get rights and content

Abstract

We model here a distributed implementation of cross-stopping, a combination of cross-validation and early-stopping techniques, for the selection of the optimal architecture of feed-forward networks. Due to the very large computational demand of the method, we use the RAIN system (Redundant Array of Inexpensive workstations for Neurocomputing) as a target platform for the experiments and show that this kind of system can be effectively used for computational intensive neurocomputing tasks.

Introduction

Artificial Neural Networks (ANNs) are an effective tool for Pattern Recognition (PR) tasks [4]. The classification underlying a typical PR problem are usually unknown, but ANNs are able to learn them from examples (i.e. a set of already classified samples called the training set). After learning, the most important performance measure of a neural network is its generalisation capacity, or, in other words, the ability to correctly classify a new pattern according to the rules learnt on the training set.

Asymptotic results have demonstrated the optimal behaviour of neural networks in classification tasks [15]; unfortunately one is confronted, in practice, with a limited data set where the generalisation is far from being optimal. A large amount of theoretical work has been developed, based on statistical learning theory, and outstanding results have been derived (e.g. [16]). However, practitioners have found in many occasions that these bounds are overall pessimistic and often the behaviour of neural networks is better than predicted by theory.

A practical way to compute the generalisation capacity of ANNs, and many other PR systems, is to split the available examples into training and test set: the generalisation error of the system, trained on the former is estimated according to the latter. Unfortunately, in many cases, such samples are too few, so a further subdivision inevitably involves a loss of efficiency in the system design. Furthermore it has been demonstrated [10] that such an approach is very sensitive to the specific splitting of the data. Several other techniques have been designed to overcome this problem [11]; nevertheless the cost of using such methods is a higher computational requirement.

We focus on two practical methods for designing effective neural networks: k-fold cross-validation and early-stopping. The first one is a method for estimating the generalisation capabilities of the network [8]: given n patterns, several networks are designed using a set of (n−k) patterns and the average of the error rate on the k test patterns is computed, then the network with the lowest average error is selected and trained on the entire set. The case k=1 is the well-known leave-one-out method [13]. The second technique is used to avoid overfitting: the learning is stopped at the minimum of the error on the test set, usually well before reaching the minimum on the training set. It has been argued [18] that this is equivalent to design a network with smaller complexity and therefore better generalisation ability. It is quite obvious that the computational load of the cross-validation can be very high; in fact, several networks must be retrained on different sets of (n−k) patterns.

We experiment on the combination of the two methods described above: cross-validation is used to estimate the performance of different network architectures stopped at different times during learning. We use the term cross-stopping to indicate this procedure in the same way as the term boot-stopping is used for the combination of bootstrap and early-stopping [17].

The purpose of cross-stopping is to find a network with an optimal architecture (i.e. number of hidden neurons) and train for an optimal number of steps, with respect to its estimated generalisation ability.

In this paper, we describe a distributed implementation of cross-stopping using the RAIN system (Redundant Array of Inexpensive workstations for Neurocomputing), consisting of several workstations connected by a LAN, acting as a single parallel computer.

In the following section, we briefly describe the rationale behind the RAIN system. Section 3details the cross-stopping algorithm as implemented on RAIN, and in Section 4we provide a computing model for evaluating the performance of such systems. Section 5shows some experimental results obtained on real-world problems.

Section snippets

The RAIN system

The need for high-performance computing in approaching statistical validation methods is quite obvious; in our case, there are at least two main reasons for resorting to this kind of distributed architecture as computing platform:

•
the cross-validation phase is trivially parallel. It consists of several independent learning phases in which communication is usually negligible with respect to the computation time. Eventually, some data transfer occurs only at the end of each learning phase;
•
the

The MBP algorithm

In Table 1 the MBP algorithm is reported [2]. The weights and biases of the network are stored respectively in matrices $W_{H}$ , $W_{O}$ and vectors $b_{H}$ , $b_{O}$ . Input patterns are stored in matrix $S_{I}$ in row order and target patterns in matrix $T$ . Matrices $S_{H}$ and $S_{O}$ contain the output of the corresponding layer when $S_{I}$ is applied to the input of the network, assuming f() as the activation function of the neurons. The back-propagated error is stored in matrices $Δ_{O}$ and $Δ_{H}$ and the variations of weights and

A time-computing model for the distributed cross-stopping algorithm

The number of operations (n^MBP_op) needed by the MBP algorithm has been computed in Ref. [3]. Using this result, we can write, for our particular case: $n^{MBP}_{op} (h)=[2N_{p} (3N_{O} +2N_{I})+(3+k_{1} +k_{2})N_{p} +4(N_{O} +N_{I})+4]h$ $+[(3+k_{1} +k_{2})N_{p} N_{O} −N_{p} +4N_{O}],$ where the network is composed of N_I input neurons, 1≤h≤N_h^max hidden neurons and N_O output neurons; the training set consists of N_p=n−k patterns and k₁ and k₂ are respectively, the number of operations needed for the computation of the activation function of the neurons and its

Experimental results

The cluster used for our experiments is composed of seven heterogeneous workstations: the master machine is an HP9000/735. The slaves are two IBM6000 (550 and 250, respectively) and four personal computers (HP Pentium Pro). The three workstations are interconnected by an Ethernet 10Base2 LAN (10 Mb/s), whereas the four PCs use an HP 100 VGAnyLAN (100 Mb/s). The two LANs are interconnected through a bridge. Note that the raw computational power and cost of the seven computers is very different:

Conclusions

We have sketched an implementation of the cross-stopping method on a distributed high-performance computing system. We believe that computer intensive statistical methods will gain more and more popularity thanks to greater availability of low cost high-performance computing. The RAIN system has been developed to demonstrate the feasibility of an effective and low-cost approach to high-performance neurocomputing.

More information on RAIN and instructions on how to obtain the code are available

Acknowledgements

The RAIN project acts within the framework of the European Commission Esprit Programme: “Demonstration and assessment of HPCN in neural network applications for industry and medicine”. The Pentium Pro PCs have been kindly donated by Hewlett–Packard, Italy. The Lyme data set is courtesy of “Centro Reumatologico Istituto Bruzzone”, Genova, Italy. We thank the anonymous reviewers for their suggestions on how to improve the original manuscript.

Davide Anguita was born in Genoa in 1963. He obtained the “Laurea” degree in Electronic Engineering in 1989 and the Ph.D. in Electronic Engineering and Computer Science in 1994 from University of Genoa. Since 1993 he is working in the field of modeling, simulation and VLSI implementation of artificial neural network. He is currently assistant professor at the Department of Biophysical and Electronic Engineering of the University of Genoa.

References (18)

D. Anguita et al.
An efficient implementation of BP on RISC based workstations
Neurocomputing
(1994)
D. Anguita et al.
Mixing floating- and fixed-point formats for neural network learning on neuroprocessors
Microprocessing and Microprogramming
(1996)
E.C. Anderson et al.
Performance of LAPACK: A portable library of numerical linear algebra routines
Proc. IEEE
(1993)
C. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford,...
A. Corana, C. Rolando, S. Ridella, A highly efficient implementation of back–propagation algorithm on SIMD computers,...
A. Corana et al.
Use of Level 3 BLAS Kernels in neural networks: The back–propagation algorithm
Parallel Computing
(1990)
A. Geist et al., PVM: Parallel Virtual Machine, a User's Guide and Tutorial for Networked Parallel Computing, The MIT...
J.S. Hjorth, Computer Intensive Statistical Methods: Validation Model Selection and Bootstrap, Chapman & Hall, London,...
R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proc. IJCAI 1995,...

There are more references available in the full text version of this article.

Cited by (2)

Distributed computing methodology for training neural networks in an image-guided diagnostic application
2006, Computer Methods and Programs in Biomedicine
Citation Excerpt :
However, not many neural network researchers have access to a high performance parallel machine [39]. On the other hand, most of the researchers have access to networked workstations which can be easily used in a distributed computing architecture [40]. Below, we construct a parallel procedure that uses the well known master–slave computational model [9].
Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used.
Use of distributed computing in derivative pricing
2009, International Journal of Electronic Finance

Andrea Boni was born in Genoa, Italy, in 1969 and received the Laurea degree in Electronic Engineering from University of Genoa, Italy, in 1996. He is pursuing the Ph.D. degree in Electronic and Computer Science Engineering in Electronic Systems Group of Department of Biophysical and Electronic Engineering (DIBE) at University of Genoa. In 1997 he operated as research consultant with DIBE. His main scientific interests focus on engineering of high-performance systems and Neural Networks.

Giancarlo Parodi was born in Genova in 1948. He received the “Laurea” degree in Electronic Engineering in 1973 from the University of Genova. He was Associate Professor of Applied Electronics at DIBE until 1994. Currently he is Full Professor of Applied Electronics at the same Department. He is a member of AEI, AICA and IEEE. Giancarlo PARODI is currently teaching the following courses:

•
Industrial electronics,
•
Applied electronics.

View full text

A case study of a distributed high-performance computing system for neurocomputing

Abstract

Introduction

Section snippets

The RAIN system

The MBP algorithm

A time-computing model for the distributed cross-stopping algorithm

Experimental results

Conclusions

Acknowledgements

Neurocomputing

Microprocessing and Microprogramming

Performance of LAPACK: A portable library of numerical linear algebra routines

Proc. IEEE

Use of Level 3 BLAS Kernels in neural networks: The back–propagation algorithm

Parallel Computing