USING NEURAL NETWORKS AND DEEP LEARNING ALGORITHMS IN ELECTRICAL IMPEDANCE TOMOGRAPHY

. This paper refers to the cases of the use of Artificial Neural Networks and Convolutional Neural Networks in impedance tomography. Machine Learning methods can be used to teach computers different technical problems. The efficient use of conventional artificial neural networks in tomography is possible able to effectively visualize objects. The first step of implementation Deep Learning methods in Electrical Impedance Tomography was performed in this work


Introduction
Artificial Neural Network (ANN) imitates the action of the human brain [3,4]. It consists of neuronswhich are the counterparts of nerve cells. Individual neurons are interconnected by creating a network. Neural networks have found wide use in modelling nonlinear, complex, and multi-dimensional data and also in analysing experimental, industrial, and satellite data. The neural network methods are successfully applied to X-ray tomography [6], electron tomography [1] and different kinds of tomographic purposes [5]. For years researchers try to invent and adopt the mix of different methods to get the better results. In this way, various new techniques arise. An example of such approach is the use of capacitance tomography to discriminate the number of fruit passing through an industrial process [10,11]. Neural networks are also utilized in optical tomography. For neural networks, it is important to choose the proper training method. The model for an image reconstruction consists of a neural network trained with the Bayesian framework by maximizing a posteriori probability. In order to solve the mixed binary and continuous optimization problem, a coupled gradient neural network was proposed. The optimization was realized following the evolution of the neural network by a proper definition of the energy function of it [13]. Another area of ANN application is a possibility to investigate the anomalies in the breast tissue using electrical impedance tomography supported by neural network algorithms [9].
The common feature of Machine Learning methods is that they can be used to teach computers in a manner that is analogous to how people learn by example. One of the Machine Learning techniques is Deep Learning, which is usually associated with Convolutional Neural Networks (CNN). Describing the extraordinary neural network workflow as "deep" distinguishes CNN model from the well-known Artificial Neural Networks (ANN) shallow architecture. Deep learning takes place if the neural network structure contains so-called convolutional layers and pooling layers. Another difference is the number of hidden layers. While typical ANNs usually contain no more than 2-3 hidden layers, CNN can have them more than a hundred. CNNs can be used for applications such as: • image classification, object detection, localization; • face recognition; • speech and natural language processing; • medical imaging and interpretation; • seismic imaging and interpretation etc.

Determining the location of an object using an artificial multilayer perceptron neural network (multilayer perceptron)
The efficient use of conventional artificial neural networks in tomography is possible, but the effectiveness of this tool depends on many conditions. First of all, ANN (artificial neural networks) are able to effectively visualize objects that many parameters are already known. An example is the problem of determining the location of an object inside another substance (wall, dam, ground, etc.), which impedes standard video identification.
If the number and size of objects in the area are known and the purpose of the tomographic process is to determine their location, the classic ANN can be used successfully. Where the electrical tomography measuring element is based on 16 electrodes, the input signal vector (voltage) counts 208 values. These are real numbers that reflect the voltages between the different combinations of pairs of electrodes. It should be noted that the 208-element vector of voltages refers only to one cross-section, which is insufficient for 3D objects. If there is a registered set of training records, consisting of pairs of input vectors and corresponding output vectors determining the location and size of the object being recognized by the CT (computed tomography) scanner, a good solution is to use a multilayer perceptron in conjunction with a supervised training procedure (training with teacher data) [7,8,12].
The results of the research for two problem cases are presented below. In both cases it is assumed that the shape of the hidden object is round (spherical). The first case concerned an ANN study to determine the location and size (radius) of the hidden object. The second case refers to the similar problem, however, it takes into account two objects located in the area affected by the CT scanner. Each of mentioned above problem cases were considered in two variants. The first variant was based on positioning using Cartesian coordinates, while the second variant performed this task using polar coordinates.
In this case, the input vector consisted of 208 measurement cases (1). Each element contained a voltage between the specified pair of electrodes. (1) The output vector contained three elements: the coordinates and the radius of the object being sought (2). IAPGOŚ 3/2017 p-ISSN 2083-0157, e-ISSN 2391-6761 (2) where: y 1horizontal coordinate of the centre of mass of the object, y 2vertical coordinate of the centre of mass of the object, y 3radius R of the cross section of the spherical object. Fig. 1 shows the schema of the applied neural network model. The network has 208 inputs, 10 neurons in the hidden layer and 3 neurons in the output layer. The hidden layer uses a logistic transfer function. In the output layer, the transfer function is linear.

Fig. 1. ANN structure
A dataset of 320 cases was used to train the neural network. The following results apply to the Levenberg-Marquardt training variant. This algorithm typically requires more memory but less time. Training automatically stops when generalization stops improving, as indicated by an increase in square mean error of validation samples.
The results of training the best developed networks are presented in Fig. 2.

Fig. 2. ANN training results
The data set was divided into 3 parts: training set (224 cases), validation set (48 cases) and test set (48 cases). The highest Mean Squared Error (MSE) was obtained with the test set and it was 0.00257. A slightly smaller error of 0.00172 was noted for the validation set. Mean Squared Error is the average squared difference between outputs and targets. Lower values are better. Zero means no error. The learning set was burdened with the lowest learning error, which is the most common and correct situation. The low MSE error of the learning set is due to the fact that the network is best adapted to learning cases. Another quality indicator of network quality was regression of R. An R value of 1 means a close relationship, 0 a random relationship. As can be seen in Fig. 2 in all three cases, R is close to 1. This is particularly true of the test and validation set, which is particularly valuable. Values close to 1 testify to a good matching of the resultant output (output vectors) to the patterns contained in individual sets (training, validation, and test).
The validation set is used to determine when the training process stops. When the dynamics of the gradient change approach zero, then the learning process ends. The test set is applicable after the training phase. It is used to verify the quality of the network. The results obtained by testing a network with the test set are the most reliable indicator of network efficiency, because cases in this set do not participate in the training process. The good indicators (MSE and R) for the training set show that there was no overtraining and that the network has the ability to generalize knowledge (i.e., correctly transforming input into output not only for the training set). Fig. 3 shows the correlation diagrams of the discussed network. The scattering of results that go beyond the pattern is visible, but the level of correlation is still high. This is evidenced by overlapping correlation lines for all studied cases: the training set, validation set, test set and collectively.  The gradient graph shows that the gradient has stabilized at an even level since the preceding epoch (the change dynamics was close to zero). By analyzing an analogous point in the momentum graph (mu), it can be seen that in 48th epoch it reaches its minimum. The last (bottom) curve of the graph corresponds to the number of preceding epochs that did not improve the validation deviation. It was assumed that if after another six epochs the validation error does not fall, the training process should be terminated. That is why the process of training ended at 48th epoch.

Determining the position of a single object by polar coordinates
As in the case of Cartesian coordinates, the input vector consisted of 208 measurement cases (1). The output vector contained three elements: the polar coordinates (angle α and the leading radius R) and the radius r determining the size of the object sought (Fig. 6). The structure of the neural network model is the same as in the previous case. The network has 208 inputs, 10 neurons in the hidden layer and 3 neurons in the output layer. The hidden layer uses a logistic transfer function. In the output layer, the transfer function is linear.
The output vector represents the relationship (3).
A collection of 3210 cases was used to train the neural network. Levenberg-Marquardt algorithm was used to train the network. The results of learning the best of the developed networks are presented in the Fig. 7.
The highest Mean Squared Error (MSE) was found in the training set and was 48.3. A slightly smaller error of 44.4 was noted for the testing set. The level of regression for all three sets was very high. For the testing set it equals 0.98. Comparison of the R-coefficients of the Cartesian coordinate-based variant (Fig. 3) with the variant of polar coordinates shows some differences in the output values distribution, but the regression is higher for the second variant.

Tomographic imaging with the use of Deep learning
The most perfect variant of tomographic imaging is the ability to convert a set of measured values into a high-resolution pixel map and a rich colour palette. Such a solution would make it possible to accurately identify hidden objects by tomographic reconstruction. This method doesn't require any preliminary assumptions on, for example, the quantity and shape of identified objects. This kind of conversion is a difficult challenge, due to its high degree of complexity and no obvious rules for converting input variables into an output image. To solve the mentioned above problem, the model based on Convolutional Neural Networks (CNN) which is a relatively new field of science called Deep Learning, was invented [2]. In this example, it is assumed that the entire background image (cross-section of the sought object) consists of pixels with constant values, such as zero. Each learning case (pattern of the output image) contains eight pixels with the same non-zero value, e.g.
[ ]. Pixel values correspond to the specific conductivity, which allows the proper identification of the material of investigated hidden object. With this approach, non-zero pixels can create differentiated images on a uniform background. If the above problem could be solved (imaging will be effective), the next step should be to differentiate the input pixel values. The input vector was a 208-element set of measurements the same one that was used in the previous examples (1). The output matrix is shown in Fig. 8. This is a 128-element set of real numbers that correspond to values 0 or 1,2. To simplify the calculation, it is assumed that each pattern contains eight pixels with values other than the background. In Fig. 9, we see two objects consisting of eight pixels (4+4). Fig. 10 shows the structure used in the CNN experiment along with the parameters of the given layers. It consists of an input layer, three convolution layers, and an output layer. Besides, the convolution layers (1, 3 and 5) are separated by pooling layers. In addition, Fig. 11 also shows the parameters of individual layers, such as support, filter dimensions, stride, pad etc. Fig. 12 shows the course of CNN training. The shape of the energy drop curve, corresponding to the deviation gradient from the pattern, indicates that the network is learning properly.

Remarks and conclusion
This paper presents two approaches for tomographic reconstruction. The first two chapters refer to cases of implementation of multilayer perceptron. The received results show the high efficiency of common artificial neural networks in case the number of controller outputs is not high. The results are similar in both Cartesian and polar coordinates.
In the chapter 3 the first step of implementation Deep Learning methods in Electrical Impedance Tomography was presented. The idea of the presented solution assumed that a convolutional neural network could convert a vector of electrical values into a vector (or matrix) of a reconstructed image of a CT scan object. Conventional networks are most commonly used in classification problems, but in this case the nature of the problem is regressive. The model was based on Convolutional Neural Networks which is a relatively new field of science. An open question that requires further investigation is to determine the following CNN parameters: • proper design of the fully connected layerthe last (original) layer of the network, • adjustment of the number of CNN layers, • selection of parameters of individual network layers (dimensions of filters, bias, stride, pad), • dimensions of filters in different layers, • number of channels and number of filters in each layer, • Learning Rate parameter selection, • set a condition for stopping the training process.