3.1 Data
In the Pingdingshan coal mine, 496 mine water data points were collected. Due to the large amount of data, some of the data are shown in Table 2. Table 2 clearly shows that there is a variance of five orders of magnitude. Data are most valuable when you have something to compare it to, but these comparisons aren’t helpful if the data is bad or irrelevant. Data standardization is about ensuring that data are internally consistent, that is, each data type has the same content and format. Standardized values are useful for tracking data that isn’t easy to compare otherwise. The raw data are normalized individually according to Eq (1)
Zij =(xij-mean(xj))/std(xj) (1)
where the subscript i means the row of the data matrix, the subscript j means the column of the data matrix, Zij represents the data after standardization, xij represents the source data, and the symbol std represents the standard deviation of related data10.
Table 2
Hydrochemical compositions and discriminant results of the water filling aquifer (unit: mg/L. In the last column, which is groundwater type (label column), 0 represents the surface water, 1 represents pore water of Quaternary limestone, 2 represents karst water of Carboniderous limestone, 3 represents sandstone water of Permian limestone, and 4 karst water of Cambrian limestone.)
Na++K+
|
Ca2+
|
Mg2+
|
Cl−
|
SO2- 4
|
HCO- 3
|
Groundwater type
|
284.16
|
13.03
|
7.05
|
31.56
|
4.94
|
768.84
|
3
|
68.31
|
57.72
|
23.35
|
18.08
|
94.14
|
329.5
|
3
|
27.37
|
143.08
|
19.68
|
36.16
|
179.15
|
314.25
|
2
|
29.21
|
179.36
|
25.64
|
58.49
|
208.93
|
386.26
|
2
|
40.1
|
80.59
|
11.08
|
31.75
|
80.05
|
257.92
|
2
|
18.03
|
82.56
|
10.21
|
17.33
|
40.35
|
268.48
|
2
|
12.65
|
75.59
|
11.55
|
18.53
|
32.51
|
248.96
|
2
|
15.85
|
86.11
|
8.52
|
16.97
|
39.31
|
253.099
|
2
|
18.4
|
78.6
|
12
|
12
|
21
|
295.9
|
2
|
5.52
|
80.7
|
10.6
|
8.5
|
14.3
|
271.2
|
2
|
2.69
|
98.4
|
6.6
|
28.4
|
18.1
|
268.49
|
2
|
5.28
|
79.65
|
14.99
|
9.46
|
8.82
|
82.02
|
2
|
0.14
|
93.29
|
15.95
|
14.3
|
44.45
|
225.75
|
2
|
7.43
|
112.44
|
16.68
|
25.75
|
80.67
|
275.37
|
2
|
37.03
|
91.98
|
49.82
|
59.56
|
107.11
|
389.92
|
2
|
77.05
|
94.59
|
31.71
|
36.51
|
244.47
|
278.25
|
2
|
83.72
|
169.14
|
36.21
|
63.46
|
303.03
|
423.48
|
2
|
38.87
|
82.16
|
18.1
|
30.13
|
101.34
|
263.61
|
4
|
133.57
|
44.09
|
19.46
|
86.55
|
47.86
|
379.18
|
4
|
246.07
|
41.21
|
30.3
|
63.48
|
218.37
|
430.97
|
4
|
246.07
|
41.21
|
30.3
|
63.48
|
218.37
|
430.97
|
4
|
234.01
|
48.22
|
30.92
|
64.38
|
277.47
|
433.91
|
4
|
31.17
|
89.04
|
10.98
|
12.94
|
31.38
|
345.67
|
0
|
20.17
|
148.8
|
24.46
|
70.08
|
35.78
|
454.57
|
0
|
32.34
|
50
|
3.76
|
10.78
|
41.46
|
184.67
|
0
|
35.19
|
68.25
|
7.46
|
15.28
|
51.35
|
246.23
|
0
|
10.28
|
72.95
|
10.21
|
10.99
|
19.21
|
241.63
|
1
|
16.74
|
94.37
|
11.1
|
8.55
|
11.73
|
348.41
|
1
|
452.1
|
12.4
|
5.96
|
116.42
|
86.16
|
922
|
2
|
303.51
|
8.02
|
2.43
|
51.06
|
5.76
|
629.7
|
2
|
119.6
|
87.17
|
22.72
|
60.26
|
179.15
|
365.51
|
0
|
110.86
|
119.64
|
27.22
|
114.86
|
352.54
|
149.5
|
0
|
47.59
|
75.55
|
14.34
|
28.61
|
90.96
|
257
|
0
|
11.57
|
30.32
|
6.93
|
6.2
|
21.4
|
118.98
|
0
|
52.9
|
201.8
|
24.18
|
62.39
|
298.75
|
389.31
|
1
|
100.97
|
114.63
|
16.65
|
64.52
|
310.75
|
194.65
|
1
|
29
|
87.06
|
8.04
|
9.8
|
72.32
|
266.98
|
1
|
809.6
|
4.8
|
15.07
|
151.02
|
143.12
|
1796.42
|
3
|
1109.73
|
15.63
|
8.88
|
94.66
|
24.98
|
2498.77
|
3
|
1036.85
|
10.42
|
1.22
|
64.17
|
4.8
|
2288.25
|
3
|
284.16
|
13.03
|
7.05
|
31.56
|
4.94
|
768.84
|
3
|
321.34
|
3.8
|
1.38
|
81.52
|
21.13
|
599.42
|
3
|
68.31
|
57.72
|
23.35
|
18.08
|
94.14
|
329.5
|
3
|
158.24
|
167.13
|
173.4
|
30.49
|
1335.23
|
52.48
|
3
|
31.27
|
76.35
|
11.07
|
21.27
|
46.5
|
277.03
|
3
|
14.02
|
76.18
|
12.64
|
19.01
|
40.33
|
248.96
|
3
|
14.25
|
73.21
|
13.98
|
18.53
|
39.09
|
249.57
|
3
|
27.37
|
143.08
|
19.68
|
36.16
|
179.15
|
314.25
|
2
|
29.21
|
179.36
|
25.64
|
58.49
|
208.93
|
386.26
|
2
|
In the datasets, the label column is categorical data (string values). These labels have no specific order of preference, and since the data are string labels, the deep learning model cannot work on such data directly11. One approach to solve this problem can be label encoding, where we assign a numerical value to these labels, for example, the surface water and pore water of the Quaternary mapped to 0 and 1. However, this can add bias in our model, as it will start giving higher preference to the pore water of the Quaternary parameter as 1>0, and ideally, both labels are equally important in the datasets. To address this issue, we will use the one hot encoding technique, which will create a binary vector of length 5. Here, the label ‘the surface water’, which is encoded as ‘0’, has a binary vector of [0,0,0,0,1]. As is shown in Table 3.
Table 3
Natural number
|
One-hot encoding
|
0
|
0,0,0,0,1
|
1
|
0,0,0,1,0
|
2
|
0,0,1,0,0
|
3
|
0,1,0,0,0
|
4
|
1,0,0,0,0
|
3.2 Deep learning basics
A machine learning algorithm is an algorithm that is able to learn from data. As a special machine learning algorithm, most modern deep learning models are based on artificial neural networks (ANNs), which form the basis of most deep learning methods and are a class of supervised learning techniques that mimic biological neural networks (Fig. 3). ANN is built from one or more layers containing a series of neurons12. The weights and biases between different neurons adjust as learning proceeds with the aim of minimizing the loss between the predicted output and actual output. The training processes of the ANN are the adjustment processes of weights and biases, which are carried out by a back propagation procedure. In the procedure, the gradient descent algorithm is used to update the weights and biases of neurons by estimating the gradient of the loss function. In the process of training, weights and biases accept an adjustment proportional to the partial derivative of the loss function relative to the current weights and biases. With the increasing number of layers, the problem of vanishing gradients, however, makes ANNs hard to train13.
Typically, when training an ANN model, we have access to a training set, we can compute some error measure on the training set, called the training error, and we reduce this training error. Thus far, what we have described is simply an optimization problem. The training and test data are generated by a probability distribution over datasets14.
3.3 Deep learning architectures
Deep learning is a subset of machine learning where the artificial neural network comes in relation. It solves all the complex problems with the help of algorithms and its process. This idea is that the additional level of abstraction improves the capability of the network to generalize to unseen data and hence outperforms traditional ANN on data outside of the network training set. The learning process is deep because the structure of artificial neural networks consists of multiple input, output, and hidden layers. Each layer contains units that transform the input data into information that the next layer can use for a certain predictive task15.
While indisputably powerful tools, traditional artificial neural networks (ANNs) and more classical machine learning techniques rely on developers identifying the typical features that describe the problem. In this work, a deep learning approach is applied to the problems of source discrimination of mine water inrush. Deep learning further exploits the power of ANNs by relying on the network itself to identify, extract, and combine the inputs into abstract features that contain much more pertinent information to solve the problem, that is, predicting the output, as illustrated in Fig. 4. Na++K+, Ca2+, Mg2+, Cl−, SO2- 4, HCO- 3. Every neuron accepts inputs from neurons on the previous layer based on linear or nonlinear activation functions (e.g., ReLU). The contents of six elements are delivered from the input layer to the output layer, where the output layer corresponds to the expectation to be predicted, which are the surface water, pore water of Quaternary limestone, sandstone water of Permian limestone, karst water of Carboniderous limestone and karst water of Cambrian limestone.
An ANN with three hidden layers and one output layer is shown in Fig. 5. Every layer constitutes a module through which one can back-propagate gradients. At every layer, we compute the total input i to every unit first, which is a weighted sum of the outputs of the units in the layer below. Then, a nonlinear function f is applied to i to obtain the output of the unit. For the sake of simplicity, the bias terms are omitted. The nonlinear functions in the hidden layer using the ANN include the rectified linear unit (ReLU) f(z)= max(0,z). At the output layer, softmax is used to calculate the probability of the water source, which is commonly used in recent years16.
At every hidden layer, we calculate the error derivative with respect to the output of every unit, which is a weighted sum of the error derivative with respect to the total inputs to the units in the layer above. Then, we convert the error derivative with respect to the output into the error derivative with respect to the input by multiplying it by the gradient of f. At the output layer, the error derivative with respect to the output of a unit is calculated by differentiating the cost function. This gives yl−tl if the cost function for unit l is 1/2*(yl−tl)2, where tl is the target value. Once ∂E/∂zk is known, the error derivative for the weight wjk on the connection from unit j in the layer below is just yj ∂E/∂zk.
The Python deep learning library Keras, with a TensorFlow backend and GPU acceleration, is used to train the ANN. TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that allows researchers to push the state of the art in DL, and developers easily build and deploy DL-powered applications. The model parameters of the intelligent evaluation of the DNN model are shown in Table 4.
Table 4
Model parameters of the intelligent evaluation of the DNN model
Number
|
Parameter
|
value
|
1
|
Type of model
|
Sequential model
|
2
|
the number of neurons in the input layer
|
6
|
3
|
the number of hidden layer and neurons
|
3,5
|
4
|
the number of neurons in the output layer
|
5
|
5
|
activation function of hidden layer
|
ReLU
|
7
|
activation function of output layer
|
Softmax
|
8
|
Epoch
|
200
|
9
|
Learning rate
|
0.01
|
10
|
optimizer function
|
Adam
|
11
|
batch_size
|
10
|
12
|
dropout rate
|
0.5
|
13
|
error limitation
|
1*10−4
|
14
|
momentum coefficient
|
η = 0.8
|