Interpretable Optimization Training Strategy-Based DCNN and Its Application on CT Image Recognition

,


Introduction
Computed tomography (CT) images play an important role in the detection of pulmonary nodules, and it has been an important auxiliary means for doctors to diagnose and treat lung cancer [1]. Modern imaging techniques can help doctors better detect nodules in the lung parenchyma and improve patients' chances of survival. How to rely on the development of science and technology to achieve intelligent image detection and recognition has become a hot issue of concern [2].
Based on the above requirements for the detection and recognition of pulmonary nodules, many researchers are committed to the realization of adaptive detection and recognition of pulmonary nodules. Zhu and Liu [2] proposed a computer-aided detection algorithm for the automatic detection of pulmonary nodules in CT images. Shi et al. [3] proposed a low-dose CT image pulmonary nodules detection method based on the convolutional neural network. Sun et al. [4] proposed an automatic detection algorithm for pulmonary nodules based on deep learning using the threshold method, region growth algorithm, and morphological processing. Li et al. [5] proposed an automatic detection method of pulmonary nodules based on a target detection algorithm and proposed a set of pulmonary parenchyma CT image processing processes combining the threshold segmentation algorithm and digital morphology processing. Xi and Liu et al. [6], based on the deep convolutional neural network model, discussed the influence of lung nodule images of different scales and modes on model classification performance and proposed a 2D multiview fusion lung image processing method. e above methods or algorithms can achieve a certain detection and recognition effect of pulmonary nodules based on their respective datasets, but there are still problems of relying on expert experience or poor interpretation. Khan et al. [7] proposed s a deep learning framework to support the automated detection of lung nodules in computed tomography (CT) images. In their work, a deep learning framework named VGG-SegNet has been used to mine the deep features, and then then these features were serially concatenated with the handcrafted features, such as the Grey Level Co-Occurrence Matrix (GLCM), Local-Binary-Pattern (LBP), and Pyramid Histogram of Oriented Gradients (PHOG) to enhance the disease detection accuracy. Sahlol et al. [8] proposed a novel method for detecting tuberculosis in chest radiographs using artificial ecosystem-based optimization of deep neural network features. Połap et al. [9] presented research results on the application of the heuristic method for the detection of over aggregated X-ray images that come from implemented segmentation. e above methods can obtain good verification results on their respective experimental datasets, but there are still some improvements to be made. Most of the above methods still optimize the feature extraction process to obtain better detection and recognition results, but often ignore the influence of the input sample quality and model training strategy on recognition results. In most engineering practical scenarios, the quality and quantity of the labeled samples used for training models are limited, and hence how to complete the detection and recognition of unknown samples in the case of limited effective samples should also become the focus of attention of most researchers. To obtain a higher detection and recognition rate further based on intelligent detection methods has become the focus of this paper.
Affected by instrument sensitivity and transmission channel, CT images will inevitably be interfered with by noise, which will greatly affect doctors' judgment of lesions and may lead to wrong or missed detection [10]. According to the literature research, CT images of pulmonary nodules are often affected by impulse noise [11]. At present, the mainstream noise reduction algorithms mainly include wavelet noise reduction, Gaussian filtering, and other filtering algorithms [12]. As far as the whole process is concerned, the wavelet denoising algorithm is complicated and cumbersome, which is not conducive to online diagnosis. Although the Gaussian filter can have a relatively ideal filtering effect on Gaussian noise, the filtering effect on impulse noise is not very good. However, other filtering algorithms, such as mean filtering and Wiener filtering, are also limited by the computational cost and noise reduction ability; thus, they cannot become a strong universal noise reduction algorithm. Different from the above denoising algorithms, the median filter algorithm can overcome the image blurriness caused by the linear filter algorithm and the high computational cost of the nonlinear filter, and the median filter algorithm has a relatively ideal denoising effect on the impulse noise in CT images [13].
With the significant improvement of computer performance, the deep learning algorithm [14], as an intelligent detection and recognition algorithm, shines brilliantly in more and more scenes. As one of the typical representatives of deep learning algorithms, the Deep Convolutional Neural Network (DCNN) can adaptively extract features from images and perform pattern recognition based on its classifier [15]. e DCNN method avoids manual extraction of feature indexes and classifier design to the maximum extent, thus effectively improving detection efficiency and intelligence [16]. However, like most methods requiring model training, the performance of DCNN model training has a great influence on the detection and recognition effect [17]. To obtain an ideal training model, the existing label training samples are generally discretized and input into the constructed DCNN model in multiple batches for training [18]. Han et al. [19] obtained a recognition model with better detection performance by training the model in multiple stages. Zhang et al. [20] realized pattern recognition tasks under limited samples by discrete training samples and reducing the training capacity of single batch samples. However, discrete samples and multiple batches of input into the training model are not highly explanatory and lack a rigorous theoretical derivation process. erefore, the existing DCNN algorithm presents two major problems in the model training process: First, what is the theoretical basis for the random dispersion of samples? Second, how to scientifically quantify the number of sample input batches? e deep learning algorithm has been faced with the problem of the "black box" effect since it was put forward. Just because of the "black box" effect, the model training problems encountered by the DCNN algorithm in engineering practice have troubled many researchers. erefore, aiming at the problems of the lack of theoretical guidance in the DCNN model training strategy and how to scientifically quantify training batches, this paper theoretically deduced why samples are randomly discrete, and further proposed a method of scientifically quantifying the sample input batches. Combined with the above sample pretreatment methods, this paper preprocessed images based on the median filtering algorithm, and then used DCNN under the guidance of interpretable training to detect and identify the pulmonary nodules in CT images so as to comprehensively verify the effectiveness of the proposed method and training strategy.

Image Preprocessing
Based on the fact that the median filtering algorithm can take into account both the efficiency of the linear filter and the  accuracy of the nonlinear filter, the median filtering algorithm is used to preprocess the sample before the sample is input into the DCNN detection model for training or testing so as to effectively reduce the sample noise and enhance the details. e median filtering process is shown in Figure 1. Firstly, the median value of local pixels in the image is obtained, and then the local correlation is used to reset the pixels so as to effectively remove the scattered salt and pepper noise. e specific steps are as follows: (1) Reorder local pixels according to their size (2) Select the median value of pixel sequence as the new pixel value (3) Move the median filter scale window and carry out the contents contained in the first and second steps again until the whole image completes traversal denoising

Deep Convolutional Neural Network (DCNN)
DCNN implements image feature enhancement with the help of convolution operation and carries out down-sampling operation relying on the local correlation of images to achieve fast dimension reduction of samples. Figure 2 shows the basic internal calculation process of DCNN. Firstly, convolution of the denoised image ergodic formula is obtained using convolution kernels f x of different scales. Secondly, the local correlation principle is used for noise reduction and feature enhancement, and the enhanced image c x is obtained by finetuning and summing b x . irdly, the lower sampling layer is fine-tuned by weighting ω x and bias b x+1 . Finally, the image features are obtained by activating the Sigmoid function S x+1 . A typical DCNN iterates the following convolution and down-sampling processes several times, and then deeply excavates the input samples to obtain the image feature information adaptively, thus providing detection and recognition results based on its classifier. Similar to the neural network, the training process of the DCNN model includes the determination of forwarding parameters and the adjustment of reverse parameters, and the forward and reverse processes are integrated to determine the minimum error of model reconstruction. Let the sample set composed of m samples be (x (1) , y (1) ), . . . , (x (m) , y (m) )}, which belong to n categories. y (i) are the corresponding category labels of the samples x (i) ; the training objective function of the DCNN model can be expressed as (1) In the formula, the meanings of ω and b are the same as those in Figure 2, and h ω,b (x (i) ) indicates the detection and recognition results. e gradient descent method is used to minimize the objective function J(ω, b), and the iterative formula in the process is in which α is the learning rate. Partial derivatives of (2) and (3) are obtained based on the BP algorithm. Firstly, the forward propagation is carried out to calculate the output value h ω,b (x (i) ) of the last layer of the network, and then the direct gap between the predicted value of the sample tag h ω,b (x (i) ) and the actual tag is calculated, which is defined as δ (nl) i (nl represents the output layer). en, the residual of each layer is obtained by calculating the residual of the final output layer so as to calculate the partial derivatives of (2) and (3). e residual calculation formula of the last layer of the traditional neural network is in which Z (l) i is the weighted sum of the input of unit i at layer l and Z (nl) i is the sum of the weighted input for unit i of the last layer.

Explainable Training Strategies for DCNN
As an intelligent image detection algorithm, the image recognition performance of DCNN depends on whether the DCNN model acquired after training has a strong enough generalization ability, i.e., whether it can achieve ideal recognition results for images with different features. As mentioned above, in order to obtain the DCNN model with good detection and recognition ability, researchers often adopt the model training strategy of random dispersion of training samples and small-capacity, multibatch input. But there are still two thorny problems: First, the random dispersion of samples lacks the theoretical basis; second, the setting method of the sample training batch is not clear. Next, this paper will focus on the detection and recognition mechanism of DCNN and explain the first problem through mathematical derivation. At the same time, aiming at the second problem, a scientific quantitative method of batch division is proposed. For the convenience of elaboration, each batch training acquisition model is defined as a singlebatch Deep Convolutional Neural Network (BDCNN). e DCNN model obtained by image sample training can be expressed as p(x k , θ k ), k � 1, 2, . . . , K , where p(·) is the detection and recognition model; x k ∈ X; X is the total image training sample set; size(X) � K * n; K is the number of input batches or the number of BDCNN obtained by image samples; n is the sample capacity of a single batch; θ k is the set of model parameters obtained by training a single batch. To define the generalization error of the DCNN model: in which P X,Y (·) is the probability function, I(·) is the discriminant function, av k (·) is the mean function, Y is the correct label set, and J is the misclassified label set. e generalization error can be used to measure the image detection and recognition performance of the DCNN model.
Conclusion to be proved: e generalization error of the DCNN training model is positively correlated with the correlation between BDCNN, and negatively correlated with the detection and recognition performance of BDCNN. e proof process is as follows. Before that, five definitions or property theorems are stated: (1) e BDCNN correlation refers to the correlation between the DCNN models acquired by a single batch of training, and the correlation expression is described in detail in the subsequent proof part of this paper. (2) e detection and recognition performance of BDCNN is the sample recognition ability of the DCNN model acquired by a single batch of training, and there will be a detailed mathematical expression in the subsequent derivation process of this paper. (3) Convergence almost everywhere [21]: Let ξ and ξ n be a sequence of random variables defined in the probability space (Ω, F, P). If there is a zero-test set Ω 0 , i.e., Ω 0 ∈ F, P(Ω 0 ) � 0, and ∀ω ∈ Ω/Ω 0 , there is ξ n (ω) ⟶ ξ(ω), and ξ n converge almost everywhere to ξ, i.e., ξ n a.s.
⟶ ξ. (4) Borel's law of strong numbers [22]: Suppose ξ n is a sequence of independent and identically distributed random variables in the probability space (Ω, F, P), Chebyshev's inequality [23]: For a random variable x, if expectation Ex and variance Dx exist, then ∀ξ > 0, It can be proved that the following conclusions can be obtained: When the number of BDCNN K ⟶ ∞, the following expressions have a convergence relation at every place: Proof process: based on the property theorem (3), comparing Equation (5) and Equation (6), it can be seen that, to prove Equation (6), it only needs to prove that for any j, there exists a null test set C in the value space of (θ 1 , θ 2 , . . . , θ k , . . .), so that for all samples x except C, the following expression is true: For the limited training samples and test sample set, the training batches corresponding to x of p(x, θ) � j are limited. Let the training batches be S 1 , S 2 . . . S R, , where R is a finite number. For the set with a total number of N samples, R ≤ 2 N . Define φ(θ k ) � r when x: h(x, θ) � j � S r , and let K r represent the times φ(θ k ) � r in K BDCNN, where k � 1, 2, · · · , K; then: According to (8), the left and right sides of the equation represent the number of BDCNN misclassified. When K ⟶ ∞, according to Borel's law of strong numbers, there will be erefore, for any j, there exists a null test set C in the value space of (θ 1 , θ 2 , . . . , θ k , . . .), such that for all samples x except C, the following expression holds: us, (6) is deduced. Next, the margin function mr(X, Y) of DCNN is defined as Down-sampling process Convolution process Image Figure 2: Convolution and down-sampling process of the DCNN. 4 Mathematical Problems in Engineering e margin function represents the degree to which the number of correctly classified BDCNN exceeds the number of misclassified BDCNN during model training. e larger the value of mr(X, Y), the higher the confidence of the DCNN model obtained by training. Note mr(X, Y) can be rewritten as Note And then It has been proved that PE * ⟶ a .s. P X,Y (mr(X, Y) < 0), and the upper bound of the generalization error PE * can be obtained based on the analysis of P X,Y (mr(X, Y) < 0). In order to show that the detection and recognition results of the DCNN model are credible, introducing the E X,Y mr(X, Y) and E X,Y mr(X, Y) > 0 represents the expected degree of DCNN on the classification results of each sample. According to Chebyshev's inequality, So, there will be e detection and recognition capability of BDCNN is defined as s, the average correlation between BDCNN is ρ, and the expressions are in which ρ(θ, θ * ) represents the correlation between rmg(θ, X, Y) and rmg(θ * , X, Y), and sd(θ) represents the standard deviation of rmg(θ, X, Y). e upper bound of PE * represented by s and ρ is obtained based on the following proof procedure.
To sum up, the DCNN training model's generalization error can be reduced by reducing the correlation between BDCNN and improving the classification intensity of BDCNN so as to enhance the DCNN model's generalization ability and improve the confidence of detection and recognition results.
erefore, during the sample input process, the correlation between BDCNN is generally reduced through sample randomization. In order to improve the detection and recognition performance of BDCNN, samples can be randomly dispersed and scientifically quantified into batches from the perspective of sample training. To a certain extent, the random dispersion of samples ensures the strong generalization ability of the trained network model and avoids the training model falling into the local optimum. At the same time, when the total number of training samples is constant, the number of sample input batches depends on the number of batch sample input and the number of iterations of the model training cycle.
In order to further illustrate the random discrete process of samples, the specific algorithm flow is given Algorithm 1.
As for how to scientifically quantify the sample input batches, as shown in Figure 3, this paper takes the final model generalization error PE * as the cost function, and realizes the optimization of the sample input batch times through the grid optimization method based on the topographic map of the model generalization error PE * under the different iterations obtained and the batch sample size.
In conclusion, the steps of the pulmonary nodule detection and recognition method based on image pretreatment and interpretable training-guided DCNN are shown in Figure 4. In other words, the denoising pretreatment of label samples is carried out based on the median filtering algorithm, and then the random discrete input of the DCNN model is used for training and recognition. Based on the sample random discretization criterion mentioned above, the pre-processed training samples are randomly discretized and then input into the DCNN model to obtain a detection and recognition model with better generalization performance.

Experiment and Result Analysis
In order to demonstrate the effectiveness of the proposed method and training strategy, this paper carried out method validation based on the online public Lung Image dataset, LDC-ID-RI (Lung Image Database Consortium) [24], and conducted a comparative analysis with the methods of related literature under the same dataset.
e LIDC-ID-RI dataset is provided by the National Cancer Institute of the United States, and each sample is in the standard DICOM format of 512 × 512 pixels. Taking dataset included in the 1018 research dataset as an example, the samples have been carried out by four experienced breast radiologists the first physician to separate samples diagnosis and the diagnosis is given, then the diagnosis of three other doctors to visit. Finally, the sample is annotated according to majority principle. 250 groups of negative and positive cases in the dataset are selected, respectively, and then the CT images are preprocessed. Meanwhile, in order to increase the training sample size of the model, the processed images are rotated and inverted, and then randomly discrete input into the DCNN model for training and recognition.

Image Preprocessing.
In order to eliminate noise interference introduced by equipment and better detect and identify pulmonary nodules, the median filtering algorithm was introduced to de-noise the original CT images. In order to compare and verify the effect of median filtering, a comparison experiment of Gaussian filtering is added, and the obtained results are shown in Figure 5. Figure 5(a) shows the CT image before noise reduction, and Figure 5(b) shows the CT image after noise reduction. A specific example after median filtering is shown in Figure 6. e dark spots with a large number of discrete distributions are impulse noises generated by the influence of sensors or channels. After median filtering, the results are shown in Figure 6(b), which can achieve an ideal noise reduction effect. As shown in Figure 5, although Gaussian filtering also has a certain noise reduction effect, compared with median filtering, the performance of Gaussian filtering in dealing with the noise reduction of salt and pepper noise is not obvious. It can be found that the median filtering method can not only suppress the impulse noise but also retain the image edge details, which can provide better conditions for the subsequent identification of pulmonary nodules.
From the above examples, it can be seen that the median filtering algorithm can achieve a better noise reduction effect for common impulse noise reduction problems. In addition to noise reduction, the median filtering algorithm can realize the enhancement of image details, which lays a solid foundation for subsequent DCNN model training and testing. Figure 7 shows the DCNN basic framework adopted by the method in this paper. Based on the training error and calculation cost considerations, the training error and calculation duration of the DCNN model at different depths are obtained, and the results are shown in Table 1. It can be seen that the optimal structure of the DCNN model consists of three convolution layers and three down-sampling layers. e size of the input image is 512 × 512, and the down-sampling form is mean down-sampling. e size of the convolution kernel used by the first two convolution layers is 5 × 5, and the size of the convolution kernel used by the third convolution layer is 6 × 6. e number of convolution kernels used by each convolution layer is 6, 12, and 12 respectively. After passing through the full connection layer, pattern recognition is carried out and detection results are output. e samples are input into the training model according to different batch numbers and different iterations are set to obtain the generalization error topographic map of the DCNN training model, as shown in Figure 8. Search to obtain the optimal batch sample number and the optimal number of iterations, and then obtain the optimal number of sample input batches, and then obtain the DCNN model completed by training.  Calculate and obtain the number of input training samples: m � size(x, 3); Obtain the sample number after random dispersion: kk � randperm(m); Get batch sample size: numbatches � m/opts.batchsize; for l � 1: numbatches Take out the batchsize samples and corresponding labels after disordering batch_x � x(:,:, kk((l -1) * opts.batchsize + 1: l * opts.batchsize)); batch_y � y(:, kk((l -1) * opts.batchsize + 1: l * opts.batchsize)); Computes the network output under the current network weights and network inputs: net � cnnff(net, batch_x); After the above network output is obtained, bp algorithm is used to get the error pair network weight through the corresponding sample label: net � cnnbp(net, batch_y); After obtaining the derivative of the error to the weight, the weight is updated by the weight update method: net � cnnapplygrads(net, opts); if isempty(net.rL) net.rL(1) � net.L; end net.rL(end + 1) � 0.99 * net.rL(end) + 0.01 * net.L; end ALGORITHM 1: e samples are randomly discrete and then input into the model for training.

Mathematical Problems in Engineering
In order to express the influence of sample discretization on image features of in-depth model mining more intuitively, the features mined from the test samples are visualized. e obtained results are shown in Figures 9(a) and  9(b). Sample Nos. 1-10 are healthy, and sample nos. 11-20 are pulmonary nodules. It can be seen that in the case of nondiscrete samples, the features mined by the detection model are chaotic, while the features mined by the model trained by the method proposed in this paper can distinguish the two types of samples well, indicating that sample discretization can effectively improve the feature mining capability of the model.
SEN indicates sensitivity, SPE indicates specificity, ACC indicates accuracy, and FDP indicates false diagnosis proportion. According to the actual evaluation criteria of pulmonary nodules in the medical field, the test criteria can be obtained as shown in Table 2. TP represents the proportion of true-positive results, TN represents the proportion of true-negative results, FP represents the proportion of false-positive results, and FN represents the proportion of false-negative results.
Based on the proposed method, the detection and recognition results of pulmonary nodules obtained from the LDC-ID-RI dataset are shown in Table 3 and Figure 10. In   order to compare and verify the superiority of sample preprocessing by median filtering and the influence of sample random dispersion on the detection model, different experimental schemes are set up for comparison and verification. In scheme 1, the original samples are directly input into the DCNN model for detection and recognition without median filtering and random discretization. Scheme 2 indicates that the original samples are not preprocessed but randomly dispersed and then input into DCNN for detection and recognition. Scheme 3 represents that the original sample is processed by median filtering but not randomly input into DCNN for detection and identification. Scheme 4 means that the original sample is firstly processed by median filtering, and then randomly discrete and then input to DCNN for detection and recognition. Figure 10 mainly compares the advantages and disadvantages of each method and sample training strategy horizontally. In order to compare the advantages and disadvantages of each method and sample training strategy vertically again, the spider plot is shown in Figure 11.
As can be seen from the above experimental results, compared with the results obtained by the methods used in the literature [3,4,6], the experimental results obtained by scheme 1, scheme 2, and scheme 3 are not very ideal, but the internal comparison of schemes shows that the median filtering algorithm and the random dispersion of samples can effectively improve the detection and recognition performance of DCNN. Moreover, sample random dispersion has a greater impact on the detection and recognition performance of the DCNN model.
anks to the intelligent detection and recognition ability of DCNN guided by image pretreatment and interpretable training, scheme 4 can achieve better detection and recognition effect of pulmonary nodules. Although the advantages of scheme 4 are not obvious compared with the algorithm in the literature [5], after image pretreatment, the samples are input into the DCNN model under the guidance of explanatory training for detection and recognition, which has achieved effective improvement in all indicators, especially in the specificity index SPE and accuracy index ACC. us, the validity of the proposed method is verified again. Method of literature [2] Method of literature [3] Method of literature [6] Method of literature [4]

Conclusion
Aiming at the problem that CT images are interfered by noise, the median filtering algorithm is used in this paper to preprocess the detected images, which can significantly reduce the salt and pepper noise in CT images and highlight the target information, thus providing better quality training and detection and recognition samples for subsequent detection and recognition models. For the problems that DCNN training sample random discrete theoretical basis is not clear and training batch cannot be scientific and quantitative, from the perspective of DCNN training strategy, the theoretical basis of the sample derivation has been proved, and a training batch quantitative method has been proposed. Experimental verification results show that the proposed method has good performance in sensitivity, specificity, and accuracy, which has obtained 96.40%, 95.60%, and 96.00% test results, respectively. Compared with other algorithms in the literature, the proposed method has improved in all indicators. Moreover, the proposed method has strong transferability, and has a good guiding significance for the training of most deep learning network models, and can be used in more image recognition tasks. However, the method in this paper still has the following aspects that can be further studied: First, the pattern recognition problem with incomplete sample coverage; second, the adaptive construction and optimization of the network structure of the deep learning model; third, the uncertainty evaluation of the test results. In the future research work, indepth research will focus on the above problems.

Data Availability
is paper carried out method validation based on the online public Lung Image dataset, LDC-ID-RI (Lung Image Database Consortium) [24].