Leakage Identification of Underground Structures Using Classification Deep Neural Networks and Transfer Learning

Water leakage defects often occur in underground structures, leading to accelerated structural aging and threatening structural safety. Leakage identification can detect early diseases of underground structures and provide important guidance for reinforcement and maintenance. Deep learning-based computer vision methods have been rapidly developed and widely used in many fields. However, establishing a deep learning model for underground structure leakage identification usually requires a lot of training data on leakage defects, which is very expensive. To overcome the data shortage, a deep neural network method for leakage identification is developed based on transfer learning in this paper. For comparison, four famous classification models, including VGG16, AlexNet, SqueezeNet, and ResNet18, are constructed. To train the classification models, a transfer learning strategy is developed, and a dataset of underground structure leakage is created. Finally, the classification performance on the leakage dataset of different deep learning models is comparatively studied under different sizes of training data. The results showed that the VGG16, AlexNet, and SqueezeNet models with transfer learning can overall provide higher and more stable classification performance on the leakage dataset than those without transfer learning. The ResNet18 model with transfer learning can overall provide a similar value of classification performance on the leakage dataset than that without transfer learning, but its classification performance is more stable than that without transfer learning. In addition, the SqueezeNet model obtains an overall higher and more stable performance than the comparative models on the leakage dataset for all classification metrics.


Introduction
With the rapid development of urban areas all over the world, many engineering structures are constructed underground, such as metro tunnels [1], underground garages [2], underground substations [3], etc. Underground structures are usually affected by many adverse internal and external effects, including corrosion, groundwater, and operation loads.These effects may lead to the performance deterioration of the underground structures.Water leakage is a very common disease of underground structures, which can accelerate structural aging and cause serious structural safety accidents.Therefore, it is very important to conduct water leakage detection for underground structures during the operation process.
Conventional manual detection methods require a lot of human and material resources, and efficiency is low.In recent years, structural health monitoring has been studied and applied widely [4][5][6][7][8][9].Using advanced sensors and data mining algorithms, structural health monitoring systems can provide timely condition information on structures.Compared with the conventional manual detection method, the detection results of structural health monitoring are more objective, labor saving, and cost-effective [10].There are many kinds of sensors used in structural health monitoring, which can obtain different condition information on the monitoring objectives.Generally, these sensors can be divided into contact and non-contact sensors.Contact sensors need to be in contact with the monitoring object to collect information on its changes, including the physical quantities such as pressure, stress, strain, and temperature.When using a non-contact sensor, physical contact with the monitoring object is not required, and the data on the monitoring object can be measured by measuring changes in the electric field, magnetic field, light, sound, etc.In recent years, as a kind of non-contact sensor, the vision sensor has been widely used in civil engineering structure health monitoring [11][12][13][14][15][16].In the field of leakage identification of underground structures, many computer vision techniques also have been proposed for the classification and location of leakage defects [17][18][19][20].These computer vision techniques generally can be divided into four categories, including image classification [21], object detection [22], semantic segmentation [23], and instance segmentation [24].The image classification method is used to identify whether there is a leakage defect in the images.Object detection methods can be used to detect the leakage defect in the images and locate the regions.The semantic segmentation method can classify the pixels of different defects in the images.At last, the instance segmentation method can classify the pixels of different items of each defect in the images.Compared with the other three methods, the image classification method generally needs lower computational cost and is ideal for fast leakage defect detection.
With the development of artificial intelligence, many deep learning-based computer vision methods have been proposed.These methods have been validated to be more powerful than conventional computer vision methods.The VGG model [25], AlexNet model [26], SqueezeNet model [27], and ResNet model [28] are four very famous deep learning-based classification models that can provide high-accuracy classification results for many datasets.However, most deep learning models need a lot of training data to ensure classification performance.If there is insufficient training data to train the deep neural network, the accuracy of leakage identification will be low.To deal with the insufficient training data problems, transfer learning [29][30][31], few-shot meta-learning [32,33], semi-supervised learning [34,35], active learning [36][37][38], and other advanced methods have been proposed recently.In these methods, the latter several methods usually use carefully designed network structures or training data selection strategies to improve the classification performance of the deep neural network.But, the training data are restricted to the limited objective dataset.However, the transfer learning methods take advantage of the related datasets, which may be large enough to pretrain the deep neural network of classification models.The pretrained deep neural network has a strong ability to extract the features of the images.Then, much smaller training data in the objective dataset is needed to retrain the deep neural network.Because the transfer learning method can use the information from both the training data of the objective dataset and the related datasets, and the size of the related datasets is not limited, the transfer learning method greatly saves the training data in objective dataset.Recently, transfer learning has attracted some attention in the field of defect detection of engineering structures [39][40][41].However, most of the references only consider one classification model.How transfer learning affects the performance of different deep learning-based classification models for leakage identification of underground structures is unclear.
In this paper, a leakage identification method for underground structures is developed based on the classification of deep neural networks and transfer learning.The main contributions of this paper are as follows.(1) The performance of the classification models incorporating transfer learning is validated in the leakage identification of underground structures and ( 2

Methodology 2.1. Deep Neural Network for Leakage Identification
In this paper, four deep neural networks for leakage identification were studied, which were the VGG16 model, AlexNet model, SqueezeNet model, and ResNet model.All pretrained models were downloaded from the Torchvision library.The input size of the model was 224 × 224 × 3. To classify the leakage dataset, the output size of the models was changed to 1 × 1 × 2.
The VGG (Visual Geometry Group) model is a deep convolutional neural network architecture proposed by the Visual Geometry Group in 2014.The architecture of the modified VGG16 model is shown in Figure 1.The original VGG16 model had 16 weight layers, including 13 convolutional layers and 3 fully connected layers.To classify the leakage defect, a fully connected layer with two outputs was added to the pretrained model.VGG16 has a simple network structure and is easy to implement, making it one of the classic models in the field of deep learning-based computer vision.Due to the deep convolutional structure, the number of parameters in the VGG model is very large.
The VGG (Visual Geometry Group) model is a deep convolutional neural network architecture proposed by the Visual Geometry Group in 2014.The architecture of the modified VGG16 model is shown in Figure 1.The original VGG16 model had 16 weight layers, including 13 convolutional layers and 3 fully connected layers.To classify the leakage defect, a fully connected layer with two outputs was added to the pretrained model.VGG16 has a simple network structure and is easy to implement, making it one of the classic models in the field of deep learning-based computer vision.Due to the deep convolutional structure, the number of parameters in the VGG model is very large.
AlexNet is a classic convolutional neural network, and it was proposed in the 2012 ImageNet image classification competition.The architecture of the modified AlexNet model is shown in Figure 2. As shown in this figure, the modified AlexNet model had five convolutional layers and four fully connected layers, where the last fully connected layer was added to change the output classes.AlexNet was an early implementation of a deep convolutional neural network.By increasing the network depth, AlexNet was able to better learn the features of the dataset, thereby improving the accuracy of image classification.
SqueezeNet is a lightweight deep learning model that can achieve high prediction accuracy with fewer model parameters.Figure 3 shows the architecture of the modified SqueezeNet model.The modified SqueezeNet model had two convolutional layers, four pooling layers, and eight fire modules.
The architecture of the modified ResNet18 model is shown in Figure 4.As shown in this figure, the modified ResNet18 model had many convolutional layers and pooling layers.The last pooling layer was fully connected with the output layer.In the ResNet model, shortcut connections are built between skip layers to deal with the vanishing and exploding gradient problems, so ResNet can be very deep.

Transfer Learning Strategy
As mentioned above, all pretrained models were downloaded from the Torchvision library.The models were pretrained on large datasets (e.g., ImageNet).Therefore, these pretrained models had the ability to extract the features from the images.As shown in the examples in Figure 5, in transfer learning, the lower-level layer of the pretrained models was frozen, and the parameters of these layers were not updated in the retraining process.However, to classify the target leakage dataset, the high-level fully connected layers of the VGG16 model and AlexNet model and the resized output layers of the SqueezeNet model and ResNet18 model were retrained using the training data from the target leakage dataset.

Transfer Learning Strategy
As mentioned above, all pretrained models were downloaded from the Torchvision library.The models were pretrained on large datasets (e.g., ImageNet).Therefore, these pretrained models had the ability to extract the features from the images.As shown in the examples in Figure 5, in transfer learning, the lower-level layer of the pretrained models was frozen, and the parameters of these layers were not updated in the retraining process.However, to classify the target leakage dataset, the high-level fully connected layers of the VGG16 model and AlexNet model and the resized output layers of the SqueezeNet model and ResNet18 model were retrained using the training data from the target leakage dataset.

Transfer Learning Strategy
As mentioned above, all pretrained models were downloaded from the Torchvision library.The models were pretrained on large datasets (e.g., ImageNet).Therefore, these pretrained models had the ability to extract the features from the images.As shown in the examples in Figure 5, in transfer learning, the lower-level layer of the pretrained models was frozen, and the parameters of these layers were not updated in the retraining process.However, to classify the target leakage dataset, the high-level fully connected layers of the VGG16 model and AlexNet model and the resized output layers of the SqueezeNet model and ResNet18 model were retrained using the training data from the target leakage dataset.

Transfer Learning Strategy
As mentioned above, all pretrained models were downloaded from the Torchvision library.The models were pretrained on large datasets (e.g., ImageNet).Therefore, these pretrained models had the ability to extract the features from the images.As shown in the examples in Figure 5, in transfer learning, the lower-level layer of the pretrained models was frozen, and the parameters of these layers were not updated in the retraining process.However, to classify the target leakage dataset, the high-level fully connected layers of the VGG16 model and AlexNet model and the resized output layers of the SqueezeNet model and ResNet18 model were retrained using the training data from the target leakage dataset.In the training of all classification models, the cross-entropy between the prediction distribution and the real distribution was used as the loss function.The cross-entropy was calculated by ( ) ( ) ( ) where ( ) i P x is the predicted probability that the sample x belongs to the th i class and ( ) Q x is the real probability that the sample x belongs to the th i class.The mini-batch gradient descent and RMSProp (root mean square propagation) algorithm were used to update the deep neural networks.The mini-batch size was set to 5, and the learning rate was set to 5 10 − .The number of training epochs was determined by k-fold cross-validation.

Dataset Preparation
The datasets used in the examples were created through three approaches, including an online search, on-site photography, and an open-source dataset [42].As shown in Figures 6-9, the datasets contained water leakage images of the underground garage, water leakage images of the underground equipment room, water leakage images of the underground tunnel lining, images of underground structures without water leakage, etc.There were 136 high-resolution leakage images collected through an online search and on-site photography.By cropping, flipping, and adding noise, 1431 small-sized leakage images were obtained.The other 4655 leakage images were directly collected from the opensource dataset.By cropping 100 high-resolution no-leakage images of the open-source dataset, 1200 small-sized no-leakage images were obtained.Then, by flipping in two directions and adding noise, the other 6000 small-sized no-leakage images were obtained.Finally, the original datasets were extended to a new dataset with 13,286 images in total.In the new dataset, the number of water leakage images was 6086, and the number of images without water leakage was 7200.All images were in RGB format and had a size of 224 × 224 × 3.In the training of all classification models, the cross-entropy between the prediction distribution and the real distribution was used as the loss function.The cross-entropy was calculated by where P i (x) is the predicted probability that the sample x belongs to the ith class and Q l i (x) is the real probability that the sample x belongs to the ith class.The mini-batch gradient descent and RMSProp (root mean square propagation) algorithm were used to update the deep neural networks.The mini-batch size was set to 5, and the learning rate was set to 10 −5 .The number of training epochs was determined by k-fold cross-validation.

Dataset Preparation
The datasets used in the examples were created through three approaches, including an online search, on-site photography, and an open-source dataset [42].As shown in Figures 6-9, the datasets contained water leakage images of the underground garage, water leakage images of the underground equipment room, water leakage images of the underground tunnel lining, images of underground structures without water leakage, etc.There were 136 high-resolution leakage images collected through an online search and onsite photography.By cropping, flipping, and adding noise, 1431 small-sized leakage images were obtained.The other 4655 leakage images were directly collected from the open-source dataset.By cropping 100 high-resolution no-leakage images of the open-source dataset, 1200 small-sized no-leakage images were obtained.Then, by flipping in two directions and adding noise, the other 6000 small-sized no-leakage images were obtained.Finally, the original datasets were extended to a new dataset with 13,286 images in total.In the new dataset, the number of water leakage images was 6086, and the number of images without water leakage was 7200.All images were in RGB format and had a size of 224 × 224 × 3.In the training of all classification models, the cross-entropy between the prediction distribution and the real distribution was used as the loss function.The cross-entropy was calculated by ( ) ( ) ( ) where ( )

Dataset Preparation
The datasets used in the examples were created through three approaches, including an online search, on-site photography, and an open-source dataset [42].As shown in Figures 6-9, the datasets contained water leakage images of the underground garage, water leakage images of the underground equipment room, water leakage images of the underground tunnel lining, images of underground structures without water leakage, etc.There were 136 high-resolution leakage images collected through an online search and on-site photography.By cropping, flipping, and adding noise, 1431 small-sized leakage images were obtained.The other 4655 leakage images were directly collected from the opensource dataset.By cropping 100 high-resolution no-leakage images of the open-source dataset, 1200 small-sized no-leakage images were obtained.Then, by flipping in two directions and adding noise, the other 6000 small-sized no-leakage images were obtained.Finally, the original datasets were extended to a new dataset with 13,286 images in total.In the new dataset, the number of water leakage images was 6086, and the number of images without water leakage was 7200.All images were in RGB format and had a size of 224 × 224 × 3.In the experiments, a certain proportion of data was randomly selected as the training set, and the remaining data were used to test the classification performance.To compare the performance of the pretrained models under the different sizes of the training set, the ratio of the training set changed from 0.05 to 0.30 with a step of 0.01, the size of the training set increased from 664 to 3986, and the size of the remaining test set changed from 12622 to 9300.A detailed configuration of the training set and test set of 26 different experiments is shown in Table 1.In the experiments, a certain proportion of data was randomly selected as the training set, and the remaining data were used to test the classification performance.To compare the performance of the pretrained models under the different sizes of the training set, the ratio of the training set changed from 0.05 to 0.30 with a step of 0.01, the size of the training set increased from 664 to 3986, and the size of the remaining test set changed from 12622 to 9300.A detailed configuration of the training set and test set of 26 different experiments is shown in Table 1.In the experiments, a certain proportion of data was randomly selected as the training set, and the remaining data were used to test the classification performance.To compare the performance of the pretrained models under the different sizes of the training set, the ratio of the training set changed from 0.05 to 0.30 with a step of 0.01, the size of the training set increased from 664 to 3986, and the size of the remaining test set changed from 12622 to In the experiments, a certain proportion of data was randomly selected as the training set, and the remaining data were used to test the classification performance.To compare the performance of the pretrained models under the different sizes of the training set, the ratio of the training set changed from 0.05 to 0.30 with a step of 0.01, the size of the training set increased from 664 to 3986, and the size of the remaining test set changed from 12,622 to 9300.A detailed configuration of the training set and test set of 26 different experiments is shown in Table 1.

Identification Performance Evaluation Metrics
The classification metrics used in this paper included classification accuracy, classification precision, classification recall, and classification F1 score.As shown by the confusion matrix in Figure 10, TP represents the number of leakage images accurately classified as leakage and FP represents the number of no-leakage images mistakenly classified as leakage.Similarly, TN represents the number of no-leakage images accurately classified as no leakage and FN represents the number of leakage images mistakenly classified as no leakage.

Experimental Results
To test the impact of transfer learning on the classification performance of deep neural networks, the deep neural networks trained using different sizes of data were used to classify the remaining test set.For each test, both the training methods with and without transfer learning were used.In the method with transfer learning, the deep neural networks and their pretrained parameters were both downloaded from Torchvision library, and the parameters of the low-level layers were not updated in the training process.On the contrary, in the method without transfer learning, the deep neural networks were also downloaded from Torchvision library, but all model parameters were randomly initialized.It is worth noting that the models with and without transfer learning used the same hyperparameter configuration.

Example 1: Test Results of the VGG16 Model
In this section, the test results of the VGG16 model are discussed.11-14, the classification metrics by the VGG16 model with transfer learning were overall higher than the method without transfer learning.When the ratio of training data was 0.05, classification accuracy, classification precision, classification recall, and classification F1 score by the VGG16 model with transfer learning were 0.928, 0.927, 0.929, and 0.928, respectively, which were higher than 0.868, 0.867, 0.867, and 0.867 by the VGG16 model without transfer learning.The heatmap of the confusion matrix of the prediction results by the VGG16 models when the ratio of training data was 0.05 is shown in Figure 15.With the increase in the ratio of training data, both methods obtained higher classification metrics.However, the increasing process of the VGG16 model with transfer learning was more stable than the VGG16 model without transfer learning.
When the ratio of training data reached 0.2, the classification metrics by the VGG16 model without transfer learning were similar to those of the VGG16 model with transfer learning.When the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the VGG16 model with transfer learning were 0.962, 0.966, 0.960, and 0.962, respectively, and the results by the VGG16 model without transfer learning were 0.965, 0.966, 0.965, and 0.965, respectively.Due to the small randomness of training, the trained VGG16 model without transfer learning achieved slightly better results than the VGG16 model trained with transfer learning.The results showed that transfer learning could significantly improve classification performance of the VGG16 model on leakage defects when the ratio of training data was lower than 0.2 and could deal with the insufficient training data problem.model without transfer learning were 0.965, 0.966, 0.965, and 0.965, respectively.Due to the small randomness of training, the trained VGG16 model without transfer learning achieved slightly better results than the VGG16 model trained with transfer learning.The results showed that transfer learning could significantly improve classification performance of the VGG16 model on leakage defects when the ratio of training data was lower than 0.2 and could deal with the insufficient training data problem.

Example 2: Test Results of the AlexNet Model
The                  As shown in Figures 16-19, the classification metrics on the leakage dataset by the AlexNet model with transfer learning were overall higher than the method without transfer learning.When the ratio of training data was 0.05, classification accuracy, classification precision, classification recall, and classification F1 score by the AlexNet model with transfer learning were 0.934, 0.933, 0.934, and 0.933, respectively, which were higher than 0.861, 0.866, 0.856, and 0.859 by the AlexNet model without transfer learning.The heatmap of the confusion matrix of the prediction results by the AlexNet models when the ratio of training data was 0.05 is shown in Figure 20.However, the AlexNet model with transfer learning reached very high classification metrics when the ratio of training data was 0.05.With the increase in the ratio of training data, the classification metrics by the AlexNet model with transfer learning still slightly increased.However, the classification metrics by the AlexNet model without transfer learning were not stable.It changed within a range of approximately 0.85 to 0.95 when the ratio of training data increased.
When the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the AlexNet model with transfer learning were 0.971, 0.973, 0.969, and 0.970, respectively, but the results by the VGG16 model without transfer learning were 0.912, 0.916, 0.917, and 0.912, respectively.The results showed that the AlexNet model with transfer learning could obtain higher classification performance on leakage defects than that without transfer learning.Because of the constant pretrained model parameters of the AlexNet model, the AlexNet model with transfer learning also obtained a more stable classification performance on leakage defects.

Example 3: Test Results of the SqueezeNet Model
In this section, the test results of the SqueezeNet model are discussed.Table 4 shows the prediction results of the SqueezeNet model with and without transfer learning.

Example 3: Test Results of the SqueezeNet Model
In this section, the test results of the SqueezeNet model are discussed.Table 4 shows the prediction results of the SqueezeNet model with and without transfer learning.5.With the increase in the ratio of training data, the classification metrics by the SqueezeNet model with transfer learning increased approximately monotonically.However, the increasing process of the SqueezeNet model without transfer learning was unstable.When the ratio of training data was 0.3, the SqueezeNet models with and without transfer learning obtained similar classification metrics on the leakage dataset, and classification accuracy, classification precision, classification recall, and classification F1 score by the SqueezeNet model with transfer learning were 0.977, 0.977, 0.976, and 0.977, respectively and the results by the SqueezeNet model without transfer learning were 0.969, 0.968, 0.970, and 0.969, respectively.The results showed that using transfer learning, the SqueezeNet model could provide higher and more stable classification metrics on leakage defects.

Example 4: Test Results of the ResNet18 Model
This section presents the test results of the ResNet18 model.The detailed prediction results of the ResNet18 model with and without transfer learning are shown in Table 5.
Classification accuracy, classification precision, classification recall, and classification F1 score by the ResNet18 model with and without transfer learning are shown in Figures 26-29.As shown in Figures 26-29, the classification metrics on the leakage dataset by the ResNet18 model with transfer learning were overall similar to those without transfer learning.When the ratio of training data was 0.05, classification accuracy, classification precision, classification recall, and classification F1 score by the ResNet18 model with transfer learning were 0.938, 0.940, 0.935, and 0.937, respectively.The prediction results by the ResNet18 model without transfer learning were 0.933, 0.935, 0.931, and 0.933, respectively.The heatmap of the confusion matrix of the prediction results by the ResNet18 model when the ratio of training data was 0.05 is shown in Figure 30.With the increase in the ratio of training data, the classification metrics by the ResNet18 model with transfer learning increased slightly.However, the classification metrics by the ResNet18 model without transfer learning were not stable.When the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the ResNet18 model with transfer learning were 0.961, 0.963, 0.959, and 0.961, respectively, which is similar to the 0.967, 0.969, 0.965, and 0.966 of the ResNet18 model without transfer learning.The results showed that the classification metrics on leakage defects by the ResNet18 model with transfer learning were overall similar to those without transfer learning.This may be because ResNet18 has a specific network structure (e.g., shortcut connection between skip layers), and ResNet18 has strong feature extraction capabilities and can achieve similar performance to ResNet18 with transfer learning using a small amount of training data.However, due to the pretrained model parameters, the prediction results by the ResNet18 model with transfer learning were more stable than the ResNet18 model without transfer learning.To test the computational efficiency of the models, the training time of different final models with and without transfer learning was compared, as shown in Table 6 and  The results showed that AlexNet achieved the highest computational efficiency, regardless of whether transfer learning was used or not.The VGG16 and AlexNet models with transfer learning used less training time than those without transfer learning.That was because the VGG16 and AlexNet models with transfer learning required a smaller number of training epochs to provide high prediction accuracy.However, the SqueezeNet and ResNet18 models with transfer learning used more training time than those without transfer learning, which may be because the training process of the models without transfer learning was more unstable than the models with transfer learning, and the optimal number of train epochs determined by cross-validation was smaller.As shown in Figure 35, with the increase in the train data ratio, the training time of the models with transfer learning increased.However, the training time of the models without transfer learning was unstable when the train data ratio increased.This phenomenon may also be caused by the unstable training process of the models without transfer learning.When the train data ratio was 0.05, the prediction accuracy of the SqueezeNet model was 0.957, which was higher than 0.928, 0.934, and 0.938 of the VGG16, AlexNet, and ResNet18 models.For classification precision, the SqueezeNet model obtained a value of 0.959 when the train data ratio was 0.05, but the values for the VGG16, AlexNet, and ResNet18 models were 0.927, 0.933, and 0.940, respectively.When the train data ratio was 0.05, the obtained classification recall by the SqueezeNet model was 0.955, which was also higher than 0.929, 0.934, and 0.935 of the VGG16, AlexNet, and ResNet18 models.Considering classification F1 score, the prediction value of the SqueezeNet model was 0.957, but the results of the VGG16, AlexNet, and ResNet18 models were 0.928, 0.933, and 0.937, respectively.With the increase in the train data ratio, all models obtained higher classification performance.However, the increased process of classification performance of the SqueezeNet model was more stable than the other models.When the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the VGG16 model with transfer learning were, respectively, 0.962, 0.966, 0.960, and 0.962.Those of the AlexNet model with transfer learning were, respectively, 0.971, 0.973, 0.969, and 0.970.The four classification metrics of the ResNet18 model with transfer learning were, respectively, 0.961, 0.963, 0.959, and 0.961.However, when the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the SqueezeNet model with transfer learning were 0.977, 0.977, 0.976, and 0.977, respectively, which were higher than those of the other methods.The results showed that the SqueezeNet model with transfer learning has a higher classification performance on the leakage defects than the other comparative methods.
To test the computational efficiency of the models, the training time of different final models with and without transfer learning was compared, as shown in Table 6 and provide high prediction accuracy.However, the SqueezeNet and ResNet18 models with transfer learning used more training time than those without transfer learning, which may be because the training process of the models without transfer learning was more unstable than the models with transfer learning, and the optimal number of train epochs determined by cross-validation was smaller.As shown in Figure 35, with the increase in the train data ratio, the training time of the models with transfer learning increased.However, the training time of the models without transfer learning was unstable when the train data ratio increased.This phenomenon may also be caused by the unstable training process of the models without transfer learning.

Conclusions and Discussion
In this paper, a transfer learning strategy was developed to deal with the problems of data shortage in training deep learning-based classification models for underground structure leakage identification.The classification performance of four famous classification models, including VGG16, AlexNet, SqueezeNet, and ResNet18, with the transfer learning on the leakage dataset was comparatively studied under different sizes of training data.
The results showed that the VGG16, AlexNet, and SqueezeNet models with transfer learning could overall provide higher and more stable classification performance on the leakage dataset than those without transfer learning.The ResNet18 model with transfer learning could overall provide a similar value of classification performance on the leakage dataset than that without transfer learning, but its classification performance was more stable than that without transfer learning.When the ratio of training data was 0.05, classification accuracy, classification precision, classification recall, and classification F1 score by the VGG16 model with transfer learning were 0.928, 0.927, 0.929, and 0.928, respectively, and those of the VGG16 model without transfer learning were 0.868, 0.867, 0.867, and 0.867 respectively.The classification metrics by the AlexNet model with transfer learning were 0.934, 0.933, 0.934, and 0.933, respectively, which were higher than 0.861, 0.866, 0.856, and 0.859 by the AlexNet model without transfer learning.The classification metrics by the SqueezeNet model with transfer learning were 0.957, 0.959, 0.955, and 0.957, respectively, and those of the SqueezeNet model without transfer learning were 0.934, 0.934, 0.933, and 0.934.The classification metrics by the ResNet18 model with transfer learning were 0.938, 0.940, 0.935, and 0.937, respectively, and the prediction results by the ResNet18 model without transfer learning were 0.933, 0.935, 0.931, and 0.933, respectively.
In addition, the SqueezeNet model obtained overall higher and more stable performance than the comparative models on the leakage dataset for all classification metrics.When the train data ratio was 0.05, the prediction accuracy of the SqueezeNet model was 0.957, which was higher than 0.928, 0.934, and 0.938 of the VGG16, AlexNet, and ResNet18 models.For the classification precision, the SqueezeNet model obtained a value of 0.959 when the train data ratio was 0.05, but the values for the VGG16, AlexNet, and ResNet18 models were 0.927, 0.933, and 0.940, respectively.The obtained classification recall by the SqueezeNet model was 0.955, which was also higher than 0.929, 0.934, and 0.935 of the VGG16, AlexNet, and ResNet18 models.Considering the classification F1 score, the prediction value of the SqueezeNet model was 0.957, but the results of the VGG16, AlexNet, and ResNet18 models were 0.928, 0.933, and 0.937, respectively.
In the methods with transfer learning, the classification models inherit a strong ability to extract the features of images.During the retraining process in this paper, the feature extraction layers were frozen, and only the weights in the classification layers were updated.The retraining process is actually learning the mapping from extracted features to the classification results.Therefore, the retraining process is generally easier than training the entire classification models without transfer, and less training data are required.In this paper, all the feature extraction layers of the models with transfer learning were fixed during the retraining process.However, if some of the convolution layers are retrained, the classification performance may change.The optimal retraining scheme for different models is worth further research.
) the influence of transfer learning on different classification models (including the VGG16 model, AlexNet model, SqueezeNet model, and ResNet Sensors 2024, 24, 5569 3 of 23 model) and the influence of training data size on their classification performance are comparatively studied.

Figure 1 .
Figure 1.Modified VGG16 model.Figure 1. Modified VGG16 model.AlexNet is a classic convolutional neural network, and it was proposed in the 2012 ImageNet image classification competition.The architecture of the modified AlexNet model is shown in Figure 2. As shown in this figure, the modified AlexNet model had five convolutional layers and four fully connected layers, where the last fully connected layer was added to change the output classes.AlexNet was an early implementation of a deep convolutional neural network.By increasing the network depth, AlexNet was able to better learn the features of the dataset, thereby improving the accuracy of image classification.SqueezeNet is a lightweight deep learning model that can achieve high prediction accuracy with fewer model parameters.Figure 3 shows the architecture of the modified SqueezeNet model.The modified SqueezeNet model had two convolutional layers, four pooling layers, and eight fire modules.The architecture of the modified ResNet18 model is shown in Figure 4.As shown in this figure, the modified ResNet18 model had many convolutional layers and pooling layers.The last pooling layer was fully connected with the output layer.In the ResNet model, shortcut connections are built between skip layers to deal with the vanishing and exploding gradient problems, so ResNet can be very deep.

Figure 5 .
Figure 5. Transfer learning configurations of a deep neural network for water leakage identification.

Figure 6 .
Figure 6.Water leakage image of the underground garage.

Figure 5 .
Figure 5. Transfer learning configurations of a deep neural network for water leakage identification.

Figure 5 .
Figure 5. Transfer learning configurations of a deep neural network for water leakage identification.

iP
x is the predicted probability that the sample x belongs to the th i the real probability that the sample x belongs to the th i class.The mini-batch gradient descent and RMSProp (root mean square propagation) algorithm were used to update the deep neural networks.The mini-batch size was set to 5, and the learning rate was set to 5 10 − .The number of training epochs was determined by k-fold cross-validation.

Figure 6 .
Figure 6.Water leakage image of the underground garage.Figure 6. Water leakage image of the underground garage.

Figure 6 .
Figure 6.Water leakage image of the underground garage.Figure 6. Water leakage image of the underground garage.

Figure 7 .
Figure 7. Water leakage image of the underground equipment room.

Figure 8 .
Figure 8. Water leakage image of the underground tunnel lining.

Figure 9 .
Figure 9. Images of the underground structure without water leakage.

Figure 7 . 25 Figure 7 .
Figure 7. Water leakage image of the underground equipment room.

Figure 8 .
Figure 8. Water leakage image of the underground tunnel lining.

Figure 9 .
Figure 9. Images of the underground structure without water leakage.

Figure 8 . 25 Figure 7 .
Figure 8. Water leakage image of the underground tunnel lining.

Figure 8 .
Figure 8. Water leakage image of the underground tunnel lining.

Figure 9 .
Figure 9. Images of the underground structure without water leakage.

Figure 9 .
Figure 9. Images of the underground structure without water leakage.
The classification metrics used in this paper included classification accuracy, classification precision, classification recall, and classification F1 score.As shown by the confusion matrix in Figure10, TP represents the number of leakage images accurately classi- fied as leakage and FP represents the number of no-leakage images mistakenly classi- fied as leakage.Similarly, TN represents the number of no-leakage images accurately classified as no leakage and FN represents the number of leakage images mistakenly classified as no leakage.

Figure 10 .
Figure 10.Confusion matrix of leakage classification.The classification metrics were then calculated by TP TN Accuracy TP FP FN TN + = + + +

Figure 11 .
Figure 11.Example 1: prediction accuracy of different methods under different training data ratios.Figure 11.Example 1: prediction accuracy of different methods under different training data ratios.

Figure 11 . 25 Figure 12 .
Figure 11.Example 1: prediction accuracy of different methods under different training data ratios.Figure 11.Example 1: prediction accuracy of different methods under different training data ratios.

Figure 13 .
Figure 13.Example 1: prediction recall of different methods under different training data ratios.

Figure 14 .
Figure 14.Example 1: prediction F1 score of different methods under different training data ratios.

Figure 12 . 25 Figure 12 .
Figure 12.Example 1: prediction precision of different methods under different training data ratios.

Figure 13 .
Figure 13.Example 1: prediction recall of different methods under different training data ratios.

Figure 14 .
Figure 14.Example 1: prediction F1 score of different methods under different training data ratios.

Figure 13 . 25 Figure 12 .
Figure 13.Example 1: prediction recall of different methods under different training data ratios.

Figure 13 .
Figure 13.Example 1: prediction recall of different methods under different training data ratios.

Figure 14 .
Figure 14.Example 1: prediction F1 score of different methods under different training data ratios.Figure 14.Example 1: prediction F1 score of different methods under different training data ratios.

Figure 14 .Figure 15 .
Figure 14.Example 1: prediction F1 score of different methods under different training data ratios.Figure 14.Example 1: prediction F1 score of different methods under different training data ratios.
test results of the AlexNet model are discussed in this section.The detailed prediction results of the AlexNet model with and without transfer learning are shown in Table 3, and classification accuracy, classification precision, classification recall, and classification F1 score by the AlexNet model with and without transfer learning are shown in Figures 16-19.As shown in Figures 16-19, the classification metrics on the leakage dataset by the AlexNet model with transfer learning were overall higher than the method without transfer learning.When the ratio of training data was 0.05, classification accuracy, classification precision, classification recall, and classification F1 score by the AlexNet model with transfer learning were 0.934, 0.933, 0.934, and 0.933, respectively, which were higher than 0.861, 0.866, 0.856, and 0.859 by the AlexNet model without transfer learning.The heatmap of the confusion matrix of the prediction results by the AlexNet models when the ratio of training data was 0.05 is shown in Figure 20.However, the AlexNet model with transfer learning reached very high classification metrics when the ratio of training data was 0.05.With the increase in the ratio of training data, the classification metrics by the AlexNet model with transfer learning still slightly increased.However, the classification metrics by the AlexNet model without transfer learning were not stable.It changed within a range of approximately 0.85 to 0.95 when the ratio of training data increased.When the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the AlexNet model with transfer learning were 0.971, 0.973, 0.969, and 0.970, respectively, but the results by the VGG16 model without transfer learning were 0.912, 0.916, 0.917, and 0.912, respectively.The results showed that the AlexNet model with transfer learning could obtain higher classification performance on leakage defects than that without transfer learning.Because of the constant pretrained model parameters of the AlexNet model, the AlexNet model with transfer learning also obtained a more stable classification performance on leakage defects.

Figure 15 .
Figure 15.Example 1: heatmap of the confusion matrix of the prediction results when the ratio of training data was 0.05.(a) VGG16 with transfer and (b) VGG16 without transfer.

Figure 16 .
Figure 16.Example 2: prediction accuracy of different methods under different training data ratios.

Figure 17 .
Figure 17.Example 2: prediction precision of different methods under different training data ratios.

Figure 16 .
Figure 16.Example 2: prediction accuracy of different methods under different training data ratios.

Figure 16 .
Figure 16.Example 2: prediction accuracy of different methods under different training data ratios.

Figure 17 .
Figure 17.Example 2: prediction precision of different methods under different training data ratios.Figure 17.Example 2: prediction precision of different methods under different training data ratios.

Figure 17 .
Figure 17.Example 2: prediction precision of different methods under different training data ratios.Figure 17.Example 2: prediction precision of different methods under different training data ratios.

Figure 18 .
Figure 18.Example 2: prediction recall of different methods under different training data ratios.

Figure 19 .
Figure 19.Example 2: prediction F1 score of different methods under different training data ratios.

Figure 18 .
Figure 18.Example 2: prediction recall of different methods under different training data ratios.

Figure 18 .
Figure 18.Example 2: prediction recall of different methods under different training data ratios.

Figure 19 .
Figure 19.Example 2: prediction F1 score of different methods under different training data ratios.Figure 19.Example 2: prediction F1 score of different methods under different training data ratios.

Figure 19 .
Figure 19.Example 2: prediction F1 score of different methods under different training data ratios.Figure 19.Example 2: prediction F1 score of different methods under different training data ratios.

Figure 19 .Figure 20 .
Figure 19.Example 2: prediction F1 score of different methods under different training data ratios.
Figures 21-24 compare classification accuracy, classification precision, classification recall, and classification F1 score by the SqueezeNet model with and without transfer learning.As shown in Figures 21-24, the classification metrics by the SqueezeNet model with transfer learning were overall higher than the method without transfer learning.When the ratio

Figure 20 .
Figure 20.Example 2: heatmap of the confusion matrix of the prediction results when the ratio of training data was 0.05.(a) AlexNet with transfer and (b) AlexNet without transfer.

25 Figure 21 .
Figure 21.Example 3: prediction accuracy of different methods under different training data ratios.

Figure 22 .
Figure 22.Example 3: prediction precision of different methods under different training data ratios.

Figure 21 .
Figure 21.Example 3: prediction accuracy of different methods under different training data ratios.

Figure 21 .
Figure 21.Example 3: prediction accuracy of different methods under different training data ratios.

Figure 22 .
Figure 22.Example 3: prediction precision of different methods under different training data ratios.

Figure 23 .
Figure 23.Example 3: prediction recall of different methods under different training data ratios.

Figure 22 .
Figure 22.Example 3: prediction precision of different methods under different training data ratios.

Figure 21 .
Figure 21.Example 3: prediction accuracy of different methods under different training data ratios.

Figure 22 .
Figure 22.Example 3: prediction precision of different methods under different training data ratios.

Figure 23 .
Figure 23.Example 3: prediction recall of different methods under different training data ratios.Figure 23.Example 3: prediction recall of different methods under different training data ratios.

Figure 23 . 25 Figure 24 .Figure 25 .
Figure 23.Example 3: prediction recall of different methods under different training data ratios.Figure 23.Example 3: prediction recall of different methods under different training data ratios.

Figure 25 .
Figure 25.Example 3: heatmap of the confusion matrix of the prediction results when the ratio of training data was 0.05.(a) SqueezeNet with transfer and (b) SqueezeNet without transfer.
, the classification metrics on the leakage dataset by the ResNet18 model with transfer learning were overall similar to those without transfer learning.When the ratio of training data was 0.05, classification accuracy, classification precision, classification recall, and classification F1 score by the ResNet18 model with transfer learning were 0.938, 0.940, 0.935, and 0.937, respectively.The prediction results by the ResNet18 model without transfer learning were 0.933, 0.935, 0.931, and 0.933, respectively.The heatmap of the confusion matrix of the prediction results by the ResNet18 model when the ratio of training data was 0.05 is shown in Figure 30.With the increase in the ratio of training data, the classification metrics by the ResNet18 model with transfer learning increased slightly.However, the classification metrics by the ResNet18 model without transfer learning were not stable.When the ratio of training data was 0.3, classification accuracy, classification precision, classification recall, and classification F1 score by the ResNet18 model with transfer learning were 0.961, 0.963, 0.959, and 0.961, respectively, which is similar to the 0.967, 0.969, 0.965, and 0.966 of the ResNet18 model without transfer learning.The results showed that the classification metrics on leakage defects by the ResNet18 model with transfer learning were overall similar to those without transfer learning.This may be because ResNet18 has a specific network structure (e.g., shortcut connection between skip layers), and ResNet18 has strong feature extraction capabilities and can achieve similar performance to ResNet18 with transfer learning using a small amount of training data.However, due to the pretrained model parameters, the prediction results by the ResNet18 model with transfer learning were more stable than the ResNet18 model without transfer learning.

Figure 26 .
Figure 26.Example 4: prediction accuracy of different methods under different training data ratios.Figure 26.Example 4: prediction accuracy of different methods under different training data ratios.

Figure 26 .
Figure 26.Example 4: prediction accuracy of different methods under different training data ratios.Figure 26.Example 4: prediction accuracy of different methods under different training data ratios.

Figure 27 .
Figure 27.Example 4: prediction precision of different methods under different training data ratios.

Figure 28 .
Figure 28.Example 4: prediction recall of different methods under different training data ratios.

Figure 29 .
Figure 29.Example 4: prediction F1 score of different methods under different training data ratios.

Figure 27 . 25 Figure 27 .
Figure 27.Example 4: prediction precision of different methods under different training data ratios.

Figure 28 .
Figure 28.Example 4: prediction recall of different methods under different training data ratios.

Figure 29 .
Figure 29.Example 4: prediction F1 score of different methods under different training data ratios.

Figure 28 . 25 Figure 27 .
Figure 28.Example 4: prediction recall of different methods under different training data ratios.

Figure 28 .
Figure 28.Example 4: prediction recall of different methods under different training data ratios.

Figure 29 .
Figure 29.Example 4: prediction F1 score of different methods under different training data ratios.Figure 29.Example 4: prediction F1 score of different methods under different training data ratios.

Figure 29 .Figure 30 .
Figure 29.Example 4: prediction F1 score of different methods under different training data ratios.Figure 29.Example 4: prediction F1 score of different methods under different training data ratios.Sensors 2024, 24, x FOR PEER REVIEW 19 of 25

Figure 30 .
Figure 30.Example 4: heatmap of the confusion matrix of the prediction results when the ratio of training data was 0.05.(a) ResNet18 with transfer and (b) ResNet18 without transfer.

Figure 32 .
Figure 32.Comparison of mean prediction precision by different pretrained models.

Figure 33 .
Figure 33.Comparison of mean prediction recall by different pretrained models.

Figure 32 .
Figure 32.Comparison of mean prediction precision by different pretrained models.

Figure 33 .
Figure 33.Comparison of mean prediction recall by different pretrained models.

Figure 32 .
Figure 32.Comparison of mean prediction precision by different pretrained models.

Figure 33 .
Figure 33.Comparison of mean prediction recall by different pretrained models.Figure 33.Comparison of mean prediction recall by different pretrained models.

Figure 33 .
Figure 33.Comparison of mean prediction recall by different pretrained models.Figure 33.Comparison of mean prediction recall by different pretrained models.

Figure 34 .
Figure 34.Comparison of mean prediction F1 score by different pretrained models.

Figure 35 .
There are many factors that affect training time, such as computer equipment, the number of model parameters, training data size, the number of training epochs, etc.The computer equipment used in this paper was a personal computer with an AMD Ryzen 9 5950X 16-Core Processor CPU and an NVIDIA GeForce RTX 3090 GPU.For a specific ratio of training data, the training data sizes for different models were equal, and the number of model parameters and training epochs might be the main factors affecting the training time.As shown in Table6and Figure35, the average training time for the VGG16 model, AlexNet model, SqueezeNet model, and ResNet model with transfer learning were 5.2 min, 4.4 min, 18.8 min, and 16.5 min, respectively.The average training time for the VGG16 model, AlexNet model, SqueezeNet model, and ResNet model without transfer learning were 13.8 min, 8.8 min, 11.3 min, and 10.6 min, respectively.

Figure 34 .
Figure 34.Comparison of mean prediction F1 score by different pretrained models.

Figure 35 .
There are many factors that affect training time, such as computer equipment, the number of model parameters, training data size, the number of training epochs, etc.The computer equipment used in this paper was a personal computer with an AMD Ryzen 9 5950X 16-Core Processor CPU and an NVIDIA GeForce RTX 3090 GPU.For a specific ratio of training data, the training data sizes for different models were equal, and the number of model parameters and training epochs might be the main factors affecting the training time.As shown in Table 6 and Figure 35, the average training time for the VGG16 model, AlexNet model, SqueezeNet model, and ResNet model with transfer learning were 5.2 min, 4.4 min, 18.8 min, and 16.5 min, respectively.The average training time for the VGG16 model, AlexNet model, SqueezeNet model, and ResNet model without transfer learning were 13.8 min, 8.8 min, 11.3 min, and 10.6 min, respectively.The results showed that AlexNet achieved the highest computational efficiency, regardless of whether transfer learning was used or not.The VGG16 and AlexNet models with transfer learning used less training time than those without transfer learning.That was because the VGG16 and AlexNet models with transfer learning required a smaller number of training epochs to Sensors 2024, 24, 5569 20 of 23

Figure 35 . 4 .
Figure 35.Comparison of the training time of different models under different training data ratios.4. Conclusions and Discussion In this paper, a transfer learning strategy was developed to deal with the problems of data shortage in training deep learning-based classification models for underground structure leakage identification.The classification performance of four famous classification

Figure 35 .
Figure 35.Comparison of the training time of different models under different training data ratios.

Table 1 .
Configuration of the training set and test set of different experiments.

Table 1 .
Configuration of the training set and test set of different experiments.

Table 1 .
Configuration of the training set and test set of different experiments.
Table 2 shows the prediction results of the VGG16 model with and without transfer learning.Figures 11-14 compare classification accuracy, classification precision, classification recall, and classification F1 score by the VGG16 model with and without transfer learning.As shown in Figures

Table 3 .
Example 2: prediction results of the AlexNet model with and without transfer learning.

Table 2 .
Example 1: prediction results of the VGG16 model with and without transfer learning.The test results of the AlexNet model are discussed in this section.The detailed prediction results of the AlexNet model with and without transfer learning are shown in Table 3, and classification accuracy, classification precision, classification recall, and classification F1 score by the AlexNet model with and without transfer learning are shown in Figures 16-19.

Table 3 .
Example 2: prediction results of the AlexNet model with and without transfer learning.

Table 4 .
Example 3: prediction results of the SqueezeNet model with and without transfer learning.

Table 5 .
Example 4: prediction results of the ResNet18 model with and without transfer learning.Classification accuracy, classification precision, classification recall, and classification F1 score by the ResNet18 model with and without transfer learning are shown in Figures 26-29.As shown in Figures 26-29

Table 6 .
Training time of different models under different training data ratios (min).

Table 6 .
Training time of different models under different training data ratios (min).