Data on evolutionary hybrid neural network approach to predict shield tunneling-induced ground settlements

The dataset presented in this article pertains to records of shield tunneling-induced ground settlements in Guangzhou Metro Line No. 9. Field monitoring results obtained from both the two tunnel lines are put on display. In total, 17 principal variables affecting ground settlements are tabulated, which can be divided into two categories: geological condition parameters and shield operation parameters. Shield operation parameters are specifically provided in time series. Another value of the dataset is the consideration of karst encountered in the shield tunnel area including the karst cave height, the distance between karst cave and tunnel invert, and the karst cave treatment scheme. The dataset can be used to enrich the database of settlement caused by shield tunneling as well as to train artificial intelligence-based ground settlement prediction models. The dataset presented herein were used for the article titled “Evolutionary hybrid neural network approach to predict shield tunneling-induced ground settlements” (Zhang et al., 2020).

network approach to predict shield tunneling-induced ground settlements" (Zhang et al., 2020 Table   Subject Civil Engineering Specific subject area Geotechnical Engineering and Engineering Geology Type of data Table  How data were  acquired Geological investigation, on-site settlement monitoring, automatic transmission of shield machine operation data Data format Raw and analyzed Parameters for data collection Data monitoring and collection were required during the entire tunnel excavation.

Description of data collection
Geological investigation determines the geological conditions along the tunnel lines. Ground surface settlement markers were installed with approximately 5 m intervals along the tunnel alignment. The ground settlements in the study site were recorded every day except for inevitable circumstances. Shield machine operation data were recorded in real-time, and the average values of the operation parameters of each ring were taken.

Value of the Data
• The dataset contains major factors that influence the ground settlement in shield tunneling.
• Those who focus on shield tunneling-induced ground settlements, especially considering the karst geological influence, may benefit from this dataset. Because this dataset includes the treatment of karst for ground settlement prediction. • The dataset is useful for establishing an artificial intelligence-based model for predicting tunneling-induced ground settlements. This dataset can be conveniently reused by researchers. In particular, the dataset can be directly applied to train a single-target regression predictive model. Moreover, if applicable, the dataset can also form part of a larger engineering dataset, thereby being able to provide more information not only for training predictive models but also for understanding the mechanism of ground settlements induced by tunneling.

Data Description
The dataset in this article (see the supplement excel files) was collected in Maanshan Station-Liantang Station, Metro Line No. 9, Guangzhou, China [1] . The ground settlement filed data were divided into a training set ( Table 1 in the DatasetForTraining.xlsx file) and a testing set ( Table 2). The rows correspond to the 328 different monitoring markers along the tunnel centerlines where the data were collected. Besides, 17 variables describing the geological conditions and shield operation parameters that may affect ground settlement were tabulated in different columns. Moreover, the file titled "MonitoringResults.xlsx" records the tunnel excavation process and the detailed variation of shield operation parameters.
The geological condition variables include groundwater level (GL), thickness of backfill over tunnel crown (BCT), thickness of sand-soil over tunnel crown (SCT), thickness of weathered rock Table 1 Testing set for shield tunneling-induced ground settlement prediction.  over tunnel crown (RCT), thickness of sand-soil under tunnel invert (SIT), thickness of rock under tunnel invert (RIT), height of karst cave (KH) and distance between karst cave and tunnel invert (KD). The selected operation parameters include total thrust (T) to push the TBM during the excavation of each ring, cutter head torque (CT), penetration rate (V), tail void grouting pressure (GP), grouting volume (GV), face pressure (FP), tunneling deviation (TD), tail void (TV), karst cave treatment scheme (KTS). Among all the considered variables, 16 of which can be determined directly from the recorded data except KTS. In this study, we set a dummy variable to represent KTS and stipulate that KTS = 1 represents karst cave with treatment, KTS = 0 represents karst cave without treatment and KTS = 0.5 represents no karst caves detected. It should be pointed out that TD and TV in Table 1 and Table 2 were processed using the principal component analysis (PCA). The original monitored TD and TV data storing in Table 3 and Table 4 in the supplement DatasetForTraining.xlsx file consist of several orientational values. The use of PCA is to reduce the number of interrelated variables [ 2 , 3 ].

Experimental Design, Materials and Methods
The method used to acquire the geological data provided below (  Table 2) mainly consists of geological survey. Geological survey determines the ground information of the tunnel and its surroundings, including soil type and soil mechanical properties, groundwater level, karst, etc. Basically, geological survey includes the on-site investigation for the ground information, and the indoor experiment for soil mechanical properties. The shield operation data (recorded in the MonitoringResults.xlsx file) can be automatically output by the shield machine's intelligent control system in real time. When establishing a prediction model, the operation data of the shield machines will be averaged in each ring and then used as the input data of the model.
The original monitored TD and TV consist of several orientation values. The PCA algorithm [2] is used to reduce the data redundancy. The main steps for PCA include: (1) to standardize the indicator data; (2) to find the covariance matrix of the standardized dataset; (3) to compute the eigenvalues and eigenvectors of the covariance matrix; (4) to determine the principal components. Algorithm 1 displays the calculation process of the PCA algorithm. The PCA algorithm has been implemented in the Fortran code (see the supplement Fortran code file).

Algorithm 1:
The calculation process of the PCA algorithm [2] .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.