Data on prediction of geological characteristics during shield tunnelling in mixed soil and rock ground

This data presented in this article pertain to measured data obtained from earth pressure balance (EPB) shield tunelling of Guangzhou-Foshan intercity railway project. The measured data consists of geological characteristics and the main shield parameters in each lining ring during shield tunnelling. The distribution of raw data was displayed, and the geological characteristics via field record were compared to the prediction results of improved stacking method. The value of the database is consideration of the relationship between shield operational parameters and geological characteristics encountered in the shield tunnelling area, including formations with soft soil, majority of soft soil, and majority of hard rock. The raw data was standardized and processed to low dimensional data by principal component analysis, which can be better used in geological characteristics classification. The presented data are applied to identify the geological characteristics in the article titled “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm”.


a b s t r a c t
This data presented in this article pertain to measured data obtained from earth pressure balance (EPB) shield tunelling of Guangzhou-Foshan intercity railway project. The measured data consists of geological characteristics and the main shield parameters in each lining ring during shield tunnelling. The distribution of raw data was displayed, and the geological characteristics via field record were compared to the prediction results of improved stacking method. The value of the database is consideration of the relationship between shield operational parameters and geological characteristics encountered in the shield tunnelling area, including formations with soft soil, majority of soft soil, and majority of hard rock. The raw data was standardized and processed to low dimensional data by principal component analysis, which can be better used in geological characteristics classification. The presented data are applied to identify the geological characteristics in the article titled "Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm".

Value of the Data
• The data of shield operational parameters can be used to analyze the relationship with the geological characteristics during shield tunnelling [2 , 3] . • The data of geological characteristics in each lining ring can be applied to ensure the safety of shield tunnelling process and arrange the schedule of construction [4 , 5] . • The data can help other researchers focus on the efficiency of shield tunnelling and the project's cost [6 , 7] . Additionally, scholars can evaluate the performance and risk during shield tunnelling in soil-rock ground [8][9][10] . • The steps of geological characteristics prediction can help researchers understand the process and application of the stacking classification algorithm integrated with grid search (SCA-GS).

Data Description
In this article, the database consists of field-measured shield parameters by sensors and types of geological characteristics recorded by shield operators. The operational parameters were collected from acquisition system in shield machine. Additionally, the transformed parameters were also considered as input data shown in this article. Fig. 1 indicates the distribution of original input data, which can help readers better understand the data structure. The original data consists of 6 shield parameters: cutterhead rotation speed (CRS), advance rate (AR), mean thrust (MF), mean cutterhead torque (MT), upper earth pressure (UEP), and lower earth pressure (LEP) and four transformed factors: penetration rate (PR), torque penetration index (TPI), specific energy (SE), and field penetration index (FPI) [11][12][13] . The dimension of original data was reduced to 6 principal components through principal component analysis (PCA). The geological characteristics were predicted using an improved stacking classification algorithm with original data and the data with reducing dimensions using PCA (PCA data). Then, the geological characteristics can be classified by SCA-GS algorithm. Fig. 2 shows the geological characteristics in record after shield tunnelling. The original data (standard FPI, standard TPI, and standard SE in Fig. 2 a) and PCA data (principal components 1,  2, and 3) were employed to display the 3D feature space to show the distribution of geological characteristics.

Experimental Design, Materials and Methods
The shield parameters are the reaction of the geological characteristics variety. The geological conditions are always continued in a construction site. Therefore, the shield parameters can be clustered as factors to evaluate the types of geological characteristics [14][15][16] .
In this article, six shield parameters and four transformed parameters were used to analyze the geological characteristics [17] . The correlation of these factors and geological characteristics was assessed by Spearman correlation coefficient (SCC) in Table 1 . The result of Spearman correlation coefficient shows that PR and AR are negatively correlated with the geological characteristics. On the other hand, the TPI, SE, and FPI are the most positively correlated with geological characteristics. Therefore, the TPI, SE, and FPI can be better used to show the results of geological characteristics identification. Table 2 presents the steps of geological characteristics prediction using SCA-GS. Before conducting the SCA-GS, the original data was standardized and processed by PCA. The results of PCA process were considered as input data with k dimensions ( D ). Then, 80% of the database was randomly selected as a training set, and the remaining 20% of the database was considered as test set [18] . The dataset, primary classifiers, and meta-classifier were input in the SCA-GS model. The selected hyper-parameters of each primary classifier were set pairs using GS. Next, the primary classifiers with paired hyper-parameters were trained using training set and evaluated by K-CV. Then, the best models and corresponding hyper-parameters with the highest accuracy were selected to integrate the stacking algorithm. The results of primary classifiers can be employed to train the meta-classifier. Finally, the classification results were obtained using the meta-classifier on the test set. More detailed data processing, classification process can be found in companion article [1] .
The SCA-GS prediction model for geological characteristics was developed by Python program. Readers can contact the author to apply for the source program.