Generation of Synthetic Density Log Data Using Deep Learning Algorithm at the Golden Field in Alberta, Canada

This study proposes a deep neural network(DNN-) based prediction model for creating synthetic log. Unlike previous studies, it focuses on building a reliable prediction model based on two criteria: fit-for-purpose of a target field (the Golden field in Alberta) and compliance with domain knowledge. First, in the target field, the density log has advantages over the sonic log for porosity analysis because of the carbonate depositional environment. Considering the correlation between the density and sonic logs, we determine the sonic log as input and the density log as output for the DNN. Although only five wells have a pair of training data in the field (i.e., sonic and density logs), we obtain, based on geological knowledge, 29 additional wells sharing the same depositional setting in the Slave Point Formation. After securing the data, 5 wells among the 29 wells are excluded from dataset during preprocessing procedures (elimination of abnormal data and min–max normalisation) to improve the prediction model. Two cases are designed according to usage of the well information at the target field. Case 1 uses only 23 of the surrounding wells to train the prediction model, and another surrounding well is used for model testing. In Case 1, the Levenberg– Marquardt algorithm shows a fast and reliable performance and the numbers of neurons in the two hidden layers are of 45 and 14, respectively. In Case 2, the 24 surrounding wells and four wells from the target field are used to train the DNN with the optimised parameters from Case 1. The synthetic density logs from Case 2 mitigate an underestimation problem in Case 1 and follow the overall trend of the true density logs. The developed prediction model utilises the sonic log for generating the synthetic density log, and a reliable porosity model will be created by combining the given and the synthetic density logs.


Introduction
Reservoir modelling is an essential work to understand and assess a target reservoir, and a reservoir model is used to implement a reservoir simulation for preparing a field development plan. Well logging is the most important data in reservoir modelling. Based on the petrophysical properties determined from well logging, a spatial correlation (e.g., variogram) is estimated and geostatistical algorithms are applied to build a reliable, three-dimensional reservoir model. Even though reservoir modelling is affected by the amount of available well log data, data shortage problem always exists because the well log data can only be acquired through an expensive drilling process. Sometimes, although well log data are obtained by drilling, a specific log type that predicts the desired reservoir properties may not have been measured. For instance, when a porosity model is needed, sonic or density logs may not be obtained during well logging. Moreover, the log data may be missing for the depth of interest. In these cases, a solution is to acquire additional well log data by new drilling or by rerunning the well logging to obtain the required log type for an already drilled well. However, drilling a new well or stopping production to rerun logging causes a huge additional cost, and some log types are not measurable owing to casing [1][2][3].
Lately, there have been researches on how to handle these problems by generating synthetic (or pseudo) well log data using the concept of machine learning. Rolon et al. designed three synthetic well log prediction models, i.e., resistivity, density, and neutron logs, in the northeast of the United States [4]. They mentioned that the geometry of the training wells was not related with the performance of a neural network. Besides, the quality control of training data considerably affected the trained prediction model. Long et al. built synthetic density log with a concept of pairwise well prediction [3]. Even though eight wells were available in the field, they chose only one ideal well to train a neural network. However, the two aforementioned studies used a simple neural network model with a single hidden layer [3,4].
Salehi et al. trained a deep neural network (DNN) with three hidden layers using two wells from a carbonate oil reservoir on the southwest of Iran [2]. The target field consisted of eight pays, but they chose one pay zone to generate the prediction models because each pay had different lithology. They generated three prediction models (i.e., true resistivity, sonic, and shallow resistivity logs) with seven input logs, including water saturation, density, neutron, and deep resistivity. Therefore, several well logs will be necessarily required to use the trained prediction models.
Korjani et al. built a resistivity prediction model for heavy oil reservoirs in Joaquin Valley, California [5]. They compared three strategies for the input layer: information of surrounding wells (angle and distance), kriging coefficient, and fuzzy kriging range. After more than 1,200 wells were trained for each strategy, the pseudoresistivity log from the fuzzy case showed the best performance. When the synthetic log was used to build a three-dimensional facies model, it displayed the geological trend (e.g., channel connectivity) properly. Previous studies have predicted missing logs where wells already existed, but in the case of [5], they created a neural network model that predicts log data at arbitrary locations without drilling. Instead of the information of a target location, well log data of 10 surrounding wells and their location data were used to train a prediction model. Further, they predicted three logs simultaneously. However, as a considerable amount of well information was needed to train the model, it will be difficult to apply to areas where the surrounding well information is limited.
Akinnikawe et al. compared several machine learning algorithms (e.g., artificial neural networks (ANN), decision trees, gradient boosting, and random forest) for generation of synthetic well logs [6]. In addition, they predicted unusual logs such as the photoelectric (PE) and unconfined compressive strength (UCS) logs because the PE log is often not measured during well logging and the USC log requires an expensive core experiment. Although the neural networks and random forest outperformed the other algorithms, they had a disadvantage of requiring more than 10 input data.
Because previous studies applied simple feed forward neural networks, the trained prediction models may not consider the sequence of log curves in depth. Zhang et al. used a recurrent neural network to analyse well logs as sequence data [1]. They discussed that the previous ANN-based prediction models found the correlation among well logs at the same depth ignoring the geological trend in reservoirs. A long short-term memory algorithm was successfully introduced to generate missing or whole well logs for both vertical and horizontal wells in the Eagle Ford Shale. They also emphasised the importance of geological criteria for training data because a log curve has a unique hidden pattern for each stratum.
The previous studies have focused on the application of machining learning algorithms for well logging. In this research, we set two criteria for a practical application of machine learning: fit-for-purpose for a target field and domain knowledge. First, the input and output layers of a prediction model should be determined by the status of availability of well log data for a target field. If the goal is to construct a reliable porosity model and a target field lacks sonic log, a prediction model for sonic log should be trained with the available log data. However, previous prediction models tried to make a prediction model without considering which logs are needed for the prediction and why [2,3,6]. Therefore, a trained model required numerous well logs for the input layer and hundreds of wells were used for training [5,7]. If we already have hundreds of well logs of several log types for a target area, a reliable reservoir model could be built without synthetic log.
Second, it is essential to preprocess the log data based on knowledge of petroleum geology and engineering, rather than using all available data. In [8], because of the importance of geological criteria, they generated a prediction model by dividing the vertical log data according to formation. Therefore, the goal in this study is to apply a machine learning algorithm correctly and effectively to well logging prediction based on the given field conditions and domain knowledge.
The target field in this study is the Golden field, which belongs to the Beaverhill Lake Group and was deposited in the Western Canada Sedimentary Basin during the middle to upper Devonian age. The main production zone is the Slave Point (SLVP) Formation, which is a subdivision of the group. The depositional environment of the formation has been interpreted as a shallow marine environment and its sedimentary facies showed complex reef carbonate deposits [9][10][11]. Dolomitisation affected the carbonate rocks at the Golden field extensively, and dissolution developed secondary pore-like intercrystalline and interconnected pore spaces [10]. Due to the aforementioned diagenesis effects, the porosity calculated by sonic log data could be underestimated when compared to the total porosity obtained by density log data [12]. Because the estimated porosity data from well logging are critical information for geostatistics, underestimated porosity data at a well location affect the overall trend of a three-dimensional porosity model. Consequently, it will cause an inaccurate reservoir simulation result and unreasonable economic evaluation for the target field. In this study, the density log means bulk density log data.
In the case of density log, it can consider the effect of secondary porosity properly and can be utilised for identification of evaporate minerals [4]. Regarding the target field 2 Geofluids conditions (e.g., depositional setting and dolomitisation), density log data are key information to build a reliable porosity model instead of sonic log. However, in the target field of this study, only 17 wells have density log data, and thus, additional density log data are obviously required for reasonable porosity modelling. Therefore, the density log is assigned to the output layer in a neural network. Long et al. [3] created a prediction model of synthetic density log but it required more than 30 data for the input layer. In the target field, this trained model cannot be applied because the log data for the input layer are not available for most of the wells. Although the problem of the target field is that sonic log is not acceptable to estimate porosity, it still has high correlation with density log. Therefore, we determine the sonic log with three-dimensional coordinates (i.e., latitude, longitude, and depth) as the input layer in a neural network.
In the target field, 12 wells have the sonic log without density log. If synthetic density log can be generated from the sonic log, the reliability of a porosity model can be improved by using both the 12 synthetic density logs from the sonic log and the existing 17 density logs. Note that the basic structure of a neural network (i.e., the input and output layers) was determined based on fit-for-purpose for the target field.
A pair of input and output data points is needed to train supervised machine learning algorithms. For the target field, only five wells have both sonic and density logs. Therefore, an additional pair of data points is searched based on domain knowledge. The previous studies selected a set of train wells according to the distance between the prediction location and the available well location. However, geological similarity is more important than the physical distance. If a prediction model is trained by well log data from the same geological formation of the target reservoir, the performance of the trained model will be improved.
In this research, we examine the effect of two criteria (fit-for-purpose for a target field and domain knowledge) for preparing training data on a synthetic well log prediction model. In Section 2, we explain in detail a specific workflow of the proposed method to generate synthetic density log for the target reservoir. It consists of data acquisition, preprocessing of selected data, structuring of a neural network, and determination of hyperparameters. In Section 3, we analyse the synthetic density log from a trained prediction model. Two cases are considered depending on the usage of well information at the target field. One case generates a prediction model using information only from wells around the target field. In another case, not only the surrounding wells but also the wells belonging to the target field are included. Then, the key outcomes are summarised in Conclusions (Section 4).

Methodology
This study follows the procedure described in Figure 1. First, additional sonic and density log data are collected because only five wells in the target field have both logs. Well log data are obtained in the SLVP Formation near the Golden field from the AccuMap database (Figure 1(a)). Note that instead of simply selecting the nearest wells from the target field, we selected additional wells that have the same depositional environment. The data consist of the sonic and density logs with the location information of the well (i.e., depth, latitude, and longitude). Preprocess procedures are applied to the obtained data, which include elimination of abnormal values and data normalisation (Figure 1(b)). After a sensitivity analysis for a default neural network structure (Figure 1(c)), a proper training function (Figure 1(d)) and the number of nodes in the hidden layers (Figure 1(e)) are fixed. Finally, the best and worst trained neural networks (Figure 1(f)) are verified by test wells (Figure 1(g)).

Data Acquisition and
Preprocessing. An objective of this study is to suggest how to predict synthetic density log from the sonic log. Firstly, well log data, which have sonic and density data at the same time, are necessary. It is well known in deep learning that obtaining qualified training data is the most important factor to build a reliable prediction model because enough data are necessary to train a DNN properly [13,14]. Intuitively, the closer the wells are located in the target area, the higher the quality. However, we consider not only the spatial relationship but also the geological similarity. In other words, we acquire well data in the same depositional environment rather than just based on a physical distance.
A target of this study is the SLVP Formation at the Golden field and Figure 2 shows the oil and gas wells (the black and red circles) near the target field. The empty circles mean wells with no gas or oil and the black-coloured ones are wells showing oil or gas. The SLVP Formation was deposited during the transgressive sequence, and its sequence is divided into two major sea level rise cycles according to the relative degree of rise. For this reason, the carbonate depositional environment between two cycles is different [9,15]. In Figure 2(a), the solid blue line indicates the Bank margin, cycle 2, and the solid red line is the depositional limit of the transgressive sequence. Those two lines are supposed to be boundaries defining a depositional setting similar to that of the target field. Therefore, the blue highlighted area in Figure 2(a) indicates a territory between the two geological frontiers and it has a consistent geological environment. Even though there are lots of wells (the black color) nearby the target field (the purple circle) in Figure 2(a), only the wells located in the blue area are interested in terms of depositional setting.
However, most wells in the blue area do not have information in the SLVP Formation. The red wells were drilled into the SLVP Formation and they are located in the blue area at the same time. In the previous researches for synthetic logging, training well data have been selected by physical distance without considering domain knowledge but the data are chosen based on geological meaning in this study. Figure 2(a) is redrawn into Figure 2(b), which clearly shows the red coloured wells in the research area without the boundaries and highlighted area. Some wells have neither sonic nor density logs, although both data types are required to train a DNN model. Among the red wells, only 34 wells have both sonic and density data, and they are carried to the preprocessing stage. Figure 3(a) presents the location of the 34 wells. First, we check the quality of the wells based on domain knowledge.

Geofluids
Five wells belong to the black dotted ellipse in Figure 3(a), which indicates the Golden field. We excluded abnormal sonic and density data according to density correction log and caliper log, which provide conditions of borehole such as mud cake or washing-out. Typically, we expect the correlation between the sonic and density data to have a positive value because the higher the speed of the sonic data is, the higher the density of the rock. In this study, the correlation between the sonic and density logs is supposed to have a negative value because the sonic data have a unit of time over distance. Therefore, if a correlation coefficient presents a positive value, the data should be not used for training a prediction model.
In Figure 3(b), the data of the five blue coloured wells, numbers 4,9,19,22, and 25, are removed from the well list because they showed strong positive correlation coefficients or curiously constant values. The wells 4, 9, and 25 among the excluded five wells in Figure 3(b) have more than +0.3 as correlation coefficient between the sonic and density logs. In addition, the wells 19 and 22 are deleted from the training data because of their flat values, without any trend.
In Case 1, from a total of 34 wells in Figure 3(a), five wells (the blue circles) are removed because of its poor data quality and other five wells (in the black dotted area) in the Golden field are hypothetically considered as not existent ( Figure 3(b)). Thus, the remaining 24 wells are used for further processing, and one well among the 24 wells is set as the test well (the red circle).
Case 2 is set for comparison with Case 1 so that the effect of well data at the target field could be examined. Case 2 utilises both the 24 wells of Case 1 and the five wells in the target field (Figure 3(c)). The test well in Case 1 belongs to a training set, and the five abnormal data are still not used. Case 2 assumes one well among the five wells at the target field as the test well. In Case 1, trainings are conducted to select the preferred training conditions (e.g., optimisation algorithm and the number of neurons in the hidden layers) and the hyperparameters from Case 1 are applied to Case 2.
A reliable neural network can be built by properly prepared training data. We already applied the two steps of preprocessing. After the selection of well data from a similar geologic depositional system to that of the Golden field, the five well data were entirely eliminated because their overall trends are not acceptable. In the case of the remaining 24 wells, partly wrong logs are removed and the remaining logs are utilised for training.  Figure 4(c), the well 2 has partly not proper sonic and density logs. Both sonic and density data are acceptable from 1607 to 1620 m. However, after 1620 m, the sonic data have unreasonable constant values. Moreover, after approximately 1635 m, all the density data indicate −999, which means malfunction of the logging equipment or some problems. Those useless parts are eliminated for training. The correlation coefficient of the well 2 improves after the elimination (−0.0573 to −0.2885 in Figure 4(c)). In the case of Figure 4(d), because the overall trend of the correlation shows a high positive value, 0.4861, the entire data of the well 4 are removed instead of partial treatment. Figure 5 shows the effect of elimination of improper data for the 29 wells. Figure 5(a) presents the correlation coefficients of sonic and density data from each of the 29 wells before any data preprocessing. The correlation coefficients are separately calculated for each well. Figure 5(b)shows the changed distribution of correlation coefficients after the entire or partial elimination of abnormal values for each well. The mean of the correlation coefficient is improved in terms of the negative correlation between sonic and density data from −0.2647 to −0.3913. For example, well 27 has a positive correlation coefficient ( Figure 5(a), the red coloured frequencies). However, that is because of a strange flat constant, not because of a wrong relationship between the sonic and density log data. After the elimination, all the data presenting positive correlation coefficients are removed ( Figure 5(b)). The preprocessing for the data is critical for the overall training performance, and the difference in prediction performance depending on the data preprocess is presented in Results and Discussion.
For a single well, the log data are evenly spaced and each well has nearly two hundred data points. However, the number of data points is different depending on the wells because the interval of log data may differ for each well (e.g., 2.5, 10, 12.5, 15.2 cm). For Case 1, the training and validation data points are 8,684 from the 23 wells and the test data points are 219 for the test well (Figure 3(b)). Note that each data point consists of latitude, longitude, depth, sonic, and density data. Thus, the preprocessed training and validation dataset is a 5 by 8,684 matrix and the test dataset is a 5 by 219 matrix. For Case 2, the training and validation data points are more than 9,000 because Case 2 has more data from the additional wells. Compared to Case 1, Case 2 includes the test well of Case 1 and the four wells at the Golden field. In Case 2, the number of data points depends on which well, among the five wells at the Golden field, is classified into the test data.
One of the factors affecting the training performance of a neural network is normalisation of the given data. All the data, such as latitude, longitude, depth, sonic, and density, have different units and scales. The depth unit is m and the sonic and density log units are μs/m and kg/m 3 , respectively. Table 1 presents statistical information of the training and 5 Geofluids validation data. Those values in different units need to be normalised in a consistent way for proper training of a DNN model [16]. Min-max normalisation is usually applied for deep learning data because it can transform the given data into a range between 0 and 1 without exception [17,18]. Each category of data is normalised by the following equation: where Data norm is the normalised data between 0 and 1, Data are the raw data of each parameter (i.e., latitude, longitude, depth, sonic, and density data), and Data max and Data min are the maximum and minimum values of the raw data, respectively. These preprocessed data are utilised for training of a DNN-based prediction model.

2.2.
Structure of Neural Network. The structure of a neural network is definitely important for generation of pseudodensity data based on sonic data. Because there might be nonlinear relationship between the sonic and density log data, it is difficult to generate pseudodensity data from a single input feature (sonic data). In log data, there is information of 3D location: latitude, longitude, and depth, and this spatial information would help find the intrinsic relationship between the sonic and density data. Thus, the latitude, longitude, depth, and sonic data are applied to the input layer of a neural   7 Geofluids network and the density data are placed on the output layer. Figure 6 shows the neural network used in Case 1. The subscript i, IL, FH, SH, and OL refer to the ith training data, input layer, first hidden layer, second hidden layer, and output layer, respectively.
The number of hidden layers is set as two because two hidden layers are likely to be advantageous than one hidden layer for solving the complicated and nonlinear relation between the sonic and density data. However, three hidden layers are excessive as they increase the computational cost, and thus, result in inefficient performance. Each hidden layer has 10 nodes for the basic case.

Selection of the Training Algorithm and Hidden Layer
Nodes. The process of training a neural network is, in practice, the updating of weights and biases between layers to obtain an optimised network. Usually, the default object function is the mean square error (MSE) between the outputs by the DNN and the target outputs (Equations (1) and (3)). In this study, the target outputs are the original density data and they are compared with the synthetic density data from the DNN model.
where f is an activation function, MSE is the error between the target density data D i and the predicted density data a i,SH−OL , and n and N S are the number of training data and the number of neurons in the second hidden layer.
To minimise the objective function MSE, weights and biases are updated by the training algorithm. This study is implemented with the deep learning package in MATLAB and it provides some training functions. We tested eight training functions, listed in Table 2, to find the appropriate one. The algorithms are divided into three categories: gradient descent, conjugate gradient, and quasi Newton. Gradient descent is a method to minimise an objective function by taking steps proportional to the negative of the gradient of the objective function [19]. The conjugate gradient method is the numerical solution algorithm of linear equation systems whose matrix is symmetric and positive-definite [20,21]. Quasi Newton is an alternative method to the full Newton's method when its application is too cost-expensive and complicated [22].
In terms of the hidden layer's node, there could be infinite combinations theoretically. As mentioned in Section 2.2, the number of hidden layers is set as two with 10 nodes for each layer during determination of the best training algorithm. Then, the number of nodes in the hidden layers is analysed to find a proper combination of nodes. Table 3 compares Cases 1 and 2 with regard to training and test data. In Case 1, the DNN is optimised using the 23 wells for training and it is verified by one well data for the test (Figure 3(b)). The test well (the well 29) is selected from the central part because it is spatially not biased. Compared to Case 1, Case 2 has data from the five additional wells, which belong to the Golden field (black dotted ellipse in Figure 3(c)). In Case 2, one of the five wells in the Golden field is set as the test well, and the remaining four are used as the training and validation data. According to the combinations of training and test data, five subordinate cases exist: Case 2-1, 2-2, 2-3, 2-4, and 2-5, listed in  Figure 6: Structure of the DNN for generation of pseudodensity data from sonic data. 8 Geofluids Table 3. Note that the five additional wells (1 st to 5 th ) at the Golden field are presented in Figure 3(c).

Results and Discussion
3.1. Case 1. As mentioned in Section 2.1, the preprocessed data are applied to the default neural network: two hidden layers with the basic 10-10 node combination. The structure of the default neural network is schematically presented in Figure 7. More detailed training options are summarised in Table 4. In Section 3.1.1., the default training condition of the neural network is decided first. Then, a sensitivity analysis of the training algorithms is conducted using the default settings because it is important to efficiently find a training condition to show a trustworthy training performance considering a training cost. Moreover, based on the default condition of the neural network, the performances of the trained neural networks before and after the preprocess of data are analysed to verify the effectiveness and necessity of the preprocess. In Section 3.1.2., another sensitivity analysis is performed to determine the number of nodes in the hidden layers. A hierarchical analysis is used to find the best combination of the number of neurons in the hidden layers.
3.1.1. Sensitivity Analysis for Training Algorithm. There are two aspects that we considered to decide whether a training is properly designed and well performed: errors with the validation data and test data. In Case 1, the training and validation data are randomly selected among 8,684 data points with the assigned ratios of 0.85 and 0.15 (Table 4), although the well 29 is fixed as the test well data (219 data points). The validation and test data ratio over the training data is about 18%. Even if the structure and hyperparameter of a specific neural network are the same, it might give somewhat different trained results owing to the random selection of the training and validation datasets. We build several prediction models to mitigate the effect of the random selection. Based on the central limit theorem, a total of 30 prediction models are created and their average is regarded as the performance  9 Geofluids of the trained DNN in this study. In other words, we predict the 30 synthetic density logs for the same input sonic log according to each prediction model.
In terms of the validation and test data, two errors are compared in the eight training algorithm functions. Figure 9 shows the MSEs of the validation data in each training algorithm in a bar chart. The bar of each algorithm means the average error of 30 pseudodensity logs from randomly selected training and validation datasets under the same training conditions. In Figures 9(a) and 9(b), both validation and test errors are consistently calculated with the preprocessed values by Equation (1). These results are normalised in the range between 0 and 1 to fairly reflect both the validation and test errors (Figures 9(c) and 9(d)).
Some training algorithms such as trainbfg, trainrp, and trainscg (Table 2) show decent performances in the validation error compared to trainlm (Table 2). However, there is a difficulty in having a consistent result for trainlm for both the validation and test errors. Figure 9(e) is the sum of Figures 9(c) and 9(d) to provide a general comparison of the errors of the eight algorithms. The best training algorithm is trainrp, because it showed not only a fast but also a stable performance in both cases of errors. Therefore, trainrp is selected as the default algorithm and it is used for a sensitivity analysis of the number of neurons in the hidden layers in Section 3.1.2.
Among the eight algorithms, the reconstruction performances for the test data of four representative algorithms are compared in Figure 10. They are chosen from each category of training algorithms: traingdm and trainrp of gradient descent, trainscg of conjugate gradient, and trainlm of quasi Newton (Table 2). In Figure 10, the blue lines indicate the mean of 30 synthetic density logs from 30 trained neural networks, and the red lines are the true density log of the test well data. Thus, the closer the match between the blue and red lines is, the better the prediction performance of the training algorithm. The two pictures in the first row are good matching examples (Figures 10(a) and 10(b)). In contrast, the two results in the second row are poor matching examples (Figures 10(c) and 10(d)). In Figure 10(a), trainscg gives an average line following the test well trend. In spite of some discrepancy between the test and the reconstruction, the overall trends matched with each other. In Figure 10(b), trainrp gives a good matching performance with the test line, as good as that in Figure 10(a), although the middle part between 50 to 150 has an almost flattened trend. In Figure 10(c), compared to the remaining three training algorithms, the average by trainlm does not follow the trend of the reference well data. It results in the high error in Figure 9(d). In Figure 10(d), traingdm has a flattened prediction and it does not represent the pattern of the real density data.
In Section 2.1, the importance of a proper preprocess for the density and sonic log data is highlighted. The results of Figure 10 are from preprocessed training data and Figure 11 shows the prediction results without a proper preprocess. In Figures 11(a) and 11(b), the trend of the blue lines is similar to those in Figures 10(a) and 10(b). However, the gap between the blue and red lines becomes larger because the deviations of the blue lines decreased in both trainscg and trainrp (Figures 11(a) and 11(b)). trainlm still presents a large discrepancy between the test data and the reconstructed density log (Figure 11(c)). Figure 11(d) highlights how improper training data affect the training performance. It seems that poor training data cause flattening in the estimations of the test data because faulty data make it difficult to find the essential intrinsic relationship between the input and output data.
Although the results of the four algorithms seem to have a similar pattern in Figures 10 and 11, Table 5 quantitatively  10 Geofluids presents the difference in performance according to data preprocess. The discrepancy between the true and predicted density data is calculated with the mean of the RMSE in the following equation: where Test means the density log of the test well and Recon denotes the reconstruction values corresponding to Test. The subscripts i and j indicate the ith data point and jth trained model, respectively. m is the number of data point of the test well, and n is the number of the trained neural network. In this study, m and n are 219 and 30, respectively.
The data preprocess results in decreased errors for the all four training algorithms. trainlm shows an unreliable training performance regardless of the data preprocess and the results of traingdm are sensitive to the preprocess of data. The overall reduced errors in Figure 10 verify the necessity of proper preprocessing for qualified training data. These results are in agreement with the results of previous studies, which mentioned the importance of data processing [4,6]. Second, to find the best case, we set 200 combinations by varying the two parameters within the preferred range independently. Note that the train algorithm is fixed as trainrp. Figure 12 shows the error results of the eight combinations. Here, one error value for each combination means an average error of 30 synthetic density data. Figures 12(a) and 12(b) are the validation and test errors, respectively. In Figure 12(a), the validation error consistently decreases as the number of nodes of the hidden layers increases. A large number of nodes have advantage in the validation data error. However, the test error moves up and down as the number of nodes in the hidden layers changes (Figure 12(b)). The lowest test error is shown in the combination of 30-30 hidden layer nodes. It seems that the behaviours of the validation and test errors are different because the test dataset does not have exactly the same distribution of the training or validation data. This problem can be solved if sufficient data are available.
In spite of that discrepancy of trend in the validation and test errors, we should make a compromise between the two errors to decide an appropriate combination of hidden layer nodes. Thus, we calculate the total error of the two normalised errors (Figure 12(e)) after the validation and test errors are normalised, as shown in Figures 12(c) and 12(d). Consequently, there would be a 13 Geofluids proper combination of hidden layer nodes around the 30-30 case. From the eight combinations at the first level of the hierarchical approach, we can make a reasonable combination of neurons for further analysis.
We randomly generate 200 combinations by changing the number of neurons for the two hidden layers from 5 to 45 because the 30-30 combination is preferred in Figure 12. Figure 13 shows the 200 combinations of the first and second hidden layer nodes. The x and y-axes mean the number of nodes in the first and second hidden layers, respectively. Each circle represents one combination. The red and blue circles are the best and worst 20 combinations, respectively. The performance of the 200 combinations is estimated in the same way as that in Figures 9(e) and 12(e), which is the sum of the normalised validation and test errors. The blue and red circles are obviously separated into the left and right sides. It is revealed that, first, a large number of nodes are needed in the first hidden layer to achieve a high performance. Second, compared to the first hidden layer, the   overall performance by the second hidden layer is not sensitive to the number of nodes. The best combination among the 200 combinations is the 45-14 case, which is marked with a red dotted circle in Figure 13 and the worst combination, the 5-33 case, is marked with a blue dotted circle. After the best and worst combina-tions are trained with the 23 wells in Case 1, the two prediction models are applied to the test well in Case 1 ( Figure 14) and the additional five test wells in the Golden field ( Figure 15). As aforementioned, although the five additional wells are supposed to be used in Case 2 (Figure 3(c)), they are used for a comparison of the best and worst combinations. In  The difference in performance of the best and worst combinations appears more clearly in the additional five test well data ( Figure 15). In both the best and worst combinations, there must be a large uncertainty due to the limited well log data for the additional five test wells (Figure 3(c)). Even though the best combination is trained with limited information, its averages (the blue lines in Figures 15(a) to 15(e)) tend to follow the overall trend of the test wells. In contrast, the worst one presents large discrepancies between the blue and red lines (Figures 15(f) to 15(j)). Although the blue lines seem to mimic the pattern of the test data, they underestimate the    density data of the test wells and the degree of underestimation is worse than that of the best combination. Table 6 compares the performance of the best and worst cases with the RMSE. Each RMSE indicates the mean of the reconstruction results from 30 trained networks (Equation (4)). The worst case has generally two times larger RMSE compared to that of the best case. The larger discrepancy with the test data mostly results from the underestimation. Except for the test well 1, the worst case for the rest of the four wells shows worse underestimation than the best (Figure 15).

Case 2.
In both the best and worst combinations of neurons in the hidden layers, the underestimation problem of the pseudodensity data compared to the actual test data occurs ( Figure 15). In [4], the authors mentioned that the prediction of performance of the wells located in the middle is better because a neural network may interpolate the information of adjacent wells. Also, Zhang et al. stressed that the prediction performance can be significantly improved by the information in the target field [1]. Therefore, to solve the problem, Case 2 uses the 5 additional wells in the Golden field to train and test the best combination case, as presented in Table 3 and Figure 3(c). Figure 16 reveals that Case 2 has five subordinate cases: 2-1, 2-2, 2-3, 2-4, and 2-5. For example, Case 2-2 has one test well (number 2 among the five wells) and the remaining four well data (numbers 1, 3, 4, 5) as training data (the upper graph in Figure 16(b)). It is expected that Case 2 would bring better test performance compared to Case 1 because of the additional amount of training data from the target field. Note that the test well in Case 1 also belongs to training data in Case 2.
In Figures 16(a)-16(e), the graphs in the first row are the synthetic density logs for each test well. They should be compared with Figures 15(a)-15(e) to analyse the effect of additional training data. The pictures in the second row are the location of the wells. In case of Figure 15, only the 23 training wells are used, and they are separated to some extent from the interested target field. The 23 wells are usually located in the longitude −116.1°and the latitude 56.3°, although the Golden field including the five wells is positioned in the longitude −116.2°and the latitude 56.5°. Therefore, it is a difficult to properly predict the density logs for the five test wells in Figure 15 without the geologically and spatially related data.
The prediction of the test data in Figure 16 shows an improved performance compared to the prediction in Figure 15. No matter which well is set as the test data among the additional five wells, they show better results compared to those in Figures 15(a)-15(e). Especially, in Figures 16(b) and 16(c), the underestimation problem in Case 1 is mitigated compared to that in Figures 15(b) and 15(c). Figure 16 Table 6). Generally, the test error of Case 2 decreases by about a half of the error from the best of Case 1. These results can be derived owing to two aspects. First, the number of training data increases in Case 2 over Case 1. Second, geologically and spatially suitable wells are helpful to figure out the nonlinear relationship between the sonic and density logs for the target field.
However, a limitation still exists. Although the synthetic log in Figure 16(e) mimics the overall trend of the true density curve, it fails to predict the abnormal value around 2,400 kg/m 3 near the 50th data point. Despite this problem, the results in Figure 16 can be seen as a reasonable prediction because it is more difficult to generate density log than other logs, such as acoustic and resistivity logs [1,4]. Table 7 summarises the results of error calculation according to the following equation.