Managing Uncertainty in Geological Scenarios Using Machine Learning-Based Classification Model on Production Data

,


Introduction
Reliable reservoir modeling is one of the most important tasks in the decision-making process in field development planning. Various types of static data are used together to build a reservoir model-most commonly, core samples, well logs, seismic interpretation, outcrops, and geological concepts. In particular, core and well logs are important for reservoir modeling as hard data, but they are only available by drilling which costs a lot. Therefore, understanding spatial correlations such as variograms and training images (TIs) is important to generate reservoir properties where drilling data in not available.
Conventional geostatistical algorithms such as sequential Gaussian simulation and kriging identify spatial relationships using variograms. However, variograms can only assess the relationship between two points even though there are other data available nearby [1]. Starting in the 1990s, multipoint geostatistics (MPS), which is about sets of three or more data points, has been developed [2,3]. In MPS, a TI is used in place of a variogram to convey spatial correlation information. Although variograms are estimated by robust formulas, TIs are based on geological scenarios and geological concepts [1].
In case of modeling channelized reservoirs, the direction of channel stream in TI is one of the most important parameters because it affects connectivity between injection and production wells. Some researchers used multiple TIs to evaluate plausible geological scenarios to consider uncertainty in channel direction [4][5][6]. Even when the same hard and soft data are used for the same geostatistics method, quite different reservoir models can be created depending on which TI is used [6]. In addition, channel stationary and ergodicity are also important for securing connectivity when modeling a channel reservoir [7].
To determine the most reasonable TI from among multiple possibilities, previous researches have proposed the two approaches: production-based TI rejection and blind well test. The concept of the TI rejection is to exclude the unsuitable TI, which has a large error compared to the observed production history after implementing reservoir simulation using static models from multiple TI candidates [6,8,9]. However, it requires complex procedures, e.g., distancebased clustering which is sensitive to the number of clusters and the definition of distance [10]. The concept of the blind well test is to quantitatively measure the degree of restoration of logging data that is excluded when MPS is executed from various TIs [11]. However, this approach is sensitive to random seeds because of the equiprobability feature in sequential simulation and to which logging data is set to a blind test.
Machine learning has been applied to a wide variety of research topics including speech recognition [12,13], public health [14], and gameplay [15]. Recently, machine learning algorithms have been suggested as a solution to problems in reservoir characterization. If a set of reservoir models and their production data is available, both proxy and inverse models can be built by machine learning methods. To construct a proxy (or surrogate) model, supervised learning is carried out using reservoir parameters in the input layer and production responses in the output layer. This approach has primarily been studied as a replacement for compositional and unconventional simulations which require significant simulation time [16][17][18][19][20]. Inverse models reverse the order of the parameters: the production responses and reservoir parameters for the input and output layers, respectively. After implementing reservoir simulation to hundreds of initial reservoir models to obtain training data, observed production data can be used to make history-matched reservoir models [21,22].
Most previous studies have relied on simple artificial neural network (ANN) algorithms, which have occasionally been extended to deeper ANN models by increasing the number of hidden layers [23]. Recent advances in deep learning have been driven by even more state-of-the-art algorithms, such as probabilistic neural network (PNN), recurrent neural network (RNN), convolutional neural network (CNN), and generative adversarial network (GAN). PNN consists of input, pattern, summation, and decision layers and has been applied to lithofacies classification for more reliable permeability modeling [24]. A number of studies have recently attempted to apply RNNs to reservoir time-series data such as production rate and pressure [25][26][27][28][29]. RNNs have been applied in place of decline curve analysis (DCA) to predict the production of shale reservoirs, as unconventional reservoirs do not satisfy standard assumptions in DCA. [25,27,28]. Some studies have applied CNN algorithms to image data, e.g., seismic data and core images. Using seismic data, fault interpretation [30] and object classification [31] can be automated via CNN. The CNN algorithm has been successfully applied to microcomputed tomography images to estimate porosity and pore size [32], to drilling cuttings to classify lithofacies [33], and to SEM image segmentation for mineral characterization [34]. A convolutional autoencoder has been applied to extract main features from seismic images, with the resulting reparameterized data used for ensemble-based history matching [35]. GAN is one of the popular generative models which train two different networks simultaneously. GAN can generate new samples based on the distribution of training data, and we can manage the samples by conditioning the networks [36]. Also, spatial correlated data can be generated by GAN without any additional MPS, which reduces calculation costs in reservoir modeling [37].
In this study, a novel classification model for four TIs with different channel directions based on production data was developed using machine learning algorithms. The proposed classification model used production data in the input layer and provided a probability for each TI in the output layer. In other words, the machine learning model tells us that it is likely to be a certain channel direction to have production history in the input layer. Then, the probability for TI was used to regenerate channelized reservoir models to figure out the effect of uncertainty reduction in channel direction. For comparison, three algorithms, support vector machine (SVM), ANN, and CNN, were applied to compare accuracy of TI classification. In particular, the CNN was the firstly applied to a two-dimensional matrix of production history. In the case of SVM, it is one of conventional machine learning algorithms, which are not based on the concept of neural network, and has shown reliable performance in facies classification [38][39][40]. In this study, it has been verified against the TI classification problem.
In Section 2.1, the workflow of the proposed method is introduced, and the following Section 2.2 explains the SVM, ANN, and CNN algorithms. Section 2.3 deals with the dataset used in this study. Sections 3.1 and 3.2 show classification results among the four TIs for the training and test data sets using SVM, ANN, and CNN. In Section 4.1, the trained models are tested for the reference field to obtain the probability for each TI, and then the probability is used to regenerate channel models in Section 4.2. Section 5 includes the findings through this research and future works.

Methodology
2.1. Proposed Classification Model. Geological uncertainty in reservoir models can lead to difficulties for reliable prediction of production performance. Because reservoir properties are highly heterogeneous in channel reservoirs, it is more 2 Geofluids important to construct plausible geological models for managing the uncertainty. Our goal is to construct a machine learning-based classification model for TI in MPS to reduce geological uncertainty. Figure 1 shows the concept of the proposed method, which is a kind of inverse model because the static parameter, TI, is estimated by production history. The input data in our model are oil and water production histories, and the output is probability in each geological scenario, which shows the most probable TI of the reservoir.
We tested the three machine learning algorithms in this study: SVM, ANN, and CNN. In particular, we focused on whether the CNN can be applied to time-series data such as production rates although it has been known for outperformance of image data, e.g., distributions of permeability, saturation, and pressure. In this study, we arranged production data into a 2D matrix to apply 2D CNN.

Machine Learning Methods
2.2.1. Support Vector Machine (SVM). SVM is one of the most powerful classifiers among supervised learning algorithms [41,42]. SVM finds a hyperplane that can separate a given set of data from other groups. It is effective at classifying data into two categories but can also be used in multiclass classification [43,44]. Figure 2(a) shows a toy problem involving 80 training data points within a 2-dimensional space. The data are categorized into two classes as red and blue colors. Using the training data, the SVM finds a hyperplane (a solid line) that maximizes a margin between the two groups. The data closest to the hyperplane define as support vectors, and the margin is calculated as the sum of the distances between the hyperplane (dotted lines) and support vectors (highlighted in green). After the training by the SVM, the model can classify new datasets as shown in Figure 2(b). The new 20 data points, 9 yellow and 11 cyan colors, are successfully grouped as the red and blue areas, respectively.

Artificial Neural Network (ANN)
. Neural networks are inspired by the human brain, which is good at pattern recognition [45]. Each neural network has input and output layers and multiple hidden layers between them to solve nonlinear problems [46][47][48]. It is a basic form of neural layer perceptron which has been developed to various forms of methods such as CNN, RNN, and GAN. In this study, ANN with two hidden layers was used to train the TI classification model.
ANN showed sensitive performance if the number of data is relatively small for the given problem [23]. In this case, hyperparameters such as the number of hidden layers and neurons for each hidden layer should be examined    3 Geofluids through sensitivity analysis. Nowadays, the optimization of hyperparameters can be automatically implemented by automated machine learning algorithms such as the AutoKeras algorithm [49].
Training data are entered into the input layer, and their labels are used in the output layer to train the ANN model. The objective is to minimize the misfit between the true label data and estimated value by optimizing the weights and bias of all connections between layers. There are several ways to improve the classification performance of the ANN [50], and we used the min-max scaling and dropout technique. Because we want to know the probability of each TI in the output layer, the softmax activation function is used so that the sum of neurons becomes 1 in this study.

Convolutional Neural Network (CNN)
. CNN is one of popular deep learning algorithms and is superior to the ANN model in terms of feature extraction and training ability for image data. The input data format for CNN is a 2D or 3D matrix, making them widely used in studies of image and vision processing [51][52][53]. The main factor differentiating CNN from ANN is the use of convolution and pooling layers to extract features of images automatically [54].
In the convolution layer, a filter is used to extract the main features from the image data and weights in the filters are adjusted to minimize the objective function during training ( Figure 3(a)). Pooling is a downsize sampling scheme involving the extraction of main features, and Figure 3(b) shows a simple example of max pooling in which the maximum value is selected among the values within the max pooling filter. The process of convolution and pooling makes the CNN superior to other deep learning methods in terms of learning image data. Detailed information on the CNN architecture used in this study is provided in Section 3.2.1.

Dataset for the Analyses.
Synthetic reservoir models were generated by using the Stanford Geostatistical Modeling Software (SGeMS) tool. We constructed two-dimensional channel reservoir models with distinct distributions of facies of sand and shale. High-permeability sand was spread out as channel facies, making it important to characterize the direction and connectivity of the channels to predict oil and water production, properly. We assume that there is uncertainty in geological scenarios of channel direction and the four images in Figure 4 are plausible TIs [6]. We want to know which TI is more proper for the given production history data.
To train the machine learning-based classification model, a total of 800 models were constructed by the single normal equation module in SGeMS: 200 models for each TI. Each model was laid out on a 25 × 25 × 1 grid system, with each grid containing 50 × 50 × 50 = 125,000 ft 3 . As shown in Figure 5(a), we assumed nine wells from which core data were extracted contained sand facies. It consists of the eight production wells and the single water injection well, which is located in the center of the model to improve production efficiency via waterflooding. The reference model used in the study is shown in Figure 5(b), and it has channel connectivity in the vertical direction because it was generated using the vertical (0 degree) TI in Figure 4(a). Uniform permeability is set to each facies of 1,000 and 1 md for sand and shale, respectively [5,6,8]. Figure 5(c) shows four of the 200 reservoir models corresponding to each TI. Even though the same hard data were used for the same geostatistical algorithm, the uncertainty in the geological scenario, TI, has large impact on We split the 800 reservoir models into three groups for constructing the machine learning-based classification model. Twenty models among 200 models from each TI were selected for the test set, and the remaining 180 models were divided in an 80/20 ratio of 144 training models and 36 validation models. For the 800 initial models from the four TIs, the 576 training and 144 validation data were chosen randomly, while the 80 test data were selected uniformly from each TI group.
We have information of the geological model and its TI, but additional information, the model's dynamic data, is required to build the classification model based on production data. Reservoir simulation was conducted on the 800 initial models to obtain production responses using ECLIPSE 100 from Schlumberger. The four dynamic items, i.e., well oil production rates, well water cuts, and field cumulative oil and water productions, were obtained until 1,000 days in 20-day intervals. Therefore, each model consisted of a total of 900 production data: 400 well oil production rates (50 time steps × 8 producers), 400 well water cuts, and 50 field cumulative oil and water productions. Finally, each initial model has a set of production data and TI index which were used in the input and output layers, respectively.

Training and Validation of Machine Learning Models
The purpose of machine learning-based classification models is to predict the TI index (coded as 1, 2, 3, and 4 for 0, 45, 90, and 135 degrees, respectively) corresponding to production data in the input layer. We trained the SVM, ANN, and CNN using the dataset described in Section 2.3. Classification performance for TI was compared quantitatively from the accuracy scores for the training, validation, and test sets. In particular, we paid attention to the test set score because it indicates the general applicability of the trained models. We used Python 3.7 as the program code and several Python libraries, including NumPy, scikit-learn, and  When the dimensionality of the data is high, the hyperplane can be obscure, which deteriorates the classification. Therefore, rather than using all input parameters (the 900 production data in this study), it is preferable to exclude redundant features via feature selection to improve classification results. Unlike CNN, SVM is needed to preprocess the feature extraction before the training. The SVM was coupled with a discrete wavelet transform (DWT), which is a widely used method for feature extraction via the selection of principal features from among input data [55][56][57]. The DWT uses the superposition of a group of wavelets to construct a basis function for wavelet transform. Various types of wavelets can be used for this purpose, including Haar, Daubechies, biorthogonal, coiflet, symlet, and Meyer wavelets [58]. Because our dataset was not sensitive to wavelet type, we used Haar wavelets-the simplest type of wavelet-to carry out the DWT.
The number of selected features can be adjusted by the decomposition level in the DWT. Increasing the decomposition level reduces the number of features; a rule of thumb for selection of the level is given as which is nine or less [59] because the size of input features is 900. The sensitivity analysis on the decomposition level was implemented to optimize the performance of the SVM by feature extraction. Table 1 lists the accuracies of the SVM models to training, validation, and test sets by adjusting the decomposition level from zero to nine. The accuracy scores were calculated by dividing the number of correct classifications by the total number of each dataset: 576 training, 144 validation, and 80 test data. The conventional classification model has evaluated the machine learning model based on the confusion matrix because it has only two options, whether a landslide has occurred or not [60][61][62]. However, the classification model of TI used the probability of TIs itself to evaluate geological scenarios instead of selecting the highest probability TI as a one-hot encoded solution. It is important to review possible geological concepts and assess uncertainty in TIs rather than a clear answer.
As the decomposition level is higher by a DWT, the number of features selected is smaller. The SVM model using only two features (the highest decomposition level) provided the poorest classification for the three data sets although the two features is effective for visualization. Figure 6 shows the distribution of 80 test reservoir models using the two features. If the test set was classified properly with only the two features, a sharp hyperplane could be defined to distinguish it according to TI; however, the mixed TI indexes on the 2D plane indicated a poor accuracy of the SVM.
The curves in Figure 7 drew the accuracy score of the sensitivity analysis in Table 1. The blue, red, and green lines indicate the results for the training, validation, and test sets, respectively. Increasing the number of features to four, eight, and then fifteen significantly improved the scores, which fluctuated beyond fifteen features. However, it is not always better to use more features: the best test set score was obtained using the 450 features, while the second-best was obtained using the 29 features. Because the slopes of the lines started to change after the decomposition level was 6, the 15 features were selected as the desirable number of features for the given problem.
3.1.2. SVM Models with Reduced Raw Data. In the previous section, the sensitivity analysis was implemented on the feature reduction from the original 900 raw data: 400 oil production rates, 400 water cuts, and cumulative 50 oil and 50 water productions. Because most of the problems in reservoir engineering have underdetermined, a large amount of input data generally helps in obtaining improved solutions.

Geofluids
However, in classifying the TIs using the SVM, a large number of features did not guarantee better results. Also, redundancy measurement data gave spurious information during history matching [63]. Thus, in this section, we tested the performance of the SVM with reduced raw datasets to improve accuracy score.
The number of raw data was reduced by either (a) using only the oil production rates and water cuts of the producers (800 features in total) or (b) using oil production rates only (400 features). Tables 2 and 3 show the scores of the SVM for each case. In Table 3, the maximum decomposition level was eight because of limiting the number of features to 400 although it was equal to nine in Tables 1 and 2. The results in Tables 2 and 3 followed a similar trend to those by the SVM in Table 1. When only two features were used for both the 800 and 400 raw data, the SVM was not trained properly and gave unreliable classification for the test set. The largest increase in test set score occurred at the four and seven features for the 800 and 400 raw data, respectively.
At above thirteen features, the scores appeared to remain stable. Figure 8 shows a comparison of test set scores for the three cases: the 900 data in Section 3.1.1 as the purple line, the 800 data as the black line, and the 400 data as the cyan line. The 400 data line showed the lowest test set scores regardless of the decomposition level. This suggested that it was preferable to use various types of data, i.e., oil and water, to improve the SVM rather than using a single type of production data. The 800-feature line indicated higher scores than the 900-feature line, because the use of similar types of data, e.g., well oil production and field cumulative oil production, was likely to prevent the extraction of principal features via DWT. It is important to determine an optimal number of features in the DWT because the three SVM models showed a reliable classification performance when the decomposition level was nearly six regardless of the number of the raw data.

ANN and CNN.
In this section, we analyzed the TI classification problem using the ANN and CNN, which had multiple hidden layers to train classification models. Based on Section 3.1.2, the 800 dynamic data, the 400 oil production    Figure 8: Test set scores of decomposition level depending on the number of raw data: the 400, 800, and 900 data. 8 Geofluids rates and 400 water cuts, were used in the input layer. The main difference between ANN and CNN is the arrangement for the 800 input data. CNN requires input data as an image, i.e., a matrix type, while ANN assigns a vector type to the input layer. Accordingly, we constructed a matrix-form dataset of the production data for the CNN (Figure 9(a)). This matrix shows the production trend in time as an image in Figure 9(b). Before training both algorithms, the 800 production data were normalized to have a value from 0 to 1 using a min-max scaler to remove the effect of unit scale. We used this image as an input in the CNN.
3.2.1. The Structure of the ANN and CNN Models. ANN is generally improved as its hidden layer is made "deeper." It is therefore important to set similar numbers of parameters in ANN and CNN architectures when comparing their per-formances. Tables 4 and 5 show the structure of neural networks in the ANN and CNN, respectively. We built the ANN structure with the two hidden layers of size 450 and 200. The number of the input and output layers were fixed as the 800 production data and the probability for the four TIs. The sum of the number of parameters in each layer, as shown in Table 4, is a total of 451,454. Table 5 provides a detailed structure of the CNN model. The CNN architecture was configured with the two convolution layers and two fully connected layers. The number of elements in the input matrix (16 × 50) was equivalent to that of the input vector in the ANN. The first convolution layer used 32 filters of size 5 × 5, and zero-padding was used to maintain the input size. Then, max pooling was used to downsize the data by 50%. After repeating the process twice with 64 filters of size 5 × 5 × 32 in the second convolution layer, the data   were flattened for the fully connected layer, which had 128 nodes. Finally, it was fully connected to the four-node output layer, which is equal to the number of the TIs. A total of 478,724 parameters was used in the CNN as the sum of the number of parameters in the convolutional and the fully connected layers. Note that the max pooling layer does not have parameters because of predetermined weights. Therefore, the numbers of parameters for the ANN and CNN were reasonably similar.

Classification
Model by the ANN Algorithm. Using the structure described in Section 3.2.1, we trained the ANN model to classify the TIs based on the production data. The Adam optimizer was applied to minimize the loss function at a learning rate of 0.0001. A training epoch of 1,000 was applied using a batch size of ten.
The accuracy scores of training, validation, and test sets by the ANN are shown in Figure 10. The difference between Figures 10(a) and 10(b) is the effect of the dropout layer, which is a useful regularization technique for preventing the ANN from overfitting on the training set. When the dropout layer was not added in the ANN (Figure 10(a)), the overfitting problem happened after 100 epochs as the score converged to 1 for the train set. Because the ANN model was not modified after the epochs, the validation and test scores did not improve anymore. The performance of validation and test is more important than the train so that the classification model is applied to unseen observed production data reasonably.
When dropout was used in the ANN (Figure 10(b)), the scores of the validation and test sets varied between 0.8 and 0.9 while the training set score converged to 0.95 at 1,000 epochs. However, the ANN model required extremely high epoch numbers for training because the test set score was still fluctuating. Although the ANN model can also be improved by changing the batch size, optimizer, and structure, these hyperparameters were fixed because our purpose was the comparison of the classification performance between the ANN and CNN.

Classification
Model by the CNN Algorithm. The CNN was applied to the matrix of the 800 production data. Other parameters such as batch size and optimizer were the same as those used for the ANN. A dropout rate is set to 0.5 and training epoch is 100. Figure 11 shows the results by the CNN for the data arrangement in Figure 9(a). The training set score was approximately 0.95, indicating that the network was properly   Figure 11: Accuracy scores of the CNN for the data arrangement in Figure 9(a). 10 Geofluids fitted. The scores of the validation and test sets at the 100th epoch were 0.9 and 0.85, respectively. Because the test score was increased by 30.77% compared to the ANN's score in Figure 10(b) at the same epoch, the CNN was more effective than the ANN to make stable and reliable TI classification. In case of the ANN, the important information of production data such as well index, time index, and type of dynamic data was missing through the vector of the 800 data. However, the matrix of the data was constructed by the order of time step in the column and well index and type of dynamic data in the row in Figure 9(a). It is also apparent that the input production image was effectively interpreted by the CNN. We then examined whether the accuracy of the CNN was affected by the data arrangement. Because the production image in Figure 9(b) was not a picture with a distinct shape, we reconstructed the image by varying the arrangement of the 800 production data. Figures 12(a) and 12(b) show the alternatives compared to the original arrangement in Figure 9(a). In the case 1 in Figure 12(a), the 800 data were arranged on the basis of production well and the matrix size is the same as the original case, 16 × 50. The case 2 in Figure 12(b) was a transposition of the original data configuration (50 × 16 matrix).
The accuracy scores for the cases 1 and 2 in Figures 12(c) and 12(d), respectively, followed similar trend to ones for the original case in Figure 11. Although there appeared to be some differences in accuracy scores among the three cases, the effect was not critical for overall performance of the CNN. Therefore, "image" of production data was not sensitive to the arrangement of the production data because CNN can extract important features automatically through convolutional and pooling layers [64].

Uncertainty Quantification Using the
Training Models 4.1. Application to the Reference Model. Because the 800 initial reservoir models were generated from the four different TIs (Figures 4 and 5(c)), there was a high degree of uncertainty in the initial models. To reduce the uncertainty in geological scenarios, we want to know which TI is more appropriate for the reference model in Figure 5(b). After the trained SVM, ANN, and CNN models were applied to observe production data from the reference field, the probability of each TI, which is the output of the trained models, was used to regenerate reservoir models based on its proportion. Day 1000 Oil production rates (1 50) Oil production rates (1 50) Oil production rates (1 50) Water cuts (1 50) Water cuts (1 50) Water cuts (1 50 11 Geofluids Table 6 provides a comparison of the probability of TIs obtained by the three trained models. The probability from the SVM estimates TI 1 by 83.4%, and it is a reliable result because the reference model was generated using TI 1. The SVM-generated probability for TI 3 is the lowest among the four indexes as 0.46% because TI 3's horizontal channel connectivity in Figure 4(c) is most contrasting with TI 1's vertical connectivity in Figure 4(a). Therefore, the SVM gives proper estimation in the application of the reference production data.
The ANN produces a completely erroneous estimation in which the observed data are linked with TI 4 only. It indicates that the probability of TI 4 is equal to 100% and the remaining TIs are useless. The probability for TI 1 from the CNN, 94.77%, is higher than that for the other models, and the probability for TI 3 is equal to zero. The estimates for TI 2 and TI 4 are 1.14% and 4.09%, respectively, both of which are relatively small values. The CNN provides the best estimate among the three trained models; it can interpret the hidden geological information of the observed data such as time sequence and the location of wells as the matrix of production history.

4.2.
Comparisons of the Initial Reservoir Models and the Regenerated Models. A hundred of new reservoir models were generated by using the probabilities in Table 6. The concept of TI rejection was adapted during regeneration of reservoir models [6,8,9]. Table 7 shows the composition of the regenerated 100 models by the three machine learning algorithms. For example, the 100 new models by the CNN consist of 95 models from TI 1, one model from TI 2, and four models from TI 4. Note that the initial 800 models were built from the four TIs at the same rate (200 models per TI).
Even though hard data were fixed as the nine points in Figure 5(a), the concept of a pseudo facies probability map [6,8] was applied for the MPS algorithm as the soft data to improve the reliability of the regenerated models. The map could be defined as the mean of the selective models among the 800 initial models, which have less error between the simulated and observed dynamic data. Note that no additional reservoir simulation was required to calculate the error, because reservoir simulations had already been performed for all initial models to train the machine learning models. The error was evaluated by the absolute misfit of well oil production rates as follows: where N w and N t are the numbers of production wells and observed time steps, respectively, which are set to 8 and 50. q sim o and q obs o are the simulated and observed oil production rates, respectively.
In this study, the five permeability models with the smallest misfit were selected and they were transformed into facies models: 1,000 md to sand (index 1) and 1 md to shale (index 0). The mean of the five facies models in Figure 13 was then used to a facies probability map for sand facies. This map was combined with the TI guideline in Table 7 and the hard data in Figure 5(a) within MPS algorithm. Figure 14 shows the means of the 800 initial models and the 100 regenerated models for each machine learning algorithm. The mean model of the initial models had no connectivity ( Figure 14(a)) because the four TIs were applied equally to generate the initial models. The locations where the hard data are located can be identified. By contrast, the mean models for the regenerated models had specific channel patterns. In the ANN case, because the 100 models were generated using only TI 4 having a 135-degree channel direction (Figure 4(d)), the average of the regenerated models had connectivity of about a 135-degree direction in Figure 14(c), which differed significantly from the reference field in Figure 5(a).
The mean models of the regenerated models from the SVM and CNN reasonably mimicked the high-permeability connectivity in the reference field. For example, the connection between P1 and P2 in Figure 5(a) was present in Figures 14(b) and 14(d). The two algorithms produced similar results because most of the regenerated models in both cases were created from the true TI 1. Compared to the mean of the 800 initial models (Figure 14(a)), the regenerated models by the SVM and CNN could significantly reduce the uncertainty in channel distribution by determining a proper TI based on the observed data.  Table 7: TI rejection based on the probability of TI in Table 6 to regenerate 100 models.

Conclusions
In this paper, we proposed the machine learning-based TI classification method based on the observed dynamic data. The probabilities of the four different TIs were evaluated in the output layer by using the 400 oil production rates and 400 water cuts in the input layer. The three algorithms, the SVM, ANN, and CNN, were trained by the 800 initial channelized reservoirs' TI and their production history.
In the case of the SVM, the result of the TI classification was sensitive to the number of raw data as well as the number of selected features. The accuracy score for the training set, 576 models, increased sharply until the decomposition level reached 6, but after that, the score was converged. It meant that the usage of a large number of features does not guarantee a reliable SVM result and the sensitivity analysis should be conducted to determine the optimal feature size for the SVM.
The ANN used a vector of the 800 dynamic data for the input layer while the CNN adopted the matrix of the dynamic data, 16 by 50, for the input layer. Under the similar complexity of neural networks between the ANN and CNN, the CNN was superior to the ANN in terms of the accuracy scores because the CNN can preserve the information of the dynamic data such as the time-dependence and well locations.
After training the three algorithms using the initial reservoir models, the trained models were applied to the observed dynamic data from the reference model to obtain the probabilities of TI. As a result, the CNN produced the best estimate, with about 95% for TI 1, which was used for the reference field. Using the probability for each TI, we regenerated 100 channel models to reduce uncertainty in channel direction. Whereas the 100 regenerated models by the ANN failed to mimic the channel connectivity in the reference field, those by the SVM and CNN had similar permeability distributions to the reference field. These results demonstrated that the trained machine learning algorithms can reduce uncertainty in the geological scenario by guiding a reasonable TI. Also, the matrix of dynamic data was successfully applied to the CNN as image data. In the future work, the regenerated models can be used as reliable prior models for a history-matching method.

Data Availability
Data are available on request.

Additional Points
Highlights. (i) Classification model for training images (TIs) is developed by using machine learning-based methods. (ii) Output of the trained model for observed production data suggests proper TI. (iii) Production data matrix is constructed to apply convolutional neural network (CNN). (iv) CNN outperforms support vector machine and artifical neural network by reducing uncertainty in facies distribution.

Conflicts of Interest
The authors declare no conflict of interest.