One method of generating synthetic data to assess the upper limit of machine learning algorithms performance

Abstract Based on statistics from the World Nuclear Association, Kazakhstan has the highest uranium production in the world. Most of the uranium in the country is mined via in-situ leaching and the accurate classification of lithologic composition using electric logging data is economically crucial for this type of mining. In general, this classification is done manually, which is both inefficient and erroneous. Information technology tools, such as predictive analytics with Supervised Machine Learning (SML) algorithms and Artificial Neural Networks (ANN) models, are nowadays widely used to automate geophysical processes, but little is known about their application for uranium mines. Previous experiments showed an ANN accuracy of about 60% in the task of lithological interpretation of logging data. To determine the upper limit of the accuracy of machine learning algorithms in such task and for indirect assessment of the experts’ influence, a digital borehole model was developed. This made it possible to generate a complete set of data avoiding subjective expert assessments. Using these data, the work of various ML algorithms, both simple (kNN) and deep learning models (LSTM), was evaluated.


PUBLIC INTEREST STATEMENT
Kazakhstan accounts for 40% of the world's uranium mining, which is extracted by the method of in-situ leaching. Moreover, the correct interpretation of geophysical data plays an important role in the process of uranium production. This process is quite complex and time consuming. For a correct interpretation, as a rule, a lot of experience is required in a particular field. To increase the efficiency and accuracy, it is proposed to use machine learning algorithms that, using the experience of experts, could perform lithological classification automatically. In this work, we propose a method for generating "ideal" geophysical data for assessing the upper limit of the quality of classifiers and provide comparative results for evaluating classifiers of various types (artificial neural network, gradient boosting, support vector machines, Long shortterm memory) on real data. The use of machine learning systems for the interpretation of geophysical data will improve the economic efficiency and environmental friendliness of uranium mining in Kazakhstan.

Introduction
Since 2009, Kazakhstan is the world leader in uranium mine production generating over one-third of the total uranium global volume. The uranium production has grown 3.5 times in the last 7 years in Kazakhstan, demonstrating a strong and compelling need in applying novel ideas and technologies for uranium mining. Optimization of uranium commercial production processes from mining ores to nuclear power is a cornerstone requirement for KazAtomProm, for example, which is an effective and sustainable Nationwide organization.
Automation of production processes, e.g. Geophysical Data Interpretation for Boreholes (GDIB) in uranium mines, is just one of many ways this optimization can be achieved. Erroneous and inaccurate results from geophysical data analysis may lead to serious financial losses on different levels: from overall decrease of active boreholes to unjustified labor cost and low volume production. GDIB processes can specifically be used for analyzing lithological composition of the borehole, while predictive analytics with SML algorithms and ANN models for automation and streamlining the overall analytical process.
The success of data interpretation processes depends on properly prepared input data. Quality, content, and format of the input data directly affect the machine learning processes, and subsequently the final accuracy of classification. The accuracy of automated log data classification to a great extent depends on expert's manual assessment because it is used as input for training ML automatic classifiers. However, cross-comparison of assessments for core sampling provided by different experts shows significant discrepancies. Specifically, the difference between assessments grows higher with decreased value of geological samples (e.g. sandstones have more deviations in manual classification than claystones). Here, we aim to capture the nature of this phenomenon and to define the limits of automatic classification while applying ANN that are trained on expert assessment input data.
The following parts constitute our efforts and presented in this paper: Part 1. Literature review, with delineation of known issues for ANN classification of log data from uranium mines; Part 2. Synthetic Log Data; Part 3. Algorithm of generating synthetic Log data; Part 4. Brief description of used ML methods; Part 5. Results with real and synthetic data; Part 6. Conclusion, with summary of results and recommendations for the next set of analytical experiments design.
In Kazakhstan uranium is mined via sub-surface in-situ leaching of boreholes, which is one of the lowcost and ecologically safe mining methods (Yashin, 2008). Moreover, the cost-efficiency of uranium ore mining depends on the speed and accuracy of geophysical data and its interpretation. Most of the collected data generated via electricity-based methodology such as logging of apparent resistance (АR), spontaneous polarization potential (SP), and induction log (IL). Log results are usually presented as diagrams, which in turn used by experts to manually assess the depth and value of the uranium deposits. In the course of data interpretation, an expert usually extracts information about bedding rock layers and performs lithological classification describing borehole throughout its depth. This manual process of borehole core sampling data processing has inevitably slow rate of data generation and low accuracy. From uranium production point of view, the importance of electrical log data accuracy from uranium boreholes is hard to underestimate. Ultimately, these data define the location for a long-term installation of an expensive filter that is required for uranium sub-surface mining via in-situ leaching.
Here, we used historically generated classifications from two minefields (~200 boreholes) to train our ANN model with ML algorithms.
In general ANN models are capable of resolving poorly formalized tasks (Neurocomputers, 2004) however in our case, this approach is predisposed to several challenges: (1) Inconsistency of experts' opinion in data assessment; (2) Requirement for equal and large number of examples from each class of data; (3) ANN inability to interpret resulting outcome; (4) Requirement for thorough preparation of input data prior to analysis (e.g. outliers' removal, normalization, data smoothing).
There is a number of publications focused on tasks and issues related to automatic interpretation of log data from uranium deposits. For example, results of analytical testing with ANN as an approach for log data classification can be found in publications Muhamedyev, Kuchin, & Muhamedyeva, 2012;xxxx, 2011), while several ML methods and their comparative results described in publications (Amirgaliev et al., 2014;Muhamedyev et al., 2015). There, it was shown that feedforward neural network demonstrates a much better classification's quality when compared to k-nearest neighbor (k-NN) or support vector machine (SVM) algorithms. Furthermore, results from a combination of ML algorithms applied to a similar underlying task were reviewed in publications (Muhamediyev, Amirgaliev, Iskakov, Kuchin, & Muhamedyeva, 2014b;Muhamedyev, Iskakov, Gricenko, Yakunin, & Kuchin, 2014). This article raises the question of the upper limit of the quality of classification for various MO algorithms in relation to this problem.

Synthetic log data
Due to inherent issues with log data classification (e.g. limited information about actual distribution of lithotypes along the borehole axes), the accurate assessment of classification's quality for real geophysical data is not feasible. Core sampling is usually done for just a few boreholes with limited interpretation throughout the well's depth. That is why the system of ML classification relies on expert's assessment data. In other words, the system of ML classification is trained primarily on expert's opinion/assessment, assuming that it is accurate. This approach, however, leads to a value paradox where actual merit of such expert's interpretation has many inherent contradictions.
As was mentioned above, our earlier experiments demonstrated that automatic classification with feedforward neural network generally performs with~60% accuracy.
The question arises, is a good result or not, what the upper limit of the accuracy of ML algorithms in such a task to which you want to strive for. The answer will also allow indirectly assess the impact of the inconsistency of expert assessments.
To test the quality of ML system performance, one can generate synthetic dataset equivalent to real log data by one or several parameters. These synthetic data would fit basic physical criteria for log instruments and have properties of the ore from uranium deposits. Of course, such evaluation is not exhaustively complete, yet it can demonstrate algorithm's potentials and limitations, as well as to indirectly provide us with insight on expert assessment's value. Additionally, the extend of contradictions in expert's assessment can be elucidated simply by comparing the results of assessment performed by several independent experts.
The only true and reliable definition of lithologic composition along the borehole axe comes from the actual coring (collection of ore samples from the well). However, even that does not assure complete information collection because the lithotype extraction from sedimentary layers (characteristic to uranium deposits in Kazakhstan), in general, does not exceed 80%. Moreover, additional errors are introduced during mapping the core sampling to the borehole depth and during core data description itself. It is challenging to identify and account for all these errors. When reliable information about lithologic composition of the borehole core section is lacking, and both electrical log data and coring provide just approximate values, the outcome from modeling of registered log signal for predefined distribution of lithotype along the borehole core with known physical properties seems not just practical, but best quality. Thus, this approach allows for the generation of artificial but fully defined log values, unmistakably linked to a specific kind of lithotypes.
Physical properties of each type of lithotype (e.g. apparent resistance, AR) have specific range of values. Distribution of values within that range can be explored during geological expedition or in the laboratory. During geological expedition the coring is done, the laboratory analysis is performed, and physical property of each type of ore, including their characteristic AR values, is being registered (Methodical recommendations, 2003;Technical instruction, 2010). Examples of typical AR-values distribution are shown in Figure 1 (the diagram is a courtesy of GeoTechnoService organization). On the graph, the apparent resistance (ohm·m) is plotted along the X-axis, and the number of intervals having such resistance is plotted along the Y-axis. For example, for lithotype 3, the peak value indicates that at 310 intervals where this lithotype is present, the apparent resistance value is about 10 ohm·m. At the same time, number of intervals with AR value of more than 15 and less than 7 ohm·m is negligible. Explanation of lithotype codes is given in Appendix 1. (Each lithotype is color/number coded).
As you can see, there is a significant overlap between AR distributions for 13 lithotypes in GeoTechnoService study. The diagrams also suggest that a normal distribution curve within a given value range can be used for modeling in the first order of approximation. On the other hand, the values for spontaneous polarization potential (SP) are more complicated for simulation, because they depend on at least two external factors: properties of drilling solution and groundwater mineralization.
Once the power distribution curve for lithotype layer is defined, the AR values are chosen for each 10cm-increment of sub-layer within that geological layer, which in turn allows to build an AR distribution curve throughout borehole's core.
To accurately model the registered AR curve, it is also important to take into account the parameters of drilling instrument (probe type, distance between electrodes), borehole's diameter, and properties of drilling solution among other factors. Thus, we develop an artificial AR log record, which would correspond to a given lithotype composition, and the validity of information about it will be 100% accurate. Same approach is used for modeling the SP distribution curve.
Synthetic AR and SP data can further be used for tuning and improving various ML algorithms. The best approximation of the model to actual conditions allows for increased accuracy of real data recognition and for outlining the upper limit of detection accuracy.
Process of synthetic data generation consists of three parts: (1) Lithological types generation (2) Physical properties values generation (3) Measuring process simulation First, a sequence of lithologic layers is generated based on statistical distribution of lithotype occurrence (e.g. 20% of layers are clay, 50% are sand layers, etc.) and typical layer power for modeled horizon of the given minefield (e.g. 1-4 m for sand layer, 0.2-2 m of clay, 0.2-0.5 m of sandstones, etc.).
There is also an option to set a static layer distribution, for example, to copy lithological layers of an existing borehole. This option was used further in the section to validate the generation algorithm.
Then, for each 10-cm increment of sub-layer inside each of the generated layers, an AR value is generated using statistical distribution from existing data. In current implementation, a Gaussian distribution is used, since AR and SP values distribution is close to Gaussian, as illustrated by Figure 1.
The last step is measuring process simulation. Physical properties of drilling solution and groundwater, borehole diameter, and drilling instrument parameters may affect log-registered values for AR and SP. In our model, we used parameters of drilling instrument as main factors affecting registered values.
The logging data is measured with 10-cm increment; however, typical distance between electrodes is 1 m, which means that for measured value of each of the 10 cm interval is affected by physical properties of the surrounding soil. The extent to which the measured value is affected by surrounding soil is set by numerical coefficients, which are one of the parameters of the synthetic data generation process along with lithotypes and physical properties statistical distributions. Results for AR-values generation by the drilling instrument model are shown in Figure 2 (bottom plot shows generated AR values, while the top plot is the result of measurement simulation). The figure illustrates two main factors, which make classification based on measured physical values inherently difficult: (1) The measured data is very smooth, averaged, which means that the actual bounds of each layer are difficult to distinguish.
(2) When measuring small layers with high AR value, the power of the layer can be not enough to saturate the measuring instrument, which leads to much lower AR value being measured. This problem can be partially solved by taking the curvature of the signal into account.
To validate models' competence, we used some boreholes from our training set to model "measured" AR and SP values for a given distribution of lithotypes along borehole's core. Results for one of the boreholes ( Figure 3) show a nice alignment between modeled and actual log-data. The fact that the modeled AR diagram is smoother compared to actual data can be explained by the incomplete input of factors such as borehole diameter, errors of measurement, and other. The modeled SP diagram is more expressive than the actual one; however, it is practically impossible to find a good quality actual SP record as it requires special preparation (e.g. solution, measurement conditions) that is hard to fulfill.
UL methods solve the problem of clustering, when the set of not previously marked objects is divided into groups by an automatic procedure, based on the properties of these objects. (Barbakh, Ying, & Fyfe, 2009;Jain, Murty, & Flynn, 1999).
SL methods solve the problem of classification or regression. The task of classification arises when in a potentially infinite set of objects there are finite groups of objects in some way indicated. Usually, the formation of groups is performed by an expert. The classification algorithm should, using this initial classification as a model, assign the following unmarked objects to a particular group, based on the properties of these objects.
In our case, since we have data marked up by experts, it is advisable to apply a group of SL methods which include k-Nearest-Neighbor (k-NN) (Altman, 1992), Logistic regression, Decision Tree (DT), Support Vector Classifier (Cortes & Vapnik, 1995) and a large family of artificial neural  Zhang, 2000) and compositions of algorithms, for example, using boosting.
We shall briefly describe the listed algorithms and how to call them in programs. Note that the description of the logistic regression is for reference only. The logistic regression model is useful for understanding the SVM and ANN algorithms. Among the many ANNs, we have chosen to apply the "classic" feedforward neural networks in the form of a multilayer perceptron (MLP) and one of the varieties of deep learning networks (LSTM), which is successfully used to analyze time series.

Logistic regression
The algorithm is used when it is necessary to divide objects of two classes, for example, into "negative" and "positive". In this case, the hypothesis function is required to fulfill the condition 0 h θ ðxÞ 1, what is achieved using a sigmoidal (logistic) function where Θ parameter vector, x is a vector of input variables.
Selection of parameters Θ after selecting the hypothesis function is performed to minimize the cost function of the form where m is the number of training examples, y ðiÞ is the class of the i-th object, λ is the regularization parameter that allows you to control the degree of generalization of the algorithm.
For large λ, the algorithm will form more smoothed boundaries between classes.
Minimization of the cost function is achieved using the gradient descent algorithm, but also use the Conjugate gradient (Møller, 1993), BFGS, L-BFGS (Liu & Nocedal, 1989) (below lbfgs) Although the logistic regression algorithm was originally proposed for classifying into two classes or One-vs-all, however, the current version of this model allows classification of objects of several classes when the multi_class parameter is set. The main adjustable parameter C is the inverse regularization quantity. The smaller it is, the stronger the regularization, that is, the more smooth the line separates the classes. The algorithm is called from the library sklearn. linear_model from sklearn.linear_model import LogisticRegression A classifier is created using the following command clf = LogisticRegression(C = 1, solver = "lbfgs")

kNN
The algorithm (Dudani, 1976;K-nearest neighbor algorithm, 2012) is based on counting the number of objects of each class in a sphere (hypersphere) with a center in an object being classified. The classified object belongs to the class whose objects are most in this area. If the weights of objects are not the same, then instead of counting the number of objects, we can sum up their weights. Thus, if in a sphere around an object being classified there are 10 standard objects of class A weighing 2 and 15 erroneous/border objects of class B with weights 1, then the classified object will be classified as class A.
The weights of objects in a sphere can be represented as inversely proportional to the distance to a recognizable object. Thus, the closer the object, the more significant it is for a given recognizable object. Depending on the modification of the algorithm, distances can be estimated using different metrics (DistanceMetric.html, 2019).
As a result, the classifier can be described as: aðu; X l Þ ¼ arg max y2Y ∑ l i¼1 ½y ðiÞ u ¼ ywði; uÞ; ( 3) where w (i, u) is the weight of the i-th neighbor of the recognized object u; (u; X l ) is the class of the object u, recognized by the sample X l .
In general, it is one of the fastest, but inaccurate classification algorithms. Loading the corresponding library and creating the classifier is performed by commands.

SVM
In the general case, support vectors are constructed in such a way as to minimize the cost function of the form: where S 1 and S 0 functions replace logðh θ Þ and 1ogð1 À h θ Þ in the expression for logistic regression (2). Usually S 1 and S 0 are piecewise linear functions.
f k is a kernel function that determines the significance of the objects of the training set in the feature space. Often, a Gaussian function f k ðiÞ ðx ðiÞ Þ ¼ exp jxÀx ðiÞ j 2 2δ 2 is used, which for any x allows us to estimate its proximity to and thus form the boundaries between classes closer or more distant from the reference object, setting the value of δ.

XGBoost
The XGBoost algorithm (Extreme Gradient Boosting) is based on the concept of gradient boost introduced in Riedman, (2001) and provides optimized distributed gradient boost on decision trees (About XGBoost, 2019). The essence of boosting is that after calculating the optimal values of the regression coefficients and obtaining the hypothesis function h θ ðxÞ using some algorithm (a), an error is calculated and a new function h bθ ðxÞ is selected, possibly using another algorithm (b), so that it minimizes the previous error h θ ðx ðiÞ Þ þ h bθ ðx ðiÞ Þ À y ðiÞ ! min In other words, it is a matter of minimizing the function: Lðy ðiÞ ; h θ ðx ðiÞ Þ þ h bθ ðx ðiÞ ÞÞ where L is the error function, taking into account the results of the algorithms a and b. If J b ðθÞ is still large, the third algorithm is selected (c), etc. Often decision trees of relatively shallow depth are used as algorithms (a), (b), (c), etc. To find the minimum of the J b ðθÞ function, the value of the error function gradient is used.
Considering that the minimization of the function J b ðh bθ ðx ðiÞ ÞÞ m i¼1 is achieved in the direction of the anti-gradient of the error function, the algorithm (b) is adjusted so that the target values are not ðy ðiÞ Þ m i¼1 but the antigradient ( À L 0 ðy ðiÞ ; h θ ðx ðiÞ ÞÞ m i¼1 ), that is, when training the algorithm (b) instead of ðx ðiÞ ; y ðiÞ Þ pairs, ðx ðiÞ ; ÀL 0 ðy ðiÞ ; h θ ðx ðiÞ ÞÞ pairs are used.

MLP
Multilayer artificial neural networks (multilayer perceptron) are one of the most popular classifiers especially for the case of several classes. To adjust the weights θ of the neural network (network training), a cost function resembling the logistic regression cost function is used.
where L is the number of layers of the neural network; s l the number of neurons in the l layer; K is the number of classes (equal to the number of neurons in the output layer); Θ weights matrix.

LSTM
Long short-term memory (LSTM) is a kind of recurrent neural networks that can memorize values for short or long periods of training.
When an error is back propagating across large networks, the error transmitted from the output to the input will gradually decrease until it becomes disappearing small. To reduce this effect, the concept of a constant error carousel (CEC) was introduced in (Hochreiter & Schmidhuber, 1997). The CEC property is achieved by using a linear unit with a fixed internal link (s), which is the central element of the LSTM, around which the entire so-called memory cell is built. The j-th LSTM cell is denotedc j . In addition to the "normal" input signal, it receives a signal from a multiplicative output element (output gate out gate) and a multiplicative input element (input gate in gate) with sigmoidal activation functions (in later literature is called forget gate). Inside the cell, there is an input and hidden layer with sigmoidal activation functions and a linear link that process the listed set of signals.
Out gate values: y out j ðtÞ ¼ f out j ðnet out j ðtÞÞ; In gate values: Where: net out j ðtÞ ¼ ∑ u w out ju y u ðt À 1Þ; (9) net in j ðtÞ ¼ ∑ u w in ju y u ðt À 1Þ; Network output: The processing process includes five main stages indicated in numbers in Figure 4.

5) Out =y c j ðtÞ ¼ y out j ðtÞhðs cj ðtÞÞ
Such an architecture allows to "forget" that part of the information that is no longer needed in the new t + 1 step, while adjusting the weights of the input gate, and add a new piece of information, adjusting the weights of the output gate.
The LSTM network is constructed from a sequence of described cells so that the input signal is fed to the network inputs and input gates of all memory cells, and the output cells are fed to the inputs and input gates of the following memory cells ( Figure 5) (Keras LSTM tutorial, 2019).
LSTM is effective in working with time series. The task of lithological classification can be considered as the task of working with a series of values, but not in time but in space (in depth).

Results with real and synthetic data
After linear normalization, the real and simulated data were fed to the input of several classifiers (ANN, kNN, SVM, XGBOOST, LSTM) in the form of a "floating window" (Figure 6).  (Hochreiter & Schmidhuber, 1997).
The use of "floating data windows" is a common method of analyzing data sequences, for example, time series or, as in our case, the dependence of recorded physical parameters on the depth. Since the expert takes into account the form of the logging curve when evaluating the data, it is reasonable to submit to the input data in a form of floating window with the size of n + 1 + n points. That is, n measurements above, the current value and n measurements below the current value are considered. The next window is formed in a similar way, shifting one point below.
Since the logging probe length is 1 m, which corresponds to 10 depth measurements, it is natural to set n = 5. Presenting data in the form of a floating window allows, to some extent, to take into account the form of the curve, and not just the value of recorded parameter at a specific depth.
For each of the 36 boreholes from the dataset, its "ideal" variant was simulated. The boreholes were divided into training and test sets in the ratio of 30/6. It should be noted that for the correct division into training and test sets in this case it was impossible to use sklearn.model_selection.  Figure 6. Floating data window. Kuchin et al., Cogent Engineering (2020), 7: 1718821 https://doi.org/10.1080/23311916.2020 train_test_split or similar functions due to the presence of a floating window. Due to the fact that the classes are not balanced, the weighted F1 was chosen as the main metric of quality (algorithm performance). For example, one of the possible combinations (lithotype code: number of values): 6: 4504, 4: 8177, 3: 21,746, 1: 15,653, 5: 251, 7: 4494, 9: 5. For this reason, weighted precision (Prec), weighted recall (Recall) and weighted F1 are given in tables to assess algorithms performance.
Several models of classifiers ANN, LSTM, and XGBoost have been tested (Described in Appendix 2 https://drive.google.com/open?id=1eJk8XIeOfYoBCjvlKjbuBCAjWyDEHU8n). For each type of classifier, a model that showed on average the best result for all folds was selected. Since the classification result for ANN and LSTM depends on the initial initialization, in order to increase statistical reliability, training, and evaluation of the results were carried out five times. That is, each model of these two types of classifiers was re-initiated, trained, and evaluated. The full results of computational experiments are given in Appendix 3(https://drive.google.com/open?id= 1al9QqJGydtAwNzapWribV3rBPzkSPdvC). The tables below show the results for different partitions. It can be seen that the accuracy of the classification significantly depends on the splitting, in addition, even on the ideal data; the accuracy of the classifiers differs significantly. The results of the classification are shown in Table 1.
The same classifiers have been applied to real data.
On real data, the accuracy of all classifiers is significantly lower, it is also clear that ANN, LSTM, XGBOOST show the best results. From Table 2 it can be seen that the splitting at which wells 24-29 fall into the test set leads to the worst results. In this regard, the idea arose to try to replace them with "ideal" generated data. Since the results depend on dataset splitting, for subsequent experiments we fixed splitting (0-6 test boreholes). The results are shown in Table 3 (ORIGINALsource data, DATA_ID = Truemark synthetic data, DATA_ID = Falsenot mark synthetic data).
It can be seen that mixing real data with synthetic data without marking the latter leads to a drop in the accuracy of all classifiers. However, the use of labeled synthetic data slightly improves the accuracy of SVM. Next, an experiment was conducted in which the training set was formed from 30 real and 30 synthetic boreholes. The results are shown in Table 4.
It can be seen that the addition of synthetic data (assuming they are labeled) has no significant effect on the quality of classification. Additional experiments are needed to analyze synthetic data generation models.
Another parameter whose influence was investigated is the size of the floating window. Since the floating window cuts a part of the curve, not allowing it to be interpreted, certain restrictions are imposed on the size of the floating window, especially on its lower part. In this regard, an asymmetrical window was used, the size of n + 1 + m points where m was fixed, and n took values from 5 to 150. The results are shown in Table 5.
A certain increase in the indicators of the XGBoost and LSTM classifiers can be noted.
The boreholes located nearby often have a similar distribution of rocks in depth, which is often used by an expert interpreter to refine and correct the results of lithological interpretation across boreholes (usually geological cross-sections are used for this purpose). Since the coordinates were available for each of the boreholes, the simplest way to use the information about the mutual arrangement of the boreholes was to submit the coordinates as training parameters. The results are shown in Table 6.
We can conclude that the use of coordinates allowed us to achieve a slight increase in indicators. This is especially noticeable on fold 24-29, which is much worse than average.

Conclusion
Accurate interpretation of electric log data is vital for uranium production needs, specifically for selecting filter installation location when using the method of uranium extraction via sub-surface in-situ leaching of boreholes. In the process of data interpretation, an expert identifies bedding layers of lithotypes and practically performs lithologic classification by describing borehole structure throughout its depth.
To exclude the subjectivity of expert assessments, a digital model of the borehole was developed, which made it possible to estimate the upper limit of the accuracy of various MO algorithms.
On simulated data, the best result was shown by XGBoost, F1 = 0.907, followed by LSTM (F1 = 0.904). On real data, the results are significantly lower. The best results were shown by XGBoost (F1 = 0.467) and ANN (F1 = 0.449). One of the reasons for this discrepancy is probably the subjectivity (inconsistency) of expert assessments, which, due to the lack of objective data about the rocks, classifiers were trained.
Adding simulated data (assuming they are labeled) has practically no effect on the quality of the classification. It can be assumed that improving the quality of the results is possible with further adaptation of the methods for synthesizing artificial logging data to the existing physical logging model.
In general, it can be concluded that ANN, XGBoost, and LSTM are promising for solving the problem of lithological interpretation, as well as the need to assess and, if possible, take into account the inconsistency of expert assessments.