At-bit estimation of rock density from real-time drilling data using deep learning with online calibration

We present a novel streaming learning approach, utilizing a deep neural network (DNN) to learn from data available during operation to estimate at-bit density using drilling parameters. Since every wellbore is different, the relationship between drilling parameters and at-bit density varies. Equipment used, well trajectory, friction and bit wear are examples of conditions that affect this relationship and makes a pre-trained model unable to represent an accurate input/output mapping applicable to all wells. However, using delayed density log measurements, continuously supervising updates to the model is possible during operation. The algorithm has been tested on drilling data from wells on a field operated by Equinor and compared to a standard deep learning approach, where results show that a streaming learning approach outperforms the traditional method. Statistical analyses have been performed to verify the statistical significance and effect size on the data sets. Data visualizations using a t-distributed Stochastic Neighbor Embedding (t-SNE) indicate that the relationship between drilling parameters and density log indeed vary between wellbores, making generalizability an issue for a traditional supervised learning approach to this problem, and motivating a streaming learning approach. Using the proposed method, more accurate at-bit estimates can be made, providing preliminary indications ahead of the tool placed 20–30 m behind the bit, which, dependent on rate of penetration (ROP), will be


Introduction
The drilling operation is complex.It is also subject to significant uncertainty, which needs to be managed in order to ensure a safe and efficient drilling operation.One such source of uncertainty is at-bit lithology, which is a central part of the environment the drilling system is in interaction with.Lithology evaluation is relevant for best-practice selection of suitable drilling parameters, evaluation of reservoir structure and evaluation of well placement, to mention a few.Logging while drilling (LWD) tools (Arps and Arps, 1964) are typically used by experts to classify lithology.However, they are placed some distance behind the bit, making at-bit lithology evaluation directly from LWD tools impossible.The density log is an LWD tool that can be used to separate harder lithologies such as stringers from softer lithologies.Although not a conclusive lithology indicator on its own, it provides valuable information on downhole conditions, and in combination with other LWD logs an accurate understanding of downhole lithology can be achieved.This tool is typically mounted 20-30 m behind the bit.Due to the placement of these tools, drilling parameters are the earliest indicators for changes in at-bit lithology, although they are where the neutron porosity, delta-time shear, and array induction two-foot resistivity are estimated based on gamma ray and delta-time compressional.Results include comparisons between the LSTM and DNN, where they found the LSTM to be superior for their study.

ANFIS
Another study (Osarogiagbon et al., 2020) simulates missing data in a wellbore, and presents results for several machine learning algorithms attempting to estimate gamma ray log from drilling parameters.
Their results are compared to that of an LSTM and a bi-directional LSTM.Gowida et al. (2020) presents experiments using both a DNN and an adaptive network-based fuzzy inference system (ANFIS) for estimation of density log using drilling parameters as input.The models in this study are trained and tested on 2400 observations from the same well, where missing data is simulated for a section of the well, on which the trained models perform with high accuracy.If a well has available log data for only certain sections, these standard deep learning methods could be applied to fill in sections of missing log data.A novel Bayesian neural network approach, using neutron porosity, gamma ray, deep resistivity, photoelectric factor and density logs to estimate sonic log, is presented in Feng et al. (2021b).Their results are comparable to that of traditional neural networks, and in addition offers quantification of uncertainty in the predictions.Lithology classification based on LWD logs using data-driven methods has been presented in several previous publications.Examples of methods include scalable gradient boosted decision trees (Dev and Eden, 2019), support vector machines (SVMs) (Al-Anazi and Gates, 2010) and bidirectional gated recurrent units (GRUs) (Zeng et al., 2020;Tian et al., 2021).Evaluation studies (Xie et al., 2018) have also been presented.Other downhole characteristics have also been addressed using machine learning.In Tunkiel et al. ( 2021), a streaming learning system for inclination prediction in directional drilling using a GRU is presented.This work is motivated by avoiding delayed corrective actions, thus improving well placement.It is also argued that standard machine learning approaches without online retraining fail to generalize well to different wellbores.In Feng et al. (2021a), a Bayesian approach is applied to fault detection using a convolutional neural network (CNN), allowing risk evaluation through uncertainty quantification.
The contribution of this work is two-fold.First, we present a novel streaming learning approach, using a DNN to solve the problem of estimating at-bit density log from drilling parameters.A pre-trained model continuously learns from the data stream available during operation, allowing the model to adapt to case specific conditions.The method is applied to data from several wellbores on a field operated by Equinor to demonstrate performance, and compared to the performance of a baseline model, which represents the traditional approach to log estimation.Next, an unsupervised learning data analysis is performed, which visualizes how data from different wellbores are structured differently.This analysis indicates the weakness in using a pre-trained and static model to estimate at-bit density using drilling parameters, motivating the streaming learning approach.
The paper is structured as follows: Section 2 presents the data used, along with the different methods and algorithms used in this study, including our own -bin experience replay buffer.Section 3 presents results for data visualizations and at-bit density estimates versus measurements.Lastly, Section 4 offers conclusions and suggestions for further work.

Data
Data from the reservoir sections of 9 different wells on a homogeneous field operated by Equinor were gathered for this work, for a total of 1.75 million observations.The training set contained 5 wells, for a total of approximately 740 000 observations, and the validation set consisted of 1 well with approximately 365 000 observations.The test set contained 3 wells, with a total of 645 000 observations.Data cleaning was performed by visual inspection.Obviously erroneous measurements were removed by logic specifying reasonable values.Next, data was centered and scaled to have zero mean and unit variance before training commenced.

Measurements description
The drilling system is composed of several subsystems, where data acquisition is different for different subsystems.For the purpose of this work, we divide measurements into two groups: surface measurements and downhole measurements.Surface measurements are taken on the rig, and examples are drilling parameters like hook load (HKLD), surface torque (T), surface drill string rotation (RPM), flow (Q) and hook height.Weight on bit (WOB) is closely related to, and derived from HKLD, while bit depth, hole depth and rate of penetration (ROP) are derived from hook height.In this work, downhole measurements refers to the density log.The density logging tool measures density of lithology perpendicular to the drill string by emitting gamma rays and detecting backscatter, giving a measure of average electron density in the lithology, which is strongly correlated with bulk density.In other words, it gives an indirect measurement of the bulk density.These measurements are communicated to the rig by mud pulse telemetry.
It is established (Bourgoyne and Young, 1974) that lithologies with different properties yield different bit-rock interactions, meaning that ROP is dependent on rock properties and input actuation like WOB, T and RPM.This means that information regarding at-bit conditions like density should be latently available in real-time through these surface measurements.The density logging tool on the other hand is installed in the bottomhole assembly (BHA), 20-30 m behind the bit.Dependent on ROP, this amounts to significant time delays on logs in ranges of typically 20-120 min.To eliminate this delay, we propose to estimate the at-bit density from surface measurements.

Depth correction & resampling
Surface measurements are sampled with a sampling period of typically 2-3 s, while the density log is sampled at a lower rate, typically every 10-15 s.Also, surface measurements and downhole measurements recorded at a given time provide information at different depths.For these reasons, a method to correct for depth and differing sampling rates was required.This was done using a holding buffer that stored incomplete observations.As Fig. 1 illustrates, an observation can be completed once the distance between bit and tool has been drilled, and the label corresponding to some set of surface measurements is obtained.We start by denoting the sampling periods for surface measurements , and the density log , as   and   , respectively.The th surface measurements are taken at time   , so that   = (  ).Similarly, the th density log measurement is taken at time   , so that   = (  ).  provides information at the bit, at depth  , =   (  ), while   , measured at a known distance behind the bit,   , provides information at depth  , =   (  ) −   .Once we obtain the first density log measurement   , where  , >  , for any indices  in the holding buffer, we can pair observations so that each   in the holding buffer is paired with the previous density log measurement,  −1 , where  ,−1 ≤  , .In addition to correcting for depth, this procedure is equivalent to resampling the 's using the forward fill method.

Variable selection
The variables used as input to the DNN were selected based on domain knowledge and availability.To eliminate the density log delay, we limit ourselves to measurements providing at-bit information, which eliminates other LWD tools.In addition to using drilling parameters, we can perform some feature engineering.Mechanical specific energy (MSE) quantifies the amount of energy required to remove a unit volume of rock.It is a function of ROP, WOB, T and RPM, and known to be different for different lithologies (Dupriest and Koederitz, 2005).MSE is given by: where units are WOB (lb), T (lb-ft), ROP (ft/h), and   is bit area (in. 2 ).
Another parameter of interest is the hydro-mechanical specific energy (HMSE), which also accounts for the weakening of the rock ahead of the bit due to flow.HMSE is given by: where  is the hydraulic energy reduction factor,   is the bit pressure drop at the nozzle (psi), and Q is the flow rate (gpm).Some parameters related to the bit and mud were not available to compute the hydraulic contribution of HMSE (Osarogiagbon et al., 2020).Default parameter values were set for these, as described in Table 1.
From the default parameters, we can compute HMSE, and further define the input vector of predictors for the DNN, , as: (3) The output of the DNN, ŷ, is simply at-bit density.As a sanity check, we wish to investigate the correlations between the density log and the inputs to the model, which are illustrated in Fig. 2. The correlations are presented separately for each wellbore, and it can be seen that the strength of the linear relationships varies between wellbores.Correlations between density and ROP are the most consistent.MSE and HMSE are also quite strongly correlated with density for most wellbores, although the strength varies more.Especially for training wells 3 and 5, these relationships are weaker.RPM, WOB and T correlations vary more, although for some wells, these can be seen to be strongly correlated with density.Several of the drilling parameters are controlled by the driller.Autodrillers can be set to maintain constant WOB or ROP.
When the driller suspects that a stringer is being drilled, the WOB is routinely increased while RPM is reduced.This will typically lead to a reduction in T as well.For training wells 2-5, the correlations for RPM, WOB and T in Fig. 2 support this.For training well 1, however, the signs of the correlations for these parameters are inverted.For the validation well, the sign of the correlation for T is inverted.Since these parameters are controlled by the driller, correlations with density will be highly affected by the driller's choices.As an example, if the drilling strategy is to increase both WOB and RPM for stringers, the resulting ROP might increase as well.Isolated, increased ROP for stringers might seem counterintuitive, but would be explained by the overall drilling strategy and the actions performed by the driller.MSE and HMSE on the other hand, eliminate the driller's actions, and define the input energy required to drill through the rock.The correlations between density and these parameters are consistently positive, indicating that more energy is required to drill denser lithology.

Deep neural networks
The specifics of the DNN used in this work are outlined in this section.First, we provide notation for the DNN: [] ∶ trainable weight matrix for layer   [] ∶ trainable bias vector for layer  and ) For a given layer ,  [] denotes the linear combination of activations from the previous layer,  [−1] , determined by the trainable weights  []  and biases  [] : [] =  []⊤  [−1] +  [] , where  is a row vector of ones.Eq. ( 8) describes the linear component of the forward propagation.Next,  [] is passed through the layer activation function  [] , which is the nonlinear component of the forward propagation: [] =  [] ( [] ). (9) The leaky ReLU activation function is utilized for every hidden layer, so that  [] = max{ [] ,  [] },  = 1, … ,  − 1. (10) is the slope in the left half plane.Note that  [0] = .Finally, the DNN outputs the estimated density at-bit, which is a quantitative output, resulting in a regression layer: ŷ =  [] =  [] . (11) Eqs. ( 8)-( 11 .The weight initialization for layer  is then given by: The biases,  [] , are initialized as zeros.The optimizer used was Adam optimization (Kingma and Ba, 2015), which adaptively estimates appropriate momentum for the gradient updates.The Adam optimization algorithm is given by Algorithm 1.

n-bin prioritized experience replay
In conventional supervised learning, a model is typically trained on a fixed data set for multiple passes, aiming to converge towards a wellperforming model for unseen data assumed to come from the same distribution.However, this simple assumption breaks down in many cases due to a variety of factors, such as shift in the independent or dependent variables, or due to an evolving underlying process.This phenomenon is commonly known as nonstationarity or concept drift, and is harmful to the predictive power of such models (Ditzler et al., 2015;Elwell and Polikar, 2011).The aim of streaming learning is to continuously update the model to correct for these effects.However, a common problem in streaming learning is catastrophic forgetting (Mc-Closkey and Cohen, 1989), where old representations are forgotten due to adaptation to the non-stationary environment.In our attempt to adapt to drifting concepts while mitigating catastrophic forgetting, an -bin prioritized experience replay  ∈ R  x  buffer was developed.The prediction space  ∈ [  ,   ] is divided into  bins, and each bin in the buffer contains  observations.This configuration ensures that the replay buffer always contains observations covering the span of the prediction space, and thus that the model is better equipped to give accurate estimates overall, and not be biased by the distribution of the latest available observations.A similar configuration has been used for multi-class classification, with one buffer for each class (Hayes et al., 2019).When learning from a data stream, an observation (  ,   ) is allocated to a bin in the experience replay buffer based on the value of   , and consequently, the oldest observation of that bin is discarded.The mini-batch used for backpropagation is sampled from the experience replay by prioritization using the softmax function.At every update of the model, the observations are given a probability of being sampled for the next update by: where  =  is the total number of observations in the buffer.
It can be seen that a higher model error on an observation results in a higher probability of being sampled for the next training step.This method is well known in the reinforcement learning field, and has shown to be an improvement over sampling observations from a uniform distribution (Schaul et al., 2016), due to added focus on areas where the model performs poorly.The -bin prioritized experience replay algorithm is given formally as: end if 10: end while Retention of observations from the entire range of the prediction space in this fashion allows smaller replay buffers.Rather than retaining all previous observations for further training, a small subset is kept, making this method memory efficient.In addition, dividing the prediction space into bins ensures that historical observations in one bin are available as long as no new observations stream into that bin.Thus, the algorithm can take into account older representations while making updates during operation.

Streaming learning system
The streaming learning system is based on a pre-trained and validated model, which is loaded as the baseline model, on which to iteratively perform updates during operation.As surface measurements become available during operation, they are first fed to the DNN to estimate at-bit density.Subsequently, they are stored in the holding buffer until its corresponding label, the density log measurement, is available.At this point, the completed observation is moved from the holding buffer to the experience replay buffer, and then used for further training.Note that the learning phase begins at time  when  , ≥  ,0 , meaning that we are not interested in the density log measurements above the first available drilling parameter measurements.Algorithm 3 summarizes the method.

Pre-training and validation
Pre-training and validation of the baseline model was an iterative approach.Hyperparameters such as DNN architecture (neurons and layers), learning rate, mini-batch size and number of epochs were tuned in an informal search based on validation set performance.Upon arrival on a satisfactory model, the streaming learning hyperparameters were tuned on the same data set.Table 2 shows the hyperparameter settings.The resulting baseline model from pre-training and validation had 3 hidden layers, each with 12 neurons.Streaming learning hyperparameters refers to the settings for streaming learning during operation.It can be seen that here, the learning rate and mini-batches sampled from the experience replay are smaller than during pre-training.Experience replay bin limits refers to the limits on the density log used in Algorithm 2 to select the appropriate bin.Density log for the data used in this study was in the range 2.0-2.7 (g/cm 3 ), so that observations below the first bin limit would belong to the low bin, observations between the two limits belonged to the mid bin, and lastly, observations above the second bin limit belonged to the high bin.The hyperparameter values presented in Table 2 should make decent initial values for similar problems.However, note that the experience replay bin limits in particular will be very case-dependent.The bin limits ensure retention of observations in different parts of the prediction space, making an understanding of the dependent variable key.We suggest consideration of important areas in the prediction space when tuning these parameters.If several areas are important, the number of bins could also be increased.

Streaming learning results
This subsection is dedicated to the presentation and evaluation of the performance of the streaming learning approach compared to the baseline model.We provide results for the validation well, along with the 3 test wells.Along with each plot, the mean absolute error (MAE) is presented.MAE is given by: Although the raw data used for the algorithm was time data, the results are converted into depth data with equidistant points at a resolution of 1 m by downsampling.At every integer depth, data points within 0.5 m are averaged.Fig. 4 shows the comparison between the baseline and streaming learning approach on the validation well.For the baseline model, one can see that the model suffers from bias on low-density observations throughout the wellbore, and that the model gradually overestimates density from 6000 m and towards the end.An inspection of the raw data revealed that for this wellbore, the torque gradually increased which indicates concept drift.The streaming learning approach can be seen to mitigate both the bias and the drift, resulting in an MAE decrease from 0.1087 g/cm 3 to 0.0615 g/cm 3 .This is a relative decrease of approximately 43%.Figs. 5 and 6 show the same comparisons for test wells 1 and 2. On these wellbores, the baseline model performs quite well.Still, on test set 1 at approximately 5000-5500 m depth, the baseline model is visibly off measurements.On test set 2, the baseline model overestimates low density observations.For test wells 1 and 2, the streaming learning approach reduces these errors, resulting in 21% and 7% decreases in MAE, respectively.For test well 2, the resulting improvement from the streaming learning algorithm is incremental, and the added complexity of method implementation compared to the baseline model counterpart may not be worthwhile.For wellbores with similar input/output relationships to those of the training wells, the baseline model will perform well without the need for continuous updates.However, in many cases it is not known in advance whether or not this will be the case.Such considerations should be a part of the validation process.If it is found that a static model performs well during validation, a traditional supervised learning approach might be sufficient.If, however, heterogeneity between wellbores is found during this process, a streaming learning approach might be warranted to correct for drifts and shifts in concepts.
Lastly, Fig. 7 shows the results for test well 3.It can be seen that the baseline model is very flat, and unable to capture any trends in the at-bit density.This indicates that the mapping from drilling parameters to at-bit density is significantly different from that of the wellbores in the training set.We can call this a shift in concepts.Although the streaming learning approach is not as good as for the previous wells, it is to a much larger degree able to capture the trends by adapting to   the concept shift, resulting in an MAE decrease of 36%.Even though the maximum recorded density for these wellbores is 2.7 g/cm 3 , it can be seen that the online models estimates densities above this value at depths 5030 m and 5110 m.At depth 5030 m, an unusually low value for RPM was measured, along with a high WOB.At 5110 m, an unusually high MSE was recorded, indicating an unusual combination of drilling parameters.As neural networks do not extrapolate well, these events result in erroneously high density estimates.The flattening effect apparent for this wellbore can also to some extent be observed in the other wellbores.In the other wellbores this takes the form of a cutoff effect, so that low-density observations are not estimated well by the baseline model.This is likely due to heterogeneity between the training wellbores.Since the input/output relationships in the wells differ, the baseline model is trained to fit an ''average'' of these, resulting in a compressive effect on the predictions.
The rate of adaption to newly available data can be seen to differ for the different data sets.For the validation, test 1 and test 2 sets, we can see that the baseline model overestimates the low-density observations in the beginning of the drilling operation.For the streaming learning approach, we can see that this is quickly mitigated by the online retraining.Also for test set 3, the performance is improved, although the corrections are not as fast.From the poor baseline model performance, we could argue that this wellbore is the most different from the training sets, and that the speed of the online retraining is dependent on how   much the model must be corrected to fit well to the newly available data.The learning rate of the system determines the size of the gradient descent updates, however, setting this hyperparameter too high can lead to unstable optimization.Thus, the online retraining rate must be a trade-off between speed and stability.
From the absolute errors on the depth converted data sets, we can investigate the statistical significance of the difference in performance between the two approaches, along with effect size and statistical power.We can perform paired, two-tailed t-tests for each wellbore to determine statistical significance.The null hypothesis is  0 ∶  1 =  2 , where  1 is the population mean absolute error for the baseline model, and  2 is the population mean absolute error for the streaming learning approach.We also determine Cohen's , which is a measure of standardized effect size.The proposed interpretation of  is along a continuum, with conventional small, medium and large effect sizes at approximately 0.2, 0.5 and 0.8, respectively.Combined with a statistical significance level  0 = 0.05, which is the accepted probability that we are failing to reject a false null hypothesis (type I error), we can find the statistical power, 1 − , where  quantifies the probability of rejecting the null hypothesis given that it is actually correct (type II error) (Portney, 2020).In Table 3 we present , the number of observations in each depth converted data set, MAE  , the mean absolute error using the baseline model, MAE  the mean absolute error using the streaming learning approach, MAE = MAE  − MAE  , Cohen's , values and statistical power, 1−.We can see from the -values that the difference in performance are statistically significant for all wellbores.Thus, we reject  0 .From Cohen's , we see that the effect size on the validation and test 3 wellbores are medium to large.The effect size for the test 1 wellbore is medium, and small for the test 2 wellbore.For all wellbores, the probability of committing a type II error, is ≈ 0.

Data visualization with t-SNE
Data visualizations with t-SNE in two dimensions are provided to complement the results in Figs.4-7 (see Appendix for a brief summary of the t-SNE method).From these plots, we have observed several different scenarios: drift, minor offsets and significant concept shift.We wish to visualize the data to better understand these effects.In the t-SNE analyses, a set of data points in 7 dimensions, taken as: is reduced to the set of data points [ 1 ,  2 ] in two dimensions.Using this setup, we can identify the structures of the data in the wellbores, for example if similar drilling parameters result in similar or different density log measurements for different wellbores.Fig. 8 illustrates the two-dimensional mapping for 3 random subsets from the same training well, each containing one third of the observations.As expected, these subsets occupy similar spaces in the plot.This indicates that a model trained on one of these subsets could perform well on the other two.Fig. 9 illustrates the same analysis on training well 1, and test sets 1 and 2. The baseline model was found to perform quite well for these test sets, and from this t-SNE analysis, we can see that the observations from these wellbores indeed overlap reasonably well with the training well in the two-dimensional mapping.Lastly, we inspect Fig. 10, which shows the two-dimensional mapping for training well 1, along with the validation well and test well 3.The validation well and test well 3 exhibited concept drift and shift respectively, and the baseline model performed poorly on them.From the plot, we can observe natural clustering within each wellbore, which might be attributed to the fact that low-density and high-density observations should be different.However, it can be seen that several clusters from the validation well and test well are isolated from each other and from the training well.
As they form separate clusters, this indicates that all 3 wellbores belong to their own natural grouping.Because of this, any baseline model regardless of model type and architecture, cannot be trained on this training well and be expected to generalize well to the others.In other words, streaming learning is essential for obtaining acceptable performance in this case.

Conclusions & further work
Since the density log is typically mounted 20-30 m behind the bit, the driller is rendered blind to at-bit conditions.In the current work, a DNN is used to estimate at-bit density from drilling parameters to eliminate this delay.The DNN is pre-trained on historical data from wells on a field operated by Equinor, serving as a baseline model.Using delayed density log measurements, the model is continuously updated during operation.This streaming learning approach allows adjustments to changing conditions that are not explicitly included in the model as variables.Comparisons of the results for the baseline model and the streaming learning approach indeed show that performance in terms of mean absolute error can be greatly improved using a streaming learning approach.This is especially true for wellbores where the relationships between drilling parameters and density log are significantly different from what the model has seen before during training.The method gives preliminary at-bit density log estimates that are available in realtime, while adapting to change, thus increasing generalizability so that the model is applicable to a wider range of cases.t-SNE is used to visualize the data from different wellbores and shows that the data sets are structurally different.This indicates that a pre-trained model, regardless of model architecture, will be unable to generalize to

Appendix. t-distributed Stochastic Neighbor Embedding
t-distributed Stochastic Neighbor Embedding (t-SNE) (Van der Maaten and Hinton, 2008) falls within the category of unsupervised learning algorithms.It is typically used for visualization of high-dimensional data by dimensionality reduction.It is a nonlinear method capable of preserving the local structure of high-dimensional data while revealing global structures such as clusters.When converting the high-dimensional data  = { 1 ,  2 , …   } to a low-dimensional mapping  = { 1 ,  2 , … ,   }, t-SNE starts by converting the highdimensional Euclidean distances between data points to similarities  | , quantifying the conditional probability that   would pick   as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at   .This similarity is given as: From these, the joint probabilities are defined to be symmetrized conditional probabilities, that is: is the variance of the Gaussian centered at   .This parameter can be indirectly tuned by the user through the perplexity hyperparameter.For a user-specified perplexity, t-SNE performs a binary search for the value of   that produces a probability distribution   over all the other data points with the same perplexity  (  ).This is defined as: where (  ) is the Shannon entropy measured in bits: Perplexity can be viewed as a smoothing measure for the number of effective neighbors, and t-SNE is robust to changes in this parameter.
For the low-dimensional mappings, the similarities   are computed using a Student t-distribution with one degree of freedom, resulting in: Note that   and   are set to 0 since t-SNE is only interested in modeling pairwise similarities.Next, t-SNE minimizes the Kullback-Leibler divergence between the two probability distributions  and  through gradient descent.The Kullback-Leibler divergence is given by: from which the gradient w.r.t the low-dimensional map can found to be: which can be used to update the low-dimensional mapping  from an initial value.

Fig. 1 .
Fig. 1.Schematics illustrating the drilling operation and the availability of measurements.
) describe the forward propagation of the DNN.The model can be visualized by a layered model with connected nodes, as shown in Fig.3.The weight initialization is He normal(He et al., 2015), which mitigates exploding and vanishing gradients by managing the variance of the activations throughout the layers of the network.This is done by pulling the weights in each layer from a truncated normal distribution with mean  = 0 and standard deviation  = √ 1 [−1]

Fig. 2 .
Fig. 2. Correlations between density log and drilling parameters for training wells and validation well.

Fig. 4 .
Fig. 4. Measured and estimates on the validation set.Top: Baseline model performance.Middle: Streaming learning performance.Bottom: Absolute errors (AE).

Table 1
Default values set for unknown parameters.Parameter Default value Bit type Polycrystalline Diamond Compact (PDC) Junk slot area 14 (in. 2 ) Flow area 0.12 (in. 2 ) Mud weight 20 (ppg) 2 and  are tunable hyperparameters for Adam optimization, and  is the current iteration number.  and   are biased first moment estimates.  and   are biased second raw moment estimates.Superscript  denotes their bias-corrected counterparts. and  are and   , respectively. is the learning rate.Cost  is defined as

Table 2
Summary of hyperparameters.