On gray-box modeling for virtual flow metering

A virtual flow meter (VFM) enables continuous prediction of flow rates in petroleum production systems. The predicted flow rates may aid the daily control and optimization of a petroleum asset. Gray-box modeling is an approach that combines mechanistic and data-driven modeling. The objective is to create a computationally feasible VFM for use in real-time applications, with high prediction accuracy and scientifically consistent behavior. This article investigates five different gray-box model types in an industrial case study using real, historical production data from 10 petroleum wells, spanning at most four years of production. The results are diverse with an oil flow rate prediction error in the range of 1.8%-40.6%. Further, the study casts light upon the nontrivial task of balancing learning from both physics and data. Consequently, providing general recommendations towards the suitability of different hybrid models is challenging. Nevertheless, the results are promising and indicate that gray-box VFMs may reduce the prediction error of a mechanistic VFM while remaining scientifically consistent. The findings motivate further experimentation with gray-box VFM models and suggest several future research directions to improve upon the performance and scientific consistency.


Introduction
To optimally control a petroleum asset and maximize the recovery of oil and gas, it is necessary to have an adequate understanding of the behavior of the petroleum production system. This consists of the reservoir, wells, flowlines, pipelines, and separators. Commonly, a mathematical model of the flow through the production system is developed as an aid to information gathering and analysis of the system response to changes in control variables. Such a model is often referred to as a virtual flow meter (VFM) (Toskey, 2012). A VFM aims to continuously predict the multiphase flow rates (mixture of gas, oil, and water) at strategic locations in the asset, for instance in individual wells. The characteristics of multiphase flow represents a particular challenge to prediction. Several types of VFM models exist, ranging from mechanistic to data-driven, thus, from white-box to black-box, respectively (de Prada et al., 2018). Depending on the prior knowledge about the system and the available process data, one model type can be more suitable than others, see Figure 1.

Virtual flow meter models
Mechanistic models are based on prior knowledge about the process and utilize first-principle laws, with possible empirical closure relations, to describe the relationship between the process input, internal, and output variables (Shippen, 2012). Contrarily, data-driven models require no prior knowledge of the process, and rather exploit patterns in available process data to describe the input-output relationship. Therefore, data-driven models often lack scientific consistency. A model may be considered scientifically consistent if the output of the model is plausible and in line with existing scientific principles (Roscher et al., 2020). Although this concept is hard to quantify and dependent on the user's scientific knowledge, it is an important characteristic as it promotes trust in the model. As mechanistic models are derived from physical laws, their scientific consistency is high. On the other hand, assumptions and simplifications of the process physics are typically necessary for a mechanistic model to be computationally feasible and suitable for use in real-time control and optimization applications (Solle et al., 2016). Accordingly, mechanistic models often lack flexibility, which is the ability to adapt to unknown and unmodeled physical phenomena. Oppositely, due to the generic structure of data-driven models, the flexibility is high and the models may adapt to arbitrary complex physical behavior as long as this is reflected in the available data. Yet, data-driven models are data-hungry and sensitive to the quality and variability of the data. If care is not taken, overfitting of the model to data is a frequent outcome that results in poor extrapolation abilities to future process conditions (Solle et al., 2016).
Gray-box models, or hybrid models, are a combination of mechanistic and data-driven models. The goal is to achieve a computationally feasible model that have a high flexibility and a scientifically consistent behavior. There exist numerous ways of constructing hybrid models. According to Willard et al. (2020), gray-box models can be divided into two domains: 1) data-driven modeling to advance first-principle models, or 2) utilization of first principles to guide data-driven models. The two domains correspond to either side of the grayscale illustrated in Figure 1 and will be referred to as the white-to-gray and the black-to-gray approach. Taking VFM as an example, a white-to-gray model is obtained if a mechanistic model is used as a baseline whereupon data-driven models are inserted to replace assumptions or simplifications. For instance, a common approach to estimate the density of gas in a mechanistic model is with the real gas law. Instead, if this relation is described with a data-driven model, a white-to-gray VFM model is obtained. Another example is to introduce a data-driven model to capture the error between the output of the mechanistic model and corresponding measurements, see an example in Bikmukhametov and Jäschke (2020a). In general, the data-driven models may substitute any factors or terms in the mechanistic model. An example of a black-to-gray VFM model is if first principles are exploited to calculate additional features to be applied as input to a data-driven model. This is commonly referred to as feature engineering. A different approach is a division into natural submodels, for instance individual wells in an asset, describe each with a data-driven model and combine the output using first-principle laws. The two approaches can also be juxtaposed. For instance, both a mechanistic and a data-driven model can be developed to predict the multiphase flow rate and the model outputs combined in an ensemble model. Independent of the gray-box model type, measures should be taken to determine an appropriate degree of influence the mechanistic and data-driven part should have on the model output. In other words, there should exist a pertinent balance between learning from physics and learning from data. For instance, if the available process data are inaccurate, the mechanistic part of the model should influence the gray-box model output the most.
If the process exhibits unknown behavior, the data-driven part should have the greatest impact. Desirably, the gray-box model should learn as much as possible from both physics and data.

Literature review
The literature reports substantial research on mechanistic and data-driven modeling of VFMs (Amin, 2015;Zangl et al., 2014;Ajmi et al., 2015;AL-Qutami et al., 2017a,b,c, 2018Omrani et al., 2018;Bikmukhametov and Jäschke, 2019;Ghorbani et al., 2018). An extensive review is found in Bikmukhametov and Jäschke (2020b). Some well-known commercial mechanistic VFMs are Olga, LedaFlow, FlowMananger, ValiPerformance, and Prosper. In the study by Amin (2015), it was found that all the above-mentioned commercial mechanistic VFM achieved an error less than 5% and 10% for the prediction of oil and gas flow rates, respectively. The noticeable series of studies on data-driven VFM by AL-Qutami et al. (2017a,b,c, 2018 achieved errors of 1.5%, 4.2%, and 4.7% on the predictions of gas, oil, and water flow rates, respectively. Despite recent emerging tools for hybrid, gray-box modeling, such as gPROMS (Siemens Process Systems Engineering, 2021), and even a commercially available hybrid VFM: TurbulentFlux (Ruden, 2020), little literature on the performance of gray-box VFMs exist. TurbulentFlux reports an error of 4% on multiphase flow rate predictions over two months for one of the tested wells. However, the robustness in performance for different wells is not reported. Furthermore, as no reference model is tested on the available data it is difficult to conclude whether the hybrid model performs better than alternative approaches. Nevertheless, some examples exist in the literature (Xu et al., 2011;Al-Rawahi et al., 2012;Kanin et al., 2019;Bikmukhametov and Jäschke, 2020a). Most of these studied different gray-box approaches on synthetic data, either as an experimental set up in a test rig (Xu et al., 2011) or a multiphase flow loop (Al-Rawahi et al., 2012), or using lab data available online (Kanin et al., 2019). The study in Bikmukhametov and Jäschke (2020a) investigated several hybrid VFM variants on real production data, with a large focus on the black-to-gray modeling approach. However, their results were based on process data from only one subsea well and the modeling approach could benefit from a deeper study of more petroleum wells.

Contributions
This research contributes to the field of gray-box VFM modeling with an in-depth study of five white-to-gray VFM models of a petroleum production choke valve. A mechanistic and data-driven model is developed for comparison of the performance and scientific consistency. The study is a significant expansion of the work done in Hotvedt et al. (2020Hotvedt et al. ( , 2021). The number of tested gray-box models is increased, the complexity of the model components is higher, and data from more wells are included. The VFM models are developed for 10 petroleum wells at Edvard Grieg (Lundin Energy Norway, 2020). Real, historical production data are used in the model development, thus no experimental setup or simulator is required for data acquisition. With data from 10 wells, the robustness of the modeling approaches can be investigated to a certain extent.
The results in this research are in respect to the VFM application, and the generalizability to other application areas is not considered.

Production choke valve models
A production system is illustrated in Figure 2, from the down-hole, the closest measurement point to the reservoir, to the separator. The volumetric flow rate from several wells are commingled and the total production from the asset is separated into three phases, oil (Q O ), water (Q W ), and gas (Q G ), at the separator. The production choke valve is located in the wellhead of the production system. The choke is a key element in the daily control and optimization of a petroleum production system. By adjusting the choke opening, the multiphase flow rate through the production system can be controlled to maximize production while meeting operational requirements such as production capacity constraints. In this research, only the production choke is modeled. This results in lesser model complexity and avoids the utilization of down-hole sensor measurements. For assets where down-hole measurements are lacking or faulty, this is advantageous. Naturally, for assets with good down-hole measurements, the VFM can be expanded.
To model the choke for individual wells, the following measurements are required: the choke opening (u), the pressures (p) and temperatures (T ) located upstream (1) and downstream (2) the choke valve, and measurements of the flow rate (q). Measurements of the phasic flow rates in individual wells q = (q O , q G , q W ) can be obtained from well tests, for instance using a test separator, or multiphase flow meters (MPFMs) if these are available. Furthermore, the phasic fluid mass fractions are required. Ideally, these should be calculated with a different model for each new sample, for example using a simplified wellbore model as in Kittilsen et al. (2014). Nevertheless, in this research, the mass fractions are treated as measurements, calculated using the flow rates from the MPFM in the previous measurement sample. Consequently, the utilized mass fraction will lag behind the true mass fractions. However, under the assumption of a slowly time-varying process, the mass fractions should not change significantly between each sample.

Mechanistic production choke model
Several mechanistic models exist for the production choke, in a varying scale of complexity in space and time. Mechanistic choke models are usually developed assuming steady-state, one dimensional (lumped) flow since increasing the dimensionality of the problem requires a numerical solution of the complex Navier-Stokes equations. These equations are computationally demanding and may not be suitable for use in real-time optimization (Shippen, 2012). There are several well-known choke models in literature and industry (Selmer-Olsen, 1995;Sachdeva et al., 1986;Perkins, 1993;Al-Safran and Kelkar, 2009). In this research, the Sachdeva model is used as the baseline model for hybridization.
The Sachdeva model is one of the less complex models as it introduces many assumptions and simplifications. Expectantly, introducing data-driven elements into the mechanistic model should increase the flexibility of the model and possibly supersede some of the simplifications. The exception is distributed effects in space and time as the Sachdeva model is assumed lumped and steady-state.
The Sachdeva model is derived from the combined mass and momentum balance equations (Jansen, 2015, p. 107): in which s is the position along a streamline, ρ is the fluid mixture density, v is the fluid mixture velocity,ṁ is the mass flow rate, and A is the area of the choke valve. Positions (1) and (2) indicate the inlet and outlet, respectively. By integrating Equation (1) between location (1) and (2) and introducing several assumptions, for example: • no-slip: the gas and liquid travels through the choke with equal velocity, • incompressible liquid: liquid densities are constant along s resulting in the oil and water densities being independent of the process conditions, • frozen flow: no mass transfers from one phase to another across the choke resulting in constant mass fractions independent of process conditions, • adiabatic gas expansion across the choke: no mass or heat transfers between the fluid and the surroundings, • thoroughly and homogeneously mixed fluid, • neglect of momentum effects at (1) due to A 2 A 1 , yielding v 2 2 v 2 1 , a model for the mass flow rate through the choke valve is obtained (see Sachdeva et al. (1986) for complete derivation): Here ρ i , η i , i ∈ {G, O, W } are the phasic densities and mass fractions, respectively, M G is the molar mass of gas, and p r is the downstream to upstream pressure ratio. The gas expansion coefficient κ is in this article treated as a constant but is in practice a function of pressure and temperature, κ = κ(p 1 , p 2 , T 1 , T 2 ).
The gas compressibility factor Z is calculated using the correlation in (Sutton, 1985). The discharge coefficient C D is commonly introduced to account for modeling errors. The area of the choke is a function of the choke opening The model differentiates between critical and subcritical flow using In short, critical flow is a phenomenon where the mass flow rate through the choke is not increasing for decreasing downstream pressure p 2 and fixed upstream pressure p 1 . A rule of thumb for the critical flow boundary p r,c for multiphase flow with a mixture of gas, oil, and water is p r,c ≈ 0.6 (Jansen, 2015). The volumetric flow rate may be obtained using the mass flow rate and the mixture density in standard conditions (SC), typically 1 atm and 15 • C (International Organization for Standardization, 1996). In this research, the model output is the oil volumetric flow rate: Mathematically, the mechanistic model (MM) in (3)-(9) is described with the generic function f that predicts the oil volumetric flow rate for the input measurements x and the set of model parameters φ M M : The φ M M are components in the model which are considered constant due to certain assumptions or simplifications. For instance, as described above, the oil and water densities are constant parameters due to the assumption of incompressible liquid.

Hybridization of the mechanistic model
To hybridize the MM, any of the factors or terms in (3)-(9) can be substituted with a data-driven model (DM). Approaching the hybridization from a physical point of view, some of the mechanistic model assumptions or simplifications can be imprecise, yielding an erroneous physical behavior. For instance, in low temperature and high-pressure conditions, the real gas law relation in (4) may be inaccurate. Instead of using a different, and possibly more complex, mechanistic relation such as van der Waals equation of state, the hybrid model utilizes a DM to substitute the real gas law. Presumably, by learning the gas density relation from patterns in the measurements only, a relation that is suitable for the process and adaptable to the current conditions is obtained. Taking another example, the adiabatic gas expansion equation in (5) assumes that no heat or mass transfer occurs between the system and surroundings, yet, in practice, both exist. If the available measurements reflect these physical phenomena, a DM substituting (5) should, to some extent, be able to implicitly capture the effect of, for instance, heat transfer on the flow rate, even without measurements of the ambient temperature. Similarly, most of the assumptions listed above may be replaced with a data-driven model to account for erroneous physics.
Consequently, the model should be more generic and suitable for utilization in a larger range of process conditions. Nevertheless, data-driven models are generally only valid in the domain of the data they have been exposed to. Hence, if the system is exposed to previously unseen process conditions, the hybrid models will likely have to be retrained or recalibrated to adapt to the new data.
There is an abundant number of hybridization options of the mechanistic model. Therefore, only a few of the simplifications and assumptions of the baseline model are investigated. Further, numerous combinations of these simplifications are viable, and for simplicity, only one simplification is considered at the time. Thereby, five hybrid model (HM) variants are developed, each addressing and substituting one of the following simplifications with a DM: 1. The area function, A 2 (u) 2. The upstream gas density function, replacing (4) 3. The adiabatic gas expansion function, replacing (5).
5. An additive error model to capture structural errors of the MM Mathematically, the inserted DM is defined bŷ The HM is defined as a combination of the MM and DM by: where timization. This will be introduced in Section 3. In short, a feed-forward neural network is a collection of L layers, represented with the following equations: At each layer, the inputs are transformed with a linearly affine function with weight matrix W i and bias b i and sent through an activation function a. The rectified linear unit activation function has been used, which is the elementwise maximum operator ReLU(z i ) = max{0, z i }. This results in the neural network being a set of piecewise linear equations. The nonphysical pa-rameters of the network are the collection of weights and biases on all layers

Parameter estimation of hybrid models
Regardless of the location of the model on the gray-scale in Figure 1, the uncertain model parameters should be estimated from data. For a fully mechanistic model, good prior values on the parameters often exist and parameter estimation is not a requirement, although usually a necessity, for high accuracy model predictions. For a fully data-driven model, parameters are initialized randomly and parameter estimation is a requirement. Thus, the latter argumentation applies to hybrid models. Parameter estimation is also referred to as model training.

Maximum a posteriori estimation
Consider a dataset D = {x i , y i } n i=1 with n measurements of the process explanatory variables Assume the process to be described by the following measurement model are the model predictions of the target variable, with model parameters φ ∈ R m and normally distributed measurement i with zero mean and variance σ 2 ,i . Observe that this measurement model can incorporate output measurement from different devices, if available, by changing σ 2 ,i appropriately for measurement i. Even synthetic data generated with mechanistic simulators may be included in this approach.
In parameter estimation problems, the parameters φ of the model h will be inferred using the available data D. This can be done using Bayesian inference where the prior parameter distribution p(φ) is updated to a posterior parameter distribution: Equation (17) includes intractable integrals (Blei et al., 2017) and approximation techniques are commonly required for a numerical solution. In this research, maximum a posteriori (MAP) estimation is applied.
In MAP estimation, only the mode of the posterior distribution is considered and the parameters are found with the following optimization problem: where log p(D | φ) is called the loglikelihood of the model. By further assuming ., m} the following optimization problem may be derived (Bishop, 2006): In short, MAP estimation is a trade-off between minimizing the error between target variable predictions and measurements and minimizing parameter deviation from the prior mean µ. By setting a constant noise level σ 2 = const., Further, regularization must be used to avoid overfitting of the model and ensure adequate generalization performance (Goodfellow et al., 2016).
A different perspective of the MAP estimation problem is that it balances learning from physics and learning from data. With softer regularization, achieved by setting flat, noninformative prior parameter distributions σ i → ∞, the data will have a large influence on the estimation outcome. This is because the regularization terms are down-weighted in optimization. The same effect is achieved with a small noise variance, implying that the measurements are accurate. With harder regularization, the opposite effect is achieved where the physics, in this case, the parameter priors, will have a higher influence on the estimation outcome and the adaption to data down-weighted in the optimization.
For the HMs in Section 2, the MAP objective function is divided into three terms, the MLE and two parameter regularization terms, one each for the physical and nonphysical parameters. In this research, only MPFM measurements are used and thus: Here m 1 and m 2 is the number of physical and nonphysical parameters, respectively.

Priors on the physical parameters
For the physical model parameters, good prior values of the mean µ i,M M often exist. For instance, for freshwater density µ ρw ≈ 1000 kg/m 3 . The parameter variances may be set to reflect the uncertainty in the prior mean value.
If the assumption of normally distributed parameters is exploited, the variance may be approximated using the absolute maximum and minimum values of the parameters and calculating the 6σ band of the distribution, for which the probability of obtaining values outside the band is ≈ 0.03%. For harder regularization of a specific parameter, the variance may be decreased, resulting in a sharper distribution, and the opposite for softer regularization.

Priors on the nonphysical parameters
Finding priors for the nonphysical parameters in the model is not trivial.
However, He-initialization is recommended for neural networks with ReLU as activation function (He et al., 2015). With He-initialization, each element in the weight matrix on each layer W i , i ∈ L (see Section 2) is initialized from a normal distribution with mean and variance However, the the network is trained on synthetic data only and it assumed that the updated prior parameter means are just as uncertain as before. Therefore, the parameter variances in (22) are utilized. If real measurements of the variable existed, such as density measurements, these could be used in the pretraining.

Priors on the measurement noise
In an industrial setting, a common measure of the error of a measurement device is the mean absolute percentage error (MAPE), comparing the measured signal to a known reference value y ref . Following the derivation in Grimstad et al. (2021), the MAPE may be translated into the variance of the measurement noise with where α is the MAPE, for instance α = 0.1 for 10% MAPE. In this study, the reference value is not known and the variance of the measurement noise is approximated by using the available data. Because the MAP estimation in (20) assumes a constant noise level σ 2 = const., the mean value of the measured target variable in the training data is used as the reference value, y ref = 1/n n i=1 y i . As mentioned in Section 3.1, in practice the σ 2 may be adjusted to influence the degree of regularization on the parameters.

Case study
The case study develops the five listed white-to-gray VFM models in Section 2 for 10 petroleum wells on Edvard Grieg (Lundin Energy Norway, 2020).
Edvard Grieg is an asset on the Norwegian Continental Shelf and consists of under-saturated oil without a gas cap. The asset is relatively new where production commenced in 2015. The wells, hereafter referred to as W01-W10, are well-instrumented with available measurements of the explanatory variables defined in (11). An MPFM located in the wellhead of each well provides measurements of the volumetric flow rate. The models are trained with MAP estimation introduced in Section 3 using real, historical production data from the 10 wells.
The number of data samples per well is unequal and spans approximately 1.5-4 years. No additional experimental or synthetic data are considered. For comparison, the Sachdeva model in Section 2, and a fully connected feed-forward neural network, are implemented. Two aspects of the models are investigated.
First, the predictive performance in terms of accuracy is analyzed in Section 4.1.
Thereafter, the scientific consistency is examined in Section 4.2. Considerations for improvements in future work are discussed in Section 4.3.
The datasets for each well are preprocessed in two steps. First, the processing technology in Grimstad et al. (2016), is utilized to generate a compressed dataset of steady-state operating points suitable for steady-state modeling. Secondly, a set of filters are applied to remove data samples that likely originate from erroneous sensor data, such as negative pressures or choke openings. The dataset is split into training and test set according to time to mimic an industrial setting where the developed models are used to predict the future responses of the process. The test set consists of the three latest months of the data samples.
The regularization method early stopping (Goodfellow et al., 2016) is utilized to train the models. This algorithm monitors the error on a validation dataset during model training to find the appropriate number of loops through the training data, called epochs, to train the model without overfitting. The validation data is 20% of the training data, extracted in randomly chosen chunks, each chunk representing data samples from two chronological weeks. Due to the stochasticity of the training algorithm, the early stopping algorithm is run several times, and the average number of epochs is used to train the final model. The optimizer Adam (Kingma and Ba, 2015) is applied with mini-batches, and the learning rate is α = 10 −4 .
An overview of the seven implemented models is found in Table is used to obtain the real parameter value. Here ζ is a small constant to avoid vanishing gradients in the optimization problem.

Predictive performance
In Figure  There are several interesting observations to make. Firstly, the median errors are large for all model types and not at the level with the reported errors in literature, see Section 1. Figure 4 shows that the DM is the only model achieving a median MAPE below 10%, though barely with 9.4%. Secondly, the results indicate that moving the model on the gray-scale from white to gray does improve the average performance significantly, see Table 2. The MM achieves an error of 17.2% against 10.3% for the best HM. However, comparing the HMs to the DM with an error of 10.4%, there is only a small improvement. Thirdly, large variations in performance for the different choke models are observed in Table 2. For instance, for W01, all the model types perform excellently and are on the level with the reported errors in the literature (less than 4% MAPE).
Yet, for W02, the performance is unsatisfactory for all model types. The large differences in performance may also be observed by looking at the cumulative deviation plots in Appendix Appendix A, Figure A.2. This plot shows the percentage of test points that fall within a certain percentage deviation from the true value (Corneliussen et al., 2005). There are several factors that may cause the observed prediction accuracy of the different models. Three of these will be discussed in the following. Section 4.1.1 will focus on the impact model simplifications may have on the accuracy, Section 4.1.2 will elaborate on the task of balancing learning from physics and learning from data, and Section 4.1.3 discusses the likely influence of available data.

The possible impact of model simplifications
First of all, it must be kept in mind that only the production choke valve is modeled, and any effects of the remaining production system on the multiphase flow, such as the wellbore, are disregarded. It is believed that the average predictive performance would improve by modeling a larger part of the production system. Second of all, several assumptions and simplifications are introduced in the baseline mechanistic choke model. Dependent on process conditions, flow regimes, and fluid composition, these may be appropriate to describe the physical behavior of the flow through the choke in some wells but imprecise in others.
For instance, observe how the HM(A 2 ) for W03 has a much better performance than any of the other model types. This may indicate that the mechanistic area function is poorly calibrated for this well in the other model types. For W01, HM(ρ G,1 ) has the best performance and may suggest that the assumption of the real gas law is inadequate. Naturally, these are only indications and the results could benefit from a deeper analysis of the suitability of different hybrid models in different cases.

The nontrivial task of balancing learning from physics and data
With adequate design and training, the HMs were expected to exploit both physics and data to their full extent and thereby perform better than nonhybrid models. Certainly, on a well level, six wells perform better with an HM. However, seen from Table 2, wells W04-W07 perform better with either a mechanistic or a data-driven model. This may cast light upon the nontrivial task of balancing learning from physics and data. The HM may be too simplistic, and consequently, not flexible enough to capture complex physical behavior.
Likewise, the data-driven elements may be erroneously influenced by the data.
Hence, an appropriate approach to control the influence of the mechanistic and data-driven component is yet to be discovered, at least for the white-to-gray hybrid model types investigated in this research.

The influence of the available data
As neural networks have the power to adapt to arbitrarily complex patterns in the data, the large MAPEs seen for many of the DMs may indicate that the quality of the available data is inadequate. Real, historical production data are used in both model training and testing. It is not uncommon that production data are noisy and biased, which complicates the modeling process and may yield an unfair indication of predictive performance for some models. Naturally, different model types or estimation techniques exist which to a greater extent exploits uncertainty in the model parameters and measurements. On the other hand, such methods require specifications of uncertainty that are not easily available, and the resulting models are usually of higher complexity. Further, it is believed that the large error for several of the choke models is mainly caused by the datasets originating from the underlying, nonstationary process. In time with the reservoir being depleted, the pressure in the down-hole will decrease.
If the goal is to maintain a steady production rate, the operators must increase the choke opening. Extracting the test dataset chronologically may therefore result in a set of process conditions that are substantially different from the conditions seen in the training dataset. If so, a steady-state model like the baseline mechanistic model or a standard neural network will not be able to capture the slowly varying, underlying changes. may be inappropriate. It may also discredit the high accuracy prediction potential of the models. Using the developed models to predict the process response only one week ahead greatly increases the accuracy, see the comparison of three months prediction against one-week predictions in Figure 6.

Scientific consistency
One consideration of a model is the performance in terms of accuracy, another is the scientific consistency. Inconsistent physical behaviors may cast doubt about the trustworthiness of the models and cause the generalization abilities to be poor. First, the outputs from the neural networks in the hybrid models are investigated. Figures 7a and 7b shows the output from the neural network in HM(A 2 ) and HM(ρ G,1 ), respectively, as a function of one of the inputs, for three of the wells. The results are diverse. In some of the choke models, the output of the neural network has a trend coherent with the expected physical behavior, illustrated with the mechanistic relation. This is seen for W01. However, notice that some of the other curves go to zero or explode, illustrating scientific inconsistency. This effect has also been observed for the HM(ρ G,2 ) and the HM(ρ). There are two likely explanations for the nonphysical behaviors. Firstly, the behavior may be influenced by the lack of data or erroneous data. For instance, for W03, data are lacking for choke openings greater than 40%. Secondly, due to the high capacity of neural networks, the data-driven part of a hybrid model may capture any modeling error and not just the factor or term the network was intended to represent. For instance, even though the HM(A 2 ) had the best performance for W03 of all models, the area function is not in line with the expected physical behavior. This indicates that the learned neural network area function may have captured other modeling errors than just a poorly calibrated area function.

MM
Additionally, a short sensitivity study is conducted to investigate the scientific consistency of the output of the seven implemented VFM models. The choke models trained on data from W01 are examined for which all models achieved a good performance, see Table 2. Five test points are randomly picked from the test dataset, the choke opening u and the upstream pressure   ering the production choke as an isolated unit without the influence of the rest of the production system, the oil flow rate should be expected to 1) increase with increasing choke opening, and 2) increase with increasing upstream pressure.
The sensitivity study is presented in Figure 8.
Most of the models seem to mimic the expected physical behavior except for the DM, for which the oil flow rate decreases with increased pressure above a certain threshold. This effect is caused by the DM being influenced by the available data to a larger degree than the other model types, and that the available data reflects the behavior of the complete production system and not only the choke. This can be explained in more detail by looking at the correlation plot of the available measurements in the dataset corresponding to W01, see Figure   9. Observe the negative correlation between the oil flow rate q O and the upstream pressure p 1 . By looking at the choke as an isolated unit this correlation contradicts the expected physical behavior. On the other hand, additionally considering the wellbore, the observed correlation has a scientific explanation: increased pressure in the wellhead may result in a decreased pressure drop in

Suggestions for improvements in future work
From the results presented in Section 4 there are several aspects that can be investigated to improve upon both the prediction accuracy and the scientific consistency of hybrid models in future work.
Firstly, only a few simplifications and assumptions are investigated as hybridization options in Section 2.2 although numerous exist. It is likely that other hybrid model types may be better at balancing the task between learning from physics and learning from data. Further, different types of data-driven models or other mechanistic choke models may yield better performances for Figure 9: A visualization of the correlation between the explanatory variables and the target variable measurements in the dataset corresponding to W01.
these wells. There is also the question raised in Section 4.1.1 on the suitability of different hybrid models in different cases. One approach in this direction is to utilize an advanced simulator to generate synthetic data, in which process conditions and other characteristics can be controlled.
Secondly, Section 4.1.3 discussed the influence of the available data on the prediction accuracy and pointed out noisy and biased measurements, together with nonstationary process conditions as influential factors. A future research path is to experiment with different estimation methods or model types that exploits knowledge regarding the uncertainty in parameters and measurements.
Some examples are variational inference as estimation method, state estimation techniques such as the Kalman Filter (Kalman, 1960), or probabilistic models.
In case of nonstationary process conditions, time dependent models may be uti- lized. Yet, such models greatly increase the computational complexity and may not be suitable for real-time applications. Another possibility is online learning, a learning method that may improve upon future predictive performance without adding complexity to the models.
Lastly, in Section 4.2, the scientific consistency of the gray-box models were discussed and several issues raised. Several possible approaches may be investigated to improve upon the scientific consistency. Firstly, a stronger regularization of the priors obtained from the pretrained neural networks could possibly result in the network replicating the mechanistic relation to a higher degree, whilst avoiding capturing other modeling errors. Secondly, the inclusion of additional data-driven elements in a gray-box model, for instance, an error term, could enable the original data-driven element to capture the proposed physics only. Thirdly, the utilization of methods that enables learning from datasets across wells, for instance transfer learning or multitask learning, may positively change the results as more data are exploited.

Concluding remarks
This article contributes towards the development of gray-box virtual flow meters in the petroleum industry. The focus has been on white-to-gray box models where a mechanistic model is used as a baseline and data-driven elements inserted to increase model flexibility. The choke valve of 10 petroleum wells has been modeled using real production data spanning at most four years of production.
The results are diverse with a prediction accuracy is in the range of 1.8%-40.6%, and no recommendations towards the suitability of different gray-box models may be drawn. The results cast light upon the nontrivial task of balancing learning from both physics and data. It is believed that the accuracy is strongly influenced by nonstationarity in the available data. Nevertheless, the results indicate that gray-box models may outperform a mechanistic and a data-driven model if an appropriate balance between the model components is identified. In particular, the gray-box modeling approach seems to increase the accuracy compared to mechanistic models and may improve the scientific consistency compared to data-driven models.
While the gray-box modeling approaches are tested on 10 different wells, these wells, while being fairly typical offshore wells, are hardly representative for all wells. Therefore, a direct generalization of the results to other assets is difficult. Assuredly, the results could benefit from a deeper analysis of gray-box modeling on wells with significantly different characteristics. Furthermore, the research has studied the approach with VFM as application, and generalization to other application areas is inadmissible without further experimentation. On the other side, the gray-box modeling approach itself should apply to any process systems where both physical equations and process data exist.
To this end, the results reported in this study are promising, albeit, the true potential of gray-box modeling is yet to be discovered. For example, hybrid modeling could yield great potential in the small data regime, where data-driven models are known to struggle. Several interesting research directions exist for future consideration. Among these are online learning and multi-task learning.