SlurryNet: Predicting Critical Velocities and Frictional Pressure Drops in Oilfield Suspension Flows

Improving the accuracy of the slurry flow predictions in different operating flow regimes remains a major focus for multiphase flow research, and it is especially targeted at industrial applications such as oil and gas. In this paper we develop a robust integrated method consisting of an artificial neural network (ANN) and support vector regression (SVR) to estimate the critical velocity, the slurry flow regime change, and ultimately, the frictional pressure drop for a solid–liquid slurry flow in a horizontal pipe, covering wide ranges of flow and geometrical parameters. Three distinct datasets were used to develop machine learning models with totals of 100, 325, and 125 data points for critical velocity, and frictional pressure drops for heterogeneous and bed-load regimes respectively. For each dataset, 80% of the data were used for training and the rest 20% for evaluating the out of sample performance. The K-fold technique was used for cross-validation. The prediction results of the developed integrated method showed that it significantly outperforms the widely used existing correlations and models in the literature. Additionally, the proposed integrated method with the average absolute relative error (AARE) of 0.084 outperformed the model developed without regime classification with the AARE of 0.155. The proposed integrated model not only offers reliable predictions over a wide range of operating conditions and different flow regimes for the first time, but also introduces a general framework of how to utilize prior physical knowledge to achieve more reliable performances from machine learning methods.


Introduction
This paper addresses the application of machine learning (ML) methods to making accurate and relevant predictions of slurry flow behavior. Slurries are complex multi-phase systems studied actively from a physical perspective for >70 years. Flow regime prediction is inexact, generally relying on semi-empirical correlations that have been fitted to different data sets, which are expensive and non-trivial to gather. These regime predictions are used to make design decisions for pipelines and other transport applications where errors are costly. In applying ML methods in any mature industrial or scientific field one has two choices: (i) start from scratch with no prior knowledge; (ii) incorporate existing knowledge. This second approach is that used here. Thus below, we review both relevant slurry flow fundamentals and ML applications in this domain.
Pipe flows of slurries are commonly encountered in the mining industry (slurry transport) and in oil and gas well operations: hole cleaning, hydraulic fracturing and gravel packing. Here we deal with slurry applications in well drilling, where the relevance of multi-phase flow has long been recognized [1]. Horizontal slurry flow of sand and a Newtonian/non-Newtonian carrier liquid in a pipe geometry is encountered widely in horizontal wells, and also in gathering and transition lines. In drilling engineering, a variety and give a complete model. The key novelty of our approach is that we work with the known physical structure of slurry flows. First we use dimensional analysis to eliminate redundancy in variables. Second we integrate 2 models to mimic the physical studies: (a) a model to predict the regimes and transition; (b) knowing the regime, we predict pressure drop. This improves the accuracy in a physically consistent way.
An outline of the paper is as follows. Below in Section 2 we outline the dimensional analysis and the development of the features as inputs to our models for critical velocity and frictional pressure drop. Section 3 provides a brief background on the ANN and SVR models, and discuss the corresponding important hyper-parameters for each model that need to be tuned for the propose of training. In Section 4 we introduce our modeling and training approach in detail, specially for developing the integrated model for prediction of the slurry friction factor using our knowledge of the flow regime. Section 5 provides the acquired experimental data from the literature, and the detailed results produced by our model with the comparison against the well-known correlations in the literature.

Dimensional Analysis and Feature Selection
For a solid-liquid Newtonian slurry flowing through a horizontal pipe, we may assume that the steady flow depends on at least the following parameters: the pipe diameter,D, the liquid phase density,ρ l , the solids phase density,ρ s , the liquid phase viscosity,μ l , the particle diameter in the solids phase,d p , gravitational acceleration,ĝ, the mean slurry velocity,Û s , and the mean volumetric concentration of solids in pipe cross section, C v . The last mentioned parameter is dimensionless, whereas the rest are dimensional. Throughout this paper we write all dimensional quantities with a· symbol and dimensionless parameters without.

Critical Velocity
The deposition velocity, also referred to as the critical velocity,V c , is one of the key design parameters for most of the slurry transport systems. It is defined as the velocity, lower than which there exists a stationary bed at the bottom of the pipe. Over the past decades, many researchers have developed empirical and/or semi-empirical correlations and models to predict the critical velocity in pipe geometry. Table 1 lists the suggested correlations of Durand [21], Zandi et al. [13], Yufin [22], Oroskar and Turian [9], and Kokpinar et al. [10]. Table 1. Proposed correlations for the critical velocity.

Researcher Proposed Correlation
Kokpinar et al. [10] aV For the prediction of critical velocity,Û s is replaced withV c whose value is to be derived. The critical depends on the following parameters: Some researchers proposed predictive correlations for obtaining critical velocity in which the particle settling velocity in the mixture,ω m and the viscosity of the mixtureμ m are involved. However, we know thatω m is a function ofμ l ,ρ l , C v , and s, andμ m is a function ofμ l and C v [19]. By performing dimensional analysis on the parameters in (1) we derive the following dimensionless parameters based on which the critical velocity could be predicted:V where δ =d p /D, s =ρ s /ρ l , and Re pw =ρ lωdp /μ l are the diameter ratio, density ratio, and the particle Reynolds number respectively. It should be noted that the Re pw is based on the settling velocity in clear waterω. The functional relationship (2) among the dimensionless parameters, which has four inputs (features) and one output (target), is used to develop the predictive machine learning algorithms.

Frictional Pressure Drop
For the prediction of frictional pressure drop ( dp dẑ ) of the slurry flow, we can introduce two important dimensionless groups, the Froude number (Fr) and Reynolds number (Re), which relate the balances of the representative forces and stresses in a slurry pipe flow: It is possible (and necessary) to define and utilize Re and Fr numbers for prediction of the frictional pressure drop because we have the mean slurry velocityÛ s as an input parameter here, in contrast to the critical velocity prediction task where this parameter is unknown. We also define the slurry friction factor ( f sl ) as the dimensionless parameter obtained from the frictional pressure drop: Therefore, the dimensionless parameters governing the slurry friction factor are as follows: It has also been found that the friction factor f w of the clean water with the same flow parameters is useful for prediction of the slurry flow friction factor. f w can be derived by the Colebrook-White correlation which gives a Darcy-Weisbach friction factor as a function of the Reynolds number and the roughness of the pipe ( f w = f CW (Re, r )). Therefore, we also add f w as an extra feature which potentially helps the model performance in prediction. As could be observed, the functional relationship (6) among the dimensionless parameters, plus f w has six features based on which the target f sl should be predicted.

Machine Learning Methodology
We now briefly outline the background to the methods we have used in this study. For more detail on these methods the reader is referred to [23,24] for SVR modeling and [25,26] for ANN.

ANN Modeling
Artificial neural networks (ANNs) are composed of a large number of interconnected processing elements called neurons or cells, that are tied together with weighted connections. Neural networks are inspired by the system of biological neurons whose connections are provided by synapses [25,27]. Learning process in neural networks occurs in a similar way through training and provision of the true input and output dataset, where the connection weights are iteratively being adjusted to solve the specific problem at hand.
The most widely applied feed forward ANN for supervised regression and classification purposes is multi-layer perceptron (MLP), which consists of an input layer, an output layer, and one or more hidden layer(s) in which each layer has a weight matrix, W and a bias vector, b [26]. Figure 1 illustrates the architecture of an MLP network. Observe that each node in every layer of MLP, including the bias node, is fully connected to all the nodes in the subsequent layer. The number of nodes in the input layer is equal to the number of input parameters. The output layer may also contain more than one nodes, corresponding to the number of predictions the network is responsible for making. However, the number of hidden layers and the number of their nodes are adjustable hyperparamters so that the model satisfies the desired approximation and suitable generalization capability.

Input layer
Hidden layer 1 Hidden layer 2 Output layer i . Subsequently, a nonlinear activation function, ψ(z), is applied element-wise to the vector z [l] to get the final values of all the nodes in layer l contained in the vector a [l] = ψ(z [l] ). If we have m data points in the training batch, we can write the feed-forward equations in matrix formation as: where the matrix A The activation function acts as a mathematical gate between the inputs that are fed to a neuron and its output that is going to the next layer. Non-linear activation functions allow the model to create complex mappings between each layer's input and output, which are vital for learning and modeling complex data [28]. Whereas, using linear activation functions leads to a model as simple as linear regression with a significant under-fitting problem. An additional important aspect of activation functions is that they should be computationally inexpensive. The most common activation functions for MLPs are Logistic sigmoid function, ψ(z) = 1 1+e −z , Hyperbolic tangent function, ψ(z) = e z −e −z e z +e −z , ReLU function, ψ(z) = max(0, z), and Leaky ReLU function ψ(z) = max(0.01 z, z).
The training is an iterative process during which the network tries to "learn" the relationship between the provided input(s) and the corresponding output(s) by altering the weights and biases to achieve a satisfactory prediction within a reasonable error margin. The weights and biases are slightly adjusted during each iteration through the training set until the mentioned task is accomplished. At this stage, the learning process is finished on the training set, and the model is ready to be examined on the unseen data. Iterating once over the entire training set is called one epoch. The optimized number of epochs for the training purpose depend on many hyper-parameters such as the learning rate, the optimization algorithm, complexity of the dataset itself, etc.
The back propagation algorithm is the most popular method for modifying and adjusting the weights and biases in every iteration, in which the difference (error) in the ground truth and the obtained outputs is propagated back to each layer and the weights and biases get adjusted accordingly [29]. The main goal of the back propagation process is that the prespecified loss function is minimized. The most widely used loss function for training a regression problem is the mean squared error: where y and y are the predicted and true outputs respectively. All machine learning models including neural networks are considered to be satisfactory in terms of their prediction(s) in case they perform well on the unseen dataset (test set) which was not used in the training process. In other words, the learning model has a suitable generalization capability if the out of sample error is within an acceptable margin. It is possible that a learning model performs well on the training set, but fails to make accurate predictions on the test set. this issue is referred to as overfitting or variance problem, in which the generalizability of the model is poor. On the other hand, if the model or MLP structure is too simple so that the performance on both the training and test sets are poor, the model has high bias or underfitting problem. To examine a specific trained model, one should perform a cross-validation procedure in which the generalizability of the model is monitored on the data that was not seen by the network during the training. The most suitable model is the one with the lowest cross-validation error. This can be accomplished either via defining another set (development set) or by performing K-fold cross validation [30].
Finding the most suitable model for a supervised learning task is essentially a trial and error process which often involves conducting a grid search over different hyperparameters of the model. For instance, MLP hyperparameters include the learning rate, weight matrices initialization, optimization algorithm, network architecture, regularization parameter, number of training epochs, etc.

SVR Modeling
Support vector regression (SVR) is the most common application form of support vector machines (SVMs). Suppose our training set contains {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x m , y m )} where x i ∈ R n are input variables (features) and y i ∈ R are the corresponding output (target) value. The modeling goal in -SVR is to propose a function f (x) that has a maximum of deviation from the actual target values for all of the input variables in the training set [24,31]. Meanwhile, the obtained function should be as flat as possible to prevent the high variance issue. For this purpose, the loss function is penalized only in case the predicted output deviation from the target is more than . SVR considers the following linear estimation function: where w ∈ R n and b ∈ R denote the weight vector and bias respectively, and . , . represents the dot product in the feature space R n . A viable solution for increasing the flatness of f (x) is to minimize the norm of w. Therefore, we transform to the following convex optimization problem: The optimization problem (11) assumes there exists a function f such that the prediction errors for all the data points are within the margin. However, this assumption might not be always satisfied. One might also consider allowing for some errors by defining a "soft margin" loss function which was primarily adapted to the SVMs by Cortes and Vapnik [32] to prevent overfitting and enhance the generalizability of the proposed function/model. To fulfill this purpose, slack variables ξ and ξ * can be introduced to modify the infeasible constraints of the optimization problem (11). The corresponding formulation is as follows: where the constant C > 0 determines the penalizing degree of the output predictions with the errors larger than , which also regulates the flatness of f . Therefore, the corresponding -intensive loss function |ξ| is described by: It is found that the optimization problem (12) can be solved more simply when converted to its dual formation. For this purpose, Lagrange multipliers can be used as described in Fletcher et al. where the idea is to construct a Lagrange function from the primal objective function and the corresponding constraints, by introducing a dual set of variables. Considering the Lagrangian function and its properties, by performing some mathematical operations (see details in [24]), we arrive at the following dual optimization problem: where α i , α * i are Lagrange multipliers. In practice, only some of the coefficients (α i − α * i ) are non-zero due to the specific character of the quadratic programming problem (14). The corresponding input vectors x i whose coefficients are non-zero are referred to as support vectors (SVs). The SVs could be considered as the data points that represent the information content of the entire training dataset. The final form of the estimation function using (14): From Equation (15) it can be deduced that Figure 2 demonstrates a schematic of the diagram of a non-linear support vector regression using -sensitive loss function. As could be observed, according to (13) if the predicted value of a data point (blue dots) are within the -tube the loss is zero; Otherwise (red dots), the loss is equal to the magnitude of the difference between the predicted value and the radius of the tube . As can be observed in Figure 2 the SVR algorithm tries to situate the tube around the data points with the help of the support vectors (green dots). * + − Figure 2. The diagram of non-linear support vector regression with a soft margin using an -sensitive loss function. The circular dots represent the data points: blue and red dots are the points inside and outside the -tube respectively, and green dots are support vectors.
The true power of SVMs can be achieved by introducing non-linearity to the original algorithm. This could be accomplished by mapping the training data points to another feature space Φ : R n → R k with higher dimension k > n, in which the dot product operation can be substituted by a kernel function, i.e., K(x i , x j ) = φ(x i )φ(x j ). In other words, the kernel function represents the dot product in a higher dimensional feature space. Substituting the kernel function in Equation (15) introduces the mentioned non-linearity to the SVM algorithm: It is important to note that kernel functions ought to have some key specific characteristics so that they correspond to a dot product operation in some other feature space. The two most widely used kernel functions in SVM algorithm are as follows: Gaussian Radial Basis Function : where u, v are kernel arguments, P is degree of the polynomial, and σ is the width of radial basis function (RBF). Similar to ANNs it is important to come up with an optimized set of hyper-parameters for SVR algorithm so that the proposed hypothesis function f offers an acceptable generalization performance and lacks the high bias or variance problem. The tunable hyperparameters of SVR are , C, the kernel type, and the corresponding kernel parameters, e.g., P for the polynomial and γ = 1 2σ 2 for the RBF kernels. Choosing a specific kernel type is usually based on the application domain knowledge and is also required to reflect the distribution of the dataset. C determines the trade-off between the flatness of the hypothesis function and the degree up to which deviations larger than are tolerated. In other words, it also has a regularization effect, such that the smaller the value of C, the more significantly the objective function in (12) is regularized. The parameter defines the radius of a "tube" zone in which the loss function is zero: larger selection leads to a proposed hypothesis function with more flatness and also fewer number of support vectors. Therefore, it can be deduced that both C and affect the model complexity.

Modeling Approach
The purpose of this study is to develop learning models using ANN and SVR algorithms for prediction of the critical velocity and frictional pressure drop of slurry flow in pipe geometry. For critical velocity, we use the four dimensionless features, δ, s, Re pw , and C v developed in Section 2.1 as inputs to develop the above mentioned learning algorithms with satisfactory generalizability. However, for the prediction of frictional pressure drop we also need to understand the effect of the slurry flow regime on the friction factor. Figure 3 shows a schematic of the frictional pressure drop as a function of the mean slurry velocity for different flow regimes. The slurry flow regime is governed by the competition between the turbulent eddies and the particle settling tendency due to gravity. The former tends to suspend the solid particles in the carrier liquid while the latter drives the particles to settle at the bottom of the pipe. The frictional pressure drop of a slurry flow depends on the different existing stresses and forces whose nature and strength strongly depend on the flow regime [6,16]  At low flow rates, the turbulent eddies are not strong enough to suspend the solid phase. As a result, a considerable portion of the pipe is occupied by stationary sedimentation bed above which there is a heterogeneous layer with a recognizable solids concentration gradients. This regime of the slurry flow is also referred to as the bed-load regime. As observed in Figure 3 the frictional pressure drop decreases with the mean veloc-ity in this regime. This is explained by the fact that at low velocities, the stresses and forces are dominated by the solids phase, that are weakened as the slurry velocity increases.
As the mean superficial velocity increases, the turbulent eddies become more capable of suspending the solids until all the static bed layer is eroded, and there is a moving bed layer at the bottom of the pipe whose concentration is close to maximal packing. As the flow rate is further increased, we reach the heterogeneous or fully suspended regime where there is a solid concentration gradient in the direction of gravity. At extremely high flow rates, turbulent eddies become significantly more dominant and the solid phase becomes progressively more homogeneously distributed in the carrier liquid. As shown in Figure 3 the frictional pressure drop increases with the mean velocity through saltation flow, heterogeneous, and homogeneous regimes. Furthermore, the pressure drop increase rate is also increasing at higher velocities, as the liquid phase role becomes more dominant in the suspension stresses.
As noted above, the frictional pressure drop behavior is noticeably affected when the regime changes from the bed-load to saltation flow, i.e., at the critical velocity. Therefore, we can introduce this prior knowledge to our predictive modeling approach. Figure 4a,b shows the work flow chart for developing our predictive models. We develop two separate learning models with satisfactory accuracy and generalization capability for the bedload and heterogeneous flow regimes according to the work-flow chart illustrated in Figure 4a. For this task, we also need to train the two mentioned models with separate datasets representing the corresponding regimes. For examining the generalizability of the developed predictive model for frictional pressure drop, we primarily check what the flow regime is, based on the developed model for critical velocity. Subsequently, we feed the six dimensionless parameters (see Section 2.2) as features to the corresponding predictive learning model for frictional pressure drop prediction. This procedure is illustrated in Figure 4b for further clarification of our integrated method scheme for prediction of the slurry friction factor. Consequently we have a dataset for critical velocity, and two distinct datasets for frictional pressure drop: in bed-load regime and the rest of the regimes.
We develop the most suitable ANN and SVR predictive models for each of the three datasets via grid search among their corresponding hyperparameters. The chosen hyperparameters of ANN for tuning include the architecture of the network, i.e., the number of hidden layer(s) and neurons in each hidden layer, activation function, number of training epochs, and learning rate, and the ones for SVR include C, , kernel type, and kernel parameter (polynomial degree for polynomial function, and γ = 1 2σ 2 for the radial basis function). Then we pick the one with the best validation score as our ultimate proposed model. For the purpose of model development we take 80% of each dataset randomly as the training set and the remaining 20% as the test set. We perform 5-fold cross validation on the training set to examine the generalization capacity of the model on the data that it did not get trained on. The best model with specific sets of hyperparameters is chosen based on this validation score.

Results and Discussion
As the magnitude of the input features are significantly different, the data should be normalized before being fed to the training algorithms. If the inputs are of different scales, the weights connected to the inputs with larger scales will be updated much faster compared to others, which can considerably hurt the learning process. On the other hand, there are also a variety of practical reasons why normalizing the inputs can make training faster and reduce the chances of getting stuck in local optima. We use the standard normalization as follows: where x norm,i is the normalized input of the ith sample, and u train and σ train are the mean and standard deviation of the data points in training set. The output is also normalized in similar way as in (19). Table 2 shows parameters of 100 experimental data points collected from the literature, measuring critical velocity, which we use to train and test our proposed models. Additionally, Figure 5a-e demonstrate the estimation of the probability density function and box plot of all the input features along with the output, which provides insightful information about the distribution and statistical parameters of dataset. Each data point is the result of an experimental test by the listed authors, performed in different flow loop facilities. As can be observed, these experiments cover a wide range of particle sizeŝ d p = 0.23 mm − 5.34 mm, pipe diametersD = 0.025 m − 0.152 m, mean solids concentrations C v = 0.007 − 0.30, and also different density ratios s = 1.04 − 2.68. Most of the data are taken from the measurements conducted by Kokpinar et al. [10] who used coarse particles, with different materials to also see the effect of s on the critical velocity. They used sand, coarse sand, coal, blue plastic, black plastic, fine tuff, and coarse tuff with specific densities of s = 2.60, 2.55, 1.74, 1.20, 1.35, 1.31, 1.04 respectively.
We train and obtain a validation score (loss) on the randomly chosen 80 data points (training set) and report the out of sample results on the remaining 20 data points. Table 3 shows the the optimum hyperparameters for SVR algorithm, Table 4 shows parameters of experimental data points collected from the literature, and Table 5 shows the optimum hyperparameters for ANN algorithms. Tables 3 and 5 show their corresponding validation loss respectively. It should be noted that the validation loss refers to the average mean squared error obtained by the 5-fold cross validation. As observed, the optimum SVR model outperforms ANN in terms of the validation score and hence the generalization capability. Therefore, the SVR model is chosen as the ultimate prediction model for critical velocity.   Table 6 shows the performance of the chosen model on training and test sets, in terms of the average absolute relative error (AARE), the cross-correlation coefficient (R), and the standard deviation of error (σ). We can compare the proposed model performance against the most widely used predictive correlations in literature brought in Table 1. The out of sample average absolute relative error of these models are 0.099, 0.153, 0.308, 0.322, 0.412, and 0.447 corresponding to the prediction of the proposed SVR model, Kokpinar et al. [10], Oroskar and Turian [9], Durand [11], Yufin [22], and Zandi et al. [13] respectively. It is evident that the prediction error of critical velocity has reduced considerably in the present work. Figure 6 shows the parity plot of the experimentally measured and predicted results of the dimensionless critical velocity for the training and test sets with the AARE of 0.073 and 0.099 respectively.
We have also directly compared the performance of our model with that of Kokpinar et. al. [10] with their own 42 experimental data points. Figure 7 shows the parity plot of the corresponding predictions versus the measured dimensionless critical velocities. The AARE of estimations are 0.142 and 0.062 for Kokpinar et al. [10] and present models respectively. As observed in Figure 7 the present model performs better in particular wherê V c √ĝD > 1.8.   Figure 8 illustrates the effect of hyperparameter C on the loss function (mean squared error) of the training, validation, and test sets. As was mentioned in Section 3.2, C determines the trade-off between the flatness of the hypothesis function, and the degree up to which deviations larger than are tolerated in SVR algorithm. In practice it also has regularization effect such that the lower the value of C, the more the objective function is regularized. As seen in Figure 8 there is an optimal C where the loss function is minimized in validation and test sets, lower than which the hypothesis function suffers from high bias (under-fitting) and higher than which it suffers from high variance (over-fitting). Obviously, the values of all hyperparameters including C are chosen based on the validation score.  Table 4 shows parameters of experimental data points collected from the literature, measuring frictional pressure drop for heterogeneous and bed-load regimes, which we use to train and test our proposed models. The total number of experimental data points are 365 and 125 for the heterogeneous and bed-load regimes respectively. As can be observed, the experiments mostly used fine particles except for the Doron et al.'s data [38] and part of Durand's measurements in bed-load regime [5], where particle sizes ofd p = 0.23 mm and 5.34 mm were used respectively. Pipe diameters of the rangeD = 0.051 m − 0.155 m were used in the experiments with flow velocity range ofÛ s = 0.24 m/s − 7.77 m/s, and mean delivered solids concentration of C s = 0.042 − 0.40. Most of the experiments were conducted using sand particles with the density ratio of s = 2.44 − 2.87 except for Doron et al.'s work where General Electric "Black Acetal" with the density ratio of s = 1.24 was used [38]. Figure 9a-g show the kernel density estimation and box plot of all the input features along with the output for both heterogeneous and bed-load regime datasets. An interesting observation is that the distribution of Fr and f sl are considerably different comparing the two regimes. The reason is that according to (4) and (5) both of these dimensionless variables include the termÛ 2 s in their equations, and we know that the mean slurry velocity in the bed-load regime is less than that of the heterogeneous regime. Therefore, Fr is considerably lower while f sl is larger in bed-load regime compared to the heterogeneous regime.  (e) (f) (g) Figure 9. Kernel density distribution and box plot of all the input features (C v , s, Re, Fr, δ, f w ) and the output ( f sl ) (a-g, respectively) for the slurry friction factor prediction model in heterogeneous and bed-load regimes.
Similar to the critical velocity case, we randomly take 80% of the dataset for the purpose of training and validation, and the rest 20% as the test set for evaluating the out of sample performance. As can be observed from Tables 3 and 5 the most efficient developed ANN models are outperforming SVR for both heterogeneous and bed-load regimes. Figure 10a,b show the parity plot comparing the measured and predicted slurry friction factor for both regimes. The corresponding out of sample results are illustrated in Table 6.  For a fair comparison against the existing correlations and models from literature, we also need to investigate the integrated method performance in terms of predicting the frictional pressure drop. In other words, we would like to determine the out of sample error where we ignore the prior knowledge of the flow regime, which can be the case in real-life scenarios specifically for industrial applications. To serve this purpose, we feed each data point to the developed SVR algorithm for critical velocity prediction, and compare the predicted result with the mean slurry velocity as a means to identify the regime. For this process the key assumption is C v = C s at the critical velocity which is a reasonable assumption to make. After the regime identification, we feed the data point to the corresponding model for predicting the frictional pressure drop. The out of sample results for integrated method prediction is shown in Table 6.
Once again, we can compare the out of sample AARE against that of some recognized correlations and models available in literature for predicting the pressure drop. For slurry friction factor prediction in heterogeneous regime, the AARE of the correlations developed by Zandi and Govatos [13], Durand and Condolios [5], and Turian and Yuan [6] are 0.643, 0.449, and 0.348 respectively, whereas for the bed-load regime, the AARE of the proposed models by Gruesbeck et al. [14], Penberthy et al. [15], and Turian and Yuan [6] are 0.837, 0.769, and 0.529 respectively. It is clear that the prediction performance of the current study with AARE of 0.084 significantly outperforms that of the mentioned models. Figure 11 illustrates the effect of the epoch number, a key hyperparameter for ANNs, on the loss function of the training, validation, and test sets for the heterogeneous regime ANN model. As can be observed, there is an optimal epoch number for training, after which the validation loss starts to increase. In other words, after around 400 training epochs the model is over-fitting on the training dataset. To investigate whether the proposed integrated method is indeed required for the purpose of a satisfactory prediction for frictional pressure drop, we have also performed a batch train using all of the 490 frictional pressure drop data points, without any supervised or unsupervised classification based on the flow regime. We trained and tested another learning model under the mentioned condition with the similar procedure as other developed models. Tables 3 and 5 show that the SVR model performance is more satisfactory compared to ANN in terms of generalization capacity. Figure 12 illustrates the corresponding parity plot for the measured slurry friction factor against the predicted values.
For comprehending and comparing the performance of the batch-trained model with the integrated method, the corresponding parity plots are shown in Figure 13a,b. Figure 13a shows the measured and predicted slurry friction factor for the integrated method. As could be observed, there are four heterogeneous data points whose regime was incorrectly classified as bed-load (blue squares), and three bed-load data points with false classification. As shown, the predicted slurry friction factor for misclassified heterogeneous data points tend to be higher than the measured value, whereas the reverse is true for the misclassified bed-load data points. The reason is that generally, the value of slurry friction factor in bed-load regime is more than that of the heterogeneous regime. However, the out of sample results of the integrated method is more satisfactory compared to the batch-trained model, with the AARE of 0.084 for the former and 0.155 for the latter, as shown in Table 6. Consequently, it can be comprehended that although the integrated method prediction highly relies on the performance of regime classification, i.e., the SVR model for critical velocity prediction, it is considered to be more efficient in practice to predefine a regime classification method, such as the one accomplished in this work, prior to feeding it to the model for satisfactory prediction of the friction factor.

Summary
We have developed a robust integrated method using ANN and SVR algorithms for prediction of critical velocity and frictional pressure drop by identifying and implementing existing knowledge of the main slurry regimes. The proposed model clearly outperforms the estimation of existing well-known and widely used correlations and models for the prediction of critical velocity and frictional pressure drop. Furthermore, it overcomes the limitations of previous machine learning models which only targeted the estimation of frictional pressure drop in the heterogeneous regime.
The features have been extracted based on dimensional analysis of geometrical and flow parameters that are involved in the governing equations of a solid liquid slurry flow in pipe. This ensures that we preserve all the data information with the least input dimension, which is one of the main goals in developing machine learning algorithms and other methods for prediction. Indeed this is a relatively simple step that can be taken for any physical/mechanical scenario. Additionally, we have shown that the slurry friction factor estimation noticeably improves with regime classification before feeding the data to the developed model.
One of the limitations of the proposed integrated method is that its accuracy highly relies on the regime's classification performance. However, the overall prediction accuracy can be improved by ensuring that the data used for training the critical velocity and frictional pressure drop models are provided from the same distribution. Another limitation of this study is the limited number of data points available for efficient training of the proposed models. Additionally, using more complex machine learning methods along with more available data can be considered as a suitable future task.
In general, the message of the paper is that one should not discard old methodologies in assuming that new machine learning algorithms will automatically solve all problems. The challenge in industrial applications where we need to predict important variables, is to integrate new predictive methodologies with the old and with our prior physical knowledge/know how. In this respect our results are promising in showing significant advance in predictive abilities with a small investment in dimensional analysis. Funding: This research has been carried out at the University of British Columbia, supported financially by NSERC and Schlumberger through CRD project 505549-16, and by UBC through the 4YF programme.