Modeling friction factor in pipeline flow using a GMDH-type neural network

: The standard methods of calculating the fluid friction factor, the Colebrook– White and Haaland equations, require iterative solution of an implicit, transcendental function which entails high computational costs for large-scale piping networks while introducing as much as 15% error. This study applies the group method of data handling to the development of an artificial neural network optimized by multi-objective genetic algorithms to find an explicit polynomial model for friction factor. We developed a relatively simple and explicit model for friction factor that performs well over the entire range of applicability of the Colebrook–White equation: Reynolds number from 4,000 to 10 8 with relative roughness ranging from 5 × 10 −6 to 0.05. For a network with only two hidden layers and a total of five neurons, this model was found to have a mean relative error of only 3.4% in comparison with the Colebrook–White equation; a determination coefficient ( R 2 ) over the range of input data was calculated to be 0.9954. The accuracy and simplicity of this model may make it preferable to traditional, transcendental representations of fluid friction factor. Further, this method of model development can be applied to any pertinent data-set—that is to say, the model can be tuned to the physical situation and input data range of interest.

ABOUT THE AUTHOR Saeb M. Besarati earned his bachelor and master's degrees in mechanical engineering from University of Guilan, Iran. He earned his PhD in chemical engineering from the University of South Florida (USF) in 2014. The focus of Dr. Besarati's work has been the use of advanced numerical techniques to solve complex engineering problems in the field of clean energy research. He conducted his doctoral studies at the USF Clean Energy Research Center under the guidance of Dr. Yogi Goswami. His doctoral research concerned the analysis of supercritical carbon dioxide power cycles for concentrated solar power applications. During his PhD study, he contributed in writing two book chapters and ten technical papers. He has been a lecturer for courses in the Department of Mechanical Engineering at the Azad University of Takestan and the Department of Chemical Engineering at USF. He is currently an engineering analyst with the firm eSolar.

PUBLIC INTEREST STATEMENT
In this paper, we analyze a classical engineering problem-that of friction in pipeline flow-using the comparatively modern techniques of artificial neural networks and genetic algorithms. In the process, we obtain a set of simple (i.e. nontranscendental) equations that give excellent agreement with the Colebrook-White equation. The method can also be tailored to experimental data for more specific applications.

Introduction
The flow of real fluids (e.g. through pipes, ducts, etc.) involves the loss of mechanical energy through the friction that accompanies flow-induced shear stresses. The magnitude of this loss depends on the viscosity of the fluid(s) in question as well as the size of the fluid flow system. Often, this loss can be quite severe, and calculations to determine the degree of viscous friction become a necessary part of the design, implementation, and control of engineered applications of fluid flow. In process control especially, such calculations must be repeated continually, and for systems of substantial size (e.g. large-scale piping networks) they constitute a high computational cost.

Colebrook-White equation
A common and comparatively simple context for these fluid flow analyses is fully-developed Newtonian flow in pipes or ducts. Through dimensional analysis, it is possible to express the viscous losses as a function of a dimensionless friction factor-as a matter of convention, fluid mechanics texts typically choose the Darcy friction factor, f, which is a function of Reynolds number, Re, and the relative roughness of the pipe, ϵ/D. In such a way, the loss of mechanical energy, or head loss, h f , as it is termed when expressed as an equivalent height of fluid, can be calculated with the Darcy-Weisbach equation (White, 1998).
Here, L represents the length of the pipe or duct, D is the hydraulic diameter, V is the average flow velocity, and g is the acceleration due to gravity. It should be noted that this equation is valid for any flow regime or pipe/duct geometry, provided the hydraulic diameter and the Darcy friction factor are appropriately derived.
The straightforward nature of the calculation of head loss for a given friction factor (i.e. Equation 1) belies the complexity of the calculation of the friction factor itself. While, in the laminar case (Re < 2,000), the friction factor becomes a rather simple function of Reynolds number (White, 1998), turbulent flow requires more complicated, typically transcendental functions to capture the variation of the friction factor with respect Reynolds number and relative roughness. The most broadly applicable function for determining the friction factor in turbulent flow regimes is the Colebrook-White equation, developed by the eponymous researcher, Colebrook (1939).
This expression is known to apply over a range in Reynolds number of approximately 4,000-10 8 (White, 1998).
It is clear that, owing to the implicit nature of the friction factor in Equation 2, iterative methods are required to calculate the friction factor for a given Reynolds number and relative roughness. Although approximations of the Colebrook-White equation that are explicit in Darcy friction factor f have been formulated, they tend to apply to specific flow scenarios and/or pipe geometries, and they introduce additional error. The Haaland equation (Equation 3), for example, assumes the same basic form of the Colebrook-White, but is explicit in terms of the friction factor; it is among the more accurate approximations, varying less than 2% from the behavior of Equation 2 (Haaland, 1983;White, 1998).
It should be noted that the Colebrook-White equation itself is known to introduce a degree of error, perhaps as high as 15%, over its range of applicability (White, 1998). Of course, this error applies with respect to the real (i.e. experimentally quantified) behavior of fluids. The error in the Haaland (2) Re f 1 2 (3) on the other hand, applies with respect to the performance of the Colebrook-White equation, thereby compounding the overall error in design calculation, potentially to a degree greater than 15% of the real value.

Calculation of friction factor via neural networks
For calculations concerning smaller data-sets, the iteratively solved Colebrook-White equation will generally suffice. However, a review of recent literature demonstrates the need for more efficient methods of computation, especially for larger, more complex piping networks, such as large-scale micro-irrigation systems (Shayya & Sablani, 1998), combined sewer systems (Haaland, 1983), etc. In particular, much work has been done to employ artificial neural networks (ANNs) and evolutionary algorithms (EAs) to calculate pipeflow friction factors. Indeed, such numerical techniques proved to be a powerful tool in solving non-linear, transcendental, or otherwise complex mathematical relations such as the Colebrook-White equation, as they allow for the modeling of perhaps mathematically intractable phenomena with little or no a priori knowledge of the mathematical characteristics of the physical processes that inhere therein.
An ANN is typically structured in three distinct parts: the input layer, an output layer, and any number of hidden layers between the input and output. Each of these layers is made up of neurons, and each neuron is interconnected through adjacent layers. An ANN is developed via "training" with a data-set, allowing the network to adjust the relative weights of the connections between each neuron such that an output of acceptable error is produced (Fausett, 1993). In the case of the Colebrook-White equation considered in this study, the network is made up of two input neurons, corresponding to each input (Reynolds number and relative roughness), an undetermined number of neurons in the hidden layers to operate on the input data, and one output neuron (the desired calculable quantity, the friction factor). The input or output data used to train the network may be transformed in any number of ways to improve the performance of the resultant model-for example, the base-ten logarithm of the Reynolds number may be chosen as an input neuron in lieu of the Reynolds number itself, as described in the methodology section. Of course, additional input neurons may be introduced by considering the individual parameters that constitute the Reynolds number (i.e. fluid velocity, effective diameter, and kinematic viscosity) (Mittal & Zhang, 2007), though such a complication was deemed unnecessary for the purposes of this study.

Literature review
A significant amount of effort has been expended on the application of ANNs to the problem of fluid flow friction factor determination in both Newtonian (Bilgil & Altun, 2008;Fadare & Ofidhe, 2009;Mittal & Zhang, 2007;Özger & Yildirim, 2009;Shayya & Sablani, 1998;Yazdi & Bardi, 2011;Yuhong & Wenxin, 2009) and non-Newtonian (Mittal & Zhang, 2007;Sablani, Shayya, & Kacimov, 2003;Shayya, Sablani, & Campo, 2005) flows. For reference, a summary of these studies in included here as Table 1. Inasmuch as the Colebrook-White equation assumes Newtonian flow, this study will focus primarily on the work dealing with such flow behavior. While in some cases, the data range for Reynolds number employed in ANN development was outside the acceptable range for the Colebrook White equation (Fadare & Ofidhe, 2009;Shayya & Sablani, 1998), very close agreement was obtained for sufficiently large networks and appropriate transformation of input and/or output parameters. In one particular study, an attempt was made to fit the friction factor behavior in both the laminar and turbulent flow regimes, though the inclusion of points in the highly indeterminate transition region introduces some uncertainty as to the reliability of the model and its ostensibly low error (Mittal & Zhang, 2007). Another study performed a model fit of the Haaland equation (Equation 3) with very high accuracy, although, here again, points were included in the training of the network that fall outside the realm of acceptability for such empirical relations (Yazdi & Bardi, 2011). The work performed in this field demonstrated that rather complex ANNs achieve a high degree of accuracy in modeling experimental data (Yuhong & Wenxin, 2009), the Colebrook-White equation (Fadare & Ofidhe, 2009;Mittal & Zhang, 2007;Shayya & Sablani, 1998), and other empiricisms of Newtonian flow (Bilgil & Altun, 2008;Mittal & Zhang, 2007;Yazdi & Bardi, 2011;Yuhong & Wenxin, 2009). No explicit relations for the friction factor were put forth in these works. There is a tendency in ANN-based modeling to develop the network model as a "black-box" (Tufail & Ormsbee, 2006)-a complex, abstruse system of functions that convert input data to desired results, oftentimes quite accurately, yet with little innate understanding of the mathematical machinery of the model. For the problem considered in this paper, it is clear that such a complex model defeats the purpose of approximating the Colebrook-White equation in the first place. Equation 3 is acceptable in terms of approximating the Colebrook-White equation, although its transcendental nature makes it less desirable, in terms of computational effort, than a simple polynomial (Davidson, Savic, & Walters, 1999). However, certainly a simply stated transcendental function is more tractable than a complex system of intermarried polynomial functions, such as those often output by proprietary neural network software packages. The maxim of model-parsimony may suggest the choice of the simpler and slightly less accurate over the more accurate and more unwieldy, a balancing act described in the non-Newtonian flow ANN modeling of Shayya et al., 2005). For more information on the black box method employing software packages for ANN development, the reader is encouraged to peruse the pertinent references (Fadare & Ofidhe, 2009;Mittal & Zhang, 2007;Shayya & Sablani, 1998;Yazdi & Bardi, 2011).
Also of unique interest in this field is the combination of fuzzy logic with ANNs in model development. An adaptive neuro-fuzzy inference modeling system involves the use of stochastic membership functions and conditional statements with associated model functions to fit real values of interest from pseudo-qualitative assessment of input data (e.g. low, medium, and high). Such a system was shown to have near perfect agreement with the Colebrook-White equation over the Reynolds number range of interest, 4,000-10 8 (Özger & Yildirim, 2009). Unfortunately, the multitude of fuzzy rules and attendant modeling functions render this method a black box, as well.
EAs present a potential improvement when wedded with conventional ANN techniques (Onwubolu, 2009). The idea behind the use of such genetic algorithms in this context is that models or functional interactions that produce the best fit to the training data have the best chance of reproducing and therefore shaping the final model system. Already, genetic algorithms, without the use of neural networks, have been employed to fit the Colebrook-White equation. In one study, standalone polynomial functions of varying complexity were developed with such a technique, and most were shown to yield a good fit (Davidson et al., 1999). This work was extended in a subsequent study that utilized a fixed function set genetic algorithm that considered a wider variety of functional forms in its model development (Tufail & Ormsbee, 2006). Explicit functions for the Colebrook-White equation were presented in both cases; however, the models developed applied only over a relatively narrow range of Reynolds number and relative roughness (10 5 ≤ Re ≤ 10 6 ; 0.001 ≤ ϵ/D ≤ 0.01). A wider range of Reynolds number and relative roughness (4 × 10 4 ≤ Re ≤ 10 8 ; 10 −6 ≤ ϵ/D ≤ 0.05) was considered in the gene expression programming (GEP) analysis of Samadianfard (Samadianfard, 2012). The explicit expression obtained in that study compared well with the various approximations described therein; however, a separate study highlighted the potentially high errors obtained with its use and also suggested similar expressions with improved accuracy (Vatankhah, 2014). It should be noted that the GEP techniques referenced here, while generally related to genetic algorithms, make no use of ANNs.

Investigation objectives
The main goal of this study is to apply the group method of data handling (GMDH) ANN, optimized by multi-objective genetic algorithms, to find an explicit model for friction factor. This methodology has been proven effective in the modeling and prediction of complex and non-linear processes such as explosive cutting, a variable valve-timing spark-ignition engine, and others (Atashkari, Nariman-Zadeh, Gölcü, Khalkhali, & Jamali, 2007;Onwubolu, 2009). The quasi-Monte Carlo method of Hammersley (Press, Teukolsky, Vetterling, & Flannery, 2007) will be used to generate an input dataset of random yet uniformly spaced points over the entire range of applicability of the Colebrook-White equation: Reynolds number from 4,000 to 10 8 and relative roughness from 5 × 10 −6 to 0.05. The Colebrook-White equation shall be solved with non-linear solution methods to obtain the friction factor values for use in training and testing the neural network. Ultimately, the development of this GMDH-type ANN shall yield an explicit polynomial model for friction factor to be presented herein.

Construction of the neural network
By means of the GMDH algorithm, a model can be represented as a set of neurons in which different neuron pairs in each layer are connected through a quadratic polynomial and thus produce new neurons in subsequent layers. Such representation can be used in modeling to map inputs to outputs. The formal definition of the identification problem is to develop a function, f , that approximates the actual function, f, to a sufficient accuracy to predict the actual output y for a given input vector X = (x 1 , x 2 , … , x n ) as ŷ. Therefore, given M observations of multi-input, single output data pairs, we define the system in the following manner.
It is possible to train a GMDH-type neural network to predict the output values for any given input vector, as illustrated in Equation 5.
The problem is now to determine a GMDH-type network so that the square of the differences between the actual output and the predicted one is minimized, that is: The general connection between the inputs and the output variables can be expressed by a discrete form of the Volterra functional series, as follows.
This form of the series is known as the Kolmogorov-Gabor polynomial (Farlow, 1984). This full form of mathematical description can be represented by a system of partial quadratic polynomials consisting of only two input variables (neurons), in the following form.
Such partial quadratic description is recursively used in a network of connected neurons to build the general mathematical relation of the inputs and output variables. The coefficients a i in Equation 7 are calculated using regression techniques (Farlow, 1984;Iba, DeGaris, & Sato, 1996;Ivakhnenko, 1971), so that the difference between the actual output, y, and the calculated one, ŷ for each pair of x i , x j as input variables is minimized. It can be seen that a tree of polynomials is constructed using the quadratic form, Equation 7, whose coefficients are regressed in a least-squares sense. In this way, the coefficients of each quadratic function G i are obtained to fit optimally the output in the whole set of input/output data pairs.
In the basic form of the GMDH algorithm, all the possibilities of two independent variables out of the total n input variables are taken in order to construct the regression polynomial (Equation 7) that best fits the dependent observations (Jamali, Nariman-zadeh, Darvizeh, Masoumi, & Hamrang, 2009;Nariman-Zadeh, Darvizeh, & Ahmad-Zadeh, 2003;Nariman-Zadeh, Darvizeh, Jamali, & Moeini, 2005). In this paper, the general structure of GMDH-type neural network (GS-GMDH) (Jamali et al., 2009;Nariman-Zadeh et al., 2005) is used for modeling and prediction of the friction factor. In the GS-GMDH ANN, neuron connections can occur between different layers that are not necessarily immediately adjacent, unlike the conventional structure GMDH (CS-GMDH) neural networks in which such connections only occur between adjacent layers. Consequently, this generalization of the network's structure can extend the performance of GS-GMDH neural networks in modeling of real-world complex processes. (4) Genetic algorithms can be used to derive the optimal connectivity configurations of such GMDH networks. The genome or chromosome representation, which depicts the topology of a GMDH-type neural network, consists of a symbolic string composed of alphabetic representation of input variables. In this coding scheme, each input variable is assigned an alphabetic name and the chromosome is a string consisting of the concatenated sub-strings of these names. In other words, for a given input vector X = (x 1 , x 2 , … , x n ), a chromosome can be represented as a string of the concatenated symbols of i ∈ {a, b, c, d, …} in the form of choromosome = ( 1 , 2 , … , i , ...), where "a," "b," "c," etc., stand for alphabetical name of inputs x 1 , x 2 , x 3 , etc., respectively.
The same procedure for defining chromosomes described by Nariman-Zadeh et al. (2005) can now be readily modified to apply to GS-GMDH networks. This modification is accomplished by repeating the name of the neuron which directly passes the next layers. For example, a network structure, as depicted in Figure 1, exhibits such a connection between neuron "ad" and the output layer. It is clear that neuron "ad" in the first hidden layer is connected to the output layer by directly bypassing the second hidden layer. Therefore, the name of the output neuron includes "ad" twice as "abbcadad." Essentially, a virtual neuron named "adad" has been constructed in the second hidden layer and used with "abbc" in the same layer to make the output neuron "abbcadad." It should be noted that in this coding method such repetition occurs whenever a neuron passes adjacent hidden layers and connects to a neuron in subsequent layers. The number of repetitions of that neuron's signifier depends on the number of passed hidden layers, ñ, and is calculated as 2 ñ . It is evident that a chromosome such as "abab bcbc," unlike chromosome "abab acbc," is not valid in GS-GMDH networks, and therefore it should be rewritten as "abbc."

Application of the network to the Colebrook-White equation
For the purposes of finding an explicit form of the Colebrook-White equation, it is sufficient to consider two inputs-the Reynolds number and relative roughness-and one output-the friction factor. As suggested by multiple prior studies Shayya et al., 2005) and independently verified by this study, transforming the inputs with a base-ten logarithm greatly improves the accuracy of the resultant ANN model. As such, the two inputs used in developing this ANN model were the base-ten logarithms of Reynolds number and relative roughness-i.e. log 10 Re and log 10 (ϵ/D). Also, a variety of transforms for the friction factor output is essayed, including a base-ten logarithm and several negative rational exponents between zero and one. It was determined through analysis of the resultant models' coefficients of determination (compared to the Colebrook-White equation) that the most accurate networks were obtained when the friction factor output was transformed by raising it to the power of negative one-half. As such, the output used to train and test the network was the inverse of the square root of the friction factor, 1∕ √ f .
These inputs and outputs are not unexpected, as they mimic the form of the Colebrook-White and Haaland equations (Equations 2 and 3).
In addition to strategic transformation of the input and output data, a sufficiently large and appropriately spaced sampling set of input variables is necessary to ensure that the training and testing of the neural network results in an accurate model. To achieve adequate distribution over the range of input variables, Hammersley sequence sampling (Press et al., 2007) was employed to generate 250 sample points between 0 and 1 in two dimensions, and these fractional values were applied over the individual variable ranges to create the input data sampling set. The range of inputs considered, prior to logarithmic transform, was as follows: Reynolds number from 4,000 to 10 8 and relative roughness from 5 × 10 −6 to 0.05. It was determined that using such a sampling method will allow for greater accuracy with fewer training/testing points. The sampling set is shown in Figure 2.
The output data for training and testing the network were obtained by non-linear solution of the Colebrook-White equation for each of the 250 data points generated by the Hammersley sampling algorithm. A variant of the Powell dogleg method was used to solve the non-linear, transcendental equations at each data point; thereafter, the friction factor data was transformed into the appropriate ANN output, as described previously. To train the network, a total of 125 points of the sampling set were used-that is, an equal number of points were used to train and then test the network. To prevent either the training or testing sets from becoming biased to any particular region of the input data range, the 250-point Hammersley sampling set was randomized prior to equal separation into training and testing sampling sets. In the next step, genetic algorithms were used to find an optimal structure of the GMDH-type neural network to yield an explicit equation for friction factor. Two hidden layers were ultimately settled upon for the general structure of the network: while three hidden layers provided some improved accuracy, the reduction in error in that case was deemed too slight to justify the additional complexity of the model.

Error analysis
In comparison with the Colebrook-White equation and other efforts made in this field of study, the following error analysis metrics were used.
Here, f p is the predicted value from the ANN, f d is the desired value from the iterative solution of the Colebrook-White equation, and n is the total number of data points-in this case, 250. R 2 is especially important here, as it measures the accuracy of the overall fit of the predicted data to the desired outputs. A value of unity indicates perfect correlation. For the mean-relative and mean-absolute errors (MRE, MAE), the greater the quantity, the greater the disagreement between the predicted and desired values. The MRE, incidentally, provides a more objective quantification of error than the MAE, as it normalizes the measured error to the mean value of the dataset in question.

Results and discussion
The input-output data-set was used to train a GMDH-type neural network to find a polynomial model of friction factor in terms of the input variables. In order to genetically refine this GMDH-type ANN, a population of 40 individuals in 120 generations with a crossover probability of 0.95 and a mutation probability of 0.01 was used; no additional improvement was observed for larger population sizes. The objective functions to be minimized simultaneously were the prediction and test error functions; the final trade-off structure of GMDH-type neural network was optimally selected by a genetic algorithm that minimized these two objective functions. The resultant ANN structure is shown in Figure 3, where "a" and "b" represent log 10 (Re) and log 10 (ϵ/D), respectively.
The corresponding polynomial representation of the model is as follows. Figure 4 shows the ability of the polynomial function to predict friction factor with a high level of accuracy. In Figure 4(a), we see the model function's behavior relative to that of the Colebrook-White equation over the entire 250-point data-set, with most deviation observed for higher values of the friction factor. In Figure 4(b), selected constant values for relative roughness were used to reproduce a simplified version of the classical Moody diagram: the performance of the ANN model is shown in solid and dashed lines with data points from the Colebrook-White equation overlain. Again, we see greater deviation at (10) y 2 = −0.264590878809085 + 0.104548473076918a + 0.991211881271789y 1 + 0.027044863177819a 2 + 0.036354915327913y 2 1 − 0.075133478936023ay 1 y 3 = 0.049539595715222 − 0.574835644104601b + 0.741634220598117y 2 − 0.096991484123168b 2 + 0.016506184006448y 2 2 − 0.006996298079579by 2 f = 1 y 2 3 Figure 3. Optimal structure of GS-GMDH network for modeling of friction factor.
higher friction factor values. Also, it seems the model is more accurate (compared to the Colebrook-White equation) for lower relative roughness values and higher Reynolds number values.
In the review of previous work on this topic, the authors noted several statistical measurements that were common to many of the publications. Three of these statistical tools were selected to compare the ANN model developed herein to previously developed ANNs. Table 2 shows the error measurements and complexity of the various neural networks. It is clear from this table that the model developed in this study has the simplest structure among those considered, yet it maintains a high degree of accuracy. Only one other study produced a polynomial equation as an approximation for friction factor. Using a related method of genetic programming, Davidson et al. produced a 14-term polynomial in order to explicitly determine f (Davidson et al., 1999). Further, they found that this polynomial performed 33% better than the explicit Haaland equation; in terms of computational cost, it was simpler for computers as such a model requires no transcendental functions, unlike Haaland or any other explicit approximation of the Colebrook-White equation (Davidson et al., 1999). However, Davidson's polynomial applies to a much narrower range of both Reynolds number and relative roughness, 10 5 -10 6 and 0.001-0.01, respectively.
This study considered the entire range of applicability of the Colebrook-White equation: 4,000-10 8 for Reynolds number; 5 × 10 −6 -0.05 for relative roughness. The GS-GMDH offers a compromise on both fronts, whereby a polynomial can be given that applies to a realistic, applicable range for input variables, while still avoiding transcendental, large ANNs, and complex computations. Further, as is evident from Table 2, a good fit is obtained with acceptable error for a very simple network and system of model equations.

Conclusions
A GMDH-type neural network model was designed to predict friction factor from Reynolds number and relative roughness. The minimum error was achieved by using base-ten logarithms of the inputs, Reynolds number and relative roughness, and the inverse square root of the output, friction factor, as the input and output, respectively, used to train and test the neural network. Using Hammersley sequence sampling method decreased the quantity of input data to only 250 points, from which 125 were used for training and 125 were used for testing the neural network. A genetic algorithm was used to refine the model based on the minimization of two objective functions, the training and testing error. The regressed polynomial representation of friction factor showed a low percentage of error despite its simplicity. The value of the numerical methods used in the development of this model lay in their ability to derive a relatively simple system of equations with relatively low computational cost-lower volume of training/testing data, smaller population sizes and fewer generations in the EA, etc.
The simplicity of the ANN model developed here is one of its primary strengths. Transcendental functions, such as those seen in the Colebrook-White equation and many of its approximations, carry a high computational cost. For applications such as large-scale piping networks, which involve high volume of computation by their very nature, or for continuous computation applications, such as process system controls, the use of a model such as that presented here would improve the speed of the computations involved in calculation of the friction factor (e.g. for real-time determination or estimation of pressure drop in pipe flow).
Another strength of this method is its inherent tunability. Note that while this model showed a 3.4% relative error to the data with which it was trained and tested, the Colebrook-White equation  (2009) 2-6-9-9-1-1 28 -0.0068 -