Artificial Intelligence Based Methods for Asphaltenes Adsorption by Nanocomposites: Application of Group Method of Data Handling, Least Squares Support Vector Machine, and Artificial Neural Networks

Asphaltenes deposition is considered a serious production problem. The literature does not include enough comprehensive studies on adsorption phenomenon involved in asphaltenes deposition utilizing inhibitors. In addition, effective protocols on handling asphaltenes deposition are still lacking. In this study, three efficient artificial intelligent models including group method of data handling (GMDH), least squares support vector machine (LSSVM), and artificial neural network (ANN) are proposed for estimating asphaltenes adsorption onto NiO/SAPO-5, NiO/ZSM-5, and NiO/AlPO-5 nanocomposites based on a databank of 252 points. Variables influencing asphaltenes adsorption include pH, temperature, amount of nanocomposites over asphaltenes initial concentration (D/C0), and nanocomposites characteristics such as BET surface area and volume of micropores. The models are also optimized using nine optimization techniques, namely coupled simulated annealing (CSA), genetic algorithm (GA), Bayesian regularization (BR), scaled conjugate gradient (SCG), ant colony optimization (ACO), Levenberg–Marquardt (LM), imperialistic competitive algorithm (ICA), conjugate gradient with Fletcher-Reeves updates (CGF), and particle swarm optimization (PSO). According to the statistical analysis, the proposed RBF-ACO and LSSVM-CSA are the most accurate approaches that can predict asphaltenes adsorption with average absolute percent relative errors of 0.892% and 0.94%, respectively. The sensitivity analysis shows that temperature has the most impact on asphaltenes adsorption from model oil solutions.


Introduction
Depending on oil composition and process conditions, asphaltenes deposition may pose a serious concern during the production of light and heavy oils [1]. Based on solubility criteria, asphaltenes are fractions of the crude oil that are insoluble in low molecular weight (MW) paraffins, whilst soluble in light aromatics such as pyridine, benzene, and toluene [2]. Asphaltenes are the asphaltenes adsorption onto nanocomposites [62]. Mohammadi et al. synthesized NiO/AlPO-5 and NiO/ZSM-5 to investigate asphaltenes adsorption under various conditions; including pH, amount of NPs over asphaltenes initial concentration (D/C 0 ), and temperature [63]. Asphaltenes adsorption onto these particles was modeled using adaptive neuro-fuzzy interference system called ANFIS [63].
In this work, asphaltenes adsorption from model solutions using different nanocomposites is modeled at various temperatures, pH, and with different amounts of nanocomposites with varying physicochemical properties; including total surface area and pore volumes. It should be noted that pH value is attributed to the solution of nanocomposites (adsorbent), base, and acid before adding the model oil (asphaltene + toluene). pH is adjusted by adding acids (e.g., citric acid) and base (e.g., ethylenediamine) [61]. Different machine learning protocols; including least squares support vector machine (LSSVM), artificial neural network (ANN), and group method of data handling (GMDH) are utilized to predict the adsorption at various process and thermodynamic conditions. Furthermore, various optimization methods; namely Levenberg-Marquardt (LM), Bayesian regularization (BR) algorithm, conjugate gradient with Fletcher-Reeves updates (CGF), scaled conjugate gradient (SCG) approach, genetic algorithm (GA), particle swarm optimization (PSO), coupled simulated annealing (CSA), imperialistic competitive algorithm (ICA), and ant colony optimization (ACO) are implemented to obtain the optimal values of the model parameters. The precision and reliability of the collected data are assessed as a first step toward developing proper models. To the best of our knowledge, this is the first time that asphaltenes adsorption by three nanocomposites; namely NiO/ZSM-5, NiO/SAPO-5 and NiO/AlPO-5, are modeled using three connectionist modeling protocols and optimized by nine procedures. Furthermore, the quality of the experimental data is evaluated on the basis of common statistical parameters. The findings of this study can help to better understand the effective parameters impacting asphaltenes adsorption and to design and operate effective asphaltene removal techniques.

Experimental Dataset
In order to develop reliable models based on LSSVM, ANN, and GMDH for predicting asphaltenes adsorption from oils using nanocomposites, a comprehensive experimental adsorption data pertaining to NiO/ZSM-5, NiO/SAPO-5 and NiO/AlPO-5 at various temperatures, pH values, and amounts of nanocomposites were collected from the literature. This set of experimental data consists of 252 points under different operational conditions, as detailed in the literature [61,62,64]. Nanocomposites properties and the experimental conditions are listed in Tables 1 and 2, respectively.  Support vector machine (SVM) is a conventional machine learning approach. Generally, machine learning performs data classification and minimizes structural risk by simplification of high dimensional space and implementing kernel function, as shown in Equation (1). The modified version of SVM approach; namely least squares support vector machine (LSSVM), uses least-squares results in the form of a principle to obtain the minimum structural risk [65][66][67]. Thus, the fundamental equation of LSSVM can be expressed as follows: where J, X i , and Y i resemble the risk bound, slack variable, and binary target, respectively. γ, ω, b, ε i , ϕ(X i ), and e i stand for the regularization parameter, weight matrix, bias, slack variable, kernel function, and error, respectively. To solve this problem, the Lagrangian function is determined as follows: In Equation (3), α k represents the Lagrangian multipliers. The derivatives of Equation (3) in terms of ω, b, e, and α k are obtained by Equation (4), which is used to determine the parameters: Following the above equations, a linear function system is defined as given below: . Ω ij , kernel function, can be formulated by the following equation [68]: In the current study, the radial basis function is selected as a kernel function for the LSSVM algorithm.

Artificial Neural Network (ANN)
One of the popular branches of computational-based modeling is the artificial neural network (ANN), which is constructed on the basis of biological nervous systems. ANN effectively explores patterns within the data and creates new relationships between the target value and the important variables in the system. ANNs consist of a huge number of interconnected elements known as neurons [69,70]. Neurons act as processing units and are organized in various layers. Neurons are used for pattern recognition, clustering, function approximation, and classification. The radial basis function (RBF) and multilayer perceptron (MLP) neural networks are prominent forms of ANNs. It is worth noting that the main difference among these networks is the procedure neurons perform. RBF-ANN is constructed based on an output layer, a hidden layer, and an input layer. The hidden layer has neurons, which contain a radial basis function for their activation functions. Implementing linear optimization approach, this algorithm can find the best solution by adjusting weights during mean square error minimization. The output for the input pattern of "x" can be obtained using the following relationship [71]: where w i and ∅ i denote the connection weight and radial basis function, respectively. There are different types of radial basis functions (e.g., Gaussian function), given below: In Equation (12), x j and σ refer to the center of function and the Gaussian spread, respectively. As stated previously, MLP is known as the other form of ANN. This algorithm has several layers with the first one being the input layer and the last one being the output layer. The input and output layers are connected by intermediate and hidden layers. In the hidden and output layers, different forms of activation functions can be applied; including: Binary Step : x f or x<0 and − x f or x ≥ 0 By considering an MLP model with two hidden layers, tansig and logsig activation functions for the hidden layers, respectively, and purlin for the output layer, the output can be calculated as follows: where b 1 and b 2 introduce the first and second hidden layer bias vectors and b 3 resembles the output layer bias vector, accordingly. In addition, w 1 and w 2 represent the first and second hidden layers' weight matrixes, respectively, and w 3 is the output layer weight matrix. In this study, optimization algorithms, namely CGF, SCG, BR, and LM are employed to enhance the performance of MLP model. A schematic of the MLP-ANN algorithm is depicted in Figure 1.

Group Method of Data Handling (GMDH)
This method was proposed by Shankar as a self-organizing system [72]. GMDH was later used for pattern recognition, artificial intelligence, regression analysis [73,74], acoustic and seismic analysis, microprocessor-based hardware, multisensor signal processing, weather modeling, medical diagnostics, and prediction and classification in various engineering and science disciplines such as chemical engineering, petroleum engineering, mechanical engineering, and environmental engineering [74][75][76][77]. GMDH, also called polynomial neural network (PNN), is constructed based on layered structure. This structure has independent neurons, which are coupled by means of quadratic polynomials. Initially, Ivankhnenko proposed GMDH based on the optimum selection of quadratic polynomial formulations [78]. To predict the relationship between outputs and inputs, Volterra-Kolmogorov-Gabor series is utilized as shown below: where x i and Y i introduce the inputs and outputs; M refers to the number of independent parameters; and a, b i , c ij , and d ij . . . k denote the polynomial coefficients. Two independent parameters are then coupled together by a quadratic polynomial formulation and new parameters, Z 1 ,..Z n , to replace the former values. The quadratic polynomials can be written as follows: The new matrix is expressed by v z = (z 1 , . . . , z n ). To determine the coefficients of Equation (21), the least square method is used. This method minimizes the sum of the squared deviations between real and predicted values as follows: in which, A = {a,b,c,d,e,f} denotes the quadratic polynomial coefficient vector and T stands for the transposed matrix. Finally, the least square method leads to the following solution: A schematic of the GMDH model proposed in this study is illustrated in Figure 2. As it is clear from Figure 2, the designed network has an input layer, seven middle layers, and an output layer. The genome and nodal formulation of this network can be determined by the expressions given in Table 3.

Figure 2.
A simplified structure of the presented group method of data handling (GMDH) for estimating asphaltenes adsorption. Table 3. Correlations developed by group method of data handling (GMDH) for predicting the amount of asphaltenes adsorption onto nanocomposites.

Optimization Approaches
In order to optimize the models applied in this study, nine optimization procedures are used. The employed optimization techniques include particle swarm optimization (PSO), imperialistic competitive algorithm (ICA), ant colony optimization (ACO), conjugate gradient with

Genetic Algorithm
A metaheuristic algorithm named genetic algorithm (GA) was inspired based on the natural selection process. GAs use operators; including selection, crossover, and mutation, in their search and optimization problem. In this algorithm, probable solutions called population which contains individuals or creatures, move toward the optimum solutions. First, the population is produced randomly; then according to the obtained fitness values for each member of a population, the best individuals are selected to make the future population by considering crossover and mutation effects on them. The previous population is replaced by a new population and the process continues until the algorithm reaches satisfactory accuracy or maximum number of iterations [87,88]. A brief procedure of GA optimization is depicted in Figure 3.

Particle Swarm Optimization
Particle swarm optimization (PSO) [89] was constructed by Kennedy based on natural flocking and swarming of birds and insects. The first step of PSO is generation of a population of random solutions, known as particles. This population moves through the problem space based on the present best particles. The main characteristics of a particle are the position and velocity, which are used for searching through space with an appropriate value of fitness. It is worth noting that each particle saves two dominant pieces of information including the best global position (g best ) and the best visited position (p best ) [90]. The algorithm has an iterative performance so that the obtained solution for each iteration is compared with the global best and self-local best particle. The next position of a particle can be determined as follows: where v i and x i resemble the velocity and position of a particle; w denotes the inertia weight which can control the impact of last velocities; and c 1 and c 2 represent the relative impact of the social and cognitive components [91]. A simple flowchart of the PSO approach is illustrated in Figure 4.

Coupled Simulated Annealing
An upgraded form of simulated annealing (SA) is the coupled simulated annealing (CSA), which enhances precision of SA without considerable reduction in the convergence speed. SA has ability to move from the present solution to a worse solution to escape from the local optimum point. During the process, the probability of occurrence of such a movement reduces. The CSA has been suggested in several theoretical and practical optimization cases for easier escape from local optimum so that the precision of the optimization solution improves without unwanted impact on the convergence speed. The major difference between CSA and SA is the probability of acceptance. Suykens summarized the basic principles of CSA to avoid the local optimum for nonconvex problems [92]. More details can be found in the literature [93]. The methodology of the LSSVM-CSA optimization is presented in Figure 5.

Imperialistic Competitive Algorithm
Imperialistic competitive algorithm (ICA) is a new social counterpart approach, which was inspired from the GA algorithm. Atashpaz-Gargari and Lucas introduced this social approach for the first time. ICA shows an excellent ability of detecting the global optimum [94,95]. This algorithm usually uses three popular terminologies; including countries, decade, and cost function. Cost function represents an equation for optimization; decade shows individual iteration; and countries stand for the chromosomes counterparts in the GA algorithm. The countries, which have the least values in cost function, are assigned as the imperialists, and the others are termed as the colonies. The main operators are revolution, competition, and assimilation. The colonies movement toward the imperialist is created by assimilation. The revolution operator changes the location of countries. The mentioned operator controls the process to avoid local minima and improve the ability for finding the best solution. For the assimilation and revolution time, a greater imperialist value than the colony cost function may result in the change of position between the imperialist and colonies. The imperialist competition is defined as taking the colonies' possession and control of other empires.
The aforementioned competition can be determined as total cost function (TC) consequence, which is described by the following expression [96,97]: TC n = Cost(imprialist n ) + ξmean Cost(colonieso f empire n ) (27) where ξ stands for the colonies contribution coefficient in TC. The normalization of Equation (27) is expressed as follows: where NTC refers to the normalized TC. The possession probability for each empire can be determined by the following relationship: In Equation (29), the imperialists size and possession probability are shown by N imp and PP n , respectively. There is similarity between empire selection in ICA and GA. However, the common selection approach such as roulette wheel is applicable in the ICA selection because it does not require the cumulative distribution. The probability vector of P is determined as follows: A random number vector and combinatorial vector are obtained as follows: In this case, the objective is that the pertinent indices of D should be maximized. The method of ICA optimization is shown in Figure 6.

Ant Colony Optimization
One of effective population-based algorithms is ant colony optimization (ACO), which was developed based on Dorigo's work [98]. Searching the least distance between the food and nest is known as the main idea of development of ACO algorithm. The ants' population uses a chemical component, called pheromone as a footprint, to simulate the best way between the food and nest [99,100]. This algorithm is employed for the discrete path. Hence, the composite probabilistic modeling from Gaussian distribution should be implemented as probable solutions. In this case, the pheromone approach is applicable to modeling continuous paths. The probabilistic strategy obtains the best solution based on comparison of results with previous step. In order to find the solution vector of x, it is necessary to minimize the objective function (OF). The steps below express the computations in the ACO algorithm [101-103]: 1.
For N number of selected random solutions, the OF should be determined.

2.
The best and worst initial solutions are denoted by x 1 and x N , respectively, which are necessary to organize the solution.

3.
The following expression is used to assign a weight for each individual solution: For all weights, the following relationship should hold:

4.
Then, the Gaussian composite probabilistic modeling is constructed based on the following expression: where x[j] and x[ j] are the j th component of the x as a solution and a decision parameter, respectively. The following equations represent the average parameter and standard deviation: where ξ is a real positive value, which indicates the exploration-exploitation balance.

5.
The M samples as the solution offspring are created by the multidimensional model, as given below: 6.
The M offspring and n best solution are chosen. 7.
The termination criterion is checked.
A schematic of the ACO approach is described in Figure 7.

Results and Discussion
In a computational and modeling work, the modeling outputs are compared to experimental data in order to assess the model performance. The statistical indexes, including root mean square error (RMSE), R-squared (R 2 ), standard deviation (SD), average percent relative error (APRE, %), and average absolute percent relative error (AAPRE, %), defined below, are employed in this work for model evaluation: In the current study, 252 data points from 3 nanocomposites are gathered to properly simulate asphaltene adsorption onto nanocomposites with various characteristics under different operating conditions. Ten models are developed, in which MLP is optimized using BR, LM, SCG, and CGF; RBF is optimized thorough ACO, ICA, GA, and PSO; LSSVM is optimized with CSA; and the last one is GMDH. The computational time of each model employed in this study is presented in Table 4. In prediction of asphaltene adsorption, the low values of SD, RMSE, AAPRE, and APRE and high values of R 2 for the training and testing phases imply the accuracy and general applicability of the proposed models. Overall, on the basis of the results shown in Table 4, the accuracy of the suggested models can be ranked as follows: Table 4. Magnitudes of root mean square error (RMSE), standard deviation (SD), average percent relative error (APRE), average absolute percent relative error (AAPRE), computational time, and coefficient of determination (R 2 ) for all the proposed models for prediction of asphaltenes adsorption. In order to graphically confirm the accuracy of the models, the cross plots of anticipated data versus experimental data as well as error distribution for the testing and training data are plotted in Figures 8 and 9, respectively. Figure 8 shows an excellent match between the model predications and the experimental asphaltene adsorption onto the surface of nanocomposites. Experimental asphaltene adsorption data are mostly located very close to y = x line. Figure 9 demonstrates that a majority of data points of all models are placed at (or close to) zero error line which, in turn, verifies the consistency between the predicated and real data points. For most of the models, the maximum relative error between the predicated and experimental data is around 30%. However, in ICA the maximum error is around 70%, showing higher deviations in prediction.    Although all of the proposed models exhibit a very good match with the experimental data, it is constructive to identify the best algorithm in this research in terms of precision and reliability. The magnitudes of average absolute relative error (AARE), which is the most vital criterion for the assessment of model performance, are presented in Figure 10. Among the ten proposed models, RBF-ACO and LSSVM-CSA with AAPRE% values under 1% appear the most accurate, whereas the GMDH has the least accuracy. However, one major asset of GMDH is that it creates a visual relationship between the inputs and output; it can be also easily applied.

Model
In order to make a graphical comparison among all of the models, the variations of absolute relative error in terms of data points cumulative frequency for the entire models are illustrated in Figure 11. It should be mentioned that the precision and robustness of the models increase as the graphs become closer to the y axis. It is obvious from Figure 11 that both RBF-ICA and RBF-ACO have a great accuracy as these models can predict 85% of the data with an absolute relative error less than 1%. Other deterministic tools estimate 70% of the data points with an absolute relative error around 1%. However, both MLP-CGF and GMDH techniqes have the highest deviation such that they can predict 60% and 30% of the data points with an absolute relative error less than 1%, respectively.  According to Figures 10 and 11, RBF-ACO has the best performance among all algorithms. The error indexes for RBF-ACO are APRE% = −0.08, AAPRE% = 0.89, RMSE = 1.42, and R 2 = 0.9937. One practical tool for evaluation of a model performance is plotting actual versus predicted data, as depicted in Figure 12 for RBF-ACO. As it is evident from Figure 12, great agreement between RBF-ACO model's predictions and the experimental asphaltene adsorption data is noticed for the training and testing phases. It again indicates a high degree of accuracy attained from this model. The accuracy of RBF-ACO includes all range of experimental conditions. For example, Figure 13 shows an excellent match between the RBF-ACO model fit and the experimental asphaltene adsorption within the temperature range of 295 to 355 K. Higher temperatures negatively affect the asphaltene uptake by NPs and nanocomposites, as shown in Figure 13, which is in agreement with published experimental investigations [58,104]. In fact, higher temperatures impact NPs and asphaltene aggregation state as well as crude oil properties. It is worth mentioning that thermodynamic studies should be conducted in order to better scrutinize effects of temperature. A key statistical analysis implemented in the current work is sensitivity analysis. This method is used to quantify the impacts of the type of nanocomposite, pH, D/C 0 , and temperature on asphaltenes adsorption. The relevancy factor, which is the main parameter in this method, is expressed by the following relationship [68,88]: where X k,i , X k , Z i , and Z introduce the k th input, input averages, target parameter(s), and its average, respectively. Figure 14 shows this parameter for each input variable. It follows that volume of the micropore, S BET , pH, and D/C 0 display a straight-line relationship with asphaltene adsorption from the model oil by nanocomposites. Moreover, increasing temperature, which is the most effective parameter, decreases asphaltenes adsorption from the model solutions as proved by experimental studies. It can be concluded that the two most influential parameters are the temperature and D/C 0 . It was found that an increase in the efficiency of asphaltene adsorption is experienced as D/C 0 increases. Furthermore, at low pH, electrostatic forces cause attraction between adsorbent with positive surface charge and asphaltene with negative surface charge. An increase in pH causes repulsion forces between asphaltenes with negative surface charge and OH− ions which, in turn, leads to a decline in the asphaltene adsorption process [62].  Various algorithms and procedures have been proposed for exclusion and determination of outliers. In this study, the Leverage strategy is applied to scrutinize the experimental data. This approach determines the model deviations from experimental data points [105,106]. In this method, Hat matrix is computed as follows [106][107][108][109]: where A denotes an a × b matrix in which b and a are the number of the model's parameters and samples, respectively. Another important parameter in this method is the leverage limit, as defined below: Hat values are plotted against standardized residuals (SR: deviations between experimental data and modeling results); the resultant figure is named William's plot. In this strategy, the experimental data have a good quality and the model is statistically valid, if a majority of the data points are located in the feasibility domain of the RBF-ACO model (0 ≤ hat ≤ 0.071 and −3 ≤ SR ≤ 3). Figure 15 demonstrates that only four experimental asphaltene adsorption data points fall in the suspected area; thus, the collected experimental dataset is sufficiently reliable to train and test the models; the RBF-ACO model is also statistically acceptable.

Conclusions
Asphaltene precipitation is one of the most problematic issues in many oil reservoirs worldwide. A novel method for the treatment of asphaltenes, i.e., NPs-based treatments, has shown an excellent ability in tackling asphaltene-related problems since NPs and nanocomposites have great affinity toward adsorption and removing asphaltenes from crude oils. In this work, artificial intelligence models are developed to forecast adsorption of asphaltenes onto nanocomposites as a function of type of nanocomposites, pH, D/C 0 , and temperature. The presented algorithm outputs are graphically and statistically compared with actual asphaltenes adsorption data. Deviation between the model fit and experiments is described using different statistical indexes. It was found that LSSVM, ANN, and GMDH optimized by LM, BR, CGF, SCG, GA, PSO, CSA, ICA, and ACO sufficiently simulate asphaltenes adsorption data onto nanocomposites. The radial basis function neural network and the least-squares support vector machine, which are optimized by ant colony optimization and coupled simulated annealing, respectively, exhibit the best performance. RBF-ACO and LSSVM-CSA have the most accuracy with AARE% values of 0.89% and 0.94%, respectively. In addition, the impact of the different input variables on asphaltenes adsorption is analyzed. The temperature and the pore volume of the nanocomposites have the most and the least influence on asphaltenes adsorption, respectively. Lastly, the leverage strategy is employed to assess the quality of collected data, confirming the reliability of the data and the proposed RBF-ACO model.

Conflicts of Interest:
The authors declare no conflict of interest.