Estimation of Adsorption Capacity of CO2, CH4, and their Binary Mixtures in Quidam Shale using LSSVM: Application in CO2 Enhanced Shale Gas Recovery and CO2 Storage

Carbon dioxide enhanced shale gas recovery depends strongly on adsorption properties of carbon dioxide and methane. In this work, Least Squares Support Vector Machine (LSSVM) optimized by Particle Swarm Optimization, has been proposed to learn and then predict adsorption capacity of methane and carbon dioxide from pure and binary gas mixtures in Jurassic shale samples from the Qaidam Basin in China based on input parameters pressure, temperature, gas composition and TOC. A literature dataset of 348 points was applied to train and validate the model. The predicted values were compared with the experimental data by statistical and graphical approaches. The coefficients of determination of carbon dioxide adsorption were calculated to 0.9990 and 0.9982 for training and validation datasets, respectively. For CH4 the numbers are 0.9980 and 0.9966. The model was extrapolating reasonable trends beyond measurement ranges. More extensive datasets are needed to properly parameterize the role of shale properties.

Due to economic development and population growth, worldwide demand for energy is increasing [1]. Shale gas reservoirs have become one of the dominant unconventional energy sources with vast resources particularly in Canada, China and the United States. They are associated with a high storage capacity particularly due to significant adsorption [2][3][4][5][6][7]. In the stated countries, the study and development of shale gas reservoirs have become the focus of energy industries and producing the adsorbed gas is a central topic [8,9]. At high pressure and temperature conditions, carbon dioxide has higher adsorption capacity than methane especially in the micropore volume fraction [10]. In recent investigations, it was shown that five carbon dioxide molecules could take the place of one methane molecule. Hence, carbon dioxide injection into organic-rich gas shales has benefits such as enhancing adsorbed methane recovery, effective sequestration of carbon dioxide and mitigating global warming caused by carbon dioxide emissions [11]. Gas shale thus is an interesting alternative for carbon dioxide sequestration [12,13]. Currently, this approach has not yet been commercialized, although numerous efforts have been made to investigate its feasibility [14][15][16]. Key investigated topics include transport properties of gas mixtures in shale, adsorption mechanisms and solid-gas molecular interactions. The parameters affecting carbon dioxide enhanced methane recovery and co-adsorption of a mixture of carbon dioxide and methane still are investigated. A recent review of experimental results indicating trends on the adsorption capacity of CO2 and CH4 in shales is reported by Klewiah et al. [7]. The focus of this study will be on the correlation and prediction of adsorption of CH4-CO2 mixtures in shales.
The first studies of the shale gas adsorption were mainly about the mechanism of shale gas storage [17,18]. Furthermore, numerous investigations on the relation between adsorption and effective parameters of shale such as clay mineral content, organic matter, and pore structure were carried out. The previous investigations expressed that there is an approximately linear relationship between total organic carbon content (denoted TOC) of shale and methane adsorption ability [18,19]. The thermal maturity and organic matter type can also strongly affect methane adsorption capacity and the rate of adsorption [20,21]. Clay mineral content and their structure also have clear impacts on adsorption ability of shale [22,23]. Cheng et al. expressed that larger surface areas can enhance the methane adsorption in shale [24]. Bai et al. proposed a linear relationship between BET surface area and methane adsorption capacity [25]. Ross et al. concluded that as the micropore volume and organic carbon content increase, the methane adsorption capacity increases [6]. Weniger and coworkers developed correlations to estimate adsorption of carbon dioxide and methane for shales and coals in Brazil [26]. Chareonsuppanimit et al. investigated methane, carbon dioxide and nitrogen adsorption in shale. Their results shows the higher amount of adsorption for carbon dioxide [10]. Kang et al. studied the carbon dioxide storage capacity in shales in terms of pressure by considering adsorption and compressibility effects [5].
Artificial intelligence approaches have demonstrated extensive applications in interpretation and prediction [27][28][29]. In this work we seek to determine the relations controlling adsorption capacity of CO2 and CH4 in shale under various conditions. For doing so, we apply the Least Squares Support Vector Machine (LSSVM) algorithm as a predictive tool to forecast adsorption of methane and carbon dioxide binary mixtures adsorption in shale. This method is considered accurate and user-friendly. Optimal training parameters for the model were found using Particle Swarm Optimization (PSO). The algorithms are implemented in Matlab. Furthermore, sensitivity analysis carried out to identify the most influential of the input parameters on gas adsorption capacity.

2.1.Aims and workflow
The main purpose of the present research is the development of an LSSVM algorithm which optimized by the PSO algorithm can be used to predict adsorption capacity of carbon dioxide and methane when these gases are exposed to shale as pure component or binary mixtures. Based on the available dataset the input parameters are pressure, temperature, gas mole fraction of CO2 and TOC of shale as shown in Figure 1. Effectively this provides a multicomponent adsorption isotherm that does not assume a particular relation between sorption and pressure (which is the most frequent measurement while holding other parameters fixed) and also integrates the effect of the other stated parameters. The input data are normalized before implementing in LSSVM-PSO and after that outputs are realized and altered to adsorption capacity. The workflow of our approach is summarized in Figure 2 while more details are explained in the following sections.

Data Gathering
The consistency and efficient performance of any proposed model extensively depend on the quality of the dataset the model was trained from. A comprehensive dataset of experimental measurements generated by [30] was applied in this work; which includes 348 measurements each for carbon dioxide or methane adsorption capacity (measured in cm 3 /g) on Jurassic shale from the Qaidam Basin in China for different combinations of the four (input) parameters pressure, temperature, carbon dioxide mole fraction and TOC. The ranges of input variables are reported in Table 1. It is worth mentioning that Luo et al. [30] used van der Waals equation of state to determine absorbed gas density, a parameter difficult to measure experimentally, which was further used to calculate the excess adsorbed gas from the experimental data [30]. In order to prepare and test the machine learning models, the collected data were randomly divided into two groups, respectively for training and validation. Note that the terms 'validation' and 'testing' will be used interchangeably.

Identification of outliers
The utilized data points in preparation of prediction model can influence the predictive ability of proposed model. In analysis of large datasets, identifying outlier data is important to assess their representativeness. To this end, Leverage analysis was employed where the socalled Hat matrix is defined as follows [31,32]: where normalized residuals are plotted on the vertical axis against hat values on the x-axis. As indicated in Figure 3, three outlier points were pointed out by standard residuals being outside the range -3 to 3. A warning leverage value ( * ) was also included in the William's plot defined by: * = 3( + 1)/ No points had hat values above this threshold and hence were not given too much leverage in the data analysis. Hence, although three points appeared different from the rest, they or others did not have a strong influence on the model calibration compared to other data points. For simplicity all data were included.

Least square support vector machine (LSSVM)
The Support Vector Machine (SVM) approach was introduced by Vapnik et al. [33][34][35]  reformulating the problem in terms of linear, rather than non-linear, constraints on the function error terms. The LSSVM approach is outlined in the following based on the notation and derivations in [38,39]. A minimization problem is then formulated as: where ∈ is an objective function which should be minimized for a given weight vector , T denotes the transpose operation, ( = 1, . . , ) is the difference between output estimate ( ) ∈ and the true output ∈ (the output is the adsorption capacity in this case) and ∈ is a socalled regularization parameter (defining which term is more important to minimize). is the number of measurements. The estimated output of data point is defined by the weighted combination of the elements in a nonlinear function ( ) and a constant bias parameter . This function maps the input space (in our case a four dimensional vector with values of pressure, temperature, CO2 fraction and TOC) to a higher dimension with same order as . Note that is a useful temporary variable for the method derivation, but is not explicitly calculated, hence its dimension need not be specified. The true output is by definition equal to the estimate ( ) plus the error : This formulation of function estimation as an optimization (minimization) problem with linear constraints is characteristic for the LSSVM method. The problem can be reformulated to finding the saddle point of the Lagrange function: where are the socalled Lagrange multipliers and its associated terms correspond to the incorporation of the equality constraints in (4). The saddle point is found by setting partial derivatives of with respect to the parameters , , , to zero which implies that: = , ( = 1, . . , ) ( ) + + − = 0, ( = 1, . . , ) The set of equations to be solved can be expressed by: ( , ) is a kernel function which must be defined with some restrictions. With this function we effectively replace the need to define and compute . is symmetric and positive semi-definite which guarantees the existence of its inverse. For a defined kernel function ( , ) the optimal values of and the vector can be computed: Using that the weight vector is given by (4) we can update the output estimate ( ) in (4) for a general input vector as follows: One of the most popular choices of kernel functions applied in LSSVM, and also applied here, is the (Gaussian) Radial Basis function (see [36] for other choices): Here σ 2 indicates the squared bandwidth and ‖ − ‖ denotes the magnitude of the vector difference − . 2 and the regularization coefficient are both given parameters for an application of the LSSVM algorithm. To further optimize the performance of the LSSVM algorithm in terms of matching given data and prediction of remaining data; these two parameters were optimized using Particle Swarm Optimization as described in the following section.

Particle swarm optimization (PSO)
The assigned parameters , 2 in the structure of the LSSVM algorithm can have extensive impacts on the performance quality for prediction so it is required to determine these parameters carefully.
In the literature, different heuristic and analytical methods exist to lead predicting tools to their best structure. Among analytical approaches, gradient-based manners can be utilized to forecast these parameters, while in heuristic approaches, evolutionary algorithms like simulated annealing, particle swarm optimization (PSO), and genetic algorithms are utilized. In this study, PSO was applied. This algorithm has great ability to improve estimates of local and global optima [40,41] and is simple to implement.
In the PSO optimization algorithm a number of candidate solutions ( = [ , 2 ]), each called a particle, are initially randomly distributed in the search space and given initial velocities also randomly. The entire set of 'particles' is called the swarm. The solution estimate of particle corresponds to its position . At a given iteration the quality of the solution parameters for each particle is calculated; This was quantified by the Root Mean Square Error (RMSE) evaluated over the training data when the optimized LSSVM algorithm applied the 'particle' position values of and 2 . The best solution position for every individual particle (throughout its movements / iterations) and the best solution position of all the particles (the swarm) are kept track of and denoted and , respectively. New velocities are then calculated for each particle as a function of the previous velocity , and based on how far the particle is from its historic best position and from the swarm's historic best position [42][43][44][45][46][47]: = + Here 1 and 2 are known as the acceleration constants, stating how quickly the particle will be steered towards the two currently best positions. is the uniform distribution function and assigns a random value between 0 and the indicated acceleration constant. It has the role of applying stochastic weight in the particle velocity updating. is the socalled inertia weight and is used for controlling the search velocity balancing between global exploration (large steps to cover the search space) and local optimization (small steps near the optimum). This parameter is reduced with increasing iteration number as calculated below: Here and indicate the inertia weight at the new and old iteration step, respectively, and is the damping factor at which is reduced at every iteration. Particle position updates may in some cases fall outside of the defined search space. This is solved by randomly assigning a new position to an 'exiting' particle to a new location in the search space as follows: = ( , − , ) (0,1) + . , ( = 1, … ,4) Here and indicate the vectors with maximum and minimum values of the parameter range of the search space.
Briefly summarized, the optimization process can be described in the following steps: (1) First assign initial values of positions and velocities, after that, the inertia weight, learning parameters and maximum number of iteration should be selected.
(2) The fitness of particles should be computed.
(3) The position and speed of particles should be updated.
(4) The stop condition will be checked. If the assumed condition is obtained, the loop is terminated, otherwise, the loop will be continued.

Assessment of matching quality
Some statistical approaches described in the following formulations were utilized to evaluate the explanatory power (quality of match) between the observed data and the simulated data = ( ).
1. Coefficient of determination ( 2 ): 2. Mean relative error (MRE): 3. Root mean squared error (RMSE): 4. Standard deviation of the error ( ): In the above formulas, denotes the mean of observed data and ̅ the mean error between estimated and observed data.

Results and discussion
In this section results are presented for CO2 adsorption as function of the four input parameters temperature, pressure, CO2 composition and TOC. The analysis for CH4 adsorption is completely similar and the results are summarized in the Appendix.

Parameter relevance for prediction
The impact of the four input parameters on CO2 adsorption were first assessed by the Pearson correlation coefficient , also termed the relevancy factor. indicates the impact of input parameter in the vector on the output and is defined as follows [48,49]: The factor is in the range of -1 to 1. Higher absolute value expresses stronger correlation, and a positive or negative sign indicates whether the dependent parameter increases or decreases with variations in , respectively. The determined relevancy factors for each input parameter for impact on CO2 adsorption are shown in Figure 4. The figure expresses that all four parameters contribute to affect adsorption capacity, although the TOC content is less significant compared to the others (~0.2). Pressure dependence appears most significant (~0.7), followed by CO2 composition (~0.4) and temperature (~− 0.4). In accordance with findings in the literature, adsorption is negatively correlated only with temperature. It should be noted that the weak correlation to TOC could be due to a lack of measurements of this parameter. The reported dataset were classified into three shale types where only one value of TOC was assigned to each shale type. Although this value can vary from sample to sample and usually is considered strongly linked to adsorption capacity the statistical variation of this parameter was not reported in the dataset. We also note that the relevancy factor calculation assumes a linear dependence between the parameters, which is generally not the case. Hence, a higher correlation should be expected when non-linear functional dependence is introduced via the LSSVM algorithm.

Model calibration and validation
After PSO optimization, the LSSVM model had been parameterized and calibrated to the training dataset. The parameters applied in the optimized LSSVM model (particularly including the band width 2 and regularization coefficient ) are listed in Table 2, while details related to the PSO algorithm (such as the number of particles, number of iterations, acceleration constants 1 , 2 , inertia weight and damping factor ) are listed in Table 3. By applying the latter parameters in the PSO algorithm the former parameters were obtained for the LSSVM algorithm, defining the optimum LSSVM structure for describing the data. A graphical comparison between forecasted and actual adsorption values of CO2 is shown in Figure 5. %, for adsorption capacity from 1 to 12 cm 3 /g, but higher deviation is seen at lower adsorption capacities. More uniform deviations could be obtained by minimizing the relative (rather than absolute) errors.
After the visual comparison, a quantitative comparison can be useful. A summary of the aforementioned statistical indices (STD, RMSE, MRE and R 2 defined in (19) to (22)) is given in Although the performance naturally is better for the trained set, both sets have very high and comparable performance and especially the model is able to predict data in the validation dataset. system as function of pressure and CO2 mole fraction is shown in Figure 9 for shale 1, shale 2, and shale 3 at constant temperature conditions of 308 K. Also here good agreement between model and experiments is demonstrated, although higher deviations can be seen in this case, with differences up to 0.5 cm 3 /g for some points. Furthermore, the overall trends of adsorption capacity of CO2 in terms of gas composition, pressure and temperature show reasonable variations based on the sensitivity analysis.

Prediction
A significant strength of the LSSVM model is that it can be calibrated to match large datasets of non-linearly correlated data and then be used to predict behavior under new conditions. For demonstration, the LSSVM model is hence used for prediction of CO2 adsorption in binary gas systems vs pressure at two temperatures of 318 and 338 K as shown in Figure 10 and Figure 11.
This is demonstrated for all three shale types. Smooth trends are obtained where increased pressure and increased CO2 fractions both lead to higher adsorption of CO2.

Conclusions
In this paper the machine learning algorithm Least Squares Support Vector Machine (LSSVM) was applied, optimized by Particle Swarm Optimization (PSO), to correlate CO2 adsorption in shale from pure or binary CO2-CH4 gas mixtures as function of pressure, temperature, CO2 gas fraction and Total Organic Carbon (TOC). A total of 348 data points (combinations of these four inputs and the adsorption capacity) were used with 80 % for calibration and the remaining 20 % for validation. The same methodology was applied to CH4 adsorption. The obtained capacity values were compared with the experimental values for both the training and validation datasets by graphical and statistical methods. Coefficients of determination of 2~0 .998 between experimental and calculated adsorption were obtained for both datasets. The applied dataset did not contain a sufficiently detailed reporting of TOC to reliably estimate the impact of this parameter. We also note that this model was parameterized against a dataset from the Qaidam Basin in China and generalization to other shales is uncertain.
The proposed algorithm has demonstrated an ability to match and predict gas adsorption capacity for CO2 as discussed in the main text of the paper. For the interested reader, in this Appendix the analysis is repeated for the corresponding experimental results of CH4 from the same experimental dataset. The parameters used to optimize and calibrate the LSSVM and PSO algorithms are listed in Table 2 and Table 3, respectively. Figure A1 compares model performance against each data point for the training and test dataset, while Figure A2 directly compares modeled vs predicted adsorption and Figure A3 shows the relative deviation. A quantitative summary of the LSSVM-PSO model performance for CH4 adsorption is given in Table 4. Figure A4 presents CH4 modeled and experimental adsorption vs pressure at different temperatures and shale types when pure CH4 is exposed to shale and Figure A5 presents CH4 adsorption vs pressure at different gas compositions and shale types for the same temperature of 308 K. The model is used to extrapolate the adsorption behavior to mixed gas composition at higher temperatures in Figure A6 (318 K) and Figure A7 (338 K).