A data-driven model to estimate the pore volume to breakthrough for carbonate acidizing

This research investigates the impact of different rock, acid, and reaction dynamic properties on the pore volume to breakthough (PVBT) at different acid injection rates using in-house developed two-scale continuum simulation model. We analyzed the parameters relation and developed a reliable machine learning model to accurately predict the PVBT at similar range of investigated parameters. In the simulation, it was found that different acid concentrations result in the same optimum injection velocity but at large contrast in PVBT between low and high acid concentration. However, other parameters such as diffusion coefficient and reaction rate exhibited an inverse PVBT behavior across optimum injection velocity due to change in acid transport and reaction behavior. After that, different reliable machine learning algorithms were employed to predict the optimum PVBT for carbonates matrix acidizing. The utilized machine learning models undergone multiple optimizations and comparison to obtain the most accurate prediction performance. The artificial neural network model with 2 hidden layers outperforms the other optimizations with 11.27% estimation error, 0.96 R2 and 0.98 correlation coefficient for the testing data set. Finally, an empirical correlation was developed to accurately estimate PVBT at a low cost and very short time compared to lab experiments and numerical simulation models. The novelty of this research stems from examining PVBT curves behavior by varying five matrix acidizing parameters independently, analyzing the correlation between these parameters and developing machine learning model for handy and reliable optimum PVBT estimation.


Introduction
Matrix stimulation is deemed one of the most effective and reliable techniques in well-performance enhancement (Economides and Nolte 2000;Shafiq and Mahmud 2017). Its use in the oil and gas industry goes back to the late nineteenth century and was further adopted after the introduction of corrosion inhibitors in the 1930s (Economides and Nolte 2000;Leong and ben Mahmud 2019). Matrix stimulation involves the injection of acid solution in the formation at injection pressure lower than formation fracture pressure to dissolve the rock minerals, thereby creating larger flow channels to bypass the damaged zone (Rae and di Lullo 2003;Garrouch and Jennings 2017;Khalil et al. 2020). The acid reaction with limestone results in unstable dissolution fronts, known as wormholes, that preferentially propagate in the direction of the largest pores as acid flows (Golfier et al. 2001;Hao et al. 2013; Leong and ben Mahmud 2019). The most common and cheapest type of acid used in the industry is hydrochloric acid (HCl), which is usually selected for calcite and dolomite stimulation (Chand et al. 2009;Shukla et al. 2011). The reaction process between rock and acid is controlled by two limiting factors (Egiebor and Oni 2007;Mahmoud et al. 2020). First, if the reaction of the acid with the rock surface is much faster than the acid diffusion from the bulk to the rock, then the reaction is called mass-transfer limited since the slowest step, in this case, is the acid transfer (Hoefner et al. 1987;Elsayed et al. 2020). This is usually the case for HCl acid reactions with calcite (Khalil et al. 2020). On the contrary, if the slowest step is the reaction kinetics, then the reaction is deemed as reaction-rate limited, which is a common trait for weak acid reactions at low temperatures .
A good understanding of different acid/rock reaction behavior on a small scale is key to optimizing the acidizing design in field scales (Egiebor and Oni 2007;Khalil et al. 2020). Reaschers like (Buijse and Glasbergen 2005;Economides 2014) have developed empirical models to study the radial distance of wormhole propagation in field-scale cases. These models, in essence, are developed based on inputs from lab-scale experiments such as pore volume to break through (PVBT). PVBT is a measure of the acid volume needed to create a continuous wormhole channel across a lab scale core in term of core pore volume.
Several laboratory experiments and numerical simulation studies were established to examine factors that impact wormhole propagation patterns and optimum acid volume consumption to achieve long wormhole length (Golfier et al. 2002). These approaches helped in providing an insightful understanding of the acidizing process. For instance, (Wang et al. 1993;Fredd and Fogler 1999;Bazin 2001) performed core flooding experiments to investigate the effect of different acid concentrations and core lengths on optimum PVBT. They concluded that higher acid concentration corresponds to a higher optimum flow rate and lower PVBT (Bazin 2001). Also, experiments revealed an increase in optimum flow rate and PVBT at longer cores (Selvaraj et al. 2020). However, the acidizing curves seem to converge for the different core length at higher injection rates which indicates that PVBT is no longer dependent on the core length at high flow rates. (Ziauddin and Bize 2007) performed comprehensive experiments on eight different limestone rocks to study the effect of porosity distribution, density, and permeability on the dissolution pattern. They observed that cores with well-connected pore (grain-dominated) samples have lower wormhole velocity than poorly connected pores (moldic) samples. They attribute this observation to the flow of acid through a smaller fraction of pores in moldic samples. This consequently results in lower PVBT with a higher optimum injection rate for the poorly connected pore rocks.
The two-scale continuum model is a numerical approach to estimate the PVBT. (Akanni and Nasr-El-Din 2016) modified the two-scale continuum model to simulate wormhole propagation in calcite rocks using alternative acids, namely acetic acid and chelating agents ethylenediaminetetraacetic acid (EDTA) and diethylenediaminetetraacetic acid (DTPA). They simulated PVBT and optimum injection rates of acetic acid at pH 2.5 and 4.6. The authors reported that PVBT increase was observed in general in acetic acid compared to HCl. Also, the PVBT with acetic acid stimulation increased as pH increased from 2.65 to 4.6. Moreover, the injection rate where lowest PVBT was achieved was much lower than acetic acid compared to HCl acid. This is mainly due to the lower reaction rate constant in acetic acid which dictates longer contact period between the rock and acid. Also, (Maheshwari et al. 2013) developed a 3D model to simulate dissolution and the sensitivity to different parameters, one of which is permeability. The permeability influence analysis was simulated at variant permeability values, but porosity distribution was fixed. The simulations show that the optimum injection rate is negatively proportional to the permeability. However, the PVBT increases with permeability.
Different approaches have been used to examine the performance of acidizing treatment and estimate the optimum acid volumes (Selvaraj et al. 2020;Mehrzad et al. 2022). The experimental techniques and simulation models can be resourceful but not readily accessible. Hence, the utilization of machine learning techniques can help in providing fast and meaningful analysis for informed decision-making in matrix acidizing which is the focus of this study. In this work, the two-scale continuum model was used to estimate the PVBT for a wide range of rock and acid properties. Also, a sensitivity analysis was carried out to understand the influence of acid concentration, rock porosity, and acid type on the required acid volume. Moreover, different machine learning (ML) techniques were implemented to estimate the PVBT based on the rock properties and treatment parameters. Finally, a new correlation is proposed based on the optimized ANN model to allow fast and direct estimation for the PVBT.

Model development
A two-scale continuum model developed in MATLAB software was used to study the PVBT sensitivities. The two-scale continuum model consists of a group of continuum formulations for fluid flow in porous media and reaction kinetics at Darcy scale coupled with pore-scale properties evolution at the pore scale as the acid/rock reaction takes place Aljawad et al. 2021). The fluid flow in porous media is governed by mass conservation (the continuity equation) and Darcy's Law as follows: Mass Conservation: where is the fluid density, u is the fluid velocity vector, is the porosity, and t is the simulation time. The fluid superficial velocity can be obtained through Darcy's law: where k is the permeability tensor and P is the pressure. After that, the system reactive transport equation is defined as: where C A is the acid concentration, D e is the acid diffusion coefficient, a v rock reaction-specific area, R is the reaction rate, and C s is the acid concentration at the acid/rock interface. The reaction rate of HCl acid with calcite is a firstorder reaction and is defined as: where k r is the reaction rate constant. At steady-state reaction, the reaction rate is the mass transfer rate and can be written as: where k c is the mass transfer coefficient. Solving (7) for C s and substituting in (6) gives: In acid/mineral reaction at Darcy scale, the effective acid reaction coefficient (k eff ) determines whether the reaction is mass transfer limited or reaction rate limited. The effective acid reaction coefficient is defined as: In reaction rate limited reaction, k r is very small, and the term in dominator reduces to k c so that k eff equals k r and vice versa for mass transfer limited reaction.
As the acid reacts with the rock, porosity evolution over simulation time is given by: where B 100 is the dissolving power of acid and m is the rock mineral density. The acid-dissolving power can be calculated as: where m and c are the stoichiometric coefficients for mineral and acid, respectively, MW m is the mineral molecular weight, and MW a is the acid's molecular weight. The flow and reaction governing equations explained so far are solved constitutively. First, the pressure profile is obtained from the continuity equation and Darcy's law. Then, the acid concentration profile is obtained from Eq. 5 and the rock porosity change can be estimated by Eq. 10. Hence, the mass transfer coefficient ( k c ) can be calculated through a set of dimensionless equations as the following: where Sh ∞ is the asymptotic Sherwood number and b is a constant.

Reynold's Number of Flow in Pores (Re p ):
where r p is the pore radius.

Schmidt Number (Sc):
where r p is the pore radius.

Molecular Diffusion (D m ):
Mass Transfer Coefficient (k c ): The acid reaction occurs in pore-scale properties, namely, porosity ( ), pore radius ( r p ), permeability ( k ), and rock reaction-specific area ( a v ) change. The following equations are used to update these properties: where , , and are empirical parameters that can be adjusted to match simulation results with experimental pressure drop, PVBT, and wormhole pattern.
The pressure distribution can be obtained by applying the following initial and boundary condi- where p i is the initial pressure, q is the acid injection rate, B is the formation volume factor, A c is grid cross-sectional area, L is the core length, p out is the core outlet pressure and n is the normal vector.
Finally, the acid distribution is obtained through the following initial and boundary conditions: where C ini is the initial acid concentration.

Model set-up
The fine-scale 2D model was set in MATLAB software with a 300 (L x ) × 300 (L y ) grid block size, with each element containing solid and liquid fractions represented by the porosity. The system of differential equations explained in the methodology section is solved in small time steps until a breakthrough is accomplished, as presented by the model algorithm in Fig. 1. The system declares breakthrough occurrence when the differential pressure across the core is less than a certain value. This happens when a highly permeable path, i.e., a wormhole, is created along the core length. The breakthrough differential pressure assigned to the system is 0.1 psi, below which the model declares breakthrough and prints the results. Gaussian number generator is utilized to randomly distribute heterogenous properties such as porosity, pore size, and permeability across the domain with property mean, standard deviation, and correlation length values in the domain directions. Also, the system pressure was maintained at 2000 psi to keep evolved CO 2 in the solution. The simulation scenarios were performed to examine the HCl acid reaction with limestone rock under a linear flow regime. Several parameters were individually simulated to study their impact on the matrix acidizing wormhole propagation, distribution, and optimum PVBT. The studied parameters are classified into core properties related to acid and reaction kinetics related. Each parameter effect on wormhole propagation and breakthrough was tested under ten different injection rates ranging from 0.1to 10 cc/min. The base case core properties, system conditions, fluid properties, and reaction parameters used in the simulation, as well as model statistical parameters, are presented in Table 1. Also, the initial porosity and permeability distribution across the model domain is shown in Fig. 2.
It is worth highlighting that although some of the parameters are strongly correlated, such as permeability and porosity, temperature, and reaction kinetics, the research intends to capitalize on the model's capability to dissect the influence of each parameter which is virtually unachievable in

PVBT estimation using machine learning tools
The second objective of this work is to use machine learning techniques to develop a model that can estimate matrix acidizing PVBT in carbonate formation. Three machine learning models were used for this purpose, including artificial neural network (ANN), fuzzy logic (FL), and support vector machine (SVM). These three ML models are widely adopted in machine learning functions and were particularly selected because they have shown the most reliable and accurate prediction performance among other ML models. In addition, mathematical correlations can be extracted from the non-linear regression process in these models. The ML study was conducted by performing exploratory data analysis, then, the relative importance between the inputs and output was examined. Finally, multiple statistical quality analysis tools were used to infer the performance of selected ML techniques.

Exploratory data analysis
A range of seven different rock and acid properties and reaction kinetics aspects were studied to examine their influence on the PVBT in matrix acidizing. More than 700 scenarios were generated from the two continuum scale model simulations. The data were carefully analyzed to eliminate redundant, outliers, and noisy data points to increase ML models' prediction accuracy and avoid underfitting or overfitting. Hence, 40 data points were removed, and a total of 660 remaining scenarios will be utilized for ML models. After that, a statistical analysis of the dataset was carried out. Statistical analysis is essential in machine learning model development and defining their applicability limitations for better predictive performance. The statistical analysis includes determining minimum and maximum values of the input parameter, their mean, range, variance, standard deviation, skewness, etc., as presented in Table 2. The resultant PVBT changes from 0.9 to 43.19. It is worth mentioning that the generated statistical analysis values are quite reasonable and representative of different matrix acidizing conditions.

Relative importance
Relative importance assessment between input data and output is vital to developing a robust and optimized machine learning model. This is because machine learning models are data-driven, and their prediction performance is dependent on the number of input parameters and the impact on output values. Incorporating too many parameters can lead to a long computational time and a higher percentage of error. Three correlation coefficients were used to examine the relative importance assessment in this study (Fig. 3), namely Pearson's correlation coefficient by the linear regression model, Spearman's correlation coefficient, and Kendall's correlation coefficient.
The correlation coefficient between two variables falls in the range between "− 1" and "1" where -1 indicates that the input and output variables have a strong opposite relationship and 1 means that they have a strong direct relationship. As the correlation coefficient value approaches zero, the pair of variables become unrelated. The relative importance analysis showcases that injection velocity, acid concentration and reaction rate constant have more significant  impact on the PVBT compared to acid diffusion coefficient and core length and porosity. These parameters can be controlled by the engineer designing the stimulation job, such as the pumping rate, the concentration of acid, and the type of acid (i.e., reaction rate constant). Ideally, the engineer should select the optimum injection rate, concentration, and acid type to reduce the PVBT.

Error metrics
Multiple statistical quality analysis tools will be used to infer the performance of selected ML techniques. The evaluation indices that will be used are Average Absolute Percentage Error (AAPE), Average Absolute Deviation (AAD), the previously defined Pearson's Correlation Coefficient (CC), and Coefficient of Determination (R 2 ). The comparison between the accuracy of ML models will be based on these evaluation indices. Representing the departure of predicted values from the actual ones is an essential indicator of model reliability as the lower AAD, and higher CC indicates better prediction performance. The AAPE is defined as: And AAD is defined as: where N is the total number of samples, Yp is the predicted value, and Ya stands for the actual value. (20)

The impact of rock and acid properties on the PVBT
The impact of several parameters on the optimum acid volume was studied. The selectivity analysis can help in determining the optimum range for different parameters in order to maximize the performance of the acidizing treatment. In this section, the impact of core length, rock porosity, acid concentration, and acid type (diffusion coefficient and reaction constant) on the PVBT will be presented.

Core length
The first parameter examined with the two-scale continuum model was the core length. The acid efficiency curves of five different core lengths are presented in Fig. 4. The figure shows longer cores correspond to higher PVBT of acid, especially at flow velocities lower than optimum, i.e., face dissolution and conical dissolution patterns. This is because the cumulative volume of pore spaces is larger in longer cores considering similar core porosity. Moreover, a stepchange in optimum injection velocity is observed among the selected range where longer cores exhibited higher optimum velocity. For example, the optimum PVBT in the 3-inch-long core was recorded at 0.797 cm/min injection velocity with 1.4 PVBT, while it is 2.99 cm/min with 2.009 PVBT in the 16.5-inch-long core. In addition, Fig. 5 reveals that acid efficiency curves converge after the optimum PVBT, and it becomes independent of the core length at injection rates higher than the dominant dissolution pattern rates. The curves seem to be converging post the optimum PVBT. (Bazin 2001) argues that this is only valid for the mass-transferred limited reaction regime. These findings agree with (Bazin 2001) and (Dong 2012) experimental work as well as (Jia et al. 2021) two-scale continuum model findings.

Acid concentration
One of the matrix acidizing efficiency key factors is the concentration of acid used. Hence, changing the acid concentration would result in different densities and diffusion 3" Long Core 16" Long Core   Fig. 6. Figure 7 reveals that the optimum injection rate is independent of the acid concentration, where all concentrations showed the same rate at the optimum PVBT. Furthermore, it can be observed that using acid concentrations higher than 20% would not result in substantial volume optimization and becomes a subject of return on investment. It is worth highlighting that these findings do not agree with experimental work done by (Bazin 2001), where PVBT was observed to be decreasing and optimum injection velocity increasing at higher acid concentrations. Nonetheless, the findings in this work agree with the experimental results of (Furui et al. 2012) and continuum scale model simulation results by (Jia et al. 2021). Analyzing wormhole propagation profiles in Fig. 7, we find no major differences in terms of wormhole shapes and thickness at low and higher acid concentrations, except that small discontinued wormholes appear at 28% acid concentration.

Rock porosity
Initial porosity is another rock feature examined in this study. The porosity was set across the domain using Gaussian distribution to yield a 22% mean porosity value in the base case run. Then, a wide range (6-25%) of porosity values was examined. The acidizing curves generated are depicted in Fig. 8. It is worth highlighting that the injection rates are used to plot the curves as independent of the porosity since interstitial velocity is a function of medium porosity. Moreover, several parameters that are a function of porosity, such as permeability, pore radius, and surface area, were assumed constant in this study to investigate each parameter individually.
In general, the least PVBT is achieved at lower injection rates as the porosity increases. On the contrary, better acidizing performance in terms of PVBT is accomplished at the lower porosity domain. In addition, it was found that less

Fig. 9
Wormhole profile for different rock porosity at optimum PVBT acid consumption took place at a higher porosity medium in the face dissolution and conical wormhole regimes. This is attributed to the fact that mineral/bulk density in lower porosity rocks is higher. Hence, a larger volume of acid is needed in the face dissolution regime. However, the acid consumption behavior flips as the injection rates approach the dominant wormhole propagation since acid leak-off and density of wormholes increases in more porous rock at higher injection rates. This is also shown in Fig. 9, where porosity and concentration profiles are compared between 6%, 16%, and 25% porosity domains. Similarly, more acid volume is required for higher porosity at very high injection rates as acid is lost in pores before achieving the breakthrough. The numerical study conducted by (Kalia and Balakotaiah 2009) has shown similar observations.

Diffusion coefficient
The acid molecular diffusion coefficient role cannot be underestimated since it controls the acid mass transfer in the acidizing process, as stated in Eq. (16). Diffusion coefficient value can vary from an order of 10 -9 cm 2 /s for organic acids and an order of 10 -7 cm 2 /s for emulsified acids to the order of 10 -5 for the straight HCl acid. In this study, a range of 10 -6 -10 -4 cm 2 /s values was examined to assess the importance of diffusion coefficient on matrix acidizing and PVBT, as shown in Fig. 10. The results clearly show a reduction in PVBT for low diffusion coefficient values at low injection rates. At Darcy scale, acid transport is governed by diffusion and convection. When acid is injected at low rates, acid transport from the bulk phase to the rock surface is dominated by diffusion. This is in addition to the fact that the mass transfer rate is higher at a higher diffusion coefficient. Hence, higher values of diffusion coefficient led to a reverse effect with more acid consumption and thicker wormholes in the acidizing process. On the other end, the acid transport shifts to being convection dominated at higher injection rates which results in both higher optimum injection velocity and lower PVBT for high diffusion coefficient values.
Researchers have been studying the relation acid transport and reaction through Damkohler number. (Panga et al. 2005) defined it as the rate of acid dissolution to the rate of acid transport by convection. Damkohler number has an inverse relationship with injection velocity, and some researchers present the acid efficiency curve as a function of the Damkohler number. It is worth explaining that the rate of dissolution is the mass transfer rate in mass-transfer limited systems while it is the surface reaction rate in reaction rate limited systems. Different researchers have highlighted, based on experiments, that the Damkohler number at which the optimum PVBT takes place around 0.29. (Fredd and Scott Fogler 1998) presented the Damkohler number formula for mass transfer limited reaction as: The comparison between high and low diffusion coefficients wormhole profile at optimum PVBT in Fig. 11 shows less dissolved core inlet and thinner wormhole at higher diffusion value.

Reaction rate constant.
A thorough analysis of reaction rate constant influence is another merit that the two-scale continuum model offers over experimental studies. Determining the reaction rate constant is not straightforward in acid/rock reactions because the reaction rate is affected by the transport of acid from bulk to the rock surface, the reaction with rock, and the transport of reaction products. In addition, multiple chemical reactions take place at the same time in the acidizing matrix process (Jia et al. 2021). This indicates that the reaction rate constant relation with acid prosperities is not very well established in the literature. Also, several researchers have reported different reaction rate constant values for HCl acid under different conditions. The range used in this study varies from 1.5 × 10 -5 to 1.1 × 10 -5 m/s. The simulation results are presented in Fig. 12.
The results show an inverse relationship between the PVBT and reaction rate constant. This more evident at values between 1.5 × 10 -5 and 1.5 × 10 -4 m/s. The shift in the acidizing curve as the reaction rate constant value increases is theoretically explained by Eq. (9) in this study. The increase in the reaction rate constant almost role out the value of k c in the dominator and k r in the numerator and the dominator cancels out. Hence the reaction tends to be mass transfer limited and vice versa for high mass transfer coefficient values where the reaction moves to be reaction rate limited. Therefore, it is obvious that dissolution at 1.5 × 10 -5 m/s is a reaction rate limited. Subsequently, a higher optimum PVBT is expected. In contrast, the acid/rock reaction moves to transition at 1.5 × 10 -4 m/s and optimum drops at higher injection rates. Finally, the reaction shifts to mass transfer limited at higher k r values, which is also translated into lower PVBT at even higher rates. It should be the role of increasing k r above 1.5 × 10 -3 m/s fades out as the reaction is already mass transfer limited. This behavior of acidizing efficacy change due to reaction rate constant was reported by (Fredd and Scott Fogler 1998), where experiments on five types of acids were carried out to evaluate the reaction rate impact on the acidizing process. The superiority of the higher reaction rate constant is obvious in Fig. 13, where a concentration profile of 1.5 × 10 -2 m/s showed two discontinued and one dominant thin wormhole compared to 1.5 × 10 -5 m/s, which revealed virtual face dissolution at the core inlet domain and two thick and branched wormholes.

ML techniques implementation
After performing data exploratory analysis and exploring the relative importance between the input dataset and target output, this section is discussing Artificial Neural Network (ANN), Fuzzy Logic (FL), and Support Vector Machine (SVM) utilization to develop a robust model to predict PVBT in calcite matrix stimulation. After optimizing and tuning the ML models, the best performance will be judged based on the pre-defined error metrics, i.e., lowest AAPE and AAD and highest CC and R 2 . Generally, the data will be randomly divided into two sets (training and testing). The ratio of data division will be 70:30; i.e., 70% of the input data will be used to train the ML models while the remaining 30% will be kept hidden to test the model after completing the training phase. It is worth mentioning that the 70:30 data division ratio is common across machine learning-related research.

Artificial neural network (ANN) predictive model
Feed-forward ANN model was built to predict PVBT as a function of multiple rock aspects, fluid properties, and reaction kinetics. The ANN model has three main layers; the input layer which has model input data and features, a hidden layer with neurons, and the output layer the is composed of the PVBT. It is worth highlighting that a transfer function is used between the input and hidden layer. The ANN model was set with two neurons as a starting point. The optimum number of hidden layers and associated neurons was based on trial and error. Also, for training and transfer between input and hidden layers, Levenberg-Marquardt (trainlm) and hyperbolic tangent sigmoid (tansig) were used, respectively for the optimum model due to better prediction performance and faster calculations. The model optimizations were performed by varying the number of hidden layers and neurons in the hidden layers to obtain the best performance based on the error metrics. The first ANN model involved one hidden layer with testing different numbers of neurons between 2 and 10 for optimization. In contrast, the second ANN model was developed with two hidden layers and a various number of neurons for testing. In addition, the learning rate and maximum iterations used were 0.12 and 1000, respectively, in all developed ANN models. The PVBT estimation results in the testing stage of the one hidden layer model are shown in Table 3.
Assessing the evaluation indices, the one hidden layer with the 10 neurons model seems to outperform the other models in comparison. Similarly, the 2 hidden layer model optimization results are presented in Table 4.  The two hidden layers model and 10 and 5 neurons marked the best prediction performance with the highest R 2 and CC and the lowest AAPE and AAD at the same time. Hence, this particular model is considered the optimum model developed with the ANN technique. The performance of the developed ANN model in the training and testing phases is presented in Fig. 14. The curves prove that no model memorization took place as the validation errors are always higher than the training errors. Moreover, the best validation performance was obtained at a minimum error of 2.2263 from testing data at 25 epochs.
In addition, the learning rate and maximum iterations used were 0.12 and 1000, respectively.
The PVBT prediction performance of the optimum ANN model during training and testing stages versus actual values is shown in Figs. 15 and 16, respectively.
The error distribution of the optimum ANN model is exhibited in Fig. 17. It is worth noting that all studied cases showed an error percentage of less than 50%. The sample that showed an error percentage of less than 5% represents around 80% of the dataset. On the contrary, samples that showed an error percentage higher than 10% are less than 5% of the total data which proves the robustness of the developed model.

Fuzzy logic (FL) and support vector machine (SVM) predictive models
Fuzzy logic (FL) prediction model was also examined for PVBT prediction. The FL model was set up using Sugeno Fuzzy Inference System (FIS) where an adaptive neuron-fuzzy inference system (ANFIS) is applied to estimate PVBT. Also, the data clustering technique chosen to build the model was subtractive clustering (genfis 2). Different values of clustering radius were used from 0.2 to 0.9 to optimize model estimation and minimize performance errors. The highest coefficient of determination (R 2 ) obtained at clustering radius 0.2, which was 0.61. The last ML technique used to predict matrix acidizing PVBT was the support vector machine. This ML technique's performance accuracy is dictated by kernel functions in the training stage. In this work, several optimizations were carried out at multiple kernel function values of Gaussian type ranging from 1.9 to 10. Out of the examined SVM models, the model with a kernel option of 4.5 exhibited the best performance with R 2 of 0.31 in contrast to the other kernel function values.
The comparison shows that the ANN model resulted in lower AAPE by five folds compared to the other ML techniques. Moreover, the AAD value for ANN is extremely lower than the AAD results for FL and SVM. On the other hand, ANN marked a high performance in terms of CC with 0.98. A comparison between actual versus predicted PVBT in the testing phase for the three ML models is demonstrated in Fig. 18.
Overall, ANN model generated better PVBT prediction performance over the FL and SVM and hence it increases the confidence in ML utilization for in this work.

Correlation development
A new empirical correlation was developed based on the optimized ANN model; the proposed equation can help in estimating the required acid volume without the need of using the ANN code. The required inputs to estimate the PVBT are interstitial velocity, acid concentration, diffusion coefficient, permeability, and acid density using Eq. (23). The weights and biases required for estimating the acid volume are provided in Table 5.
where PVBT is the required acid volume to achieve the breakthrough, N is the total number of neurons, w1 and w2 1 + e −2(w 1i,1 X j n +b 1i ) + b 2 are the weights of the input and output layers respectively, X j n represents the normalized model inputs, and b1 and b2 are the biases for the input and output layers, respectively. Importantly, the range of applicability of this correlation is provided in Table 5. Also, all inputs used in the equation should be used as normalized values ( X j n ). The minimum ( X j min ) and maximum ( X j max ) can be used to determine the normalized values for each input, as given by Eq. (25). The minimum ( X j min ) and maximum ( X j max ) values for all parameters are provided in Table 5.
(24) X j n = 2 × X j − X j min X j max − X j min − 1

Conclusions
• This research has provided an understanding of multiple, rock, acid, and reaction kinetics properties that impact calcite matrix acidizing with HCl acid. • Two scale continuum model was constructed to simulate the influence of these parameters on the pore volume to breakthrough (PVBT) and wormhole patterns. • The highlighted findings were benchmarked against literature lab experiments and other simulation work conducted by other researchers. • It was found that injection velocity, acid concentration and reaction rate constant have more significant impact on the PVBT compared to acid diffusion coefficient, core length, and porosity. • The investigation has led to the generation of more than 1,000 data points. Data cleaning and exploratory analysis were conducted to figure out data correlation and distribution. Then, three different machine learning techniques were utilized to predict PVBT: artificial neural network (ANN), fuzzy logic (FL), and support vector machine (SVM). • Machine learning model optimizations and comparison showed that the ANN model with 2 hidden layers outperforms the other optimizations with 11.27% average absolute percentage error (AAPE), 0.41 average absolute deviation (AAD), 0.98 correlation coefficient (CC), and 0.96 R 2 in the testing phase. • An empirical correlation was developed to accurately estimate PVBT at a low cost and very short time compared to lab experiments and numerical simulation models.