Physics-Informed Neural Networks for Multi-Phase Flow in Porous Media Considering Dual Shocks and Interphase Solubility

Physics-Informed Neural Networks (PINNs) integrate physical principles into machine learning, finding wide applications in various science and engineering fields. However, solving nonlinear hyperbolic partial differential equations (PDEs) with PINNs presents challenges due to inherent discontinuities in the solutions. This is particularly true for the Buckley-Leverett (B-L) equation, a key model for multi-phase fluid flow in porous media. In this paper, we demonstrate that PINNs, in conjunction with Welge's Construction, can achieve superior precision in handling the B-L equations in different scenarios including one shock and one rarefaction wave, two shocks connected by a rarefaction wave traveling in the same direction, and two shocks connected by a rarefaction wave traveling in opposite directions. Our approach accounts for variations in fluid mobility, fluid solubility, and gravity effects, with applications in modeling 1D water flooding, polymer flooding, gravitational flow, and CO$_2$ injection into saline aquifers. Additionally, we applied PINNs to inverse problems to estimate multiple PDE parameters from observed data, demonstrating robustness under conditions of slight scarcity and up to 5% impurity of labeled data, as well as shortages in collocation data.


Introduction
Physics-Informed Machine Learning (PIML) bridges the gap between data-driven and physics-based approaches by integrating physical principles into machine learning (ML).PIML methods include training with synthetic data, post-processing to filter non-physical solutions, transfer learning, customizing neural network architectures to enforce physical constraints, and encoding governing physical laws into the loss function (Latrach et al., 2023).PIML has various applications in subsurface energy, such as autonomous directional drilling (Kesireddy et al., 2023), Pressure Transient Analysis (PTA) (Badawi and Gildin, 2023), geoscience data interpretation, production forecasting, reservoir characterization, and Carbon Capture, Utilization, and Storage (CCUS) simulations (Wang and Chen, 2023;Latrach et al., 2023).
Among PIML methodologies, Physics-Informed Neural Networks (PINNs) (Raissi et al., 2019) stand out for their explicit integration of physics equations into the loss function of the neural network, steering the model toward physically plausible solutions.PINNs excel in both dataindependent solving and data-driven discovery of governing equations.In forward problems, neural networks learn the solution by minimizing the losses on governing PDE residues, initial conditions, and boundary conditions.In inverse problems, PINNs use observed data to identify unknown parameters within the governing equations while adhering to the constraints imposed by PDE residues (Fraces et al., 2020).
This study applies PINNs to solve the Buckley-Leverett (B-L) equation (1942) and its variants to model multi-phase fluid flow in porous media, such as water displacing oil and CO 2 displacing brine.The B-L equation complexity and nonlinearity make traditional analytical or numerical solutions challenging.Previous efforts to solve B-L problems with PINNs have encountered difficulties due to solution discontinuities.Enhancement methods, such as artificial viscosity, attention-based mechanisms, and convex hull construction for the flux function, have been proposed to address these challenges.Adding an artificial viscosity term transforms the B-L equation from hyperbolic to parabolic, approximating the exact solution as the viscosity coefficient nears zero.However, this approach can lead to a smoothed shock front that mimics the diffused shock caused by truncation or discretization errors associated with numerical methods, thereby diminishing the strength of PINNs (Fuks and Tchelepi, 2020;Fraces et al., 2020;Coutinho et al., 2023).Attention-based mechanisms help neural networks focus on specific data segments, adjusting the "attention" level to different elements in the sequence, but they can complicate implementation and increase computational demands(Rodriguez-Torrado et al., 2022;Diab et al., 2022).Alternatively, Welge's Construction imposes the Rankine-Hugoniot and Oleinik entropy conditions to avoid non-physical, multi-valued solutions by constructing a convex hull for the flux function, resulting in a sharp, physically plausible shock front (Welge, 1952;Magzymov et al., 2021;Fraces and Tchelepi, 2021).
In this study, we use Welge's Construction method with vanilla PINNs to solve the original B-L equation and two challenging variants: the dual-shock B-L model accounting for gravity and the dual-shock B-L model accounting for inter-phase solubility.We examine the sensitivity of PINNs' performance in various scenarios, showing that standard PINNs handle varying mobility ratios, dip angles, and multiple discontinuities in solutions.This proficiency enables accurate prediction of engineering problems, such as water flooding, polymer flooding, mixed flow with dominant gravity, purely gravitational flow, and CO 2 injection into saline aquifers.Additionally, we apply the Welge's Construction method with vanilla PINNs in inverse problems.Starting with the mobility ratio as a learnable parameter, we evaluate the impact of observed data volume and quality on PINN effectiveness, demonstrating robustness under slight data scarcity and impurity.We then conduct a two-parameter learning experiment to identify both the mobility ratio and gravity term, where our method maintained strong performance.This study underscores the essential role of Welge's Construction in tackling hyperbolic PDEs.
The remainder of this paper is organized as follows: Section 2 provides an introduction to the training algorithm of PINNs, including basic concepts and overall workflow.Section 3 details the physical law, including the construction method of the flux function and two variants of dual-shock B-L models.Section 4 presents the results of the implementation of forward and inverse PINN training.Finally, Section 5 summarizes the key findings in the paper and discusses their implications for future research.

Physics-informed neural networks
Physics-Informed Neural Networks (PINNs) rely on a structured workflow to model system behaviors, as illustrated in Figure .1.Initially, a fully connected neural network (NN) processes input data sampled across spatial and temporal domains, providing a preliminary solution estimate.Automatic differentiation (AD) then computes the derivatives of the neural network output with respect to its input coordinates and model hyperparameters.These derivatives are used to calculate the loss function and update the model parameters, respectively.The iterative process of minimizing the composite loss optimizes the model's weights and biases, refining the NN's solution û(x, t), and gradually converging towards the exact solution u(x, t).Neural networks consist of interconnected layers (input, hidden, and output) where each neuron processes inputs through weighted sums and activation functions, making them universal function approximators.In PINNs, the NN approximates the solution to a PDE, with AD ensuring the computation of derivatives with machine precision and computa- The physical laws governing a system are integrated into the NN's loss function.For forward problems, the aim is to predict system behavior without labeled data, acting as unsupervised learning.The loss function includes three terms: L P DE penalizes deviations of solutions from the PDE, while L IC and L BC regulate the solution to adhere to the initial conditions u (x 0 , 0) and boundary conditions u (x b , t b ).
Loss function (2) Here, i=1 represent boundary condition points, and x i r , t i r Nr i=1 are collocation points sampled within the domain of time and space.Typically, the number of collocation points greatly exceeds the number of initial or boundary condition points.
PINNs offer a versatile framework for addressing both forward and inverse problems with minimal modification in code (Fraces et al., 2020;Cuomo et al., 2022).Inverse problems focus on parameter estimation from observed data, aligning with supervised learning.The loss function for inverse problems consists of two components: where L data quantifies the discrepancy between model predictions and observed data as: x i s , t i s Ns i=1 are observed or labeled data points.

Multi-phase fluid flow model
In PINNs, integrated physical laws are often modeled by PDEs, generally represented as: where u(x, t) represents the solution to be determined, N [•] is a nonlinear differential operator, and Ω is a subset of R d .This study focuses on multi-phase fluid flow dynamics modeled by the nonlinear hyperbolic PDE known as the Buckley-Leverett (B-L) equation, which involves one shock and one rarefaction wave in its original solution.More complex variants of the B-L model were later developed to account for gravity effects and inter-phase solubility.To deepen the understanding of the physical principles involved in PINNs, this section starts with the single-shock Buckley-Leverett equation and the construction method of its flux function.We will then detail the two important variants of Buckley-Leverett equation for CO 2 injection modeling and purely gravitational flow.

Single-shock Buckley-Leverett equation
The original Buckley-Leverett equation, based on mass conservation and Darcy's law, characterizes the transport of two immiscible fluids in porous media (Buckley and Leverett, 1942): where S w denotes the water saturation, t and x are dimensionless time and length, and α represents the angle of flow deviation from the horizontal plane, as shown in Figure .2.
The fractional flow or flux function f w , the ratio of the water flow rate q w to the total flow rate q t , is a function of water saturation as: The mobility ratio M quantifies the relative mobility of two phases and the gravity number N quantifies the effect of gravity on flow velocity: Here, µ w and µ o are the viscosities of water and oil, k is the absolute permeability of rock, A is the cross-sectional area, g is the gravitational constant, and ∆ρ = ρ w − ρ o .k rw and k ro quantify the effective permeability of the medium to each fluid in the presence of both via the Corey-Brook model (Brooks and Corey, 1966): with S being the effective water saturation: k 0 rw and k 0 ro are the maximum values of k rw and k ro at the endpoints of the water saturation profile (S wc and 1 − S or ), respectively.Figure .3illustrates an example of k rw and k ro profiles.The initial and boundary conditions of Eq.8 are:  The single-shock B-L model presents that when pure water is pumped into a 1D oil reservoir, at a production well (x = 1), one obtains pure oil until the water front arrives, followed by a mixture of oil and water with increasing water cut as time goes on (Araujo et al., 2020).It is widely applied to model 1D horizontal water and polymer flooding processes, both important techniques to increase oil recovery in the petroleum industry.

Construction of convex hull
For enhanced clarity, Eq.8 is hereafter simplified as follows: Here, u(x, t) symbolizes water saturation in water flooding scenario or gas saturation in CO 2 injection scenario.A typical flux function curve is depicted with a dashed line in Figure .4a.The derivative df (u) du represents the propagation velocity for a specific saturation level u.Using this velocity to determine how far a certain saturation level travels over time enables the construction of a saturation profile, as plotted by the dashed curve in Figure .4b.However, this profile is physically implausible because the water saturation is triple-valued at a single location.This irrationality originates from the non-convex nature of the flux function, where the velocity df (u)  du initially increases with increasing saturation, peaks, and subsequently decreases, causing higher saturation levels to overtake lower ones and form a shock front (Dake, 1983).Because of the shock, the mathematical expression of the Buckley-Leverett problem in Eq.9, assuming continuity and differentiability of u, fails to accurately represent the dynamics ahead of the shock, rendering it inadequate for integration in PINNs as the governing equation.
To develop a fully valid solution for the Buckley-Leverett problem, we introduce the concept of Riemann problems, which are hyperbolic conservation laws accompanied by piecewise initial con- The Buckley-Leverett model with its non-convex flux function is a classic example of a Riemann problem, where the left state is u l = 1 − S or and the right state is u r = S wc .In numerical analysis, the imposition of Rankine-Hugoniot jump condition and the E-condition of Oleinik on the flux function helps to select the unique and proper solution for Riemann problems (LeVeque and Leveque, 1992).The E-condition of Oleinik, as shown in Eq.17, ensures that the rarefaction wave trailing the shock does not surpass it, thereby adhering to the second law of thermodynamics, which mandates that entropy in an isolated system should not decrease: The Rankine-Hugoniot jump condition equates the flow rates of the displacing fluid on either side of the shock, ensuring the conservation of mass across the shock front.It helps determine the speed at which a shock wave moves through the medium, as shown in Eq.18.
So, from points at(u r , 0) to (u f , f (u f )), the original f (u) curve is replaced by a straight line segment to represents a shock jumping from u = u r to u = u f , forming a convex hull.Behind the shock, in the saturation range of [u f , u l ], Eq.9 remains valid (Welge, 1952).With this construction, the solution can be obtained as shown by the solid line in Figure .4b.

Dual-shock B-L model with inter-phase solubility
Injecting CO 2 into subsurface saline aquifers can reduce greenhouse gas emissions and combat climate change (Green et al., 1998;Li et al., 2024).The original Buckley-Leverett model, assuming strict immiscibility between phases, was modified to include a retardation factor that accounts for the partial solubility of CO 2 and brine (Noh et al., 2007;Burton et al., 2009;Azizi and Cinar, 2013;Bai et al., 2024).This adjustment creates a dynamic two-phase, two-component system with constant phase properties.The system involves three distinct regions: pure CO 2 region (I), fresh brine region (II), and two-phase region (J) where CO 2 and H 2 O dissolve in each other's phases, indicating hydrodynamic and solubility trapping of CO 2 .These three regions are separated by leading and trailing shocks with saturation S g1 and S g2 , as illustrated in Figure .5.
The velocities of the leading and trailing shocks depend on the solubilities of CO 2 in the aqueous phase and H 2 O in the gaseous phase, respectively.
The retardation factors D, which quantify inter-phase mass transfer (mutual solubility) as following equations, are influenced by temperature, pressure, and salinity, shaping the interactions within each phase.
D I→II and D II→J , located on the line with a unit slope through the origin, correspond to conditions in pure brine (initial) and pure CO 2 (injection), respectively.Typically, CO 2 's solubility in water exceeds H 2 O's solubility in gas, causing the leading shock to advance faster than the trailing shock.
The trailing shock disappears when the slope of v trailing reaches zero, corresponding to zero water solubility in gas, with s g2 = 1 − s wr .

Dual-shock B-L model with dominant gravity
The non-zero gravity term N sin α in Eq.9 signals that the horizontal multi-phase flow is subject to gravity effect.However, when gravity completely dominates the flow, the fluid dynamics are described by Buckley-Leverett equation with a different flux function (Araujo et al., 2020): displayed by the solid curve: from u r to the low-saturation shock and from high-saturation shock to u l , the original curves are replaced by convex hulls.It can be observed that the velocity of low saturations is positive while the velocity of high saturations is negative, leading to two front shocks traveling in opposite directions.

Implementation and training results
This section presents the results of applying the PINN framework to both forward and inverse problems.The neural network structure utilized here was simple and straightforward, featuring an input layer with two neurons (for spatial and temporal inputs), eight hidden layers with 20 neurons each, and a single-neuron output layer (for the solution).The entire implementation was carried out with TensorFlow 2.x, and we employed the keras.models.Sequential() method for PINN model development, initializing hyperparameters with the Xavier method and optimizing with the Adam optimizer.tf.GradientTape() was used to compute gradients of the solution with respect to time and space for PDE residues, as well as gradients of the loss concerning model hyperparameters for PINN training.The typical runtime for a case on a T4 GPU system in Google Colab ranged from 10 to 20 minutes.

Forward problems
The objective of forward training was to solve the Buckley-Leverett equation by minimizing a composite loss function that included residual loss, initial condition loss, and boundary condition loss.
The training dataset consisted of 10,000 collocation points within the solution domain, along with 300 data points to enforce the initial condition and another 300 data points to enforce the boundary condition, all generated using the Latin Hypercube Sampling (LHS) method.No labeled data was used for this process.The maximum iteration number was set at 20,000.
Initially, the hyperbolic tangent (tanh) function was chosen as the activation function across all layers.This setup, however, led to non-physical results where the solution values fell below zero at the shock front, contrary to the expectation that water saturation levels should remain within the [0, 1] interval.To remedy this, we transitioned to using the sigmoid function for the output layer's activation, ensuring solution values were constrained within the appropriate range.This change led to a more accurate solution map, as displayed in Figure .9.
The subsequent subsections will explore the results from forward training for the single-shock scenario, examine the sensitivity of PINN performance to various fluid mobility, incorporate gravity into the governing equation, and subsequently tackle two dual-shock Buckley-Leverett models.

Base case
In the base case, we excluded the influence of gravity and use a unit mobility ratio (M ).Osher's method was utilized to compute analytical solutions, serving as benchmarks for evaluating the PINN  predictions (Ketcheson et al., 2020).The progression of the training is illustrated in Figure .10,displaying how the solution's profile changed over distance at specific time intervals (0.1, 0.4, and 0.9) through various stages of iteration (200, 1000, 5000, and 20000).By approximately the 5000th iteration, the trained solution closely aligned with the exact solution.The ultimate L 2 -norm error of the PINN solution was 4.55% and an L 2 -norm loss is calculated at 1.36E-6, according to Eq.1.We explored four additional cases to assess the flexibility of standard PINNs.The details of these cases, including their specific parameters, losses, and errors, were compiled in Table1.These cases were categorized into two groups to facilitate sensitivity analyses.

Sensitivity analysis on fluid mobility
The first group of cases (base, L1, L2) investigated the impact of the mobility ratio.In the base case, a unity M reflects equal mobility between the displacing (water) and displaced (oil) phases.For water flooding, M is typically less than 1 due to water being more mobile than oil.In contrast, for polymer flooding, where the viscosity of the displacing phase is intentionally increased through polymer addition, M can be significantly higher.Extreme values were examined by setting M = 0.1 for the L1 case and M = 10 for the L2 case.As summarized in Table1, cases L1 and L2 yielded comparably low errors and losses relative to the base case, indicating the proficiency of standard PINNs in resolving Buckley-Leverett models across different M values.This was further evidenced by a side-by-side 2D comparison of analytical and PINN solutions in Figure .11,displaying cooler hues for higher water saturation and warmer ones for higher oil saturation.There was no noticeable difference between the PINN and analytical solutions.When comparing cases with different M , a smaller M led to a higher front saturation u f and a delayed breakthrough time of water t bt (the value of t when x = 1).The shock front is marked by the transition between cold and warm color zones.The influence of M on flux function and oil recovery was further illustrated in Figure .12.On the left, the points where the dashed line (original fractional flow) intersected with the solid line (constructed fractional flow) signified u f , with values of 0.30, 0.71, and 0.95 for the three cases, respectively.On the right, oil recovery factors were calculated by integrating the oil rate (1 − u) over time at the producer's location (x = 1), with the slope representing oil production rate.Oil was produced at a constant rate across all cases until the water breakthrough occurred at different times: 0.463 for the low-mobility-ratio case, 0.828 for the base case, and 0.976 for the high-mobility-ratio case.Upon breakthrough, the water cut at the producer jumped from 0 to u f and continued to rise as flooding progresses through the reservoir.A high mobility ratio led to an early breakthrough, leaving significant oil unrecovered-an unfavorable scenario.A moderate mobility ratio delayed the breakthrough and improved sweep efficiency.A low mobility ratio caused a late breakthrough, enabling nearly complete oil recovery, making it the most advantageous scenario.

Sensitivity analysis on gravity
The second set of cases (base, N1, N2) incorporated the gravity term (N sin α) into the Buckley-Leverett equation, adding complexity to the flux function.The gravity term, as defined in Eq.11, is influenced by the reservoir's inclination angle α, rock permeability, the density difference between water and oil, and the flow rate.A positive α indicates upward flooding, whereas a negative α suggests downward flooding.We explored N sin α values ranging from -3 to 3.
PINN training results from the N1 and N2 cases, detailed in Table1, showed small errors and losses, demonstrating the effectiveness of PINNs in solving gravity-modified Buckley-Leverett models.
Figure .13 reinforced this point through a 2D comparison, affirming PINNs' precision in capturing the comprehensive solution landscape for these scenarios.A negative gravity term resulted in faster water breakthrough and reduced front saturation, compared to a positive one.
Furthermore, the negative gravity effect modifies the initial conditions of the displacement process.Due to the density difference between oil and water, saturation distribution changes immediately after water injection begins.As a result, modeling of down-dip flooding cases requires careful adjustment of new boundary conditions from the original value of 1 − S or .Figure .14examined gravity's impact on the fractional flow curve and oil recovery.In scenarios with steeply downwarddipping reservoirs, the value of f (u) may exceed one, promoting water flow while restricting oil production.Conversely, up-dip flooding impairs water flow, resulting in high front saturation u f and slow movement.Thus, a positive gravity term leads to more oil displacement at breakthrough, albeit occurring later, as shown in Figure .14b.A late t bt is more favorable because displacement efficiency tends to be poor after breakthrough.Predictive modeling extended the system's future solution behaviors until t=1.2.Until now, vanilla PINNs have managed to model the Buckley-Leverett equation featuring a single shock in the solution.Yet, whether PINNs can effectively tackle scenarios with two discontinuities in the governing equation remains to be seen.To investigate this, PINNs are further trained on adapted Buckley-Leverett equations that model gas-displacing-water processes and purely gravitational flows.shows, at a rate of 4 cubic meter per day (reservoir conditions) for 30 years, we can project the expansion of the CO 2 plume.The distance the leading shock spreads is: The distance the trailing shock reaches is: Figure .18depicted the PINN solution versus distance at periodic time slices.At the midpoint (z = 0), the saturation value remained constant.Due to gravity, one water shock and one oil shock formed and traveled at different speeds in opposite directions.
The PINN trained solution was further compared with the analytical solution in Figure .19.Oil shock, water shock, and the whole solution map were accurately captured by the PINN model.

Inverse problems
Within the framework of PINNs, inverse problems leverage observed (labeled) data to unravel the hidden parameters in the governing equations, such as rock and fluid properties in this study, thereby enabling the comprehensive prediction of the system's behavior over space and time.The training configurations for inverse PINNs mirror those of forward PINNs, including neural network architecture, initialization method, and optimization strategies.However, the composition of the loss function of inverse problems comprises solely the observed data error and the PDE residual error, as defined by Eq.5.Initial and boundary conditions are unknown in these scenarios.In addition to L 2 -norm error and loss, the parameter error is employed to assess the performance of inverse PINN training, defined as: It is important to point out that the fractional flow is not predefined but dynamically constructed during each iteration, with the front saturation calculated by Eq.18.This iterative refinement ensures that the PDE parameters and their constructions are continuously updated to align with observed data.
The forthcoming subsections will delve into the results of inverse training within the base scenario, which focuses on a single learnable parameter.We will examine how variations in the quantity and quality of sampling data, as well as the size of collocation points, impact PINN performance.Subsequently, we will address the complexities of learning with two parameters in the context of the Buckley-Leverett model.

Base case
For the base case, the focus was on learning the mobility ratio (M ) , with gravity effects momentarily set aside.We assembled a dataset comprising 10,000 collocation points alongside 10,000 labeled data points, gathered through Latin Hypercube Sampling (LHS).This method ensured broad coverage across both time and spatial domains, aiming for thorough characterization of the system's dynamics.A learning rate of 1E-4 was employed for the neural network weights and a learning rate of 1E-3 was used for the learnable parameter.
The progression of loss, error, and parameter error throughout the training process was depicted in Figure .20.We identified the optimal model at an iteration where both the parameter error and solution error showed reductions compared to preceding values.Eventually, at iteration 11,416, the model precisely predicted M to be 1.000000119 (true M=1), achieving an L 2 -norm error of 1.18% and a loss of 6.17E-05.Expanding our analysis, we adjusted M values for training and compiled the results in Table2.For instance, the M1 case yielded an estimated M value of 0.09999999404 (true M = 0.1) and for case M2, the NN model estimated M to be 9.99998664856 (true M = 10).
The inverse PINN training undertaken in these cases consistently produced commendable results regarding parameter error, solution error, and the loss function.
To facilitate a more in-depth investigation on PINNs' performance with various sampling strategies of labeled data, eight additional cases were performed.The outcomes of these experiments, detailed in Table2, were categorized into three groups for analysis.

Sensitivity analysis on sampling size
Ideally, the number of labeled data points should be comparable to the number of collocation points, such as the base, M1, and M2 cases.However, data scarcity in scientific and engineering contexts often necessitates an examination of how the effectiveness of PINNs fluctuates with varying sizes of labeled data.To address this, we conducted experiments to assess PINN performance with progressively smaller datasets in the D1, D2, and D3 cases, which utilized 1,000, 100, and 10 labeled data points, respectively.Figure .21illustrated the model's learning outcomes using just 100 labeled data points.The results from these cases, as presented in Table2, validated that our method for selecting the optimal model consistently facilitated reliable parameter estimation across various datasets.Despite the reduced data sizes, the recorded loss values for these PINN models remained within modest ranges.However, as depicted in Figure .22,the clarity in distinguishing between high and low saturation regions diminished with smaller sample sizes, indicating that solution prediction accuracy degraded as the number of sampling points decreased.Training with as few as 10 data points led to a significant decline in predictive performance, with the loss increasing by an order of magnitude compared to the base case.

Sensitivity analysis on sampling purity
Beyond the impact of sampling size, given that real-world data collection frequently encounters the challenge of noise or impurities, we assessed the resilience of PINNs against such imperfections by introducing Gaussian noise at varying intensities of 1% (C1 case), 3% (C2 case), and 5% (C3 case) into the pristine base case data.The corresponding training outcomes detailed in Table2.
Our results indicated that PINNs could successfully estimate the unknown parameter M despite slight data corruption.For better visualization, errors and losses of these four cases were plotted in Figure .23.In the noise-free base case, the error and loss were recorded at 1.18% and 6.17E-05, respectively.The introduction of 1% noise slightly altered the results to an error of 1.00% (lower than that of the base case, likely attributable to sampling variability) and a loss of 1.41E-04.At a 3% noise level, the error and loss increased to 1.33% and 9.63E-04, respectively.With 5% noise level, the most corrupted scenario, the results further deteriorated to an error of 2.48% and a loss of 2.74E-03.Despite the apparent trend that increased corruption impaired PINN solution accuracy, the error and loss values, even at a 5% noise level, remained relatively modest.This demonstrated  the robustness of PINN training against minor noise interference, enhancing their applicability in real-world scenarios where data often comes with inherent inaccuracies.

Sensitivity analysis on collocation data size
Unlike labeled data, collocation data generated from governing PDEs across spatial and temporal domains serve as 'free data' that ensure the neural network solution adheres to the underlying physical laws.Ideally, a large number and thorough distribution of collocation points should be used to guarantee the performance of PINNs, provided computational resources permit.To test the robustness of PINNs, we conducted two additional cases alongside the base case.The base case used 10,000 collocation points, while the additional cases, P1 and P2, used 1,000 and 100 collocation points, respectively.Figure .24illustrates the different sizes of input collocation points for these cases.
The training errors, losses, and accuracy of parameter estimation for these cases were summarized in Table2.The training losses remained relatively stable across the three cases.In the base case, which utilized 10,000 collocation points, the training error was found to be 1.18%, with a corresponding estimated value of the mobility ratio of 1.00000011921.For the P1 case, which used 1,000 collocation points, the training error increased slightly to 1.33%, with the estimated M being 1.00000083447.In the P2 case, with only 100 collocation points, the training error further increased to 1.34%, and the loss was 5.05E-05, with the estimated M being 0.99996614456.A smaller size of collocation data points required more iterations of training to approach the exact solutions.However, a maximum iteration threshold of 20,000 was adequate for all three cases to achieve well-trained models.
Additionally, the Latin Hypercube Sampling (LHS) method ensured a relatively uniform distribution of collocation data across space and time, contributing to the robustness of PINN performance.In summary, while there was a general trend of slightly lower accuracy with fewer collocation points, PINNs exhibited significant robustness in their performance.

Two learnable parameters
While vanilla PINNs demonstrate promising performance for inverse problems involving a single parameter, this subsection explores their capability to tackle the Buckley-Leverett model with two unknown parameters in the governing PDE: the mobility ratio (M ) and the gravity term (N sin α).10,000 labeled data points and 10,000 collocation points were used as inputs.
The journey of training with two parameters proved to be considerably more complex, marked by notable fluctuations in losses and errors.To manage this complexity, three individual optimizers were deployed-one for the model hyperparameters and one each for the mobility ratio and gravity term-alongside careful adjustments of the learning rate for each optimizer.Consequently, the training duration extended to 65 minutes compared to approximately 15 minutes for the one-parameter case.
Through meticulous adjustments and optimization, stable convergence was achieved.The final error and loss were recorded at 1.43% and 2.79E-6, respectively.The evolution of solution profiles during training was displayed in Figure .25.The PINN estimated M to be 0.999979854 (true value = 1) and the gravity term to be -1.00098896(true value = -1), showcasing the model's high accuracy even with increased parameter complexity.

Conclusion and discussion
This research leverages state-of-the-art Physics-Informed Neural Network (PINN) techniques to solve and discover Buckley-Leverett equations and their variants, which exhibit intricate solution behaviors.Focusing on real-world petroleum engineering challenges, such as water flooding in subsurface hydrocarbon reservoirs and carbon sequestration in saline aquifers, we have identified several key findings: 1.The success of PINN training for the Buckley-Leverett model critically depends on Welge's construction of a convex hull for the original flux function.This method imposes precise physical constraints on the solutions, eliminating issues of multiple saturation values at a single location related to the original fractional flow.This strategy is generally applicable to other nonlinear hyperbolic PDEs exhibiting discontinuities.
2. PINNs in their most elemental forms efficiently solve the Buckley-Leverett equation across different fluid mobility ratios and gravity terms, without relying on labeled data.Our find- ings indicate that lower mobility ratios and upward-inclined reservoirs are favorable for delaying water breakthrough, thus enhancing oil recoveries in water-displacing-oil processes.
3. Vanilla PINNs demonstrate the capability to resolve not just a single saturation shock but also dual shocks in both a semi-miscible gas-displacing-water process, where shocks travel in the same direction, and in purely gravitational flow processes, where shocks travel in opposite directions.The presence of additional discontinuities, due to inter-phase solubility and dominant gravity effects, does not impede the effectiveness of PINNs.This capability provides a valuable tool for modeling the spread of a CO 2 plume and gravitational flows.
4. Utilizing observed data, inverse PINNs precisely identify hidden parameters in the governing equations, such as the mobility ratio.The constraints imposed by the governing equations reduce the dependence of inverse PINNs on labeled data.Our sensitivity analysis shows that PINNs demonstrate resilience to data impurities of up to 5% and cope well with moderate data shortages.Furthermore, PINNs exhibit significant robustness when varying the number of collocation points, maintaining accuracy even with reduced data sizes.
5. Inverse PINNs are capable of identifying multiple parameters in the Buckley-Leverett equation, enabling comprehensive mapping of the entire solution space through meticulous adjustments of learning rates for individual optimizers.
PINNs exhibit independence from labeled data and excel in extrapolation or prediction capabilities for forward problems, outperforming other machine learning methods.Compared to numerical methods, the meshless characteristic of PINNs eliminates the need for fine grid blocks to track shock front movements, thereby avoiding the discretization errors associated with numerical simulation.However, it is essential to recognize that PINNs are not designed to replace but to augment traditional simulation methods.Currently, PINN techniques are in their developmental stages and encounter challenges in modeling complex physical phenomena, similar to their governing models.For instance, the Buckley-Leverett equation assumes constant rock and fluid properties and neglects capillary pressure, which restricts the generalizability of PINNs in highly heterogeneous and fractured reservoirs exhibiting hysteresis.Future research efforts will focus on overcoming these challenges and extending the application of PINNs to two-dimensional and even three-dimensional models, as well as incorporating capillarity, thereby broadening their impact in engineering and scientific research fields.

Figure 3 :
Figure 3: Relative permeability profiles for water and oil .

Figure 4 :
Figure 4: Example of (a) flux function as a function of water saturation and (b) resultant saturation profile.

Figure 5 :
Figure 5: Schematic of a miscible gas-water displacement.Two saturation shocks divide the medium into three regions.
Figure.6 demonstrates the construction of the flux function by drawing tangent lines from points (D I→II , D I→II ) and (D II→J , D II→J )to intersect the original flux function at S g1 and S g2 .The slopes of these lines dictate the velocities of the shocks, as detailed in Eq.19 and Eq.20.

Figure 6 :
Figure 6: Construction of the flux function for miscible gas-water displacement.

Figure 7 :
Figure 7: Initial saturation distribution of purely gravitational flow.
Figure 8: Example of flux function of purely gravitational flow.

Figure 9 :
Figure 9: 3D visualization of Buckley-Leverett solutions with (a) tanh and (b) sigmoid as the activation function for the output layer.

Figure 10 :
Figure 10: Evolution of solution profiles during forward PINN training for the base case.

Figure 15 :
Figure 15: Evolution of PINN solution for the model with two shocks traveling in the same direction.

Figure 16 :
Figure 16: Analytical vs. PINN solution profiles: 2D comparison for the dual-shock B-L model for CO 2 injection.

Figure 18 :
Figure 18: PINN solution for the model with two shocks traveling in the opposite direction.

Figure 19 :
Figure 19: Analytical vs. PINN solution profiles: 2D comparison for the dual-shock B-L model for gravitational flow.

Figure 20 :
Figure 20: Inverse training of base case: evolution of loss, error, and parameter error.

Figure 22 :
Figure 22: Analytical vs. PINN solution profiles: 2D comparison for various data sampling sizes.

Figure 23 :
Figure 23: PINN prediction accuracy relative to data purity level.

Figure 24 :
Figure 24: Comparison of collocation points for base case, P1 case, and P2 case.

Figure 25 :
Figure 25: Evolution of solution profiles during inverse PINN training for the two-parameter case.

Table 1 :
Summary of Single-shock Forward PINN Training Cases

Table 2 :
Summary of Inverse PINN Training Cases (One Learnable Parameter) Figure 21: PINN solution map learnt from 100 labeled data.