Inferring incompressible two-phase flow fields from the interface motion using physics-informed neural networks

In this work, physics-informed neural networks are applied to incompressible two-phase flow problems. We investigate the forward problem, where the governing equations are solved from initial and boundary conditions, as well as the inverse problem, where continuous velocity and pressure fields are inferred from scattered-time data on the interface position. We employ a volume of fluid approach, i.e. the auxiliary variable here is the volume fraction of the fluids within each phase. For the forward problem, we solve the two-phase Couette and Poiseuille flow. For the inverse problem, three classical test cases for two-phase modeling are investigated: (i) drop in a shear flow, (ii) oscillating drop and (iii) rising bubble. Data of the interface position over time is generated by numerical simulation. An effective way to distribute spatial training points to fit the interface, i.e. the volume fraction field, and the residual points is proposed. Furthermore, we show that appropriate weighting of losses associated with the residual of the partial differential equations is crucial for successful training. The benefit of using adaptive activation functions is evaluated for both the forward and inverse problem.


Introduction
Since multiphase flows with fluid-fluid interfaces are omnipresent in nature and industrial applications, the accurate numerical simulation of such flows is subject of intense research. Despite the challenges posed by multiphase flows such as large density and material parameter ratios (Bussmann et al., 2002;Scardovelli & Zaleski, 1999) and modeling of surface tension (Popinet, 2018), a wide range of interface tracking/capturing algorithms has been developed. Examples include the Level-set (Sussman, 1994), Front-tracking (Tryggvason et al., 2001) and Volume-of-fluid (Hirt & Nichols, 1981) method.
Recent successes of machine learning in cognitive science (Lake et al., 2015) and genomics (Alipanahi et al., 2015) have motivated researchers to apply such techniques to computational physics. Deep learning in particular has brought about major breakthroughs in image and speech recognition (Lecun et al., 2015). The origin of deep learning methods for solving physical problems governed by partial differential equations (PDE) may be tracked back to Lagaris et al. (1998). Raissi et al. (2019a) introduced physics-informed neural networks (PINN): Deep neural networks that are constrained to respect the underlying pyhsical laws of the observed data. The strength of this approach lies in its flexibility as it may be used for inference, i.e. data driven solution of PDE, but also for system identification, i.e. data driven discovery of PDE. In the latter case, parameters of the governing equations may be completely or partially unkown, and are learned from the observed data. For inference, PINN provide an alternative approach to traditional mesh-based methods for solving PDE from initial and boundary conditions, referred to as the forward problem. However, they also provide an arguably more attractive application, which is solving the inverse problem.
In this case, potentially sparse and noisy data scattered across space and time of either auxiliary variables and/or some quantities of interest (QoI) are availabe. These data alongside the given physical laws are used to infer the entirety of the QoI within the whole spatio-temporal domain.
The performance of PINN has been improved continuously. Lu et al. (2019) presented Deep-XDE, which is a customizable python framework providing building blocks to construct individual problems regarding the spatio-temporal domain and boundary conditions. They proposed a new method to distribute the training points for the residual of the PDE (residual points). Analogously to adaptive refinement in mesh-based solvers, residual points are added during training where the residua of the PDE is large, improving the efficiency of the training process. Dwivedi et al. (2019) proposed distributed PINN to evade the difficulty associated with training deep neural networks due to the well known problem of unstable and vanishing gradients (Pascanu et al., 2013). Instead of using one potentially deep PINN for the whole spatio-temporal domain, the authors suggest to divide the domain into cells and locally install a PINN in each of these cells. To preserve global continuity and differentiability of the solution, it is necessary to minimize corresponding losses at the cell faces. Compared to one global PINN, the local PINN needs to perform a less complex task. Thus they require less hidden layers and are therefore easier to train. Jagtap et al. (2019Jagtap et al. ( , 2020 suggest adaptive activation functions by introducing scaled coefficients into the activation functions, which are optimized alongside the network parameters resulting in better convergence and accuracy. The possibility of solving the inverse problem naturally gives rise to novel approaches for the quantification of flow fields. Direct measurements of the velocity and pressure for an entire flow field is usually impractical. The idea is to combine easier obtainable data of auxiliary variables with the knowledge of underlying physical laws to extract quantitative information of the entire flow field. One example is the inference of velocity and pressure fields from a passive scalar, e.g. smoke or contrast agents, that are used to experimentally visualize the flow field. By encoding the incompressible Navier-Stokes equations and an advection-diffusion equation for a passive scalar into a PINN, Raissi et al. have shown that hidden fluid mechanics (Raissi et al., 2020(Raissi et al., , 2019b) is a promising application. Another example was presented by Mao et al. (2020), who quantified compressible flow fields using data of the density gradient, which experimentally may be generated using Schlieren photography. We extend this idea to two-phase flows using a Volume-of-fluid approach.
Here, the neural network receives data on the interface position scattered over time, motivated by experimental setups where the interface position is obtained using non-invasive methods, namely optical techniques combined with image processing (Murai et al., 2001;Takamasa et al., 2003;Murai et al., 2006;Poletaev et al., 2020), provided a direct observation of the interface is possible. In experiments that do not allow for direct observation, methods such as ultrasonic detection (Murai et al., 2010) may be used. Best-practice precedents for training point distribution and choice of hyper-parameters for the neural network and training process are shown. The predicted results are validated with analytical and CFD solutions. The CFD simulations are performed with ALPACA (Winter et al., 2019;Kaiser et al., 2019b,a), a sharp interface capturing (Level-set) two-phase solver for compressible flows.
The paper is structured as follows: Section 2 provides a description of physics-informed neural networks and the governing equations including the Volume-of-fluid method. In section 3 the results are presented. For the forward problem, a 1D two-phase Couette flow and a quasi 1D two-phase Poiseuille flow is investigated. Subsequently, the inverse problem is studied on various 2D test cases including a drop in a shear flow, an oscillating drop and a buoyancy driven rising bubble. Section 4 concludes this work with final remarks.

Governing equations
The governing equations for an incompressible, Newtonian fluid under the effect of surface tension and gravitational forces in non-dimensional ( * ) form are given by where u * = [u * , v * ] T and p * and ρ * denote the velocity, pressure and densitiy, respectively. The Reynolds number Re, Weber number W e and Froude number Fr are defined as with ρ r , u r and L r being reference quantities for the density, velocity and length scale. The dynamic viscosity, surface tension coefficient and gravitational acceleration are indicated by µ, σ and g, respectively. The relations between dimensional and non-dimensional quantities are as follows: u = u * u r , ρ = ρ * ρ r , x = x * L r and p = p * ρ r u 2 r . In the Volume-of-fluid (Hirt & Nichols, 1981) method, α represents the volume fraction of the respective fluid in each computational cell. Within the proposed deep learning framework the background computational mesh is used. In this case a phase characteristic variable φ (Scardovelli & Zaleski, 1999), which can only adopt the values 0 and 1, identifies the phase assignment of any point of the domain. The relation between both scalar fields is α = φdV/∆V , where ∆V is the volume of a computational cell. The local averaged density ρ and viscosity µ are evaluated as Note that fluid 1 and fluid 2 correspond to a volume fraction of 1 and 0, respectively. The interface region is indicated by 0 < α < 1. The volume fraction is advected by the local fluid velocity.
The curvature is given by
The solution is approximated by a deep neural networkÛ(x * , t * ; w), with w being the network parameters, namely its weights and biases, that are tuned during the training process. For a detailed description of how deep neural networks function regarding feed forward and backpropagation, the reader is referred to (Goodfellow et al., 2016). Using automatic differentation (Guenes Baydin et al., 2018;Abadi et al., 2016),Û may be derived with respect to its inputs. This way, the residual that maps the spatio-temporal coordinates to the residuals of the governing equations described above, is defined. Note thatÛ andf share weights and biases. During training,Û receives the observed data, i.e. initial and boundary conditions in the forward problem and scattered measurements of auxiliary variables or partial quantities of interest in the inverse problem. Simultaneously, the residual networkf acts as a regularization agent for possible solutions that fit the observed data to also fulfill the governing equations. We minimize the mean squared error between prediction and observed data M SE u and the mean squared error of the residual network output M SE f . Here, denote the set of training points for the observed data and the set of residual points, respectively. The corresponding number of samples is indicated by N u and N f . Two important notes shall be made. First, M SE u is not yet explicitly defined as it is in practice a sum of multiple losses that are associated with observed data. These might e.g. be boundary/initial conditions and/or scattered measurements across the spatio-temporal domain.
This implies that the composition of M SE u is case dependent and needs to be specified for each case individually. Second, M SE f is a weighted sum of the mean squared error of the continuity, momentum and volume fraction advection equation residuals The indices m, u, v and α indicate continuity, x-and y-momentum and advection equation, respectively. The loss weights are crucial for the training process when inferring flow fields with surface tension forces as will be shown later. They are case dependend and will thus be specified in the results for each case individually. Note that the components M SE f,i all are computed at the same (Jagtap et al., 2019 introduce a scaled coefficient n · a into the activation function argument. Here, n is the scale factor and a is the adaptive activation coefficient, which is optimized along with the neural network parameters. The scale factor is used to adjust the sensitivity of the optimization process with respect to a. The initial values of a and n are case dependend and will thus be specified for each case individually in the results.

Results
To quantify the error between neural network predictions and analytical or CFD solutions (both will be referred to as exact solution), the mean absolute error M AE and the relative L 1 and L 2 error norms will be used. For an arbitrary quantity q, these errors are computed as follows: For the following test cases, weights and biases are initialized by glorot normal (Glorot & Bengio, 2010) and zero initialization, respectively. The activation functions for the hidden layers are hyperbolic tangents. The output layer activation functions are linear for the velocities. For the pressure and the volume fraction, an exponential and logistic function is used, respectively.
As for the training procedure, the network parameters are optimized using the Adam algorithm (Kingma & Ba, 2015). The training data are shuffled at the start of each epoch and subsequently split into multiple batches. The training and neural network hyper-parameters will be specified for each problem, individually. The implementation is done with the library TensorFlow (Abadi et al., 2016). We provide the code to train and visualize the rising bubble case at https://github.com/ aaronbuhendwa/twophasePINN.

Forward Problems
The forward problem is characterized by solving the partial differential equation from initial and boundary conditions. Thus, the total loss function M SE total here consists of the losses associated The loss functions for initial and boundary conditions are represents the data for the initial condition and corresponds to the data for the boundary condition. The loss weights are w f,i = 1.0 with i = {m, u, v, α}. Note that M SE f is computed as described by equations (11) and (12).
The training points are drawn from a random uniform distribution within the respective bounds. nodes each. The terms in equations (1), (2) and (8) that may be neglected for this specific case are not implemented for computational time reasons. In particular, these are terms with v, ∂ ∂x and gravitational as well as surface tension forces. From experience, it has proven to be useful to successively reduce the learning rate during the training procedure. Table 1 lists the training hyper-parameters for this case.
In Figure 2 the predicted time evolution for a fixed viscosity ratio is shown on the left. Furthermore, the predicted and exact steady state solution for various viscosity ratios are compared on the right. To evaluate the error between prediction and analytical solution, a linearly spaced grid of 100 spatial points in y-direction is used. The corresponding errors are presented in Table 2, with relative errors ranging between 1.4% to 3.6%, increasing with the viscosity ratio. Note that these results are for fixed activation functions.
A reduction of the steady state relative errors of about 20% is achieved. At the same time, the computational cost per epoch is increased by about 28%.

Poiseuille flow
The poiseuille flow describes a viscous channel flow driven by a constant pressure gradient. The quasi 1D two-phase configuration studied here is depicted in Figure 4.
The analytical steady state solution for this setup consists of two quadratic functions for each phase with a kink at the interface. It is given by (Rezavand et al., 2018)   where u i with i ∈ {1, 2} denotes the steady state velocity profile within the respective fluid and b = 0.5 is half the domain height. All reference quantities for non-dimensionalization are 1.0.
The loss functions for initial and boundary conditions are given by is the training data for the north and south boundary and represents the data for the east and west boundary. The loss weights are w f,i = 1.0 with i = {m, u, v, α}.
Note that M SE f is computed as described by equations (11) and (12).
The distribution of the training points for t = 0 is shown in Figure 5. For the initial condition, the interface is refined in a range of 0.02 in both positive and negative y-direction from the interface with 300 points. For the rest of the domain, 400 points are used, resulting in N IC = 700. We draw 20 time snapshots randomly from a uniform-distribution within the respective bound to distribute  the spatial points for each boundary and the residual points. Here, it is ensured that t = 0.0 and t = 0.5 are included in the set of time snapshots and that the time snapshots for the east and west boundary are equal, since this is necessary for the periodic condition. At each time snapshot, the interface is refined with 800 spatial residual points. The refinement region for the residual points spans 0.04 in positive and negative y-direction from the interface. The rest of the domain is filled with 4000 spatial residual points. As for the boundary conditions, 20 spatial points are random uniformly drawn at the north and south boundary and linearly spaced at the east and west boundary within the respective bounds, resulting in N N S = 800 and N EW = 400, respectively.
Note that for periodic boundaries, one sample corresponds to a pair of points. As described in the previous case, the set of points for the initial and boundary conditions are added to the set of residual points. This results in a total of N f = 98300 points.
terms with v and gravitational/capillary forces. Table 4 lists the training hyper-parameters. In Figure 6 the predicted time evolution for a fixed viscosity ratio is shown on the left. On the right side, the predicted and exact steady state solutions for various viscosity ratios are compared. Table   5 lists the corresponding errors. The relative errors are ranging from 0.55% to 4.4% increasing with the viscosity ratio µ 1 /µ 2 . These errors have been evaluated on a linearly spaced grid of 100 points in y-direction at x = 0.
A single adaptive activation coefficient a is introduced for each node of the hidden layers to study its influence on the training process. The initial value is 0.1 and a scale factor of 10 is used. Figure 7 compares the loss history with and without adaptive activation functions for a fixed viscosity ratio. The observed pattern is quite similar to the previous case with an initial drop of a.
Subsequently, after the first learning rate reduction at epoch = 3000, an increase of both a and the

Inverse Problems
While the direct measurement of velocity and pressure of entire flow fields is impractical, the measurement of auxiliary variables like e.g. a passive scalar (smoke or dye) to visualize the flow is comparatively easy. Combining these data of experimentally obtainable auxiliary variables with the knowledge of the underlying physical laws to extract quantitative information of the entire flow field is the motivation for the inverse problem. Here, the auxiliary variable is the interface position, i.e. the volume fraction field, which in an experimental setup is obtained using a combination of optical techniques and image processing (Murai et al., 2001(Murai et al., , 2006Takamasa et al., 2003;Poletaev et al., 2020). Three 2D test cases are investigated, namely a drop in a shear flow, an oscillating drop and a rising bubble. These are simulated with ALPACA (Winter et al., 2019;Kaiser et al., 2019b,a), which is a finite-volume based solver for compressible two-phase flows that captures the interface using the Level-set method (Sussman, 1994). From the CFD solution, the information of the interface position over time is retrieved. The neural network predictions are then compared to the CFD solution.
The interface refinement strategy with regard to the training point distribution is more sophisticated compared to what has been shown in the previous subsection for the 1D problems, since surface tension effects have now to be considered. In Figure 8, a representative interface segment with all refinement regions is shown. We distinguish between three distinct regions, namely the interface refinement region for the residual points, the interface refinement region for the points to fit the volume fraction field α and the nearfield refinement region for the residual points. The interface thickness δ I predicted by the trained model can thus be controlled with δ 1 and equals, depending on the quality of the fit, about 2δ 1 . Furthermore, it is ensured that the residual points within the interface refinement region always measure a non-zero absolute gradient |∇α| = 0.
The total loss function M SE total for the inverse problem is composed of the losses associated with the fit of the interface, i.e. the volume fraction M SE α plus the loss associated with the residual of the governing equations M SE f . Additionally, in order to infer the unique solution that is given by the CFD, the neural network must be informed of the corresponding boundary conditions M SE BC . Here, denotes the training data for the volume fraction. The computation of the boundary losses will be specified for each case individually. Note that M SE f is computed as described by equations (11) and (12).

Drop in a shear flow
A liquid drop is immersed within another fluid and a shear flow is generated by moving boundaries. Viscous forces will cause the drop to deform, changing its circular shape to an ellipsoidal shape. Capillary forces counteract this process, resulting in a steady state deformation at equilibrium. The final shape of the drop is described by the viscosity ratio µ b /µ d and the capillary number Ca = µ b Rγ/σ (Taylor & I., 1934), where R is the initial drop radius,γ is the shear rate, σ is the surface tension coefficient and the indices b and d indicate the bulk and the drop phase, respectively. This is a standard case studied in two-phase modeling (Luo et al., 2015;Hu & Adams, 2007). Figure 9 shows the initial and boundary conditions for the present setup. The within the same order of magnitude, i.e. the labels should be normalized. With that in mind, an ambient pressure p ∞ = 10 is prescribed.
The loss functions for the boundary conditions are given by denotes the training data for the north and south boundary condition and represents the training points for the east and west boundary condition. The data is used to enforce the ambient pressure at all boundaries at t = 0.
For small Weber numbers, the loss of the momentum equation residual is orders of magnitude larger than all other losses due to the error between the surface tension term and the pressure gradient. In particular, the gradient of the volume fraction ∇α becomes larger the better the fit of the interface. Therefore, the loss has to be weighted accordingly. When using the mean squared error as loss function, we found that weighting the loss of the momentum equation residual with a factor of O(−3) is suitable for cases with Weber numbers of O(−2). The Weber number here is W e = 0.04. Successively increasing the loss weights while monitoring the training history should be done to achieve the best possible results. Here, the loss weights are w f,m = 1.0, w f,u = 1e−3, Note that M SE f is computed as described by equations (11) and (12).    Figure 10.
The neural network that is used for this case maps x, y, t → u, v, p, α and is composed of 7 hidden layers with 250 nodes each. The training hyper-parameters are listed in Table 7. In Figure   11, The predicted qualitative time evolution is shown in Figure 12. Noticable are the non-zero velocities at t = 0, which may be improved by increasing the amount of time snapshots at very early stages or introducing losses that prescribe initial conditions for u and v. We compare the neural network prediction of the steady state at t = 0.5 to the exact solution in Figure 13. The predicted and exact pressure field have been normalized by substracting the corresponding spatial average pressure from both fields, since for incompressible flows the pressure is unique only up to a constant. In Figure 14 the mean absolute error and the relative errors norms between prediction and exact solution over time are depicted. Looking at the velocity, after high initial relative errors resulting from the fact that the exact velocity at t = 0 is zero, the relative errors are about 0.5% and 5% for u and v, respectively. The relative L 1 and L 2 errors for the pressure are about 10% and 4%, respectively. As for the quality of the interface fit, the predicted interface thickness δ I is computed by measuring the orthogonal distance between the isolines α = 0.01 and α = 0.99 in x and y direction and averaging the results for t = 0. This results in δ I = 8.674e−3, which is about 2δ 1 .  To study the influence of adaptive activation functions, the following two adaptive activation function configurations will be used. For the first one, referred to as adaptive 1, a single adaptive activation coefficient a is introduced to each node of all hidden layers, with an initial value of 0.1 and a scale factor of 10. The second one, referred to as adaptive 2, introduces two layer-wise adaptive activation coefficients a 1 and a 2 into each node of the hidden layers 1 − 4 and 5 − 7, respectively.
We scale a 2 with a value of 10 and initialize it with 0.1. Since the first coefficient is exptected to be harder to optimize as it appears in earlier layers, a scale factor of 20 and an initial value of 0.05 is used. This way, the optimization process is more sensitive towards a 2 . Table 8 lists both configurations.
The training history for the models fixed and adaptive 2 is illustrated in Figure 15.    point, a 2 also shows a small jump towards larger values, however, subsequently reduces again.
Only after the second learning rate reduction at epoch = 13000, a 2 increases successively with a very small rate. In Table 9, the final losses for fixed and adaptive activation functions are listed.
While adaptive 1 shows increased final losses, the configuration adaptive 2 significantly reduces all losses. An evaluation of the errors between prediction and exact solution for the models fixed and adaptive 2 reveals that there is a signifcant reduction of the initial relative errors of u and v (compare Figure 14), i.e. the inference of the initial zero velocity is improved. For the rest of the temporal domain though, no major improvements regarding the relative errors are observed.
The average computational cost per epoch increases by about 30% when using the configurations adaptive 1 and adaptive 2 compared to fixed.

Oscillating droplet
An ellipsoidal drop is immersed into another fluid. Surface tension forces will drive an oscillation that is associated with a transfer between potential and kinetic energy. The oscillation period is (Strutt, 1879), where indices b and d indicate the bulk and drop phase, respectively. The radius R is the effective circle radius. This is a further standard test case studied in two-phase modeling (Luo et al., 2015;Garrick et al., 2017). In Figure 16 non-dimensionalization. All other reference quantities are 1.0. The boundary conditions are zero gradient for u and v and an ambient pressure of p ∞ = 10. Furthermore, the neural network is given actual measurements of the velocities u and v at the boundaries. While this is not necessary to infer a unique solution, these measurements have been found to significantly improve the inference of the velocity field. A comparison of the predictions with and without measurements for u and v at the boundaries is shown in Appendix B.
The loss functions are computed as follows: denotes the points for the north and south zero gradient boundary condition and {x i EW , t i EW }| N EW i=1 represents the points for the east and west zero gradient boundary condition.
is used to enforce the measurements for u and v as well as the ambient pressure. For this case, the loss weights are w f,m = 1.0, Note that M SE f is computed as described by equations (11) and (12).
The CFD solution provides data on a uniform 512×512 grid in linearly spaced time snapshots with ∆t = 0.005 for two entire oscillation periods. Here, we infer one oscillation period. For the present configuration the oscillation period is T = 0.41. The temporal domain however is t ∈ [0.0, 0.45], as a few time snapshots are added before and after to better capture the still interface at maximum potential energy correlated with an ellipsodial shape. In particular, we use 46 linearly spaced time snapshots, i.e. ∆t = 0.01, to random uniformly distribute the spatial points for the volume fraction α and the residual points. At each time snapshot, the interface refinement region for α is populated with 500 points, with δ 1 = 4e−3 and δ 2 = 8e−3. For the rest of the domain, 300 points are used, resulting in N α = 36800. The inferface and nearfield refinement region for the residual points are filled with 600 and 1500 points, respectively, using δ 3 = 1.5e−2 (see Figure 8 for    Figure  In Figure 19, the predicted qualitative time evolution is shown. In particular, the time snapshots corresponding to the minima and maxima of the kinetic energy of the drop are displayed. Noticable are the non-zero velocities at the times of maximum potential energy, i.e. when the drop has an ellipsoidal shape. As described earlier, the temporal domain was already extended by a few time snapshots before and after the start and end of the oscillation period for this very reason. The examplary time snapshot t = 0.33 of the prediction is compared to the exact solution in Figure   20. The pressure fields of both the prediction and the exact solution have been normalized by subtracting the corresponding mean pressure from the pressure field. This is necessary since the pressure in incompressible flows is unique only up to a constant.   errors. In the transition regions, where the drop is going from an ellipsoidal to a circular shape, the relative error norms for the velocities range between 8% − 10%. As for the pressure, the relative L 1 and L 2 errors are about 3% and 5%, respectively. The predicted and exact kinetic energy of the drop over time is compared in Figure 22. The largest relative errors of the kinetic energy are observed at the peaks where it is about 3%.

Rising bubble
A bubble is immersed into another, heavier fluid. Buoyancy causes the bubble to rise and subsequently undergo moderate shape deformations. This test case was studied by Hysing et al.
(2009) as a benchmark case to compare various two-phase solver. Figure  The losses associated with the boundary conditions are computed as follows: denotes the training data for the north/south no slip boundary, are the points for the east/west periodic boundary and represents the data to enforce the ambient pressure at the north boundary. The loss weights are  time evolution is shown in Figure 24. Noticable is the symmetry-breaking error in u for early times within the bubble. This may be reduced by distributing more residual points within the bubble for these times. In Figure 25, a comparison between the prediction and the exact solution for an examplary time snapshots t = 1.5 is depicted. To compare the predicted and exact pressure field, they are normalized by subtracting the corresponding mean pressure. The errors between prediction and exact solution over time are displayed in Figure 26. The high initial relative errors of u and v are due to the fact that the exact velocity at t = 0 is 0. Moving forward in time, the relative errors for u and v are about 10% and 8%, respectively. As for the pressure, the high initial relative errors are explained by the initialization of the CFD. Here, only a linear hydrostatic pressure profile within the bulk fluid has been prescribed, while the initial pressure within the drop is constant.   The average relative errors of the pressure are about 2%.
Norms L 1 L 2 Figure 26: Mean absolute error and relative error norms between prediction and exact solution of the rising bubble.
adaptive 1 and 2 are the same as in the previous case (compare Table 11). Both adaptive configurations result in a reduction of all final losses. Adaptive 2 in particular improves the relative L 1 error to the exact solution, reducing the error for u, v, and p on average over time by 10%, 8.8% and 0.3%. The computational cost per epoch increases by about 28% for both configurations.

Conclusion
Physics-informed neural networks give rise to a new approach for the quantification of flow fields by combining available data of auxiliary variables with the underlying physical laws. This way, the velocity and pressure of entire flow fields may be inferred, for which a direct measurement is usually impracticle. We extend previous work, where this method is applied to incompressible (Raissi et al., 2019b) and compressible (Mao et al., 2020) single-phase flows, to incompressible two-phase flows using a Volume-of-fluid approach. The auxiliary variable here is the interface position, i.e. the volume fraction field, motivated by experimentally accessible data of the interface position using a combination of optical techniques and image processing (Murai et al., 2001;Takamasa et al., 2003;Murai et al., 2006;Poletaev et al., 2020).
We investigate three classical test cases in two-phase modeling: A drop in a shear flow, an oscillating drop and a rising bubble. The training data is generated with ALPACA (Winter et al., 2019;Kaiser et al., 2019b,a), a compressible two-phase solver that capures the interface using the Level-set method. An effective strategy of distributing the training points that are used to fit the volume fraction and the points to minimize the residual of the partial differential equations (PDE) is presented. Generally, it is found that the neural networks should be increased in width rather than length to increase their capability. Furthermore, it is shown that the weighting of the losses of the PDE residual is crucial for the optimizer to converge. Depending on the Weber number, the loss of the momentum equation residual can become orders of magnitude larger than all other losses due to the error between pressure gradient and surface tension term. To compensate for this during the training procedure, the momentum equations must be weighted accordingly.  −54%, respectively, is achieved. However, an increase of all other losses is observed. In particular, the loss associated with the volume fraction M SE α increases by 194%. This corresponds to an increase of the mean interface thickness by 59%. The relative L 1 error between prediction and exact solution for both loss weights over time is depicted in Figure A.27. On average over time, there is no signficant difference with regard to the error between both models.
Furthermore, we compare the hyperbolic tangent, which is used throughout this work, with a sine as activation function for the hidden layers. In Figure A.28, the loss history for both activation functions is illustrated. It shows that the sine activation leads to significantly more noise in particular regarding the loss M SE α . Furthermore, the convergence rate of all losses is reduced.