Dataset of numerically-generated interfaces of Newtonian jets in CIJ regime

The so-called Rayleigh-Plateau instability of fluid jets has been widely studied and is extensively used in the Continuous InkJet (CIJ) printing process. The present dataset contains the numerically-generated interfaces of Newtonian fluids jets in CIJ jetting conditions for low to moderately high stimulation amplitudes. We used Basilisk, an open-source Computational Fluid Dynamics (CFD) software specialized in multiphase flow to compute thousands of jets of fluids for Reynolds numbers ranging from 100 to 1000. The dataset gives raw data of CFD simulations liquid-air interfaces, for each Reynolds – stimulation amplitude pair. The present 10 GB dataset contains ≈110000 interfaces which allows to use novel machine learning and deep-learning approaches to explore jet morphologies evolution that can’t be addressed with the classical Rayleigh’s theory.


Specifications
Hydrodynamics Specific subject area Multiphase Flow, Jets of fluids, Continuous InkJet Type of data Text Files How data were acquired Numerical simulation using Basilisk software Data format Raw Parameters for data collection Jets are axisymmetric, with dimensionless radius of 1 and dimensionless inlet velocity of 1. The periodic amplitude disturbance u 0 that will trigger and control the Rayleigh-Plateau instability writes where the dimensionless frequency is f r = 1 7 and δ is the stimulation amplitude which ranges from 1 to 3 . 5% of the velocity inlet with a 0 . 5% interval. The fluids viscosities range from Reynolds 100 to 10 0 0 with an interval of 5. The surface tension is fixed so as the Weber number is We = 600 . The density and viscosity ratios between the gaz and liquid phases are fixed to 10 0 0 and 500, respectively. Description of data collection The dataset has been numerically-generated using the open-source software Basilisk [1] on the GRICAD infrastructure ( https://gricad.univ-grenoble-alpes.fr ) Data source location Grenoble Alpes Université, Grenoble, France Data accessibility https://data.mendeley.com/datasets/3ds9h73pnv/1

Value of the Data
• The present dataset gives the interfaces of jetted fluid for both a large range of Reynold numbers and disturbance amplitudes. These interfaces, that can't be analytically retrieved, are generated solving the full Navier-Stokes equation which are computationally intensive simulations. • The data can be useful for engineers and researchers who work in the fluid jetting research area with particular focus on CIJ. • The information contained in the present dataset can support either fundamental research on jetted fluids and drops by comparing analytical development to numerical simulations or applied CIJ research by developing further the process. • By using machine learning and more specifically deep learning approaches, data may give a better insight of the morphology of CIJ jets -i.e. breakup-length, satellite regime, drop shape, etc -and allow to further improve the CIJ process or even use it for new application fields.

Data Description
The data is organized as follows: for each Reynolds number and stimulation amplitude data files are in a folder named Re _ Amp, with Re the Reynolds number ranging from 100 to 10 0 0 and Amp the stimulation amplitude ranging from 0.01 to 0.035. In each folder, i.e. for every Reynolds and stimulation amplitude pair, 101 raw text files are provided named inter-faces_Re_Amp_Time.dat where T ime refer to the simulation time at which the interface has been saved (see next section for a more detailed description of the simulation). The data in the interface files are in Gnuplot-style [2] format: the interface is described by segments of 2 points, with an x and y position, separated by a line break. Fig. 1 illustrates the generated interface ( Re = 975 , Amp = 0 . 035 at T ime = 102 . 06 ) plotted using Gnuplot.  Along with the interface files, in each folder is coefficients.csv file which contains information for the first 10 drops for every T ime .
The first three columns, Reynolds, Amplitude and Time are explicit; for every drop, six columns containing the barycenter, width, height, Feret diameter, area and volume of the drop are computed during the simulation and added to the file as illustrated 1 . Note that the computation of both the area and volume account for the model axisymmetry.
Note: As pictured on Fig. 2 , the main jet is always considered as the first drop and thus numbered drop 0 . The drops are numbered from the inlet (left) to the outlet (right) and the drop numbering is not preserved over the simulation and changes when a new drop is generated. A maximum of 10 drops are kept and NA values are added where no data is available, i.e. when less than 10 drops are present.

Governing equations
The dataset has been numerically-generated using Basilisk software [1] which is dedicated to solving partial derivative equations. It uses a tree data structure (quadtree in 2D and octree in 3D) that allows to refine locally and dynamically the mesh based on automatic or user-defined criteria [3] . In the present case it solves the multiphase, unsteady and incompressible Navier-Stokes equations and with u the velocity field, ρ the density of the considered phase, D the deformation tensor such as D = [ ∇u + (∇u ) T ] / 2 and μ the dynamic viscosity.
The interface between the fluids is tracked with a Volume-Of-Fluid (VOF) method [4] . At the interface the term is also added to the right-hand side of Eq. (2) to account for the surface tension effect, with σ the (constant) surface tension, κ the interface mean curvature and f the volume fraction of fluids describing the interface. The model is axisymmetric (see Fig. 2 ) with a velocity boundary condition u 0 on the liquid phase with a periodic amplitude disturbance that will trigger and control the Rayleigh-Plateau instability with δ the disturbance amplitude, f r the frequency of the perturbation and t the simulation time. The frequency is fixed to f r = 1 / 7 for all jets.
An outflow boundary condition is imposed on all the remaining boundaries. The initial radius R 0 is set to 1 and both the density and the viscosity ratio of the liquid-gas system are fixed to 10 0 0 and 50 0, respectively. The Reynolds Number is directly related to the inverse of the Newtonian viscosity (the subscript l stands for liquid) as ρ l , R 0 and u t=0 0 are set to 1 for all jets.

Meshing strategy and convergence
Basilisk provides a powerful Automatic Mesh Refinement (AMR) strategy based on the use of quadtrees (in 2D). The domain is a square of dimension 512 and the mesh is refined locally and dynamically based on a wavelet-estimated discretization error [5] : a user-defined list of fields is analysed and the mesh is refined/coarsened based on a user-defined error criteria (one per field). In the present simulation, the adaptation is based on both the phase fraction f and the velocity fields u with 10 −4 and 10 −3 error criteria, respectively, with a user-specified refinement level n , such as the element size x can be as small as x = 512 2 n in the most refined zone. To assess the most efficient meshing approach, three strategies are compared in term of accuracy and performance: 1. An automatic refinement up to the maximum level is forced around the interface, i.e. in zones where the fluid fraction is between zero and one. This approach is similar to what can be performed with Gerris software [6] , the Basilisk's predecessor (and done in [7] for example). 2. Taking advantage of the adaptive wavelet algorithm of the Basilisk toolbox to refine/coarsen where it is needed, no matter the liquid fraction value. The adaptation is based on both the phase fraction f and the velocity fields u with 10 −4 and 10 −3 error criteria, respectively. 3. Mixing the above two strategies by forcing the finest mesh close the interface and adapting elsewhere is needed. This strategy will be considered as a reference as it should be the most accurate.
As one can expect, the meshing strategy has a great impact on the number of cells and, consequently, on the computation time. Tab. 2 gives the CPU time spent for each strategy using 4 cores on the same CPU (Intel E5-2670). Strategy 1 is the fastest while the other two strategies  are more expensive as the surrounding air is also partly refined although not having a great influence on the jet morphology.
When comparing the jet morphologies at the breakup and the same simulation time of t = 128 and for Re = 500 ( Fig. 3 ), the 3 strategies give a very close morphology, with the best agreement between strategies 2 and 3. Hereafter, strategy 2 will be used for all the simulations as it is slightly more accurate in term of interface shape than strategy 1 and less computationally expensive than the last strategy.
Using the meshing strategy 2, a convergence study is then performed and it has been found that the converged refinement level is 15: as pictured Figure 4 , the interfaces obtained with a refinement level of 14 are almost identical to those obtained with a refinement of 15.
With a refinement level of 15 in the most refined zone, the element size can be as small as x ≈ 0 . 0156 or approximately 1 . 5% of the initial jet radius. An example of the resulting adaptative mesh is plotted Fig. 5 .
It is worth pointing out that the timestep is automatically adjusted so that the Courant number remains lower than 0.4.

Comparison with experimental data
The present numerical results is compared Fig. 6 to experimental ones from [8] . The experimental results have an Ohnesorge number Oh = √ We Re = 0 . 2 , corresponding to Re = 120 in the present dataset. Although the numerical inlet velocity is different from the experimental one, it has been showed in [8,9] that, until a moderate amplitude of stimulation, the jet morphology is not influenced by the nozzle geometry.  Both morphologies show a very good overall agreement and the small discrepancy observed is due to the different wave number x , with x = 0 . 9 and x = 0 . 6 for the numerical and experimental jet, respectively.

Data collection
A general overview of the methodology used to collect and generate the data is given Fig. 7 . For each simulation output, i.e. for each Re − Amp pair and at each simulation time t, 2 sets of information are generated: 1. descriptors of each drop are computed (area, volume, etc) and stored in coefficients.csv ; 2. fluid-air interface segments are extracted and saved in interfaces_Re_Amp_Time.dat file.

Ethics Statement
The authors followed generally expected standards of ethical behavior in scientific publishing throughout article construction.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.