Lagrangian and Eulerian dataset of the wake downstream of a smooth cylinder at a Reynolds number equal to 3900

The dataset contains Eulerian velocity and pressure fields, and Lagrangian particle trajectories of the wake flow downstream of a smooth cylinder at a Reynolds number equal to 3900. An open source Direct Numerical Simulation (DNS) flow solver named Incompact3d was used to calculate the Eulerian field around the cylinder. The synthetic Lagrangian tracer particles were transported using a fourth-order Runge-Kutta scheme in time and trilinear interpolations in space. Trajectories of roughly 200,000 particles for two 3D sub-domains are available to the public. This dataset can be used as a test case for tracking algorithm assessment, exploring the Lagrangian physics, statistic analyses, machine learning, and data assimilation interests.


Specifications
Physics Engineering Specific subject area 4D Particle Tracking Velocimetry (4D-PTV) Lagrangian Particle Tracking (LPT) Direct Numerical Simulation (DNS) Type of data Text file How data were acquired Direct Numerical Simulation (DNS) Synthetic particle transport Data format Raw Parameters for data collection The Eulerian velocity and pressure fields as well as the Lagrangian trajectories were collected for every 10

Value of the Data
• Recent rapid development in time-resolved three-dimensional Particle Tracking Velocimetry (4D-PTV) and Particle Image Velocimetry (PIV) studies arises a need to have ground truth datasets. To this end, a reference dataset was generated from a highly-resolved Direct Numerical Simulation (DNS). • The data can be used by the PIV/PTV algorithm developers for assessment and validation purposes, as well as by those interested in machine learning and data assimilation studies in fluid mechanics. Moreover, scientists can benefit from the Eulerian and Lagrangian snapshots in the dataset to explore the physics of turbulent wake flows. • Four types of data including, Lagrangian trajectories, 3D velocity fields, 2D velocity snapshots, and pressure fields, in two sub-domains are available in the repository. As listed in Table 1 , one or more types of data can be used depending on the application.  15] • There is an open-access Lagrangian particle transport software package in the data repository if interested readers require tracer particle trajectories with different properties including particle concentrations, temporal scale, and noise level.

Data Description
A highly-resolved Direct Numerical Simulation (DNS) of the flow over a smooth cylinder at a subcritical Reynolds number 3900 (based on the diameter D of the cylinder and the free-stream velocity) was performed to generate the data. Double-precision Eulerian and Lagrangian fields for two sub-domains were collected, as shown in Fig  for studies requiring the highest possible temporal resolution. Details of two sub-domains can be found in Table 2 . One Eulerian snapshot of the current wake flow is shown in Fig. 2 . For both sub-domains, Lagrangian trajectories are provided for roughly 20 0,0 0 0 synthetic particles. Three main categories are available in in the data repository, Sub-domain-1, Sub-domain-2, and Software. The snapshots are formatted in text (.txt) and collected in compressed files (.zip). There is no particular requirement for reading and opening the data. The naming format of each snapshot is shown in Fig. 3 . The Eulerian 3D snapshots are saved in vector formats. Therefore, it is necessary to extract them within three internal loops in xyz directions. The users also need to download the grid file separately to find the corresponding coordinates.

Experimental Design, Materials and Methods
The PIV/PTV community consistently requires synthetic datasets to assess and validate developed image based methods. The EUROPIV Synthetic Image generator (SIG) developed a standardised synthetic dataset framework for the PIV/PTV community [16] . SIG targeted three objectives including, algorithm performance assessment, algorithm sensitivity analysis as a function of characteristic parameters, and algorithm comparison. Characteristic parameters refer to particle concentration (i.e., density), temporal scale, and noise ratio that can determine how the synthetic dataset is similar to a real experiment. Since then, by increasing capabilities of the PIV/PTV techniques, algorithm assessments constantly require datasets of flows with relatively complex and high gradient regions associated with 3D directional dynamics. That was the motivation to generate a database of Eulerian velocity and pressure fields with Lagrangian trajectories for the wake carrying complex flow motions downstream of a smooth cylinder. Applications of the current dataset can be summarised in Table 1 .

Eulerian method
The computations are carried out with the open-source flow solver named Incompact3d [17,18] based on sixth-order finite-difference com pact schemes for the spatial discretisation on a Cartesian grid. Simplicity of the Cartesian grid offers the ability of implementing higher order spectral schemes for spatial discretisation. For the current simulation, the time advancement was performed with an explicit third-order Adams Bashforth scheme. The governing equations are solved with a fractional step method to treat the incompressibility constraint, which requires solving an additional projection step, the Poisson equation. This Poisson equation is fully solved in spectral space using three-dimensional Fast Fourier Transforms (FFTs). In the present work, the smooth cylinder is modelled using a customised immersed boundary method (IBM) with an artificial flow inside the cylinder to ensure the smoothness of the velocity field while imposing a no-slip boundary condition at the cylinder. More details about the flow solver can be found in Laizet and Lamballais [17] . Incompact3d is built with a powerful 2D domain decomposition for simulations on super-computers. The computational domain is split into a number of subregions (pencils) which are each assigned to an MPI process. The derivatives and interpolations  in the x-direction (y-direction, z-direction) are performed in X-pencils (Y-pencils, Z-pencils), respectively. The 3D FFTs required by the Poisson solver are also broken down as series of 1D FFTs computed in one direction at a time. Global transpositions to switch from one pencil to another are performed with the MPI command MPI_ALLTOALL(V) . Incompact3d can scale well with up to hundreds of thousands of MPI processes for simulations with several billion grid nodes [18] . Inflow/outflow boundary conditions are implemented along the streamwise direction with free-slip and periodic boundary conditions along the vertical and spanwise directions, respectively. The simulation was performed on nearly 4 × 10 10 grid points (see Table 2 ). The grid was uniform in the streamwise and spanwise directions, while a non-uniform grid was used in the vertical direction, with a grid refinement towards the centre of the cylinder. The finest grid size in the vertical direction was y min = 0 . 00563 D . The dimensional DNS time step was 0 . 0 0 075 D/U ∞ (where U ∞ is the free-stream velocity). It takes 6 6 67 DNS time steps to simulate one vortex shedding. It should also be mentioned that 1333 DNS time steps correspond to one integral temporal scale D/U ∞ .

Particle transport
The Johns Hopkins Turbulence Database (JHTDB) generated from DNS has been employed widely for the quantitative performance assessment of PIV/PTV algorithms [19] . JHTDB contains nine multi-terabyte datasets in turbulent cases such as homogeneous isotropic turbulence (HIT) and channel flows. The current study brings added value to the available databases [16,19,20] by providing a case in the wake flow. Numerous complexities occur in the wake behind the cylinder at subcritical Reynolds number, which can be a challenging test case for quantitative assessments.
In the present dataset, synthetic particles were transported using a conventional fourth-order Runge-Kutta scheme in time. The Lagrangian velocities of the synthetic particles were calculated by trilinear spatial interpolations over eight nearest neighbour grid points. To mimic the real experimental condition, three characteristic parameters, temporal scale, particle concentration, and noise ratio must be defined. Depending on the desired temporal scale, the synthetic time step can be calculated by knowing that the DNS time step of the current dataset is roughly 20 times smaller than the Kolmogorov temporal scale. The desired temporal scale should be defined based on experimental hardware facilities, such as the illumination pulse rate or the . The Kolmogorov length scale is almost 2.8 times smaller than the average grid size in the vertical direction for the current dataset. In a real experiment, the achievable spatial resolution is highly limited by the particle seeding system and the PIV/PTV algorithm performance. Therefore, an appropriate number of synthetic particles in the domain can be selected depending on the desired spatial resolution. An open-access tracer particle transport software package in MATLAB graphical user interface (GUI) is available as an additional tool in the data repository. Interested users can create tracer particle trajectories with different properties including particle concentrations up to the DNS spatial resolution, temporal scale up to the DNS time scale, and noise level.

Particle transport accuracy
A comparison was made between the transport of particles at every 10 DNS time step (i.e., temporal scale of Sub-domain 1) with the transport of particles at every DNS time step in Subdomain 2, to quantify the uncertainty level of trajectories in Sub-domain 1. As a result, the mean deviation of the trajectories between two temporal scales after 10 0 0 DNS time steps in the larger domain is equal to 3 . 28 η, with η the Kolmogorov spatial scale. The standard deviation of position error is σ = 0 . 017 η. Fig. 4 shows a 2D map of the non-dimensional position deviation /η between two temporal scales averaged in the spanwise direction. Therefore, it is recommended to use the data from Sub-domain 2 for studies requiring accurate trajectories inside the wake region, while the data from Sub-domain 1 are better suited for studies focusing on large scale motions.

Ethics Statement
This study did not conduct experiments involving humans and animals.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.