Dataset of a parameterized U-bend flow for deep learning applications

This dataset contains 10,000 fluid flow and heat transfer simulations in U-bend shapes. Each of them is described by 28 design parameters, which are processed with the help of Computational Fluid Dynamics methods. The dataset provides a comprehensive benchmark for investigating various problems and methods from the field of design optimization. For these investigations supervised, semi-supervised and unsupervised deep learning approaches can be employed. One unique feature of this dataset is that each shape can be represented by three distinct data types including design parameter and objective combinations, five different resolutions of 2D images from the geometry and the solution variables of the numerical simulation, as well as a representation using the cell values of the numerical mesh. This third representation enables considering the specific data structure of numerical simulations for deep learning approaches. The source code and the container used to generate the data are published as part of this work.


Subject
Computational Mechanics Specific subject area Numerical modeling of fluid flow and heat transfer phenomena (conjugate heat transfer) using Computational Fluid Dynamics (CFD) Type of data Table Image Code files (Python, Bash) How the data were acquired A parameterized model of the U-bend flow was defined.A variety of open-source tools, including Python and OpenFOAM, were used to address and solve different physical phenomena associated with this parameterized model.To ensure transparency and reproducibility, the source code has been made publicly available along with the dataset.Data format Raw Filtered Description of data collection The dataset consists of 10,0 0 0 samples of U-bend shapes.Each is described by 28 parameters.These samples have been generated, evaluated and processed using Python and OpenFOAM.Each sample is independently and identically distributed to ensure a diverse and reliably dataset for deep learning applications.The performance of each shape is determined by its pressure loss and cooling capacity.Data source location • Institution: University of Kassel, Department for Intelligent Embedded Systems • City: Kassel • Country: Germany Data accessibility Data repository name: DaKS -Datenrepository der Universität Kassel [1] Data identification number: 10.48662/daks-17 Direct URL to data: https://daks.uni-kassel.de/handle/123456789/50

Value of the Data
• Despite its high complexity, the present multiphysics problem can be both interpreted and comprehended from the perspective of fluid dynamics.This allows the results proposed by innovative design optimization algorithms to be properly examined and comprehended.• The dataset holds significance for researchers engaged in the optimization of topologies, shapes, or the design of components.
• The dataset offers the potential to develop innovative algorithms for design optimization, particularly those based on deep learning methods that typically require a substantial volume of data.Currently, publicly available datasets of such magnitude for this purpose are limited.• To measure the performance of different algorithms, this dataset serves as a reliable benchmark in the field of design optimization.• Due to the number of partial differential equations, there are large numbers of target variables, which are particularly interesting for concepts from transfer learning.For instance, it becomes feasible to potentially decrease the number of partial differential equations that necessitate explicit solution.
• Each sample of the dataset is represented in three different ways (parameters, images, cells), enabling multi-modal learning.

Objective
The dataset may be of interest to researchers combining the fields of deep learning and design optimization using numerical simulations.In that area, this dataset can serve as a benchmark, as it is very versatile since each sample is represented as three different data types.Research in multimodal features, transfer learning, active learning and learning from image and/or graph data using supervised, unsupervised and or semi-supervised learning methods are feasible.The dataset was originally used as a case study to demonstrate the versatile applicability of a developed framework for anomaly and novelty detection [3] .To enable other researchers the use this dataset, and for subsequent investigations, this dataset is provided with public accessibility licensed by CC-BY-NC-4.0.

Experimental Design, Materials and Methods
In this section, the parameterization of the design space, the material properties, the boundary conditions and two different objective functions are first presented, and finally the experimental setup is explained.Parameterization of the design space: The parameterized model developed in this work is based on a benchmark test case from the von Karman Institute for Fluid Dynamics .It was originally introduced by Verstraete et al. [5] .The original model was extended with more design parameters and a solid region which was introduced by Goeke and Wünsch [6] .The original three-dimensional problem is simplified to two dimensions as shown in Fig. 2 .[3] .The boundary points can vary within the dashed boxes, while the curve parameters indicate the variation between the origin boundary point and its opposite boundary point.A smaller curve parameter results in a more pronounced curve, while a larger curve parameter results in a smoother transition between the boundary points.
It has a circular U-bend with an outer radius of 1 .26 • d h , the fluid area is bound by an inner radius of 0 .26 • d h , and the solid with a thickness of 0 .13 • d h is connected to it.The inlet and outlet sections have a length of 10 • d h .The used hydraulic diameter is d h = 0 .075 m .The initial U-bend with both regions, solid (dark grey) and fluid (light grey) is pictured in Fig. 2 .The Ubend is partitioned into different levels A to G.These levels represent the position on a layer o (outer) or i (inner).Boundary points with white-filled circles are fixed and therefore provide no degrees of freedom to the entire system.However, boundary points with green-filled circles, represent two design parameters and thus contribute two degrees of freedom to the overall system.In order not to display the complete length of the inlet and outlet channels the distance from level A/G to level B/F is depicted compressed in Fig. 2 .Green-filled circles represent the boundary points of the geometry and can vary within the dashed boxes.These boxes represent the limits of the design parameters of the respective boundary point.The dimensions for the boxes on the outer layer are 0 .75 d h x 0 .75 d h and 0 .52 d h x 0 .13 d h on the inner layer.On the outer layer, each point with a green-filled circle is parameterized for both coordinates x and y between -1 and 1 and has its origin (initial solution) in the parameter values (0,0).Positive parameter values provide a widening, whereas negative values result in a constriction of the flow area.On the inner layer, one degree of freedom is also parameterized between -1 and 1.The other component is parameterized between 0 and 1. Third-order Bézier curves are used to connect the boundary points.Each curve is controlled by two curve design parameters which are represented in Fig. 2 by the red dots.With the help of the blue lines the origin of each curve parameter is indicated.A red dot and its respective blue line can only move on a single axis (x or y).Each curve parameter value is between 0 and 1 and the value 0 places the curve parameter directly on the corresponding boundary point.Six boundary points and eight curves with two design parameters result in a total number of 28 design parameters each.In the following Fig. 3 , two example designs are presented.

Material properties and boundary conditions:
In the following the basic material properties and used boundary conditions are presented.These are kept constant for each sample in the dataset.The U-bend depicted in Fig. 2 consists of a fluid and a solid region.The selected fluid is air and the solid region is made of construction steel.Material properties used are assumed to be constant and gathered in Table 1 .Boundary conditions for each area are specified separately and described according to the names commonly used in OpenFOAM [7] .The noSlip condition applies to the flow on all walls.This means that the flow velocity must be zero.A fixed value is specified at the inlet .In normal direction, the gradient of the velocity must be zero at the outlet due to the free flow.The fluid enters with a temperature of 300K.The pressure p rgh is the static pressure p which is reduced by the hydrostatic pressure and is calculated at the entrance.On the adiabatic walls and at the limit wall the fixedFluxPressure boundary condition is used.At the outlet, the pressure p rgh is specified with a fixed value of zero.At each boundary the static pressure p is calculated.At the surface heat , a specific heat flux externalWallHeatFluxTemperature of q = 10 0 , 0 0 0 W/m 2 is given.In Fig. 2 all boundaries are shown and explained in detail for the fluid region in Table 2 and for the solid region in Table 3 .At the interface between fluid and solid the temperature must be the same, therefore the heat flow supplied to the fluid must correspond to the heat conducted from the solid.With turbulentTemperatureCoupledBaffleMixed , these two boundary conditions are realized and must be specified for both regions.Therefore, two boundaries instead of one boundary needs to be defined.For the fluid region, the limit is defined as fluid_to_solid .It is the other way around for the solid region and is defined as solid_to_fluid .
A Mach-number of 0.05 allows to use an incompressible assumption.A completely turbulent flow is indicated by the Reynolds number of 40 0 0 0. It was decided to employ the twoequation k − ω SST model in order to take turbulence into account.Used boundary conditions for the turbulent kinetic energy k , turbulent viskosity ν t and specific turbulent dissipation rate ω are mentioned in Table 4 .The boundary condition inletOutlet is usually the same as zeroGradient .Nevertheless, it changes to a fixedValue when the velocity vector targets into the domain next to the boundary.This fixed value is the value at the inlet .
In order to ensure a fully developed velocity profile at the inlet , a one-dimensional channel flow is precomputed.Furthermore, turbulence values k , ν t and ω are given from this precalculation.This ensures a suitable initial solution for the boundaries of the 2D U-Bend duct.
Using the initial geometry a comprehensive mesh study was performed.The most reliable results were obtained by using a mesh with 60 cells along the cross section in the fluid region, 15 in the solid region, and 780 cells along the flow course.The initial solution result was matched and validated with the help of measurements from Coletti et al. [8] using particle image velocimetry and a 3D simulation of Hayek et al. [9] .To save computation time the converged numerical solution of the initial problem is used as a starting point for the later generation of samples.
Objective functions: Two objective functions are introduced to evaluate the different designs and to measure their performance.An obvious optimization problem in fluid mechanics is the minimization of pressure loss.The pressure loss is calculated by the difference of the integral of the pressure p over the area A and normalized by the area A of the inlet and outlet.A low pressure loss is accompanied by a low pumping effort.One important aspect is to keep the boundary condition for the volume flow, i.e. the quantity of pumped medium, constant, since a zero flow rate would result in a zero pressure loss.
The second objective function is used to quantify the cooling performance, which is done with the help of the temperature at the heating surface.A lower heating surface temperature with a defined heat flow results in a bigger heat transfer.Objective function is composed of the quadratic temperature difference between heating wall temperature T heat and inlet temperature T in integrated over the heating surface A heat .This objective was used and introduced by Goeke and Wünsch in [10] .From a physical perspective, the objectives are mutually exclusive, since no design minimizes both objectives.The reason for this is that convective heat transfer and pressure loss are coupled by Reynolds number, flow velocity and wall shear stress, respectively.J 1 would favor a lower Reynolds number.However, since a high Reynolds number benefits a high convective heat transfer, this would lower the objective J 2 .

Experimental design:
The workflow of the computer experiment is shown in Fig. 4 .Only free and open-source software was used to set up the experiment.The programming language Python [11] , the workload manager Slurm [12] , the container software Podman [13] (to paravirtualize a Linux operating system) and OpenFOAM [7] as a CFD software were applied.The ➀ Design-Sampler generates a vector of 28 independent and i.i.d.random variables.In this step they are further processed into a format that can be read by OpenFOAM to create the mesh.Subsequently, the ➁ Job-Scheduler is used to parallelize a predefined number of jobs.The Job-Scheduler creates a job for each sample, that is fully automated and able to schedule jobs across multiple nodes in the computing cluster.Once simulations are completed, the results are gathered by the ➂ Data-Collector , so that new jobs are scheduled until the number of desired samples has been generated.
Inside the three large and parallel dark grey boxes the process of a simulation is shown in Fig. 4 .With the help of the Podman paravirtualization software, a container is created for each simulation.This container runs a Linux-based guest operating system, the CFD software Open-FOAM and other auxiliary utilities that are required to process the data as desired.The file provided by the Design-Sampler is received by the Mesh-Generator to generate the computational mesh.Now, the quality of the mesh is checked in Mesh Control .Since the solution quality of numerical simulations is very sensitive to the quality of the mesh a sample with a mesh that does not fulfill defined quality criteria is discarded.In addition, the parameterization allows the inner and outer layers to intersect, which is physically impossible and thereby prohibited.The parameters used to evaluate the mesh quality criterion are the maximal skewness, the maximal aspect ratio and the maximal non-orthogonality of single cells in the mesh.

Fig. 1 .
Fig. 1.The structure of the present dataset.Each of the i.i.d.samples has its own folder with input and output directories in which the respective files for the three different representations are stored.Structure information of the dataset are provided in the dataset.jsonand dataset_red.json .

Fig. 2 .
Fig. 2. Parameterized initial geometry with boundary points in green and curve parameters in red.The Figure is modified from Decke et al.[3] .The boundary points can vary within the dashed boxes, while the curve parameters indicate the variation between the origin boundary point and its opposite boundary point.A smaller curve parameter results in a more pronounced curve, while a larger curve parameter results in a smoother transition between the boundary points.

Fig. 4 .
Fig. 4. Experimental design to produce the U-bend design dataset.The Design-Sampler generates 10 0 0 0 designs which are managed by the Job-Scheduler on the Slurm-based computing cluster.According to the number of parallel jobs, the complex computer simulation is performed using OpenFOAM.Its quality is checked and the three data representations (Parameter, Image and Cells) are provided accordingly.The data is collected by the Data-Collector and prepared for usage for deep learning applications by the Data-Processor.
The chtMultiRegionSimpleFoam solver is used as Evaluator .It is a steady-state solver for buoyant, turbulent fluid flow and solid heat conduction with conjugate heat transfer between solid and fluid regions.The Residual Control monitors the quality of the solution, using the residuals of the individual numerical solution variables.Residuals are monitored for two reasons.Firstly, to stop the simulation as soon as a certain solution quality has been achieved and secondly, to filter samples that do not meet the minimum requirements for the solution quality after the simulation has been completed.The simulation of a single sample using an AMD EPYC 7002 processor (boost clock rate up to 3.35 GHz) lasts between 15 and 240 min.Samples that do not have sufficient mesh quality or sufficient solution quality are penalized with artificially high objective values.Whether these samples should be considered for further investigations or excluded depends on the research question to be investigated and therefore they are basically still part of the dataset.The Post-Processor calculates the defined objective values from the numerical solution and passes them to the Data Export .There the solutions are prepared and exported according to Fig. 1 in the demonstrated data formats and forwarded to the ➃ Data-Collector .As a last step, the ➄ Data-Processor deletes unnecessary log files.A * .jsonfile is compiled to represent the existing folder and data structure of the dataset.This * .jsonfile allows the data to be used quickly and efficiently for deep learning tasks.

Table 1
Material properties for the solid area (steel) and the fluid area (air).

Table 2
Boundary conditions of the fluid.

Table 3
Boundary conditions of the solid.

Table 4
Turbulence boundary conditions.