VoroCrack3d: An annotated semi-synthetic 3d image data set of cracked concrete

Sustainability is an important topic in the field of materials science and civil engineering. In particular, concrete, as a building material, needs to be of high quality to ensure its durability. Damage and failure processes such as cracks in concrete can be evaluated non-destructively by micro-computed tomography. Cracks can be detected in the images, for example via edge-detection filters or machine learning models. To study the goodness, robustness, and generalizability of these methods, annotated 3d image data are of fundamental importance. However, data acquisition and, in particular, its annotation is often tedious and error-prone. To overcome data shortage, realistic data can be synthesized. The data set described in this article addresses the lack of freely available annotated 3d images of cracked concrete. To this end, seven concrete samples without cracks were scanned via micro-computed tomography. Realizations of a dedicated stochastic geometry model are discretized to binary images and morphologically transformed to mimic real crack structures. These are superimposed on the concrete images and simultaneously yield the label images that distinguish crack from non-crack regions. The data set contains 1 344 of such image pairs and includes a large variety of crack structures. The data set may be used for training machine learning models and for objectively testing crack segmentation methods.


a b s t r a c t
Sustainability is an important topic in the field of materials science and civil engineering.In particular, concrete, as a building material, needs to be of high quality to ensure its durability.Damage and failure processes such as cracks in concrete can be evaluated non-destructively by microcomputed tomography.Cracks can be detected in the images, for example via edge-detection filters or machine learning models.To study the goodness, robustness, and generalizability of these methods, annotated 3d image data are of fundamental importance.However, data acquisition and, in particular, its annotation is often tedious and error-prone.To overcome data shortage, realistic data can be synthesized.The data set described in this article addresses the lack of freely available annotated 3d images of cracked concrete.To this end, seven concrete samples without cracks were scanned via micro-computed tomography.Realizations of a dedicated stochastic geometry model are discretized to binary images and morphologically transformed to mimic real crack structures.These are superimposed on the concrete images and simultaneously yield the label images that distinguish crack from non-crack regions.The data set contains 1 344 of such image pairs and includes a large variety of crack structures.
The data set may be used for training machine learning models and for objectively testing crack segmentation methods. ©

Value of the Data
• 3d image segmentation methods from the field of machine learning such as random forests and convolutional neural networks are usually trained on large annotated data sets.This data set addresses the need of annotated 3d images of cracked concrete.It consists of 1 344 images obtained by superimposing μCT images of concrete with synthetic crack structures and corresponding label images.That is, the ground truths are provided as well.• The images show seven concrete types, each with four levels of noise.The synthetic crack structures are based on four stochastic geometry models, include several levels of branching, and appear on multiple scales.The variability within this data set makes it unique and useful for several applications.• First, the data can be used to train machine learning models such as convolutional neural networks to be used for segmenting cracks in 3d images of concrete.Within this context, the goal is to train models that can be used automatically or semi-automatically for segmenting various concrete types and crack structures.Due to its variability, the data set described here can be considered highly suitable for this task.• Second, based on this data set, crack segmentation methods can be evaluated objectively.It is thus suitable for benchmarking tests, validating robustness, and studying the generalizability of segmentation methods.• The data set also provides the blank concrete images without cracks as well as the blank label images that contain only the cracks, making it useful for generating additional images that are not necessarily restricted to concrete.

Background
Annotated data is a valuable foundation in the field of pattern recognition.It is a prerequisite for training or assessing the performance of algorithms for image segmentation.
However, annotated data is often only scarcely available.This holds true especially for 3d images since manually annotating the regions of interest is time-consuming and usually not a trivial task, even for experts.A solution to that problem is offered by the generation of (semi-)synthetic data.Here, the data comes in pairs of input and ground truth images that objectively classify the regions of the input images.This makes them predestined to be used for training machine learning models.Trained on semi-synthetic images, these models were already successfully applied in many contexts such as crack segmentation in concrete [2 , 13-15] and defect segmentation on metal surfaces [12] .Furthermore, segmentation methods -both from classical image processing and machine learning -can be validated objectively [2 , 3] .
In case of 2d images of cracked concrete, several annotated data sets exist [4 , 5] .Annotated 2d images are also available for the related field of cracks in (road) pavements [9 , 10] .In a previous study of the authors, 3d concrete images with synthetic cracks as realizations of fractional Brownian surfaces were considered.The annotated data set is freely available [2] .However, it is restricted to cracks on just one scale that appear on just one concrete type.
Here, we consider cracks that are realizations of a more flexible stochastic geometry modelboth in terms of crack shape and crack thickness.Cracks are discretized to binary images, morphologically transformed, and embedded into real μCT images of concrete.The images feature high variability regarding concrete backgrounds, noise levels, crack widths and structures.Hence, we consider the data set unique and capable of filling the gap of freely available, annotated 3d image data of cracked concrete.This makes it highly valuable for the image processing, deep learning, materials science, and civil engineering communities.
The crack modeling and discretization procedure was originally proposed in [1] .While the authors of [1] focus on the development of the stochastic modelling approach, this paper describes a new data set based on a variety of CT images of concrete with embedded model realizations.

Data Description
VoroCrack3d [11] comprises a total of 1 344 volume images of concrete.All images are 16bit 3d grayvalue images in tif-format and of size 400 × 400 × 400 voxels.The data is split up with respect to the seven particular concrete backgrounds shown in Fig. 1 .The backgrounds are cropped from reconstructed μCT images of the respective concrete samples.The directory structure is as follows: 1. hpc : High-performance concrete reinforced with fibers made of glass fiber-reinforced polymer, voxel size 20.4 μm 2. nc : Normal concrete reinforced with fibers made of glass fiber-reinforced polymer, voxel size 22.7 μm 3. pores : Air pore concrete, voxel size 2.8 μm 4. ppfiber : High-performance concrete, reinforced with polypropylene fibers, voxel size 60.4 μm 5. steelfiber-crimped : Ultra-high-performance concrete, reinforced with crimped steel fibers, voxel size 106 μm 6. steelfiber-hooked-end : Ultra-high-performance concrete, reinforced with hooked-end steel fibers, voxel size 88.5 μm 7. steelfiber-straight : Ultra-high-performance concrete, reinforced with straight steel fibers, voxel size 49.4 μm For each concrete type, synthetic cracks were generated as a subset of facets of a random Voronoi diagram.We used four different point process models as generators of the diagram The input and label folders then each contain 48 images representing several levels of crack widths and branching ( Fig. 2 and Fig. 3 ).The degree of branching is given by the number of crack branches per image.The local variation of crack width is determined by the Bernoulli parameter, see Section 'Experimental Design'.

Experimental Design, Materials and Methods
The concrete samples were prepared by Frank Schuler ( steelfiber-hooked-end, steelfibercrimped ), Bianca Dornisch-Bund ( pores ), Kasem Maryamh ( steelfiber-straight ), Martin Kiesche ( hpc, nc ), and Szymon Grzesiak ( ppfiber ), all at RPTU.CT imaging was done by Franz Schreiber ( steelfiber-hooked-end, pores, steelfiber-straight, nc, hpc ; at Fraunhofer ITWM), Michael Salamon ( ppfiber ; at Fraunhofer EZRT) and Ralf Löffler ( steelfiber-crimped ; at Hochschule Aalen).The image data was acquired by means of μCT using the hardware setups A-C.The synthetic crack structures are realizations of a stochastic geometry model.In the following, we will present the basic idea behind the modeling procedure and report the parameters that were chosen to produce the cracks.For more details, we refer to [1] .Cracks are modeled via a connected set of facets of a bounded 3d Voronoi diagram.The generators of the Voronoi diagram are realizations of several types of point process models.In the first step, we choose a Voronoi vertex on each of the edges of the bounding cuboid pointing in y-direction.To avoid boundary issues, the vertices are chosen in the center of the cuboid.The four vertices are connected via Dijkstra's algorithm using only Voronoi edges on the faces of the cuboid.The edges are weighted according to their length.This procedure yields a contour on the boundary of the cuboid ( Fig. 5 , middle).Then, a minimum-weight surface (MWS) bounded by that contour is computed ( Fig. 5 , right).This is done by solving the linear binary integer program minimize where F is the set of Voronoi facets, w a function that assigns each facet a positive weight and D an incidence matrix with D j,i = 1 if edge a j and face f i are incident and coherent, D j,i = −1 if edge a j and face f i are incident and anti-coherent and D j,i = 0 else.Similarly, for the vector q we have q j = 1 if a j is part of the contour and coherent to it, q j = −1 if a j is part of the contour and anti-coherent to it and q j = 0 else.After solving the program, y i = 1 means that facet f i is part of the minimum-weight surface.For more details we refer to [1] .
Here, we choose the facet weights to be equal to their respective area.The Voronoi diagram's levels of regularity depends on the chosen underlying point process model.We consider an observation window of size 400 × 150 × 400 and Poisson point processes (' ppp ') with intensity 0.0 0 02, Matérn cluster processes (' matclust ') with parent intensity 0.0 0 02/50, offspring intensity 50, cluster radii 20, and hard-core point processes (' hc' ) obtained from force-biased packings of spheres [8] with constant radii, intensity 0.0 0 0 025 and 60% volume fraction, see Fig. 6 .Furthermore, we consider a Poisson point process in an observation window of size 200 × 150 × 200 with intensity 0.0002.The resulting Voronoi diagram will be scaled in the discretization procedure explained below (' ppp-scaled ').
The MWS is then discretized.In the first step, a label image is computed where the grayvalue of each voxel (i,j,k) is set to the index of the Voronoi cell the point (i,j,k) is contained in.In a second step, a binary image is computed by considering adjacent pairs of voxels whose grayvalues in the label image differ.The values of these voxels are set to 1 in the binary image if their labels correspond to cells that generate a facet that is part of the minimum-weight surface.The remaining voxels are set to 0.
In case of the structure in the 200 × 150 × 200 observation window, the discretized image is scaled by a factor of two in x-and z-direction.
Afterwards, the structure is dilated.The cracks with a fixed width are dilated with a structuring element of fixed size (3 × 3, 5 × 5, 7 × 7) or not dilated at all.For the multi-scale cracks, we choose an adaptive dilation procedure: Starting from slice x = 1, every 2d slice is dilated separately and repeatedly with a 2 × 2 structuring element.The number of repetitions is derived from a random walk with Bernoulli-distributed increments with parameter p (Bernoulli parameter).We chose p from {0.01, 0.02, 0.05, 0.1, 0.2}.
To model the rough crack boundary, a second Voronoi diagram is generated from a Poisson process in 400 × 150 × 400 with intensity 0.2.This diagram is also discretized.The expected size of a cell of this diagram is roughly 5 voxels.Every cell of this second diagram that touches the initial dilated crack is merged to the crack.For cracks which have not been dilated, this procedure yields widths which exceed the desired width of 1 voxel.In this case we only consider crack voxels that are adjacent to the lower part of the background to be part of the crack.
Crack branching is realized by combining several cracks.The main crack is generated as described above.Branches are added as follows: For the contour, we take two of the vertices that were used for the contour of the main crack.The other two are chosen equally randomly from the respective edges of the cube.Branches are not dilated.For modeling the rough boundary, the same Voronoi diagram as above is used.
The resulting label image is then padded to a size of 400 × 400 × 400 to obtain the ground truth image.It is embedded into the 3d image of uncracked concrete: The crack's grayvalues are sampled from a normal distribution.Mean and variance are estimated from the empirical distribution of the concrete's air pores.To this end, the air pores are segmented via a Frangi filter [6] for dark blob-like structures on a brighter background.Finally, a Gaussian filter with σ = 0.6 is applied to the transition area of crack and concrete background to mimic the partial volume effect.The dilation and embedding procedure is shown in Fig. 7 .From left to right: Polypropylene fiber-reinforced concrete, air pore segmentation, label and alternative label image.The green pore in the segmentation intersects the crack but is not fully contained in it.Thus, it is not considered part of the crack and is given value 0 in the alternative label.The orange pore is fully contained in the crack.Thus, it is part of the crack and is not considered background.
Steel reinforcements are not expected to fail during crack propagation.Hence, the synthetic cracks do not go through these reinforcements.That is, the fiber system from the background image should be unaffected by the crack embedding described above.To this end, we compute a mask by segmenting the steel reinforcements via a Frangi filter for bright tube-like structures on a darker background.An example is shown in Fig. 8 .Cracks are then only added outside the mask.
These considerations also yield the conclusion that an alternative ground truth is beneficial.First, the data set includes a ground truth of the whole crack structure ('label').For the second (alternative) ground truth ('label-altern'), the intersections of crack and fibers are computed.These intersections are then considered background (no crack, value 0) in the ground truth since we assume only uncracked fibers ( Fig. 8 , right).
Also, it can be argued whether air pores intersecting a crack should be considered part of the crack.Here, the intersection of a crack and an air pore is considered background (no crack) if the crack does not contain the whole air pore.The concept is visualized in Fig. 9 .
The label of the ground truth grayscale images corresponds to the local crack thickness.The images can be thresholded with threshold 1 to obtain binary label images.

Limitations
Our crack modeling procedure is based on the assumption that cracks are connected structures which possibly exhibit multiple levels of thickness.Typically, these kinds of cracks appear in concrete samples that were exposed to stress tests such as tensile or bending tests.However, other crack structures are possible.For example, cracks that emerge from alkali-silica reactions typically exhibit a different topology [7] .In particular, a network of single, disconnected cracks can be observed.Our model does not account for these kinds of structures, such that the data set is limited to connected crack structures.

Fig. 5 .
Fig. 5. Crack modeling procedure: Voronoi diagram bounded by a cuboid (left), contour on the cuboid that is a set of Voronoi edges (middle), minimum-weight surface inside the Voronoi diagram that is bounded by the contour (right).

Fig. 6 .
Fig. 6.MWS with different levels of regularity.The Voronoi diagarms were generated by a Poisson point process (left), a Matérn cluster process (middle) and a hard-core process (right).

Fig. 7 .
Fig. 7. From left to right: Discretization of the MWS, adaptively dilated crack, crack boundary refinement, crack embedded into a CT image of real concrete.

Fig. 9 .
Fig.9.From left to right: Polypropylene fiber-reinforced concrete, air pore segmentation, label and alternative label image.The green pore in the segmentation intersects the crack but is not fully contained in it.Thus, it is not considered part of the crack and is given value 0 in the alternative label.The orange pore is fully contained in the crack.Thus, it is part of the crack and is not considered background.
2024 The Author(s).Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) The image data were acquired by means of micro-computed tomography (μCT) at Fraunhofer ITWM, Fraunhofer EZRT, and Hochschule Aalen using the CT devices described in Section 'Experimental Design'.The synthetic crack structures are realizations of a stochastic geometry model which are discretized and morphologically transformed as described in Section