Advanced Approach of Material Region Detections on Fibre-Reinforced Concrete CT-Scans

Detections of material regions on CT-scans of solids are commonly treated manually by an expert. Although such manual detections have many advantages, some amount of human error is also incorporated. Moreover, expert opinions may vary significantly. We present an application of the k-means++ clustering as an alternative option to manual way of material area detections. k-means++ clustering is derived from k-means (the method of vector quantization, originally from signal processing), popular for cluster analysis in data mining and image processing communities. The algorithm's main advantages are its simple implementation and fast convergence to a local optimum of an objective function. We benchmark the suggested approach on transverse CT-scans of a fibre-reinforced concrete solid. Moreover, we introduce a technique for processing air distribution, such that the appropriate pixels detected as the pixels of air are converted into pixels representing concrete. The technique is based on the connected component algorithm. Benchmark and results of proposed method conclude the paper.


Introduction
Image processing techniques form an essential part of modern simulation methods especially in preprocessing, post-processing and visualizations.In this paper, we present an application of the commonly known k -means++ algorithm [1] as a solver for finding k maximal homogenous parts of transverse CT-scans of a solid.We call these parts material regions, which correspond to specific material areas in a CT-scan image.We suggest the technique as a semi-automatized alternative to the manual thresholding approach [2], which is used mainly in material science communities.Then, we focus on a problem formulation, which can be solved by k -means type algorithms, commonly known as Lloyd algorithms [3] and [4].For practical purposes, we propose to use the k -means++ variant of this algorithm.Standard k -means and k -means++ are presented.Benchmark of proposed method concludes the paper.

Fundamental Notations and Definitions
Let Ω ⊂ R 2 be the image function area.It is suitable to define the image function f (x, y) as the mapping f : Ω → 0, 255 in case of a CT-scan image.
Definition 1 (Region of Interest).The Region Of Interest Ω ROI ⊂ Ω is a part of an image function area which is selected by the segmentator/classifier (a program or human user).Region Ω ROI is typically a part of image foreground and Ω \ Ω ROI is interpreted as the background of the image.
The Region of Interest is often denoted by the acronym ROI.For the purposes of this text, the ROI definition is sufficient; see [5, pp. 340] for further information.For extracting the ROI from an image function area Ω, a binary mask is used in image processing field.In general, a binary mask corresponds to a characteristic function of a set.

Definition 2 (Binary mask).
Let Ω be the image function area, Ω ROI be ROI, and f be the image function.An image binary mask f mask : Ω → {0, 1} determines points p 1 , p 2 , . . ., p n ∈ Ω which belong to Ω ROI and it is defined as the characteristic function of Ω ROI : where j ∈ {1, . . ., n}.
Further, properties of pixels are maximally similar within each determined region and maximally dissimilar among the regions.More formally, the length of the interval I containing values of the function f |Ω i ROI is minimized, whereas values of f |Ω\Ω i ROI are as far as possible from the centroid c i of the interval I. Other region requirements are not treated such as region compactness, or the total length of the regions boundary.The problem formulated above implies that partitioning of the image scene area is not locally specified; it treats k region homogenizations of pixel value distributions and formulates them as a clustering problem which can be solved by standard clustering algorithms.We propose Lloyd based algorithms, commonly known as algorithms of k-means type, to solve this problem.

4.
The k -means Algorithm (∀p j ∈ P ) (∃!S i ∈ S : In general, it is an image partitioning technique which decomposes an image area into reasonable noncompact parts; by definition, determination of ROI is not a local specified problem.However, the k-means is an optimizer, hence, the solution returned by the solver, is a relax of the optimal solution.Thus, let c 1 , c 2 , . . ., c k be centeroids of the aprropriate image regions S 1 , S 2 , . . ., S k .The regions are built so that they minimize a sum of squares within each region, i.e. the k-means solves the minimazition problem where • 2 denotes the Euclidean norm.In general, we can divide k-means algorithm procedure into two significant steps, i.e. an initialization step and an update step, as follows: • Initialization step: The algorithm determines the nearest centroid from the set of centroids {c 1 , c 2 , . . ., c k }, which we denote by C, of the appropriate regions S 1 , S 2 , . . ., S k for each input point p j ∈ P as follows: it compares stepwise Euclidean distance between the point and each centroid c i ∈ C, and the point is assigned to region S i ∈ S, in which the Euclidean distance between the point and appropriate centroid is minimal.
The initial state is typically a random assignment of points p 1 , p 2 , . . ., p n to regions S 1 , S 2 , . . ., S k or a random selection of centroids from the input point set P and then an assignment of input points by the approach mentioned above.Mersenne twister [6] is typically used as the pseudo-random integer generator.
• Update step: The algorithm computes new coordinates of region centroids as an arithmetic mean of the points within an appropriate region.
These two steps are repeated until the maximum iteration count is exceeded or point region membership changes are relatively small.By the following pseudocode, we simply describe the entire k-means procedure.In the implementation of the k -means algorithm, a point assignment to an appropriate region is implemented so that the point memberships are only stored in membership vector U .This approach is efficient for a computer memory management, because there is no point reassignment overload.for p j ∈ P : U {i t } = j t , where j t is index of the point and i t is the nearest centroid index.
5. Compute new region centroids as the arithmetic mean of points within-region, i.e.

5.
The k -means++ Algorithm The k-means algorithm quickly converges to a local optimum of the potential function φ, but there are many practical examples for which it generates arbitrarily bad clustered regions, i.e. φ φopt is unbounded even when n := |P | and k are fixed.It becomes apparent that the reasons of this fact are the random choices of starting centroids and appropriate assignment of input points to starting regions.Therefore, in 2007, David Arthur and Sergei Vassilvitskii proposed an efficient technique for selecting k-means starting centroids from an input point set P , and they called this technique k-means++ [1].Its general principle is a uniformly random choice of starting centroids from input point set at probability distribution, which is computed as the squared Eu-clidean distance to the centroid that has already been chosen, as follows: Algorithm 2: k-means++ Input : input set P = {p 1 , p 2 , . . ., p n }, desired region count k, reasonable small ε ∈ R, maximum iteration count M .Output: regions S 1 , S 2 , . . ., S k .
1. Choose uniformly randomly a centroid c 0 1 from the input point set and assign c 0 1 to the set of centroids C.
2. For each p j ∈ P , compute the best label distance D(x) as the minimal Euclidean distance between point and the centroids that have already been chosen.
3. Choose next starting centroid c 0 i randomly at probability distribution D(x) 2 .
4. Repeat 2 and 3 until the k starting centroids have not been chosen, then continue with the standard k -means algorithm.
By this augmentation of the k-means, the k-means++ algorithm becomes O (log k)-competitive, where k is the desired region count.Moreover, it can be proved that E [φ] ≤ 8 (ln k + 2) φ opt [1], i.e. the solution returned by the solver is at most 8 (ln k + 2) worse by factor than the optimal solution.

Non-Local Material Detections in CT-Scan Value Distributions
Industrial CT-scans are typically 8-bit grayscale images; it implies that appropriate intensities of desired materials are coded into at most 256 values, i.e. 0, 1, . . ., 255.From Institute of Geonics of the CAS, we have obtained a 1491 large CT -scan dataset of the Fiber-Reinforced Concrete (FRC), the 591 st image is shown in Fig. 1.All the presented binary masks are associated with this CT-scan.
In the FRC dataset, a Fiber-Reinforced Concrete consists of 3 most significant material types, i.e. air, fiber, and concrete.The material detection problem is not locally specified by definition, therefore it is appropriate to use any of the Lloyd type clustering algorithms to solve the material detection problem.As noted in the previous section, the k-means++ algo-  rithm is bounded.Therefore, we propose to use it instead of the unbounded standard k-means algorithm.
In practical applications, the outside-air region (the hatched part in Fig. 3) is not suitable for additional processing, e.g.construction of mesh material parts from determined binary masks.Therefore, we propose to detect image connected components [7] and remove a connected component from an outside-air region.Moreover, small parts of an air region, i.e. small air bubbles in concrete, are typically not appropriate for additional computations, e.g.plasticity [8] and [9] or plasticity with damage [10] computations on the FRC.Hence, we propose to remove air bubbles with a size below a certain threshold from the air region and consider them as parts of the concrete region.In Fig. 4, the threshold is set to 15 pixels.
For implementation, we recommend to use C++ OpenCV framework [11].It is an open-source image processing framework frequently used in imageprocessing communities.Moreover, the algorithms above, i.e. k-means, k-means++ and image connected components algorithm, are implemented in the OpenCV framework, see [12].
We benchmark the presented approach on FRC dataset containing 1491 CT-scans.As already mentioned, such k binary masks determine as homogenous as possible k parts of geocomposite material parts.Furthermore, the binary masks can be used for another processing, e.g.building of hull and volume meshes by the Marching cubes algorithm [13].

7.
Future Work In case of industrial simulations such as plasticity computation on a heterogeneous solid, it is suitable to convert the real-world solid into a finite element model enabling highly accurate simulations.Image partitioning/segmentation plays an important role in detection of material regions of the CT-scan image scene and set up their binary masks.Furthermore, the determined binary masks are used for building hull-mesh by the marching cube algorithm as we present in the text above.For a tetrahedral volume mesh geometry generation from hull-mesh boundary representation, the NETGEN library [14] is typically used.
The plasticity computations will be proceeding in PERMON [15] (Parallel, Efficient, Robust, Modular, Object-oriented, Numerical) software toolbox developed at IT4Innovations National Supercomputing Center, Ostrava, Czech Republic.PERMON is a set of solvers combining quadratic programming and domain decomposition methods.It makes use of and extends the PETSc [16] framework for numerical computations.Among the main applications are contact problems of mechanics.They can be decomposed by means of TFETI (Total Finite Element Tearing and Interconnecting) [17] non-overlapping domain decomposition method implemented in the PermonFLLOP package.The resulting quadratic programming problems can be solved by the PermonQP package efficiently.

Conclusions
In this paper, we introduce an advanced tool for detections of material regions on CT-scans of fiberreinforced concrete.The essential part is based on the k -means++ clustering for detecting binary masks of desired material areas on the CT-scan image scene.
We benchmark the presented approach on FRC dataset containing 1491 CT-scans.Three significant material regions are detected, i.e. air, concrete, fibers.Moreover, we introduce a technique based on an image connected component algorithm for reduction of the air bubble distribution.Based on the achieved results, we propose the automatized presented technique as a suitable alternative to the manual thresholding approach, which is a standard way to solve such problem in material engineering communities.Additionally, the most significant benefits of an automatized technique like ours, which is being presented in this paper, is its possibility to be integrated into a pipeline, in which an industrial process is simulated.The optimization and tuning of the pipeline runned on high performance computers are subjects of our further research.

About Authors
Marek PECHA was born in Zilina, Slovak republic.He has received his M.Sc.degree in Computer Science and Applied Mathematics from VSB-Technical University of Ostrava (VSB-TUO), Czech Republic (2016 Faculty of Electrical Engineering and Computer Science dean's award for diploma thesis).Since 2014 and 2016, he has been a Research assistant at IT4Innovations National Supercomputing Center and a Ph.D. student at the Department of Applied Mathematics, respectively, both institutions being within VSB-TUO.His research of interest covers numerical analysis, machine learning, computer graphics, material sciences, and massively parallel programming.He is a co-author of the PermonSVM library for linear support vector machines problems, and also a co-author of massively parallel version of the PRAGTIC software for fatigue analysis.