Image Processing-Based Spall Object Detection Using Gabor Filter, Texture Analysis, and Adaptive Moment Estimation (Adam) Optimized Logistic Regression Models

. This study aims at proposing a computer vision model for automatic recognition of localized spall objects appearing on surfaces of reinforced concrete elements. The new model is an integration of image processing techniques and machine learning approaches. The Gabor ﬁlter supported by principal component analysis and k -means clustering is used for identifying the region of interest within an image sample. The binary gradient contour, gray level co-occurrence matrix, and color channels’ statistical measurements are employed to compute the texture of the extracted region of interest. Based on the computed texture-based features, the logistic regression model trained by the state-of-the-art adaptive moment estimation (Adam) is utilized to establish a decision boundary that delivers predictions on the status of “nonlocalized spall” and “localized spall.” Experimental results demonstrate that the newly developed model is able to achieve good detection accuracy with classiﬁcation accuracy rate � 85.32%, precision � 0.86, recall � 0.79, negative predictive value � 0.85, and F 1 score � 0.82. Thus, the proposed computer vision model can be helpful to assist decision makers in the task of the periodic survey of structure heath condition.


Introduction
Public safety is a major concern of civil engineers who design and maintain high-rise buildings.Despite considerable efforts in design and advanced knowledge of building structures, accidents can still happen in the built environment due to excessive usage, structural aging, and inclement weather conditions [1].Among the hazards occurred in high-rise buildings, falling objects from overhead caused by concrete spalling can be particularly dangerous and have a high potential severity to occupants' heath [2].e effect of concrete debris can be devastating for human lives if it gets broken off from surfaces of exterior wall systems of high-rise buildings [3].
A concrete spall (Figure 1) is regarded as flakes of concrete/mortar broken off from a concrete element (e.g., beam, wall, and ceiling) [4].Spalling is typically caused by stresses brought about by differential movement of materials.Most often, spalling in concrete is due to corrosion of steel reinforcement embedded in the structure.To prevent such accidents and to ensure the safety and serviceability of the built environment, periodic visual surveys of structural heath condition and proper maintenance activities are very crucial [5].
In developing countries, including Vietnam, manual visual inspection is still the principal method for evaluating structural heath conditions. is activity is performed at regular intervals to identify potential damages and guarantee the service/safety requirements of high-rise buildings.Provided the well-trained technicians experienced in structural heath assessment, manual visual inspection is able to providing accurate surveying outcomes.Nevertheless, due to the increasing numbers of buildings needed to be inspected periodically and the limited number of experience technicians, timely evaluation of building elements becomes infeasible and inspection deficiencies become a major concern of property owners.erefore, there is a practical need to substitute the unproductive manual visual survey with a more effective approach.
Recently, due to the ease of access to low-cost and highquality visual sensing equipment including digital cameras, computer vision-based models have been increasingly used for automatic structural heath monitoring [6].
ese advanced approaches have been proved to be viable alternatives to replace the labor-intensive and subjective methods relied on manual survey.With the use of advanced image processing techniques operated on image samples collected from digital cameras, the physical condition of civil structures can be continuously surveyed and reported to maintenance agencies.is evaluation outcome can be effectively used to support the decision-making process regarding maintenance prioritization and funding allocation.
Due to such reasons, a large number of computer visionbased approaches have been proposed to successfully detect various forms of structural defects such as cracking and spalling.Abdel-Qader et al. [7] employs a principle component analysis-based model to recognize cracking defects appeared on bridge surfaces; the principle component analysis is utilized to support data cluster identification with a large database of bridge images.O'Byrne et al. [8] utilizes texture analysis for detecting damages appeared on infrastructural elements; the texture-based image segmentation relies on pixel intensity values and gray level co-occurrence matrix.Subsequently, a support vector machines model is then employed for the data classification task.Lattanzi and Miller [9] rely on the data clustering approach for image segmentation based on the Canny and k-means algorithm; the research finds that the combined algorithms can deliver good accuracy of crack recognition under different environmental circumstances.
As can be seen from the literature, a large number of previous studies have been dedicated to crack detection for concrete structures [10][11][12][13][14][15][16][17][18][19][20].Only recently, there is an increasing focus on detecting other forms of damage including concrete spalling [21][22][23][24].German et al. [25] constructs a combination of segmentation, template matching, and morphological preprocessing for detecting spall appeared on surfaces of concrete columns.Machine learning models including support vector machines, Naïve Bayesian classifier, and random forest have been employed to identify concrete defects [8,26].A model for localization and quantification of concrete spalling defects based on terrestrial laser scanning has been proposed in [27].Dawood et al. [21] presented a computer vision-based model for spalling detection used in environment of subway networks.
Hoang [28] relies on a steerable filter and machine learning to recognize wall defects such as cracks and spalls.A concrete spalling detection model for metro tunnel from point cloud that employs a roughness descriptor has been developed by Wu et al. [24].Hoang [29] presents an image processing approach for automatic detection of concrete spalling using machine learning algorithms and image texture analysis.Nevertheless, this model focused on machine learning-based texture discrimination and was not capable of isolating the entire individual spall object.
Yao et al. [30] establishes a convolutional neural network-based model for concrete bughole detection; a large number (about 10,000) of image examples have been used as a training dataset.Li et al. [31] proposed a model for detecting exposed aggregate appeared on stilling basin slab using the attention U-Net network.Chow et al. [32] employs deep learning of a convolutional autoencoder for anomaly detection of defects existing on concrete structures.A model for recognizing damaged ceiling areas in large-span structures has been proposed by Wang et al. [33]; this model employs a convolutional neural network for pattern recognition.Although deep learning-based models are capable of performing feature extraction phase automatically, these supervised learning models generally demand a large-size training dataset and a meticulous process of data labeling [34][35][36].is data labeling process itself can be time-consuming as well as error prone [5].In addition, the deep learning models also require experience and the trial-anderror process to adjust a significant amount of model's free parameters.
An effort of combining unsupervised learning and machine learning-based data classification has been recently introduced in [37].e k-means clustering algorithm and machine learning classifier have been integrated to form an automatic approach for estimating stripping of asphalt e k-means clustering algorithm is utilized to separate pixels with similar values on the surface of aggregates; subsequently, machine learning models are used to categorize the identified clusters into groups of asphaltcoated and uncoated areas.
As pointed out by previous studies, the current challenges faced by computer vision-based concrete damage detection including spall recognition are complex environmental conditions (e.g., noisy background image) [5] and the difficulty of the image labeling process [32].More efforts should be dedicated to automatically identify the damage's region of interest (ROI) via unsupervised learning methods.Capable machine learning methods with few free parameters should be investigated as viable alternatives to sophisticated models used for data classification.It is because simple and manageable models significantly facilitate the development and application of hybrid computer visionmachine learning approaches for concrete spalling detection.
Based on such motivation, this study proposes and verifies an automated method for recognizing localized spall objects based on an integration of a Gabor filter, k-means clustering, image texture analysis, and logistic regression pattern classification models.e Gabor filter coupled with the principal component analysis and the k-means clustering are used synergistically for automatic identification of ROI on concrete surface.e image texture analysis combines powerful texture discriminators of binary gradient contours, color channels' properties, and the gray level co-occurrence matrix.e logistic regression model trained by the state-ofthe-art adaptive moment estimation (Adam) optimizer is employed for data classification.
e subsequent sections of the study are organized as follows: the second section reviews the research methodology.e third section presents the image data collection process.e proposed integrated model used for concrete spall detection is described in the next section, followed by the experimental results and discussion.e final section summarizes the research findings with several concluding remarks.

Gabor Filter (GF).
Image segmentation is the process of separating an image into distinctive regions [38,39].e GF is a widely applied approach for segmenting image [40,41].
is approach is inspired by the multichannel operation of the human visual system used for visual interpretation in real-world circumstances [42][43][44].Based on experimentation, it has been shown that the GF resembles simple cells in the Mammalian vision system.us, this filter can be a reasonable model of how humans actually recognize and discriminate areas characterized by different texture [45].
e GF consists of two-dimensional Gabor filters which can be described as complex sinusoidal waves modulated by Gaussian envelopes [43]. is filter carries out a localized and oriented frequency analysis of a two-dimensional signal.
e GF yields a response that can be mathematically given as follows [45]: where u 0 denotes the frequency of a sinusoidal plane wave along the x axis.σ x and σ y represent the space constants of the Gaussian envelope along the x and y axes, respectively.It is noted that the GF with arbitrary orientations can be obtained via a rigid rotation of the x-y coordinate system.e frequency domain representation of the GF is given by [45] where It is worth noticing that it is necessary to specify tuning parameters of the GF including the orientation angles and the radial frequency.Based on the suggestions of Jain and Farrokhnia [45], four values of orientations are often employed: 0 °, 45 °, 90 °, and 135 °.Given an image with a width of N w pixels, the following values of radial frequency u 0 can be considered: 1

e K-Means Clustering Algorithm.
In this study, the unsupervised machine learning approach of k-means clustering [46] is employed to divide an image into different regions based on the analysis results obtained from the GF. is unsupervised machine learning method is simple yet powerful algorithm for automatic data grouping [47].Based on such method, image pixels that have the similar properties can be grouped in one cluster.Accordingly, data samples belonging to one cluster feature the smallest degree of variation.e iterative algorithm used to compute the cluster centers is presented in Algorithm 1.

Binary Gradient Contours (BGC).
e BGC, proposed by Fernández et al. [48], is a group of computationally simple texture descriptors.Given a 3 × 3 grayscale image patch, these texture descriptors employs a set of eight binary gradients between pairs of pixels all along a closed path around the central pixel [49].
e BGC includes three versions which are single-loop, double-loop, and triple-loop descriptors.Via experimentation, the BGC operator has been found to achieve good texture discrimination outcomes.
A matrix S which is the pixel intensity of an image patch of the size 3 × 3 is given as follows: S �

Advances in Civil Engineering
where I c denotes the central pixel.I 0 , I 1 , . .., I 7 are the neighboring pixels.e schematic representations of BGC with three versions of single, double, and triple loops are presented in Figure 2. In addition, to facilitate the mathematical formulation of these texture descriptors, a square crop S m,n is given by where I m,n represents the pixel at m th row and n th column.Accordingly, the formulations of the single, double, and triple-loop versions are given by [48] (i) Single-loop version: (ii) Double-loop version: (iii) Triple-loop version:

RGB Channels' Properties.
Since the color properties of spall and nonspall objects are expected to be dissimilar, this study employs the statistical measurements of three color channels: red (R), green (G), and blue (B) as a means of texture description.Given an image sample I, the first-order histogram P(I) can be computed.Accordingly, the mean (μ c ), standard deviation (σ c ), skewness (S c ), kurtosis (K c ), entropy (E c ), and range (R c ) of the three color channels (R, G, and B) can be calculated [29,50].
is approach focuses on capturing the repeated occurrence of certain gray-level patterns [53].erefore, indices extracted from a GLCM can be effectively utilized to evaluate the coarseness/fineness of an image region.Let r and θ represent a distance and a rotation relationship between two individual pixels.e GLCM, denoted as P δ , denotes a probability of the two gray levels of i and j having the relationship specifying by r and θ [54].Based on the recommendations of Haralick et al. [51], the GLCM can be constructed with r � 1 and θ � 0 °, 45 °, 90 °, and 135 °.Accordingly, for each matrix, four indices of angular second moment (AM), contrast (CO), correlation (CR), and entropy (ET) can be computed as follows [29,55]: where N g denotes the number of gray level values; μ X , μ Y , σ X , and σ Y are the means and standard deviations of the marginal distribution with respect to P N δ (i, j).
e task at hand is to construct a decision Determine the number of cluster k Randomly assign k centers of data samples (1) Loop (2) Assign each data points to the cluster with the nearest mean (3) Recalculate means for data points assigned to each cluster (4) Until the data assignments are unchanged.
ALGORITHM 1: e k-means clustering.4 Advances in Civil Engineering boundary that categorizes the input data into two distinctive regions.erefore, given a vector of input data x i � x i1 , x i2 , ..., x i D , where D is the number of the features used for classification, the model is able to derive the class output y with either y � 0 (for the negative class of nonspall) and y � 1 (for the positive class of spall).
e probability of the positive class h θ (x i ) derived by a LRM is given by [59] where As a supervised learning approach, a set of training examples needs to be prepared so that the vector θ can be adapted during the model training phase.A LRM can be trained by either minimizing the least square loss function or maximizing the log likelihood function.
e least square loss function is given by [60] Loss where M is the number of training data.e log likelihood function is described as follows [61,62]: A LRM can be trained via the stochastic gradient descent framework [29].If the least square loss function is used, the update rule for adapting the model parameters is given by [60] where Lr denotes the learning rate parameter.
Meanwhile, if the log likelihood function is selected, the update rule used that compute θ is given by [61,62]

Adaptive Moment Estimation (Adam) Optimizer.
Adam, proposed by [63], is designed as an algorithm for first-order gradient-based optimization of stochastic objective functions.is algorithm is relied on adaptive estimates of lower-order moments.Adam can be considered as an extension of the stochastic gradient descent employed to train machine learning models via an iterative weight updating process [64].It is noted that the conventional stochastic gradient descent employs a constant learning rate (Lr) for all weight updates.Adam seeks for improving the model training phase by adaptively fine-tuning the Lr parameter.
Adam harnesses information obtained from the average of the second moments of the gradients.In detail, this optimization algorithm computes an exponential moving average of the gradient and the square gradient.Moreover, a set of parameters (β 1 and β 2 ) is used to dictate the decay rates of these moving averages [64].Via experimentation, it can be shown that the advantages of this optimizer include efficient computation, straight forward implementation, no memory requirements, and the capability of dealing with a large number of optimized parameters [63].
In order to implement Adam to optimize a LRM, it is necessary to compute the gradient (g t ).e gradient g t in the case of using the least square loss function is given by [60] If the log likelihood function is employed, the gradient g t is given by [22,61,62] Accordingly, the Adam procedure (illustrated in Algorithm 2) used for training a LRM can be performed iteratively with the following steps: (i) Compute gradient g t (ii) Update the biased first and second raw moment estimates (iii) Compute the bias-corrected moment estimates (iv) Adapt the optimized parameters

Collection of Image Samples
e LRM used in this study belongs to the category of supervised machine learning methods.To train this LRM with the use of the aforementioned Adam optimizer, it is a requisite to prepare a set of training image samples as well as a set of testing image samples to verify the model construction phase.erefore, this study has carried out field surveys at several high-rise buildings in Danang city (Vietnam) to collect a set of 600 image samples.Among them, 300 samples contain localized spall objects and 300 samples consist of nonlocalized spall objects.Notably, image samples of the two class of nonspall (class label � 0) and spall (class label � 1) have been assigned by a human inspector for the purposes of model training and testing.e Cannon EOS M10 (CMOS 18.0 MP) and Nikon D5100 (CMOS 16.2 MP) have been employed to collect image samples.In addition, the image size has been standardized to be 64 × 64 to facilitate the computation process.
e image set has been collected so that a diverse background (e.g., cracks and stains) can be included.e collected image set is demonstrated in Figure 3.

The Proposed Hybrid Approach of Image Processing and Machine Learning Approach for Automatic Detection of Concrete Spall
is section of the study aims at describing the structure of the proposed hybrid approach of image processing and machine learning used for recognizing localized spall object.
e overall structure of the proposed approach is presented in Figure 4 After the GFs with different orientations and radial frequency are computed, the principal component analysis (PCA) is performed to transform the set of GFs and reduce the data dimensionality (Figure 5).e number of the PCA transformed data is selected corresponding to 99% of cumulative variance explained.It is noted that the GF and the Define step size a � 0.001 Define exponential decay rates β 1 � 0.9 and β 2 � 0.9999 Define the objective function f(θ) Randomly initialize the searched variable θ Assign m 0 � 0, v 0 � 0, and t � 0 (1) While (θ t not converged) (2) Compute gradient: Update biased 1 st moment estimate ( 5) Update biased 2 nd raw moment estimate ( 7) Calculate bias-corrected first moment estimate ( 9) Calculate bias-corrected 2 nd raw moment estimate ( 11) Update the searched parameter ( 13) ALGORITHM 2: e Adam optimization procedure.6 Advances in Civil Engineering PCA operations have been implemented via built-in functions provided by the Accord.NET Framework [65].
Based on the PCA result, the k-means clustering algorithm is used to segment the image sample.Via experimentation, the suitable number of clusters for the collected dataset is found to be 3. Subsequently, the morphological operation of filling and removing small objects are utilized to process the segmented image.Moreover, the operation of background removal is performed to remove redundant objects.In this study, an object within an image sample is considered to be background if its width or height is equal to that of the image sample.
Accordingly, each image cluster or segment is presented as a binary image.e connected component labeling algorithm [66] is then used to analyze the position of the binary-1 pixels and separate them into distinctive component regions.Essentially, all pixels having value binary 1 and are connected to each other are grouped into one object [38].To remove crack objects, for each grouped pixels obtained from the connected component labeling analysis, an object slenderness index (OSI) is computed as follows: where L OX and L OY are the object lengths along the X axis and Y axis, respectively.μ OX and μ OY denote the mean thicknesses of the object along the X axis and Y axis, respectively.
If the OSI of an object is greater than a certain threshold (T OSI ), this object is classified as a crack.Via several trial-anderror experiments with the collected image sample, a suitable value for the threshold T OSI is found to be 5.After the ROIs have been identified, operations of image convolution and cropping are employed to isolate the areas of interest within the image sample.e processes of ROI identification and isolation are demonstrated in Figures 6 and 7  us, the GLCM texture descriptor yields 4 × 4 � 16 features.In total, there are 15 + 18 + 16 � 49 numerical features that can be extracted from the image texture computation process.

Pattern Classification Using LRM Trained by the Adam
Optimizer.Using the extracted ROIs and the aforementioned texture descriptors, a dataset with 790 samples and 49 features can be constructed.
is dataset contains 465 nonlocalized spall samples and 325 localized spall samples.As stated earlier, the output class is either 0 for the negative class and 1 for the positive class.Moreover, in order to standardize the input features' range, the numerical texture descriptors have been normalized by the Z-score equation as follows: where X o and X ZN represent the original and normalized input data, respectively.m X and s X are the mean and the standard deviation of the original input data, respectively.Based on the aforementioned dataset, the LRM is trained with the Adam optimizer using the least square and log likelihood loss functions.
ese two LRM is denoted as Adam-LS and Adam-LL.It is noted that 90% of the collected dataset has been employed to construct the LRM model.Meanwhile, the rest of the dataset is reserved to verify the generalization capability of the model.

Research Findings and Discussion
As mentioned earlier, the whole collected dataset is divided into two subsets: a training set (90%) and a testing set (10%).Moreover, to diminish the effect of randomness brought about by data sampling and to assess the generalization capability of the integrated method reliably, the data sampling process has been repeated 20 times.A partitioned datasets used for model training and testing are demonstrated in Table 1.In addition, the LRM trained with the stochastic gradient descent algorithm with the least square and log likelihood loss function are employed as benchmark models.e stochastic gradient descent models coupled with the former and later loss function are denoted as LS-LR and LL-LR, respectively.Furthermore, the two LRMs trained with the Adam optimizer are denoted as Adam-LS-LR and Adam-LL-LR.All of the LRMs have been trained with 1000 iterations.
In addition, the classification accuracy rate (CAR), precision, recall, negative predictive value (NPV), and F1 score are computed to quantify the model predictive accuracy.ese performance measurement indices are provided as follows [67]: where FN, FP, TP, and TN denote the number of falsenegative, false-positive, true-positive, and true-negative samples, respectively.e experimental results obtained from the repetitive data sampling with 20 runs are reported in Table 2.As can be seen from this table, the Adam-LL-LR has achieved the best predictive accuracy in both of the training (CAR � 85.25%, precision � 0.84, recall � 0.81, NPV � 0.86, and F1 score-� 0.82) and testing phases (CAR � 85.32%, precision � 0.86, recall � 0.79, NPV � 0.85, and F1 score � 0.82).Since the prediction performances obtained from the training and testing phases of the Adam-LL-LR are relatively similar, it can be shown that this model has not suffered from overfitting.In addition, the LL-LR model is the second best approach (with CAR � 81.90% and F1 score � 0.78), followed by the Adam-LS-LL (with CAR � 72.03% and F1 score-� 0.71) and the LS-LR (with CAR � 70.82% and F1 score-� 0.70).Herein, the index of the F1 score is emphasized because it presents the harmonic mean of the precision and recall.
e training and testing performances of the employed models are graphically presented in Figures 8 and 9. e boxplot shown in Figure 10 demonstrates the testing performances of LRMs.In addition, to confirm the statistical difference of each pair of the localized spall detection models, the Wilcoxon signed-rank test with a significance level (p value) � 0.05 is employed in this section of the study.
e test outcomes are reported in Table 3. Observably, experimental results show that all of the p values are lower than the significance level.us, the null hypothesis shows that the performances of the two models under testing are statistically indifferent and can be confidently rejected.is hypothesis test asserts the superiority of the Adam-LL-LR model over other benchmark approaches.
Based on the experimental result, the Adam-LL-LR model is best suited for the collected dataset at hand.e performance of this model is further studied in this section.Illustrations of correctly recognized spall objects yielded by Adam-LL-LR are presented in Figure 11.As can be observed, the model can deliver accurate detection results in the  Nevertheless, as shown in Figure 13, the newly developed model has made incorrect detection results in the cases of complex background.As observed in Figure 13(a), an area in the background has the texture property similar to that of the

I 7 I 6 I 5 I 0 I c I 4 I 1 I 2 Figure 2 :
Figure 2: e graphical representation of BGC.

Figure 3 :
Figure 3: Demonstration of the collected image samples: (a) images containing localized spall objects and (b) images containing nonlocalized spall objects. .

Figure 6 :
Figure 6: ROI extraction for images containing localized spall objects: (a) one object and (b) multiple objects.

Figure 7 :
Figure 7: ROI extraction for images containing nonlocalized spall objects: (a) one object and (b) multiple objects.

Figure 8 :
Figure 8: Performance measurement indices for the training phase.

Table 1 :
Demonstration of the collected dataset.

Table 3 :
Wilcoxon signed-rank test results.prediction accuracy.erefore, the proposed integrated model can be a useful tool to assist building maintenance agencies in the task of evaluating structure heath condition. desired