Machine Learning Application for Concrete Surface Defects Automatic Damage Classification

Defects damage in concrete structures are an important measure of structural integrity and serviceability. In the context of investigating the condition of concrete surface that has defects, a visual inspection is usually performed. However, this method is subjective, tedious, time-consuming, and complicated, requiring access to many components of a large project design. Therefore, a Machine Learning classifier for concrete surface defect classification using the Discriminant Analysis Classifier was introduced to more accurately extract the types of concrete surface defects information from the digital images. The aim of this research is to increase the efficiency of concrete surface defect analysis in terms of quality, time and cost. 200 images were collected, with 50 images for each concrete defect (crack, corrosion, spalling, and no defect) serving as control data. The Gray Level Co-Occurrence Matrix (GLCM) is used to create an image processing and feature extraction algorithm. This model is trained using 80% of the image data and tested using another 20% of the image data. Thus, the model achieved 95% accuracy on the training data and 70% on the test data when using Quadratic Discriminant Analysis. These findings is very important to help engineers or construction inspectors in inspection activities.


INTRODUCTION
Engineering constructions, such as reinforced concrete (RC), are frequently subjected to various dynamic and cyclic stresses, which cause defects and deterioration at the microscopic level at the structure's surface in the long run.The cross-sectional areas and stiffness of the structure exposed to these stresses will be reduced by initiation of surface defects, which eventually lead to material fractures (Bernard & Richard 1976;Jacob 1987).Early detection of these surface defects enables for the implementation of preventative measures to minimize the damage and failure (Dhital & Lee 2012).Commonly, surface defect detection work is the conducted of using destructive and Nondestructive Test (NDT) to classify the types of defects and quantify the severity of the damage, i.e., cracks, spalling, and rust formation, on the surface of the structure.As a result, developing appropriate inspections procedure is critical for achieving accurate and reliable surface damage diagnostic of the concrete structure's surface condition.
The NDT is an alternative method of inspection and maintenance of RC structures to access the types of surface damage on RC structures without to disturb physically the structures due to its mobility and relatively rapid manner execution (Senin et al. 2019).One of the most prominent and powerful preliminary NDT approach to identity this type of surface damage is by the visual inspection (VI) technique.VI can provide a lot of information that can lead to a specific diagnosis of the source of the distress.VI detection rates tend to vary substantially depending on the application and kind of examination, according to earlier studies in other areas.Drury and Fox (1975) claim a 20-30% error rate, however these values vary greatly between applications and contexts.Despite their limited precision (up to a few hundredths of a millimeters), these VI procedures are time demanding, and only capable of onedimensional point wise measurements, limiting them to a few discrete parts of the RC structure (Valenca et al. 2019).As a result, it is necessary to use a different technique to address these challenges.
Currently, there is an increase of interest on the autoclassification of surface defects on RC structures by using machine learning algorithm.Patrik (2013) stated that the main advantage of this approach is that it provides more accurate results compared to traditional manual approaches.The difficulty of defect detection is highly dependent on the image size (pixels).The image resolution of newer digital cameras is over 10 megapixels.This higher resolution allows detailed photographs of RC surfaces to be taken.With modern commercial cameras, a large area of a concrete surface can be captured in a single shot.A long-range image can be used for useful detection of surface defect on RC structures in low-cost applications (Rodriguez et al. 2016).
This study aims to principally establish an NDT method on classifying the types of the surface damage via digital image processing (DIP) by using the camera.The purpose was to find answers to three major issues about the quality of VI of surface damage defect classification identification on RC structures using the Discriminant Analysis Classifier (DAC):

METHODOLOGY SURFACE DEFECT TYPES AND DIGITAL DATA MANAGEMENT
Two-hundred digital images of three selected types of concrete surface defect were collected by a digital camera from RC buildings on Pulau Pinang.The digital image of the concrete surface defect was then converted to jpg format under 227 × 227-pixel resolution, which is the best format to represent the surface defects.Cracks, corrosion, and spalling were chosen as surface defects categories because they are commonly known surface defects in the RC structure.Fifty digital images acquired for each of these surface defects, including the images for non-defect surface.
The whole digital images data will be divided into two portions; with 80 percent of the data being used for training and the remaining 20 percent being utilised for testing.According to Gholamy et al. (2018), there is no established guideline to determine the percentage as the training and testing data sets and most of the researchers employ 80:20 split between the training and testing datasets.As a result, in order to get the optimum model performance, this study will split the testing and training data in an 80:20 ratio.The images of all surface defect on RC structure are shown in Figure 1 until Figure 4.

FEATURE EXTRACTION AND SELECTION
Gray Level Co-occurrence (GLC) was employed as the main digital image processing algorithm to extract all 21 FIGURE 1.The surface crack defect digital image samples data of the study average features in all surface defect images (Table 1).Generally, the GLC algorithm characterise an image's texture by detecting how frequently pixel pairs of unique values occur in an image and each spatial connection, creating twenty-two GLC matrix features, and extract the statistical data from that matrix (Priyanka & Kumar, 2020).
All of these matrix feature's values will be imported in Microsoft Excel and the average values of these matrix for each of surface defect classes will be then processed for the feature selection stage by MATLAB.The variance threshold approach is used to the remove the unwanted average GLC matrix values, which is more than 10 percent.where µ k and Ʃ k is the mean and covariance of each class k of the surface defect.All of this computation is performed by MATLAB software.The automatic classification of surface defect k is done based on the largest probability for a randomly selected testing data.
Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) were the Discriminant Analysis Classifier algorithm to train the GLC matrix datasets.The confusion matrix for LDA and QDA were depicted to measure the accuracy of the current approach.

RESULTS AND DISCUSSION FEATURE SELECTION AND TYPES OF CLASSIFIER EFFECT TO THE CLASSIFICATION ACCURACY
The variance threshold approach was applied to the 22 average GLC matrices, and any average GLC matrix value that has more than 10 percent variance is removed from the original average GLC matrix.Five features, namely correlation 2, cluster tone, information measure of correlation, information measure of correlation 2 and inverse difference moment were eliminated from the 21 original features.In this paper, it is hypothesized that the accuracy of surface defect classification is higher by using only these remaining 18 features selected from 21 than by using the entire 22 features.
Table 2 shows the comparison between the percentage classification accuracy of 22 features (complete features without selection) and after feature selection (18 features).In general, QDA performs better than LDA in classifying surface defects, regardless of the number of features used.The removal of 5 unnecessary features from the original data, leads to increment on the classification accuracy by the discriminant analysis classifier by 7.5 percent (LDA) and 5.6 percent (QDA).The improvement in results by removing unnecessary features is consistent with the findings of other researchers (Jensen, 2005;Rahman et al. 2009).However, when changing the analysis from LDA to QDA, a very small percentage improvement in classification prediction (0.6 to 2.5 percent) is observed regardless of the number of features.

CLASSIFICATION PERFORMANCE VALIDATION USING CONFUSION CHART
The performance measurement for classifying each of surface defects (i.e., cracks, corrosion, and spalling) were validated using the confusion chart as shown in Figure 5 and Figure 6 as the Truth Positive Rate and False Positive Rate.It is worth noting that the Truth Positive Rate (TPR) is a parameter that measures the percentage of true positive cases that are correctly identified by the algorithm DCA, while the False Negative Rate (FNP) is the percentage of probability that a true positive case is missed by the algorithm DCA.
Both plots show the rows corresponding to the actual class of surface defects identified prior to image acquisition, while the columns represent the predicted classification based on the DCA algorithm.Figure 5 depicted that LDA classifier was able to predict the surface crack and corrosion with full accuracy, followed by no defect surface (95 percent) and the lowest prediction accuracy (85 percent) for spalling defect.
Figure 6 shows that QDA has similar accuracy in predicting defect-free surface and corrosion class.However, QDA has improved its classification prediction for the spalling class by 10 percent.Tharwat (2016) explain that the possible reason for this increase is that QDA allows for different feature covariance matrices, resulting in a quadratic decision boundary for classification.In contrast, the classification prediction accuracy for surface cracks decreased by 7.5 percent compared to LDA.
From the results obtained, it can be observed that LDA and QDA models perform differently in predicting different types of defects.The LDA model has shown excellent learning capability, as it perfectly predicted both corrosion and crack defects.This indicates that the LDA model is a reliable approach in predicting these types of defects in concrete.However, it appears that the LDA model had FIGURE 5. Confusion chart of all surface defects types prediction using LDA FIGURE 6. Confusion chart of all surface defects types prediction using QDA some difficulty in predicting spalling defects, with an accuracy rate of 82.5 percent.On the other hand, the QDA model performed well in predicting corrosion defects, achieving perfect prediction accuracy.However, it showed limitations in predicting crack defects, with a false negative rate of 7.5 percent.Despite this limitation, the QDA model still achieved an impressive true positive rate of 92.5 percent accuracy in predicting crack defects.Furthermore, the QDA model performed better in predicting spalling defects than the LDA model, with an accuracy rate of 92.5 percent.Overall, it is evident that different models have varying strengths and weaknesses in predicting different types of concrete defects.Therefore, it is important to choose the most appropriate model for each specific defect type to obtain accurate results.

CONCLUSION
The main contribution of this study is to provide an automated system for concrete damage classification identification using the Discriminant Analysis Classifier.The Discriminant analysis had analysed 200 images data of 4 types of concrete surface defects.The study has been evaluated with 80 percent of the training data and 20 percent of the testing data.An automated classification system has been proposed to reduce the number of features and classification accuracy.
The following findings were highlighted in this study: 1.In general, the performance of the classification accuracy of LDA and QDA is satisfactory.However, the percentage accuracy between these techniques is still slightly different, ranging from 94.375% to 95%.

The reduction of unnecessary features from 22 to
17 features improves the prediction of classification accuracy by 7.5% (LDA) and 5.6% (QDA).However, an insignificant improvement was found when the QDA classifier was used instead of LDA to predict classification.3. The QDA classifier was found to improve classification prediction for chipping surface defects by 10% compared to the LDA classifier and to decrease prediction accuracy for surface crack defects by 7.5%.Higher order model such QDA classifier model able to learn better than LDA classifier model in certain concrete defects, such as corrosion defects.

FIGURE 2 .
FIGURE 2. The surface corrosion stain defect digital image samples data of the study

TABLE 1 .
Average values of of twenty-one GLC matrix values feature of surface defect DCA have been chosen as the intelligence automatic image classifier features to discriminate or to separate the classes of surface defect groups.In order to perform the surface defect classes k separation, a training dataset, X with the computed average GLC matrix, the probability density function of each surface defect class x k from the training set X is estimated.The probability density function can be expressed by Equation (1),(1)