Predicting Concrete Compressive Strength using Deep Convolutional Neural Network based on Image Characteristics

In this study, we examined the efficacy of a deep convolutional neural network (DCNN) in recognizing concrete surface images and predicting the compressive strength of concrete. A digital single-lens reflex (DSLR) camera and microscope were simultaneously used to obtain concrete surface images used as the input data for the DCNN. Thereafter, training, validation, and testing of the DCNNs were performed based on the DSLR camera and microscope image data. Results of the analysis indicated that the DCNN employing DSLR image data achieved a relatively higher accuracy. The accuracy of the DSLR-derived image data was attributed to the relatively wider range of the DSLR camera, which was beneficial for extracting a larger number of features. Moreover, the DSLR camera procured more realistic images than the microscope. Thus, when the compressive strength of concrete was evaluated using the DCNN employing a DSLR camera, time and cost were reduced, whereas the usefulness increased. Furthermore, an indirect comparison of the accuracy of the DCNN with that of existing non-destructive methods for evaluating the strength of concrete proved the reliability of DCNN-derived concrete strength predictions. In addition, it was determined that the DCNN used for concrete strength evaluations in this study can be further expanded to detect and evaluate various deteriorative factors that affect the durability of structures, such as salt damage, carbonation, sulfation, corrosion, and freezing-thawing.


Introduction
Concrete is a commonly used construction material that comprises water, cement, sand, gravel, and various other admixtures. It has been used for several centuries to ensure the integrity of structures. Considering the lower cost and advantages of concrete, it is difficult to find suitable alternative materials [Chahal, Siddique and Rajor (2012); Maia and Fiquieras (2012)]. The compressive strength of concrete, which is a vital element of structural design, is one of the most important mechanical properties characterizing the quality of concrete. Several other properties of concrete, such as impermeability, modulus of elasticity, and resistance to weathering agents, are directly or indirectly related to the compressive strength of concrete [Nematzadeh and Naghipour (2012)]. Traditionally, methods for evaluating the compressive strength of concrete are based on destructive testing. However, these methods require a considerable amount of time and cost, and their applicability in testing materials/structures in active use is limited [Sbartai, Breysse, Larget et al. (2012); Alwash, Breysse and Sbartai (2015)]. To address these issues, various non-destructive techniques have been developed, such as the impact hammer method, ultrasonic velocity method, pull-out test, penetration methods, as well as magnetic and radioactive methods [Basyig˘it, Çomak, Kılınçarslan et al. (2012)]. These methods for evaluating the compressive strength of concrete are typically used separately; however, they can be combined depending on requirements. These methods also involve a wide range of errors (4-15%) [Sbartai, Breysse, Larget et al. (2012); Bogas, Gomes and Gomes (2013)]. Additionally, researchers have been evaluating the potential of image processing as a method for recognizing concrete characteristics such as aggregate dispersion [Soroushian, Elzafraney and Nossoni (2003); Kabir, Rivard, He et al. (2009); Barbosa, Beaucour, Farage et al. (2011)]. Typically, empirical formulas are used when evaluating concrete characteristics through non-destructive techniques and image processing. As a result, these methods have a limited ability of interpreting image data obtained from a wide range of actual scenarios [Basyig˘it, Çomak, Kılınçarslan et al. (2012); Cha, Choi and Buyukozturk (2017)]. Considering the importance of the strength characteristics of concrete used for structural support, studies on evaluating the compressive strength of concrete have focused on realizing faster, more accurate, and more practical methods [Dogan, Arslan and Ceylan (2017)]. Researchers from a variety of fields have employed machine learning for the evaluation of phenomena recognized via image processing, under real-world conditions [Jahanshahi, Masri, Padgett et al. (2013); O'Byrne, Ghosh, Schoefs et al. (2014); Alwasel, Sabet, Nahangi et al. (2017); Dawood, Zhu and Zayed (2017)]. However, typical machine learning methods such as support vector machines (SVMs) and artificial neural networks (ANNs) only utilize one or two layers; hence, they have a limited ability of reflecting the complexity of images required to evaluate the strength characteristics of concrete [Zhang, Wang, Li et al. (2017)]. Recently, deep learning has attracted significant interest worldwide, owing to the rapid developments in big data processing and computing technology. Deep learning employs a multi-layer neural network structure based on the artificial neural network theory, thereby enabling machines to extract and process feature data to produce analytical results [Hinton, Osindero and Teh (2006)]. In particular, deep convolutional neural networks (DCNNs) are well-suited for recognizing complex images of objects [Barat and Ducottet (2016) ;Shi, Bai and Yao (2016) ;Zhang, Wang, Li et al. (2017)]. Therefore, current studies in the field of civil engineering have employed DCNNs for applications such as crack detection and aggregate shape evaluation [Zhang, Wang, Li et al. (2017);Gopalakrishnan, Khaitan, Choudhary et al. (2017); Cha, Choi and Buyukozturk (2017); Tong, Gao and Zhang (2017)].
In this study, we propose, examine, and demonstrate the usefulness of DCNN in recognizing concrete surface images and evaluating the compressive strength of concrete. In particular, this study employed a commercially available digital single-lens reflex (DSLR) camera and a microscope for recognizing the concrete surface images. The performance of the proposed DCNN model was evaluated with respect to the image capturing method. Moreover, the performance of the proposed model was evaluated in terms of accuracy and the ease of use. The remainder of this paper is organized as follows. Section 2 provides a literature review of related research approaches and the limitations of existing studies. Section 3 describes the proposed DCNN architecture. Section 4 presents details regarding the calculations and results of DCNN training using concrete surface image data captured via a DSLR camera and a microscope. This section also provides the results of the testing using different data and includes a discussion of the performance and potential of the proposed DCNN architecture. Finally, the conclusions of this research are summarized in Section 5.

Literature review
Image processing has been extensively used to analyze concrete characteristics such as pore structure and aggregate dispersion. Soroushian et al. [Soroushian, Elzafraney and Nossoni (2003)] developed specimen preparation and image processing and analysis techniques for the automated quantitative microstructural investigation of concrete, focusing on the micro-cracks and voids in concrete. Kabir et al. [Kabir, Rivard, He et al. (2009)] evaluated various edge-detection algorithms, as well as transform and statisticalbased methods, for their effectiveness in assessing the damage in a concrete dam, based on digital borehole imagery obtained using an acoustic televiewer. Barbosa et al. [Barbosa, Beaucour, Farage et al. (2011)] proposed an image processing-based technique for evaluating the uniformity of aggregate distribution in lightweight concrete. Basyig˘it et al. [Basyig˘it, Çomak, Kılınçarslan et al. (2012)] assessed the compressive strength values of different concrete classes by using the image processing technique. Dogan et al. [Dogan, Arslan and Ceylan (2017)] used Artificial Neural Networks (ANN) and Image Processing (IP) together to determine the mechanical properties of concrete, such as the compressive strength, modulus of elasticity and maximum deformation, at a certain success rate. As these characteristics of concrete influence the strength of the material, it can be inferred that the surface characteristics and strength of concrete can be determined through image processing. In addition to simple image capturing, a majority of these studies also employed empirical formulas to assess concrete characteristics, using captured images. However, the ability of this approach to interpret image data captured under real-world conditions is limited. Machine learning can be used as a solution for resolving this limitation. Therefore, studies have utilized machine learning, instead of standardized empirical formulas, to predict the strength of concrete, under a variety of actual scenarios. Yan et al. [Yan and Shi (2010)] used an SVM to predict the elastic moduli of normal-and high-strength concrete and compared the elastic moduli predicted by the SVM using experimental data with that of other prediction models. Castelli et al. [Castelli, Vanneschi and Silva (2013)] proposed an intelligent system based on genetic programming for predicting the strength of high-performance concrete. Chou et al. [Chou, Tsai, Pham et al. (2014)] performed a comprehensive comparison of various learning techniques used individually and in combination, for executing simulations of the compressive strength of concrete, based on multi-nation datasets with diverse additive materials. Additionally, researchers in the field of civil engineering have proposed a variety of analysis techniques that combine image processing and machine learning for crack detection and the motion analysis of workers. Jahanshahi et al. [Jahanshahi, Masri, Padgett et al. (2013)] introduced a contactless, remote-sensing crack detection and quantification methodology based on threedimensional (3D) scene reconstruction via computer vision, image processing, and SVM. O'Byrne et al. [O'Byrne, Ghosh, Schoefs et al. (2014)] proposed an image analysis-based damage detection technique to supplement and strengthen existing visual inspection methods as a quick and convenient source of quantitative information. Alwasel et al. [Alwasel, Sabet, Nahangi et al. (2017)] proposed and validated an SVM-supervised machine learning algorithm to classify the poses of masonry workers, based on expertise. Dawood et al. [Dawood, Zhu and Zayed (2017)] developed an integrated model based on image processing techniques and machine learning to automate consistent spalling detection and the numerical representation of distress in subway networks. Although these studies have established machine learning methods, they are limited by inappropriate feature extraction during image processing as well as optimization problems due to image complexity. Deep learning is a method devised by Hinton et al. [Hinton, Osindero and Teh (2006)] for resolving the optimization problems associated with existing ANNs. The DCNN has attracted considerable research attention in a variety of fields because it has achieved impressive results for feature extraction from image data. Consequently, studies have examined the efficacy of DCNNs in evaluating a variety of features, using structural image data. Zhang et al. [Zhang, Wang, Li et al. (2017)] proposed an efficient network architecture based on the convolutional neural network (CNN), for pavement crack detection on 3D asphalt surfaces with complete consideration of the pixel perfect accuracy. Gopalakrishnan et al. [Gopalakrishnan, Khaitan, Choudhary et al. (2017)] employed a DCNN trained on the "big data" ImageNet database, which contains millions of images, and transferred this learning to automatically detect cracks in the images of Hot-Mix Asphalt and Portland Cement Concrete surfaced pavement including a variety of non-crack anomalies and defects. Cha et al. [Cha, Choi and Buyukozturk (2017)] proposed a visionbased method employing a deep architecture of CNNs to detect concrete cracks without calculating defect features. Tong et al. [Tong, Gao and Zhang (2017)] employed the 3D-CNN method to provide an appropriate model for automatically evaluating aggregate angularity with respect to time, based on digital images. This study examines the efficacy of a DCNN in evaluating the compressive strength of concrete, using concrete surface images as the input data for the DCNN.

Research methodology
In this study, we employed a DCNN, which is a deep neural network specialized for image recognition. Previous studies have improved the performance of neural network structures by deeply processing image data in a layered manner. The concept of DCNN was well-developed by the 1980s and 1990s [LeCun, Boser, Denker et al. (1990)]. When AlexNet was developed in 2012, DCNN was considered to be unparalleled in the field of image processing [Krizhevsky, Sutskever and Hinton (2012)]. A DCNN is composed of two main components: (1) a feature extraction component, which extracts features from input images and comprises the convolution layer and the pooling layer, and (2) a classification component based on the fully-connected layer.

Overall architecture
In this study, an analysis was performed based on the previously mentioned AlexNet structure. Fig. 1 depicts the proposed DCNN architecture, and Tab. 1 lists the dimensions of each layer and operator. The DCNN used in this study comprises five convolution layers and three fullyconnected layers. The first layer is an input layer with a pixel resolution of 224×224×3, which has 3 RGB components. A feature map is created as the input data moves through each layer, and ultimately a feature map with a resolution of 6×6×256 is created by the convolution (C5) and pooling (P3) layers. The final feature map is constructed in the three fully-connected layers and delivered as a single result in the final output layer (L11). Generally, a DCNN uses a softmax function in the final output layer, which is an activation function focusing on classification without a regression function. However, in this study, the goal was to identify the correlation between the surface images and compressive strength of concrete; hence, the Euclidean loss function expressed in Eq. (1) was used in the final output layer instead of a softmax function.
where N is the total number of data, oi is the output of the DCNN (i.e., the predicted compressive strength), and yi is the actual compressive strength. Regarding the weights of the DCNN, training was performed using a backpropagation algorithm to minimize the loss function. Additionally, a rectified linear unit (ReLU) as used as the nonlinear activation function related to the input and output of the five convolutional layers-L1, L3, L5, L6, and L7and two fully-connected layers-L9 and L10). Furthermore, a dropout (add reference) was used in the fully-connected layers (i.e., L9 and L10) to minimize overfitting.

Convolution and pooling layers
The convolution layers create a new feature map that highlights unique features in the input image. In addition, these convolution layers can share weights in the same feature map, to reduce the number of parameters; hence, the correlations of features in nearby areas can be learned. As shown in Fig. 2, the convolution calculations employed for computer vision are used in the convolution layer [Lecun, Bottou, Bengio et al. (1986)]. Thus, the feature map is determined according to the values of convolution filters. The filter weights, which are learned in the CNN, are element values of the filter. Generally, an odd number is used for the filter size; however, this attribute can vary depending on user preference. Stride refers to the size of the pixel that moves the filter, and the output map is determined by the number of filters [Lecun, Bottou, Bengio et al. (1986)]. In this study, five convolution layers were arranged. The first convolutional layer had a filter size of 11×11, a stride of 4, and created 96 feature maps. The final fifth convolutional layer had a filter size of 3×3, a stride of 1, and created 256 feature maps. Subsequently, these feature maps proceed through the pooling layer and the fully-connected layer to produce the final output.

Figure 2: Convolution example
As observed in Fig. 3, the pooling layer subsamples the output of the convolution layer.
Pooling is implemented to refine the results of feature extraction via convolution. Numerous features are extracted if only convolution is performed; hence, pooling is implemented during the convolution process because it retains stronger features and discards weak features. Pooling methods include mean pooling, which extracts the mean value, and max pooling, which extracts the largest value; among the two, max pooling yields superior performance [Scherer, Muller and Behnke (2010)]. Therefore, in this study, the convolution layers-C1, C2, and C5-and the pooling layers-P1, P2, and P3-were arranged together, and a max pooling method with a filter size of 3×3 and a stride of 2 was employed.

Activation function
In a neural network (NN), the function that converts input signals into output signals is called the activation function. In a normal ANN, the logistic sigmoid function and the hyperbolic-tangent function are among the most commonly used activation functions for imparting nonlinearity. However, these functions are encumbered by their slow learning rates. For a small-sized NN, this problem is not severe; however, a considerably large NN, such as a DCNN, is significantly affected by the low learning rates of these functions [Glorot, Bordes and Bengio (2012)]. To resolve this problem, the ReLU was used as the nonlinear activation function [Nair and Hinton (2010)]. Fig. 4 illustrates the different types of nonlinear activation functions. Although other nonlinear functions are bounded to the output values, such as positive and negative ones and zeros, ReLU has no bounded outputs, except for its negative input values. Intuitively, the gradients of ReLU are always zeros and ones. Therefore, ReLU is relatively simple and requires fewer calculations; moreover, ReLU is a fast learner and yields more accurate results than other nonlinear activation functions [Cha, Choi and Buyukozturk (2017)]. Therefore, in this study, the ReLU activation function was used as the input-output activation function of the neuron.

Methods for reducing overfitting
Overfitting refers to a situation where a NN becomes overly accustomed to training data and cannot appropriately handle other data, leading to poor validation and testing results.
Overfitting mainly occurs because the training data is insufficient for the number of parameters to be learned. To resolve this problem of overfitting, augmentation and dropout methods were employed in this study. The data augmentation used in this study included random crop and horizontal flipping methods, as illustrated in Fig. 5. Random crop is a method in which images are randomly cropped or scaled. In this study, a random seed was created in the 18×18 area at the topleft of the 112×112 reduced images. This point was then used as the top-right coordinate to create 84×84 image data. The images were also rotated by 180° along the x-axis to increase the amount of data.
Furthermore, dropout was used to minimize overfitting. In addition to data augmentation, there are other methods for preventing overfitting; these methods reduce the network structure to reduce the number of parameters. Currently, the dropout method is typically used. This method does not train the entire NN; instead, it randomly selects a part of the NN and trains it according to a certain dropout rate [Srivastava, Hinton and Krizhevsky et al. (2014)].

Experimental study
This study used a DCNN to predict concrete compressive strength based on concrete surface image features. A DSLR camera and a microscope were used simultaneously to capture concrete surface images. The DSLR camera was portable and could easily capture wide ranges at high resolution. However, the DSLR camera experienced certain difficulties when shooting with high magnification. In contrast, the microscope, which is designed to shoot higher magnification images, captured detailed concrete surface image characteristics; however, the microscope images had a relatively narrow range. To examine the characteristics of this imaging equipment, DCNNs that used the images captured by the DSLR camera and microscope as input data were implemented, and the analysis results were evaluated. The AlexNet model was used as the network architecture for training the DCNNs to use each type of image data. Finally, the performance of the DCNN models was indirectly analyzed by comparing the accuracy of models to that of models which evaluate concrete compressive strength through existing non-destructive testing (NDT) methods. In this study, a workstation equipped with four GPUs was employed to perform the analysis (CPU: Intel Xeon E5-2620 v4 @ 2.1 GHz with 64 GB RAM and four Nvidia GTX 1080 Ti GPUs).

Data preparation
The DCNN model for evaluating concrete strength were created in advance. At 28 days, the concrete specimens had a mixture of strengths, including 18 MPa, 24 MPa, and 40 MPa, as listed in Tab. 2. The specimens of each strength were categorized according to their material age in days, and strength tests were performed. The results provided in the table constitute the output data. The images were obtained by capturing the surfaces of concrete specimens with a DSLR camera and a microscope. The DSLR camera captured a total of 332 images at a resolution of 4,096×2,160 pixels. The microscope captured 300 images at a resolution of 1,920×1,080 pixels. The distance between the imaging equipment and the concrete surface was approximately 1.0-1.5 cm. Random crop and horizontal flipping were performed on the captured images, as shown in Fig. 6, and the images were resized to 84×84 pixels resolution. Finally, 4,709 and 3,846 image data were created in DSLR and microscope categories, respectively, shown in Tab. 3. The data were distributed randomly in the learning, validation, and testing processes.

Training
The DCNN model for evaluating concrete compressive strength based on concrete surface images was trained via stochastic gradient descent (SGD) backpropagation algorithm delineated in Eq.
(2), where i is the iteration index, v is the momentum variable, is the learning rate, and � � � is the average over the ith batch of the derivative of the objective with respect to , evaluated at . (2) The total DCNN training and validation images that were captured by DSLR camera were 3,601 and 515, respectively, and 2,804 DCNN training images and 401 validation images were captured using the microscope. The minibatch size for both the models was the same, i.e., 128. The initial learning rate, weight decay, and momentum parameters were set at 0.01, 0.0005, and 0.9, respectively. The convolution layer C1 and convolution layers C2-C5 had a stride size of 4 and 1, respectively. The pooling layers P1-P3 had a stride size of 2. After setting the dropout rate to 0.5, the analysis was conducted. Fig. 7 shows the DCNN learning curves for the DSLR camera and microscope data. As can be seen in Fig. 7, the learning and validation loss both decreased as the iterations increased, and as a result overfitting did not occur. The results demonstrated that the image data from both the DSLR camera and the microscope contain patterns which can predict compressive strength, and that these patterns can be learned by the DCNN.

Testing
After training, performance tests were conducted to assess the efficacy of the DCNN for evaluating concrete compressive strength. The box plots in Fig. 8 illustrates the calculated error rates of the DCNN for each level of compressive strength using images captured by the DSLR camera and the microscope. As observed in Fig. 8, although there were cases where the maximum value of the error rate was extremely high, most of the values between the first and third quartiles were close to the median value. When comparing the error rates by level of strength, it was observed that as the design strength increased, not only did the maximum value of the error rate decrease, but also the thickness of the box gradually narrowed, indicating increased accuracy.
(a) DSLR camera (b) Microscope Figure 8: Box plots of deep convolutional neural network (DCNN) error rate according to strength On examining each DCNN in terms of the DSLR camera and microscope images, we found that the DCNN accuracy was higher for the images captured by the DSLR camera. When Figs. 8(a) and 8(b) are compared, the box thickness for most strengths is found to be relatively thin for the DCNN which uses DSLR images compared to those derived from the microscope dataset. This means that the error rate was concentrated on the median value. Thus, it is observed that the DCNN performance was relatively better for DSLR camera images compared to the microscope images. This analysis is supported by further analysis provided in Tab. 4. The root-mean-square error (RMSE) and mean absolute percentage error (MAPE) of the DCNN that used DSLR images were relatively low. This was because the images captured by the DSLR included a relatively wide range. This showed that, inclusive images with a wider range were better-suited for extracting more features. In fact, most of the literature on non-destructive testing uses these empirical formulas and compares them with the existing empirical formulas. Therefore, to assess the performance of the proposed DCNNs, we consulted the literature evaluating the performance of existing empirical formulas, and indirectly compared the prediction performance of the empirical formulas to that of the proposed DCNNs. For a comprehensive evaluation of the prediction performance of existing empirical formulas refer to Kim et al. [Kim, Oh and Oh (2016)]. Fig. 9 shows the graphs of the indirect comparison of the prediction performance of the evaluation results and the proposed DCNNs. As can be observed from the figure, the prediction performance of the DCNNs was relatively high in terms of RMSE. Therefore, we assert that the minimum DCNN prediction performance is acceptable. Figure 9: Comparison of RMSE between NDT methods and the proposed DCNN

Conclusion
Concrete strength evaluations are extremely important because concrete plays a major role in supporting structures. Non-destructive methods for concrete strength evaluation is gradually attracting more attention because these techniques maximize the efficiency. Deep learning technology is also attracting increasing interest owing to the growth of big data technology. This study proposed, examined, and demonstrated the usefulness of a deep convolutional neural network (DCNN) for recognizing concrete surface images and evaluating concrete compressive strength. The analysis showed that the DCNN which used the DSLR camera image data had relatively high accuracy compared to that of the microscope image data. This higher accuracy was because the DSLR camera captured a wider range compared to the microscope, thus making it better-suited for extracting more features. These results also indicate that the DCNN is an effective tool for image-based analysis. When compared to the microscope, the DSLR camera was much easier to use in the field, as it reduces time and expense losses which further strengthens its usefulness. When the accuracy of the proposed DCNN approach was indirectly compared to the existing non-destructive methods for evaluating concrete compressive strength, the proposed DCNN exhibited improved concrete compressive strength prediction performance compared to other approaches. Although more data must be collected and evaluated, and a more detailed analysis should be conducted, this study still establishes the potential of DCNNs as nondestructive testing methods to evaluate concrete compressive strength.
To ensure the durability of structures, it is necessary to develop a method for detecting and evaluating not only the concrete compressive strength, but several other concrete deterioration factors such as salt damage, carbonation, sulfation, corrosion, and freezingthawing. Most methods for evaluating these concrete deterioration factors have limited practicality because they rely on empirical formulas. In this study, the main aim of DCNN was to evaluate the concrete compressive strength, but this could be expanded to include additional deterioration factors as well.
Funding Statement: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2018R1A2B6007333). This study was supported by 2018 Research Grant from Kangwon National University.

Conflicts of Interest:
The authors declare no conflicts of interest.