Application of Deep Learning and Unmanned Aerial Vehicle on Building Maintenance

,


Introduction
Changes in customer preference may negatively affect building sustainability, well-being, and safety and may eventually increase competitiveness in the market. For proactive and prompt building maintenance and repair work, customers seek quick, effective building monitoring approaches to avoid severe damage and unnecessary expenditure [1]. Conventional approaches for examining building structures typically require the involvement of building surveyors who conduct assessments of building elements. ese assessments include lengthy site inspection for systematic recording of the building elements' physical condition on the basis of note-taking, photographs, drawings, and customer-supplied information [2], followed by analysis of the collected data and writing of a health assessment report of the building. e components of this report include the assessed building's current state, recent updates, maintenance and repair records, and future longterm repair cost estimates [3]. However, this approach is a time-, labor-, and cost-intensive process and can endanger the surveyors' health and safety, particularly when the building to be assessed is a mid-to high-rise structure. Convolutional neural networks (CNNs) have been applied to detect the deterioration of many structures such as roads, bridges, and tunnels but have rarely been employed to detect deterioration of building external walls [4][5][6]. Moreover, unmanned aerial vehicles (UAVs) have wide applications in deterioration detection. Consequently, a UAV-CNN combination for external wall deterioration detection could have practical applications, ensuring surveyor safety.
In this study, we focused on the automated image-based detection and localization of key defects (efflorescence, spalling, cracking, and defacement) in the external wall tiles of buildings. However, this study was only a pilot study and thus has a few limitations: (1) the model could not consider multiple defect types simultaneously; in other words, all the considered images belonged to only one category; (2) the model considered only images with visible defects.
Herein, this study reports a CNN application for the automated assessment of the external wall tile condition of buildings, with a brief discussion of the method for selecting the most common defects of these tiles. First, we provide a brief overview of various applications of CNNs, including deep learning techniques, for resolving computer visionrelated problems, followed by a description of the theoretical basis for the current study. is research proposes a model for detection and localization that is based on transfer learning, involving the use of VGG-16 to execute feature extraction as well as feature classification. Next, the localization problem and the class activation mapping (CAM) technique-incorporated within the defect localization model-are discussed. Subsequently, we discuss the employed dataset, the developed model, and the obtained results, finally followed by conclusions and directions for future studies.

Factors Leading to Building Deterioration.
Building lifespan can vary from decades to centuries. In general, building durability can be increased through constant protection, repair, and maintenance activities [7,8]. e deterioration rate and degree differ among building components, with construction design, material, method, construction quality, and environment being the crucial influencing factors [9]. Several factors leading to building deterioration may be divided into the following categories: natural environment (temperature, relative humidity, sunshine, wind, and water), natural disasters (earthquakes and typhoons), and human factors (design, construction, users, management, and maintenance) [10][11][12][13].

Building External Wall Tile Defects and
eir Types External wall tile defects not only influence the overall appearance of buildings but also endanger public safety; for instance, they may lead to injuries due to their falling. External wall tile defects can be roughly divided into five types: defacement, efflorescence, cracking, spalling, and bulging. Of these, defacement, efflorescence, cracking, and spalling have been the main focus of most studies: (1) Defacement. Defacement, the most significant and common type of external wall tile deterioration in buildings, is closely related to the architectural shape and design of a building and long-term influence of wind and rain on it [14]. Several major factors result in the defacement of external wall tiles. For instance, when rebar is exposed due to external wall cracks, water containing rust from the corroded iron flows out of the walls, defacing the affected areas.
Moreover, installation of accessories can damage external wall tiles, thus promoting algal and fungal growth on the affected walls. (2) Efflorescence. Efflorescence-commonly known as whiskering, saltpetering, or "wall cancer"-often affects the hollow bricks of building finishes, joints of external wall tiles, or joints of stone veneers. Efflorescence prevention in cement mortar or concrete-based structures is impossible. (3) Cracking. e main causes of external wall cracking include overloading of buildings, uneven land subsidence, and violent shaking during earthquakes [15]. e drying shrinkage of external wall concrete, corrosion expansion of rebar, secondary construction of external wall accessories, and man-made disasters of fire and explosion can aggravate this cracking. Furthermore, tile breakage can lead to entry of rainwater into the main bodies of buildings, resulting in internal and external structural deterioration. Hence, cracks on a building's facade can influence the building's appearance and cause rainwater invasion, possibly leading to inconvenience in daily life or loss of property or even affecting building safety and durability. (4) Spalling. Spalling is characterized by falling off of surface decorative materials (e.g., tiles and coating) due to reduction in adhesive strength, aging of cement mortar and concrete, poor tile quality, high temperature caused by fire, or natural forces (e.g., strong wind and violent shaking during earthquakes) [16][17][18]. (5) Bulging. Bulging mainly occurs between concrete and the base cement mortar. Gaps form between the layers of cement mortar and surfaces of external wall tiles, resulting in material separation. Long-term changes in temperature or humidity lead to a reduction in adhesive strength and separation of adhesive interfaces for various adhesives.
e methods of building deterioration detection include visual assessment, percussion-based identification, rebound intensity assessment, ultrasonic wave propagation assessment, pull-out testing, infrared thermography, and UAV use [44][45][46]. Compared with other methods, the application of UAV is a more efficient method to collect huge amount of building data [47,48].
In addition to deterioration detection, UAV can be used in environment monitoring, traffic management, pollution monitoring, and security [49][50][51]. UAV is also an important emerging technology to develop sustainable communities [52].

CNN Use for Building Deterioration Detection.
With the development of deep learning, the applications of automatic defect detection on community infrastructures and built environment are increasing. CNNs have been used for rapid structural damage detection and maintenance cost estimation after a serious earthquake so as to provide a reference for owners and decision-makers to make accurate and timely risk management decisions [53]. Region-based CNN (R-CNN) and faster R-CNN have also been used for road damage detection and classification [54]. Other CNN applications include the detection of concrete cracks [55][56][57], automated detection of deformation at the bottom of steel box girders of long-span bridges [58], and automated detection of building types in street images [59]. Besides, CNNs have also gradually used in building external wall defect detection. Agyemang and Bader applied a CNN for detecting cracks on the building external walls and assessing the defects therein [3]. Perez et al. also used CNNs to detect the building defects [9]. As shown in the related researches, VGG-16 and CAM are the commonly used methods in the application of building defect detection.
In summary, although deep learning has been used in many engineering fields [60,61], it has less been used for detecting external wall deterioration. Moreover, integrating UAV and deep learning applications may increase the practical value of automated external wall deterioration detection.

Materials and Methods
is study developed a deep learning model with the ability to classify defects, namely, efflorescence, spalling, cracking, and defacement, in the external wall tiles of buildings. By applying CNNs, we identified the related limitations and challenges based on the nature of not only the defects to be investigated but also the surroundings: images showing the defect types of different external wall tile sources were collected first, and then, the data were appropriately cut and resized; the obtained dataset was used to train the network model after completion. Next, by using a transfer learning technique with a pretrained VGG-16 model in ImageNet as our model, this study customized and initialized the weights. Subsequently, this study used a separate set of images, not seen by the trained model thus far, to validate and examine the trained model's robustness. Finally, this study applied CAM and addressed the localization problem.

Dataset.
All external wall tile images were obtained using mobile phones, handheld cameras, and drones; thus, they had differences in resolution and size. Accordingly, to increase the study dataset size, the obtained images were sliced into images with a resolution of 224 × 224 and 3024 × 4032 pixels. In total, 5680 images were used as the training dataset for our model, all of which were labeled and categorized as efflorescence (n � 1382), spalling (n � 1386), cracking (n � 1551), and defacement (n � 1361) images ( Figure 1). Additionally, of the images in the dataset, 10% randomly selected were used to form a validation dataset. To prevent overfitting, this study applied a wide variety of image augmentation processes, namely, rescaling, rotation, height, and width shift, to the training dataset. e datasets could be viewed in the public website:

Method for Automated Defect Detection.
is study used a modified model as the feature extractor ( Figure 2) and applied fine-tuned transfer learning to an ImageNet-pretrained VGG-16 network [62]. e mentioned transfer learning is to first conduct training under big data to ensure that the deep learning network has the basic ability to recognize objects. Subsequently, the classification layers of the network are replaced with the required categories to make the network more robust.
is study used VGG-16 because it is powerful yet has simple architecture with relatively few layers.
is architecture comprises five convolutional layer blocks with max pooling for feature extraction; next, three fully connected layers and one final 1 × 1000 Softmax layer come after the mentioned layer blocks. Moreover, in the CNN, the input comprises 224 × 224-pixel RGB images, and the first block consists of two convolutional layers with 32 filters, each size 3 × 3. e second, third, and fourth convolution blocks use filters of sizes 64 × 3 × 3, 128 × 3 × 3, and 256 × 3 × 3, respectively. is simple architecture eases model modification processes for transfer learning and CAM while preserving the model's accuracy.
In the determination of hyperparameters, some of the default values are directly used and some of them are determined by training data testing and modifying. e default values of optimizer (as SGD), momentum (as 0.9), and weight decay (as 5e −4 ) are directly used without modifications [63]. e range of 1r is from 0.001 to 0.01, and the convergency efficiency is better on 0.01 after testing. Although there are many loss functions, the Advances in Civil Engineering cross-entropy loss method is used owing to the research objective of basic classification. Batch size is usually justified by the multiples of 2, and 2 5 is determined by the system performance. To fine-tune the VGG-16 model, the initial four convolutional layer blocks were first used as the generic feature extractor, and then, the final 1 × 1000 Softmax layer was replaced with a 1 × 4 classifier (for efflorescence, spalling, cracking, and defacement). Finally, the newly modified model was retrained to enable only the weights of the fifth convolutional block to update during training.

CAM-Based Object Localization.
Problems in object localization differ from those in image classification. Algorithms can determine the class of image features or objects and detect and label the objects within the image usually by placing a rectangular bounding box, indicating the algorithm's confidence of existence [64]. Moreover, for a detected object, a neural network provides four numbers as the output; these numbers function to parameterize the aforementioned bounding box.
For the identification of discriminative regions in the image, CAM can be combined with classification-trained  Advances in Civil Engineering CNNs. In CAM, the height of image regions, which are relevant to a specific class, is determined by reusing CNN classifier layers so as to obtain optimal localization results. In this study, the application of CAM to the current study model increased the accuracy of image localization. Figures 3(a) and 3(b) illustrate the loss and learning curves derived for our model for the training dataset. Epoch, presented on the horizontal axis in both curves, represents the training cycle for in which the entire dataset was entered into the network. erefore, when the loss curve presents a lower value, the probability of image recognition error is low, but when the learning curve presents a value close to 1.0, the model training accuracy is high. As indicated in Figure 3(a), at around the 50th cycle, the loss curve reached stable convergence to achieve good image recognition. As presented in Figure 3(b), model training remained in a good state. e training dataset included 5680 images, and the training involved 500 cycles. As shown in Figure 3, our model was well trained. Moreover, the accuracy for the optimal training dataset was 86%, with a final loss of 0.0576 at the end of the 500th cycle of training; nevertheless, no model overfitting was identified during training. As presented in Table 1, the model's accuracy rates for efflorescence, cracking, and defacement were 91%, 86%, and 98%, respectively, but that for spalling was only 76%.

Defect Localization Using CAM.
To further analyze the reasons for the fact that the accuracy rate for spalling was low, we visualized the dataset by applying CAM, a low-cost computation method. In the resulting image (Figure 4), large network responses, indicated in red, were noted. Figure 4 shows the focus of the various artificial neural networks.
Next, a confusion matrix (  Advances in Civil Engineering that 94.44% and 5.56% of these images presented mosaic tiles and lath bricks, respectively. Because of the small unit area of mosaic tiles, as these tiles fell, they left dirty, black stains behind. Moreover, during the process of capturing images of sample areas, trees may have blocked the light and created shadows (    Figure 6). Similarly, lighting problems during image capture were the reasons for the misclassification of efflorescence as spalling (Figure 7, red circles). us, when sunlight was too bright or when the spalling pattern was irregular, the model misclassified efflorescence as spalling during model training ( Figure 8). Finally, some cracking was also misclassified as spalling during model training (Figures 9 and 10, red circles).      8 Advances in Civil Engineering

Conclusions
In this study, this study combined a UAV with a deep learning model for automated detection of external wall tile deterioration of buildings and made modifications to improve the efficiency of our method. e results indicated that our model had high accuracy and recall, the respective rates of which were 91% and 80% for efflorescence, 76% and 100% for spalling, 86% and 86% for cracking, and 98% and 78% for defacement (Table 1). Compared with traditional detection methods, the use of UAVs is inexpensive and affords higher mobility, efficiency, and safety. However, UAV efficiency can be affected by the climate, lighting, wind, and blind spots in the test area and by the limitation of UAV operational technology. In the future, these limitations may be overcome through the use of relatively robust camera lenses, sensors, systems, and automation technologies, making UAVs safer and more efficient and increasing their application in the field of construction.
In the current study, the recognition accuracy for spalling was slightly low, indicating some limitations in spalling recognition from the existing images. erefore, in future studies, the use of infrared scanners, which detect differences in depth and recognize whether tiles have fallen, is highly recommended to improve recognition accuracy. Besides using larger data, a deeper network can be also considered. Deeper network can identify more detailed characteristics to improve the accuracy. Moreover, in the aspect of simultaneously identifying multiple defect types, different tags can be given in the image and use the corresponding loss functions. In the aspect of normal photos (without deterioration), the normal photo would be also given relatively lower belonging probabilities to the four deterioration types. Two methods are considered to further improve the model adaptation: (1) to set a basic threshold in the model; that is, if the input photos are lower than the threshold, they are classified as background (not belonging to the four types of deterioration); and (2) to take photos of normal exterior wall tiles equivalent to the number of singledeteriorated photos as the background type (the fifth type) and then retrain the model.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.