Real-Time Road Crack Mapping Using an Optimized Convolutional Neural Network

. Pavement surveying and distress mapping is completed by roadway authorities to quantify the topical and structural damage levels for strategic preventative or rehabilitative action. The failure to time the preventative or rehabilitative action and control distress propagation can lead to severe structural and financial loss of the asset requiring complete reconstruction. Continuous and computer-aided surveying measures not only can eliminate human error when analyzing, identifying, defining, and mapping pavement surface distresses, but also can provide a database of road damage patterns and their locations. The database can be used for timely road repairs to gain the maximum durability of the asphalt and the minimum cost of maintenance. This paper introduces an autonomous surveying scheme to collect, analyze, and map the image-based distress data in real time. A descriptive approach is considered for identifying cracks from collected images using a convolutional neural network (CNN) that classifies several types of cracks. Typically, CNN-based schemes require a relatively large processing power to detect desired objects in images in real time. However, the portability objective of this work requires to utilize low-weight processing units. To that end, the CNN training was optimized by the Bayesian optimization algorithm (BOA) to achieve the maximum accuracy and minimum processing time with minimum neural network layers. First, a database consisting of a diverse population of crack distress types such as longitudinal, transverse, and alligator cracks, photographed at multiple angles, was prepared. Then, the database was used to train a CNN whose hyperparameters were optimized using BOA. Finally, a heuristic algorithm is introduced to process the CNN’s output and produce the crack map. The performance of the classifier and mapping algorithm is examined against still images and videos captured by a drone from cracked pavement. In both instances, the proposed CNN was able to classify the cracks with 97% accuracy. The mapping algorithm is able to map a diverse population of surface cracks patterns in real time at the speed of 11.1km per hour.


Introduction
Societal economic vitality and growth is intimately tied to the state of infrastructure and its ability to safely and efciently handle the transfer of goods from one point to another.Pavement management, preservation, and rehabilitation strategies are critical components in maintaining the viability of infrastructure and economy over the long term.Roadway networks contain millions of miles of pavements, and maintenance operations of these systems cost upwards of $25 billion per year [1].As part of the maintenance operations, pavement surveys, which include both surface and subsurface assessments, are required frequently to assess the state of the pavement and help to prioritize rehabilitative and preservative action.Moreover, according to the National Highway Tra c Safety Administration, 16% of tra c crashes are produced due to roadway environmental factors mainly by poor pavement conditions [2].Poor road conditions also lead to excessive wear on vehicles and tend to increase the number of delays and crashes which can lead to additional financial losses [3].Currently, manual inspection is the most common technique for identifying pavement distress road surveys [4].Manual inspection can be, however, time-consuming, costly, and labor-intensive.Furthermore, during manual inspection operations, human visual error is possible, the operation itself can be unsafe due to the passing of nearby motor vehicles, and the operations may impede traffic flow [5].To overcome the limitations of manual inspection, automated and/or semiautomated crack detection techniques can be developed to measure, monitor, and map the evolution of the pavement surface and subsurface structure and distress profile [6].Semiautomated modern pavement distress mapping or diagnosis techniques need to be nondestructive, cost-effective, accurate, enabling data acquisition at high-speed, and relatively user and environmentally friendly [7].As part of an effort to lower costs and accelerate maintenance operations, transportation departments are prioritizing the development of automated systems profiling systems for pavement distress assessment [8].ere remains a need, however, to develop automated and real-time distress mapping and assessment tools that can provide the end-user with large quantities of information related to the distress type, geometry, and distress source without manual surveillance either in situ or by proxy.
e prominent solution of replacing expert inspectors with robots that can automatically gather data and analyze them has been studied or suggested extensively in recent years.
In the literature, emphasis of automated crack detection works was set on both using image processing for data analysis and developing automatic method for fast data collection like using robots or vehicles.Advantages like portability, being nondestructive, and lane closure avoidance are some of the important aspects of using vehicles for data collection in pavement distress studies, as suggested in many publications [9,10].
Despite all benefits of automated data collection methods, it leads to vast amount of raw pavement data.Interpreting the raw data needs human expert for analysis and decision making.Regarding the importance of on time maintenance of pavements, it is impossible to process 256 × 256 all raw data relying on expert human performance which has led the researchers to develop automatic intelligent algorithms for processing gathered raw data.e utilization of computer vision methods for pavement engineering applications has grown exponentially over the last few decades [11], while many challenges should be addressed to achieve a full and seamless realization due to unwanted and highly variable image noises from random variation of brightness color, camera, and the environment [12].In recent years, many transportation and highway agencies in the US have become interested in image processing-based methods for analyzing collected raw data from highways and roads [13].
Although numerous suggested classical methods have helped in pavement distress analysis, some drawbacks like being prone to environment noise, not being applicable under all road conditions, being dependent to certain image quality, etc. have reduced their robustness against processing varying data.In the recent literature, machine learningbased methods, especially deep learning, show promising results in pavement distress analysis [20][21][22][23][24]. Unlike classical methods, machine learning-based algorithms proved to be more robust for processing different pavement distress images under noisy conditions.Several machine learning methods like neural classifiers [25] and support vector machines (SVM) [26] had been suggested for pavement distress analysis.In a comprehensive review of computer vision-based defect detection and condition assessment of asphalt pavement, Koch et al. [11] identified SVM as the most robust machine learning technique for image-based pavement distress detection in 2015.Moreover, recently, deep learning has become a popular alternative in pavement distress analysis due to its convincing performance over SVM and other methods [27].
Cha et al. [10] created a database with 40,000 images of sizepixels and annotated them into crack or intact bins utilizing MatConvNet [28] for crack detection that could achieve 98% accuracy.Gopalakrishnan et al. [29] studied transfer learning on a single-layer pretrained neural network classifier for pavement distress detection.ey labeled their data as crack or ("1") and no-crack or ("0").Also, they could gain 90% accuracy by using ImageNet's pretrained VGG-16 as the feature extractor.Dorafshan et al. [30] prepared a database of 1,574 crack and 16,426 without crack images.ey compared deep learning and edge detection methods and suggested a combination of both for improving the results in crack detection of concrete.Also, they used AlexNet architecture for feature extraction.Smartphone-based data collection is proposed in Maeda et al. [31], and they have tested several object detection systems like Faster R-CNN, YOLO, SSD, and R-FCN.Gopalakrishnan, in [27], has extensively reviewed the most recent deep learning-based methods for pavement distress detection.Besides developing crack distress detection algorithm whether it is based on the classical or deep learning method, some papers have suggested specific platforms for data collection.Due to the vastness of the roadway system, automatic pavement screening is needed.Prasanna et al. [9] has suggested an automated crack detection algorithm, called spatially tuned robust multifeatured, for monitoring concrete bridges and they implement the algorithm on the robot platform.In [32], some had studied an automatic image-based road crack detection method and a vehicle-based data collection platform is used to collect data from different locations for further processing.Among several data collection methods, vehicle mounted cameras are the most popular one.
e integration of computer vision methods using deep convolutional neural networks (CNNs) shows exceptional promise for use in crack detection applications, but require many images for the training process [10,27,33].Although 2 Complexity utilizing CNNs improves the crack detection accuracy, there are known drawbacks in the conventional approaches for studying the cracks [10].In the available works [10,27], whether asphalt pavement or other surfaces, the objective is to distinguish cracked areas of pavement from uncracked ones that yields to a binary decision with two outcomes: cracked or noncracked.Due to the proven outstanding performance of deep learning in contrast with other machine learning methods like SVM, Adaboost, and random forest, still some shortage exists.For instance, in [11], although it is proved that a CNN has the capability of detecting cracks with high accuracy, authors suggest an optimization for the CNN for future improvements.Also, using transfer learning for pavement distress detection, Gopalakrishnan et al. [29] suggested to add a feature for evaluating severity of detected cracks, which shows possible further improvements for pavement distress detection.In deep learning-based methods, for pavement distress detection, the current focus is on improving the accuracy of neural network for identifying cracks.Also, most of the recent work in this application uses AlexNet, VGG-16 CNN architecture, and some transfer learning methods for pavement distress detection task.It is important to consider that most of the mentioned architectures are designed and tested on datasets that do not include pavement distress data.Although in many cases transfer learning is applicable for reducing training time and improving accuracy, the objects in datasets like MNIST, ImageNet, CIFAR-100, etc. do not share similar patterns in pavement applications, so using transfer learning with similar CNN architectures is limited.

Methodology
As reviewed above, deep learning-based methods for pavement distress detection improves false detection accuracy within noncracked pavements, whether using different CNN architectures or transfer learning or pretrained models.is paper proposes an approach to geometrically map a surface crack on asphalt pavement using a technique that involves image partitioning and crack geometrical and spatial classification.is technique allows the user to both detect the presence and map a crack on the road surface in real time using raw input images.e work extends the functionality of CNN-based classification techniques, which up to date are limited to only crack presence detection and do not provide simultaneous geometrical mapping of the object [34,35].Crack images are aggregated in the database and indexed according to their orientation and spatial position within a squared partitioned area of the larger raw image file, which then allows the position and orientation to be estimated heuristically using thirteen unique categories.By applying this approach, instead of predicting crack position in each frame by a marginal error that depends on the searching window, we not only are able to detect and classify the cracks, but also map the crack and avoid errors caused by the searching window.
Instead of relying on predesigned CNN architectures, we have proposed an optimized architecture for the CNN and hyperparameters within the pavement distress detection task.Also, rather than taking the approach in the crack detection task that is focused on detecting crack from noncrack, which introduces some false positive after classification, in this work, we propose a heuristic algorithm CMA which is able to regenerate crack shape automatically.Alternatively, in this work, we propose a method for mapping a crack's shape or analyzing the results based on the pattern of a crack in an entire image or video frame that could yield a smoother crack map by eliminating smaller cracks or errors.Moreover, the proposed method is a general concept for automating the road surface crack analysis that can be later adapted with several highway agencies protocols like American Association of State Highway and Transportation Officials (AASHTO) (PP67-10 and PP44-00) or Mechanical-Empirical Pavement Design Guide (MEPDG) [36].In other words, by changing the camera parameters like sensor size, focal length, lens type, distance of camera from pavement, etc., it is possible to achieve the minimum deficiency length that is considered crack in several protocols.
In the proposed scheme, each image is first partitioned into 300 equal square tiles.en, a CNN is developed and trained that classifies the cracks in the tiles to predefined categories.Since the categorization of the cracks is conducted tile-by-tile, the resulted map may show discontinuities at the borders of the tiles.To mitigate such discontinuity errors, a heuristic real-time crack mapping algorithm (CMA) is introduced.
e CMA processes the classification results and, based on the regional crack's information, it modifies the current segment that yields to a unified and continuous map of cracks on the road surface image.Further, the CMA has the ability to eliminate small cracks or false positive objects isolated in one partitioned tile that is continuous over multiple partitioned tiles.Since the objective of this paper is to map the crack real-in-time in image frames from a streaming video, the CNN hyperparameters (HPs) had to be optimized so the processing time and the classification error for the input images are minimized simultaneously.To that end, a Bayesian optimization algorithm (BOA) was utilized in lieu of trial and error methods.Experimental results show that the CNN installed on a portable computer can process 5 frames per seconds providing the ability to map one band of a road real-in-time at the speed of 11.1 km per hour.e primary objective of this work is to use real-time images to map cracks on the surface of an asphalt pavement.A crack is defined as a mechanical or thermal strain-induced separation of material.
is material separation allows moisture to infiltrate the pavement structure internally, leading to premature failure or accelerated deterioration.Cracks are classified by their geometric orientation, source, width, and concentration per unit length or area.Only cracks visible and distinguishable to the naked eye are considered in the distress survey.
As illustrated in Figure 1, the proposed crack mapping scheme is comprised of three stages: database preparation, training and optimizing, and real-time crack mapping.To Complexity prepare the database, a descriptive approach was taken to categorize a given crack based on its relative position and geometric orientation within the image.
is approach labels the cracks based on their geometric orientation.Multiple images of cracked asphalt pavements were gathered from the field, and each image was subdivided into smaller tiles (Tls).
e Tls were classified into 12 different categories based on whether the Tl contained a crack.If a crack is detected, it is further classified by crack position and orientation, i.e., horizontal, vertical, diagonal.Tls that did not contain a visible crack, but rather contained objects like grass, shadows, patched cracks, pavement markings, general uncracked pavement, etc., were binned into the 13 th category.A CNN is then used to learn the unique features contained within each Tl, and then each Tl was classified ± 10 °into a specific category (1-13) based on crack presence, position, and orientation.To further refine the process, BOA is used to objectively and systematically achieve optimal HPs by selecting optimal initial learning rates and momentum, which served to significantly reduced training time operations that have been shown to be tedious when tuned manually [10,12].e trained CNN is used for real-time crack mapping using video frames of a cracked roadway surface that were received from a camera installed on an aerial vehicle.e classified Tls are then sent to the CMA which maps the cracks in real time.e CMA is designed to enhance the decisions made by the CNN by eliminating cracks found only within one isolated Tl and requiring the crack maps to be contiguous across multiple Tls to increase the smoothness and accuracy of the mapped crack field.

Database Preparation.
To develop and evaluate the CMA, 1500 images of cracked asphalt pavement surface were collected to prepare a database to train the CNN.Images were taken by a FLIR E5 camera with a 55 °× 43 °field of view and 640 × 480 pixels resolution.e camera was placed between 1.5 m and 2.5 m above the road with a vertical line of sight and of tilt error.Each image was then divided into 300 equal Tls containing 32 × 32 pixels.Each Tl was then virtually divided into 9 equal blocks, as depicted in Figure 2(a), where the hashed blocks indicate the range of possible crack locations within a given bounded region; which is uniquely defined for each category.Figure 2(b) shows how an actual pavement surface image is aligned with a given category.Category groups {1, 2, 3}, {4, 5, 6}, and {7, . .., 12} represent horizontal, vertical, and diagonal cracks, respectively.e noncracked category ( 13) is shown in Figure 2(c).
Based on the established definition of categories, a total of 6,695 Tls were handpicked from the 1500 images that fit into one of the 13 categories.For representing data, the CIFAR-10 [37] layout was used that yielded to 13 different data-batches equal to the number of crack categories.Each data-batch contains similar manually annotated Tls that represents one of the 13 categories along with an assigned label Lb ∈ 1, 2, ..., 13 { } indicating Tl's category number.en, 90% of each data-batch was randomly taken along with their labels to form the training set, and the other 10% was used for the test set.A test set is then used to measure the accuracy of the trained CNN.  e first layer is the input layer that receives the input image Tls for classification.e crack features in each databatch were extracted using multiple convolution layers.is layer consists of various sets of neurons whose weights and biases will be updated relative to the crack features.In the convolutional layer, the neuron input consists of small sectors from the previous layer that is called the filter (kernel).
e size of the filter, S f , can be tuned from 1 × 1 pixels up to the size of the input image.In the convolution layer, the filter moves along the input and builds a convoluted feature map.To increase the number of feature maps, multiple filters should be used, and each filter has different weights and biases to be able to extract various features of the image.e stride (amount of horizontal and vertical movement of the filter on the input per convolution) is set to 2 pixels.
After the convolution layer, a batch normalization layer is used for reducing the CNN sensitivity to initial HPs values and decreasing training processing time.Following the batch normalization layer, a Rectified Linear Unit (ReLU) activation layer is added to apply a zero threshold to all negative values in the batch normalization layer; that means, the inputs from the previous layer b go through max(0, b). e Complexity max-pooling layer downsamples the input by dividing it into rectangular pooling regions to compute the maximum of each region of gathered feature matrices.After designing the feature extractor, the fully connected (FC) layer is used to map the features matrix in the last layer in the form of a 1 × s vector, where s � 13 is chosen equal to the number of categories in the database.For representing the probability distribution over multiple classes in the output of a classifier, a generalized model of binary logistic regression classifier (SoftMax function) is utilized after the FC [38,39].Considering the input of the SoftMax function as a sample tile that belongs to one of 13 categories, Tl ∈ Cat j where j ∈ 1, ..., 13 { }; then the category prior probability [40] is defined as P(Cat j ), which shows the probability of Tl ∈ Cat j and conditional probability as P(Tl, ϕ | Cat j ), where ϕ � [w, b] is the parameter vector that consists of weights (w) and biases (b).e SoftMax function is described as where r j (Tl, ϕ) � ln(P(Tl, ϕ | Cat j )P(Cat j )) and S j is a probability distribution of the SoftMax function output, where 0 ≤ S j ≤ 1 and  13 j�1 S j (Tl, ϕ) � 1.Following the SoftMax function, the classification output layer (cross entropy function) is used to assign each input to one of the n � 13 mutually exclusive categories using the loss function shown in the following: where p is the number of samples and d ij is a matrix that shows with what probability the i th sample of Tl belongs to j th category.

Training. Stochastic Gradient Descent with Momentum
(SGDM) was used to train the CNN for classification.is method updates CNN's weights and biases to minimize the loss function that measures the difference between true classified and false classified Tls. e SGDM uses a subset of training data (mini-batch).e gradient derived from the data within the mini-batch is used for updating the weights and biases.Each update to the weights and biases is defined as one iteration.e gradient descent update law is described as where subscript k represents the iteration number, the initial learning rate is 0 < λ < 1, ϕ is a vector that contains the weights and biases, l(ϕ) is the loss function and 0 ≤ η ≤ 1 is the momentum, which defines the level of contribution from the previous step.For λ values close to 0, the learning process is slowed and values close to 1 lead to either diverging or suboptimal weights.Moreover, to prevent overfitting of CNN during the training process, L2 regularization [39,41] is utilized as follows: where τ is the regularization factor.To both prevent overfitting and feature memorization and improve the generalization of the SoftMax classifier during the training process, a modified data augmentation procedure is used during each iteration [38], where the Tls were translated randomly in the horizontal and vertical directions by a maximum of by ± 4 pixels.It is noteworthy that the Tls cannot get flipped or rotated since the classification process is dependent on crack orientation.
and 0 < λ < 1. e possible values for ND, S f , and N f are integers and for η, τ, and λ are logarithmically spaced values between 0 and 1. e classification error is the number of misclassified Tls by the classifier (SoftMax).e objective of optimization is to find optimal values for the HPs such that the classification error is minimized.So, the objective function can be considered a function with HPs as the input and the classification error as the output.Modeling of this objective function is algebraically complicated and computationally intensive.e BOA is capable of performing optimizing the HPs to minimize the classification error, while the objective function is considered as a black-box [42].To perform the BOA, a validation set was defined that consists of 10% randomly selected Tls from the training set.e inputs of the objective function are training set and validation set.As shown in Figure 4, the objective function trains the CNN and returns the classification error on the validation set.By modeling the calculated error using Gaussian process (GP) as mentioned in [43] and in multiple iterations z, where z � 1, 2, . . ., 100 { }, the BOA finds the optimal values for HPs that minimize the classification error.e kernel function that was used for GP is the Automatic Relevance Determination (ARD) Matérn 5/2 in [44].In addition, the acquisition function (q z (HP)) that is used for the GP is the Expected Improvement function E(.) [45], as follows: To perform the BOA, a validation set was defined that consists of 10% randomly selected Tls from the training set.
e inputs of the objective function are training set and validation set.As shown in Figure 4, the objective function trains the CNN and returns the classification error on the validation set.By modeling the calculated error using 6 Complexity Gaussian process (GP) as mentioned in [43] and in multiple iterations z, where z � 1, 2, . . ., 100 { }, the BOA finds the optimal values for HPs that minimize the classification error.
e kernel function that was used for GP is the Automatic Relevance Determination (ARD) Matérn 5/2 in [44].In addition, the acquisition function (q z (HP)) that is used for the GP is the Expected Improvement function E(•) [45], as follows: where f max z (HP) is the current maximum observed value for the objective function.e next estimation for maximizing the objective function is obtained by using the acquisition function.e GP posterior is updated in each iteration using equation ( 6): where u � (HP z , f z ), z � 1 : 100  . e extrema of f z (HP) was obtained numerically at sampled values of the function.A closed form expression of the objective function is not required within the BOA mathematical structure [46].e objective function and acquisition function for two of SGDM HPs, i.e., η and λ, during the optimization process are shown in Figure 5.As depicted in Figure 5(a), the observed points are demarcated by blue dots (f z (HP)), the model mean that is obtained from the observations is depicted as the red surface, and subsequent evaluation point addition is demarcated with a black dot.Moreover, Figure 5(b) illustrates the acquisition function.e objective function is shown to reach a minimum at the 69 th iteration; this point is demarcated with a black star.Figure 5(b) shows the maximum feasible value that is generated upon minimizing the classification error.e total number of iterations was set to 100.Each iteration calculates the classification error among 600 randomly selected Tls from 13 data-batches.Figure 6 represents the estimated (expected) improvement in each iteration and the calculated improvement during optimization.
e BOA was evaluated statistically using the Wald method [47] by representing the images in the test set as independent events with a known probability of success.
e number of misclassified images was represented with a binomial distribution.By applying the trained CNN with optimized HPs on the test set and computing the number of correctly classified Tls, the test error E t is defined as follows: where Ts and b are the number of correctly classified Tls, and the total number of Tls in the test set, respectively.Note to evaluate the trained CNN performance on the test set without exposing the CNN to the optimization process, E s is used to obtain the standard error.is approach helps to increase the optimization speed.e standard error is represented as follows: Moreover, as the target of this research, to obtain a ± 3% error margin, a confidence interval of 97% is defined to calculate the generalization error E G defined as e final HPs value for the CNN were ND � 4, S f1 � 5, S f2 � 5, S f3 � 6, S f4 � 2, N f1 � 119, N f2 � 119, N f3 � 108, and N f4 � 96.Also, the optimized values for SGDM were λ � 0.0005145, η � 0.040069, and τ � 0.002859.
Applying the optimal HP values to the CNN and SGDM yields a CNN with 24 layers and 96.67% accuracy in 10 epochs as it is shown in Figure 7(a).Moreover, the minimized value for the loss function was 0.033 in 10 epochs, as shown in Figure 7(b).e generalized error E G interval for the test set was [0.0476, 0.0184].Figure 8 illustrates 24 randomly selected output of the fourth convolution layer, indicating that the features of Tls extracted by the CNN.Extracted features of the position and orientation of a crack in each Tl can be easily recognized in Figure 8. Figure 9 depicts the confusion matrix for the test set, which is obtained based on the CNN that is trained using final values.As shown in Figure 9, the highest confusion is between the two categories 5 and 6.
e reason is that, from Figure 2(a), categories 5 and 6 both represent vertical cracks in the middle and right side of Tl, respectively, which intrinsically leads to a higher probability of misclassification.24 randomly selected Tls from the test set that were selected for testing the optimized trained CNN are shown in Figure 10 along with the percentage probability that they belong to each category.e trained CNN in this section will be used to map the cracks with the CMA described in the following section.

Real-Time Crack Mapping Algorithm (CMA)
In this section, the proposed real-time Crack Mapping Algorithm is discussed.So far, a CNN is trained that classifies the cracks in a Tl.However, if the CNN is directly used to map a crack, the resulted map will not be continuous.e discontinuity is the result of the classification errors, due to the size of the tile blocks and the limited number of categories that the CNN classifier recognizes.To mitigate this Complexity problem, as it is indicated in Figure 1, a CMA block is added to the mapping scheme that smoothens the resulted final map.As shown in Figure 3, an input image or video frame is divided to Tls in the same size as that of the database Tls assuming that input images have 640 × 480 pixels.e classifier assigns a score to Tls related to their similarity to each category.e higher the score, the more probable that Tl's crack belongs to a category.Among all assigned scores for a Tl, if a score is less than a defined threshold value of 85%, it is assumed as a noncracked Tl that is the 13 th category.As shown in Figure 11, the divided input image has 15 rows r � A, . . ., O, and 20 columns, c � 1, . . ., 20.As depicted in Figure 12, for each crack category, a raw mapping plot was defined.
e raw mapping plot is a collection of straight-line segments that estimates crack position and orientation based on its classification.is  8 Complexity mapping will not be interconnected between tiles, which leads to two immediate deficiencies: isolated cracks and nonconnected cracks which lead to an overall mapping error.
To improve the crack connectivity between tiles, without refining the pixel dimensions of the Tls and increasing the number of classification categories, a CMA block was created to "tie" the crack line segments between neighboring tiles.is procedure is described as follows: if a Tl has any common side (Si) or corner with other, Tls it is called neighbor Tl.Each Tl has at least 3 neighbors at the image corner and at most 8 neighbors in the image interior.An        Figure 14 shows the connected CMA modified mapping plots (red lines) overlaying both crack pavement image and raw mapping plots (blue lines: without segment connection).An isolated crack (blue line) is detected in tile C7 and classified into category 7.No neighbor tiles to C7 have a detected crack.More isolated cracks are detected in C7, E9, H17, and K8. e isolated cracks are eliminated within the CMA.
e surviving linear segments are those that have neighbor Tls which are shown from A9, B9, B10, C9, D9 . . . to O13.By defining the neighbor Tl, the mapped plots from A9, B9, B10, C9, D9 . . . to O13 form a pattern which can map the underneath crack.e proposed method is not limited to a certain vertical distance of the camera from the road.By having the camera's parameters like focal length, sensor size, etc., the pinhole camera model in [48] can be used to find the real dimensions of the road surface in each image.
is information is required to determine the size of the crack in each image.As stated in Section 3, images were taken from distance of 1.5 to 2 meters above the cracked asphalt.For instance, for the camera used to collect the images in this work, a 640 × 480 pixels image would cover a 1036 mm × 915 mm to 1280 mm × 1097 mm block on the road.In that case, each Tl would cover a 51.8 mm × 61 mm to 64 mm × 73.1 mm area.erefore, considering the isolated Tls that are eliminated by the CMA leads to exclusion of cracks with maximum length of 80.02 mm to 97.1 mm.On the other hand, crack width is one of the other requirements by different protocols for defining severity of deficiency.As an instance, AASHTO protocol (PP44-00 and PP67-10) has three different levels for measuring the damage severity.Level 1 is defined as cracks with width of less than 3 mm, level 2 refers to cracks with width between 3 mm and 6 mm, and level 3 is cracks with width of greater than 6 mm.Moreover, three major types of cracks in most protocols are longitudinal, transverse, and alligator cracks [36,49].Although converting length and width of cracks to the percentage of lane area is a straightforward task for transverse and longitudinal cracks, there is no certain way for applying the same method for alligator cracks.Hence, some protocols like HPMS, LTPP, and PP44-00 are focusing on transverse and longitudinal cracks and considering alligator cracks as a combination of those two types.With the mentioned equipment that is used in this paper, it is possible to detect cracks with minimum width of 2 mm and maximum width of 30 mm in images that are taken from 1.5 to 2 meters of the pavement surface.e proposed algorithm has no limitation over the length of crack.Moreover, the proposed algorithm is capable of detecting all three types of cracks that is defined widely accepted protocols.

Experimental Results
In this section, the test results of the trained CNN are presented in two subsections: (i) the performance of the algorithm was tested on several single images with various crack shapes and (ii) a real-time mapping evaluation was done on a captured video that is obtained from a randomly selected pavement roadway.e test images from the training and testing database were not used, not filtered or modified, and taken under varying light condition and camera position.Also, in this paper, MATLAB was used for training and optimizing the CNN and implementing the CMA algorithm.

Real-Time Mapping
Evaluation.Also, to evaluate the CMA performance in a real-time manner, a video was captured using a DJI Phantom 4 drone with mounted camera.e CMA was implemented on the captured video's image frames.e drone was set to an altitude of 2.5 m from the ground.e camera was perpendicular to the ground, and the video quality was set to resolution of 640 × 480 and 120 fps.Since the drone was flown with a slow speed above the crack, and due to the high frame rate, the frame interval for processing was set to 60 frames.A ranging rod, 30.48 cm, with orange and white bands was set beside a road crack, and the drone was flown 3.6 meters in the forward direction.e laptop that was used for processing the captured videos was Alienware 15 R3 with NVIDIA GTX 1070 GPU and Intel Core i7 processor.As it is shown in Figure 17, the algorithm was able to detect and map deep cracks and avoid oil spills on the surface road.Referring to Figure 17, each frame was taken every 0.2 milliseconds.A total of 5 frames could be processed per second which covers 0.6 m of the road pavement.is indicates that the maximum speed of the real-time mapping with the current hardware is 11.1 km/h. is speed, however, can be increased with the advent of lighter and more powerful graphical and processing units.Figure 18 shows a video obtained from the drone footage with real-time crack mapping segments produced from the CMA.
e proposed algorithm in this work improves upon existing work [10] by integrating crack detection with a crack mapping using image segmentation and classification within a CNN architecture.In addition, the optimized CNN architecture proposed here uses a significantly lower number of filters in the convolution layer (256) leading to reduced computational demand in both CNN training and real-time processing.As mentioned in Section 6, the BOA is used to compute the HPs.To verify the fact that the selected optimal values maximize the CNN accuracy, during the training process, all HPs were perturbed by a ± 5%, ± 10%, ± 20%, and ± 30% white noise.As shown in Table 1, perturbing the HPs by ± 5% decreases the accuracy by about 1%, while as it increases to ± 30%, the accuracy decreases by at most 8%.

Conclusion
In this paper, an algorithm for mapping road cracks in real time using convolutional neural networks was proposed and tested.Authors gathered the database for this work, and due to limited available resources, the size of the database was limited to 6695 images.e convolutional neural network in this work was optimized using the Bayesian optimization algorithm.A heuristic algorithm for real-time crack mapping was introduced and tested on different images with complicated crack position and orientation.Also, a video was recorded and processed for testing the real-time ability of the algorithm.Although the database was carefully selected and curated in this work, the authors attempted to include a robust population of crack images to improve the selection and classification power of the CNN.However, this study is limited to only one block size and 13 classification categories.Certainly, for commercial applications, increasing the number of images within the training and  Complexity increasing the computing power will allow users to reduce the size of the tiles and increase the number of classification categories which may further refine the smoothness of the mapping segments.e mapping results via the CMA may also be used for crack type classification and causation, analyzing what type of asphalt is more prone to cracking, what type of asphalts are more suitable for different road conditions with respect to the traffic, and how to choose the best asphalt for various conditions, and finally estimating the repair and protection costs for each individual road type.
Analyzing the crack propagation patterns based on geographical information of the road using the CMA provides more analytical information in combination with other data that could help during the decision making process for road construction.
Optimizing hyperparameters using Bayesian method Building data-batches using 13 different categories Training Categorizing tiles based on crack type Training the CNN classifier using SGDM Implementation and results Mapped cracked image Acquiring new images/videos Mapping classified cracks with CMA Trained CNN for classifying of tiles

Figure 1 :
Figure 1: Flow chart for crack detection and mapping.

Figure 2 :FCFigure 3 :
Figure 2: (a) Categorization method based on the position of a crack in each tile, indicating a defective section.(b) Sample of tiles in 12 categories.(c) Category 13 th sample images of noncracked tiles.

Figure 4 :
Figure 4: Block diagram of optimizing HPs using BOA.

Figure 5 :
Figure 5: (a) e observation function model.(b) Acquisition function for two parameters of the SGDM.e starred points are the calculated optimal values at the 69 th iteration for λ and η.

Figure 6 :
Figure 6: Evolution of minimum observed objective and estimated minimum objective in total 100 iterations, where the minimum calculated value is 0.07.

Figure 7 :
Figure 7: (a) Training CNN with optimized calculated value reached 97% accuracy after 10 epochs.(b) Loss function output during training, the error value is 0.033 after passing 10 epochs.

Figure 8 :
Figure 8: 24 randomly extracted features in the last convolution layer.

Figure 10 :Figure 9 :
Figure 10: 24 randomly selected tiles from the test set.Category and its confidence are available on top of each tile.

Figure 13 :
Figure 13: Applying the CMA, to make the crack mapping unify.

Figure 14 :Figure 15 :Figure 16
Figure14: Blue lines show the raw mapping of the cracks.e crack map is not unified, and there are isolated mapped cracks available in the image.Red line shows the improved mapping the cracks using the CMA, isolated cracks were eliminated, and crack's map is unified and smoothed.

5. 1 .
Single Image Mapping Evaluation.As depicted in Figure 15, samples of three images having typical types of cracks with vertical (Figure 15(a)), horizontal (Figure 15(b)), and diagonal (Figure 15(c)) orientations were used to test the CMA.ere are no isolated tile cracks and the connected mapped cracks cover the main crack with substantial accuracy.

Figure 17 :
Figure 17: 4 processed frames with detected and mapped cracks.Mapping algorithm was able to separate all of 4 significant cracks and map them.
test the algorithm's performance under different weather and illumination conditions, multiple cracks were selected and pictured in different situation.Figures16(a)-16(f ) show the wet, bright, and dark conditions respectively.e CMA could map cracks regardless of the road condition.

Figure 18 :
Figure 18: e real-time video of CMA is available at this link: https://youtu.be/SGh03B5EEpk.
Figure 12: Raw plot based on the position of each crack in a single tile for all 12 categories that contain a crack.

Table 1 :
Retraining the CNN by changing the HPs to validate the calculated values by the BOA.