1 Introduction

Pavement management is the process of planning maintenance and repair of roads to retain the road networks in an optimal state in terms of life cycle costs. Since pavement management requires complex decisions involving resource allocation and work scheduling, a software tool called a pavement management system (PMS) [33] is used to support these decisions. The key to successful pavement management is to detect and repair defects on roads in a timely manner so that pavement deterioration does not cause inconveniences to safe driving. In large cities with high traffic, fast crack detection and maintenance becomes even more important because the number of vehicles exposed to cracks is higher and the severity of defects rapidly aggravates with heavy traffic. When there are too many roads to manage with limited resources, quantified decision-making based on risk assessment is required to handle these situations effectively. Determining the severity of road cracks is very important for a proper risk assessment. To achieve an effective risk assessment in PMS, the severity assessment should be accompanied by the crack type detection because different types of road cracks have different severity assessment methods. As summarized in Fig. 1, pavement management basically consists of 4 phases. During the data collection phase, massive images of road surfaces are continuously captured from patrol vehicles via road scanner or camera equipment, along with location information (GPS data) for that section of the road. In the case of Seoul city, the image capacity required to scan and store 8,323.7 km of paved roads at 1-m interval in 2020 is about 370 TB. This data capacity easily doubles as the number of lanes and the number of scans per year increases. Data collection is usually done automatically in most countries using dedicated hardware. During the crack detection phase, the crack type is determined along with the presence or absence of cracks in the images. Although many recent studies have reported achievements related to automated crack detection [2, 32], to the best of our knowledge, these results have not yet been applied to PMS and appear to be still in the research phase. The crack detection phase of Korean PMS also still relies heavily on visual inspections performed by many operators, making it a bottleneck process that causes significant delays in the overall process. Lots of operators are classifying crack types in road images and manually mark crack pixels. However, recent advances in crack detection research are moving this phase from the conventional manual process to a semi-automatic process by country. In the severity assessment phase, manually marked pixel data is used to evaluate the individual severity for cracks and the overall severity of that road segment is determined. Finally, in the maintenance planning phase, repair schedule for road sections with high-severity cracks is generated by reflecting available resources [8].

Fig. 1
figure 1

The main process of road pavement management

To accelerate the entire pavement management process, it is necessary to expedite the crack detection and severity assessment, which are bottlenecks that are highly dependent on human resources. Recently, AI-related research to assist or replace human tasks has been actively conducted in various industries. In particular, research related to deep neural networks (DNNs) using GPU’s high-performance computing power has shown good results and is expanding the field of application. In relation to the detection and classification of road cracks, studies using DNNs to determine the presence of cracks or to classify crack types have been actively conducted, and results with significantly improved accuracy are being published [1, 6, 7, 13]. However, since many studies have been limited to classifying a relatively small number of crack types, it is necessary to expand the number of classifiable crack types for actual field application. In addition, research on automatic estimation of crack severity is still in its early stages, making it difficult to fully automate the process within PMS. To automate the estimation of crack severity, it is necessary to accurately confine the crack region from the road image and calculate the degree of deformation according to the crack type. In this paper, the authors focus on implementing practical applications for automating crack detection and severity assessment in PMS while adopting and extending existing research. The authors have expanded the types of cracks that can be classified into five categories: alligator crack (AC), longitudinal crack (LC), transverse crack (TC), pothole, and patching so that it can be used in actual fields. They also have built a pilot system that can determine the severity of cracks by identifying crack regions through object detection. In this pilot system, DNN-based image segmentation is performed to clearly enclose crack regions from the input image for severity assessment, and crack classification is performed before segmentation to improve segmentation accuracy by reflecting the characteristics of each crack type. The overall accuracy of pilot system has reached to 91.2% for 1330 test images, which appears to be applicable to actual fieldwork. The content of the paper is organized as follows: Sect. 2 introduces background knowledge and prior research related to this study. Section 3 describes the model details and experimental results of the proposed approach to assess the severity of road cracks. Overall structure of the pilot system for analyzing road cracks is presented in Sect. 4. Finally, the conclusion in Sect. 5 briefly reviews the contributions of this study and presents suggestions for further research.

2 Related works

The configuration and relative frequency of cracks in asphalt roads can vary depending on several factors, such as the root cause, the surrounding climate, and the road usage pattern. For this reason, many countries want to customize the types and severity measure of road cracks to better reflect their own situation, and to manage them systematically. In Korea, as shown in Fig. 2, the top five crack types that are most frequent and require strict management with importance are identified [13, 32]. Alligator crack or crocodile crack is a common type of distress in asphalt pavement, which is characterized by interconnecting or interlaced cracking in the asphalt layer like the patterns in crocodile hide. Longitudinal crack is a form of distress whose direction is typically parallel to the edge of the pavement shoulder. Transverse crack occurs roughly perpendicular to the centerline of the pavement, mainly due to shrinkage of the asphalt layer or reflection from an existing crack. A pothole refers to a hole of various sizes and shapes that occurs when the weak spot in the asphalt layer collapses or is displaced by the weight of a passing vehicle. The term patching refers to the process of filling potholes or overlaying excavated areas in the asphalt pavement. Though patching is not an actual asphalt distress, it is managed as an important crack type in Korea because there is a high possibility that the underlying crack is easily exposed due to wear and tear in case of thin surface patch.

Fig. 2
figure 2

Five types of road cracks [32]

Road crack analysis initially identifies the type and location of cracks from the input image, and the severity is determined by measuring the maximum width of the crack or by calculating the area of the distress according to the type of crack. Many existing studies on road crack detection using DNN still focus only on the classification of crack types and do not cover all five types mentioned above [8, 12, 16]. In particular, research on patch detection is difficult to find, so additional research is needed in Korea, where patching is to be managed as a major crack type. Unlike previous studies that use a separate analysis algorithm to identify crack segments while evaluating the severity of each crack type [1, 2, 10], this study uses an object detection technique to simultaneously handle the classification of crack types and the confinement of crack regions.

2.1 Crack detection using deep learning

There have been many studies trying to detect cracks in objects such as concrete walls, bridges, pipelines, glass and asphalt pavements by combining image processing techniques and deep learning [2,3,4, 6, 8, 10,11,12,13, 24,25,26,27, 30, 34]. Basically, crack detection is performed according to three main steps: pre-processing, detection and classification. In these studies, images were pre-processed by applying conventional image processing techniques such as smoothing, normalization and filtering methods. In detection phase, the existence of crack is determined by applying analytical or logical methods such as Otsu [28] methods, statistical approaches, and threshold methods. Many of road crack detection researches can be assigned into this category that actually determines the existence of cracks by incorporating various segmentation techniques to improve image quality or generates crack regions from the input image in the form of bounding boxes [35,36,37, 39], but they does not actually determine the individual type of cracks. There are researches that utilize additional types of input data such as acoustic-sensor data [38] or 3D scanned data [40] to better detect cracks hidden below the surface. Finally, the actual multi-type classification is done using deep learning methods like convolutional neural networks (CNNs) or using various mathematical techniques. Approaches using analytical or logical methodologies are generally fast in processing, but have an accuracy of about 80 to 90%, which is somewhat insufficient for practical use [20].

In the case of CNN approaches, detection is usually performed together with classification rather than a separate step, or a segmentation step to find crack regions or contours in the input image is often included instead of the detection step. Although CNN approaches require more computational resources than analytical or logical methods, they show improved accuracy of more than 90% through the development of new network models and continuous learning on the accumulated data. Studies on the detection of cracks on asphalt pavements have also shown a similar trend to the other fields of research on cracks [3, 18]. Zou et al. proposed a deep convolutional neural network (DCNN) called DeepCrack based on the encoder-decoder architecture employing hierarchical multi-scale features for automatic crack detection in pavement and stone surface images. It is reported that DeepCrack achieves F-measure over 0.87 on the test dataset [21]. Feng, X et al. proposed a method based on a DCNN fusion model, which combines the advantages of the multitarget single-shot multibox detector (SSD) CNN model and the U-Net model.

Segmentation and crack type classification is carried out sequentially in this model. Test results for this fusion model show that the recognition accuracy of the pavement crack for TC, LC, and AC is 86.8%, 87.6%, and 85.5%, respectively [22]. Although this model performs crack detection and classification for three types with relatively high accuracy, it has the disadvantage requiring substantial computational capacity due to a large model using many parameters. Hu G.X. et al. conducted several experiments applying a set of YOLOv5 object detection models for pavement crack detection. In their experiments, the YOLOv5l model recorded the highest detection accuracy of 88.1% and the YOLOv5s model recorded the shortest detection time of 11.1 ms for each image [23].

In terms of road pavement management, if the existing studies are simply grouped according to the functions they are dealing with, most of them can be classified into three types: ‘detection—classification’, ‘segmentation—classification’, and ‘detection—severity assessment’. As mentioned earlier, there are many studies focusing on classification, and few studies on severity assessment [29, 31]. Implementing PMS, however, requires handling both classification and severity assessment for all major crack types, including pothole and patching.

2.2 Road crack classification and severity assessment

To quickly implement classification and severity assessment together in PMS, it is practical to use a combination of methods proposed in previous studies. This approach, however, has a drawback in that the size of model increases due to the combined use of existing networks. Because on-site reprocessing is required to validate the PMS results, the size of the model must be considered so that it can operate even with limited computing resources of the mobile terminals. Jo, H. et al. published an experiment comparing the performance of two self-designed CNNs and SqueezeNet in the process of classifying crack severity into high, medium, and low for 7 crack types. As is well known, SqueezeNet is useful for applications with memory or computational limitations and shows good performance while achieving AlexNet-level accuracy on ImageNet dataset. Although SqueezeNet is a CNN designed to have a small number of parameters, it showed relatively high accuracy in classifying the severity of road cracks in their experiment [6, 19]. Ha J. et al. conducted various experiments on segmentation and classification to detect road cracks and proposed a network using U-Net and Mobilenet-SSD together [32]. U-Net is a CNN developed for biomedical image processing to work with fewer training images and to yield segmentations that are more precise. Because U-Net adopts a patch method that splits the entire image into grid tiles and processes them separately, it shows faster processing speed than the conventional sliding window method that frequently recalculates the overlapped window area. In particular, the authors believed that U-Net’s ability to segment cell structures well would be suitable for segmenting lattice-shaped cracks such as alligator cracks [14, 32]. The authors conducted object detection using an SSD model to confine the crack region and assess the severity. SSD is one of the representative object detection model that uses a single-stage approach to detect multiple objects in an input image, as opposed to a two-stage model that uses a regional proposal mechanism such as R-CNN and Fast(er)-R-CNN. SSD is known for its speed and simplicity, which uses feature maps of different sizes to provide high detection performance while minimizing the impact of changes in the size of objects in the image. In their experiment, the SSD300 model employing Mobilenet v1 as the backbone is used to achieve fast operation while maintaining good performance even with limited computing resources [5, 9, 15]. It is necessary to briefly review the experiments they conducted in that this study is an extension of their study so that a severity assessment can be done. Following Fig. 3 outlines two experiments conducted by Ha J. et al. They compared the performance of two Mobilenet-SSD networks trained separately on the original and the mask images to detect five types of cracks. The results of the network trained on both images are summarized in Table 1 briefly. The mAP (mean Average Precision) values obtained from the two networks were 0.6818 and 0.9382, respectively. The network trained on the original image did not properly identify the crack types, but the network trained on the mask image was able to identify them with very high accuracy. This result indicates that masking of images is very important for crack identification using object detection.

Fig. 3
figure 3

Two networks trained on different image datasets

Table 1 Performance comparison of two networks trained on different image datasets

Thus, a good segmentation method is needed to obtain an mask image automatically from the original image without manual intervention. For this reason, Ha J. et al. added U-Net, FPHBN [17], and FPN networks for crack segmentation before object detection network and compared their performance. However, even in U-Net, which showed the best result among the three, the value of mIoU (mean Intersection over Union) was only about 0.4256, so the crack could not be properly segmented. To improve crack segmentation performance, another experiment was performed with the configuration shown in Fig. 4 below, using different segmentation networks for linear cracking (AC, LC, and TC) and area cracking (patching and porthole). Significant performance improvements were achieved in this new configuration, and the results are summarized in Table 2 below.

Fig. 4
figure 4

Separated segmentation networks for object-detection

Table 2 Test result of separated segmentation approach

The study of Ha et al., however, still needs to add severity assessment for the implementation of automated PMS. In addition, to classify the input images fed to the two U-Nets that perform segmentation by crack type, we need another classifier in front of the U-Nets. The remainder of this paper deals with the extension of the above network and its experimental results.

3 Extensions for assessing crack severity

The severity criteria for road cracks can vary from country to country, but in general, depending on the type of crack, the maximum crack width or the relative ratio of the damaged area is calculated and classified into three levels: low, medium, and high. High severity cracks can lead to dangerous situations, so it’s very important for PMS to detect and repair them quickly. Table 3 summarizes the severity assessment criteria used in this study, which was made with the help of construction expert from South Korea by referring previous studies [13]. The authors applied the criteria in Table 3 to determine crack severity for the input image representing an area 0.6 m wide and 1.06 m long.

Table 3 Criteria for assessment of road crack severity

3.1 Severity of linear cracking

The severity of linear cracking such as AC, TC, and LC is assessed based on the maximum width or thickness of the crack line. The actual size of a pixel in the input image is calculated from the size of the captured area easily. In this study, since an area of 0.6 m in width and 1.06 m in height is captured as a 224 × 224 image, 1 pixel is about 2.68 mm in width and 4.73 mm in height. Crack severity is estimated as high, medium, and low based on the maximum crack width that is calculated from the segmented image. The accuracy of the model is measured by comparing this estimated severity with the severity obtained from the manually-created mask image. Figure 5 is a brief summary of the severity estimation process for linear cracking used in this study.

Fig. 5
figure 5

Severity estimation for linear cracking

The accuracy of the severity estimation for the 4383 linear cracking images—AC: 1755, LC: 1123, TC: 1505—achieved 94.39%. This high-accuracy result shows that automatically segmented images can be practically utilized for crack severity estimation instead of manually-created mask images. Table 4 summarizes the severity estimation results for linear cracking.

Table 4 Result of severity estimation for linear cracks

3.2 Severity of area cracking

The severity of area cracking such as pothole and patching is assessed by calculating the proportion of distress region to the whole image. The crack severity of area cracking is also estimated as high, medium, and low for segmented images, just like linear cracking. The accuracy is again measured by comparing this estimated severity for the segmented image with the severity obtained from the manually-created mask image. Figure 6 outlines the severity estimation process for area cracking used in this study.

Fig. 6
figure 6

Severity estimation for area cracking

In this study, the severity is calculated as the ratio of the crack area to the total image area, but it can also be defined as the ratio of the crack area to the bounding box that is found through object detection. In such cases, the metric values that define the severity level should be adjusted accordingly. The accuracy of the severity estimation for the 2267 area cracking images—pothole: 1562, patching: 705—recorded 89.68%. Compared to linear cracking, the accuracy is relatively low, but it is still sufficient for practical use. Table 5 summarizes the severity estimation results for area cracking.

Table 5 Result of severity estimation for area cracking

4 Automatic crack analysis for PMS

Through our experiments, it was confirmed that the severity estimation could be performed with high accuracy even with the use of automatically segmented images. For the crack image to be properly segmented, however, the input image must be pre-classified into linear and area cracking and fed to each segmentation network. That is, a classifier that determines the type of input image as a linear or area cracking from the start is required for severity assessment in PMS. In this study, the SqueezeNet-based classifier trained in the previous study was used [6, 19]. Figure 7 shows the overall architecture of proposed system for automatic crack analysis.

Fig. 7
figure 7

Automatic crack analysis system

For the training of SqueezeNet and U-Net, the following sets of parameters summarized in Table 6 have been used. These values are selected from the best training result over the random sampled values within the search range.

Table 6 Parameter values for experiment

The SqueezeNet used for classification was trained with 5320 images that account for 80% of dataset from the total 6650 images. It discriminates whether the input image is a linear or area cracking with an accuracy of 99.6%. The image classified by the SqueezeNet is fed to dedicated U-Net that is responsible for segmentation of linear cracking or area cracking to generate a black & white segmented image. This segmented image is fed again to Mobilenet-SSD, which performs object detection, and the detailed crack types and crack regions are determined. Finally, the detailed crack types and crack regions are used together with the segmented image to evaluate the crack severity. Table 7 shows the final accuracy of our automatic crack analysis system. For this result, fivefold cross validation has been carried out, and the average accuracy is given in Table 7.

Table 7 Overall performance of crack analysis system

The final accuracy represents the percentage of accurate predictions on both the crack type and severity of cracks tested on 1330 test images that account for 20% of total images left for testing. For linear cracking, testing with 858 images showed that both crack type and crack severity could be determined with 93.27% accuracy. For area cracking, the accuracy of 87.43% is recorded for 472 images. In total, suggested crack analysis system achieved 91.2% accuracy for both crack type classification and crack severity assessment. Since the accuracy of classification done by the SqueezeNet is about 99.6%, the remaining 0.4% of test images are misclassified in types, and fed to wrong U-Net for segmentation. Out of those 5 misclassified test images, actually 4 images which account for 80% of misclassified test images are correctly reclassified into their original crack types at the object detection stage done by Mobilenet-SSD. If an image is initially misclassified, it is possible that the segmentation result will be of poor quality, leading to the wrong severity level. It seems likely that the more test data will be needed to better evaluate the detailed performance of our proposed system, especially with respect to the impact of misclassification on severity assessment.

In this system, a classifier using SqueezeNet was used to classify the input into linear cracks and area cracks before segmentation, but after segmentation, detailed classification of crack types using Mobilenet-SSD is being performed again. This somewhat overlapping function of the current system needs to be further optimized through continuous research and the introduction of new networks that segment and classify cracks altogether. Until then, it seems practical to use existing networks in combination to satisfy field needs.

5 Conclusions

In this study, the existing research has been advanced to enable the realization of PMS by expanding the crack types into five types and performing crack severity assessment with high accuracy through automated segmentation. The contribution of this study is significant in that it can perform crack detection, classification, and severity assessment in one system with high accuracy by synthesizing studies related to road cracks that have not been effectively combined so far. To configure the whole system, SqueezeNet, U-Net, and Mobilenet-SSD have been combined. By using two U-Nets to separate segmentations for linear cracking and area cracking, the accuracy of crack severity assessment has been improved to 94.39% for linear cracking and 89.68% for area cracking. The final system achieved an accuracy of 91.2% for both the assessment of crack severity and the classification of crack type.

The data used for training and testing in this study are 2D images without any depth information. If we can augment the data with depth information, then we can further enhance our result to better assess the severity of cracks like patching and pothole. Although the authors vaguely consider using pixel darkness to infer crack depth from the 2D images, this can be a challenging task because the lighting conditions of captured image greatly affects to the brightness of the whole image. With the introduction of 3D road scanner, it is expected that a higher level of accuracy can be achieved by utilizing the depth information from the captured 3D model of the road pavement in the near future. The authors put a lot of effort to use small and efficient networks, but the use of many network modules inevitably has led to a large system. In particular, since the operation of the classification network added to increase the segmentation performance overlaps with the identification of crack types in the object detection network, an in-depth study should be conducted to reduce the size of the overall system.