Automatic Segmentation and Enhancement of Pavement Cracks Based on 3D Pavement Images

Pavement cracking is a significant symptom of pavement deterioration and deficiency. Conventional manual inspections of road condition are gradually replaced by novel automated inspection systems. As a result, a great amount of pavement surface information is digitized by these systems with a high resolution. With pavement surface data, pavement cracks can be detected using crack detection algorithms. In this paper, a fully automated algorithm for segmenting and enhancing pavement crack is proposed, which consists of four major procedures. First, a preprocessing procedure is employed to remove spurious noise and rectify the original 3D pavement data. Second, crack saliency maps are segmented from 3D pavement data using steerable matched filter bank.Third, 2D tensor voting is applied to crack saliencymaps to achieve better curve continuity of crack structure and higher accuracy. Finally, postprocessing procedures are used to remove redundant noises. The proposed procedures were evaluated over 200 asphalt pavement images with diverse cracks.The experimental results demonstrated that the proposed method showed a high performance and could achieve average precision of 88.38%, recall of 93.15%, and F-measure of 90.68%, respectively. Accordingly, the proposed approach can be helpful in automated pavement condition assessment.


Introduction
Effective and efficient pavement condition assessment is crucial for determining pavement maintenance schedules, evaluating performance, planning rehabilitation, etc.Because pavement cracking is an important indicator of pavement deterioration and deficiency, it is widely considered as an integral part of regional pavement distress surveys [1].Many studies show that timely and accurately inspected pavement cracks can help transportation agencies reduce road maintenance cost and extend pavement service life [2,3].
In some developing countries, pavements are mainly investigated by human inspectors [4].The traditional manual pavement inspection is unsafe, time-consuming, expensive, and subjective.Hence, the automation in pavement inspection and evaluation has become increasingly popular and dominant [5].There are two types of imaging techniques extensively adopted in automated pavement data collection: two-dimensional (2D) imaging technologies and three-dimension (3D) imaging technologies.The early 2D imaging based pavement detection systems [6,7] were developed by integrating hardware such as line-scan cameras, laser illumination systems and other auxiliary equipment.With the emergence of advanced technologies such as high-speed and high-resolution 3D industry cameras, the pavement inspection methods based on 3D scanning have attracted more and more interests for the following reasons: (1) The surface information in 3D images collected by advanced data acquisition systems is more accurate than those in 2D images.Figure 1 shows a comparison between 2D and 3D pavement images.The 2D images collected in gray-scale formats have limited data range, while the 3D images are able to represent the actual depths of pavement surfaces.
Due to recent developments and innovations in hardware devices, laser line-scanning based techniques tend to become mature enough for high-resolution 3D pavement data collection.Laurent et al. [13] developed a Laser Crack Measurement System (LCMS) composed of two laser profilers to acquire high-resolution 3D road surface data.Moreno et al. [14] proposed an electric vehicle equipped with a laser scanner to achieve high density of surveyed points.Furthermore, the PaveVision3D System mounted on Digital Highway Data Vehicle (DHDV) (Figure 2) is able to obtain full-lane-scale 3D data in 1-mm resolution at a highway speed up to 100 km/h no matter during night-or day-time [15,16].
Although automation in pavement data collection has achieved remarkable progress, automated distress detection still faces great challenge due to the complexity and diversity of pavement surfaces [17,18].As a major task of distress survey, automated crack detection has been studied for a long time.Intensity-thresholding methods have been proposed to transform the pavement images into a binary domain such that the pavement distresses are easier to be recognized [19,20].However, those methods fail to handle images with unevenly distributed illuminance.Edge detection based methods, such as morphological filters [21] and BEMD [22], are also introduced for pavement crack detection.Nevertheless, those methods tend to generate discontinuous or nonintegral cracks.Wavelet-based approaches [23] have been utilized to decompose the original data into different frequency subbands.Unfortunately, those approaches have limitations in detecting discontinuous or high-curvature cracks.Currently, there are some successful applications of machine learning techniques, such as Artificial Neural Network (ANN) and Support Vector Machine (SVM), in classifying cracks on pavement surface [24].
In rest of this paper, firstly, the proposed method is explained in detail.Secondly, an image library of 200 pavement 3D data verifies the accuracy and effectiveness of the proposed method.Lastly, discussion and conclusions are given, respectively.

Methodology
In this paper, all the testing and validation data are 3D pavement images collected by PaveVision3D System.Each 3D image in size of 2048 × 4096 is able to cover roughly 2 × 4 m 2 surface area with 1 mm resolution.As shown in Figure 3, the proposed method represents the following procedures: (1) Preprocessing techniques are utilized to remove noises and to rectify pavement data.(2) Steerable Matched Filter Bank (SMFB) is applied on 3D pavement data for segmenting crack saliency maps.(3) 2D Tensor Voting is used to enhance the crack continuity based on the crack saliency maps.(4) Postprocessing is conducted to remove false-positive errors.are needed at the first stage.A typical 2D Gaussian filter with standard deviation  is used for noise removal.Equation (1) gives each value of the 2D Gaussian filter at position  0 = (, ).

𝑔 (p
In this case, the size of the filter is 3 × 3, and  is equal to 2. In order to determine the presence of a spurious noise at each point (, ), the following criterion is conducted: where (, ) is the original pixel value at the point (, ), (, ) is the filtered pixel value at the point (, ), and thres is a given threshold.
After obtaining   (, ), another big-size 2D Gaussian filter is applied to smooth the entire image.In this case, the filter size is 101 × 101, and  is equal to 80. Let   (, ) be the convolved images based on   (, ); then the rectified image will be where   is the rectified pixel value at the point (x, y);   is the convolved pixel value at the point (x, y).
Figure 4 shows sample profiles in both transverse and longitudinal directions.The top images (a) and (b) show the original pavement profile.The bottom images (c) and (d) illustrate rectified profiles based on (3).The red lines are their smoothed profile.[25] is a linear combination of a few basic filters.Particularly, steerable filter is popular in crack and ridge detection due to high efficiency [26][27][28].In this study, the SMFB method uses second-derivative Gaussians as basic filters.Equation (1) gives the 2D Gaussian with variance , and (4) gives its second derivatives.Equation (5) shows the formulation of the filter (, ), where  ∈ [−/2, /2] is the orientation of the filter:

Steerable Matched Filter Bank (SMFB). The steerable filter introduced by Freeman and Adelson
A filter bank is generated by using Steerable Matched Filter, namely SMFB.Table 1 lists 52 components of SMFB with different parameters, filter size, , and .Four different  are assigned to consider the varying widths of cracks.The orientations are incremented with a fix angle interval 15 ∘ to capture crack segments in varying orientations.In order to yield nearly zero responses within noncrack area, the filters are shifted to have a zero mean.All filters in SMFB are illustrated in Figure 5.
Each preprocessed 3D pavement image is convolved with all 52 filters in SMFB.At each pixel, only the maximum convolutional response is preserved as a result of SMFB operation.Mathematically, (6)∼ (8) give the specific calculation procedures.
where p denotes a pixel located at (, ); () denotes the preprocessed 3D pavement data;   (, ) denotes the  th steerable filter in SMFB with parameters , ;   (; , ) denotes the response based on convolutional output over   and (). * () denotes the maximum response; and  * () denotes the binary crack map by thresholding implement.As illustrated in Figure 6, crack saliency maps are generated after implementing SMFB.Due to pavement texture, some noncrack pixels have high responses, resulting in falsepositive errors.In addition, some crack pixels have low or even zero response, resulting in crack discontinuity and falsenegative errors.Thus, additional procedure is needed to improve the detection accuracy.

Tensor Voting.
Tensor voting (TV) is a perceptual grouping method proposed by Guy and Medioni [29].In computer vision, TV is widely utilized to infer curvilinear structures [30], locally link the corrupted data [31], and extract the lines and curves from noisy images [32].It is highly possible that some cracks have weak responses to the SMFB due to various reasons.Consequently, some cracks may be detected as discontinued fragments.In the paper, TV is adopted to enhance connections between crack fragments.
A second-order symmetric positive semidefinite tensor  is associated with each pixel in the crack saliency maps.T is mapped to a matrix (  ) 2×2 , whose eigenvalues are  1 ≥  2 ≥ 0, and corresponding eigenvectors are  ⇀  1 and  ⇀  2 .Thereby the tensor can be deposed as follows: The first term 9) represents a stick tensor as an elongated ellipsoid and the second is called "ball tensor" as a circular disk.First, a ball voting is used to estimate the crack-curve orientation at each crack pixel from the crack saliency maps, that is, each detected crack pixel is initialized as a ball tensor ( 1 0 0 1 ), and noncrack pixels do not participate in voting.The ball voting is conducted by adding the fields generated by the stick tokens spanning 360 ∘ at regular intervals.In this way, the principal direction at each crack pixel is found, which is set as the orientation of the stick token.Then a stick voting is applied by means of casting the votes from each stick token to all the pixels (crack pixels and noncrack pixels).For stick voting, assuming that  is the origin location and  is the voting location as shown  in Figure 7, the voting field can be defined by using a decay function: where s is the arc length from this token to a target point in the voting field, k is the curvature,  is the scale of voting, and  is a parameter controlling the degree of decay with curvature defined in (11) as As TV is used to enhance connections between crack fragments in this paper, after the stick voting stage, the dense tensor map is extracted as tensor voting result, which is different from the original tensor voting method presented in  [33].Lastly, the OR operation is executed on the dense tensor map and the crack saliency map.An overall illustration of our method is shown in Figure 8.After TV operation, missing parts of the detected cracks could be retrieved, resulting in enhanced continuity of cracks.Figure 9 provides typical examples of connecting discontinued parts using TV.In Figure 9, small fragments close to each other are linked together as a whole part, as highlighted by the dashed circles..

Postprocessing.
After the TV operation, some noise pixels may still exist.The remaining noises can seriously affect the precision of crack detection.Hence, postprocessing is needed to further remove noises and refine the final detection output.In this paper, all connected components less than 1000 pixels are removed.

Experimental Results and Comparison
In this paper, precision, recall, and F-measure are used to evaluate the performance of the proposed method.Precision measures the exactness or fidelity of detection and segmentation, while recall describes the completeness of detection and segmentation.F-measure is the harmonic mean of precision and recall, where an F-measure reaches its best value at 1 and worst at 0. The definitions of precision, recall, and F-measure are shown in In ( 12) and ( 13), TP denotes true positives; that is, pixels labeled as crack pixels in the ground truth are correctly recognized as crack pixels; FP denotes false positives; that is, pixels labeled as noncrack pixels in the ground truth are incorrectly recognized as crack pixels; FN represents false negatives; that is, pixels labeled as crack pixels in the ground truth are incorrectly detected as noncrack pixels.The ground truths of cracks were obtained by two steps: at the first step, crack maps were generated automatically by applying method proposed by Zhang [34]; at the second step, manual labeling was used to refine the crack maps provided by the first step.A pixel-to-pixel comparison is conducted during the evaluation.
A test data set consisting of two hundred 3D pavement images is selected to evaluate the proposed method.This test set covers images from different road sections, various lighting conditions (i.e., daylight and nightlight), and diverse severities of cracks (i.e., low-level or no crack, mediumlevel crack, and high-level crack).The computer hardware used for experiments is summarized as follows: Intel Core i7-6700T, 3.00 GHz CPU, and 32 GB RAM.All algorithms are implemented in MATLAB platform.Figure 10 presents some typical detection results using the testing images.As shown in Figure 10, the proposed method can detect cracks with varying widths, severity levels and contexts.In addition, typical false-positive and false-negative errors are shown within the dashed rectangles in Figure 10(b) and within the dashed circles in Figure 10(d), respectively.
Accordingly, the precisions and recalls for all the selected images are illustrated in Figure 11.The precision fluctuates between 84.00% and 97.00%, and the average precision is 88.38%, while the recalls range from 85.00% to 99.00%, and the average recall is 93.15%.In addition, the F-measure is between 85.00% and 97.00%, and the average F-measure is 90.68%.Many research works reported the performance of their methods for crack segmentation, as listed in Table 2, which demonstrates that the proposed method in this paper has a higher performance than those using other methods.It is acknowledged that the same set of images/data should be used to compare all methods, judge their performance, and estimate their potential.That requires direct access to both dataset and programs/codes/algorithms. Further efforts taken by different related research agencies are needed to create a benchmark dataset and form a comparison protocol.

Conclusions and Future Work
Automated pavement crack survey has drawn more and more attentions from both researchers and transportation agencies.This article proposed a novel method for segmenting crack maps based on 3D pavement images.The proposed method implements the SMFB operation, Tensor Voting, and preprocessing as well as postprocessing procedures in a specific order to detect cracks from 3D pavement images.
The experiment using 200 testing images demonstrated that the proposed method can achieve a high level of detection efficiency, quantified as average precision 88.38%, average recall 93.15%, and average F-measure 90.68%.The average precision was slightly lower than the average recall, implying that some noncrack pixels were incorrectly detected as crack pixels.A possible reason is that the edges of some pavement markings present height differences similar to those occurred at cracking area.The proposed method used the same set of fixed parameters to yield similar detection accuracies for all 200 testing images, implying that the proposed method has achieved an efficient generalization over varying cracks on 3D pavement surfaces.
Although the proposed method is efficient in detecting pavement cracks, it still needs to consume roughly 10.3s per image due to expensive computations primarily introduced by TV.In the future, parallel computing techniques may be considered to optimize the processing speed.

Figure 1 :
Figure 1: The 2D pavement image (left) and 3D pavement image (right): the up green line is the transverse profile of a patch of pavement surface.

Figure 3 :
Figure 3: Flowchart of the proposed method.

𝑟Figure 4 :
Figure 4: Examples of pavement 3D data rectification: (a) and (b) the original transverse profile and longitudinal profile; (c) and (d) the corresponding transverse profile and longitudinal profile.

Figure 7 :
Figure 7: Votes cast by a stick tensor at the origin O.

Figure 9 :
Figure 9: Examples of TV approach: the size of each patch is 700 × 300 pixel 2 .The patches in the left column are cropped from the crack saliency maps; the patches in the right column are cropped from the results of TV operation.
Pavement Data Rectification.3D pavement data may have noises caused by invalid laser points and vehicle vibration or movement.Therefore, spurious noises removal and pavement 3D data rectification

Table 2 :
Performance comparison of different methods.