An Automatic Road Distress Visual Inspection System Using an Onboard In-Car Camera

Speaking of road maintenance, the preventive maintenance strategy is preferable for most governments. Many governments possess special vehicles that can accurately detect and classify many types of road distresses. By running these vehicles frequently, small road distresses will be detected before growing into the big ones. However, because running these huge and expensive vehicles is not easy, in practical, it usually ends up with infrequent road inspection regardless of having automatic road inspection vehicles. In this paper, we focus on investigating and developing an automatic and nondestructive visual inspection system whose setup and usage are designed by considering the context of drivers, driving styles, and road conditions in Bangkok, the capital city of Thailand. Our proposal includes a workflow diagram of a vision-based road inspection system that is capable of detecting, classifying, tracking, measuring, and pricing road distresses. As for the proof-of-concept, our current system focuses on detecting one specific type of road distresses called pothole, using only one onboard in-car camera. Experimental results reveal that the context of Bangkok


Introduction
For the government of Thailand, preventive maintenance is a preferred strategy for Department of Highways (DOH) whose responsibilities include keeping asphalt and concrete roads of more than sixty thousand kilometers in good conditions.In order to successfully apply the preventive maintenance strategy, frequent road inspection is required so that any small minor road distress is detected and immediately repaired before it becomes a big major road distress that may lead to unexpected consequences.So far, manual inspection by human experts is one of the most practical solutions; but an obvious downside is that there are too many and too long roads whereas there are a small number of human experts.Hence, frequent and deliberate inspection as required by the preventive maintenance strategy is not possible in practice.Besides, results from humanbased inspection are too subjective, making them difficult to understand and compare.
To frequently inspect thousands kilometers of roads, automatic road inspection systems are needed.In the past decade, there are many of such systems available both in research communities and in commercial markets; many governments including the government of Thailand have already had them in possession.Nevertheless, despite having expensive road analyzing systems, it has been reported that manual inspection by human experts is still the most practical solution due to its ease of use.To settle this contradiction, the goal of this paper is to study and develop a low cost, easy to use, and nondestructive automatic road inspection system using an onboard in-car camera whose popularity in Thailand has been increasing recently, particularly since March 2017 when Office of Insurance Commission (OIC) announced 5-10% discount in car insurance premium for any car equipped with one or more onboard in-car cameras.
Although nondestructive vision-based road analysis systems are not new in research communities, many problems and challenges have remained unsolved and researchers from many countries have continued proposing new solutions as some recent examples from United States [1,2], Canada [3], France [4], and Italy [5].Nevertheless, in the context of roads in Bangkok, the capital city of Thailand, there are some unique scenarios that have never been addressed by previous works but they do affect the ease of use and practicality of an automatic road analysis system in Bangkok, for instance, inconsistent road conditions, damaged road markings, confusing driving styles, and traffic problems.
In the remaining of this paper, Section 2 describes some recent techniques regarding vision-based road distress analysis systems.Then, road distress analysis is discussed from vision-based perspective in Section 3. Section 4 describes more specific details regarding the context of Bangkok together with our proposed proof-of-concept system and experimental results.Finally, Section 5 concludes this paper.

Related Works
In order to give readers a broader background, in this section, we discuss previous researches in automatic road analysis systems, particularly vision-based systems.The main focus of this section is to introduce methods used by previous systems and researches in order to do further analysis in case of Bangkok, Thailand, in the latter sections.
Recently, Mohan and Poobal [9] provided a good review of vision-based road crack detection showing that this field of automatic road analysis systems is beneficial and essential.Previously, there were many road crack detection methods presented.For example, Pascual and Ortiz [10] presented some defect detection algorithms for hull inspection using the morphological properties of cracks in steel surfaces.The crack location was identified using the corrosion detector and the crack detector.These algorithms ran almost in real time.Moreover, Sorncharean and Phiphobmongkol [11] proposed a crack detection method using image processing algorithms in asphalt surface images for the purpose of reducing human workload in the highway management system.They used Grid Cell Analysis (GCA) for identifying cracked cells and reducing false detection of shadow border.
Lins and Givigi [3] created a vision-based system for automating the crack measurement process.Using a camera installed in a robot, their automatic crack detection and measurement were done by integrating the RGB color model with the particle filter.With this method, they can estimate the number of pixels in a cross section and interpret it as a crack's dimension output.In addition, Aldea and Le Hegarat-Mascle [4] presented a-contrario modeling-based strategy for crack detection using unmanned aerial vehicles.This strategy reduces the need of defining various thresholds during crack segment detection, so it is more convenient to reconnect each crack segment despite different conditions of images and different structures' degradation.
Ryu et al. [12] built an image-based pothole detection system based on 2D images.This work was expected to help ITS (Intelligent Transportation System) service and road management system.Furthermore, Wang et al. [2] proposed a multiple high-level context-driven method to detect cracks with image processing.By using geometric/structural context and physical context, a sparse decomposition problem was formulated to incorporate the context.It was then utilized for automatic inspection of aircraft's surface and subsurface defects.By applying this technique, it helps reduce false detections.More recently, Phung et al. [13] presented a crack detection method using histogram analysis; unmanned aerial vehicles were used to collect the data, and then they built a 3D model of the structure from laser scanners for using in clustering histogram analysis and peak detection.
Nevertheless, our proposed system is different from most existing works [2,4,[10][11][12][13].The work of [10] does not focus on solving the problems of road distresses, particularly in the context of Bangkok as our paper does whereas the work of [4] is based on unmanned aerial vehicles not on-road vehicles like ours.In [11], even though their focuses and context are similar to ours, which is to help reducing or replacing visual inspection of Thailand's highway management system, their work mainly focuses on how to detect and recognize road cracks while barely mentioning about other challenges in real situations; for example, their input images were specifically cropped to contain only the target crack without any unrelated salient objects like shadow, road markings, etc.In [12], although this system can perform vision-based pothole detection, it still has several limitations including misdetection due to unexpected shapes of potholes.In [2,13], they utilize their systems in different aspects compared to ours.For instance, according to their prototype experiment in [13], their method worked well in the artificial vertical wall, but there is no test regarding their method in the context of horizontal road.

Landscape of Road Distresses in Computer Vision
In Bangkok, there are two main types of roads, asphalt roads and concrete roads; both share some distresses and also have their own unique distresses.Excluding characteristics that cannot be recognized by visual inspection, there are three main characteristics to be considered during detection and classification of road distresses: (i) Dimension (2D, 3D, or both): this is a visual characteristic that explains whether a distress can be detected solely by either 2D or 3D image inspection, or it is a distress that requires inspecting both 2D and 3D visual characteristics altogether.For example, rutting (Figure 1(a)) is a road distress whose visual appearance can be easily recognized in 3D but not in 2D.
(ii) On-surface pattern (linear pattern, area pattern, or both): this is a visual characteristic that explains whether a distress appears as some linear line patterns on road surfaces (e.g., crack in Figure 1(b)) or it is a distress with texture spanning across a continuous area (e.g., bad patching in Figure 1(c)).
(iii) On-surface location and orientation: this is a visual characteristic required by some road distresses that strongly depend on their occurring location and/or orientation on the road.For example, a corner crack in Figure 1(d) is a crack that specifically occurs near the corner.Despite how many types of road distresses or which visual characteristic(s) to be detected and classified, the overall computational workflow for completed vision-based automatic road analysis system is similar to Figure 2. Note that this workflow is a conclusion from our discussion with civil engineers in Bangkok who explained about their expectation of an automatic road distress evaluation system.From the figure, it starts from A: Preprocessing, preprocessing an input image in order to make it ready for further processes.The preprocessing can range from noise removal, image rectification, geometric calibration, image resizing or subsampling, and other image processing techniques that help enhance quality of the input image, increase accuracy, and reduce computational times in the latter processes.
After that, B: Detecting, Grouping, and Classifying is a process to detect road distresses (if any) in the preprocessed image, group the detected results (if in the reasonably proximity), and classify each grouped result to its most appropriate distress type(s).Results of this process include locations and types of road distresses found by the system.This process is perhaps one of the most complicated processes and is the core part for this kind of systems.The straightforward solutions are to detect road distresses by looking for visually salient objects on road surfaces.However, this is not an easy task due to two main reasons.First, it is because visually salient objects on road surfaces are not only road distresses but also road surface markings, road patching, shadows, etc.Therefore, finding road distresses among these visually salient objects is difficult, particularly for Bangkok's roads that are not in good conditions where clear and stable road surface markings cannot be ensured.Second, for roads in good conditions, most input images contain no road distress.This means that not only there is a high risk of false alarms but also there are very few images of road distresses to be used for proper training and testing the developed system.To avoid this problem, some previous works like [14] developed an algorithm that includes detecting normal road surfaces in addition to detecting visually salient objects or road distresses.
Once the locations and types of all road distresses regarding a single input image are known, some systems may extend these results in order to continuously detect road distresses in a stream of images.At this point, C: Counting and Tracking is a useful process to correctly count the number of road distresses detected in a series of continuous images.Unless using a line-scan camera as in [14], shooting a stream of images by a common camera (i.e., an area-scan camera) causes visual overlapping among consecutive image frames.This means that without a good visual tracker, one road distress whose visual appearance spans across many image frames will be counted more than once.As an additional advantage, a good visual tracker will help reduce computational loads and times used by process B in the next image frames.
While the processes A, B, and C mainly rely on knowledges of computer vision and image analysis of computer scientists, process D: Measuring and E: Pricing require expert knowledges of civil engineering regarding how to measure size and severity of each type of road distresses and how to estimate its fixing cost.Nevertheless, to the best of our knowledge, there are few researches that focus on these two processes as they require seamless knowledge integration and close cooperation between computer scientists and civil engineers.

Proposed System
4.1.Challenges of Bangkok.From Bangkokian perspective whose daily life suffers from the world most severe evening rush hour traffic (ranked by CNN Money in February 2017), it is preferable not to introduce any more traffic jam by blocking roads or lanes in order to run a huge full-scale automatic road distress analysis system/vehicle as proposed in [14,15].Although that kind of systems produce better and more reliable results, its ease of use is not practical in real life and very difficult to perform inspection frequently.
Apart from severe traffic, another challenge of roads in Bangkok for any automatic road analysis system including self-driving cars is instability of road and road surface marking conditions.For example, there are some previous works [16,17] which detect road lanes by looking for white line road markings.However, this assumption is not always true in Bangkok as some roads with good maintenance may have clear white line road markings whereas the others may have damaged markings or no visible marking at all as shown in Figure 3.

Proof-of-Concept.
To maintain the preventive maintenance strategy by performing road inspection as frequent as possible, we propose an automatic road distress visual inspection system that focuses on the following: (i) Ease of use: the system must be easy to setup, easy to use, and matching with Bangkokian lifestyle.
(ii) Low cost: the system must not involve additional equipment with high price tags.
The system will consist of an onboard in-car camera as it has been increasingly popular among Bangkokian drivers recently.For simplifying visual analysis by removing nonroad visual information as much as possible, it is recommended that the camera is fixed at a firm location and is pointing downward to see the road surface from a bird-eye-view or similar perspective.Although this setup is easy and low cost, its obvious disadvantage is that it is not capable of detecting or recognizing road distresses with significant 3D visual characteristics.
As a proof-of-concept system, this paper will focus on developing a visual detection algorithm (process B in Figure 2) for pothole which is one type of road distress as shown in Figures 4 and 5 (except for experiment 8 whose image is not a pothole road distress).This is in order to concentrate on the core process first and also to understand actual problems regarding the context of roads in Bangkok.Speaking of road distress detection systems, lot of works have contributed to detection and classification of different types of road cracks.In comparison to road crack detection, the number of previous works in pothole detection is small.Our proof-of-concept system chooses to deal with the pothole road distress because potholes are direct nuisance to all car drivers and big potholes can even lead to fatal road accidents.
In research communities, there are two popular alternatives for pothole detection, vibration-based alternatives and image-based alternatives.The vibration-based alternatives as in [18][19][20][21] usually involve using an accelerometer (together with a gyroscope in [20]) to measure real-time vibration of the car; when salient vibration is detected, it is implied that the car is going through a pothole; size and severity of the experienced pothole can also be computed by further signal analysis.Vibration-based solutions are popular for the task of pothole detection.This is because it requires less computational effort than image-based solutions.But the downside is that these vibration-based systems cannot detect a pothole without the car falling into it first.As our long-term goal is to develop a system that allows government officers to perform road inspection frequently and facilitates their works of road distress maintenance cost evaluation, vibration-based solutions are not suitable because they are specific to pothole road distresses.In contrast, image-based solutions are more flexible for future extension as there are many other types of road distresses that can be detected by visual inspection.
There are not much works proposing image-based pothole detection.Some available works include [5,6] where machine learning techniques of SVM (Support Vector Machine) and LBP (Local Binary Patterns) cascade classifier are used, respectively.Applying machine learning techniques has become interesting computer vision solutions in the past decade.However, a good machine learning model is a direct consequence of a good training dataset with the sufficient numbers of positive and negative samples.This is the reason why our proof-of-concept decides not to use any machine learning solutions at this moment, because our positive samples (pothole images regarding roads in Bangkok) are not enough.Creating a good dataset for Bangkok's road distress images is one of our future works that will require a lot more time and effort to accomplish.
Other image-based pothole detection systems that do not involve machine learning were proposed in [12,22]; both proposed handcrafted image processing algorithms for pothole detection.In [22], their pothole assumption includes "any strong dark edge within the extracted road surface" together with some size constraints; the limitation is that this Canny-edge based algorithm is not able to deal with potholes whose edges are barely visible due to white sand or dirt.In addition, they do not mention about how to deal with strong dark edges caused by nonpothole instances like shadows or damaged road markings.As for the work of [12], their pothole detection algorithm shares some similar subalgorithms as our algorithm proposed in Section 4.3, image denoising, image binarization, morphological operations, and Sobel image first derivation.However, the detail implementation and decision are totally different.Besides the work of [12] does not include grouping/clustering potholes in the same proximity like ours.Note that grouping/clustering road distresses of the same (or similar) type in the same proximity are a feature inspired by our discussion with civil engineers, as they explained that fixing damaged roads are done by fixing many distresses (if any) of the same (or similar) type in the same proximity all at once.This means that if there are many distresses of the same (or similar) type appearing in the same proximity, they prefer counting them as one road distress to be fixed.

Image-Based Pothole Detection
Algorithm.This section explains our vision-based pothole detection algorithm.As we mentioned earlier, it is difficult to collect lot of example image frames regarding a specific road distress.From our experiences, some roads contained no distress at all for several kilometers; some roads contained many road distresses but no road distress that we were looking for.Hence, using a complicated model that requires lot of sample images like deep learning is not our choice at this moment.The algorithm written below is designed and concluded here by inspecting salient visual appearances of the pothole road distress and conducting many trial-and-error experiments.
(1) Preprocessing (a) Convert an input image from color to grayscale.(b) Resize (i.e., subsampling) the grayscale input image in order to remove noises and reduce computational loads in further steps.Instead of manually doing a sliding window and subsampling the image based on a predefined threshold value as in [14], from our experiments, using simple bilinear interpolation yielded similar results.(c) Denoise the resized image using nonlocal means denoising algorithm [23] in order to remove noises but keeping fine structures and details.
From many experiments, we found that this technique helps reduce noises on road surface significantly compared to other traditional image smoothing techniques.(d) Apply histogram equalization in order to enhance contrast in the denoised image.
(2) Detecting (a) Binarize the histogram-equalized image using a threshold value of  * V, when  is a predefined ratio value and V is the mean of intensities    computed from all pixels in the histogramequalized image.(b) Perform morphological erosion followed by dilation in order to remove tiny black noises and connect white areas in the binarized image.
(c) Find all contours in the morphed image.Then for each detected contour, reject the contour immediately if its area is too big or too small comparing to the entire image area.(d) Compute the first pothole-likelihood score by measuring a difference value between mean Table 1: Information of images used as inputs of our experiments as shown in Figures 4 and 5.
intensities of pixels inside and outside each contour.The bigger the difference, the higher the first pothole-likelihood score.This is one of our assumptions that a pothole usually consists of the inner area and the outer area with noticeably different intensities.(e) Compute the second pothole-likelihood score using a ratio of  sharp / all , when  sharp is the number of sharp contour perimeter pixels and  all is the number of all pixels in the contour perimeter.The higher the ratio, the bigger the second pothole-likelihood score.In our algorithm, the sharp contour perimeter pixel is implied from the high value of magnitude of gradient; the magnitude of gradient image is calculated by applying the Sobel image first derivation to the histogram-equalized image.Our intension of this step is to filter out nonpothole visually salient objects with blurry edges like soft shadows and water stains.(f) For each remaining contour, compute a contour acceptance score by equally weighted averaging the two pothole-likelihood scores from 2(d) and 2(e).Any contour whose acceptance score is lower than a predefined threshold will be rejected.(g) Detection result is a set of contours representing all detected pothole road distresses (if any).
(3) Grouping/Clustering (a) For each contour from the pothole detection result, find its corresponding minimum bounding (rotated) rectangle and assign one cluster to each rectangle.(b) For each cluster, compute the minimum distance between itself and the other clusters.In our algorithm, the minimum distance between two clusters is the minimum Euclidean distance between two vertices, when each vertex belongs to each cluster.(c) Merge two clusters into one new cluster if and only if the minimum distance between the two clusters is less than a predefined threshold.show some of our experimental results based on a small set of road distress images collected from Internet; experiments 1-7 are pothole road distresses whereas experiment 8 is a road patching.Note that because our experimental input images were different in dimension, distance from the camera to the road, shooting perspective, and road orientation (as shown in both figures and as concluded in Table 1), manual parameter tuning was required in the step 2(c) of the algorithm (i.e., rejecting contours that are too small or too big).Apart from this, all experiments shared the same set of parameters.In the future, once we manage to collect our own image dataset shot by the same shooting environment, this manual parameter tuning will no longer be required.
In experiments 1-7, it can be seen that the nonlocal means denoising algorithm [23] significantly smooths out noisy pixels originating from road surface materials and the histogram equalization helps emphasize sharp edges around each pothole.The sixth row of images shows all resultant contours of detected pothole; contours with colored bounding rectangles represent our pothole detection results whereas other grey-outline contours are those rejected by our algorithm.Finally, images on the last row show results of grouping/clustering; blue rectangles represent individual contours before grouping whereas yellow rectangles are the grouped results.
In experiments 1-7 where one or more pothole road distresses exist in input images, our proposed algorithm successfully locates and pinpoints the potholes despite unknown road surface's materials.But the problem as shown in results of experiments 5, 6, and 7 is false alarms caused by nonpothole road damage and white line lane marking.Using color criteria to differentiate between black potholes and white line lane markings is not a good idea, as there are cases when potholes are filled with white sand or dirt.Another limitation of our current algorithm is shown in experiment 8.In this experiment, not only there are false alarms caused by white line lane markings and shadows, but also a road patching is misinterpreted as being a pothole.This confusion between patching and pothole is not easy to resolve in our system because one onboard in-car camera (as assumed by our system) is not capable of doing accurate 3D measurement; it is hard to tell whether this visually salient object is a layer of patching material above the surface or it is a cavity of pothole below the surface.
Apart from the results and limitations discussed above, there are other problems we encountered during the development of this proof-of-concept system.Some interesting problems are described and concluded as follows: (1) Camera position and orientation problem: on one hand, inappropriate camera positions or orientations caused majority parts of images being blocked by the vehicle itself, wasting lot of image real estate for nothing.On the other hand, when fixing a camera firmly on a static pole extending far away from the vehicle, shadow of the pole itself appeared as an visual salient object in every image frame.Nevertheless, as long as the shadow appears at a static location, it can be easily eliminated using simple image masking technique.
Extending the length of pole to make it farther away from the vehicle and pointing the camera downward perpendicularly to road surface can reduce works regarding image rectification and make the most uses of image real estate at the same time.The downside is that the long extending pole can get in the way or cause accidents to other vehicles and commuters in the cramped and confusing traffic of Bangkok.
(2) False alarm problem: as shown in the misdetected results of experiments 3, 5, 6, 7, and 8 in Figures 4 and 5, there were many unrelated visual salient objects on roads that led to false alarms in our system.From our experiment, the most frequently found false alarms are white line road markings.Our attempt to eliminate this false alarm by looking for white rectangular shapes did not succeed as road markings in Bangkok were in unpredictable conditions (examples in Figure 3) and our system did not enforce a driver to drive straight in one single lane.Hence, unpredictable orientations of the vehicle relative to the road surface make it even more difficult to eliminate false alarms originating from white line road markings.
(3) Unable to detect large road distresses: at this moment, our algorithm as well as most previously proposed algorithms of vision-based pothole detection finds potholes by looking for any visually salient object whose visual characteristics match some predefined rules.However, in case that the size of pothole is very large, covering the majority of image area or spanning across many image frames, the pothole will no longer be a visually salient object.Hence, most vision-based pothole detection algorithms including ours will fail.
(4) No mechanism for precisely pinpoint the detected road distress: in order to maintain ease of use for Bangkokian driver's lifestyle, our system does not enforce a driver to drive slowly at a constant speed or drive straightly in one specific lane at all time.Speaking about driving in Bangkok, sometimes it means staying still on a road for hours whereas the other time it means lots of immediate accelerating and lane changing.With these Bangkokian driving styles, without any kind of mapping systems that is able to create a map of road in according to real-time car movements, it will be difficult to automatically and precisely pinpoint each detected vision-based road distress.To the best of our knowledge, no previous works in computer vision have solved this problem yet.

Conclusion and Future Works
In this paper, we study and develop an automatic visual inspection system in order for road distress inspection to be done easier and more frequently.To match with Bangkokian's driver lifestyle, our proof-of-concept system assumes using a single onboard in-car camera with our visual analysis algorithm for pothole detection.Our trial-and-error and experimental results reveal that there are many unmentioned problems according to the context of road conditions in Bangkok and the context of Bangkokian's driving styles, requiring additional computation modules to be added in other to maintain both ease of use and accuracy of the system.The current pothole detection algorithm still includes false alarms caused by nonpothole road distresses and unrelated visual salient objects, particularly inconsistent white line road markings.So this issue could be improved as our future works.Besides, we also plan to form a team to collect more images and videos regarding actual road conditions in Bangkok.This is in order to create a reliable image/video database of road surfaces and road distresses in Bangkok, which will definitely become useful when we scale-up the system to include other types of road distresses as well as when we move to more sophisticated solutions like machine learning and deep learning.

Figure 3 :
Figure 3: Example of white line road markings in Thailand.(a) is the marking in good condition whereas (b) is damaged markings (images from[7,8]).

Figure 4 :
Figure 4: Results of the 1st-4th experiments.Pothole road distress detection and grouping based on 2D image analysis (sources of all input images are concluded inTable1).

( d )
Repeat 3(b) and 3(c) until no pair of clusters can be merged.4.4.Experimental Results and Discussion.Figures 4 and 5

Table
). Results of the 5th-8th experiments.Pothole road distress detection and grouping based on 2D image analysis (sources of all input images are concluded in Table1).