RDD2020: An annotated image dataset for automatic road damage detection using deep learning

This data article provides details for the RDD2020 dataset comprising 26,336 road images from India, Japan, and the Czech Republic with more than 31,000 instances of road damage. The dataset captures four types of road damage: longitudinal cracks, transverse cracks, alligator cracks, and potholes; and is intended for developing deep learning-based methods to detect and classify road damage automatically. The images in RDD2020 were captured using vehicle-mounted smartphones, making it useful for municipalities and road agencies to develop methods for low-cost monitoring of road pavement surface conditions. Further, the machine learning researchers can use the datasets for benchmarking the performance of different algorithms for solving other problems of the same type (image classification, object detection, etc.). RDD2020 is freely available at [1]. The latest updates and the corresponding articles related to the dataset can be accessed at [2].


a b s t r a c t
This data article provides details for the RDD2020 dataset comprising 26,336 road images from India, Japan, and the Czech Republic with more than 31,0 0 0 instances of road damage. The dataset captures four types of road damage: longitudinal cracks, transverse cracks, alligator cracks, and potholes; and is intended for developing deep learningbased methods to detect and classify road damage automatically. The images in RDD2020 were captured using vehiclemounted smartphones, making it useful for municipalities and road agencies to develop methods for low-cost monitoring of road pavement surface conditions. Further, the machine learning researchers can use the datasets for benchmarking the performance of different algorithms for solving other problems of the same type (image classification, object detection, etc.

Value of the Data
• The RDD2020 data provides the basis for smartphone-based automatic road damage detection and is useful for municipalities and road agencies for low-cost monitoring of road conditions. • RDD2020 data is valuable for developing new deep convolutional neural network architectures or modifying the existing architectures to improve the performance of the network. Researchers can use the data to train, validate, and test the algorithms for detecting road damages in multiple countries. • Currently, the data contains the road images from three countries (India, Japan, and the Czech Republic). Researchers or pavement engineers may utilize the data for other countries by following the procedure given in the research article [4] . • At present, the data supports the detection and classification of road cracks (longitudinal, transverse, and alligator) and potholes. It can be further extended to cover other damage categories. • Machine learning researchers can use the datasets for benchmarking the performance of different algorithms for solving other problems of the same type (image classification, object detection, etc.). • RDD2020 data can be used to organize data challenges. For instance, the Global Road Damage Detection Challenge (GRDDC'2020), organized as an IEEE Big Data Cup in 2020, utilized the dataset RDD2020 to evaluate the road damage detection models proposed by participants [5 , 6] .

Experimental Design, Materials and Methods
The data collection involves capturing road images using a Smartphone mounted on a moving vehicle. A smartphone application was designed, and road images were captured at the rate of one image per second to photograph images while traveling on the road without leakage or duplication when the average speed of the vehicle is approximately 40 km/h (or 10 m/s). The installation setup of the smartphone in the car is same as used by the authors in [4] .
Firstly, 9053 road images were captured from Japan in 2018 [7]. The aforementioned Japanese dataset was augmented in 2019 using the Generative Adversarial Network [8] and in 2020 using images from India and Czech Republic [4] . The collected images were annotated in PASCAL VOC format [3] using the labelImg tool. The annotations include marking the road damage label and location in the image. The damage categories for annotating Japanese data are defined using the Japanese Road Maintenance and Repair Guidebook 2013. Accordingly, eight damage categories have been considered for annotating images collected from Japan [7] .
However, the road standards for evaluations of Road Marking deterioration such as Crosswalk or White Line Blur differ significantly across different countries. Thus, these categories were ex-     cluded from the annotations for images collected from India and Czech so that generalized models can be trained applicable for monitoring road conditions in more than one country. The annotation pipeline is shown in Fig. 7 . A summary of several state-of-the-art deep-learning models trained using the RDD2020 dataset for global road damage detection is presented in [5] .