LiRA-CD: An open-source dataset for road condition modelling and research

This data article presents the details of the Live Road Assessment Custom Dataset (LiRA-CD), an open-source dataset for road condition modelling and research. The dataset captures GPS trajectories of a fleet of electric vehicles and their time-series data from 50 different sensors collected on 230 km of highway and urban roads in Copenhagen, Denmark. Additionally, road condition measurements were collected by standard survey vehicles, which serve as high-quality reference data. The in-vehicle measurements were collected onboard with an Internet-of-Things (IoT) device, then periodically transmitted before being saved in a database. Researchers can use the dataset for prediction modelling related to standard road condition parameters such as surface friction and texture, road roughness, road damages, and energy consumption. Furthermore, researchers and pavement engineers can use the dataset as a template for future studies and projects, benchmarking the performance of different algorithms and solving problems of the same type. LiRA-CD is freely available and can be accessed at https://doi.org/10.11583/DTU.c.6659909.


a b s t r a c t
This data article presents the details of the Live Road Assessment Custom Dataset (LiRA-CD), an open-source dataset for road condition modelling and research. The dataset captures GPS trajectories of a fleet of electric vehicles and their timeseries data from 50 different sensors collected on 230 km of highway and urban roads in Copenhagen, Denmark. Additionally, road condition measurements were collected by standard survey vehicles, which serve as high-quality reference data. The in-vehicle measurements were collected onboard with an Internet-of-Things (IoT) device, then periodically transmitted before being saved in a database. Researchers can use the dataset for prediction modelling related to standard road condition parameters such as surface friction and texture, road roughness, road damages, and energy consumption. Furthermore, researchers and pavement engineers can use the dataset as a template for future studies and projects, benchmarking the performance of different algorithms and solving problems of the same type. LiRA-CD is freely available and can be accessed at https://doi.org/10. 11583/DTU.c.6659909 .

Value of the Data
• The LiRA-CD provides the basis for development of road condition prediction models suitable for wide-area implementation, i.e., models for prediction of surface friction and texture, road roughness, road damages, and energy consumption (see e.g., [5][6][7][8][9][10] ), and could be useful for road operators and owners, such as road agencies and municipalities. • The data is suitable for developing new interpretation schemes (e.g., utilizing physical models) and machine-learning algorithms. Researchers can use data to train, validate and test algorithms for estimating road conditions for a variety of road types and conditions. • The data include high-quality reference measurements from standard surveying vehicles.
Hence, it presents a unique opportunity for researchers to link In-vehicle sensor data to standard road condition parameters. • The reference data may be complemented with images and image datasets (see e.g., [11] ).
Although this effort requires additional collection and processing of image data, it will increase the quality of training data and enable the utilization of vision-based methods (see e.g., [6] ). • Researchers and pavement engineers can use the dataset as a template for future studies and projects. • The dataset can be used for benchmarking the performance of different algorithms, solving problems of the same type.

Objective
This data article aims to provide researchers and pavement engineers with a vehicle dataset that is open-source and suitable for developing large-scale road condition monitoring methods. Specifically, this dataset links in-vehicle sensor data from regular cars to standard road condition parameters utilized by public road agencies (i.e., parameters used as input for planning and managing road networks). Moreover, the article provides visual aids to familiarize readers with the data in the repository and detailed descriptions of the data collected in the LiRA project, including storage formats, file types, and relevant metadata, making the data usable by researchers and engineers worldwide.

Data Description
The LiRA-CD contains 1796 km of road data from highway and urban roads in the Copenhagen area collected during the LiRA project [ 1 , 2 ]. It includes more than 50 in-vehicle sensor signals from Renault Zoe electric cars operated by GreenMobility (GM) and 92 road condition parameters collected with standard vehicles operated by the Danish Road Directorate (DRD). A Graphical outline of the data collection and data infrastructure is shown in Fig. 1 . The LiRA-CD is a collection of three selected datasets from the LiRA project: #3 Data subset for road condition modelling -platoon friction test. This subset includes data from an additional measurement campaign involving a car driving behind the VIAFRIK vehicle. The data are divided into several subsets, i.e., friction data from the VIAFRIK vehicle ('..fric_custom…csv') and car sensor data from the AutoPi and the vehicle CAN bus ('task_7505…txt').
An overview of the LiRA-CD directory and filenames are shown Fig. 2 ; in the filenames the terms '..cph1.. ', '..cph6.. ', '..m3.. ', and '..m13..' refers to the road name and, the terms '..hh..' and '..vh..' refers to the right hand side ('..hh..') and left hand side ('..vh..') of the road (respectively). The car sensor data is stored in a structured format (.hdf5), an overview of the file structure is depicted in Fig. 3 . The Figure shows that the data are structured in five levels; (i) at the top level is route name (i.e., 'CPH1', 'CPH6', 'M3' or 'M13'), (ii) device/vehicle (in this case on GM for GreenMobility cars), (iii) trip number or task id, (iv) pass number, and (v) GPS and sensor signals. The different signal types and names are described in Table 1 .   The raw car data was translated on-board utilizing a standard On-Board Diagnostics (OBD) protocol [3] and the CanZE application [4] . Processing the data on-board involved converting bits to decimals and then translating decimal data to actual physical units. For some of the car sensors in LiRA-CD, further translation of the sensor signal, s , is required, i.e.: where s LiRA −CD is the sensor signal stored in LiRA-CD, b * and r * are the offset and resolution values (respectively) given in [4] , and b and r, are the corrected offset and resolution values (respectively) found in the LiRA project. Relevant offset and resolution values are given in Table 2 . The reference data (.csv) is stored in a flat format; each column represents a road condition parameter, GPS information or timestamp, while each row represents a new record in time or space. The different data fields from the reference data, i.e., data collected with the P79 Profilometer (P79), the ViaFriction measurement device (VIAFRIK) and the Automatic Road analyser (ARAN), are described in Table 3 , Table. 4 , Table. 5 , Table 6 (respectively).
The raw data has not been aligned or structured, hence several pre-processing steps may be required by users before further analysis can be performed. Recommended pre-processing steps include: (i) re-orientation of the AutoPi accelerometer axes to align with the principal car axes; (ii) map-matching -where the GPS routes are corrected using the reference measurements; (iii) interpolation -where GPS coordinates are assigned to all sensor readings; and (iv) structuring of data -where sensor readings are re-sampled to ensure consistency between car-and reference data across all sensors.  Right wheelpath rut depth    Percentage of area with severe bleeding right wheelpath (bleeding index > 2.00). The detection utilize smoothness and pixilation differences between the area with bleeding and the surrounding area. LRUT

Experimental Design, Materials and Methods
The data collection involved capturing car sensor signals while driving and standard road condition parameters for the same roads. The choice of roads was guided by the desire to cover a wide variety of pavement conditions with respect to age, type, and distress severity. The dataset has been used to develop several models to estimate road conditions. In [5] , a datadriven method is proposed for estimating the tire-grip potential of asphalt roads utilizing transverse accelerations and VIAFRIK measurements. In [6] , a machine-learning model for evaluating pavement friction based on in-vehicle sensor data and images collected with a GoPro camera is proposed. The Authors utilized longitudinal and lateral accelerations, speed, yaw rate, wheel RPM, engine torque, steering angle, and reference measurements from the VIAFRIK vehicle. In [7] , vertical accelerations, speed measurements, and GPS location are utilized to estimate the International Roughness Index (IRI). In [8] , vertical acceleration readings and a simple dynamic model are utilized for road profile inversion. In both cases, the P79 measurements were used as a reference. In [9] and [10] new road energy efficiency monitoring concepts are proposed based on vehicle speed, longitudinal acceleration, wheel torque, and traction power measurements. Overview maps of the different routes in LiRA-CD are depicted in Fig. 4 .
In-vehicle sensor data were collected with an AutoPi Telematics Unit (3rd generation) connected to the GM vehicle's controller area network (CAN). This unit includes a single-board Raspberry Pi computer, with added GPS and accelerometer modules -as described in [2] . The AutoPi units were physically fixed to the frame in the middle of the GM car close to the front axle. The installation of the devices is depicted in Fig. 5 .
The standard road conditions, i.e., the reference dataset, were collected from vehicles operated by the DRD. The surveying included: (i) P79 Profilometer -a van equipped with a beam hosting 25-point lasers that measure longitudinal and transverse profiles. Two of these lasers operate at an exceptionally high acquisition frequency for measuring surface texture depth. The P79 also measures cross-fall slope, vertical curvature, IRI, mean profile depth, rutting, and offers a 3D view of the entire pavement surface; (ii) ARAN90 0 0 -a multi-functional road scanning vehicle that quantifies road defects and distresses using cameras and a Laser Cracking Measure-  The car sensor data were collected during the summer and the autumn of 2020 and the spring and the summer of 2021 utilizing eight different cars. Designated drivers were instructed to drive in the right lane at a constant speed of 90 km/h on the highways and 50 km/h on the urban roads or follow the speed limit (e.g., in case this was lower than the instructed speed) or the traffic (e.g., in the case of congestion). An overview of the road metadata and test instructions during measurement campaigns is shown in Table 7 .
An overview of the timing of measurement campaigns for the car data collection, surface and weather conditions are shown in Table 8 ; the heading 'TaskId' refers to a specific test (and could include several passes and directions over the same road section), 'Unit no.' refers to the different AutoPi units (cars) used, and 'Filename' refers to the filename in LiRA-CD.
An overview of the timing of measurement campaigns for the reference data, surface and weather conditions are shown in Table 9 ; the header 'Filename' refers to the filename in LiRA-CD.

Ethics Statement
This work did not include work involved with human subjects, animal experiments or data collected from social media platforms.