A dataset of full-stack ITS-G5 DSRC communications over licensed and unlicensed bands using a large-scale urban testbed

A dataset of measurements of ETSI ITS-G5 Dedicated Short Range Communications (DSRC) is presented. Our dataset consists of network interactions happening between two On-Board Units (OBUs) and four Road Side Units (RSUs). Each OBU was fitted onto a vehicle driven across the FLOURISH Test Track in Bristol, UK. Each RSU and OBU was equipped with two transceivers operating at different frequencies. During our experiments, each transceiver broadcasts Cooperative Awareness Messages (CAMs) over the licensed DSRC band, and over the unlicensed Industrial, Scientific, and Medical radio (ISM) bands 2.4 GHz-2.5 GHz and 5.725 GHz-5.875 GHz. Each transmitted and received CAM is logged along with its Received Signal Strength Indicator (RSSI) value and accurate positioning information. The Media Access Control layer (MAC) layer Packet Delivery Rates (PDRs) and RSSI values are also empirically calculated across the whole length of the track for any transceiver. The dataset can be used to derive realistic approximations of the PDR as a function of RSSI values under urban environments and for both the DSRC and ISM bands – thus, the dataset is suitable to calibrate (simplified) physical layers of full-stack vehicular simulators where the MAC layer PDR is a direct function of the RSSI. The dataset is not intended to be used for signal propagation modelling. The dataset can be found at https://doi.org/10.5523/bris.eupowp7h3jl525yxhm3521f57, and it has been analyzed in the following paper: I. Mavromatis, A. Tassi, and R. J. Piechocki, “Operating ITS-G5 DSRC over Unlicensed Bands: A City-Scale Performance Evaluation,” IEEE PIMRC 2019. [Online]. Available: https://arxiv.org/abs/1904.00464.


Subject area
Computer Science

More specific subject area Computer Networks and Communications
Type of data The network data exchanged by each RSU and OBU is provided as multiple PCAP traces [1] and CSV files.For each experimental session, each transceiver onboard each RSU and OBU is associated with a PCAP trace and CSV file.For each transceiver pertaining to each RSU and OBU, we also included a processed version of the data saved as MATLAB tables [2].
How data was acquired The Innovate UK-funded FLOURISH Test Track, which includes multiple RSUs and OBUs and spans a length of 5 km including key roads in the centre of Bristol, UK.
Data format Raw data format, as recorded by the RSUs and OBUs pertaining to the FLOURISH Test Track, in the form of PCAP traces and CSV files.The raw data are then filtered in order to remove not required information from the PCAP traces and the CSV files, such as the parts of an IEEE 802.11p frames encapsulating CAMs.Filtered data has been recorded as MATLAB tables.

Parameters for data collection
The data has been recorded by employing two vehicles driving around the whole length of the FLOURISH Test Track.Since the data set has been recorded across multiple days, we relied on different drivers driving for ∼ 4 h per day (namely, ∼ 2 h in the morning and ∼ 2 h in the afternoon).Furthermore, drivers experienced different traffic conditions in the city -thus the time needed to drive the whole length of the track depended on the time of the day.

Description of data collection
Eight data recording sessions of ∼ 2 h each, where one vehicle was driven in a clockwise and the other in an anticlockwise fashion across the whole length of the FLOURISH Test Track.Two transmission frequencies were tested during each data recording session (one per transceiver).

Value of the Data
• Our dataset consists of network interactions recorded among RSUs and OBUs in an urban environment in both a Vehicle-to-Infrastructure (V2I) and Vehicle-to-Vehicle (V2V) fashion.Network interactions account for CAMs exchanged according to the standard ITS-G5 DSRC, operated over the licensed DSRC band and unlicensed ISM bands 2.4 GHz-2.5 GHz and 5.725 GHz-5.875GHz.Along with the dataset we also provided the necessary data processing tools to estimate the MAC layer PDR as a function of RSSI values.The empirical PDR value can be directly integrated into (simplified) physical layers that are widely adopted in full-stack network simulators.It is beyond the scope of our experiments to perform a measurement campaign suitable for deriving accurate propagation models.
• By building upon the empirical PDR values (expressed as a function of the RSSI values), experts from the Wireless Communications and Networks communities can directly assess the impact of operating an ITS-G5 DSRC system over both licensed and unlicensed frequency bands, being respectively the DSRC and the ISM bands.For instance, this will facilitate further research on the provision of safety-critical and non-safety critical automotive services over the DSRC band, the ISM bands, or both at the same time.
• Licensed DSRC band is expected to be (at least partially) allocated to Cellular-V2X (C-V2X) systems.As such, the empirical PDR values (expressed as a function of RSSI values) derived from this dataset can be directly compared against empirical PDR values measured for C-V2X systems, in urban environments.Furthermore, a similar performance investigation can be carried out by comparing the empirical PDR values of C-V2X against ITS-G5 DSRC operated over the unlicensed ISM bands we considered in our experiments.
• Since our dataset has been recorded employing vehicles driving in a real urban environment; it can also be used by cyber-security experts.A potential usage is to establish what the "standard behavior" of a connected vehicle is when observed from the communication perspective -thus making it possible to train anomaly detection systems to identify potential threats [3].For instance, a vehicle can maliciously broadcast CAMs where its location has been altered.However, any RSU receiving malicious CAMs will be able to measure their RSSI values.Since the RSSI value is a function of the environment and the distance between the transmitter and the receiver, an anomaly detection system can verify if a measured RSSI value is comparable with the expected value given the positioning information encapsulated into each CAM.

Data
The raw data associated with each transceiver (namely, PCAP traces [1] and CSV files) installed on each RSU and OBU are organized as follows.During each day of trial, our dataset includes one or two sequences of raw data files for each transceiver.That depends on if the transceiver is installed on either an RSU or an OBU.In particular, raw data files associated with each RSU collates two experimental sessions (namely, morning and afternoon) per-file, while raw data files pertaining to each OBU always refer to a single experimental session.
As for the PCAP traces, each of them records the whole of the network interaction where each transceiver is involved in.In particular, each received IEEE 802.11p frame is recorded into a PCAP trace with its corresponding RSSI value.Our PCAP traces can be accessed using our modified version of the tool Tcpdump [4].Furthermore, for each transceiver, key fields of each of the transmitted and received CAMs are recorded in a tabular format in the same CSV file.Since, the FLOURISH Testing Track logs both the transmitted and received CAMs in the same raw data file, for the sake of data readability and debugging purposes, the labels of each column are replicated in each data row, i.e., each data row follows the format: For the sake of reducing the overall size of the dataset and hence, making it possible to manipulate the dataset with a smaller memory footprint, the raw data has been filtered into MATLAB data files [2].In particular, for each set of CSV file and PCAP trace associated to a given transceiver, we generated the corresponding MATLAB tables for the transmitted/received CAMs, and one MATLAB table per-PCAP trace referring to the received CAMs.Further details about the labels of the columns in the CSV raw data files and MATLAB tables are provided in Table 1.
The MATLAB framework designed for the above data manipulation and filtering can be found online in [5].Our framework provides also tools to investigate the Key Performance Indicators (KPIs) shown in Figs.[3][4][5][6][7][8][9][10].These figures will be further discussed in the next section.
Table 1: Definition of the labels of each column of the CSV raw data files and MATLAB tables in the dataset.The four RSUs providing network coverage across the 5 km length of the FLOURISH Testing Track (see Fig. 1) are fitted outside premises of the University of Bristol.Each site, along with its ID and characteristics, is listed below: • MVB -Merchant Ventures Building, BS8 1UB, UK : RSU installed at a height of ∼8 m opposite to a T-junction.
• DH -Dorothy Hodgkin Building, BS1 3NY, UK : RSU located at ∼12 m from the ground beside a curvy road.
• SU -Students Union Building, BS8 1LN, UK : RSU fitted at ∼25 m from the ground beside a straight stretch of road.
All aforementioned sites host an identical RSU operating two transceivers at the same time.
Since the first transceiver has a maximum transmission power of 25 dBm and the second can reach up to 29 dBm, in our dataset, we regard the first one as the Low Power (LP) transceiver and the second one as the High Power (HP) transceiver.During each experimental session, two vehicles have also been used, with the only exception of the afternoon experimental session held on 15 th February 2019 when only one vehicle had been used.When two vehicles were present, one was driven in a clockwise direction while the other in an anticlockwise fashion.Each vehicle was equipped with an OBU operating the same set of transceivers that we employed in each RSU.Further details about both the hardware and software pertaining to the RSUs and OBUs can be found in [8].During each day of trials, the LP and HP transceivers onboard each RSU and OBU have been operated on 10 MHz-width channels.The center frequencies chosen for each day, the considered transmission power and Modulation and Coding Scheme (MCS) are listed in Table 2. Since the data traffic exchanged between the OBUs and RSUs was entirely made of BTP and CAM messages, we adopted the MCS QPSK-1/2 as imposed by the ITS-G5 DSRC standard [9].
All the RSUs and OBUs communicate using ETSI's ITS-G5 standard [10] and broadcast a CAM every 10 ms.Our setup is built upon the IEEE 802.11pDSRC physical layer as described before.On top of that, a MAC and Logical Link Control (LLC) layers act as the adaptation between the physical and the upper layers.Later, the Basic Transport Protocol (BTP) operating on top of the MAC and the LLC, provides an end-to-end connection-less transport service in an ad-hoc fashion.The BTP is also responsible for multiplexing/demultiplexing packets originating from/to the Facilities layer [10].The exchanged CAMs are generated at the level of the Facilities layer.During our trials, we only referred to CAMs carrying positioning information (namely, position, speed and heading) pertaining to each OBUs and RSUs -obviously, in the case of the RSUs, both speed and heading is set to zero.For our research activities, we extended the standard CAM format described in [6] by adding two extra fields, namely, the Timestamp and the Sequence number.The usage of  these fields has been described in [8], and it allowed us to reconcile the logged data during the data processing phase.All the fields encapsulated within the exchanged CAMs can be seen in Fig. 2(a).
Each transmitted and received CAM has been recorded both in the CSV and PCAP traces.
When a packet appears in the transmission or reception queue of a transceiver, a decision is made whether it is going to be recorded or not.The flowchart describing that, can be seen in Fig. 2(b).For the sake of limiting the size of the dataset, only the CAMs exchanged were recorded in the traces (Fig. 2(b)-i).The rest of the wireless networks interactions within our nodes are discarded.The validity of CAMs within the queues is then checked (Fig. 2(b)-ii) by evaluating the length of the packet, and the expected fields based on the legacy CAM structure.For each valid CAM, the positioning information is extracted (Fig. 2(b)-iii).Subsequently, duplicated packets are then removed from the transmission/reception queue (Fig. 2(b)-iv).Whenever, the CAM is not valid or duplicates, the messaged is not recorded in the log files.Otherwise, the message is logged (Fig. 2(b)-v).
In particular, for what concerns an RSU, raw data recorded during a specific day of trial and pertaining to a specific transceiver, is recorded under the following filename and directory naming convention: ./rawData/ddmmyyyy/RSUs-ddmmyyyy/log_XX_YY.txt and ./rawData/ddmmyyyy/RSUs-ddmmyyyy/log_XX_TCP_YY.pcap where "ddmmyyyy" signifies the date when the experiment took place, "XX" can either be "HP" or "LP" depending on the transceiver under consideration, and "YY" refers to one of the four RSUs (namely, "MVB", "DH", "HW" or "SU").Thus, for instance, files "./rawData/11022019/RSUs-11022019/log_HP_MVB.txt" and "./rawData/11022019/RSUs-11022019/log_HP_TCP_MVB.txt"refer to the HP transceiver of the RSU located on the MVB site during the 11 th February 2019.A similar conversion has been adopted for recording raw data associated with the OBUs, which is defined as follows: Table 3: Naming conventions and per-file content description for the filtered part of our dataset.
The raw data part of our dataset has been filtered by means of our MATLAB framework [5].
The output of the data filtering operations has been made available under the directories "importedData/ddmmyyyy", where files follow the naming convention summarized in Table 3.
Our MATLAB framework [5] can also be used to investigate the following Key Performance Indicators (KPIs).For simplicity, a representative subset of figures was chosen for each KPI: • Heatmaps for the PDR values associated with the HP/LP transceivers across the roads forming the FLOURISH Testing Track.A sample of the sixteen resulting figures is shown in Figs.3-6.A CAM is considered as successfully delivered, when all the encapsulated sensor information can be successfully extracted (as shown in Fig. 2(b)).
• Awareness horizons for both HP and LP transceivers showing the PDR as a function of the distance to a given RSU.A sample of the sixteen resulting figures is shown in Figs.7-8.The successful CAM reception is considered as in the heatmaps KPI.
• RSSI value for both HP and LP transceivers as a function of the distance to a given RSU.A sample of the thirty-two resulting figures is shown in Figs.9-10.
Our framework can be used to generate the entire set of figures for all the above-mentioned KPIs.It can also be further extended to accommodate or more specific KPIs pertaining to the network interactions at specific road junctions at specific time of the day, or the network interactions among moving vehicles.As for the latter, for convenience, our MATLAB framework [5] includes a complementary script (namely, "v2vInteractions/v2vTimings.m") to locate the the timestamps when each vehicle received a CAM from another one, and saves them into MATLAB table.
Our MATLAB framework also gives the reader the option of filtering the entire dataset in order to retrieve all the network interactions between the pair of OBUs considered in our experiments.In particular, when the script "v2vInteractions/v2vInteractions.m" is executed, for each day and transceiver, it produces a table-formatted MAT file defined by the columns summarized in Table 1 where each row corresponds to a transmitted/received CAM.For the fields that exist on both the TX and the RX sides (for e.g. the "GpsLon"), we distinguish their values in the table by adding either "-RX" or "-TX" at the column name1 (for e.g."GpsLon-TX" and "GpsLon-RX").Similarly, the reader can develop a filtering script to retrieve the network interactions pertaining to a given OBU and RSU.The starting point can be any of the existing scripts under the folder "scriptResults" folder that can be modified as follows: 1) Modifying the two main loops of the script, namely, the loops associated with the day and the transmission power.The user can then choose to evaluate the results for a specific day or TX power.Based on that, our framework will load only the network interactions related to the chosen day and TX power.
2) Modifying the internal loops of our script, namely, the ones related with the OBUs and the RSUs.One or more specific pairs of OBU/RSU can be chosen to be processed.
3) Similarly to the "v2vInteractions.m"script, the unique sequence number for the transmitted packets per session, as well as the MAC address from the transmitter can be parsed at first.
4) Using the above information, it can be identified whether a packet is successfully delivered or not, i.e., checking that a packet with the given MAC address and sequence number exists in the list of received packets.

5)
Having this pairwise correlation between transmitted and received packets, a table can be later generated incorporating all the given interactions between a set of devices (as in "v2vInteractions.m").
6) The above steps can be replicated for the requested number of OBU-RSU pairs generating a list of all the desired network interactions.
StationIDID of the transmitter/receiverGenDeltaTimeThe remainder of T imestamp/65546[6] SeqNum Sequence number of the transmitted/received CAM[7] GpsLon Longitude of the current position of the transmitter/receiver GpsLat Latitude of the current position of the transmitter/receiver CamLon Longitude of the position of the transmitter/receiver as encapsulated in a CAM CamLat Latitude of the position of the transmitter/receiver as encapsulated in a CAM SpeedConf Encoding configuration for the speed [6] VehHeading Heading of the transmitter GpsSpeed Speed of the transmitter/receiver CamSpeed Speed of the transmitter as encapsulated in a CAM Timestamp Timestamp at the moment of creation of a CAM CamLength Byte-length of a CAM RSSI The RSSI value of a received CAM Experimental Design, Materials, and Methods

Figure 1 :
Figure 1: The Flourish Testing Track and the locations of the four RSUs.
The flowchart showing the decisions for recording a CAM within the dataset.

Figure 2 :
Figure 2: Considered CAM and flowchart of the data logging process.When a packet is received in the TX or the RX queue, it is checked whether it is CAM message at first.Later, the validity of the packet is specified before being recorded in our dataset.

Figure 3 :
Figure 3: Heatmaps for the PDR associated with the first day of trials (DSRC band).

Figure 4 :
Figure 4: Heatmaps for the PDR associated with the third day of trials (ISM band).

Figure 5 :
Figure 5: Heatmaps for the RSSI values (expressed in dBm) associated with the first day of trials (DSRC band).

Figure 6 :
Figure 6: Heatmaps for the RSSI (expressed in dBm) associated with the fourth day of trials (ISM band).

Figure 7 :
Figure 7: Awareness horizon associated with the first day of trials (DSRC band) and the OBU-00.Each group of four figures shows the awareness horizon associated with the RSU at sites MVB, DH, HW and SU.

Figure 8 :
Figure 8: Awareness horizon associated with the third day of trials (ISM band) and the OBU-00.Each group of four figures shows the awareness horizon associated with the RSU at sites MVB, DH, HW and SU.

Figure 9 :
Figure 9: RSSI measurement as a function of the distance, associated with the first day of trials (DSRC band) and the RSU at the DH site.

Figure 10 :
Figure 10: RSSI measurement as a function of the distance, associated with the fourth day of trials (ISM band) and the RSU at the HW site.
A typical urban setting in Bristol, UK, comprising the following roads: Marlborough Street, Upper Maudlin Street, Park Row, Woodland Road and Queens Road. //arxiv.org/abs/1904.00464.

Table 2 :
The center frequency, transmission power and MCS used throughout the four days of trials.