A dataset of urban traffic flow for 13 Romanian cities amid lockdown and after ease of COVID19 related restrictions

This dataset comprises street-level traces of traffic flow as reported by Here Maps™ for 13 cities of Romania from 15th. of May 2020 and until 5th. of June 2020. This covers the time two days before lifting of the mobility restrictions imposed by the COVID19 nation-wide State of Emergency and until four days after the second wave of relaxation, announced for 1st. of June 2020. Data were sampled at a 15-min interval, consistent with the Here API update time. The data are annotated with relevant political decisions and religious events which might influence the traffic flow. Considering the relative scarcity of real-life traffic data, one can use this data set for micro-simulation during development and validation of Intelligent Transportation Solutions (ITS) algorithms while another facet would be in the area of social and political sciences when discussing the effectiveness and impact of statewide restriction during the COVID19 pandemic.


Specifications
Transportation Specific subject area Traffic flow demand data Type of data Table  Figure CVS data files How data were acquired Software application (available in the dataset, as part of the article), developed using Python language, using Here API for gathering raw data regarding live traffic and a set of custom developed scripts for cleaning the data (detailed below) and plotting visual representations of the instantaneous traffic flow.
Hand annotation was used for providing supplementary data and information for specific events regarding the national policy against COVID19 and also for description of the cities Data format Raw Preprocessed Annotated Parameters for data collection The datasets covers the period from 15th. of May 2020 and until 5th. of June 2020, with a sampling period of 15 minutes, using the standard Here Maps Traffic API Description of data collection There are 3 software scripts used: one is responsible for job automation and runs the grabbing script at a 15 minutes interval, which subsequently launches the API requests for each of the cities and writes the XML files with raw data on drive. Later the third script iterates over the XML files and extracts the road information data and traffic flow data, discarding the geometrical properties of the road. Data

Value of the Data
• There is a scarcity of data available regarding traffic flow and road use demand. Even if larger cities in highly developed nations have near real-time data from ITS systems, in other cases those data are practically impossible to gather with good quality and at decent costs. This data set covers a broad range of demands and loads, form almost empty roads (during COVID19 restrictions) and up to full traffic (after second set of relaxation rules); • This dataset is directly useful for practitioners in the field of ITS systems design, for assessing transportation capacity and developing algorithms and policies for congestion prediction and mitigation and also for sociologists doing research regarding the impact of COVID19 restrictions and the reaction of the public to the restrictions and gradual lifting of the restrictions. • The main usage of the data, in the field of ITS, is to provide real-life data from a variety of Romanian cities (ranging from small to large in population, area and road network size) useful for training machine learning algorithms for prediction of congestion and for simulation of the impact of traffic incidents over the traffic flow. Practitioners in the field of social sciences can benefit from the data in the analysis of specific reactions of the population to COVID19 restrictions. • Descriptive statistics could be used for simple analysis of data and detection of anomalies in the traffic flow which in turn can be used for inferring hidden events such as an incident on a minor street which feeds to a major artery. • Machine learning methods and tools can be used for identifying signature-features of traffic flow which predict congestion, with high spatial resolution.
• Qualitative analysis of the impact of COVID19 transportation restrictions can be made, with ramification of both the economic sector and epidemiological one

Data Description
In the field of Transportation there is a distinct subfield of Intelligent Transportation Systems (ITS) characterized by the usage of methods and tools of computation, mathematics and control theory for deriving means of maximizing the usability of the existing infrastructure (transportation capacity and quality) or the decision to develop new infrastructure [1] . One of the current important topics in this field is related to congestion prediction [ 2 , 3 ], while a lot of the approaches rely on the means and methods of machine learning to leverage the value of the past (historic data) in order to predict the future (when congestion will arise) [4] . Another subject of interest, directly connected to the problem of congestion is the one related to the traffic incident management [ 2 , 5 ]. A lot of the rules, policies and the systems are designed and work well in stable nominal conditions (when all the participants obey the traffic laws and everything works as intended). Analysis done over the root cause of major gridlocks showed that the complex dynamics involved with road traffic allows minor incidents (i.e. a car not giving way when changing lanes) to become major sources of trouble spanning dozens of minutes a few blocks (hundreds of meters) radius [6] .
The resolution of both problems can be addressed in a virtual environment using what is called traffic micro-simulation [7] . When fed high quality data and with a good description of the existing infrastructure, current software tools for microsimulation are capable of mirroring actual traffic conditions over a time-span ranging from dozens of minutes to hours [2] . Topology of the road infrastructure and the placement of road signs and traffic signaling plans are core components of the simulation scenarios and can be obtained either from local authorities or from open data ( [ 4 , 8 ]) and an initial leg-work (for collecting data regarding signaling plans). The missing component is represented by the actual conditions on the road, which can be obtained by the existing infrastructure (car counting loops and equipment) -which is costly to deploy and provide low spatial resolution -or by deploying human observers for making assessmentswhich is costly and provides low temporal resolution [ 2 , 4 ].
Over the last decades, with the development of mobile applications targeted at assisting drivers on the road, a new set of sources has appeared in the form of traces form mobile devices of the drivers (or passengers), but still most of them are not providing means of accessing historical data [9] . Major players in the field provide current data inside their applications and most of the time historic data are provided in an aggregate manner, which suffice for the average user, but are not of good enough quality for the practitioners in the field of ITS [ 10 , 11 ].
We selected Here Maps ( TM ) [10] for gathering data because they provide data access via API, allowing scripted automation, and the collection of the data in an automated manner is allowed by their Terms and Conditions. Data provided by the API is always for current conditions but can be inferred by the Here Maps engine when the actual number of participants to the traffic is low, expressed by Confidence Level (see below) [10] . We have chosen a sampling frequency of 4 times per hour (once every 15 min) based on empirical observations regarding when data changes and limitations in the software license we used. A smaller than 5 min sampling period is not useful because the Here Traffic API does not update the data that often.
Each of the cities was defined through a rectangular bounding box with geo-coordinates described in Table 1 .
The time span covered by this dataset ranges from 15th. of May 2020 and until 5th. of June, during the mobility restrictions imposed by Romanian authorities for containing the COVID19 pandemic and provides the opportunity for capturing a diverse and broad spectrum of scenarios in terms of traffic demand data. The cities for which we provide traffic data, also represent a diverse set in terms of demographics, urban development and geographical placement in Romania. A detailed description is provided in Table 1 . Table 1 List of cities comprising the dataset, the coordinated of the corresponding bonding-boxes and some remarks over the inclusions of these cities in the set. The dataset consists of three parts: For a better interpretation of the flow data, stored in the .csv file, in Table 2 we provide a detailed description of the fields extracted from Here Maps Traffic API, with comments over the semantics and calculation of the fields. A more indepth documentation is to be found in [ 10 , 12 ].  For a more depth and complete analysis, taking into account the context of the data (the transportation and traffic restrictions imposed on the national level by the SARS-CoV-2/COVID19 pandemic) we present in Table 3 the most important events with impact over the traffic flow. These data can be augmented by the user of the dataset with supplementary data (such as weather), based on their own avenue of investigation.

Experimental Design, Materials and Methods
first command line arguments the basePath holding the XML files produced by multiple runs of grab.py. The second command-line argument is the label of the city (city name) and the third argument is represented by the base folder path for the output (where the post-processed CSV is to be stored).
For each of the XML files found into the basePath , the script is extracting the metadata encoded into the file-name (city, date and time) and iterates over the < fi> items extracting the relevant information for traffic flow. The structure of the data is described in the Data Description section. For defensive programming reasons checking of None type is done and default values are stored whenever the actual data are corrupted or missing (i.e. "DE" field representing the street/road name is missing and is replaced by "N/A"). For each folder (set of records about a specific city) the parse.py produces a concatenated .csv file with all the records available, one per line. These files, for each of the cities, represent the core element of this dataset and are provided distinctly per city, or as an archive with all the cities and all the records, in the data repository. The data regarding the shapes of the roads are discarded in the CSV files but are available in the raw XML files, stored under the ./raw path in the dataset.

Ethics Statement
This work did not include any human subjects nor animal experiments.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.