COVID-19 in Europe: Dataset at a sub-national level

The COVID-19 pandemic has hit humanity, straining health care systems, economies, and governments worldwide. In one of the responses to the pandemic, a big global effort has been mounted to collect, analyze, and make data publicly available. However, many of the existing COVID-19 public datasets are (i) aggregated at country level, and (ii) tend not to bring the COVID-19-specific data coupled with socio-demographic, economic, public policy, health, pollution and environmental factors, all of which may be key elements to study the transmission of the SARS-CoV-2 and its severity. To aid the evaluation of the determinants and impact of the COVID-19 pandemic at a large scale, we present here a new dataset with socio-demographic, economic, public policy, health, pollution and environmental factors for the European Union at the small regions level (NUTS3). The database is freely accessible at http://dx.doi.org/10.17632/2ghxnrkr9p.4. This dataset can help to monitor the COVID-19 mortality and infections at the sub-national level and enable analysis that may inform future policymaking.


a b s t r a c t
The COVID-19 pandemic has hit humanity, straining health care systems, economies, and governments worldwide. In one of the responses to the pandemic, a big global effort has been mounted to collect, analyze, and make data publicly available. However, many of the existing COVID-19 public datasets are (i) aggregated at country level, and (ii) tend not to bring the COVID-19-specific data coupled with socio-demographic, economic, public policy, health, pollution and environmental factors, all of which may be key elements to study the transmission of the SARS-CoV-2 and its severity. To aid the evaluation of the determinants and impact of the COVID-19 pandemic at a large scale, we present here a new dataset with socio-demographic, economic, public policy, health, pollution and environmental factors for the European Union at the small regions level (NUTS3). The database is freely accessible at http://dx.doi.org/10.17632/2ghxnrkr9p.4 . This dataset can help to monitor the COVID-19 mortality and infections at the sub-national level and enable analysis that may inform future policymaking.

Value of the Data
• This dataset is a useful input to improve the understanding of the inter-relationships between COVID-19 mortality and infections with socio-demographic, economic, public policy, health, air pollution and environmental factors at the finest possible level of spatial (NUTS2-3) and temporal (daily, weekly, monthly) resolutions in fighting the pandemic across Europe.
• The beneficiaries of these data are the general public, policy-makers, organizations, researchers who deal with the COVID-19 spread from local (sub-country) to large scale (continental). These data can be used: (i) to conduct a cross-comparison between European countries either at NUTS2 or at NUTS3 level, (ii) to inform European citizens on the COVID-19 spread in Europe, and (iii) to support researchers in future socio-epidemiological research. • It can be combined with survey or census health data for a wide range of applications. The dataset contributes to a better scientific understanding of the COVID-19 outbreak, to facilitate the process of searching for science-driven solutions.

Data Description
In Table 1 , we present several key variables of this dataset: the health data regarding the COVID-19 cases, mortality, and tests performed at sub-national level (NUTS3), collected until August 31st 2020. Furthermore, we include in Table 2 a set of variables capturing non-COVID-19-related health aspects that might predispose people to getting infected and/or might increase the risk of complications when infected with SARS-Cov-2, i.e. chronic obstructive pulmonary disease (COPD), diabetes and smoking. In addition, we add the mortality rates for respiratory and cardiovascular causes and diabetes. This dataset also includes physician density and (where available) the number of beds in intensive care and/or reanimation units available in hospitals at NUTS2 level. Note: The sources of these COVID-19 variables are given in the database [1] .  Note: The sources of these socioeconomic variables are given in the database [1] . Note: The spatial resolution refers to the resolution at which the dataset was downloaded. Our dataset contains the same variable aggregated at NUTS3 level and the data sources of these variables are given in the supplementary data are available dataset [1] . Table 3 describes the socio-demographic and economic data available at NUTS3 level for all European countries (source: Eurostat). This data comprises population density, the population growth, and the surface area of the region. In addition, we provide the population split into five age groups, as well as the percentage of the population of aged people above 60 years old and the percentage of females and males in the population. We also include variables capturing the number of households and dwellings at NUTS3 level. The economic data refers to the unemployment rate at NUTS2 level and the nightlight intensity, for which we have collected its average from the year 2016 at NUTS3. Table 4 includes the environmental variables. For these variables, we have collected the annual average over a period of 16 years that were averaged and aggregated at the NUTS3 level.  Table 5 refers to the variables tracking the public policies put in place by authorities to mitigate the spread of the virus (i.e. lockdown measures). We have calculated the number of days since the first case reported until the first day of lockdown as well as the duration of lockdown in each country. Furthermore, we add a variable describing the lockdown severity in each country. All tables include three more variables: COUNTRY, CODE_COUNTRY, NUTS3, CODE_NUTS3. COUNTRY represents the name of the country and NUTS3 the sub-regions, the CODE_COUNTRY is the letter code of each country (e.g., LUX for Luxembourg), and the NUTS3_CODE is the classification code for each sub-region NUTS3. However, in some open sources for COVID-19, the data was available only at NUTS2 level; thus, we include this data as well as at NUTS2.
In Fig. 1 , we present the relationship of a sample of variables of the dataset with COVID-19 mortality and positive cases. This figure is given as an example to illustrate the potential use and usefulness of this dataset.  2. Workflow of the process of data collection and processing (adapted from [7] ).

Experimental Design, Material and Methods
Due to the outbreak of the novel coronavirus pandemic at the beginning of 2020, several countries around the world developed dashboards [4 , 5] and open data sources [6] that provide open access to COVID-19 data in real time and/or over time (i.e., daily, weekly, monthly). These open sources have the scope of informing the population of the status of the pandemic and help researchers in understanding the impact of the virus on our surroundings. However, generally COVID-19 dashboards provide aggregated data at the NUTS1 level and rarely at the sub-national levels (those from governmental agencies). To overcome this limitation we have collected COVID-19 data from multiple sources at the lower administrative possible scale (NUTS2-3) and compiled them in one place. In order to build this dataset we followed the workflow described in Fig. 2 [7] . This workflow is composed by several processes: data collection, processing/cleaning, analysis, and visualization. The resulting dataset is ready-to-use by a large community of researchers in a wide range of applications [1] . It contains 35 socio-demographic, economic, public  policy, health, air pollution and environmental variables that can help researchers, practitioners, authorities, and those interested in this subject.
To visualize all the collected data at NUTS3 level (with both static and dynamic component), a web-based dashboard application was developed [2] . This application allows automatic processing of spatial Raster and Vector datasets, to get relevant statistics (i.e., mean, minimum, maximum, and standard deviation). This application also shows interactively the number of COVID-19 mortality and positive cases, simultaneously. The user is able to set the region of interest (i.e., country), the NUTS level (i.e., NUTS1-2-3), type of pollutant (i.e., NO 2 ), the year and the desired statistics. Then, a choropleth map is generated, accompanied by COVID-19 cases evolution chart of the selected area. As an example, Fig. 3 shows the distribution of NO 2 across the entire Europe during March 2020 at the NUTS 3 level [8] . In addition, the dashboard generates charts showing temporary changes of COVID-19 mortality and positive cases, such as the example in Fig. 4 that shows the daily variation of COVID-19 positive cases in Madrid NUTS3.

Ethics Statement
None.

Declaration of Competing Interest
The authors declare that they have no competing financial interests or personal relationships, which could have influenced the work reported in this article.