Escalating environmental inequalities in larger European regions: A data mining

Environmental inequality has been the focus of European scientists and policymakers in the past decades. This database is prepared to provide researchers with a multiscale, multivariate database on environmental inequality across different scales, i.e. the so-called NUTS region levels. To do so, the database offers the population-weighted average and Gini coefficient at four European NUTS region levels (NUTS 0, NUTS 1, NUTS 2, NUTS 3) over exposure to air pollution (NO2, O3, PM10, PM2.5), summer land surface temperature (LST), and Tree and Non-tree vegetated surfaces. The dataset can be used to compare and map the magnitude of inequalities related to each of the environmental hazards/services. Furthermore, it is helpful to identify the levels of scales with the highest and lowest levels of environmental inequality. To this end, this manuscript provides histograms and maps to present the potential for the use of the database.


Value of the Data
• The database offers the possibility of comparing different types of environmental inequalities -air pollution, land cover, and surface temperature -at different levels of scales, i.e. the so-called European NUTS0, NUTS1, NUTS2, NUTS3.In this respect, the database offers a novel possibility for multivariate, multiscale environmental inequality studies at the continental scale.• The database enables human geographers, environmental scientists and regional studies experts to examine to what extent environmental inequalities are associated with geographical factors such as demography, urbanity and spatial distribution of urban-rural areas, climate types, and land cover.The database also allows for studying the associations between environmental inequalities and other kinds of inequality, namely income, gender and ethnic inequalities, across the European regions.• The database can be further used for health-related studies and spatial planners to identify the regions with high exposure to hazards such as air pollution and surface temperature and those deprived in terms of access to green spaces.This can pave the way for further spatial interventions -e.g., land use or transport modification and provision of health services.

Data analysis
The database comprises measures of environmental inequality in exposure to air pollution and land surface temperature in the EU regions.Environmental inequality is measured using the population-weighted gini coefficient.To calculate the population-weighted Gini coefficient in the region g ∈ [1,G], the 1 km x 1 km raster data, hereinafter called cells , on population and environmental hazards/services (air pollution, land surface temperature, presence of trees, or presence of non-tree vegetated surfaces) are used.In the first step, the cells are ranked based on their population, with r ∈ [1,R] showing the rank of a cell.P gr represents the population of cell rank r in region g , and W gr is the weight of cell rank r (see Eq. ( 1) ).
W gr = P gr R 1 P gr (1) Subsequently, cell weights are used to calculate the population-weighted average of environmental hazard/service exposure in region g (see Eq. ( 2) ): W gr H gr (2) where H gr shows the magnitude of hazard in the cell ranked r in region g .(Note that the ranks are based on cell population.)Adapted from Lerman and Yitzhaki [1] , the population-weighted Gini coefficient, i.e. the measurmenet of hazard-exposure inequality, is calculated as follows ( Eq. ( 3) ): where G g represents the population-weighted Gini coefficient of the environmental factor in grid g , ˆ F gr showing the weighted cumulative distribution of the environmental factor ( Eq. ( 4) ), and F representing the average of ˆ F gr : ˆ (4)
The population-weighted Inequalities of each environmental hazrad/service are calculated in four scales, the so-called NUTS regions scales ( Fig. 2 ).The GIS file with the boundaries of regions (.shp) is saved as NUTS_RG_20M_2021_3035.shp[9] .In detail, the database provides insights on variatiosn of inequality across the levels of scale ( Fig. 5 ).

Experimental Design, Materials and Methods
The workflow to generate the database is as follows.The raw raster data are downloaded and collected from different sources.The resolution of all the raster files is set at 1 km x 1 km, and pixels are "snapped" at the population raster to have the exact spatial match.When the raster values needed to be aggregated from 250 m x 250 m to 1 km to 1 km, the mean value of the pixels is used.To do so, the "resample" function of ArcGIS Pro 2.9 is used.Subsequently, the "sample" function in ArcGIS pro produces a data file (.dbf) with the values of all overlapping pixels.Each row of the data file includes (x,y) coordinates of the centroid of the pixel in question, the identification codes of the European regions the pixel is located in, and the population and environmental hazards/services values.The data file is imported in Python and, using the weighted Gini function of the "inequalipy", the population-weighted Gini coefficient and the population-weighted average of the environmental hazards/services are obtained.
The database can be used to study the impact of unequal exposure to heat and different landcover types on energy consumption (similar to [ 10 , 11 , 12 ]), energy poverty (similar to [ 13 , 14 ]), environmental inequality among socioeconomic groups (similar to [ 15 , 16 ]), and distribution of electric vehicles to reduce air pollution inequality (by approaches similar to [ 17 , 18 , 19 ]).

Fig. 1 .
Fig. 1.The data on population and environmental hazard/services.

Fig. 2 .
Fig. 2. Inequality analyses are conducted in four scales, i.e. the so-called NUTS region levels.

Fig. 4 .
Fig. 4. Comparison of inequalities of the environmental hazards/services at different regional levels.