ERA5-HEAT: A global gridded historical dataset of human thermal comfort indices from climate reanalysis

The mean radiant temperature (MRT) and the Universal Thermal Climate Index (UTCI) are widely used as human biometeorology parameters to assess


| INTRODUCTION
In recent years, the occurrence of extreme meteorological events with detrimental impacts on public health has uncovered the close linkages between human well-being and the environment. Heatwaves and cold spells, for instance, are nowadays well-known hazards associated with increased mortality and morbidity, especially in vulnerable population groups (Seltenrich, 2015). Heat-and cold-related fatalities occur as ultimate consequences of human body exposure to extreme thermal conditions. Every day, a range of environmental factors such as air temperature, humidity, wind and radiation (thermal environment) challenges the ability of the human body to maintain its core temperature within the range required for optimal comfort, performance and health (McGregor and Vanos, 2018). Exposure to extreme thermal environments can lead to the failure of this thermoregulatory mechanism. If it is too hot, for instance, the human body may produce or absorb more heat than it dissipates thus increasing its core temperature to values that can potentially cause discomfort, heat-related illnesses and ultimately death (Koppe et al., 2004).
In 2009, the Universal Thermal Climate Index (UTCI) was created as a multivariate parameter describing the synergistic heat exchanges between the thermal environment and the human body, namely its energy budget, physiology and clothing . The UTCI is based on the UTCI-Fiala model, which combines an advanced dynamic multi-node thermoregulation model of the human body with a state-of-the-art temperature-adaptive clothing insulation model for outdoor climates Havenith et al., 2012). For any combination of air temperature, wind, radiation and humidity, the UTCI is defined as the air temperature of a reference condition that would cause in the human body the same UTCI-Fiala model response as the real condition . The UTCI is classified into 10 thermal stress categories. Each stress category, defined by a specific range of UTCI values, corresponds to a specific set of human physiological responses to the thermal environment (Błażejczyk et al., 2013).
The UTCI is valid at all spatial and temporal scales from the micro to the macro and suitable for a variety of applications Staiger et al., 2019). It has been extensively used, for instance, to define the thermal environment and its variability under current and future climates (Błażejczyk et al., 2010;Bröde, Krüger, et al., 2012b;Coccolo et al., 2016;Kjellstrom et al., 2017;Di Napoli et al., 2018). It has been applied to investigate the negative impacts of heat and cold exposure on human comfort and health, also within vulnerable groups (Urban and Kyselý, 2014;Burkart et al., 2016;Krzyżewska et al., 2017;Romaszko et al., 2017Romaszko et al., , 2019Błażejczyk et al., 2018;Di Napoli et al., 2018Skutecki et al., 2019). It has also been implemented as a forecasting tool for weather conditions associated to thermal stress, such as heatwaves (Novák, 2013;Pappenberger et al., 2015;Heusinkveld et al., 2017;Leroyer et al., 2018;Vitolo et al., 2019). Further applications include tourism industry, outdoor recreational activities and occupational health (Ge et al., 2017;Nassiri et al., 2017;Honjo et al., 2018).
Despite the potential of the UTCI as a human biometeorology index for the assessment of human health-environment linkages, its calculation has mostly relied on data recorded at ground-based meteorological stations. There are four variables required for the computation of the UTCI: 2 m air temperature, 2 m dew point temperature (or relative humidity), wind speed at 10 m above ground level and mean radiant temperature (MRT). MRT is a critical physical quantity representing how human beings experience radiation and is being addressed in a variety of disciplines, from urban planning to public health and climate change, as a major component of thermal-related comfort Thorsson et al., 2014;Thorsson et al., 2017;Guo et al., 2020). Its estimation, however, is generally considered problematic (Kántor and Unger, 2011). In the past years, the development of specific modelling software (see e.g. Lindberg et al., 2008;Matzarakis et al., 2010) has eased the calculation of MRT from environmental variables, such as solar/thermal radiation and cloud cover, observed at meteorological stations or over limited areas. Recently, however, the computation of MRT as a spatially gridded quantity over the globe has been demonstrated by means of climate reanalyses (Di Napoli et al., 2020). A climate reanalysis combines model data with observations from across the world to provide a globally complete and consistent description of the Earth's climate and its evolution in recent decades. Differently from station data that are location specific, may suffer from temporal gaps and may provide a limited set of environmental variables, a climate reanalysis has the advantage of temporal and spatial completeness for multiple atmospheric and land-surface parameters (Parker, 2016). The importance of computing MRT from a climate reanalysis is twofold. First, it confirms that climate information can be used as an input data source to derive thermal stress-related parameters at the global scale and across historical periods (see e.g. Buzan et al., 2015). Second, because of its definition, it includes the effects of radiation which have been often ignored in the assessment of heat and cold stress (Buzan and Huber, 2020). This paper presents the first reanalysis dataset ever produced for two biometeorology-relevant indices: the UTCI and MRT. The dataset, called ERA5-HEAT (Human thEr-mAl comforT), uses environmental data from ERA5, the latest global climate reanalysis created by the European Centre for Medium-Range Weather Forecasts (ECMWF), to generate worldwide historical gridded maps of UTCI and MRT spanning the most recent decades. The automated routine implemented to generate ERA5-HEAT is described in detail in Section 2. The two reanalyses are compared and discussed | 3 DI NAPOLI et AL.
against data from ground-based meteorological stations in Section 3. Details on the data file format and how to access ERA5-HEAT from the Climate Data Store (CDS, developed as part of the Copernicus Climate Change Service C3S implemented by ECMWF, https://cds.clima te.coper nicus.eu/) are reported in Section 4. Finally, Section 5 illustrates potential applications and limitations of the dataset.

METHODS
The workflow of the automated routine implemented to create the dataset from climate reanalysis data is illustrated in Figure 1.

| Input file preparation
ERA5-HEAT is derived using climate variables from ECMWF ERA5 reanalysis (Hersbach et al., 2020). ERA5 incorporates vast amounts of quality-controlled historical observations into global estimates by means of advanced modelling and data assimilation systems. The result is a comprehensive record of the climate observed in recent decades. ERA5 provides consistent time series of multiple climate variables. Each time series is a sequence of spatial grids at 0.25° resolution (approximately 31 km at the equator) and hourly intervals. The following ERA5 climate variables are used in the computation of the dataset: 2 m air temperature (T a ), 2 m dew point temperature (T d ), wind speed at 10 m above ground level (va), solar radiation and thermal radiation at the Earth's surface. For each variable, an automated routine retrieves 12 hourly data grids at every model's initialization time (06:00 and 18:00 Coordinated Universal Time, UTC) and merge them into a stack. Stacks are in GRIB (GRIdded Binary) format, a WMO standard for the storage and distribution of gridded data. Their spatial extent covers all the globe (land and sea) north of 60°S latitude. The same routine also performs two additional tasks. One task consists in calculating relative humidity (RH) from T a and T d via the Clausius-Clapeyron relation (Alduchov and Eskridge, 1996). The second task computes the cosine of solar zenith angle, that is, the angle between the zenith and the centre of the Sun's disc. The cosine of solar zenith angle influences the amount of solar radiation reaching a standing person. Its calculation is performed as described by Di Napoli et al. (2020). These two tasks generate two gridded stacks-one for RH and one for the cosine of solar zenith angle. These, together with the stacks of T a , va, solar radiation and thermal radiation, are the input files to the calculation of the dataset.

| Calculation of indices
When all necessary input files are generated, the automatic routine triggers the calculation of MRT and UTCI reanalyses.

| MRT
MRT reanalysis is computed via a three-step algorithm. In the first step the surface projection factor f p is calculated. The surface projection factor represents the portion of body surface exposed to direct solar radiation. For a rotationally symmetric standing or walking person it is computed from the solar elevation angle γ as (Jendritzky et al. 1990).
where γ is the complementary angle to the solar zenith angle (γ in degrees). The second step computes the direct, diffuse and reflected part of both thermal and solar radiation, each composed of two components. For the thermal radiation: the downwelling thermal component from the atmosphere (L dn surf ) and the upwelling thermal component from the ground (L up surf ). For the solar radiation: a direct component from the sun (I * ) and a diffuse component, with the latter equal to the sum of the isotropic diffuse solar radiation flux (S dn,diffuse surf ) and the surface-reflected solar radiation flux (S up surf ). These five radiation components are computed from the solar and thermal data stored in ERA5. The details of this computation and of radiation sources are described in Di Napoli et al. (2020). In the third step, the surface projection factor and the five radiation components are input into the following formula to calculate MRT (Staiger and Matzarakis, 2010).
(1) f p = 0.308 cos (0.998 − 2 ∕50000) where σ is the Stefan-Boltzmann constant (5.67 × 10 −8 W/ m 2 K 4 ), α ir is the absorption coefficient of the body surface area irradiated by solar radiation (standard value 0.7), ε p is the emissivity of the clothed human body (standard value 0.97) and f a is an angle factor. The factor f a is set to 0.5 which corresponds to considering the surroundings of a human body as made of a lower hemisphere (ground) and an upper hemisphere (sky) only. This assumption is valid for most applications at the macro-scale, that is beyond urban level (Kántor and Unger, 2011), and it is consistent with the ERA5 spatial resolution of 31 km. The formula (2) generates a gridded stack for MRT where the layers correspond in number and time step to the layers from input files.

| UTCI
Once the MRT stack is ready, the computation of the UTCI starts. This is performed by an algorithm that reads in the gridded stacks retrieved for T a and va, and those computed for RH and MRT, and inputs them into an operational procedure which generates a UTCI gridded stack as output (Bröde, Fiala, et al., 2012a). The operational procedure computes the offset between the UTCI and T a via a six-order polynomial equation in the four parameters T a , MRT, va and RH. From the computational point of view, running the operational procedure is faster than solving the original Fiala multi-node model, thus providing an optimal tool for operational numerical weather forecasts and climate reanalysis applications. The procedure approximates the UTCI-Fiala model within an average root mean squared error of 1.1°C (Bröde, Fiala, et al., 2012a). The approximation is valid for input parameters values within specific ranges, namely • 2 m air temperature: −50°C ≤ T a ≤ + 50°C • Mean radiant temperature: −30°C ≤ MRT − T a ≤ + 70°C • Relative humidity: RH ≤ 100% equivalent to pa ≤ 5 kPa where pa is water vapour pressure in kPa.
These ranges are applied in the algorithm. Furthermore, the wind speed is capped at 17 m/s as higher speeds have been observed to generate extremely low UTCI values (Novák, 2013;Pappenberger et al., 2015). Should input parameters not satisfy the ranges above, such as in hurricane-like conditions (which are usually characterized by wind speeds above the 17 m/s threshold), UTCI values are considered undefined and set to NaN.

| Output file preparation
The automated routine generates two output files: a gridded MRT stack and a gridded UTCI stack. Each stack is composed by 12 hourly layers with time steps starting at each model's initialization time (06:00 and 18:00 UTC). Stacks are routinely archived on an internal ECMWF database in GRIB. They are then transformed into NetCDF (Network Common Data Form) in compliance with the format standard adopted by CDS. To ease transmission and retrieval from the database stacks are packaged in files of 24 hourly layers covering the full day period (00-23UTC). Layers are defined on a regular grid in the World Geodetic System 1984 (WGS 84). Each grid is made of 601 rows and 1,440 columns. MRT and UTCI data are stored in Kelvin (K). They can be converted into degree Celsius (°C) by subtracting 273.15.
The MRT and UTCI reanalyses so generated currently contain over 40 years of data starting 1 January 1979. Two reanalysis streams are produced. One stream (consolidated) uses ERA5 data as inputs, and it is updated monthly on CDS as ERA5 data become available, that is, 2-3 months behind real time. Another stream (intermediate) uses ERA5T, a preliminary no-quality-assured dataset for ERA5. MRT and UTCI reanalyses from ERA5T are updated on the CDS on a daily basis within five days from real time.
The climatology of MRT and UTCI for the period 1981-2010 from ERA5-HEAT (consolidated stream) is shown in Figure 2 as daily mean value over the climatological period for a month, here July. Highest MRT and UTCI values are found in almost all the Northern Hemisphere and the tropical regions. Regional variations, mainly related to land-surface type and topography, can be recognized. For instance, MRT values are generally higher in desert areas, where the low cloudiness and high albedo increase the portion of the solar radiation incident on the ground and reflected by it. Differences can also be observed for the UTCI with values in mountainous areas such as the Andes and the Tibetan plateau are generally lower than their surroundings.

| COMPARISON AGAINST OBSERVATIONS
ERA5-HEAT was compared against observations from 177 meteorological stations of the surface synoptic observations (SYNOP) network. SYNOP stations were selected outside forest areas and on the condition that their elevation is similar to the elevation represented in the model (<5 m difference). Observations of 2 m air temperature, 2 m dew point temperature, wind speed at 10 m and total cloud cover were retrieved for 2018 and input to RayMan Pro software (Version 3.0 Beta; Matzarakis et al., 2010) to calculate observed MRT and UTCI at each SYNOP station. Observed data (o i ) were compared against reanalysis data (m i ) previously extracted at the grid cells where stations are located. Two metrics were used: and the root mean square error Metrics were calculated at each station and mapped out as shown in Figure 3. R-squared is above 0.6 at the majority of stations with a mean value equal to 0.80 ± 0.13, which indicates a general correlation between reanalysis data and corresponding observed data (Figure 3a). RMSE distribution shows that MRT and UTCI reanalysis data deviate, on average, from observed values by 8.6 ± 2.5°C and 5.2 ± 2.5°C, respectively (Figure 3b). This is in agreement with the uncertainty found in previous studies for the same indices (Weihs et al., 2012;Schreier et al., 2013). It is worth noting that metrics are better for the UTCI than they are for MRT. The

|
DI NAPOLI et AL.
accuracy of MRT is related to the correct representation of radiation in numerical weather prediction models, which is usually challenging and prone to errors (Di Napoli et al., 2020). When the operational procedure is applied, the contribution of MRT to the UTCI is weighted by numerical factors (see Equation 2) so is its uncertainty.  Figure 4 and reports a description of the dataset and its metadata. Users can navigate through the dataset in a dynamic and easy-to-use interface. Whenever a parameter-variable (i.e. index), year or month-is selected, the interface is automatically updated to reflect data availability. Once all selections are made, users can submit their request to download the data. They can also visualize the corresponding Python script and use the CDS-API (Climate Data Store Application Program Interface) service to download data in a more programmatic way.
To facilitate transfer across the network UTCI and MRT reanalyses data are provided as compressed files. Each file contains as many stacks as the number of days selected. Each stack contains 24 time steps at 0.25 resolution of the requested human biometeorology index. It is in NetCDF format and can be displayed, manipulated and analysed with many tools, that is Panoply, R and Python. Files follow the naming convention 'ECMWF_index_YYYYMMDD_version_dataset.nc' where index is the acronym of the selected biometeorology index (utci or mrt), YYYYMMDD is the selected date in a yearmonth-date format, version is the current version of the dataset (v1.0) and dataset is to distinguish between the consolidated (con) and the intermediate (int) stream.

| DATASET USE AND REUSE
ERA5-HEAT is meant for studies in climate research, climate adaptation and other services, including the monitoring and comparison of current thermal conditions with those of the past, or the association of heat and cold extremes to weather patterns. The human-based definition of the UTCI and MRT makes the dataset relevant also for public health applications ranging from epidemiology, with the identification of health impacts (i.e. mortality) in relationship to thermally hazardous events, to tourism. The possibility of aggregating grid cells to regional, national, continental or user-defined boundaries can support policymakers in the assessment, for instance, of advisory thresholds and corresponding precautionary planning for heat and cold extremes (see e.g. Di Vitolo et al., 2019).
As described in Section 3, the dataset shows biases due to inherent uncertainties in the numerical weather prediction model used to generate the reanalysis. Even at the smallest grid cell, a reanalysis is a collection of values averaged over an area whereas station measurements are values collected at one specific point which leads to a representative error. The difference between reanalysis and measurements increases as the grid cell size increases, the terrain becomes complex (i.e. mountains) and its surface changes roughness and type. Uncertainties are also related to the ability of the model to represent parameters such as wind speed and radiation (Pappenberger et al., 2015;Di Napoli et al., 2020). Another aspect worth considering is the use of standard values in the definition of ERA5-HEAT biometeorology indices, that is, the emissivity of the clothed human body is set to 0.97 in the MRT calculation (formula 2). Users are recommended to consider these caveats when utilizing the dataset.
ERA5-HEAT was released in January 2020 as version 1.0, and it is automatically updated as input data from ERA5 and ERA5T become available. The dataset will be upgraded to a new version in case new biometeorology indices are added or ERA5 undergoes major updates. Registered users will be promptly notified in either case.

| SUMMARY
This paper describes the first-ever global historical dataset of two human biometeorology indices, namely MRT and the UTCI. The dataset, called ERA5-HEAT (Human thErmAl comforT), is computed using multiple climate variables from ERA5, a comprehensive quality-controlled climate reanalysis from the European Centre for Medium-Range Weather Forecasts. ERA5-HEAT provides hourly MRT and UTCI records on regular latitude-longitude grids at 0.25°× 0.25° resolution. It currently spans from 1979 to present, and it will be extended in time as updates of ERA5 are made available. It consists of two streams, a consolidated and an intermediate one, that are released at 2/3 months and 5 days behind real time, respectively. Data are released under open-source licence and are free to download. They are aimed at a wide range of end users, from scientists to policymakers, interested in the linkages between environment and human health at any spatial and temporal scale.