Data supporting the short-term health effects of temperature and air pollution in Valencia, Spain

The data presented in this article is part in essence of a more extensive dataset aimed at evaluating patterns of change in the temperature–mortality relationship on population health in the city of Valencia, Spain on population health in the city of Valencia, Spain. The complete dataset was used in the framework of the European multi-city project PHASE (Public Health Adaptation Strategies to Extreme weather events) [1]. The data includes daily counts of all-cause mortality, excluding external causes and cardiovascular and respiratory diseases. All-cause mortality is also classified by gender and age groups. Besides temperature, we included other meteorological variables and air pollutants from the PHASE dataset, as well as influenza epidemics. The variable Saharan dust events was also added. All these data were collected from public Governmental data repositories accessible under request. The dataset of this article provides a basis for comparison with similar models for time-series regression, allowing researchers to integrate additional model components without duplication of effort.


a b s t r a c t
The data presented in this article is part in essence of a more extensive dataset aimed at evaluating patterns of change in the temperature-mortality relationship on population health in the city of Valencia, Spain on population health in the city of Valencia, Spain. The complete dataset was used in the framework of the European multi-city project PHASE (Public Health Adaptation Strategies to Extreme weather events) [1] . The data includes daily counts of all-cause mortality, excluding external causes and cardiovascular and respiratory diseases. All-cause mortality is also classified by gender and age groups. Besides temperature, we included other meteorological variables and air pollutants from the PHASE dataset, as well as influenza epidemics. The variable Saharan dust events was also added. All these data were collected from public Governmental data repositories accessible under request. The dataset of this article provides a basis for comparison with similar models for time-series regression, allowing

Value of the Data
• This dataset allows the estimation of short-term health effects of meteorological variables and air pollutants, and assess the impact of environmental policies on population health. • These data can be used for educational purposes to illustrate the use of time-series regression and distributed lag non-linear models in environmental epidemiology studies. • These data provide a basis for comparison with similar models and allow researchers to integrate additional model components without duplication of effort.

Description of Study Area
The city of Valencia, the capital of the Spanish province of the same name, is located on the eastern coast of the Iberian Peninsula and the western part of the Mediterranean Sea (latitude 39 °28 N, longitude 0 °22 W). Spreading on an area of 135 km2, Spain's third-most populated municipality, with 789,744 inhabitants.
According to the Köppen climate classification (Iberian Climate Atlas (2011)), Valencia has a Hot-summer Mediterranean climate (csa category) with mild winters and hot, dry summers.

Meteorological Data
Daily mean, minimum and maximum temperature ( °C) and relative humidity (%) were collected from the National Institute of Meteorology at one city's weather station located in the Meteorological Centre of Valencia [1] . We also identified heatwave days defined as those periods of at least two days with maximum temperature exceeding the 90th percentile of the monthly distribution between May and October, or those periods of at least two days in which the minimum temperature exceeds the 90th percentile of the monthly distribution and the maximum temperature exceeds the median monthly value [2] . All series of meteorological data were complete (no missing values).

Air Pollution Data
Daily particulate matter with aerodynamic diameter ≤ 10 μm (PM10, 24 h average), nitrogen dioxide (NO2, 24 h average), and ozone (O3, maximum 8 h moving average) were collected from the Valencian community's Air Pollution Monitoring Network [1] . A conversion factor was used to obtain estimates of PM10 measurements from total suspended particles (TSP), as PM10 = TSP × 0.58 [3] . The number of missing values were n = 282 (11%), 158 (6.2%) and 88 (3.4%) for PM10, NO2 and ozone, respectively. We also collected those days with Saharan dust intrusions from the Spanish Ministry for the Ecological Transition (Ministerio para la Transición Ecológica, MITECO). It is based on the daily interpretation of air mass back trajectories, synoptic meteorological charts, satellite imagery and daily consultation of dust forecast models [4] . However, the identification of Saharan dust in Spain was established in 2003. Since this year, the series has been completed (no missing values).

Other Data
The dataset also includes an indicator variable for influenza epidemics, complete along the period (no missing values), obtained from the epidemiological services of the city of Valencia [5] , and calendar variables for date, year, month, day of the month, day of the week and public holidays.

Experimental Design, Materials and Methods
The dataset has been used to evaluate the association between temperature and mortality using time-series regression [1] . Here, exposure and outcome data are available at regular time intervals (i.e. daily temperature and mortality counts) [6] . The time-series design has been widely used in environmental epidemiology to investigate short-term associations between environmental exposures such as air pollution, weather variables or pollen, and health outcomes such as mortality or disease-specific hospital admissions [7] . In this illustration, we analysed the association between daily mean temperature and mortality. Fig. 1 shows the seasonal patterns for daily temperature and mortality over time.
Data is analysed using quasi-Poisson regression with distributed lag non-linear models (DLNM) [8] . This class of models can describe complex non-linear and lagged dependencies by combining two functions that define the conventional exposure-response association and the additional lag-response association, respectively. The lag-response association represents the temporal change in risk after a specific exposure, and it estimates the distribution of immediate and delayed effects cumulated across the lag period.
Specifically, we modelled the temperature-mortality relationship using a natural cubic spline, with three internal knots at the 10th, 75th, and 90th percentiles of the temperature distribution, and the lag-response relationship using a natural cubic spline with three internal knots equally spaced on the logarithmic scale. The lag period was extended to 21 to capture the long delay in the effects of cold. The model also included a natural cubic spline of time with 10 degrees of freedom per year to control seasonal variations and long-term trends and indicator variables for days of the week and public holidays. These modelling choices are based on the previously extensive work using an overlapping dataset and have been thoroughly tested by sensitivity analyses [9][10][11] .
Statistical analysis was performed in R software, version 4.1.1, using the library DLNM [12] . The R code to analyse the data is available in the Appendix as Supplementary data.
The specification of a DLNM implies a complex parametrization of the exposure series relying on a set of coefficients which straightforward estimation (common regression) but with no straightforward interpretation. The library DLNM, helps the user by providing him with the interpretation of these coefficients in terms of a surface of estimated effects along the two dimensions (object crosspred ) and robust graphical capabilities ( plot.crosspred ), allowing for interesting summaries, as illustrated in Figs. 2 and 3 . Fig. 2 shows the relative risk (RR) of mortality along with daily ambient temperature and lag dimension. The RR is interpreted as the ratio between the risk of a specific temperature at a specific lag compared with the risk at the temperature at which the risk of mortality is minimum (MMT) (9), located at 25 °C. The blue lines show the RR across the lag dimension for the cold and heat effects defined at the 2.5th and 97.5th percentile of the temperature distribution, located at 7 °C and 28 °C, respectively. In contrast, the red lines indicate the RR at lags 0 and 14 days along with temperature distribution (i.e., the same day when the temperature exposure occurs and one week after the exposure). Fig. 3 shows the cumulative exposure-response

Ethics Statements
The authors declare that there are no ethical issues with data and methods used in this research and that these data are neither involved with human subjects, animal experiments nor obtained from social media platforms.

Declaration of Competing Interest
None.