Mortality Associated with Ambient PM2.5 Exposure in India: Results from the Million Death Study

Background: Studies on the extent to which long-term exposure to ambient particulate matter (PM) with aerodynamic diameter ≤2.5μm (PM2.5) contributes to adult mortality in India are few, despite over 99% of Indians being exposed to levels that the World Health Organization (WHO) considers unsafe. Objective: We conducted a retrospective cohort study within the Million Death Study (MDS) to provide the first-ever quantification of national mortality from exposure to PM2.5 in India from 1999 to 2014. Methods: We calculated relative risks (RRs) by linking a total of ten 3-y intervals of satellite-based estimated PM2.5 exposure to deaths 3 to 5 y later in over 7,400 small villages or urban blocks covering a total population of 6.8 million. We applied using a model-based geostatistical model, adjusted for individual age, sex, and year of death; smoking prevalence, rural/urban residency, area-level female illiteracy, languages, and spatial clustering and unit-level variation. Results: PM2.5 exposure levels increased from 1999 to 2014, particularly in central and eastern India. Among 212,573 deaths at ages 15–69 y, after spatial adjustment, we found a significant RR of 1.09 [95% credible interval (CI): 1.04, 1.14] for stroke deaths per 10-μg/m3 increase in PM2.5 exposure, but no significant excess for deaths from chronic respiratory disease and ischemic heart disease (IHD), all nonaccidental causes, and total mortality (after excluding stroke). Spatial adjustment attenuated the RRs for chronic respiratory disease and IHD but raised those for stroke. The RRs were consistent in various sensitivity analyses with spatial adjustment, including stratifying by levels of solid fuel exposure, by sex, and by age group, addition of climatic variables, and in supplementary case–control analyses using injury deaths as controls. Discussion: Direct epidemiological measurements, despite inherent limitations, yielded associations between mortality and long-term PM2.5 inconsistent with those reported in earlier models used by the WHO to derive estimates of PM2.5 mortality in India. The modest RRs in our study are consistent with near or null mortality effects. They suggest suitable caution in estimating deaths from PM2.5 exposure based on MDS results and even more caution in extrapolating model-based associations of risk derived mostly from high-income countries to India. https://doi.org/10.1289/EHP9538

. List of satellite-derived PM2.5 data sources and ground-level PM2.5 data collected for validation. Column ID shows the identifier used to refer to the data set elsewhere in the text. SE stands for satellite estimate and GD for ground data.                      Table S1 lists these satellite-derived data sources together with a few ground-level ones. The latter are used below to further validate the satellite-derived PM 2.5 data used in our study. Here, SE stands for satellite estimate and GD for ground data. The five satellite-derived PM 2.5 data sources are shown in Figure S3 for the year 2010. Generally speaking, these data sets show similar patterns with higher PM 2.5 values in the north and lower ones in the south of India. In 2010, SE3 showed the highest mean (34.9 µg/m 3 ) whereas SE4 showed the lowest (20.2 µg/m 3 ). The 2010 mean for SE1, our paper's main estimated PM 2.5 exposure data set, was 24.6 µg/m 3 .

Comparison between satellite-derived PM 2.5 estimates and direct groundlevel PM 2.5 readings
We collected ground-level PM 2.5 data sources from various locations and times and compared these to SE1 data source. All data were linked by location and time. For GD2 and GD3, there was a three to five-year lag time linkage with SE1 2012 given these two ground data sources were from the period 2015-17. Similarly, GD4 represents an average for the period 2012-14 and this average was linked to SE1 2012. Figure S1 shows the locations of ground observations used to evaluate satellite-derived PM 2.5 exposure estimates. The locations of ground-level data sources collected by van Donkelaar et al. (2015) 1 (green circles) and those collected in this study (triangles and stars) are shown. Some areas in the north and south of India with high population density (year 2010) are not represented.
In Figure S2a and b, SE1 is compared with ground-level readings for January (peak month for PM 2.5 pollution) and annual averages for the year 2010. SE1 correlated well (positively) with ground data for both the month of January (r=0.88, p<0.001) and the annual average (r=0.88, p<0.001). Similarly, all other satellite-derived data sources correlated well with January ground-level PM 2.5 (lowest: SE3 and SE5, r=0.86) and with ground-level PM 2.5 annual averages (lowest: SE3, r=0.84; highest: SE4, r=0.9). SE1 showed the highest correlation with January ground-level data and the second highest correlation with the ground-level annual average for 2010.
We also compared SE1 PM 2.5 data with older ground-level data from various sources, spanning the period 2004-14. All data were also linked by location and time. Figure S2c compares average PM 2.5 for six Indian cities for the year 2007 (GD5) with SE1. Also, included in the plot is a single point that represents average PM 2.5 for the period 2012-14 for rural Tamil Nadu (GD4). Figure S2d compares ground-level readings for the year 2005 (GD6) with SE1 for the same year. The correlation coefficient, R, in Figure S2d is relatively low compared to that of Figure S2a, b and c, but this might be due to the fact that PM 2.5 estimates were converted from PM 10 ones. To explore regional differences between ground-level data and SE1 estimates, we classified all ground data (GD1 to GD6) north of 23 degrees latitude into a northern region and the rest into a southern region. The correlation between ground data and SE1 estimates was higher for the northern region (R = 0.72) than for the southern region (R= 0.48). Overall, SE1 shows substantial positive correlation with older direct PM 2.5 ground-level data. Figure S12 shows time-series plots of ground-level readings of PM 2.5 for six selected stations (two in New Delhi, one each in Chennai, Hyderabad, Kolkata and Solapur). Overall, PM 2.5 shows seasonality with peaks in January and lows towards the last quarter of the year. In contrast, satellite-derived data only provide a single mean (or median) annual value. Thus, satellite-based data (including SE1 used in our study), are missing important variation that may help better model the relationship between PM 2.5 and mortality.
In summary, there is a handful of satellite-based PM 2.5 data sets available in the public domain. In general terms, these satellite-based data sets are similar to each other. Compared to ground level PM 2.5 readings, satellite-based data provide more conservative estimates but do show robust positive correlations with ground station data. We chose to use van Donkelaar et al. (2015)'s 1 data set in our study for two key reasons. First, the data showed strong correlations with January and annual ground-level PM 2.5 data and second, it overlapped in terms of time with the Million Death Study (MDS) mortality outcomes.

Implementation of the INLA methodology
The model-based Generalized Linear Geostatistical model is fit using Bayesian inference, and posterior distributions are computed using the INLA methodology. 7 As there are over 7000 sampling units, and 7000 different spatial locations , model fitting is computationally intensive and an approximation to the spatial covariance matrix is necessary. The Markov random field approximation is used here, 8 and implemented in the geostatsp package 9 for the R statistical programming language, 10 which in turns calls the inla software. 7 A full description of the methodology can be found in Brown (2016). 11 Although there are other methods available for fitting models of this type, the task is complex and computationally demanding and there is currently no rival to the Bayesian methodology in the inla software for fitting spatial models with non-Normal responses. 7 Bayesian inference requires specifying prior distributions, and spatial models are particularly susceptible to producing spurious results from ill-chosen priors for the spatial parameters and . Here the penalized complexity prior distributions from Simpson et al. (2017) are used, 12 priors which discourage a spatial effect (wanting ( ) flat and close to zero) unless the data indicate a clear preference for a spatial model. Following Simpson et al. (2017), 12 the standard deviations and have exponential priors, as does the scale parameter 1/ . The prior median for , the distance beyond which the correlation between two locations is under 10%, is 500 or roughly one sixth of the distance across India. The prior medians of and are both (2), a value at which a one standard deviation increase in ( ) or doubles mortality risk.
The non-parametric effects for PM 2.5 are second order random walks, or lines where the slope continually changes by a modest amount. 13 As the model contains an intercept parameter, the random walks must be constrained for the model to be identifiable. The PM 2.5 effect is constrained with ( ) = 0 when ≤ 2, a level of exposure below which Burnett et al. (2018) regards as having no adverse health effects. 14 Table S1: List of satellite-derived PM 2.5 data sources and ground-level PM 2.5 data collected for validation. Column ID shows the identifier used to refer to the data set elsewhere in the text. SE stands for satellite estimate and GD for ground data.

ID
Data set description

SE1
Data were downloaded from the Atmospheric Composition Analysis Group's (ACAG) website (https://sites.wustl.edu/acag/datasets/surface-pm2-5/). This is version V3.01, released in 2015 by van Donkelaar et al. (2015). 1 This version did not include ground data in the modeling but did use ground data for validation. The spatial resolution is 0.1 • × 0.1 • at the Equator. Annual estimates (based on three-year medians as produced by van Donkelaar et al. (2015)) for the period 1998-2012 are available. This data set is the PM 2.5 exposure estimates used in our study.

SE2
Data were also downloaded from the ACAG's website. This version (V4.GL.02.NOGWR) was released in 2016 and incorporates ground data in its PM 2.5 modeling. 2 The spatial resolution is 0.1 • × 0.1 • at the Equator. Annual estimates for 1998-2012 are available.

SE3
Data were also downloaded from ACAG's website. This version (V4.GL.02.GWR) was released in 2016. 2 This data set incorporates ground data and geographic weighted regression (GWR) in its PM 2.5 modeling. The spatial resolution is 0.1 • ×0.1 • at the Equator. Annual estimates for the period 1998-2012 are available.

SE4
Data were produced by Brauer et al. (2015). 4 The data are released in CSV format and were download from http://pubs.acs.org/doi/abs/10.1021/acs.est.5b03709. The CSV file was converted to raster format prior to analysis. These PM 2.

SE5
Data were also produced by Brauer et al. (2015). 4 The data are released in CSV format and were download from the same source as SE4. The CSV file was converted into raster format prior to analysis. These PM 2.

GD1
This data set is from India's Central Pollution Control Board (CPCB). 6 The data were downloaded from http://www.cpcb.gov.in/. The PM 2.5 data represent direct measurements from ground stations located throughout the country. We geocoded the location of these stations using city or land mark names. The earliest data used here were from 2012.

GD2
Data are from the United States government. US consulates have ground stations in-situ that collect daily ambient air pollution data. Data are publicly available and were downloaded from https://in.usembassy.gov/air-quality-data-information-4/. We geocoded the location of the cities (India: Delhi, Ahmedabad, Kolkata, Mumbai, Hyderabad, Bangalore, and Chennai; Bangladesh: Dhaka) where the consulates are located. These data are for the period 2015-17.
Table S1 (continued): List of satellite-derived PM 2.5 data sources and ground-level PM 2.5 data collected for validation. Column ID shows the identifier used to refer to the data set elsewhere in the text. SE stands for satellite estimate and GD for ground data.

ID Data set description GD3
These data are from China's Air Quality Online Monitoring and Analysis Platform. Data were downloaded from https://www.aqistudy.cn/historydata/. We downloaded data for two Chinese cities (Shigatse and Ngari) located close to India. These data are daily PM 2.5 measures from the period 2015-17 and were geocoded using city names.

GD4
These data are from Balakrishnan et al. (2015) 5 and represent PM 2.5 ground readings from 29 sites in rural Tamil Nadu. Data were collected from several grounds stations in or near villages. We geocoded these stations using village names. These data are yearly averages for the period 2012-14.

GD5
These data represent direct urban PM 2.5 24-hour readings from India's CPCB. 6 The data are from 135 locations belonging to six Indian cities. The actual ground station locations were geocoded. These data are yearly averages from the year 2007.

GD6
These data represent urban PM 2.5 estimates from India's CPCB. 6 The estimates were derived from PM 10 ground readings by using a PM 2.5 /PM 10 ratio of 0.42. We geocoded the city names found in the data. These data are yearly averages for the period 2004-11. Note: Death counts do not sum to total counts due to missing information on age, rural/urban setting, or geocoding for sampling units.     Note: Deaths were due to asthma and chronic obstructive pulmonary disease (chronic respiratory), ischaemic heart disease (IHD), and stroke. IQR: Inter-quartile range. PM 2.5 : Particulate matter smaller than 2.5µm in diameter.     Note: Sampling units shown in the map are from the Million Death Study, 2004-13. State administrative boundaries came from the 2011 Indian census. We used a simplified version of the state boundaries which was created in-house. Map was created using R software 10 version 4.1.0 with the "sp", "raster", "mapmisc", and "RColorBrewer" packages.  Note: Death were restricted to those that could be linked to PM 2.5 values lagged 2-4 years, 3-5 years, and 4-6 years for comparison purpose. Models adjusted for death year, urban/rural residency, smoking prevalence in the sampling unit, sub-district-level percentage of female illiteracy and dominant language groups (from 2011 Indian census), and included smoothly varying spatial random effects and independent unit-level random effects. Box sizes are weighted by the sample sizes. Credible intervals (CI) of 95% were reported for the relative risks (RR).  Note: Dominant language groups and percentage of female illiteracy and were derived from the 2011 Indian census. State administrative boundaries came from the 2011 Indian census. We used a simplified version of the state boundaries which was created in-house. Maps were created using R software 10 version 4.1.0 with the "sp", "raster", "mapmisc", and "RColorBrewer" packages.  Note: Main models adjusted for death year, urban/rural residency, smoking prevalence in the sampling unit, sub-district-level percentage of female illiteracy and dominant language groups (from 2011 Indian census), and included smoothly varying spatial random effects and independent unit-level random effects. Box sizes are weighted by the sample sizes. Credible intervals (CI) of 95% are reported for the relative risks (RR).

Figure S11
. Relative risks of long-term exposure to PM 2.5 stratified by households with or without male smokers for female deaths.
Note: Models adjusted for death year, urban/rural residency, smoking prevalence in the sampling unit, sub-district-level percentage of female illiteracy and dominant language groups (from 2011 Indian census), and included smoothly varying spatial random effects and independent unit-level random effects. Box sizes are weighted by the sample sizes. Credible intervals (CI) of 95% are reported for the relative risks (RR).