Long-term satellite-based estimates of air quality and premature mortality in Equatorial Asia through deep neural networks

Atmospheric pollution of particulate matter (PM) is a major concern for its deleterious effects on human health and climate. Over the past 50 years, Equatorial Asia has experienced significant land-use change and urbanization, which have contributed to more intense and frequent extreme PM concentrations associated with increased anthropogenic and wildfire emissions. Recent advances in remote sensing instrumentation and retrieval protocols have enabled effective monitoring of PM from space in near real time with almost global coverage. In this study, long-term satellite-based observations of key chemical and physical parameters, integrated with ground-based concentrations of PM with aerodynamic diameter <10 µ m (PM 10 ) measured at 52 stations, are used to develop a machine learning approach for continuous PM 10 monitoring. As PM atmospheric pollution, like most of environmental processes, is highly non-linear and influenced by numerous variables, machine learning approaches seem very suitable. Herein, deep neural networks are developed and tested over different temporal scales and used to map PM 10 over Equatorial Asia during the period 2005–2015. The proposed model captures both PM 10 seasonal variability and the occurrence of extreme episodes, which are found to impact air quality on the regional scale. The modeled annual mean fine PM (PM 2.5 ) concentrations are used to estimate long-term premature mortality. This study indicates that the region is experiencing increasing mortality rates related to long-term exposure to PM 2.5 , with 150 000 (108 000–193 000) premature deaths in 2005 and 204 000 (145 000–260 000) in 2015. This is mostly due to air quality worsening and high population growth in urban areas, although the contribution of years of intense wildfires results as well significant.


Introduction
Atmospheric particles (i.e. aerosols) have been shown to be responsible for increased morbidity and premature mortality (Cohen et al 2017, Burnett et al 2018, particularly in developing countries, where extreme aerosol episodes are more frequent and intense than in high-income countries (WHO 2018). In recent decades, many countries worldwide have experienced rapid development, with fast economic growth, * Corresponding Author: P Crippa. industrialization and urbanization (Muntean 2018, United Nations 2018, which have led to increased primary emissions and enhanced secondary formation of aerosols in the atmosphere.
To improve understanding of atmospheric pollution impacts and inform policymakers on effective mitigation strategies, there is a strong need to assess aerosols' properties at high spatio-temporal resolution. This includes information on the distribution of particulate matter (PM) with aerodynamic diameter smaller than 10 µm and 2.5 µm (PM 10 and PM 2.5 , respectively). Large uncertainties in the estimates of PM environmental and societal impacts arise from the incomplete understanding of the key controls dictating the spatio-temporal variability of degraded air quality conditions and extreme aerosol events. Multiple factors, including climate variability, meteorological conditions and land-use change, potentially play different and changing roles in increased occurrence of extreme pollution episodes globally (Jacob and Winner 2009, Fiore et al 2012, Hong et al 2019, Turnock et al 2020. Data from monitoring networks are frequently used to produce localized assessments, but their spatio-temporal coverage degrades dramatically in developing countries, where air quality is generally worse and in need of monitoring. Advances in satellite-borne instrumentation and data retrieval protocols now allow identification of pollution sources and detailed monitoring of atmospheric properties and thus of air quality conditions with almost global and near real-time coverage. Multiple mathematical approaches have been proposed to infer ground-level PM concentrations from satellite retrievals of column integrated aerosol optical depth (AOD). Simple statistical linear models (LMs) have demonstrated high potential for global mapping (Donkelaar et al 2010, Reid et al 2012 and more sophisticated proxies have been also successfully proposed to predict ultrafine particle concentrations (Kulmala et al 2011, Crippa et al 2013 and to account for aerosol dynamics (Sullivan et al 2016, Crippa et al 2017. However, LMs present limited skills in predicting PM 10 spatio-temporal distribution and cannot capture mechanisms involved in aerosol dynamics, chemistry and transport processes which are characterized by a strong non-linearity and interactions between variables (Seinfield and Pandis 2016). To overcome major limitations of prior studies, machine learning approaches represent a unique opportunity given their high predictive skills (Grgurić et al 2014, Chen et al 2018, Di et al 2019, Shtein et al 2019 and low computational expense compared to the widely used Earth System Models (Huntingford et al 2019, Reichstein et al 2019. Specifically, artificial neural networks have shown to be one of the most effective and low-demanding tools in predicting the spatio-temporal distribution of both gaseous pollutants and atmospheric PM (Feng et al 2019, Lautenschlager et al 2020, especially over areas with sparse monitoring sites (Alimissis et al, 2018).
The present study develops a novel and general Deep Learning approach for spatially and temporally continuous air pollution mapping based on a suite of satellite retrievals. As previous studies mainly rely on AOD, meteorological and land-use data to infer ground-level PM (Ma et al 2014, Wei et al 2019, this machine learning application aims to account for chemical processes connected to primary and secondary aerosol formation/evolution and for different emission sources by exclusively relying on satellite-retrieved data. Moreover, the presented application targets an entire decade (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), with the purpose to quantify and capture spatiotemporal patterns and long-term trends in PM 10 and PM 2.5 concentrations and their epidemiological impacts. Specifically, this work aims to (i) investigate the predictive skills of a set of satellite-based proxies in reproducing ground-level PM 10 through deep neural networks (DNNs), (ii) explore the predicted seasonal and inter-annual changes in air quality over the period 2005-2015 in response to variable atmospheric composition, land use and emissions, and (iii) analyze the chronic health impacts due to PM 2.5 exposure. The focus is on Equatorial Asia, a tropical region particularly sensitive to changes in climate that in recent decades has experienced significant urbanization and land-use/land-cover change (Field et al 2009, Gaveau et al 2014. Recent studies have shown that haze episodes and extreme air pollution concentrations have become more frequent due to both increased local/urban emissions and transboundary pollution (Aouizerats et al 2015, Lee et al 2018, Hansen et al 2019, Alifa et al 2020. Equatorial Asia is also currently one of the most densely populated regions in the world, thus the need of improving air quality to reduce harmful impacts on human health is particularly pressing.

PM 10 observations
Long-term observations of PM 10 concentrations from a network comprising 52 ground-level monitoring stations across Peninsular Malaysia and Malaysian Borneo are analyzed (figure 1). The sites have been active during the period 1997-2015 and monitored PM 10 concentration through beta attenuation or tapered element oscillating microbalance instruments, as part of the continuous air quality monitoring program of Malaysia. Measurements have been standardized using universal calibration approaches. In this work, daily mean PM 10 values are used to investigate air pollution variability at multiple time scales, including monthly, seasonal, annual, and inter-annual.

Satellite observations of aerosols, atmospheric trace gases and land use
Satellite retrievals of aerosol properties, trace gases and land use are used to develop a satellite-based proxy able to capture the variability of ground-level PM 10 . Key features of the analyzed satellite retrievals are summarized in table S1. Specifically, our proxy is based on: • Aerosol optical depth (AOD) data from MODIS (Moderate-resolution Imaging Spectroradiometer) Collection 6 deployed onboard the NASA Terra and Aqua satellites. Level 2 (L2) daily AOD (at the λ = 550 nm wavelength) at 1 km × 1 km resolution from the multiangle implementation of atmospheric correction (MAIAC) algorithm (Lyapustin et al 2018) is used. MAIAC is chosen because characterized by a wider spatial coverage and higher retrieval accuracy, compared to other algorithms applied in neighboring regions (Mhawish et al 2019). AOD is chosen as a proxy for suspended aerosols in the atmosphere, including fine solid PM. Over land, 95.5% of the AOD data used have the highest quality in the MAIAC product.
• Column water vapor (CWV), retrieved as a daily, 1 km × 1 km resolution data from MODIS on Terra and Aqua, and corrected through MAIAC. CWV is considered as indicator of liquid suspended particles/droplets, which affect AOD measurements as absorbing part of the radiation detected by MODIS. • Normalized difference vegetation index (NDVI), retrieved as monthly Level 3 (L3) 1 km × 1 km resolution quantity from MODIS onboard Aqua. NDVI denotes the vegetation surface coverage and is used as a proxy of natural emissions of PM precursors (e.g. volatile organic compounds (VOCs), mainly isoprene), as well as an important absorber of both atmospheric PM 10 and its gaseous precursors (Nowak et al 2014(Nowak et al , 2018. • Carbon monoxide (CO), tropospheric amount derived from the Measurements of Pollution in the Troposphere (MOPITT) sensor onboard Terra (Deeter et al 2017), as gridded L3 (Version 8) monthly averages at 1 • × 1 • latitude × longitude resolution. CO is taken into account as an indicator of primary PM emissions derived from both anthropogenic (e.g. traffic) and natural (e.g. wildfires) combustion processes.
• Urban fraction (UF), from the Consensus Landcover dataset (Tuanmu and Jetz 2014), as a single satellite image with 30-arc-second spatial resolution (∼1 km at the equator). UF is included, as expected to be positively associated with anthropogenic emissions of both primary PM 10 and gaseous precursors of secondary aerosol.
Tropospheric amounts of trace gases and ultraviolet (UV) irradiance, measured by the ozone monitoring instrument (OMI) onboard the NASA's Aura spacecraft, are also considered in this study as key precursors of both inorganic and organic secondary atmospheric PM: • Nitrogen dioxide (NO 2 ), daily tropospheric column, cloud-screened at 30%, with 0.25 × 0.25 • latitude × longitude resolution from the OMNO2d (V3) L3 product (Duncan et al 2018 (Sullivan et al 2016), as a proxy for the availability of secondary organic aerosol precursors (e.g. VOCs). • Ultra-violet irradiance (UV), daily gridded L2 retrieval (OMUVBG, V3) at 0.25 • × 0.25 • resolution, measured at λ = 310 nm. UV is considered as the main energy source of the photochemical reactions that lead to secondary aerosol formation.

Data pre-processing
To test models skills in predicting ground-level PM 10 , satellite data are extracted by averaging the values of the pixels falling within a 20 km radius around each measuring station. The radius choice derives from our sensitivity analysis that identifies the minimum radius maximizing the correlation between daily PM 10 and AOD while retaining an appreciable number of non-missing data over the entire period 2005-2015. AOD averages are computed if at least 5% of the total number of pixels inside the 20 km radius are non-missing data. An autocorrelation analysis is performed at each site to quantify the actual scales of PM 10 temporal variability. As the PM 10 autocovariance function displays an exponential decay (also shown by (Alifa et al 2020)), the mean autocovariance among sites reaches the value of 1/e (∼0.37) at a lag equal to 7 d (figure S1 (https://stacks.iop.org/ERL/15/104088/mmedia)), thus we average daily PM 10 concentration using a 7-d moving average without discarding significant temporal variability. This moving average is applied only when at least 3/7 values are non-missing. An analogous moving average is applied to daily values of AOD, CWV, NO 2 , SO 2 , HCHO and UV. Monthly satelliteretrievals of CO and NDVI are replicated on a daily basis for each month and a 7-d moving average is applied to smooth the transition between consecutive months.
Satellite retrievals are homogenized to the reference OMI 0.25 • × 0.25 • (latitude × longitude) grid when aiming to predict PM 10 maps over Equatorial Asia. As CO is available at 1 • × 1 • resolution, each grid cell is divided into 16 sub-cells containing the same value of the initial one to match the reference grid.

Deep neural networks
DNNs are powerful non-parametric approaches to explain highly non-linear relationships between input and output (Goodfellow et al 2016) and hence appropriate to explain atmospheric chemistry processes in the Earth system (Reichstein et al 2019). Here DNNs are trained using satellite data extracted with a 20 km averaging radius around each station and aggregated with a 7-d moving average. Based on our sensitivity analysis on model performance (figure S2), we define our DNN to comprise two subsequent hidden layers of ten and nine nodes, respectively. Given the 9 inputs and 1 output, the total number of model's parameters is 209 ( figure S3). DNNs are trained on a randomly extracted 80% subset of all available data; then, validation is performed on the remaining 20%. The data for DNN are selected when all the nine input variables are non-missing at the same time-step. One hundred trials are performed to eliminate the dependence of model performance on individual random sampling of training data. The overall performance of DNN is evaluated with the Pearson (r) and Spearman (ρ) correlation coefficients between observed and modeled values. The model bias and error are quantified on a seasonal basis using the normalized mean bias factor (NMBF) and the normalized mean absolute error factor (NMAEF) (Yu et al 2006), defined as: where m i represents the estimated PM 10 and o i the observed one, whilem andō their associated means and n the number of samples of the entire dataset. DNN predictive skills are also compared against the performance of a LM having the same input and output variables. Moreover, as seasonal phenomena (mainly monsoons and wildfires) are present in the analyzed area, model evaluation is also performed over distinct seasons: winter, spring, summer and fall (DJF, MAM, JJA and SON, respectively).
To predict annual mean PM 10 spatial fields, other DNNs are trained on monthly aggregated data. PM 10 patterns are thus predicted from the monthly aggregated satellite variables homogenized to the reference grid at 0.  (see table S2), while they are more relevant over  shorter time scales and when more data are available  (table S3). Thus when aiming to predict annual mean PM 10 fields, meteorological data are not incorporated into DNN.

Premature mortality estimates
We apply the global exposure mortality models (GEMMs) (Burnett et al 2018) to estimate the mortality burden associated with the satellite-derived PM 10 concentrations. As GEMMs require PM 2.5 data, we estimate the PM 2.5 /PM 10 ratio from simulations of the Weather Research and Forecasting model with Chemistry (WRF-Chem) presented in  for September, October and November 2015. The annual domain average PM 2.5 /PM 10 , retrieved by assuming September and October to be wildfire months and November being representative of the other 10 'ordinary' months, is estimated by averaging all the on-land grid cells. A value of 0.622 ± 0.036 is found, consistent with (Amil et al 2016).
The relative risk (RR), which is the probability of a fatal outcome from a specific disease due to PM 2.5 chronic exposure divided by the risk of the same outcome in the case of no-exposure, is calculated for each grid cell i and year, following (Burnett et al 2018): where θ, α, µ, are age and disease-specific parameters andz i the maximum between zero and the difference between PM 2.5 concentration in grid cell i and the no-observed-effect concentration (2.4 µg m −3 ). RR is calculated by referring to 5-year age groups >25 years old (i.e. 25-30, 31-35, 36-40… 80 plus) and to four chronic diseases: chronic obstructive pulmonary disease (COPD), lung cancer (LC), ischemic heart disease (IHD) and stroke (S). The premature deaths in each year and grid cell i are computed as: where P i is the total population living in cell i for a specific year and B the yearly baseline mortality for a specific chronic disease, derived from the Global Burden of Disease (Naghavi et al, 2017). The total population living in the region in figure 1 is considered for the health impact assessment. The uncertainty of (i) PM 2.5 /PM 10 ratio, (ii) GEMMs parameter (θ) and (iii) B, is propagated into the estimates of yearly premature deaths with a Monte Carlo approach on 10 000 simulations, assuming a Gaussian distribution of each of the parameters, similarly to (Giani et al 2020). Estimation of the uncertainty associated with DNNs parameters remains an open question in computer science (Goodfellow et al 2016), thus the uncertainty from PM 10 predictions cannot be included in our assessment.

Sources of seasonal and inter-annual variability of air quality in Equatorial Asia
During 2005-2015, significant seasonal patterns are observed in ground-measured PM 10 , with highest concentrations during summer and lowest in winter ( figure S4) S4), due to the dry conditions brought by ENSO and associated wildfires. During these years, central Sumatra and southern Borneo were hotspots of intense and widespread wildfires, which degraded air quality in surrounding areas , Field et al 2016. In addition to PM 10 seasonal variability due to climate and meteorological conditions, satellite retrievals also display a strong seasonality which may also explain the PM 10 variability. AOD and CO show a strong seasonality, with highest values over southern Borneo and central Sumatra during fall (figures S5 and S6), as a result of the Southwest Monsoon and possible wildfires occurrence. A seasonal pattern is also noticeable in SO 2 , which highlights the impact of volcanic emissions (Carn et al 2017), especially over Java island and during winter ( figure S7). Conversely, no clear seasonal pattern is observed for NO 2 and HCHO (figures S8 and S9), which however clearly identify key anthropogenic sources. Given the significant seasonal variability of the predictors, our DNNs are developed on a seasonal basis to capture seasonal phenomena key to explain PM 10 variability and ultimately improve satellitebased PM 10 predictions.

Model evaluation
The evaluation of seasonal DNN performance and comparison to LM, as described in section 3.2, are here presented. Significantly higher r and ρ are found for DNN compared to LM (table 1), which indicate that DNNs present higher performance in predicting ground-level PM 10 values. This is likely due to the presence of non-linear mechanisms and interactions between variables, typical of environmental pollution systems, which are not captured by the LM. The accuracy of DNN predictions is also higher, as both model bias (NMBF) and absolute error (NMAEF) are significantly closer to 0. Moreover, while LM underestimates PM 10 observations in every season, DNN presents a negative NMBF in summer and fall, when PM 10 peaks generally occur, and a slightly overestimation in winter and spring. NMBF remains anyway modest and lower compared to prior chemical transport model simulations performed over the area (Gao et al 2014. The seasonal variability of model performance reflects the complexity of the relationship between dependent (measured PM 10 ) and independent variables (satellite-retrieved). DNNs skill on relatively low PM 10 (∼40 µg m −3 ) remains similar among seasons; however, the occurrence of higher PM 10 in summer and fall favors an improved fit, with the model being able to estimate a wider range of values (figure S10). Such model behavior suggests the presence of a baseline PM 10 level that cannot be fully explained by the predictors included in the model. Some satellite retrievals are also subject to higher uncertainty when aiming to detect low levels of trace gases, thus other predictors, such as meteorological variables, may be included to provide additional information on PM 10 variability.
DNN seasonal performance is generally higher when input data are aggregated on a monthly basis. The monthly averaging reduces some short-term variability, while still capturing seasonal patterns, and produces a non-negligible increase in the linear correlation coefficients r and ρ, compared to DNN trained on 7-d moving averages (table 2 and figure  S11). In this case, training is performed on all the available data, as the monthly temporal aggregation reduces the sample size by an order of magnitude and precludes generating random samples, representative of most of data variability, for training and validation.   High values of PM 10 (>50 µg m −3 ) also occur over Peninsular Malaysia, central/eastern Sumatra and part of Java every year (see figure 1 for the locations of these islands), due to the combined effect of local emission sources and transnational pollution transport (Lee et al 2016). Urban scale pollution is also captured by the model, as localized pollution peaks are present over metropolitan areas including Singapore, Jakarta and Kuala Lumpur. The yearly average PM 10 over these areas always exceeds the World Health Organization threshold of 50 µg m −3 , thus suggesting that both wildfires and large anthropogenic emissions are critical in deteriorating the regional air quality and lead to potentially severe impacts on human health. Analogous results are found for yearly maps of PM 2.5 where most of the analyzed domain exceeds the yearly average WHO standard of 10 µg m −3 (figure S12).

Trends in human health impacts
Yearly satellite-based PM 10 spatial fields, generated with DNNs, are converted to PM 2.5 maps (see example in figure S12) using a PM 2.5 /PM 10 ratio estimated from WRF-Chem (see Methods) and fed to the GEMMs. Premature deaths are computed for each year by integrating the estimated RR with population distribution maps. The total premature mortality burden and the associated 95% confidence interval (C.I.) are reported in table 3.
The most relevant diseases over the analyzed decade are IHD and stroke, which on average  Table 3. Total estimated premature deaths associated with PM2.5 and 95% confidence interval (C.I.) (columns 2 and 3, respectively) over the analyzed domain for each year during 2005-2015. Columns 4-7 indicate the percentage of premature deaths associated with each of the diseases analyzed (see section 3.3 for their definition). Column 8 contains the total exposed population (in millions) over the analyzed domain.

Year
Total  4). Big metropolitan areas, including Jakarta, Singapore and Kuala Lumpur, stand out clearly as the most affected areas. This is certainly due to the large population, but PM 2.5 also plays a crucial role, as it peaks over those locations (figure 2). Other PM 2.5 -related health effects, beyond to big cities, are prominent in highly urbanized areas, such as Java   figure 4). Conversely, wildfire contribution to premature mortality is most likely determined by transport phenomena from burnt areas to densely populated areas, thus impacting the total amount of victims (table 3).

Conclusions
In this study, we develop a novel DNN approach trained on a suite of satellite-retrieved variables related to atmospheric physics and chemistry, and land use, to remotely predict ground-level PM 10 concentrations. The model is developed for Equatorial Asia but its applicability extends to any region with a sparse monitoring network. DNNs generally show enhanced predictive skills and lower bias compared to the more classical LM approach, as able to capture significant non-linear mechanisms and variable interactions dictating PM 10 concentrations. On a seasonal basis, the dry period brought by the Southwest monsoon, possibly enhanced by ENSO, is found to be associated with higher PM 10 that lead to an improved model performance during summer and fall. Higher PM 10 in the fall is associated with the less frequent wet deposition processes and enhanced wildfires occurrence, which determined the extreme haze events recorded in 2006 and 2015. As significant spatio-temporal variability remains poorly explained for relatively low PM 10 , future research should focus on including additional meteorological variables (e.g. wind speed/direction, ground temperature and planetary boundary layer height), which may enable description of aerosol vertical profiles and transport and dispersion processes on the local scale. Further, while this study shows high skill of DNN in predicting surface PM concentrations, future investigations could be directed to quantify the predictive skills of a different machine learning approaches (e.g. random forests, gradient boosting machine or mixed models).
The annual PM 10 and PM 2.5 maps reveal significant spatial and inter-annual patterns related to both anthropogenic drivers and wildfires. The estimated health impacts indicate that metropolitan areas remain the most affected, due to the combined effect of numerous anthropogenic emissions and high population density. Conversely, the effect of wildfires dominates on the regional scale, as indicated by the strong inter-annual variability in the number of premature deaths over the region, which are significantly higher during fire years than adjacent non-fire years. We also found a significant increasing trend of PM 2.5related mortality of +1600 additional deaths per year on average over 2005-2015. In addition to the population growth, a possible explanation for this include the ongoing urbanization and land-use and landcover changes, as well as large-scale climatic changes, such as the enhanced intensity of ENSO and more frequent wildfire events. Future research will be directed to attribute the role of these drivers through numerical model simulations including different climate conditions and emission scenarios. Environment (DOE) for providing access to the observational data used in this study. The authors acknowledge NASA for accessibility to MODIS and OMI data (downloaded from open-access website https://earthdata.nasa.gov), MERRA-2 output (https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/data_access/) and population/demographic data (obtained from NASA SEDAC (https://sedac.ciesin. columbia.edu). Global Burden of Disease used in this study have been accessed from the Institute for Health Metric and Evaluation website: http://ghdx.healthdata.org/ihme_data.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.