Estimating historical PM2.5 exposures for three decades (1987-2016) in Japan using measurements of associated air pollutants and land use regression.

Accurate estimation of historical PM2.5 exposures for epidemiological studies is challenging when extensive monitoring data are limited in duration. Here, we develop a national-scale PM2.5 exposure model for Japan using measurements recorded between 2014 and 2016 to estimate monthly means for 1987 through 2016. Our objective is to obtain accurate PM2.5 estimates for years prior to implementation of extensive PM2.5 monitoring, using observations from a limited period. We utilize a neural network to convey the non-linear relationship between the target pollutant and predictors, while incorporating the associated air pollutants. We obtain high R2 values of 0.76 and 0.73 through spatial and temporal cross validation, respectively. We evaluate estimation accuracy using an independent data set and achieve an R2 of 0.75. Moreover, monthly variations for 2000-2013 are well reproduced with correlation coefficients of greater than 0.78, obtained through a comparison with observations. We estimate monthly means at 1 × 1 km resolution from 1987 through 2016. The estimates show decreases in the area and population weighted means beginning in the 1990s. We successfully estimate monthly mean PM2.5 concentrations over three decades with outstanding predictive accuracy. Our findings illustrate that the presented approach achieves accurate long-term historical estimations using observations limited in duration.


Introduction
Exposure to PM 2:5 (particulate matter with aerodynamic diameters equal to or less than 2.5 mm) has been associated with adverse health outcomes in many epidemiological studies (Rich et al., 2009;Faiz et al., 2013;Puett et al., 2014;Fleischer et al., 2014;Cohen et al., 2017). Proper exposure assessment requires accurate estimates of the spatial and temporal variations in the study domain and period. Statistical models, referred to as land use regression (LUR) models and their derivatives, have typically been used for this purpose (Beelen et al., 2013;Sampson et al., 2013;Beckerman et al., 2013;Di et al., 2016a). Because these models are dependent on observations, many previous models estimated exposures in modeling periods for which measurements are available (Knibbs et al., 2014;Bechle et al., 2015;Proietti et al., 2016;Huang et al., 2018).
In cohort studies, accurate estimation of historical exposures is important in assessing the longitudinal impact of PM 2.5 on chronic health outcomes such as cancer and cardiovascular disease. However, for areas where monitoring networks have been installed only recently, studying PM 2.5 exposures and health outcomes will be difficult until sufficient monitoring data are available to properly estimate exposures. Meanwhile, epidemiological knowledge specific to the area is required for proper implementation of environmental policies such as emission controls and air quality standards.
Previous studies have presented historical estimation of PM 2:5 for time periods prior to the implementation of extensive monitoring networks. Lall et al. (2004) built a model based on the ratio of PM 2:5 to PM 10 or TSP (total suspended particle) obtained from observations recorded between 1987 and 2000, and estimated quarterly PM 2:5 in the U.S. from 1979 through 1986. Li et al. (2017b) trained a generalized additive model using measurements recorded between 1999 and 2013, and predicted daily PM 2:5 in California, U.S., from 1990 through 1998. Kim et al. (2017) constructed a model using the estimated long-term trend in observed PM 2:5 between 1999 and 2010 to predict annual PM 2:5 in the U.S. from 1990 through 1998. These attempts used relatively long-term data sets (i.e., spanning more than ten years), which is reasonable because historical trends can be better estimated using observations recorded over longer periods. This would be possible in a country or area, such as the U.S., where a comprehensive monitoring network was implemented early on. Conversely, back-extrapolation would be infeasible for countries or areas where extensive networks developed more recently, because available observations are limited in duration.
In the statistical model, predictors such as land use, trafficrelated, and meteorological variables are generally used for model constructions and predictions. Although these predictors represent the spatial variations at least in a certain time window, they would not sufficiently represent temporal variability when pollution levels change over time. Thus, variables sufficiently reflecting temporal variation are essential for back-extrapolation. Satelliteretrieved data has been widely used as a predictor in statistical models for NO 2 (Hoek et al., 2015;Xu et al., 2019) and PM 2:5 (Mao et al., 2012;Xu et al., 2019). Because they can be a useful predictor for capturing temporal as well as spatial variations, they are utilized in models estimating long-term variations in NO 2 (Knibbs et al., 2014;Bechle et al., 2015) and PM 2:5 (Beckerman et al., 2013;Di et al., 2016a;Zhang et al., 2018;Xu et al., 2018). However, this approach cannot be applied to estimates for the time period before Terra was launched in 1999 (Li et al., 2017b).
Meanwhile, PM 10 could be a predictor for modeling historical PM 2:5 concentrations in earlier years, because it was monitored extensively before the comprehensive PM 2:5 network was implemented. Li et al. (2017b) used PM 10 measurements recorded between 1999 and 2013 as one of the predictors for a generalized additive model, and estimated daily PM 2:5 in California, U.S., from 1990 through 2013 with a cross-validated R2 of 0.79. Sulfate and nitrate, which are secondary constituents of PM 2:5 , are generated by the oxidation of SO 2 and NO 2 , respectively. Thus, in addition to coarse particulate matter, these air pollutants share emission sources with PM 2:5 and can be informative predictors in the PM 2:5 model. Moreover, similar to PM 10 , SO 2 and NO 2 were monitored extensively before the launch of the comprehensive PM 2:5 network, which raises the possibility of pre-1999 estimations. However, to our best knowledge, applications of these pollutants to model predictors for historical exposure assessments have been limited thus far.
In Japan, the national air quality standard for PM 2:5 was implemented in 2009. National-scale PM 2:5 monitoring started in 2010 with a small number of stations, and was gradually extended until an extensive network was established after 2014 when the number of monitors generally reached more than 700. At the time of this study, the extensive data set only covers the years 2014e2016. Before the launch of the national monitoring network, a limited number of monitoring stations (approximately ten, depending on the year) performed governmental monitoring starting in 2000. Thus, detailed spatial variation is unclear before 2010, and no information is available prior to 2000.
In this study, we develop a national-scale monthly PM 2:5 exposure model for Japan using measurements recorded between 2014 and 2016, incorporating the associated air pollutants as predictors. The overall objective is to obtain accurate PM 2:5 estimations for the years before the launch of the extensive monitoring network, using monitoring data from a limited time period. We utilize a neural network to model the complex, non-linear relationships between the target pollutant and predictors. We validate our model through cross validation, as well as external validation using an independent data set containing 14 years (2000e2013) of data, and estimate monthly mean PM 2:5 from 1987 through 2016. Further, we discuss the accuracy of our model and the estimated historical trend.

Study area
The study area includes the main islands of Japan (130 E À 146 E, 31 N -46 N); remote or small islands are omitted (Fig. S1).

Air quality measurements
We obtained PM 2:5 observations from two databases: the national air quality monitoring network (AQMN) database and the governmental monitoring database. The AQMN was developed in the early 1980s and PM 2:5 monitoring was started in 2010. The AQMN stations are maintained by local governments. Detail of the AQMN database usage is presented in Supplementary Material. Governmental PM 2:5 monitoring was started by the Japanese Ministry of the Environment with four stations in 2000, ten years before the implementation of the national-scale network. The number of governmental sites increased gradually to 18 in 2009, one year before the implementation of the national network. We obtained data from four stations that are continuously operated: Nonodake (NND), Kawasaki (KWS), Osaka (OSK), and Amagasaki (AMG). Station NND is located in the rural area of northern Japan. Station KWS is installed in the Tokyo metropolitan area. Stations OSK and AMG are both positioned in the Osaka metropolitan area. These monitoring sites have been continuously operated since 2000. The locations of the stations are presented in Fig. S1. The availability and usage of the PM 2:5 monitoring data are summarized in Table S2.
We used the monitoring data recorded between 2014 and 2016 for model building, the latest available three years, when the number of monitors are more than 700 and became generally constant (Fig. S2), in order to avoid heterogeneity of observations in space and time and to better represent the study area and period. The remaining data from 2010 to 2013 are used as an independent test dataset for model validation. The governmental monitoring data are used to evaluate the correlation and trend. The model validation is described in more detail below. Fig. S3 shows the time series of the monthly means we used for model construction. The temporal variation is not clearly recognized, although it is generally high during spring.

Data set
Associated air pollutants, including suspended particulate matter (SPM), SO 2 , and NO 2 , were used as predictors. SPM has been measured in Japan since early 1970s as airborne particulate matter and is defined as particulate matter with a diameter of less than 10 mm by the Air Pollution Control Act. Thus, SPM is different from PM 10 and corresponds to approximately to PM 7 (Kasahara, 2002), that is, particulate matter with an aerodynamic diameter of less than 7 mm. We obtained the associated pollutant monitoring data from the AQMN database. We spatially interpolated the monthly mean values of the three pollutants by ordinary kriging to derive gridded data at 1 km resolution and extracted the kriged values at the location and time of the observed PM 2:5 . We did not use observations from collocated monitors because PM 2:5 monitors are not always collocated with all three pollutant monitors.
We include the emission intensity of large point sources as a predictor. Emission intensity was derived from the Emissions Database for Global Atmospheric Research (EDGAR v4.3.2) (Crippa et al., 2018), which is a global emission inventory at 0.1 resolution covering an extensive time series (1970e2012). We retrieved annual grid data from road transport and large point sources for the years corresponding to our study period. The sectors we used for the latter sources (Table S3) were selected on the basis of estimated emission amounts in Japan (Crippa et al., 2018). Because emission data are not available for years after 2012, we used 2012 emission data for the analyses between 2013 and 2016 without any correction. For large point sources, we consider the distance decay effect (Araki et al., 2018) using the moving window approach (Vienneau et al., 2009).
We prepared gridded monthly mean precipitation, temperature, relative humidity, and wind speed at 1 km resolution by interpolation of the observations via ordinary kriging (Araki et al., 2018). The build-up area and green area ratio were derived using land use data (Araki et al., 2018). The land use and population data we used are originally at 1 km resolution. These data are available only for specific years, as shown in Table S4. For the years for which no data are available, we used the data of the closest year. We used the nearest straight-line distance from the observation and prediction points to the coastline as a predictor (Araki et al., 2017). We utilized longitude as a predictor representing the effect of long-range transport from the Asian continent (Araki et al., 2017).
We summarized the predictor variables described above in Table 1. Details of the data sets and sources are given in Table S4.

Neural networks
Artificial neural networks are mathematical models inspired by brain activity, consisting of artificial neurons (nodes) in layers (Gardner and Dorling, 1998;Agirre-Basurko et al., 2006;Cobourn et al., 2000). The network is formed by three layers of nodes: an input layer, one or more hidden layers, and an output layer. Neural networks make no prior assumptions regarding data distribution, unlike other statistical techniques (Gardner and Dorling, 1998;Cobourn et al., 2000). Neural networks can handle complex nonlinear relationships between input and output variables (Di et al., 2016a), and are used in many applications related to atmospheric problems (Agirre-Basurko et al., 2006;Cobourn et al., 2000;Russo et al., 2013;Biancofiore et al., 2017;Di et al., 2016a, b;Ganesh et al., 2018).
Using the above listed variables, we trained a neural network against the measured PM 2:5 concentrations. We used one hidden layer with seven nodes, because additional layers could have made the model unnecessarily complex and difficult to train (Cobourn et al., 2000). To eliminate randomness in the initial weights, we trained 100 models (unless otherwise noted) with different random number seeds, and averaged the output from each network to obtain predictions (Kuhn et al., 2019). The weight decay was set to 0.01. The input data were normalized to the range of [0,1] by y ¼ ðx À x min Þ=ðx max À x min Þ, where y is the normalized vector of the input variable and x is the raw input variable.

Cross validation
We conducted five-fold cross validation to evaluate the accuracy of our model. The monitoring locations were evenly split into five groups at random. The observations from one group of locations were removed for the whole period, and the 10 networks were trained with different random number seeds using the observations from the remaining four splits. Predictions at the time and location of the removed observations were obtained by averaging the output from each network. This procedure was repeated for the four groups. We iterated the five-fold cross validation 50 times with different random number seeds to extinguish the randomness of the splits. Finally, the cross validation predictions were obtained by averaging the 50 outputs and compared to the observed values. Thus, 500 networks in total were trained. We calculated R 2 and root mean squared error (RMSE) between the predictions and observations as the prediction accuracy indexes. R 2 was obtained by 1 À MSE=varðYÞ, where Y is a vector of observations and MSE is the mean squared error. We refer to this procedure as spatial cross validation.
We also performed temporal cross validation. In this process, observations were removed for a particular year (12 months), and the model was built using the data for the remaining two years (24 months). Values at the monitoring locations in the selected year were estimated by the model. This procedure was repeated for the other two years, and R 2 and RMSE values were obtained. Because we intended to back-extrapolate for more than 25 years, we removed observations year by year, not month by month, in order to evaluate the temporal predictive power of our model in a manner better suited to our purpose.

External validation
In addition to cross validation, we evaluated our model using an independent data set that was not included in the model building process. We refer to this process as external validation. From the AQMN database, we obtained PM 2:5 data from 2010 to 2013 and used them for external validation. We estimated monthly PM 2:5 at the time and locations of the external validation data. We computed R 2 as well as the RMSE between the estimated and measured values.

Reproducibility in monthly variations
We compared our model estimates against the observed monthly PM 2:5 at the four continuously operated stations to further validate our model performance in terms of the reproducibility of monthly variations and trends. The observations were obtained from governmental monitoring performed during 2000e2013. Because the monitors at the four sites do not incorporate the standard analytical method for the current national-scale network set by the Japanese Ministry of the Environment, we only evaluate the correlation and trend, rather than comparing the absolute concentrations.

Estimation
We estimated the monthly mean PM 2:5 at 1 Â 1 km resolution in the whole study area from 1987 through 2016. To examine the long-term trend of PM 2:5 , we calculated the monthly arithmetic average of the estimations at all the prediction grids, as well as the population weighted means. The former can be seen as area weighted means. The latter is obtained by (1) where pwm is the population weighted mean, C i and p i is the estimation and population in the i th prediction grid, respectively, and n is the number of prediction grids. The population data from the closest year is used for the calculation, because the data are not available for every year, as mentioned earlier.

Computation
All the computations were done using R statistical software 3.5.3 (R Core Team, 2019); the raster (Hijmans, 2019) and gstat (Pebesma, 2004) packages were used for preparing the predictor variables, and the nnet (Venables and Ripley, 2002) and caret (Kuhn et al., 2019) packages were used for training the neural network.

Cross validation
The scatter plot of the observations and the predictions obtained by the spatial cross validation is given in Fig. 1 a). The colors of the dots represent the density of the points in the plot; red represents higher density and blue represents lower density. R 2 and RMSE are shown in the panel. We achieved an R 2 value of 0.76, with an RMSE value of 1.9 mg m À3 . Fig. S4 presents the spatial cross validation results plotted year by year. R 2 and RMSE values are similar between years. Fig. 1 b) presents a scatter plot of the observations and the predictions obtained by the temporal cross validation. We obtained an R 2 value of 0.73, with an RMSE value of 2.0 mg m À3 . Fig. S5 presents the annual plots of the temporal cross validation. The differences in R 2 and RMSE values for the three years are marginal. Fig. 2 shows the external validation result obtained by using the independent data set for 2010e2013. We achieved an R 2 of 0.75 with an RMSE value of 2.2 mg m À3 . These statistical indicators are generally similar to those for the spatial and temporal cross validation, and are comparable between years (Fig. S6). The estimations are not significantly over-or underestimated, which is apparent from Fig. 2.  Fig. 3 presents the temporal variation in the observed and estimated monthly means for the four continuously operated monitoring stations: NND, KWS, OSK, and AMG. The plotted lines for the observed means are disconnected where observations are missing. The temporal variations are generally well reproduced, with high Pearson's correlation coefficients (r) of greater than 0.78. We aggregated the observed and estimated monthly means in each calendar year into annual means. The estimations also conform to the long-term trends of the annual observations at the four locations, which is apparent from Fig. S7. Fig. 4 presents the annual prediction maps at 1 Â 1 km resolution obtained by aggregating the estimated monthly means in each calendar year into annual means. While the concentrations in the southern part of Japan are consistently higher than those in the northern part throughout the study period, the estimations clearly decreased over the three decades in the whole study area. consistently higher than the area weighted means, although the difference decreases in later years. The decreasing trend is exhibited in both the area weighted and population weighted means. The decrement in concentrations for both of the means became remarkable in the 1990s.

Discussion
We developed the national-scale PM 2:5 exposure model to estimate the monthly means for three decades (1987e2016). Our model is accurate, with spatial cross validated R 2 and RMSE values of 0.76 and 1.9 (mg m À3 ), respectively, and temporal cross validated R 2 and RMSE values of 0.73 and 2.0 (mg m À3 ), respectively. Making predictions for the years beyond the modeling period might involve larger uncertainty due to the potential change in the relationship between PM 2:5 and predictors. Thus, in addition to the cross validation, we evaluated our model using two independent data sets recorded before the year 2014: the AQMN data and the governmental monitoring data. For external validation, we used AQMN data recorded between 2010 and 2013 and achieved R 2 and RMSE values of 0.76 and 2.2 (mg m À3 ), respectively, which are nearly identical to the indicators obtained by the spatial and temporal cross validation. This indicates that our model is robust without overfitting. No significant over-or underestimation is recognized in the three validation results, as illustrated in Figs. 1 and 2. The spatial and temporal cross validations yield similar R 2 and RMSE values between years. This shows the homogeneity in the model accuracy between years. Meanwhile, we assessed the reproducibility of the long-term trend and the monthly variations between 2000 and 2013 using the independent data sets obtained from governmental monitoring. The monthly variations and the long-term trend are well reproduced with r values of greater than 0.78, as presented in Fig. 3. These results indicate that our model accurately reproduces the spatial and temporal variability of PM 2:5 concentrations for the times when observations are available.
Meanwhile, we were unable to validate the prediction accuracy for the years without recorded observations (1987e1999). Levy et al. (2015) estimated annual NO x concentrations in Israel between 1961 and 2011, employing a LUR model developed using observations recorded between 1991 and 2011. They validated the estimations against the independent annual NO x emission estimates from road transport sources. Unlike NO x , PM 2:5 consists of primary and secondary pollutants such as sulfate and nitrate, which are produced by oxidation of SO 2 and NO 2 , respectively. Long-range transport from the Asian continent has been reported to contribute to PM 2:5 concentrations in Japan (Aikawa et al., 2010;Shimadera et al., 2016). In addition, we used PM 2:5 emissions from large point sources as one of the predictors. Therefore, using emission data for evaluating our estimations would not be appropriate. Similarly, the associated air pollutants could not be used for validation of the model. Satellite derived AOD cannot be applied for evaluations for years prior to 1999 (Li et al., 2017b).
Alternatively, we compare our trend estimations with that of a previous study. Li et al. (2017a) estimated gridded global PM 2:5 trends between 1989 and 2013 using a chemical transport model. In their results, the grids over Japan exhibit a decreasing trend of approximately 0.3e0.4 mg m À3 yr À1 . This is slightly lower than, but in a similar range as, our result of 0.4 mg m À3 yr À1 , which is calculated from the area weighted means presented in Fig. 5. This comparison shows that our trend estimates are generally consistent with the previous study.
These evaluation results suggest that our model estimations can be expected to be accurate not only for the later period (2000e2016) but also for the years (1987e1999) for which monitoring data are not available, thus sufficiently capturing the spatial and temporal variation of PM 2:5 .
The population weighted means are consistently higher than the area weighted means, as indicated in Fig. 5; this implies that areas with higher population densities have higher PM 2:5 concentrations due to local anthropogenic emissions. Similar results were reported in a previous study (Li et al., 2017a), confirming that population weighted PM 2:5 is higher than area weighted PM 2:5 in populated regions worldwide. The downward trend in PM 2:5 may owe to the decrease in local emissions. Itahashi and Hayami (2018) conducted PM 2:5 source apportionment in Japan for the years 2000e2008 using a chemical transport model, and found that decreases in domestic anthropogenic emissions, particularly those from road transport, contributed to the downward trend. The decline in local anthropogenic emissions may reduce the difference between the population and area weighted means. This might be the effect of governmental regulations on automobile exhaust and volatile organic compound (VOC) emissions.
We utilized the associated air pollutant information as the model predictors. These predictors exhibit high correlation with the monitored PM 2:5 (Fig. S8) and could contribute to the performance of the model. These pollutants share some emission sources with PM 2:5 , and in turn are influenced by meteorological conditions in a similar manner as with PM 2:5 . Thus, they can be expected to reflect the spatial and temporal variation of PM 2:5 to some extent. Additionally, these variables have the same temporal resolution as the observed PM 2:5 . The relationship between the associated pollutants and PM 2:5 is not causality, but correlation. Therefore, the associated pollutants are essentially different from commonly used predictors such as land use, meteorological, and traffic variables, which directly or indirectly affect PM 2:5 concentrations. Although satellite derived data are similar in this regard and have higher spatial coverage, the associated air pollutant information provides better temporal representativeness with significantly less missing data because it is recorded by onsite monitoring stations in continuous operation. Given these features, the information from the associated air pollutants can contribute to the predictive accuracy of statistical models and should be further examined for its applicability in a different context.
In this regard, building a daily model would be technically possible using the associated air pollutant information as well as meteorological parameters. More accurate monthly estimates could be derived by aggregating daily predictions obtained by a successfully trained daily model which capture spatial variations at a daily level. Meanwhile, the performance of a daily model would be crucial for improving the monthly estimation accuracy. Preparing predictors which sufficiently represent PM 2:5 daily variations is required to successfully build a daily model. However, it is particularly difficult for variables representing emission intensity. Therefore, constructing a daily model aiming at improving monthly estimation accuracy should be further examined.
There are some limitations in this study. First, we did not use some predictors that are unavailable in data from previous years. For instance, we used PM 2:5 emissions from large point sources obtained from EDGAR v4.3.2 (Crippa et al., 2018), which is available from 1970 through 2012 at 0.1 Â 0.1 resolution. Although there is an emission inventory for Japan at 1 Â 1 km resolution (Kannari et al., 2007;Fukui et al., 2014), this database does not contain data for years prior to 2000. Additionally, we did not use road network related variables as our model predictors, because historical road network data are not publicly available. Instead, we used road transport emission data obtained from EDGAR v4.3.2 (Crippa et al., 2018) at 0.1 Â 0.1 resolution, which may be too coarse for our model. Therefore, the accuracy of the model for recent years might be compromised to maintain the accuracy for the whole study period. We prioritized the temporal consistency of the model predictors over the estimation accuracy of a limited number of years, which may affect the prediction accuracy for the overall period.
Second, our estimations may not well represent the PM 2:5 concentrations in non-residential areas. To generate predictors for our model, we spatially interpolated the monitored concentrations of the associated air pollutants by ordinary kriging. The monitoring sites are not homogeneously distributed; in fact, they are mostly located in residential areas where the concentrations of such air pollutants are generally higher than those in non-residential areas. The kriged concentrations, the predictors of our model, in the nonresidential areas are likely to be over-influenced by the observed values of the associated pollutants in residential areas. Our model estimations in non-residential areas are in turn strongly influenced by the monitored values of the associated pollutants in residential areas. Thus, our estimations in non-residential areas may not well represent the actual pollution levels, rather reflecting the concentrations in residential areas. Therefore, our model may overestimate PM 2:5 concentrations in non-residential areas. Conversely, the potential uncertainty in our predictions is limited to nonresidential areas. This is acceptable because we focus on PM 2:5 exposures. To better estimate PM 2:5 concentrations over an entire area, a more accurate spatial interpolation of the associated pollutants should be considered.

Conclusion
Despite these limitations, we successfully developed our model for estimating national-scale monthly mean PM 2:5 concentrations for three decades with outstanding predictive accuracy. We demonstrated that an accurate long-term historical estimation can be achieved with observations limited in duration by our innovative approach, allowing to help fill the gap between limited data availability and prompt epidemiological knowledge accumulation.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.