Using geographic information systems to assess individual historical exposure to air pollution from traffic and house heating in Stockholm.

A specific aim of a population-based case-control study of lung cancer in Stockholm, Sweden, was to use emission data, dispersion models, and geographic information systems (GIS) to assess historical exposure to several components of ambient air pollution. Data collected for 1,042 lung cancer cases and 2,364 population controls included information on residence from 1955 to the end of follow-up for each individual, 1990-1995. We assessed ambient air concentrations of pollutants from road traffic and heating throughout the study area for three points in time (1960, 1970, and 1980) using reconstructed emission data for the index pollutants nitrogen oxides (NO(x)/NO(2)) and sulfur dioxide together with dispersion modeling. NO(2) estimates for 1980 compared well with actual measurements, but no independently measured (study-external) data were available for SO(2), precluding similar validation. Subsequently, we used linear intra- and extrapolation to obtain estimates for all other years 1955-1990. Eleven thousand individual addresses were transformed into geographic coordinates through automatic and manual procedures, with an estimated error of < 100 m for 90% of the addresses. Finally, we linked annual air pollution estimates to annual residence coordinates, yielding long-term residential exposure indices for each individual. There was a wide range of individual long-term average exposure, with an 11-fold interindividual difference in NO(2) and an 18-fold difference in SO(2). The 30-year average for all study subjects was 20 microg/m(3) NO(2) from traffic and 53 microg/m(3) SO(2) from heating. The results indicate that GIS can be useful for exposure assessment in environmental epidemiology studies, provided that detailed geographically related exposure data are available for relevant time periods.

Recent developments of geographic information systems (GIS) have provided environmental epidemiologists with new tools to study associations between environmental exposures and disease. A GIS is a powerful computer mapping and analysis tool, which permits spatial linking of different types of data (e.g., residential addresses, environmental exposure levels, demographic information) (1,2). GIS can also be used for data management, such as automated address matching to geographic coordinates (geocoding).
One important GIS application is the mapping of environmental exposure (3,4). Modeling of air pollution data has been used to assess ambient concentration fields, usually considered proxies of exposure fields for residents in the area (5,6). Dispersion models are physical (deterministic) models that use existing data on emissions and meteorologic and topologic conditions to create maps of pollutant concentrations (7). Such models are generally based on Gaussian plume dispersion equations. Several models have been constructed, such as the BREEZE model developed for the U.S. Environmental Protection Agency, the CAR model developed by the Dutch Environmental ministry (4), and the AIRVIRO models developed by the Swedish Meteorological and Hydrological Institute (8)(9)(10).
It has long been argued that exposure to urban air pollution contributes to the development of lung cancer, although the risk increase supposedly is limited (11)(12)(13). Studies from many countries indicate a smoking-adjusted relative risk in urban areas over countryside areas of up to 1.5 (14,15). The extent to which urban air pollution contributes to this excess is uncertain, but studies on diesel-exposed occupational groups support a causative role by traffic-related pollution (14,16).
Lung cancer is the most common cause of cancer death among males in Sweden. Stockholm County has a higher incidence of lung cancer than Sweden as a whole, and in the central parts of Stockholm the incidence increases even further (17). A large case-control study of lung cancer in Stockholm males-the LUCAS (Lung Cancer in Stockholm) study-aimed to determine the role of known and potential risk factors, such as smoking, occupational exposures, and exposure to ambient air pollution. Our methodologic study, based on the LUCAS study, specifically aimed to develop methods to assess individual historical exposure to ambient pollution using reconstructed emission data, dispersion models, and GIS techniques. Nitrogen oxides (NO x and NO 2 ) and sulfur dioxide were chosen as indicators of air pollution from motor vehicles and residential heating, respectively.

Materials and Methods
The source population consists of all men 40 to 75 years of age who lived in Stockholm County at any time between 1985 and 1990, and who had lived in the County since 1950, with a maximum of five years of residence outside the County. From this population we recruited three groups, which form the study population for this study: 1,042 cancer cases reported to the Stockholm County cancer registry 1985-1990, and two populationbased control groups comprising 2,364 individuals, age-stratified to the case group (18,19).
We collected information on individual exposures using a postal questionnaire sent to living respondents or to next of kin when the study subject had died. The questionnaire included questions on occupations, smoking habits, and food consumption. To assess exposure to air pollution, we also collected information on all dwellings where the case or control had lived for more than one year from 1955 onward. In case of incomplete data in form of gaps, overlaps, or other inconsistencies in the residential histories, we sought supplemental information from the local offices for demography (parishes) and/or the tax authority. Respondents who had lived in apartment houses were also asked whether they had lived on the ground floor.
We transformed the addresses into geographic coordinates using the MAPINFO computer software (20) in conjunction with a regional geographic address database (21). The address database covers all urban areas in Stockholm County, defined as areas with more than 200 inhabitants and no more than 200 meters between houses (22). The database contains the names of all streets in the urban area. Each street is divided into segments (generally the length of a block), with geographic coordinates on both ends of the street segment and with street numbers on each side of the street at both endpoints. When an address with street and street number is matched to the address database, coordinates for street numbers between endpoints of the street segment are calculated through interpolation.
We geocoded the 10,800 addresses in the study in a three-step procedure. The first step used the automatic geocoding function in MAPINFO. Addresses that could not be coded directly by the program were coded interactively in a second step, with an operator adjusting spelling or street number. In the third step, addresses not matched interactively were coded manually.
We tested the reliability of the automatic coding function using two external suppliers of geocoding services [Telemedia (Stockholm, Sweden), a private company making the maps for the Swedish telephone directory, and CFD (Centrala fastighetsdata; Gävle, Sweden), central authority for real estate data]. Telemedia uses a technique similar to that of MAPINFO, whereas CFD sets the coordinates for an address to the center of the property.
We assessed exposure to traffic-related air pollution (NO x and NO 2 ) for 1955-1990 for the whole territory of Stockholm County, approximately 6,500 km 3 . The exposure assessment was based on geographic modeling of three points in time: 1960, 1970, and 1980. We used the AIRVIRO dispersion modeling system in conjunction with retrospectively constructed emission databases for these years. We used a detailed regional emission database for 1993, collected by the Stockholm-Uppsala Air Quality Management Association (Stockholm), as a basis (23). It contains approximately 4,300 line sources related to road traffic, covering all roads with more than 1,000 vehicles/24 hr, which corresponds to 90% of the estimated total emission from road traffic. It also contains information on over 500 point sources, including major industries and energy plants as well as small industry and ferries in ports. Limited diffuse emission sources, e.g., air traffic and merchant vessels in commercial routes, are treated as area sources, and several population-density related sources, such as local heating, are mapped as grid-sources (250 m or 1,000 m grids). Work machine emissions are treated as grid sources, distributed with one-quarter over industrial areas, and the remainder according to population density. The total emission of NO x in the region in 1993 is estimated at 18,900 tons, of which 79% originated from road traffic.
To reconstruct comparable historical emission databases, we collected retrospectively data from 1955 to 1993 on the growth of the urban areas in the Stockholm county region, the development of the district heating system, and the growth and redistribution of the road traffic (Table 1). We constructed separate emission databases for regional emissions of NO x for 1960, 1970, and 1980. The two former databases contain only contributions from road traffic, whereas the 1980 database was constructed in two versions: with road traffic only and with all sources. The total emissions of NO x from road traffic in these databases are 13,400, 18,100, and 20,800 tons/year.
Emissions from heating were assessed as SO 2 in a similar way as those from NO x were from road traffic. We reconstructed three historical regional emission databases for the same years as for the NO x emissions, with information again based on the 1993 database. Detailed data on the development of district heating (point sources) and other energy plants, and local restrictions of sulfur content in oil were used for the reconstruction (Table 2), as well as SO 2 measurement data. We used data from five monitoring stations (annual and decade mean levels) to calibrate the model for average emission levels from grid-type sources (mainly local oilfueled house heating) for 1970 and 1980, through iterative model runs. Measured levels from the two central monitoring stations were allowed to influence the relative level of emissions from grid sources in the 1960 and 1970 databases, respectively.
We performed dispersion calculations from the retrospectively constructed emission databases using the Gaussian model in the AIRVIRO system (8). We used the 1990s average distribution of climate in 180 different conditions over a year for all periods, and we used no correction factors. We calculated SO 2 and NO x annual mean concentrations for the relevant period, using only the contributions from road traffic and heating sources, respectively. The NO x concentrations were transformed to NO 2 data using the equation in which the parameters A, B, and C were estimated from measurements in Stockholm County in the early 1980s (hourly means from 10 fixed and five mobile stations). The parameters A and B were allowed to vary according to location, and the estimates were in the ranges 0.60-0.70 and 30-40, respectively. The parameter C was estimated at 100 throughout the county. The performance of these conversions was tested in 220,418 paired 1-hr measurements of NO x and NO 2 from the 1990s. The correlation between  1970, and 1980, and local interpolation of NO 2 levels 1955-1959, 1961-1969, 1971-1979, and 1981-1990 We performed the dispersion calculations in four different resolutions: 2,000 m × 2,000 m, 500 m × 500 m, 200 m × 200 m, and 100 m × 100 m grids. We used the highest resolutions in central city areas and the lowest resolutions in the countryside. For each grid we estimated the average levels of the index pollutants. All illustrations are based on 500 m × 500 m grids. The isolines in the illustrations were not used for the calculations.
For main streets in the city center, we added a street canyon contribution to the assessed roof concentrations of NO x . We assessed this contribution by dispersion calculations with the AIRVIRO street canyon model, and summarized it in an 80% addition at the ground floor and a 40% addition for all other dwellings in the streets concerned. We gave 68 addresses (2% of 3,406) an addition to the 1960 dispersion estimate; 58 addresses an addition to the 1970 dispersion estimate, and 44 to the 1980 dispersion estimate. Additions to the assessed levels of NO 2 were made in the same way, but with a 50% addition at the ground floor and a 20% addition for other dwellings in these streets.
For 1980, we compared the dispersion model calculations including all sources of NO x -after transformation to NO 2 as indicated above-with measurements of NO 2 in five rooftop or background sites (one stationary and four mobile stations) taken in the early 1980s ( Figure 1). In the first model run, all five annual values predicted by the model at the highest resolution (not shown in Figure  1) were within ± 20% of the value measured at their respective locations. The model was successively calibrated to minimize this deviation. All available measurements for SO 2 had been used to calibrate the model, so no formal validation of the SO 2 modeling was possible.
To create annual levels of exposure to SO 2 and NO 2 from the modeled concentration fields, we computed extra-and intrapolated values for each address for each year between 1955 and 1990. We modeled the NO 2 concentration in a geographic point to increase linearly from 1955 to 1970, using the estimated values for 1960 and 1970 to obtain the slope. The rationale for linearity is that the traffic counts increased approximately linearly in this period. Between 1970 and 1980 the changes were small and were also modeled as linear. We modeled the increases for after 1980 as linear with the same slope as 1955-1970, motivated by a similar linear increase in traffic counts. In the vicinity of new major roads and the corresponding relieved roads, the slopes before and after the change were extracted from surrounding areas.
We modeled SO 2 concentrations as constant during the period 1955-1967 (1960 Articles • Assessing air pollution exposure using GIS Environmental Health Perspectives • VOLUME 109 | NUMBER 6 | June 2001 Table 2. Examples of input data for reconstruction of SO 2 emission data for 1960, 1970, and 1980, and local interpolation of SO 2 levels 1955-1959, 1961-1969, 1971-1979, and 1981-1990  Besides the individual exposure estimates, we calculated the average annual level of all addresses in the study, which may be interpreted as an annual area average, approximately weighted for average population. The results from epidemiologic analyses based on these exposure estimates are presented elsewhere (19).

Results
The dispersion calculations for NO 2 from road traffic show a substantial increase of the average level with time (Figures 2 and 3), mainly from 1960 to 1970. The increase affects most parts of greater Stockholm and is caused mainly by the increase in traffic volume ( Table 1). The dispersion calculations for SO 2 from heating show a substantial decrease of the levels with time, from above 130 µg/m 3 in central Stockholm in the end of the 1960s to less than 50 µg/m 3 in the whole area at the beginning of the 1980s (Figures 4 and 5). The main reasons for this decrease are restrictions in the sulfur contents of heating oil, massive investment in district heating, and investment in emission control at the energy plants ( Table 2). The decrease is more pronounced in densely populated areas because of earlier implementation of more strict sulfur standards and a more rapid development of district heating.
Of the 10,800 addresses in the study, 56% were geocoded by the automatic geocoding function in MAPINFO. A further 27% were geocoded by the computer program with minor operator assistance. The remaining 17% of the addresses had to be geocoded manually; these addresses were either outside urban areas or too inexact to be geocoded interactively. Three percent of all the addresses could not be found and were given approximate coordinates, e.g., of a village center. We checked the result of the geocoding visually by examining maps where the geocoded addressed were plotted.
For a subset (n = 100) of the addresses, we performed a reliability test of the automatic coding of geographic coordinates using two external suppliers of geocoding services. In this subset, 68 addresses could be geocoded automatically with all three procedures. Both external codings agreed reasonably well with our automatic coding. The average deviance from our coding was about 50 m in both; 80% of the addresses agreed within 100 m for all three codings; and our coding agreed within 100 m for almost 90% in comparison with any one of the other two codings. The maximal deviance was 168 m and 237 m, respectively, when our coordinates were compared with those from the external suppliers.
The estimated annual mean concentration levels for NO 2 from road traffic and SO 2 from heating, at all of the 10,800 addresses in the study (i.e., considering the whole period for every address), shows clearly the increase of NO 2 and decline of SO 2 in Stockholm County (Figures 6 and 7). The corresponding annual means for the study population individuals are similar to those of all the addresses. There is a slight tendency of temporal decrease in the individual means compared with the average address means, caused by a net migration from more polluted to less polluted areas (Figures 6 and 7).
All individual exposure indicators show approximate normal distribution (Table 3). Despite the increase in NO 2 levels during the period, the average estimated individual traffic-related NO 2 levels during the three decades before the end of follow-up are well below the current World Health Organization (WHO)   ambient concentration guideline of 40 µg/m 3 for NO 2 , which refers to contributions from all sources. This guideline defines an annual limit value for the protection of human health, and is numerically identical to that of the European Union recommendation proposed for 2010 (24). The maximum of the individual period average is about 11 times higher than the minimum. The total range extends just over the WHO limit. Despite the decline of emissions since 1960, the average estimated individual heating-related SO 2 level during the 30-year study period exceeds the current WHO recommended guideline of 50 µg/m 3 for the annual mean ambient SO 2 concentration from all sources (25). About 58% of the individuals have study-period mean exposures from heating-related sources that exceed this level. The total range of individual average levels is quite large, with a maximum that is about 20 times higher than the minimum. That the levels decrease with time is reflected in differences in individual average levels in the three decades preceding individual end of follow-up. The mean and most other parameters decrease by about two-thirds from the first to the last decade ( Table 3). The interquartile ratio is about 1.7 for all exposure metrics, and the ratio of the means of the upper and lower quartiles is about 3 ( Table 3).
The individual 30-year average exposures to NO 2 and SO 2 are correlated (r = 0.65). Despite this correlation, there are obvious differences in individual exposure to the two agents. For example, at an individual average SO 2 exposure of 80 µg/m 3 , there is a 3-fold total variation (15-   Year Mean (area) Mean (population) 10th to 90th percentile (population)

Discussion
Recently, atmospheric dispersion models have been applied in a retrospective study of health effects of a point source of arsenic dust (26). Some other epidemiologic studies have used regression models of measured air pollution levels in GIS-based exposure assessment-for example, the Small Area Variation in Air Pollution and Health (SAVIAH) project (27). To our knowledge, our study is the first attempt at retrospective application of atmospheric dispersion models to assess exposure to a wide range of emission sources that are followed over several decades. We assessed the exposure at the individual level, linked to residence over a 30-year period, and based our assessment on indices of complex mixtures of air pollution from road traffic and house heating in an entire region. The geocoding of addresses is often central to the use of a GIS in environmental epidemiology research. The success of the geocoding depends on the quality of the databases available. It became clear in the present study that the regional address database contained errors such as spelling mistakes and local differences in the completeness of data. This is likely to occur in most detailed databases of this kind. Also, the interpolation technique, used in the present study, yields a set of address coordinates close to the road, which should describe road traffic exposure more accurately than the method using the property centroid. A more formal validation of the address coordinates (e.g., by using the satellite Global Positioning System technique) was not performed, but we compared our method to geocoding using two other address databases, for a subset of the addresses. There was a difference of less than 100 m for 90% of the addresses. Since the highest resolution of the air dispersion model was 100 m × 100 m, the geocoding accuracy seems satisfactory.
We assessed both NO x and NO 2 as indices of pollution from road traffic. NO x represents the direct emissions and is thus, in principle, a better index of traffic exhaust exposure. NO 2 is also important because of its toxicity and widespread use in exposure assessment. The relation between the two indices was derived from observations in Stockholm in the 1980s. The assessed relation is relevant for the period before catalytic converters were implemented, and showed good agreement with the measured relation in the beginning of the 1990s. However, it may not be used outside our study period and location, because it is dependent on composition of automobile fleet and climatologic conditions.
Gaussian dispersion models running on personal computers can now be implemented in several places (7). The main challenge is not the model itself, but rather the quality of the emission and meteorologic data. We used the same average yearly distribution of meteorologic conditions for all study periods, which should be sufficient for calculation of long-term average levels. The alternative is using time-series weather data, which would allow spatial assessment of the pollutant level day to day. However, this places enormous demands on computational capacity and was judged unnecessary. The ultimate answer on how well a model run performs is gained by comparison with independently measured pollutant levels. Unfortunately, we could perform this type of validation only for NO 2 .
The 1993 emission database for Stockholm County contains very detailed descriptions of road traffic and house heating. We constructed historic databases for 1980, 1970, and 1960 mainly by successive additions and deletions to the 1993 database. Modeled ambient NO 2 data based on the 1980 database compared well with actual measurements for that period. The current database has been somewhat improved, and we used the 1995 edition to compare measured NO 2 data in 16 locations throughout Stockholm for 1994-1997. Modeled annual average levels correlated very well (r = 0.96) with the measurements [site averages in the range 4-26 µg/m 3 (28)].
Assumptions regarding emission factors, proportion of heavy traffic, and different heating fuel qualities are likely to be less valid for the beginning than for the end of the study period. The average pollutant levels assessed by modeling based on the early databases may therefore be subject to greater measurement error. For traffic emissions, however, the main source of geographic exposure contrast at any given time is the difference in traffic work between different neighborhoods, which is well documented in traffic flow charts from the 1960s onward. The interindividual NO x or NO 2 exposure contrasts within limited time periods, e.g., decades, are therefore expected to be reasonably valid in the beginning of the study period.
In contrast, the differences in SO 2 levels depend on the distribution of different heating methods. The estimated emissions of SO 2 from heating in 1970 were lower per capita than those estimated for the whole country based on fuel consumption in the beginning of the 1970s (29,30). Despite this discrepancy, we judge the temporal development of average house heating to be quite accurate. However, the geographic variations are less well documented in the reconstructed emission databases. The interindividual SO 2 exposure contrasts in the early study period may therefore be less valid than those for traffic exhausts.
Despite the mobility of study subjects (on average about three residences in 30 years), the results of the present study show ranges of assessed individual exposure to be quite wide. Considering the average individual exposure for three decades before the end of follow-up, the ratio between maximum and minimum was approximately 20 for SO 2 and 10 for  Table 3. Descriptive statistics for average estimated individual exposure to NO 2 from road traffic and SO 2 from house heating, used in a case-control study of lung cancer in Stockholm. Individual exposure is reported for three decades before the year in which follow-up ended. Lung cancer incidence was followed up in 1985-1990 and the individual decades fall within the calendar years 1955-1969, 1965-1979, and 1975-1989 When the subjects are divided in quartiles, the ratio between the average exposure in the upper group and that in the lower group is almost 3 for both indices. Within the different decades, this contrast is somewhat higher. It thus seems possible to categorize this population in different levels of exposure, which permits the detection of exposure response relations. Furthermore, despite the overall correlation between the two agents, there were substantial differences in the relation between exposures to NO 2 and SO 2 for study individuals, which should make it possible to differ between these two pollutants in epidemiologic analyses. However, the outdoor levels assessed in residential areas to a varying extent reflect true individual total exposure. Exposure contributions at work and in traveling are not included. In addition, perfectly measured individual levels of these pollutants have proxy properties, because they are single components in very complex mixtures of pollutants from road traffic and heating. The relation of NO 2 and SO 2 with the other components of these mixtures is likely to have changed over time, and may also vary across the territory. Any positive finding of exposure-response based on this exposure information is therefore likely to be an underestimate of the total effects of air pollution from these sources.

Conclusion
To our knowledge, this is the first attempt to use reconstructed emission data together with dispersion modeling and GIS for retrospective individual exposure assessment of multisource air pollution in an epidemiologic study. The results show that the technique may be useful for exposure assessment in environmental epidemiologic studies, given that detailed emission data or environmental exposure data are available for study-relevant time periods and with good spatial resolution. Another prerequisite is that the biologically relevant true individual exposure contrast must be high enough for individual contrast to remain in a geographically based assessment of proxy exposures.