Uncertainty propagation of meteorological and emission data in modeling pollutant dispersion in the atmosphere

Variability is true heterogeneity existing within a population that cannot be reduced or eliminated by more or better determinations. Uncertainty represents ignorance about poorly characterized phenomena, but it can be reduced by collecting more data. The aim of this paper was to study the impact of the variability and uncertainty of the main variables, i.e., emissions and meteorology, of the PM10 concentration caused by a point source located at Malagueño (Córdoba, Argentina). To perform this analysis, a scheme was developed using the USEPA Industrial Source Complex model algorithms with a Monte Carlo methodology. Using a simulation with one hundred thousand iterations, the concentration distribution was obtained and showed that the uncertainty in wind direction had the greatest impact on the estimates.


Introduction 1234
Atmospheric dispersion models are tools used for predicting the fate and transport of air pollutants to assess the impact of emission sources on air quality (Monteiro et al., 2008).However, because atmospheric dispersion is a stochastic phenomenon, the concentration at a given time and place cannot be predicted accurately (Chatwin, 1982).In addition to the inherent uncertainty and variability of atmospheric processes, there are also errors associated with the air quality models and the parameters used (Hanna et al., 1998).Examples of these are errors associated with the input data, the use of surrogate data, the model formulation and its subsequent application outside its validity range.While variability is the true heterogeneity observed in nature (with temporal, spatial or inter-individual differences), uncertainty is defined as the incomplete knowledge of a specific magnitude whose "real value" could only be established if there were a perfect measuring device (Cullen and Frey, 1999).Variability is a property of the system under study and cannot be reduced even by improving the measurement system.In contrast, uncertainty is considered to be a property of the measurement process and can be minimized, for example, by obtaining more data or higher quality data (Dabbberdt and Miller, 2000).Baumann-Stanzer and Stenzel (2011) quantify the effects of meteorological uncertainties over hazard distances using different models for scenarios with chlorine, ammoniac and butane.Hanna et al. (2007) analyze the effects of emissions, meteorological and dispersion model parameter uncertainties over the annually averaged concentrations of benzene and 1,3-butadiene estimated with ISC3 and AERMOD models with the Monte Carlo method.Yegnan, et al. (2002) calculate the uncertainty in ground-level concentrations of ISCST calculations using first-and second-order Taylor series.
Garcia-Diaz and Gozalvez-Zafrilla (2012) applied uncertainty and sensitivity analysis methods over the ISC3 model to analyze the influences of wind speed, wind direction, and pollutant emission rate to predict the ground-level concentration of sulfur dioxide emitted by a power plant.
The aim of this work was to evaluate the uncertainty in the estimation of the PM10 concentration (particulate matter ≤ 10 microns), which contributes to the ambient value, associated with a point source located 2 km south of Malagueño City (Province of Córdoba, Argentina).Here, only one source of uncertainty was considered: the dispersion model input data.This type of uncertainty includes systematic errors or biases in the data collection process, imprecision in analytical measurements, inference due to limited data or an unrepresentative variable, as well as the extrapolation or use of surrogate data for the parameters of interest.
The variables analyzed were meteorological variables (wind speed and direction, atmospheric stability and temperature) and emissions variables (exhaust temperature, emission rate, and exhaust velocity).The meteorological variables were obtained from the National Meteorological Service, and the emission variables were estimated by evaluating the specific factors of industrial processes.To address this problem, the ISC (Industrial Source Complex) (EPA, 1995) dispersion model was used in conjunction with the Monte Carlo simulation (MC).

Dispersion Model
From the wide variety of mathematical tools available for dispersion modeling (lagrangian, Eulerian, grid models, etc.), a Gaussian model application, the ISC3 computational model (EPA, 1995a), was chosen due to its simplicity and low computational costs.ISC3 is a steady-state Gaussian Plume Model that was created by the U.S. Environmental Protection Agency (EPA) for regulatory purposes.It was the preferred EPA model before being superseded by AERMOD (EPA, 2005) in 2006.Thereafter, ISC has been known as an "alternative model", despite its current use as a regulatory model in many parts of the world, as a result of its robustness, adaptability to different situations, availability of required data and relative ease-of-use compared to more advanced models.

Uncertainty propagation: Monte Carlo Simulation
The steps usually followed in the MC simulation are (i) identify the mathematical model that best represents the system under study, (ii) describe the probability distributions of the variables of the model, (iii) take random samples from different distributions that characterize the input data, and (iv) obtain the output value set by the mathematical model for each sample.Finally, statistical analysis is performed on the output values to support decision making (Glasserman, 2003).The minimum number of iterations in a MC simulation depends on the quantity of input parameters (and whether they are correlated or not) and the confidence required in the output probability distribution (Graettinger and Dowding, 2001).Although other methods exist, the most direct route for generating a random sample from a given distribution is the inverse method (Raychaudhuri, 2008).This scheme uses the inverse of the cumulative distribution function by converting a random number between 0 and 1 to another random value of the input distribution.As a rule of thumb, the model must be run a sufficient number of times to achieve numerical stability in the distribution tail.

Emission source characterization
The emission source used as the application case belongs to a cement plant located in Malagueño (Province of Cordoba).Currently, in this town, there are several activities related to lime mining and its industrialization.These operations emit particulate matter that can affect human health, primarily the respiratory system (WHO, 2005).The particulate source analyzed uses a so-called "dry process" for the production of cement.Prior to emission in the atmosphere, the combustion gases are filtered using a baghouse filter.
Emission rates are normally determined by continuous or periodic emission monitoring using material balances or through emission factors derived from similar sources.In our case, there were no monitoring data and no access to the process material balance; emission factors for PM10 were estimated using the AP-42 guidance (EPA, 1995b).According to Schvarzer and Petelski (2005), the production of this plant (its activity rate) in 2005 was 1.5 x 10 6 tons of cement, which resulted in an emission rate of 4.75 g/s.Although the current production is unknown, it was reported that, in 2005, the consumption of cement in Argentina was 9 x 10 6 tons (Farfaro Ruiz, 2010); 11.5 x 10 6 tons were sold by late 2012.Assuming a linear relationship exists in recent years between the consumption and the production of cement in Argentina (Farfaro Ruiz, 2010), it was estimated that the Malagueño plant could be producing 1.9 x 10 6 tons, resulting in an emission rate of 6.02 g/s of PM10 in 2012.
Because knowledge of the behavior of the emission variables was limited, probability distributions of a minimum complexity were used.Based on this, the values utilized were the upper and lower limits with the mean value considered to be the most likely value of a triangular distribution.The exhaust velocity and temperature of the gases were taken from the description of similar processes (Schneider et al. 1996;EQM 2008;EC, 2000;NEPA, 2004)

Characterization of meteorology
Meteorological data are the cause of more than half of the uncertainty in predicting hourly concentrations with dispersion models (Rao, 2005).In addition to the natural variability of the atmosphere, the use of meteorological data taken at non-representative locations, the use of inappropriate instruments or non-systematic recording and data storage are some of the most influential factors affecting a dispersion model's uncertainty.
For the present study, 5 years of consecutive data (2007)(2008)(2009)(2010)(2011) were provided by the National Weather Service (NWS) and measured at Ambrosio Taravella International Airport (approximately 20 km from the application zone).These data correspond to wind speed, wind direction, ambient temperature and atmospheric stability.
Two of the variables provided by the NWS (i.e., wind speed and direction) are characterized by discrete jumps.These discontinuities may be caused by the sensitivity of the measuring instruments, the methodology used for the processing of measurements and how the data are "packaged" for distribution.This characteristic generates uncertainty of the "true" distribution of both variables because they are continuous variables.Therefore, the measurement and recording of these variables generated an intra-range uncertainty, and consequently, both wind direction and wind speed were first characterized as continuous variables and then as discrete variables to assess the impact on the estimated PM10 concentration.The distributions for wind speed and direction are shown in Figures 1 to 4. Additionally, the effect of the interannual variation of the meteorology was also taken into account by using each of the five years independently to obtain more robust confidence intervals.

Variability and uncertainty
The variability and uncertainty analysis was performed over the maximum exposed receptor (MER).To determine the MER, EPA's ISC model was run.The urban area of Malagueño was represented by a 36 km 2 grid (with the origin at the point source) and a total of 931 nodes every 200 meters.Furthermore, the study area was considered rural because less than 50% of the area of influence (determined by a circle with a 3 km radius that was centered on the point source) is industrial, commercial or residential (EPA, 2005).
Then, the Monte Carlo method was applied.Two runs per year (for a total of 10 runs) were performed: the first run with wind speed and wind direction as "continuous" and the second run with wind speed and wind direction as "discrete" while maintaining the rest of the atmospheric variables (temperature and stability) under the same characterizations.Each simulation for the MER was performed with 1x10 5 iterations.The simulations for each of the 10 scenarios were constructed using the uncertainty bands shown in Figure 5. Specifically, for each scenario, a distribution of concentrations (response distribution) was calculated.With these 10 response distributions, the confidence limits were estimated.The "y" axis is the cumulative frequency, now named the "variability percentile".This naming convention helps to distinguish the empirical frequency distributions of the input variables and the output (concentration) percentiles estimated with the Monte Carlo method.Second, it emphasizes the fact that this axis shows the temporal variability on the MER.As can be observed in fig.5, the uncertainty bands are wider between the 30th and 90th percentiles, whereas the bands became closer in the lower and higher percentiles.According to the EPA (USEPA, 2002), decisions regarding the management of health risks should be based on the worst case scenario.If sufficient information exists (e.g., to build probability distributions), decisions may be based on the tail of the distribution (e.g., 90th or 95th percentiles).Following EPA suggestions, Figure 6 shows the confidence intervals (uncertainty) at the 90th and 95th percentiles of concentration's variability.
From this figure, it can be argued with 90% confidence (width of the confidence interval) that 90% (i.e., 90 th percentile) of the PM10 hourly average concentration did not exceed 3.62 g/m 3 ; 2.97 mg/m 3 is the most likely value for this percentile.In addition, the 95th percentile of the PM10 average hourly concentration (90% confidence) did not exceed the value of 3.94 g/m 3 , with a median value of 3.39 g/m 3 .

Sensitivity
Sensitivity analysis evaluates how the change in a model's output variables can be attributed, qualitatively or quantitatively, to different sources of variation in the input data (Saltelli et al., 2008).In Figure 7, the Spearman correlation coefficient is shown; it takes into account the behavior of the 50th and 95th concentration percentiles.The emission variables (PM10 emission rate, exhaust velocity and temperature) are grouped under one label due to their joint effect on sensitivity.The variables with the greatest impact on the median (50 th percentile) were the meteorological variables, primarily wind speed and wind direction, followed by the ambient temperature and stability.The set of emission variables (emission rate, speed and gas temperature) had the lowest impact.Given the implications for decision making, what is observed in the 95th percentile (distribution tail) is very important.In this percentile, the wind direction was the variable with the greatest impact.
In contrast with the 50th percentile, the second most influential variable was the uncertainty introduced by the emissions and was, to an extent, similar to the wind speed, atmospheric stability and ambient temperature with the lowest impacts.

Conclusions
The concentration of PM10 (and any other air pollutant concentration) is a random variable that cannot be predicted accurately, but it can be described using a probability distribution, which provides a better understanding of the concentration.A deterministic estimate does not provide complete information of the scenario because the concentration's uncertainty is not considered.For that reason, ISC3 (and other regulatory models) should be modified to accept uncertainties as input data and to report estimated concentrations along with its uncertainties.This could be used to allow decision makers to evaluate the validity of these estimates in actual applications.For example, if the lower confidence limit value (5th percentile) was above a regulatory standard, then corrective/preventive actions will most likely be needed.If the upper confidence limit (95th percentile) was below the standard, then corrective/preventive measures will most likely not be required.Thus, dispersion modeling supported by an uncertainty analysis could provide an important tool for decision making.
Modeling the dispersion of air pollutants requires a large number of inputs, some of which are subjected to large uncertainties.Additionally, assumptions often have to be made that tend to overestimate the actual concentrations.
In this work, the variability and uncertainty in PM10 concentration modeling was estimated using a point source located in Cordoba Province as a case study.Two main sources of uncertainty were considered: meteorology and emissions.
The meteorology data used presented discrete jumps in wind speed and wind direction, even though both variables are continuous.Therefore, the propagation of this source of uncertainty was analyzed considering (i) continuous probability functions and (ii) discrete probability functions.
The AP-42 method was used for the estimation of the emission factors, which also introduced uncertainties into the calculations; however, these uncertainties were not considered here because, according to our criteria (uncertainty is a property of the analyst), this uncertainty is overlapped by the uncertainty in the activity rate given that there are not recent, available data.
Five years of meteorological data were fitted to several distributions, and 10 runs (two runs per year) were performed (1x10 5 iterations each) with the support of the ISC3 model combined with Monte Carlo.Ten PM10 concentration distributions were obtained for the maximum exposure receptor.
By defining wind speed and direction with discrete distributions, it was observed that higher concentrations were produced with respect to their counterparts, resulting in very large uncertainty bands between the 30th and 90th percentiles.With a 90% confidence, the 90th and 95th percentiles were 3.62 ug/m 3 and 3.94 ug/m 3 , respectively.
Through sensitivity analysis, it was determined that the direction and emission variables were the most important factors affecting the uncertainty in the distribution tail.With these findings, it can be concluded that by obtaining higher quality data (wind speed, wind direction and emission variables), the uncertainty bands will decrease dramatically and thus improve the quality of the estimations made with the ISC3 dispersion model.
Finally, it is important to note that although there are other meteorological and emission variables subject to uncertainties (e.g., roughness length, mixing height, precipitation, cloud cover, ceiling height, and emission factor), they were not considered here.Otherwise, it would be important to study the implications of these factors' uncertainties in the dispersion modeling.

Figure 5 .
Figure 5. Uncertainty bands of PM10 hourly concentrations.The "y" axis shows the cumulative frequency bands of the PM10 concentration ("x" axis).These bands indicate the lower and upper limits (confidence interval 90%) of the speculated "actual" variability

Figure 6 .
Figure 6.90th and 95th percentile box plot of PM10 concentrations.Here, the confidence intervals (uncertainty) for the EPA's suggested percentiles are highlighted.Note that this graph has the same axes as Figure 5 (and nearly the same information), but it emphasizes the percentiles of interest

Figure 7 .
Figure 7. Spearman correlation coefficient of the 50th percentile

Figure 8 .
Figure 8. Spearman correlation coefficient of the 95th percentile

. Parameters used for characterizing emission uncertainties
, which employed ranges to describe these variables.The following table summarizes the values used: