Evaluation of the high resolution WRF-Chem ( v 3 . 4 . 1 ) air quality forecast and its comparison with statistical ozone predictions

An integrated modelling system based on the regional online coupled meteorology–atmospheric chemistry WRF-Chem model configured with two nested domains with horizontal resolutions of 11.1 and 3.7 km has been applied for numerical weather prediction and for air quality forecasts in Slovenia. In the study, an evaluation of the air quality forecasting system has been performed for summer 2013. In the case of ozone (O3) daily maxima, the firstand second-day model predictions have been also compared to the operational statistical O3 forecast and to the persistence. Results of discrete and categorical evaluations show that the WRFChem-based forecasting system is able to produce reliable forecasts which, depending on monitoring site and the evaluation measure applied, can outperform the statistical model. For example, the correlation coefficient shows the highest skill for WRF-Chem model O3 predictions, confirming the significance of the non-linear processes taken into account in an online coupled Eulerian model. For some stations and areas biases were relatively high due to highly complex terrain and unresolved local meteorological and emission dynamics, which contributed to somewhat lower WRF-Chem skill obtained in categorical model evaluations. Applying a bias correction could further improve WRF-Chem model forecasting skill in these cases.


Introduction
Real-time air quality forecasting (RT-AQF) is a relatively new discipline in atmospheric sciences, which has evolved as a response to societal and economic needs, reflecting the progress in scientific understanding of physical processes and numerical and computational technologies (Zhang et al., 2012a).The first RT-AQF systems, developed for forecasting air pollution in exposed urban regions, were either empirical methods based on persistence, climatology, human expertise and meteorological forecast (e.g.Wolff and Lioy, 1978), or statistical models taking advantage of links between pollutant concentrations, meteorological variables (wind speed and direction, temperature, cloudiness, moisture etc.) and physical (emissions) parameters (e.g.McCollister and Wilson, 1975;Cobourn, 2007;Vlachogianni et al., 2011).The next step in evolution of RT-AQF systems was the use of sophisticated chemical transport models that represent all major processes (meteorological and chemical) that lead to the formation and accumulation of air pollutants.Many of these RT-AQF systems consist of an offline coupled meteorological model and a chemical-transport model, where the meteorological model (e.g.ALADIN, ALADIN International Team, 1997;MM5, Grell et al., 1994;WRF, Skamarock et al., 2008) provides meteorological input for the chemical-transport model (e.g.EMEP, van Loon et al., 2004;CMAQ, Byun and Schere, 2006;CAMx, ENVIRON, 2011;CHIMERE, Menut et al., 2013) with an output time interval typically of around 1 h.Examples are the EURAD (http://db.eurad.uni-koeln.de/index_e.html),SILAM (http://silam.fmi.fi/), ForeChem (http://atmoforum.aquila.infn.it/forechem/),CALIOPE (http://www.bsc.es/caliope/)forecast systems and others.The new generation of online coupled models (e.g.MCCM, Grell et al., 2000;GATOR-GCMM, Jacobson, 2001; Meso-NH-C, Tulet et al., 2003;WRF-Chem, Grell et al., 2005;Enviro-HIRLAM, Baklanov et al., 2008; GEM-AQ, Kaminski et al. 2008;COSMO-ART, Vogel et al., 2009; WRF-Chem-MADRID, Zhang et al., 2010a) presents an alternative approach with one unified modelling system, in which meteorological and air quality variables are simulated together within the same model.The online approach permits the simulation of two-way interactions between different atmospheric processes including emissions, chemistry, clouds and radiation, and a better response of the simulated pollutant transport to changes of the wind field (Grell et al., 2004) and, thus, provides a more realistic representation of the atmosphere.The use of online coupled models can be particularly important in regions with high aerosol loadings and cloud coverage (Otte et al., 2005;Eder et al., 2006), where physical processes in the atmosphere may be modified by the aerosol direct effect on radiation or by aerosol-cloud interactions.Several reviews summarized the strengths and limitations of offline and online coupled models (e.g.Zhang, 2008;Klein et al., 2012;Baklanov et al., 2014).There is an increasing awareness that an integrated online approach is needed not only for assessment, forecasting and communication of air quality but also for weather forecasting (e.g.Baklanov, 2010;Grell and Baklanov, 2011;Klein et al., 2012;Zhang et al., 2012b;Baklanov et al., 2014).Nevertheless, there are several issues regarding the inclusion of chemistry into numerical weather prediction models.More evidence is required to determine whether an integrated model can produce a good climatology of the most important chemical species and if such a model is, considering many uncertainties, able to beat persistence forecasts of these species (Grell and Baklanov, 2011).These questions are calling for further research and studies exploring the performance of the models with an online coupled chemistry.
In recent years extensive efforts have been devoted to develop air quality (AQ) forecasting systems for Slovenia.In this study we explore the use of the state-of-the-science WRF-Chem model (Grell et al., 2005) with coupled meteorological, microphysical, chemical, and radiative processes for forecasting AQ in Slovenia during summertime conditions.In last decade WRF-Chem has been increasingly applied to many areas worldwide (e.g.Misenis and Zhang, 2010;Fast et al., 2009;Zhang et al., 2010a, b;Li et al., 2011;Tie et al., 2009;Hu et al., 2012;Forkel et al., 2012Forkel et al., , Žabkar et al., 2011Forkel et al., , 2013)).In most of these studies the WRF-Chem model has been successfully used to simulate historical poor AQ conditions with a hindcast approach.To our knowledge, only a few studies focused on using WRF-Chem for forecast-ing AQ, most of these applied the WRF-Chem forecast before and during field campaigns (McKeen et al., 2005(McKeen et al., , 2007(McKeen et al., , 2009;;Yang et al., 2011).Takigawa et al. (2007) evaluated the O 3 forecast for a 1-month time period from a one-way nested global-regional RT-AQF system with full chemistry based on the global CHASER (Sudo et al., 2002) and regional WRF-Chem models, while Saide et al. (2011) evaluated a forecast system based on the WRF-Chem model for simulating carbon monoxide (CO) as a PM 10 /PM 2.5 surrogate over Santiago de Chile for wintertime conditions.WRF-Chem-MADRID (Zhang et al., 2010a) with two additional gas-phase mechanisms, sectional representation for particle size distribution and more advanced model treatments compared to WRF-Chem, was applied by Chuang et al. (2011) and by Yahya et al. (2014) for forecasting AQ over the southeastern US.In spite of a limited number of evaluation studies published in the literature, an increasing number of real-time weather and air quality forecasting systems based on WRF-Chem are implemented worldwide (http://ruc.noaa.gov/wrf/WG11/Real_time_forecasts.htm).
In our study we explore the forecasting skill of WRF-Chem model over the topographically complex and geographically diverse area of Slovenia for three summer months (June-August 2013).Furthermore, in the case of O 3 we compare WRF-Chem predictions with a statistical model for predicting O 3 daily maxima, currently used at the Slovenian Environment Agency (SEA).Both first-day (1-day) and secondday (2-day) forecasts are considered, while a persistence model, which assumes that pollutant level today and tomorrow will be the same as yesterday, is used as a threshold for useful model prediction.Since the availability of an accurate and reliable forecasting system could be useful to the local authorities and could help to advise the public the proper preventive actions, we want to answer the question of whether WRF-Chem model outperforms the statistical model or persistence.Namely, considering many uncertainties related to one unified model, it may not be easy for models with online chemistry to be able to perform well enough to meet the required standards, and more research and studies are needed to investigate that (Grell and Baklanov, 2011).Due to the limited number of previous studies focused on online coupled forecasting systems, the aim of our study is also to provide a greater insight into the potential that lies in the approach based on a unified model for forecasting weather and air pollution.Finally, identified strengths, limitations and deficiencies of analysed RT-AQFs are expected to present the basis for further research.

WRF-Chem forecast system
The RT-AQF system for Slovenia based on the WRF-Chem model version 3.4.1 is configured with two nested domains (Fig. 1) with horizontal resolutions of 11.1 and 3.7 km, and 151 × 100 and 181 × 145 grid points, respectively.A oneway nesting is applied by two separate consecutive simulations, where outputs from the coarse grid integration are processed to provide boundary conditions for the nested run every 15 min.The vertical structure of the atmosphere is resolved with 42 vertical levels extending up to 50 hPa, with the highest resolution of ∼ 25 m near the ground.About 15 levels are located within the lowest 2 km to assure high vertical resolution of the daytime planetary boundary layer (PBL).To produce the 48 h forecast, the model is run every day, starting at 00:00 UTC, with meteorological initial conditions (ICs) and lateral boundary conditions (BCs) taken from the 0.5 • data of the Global Forecast System (GFS) operated by the US National Weather Service (NWS).For chemical BCs, forecasts from the global MOZART-4/GEOS-5 (Emmons et al., 2010) RT-AQF system with temporal availability of 6 h are used.The instantaneous outputs at the 24th hour of the previous day's forecast are used to initialize next day's forecasting simulation.An exception is the very first day of the first 48 h forecasting cycle, when global MOZART-4/GEOS-5 fields were also used to initialize chemistry.A 3-day spinup ahead of the first analysed forecast day is then taken into account to allow pollutants to accumulate in the air masses.
In the WRF-Chem model, several choices for parameterizations of physical and chemical processes are available (Grell et al., 2005;Skamarock et al., 2008;Peckham et al., 2012) and these can have a strong impact on the model predictions.In both domains we decided to apply the same schemes as were used in simulation SI1 for phase-2 of the Air Quality Model Evaluation International Initiative (AQMEII) (e.g.Balzarini et al., 2015;Baró et al., 2015;Curci et al., 2015;Forkel et al., 2015;Im et al., 2015a, b;Kong et al., 2015;San Josè et al., 2015).These include the Yonsei University (YSU) PBL scheme (Hong et al., 2006), NOAH land-surface model (Chen and Dudhia, 2001), Rapid Radiative Transfer Method for Global (RRTMG) long-wave and short-wave radiation scheme (Iacono et al., 2008), Grell 3-D ensemble cumulus parameterization scheme (Grell and Devenyi, 2002) with radiative feedback, Morrison doublemoment cloud microphysics (Morrison et al., 2009), Fast-J photolysis scheme (Wild et al., 2000), RADM2 gas-phase chemistry (Regional Acid Deposition Model, Stockwell et al., 1990) and the MADE/SORGAM aerosol module (Ackermann et al., 1995;Schell et al., 2001).Current model implementation includes a modified RADM2 gas-phase chemistry solver as described in Forkel et al. (2015), which avoids underrepresentation of nocturnal O 3 titration in areas with high NO emissions.According to Forkel et al. (2015) the modified solver tends to overestimate the low NO 2 concentration for pristine regions and in the free troposphere, which results in an overestimation of O 3 .Due to the focus on polluted regions this deficiency was considered as less important than the advantage of better description of the titration.In addition, the comparatively small modelling domain (D1) ensures that the boundary conditions constrain the high bias of the modified solver for O 3 and NO 2 in the free troposphere.Also, according to our sensitivity tests (results not shown) the modified solver showed better performance for O 3 daily maxima and O 3 nighttime minima than the QSSA (quasi steady state approximation) RADM2 solver supplied originally with the WRF-Chem model.
Among feedbacks only the aerosol direct effects on radiation according to Fast et al. (2006) and Chapman et al. (2009) are taken into account.As shown by Kong et al. (2015) for two air pollution episodes, this degree of aerosol-meteorology interactions in the 3.4.1 version of the WRF-Chem improved model performance for high aerosol loads, while the representation of the indirect effects needs to be further improved to be able to outperform simulations with direct effects only.
Biogenic emissions are estimated using MEGAN (Model of Emissions of Gases and Aerosols from Nature; Guenther et al., 2006) online model calculations, while dust emissions are modelled according to Shaw et al. (2008) with an adjustment to avoid high dust fluxes from some Dalmatian Islands in Croatia.A detailed anthropogenic inventory for pollutants CO, NH 3 , NO x , SO 2 , and NMVOCs (non-methane volatile organic compounds), created for the purpose of AQ forecasting constructed for year 2009 by SEA (SEA, 2014), is used to estimate anthropogenic emissions in Slovenia.For areas outside Slovenia the recently updated anthropogenic emissions for the year 2009 based on the TNO-MACC-II (Netherlands Organization for Applied Scientific Research, Monitoring Atmospheric Composition and Climate -Interim Implementation), the same as prepared for phase-2 of the AQMEII exercise (Pouliot et al., 2014), are being used.Daily updates of the WRF-Chem-based experimental AQ forecast are provided at http://meteo.fmf.uni-lj.si/onesnazenje.

Statistical ozone daily maximum forecast
The statistical O 3 model (Žabkar, 2011), currently used at SEA for forecasting O 3 daily maxima at eight measuring sites in Slovenia (Fig. 3), is a multivariate regression tool combined with clustering algorithms to take into account measured data, weather forecast data, and the predicted backward trajectories of each monitoring site.As regards measurements, the previous day's (at 12:00, 15:00, 18:00 and 21:00 local time, daily maximum, daily minimum, daily average) and present day's early morning (07:00 local time) meteorological (pressure, relative humidity, direct and diffusive solar radiation, wind speed) and AQ data (O 3 , NO x , NO 2 , CO, PM 10 , SO 2 ) are used.For meteorological predictions, the 24 h ECMWF forecast variables at 12:00 UTC of the forecast day at different vertical levels (1000, 925, 850, 500, 300 hPa) above the measuring sites are taken into account.Among all these variables, by using the stepwise technique based on F statistics, only significant variables were selected to be included in multivariate regression equations for different monitoring sites (from 15 to 26 variables, depending on monitoring site).
The important part of the statistical forecast is the calculation of 24 h backward trajectories on meteorological fields of the ALADIN/SI forecast.The inclusion of 24 h predicted trajectories into a statistical model is based on the study of Žabkar et al. (2008) which showed that the highest O 3 daily maxima at monitoring sites in Slovenia are in general associated with short (slow-moving) backward trajectories with a southwestern origin, while the lowest measured daily maximum O 3 values for all the stations are associated with the clusters of long northwestern trajectories.Clusters of similar trajectories were for the purpose of the statistical forecast calculated by k means clustering algorithms (Moody and Galloway, 1988;Žabkar et al., 2008) on 6 years (2004-2010) of data (ALADIN/SI trajectories).As an example, Fig. 2 shows a mean O 3 daily maxima for clusters of similar trajectories for one of the monitoring sites.The same 6-year time period of training data was used in the stepwise multiple regression procedure to determine the multiple regression prognostic equations associated with monitoring sites and trajectory clusters, from measurements, ECMWF forecast data, average cluster O 3 daily maximum, and day-of-the-year variable.
The first step of the statistical O 3 prediction is the calculation of trajectories approaching the monitoring stations at 12:00 UTC of the forecast day.In the next step, the backward trajectories of each monitoring site are associated with the nearest pre-calculated cluster of similar trajectories.Finally, the multiple regression equation of the associated group of trajectories is used to calculate the O 3 daily maximum prediction.It must also be noted that the decision on declaring O 3 episodes is only partially based on the results from this statistical model, it also involves a decision made by AQ forecasters.

Evaluation methodology
We evaluate the 1-day and 2-day WRF-Chem meteorological and AQ forecasts on the high-resolution domain during a 3-month period (June-August 2013).The main focus is on O 3 predictions.In the case of air pollutants, the instantaneous lowest model level mixing ratios (with grid point centre at about 12 m above model orography -an exception is the KRV station as explained below) are compared to the hourly averaged concentrations measured at monitoring stations (which have a typical inlet height of 3 m) from the national network and some other environmental information systems in Slovenia.Figure 3 shows locations of these AQ monitoring stations and Table 1 lists the basic characteristics, including comparison of the station altitude, the height of model orography, model analysis height, and pollutants with higher than 75 % availability of valid data during the analysed time period for each of the AQ monitoring sites.In the case of the elevated alpine KRV station, AQ variables are evaluated for the fifth model layer instead of the first model layer.We made this exception for KRV since the height of the model topography was significantly underestimated there (Table 1) and the station is known to be strongly influenced by the conditions of the free troposphere.The selection of the fifth model layer for the KRV station is based on analyses performed for different model layers (results not shown) and was found to reduce the negative bias for O 3 due to too low WRF-Chem topography at this location.Although even for this model layer the location of the grid point representing the KRV station (1414 m) is still well below the true station altitude (1740 m), the O 3 bias for the KRV station is signifi- cantly smaller than for the first layer, while the correlation coefficient between the measured and simulated O 3 levels remains similar in both cases (the fifth or the lowest model layer).Taking results from higher model layers would further decrease the negative model bias but would also worsen the correlation coefficient for O 3 at this station due to decreased impact of surface processes.
All AQ stations are background, seven are measuring urban background, one suburban and nine rural conditions.Valid O 3 measurements are for the analysed time period available for 13 AQ stations.When studying the general model performance, data from four additional stations for two other pollutants (NO 2 , PM 10 ) are also analysed to get a better picture of model behaviour over the domain, known for its large topographical and climate diversity.The coverage of three climate zones in Slovenia (Mediterranean, subalpine and mountainous) with monitoring stations is the following: NG, KOP and OTL are Mediterranean sites, KRV is a mountainous station, and the remaining stations are subalpine.As the elevated station KRV, the ISK, OTL and VNA stations are also influenced by regional transport of pollutants.
For evaluation of predicted meteorological variables, data from SEA meteorological stations (MET, Fig. 3) for 2 m temperature (T2 m), 10 m wind speed (W10 m), relative humidity (RH), incoming solar radiation (SR) and precipitation (RR) are used.It must be noted that MET stations with lower spatial representativeness (e.g.alpine stations) were not a priori excluded from the analyses, which needs to be taken into account when looking at evaluation results.The reason for not excluding these stations was that some information about the AQ forecast can also be gained by the evaluation of meteorological forecasts for these stations.
Basic statistical measures (correlation coefficient (CORR), mean error (ME), mean absolute error (MAE) and root mean square error (RMSE)) are used for evaluating the model's forecasting skills of meteorological and AQ variables.In the case of O 3 , correlation coefficients are presented also by Taylor diagrams (Taylor, 2001), which graphically summarize the similarity between model forecasts and observations not only in terms of their correlation but also with their centred root mean square difference and the amplitude of their variations, represented by their standard deviations.Furthermore, some additional discrete statistical measures, including in-R.Žabkar et al.: Evaluation of the high resolution WRF-Chem air quality forecast dex of agreement (IOA), the mean normalized bias error (MNBE), and the mean normalized gross error (MNGE) are calculated for O 3 daily maximum concentrations predicted by the different models.Finally, to evaluate the model's ability to predict exceedances and non-exceedances, several categorical indices including equitable threat score (ETS), critical success index (CSI), bias (B), false alarm ratio (FAR) and probability of detection (POD) are calculated for different thresholds.Definitions of the statistical measures are shown in Appendix A.

Meteorology and air quality of June-August 2013
The analysed period was marked by three heat wave events, which contributed to the summer characterized by high temperatures, sunny weather and lack of precipitation in Slovenia.The first heat wave event with a measured temperature daily maximum of up to 35 • C occurred after a rather cold beginning of the month and lasted from 15 to 21 June.The event was terminated by a cold front passage and followed by the pronounced cold episode during the end of June and the beginning of July.Another heat wave event with temperatures above 35 • C, observed in the lowland, started on 26 July and was briefly interrupted on 29 July, when thunderstorms related to frontal passage were accompanied by exceptionally strong wind gusts.The most remarkable of three three extraordinary hot episodes was recorded from 01 to 08 August.On the last day of this episode, 08 August, temperatures reached 40 • C at some measuring sites in Slovenia, and many of them observed their highest temperature ever recorded.
As expected for summertime conditions, measured concentrations of most air pollutants, including PM 10 , were in general low during the analysed time period.The only exception was O 3 with exceedances of the 8 h target value (120 µg m −3 ) measured at all AQ monitoring stations during the three heat wave events, which is the reason why the main focus of the present study is on this pollutant.During the following two events (in July and August) threshold exceedances of 1 h daily maxima were also recorded for O 3 .In spite of the hot and sunny conditions during the first heat wave event in June 2013, measured daily O 3 maxima at the Slovenian stations did not exceed the 1 h information threshold value (1 h ITV; 180 µg m −3 ) but reached 171 µg m −3 at the Mediterranean OTL and the elevated alpine KRV stations.During the second heat wave event, the 1 h daily maxima exceeded 180 µg m −3 at KRV, OTL, NG and KP (23-28 July), and the highest number of 1 h exceedances (20) was measured in July at the OTL station.Similarly, during the August heat wave event O 3 concentrations exceeded the 1 h ITV at LJ, MB, OTL, NG and KP from 02 to 07 August.To summarize, the Mediterranean stations (NG, OTL, KP), due to very high O 3 concentrations measured during the heat wave events (especially the second two events), exhibited the poorest AQ in Slovenia during the analysed time pe-riod, while the legislation's limit values were exceeded only occasionally for the subalpine stations.

Evaluation of meteorological variables
Table 2 shows conventional statistical scores evaluating the 1-day WRF-Chem forecast for the basic meteorological variables: 2 m temperature (T2 m; for hourly values and daily maxima), 10 m wind speed (W10 m), RH and incoming SR.Results for three selected measuring sites (LJ, NG, MS) and the overall result for all 24 MET monitoring sites (shown in Fig. 3) are presented separately.
Incoming solar radiation is the main energy source that drives all atmospheric processes, including PBL processes, and has a critical role also in atmospheric chemistry.For almost all sites the mean SR was overestimated by the model, with overall MEs of 16 and 11 W m −2 for 1-day and 2-day forecasts, respectively.CORR was higher for 1-day (0.77) than for 2-day (0.71) forecasts, with a range of 0.64-0.90for 1-day forecasts at different stations.The larger positive bias for the first day than for the second day can be attributed to less cloudy conditions during the first day of simulation.
In the case of T2 m 1-day (2-day), the WRF-Chem meteorological forecast showed an overall correlation with measurements of 0.93 (0.94) for all 1 h values and 0.97 (0.96) for 1 h daily maxima.With an exception of three alpine stations with higher simulated positive bias, daily T2 m maxima were simulated with MEs between −3.9 and −0.6 • C, depending on the station's spatial representativeness.All meteorological variables, including soil temperature and soil moisture, are always initialized with GFS data.This explains higher negative bias for T2 m during the first day of simulation in spite of the overestimated solar radiation.An average systematic underestimation of T2 m daily maxima was −2.1 • C both for 1-day and 2-day forecasts.Nighttime T2 m minima showed lower systematic bias for the 2-day forecast, which resulted in an overall bias for all hourly T2 m values of −1.3 • C for 1-day and −0.8 • C for 2-day forecasts.Predominantly weak wind conditions with variable direction at stations located in complex topography were challenging to simulate.The general model tendency was to overestimate W10 m with an overall ME of 0.8 m s −1 for 1-day and 2-day forecasts, which for some stations bias can be very low (e.g.LJ; Table 2) and much higher for other stations due to their local positioning in complex topography (e.g.HRA located in valley with ME of 1.9 m s −1 ).For hourly values the correlation is lower (Table 2), but for mean daily W10 m values a Pearson correlation coefficient of between 0.4 and 0.9 has been simulated, depending on the monitoring site.Relative humidity shows slightly better results for 1-day than for 2-day forecasts with CORR of 0.77 and low overall ME of 2 % for the 1-day fore-Table 2. Statistical scores for 1 h values of 2 m temperature (T2 m), 10 m wind speed (W10 m), relative humidity (RH), and for daily average incoming solar radiation (SR).Shown are results for 1-day forecast, calculated separately for three measuring sites (LJ, NG, MS), and for 24 MET monitoring stations (ALL) during the 3-month period.In the case of temperature, results for daily maxima are also shown.2).
Precipitation has an important role in cleansing of the atmosphere by wet deposition and scavenging.On average, the predicted precipitation underestimated the measured 3month accumulations by −55 mm (1-day) or −8 mm (2-day forecast), where the station-averaged, predicted 3-month precipitation was 145 mm for 1-day and 194 mm for 2-day forecasts (results not shown).It must also be taken into account that the 3.4.1 model version does not allow including the information about hydrometeors at the boundaries of the nested domain (in the applied 1-way nesting procedure), which contributes to the negative simulated bias of precipitation.A large decrease in the precipitation bias from day 1 to day 2 suggests that different initialization methodology (e.g. using 1-day spin-up for meteorology) could improve the prediction of precipitation events.

Evaluation of air quality variables
In this section we evaluate WRF-Chem predictions for O 3 , NO 2 and PM 10 , as three of the most problematic pollutants in terms of harm to human health and compliance with EU limit values (EEA, 2012).Table 3 shows the domain-wide performance statistics for 1-day and 2-day forecasts of these pollutants where, in the case of O 3 , 1 h and 8 h averages and daily maxima are analysed separately.The comparison of 1-day and 2-day forecasts shows that concentrations of air pollutants were somewhat better forecasted 1-day ahead by means of almost all of statistics shown in Table 3, with higher impact on O 3 predictions.Although the 2-day prediction was generally not worse for the majority of meteorological variables, the reason for a better 1-day prediction in the case of O 3 could be the somewhat stronger simulated winds on the second day of simulation.Stronger winds impact the transport and dispersion of pollutants and have the greatest consequence on secondary pollutants (like O 3 ) which need time to be formed.
As shown in Table 3 the WRF-Chem simulations tend to overestimate the 1 and 8 h O 3 values with MEs of 14.5 and 14.6 µg m −3 , respectively.Looking at MAE, RMSE and CORR statistics, agreement with measurements is better for 8 h (22.6, 28.1 µg m −3 and 0.69) than for 1 h O 3 values (25.1, 32.1 µg m −3 and 0.65), which is in line with results of previous studies (e.g.Tong and Mauzerall, 2006) and suggests that the current modelling system has problems simulating the small-scale fluctuations of O 3 .On the other hand, evaluations of predicted 8 h and daily O 3 maxima, which are of most concern, show good model performance (ME, MAE RMSE and CORR of −2.7, 13.3, 16.7 µg m −3 and 0.81 for daily maxima, respectively) in line or even better than those obtained in some previous studies (e.g.Tong and Mauzerall, 2006; Chuang et al., 2011;Yahya et al., 2014), which could be to some extent related to higher model resolution.
To understand results of the domain-wide statistics (in Table 3) we further analyse spatial and temporal characteristics of model O 3 predictions.Figure 4 shows a spatial pattern of average, simulated 1-day predictions for O 3 , NO 2 and PM 10 overlaid with measured averages where, in the case of O 3 , results for all hourly values and for daily maxima are shown separately.Examples of forecasted and measured time series for O 3 at different stations are shown in Fig. 5.In Fig. 4a the elevated alpine KRV station is the only one with a high negative bias (−12 µg m −3 ) in forecasted 1 h O 3 concentrations at the lowest model layer, which can be explained by the too low altitude of the KRV station in model topography.The high negative bias for hourly O 3 concentrations at the KRV station is reduced to a value of only −2 µg m −3 by using the fifth model layer concentrations as explained in Sect.2.3.The fifth model level predictions will be used for KRV in all analyses that follow.Besides KRV, the Mediterranean KOP and OTL stations, as well as the rural ZAV site, are stations with comparatively high measured nighttime O 3 levels, which results in a low overall bias for all hourly O 3 values for these stations (from −2 to −7 µg m −3 ).Namely, the WRF-Chem model cannot capture well the profound nighttime O 3 reductions (shown also by Žabkar et al., 2013;Im et al., 2015a), which contributes to the overall overprediction of hourly O 3 concentrations (from 10 to 36 µg m −3 ) for stations with very low measured nighttime O 3 concentrations.For sites with the highest positive bias in 1 h O 3 concentrations (TRB, ZAG, HRA and ISK, with bias of 36, 31, 26 and 32 µg m −3 , respectively), this can also be partly explained by the too high altitude of the stations in model orography (Table 1), since the mean O 3 concentration increases with height.Looking at the O 3 daily maxima (Fig. 4b), the underpredictions occur at the alpine KRV station (−16 µg m −3 for the lowest model level shown in Fig. 4) and at three Mediterranean stations (OTL, NG, KOP; from −14 to −11 µg m −3 ).For Mediterranean stations the underestimations of daily maxima are most probably due to inaccurate representation of costal processes in model, which are crucial for PBL height evolution and accumulation of pollution in the nearground air layers.For TRB station located in a narrow valley of very complex terrain, which cannot be appropriately resolved in the current model topography, the model over- predicts O 3 daily maxima for 14 µg m −3 .For other subalpine stations the bias of O 3 daily maxima predictions is lower.
To some extent the previously mentioned model overpredictions of nighttime O 3 minima could be explained by model error in predicted NO 2 levels.When evaluating the primary pollutants one must be aware that in the model the instantaneous emissions are spread over an entire grid box, which results in underestimated emissions and concentrations close to the source regions and overestimated emissions and concentrations at rural locations adjacent to the source regions, and can thus cause a combined effect of negative and positive biases at urban and rural sites.Comparisons of WRF-Chem-predicted NO 2 levels with measurements show that in spite of the high spatial resolution the concentrations of the small urban areas are insufficiently represented by the model (Fig. 4c).In Slovenia many towns are located in basins or very narrow valleys, usually poorly or, in some cases, not resolved in model topography.Smoothed local emissions for these towns show significant underestimations of NO 2 concentrations (e.g.ZAG in Fig. 6).In combination with poorly reproduced meteorological processes (calm and stable nighttime conditions in valleys and basins) this results in an underestimation of the O 3 loss by titration.This can explain the positive nighttime bias of O 3 found at these sites.The situation is better for bigger cities, located in wider basins, like LJ or CE (LJ; Fig. 6), while at rural sites NO 2 is either well simulated (e.g.MOH; Fig. 6) or slightly overpredicted due to increased emissions from adjacent urban areas (e.g.ZAD; Fig. 6).The overall agreement of hourly NO 2 predictions with measurements was good for rural sites while urban sites experienced underpredictions, which were highest for small cities, especially for NG (ME of −13 µg m −3 ) and ZAG (ME of −14 µg m −3 ).
Also interesting to discuss are the results for predicted PM 10 concentrations (Table 3, Fig. 4d) showing slight overprediction of daily PM 10 levels at all stations, which is somewhat surprising due to the fact that nearly all current offline and online coupled chemical transport models show large systematic PM 10 underestimations.For example, within the AQMEII exercise, where 17 modelling groups from Europe and North America were brought together, running eight operational online coupled air quality models over Europe and North America, the rural PM 10 concentrations over Europe were underestimated by all models (model configurations) by up to 66 % while for the urban PM 10 concentrations the underestimations were even much larger (up to 75 %) (Im et al., 2015b).The reason for slight overprediction of PM 10 levels could be to some extent attributed to the high model spatial resolution used in our study.Furthermore, the CORR values for daily PM 10 concentrations is rather low (0.34 and 0.37 for 1-day and 2-day forecasts, respectively; Table 3), which is partly due to the low temporal dynamics of measured daily PM 10 concentrations during the analysed time period (no recorded PM 10 exceeding), and partly due to the simulated PM 10 overestimations during the heat wave events.These overpredictions contributed also to the overall positive bias of predicted PM 10 levels.As shown in Fig. 7 for two monitoring sites, there was a significant PM 10 overprediction simulated on 10 June (day 8 in Fig. 7) related to the pre-frontal advection of polluted air masses coming from the northwestern part of the domain D2 (coming from domain D1).The next significant PM 10 overprediction occurred dur- ing the first heat wave episode (17-22 June), when during the hot and low wind conditions (after 17 June) the PM 10 levels started to build up in the PBL over entire domain D2 (and over southwestern parts of domain D1) and reached the maximum concentrations in Slovenia again with prefrontal advection of polluted air masses.Both overpredictions contributed to an overall positive bias in forecasted PM 10 concentrations.Detailed analyses showed that high concentrations in domain D1 originated from boundary conditions, and appear to be a consequence of overestimated advection of Saharan dust in MOZART model predictions.The increase in PM 10 concentrations over Slovenia was also simulated during the prefrontal advection related to the cold front which terminated the next two heat wave events in July and August (days 56 and 57 and days 67 and 68 in Fig. 7); however, during these days the predicted PM 10 levels were close to the measured PM 10 concentrations.

Evaluation and comparison of different methods for O 3 daily maximum predictions
In this section we want to answer the question: how accurate is the 1 h O 3 daily maximum WRF-Chem forecast in comparison to the statistical model prediction or to persistence?According to Zhang et al. (2012a) statistical models are known to be generally more suitable for complex site-specific relations between concentrations of air pollutants and predictors.With appropriate and accurate predictors they have a higher accuracy as compared to deterministic models which is along with their computational efficiency their main advantage (Zhang et al., 2012a).Among the strengths of the deterministic models are that they give prognostic time-and spatially resolved concentrations under typical and atypical scenarios and can give scientific insights into pollutant formation processes (Zhang et al., 2012a).Furthermore, they also allow for forecasts in locations which are not monitored due to their complete spatial coverage.In spite of simplified descriptions of physical and chemical processes in the deterministic models and inaccuracies and uncertainties in model inputs (in particular the emissions), some previous studies already suggested that deterministic models can also have skills similar to statistical forecasting tools (e.g.Manders et al., 2009).In addition to evaluation and comparison of O 3 daily maxima predictions with WRF-Chem and the statistical model, we decided to add a persistence model as a threshold for useful model prediction.Persistence works well under stationary conditions but, because it cannot handle changes in weather and emissions, fails at the beginning and at the end of the episodes (Zhang et al., 2010a).Regarding the extremes, models of all types are known to have problems to accurately predict them, while persistence predicts extremes with a 1-day (2-day) time lag. Figure 8 compares discrete statistics site by site for 1-day and 2-day model predictions of 1 h O 3 daily maxima.Similarly, Table 4 shows these statistics for all data with different thresholds applied (only for WRF-Chem and persistence, because a statistical forecast is not available for all stations) and separately for different types of stations (subalpine urban, rural, Mediterranean urban) with an available statistical forecast.Looking at the ME, persistence gives results close to zero as long as no threshold is applied, while with a threshold of 140 µg m −3 (Table 4) the ME of 1-day persistence (−10.2 µg m −3 ) is very close to the WRF-Chem model for 1-day predictions (−11.2 µg m −3 ), and for 2-day predictions WRF-Chem (−13.8 µg m −3 ) already beats persistence (−19.4 µg m −3 ).Site-by-site comparison (Fig. 8) shows that for most stations the statistical forecast has a lower ME than WRF-Chem forecast, but there are also stations (ISK, HRA, LJ, KRV) with a lower or equal ME to WRF-Chem than for the statistical model, indicating the possible occurrence The Taylor diagrams in Fig. 9 show CORR together with the centred root-mean-square difference (RMSD) between model forecasts and observations, and the amplitude of their variations (standard deviation).The ideal model would have a correlation coefficient of 1 and a standard deviation equal to the observations, which means that it would be co-located with the black dot on the diagram.WRF-Chem gives higher CORR and lower RMSD for all types of stations, while the standard deviation of WRF-Chem O 3 daily maxima predictions is underestimated and lower than for other model fore-casts.The latter shows that the variability in WRF-Chem model predictions is not as large as that in observed values.The MNBE in Fig. 8 has very similar results to ME.For all forecasts except WRF-Chem for the TRB site (with MNBE of 16 %), which is located in a narrow valley that is not resolved in the current model resolution, MNBE is below the ±10-15 %, which is the US EPA (1991) recommended threshold for the models used for regulatory applications.For MNGE the US EPA recommendation below 30-35 % for O 3 applications is met by all forecasts, even in the case of the 2-day persistence model.With the exception of the MS and KOP sites, the MNGE is lower for WRF-Chem than for the statistical forecast, while for the KOP and KRV sites 1-day persistence gives best results followed by the statistical forecast or WRF-Chem.The results are very similar for the IOA statistic with the range of 0-1, score 1 indicating perfect model agreement with the observations.We can conclude that for most stations the WRF-Chem predictions are in line or even outperform the statistical model.With the exception of the stations with high bias due to very complex local topography (TRB) or unresolved coastal processes (KOP), the WRF-Chem forecasts are more accurate than persistence.Here we recall that a high negative bias in the WRF-Chem forecast for the alpine KRV site, due to its low altitude  in model topography, was compensated by taking predictions from the fifth model level.
The key requirement for a forecast system is to be able to predict O 3 concentration levels greater than a given threshold.Thus, in addition to the discrete evaluation just presented, the contingency-table-based statistics are also an important metric of forecast performance.Table 5 summarizes the categorical evaluation results for three different thresholds (120, 140, 160 µg m −3 ) of elevated O 3 levels, which pose a greater risk to human health.Namely, it is important to take into account that results of categorical statistics are very sensitive to the threshold chosen, as well as to the overall pollution levels during the analysed months.Equitable threat score (ETS) measures the fraction of observed and/or correctly predicted events, adjusted for the frequency of hits that would be expected to occur by random chance.Although this score takes into account the climatology, it is not truly equitable.It ranges from −1/3 to 1, where the minimum value depends on climatology (it is near 0 for rare events).Looking at Table 5, ETS shows equal skill for WRF-Chem and the statistical forecast, higher than persistence for the 120 µg m −3 threshold (1-day and 2-day forecast).ETS decreases with increasing the threshold for both WRF-Chem and the statistical forecast, indicating the challenge that both models have to accurately predict the extremes.In the case of a 140 µg m −3 threshold, WRF-Chem has the same ETS as persistence, higher than the statistical model for the 1-day forecast, while for the 2-day forecast WRF-Chem outperforms the statistical model, followed by persistence.In the case of a 160 µg m −3 threshold, persistence has the highest ETS for a 1-day forecast, followed by the statistical model and WRF-Chem, while in the case of 2-day predictions the statistical model shows the highest skill and WRF-Chem the lowest.Another measure, the critical success index (CSI), is similar to ETS except that it does not take into account the climatology of the events and thus gives poorer scores for rarer events.It measures the percentage of cases that are correctly forecasted out of those either forecasted or observed, and ranges from 0 to 1 (1 indicating the perfect forecast).Similar to ETS, CSI gives higher scores for persistence in the case of the 1-day forecast for the two higher thresholds, while on the second day WRF-Chem or the statistical model already perform better.Bias (B) determines whether the same fraction of events are both forecasted and observed.A tendency of the statistical model and of WRF-Chem to under- predict O 3 threshold exceedances shows as a B below 1 for these two models.The false alarm ratio (FAR) that measures the percentage of forecasted high O 3 events that turn out to be false alarms gives the highest skill to WRF-Chem, followed by the statistical model and persistence.The probability of detection (POD) is a measure of how often a high threshold occurrence is actually predicted to occur and is relatively low for WRF-Chem with respect to other models.
It must be noted that in categorical evaluations systematic biases like those obtained with WRF-Chem for some stations (e.g.KOP) significantly impact the model performance.For example, if the KOP station was excluded from categorical evaluations, WRF-Chem performance improved by means of all statistical measures (results not shown).If correction techniques, based on observations and the previous day's forecast (e.g.McKeen et al., 2005McKeen et al., , 2007;;Kang et al., 2008), were to be applied to correct the systematic biases, WRF-Chem forecasts might outperform the other two models even in categorical evaluations.

Summary and conclusion
A high resolution modelling system based on an online coupled WRF-Chem has been applied for numerical weather prediction and for forecasting air quality in Slovenia.In the study, the evaluation of the forecasting system has been conducted for 3 summer months.Since the selection of physical or chemical parameterization schemes influences and possibly changes the outcomes, we decided to apply schemes which are well documented and have previously been used in other applications (e.g.AQMEII).Both 1-day and 2-day predictions of meteorological and air quality variables have been analysed.The focus has been on O 3 as it is the only pollutant with recorded exceedances of the legislation's limit values during the three heat wave events in June, July and August 2013.WRF-Chem daily O 3 maximum predictions have also been compared to the operational statistical model and persistence forecasts to answer the question of how skillful are the WRF-Chem model predictions in comparison to these two other models.
The 1-day and 2-day WRF-Chem PM 10 forecasts showed a very low bias.Exceptions were two events with significantly overpredicted PM 10 levels due to prefrontal advection of polluted air masses from neighbouring regions.Knowing that the majority of the current chemical transport models show large negative biases in simulated PM 10 concentrations, these results present a good starting point for studying the importance of aerosol feedbacks with realistic model aerosol concentrations, left for future research.
The overall agreement of the WRF-Chem NO 2 forecast with measurements was good for rural sites, while urban sites experienced model underpredictions which were highest for small towns.One important reason is that many small towns are located in basins or very narrow valleys, usually poorly presented in model topography.Smoothed local emissions result in model underestimations of NO 2 concentrations for these towns.This in combination with insufficiently reproduced calm meteorological conditions in basins and valleys during the nighttime hours explains also WRF-Chem overpredictions of nighttime O 3 concentrations.
Evaluations of predicted 1 and 8 h daily O 3 maxima, which are in the case of this pollutant of the highest interest, show good WRF-Chem model performance.Nevertheless, there are also stations which experience high over-or underpredictions of O 3 daily maximum levels.For Mediterranean sites the underpredictions of the daily maxima are most probably due to inaccurate representation of costal processes in the model, which are crucial for the PBL height evolution and accumulation of pollution in the near-ground air layers.For some subalpine stations the reason for the higher bias in O 3 daily maximum predictions is their location either at elevated mountainous or coastal regions or in narrow valleys which cannot be appropriately resolved in the current model resolution -which impacts how accurately a model simulates the local processes responsible for the level of local pollution.Comparisons of WRF-Chem O 3 daily maximum forecasts with persistence and with statistical model predic-tions show that with respect to some statistical parameters the deterministic WRF-Chem forecast can outperform the other two for both 1-and 2-day predictions.For example, the correlation coefficient shows a higher skill for the WRF-Chem model, confirming the importance of complex processes as taken into account in an online coupled Eulerian model.Further improvement of WRF-Chem forecasting skill could be obtained by applying one of the bias-correction methods in order to account for unresolved topographical and coastal effects, as well as emission patterns.Chemical data assimilation, although currently still in its infancy for online coupled meteorology-chemistry models (Bocquet et al., 2015), could in future also be used as an efficient method for improving prediction of chemical concentration fields.For the WRF-Chem model, a technical note on the implementation of the aerosol assimilation and a guide for prospective users has been recently published by Pagowski et al. (2014).

Figure 2 .
Figure 2. Example of ozone analysis for the Nova Gorica (NG) monitoring site (average daily maximum ± standard deviation) for seven clusters of similar trajectories, as used in the statistical ozone daily maximum forecast for the NG station.

Figure 4 .
Figure 4.The 3-month average 1-day predictions of (a) hourly O 3 , (b) O 3 daily maximum, (c) hourly NO 2 , and (d) daily PM 10 concentrations for the first model layer, overlaid with measurements.

Figure 5 .
Figure 5.Time evolution of hourly ozone concentrations for 1-day (F 1 day) and 2-day (F 2 day) WRF-Chem predictions and measurements for some stations during the 3-month period.

Figure 6 .
Figure 6.The same as Fig. 5 but for NO 2 at the LJ, ZAG and MOH stations.

Figure 7 .
Figure 7.The same as Fig. 5 but for daily PM 10 concentrations at the MS and ZAD stations.

Figure 8 .
Figure8.Site-by-site comparison of discrete statistics for 1-day and 2-day WRF-Chem (F 1 day, F 2 day), statistical (SF 1 day, SF 2 day) and persistence model (P 1 day, P 2 day) predictions of ozone daily maxima during the 3 summer months analysed.
2 No Figure 3. Locations of monitoring stations used in evaluation of air quality variables (AQ stations) and meteorological variables (MET stations).Green dots indicate measuring sites with available ozone daily maximum statistical forecast (SF).For the meaning of the AQ site abbreviations see Table 1.

Table 4 .
Discrete evaluation of 1 h daily maximum ozone predictions (PER: persistence).

Table 5 .
Categorical evaluation of 1 h daily maximum ozone predictions for different thresholds, calculated for eight monitoring sites with available statistical forecast.