Performance evaluation of approaches to predict sub-hourly peak odour concentrations

Atmospheric dispersion models are widely used to predict the potential for annoyance of odour emissions. It is well established that the predictive accuracy of these models is higher for the mean concentration. In reality, the concentration of a substance advected by a turbulent boundary layer flow often shows large fluctuations around its mean value. In the case of odorous gases, the mean concentration field only may thus hide information on the real impact on receptors, since such a process exhibits nonlinearity. Short-term odour concentrations have to be accounted for, thereby requiring additional knowledge on concentration fluctuations. Here, the main goal is to evaluate three selected approaches to predict sub-hourly odour peak concentrations. These approaches are the constant factor of 4, an empirical-based peak-to-mean procedure and the concentration-variance computation. A full-scale field dispersion experiment (Uttenweiler), for which concentration fluctuation measurements have been conducted, was designated for the investigation. In this respect, additional objectives of this work are to investigate the mean flow conditions encountered in the different dispersion trials, to examine the measured tracer fast-response concentration time series, and to characterise the performance of a Lagrangian particle dispersion model (LASAT) for the mean concentration. Several statistical indicators are used to compare predictions against observations. First, there is evidence, gained from evaluating the ultrasonic anemometer data, to show that not only neutral but also stable atmospheric conditions occurred during the trials. Second, the dispersion model showed an overall satisfactory performance under the given range of study conditions. A general bias towards underestimation was detected, with the dispersion model performing better for distances further away from the emission source. Third, for the concentration fluctuations, results are presented and discussed in terms of the fluctuation intensity, intermittency, peak-to-mean factors and probability density functions. Finally, regarding the three evaluated approaches, it is found that as the inherent complexity of the approaches grows, more accurate predictions of peak-to-mean factors Ψ90 are obtained. Namely, the concentration-variance computation approach performed best, with bias towards overestimating Ψ90. While the constant factor of 4 overestimated all Ψ90 observations, the empirical-based peak-to-mean procedure underestimated Ψ90 to a great extent. The results also confirm and reveal the advantages and shortcomings of each evaluated approach. The findings of this work have potential implications for future research and policymaking


Introduction
Industrial and livestock activities are the principal economic sectors releasing odorous compounds to the atmosphere. Malodour, as an ambient stressor (Campbell, 1983), can cause a broad range of adverse effects in exposed communities (Schiffman et al., 2005;Sucker et al., 2009). Annoyance is one of the most important adverse effects due to odour exposure (Cantuaria et al., 2017;Shusterman, 1992). Complaints usually arise when annoyance has become severe, and people no longer tolerate the situation (VDI 3883 Part 4, 2017). A significant fraction of complaints arriving at environmental agencies' hotlines concerns odours (Henshaw et al., 2006), thereby requiring governments' efforts to tackle the problem. Such complaints are one of the main barriers faced by site operators to develop their business further.
Turbulence is effective in mixing and advecting substances. If the substance has a negligible effect on the flow, it is called a passive scalar. The scalar concentration can show complex dynamic behaviour with resemblances to the turbulent velocity field, but with its decoupled statistics (Shraiman and Siggia, 2000). A practical way to characterise the turbulent mixing phenomenon is through local measurements of the advected field. The scalar measured with respect to time fluctuates around its mean value at the observation point as fluid parcels sweep by (Shraiman and Siggia, 2000). Considering that a process depends on the concentration linearly, the mean concentration C (the first statistical moment) can describe the ensemble average nature of such a process satisfactorily. However, nonlinearity is inevitable in several cases of importance to society, highlighting the practical relevance of obtaining information on the second and higher statistical moments of the probability density function (PDF) of concentration fluctuations (Marro et al., 2018). Examples of such cases include hazards posed by discharges of flammable and toxic gases, atmospheric chemistry and, the focus of present work, exposure to odorous gases. The human sense of smell responds in seconds to a stimulus, characterising a non-linear, instantaneous perception. In odour impact assessments, the peak concentrations are thus very important.
Atmospheric dispersion modelling is a fundamental tool in air quality management. The use of dispersion models is an integral part of several odour regulations and guidelines worldwide, being nowadays the most preferred technique for compliance demonstrations (Brancher et al., 2017;Capelli et al., 2013). Most dispersion models are grounded on the Reynolds-averaging concept. Thus, these models have the potential to predict information on C with sufficient accuracy. To define what can be considered acceptable protection against community annoyance, odour concentration thresholds that shall not be exceeded in a period of time have been set (Brancher et al., 2017). The allowed exceedances are usually assessed on the basis of hourly predictions. The ratio between a short-term mean value and the long-term mean value defines the peak-to-mean factor Ψ (Schauberger et al., 2012). In the field of environmental odours, such peak-to-mean factors are widely used to describe concentration fluctuations. The peak value, taken to represent the short-term mean value, can be defined manifold. Germany pioneered a regulatory procedure based on the frequency of 1-h periods (so-called odour hours) for which the peak value is defined by the 90th percentile (GOAA, 2008).
In the European Union, a common basis for the Member States to assess odour in ambient air has been achieved through the standardisation of a technique called field inspections (EN 16841-1, 2016). Empirical measurements, by using the grid method, allow measuring the frequency of odour hours. In the grid method, one odour hour is defined if a recognisable odour (unambiguously identified) is smelled in at least 10% of the measurement interval. For practical reasons, a sample period of 10 min (60 breaths) is considered to represent 1 h. If 6 out of 60 periods (10 min) are assessed as odorous by a panellist, this triggers an odour hour. Conversely, for modelling, one needs to determine the 90th percentile C 90 of the corresponding cumulative distribution function of the time series of ambient odour concentrations. Then, C 90 is normalised by the mean concentration C, so defining Ψ 90. Namely, Ψ 90 ¼ C 90 /C. Currently, in Germany, the constant factor of 4 for Ψ 90 should be used for compliance demonstrations using dispersion modelling. An odour concentration threshold of 1 ou E m À 3 is applied for C 90 , and one odour hour is counted as such if the predicted odour level exceeds this detection threshold. Hence, the exceedance probability is given by the fraction of odour hours with respect to the total number of hours of the calculation period (e.g., 8760 h in a typical year). Based on such a concept, assessment conclusions from either the grid method or dispersion modelling are thus considered equivalent, in general. Inconsistencies between modelled and observed odour hour frequencies have been discussed (Brancher et al., 2019b;Oettl et al., 2018).
The primary goal of this work is to evaluate three approaches to predict sub-hourly peak concentrations in the context of odour pollution. This will allow answering the following research question: Can more sophisticated modelling techniques deliver better Ψ 90 estimates? For the evaluation, we use observational data from the Uttenweiler field experiment (B€ achlin et al., 2002). We address this objective by comparing Ψ 90 predictions against observations.
The present work has additional goals. The second goal is to investigate the mean flow conditions encountered in the different dispersion trials. It is undertaken by scrutinising the available ultrasonic anemometer data. The third goal is to examine the measured tracer fastresponse concentration time series. It is achieved by calculating fundamental statistics from the fast-response concentration signals. The fourth goal is to characterise the performance of a Lagrangian particle dispersion model (LASAT). It is attained by comparing model predictions against observations for the mean concentration.
The remainder of this work is organised as follows. Sec. 2 details the three evaluated approaches. Sec. 3 describes the Uttenweiler field experiment, dispersion modelling set-up and statistical methods for performance evaluation. Sec. 4 presents the results and discussion. Finally, Sec. 5 concludes the work.

Approach 1
The first approach to be evaluated (hereafter Approach 1) is the one currently enforced in Germany, for which Ψ 90 ¼ 4 has been set (GOAA, 2008). The peak concentration (defined by the 90th percentile) is estimated by multiplying the long-term mean value (typically hourly) by that constant factor.

Approach 2
An empirical-based approach (hereafter Approach 2) for estimating variable peak-to-mean factors has been established (Piringer et al., 2015 and references therein). It works by taking into account two aspects known to influence peak-to-mean factors: atmospheric stability and distance from a single-point source. The conceptualisation of this approach starts from a widely used relationship (power-law function) so that the peak-to-mean factor Ψ 0 ¼ C p / C is given by with the mean concentration C calculated for an integration time of t m and the peak concentration C p for an integration time of t p ; n is a nondimensional empirical exponent which is atmospheric stability dependent. The values for t m ¼ 600 s and t p ¼ 10 s were set based on the conditions of the Uttenweiler field experiment. Accordingly, Table 1 shows the maximum peak-to-mean factors Ψ 0 , which are set near the source, for different values of the exponent n (Beychock, 1994). Besides, the Ψ 0 values were related to the Klug/Manier (K/M) stability classification scheme. The K/M scheme is now the default atmospheric stability classification method in Germany (VDI 3782 Part 6, 2017). The K/M stability classes are numbered from I-V (Index 1-6) and are roughly comparable to the Pasquill/Gifford classes F-A. An exponential attenuation function (Mylne, 1992;Mylne and Mason, 1991) adjusts the maximum peak-to-mean factors Ψ 0 to herein yielding Ψ 90 , as follows where T ¼ x=uðzÞ is the travel time and T L stands for the Lagrangian time scale. T is estimated based on the radial distance x from source to receptor and the mean wind speed uðzÞ at a height z above ground level (a. g.l.), which is taken as the anemometer height. T L assumes the form of is the variance of the wind speed, calculated as the mean of the variance of the three wind components u, v, and w. Mylne and Mason (1991) note that such parametrisation for T L should not be assumed equal to the Lagrangian integral time scale (it is merely a measure of the time scale of motions). The turbulent kinetic energy dissipation rate ε is specified by the following parametrisation (Pasquill and Smith (1983) cited by Mylne and Mason (1991) where κ is the von K� arm� an constant (assumed to be 0.4), u * is the friction velocity and z r is the receptor height a.g.l.
(1 m). The parametrisation with σ w in Eq. (3) has been used in previous studies (Piringer et al., 2015 and references therein). However, here we considered the right-hand side of Eq. (3) because u * is available.

Approach 3
A new approach (hereafter Approach 3) has been recently proposed for providing variable Ψ 90 estimates (Oettl and Ferrero, 2017). It relies on a simplified advection-diffusion equation for the concentration variance, neglecting the transport and diffusion terms. Ψ 90 is calculated by assuming a Weibull PDF that is defined by two parameters: the concentration fluctuation intensity i c ¼ σ c = C, with σ c being the standard deviation of concentration fluctuations, and the mean concentration C. The simplified equation for computing the concentration variance reads as where c� 2 is the concentration variance, σ 2 i are the wind velocity variances, and T Li are the Lagrangian time scales. The x i -direction (with i ¼ 1, 2, 3) represents the along-wind x, crosswind y, and vertical z directions, respectively; and, u, v, and w denote the velocity components in the x-, y-, and z-directions, respectively (Hsieh et al., 2007). In general, both σ 2 i and T Li depend on the height a.g.l. and atmospheric stability. T Li was parametrised as T Li ¼ 2σ 2 i =C 0 ε , where C 0 is the Kolmogorov constant. The value of C 0 ¼ 4.5 was assumed (Nironi et al., 2015), and ε ¼ σ 3 w =z r was used following Ferrero and Oettl (2019). The ordinary differential equation (ODE) and its components, described by Eq. (4), can be interpreted in multiple ways with respect to explicit and implicit time dependencies. In general, and as also has been proposed by Ferrero et al. (2017), the dissipation time scale t d can be a time-dependent function. It is a straightforward exercise to examine the variations of the analytic solutions of the ODE with respect to different time dependencies of the constituents (the difficulty is to arrive at a proper parametrisation for t d ). However, in order to follow the originally published approach, t d was approximated as a constant given by t d ¼ 2 T Lw , where T Lw is the Lagrangian time scale for the vertical component.
The solution of Eq. (4) is used to estimate the fluctuation intensity i c . Finally, Ψ 90 is calculated via the following procedure where k&λ � 0 are the shape and scale parameters of the Weibull PDF f (c), respectively, c designates the instantaneous concentration and Γ(⋅) denotes the Gamma function. The two-parameter Weibull PDF for computing Ψ 90 was originally parametrised by taking it to the power of 1.5. In doing so, a better fit for the large event values of observed Ψ 90 was stated to be achieved (Oettl and Ferrero, 2017). However, in a subsequent study, instead of continuing taking the Weibull PDF to the power of 1.5, this time it has been multiplied by the factor 1.5 (Oettl et al., 2018). Moreover, it has to be asserted that in Oettl and Ferrero (2017) PDFs were not fitted to time series of concentration data. Instead, curves were fitted to a dependent variable (Ψ 90 ) against an explanatory variable (i c ) such that the curve has the same shape as a particular PDF. It should also be mentioned that the assumptions of Approach 3, along with the chosen upper limit for i c of 3.77, restricts the solution space of Ψ 90 to a value range of ~1.5-4.0. It has been stated that this minimum of 1.5 was considered to safeguard conservativeness in conclusions of real case studies under regulatory compliance using the GRAL dispersion model (Oettl et al., 2018).
Since this procedure primarily depends on a given mean concentration field, it could, in principle, be applied to any dispersion model. In this work, we realised Approach 3 as a post-processing routine of the LASAT dispersion model.

Field experiment
The dispersion trials of the Uttenweiler field experiment were undertaken on a pig farm (48.136 � N, 9.652 � E, ~160 m above sea level, Uttenweiler, Germany) during two sets of events in December 2000 and October 2001. The site was located mostly within an area of flat farmland. There was a small wooded area (circa 100 m � 100 m) ~100 m to the North of the site. The pig farm consisted of two conjugated buildings of 7.7 m and 10.6 m height. For realising the dispersion trials, the smaller building had a single-point source with a height of 8.5 m connected to internal ventilation systems. The stack consisted of three distinct shafts with rectangular outlets of 1.2 m 2 , totalling 3.6 m 2 . Continuous releases of odour and a passive tracer (SF 6 ) occurred from Table 1 Maximum peak-to-mean factors Ψ 0 as from Eq. (1) for different values of the exponent n and assuming t p ¼ 10 s and t m ¼ 600 s. K/M: Klug/Manier stability classification scheme from V (very unstable) to I (very stable). Odour intensity levels (not considered herein) and SF 6 bag measurements were carried out at 11 or 12 receptors downwind of the pig farm. The sampling points were distributed either at two traverses (Trials B-H, M-O) or a single traverse (Trials I-L) perpendicular to the wind direction. All trials featured a 10 min-averaging time.
Moreover, at two receptors, fast-response concentration measurements of SF 6 were undertaken with a sampling rate of 0.1 Hz. In this case, the SF 6 concentration in the air samples was analysed off-line by gas chromatography. Depending on the trial, these two receptors were placed at ~144 m and 279 m on average from the stack.
Additional information on this dataset can be found in B€ achlin et al. (2002).

Meteorology
Upwind of the pig farm (approximately in the south-southwest sector), at a distance of ~150 m from the stack, a meteorological mast was deployed. At the reference height of 10 m, wind direction and speed (5-min averages) were recorded with a cup anemometer. During the trials on 31/Oct/01 (Trials I-O), the device on the measuring mast went down due to a technical problem. As an alternative, wind measurements were carried out using a 2 m height auxiliary mast located nearby (at a distance of ~151 m from the stack). During all trials, meteorological measurements were also conducted with an ultrasonic anemometer (hereafter sonic). This instrument was mounted downwind at z ¼ 3.5 m a.g.l. near the first traverse at which fast-concentration measurements were undertaken. It was operated with a sampling frequency of 10 Hz, average values made available over 10 s. As for the fast-response concentration measurements, 60 observations per 10-min measurement interval were available for each trial.
Consequently, the average meteorological observations result from three different instruments mounted at different heights and positions. Intending to have the meteorological data from the same instrument, height and area, we decided to consider the sonic records only. On top of that, this instrument offers the advantage of providing direct measurements of turbulent quantities. Notably, the Obukhov length (L, in units of m) can be derived from such data in order to characterise the stability of the surface layer.
For neutral stratification, L can by definition approach positive or negative infinity. Values of L > 0 indicate stable stratification, while values of L < 0 characterise unstable stratification. The physical interpretation of |L| is the height where mechanical production and buoyancy production of turbulence become equal. Above this height buoyancy prevails, and below it shear dominates. Atmospheric stability is also commonly expressed in terms of the dimensionless Obukhov stability parameter ζ ¼ ðz-z d Þ=L (Arriga et al., 2017;Kent et al., 2018), where z d is the displacement height. Likewise, when ζ → 0, the boundary layer is classified as neutral. Table 2 summarises the data of each trial which are relevant to this work. Fig. 1 illustrates the Uttenweiler field experiment with a schematic diagram.

Ultrasonic anemometer data
We examined the available sonic records before their use. Primarily, the following meteorological parameters were considered in this examination: wind direction and speed, ambient temperature, standard deviations of the three wind components (σ u , σ v , σ w ), u * and L. The temperature and wind data were of high quality, not requiring data revision. However, negative values for u * occurred, resulting in missing values of L. Some extreme (unrealistic) values of L were detected as well.

Table 2
Data summary of the 14 trials performed in the Uttenweiler field experiment. Receptor 1 and Receptor 2 represent the two points at which fast-response concentration measurements were undertaken. The Cartesian coordinates (X, Y) are relative to the origin of the coordinate system, that is, the stack.  When a Lagrangian particle model as LASAT finds invalid meteorological data, the current calculation is interrupted at this point and restarted when valid data is once again available. As a consequence, this can skew the results instantly after the restart, since contributions from the preceding averaging interval will be missing (Janicke Consulting, 2019). Hence, we decided to revise the available sonic records. We applied a tool called Usat (version 1.1.18) for this revision by considering the following steps (Janicke Consulting, 2019): i. Measurement gaps were filled applying an interactive procedure based on persistence. The maximum number of filling steps was set as two, once this value was enough to close all gaps; ii. A negative value of u * causes a gap of L. In such a case, u * can be calculated from σ w and then used to recalculate L if the covariance of temperature and vertical velocity is given; iii. A limit of 1 as absolute value for the reciprocal of L (i.e., 1/L) was set to filter out extreme values.

Surface roughness length z 0
The logarithmic wind law establishes that for a height z the mean wind speed uðzÞ can be determined by (Kent et al. (2018) and references therein): From this equation follows that the aerodynamic roughness length z 0 can be derived if the other variables are known. We calculated the mean z 0 in this manner from the measured wind and turbulence observations. The tool Usat (version 1.1.18) was again used for this procedure (Janicke Consulting, 2019). First, the average for all situations was computed, where w s is in this case used to denote the measured 10-s wind speed values. Then, z 0 was calculated by where δ ¼ z d =z 0 , which is generally of the order of 6 (Janicke Consulting, 2019). The logarithmic wind-speed profile is strictly valid in a neutrally stratified atmosphere (Kaimal and Finnigan, 1994). Although stability conditions can be identified using the Obukhov length L more precisely, for simplicity, only observations with w s � 2 m s À 1 were retained. The higher the wind speed, the more likely the atmospheric stability is to approach neutral conditions. The calculation was initially completed for 10 � sectors to distinguish any dependency of z 0 on the wind direction. Lastly, we determined z 0 as a sector average (Table 3). We calculated z 0 as from the complete sonic dataset against the revised dataset. These data were also related to each of the three days of trials. Complete means that all records were considered since meteorological measurements were also performed and made available for longer periods than those of the trials. This allowed including more data in the calculation once we put a constraint on w s .
The mean values of z 0 remained mostly unaltered after the revision of the sonic records. This indicates that the adopted procedure was satisfactory in preserving the original dataset information while filling up the gaps. The mean values of z 0 adopted for dispersion calculations followed the CORINE land-use class values given in a German guideline (VDI 3783 Part 8, 2017). Moreover, we carried out sensitivity tests for δ considering values between 1-12. A negligible effect on z 0 was observed. Yee and Biltoft (2004) stated that the diagnosed value of z 0 in their case study-an idealised urban area given by obstacles arrayed regularly, viz. the Mock Urban Setting Test (MUST) field experiment)-seemed to be relatively insensitive to the exact value of z d .

Dispersion modelling
We used LASAT in its version 3.4.5 for computing the mean concentration fields (10-min averages). LASAT is a Lagrangian particle model that meets the German guideline VDI 3945 Part 3 (2000) requirements. The model formulation allows describing the temporal evolution of a representative sample of fluid particles in a turbulent field employing a continuous stochastic Markov process (Janicke Consulting, 2019). For the modelling runs, we selected the stack as the reference location of the coordinate system. We then defined a domain of 1 km � 1 km centred on the stack. The terrain was considered flat. Concentrations were computed releasing 500 particles per second on a three-dimensional grid with a horizontal mesh width of 5 m. Preliminary tests with 1000 particles per second did not improve the statistical error significantly to justify a higher computational time. The standard boundary layer model (version 2.1) of LASAT was used.
In the Uttenweiler field experiment, the two buildings, with irregular aspect ratios, were repeatedly at an oblique angle to the incident winds. This setting creates a complex flow dynamic in the vicinity of those buildings that is complicated to be reproduced. In addition, the stack height was ~1.1 times the height of the building on which it was positioned. The releases took place through a rectangular stack rather than a circular one. This source configuration would have to be converted because the dispersion model requires circular-shaped stack parameters as input. But when using LASAT to model the release of a point-like source without considering plume rise, such a conversion is avoided (the only required source input parameter is, besides the emission rate, the stack height). Therefore, following the strategy employed in the development of the AUSTAL2000 model (Janicke and Janicke, 2007), we realised the modelling runs without considering buildings nor plume rise. This simpler strategy to dealing with the case in question proved to be effective for the present purposes, as will be shown in Sec. 4.3. It is also important to note that the closest traverse of receptors was located on average at ~19 times the height of the building containing the stack.
The modelling runs were conducted using an Intel® Xeon® Processor E5-2680 v2 with 2.80 GHz, 32 GB RAM and 8 Cores.

Performance evaluation
We conducted the performance evaluation by selecting various statistical indicators, comprising qualitative and quantitative viewpoints in the analyses.
We selected as performance measures the fraction of predictions within a factor or two (FAC2), mean bias (MB), normalised mean bias (NMB), mean absolute error (MAE), fractional bias (FB), root mean squared error (RMSE), normalised mean squared error (NMSE), Pearson correlation coefficient (r) and index of agreement (IOA). For brevity, the reader is referred to preceding studies (Bennett et al., 2013;Carslaw and Ropkins, 2012;Chang and Hanna, 2004;Hanna and Chang, 2012;Jackson et al., 2019;Willmott et al., 2012) for a comprehensive definition and formulas of these metrics. We did not select the geometric mean and the geometric variance because the concentration data of the Table 3 Values of mean roughness length z 0 (m).  (Bennett et al., 2013;Carslaw and Ropkins, 2012;Chang and Hanna, 2004;Hanna and Chang, 2012;Jackson et al., 2019;Willmott et al., 2012). Hence, bias is considered in some error metrics, and it is given by the sign of the resultant value. However, to avoid negative and positive errors cancelling out each other, some metrics deliberately eliminate bias effects via absolute or squared error values. A drawback is that such error metrics do not indicate systematic over or underprediction because the result will be always positive. There is no such thing as a universal performance measure tailored to all situations. A balanced evaluation is often more useful to explore different aspects of model performance (Bennett et al., 2013;Carslaw and Ropkins, 2012;Chang and Hanna, 2004;Hanna and Chang, 2012;Jackson et al., 2019;Willmott et al., 2012). Also, we devised a composite diagram known as Taylor diagram (Taylor, 2001). The Taylor diagram depicts a concise visual interpretation of model performance showing how the centred RMSE, the r and the magnitude of the data spread (represented by the standard deviation) vary concurrently. The RMSE is centred because the mean values of the observations and predictions are subtracted first (Carslaw and Ropkins, 2012). We considered classical scatter plots and quantile-quantile plots for the performance evaluation as well.
All selected statistical indicators assess the prediction skill of models utilising predicted data, observed data and number of samples, with pairing in time-space or not. Two models can also be compared by assuming a reference situation.
A flowchart with the main points of the methodology has been provided ( Fig. S1 in the Supplementary Material) for a better understanding of the working principles applied herein. Table 4 shows the summary statistics of wind and turbulence conditions as 10-min averages. These results relate to the mean flow conditions of each dispersion trial in reference to the sonic height (z ¼ 3.5 m a.g.l.). They are indispensable for a better understanding of the pollutant transport and dispersion at the site during the trials. Moreover, the outcomes of the next sections depend at least partially on what has been found within this section. The conversion from L to K/M classes followed Table 17, Annex 3 of TA Luft (TA Luft, 2002). This procedure returns the K/M classes as a function of roughness length z 0 and L values.

Mean flow conditions
The uðzÞ ranged from 2.3 m s À 1 (Trial D) to 6.7 m s À 1 (Trial E). Besides, u * varied between 0.09 m s À 1 (Trial D) to 0.37 m s À 1 (Trial J). Thus, on average, friction velocities in the range of 15-24 times less than the 10-min averaged wind speeds were observed. While the minimum u * of 0.09 m s À 1 was linked to the minimum uðzÞ of 2.3 m s À 1 (Trial D), the maximum u * of 0.37 m s À 1 was not associated with the maximum uðzÞ but with a speed of 5.7 m s À 1 (Trial J). For all trials, σ u , σ v and σ w were higher than u * .
The trial days were stated to be selected to carry out the Uttenweiler field experiment in comparable boundary layer conditions. To this end, requirements were placed on the meteorological situation. In terms of atmospheric stability, there was as an attempt to run the trials mainly during neutral stratification (B€ achlin et al., 2002). Neutral, or near-neutral conditions, occurred in 9 out of 14 trials based upon L, ζ and K/M results. The Uttenweiler field experiment has been replicated in an engineering wind tunnel, while in this case, only neutral conditions were accomplished (Aubrun and Leitl, 2004). However, these authors mentioned that the field trials were performed under neutral and stable conditions.
The sonic data suggest that Trial O, in particular, featured a very stable stratification. Stably stratified boundary layers are known to form more often at night over land (then they are also called nocturnal boundary layers). However, a stable boundary layer can also develop during the day by advection of warmer air (e.g., after a warm frontal passage) over a cooler surface (Stull, 1988). Very low σ w can occur under stable conditions, occasionally even lower than u * . This was not the case as turbulence, presumably of mechanical origin, was not suppressed (note the σ u , σ v and σ w values of Trial O). One example of σ w equal to u * has been specified for a field trial conducted at night-time under a very stable condition associated with low wind speeds (Finn et al., 2018).
The requirements placed on the meteorology were indeed able to prevent convective stability. The estimates of L and ζ are positive for all Table 4 Summary statistics of meteorological conditions. The ultrasonic anemometer was mounted at z ¼ 3.5 m a.g.l. about 145 m on average downwind of the point source. L: Obukhov length; ζ: Obukhov stability parameter; K/M: Klug/Manier stability classification scheme; σ u , σ v , σ w : standard deviations of the along-wind, crosswind and vertical wind velocity components, respectively; u * : surface friction velocity; uðzÞ: mean wind speed at height z; w d : wind direction. Values denote averages over 10 min, except for wind direction for which additional statistics (range and variation) are shown. cases, thus not indicating the occurrence of unstable conditions. With unstable conditions, it is reasonable to expect stronger turbulence in the vertical plane. The σ w values are smaller than the other two velocity components (σ u and σ v ) for all trials. Hence, the non-occurrence of unstable conditions is also confirmed based from this angle. In short, the sonic data suggest that neutral and stable stability conditions occurred at z ¼ 3.5 m. These results may have important implications for future studies considering such a dataset. The sonic was positioned in the range of 138-149 m downwind of the stack. According to the LASAT reference book (Janicke Consulting, 2019), the influence of a building on the wind field structure extends out typically to a distance of 5 times the height of the building. Such an influence on the turbulence quantities of the flow can be up to 10 times the height of the building. As the pig farm consisted of two conjugated buildings of 7.7 m and 10.6 m high, one can thus as far as possible rule out the influence of obstacles on the sonic wind and turbulence measurements. The sonic was positioned ~90 m on average from the closest edge of the small wooded area (circa 100 m � 100 m, mean tree height of about 20 m) (B€ achlin et al., 2002). In this case, the instrument was likely in the area of influence of the wooded area. However, when the releases took place, the flow was not oriented from that direction. Fig. 2 presents the wind roses of each trial to conceive a graphical picture of the wind frequency distribution within the 10-min measurement interval. The 10-s wind directions oscillated from 17 � (Trial B and Trial M) up to 38 � (Trial F). Such variability in wind direction may prompt departures of the ensemble-averaged crosswind concentration profiles from the Gaussian distribution. Wind speeds <0.5 m s À 1 were not observed. As a comparison exercise, we related the friction velocities reported in Table 4 (as calculated from u * ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi jðu ' w ' Þj p , where u ' w ' is the vertical turbulent momentum flux) against the friction velocities from Eq. (9) using the revised z 0 values reported in Table 3. In general, there was a good agreement between the two methods (e.g., FAC2 ¼ 100%, MB ¼ 0.07, MAE ¼ 0.08, FB ¼ À 0.25, NMSE ¼ 0.09, RMSE ¼ 0.09, r ¼ 0.87, n ¼ 14) with a tendency towards a slight overestimation by Eq. (9). As noted earlier, the logarithmic law strictly applies to neutral conditions. The estimates become less reliable as the stability departs from neutral. So, in practice, wind-speed profiles will deviate gradually from logarithmic (Kaimal and Finnigan, 1994). Even subsetting by neutral cases only, a slight overestimation by Eq. (9) was again detected (e.g., FAC2 ¼ 100%, MB ¼ 0.07, MAE ¼ 0.08, FB ¼ À 0.22, NMSE ¼ 0.08, RMSE ¼ 0.09, r ¼ 0.84, n ¼ 9). Note that considering the displacement height z d in Eq. (9) has virtually no effect on these results since its value is very low. This parameter is central to, for example, the study of canopy flows.

Short-term concentrations
Fast-response concentration signals by themselves allow extracting meaningful information. Fundamental statistics that can be extracted from such time series are the four statistical moments, (a) mean C, (b) standard deviation σ c , (c) skewness, (d) kurtosis, besides the (e) fluctuation intensity i c , (f) peak concentration, (g) intermittency factor I and (vii) power spectrum. Here we show results for (a), (e) and (g) ( Table 5). Gaps in the concentration signals were removed for such a purpose.
Recall that the fluctuation intensity i c was defined as i c ¼ σ c = C. The peak concentration was defined by the 90th percentile C 90 , which is then normalised by the mean concentration C, yielding Ψ 90 ¼ C 90 /C. Like so, results for (b) and (f) are implicitly presented as well. The intermittency factor I is typically given as the frequency of non-zero concentrations. Then, the definition of a threshold value for which the concentration is equal to or less than comes in handy. This threshold has been usually grounded on the detection limit of concentration analysers (Klein and Young, 2011). Clearly, I factors may be sensitive to the threshold below which the concentration is taken to be zero (Wilson, 1995). For simplicity we set this threshold as a zero-concentration value. Moreover, most of the concentration signals analysed herein had significantly non-zero concentrations. Consequently, in the Uttenweiler case, only a few I factors would be affected by such a threshold definition. Examples are Trials B and D, Receptors 1 and 2 (Table 5).
We classified the receptors closer to the source (those located at the first traverse) as near-field, and the ones further away (at the second traverse) as far-field. There is however no wide agreement in the scientific literature of what is near-field or far-field. For example, in a previous study, near-field was defined as full-scale distances less than ~100 m from the source (Hanna and Chang, 2015). Here near-field denotes the receptors placed at distances of ~150 m from the source. Perhaps in future studies a more harmonious definition for defining near or far-field distances can be sought in relation to the Lagrangian time scale.
Observed i c varied from 0.23 to 2.17. In general, comparable i c results have been previously reported in relatively flat and homogeneous terrain settings. Besides, Finn et al. (2010) discussed the magnitude of concentration fluctuations in an urban boundary layer (Joint Urban, 2003 field experiment in Oklahoma City) and found that i c results were similar to that in more homogeneous situations. However, they reported little difference in conditional and unconditional i c results for the heterogeneous urban environment under study (Finn et al., 2010).
Observed Ψ 90 varied from 1.3 to 3.8. As mentioned, the Uttenweiler field experiment has been replicated in an engineering wind tunnel (Aubrun and Leitl, 2004), where Ψ 90 always less than 4 has been reported. Considering the complete database (accessible at www.mi.un i-hamburg.de/cedval), a few cases of Ψ 90 higher than 4 have been stated to occur, with a maximum of 6.5 (Aubrun and Leitl, 2004). Liu et al. (2011) reported Ψ 90 values, as a result of 2-s averaged data, generally ranging from 2 to 5 for a wind tunnel experiment reproducing gas dispersion around a complex-shaped high-rise building. Fackrell and Robins (1982) reported a fairly constant value of about 4.5-5 for Ψ 99 (the peak defined by the 99th percentile) in a wind tunnel study for plumes from an elevated and a ground-level source in a neutrally stratified turbulent boundary layer. Klein and Young (2011) reported mean Ψ 98 (the peak defined by the 98th percentile) of 4.76 and 4.37 for IOP3 and IOP6, trials part of the Joint Urban 2003 field experiment.
In terms of atmospheric stability, i c results varied from 0.39 to 1.96 (mean value of 1.16) and 0.23-2.17 (mean value of 1.12) for neutral and stable conditions, respectively. Ψ 90 results varied from and 1.4-3.8 (mean value of 2.7) and 1.3-3.7 (mean value of 2.7) for neutral and stable conditions, respectively. Noteworthy differences in i c and Ψ 90 ranges and mean values between stability conditions were thus not distinguishable. A possible reason for this could be due to the nature of the stable conditions of the present trials. That is, they have been linked to relatively high wind speeds with turbulence in the three planes (Table 4) unlike those stable conditions that generally occur at night with low wind speeds and reduced turbulent mixing.
For a distance of ~144 m (receptors in the near-field), i c and Ψ 90 results ranged from 0.23 to 2.17 (mean value of 1.2) and 1.3-3.8 (mean value of 2.8), respectively. For a distance of ~279 m (receptors in the far-field), i c and Ψ 90 results ranged from 0.39 to 1.56 (mean value of 1.05) and 1.4-3.5 (mean value of 2.4), respectively. Consequently, on average, i c and Ψ 90 were higher closer to the emission source.
When examining the results of the preceding paragraph more in detail-by additionally considering the position of the two receptors in relation to the mean modelled plumes-the general tendency detected was a centreline decrease of the unconditional i c as downwind distance increases. This is somehow reflected in the mean results of i c and Ψ 90 reported above. However, such a pattern is not directly echoed in each individual i c value. The most likely explanation appears to be the crosswind position of some receptors. That is, they were in the edges of the mean modelled plumes, a region where higher fluctuations take place more often than in the plume centreline. The Ψ 90 results are likely to be affected accordingly. Indeed, for five trials (Trials B, E, M, N and O), the observed Ψ 90 is larger in the far-field than in the near-field. In Trial F, the values for the observed Ψ 90 are approximately equal (Table 5). Finn et al. (2010) notes that the signature in fast-response concentration time series, taken at any receptor, is due to the distance downwind and vertical and crosswind positions within the plume, along with other factors including atmospheric stability. The intermittency remains close to unity near the plume centreline and decreases abruptly towards the margins of the plume. Hence, larger i c are related to receptors closer to the edges of the plume and to the source. Smaller i c values are related to receptors closer to the nominal centreline of the plume and further downwind. Interestingly, i c can thus be used as a proxy for spatial reference (Finn et al., 2010). The last row of Table 5 gives the mean values for C, i c and Ψ 90 taken over all 10-s concentration records (n ¼ 1614). Therefore, they are not the mean of the means because the sample size varies between the trials. The computations over the whole dataset returned mean values of C, i c and Ψ 90 equal to 9.95 μg m À 3 , 1.26 and 2.9, respectively. These results are different from those reported in previous studies (Ferrero and Oettl, 2019;Oettl and Ferrero, 2017), although the values are in the same order of magnitude. It seems that such differences are mainly due to considerations in the computation of the mean values, and the percentile estimation method to a lesser extent. Fig. 3 exemplarily shows concentration signals for Trial E and Trial M for the two receptors at which fast-response concentration measurements were recorded. These plots demonstrate the wide range of length and time scales acting on the turbulent mixing of a pollutant emitted from a point source (Yee, 1999). The intricate concentration pattern that emerges at any receptor emphasises further that, for an improved assessment of odour impacts, a better understanding of concentration fluctuations is essential.
Concentration fluctuations can be studied by constructing, from the temporal sequence, an empirical probability density function PDF (Shraiman and Siggia, 2000). Accordingly, we also present exemplary results of the statistical characterisation of fluctuations in the fixed points of the field shown above, independently (in space and time) of any other point. This strategy is a practical way of approaching the characterisation of fluctuations known as the one-point one-time concentration PDF (Marro et al., 2018). The concentration PDF is key because it embraces all the statistical moments.
In fact, there is still no broad agreement on which PDF fits concentration fluctuation data best. The functional form of the concentration PDF has been complicated to resolve from both prior field, wind tunnel and water channel experiments. A large collection of PDFs has been proposed as candidates for a variety of conditions (see Table 1 of Efthimiou et al. (2016) for a summary). Examples include the exponential, gamma, beta, lognormal, clipped normal, generalised Pareto, Weibull, extreme value and g-and h-distributions. Note that a crucial point is that if a candidate PDF describes the physical phenomenon at hand, and whether it has practical application for a later implementation into dispersion models. This means that so-called overfitting should be avoided. By integrating the PDF, a cumulative distribution function (CDF) is obtained. The CDF gives the probability that a random variable has a value less than or equal to a certain value. The slope of the central part of the CDF indicates the fluctuation intensity i c . The lower the i c , the steeper the slope (Santos et al., 2005). The intermittency factor I can be deduced from the point at which the CDF crosses the vertical axis (Mylne and Mason, 1991). In the case of the clipped normal, I can be inferred as the area under the clipped-off tail of the normal distribution that prolongs below the zero concentration (Wilson, 1995). The CDF can also provide information on the ratio between the maximum peak of the time series and the mean, which is given by the value where the curve reaches one (Santos et al., 2005).
We plotted the empirical PDFs and CDFs normalising the instantaneous concentration c by the mean concentration C (Fig. 4 and Fig. 5). On top of this, we attempted to fit some standard two-parameter statistical distributions (gamma, lognormal, Weibull and clipped normal) to the Uttenweiler observational data. The resultant fits clearly show the remarkable changes in the shape of the concentration PDFs depending on the experimental conditions and receptor position in the field. None of the parametric statistical distributions fit the entire range of the data effectively for all sample trials. From a qualitative appraisal of the fits (the goal here is not to assess the goodness-of-fit statistics and criteria), the gamma and Weibull perform reasonably well. The clipped normal also does in general a good job, particularly on the upper tail. The clipped normal however fits the data from Trial E, Receptor 2 (Fig. 4, bottom panels) rather poorly in the entire range of the distribution. This is explained by the fact that conditional (in-plume) fluctuation intensities are restricted to unity for the clipped normal (Wilson, 1995). For Trial E, Receptor 2 the in-plume i c was 1.3. Overall, the lognormal fails to capture the high concentration peaks, which are relevant in non-linear exposure assessments, except for Trial M, Receptor 1. For this case, Fig. 5 (top panel on the left-hand side) shows that the gamma distribution approximates the lognormal. Besides, for Trial E, Receptor 1 (Fig. 4, top panel on the left-hand side), one can note that the gamma moves toward an exponential-like distribution.
The versatility of the Gamma to fit different types of datasets has long been acknowledged. The PDF of the gamma distribution accepts diverse shapes depending on its shape parameter (the same is true for the Weibull distribution). The potential existence of a universal function for the concentration PDF that can be appropriately described by a family of one-parameter Gamma distributions has been proposed in the scientific literature (Nironi et al., 2015;Villermaux and Duplat, 2003;Yee and Skvortsov, 2011). Particularly Nironi et al. (2015) showed that the concentration PDFs were well described-on the plume centreline-by a family of one-parameter Gamma distributions. Their wind tunnel study considered a single-point source placed in neutrally stratified boundary layers. The one-parameter Gamma distribution was capable of reproducing the plume centreline evolution of the shape of concentration PDFs as the distance from the point source increased. The referred Gamma distribution can take into account in-plume concentration fluctuations (in other words, intermittency is not incorporated). Specifically, from an exponential distribution in the near-field, passing by a log-normal-like distribution with short tail in the intermediate field, to a normal-like distribution in the far-field (Nironi et al., 2015). Such shape transitions were mathematically controlled by i c only. Physically such transitions were interpreted in light of the intermittency factors due to the meandering and relative dispersion mechanisms controlling the turbulent mixing (Nironi et al., 2015).
Although the Uttenweiler field experiment has been used as a benchmark for odour modelling studies (Ferrero and Oettl, 2019;Hoinaski et al., 2017;Janicke and Janicke, 2007;Oettl and Ferrero, 2017;Piringer and Baumann-Stanzer, 2009;Stocker et al., 2019), in our view, it has at least two limitations that should be noted. The first limitation relates to the frequency of the concentration measurements of 0.1 Hz. According to Mylne and Mason (1991), such a frequency response cannot resolve the smallest scales of turbulence, nor is fast enough to resolve the time-scales related to the frequency of the human sense of smell of approximately 0.3 Hz. Second, the dataset on concentration fluctuations consists of 28 time series obtained from 14 trials involving two measurement points. Of course, such a dataset cannot incorporate a wide range of dispersion and emission conditions found in reality.

Mean concentration
We carried out the evaluation of LASAT's performance for the mean concentration by including the data of all 11-12 receptors present in each trial. Current dispersion models can potentially return predictions with higher accuracy for an averaging time of typically 1 h, which is linked to their current internal turbulence parameterisations. However, the averaging time in Uttenweiler was 10 min. Care is thus necessary when evaluating model performance under such an averaging time. Fig. 6 presents a scatter plot for displaying the overall relationship between observations and predictions. The 1:1 solid line (perfect model) and the 1:0.5 and 1:2 dashed lines (FAC2) were added to the scatter plot to assist the interpretation of results. We mapped the data points by atmospheric stability (shape) and field (colour). By doing so, a single plot conceives a great amount of information. Overall, one can note that LASAT performed satisfactorily because a decent fraction of the data is within �FAC2. The results also seem to show that the model predicts the mean concentrations fairly well at most receptors. However, a bias towards model underprediction, mainly occurring in the near-field, was detected. LASAT did not predict adequately the highest observed mean concentrations, which occurred in the near-field under both neutral and stable conditions. In contrast, LASAT overpredicted a few observations in the near-field under neutral conditions. Thus, the model showed improved performance in the far-field. Table 6 quantitatively dissects the previous findings through the computation of several statistical metrics. Besides computing the metrics for the whole dataset, we also scrutinised how model performance differs by distance or atmospheric stability. The FAC2 value for the "All cases" category shows that 53% of predicted concentrations were within a factor of two of the observations. On average, LASAT underpredicted the mean concentration with MB ¼ 1.98 μg m À 3 , NMB ¼ 30%, MAE ¼ 3.47 μg m À 3 and RMSE ¼ 6.31 μg m À 3 . The values for FB and NMSE were 0.35 and 0.68, respectively. The Pearson correlation coefficient r and IOA values were 0.67 and 0.71, respectively. For the "By distance" category, all metric results consistently confirm better model performance in the far-field. For the "By stability" category, most metric results do indicate better model performance under stable atmospheric conditions, however, for some metrics (MB, NMB and FB), the differences were not striking and therefore not conclusive.
As an attempt to gain more insights, we explored the results further altering the conditioning by field and atmospheric stability. Fig. 7 reports the outcomes in the form of normalised Taylor diagrams. The normalisation was done for the centred RMSE and the standard deviation of the predictions by the standard deviation of the observations. In the Taylor diagram, model performance is related to the distance of the marker to the "observed" point and its black dashed line. The closer the marker is to the "observed" point and the black dashed line, the better the performance of a model. The normalised Taylor diagrams also show the general tendency for underprediction by LASAT. But they further reveal that LASAT tended to overpredict for very little the variability of concentrations in the far-field under stable conditions. This understanding is because the marker linked to this conditioning lies outside the black dashed line (Fig. 7, left panel). In fact, the marker closest to the "observed" point is this one linked to far-field and stable conditions. All additional markers lie inside the black dashed line, demonstrating therefore the underestimation tendency. Fig. 8 presents quantile-quantile (Q-Q) plots. While the prior results represent a strict evaluation because the data points are paired, Q-Q plots are less restrictive because pairing is not accounted for. To create a Q-Q plot, the data are first sorted by rank, and then their quantiles are plotted against each other. Strictly speaking, pairing is not fully considered for the "All cases" subplot. For the other cases, there is some pairing (in space) as per the downwind distance. The general tendency for underprediction of LASAT is again captured, although at some parts of the distribution LASAT in particular with the lower concentrations, shows a very good performance. The distribution again suggests higher bias (towards underestimation) over the high-end of the mean concentrations. Again from this angle, the results indicate that LASAT tended to slightly overpredict in the far-field under stable conditions.
Previous studies that explicitly stated having considered building downwash effects have reported a bias towards underprediction too   (Janicke and Janicke, 2007;Oettl and Ferrero, 2017;Stocker et al., 2019). We conclude that LASAT provided an overall good agreement for the mean concentrations; even though its performance was evaluated under an "unfair" situation, viz., for an averaging time of 10 min. Fig. 9 presents scatter plots of Ψ 90 observations against predictions for the three evaluated approaches. Approach 1 overestimated all Ψ 90 observations as shown by the data points lying above the 1:1 line (Fig. 9,  top panel). Besides, one can note that for Approach 1 there is a systematic divergence from the 1:1 line, pointing out unmodelled Fig. 7. Normalised Taylor diagrams for mean concentrations considering the distance from the source (near-field or far-field) and atmospheric stability (neutral or stable) as conditioning variables. behaviour. This unmodelled behaviour was certainly expected due to the use of a constant peak-to-mean factor, as it is the case of Approach 1 that uses the constant factor of 4. Of concern is that there seems to be a systematic disagreement from the 1:1 line for Approach 2 as well (Fig. 9,  central panel). This time however the other way around, as Approach 2 tended to underpredict most of the Ψ 90 observations. These simple scatter plots prove their usefulness by suggesting that both Approach 1 and Approach 2 are missing some features of the system.

Peak-to-mean factor Ψ 90
The prediction skill of Approach 3 is reasonably good at most receptors; a good fraction of the data points is within �FAC2 (Fig. 9, bottom panel). For Approach 3, it can be seen that the data points lying outside the �FAC2 lines occurred for receptors located in the near-field under stable conditions mainly. A tendency for overestimation by Approach 3 can also be awaited because there are more data points within the 1:2 line than the 1:0.5 line.
The inference acquired from such an analysis (Fig. 9) is that as the inherent complexity of the three approaches grows, more accurate predictions of Ψ 90 are obtained. The Ψ 90 scatter plots were also quantitatively scrutinised by statistical metrics (Table 7). Results were not computed for r and IOA because Fig. 9 shows that these correlation metrics would return meaningless results. Correspondingly, a Taylor diagram was thus unwanted. All metric results shown in Table 7 are dimensionless because Ψ 90 is by definition a non-dimensional quantity. It is important to stress that the interpretation of such results should be linked to the observed Ψ 90 range (1.3-3.8, Table 5).
The FAC2 values, a measure which is not excessively influenced by outliers (Chang and Hanna, 2004), were quite decent for Approach 1, Approach 2 and Approach 3, namely, 61, 68 and 79%, respectively. Among the three evaluated approaches, the worst performance by Approach 1 is once more detected with the highest errors for almost all metrics (e.g., MB ¼ 1.64, NMB ¼ 70%, MAE ¼ 1.64 and RMSE ¼ 1.79). Also, the fact that Approach 3 performed best in general was again spotted. The results of Approach 1 were not far-off from those of Approach 2 (e.g., MB ¼ À 1.04, NMB ¼ À 44%, MAE ¼ 1.04, and RMSE ¼ 1.28). NMSE is an estimation of the overall deviation between two datasets. In this sense, Approach 2 had noticeably higher NMSE than Approach 1. RMSE is sensitive to outliers, or else, it highlights larger errors. The MAE is similar to RMSE, but an absolute value is taken instead, thereby reducing the error towards large value events (Bennett et al., 2013;Jackson et al., 2019). In either case, Approach 3 had slightly higher MAE and RMSE than Approach 2. This might be because scaled error metrics such as MAE and RMSE do not respond to linear transforms or offset (Jackson et al., 2019). Table 8 reports Ψ 90 results conditioned by receptor position in the field, despite the relatively lower number of samples for such an analysis. Approach 1 outcomes did not change much as a function of distance categorisation, except for a better FAC2 result in the near-field. Approach 2 returned better results in the near-field, as expected. Conversely, the results of all metrics reveal that Approach 3 had superior performance in the far-field. Table 9 reports the Ψ 90 results conditioned by atmospheric stability. This analysis also suffers from the limitation of a relatively lower number of samples. In general terms, Approach 1 predicted Ψ 90 slightly better under neutral conditions. Nevertheless, differences between neutral and stable conditions are not pronounced, except for a higher FAC2 result under neutral conditions. The same is true for Approach 2 for which a more palpable difference between stability conditions could be expected though. Why this did not occur can be explained in light of the experimental conditions for which Approach 2 was confronted, as clarified previously. In summary, the results displayed in Table 9 suggest that both Approach 1 and Approach 2 performances were independent of atmospheric stability.
As an attempt to better understand the behaviour of the Ψ 90 results that Approach 3 provides, a contrast of these results can be done against the modelled mean concentrations, by considering the distance from the source and atmospheric stability. In terms of the latter, LASAT exhibited in general a better performance under stable conditions. However, for some metrics, differences in metric results were not striking and so not conclusive. Such an outcome is somehow reflected in the Ψ 90 results given by Approach 3 (Table 9). This time, coincidentally or not, only the error metrics that indicate bias (MB, NMB and FB) are better for Approach 3 under stable conditions. On the other hand, the error metrics that do not indicate bias (MAE, RMSE and NMSE) are better for Approach 3 under neutral conditions. A noticeable difference in results was captured by NMSE in which the error is more than double under stable conditions than neutral conditions. We remark that confidence interval estimation for the metric results is outside the scope of this work; thus we cannot firmly state that there are statistically significant differences between the comparisons. Regardless, future research may be needed, for example, exploring a broader range of experimental settings, to arrive at a firmer conclusion on the performance of Approach 3 concerning atmospheric stability. We consider this important because Approach 3 can be applied for regulatory purposes. In such cases, odour modelling studies are typically conducted using at least a one-year meteorological dataset (Brancher et al., 2019a) comprising all stability conditions. Regarding the distance from the source, the attempted contrast is clearer. LASAT had improved performance in the far-field, consistent with the Ψ 90 results given by Approach 3. Fig. 10 presents quantile-quantile (Q-Q) plots for the Ψ 90 results as from Approach 2 and Approach 3. A thicker red line highlights the interquartile range (between the 75th and 25th percentiles) for referencing the region with the bulk of the data (50%). Even if the Q-Q plot does not consider data pairing, the overall tendency for underprediction of Approach 2 at all parts of the distribution is again made clear. The contrary is true for Approach 3. Accordingly, Approach 3 overestimated the observed Ψ 90 , but not as much as Approach 1. The performance of Approach 2 improves as the Ψ 90 observed quantiles tend to unity. The distributions indicate a lower bias for Approach 3 as compared to Approach 2. Also, from the Q-Q plot, one can note a systematic divergence from the 1:1 line for Approach 2 as the Ψ 90 observed quantiles increase. Fig. 10 (right panel) and Fig. 9 (bottom panel) show graphically that four data points in Approach 3 were set to the minimum value for Ψ 90 of 1.5, as described in Section 2.3. This consideration increased at least slightly the performance of Approach 3. This was however expected because Oettl and Ferrero (2017) stated that Approach 3 was designed to provide more conservative Ψ 90 estimates for the lowest range of values.
As seen e.g. from Fig. 9, Approach 1 performed rather poorly. The use of a constant factor to describe concentration fluctuations and mimic the human sense of smell is of course very practical and straightforward, but it has caveats as our work points out. Peak-to-mean factors depend on many aspects. The main aspects are the distance from the emission source, receptor position in the plume, atmospheric stability,   intermittency and source configuration (Schauberger et al., 2012). Two main limitations inherently related to Approach 2 can explain its outcomes: � According to Eq. (2), the peak-to-mean factors decrease exponentially to reach values close to unity at approximately 100 m from the source. This rapid characteristic attenuation, at such a distance, is similar under all atmospheric stability conditions. As seen, the fastresponse concentration signals in the Uttenweiler field experiment were acquired at distances of ~144 and 279 m on average from the stack. For a rural site in the Austrian flatlands (Kittsee), which is characterised by a large abundance of neutral conditions combined with high wind speeds, that attenuation was retarded, and peak-tomean factors decreased more slowly downwind of a source (Piringer et al., 2015); � Approach 2 is strictly valid along the plume centreline. It seems that in the Uttenweiler field experiment, some receptors were at the borders of the plume, a condition for which Approach 2 was not developed for.
A further limitation of Approach 2 is on the validity of Eq. (3). This equation was stated to give a good approximation when the Obukhov stability parameter ζ is less than 0.1 (Mylne and Mason, 1991). As our results point out, not only neutral conditions but also stable conditions occurred in the Uttenweiler field experiment. More direct estimations of ε can be obtained for instance from the velocity spectrum in the inertial subrange, but this depends, first of all, on the (high) frequency of the ultrasonic anemometer (Kaimal and Finnigan, 1994).
Approach 3 returned the best Ψ 90 results across the board. However, this approach also has limitations that should be underlined, as Fig. 11. Predicted Ψ 90 contour plots using Approach 3 for Trial E until Trial H. The red crosses denote the two receptors at which fast-response concentration measurements were undertaken. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) acknowledged by Oettl and Ferrero (2017). The estimation of t d is important in computing concentration variances. This parameter is a dissipation time scale, characteristic of the decay of concentration fluctuations in the scalar field. Molecular diffusion effects are gathered into t d , making it crucial for describing the concentration mixing and the subsequent decay of the variance of the concentration fluctuations (Hsieh et al., 2007;Yee et al., 2009). It has been set as t d ¼ 2 T L3 . This simple assumption works to eliminate the source geometry dependency, and so makes the approach readily available for practical cases with different emission source typologies (Oettl and Ferrero, 2017). However, Approach 3 has not been evaluated against concentration fluctuation measurements in overlapping plumes, having in mind that such datasets are uncommon. Approach 3, as reported elsewhere (Oettl et al., 2018), has been tested on the basis of odour-hour frequencies and by multiplying the Weibull PDF by 1.5. This is different than in the original formulation where it was taken to the power of 1.5 (Oettl and Ferrero, 2017).
The source term for the concentration variance is proportional to the square of mean concentration gradients (Eq. (4)), so that underestimating or overestimating the mean concentration can affect that source term. As the distance downwind increases, the plume grows, meandering is restrained and the relative dispersion becomes dominant, thereby causing more homogeneous mean concentration gradients. The result is a decrease of the source term in Eq. (4). Bearing this in mind, as shown previously, on average the dispersion model underestimated the mean concentration, while Approach 3 overestimated the Ψ 90 values (both taken at receptor height). In fact, before going down to the end of the chain (namely, Ψ 90 ), an overall overestimation of the concentration fluctuation intensity i c (see Fig. S2 in the Supplementary Material) was detected. Using Approach 3, the predicted Ψ 90 contour plots are shown for all trials (Fig. 11 and Fig. 12). We did not apply special methods, such as low-pass filters, to smooth the Ψ 90 contour maps. Ψ 90 is evidently highly dependent on i c . The expected overall picture of lower fluctuation intensities and Ψ 90 values in the plume centreline and higher values for these quantities in the borders of the plume was well captured in most trials. Particularly for Trial E, Trial F and Trial I such an overall picture occurred up to a certain distance from the source, and then it is worsened.
Moreover, the configuration of the Uttenweiler field experiment consisted of trials with winds blowing from the source to the receptors. In Oettl and Ferrero (2017), high Ψ 90 values appeared further behind the source. This suggests that the fluid particles, driven by the turbulent field, moved toward that direction. Oettl and Ferrero (2017) initialised the GRAL dispersion model considering averaged wind data and, for all trials, neutral stability class; turbulence quantities were thus estimated by the model parametrisations, which embraces the standard deviations of the wind velocity components. In the present work, the dispersion model was initialised with wind and turbulence data measured at z ¼ 3.5 m a.g.l. The resulting Ψ 90 estimations (Figs. 11 and 12) at receptor height practically did not depart from the modelled plumes. The sampling error inherently related to Lagrangian stochastic models was satisfactory for LASAT, specifically where low concentrations arise (edges of the plume) and by definition i c increases considerably there. The values of Ψ 90 ¼ 1.5 all over the model domain are due to the a priori consideration of this minimum.

Summary, conclusions and future outlook
This work contributes to the limited body of research formally addressing the performance of approaches with a view to predicting subhourly odour peak concentrations. Reliable calculations of these values are a key aspect to predict annoyance caused by odour exposure. Here, the 90th percentile characterised the peak value. Thus, the peak-tomean factor Ψ 90 , given by normalising the 90th percentile by the mean concentration, was the quantity of interest when evaluating three selected approaches. Using concentration measurements from the Uttenweiler field dispersion experiment (B€ achlin et al., 2002), the selected approaches were the factor of 4 currently enforced in Germany -Approach 1 (GOAA, 2008), an empirical-based peak-to-mean procedure -Approach 2 (Piringer et al., 2015), and the concentration-variance computation -Approach 3 (Oettl and Ferrero, 2017). Approach 1 assumes a constant Ψ 90 , while Approach 2 and Approach 3 have been designed to provide spatially varying peak-to-mean factors. Approach 1 was considered as it is. Approach 2 was slightly modified by considering another parametrisation for the dissipation rate. Approach 3 was tested, taking into account its original formulation, as a post-processing routine of LASAT, a Lagrangian particle dispersion model, for the first time. Efforts were also dedicated to investigate the mean flow conditions encountered in the different dispersion trials, to evaluate LASAT's performance for the mean concentration, and to compute observed concentration fluctuation statistics.
The following conclusions are drawn: � The results favoured the use of more sophisticated modelling techniques, which seek to more truly reflect the physics of concentration fluctuations. By doing so, more accurate predictions of sub-hourly peak concentrations were obtained. That is, Approach 3 performed best, which confirms the research hypothesis posed in the Introduction; � Approach 1 overestimated all Ψ 90 observations. Under tiered regulatory frameworks, advanced modelling methods are typically required if a screening analysis indicates that a standard could be exceeded. In this particular context, Approach 1 could be seen as a screening procedure to indicate whether more refined peak-to-mean modelling is necessary. Screening procedures often use simpler techniques and are more conservative; � Approach 2 was strongly biased towards underestimation, mainly caused by the rapid exponential decrease of Ψ 90 with downwind distance. This approach is strictly valid along the plume centreline and so cannot account for receptors at the borders of the plume; � Approach 3 tended to overestimate Ψ 90 , however not in the way Approach 1 did. Approach 3, in its current form, restricts the solution space of Ψ 90 . However, for the tracer study considered herein, a satisfactory agreement was generally found for the Ψ 90 observations though; � A closer examination of the available ultrasonic anemometer data revealed that the trials were performed under neutral and stable atmospheric conditions. This finding has important implications for interpreting the results of any study using the Uttenweiler field experiment dataset. To the best of our knowledge, several studies over the years have considered neutral conditions for all trials; � The observed concentration fluctuation intensity i c varied from 0.23 to 2.17, while the observed Ψ 90 varied from 1.3 to 3.8. There were no appreciable differences in these results between stability conditions. Considering the whole fast-response tracer concentration dataset, indicative mean values of C, i c and Ψ 90 equal to 9.95 μg m À 3 , 1.26 and 2.9, respectively, were obtained; � For some trials, i c and Ψ 90 were higher at larger distances from the emission source. The most likely explanation appears to be the crosswind position (edges of the plume) of some receptors. Approach 3 had capacity in capturing this occurrence; � There is currently no universal solution for the functional form of the one-point one-time concentration PDF of a passive scalar. Exemplarily fitting conventional two-parameter PDFs (gamma, lognormal, Weibull and clipped normal) to concentration fluctuation measurements illustrated the difficulties that the tested distributions have to fit a wider range of the data adequately; � LASAT's evaluation for the mean concentration showed a satisfactory performance under the given study settings. On average, the dispersion model tended to underpredict. Better agreement was obtained for distances further from the source.
As any other tracer study, the Uttenweiler field experiment has qualifications (e.g., relatively flat and homogeneous terrain, two conjugated buildings, daytime releases, single-point source, limited temporal and spatial resolution of concentration measurements and atmospheric stability range). Accordingly, there is a clear need for further research to improve our understanding of the general validity of the approaches herein investigated. Approach 3, in particular, is attractive for practical applications, but further evaluations and sensitivity analyses are highly recommended. Of importance in this context, there is a general need for more public tracer study datasets with the higher frequency concentration measurements so much relevant to investigate concentration fluctuations. Such a need should come hand in hand with the latest technological instrumentation. Also, future model evaluation studies should try to consider the uncertainty of the measurements to reach even more solid findings. Finally, when using dispersion models for compliance demonstrations, the demand for fitness-for-purpose for these models to cope with all atmospheric conditions occurring year-round is growing steadily. The current investigation was able to show the benefits of an analysis of observational data and the associated model evaluation exercise. This represents an essential means to gain more confidence in the every-day use of dispersion models.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.