Holistic approach to wind turbine noise: From blade trailing-edge modifications to annoyance estimation

Wind turbines represent an encouraging option for sustainable energy but their noise emissions can be an issue for their public acceptance. Noise reduction measures, such as trailing-edge serrations or permeable inserts, seem to offer promising results in reducing wind turbine noise levels. This manuscript presents a novel holistic approach for perception-based evaluation of wind turbine noise and the performance of reduction measures using synthetic sound auralization. To demonstrate its feasibility, a case study featuring four state-of-the-art noise reduction trailing-edge add-ons synthetically applied to two full-scale wind turbines at nominal power is presented. The synthetic sound signals were auralized and propagated to three observer locations. The expected annoyance in each case was estimated by employing a combination of psychoacoustic sound quality metrics and a listening experiment featuring 16 participants. A close relation was found between the results of the psychoacoustic metrics and the listening experiment. In general, this holistic approach provides valuable information for the design of optimal noise reduction measures and wind turbines.


Introduction
Wind turbines are a promising approach for producing sustainable energy but their noise emissions are an important cause of annoyance for the population living next to wind farms and, thus, represent a critical factor for their public acceptance [1][2][3]. Despite typically having noise exposures with lower sound pressure levels ( p ) than other sources of community noise, such as aircraft [4,5] or road traffic [6], the percentage of highly annoyed people (%HA) due to wind turbine noise is higher and has a steeper dose-response relationship than for those other community noise sources at a given noise level [7][8][9][10][11]. Therefore, noise regulations for wind turbines become stricter with time, whereas the ever-increasing demand for wind energy only aggravates this issue even more. Because the cost of building wind turbines on land is considerably lower than offshore, an increasing number of turbines are being erected near densely populated areas [7]. The emitted noise levels may prevent wind turbines from operating at maximum power settings and to even stop operating at night. In practice, wind turbine noise levels are typically controlled by operational restrictions, such as low-noise modes, that limit the power generation [12] and, hence, of revenue. A decrease of a single decibel in a usual wind turbine noise signature is expected to enable an increase in power generation by 2% to 4% [13]. Taking into account that the expected lifetime of a typical modern wind turbine is approximately 20 * Corresponding author.
The horizontal axis wind turbine (HAWT) is the most popular and most widely implemented design worldwide [15]. These turbines are usually located in large numbers in so-called wind farms or wind parks. HAWT noise emissions depend on several parameters, from which the wind speed (and consequently the rotational speed of the blades, usually selected for maximizing the power generation), the rotor radius , and hub height are the most important [15]. The actual noise exposure perceived by nearby residents also depends on the sound propagation conditions, which are a function of the observer location and the characteristics of the environment (e.g. ground type) and atmosphere (i.e. air temperature, relative humidity, atmospheric turbulence, etc.). An acoustic measurement procedure for estimating the noise impact near wind farms taking the sound propagation into account has been conceived by Gallo et al. [16].
Wind turbine noise is a considerably complex sound that consists of broadband noise and sometimes additional discrete tonal components [1,15,17]. Furthermore, the motion of the blades causes a periodic amplitude modulation (AM) of the sound with the modulation frequency being equal to the blade passing frequency (BPF). This sound characteristic is usually described in literature as an annoying swishing, lapping or thumping hearing sensation [8,15,[18][19][20] turbine noise is typically assessed using classical indicators that describe the sound exposure in a general and highly averaged way [21,22], such as the equivalent continuous A-weighted sound pressure level ( p,A,eq ) [15]. Some studies [23] reported that HAWT sounds with tonal components and a stronger high-frequency content (for frequencies above 1 kHz) were perceived as more annoying and caused higher awareness than those without tonal components and a stronger lowfrequency content, despite having the same p,A,eq value. Therefore, it is questionable to only use these conventional indicators to assess wind turbine noise and its abatement procedures, since they do not fully capture the sound properties responsible for the perceived annoyance [7,18,22,24].
From all the wind turbine noise generation mechanisms, the turbulent boundary layer trailing edge (TBL-TE) noise of the rotor blades is considered as the main noise source of modern HAWTs operating within their typical envelope [25,26]. TBL-TE noise is generated when the unsteady pressure surface fluctuations convected within the boundary layer arrive at the trailing edge, where they experience a sudden change in acoustic impedance and scatter efficiently as broadband noise [25]. The noise emissions generated at the blade sections near the tips are usually dominant due to their comparably higher velocity [25,27]. Several noise reduction measures have been proposed to alleviate this acoustic impedance mismatch [28,29], from which trailing-edge serrations [30,31] and permeable inserts [32][33][34] seem to be the most promising approaches, showing sound pressure level reductions ( = ,baseline − ,add-on ) up to approximately 10 dB in some one-third-octave frequency bands, with respect to the straight, solid trailing-edge baseline configuration. However, there is little literature [35] investigating how these reductions in p are perceived by the population in terms of noise annoyance reduction, if any.
During the last decades, remarkable scientific progress has been achieved in the field of auralization of environmental acoustical sceneries [36]. Auralization is the technique of artificially making an acoustical situation audible, which can be considered as the acoustical counterpart to visualization. The auralization process typically begins with the generation of a source audio signal that is then filtered for a selected observer location to consider propagation effects and finally reproduced as an audible sound field [36]. Current parametric, physicsbased calculation models are already able to auralize wind turbine noise scenarios [12,37] and create plausible listening experiences. This simulation approach enables the evaluation of listening experiences that do not necessarily exist in reality. For example, a recent study by Pieren et al. [22] evaluated future low-noise aircraft technologies and flight procedures.
The objective of this research is to present a novel holistic approach for a perception-based evaluation of wind turbine sounds and the performance of noise reduction measures. In this multidisciplinary approach, different wind turbine scenarios are assessed by human listeners within a virtual acoustical environment regarding their induced short-term noise annoyance. The proposed simulation process considers the wind turbine type, its operational conditions, sound propagation effects, as well as human perception. The feasibility of this novel approach is demonstrated with a case study based on two existing HAWTs as a baseline, which are both synthetically equipped with four different state-of-the-art add-ons for TBL-TE noise abatement. The expected changes in noise generation ( ) of the measures considered are obtained from previous research, including wind-tunnel experiments [34,[38][39][40][41] and computational aeroacoustic (CAA) simulations [42], and scaled up to full-scale wind turbine conditions. Different virtual observer locations are considered for this study, including the standard location for noise certification of wind turbines [17]. The associated noise annoyance in each case is estimated in a psychoacoustic listening experiment with human subjects, as well as by the computation of psychoacoustic sound quality metrics (SQMs) [43][44][45]. This approach provides valuable information for effective future improvements and optimization of wind turbine noise abatement procedures. This manuscript represents an extension of some preliminary results in [35].
The paper is structured as follows: The proposed approach is explained in Section 2, including a brief explanation of the required inputs, the auralization process, the SQMs, and the listening experiment. The details of the case study are described in Section 3. Section 4 analyzes and discusses the results obtained and, lastly, Section 5 contains the main conclusions drawn. Fig. 1 illustrates the core concept of the approach proposed in this paper. It describes the assessment of the performance of current and future wind turbine noise reduction measures, considering human perception, by creating a virtual reality (VR) with audio simulations based on data from laboratory and field measurements. The process begins with the selection of the wind turbine type and operational conditions. Ideally, wind turbine noise field measurements would be available as an input for the parametric simulation tool. For the current case study, the outcomes of a previous experimental campaign featuring two different full-scale HAWTs [12,37] were employed and used as validation, see Section 3.2. The present study only considers acoustical scenarios featuring one HAWT, but this approach can easily be extended to evaluate full wind farms, see left dashed block in Fig. 1. The design of the wind turbine blades (i.e. the geometry and material) and the results from wind-tunnel experiments and CAA simulations featuring novel noise reduction measures are also input to the simulation tool. In this research, the values achieved by the noise reduction measures considered as reported in the literature [34,[38][39][40][41][42] are scaled up to represent full-scale wind turbine conditions. Using a parametric wind turbine noise synthesis tool [12,37] and considering the observer locations and sound propagation model selected by the user, the sound signals for all the cases considered are then auralized. The noise annoyance is then estimated by using SQMs and psychoacoustic listening experiments. The main principles of these processes are explained in the following subsections.

Overview of the approach
Considering that the annoyance generated by wind turbines also depends on the visual impact of these devices [46][47][48][49][50], the approach proposed here can be potentially enhanced by using VR audio-visual simulations [51], see right dashed block in Fig. 1. This is, however, considered as future work because the main application of the present research is to only evaluate the influence of sound.

Acoustical field measurements
Acoustical field measurements with full-scale wind turbines provide valuable information because they represent the actual blade geometries and operational conditions [52], compared to typical windtunnel experiments and CAA simulations, which normally only consider airfoil sections. Moreover, the time dependency of the noise levels, such as the AM at the BPF, can also be studied in this kind of field measurements [14].
In comparison with wind-tunnel experiments and CAA simulations, however, there is lower or no controllability of the flow characteristics, typically higher distances between sound source and observer, less accurate knowledge about the exact model position, and the fact that the sound source is in constant motion [53,54]. In addition, these experimental campaigns are typically more expensive than wind-tunnel tests.

Wind-tunnel measurements
Measurements in wind tunnels are one of the main methods for aeroacoustic testing for the last decades [55][56][57], as they offer controllable conditions (flow velocity, angle of attack, etc.), repeatability of the results, and, in most cases, accurate knowledge of the model position.
One of the main challenges for wind-tunnel experiments is to replicate the exact conditions present at full-scale wind turbines in a field. The smaller scale and usually less-detailed geometry of the wind-tunnel models, together with the typically lower flow velocities, can lead to a discrepancy in the Reynolds number [58], see Section 2.5.

Computational aeroacoustic simulations
Computational aeroacoustic simulations typically aim at solving the equations for fluid motion and the acoustic wave equation [59][60][61], which account for the time-resolved aerodynamic sound generation and the propagation of the acoustic waves, respectively. These computations are normally very time-consuming, which limits the complexity of the shape of the model to be analyzed to relatively simple geometries [62]. On the other hand, they provide accurate information on the aerodynamic properties within the whole flow field selected.

Up-scaling of the noise reductions
The broadband noise reductions p measured in wind-tunnel experiments and CAA simulations need to be scaled to represent the geometry and flow conditions of the full-scale wind turbines. This is due to the typically smaller model size and lower flow velocities ∞ employed in these two approaches with respect to the actual conditions present in full-scale field measurements. In this research, it was assumed that the noise reduction measures considered provide the same p value for the same non-dimensional frequency, expressed as the Strouhal number based on the displacement thickness of the boundary layer at the trailing edge * , which is defined as: where denotes the frequency in Hz. The flow velocities ∞ around the airfoils tested in the previous studies on the selected noise reduction measures ranged between 20 m/s and 30 m/s, see Table 1. These values are considerably lower than the typical flow velocities around the blade tips of a full-scale wind turbine in normal operational conditions, which are approximately 70 m/s for the two wind turbines considered in this study, see Section 3.2.
Moreover, ∞ influences * , which also depends on the chord lengtĥ of the rotor blade. All the investigations on the noise reduction wherêis the chord-based Reynolds number, is the air density, and is the dynamic air viscosity. The assumptions made within this rather simple scaling imply some limitations that are discussed in Section 4.1.

Auralization of wind turbine sound
The auralization tool for wind turbine noise employed in this research was developed by Pieren et al. within the research project VisAsim [64], and consists of a parametric emission synthesizer [12], a propagation filter and a vegetation noise synthesizer [37], a reproduction rendering [65], and an audio-visual reproduction system [51,66].
The parametric synthesizer is based on spectral modeling synthesis and characterizes a wind turbine signal only by a few components, such as tones, broadband noise, and temporal variations. Sounds are artificially generated considering the narrowband spectral content (including potential discrete tonal components) and the frequencydependent AM in one-third-octave bands [12]. For each band, level fluctuations are synthesized as the superposition of a random process and a periodic function steered by the BPF. Overall, the model employs approximately 120 low-precision input parameters, which were obtained by analyzing sound pressure signals from wind turbines recorded in field experiments [12]. A previous listening experiment with 12 experienced listeners demonstrated that the synthetic wind turbine sounds are so realistic that they can be easily confused with the original sound [12]. Audio-visual experiments in [66] further showed only minor differences in the annoyance ratings between recordings and simulations of wind turbine noise.
The signal processing involves the following five consecutive steps: 1. Signal analysis is applied to a field recording to deduce the synthesis parameters, which would allow to reconstruct or resynthesize the recording. The signal operations involve a separation into frequency bands and into different AM types and modulation bands. 2. Sound propagation effects are inverted to derive synthesis parameters at a virtual source position at the hub of the wind turbine. 3. Possible modifications to the source parameters are applied (such as the virtual application of noise reduction measures). 4. For the modified source parameters, a source signal of arbitrary duration is synthesized. This is done using digital sound synthesis techniques at a sampling frequency of 44.1 kHz. 5. The source signal is propagated to a receiver location by applying a network of digital filters to the source signal. This results in a sound pressure signal that can be reproduced via loudspeakers and from which acoustic indicators (such as SQMs) can be calculated.
The sound propagation modeling considers the propagation effects that are of relevance in wind turbine noise exposure situations that feature relatively large source-receiver distances and an elevated source. The propagation effects of geometrical spreading, atmospheric absorption, ground effect, and atmospheric turbulence are taken into account. Geometrical spreading is modeled assuming a point source behavior. Atmospheric attenuation due to air absorption is highly frequencydependent and modeled as a function of the propagation distance, assuming a homogeneous atmosphere with constant air temperature and relative humidity. Ground reflection is simulated by considering the vertical source extension and the acoustical properties of the ground. The effect of atmospheric turbulence is modeled by a frequency-and distance-dependent random AM. The proposed methodology allows for changes in the meteorological conditions to investigate their effect in perception, but this was considered beyond the scope of this study. However, it is expected that different meteorological conditions would hardly affect the relative effects of the noise reduction measures.

Acoustic and psychoacoustic sound quality metrics (SQMs)
Acoustic metrics typically employed for assessing wind turbine noise, such as the p,A,eq , have some difficulties when explaining the actual annoyance experienced by people exposed to this type of noise [18]. Some metrics from the field of aircraft noise, such as the effective perceived noise level (EPNL) [69], are more complex and account for the duration of the sound signal and the presence of tonal sound using the one-third-octave band spectrum [70]. The EPNL metric, however, was specifically tailored to analyze the noise impact of turbojet aircraft in the 1960s [69].
Hence, more sophisticated Sound Quality Metrics (SQMs) from the field of psychoacoustics are currently being studied and considered for their application in wind turbine noise [18,35] and aircraft noise [43,44,[70][71][72][73]. In general, SQMs provide sensation magnitudes instead of stimulus magnitudes, i.e. they describe the hearing sensation instead of a purely physical magnitude, such as the acoustic pressure or p . The relationships between stimulus magnitudes and SQMs are typically non-linear, and normally a minimum threshold variation in an SQM is required to generate sensations different enough to hear [45]. Since the explanation of the complete calculation process of each SQM is rather lengthy and out of the scope of this manuscript, only a brief description is provided here. The interested reader is referred to Refs. [43][44][45]74] for more detailed information. The five most common SQMs as also used here are: Loudness ( ) [75], Tonality ( ) [76], Sharpness ( ) [77], Roughness ( ) [78] and Fluctuation strength ( ) [45].
With the exception of loudness [75], the calculation of these metrics is not standardized, and calculations using different algorithms may, thus, differ somewhat [18]. In the current study, the 5% percentiles of these SQMs are employed, i.e. the value that is exceeded 5% of the time. Thus, a subscript ''5'' is added to the SQMs henceforth.
SQMs can be used to estimate human reactions using combined metrics, such as the Psychoacoustic Annoyance (PA) parameter introduced by Fastl and Zwicker [45]. In this research, a modified PA metric as suggested by Di et al. [79] to include also the effect of tonality was employed, and, thus, referred to as modified psychoacoustic annoyance (PA mod ) henceforth. The PA mod metric combines the effects of the five aforementioned SQMs, but, in general, loudness ( ) has the strongest influence.
One of the main advantages of the SQMs and the PA mod metric is that they provide a quick estimate of the annoyance due to a sound signal without performing listening experiments. This can be especially useful for perception-based design loops involving a large number of parameters to optimize (and, hence, sound signals to evaluate) or for calculations featuring a large number of observer positions [22]. In those examples, performing psychoacoustic listening experiments may not be a practical option for assessing all cases, but would certainly be recommended as validation once the number of test cases of interest is reduced to a manageable value.

Perception-based experimental evaluation
While SQMs and the PA mod metric have the advantage of providing quick estimations of the annoyance evoked by a sound without performing a listening experiment, they do not yield as reliable results as a dedicated listening experiment and are, thus, afflicted with uncertainty. Therefore, a listening experiment was performed within this study as the standard of perception-based evaluation. The results will then be compared with PA mod to verify the reliability and applicability of the latter; if it is given, then the perception-based suitability of further trailing-edge modifications might be estimated exclusively by PA mod .
The design of the experiment has to be specifically adapted to the desired endpoint of the evaluation; there is a number of design variables to consider [80]. The present experiment focused on shortterm noise annoyance (see Section 3.3). Other endpoints, however, such as the audibility of tonal components, would require other experimental designs.
A brief explanation of the four studies regarding each noise reduction measure is provided below, whereas their main parameters are gathered in Table 1. All results correspond to a symmetric NACA 0018 airfoil of 0.2 m of chord (̂) at zero angle of attack ( = 0 • ) and, for the cases featuring serrations, a flap angle of s = 0 • was selected (i.e. the serrations were flow-aligned), see Fig. 2(a). All airfoils were tripped at 20% of the chord on both the pressure and suction sides to ensure the transition to turbulent boundary layer.
For the experimental cases, the far-field noise emissions of the airfoil's trailing edge were isolated and measured by using a 64microphone array placed outside of the flow [34,38,81] and acoustic imaging and integration techniques [82][83][84][85]. For the CAA simulations [42], the far-field noise was estimated by propagating the pressure fluctuations around the model employing Lighthill's acoustic analogy [86] using the Ffowcs Williams-Hawkings (FW-H) acoustic analogy [87] and the time-domain formulation of Farassat [60].
All studies included a baseline configuration with a solid, straight trailing edge with respect to which the p values are defined. Besides,  [42]. (c) Illustration of the NACA 0018 airfoil equipped with a permeable trailing-edge insert (in dark purple) [34].
the turbulent boundary layer profile at the trailing edge was measured in all cases, including its displacement thickness * at the trailing edge. This parameter is required for scaling the p values to the full-scale wind turbine geometries considered in this research, as explained in Section 2.5.

Sawtooth serrations
The performance of sawtooth serrations in the reduction of TBL-TE noise was investigated by Arce León et al. [38][39][40] on a NACA 0018 airfoil of 0.4 m of span ( ) manufactured in aluminum, see Fig. 2(a). The airfoil was tested in the open-jet vertical wind tunnel (V-tunnel) at Delft University of Technology, before being refurbished into an anechoic wind tunnel (the current A-tunnel) [14,81].
The serrations employed had a length of 2ℎ s = 40 mm (i.e. 20% of the airfoil chord̂) and a width of s = 20 mm, see Fig. 2(a), and were manufactured in laser-cut steel of 1 mm thickness and retrofitted to the airfoil. For a flow velocity of ∞ = 30 m∕s, noise reductions up to 7 dB were reported in the measurements [38][39][40]. For additional details on the experimental setup, the reader is referred to [38][39][40].

Concave serrations
The results referring to concave serrations were obtained from the study by Avallone et al. [42], which employed the commercial software package Exa PowerFLOW 5.3b based on the Lattice Boltzmann Method (LBM) [88]. A Very Large Eddy Simulation (VLES) was implemented as viscosity model and a total of around 150 million voxels were used for the discretization of the computational domain.
The airfoil had a span of = 80 mm and it was equipped with concave serrations designed in a similar manner as the sawtooth serrations explained in Section 3.1.1, but using a spline curve. At the root, the concave serrations are perpendicular to the baseline straight trailing edge, whereas at the tip they are tangent to the line obtained as the intersection of the tip point with the point at 3/4 of the serration (i.e. 3ℎ s ∕2), see Fig. 2(b). These serrations also had a length of 2ℎ s = 40 mm, a width of s = 20 mm, and a thickness of 1 mm. For a flow velocity of 20 m/s, the concave serrations provided an additional decibel to the maximum noise reduction with respect to the standard sawtooth serrations, i.e. p,max ≈ 8 dB. For additional details on the computational parameters employed the reader is referred to [42].

3D-printed permeable inserts
A similar setup was employed as for the sawtooth serrations study (see Section 3.1.1), but in this case, the experiments were performed in the anechoic vertical open-jet wind tunnel (A-tunnel) [81] of Delft University of Technology, which is the result of the anechoic refurbishment of the previous V-tunnel.
The airfoil employed was manufactured in aluminum and could be retrofitted with trailing-edge inserts of the last 20% of the chord (i.e. 40 mm) made of either solid aluminum (baseline) or permeable materials, see Fig. 2(c). This approach was employed by Rubio Carpio et al. [41] for studying the performance of several 3D-printed permeable inserts for TBL-TE noise abatement. The insert considered in this study was 3D-printed using HTM 140 V2, a high-temperature molding material, see Fig. 2(d). The insert had cylindrical channels normal to the chordwise direction that connected the suction and the pressure side with a hole diameter of 0.8 mm and a spacing between holes of 1.5 mm. This insert presented a permeability of 5.4 × 10 −9 m 2 and a porosity of 0.392. The flow velocity selected for this study was 26 m/s, for which maximum p values of almost 10 dB were reported. It should be noted that these inserts generated a tonal noise at about 630 Hz in the experiments [41]. The physical mechanism responsible for this tonal noise is not yet known, but it was not observed at angles of attack other than zero. More information about the experimental setup can be found in [41].

Metal foam inserts
The study regarding the metal foam inserts was performed in essentially the same way as the one for 3D-printed permeable inserts explained in Section 3.1.3. In a similar experiment by Rubio Carpio et al. [34], trailing-edge inserts of different metal foams were tested on the same airfoil from Section 3.1.3 in the A-tunnel.
From the different metal foam inserts studied, the one providing the highest p values was made of a foam of NiCrAl alloy with a porous cell diameter of 800 μm and a flow permeability of 27×10 −10 m 2 , see Fig. 2(e). The flow velocity selected for this paper was 20 m/s, for which p,max ≈ 10 dB. This add-on presented a considerable increase of the noise emissions for frequencies higher than about 2.5 kHz, most likely due to the surface roughness. For further details of this experimental setup, the reader is referred to Ref. [34].

Case studies
The four trailing-edge noise reduction measures listed in Section 3.1 were synthetically applied to two large, three-blade HAWTs of 2 MW nominal power. Multiple observer locations were considered to account for different sound propagation conditions: the standard certification location and two further locations representing nearby residents, see Fig. 3.
The two wind turbine types were selected based on the field measurements performed by Pieren et al. [12]: 1. WT I was a Vestas V90-2.0 MW [89]. The hub height of the turbine was = 95 m, the rotor radius was = 45 m, and the rotational speed 15 rpm. The measurement was performed in the downwind direction (position no. 1 according to the IEC 61400-11 standard for noise certification of wind turbines [17]). 2. WT II was an Enercon E82-2.0 MW [90]. The hub height of the turbine was = 78 m, the rotor radius was = 41 m, and the rotational speed 16 rpm. The measurement was performed in the upwind direction (position no. 3 according to the IEC 61400-11 standard [17]).
Both recordings were performed at strong wind conditions with the turbines operating at nominal power [12]. The microphone was located at a horizontal distance of 0 = + from the tower on a ground plate, following the IEC 61400-11 standard [17], see Fig. 3. Henceforth, this location is referred to as norm. The terrain around both HAWTs was flat and grassy ground. The measured sound signal had a duration of 20 s and is available in [64].
For both wind turbines, the baseline configuration (without any noise reduction measures implemented), as well as the cases with the assumed scaled p provided by the four noise reduction measures explained in Section 3.1, were simulated and auralized. These audio signals were then propagated to three different observer locations for each turbine, see Fig. 3: 1. An observer placed on the ground on a rigid ground plate at a horizontal distance from the tower of 0 = + (i.e. at the norm location). This location replicates the recording position of the field measurements and follows the IEC 61400-11 standard for noise certification of wind turbines [17].
2. An observer placed at a horizontal distance from the tower of 400 m and a height of 1.7 m over the ground (i.e. approximate ear level of an average standing person). 3. Same as the previous location but for a distance of 600 m, instead of 400 m.
The last two locations aim at representing an observer in residential areas close to a wind farm, but larger distances could also be considered since sound propagation is an explicit part of the modeling. For the simulations, flat grassy ground (with an airflow resistivity of 200 kPa s/m 2 ), an air temperature of 10 • C, and a relative humidity of 60% were assumed for the propagation. These meteorological conditions reflect the spatial and yearly averages of Switzerland (and, in fact, central Europe) and how noise exposure is commonly assessed. Therefore, these conditions were employed within the virtual environment to render the auralizations (but not during the conduct of the listening experiment) and to allow for a direct comparison between wind turbines, noise reduction measures, and propagation distances. All the simulated cases amount to a total of 30 auralized audio signals (two wind turbine types × three observer locations × five trailing-edge configurations).

Psychoacoustic experiments
Perception-based evaluation was performed by means of a laboratory listening experiment in the listening experiments facility AuraLab at Empa in Switzerland, see Fig. 4. AuraLab features controlled room acoustics with a reflective floor, low reverberation time ( mid = 0.11 s) and low background noise level (7 dBA, GK0). The laboratory setup is described in [91]. Monophonic sound reproduction was used via a frontal loudspeaker (Neumann KH 120 A) at a listening distance of 2 m and four subwoofers (Neumann KH 805) for the low-frequency content, allowing for a reasonably flat frequency response between 20 Hz and 20 kHz. Prior to the experiment, the reproduction chain was calibrated with a sound level meter positioned at the listening spot.

Factorial design and stimuli
For the present study, the following design variables were selected: wind turbine type (WT I and WT II ), observer distance (norm and 400 m), and trailing-edge configuration (baseline and the four noise reduction measures considered: sawtooth serrations, concave serrations, 3D-printed permeable inserts, metal foam inserts). The experiment was scheduled with a full factorial design (a complete combination of all levels of all variables) with respect to the design variables, resulting in a total of 20 stimuli (2 wind turbine types × 2 observer distances × 5 trailing-edge configurations). Stimuli duration was set to 20 s, which was previously found to be an optimal stimuli length for wind turbine noise annoyance ratings [8]. The audio data was precomputed as described in Section 2.6.

Experimental procedure
The listening experiment was designed to assess the short-term noise annoyance reactions to the wind turbines in the different trailing-edge configurations. The experiment was approved by the ethical committee of Empa (approval CMI 2020-143 of 8 June 2020). The procedure followed guidelines, such as [74] and [92], and was conducted similarly to the experiments within the framework of perception-based evaluation of future aircraft technologies [22]. The experiments were performed in a within-subject design, in which all subjects were exposed to all stimuli. The subjects performed the experiments individually, one at a time, doing focused tests in which they listened to and rated the stimuli regarding annoyance. To that aim, they used the ICBEN 11point scale [93] to answer the following question taken from [8] during or after playback of each stimulus (in German): ''When you imagine that this is the sound situation in your garden, what number from 0 to 10 best shows how much you would be bothered, disturbed or annoyed by it?'' Here, 0 represents the lowest and 10 the highest annoyance rating.
The experimental procedure consisted of: 1. A short introduction to the research topic.
2. Filling out a consent form to participate in the study. 3. A questionnaire about self-reported hearing capability and wellbeing as inclusion/exclusion criteria for participation. 4. The actual listening experiment with an orientation (example stimuli covering the range of situations the subjects would be exposed to), exercise ratings, and the main experiment. 5. A post-experimental questionnaire with questions on subjects' characteristics, such as gender, age, noise sensitivity (NoiSeQ-R [94]: 0 for noise-insensitive to 3 for highly noise-sensitive), and attitudes towards wind turbines ( [8]: 0 for very negative to 4 for very positive attitude).
A software application with a graphical user interface guided the subjects throughout the experiment, with automatic playback of the stimuli and recording of the entered annoyance ratings. The procedure closely followed the one described in detail by Schäffer et al. [8].
The experiment took about 40 minutes per subject, and the subjects obtained a compensation for participating of 20 CHF (approx. 19 e).
For the playback order of the stimuli, an incomplete counterbalanced hierarchical design was chosen for the two receiver positions and the two wind turbines within the position (four blocks per participant), while the 5 stimuli per block of position per wind turbine (i.e. trailingedge configurations) were partially balanced. Besides, the first stimulus per block of position per wind turbine was repeated in order to be able to check for consistency of the ratings by means of inter-rater reliability. This resulted in 16 to 23 observations per stimulus, depending on the number of times a stimulus was at the beginning of the block and, thus, repeated. Hence, the subjects rated 24 stimuli in the main experiments (20 different stimuli plus 4 stimuli rated twice).

Subjects
Sixteen subjects (8 females, 8 males) with self-reported normal hearing, who felt healthy and well, and who were not tired at the time of the experiment participated in the study. They were 27 to 57 years old (mean of 41.2 years). The participants covered a wide range of possible noise sensitivities with values from 0.9 to 2.5 (median of 2.1). Thus, the participants were rather noise sensitive. Their attitude towards wind turbines was quite positive, with values from 2.0 to 3.4 (median of 2.7). All subjects were employees of Empa.

Statistical analysis
In a first step, the annoyance ratings were first exploratively analyzed and visualized depending on the design variables (wind turbine, observer distance, and trailing-edge configuration), as well as of the sequence (playback number) and the first presented receiver position, which were both found to be important in previous studies [8,51]. Further, the observed annoyance ratings were compared with PA mod by means of regression analysis.
In a second step, the significance of the effects was statistically explored using linear mixed-effects models. These models allow separating fixed effects (here, in particular, the design variables) and random effects (the subjects, which were randomly chosen from a population). Such hierarchical analysis (here, with the lower level representing the individual ratings and the upper level the subjects) has been successfully applied in previous wind turbine noise studies [8,51,91]. As fixed effects, the design variables were tested a priori. Further, two-fold interactions between these three variables were tested to check for inter-dependencies, such as whether the annoyance to the trailing-edge configurations depends on the wind turbine type. Finally, also simple order effects due to the sequence (playback number) of the stimuli [95], as well as primacy effects due to the first presented receiver position were studied, but details are not shown here for sake of brevity. As described in more detail in [8], several models of different degrees of complexity were tested to choose the final model.
In a third step, linear mixed-effects models using either p,A,eq or SQMs as predictors were tested to explore how well these metrics may predict the observed annoyance. For the SQM models, 5 , 5 , and 5 were included as predictors, along with sequence, because 5 and 5 were nearly identical between the stimuli. Again, different models were tested to select the final model.
Personal characteristics, such as gender or age, were also tested in the above models but none of them significantly affected the annoyance ratings (with p ranging from 0.13 to 0.78 1 ), which was also found in previous studies (e.g. [8,51]. In the last step, the goodness-of-fit of the final linear mixed-effects models was assessed according to [96] and [97], using the marginal ( 2 ) and conditional ( 2 ) coefficients of determination to quantify the variance explained by the fixed factors and by the fixed and random factors, respectively. These metrics were used to compare the different mixed-effect models.

Results and discussion
This is the first time that the effects of possible wind turbine blade modifications were artificially made audible before being built in a reallife application and that a virtual acoustic environment was used to assess and compare different wind turbine noise scenarios consisting of different wind turbine types, mitigation measures, and propagation distances. The following results first focus on the frequency spectra, then on the acoustic and psychoacoustic metrics, and finally on the short-term noise annoyance observed in the listening experiment as the main indicator for a perception-based evaluation.

Frequency spectra
The ratio between the TBL displacement thickness values ( * ) expected in the full-scale wind turbine and those measured in each of the studies on the noise reduction measures [34,[38][39][40][41][42] was approximated using Eq. (2). Values of * of around 6 mm were estimated for both wind turbine types considered. Overall, this scaling translates into a reduction of the frequencies for which the p values were obtained of about 15%. Fig. 5 depicts the final aerodynamic broadband noise reduction values for WT I for the four noise reduction measures considered, after interpolating the scaled values to match with the center frequencies of the one-third-octave bands. For frequencies not covered in the studies described in Section 3.1, was set to a value of 0 dB. This assumption is expected to be more valid for lower frequencies, for which the low-noise add-ons are comparably smaller than the acoustic wavelengths and, hence, expected to be less effective. The four curves in Fig. 5 follow a similar trend throughout most of the frequency range considered, except for the high-frequency noise increase of the metal foam inserts (above 2 kHz in the scaled application) and the aforementioned tonal noise at 630 Hz (observed as a local minimum here) generated by the 3D-printed permeable inserts. The narrowband spectra for the five generated sound signals (baseline and four add-ons) for each wind turbine type at the norm observer location are presented in Fig. 6. Tones observed as sharp peaks for WT I at 490 Hz and 620 Hz and for WT II at 3.2 kHz are most likely due to mechanical components in the nacelle [12,15,35] as present in the field recordings.
The assumptions made for scaling the noise reductions from smallscale experiments and simulations to full-scale wind turbines explained in Section 2.5, however, pose the following limitations: • The assumption that the p values hold for the same Strouhal numbers is likely to not be completely fulfilled, given the considerably different flow conditions (Reynolds number, turbulence levels, etc.) found in the wind-tunnel experiments and simulations compared to those in full-scale wind turbines. In addition, all the p values from literature refer to a NACA 0018 symmetric airfoil, whereas typical wind turbines are equipped with more complex and cambered airfoils. The performance of the noise reduction measures in those airfoils is likely to be poorer unless they are redesigned accordingly [98].
• The angle of attack considered for all the add-ons was 0 • and no serration flap angle s = 0 • was considered, see Fig. 2(a). Actual operational conditions will likely differ from these, altering the performance of noise reduction measures [39], most probably towards lower p values. • For simplicity, the noise reduction measures were assumed to be omnidirectional, i.e. to provide the same p in all the emission directions, rather than considering a more complicated and realistic directivity pattern [42,99].
Therefore, due to the combined effect of all these limitations, the expected performance of the four noise reduction measures considered in this research is possibly overestimated, i.e. they might be less effective in reality [52]. Hence, the results presented here should be considered as indicative.

Conventional and sound quality metrics
To acoustically characterize the scenarios, the metrics introduced in Section 2.7 were calculated for the synthesized sound signals. Tables 2  and 3 contain the data for WT I and WT II, respectively.
The values of the conventional metric p,A,eq for all cases are presented in Fig. 7. The p,A,eq are larger for the norm location compared to the representative residential observer locations, as expected due to the sound pressure level decrease with distance. Between the 400 m and 600 m locations, the average level difference amounts to 3.5 dBA. All the trailing-edge add-ons reveal a noise reduction compared to the baseline at all observer locations and for both turbines. Similar trends between the studied measures are observed for the different observer locations and wind turbine types. Overall, the sawtooth serrations, concave serrations, and 3D-printed permeable inserts provide a reduction in p,A,eq of about 3 dBA, whereas the metal foam inserts perform slightly worse and only reduce the levels by approximately 2 dBA, most likely due to their high-frequency noise increase (see Fig. 5). The concave serrations seem to offer the best noise attenuation performance in terms of p,A,eq .
A comparable behavior as for the p,A,eq metric is observed for the EPNL, but with slightly more accentuated differences between the five cases. The concave serrations and the 3D-printed permeable inserts achieve the largest EPNL reductions (up to about 5.5 EPNdB). In general, it seems that larger noise reductions are obtained for WT II and for increasing distance between the observer and the wind turbine.
Also, similar tendencies are observed for loudness ( 5 ) values in phon (i.e. in a logarithmic scale). This is somehow expected since the A-weighting present in the p,A,eq metric is a simplification of the loudness-based threshold of hearing [45]. In this case, however, the differences are more significant, and reach reduction values 5 with respect to the baseline case up to 8 phon for the concave serrations at 600 m distance. In general, the relative improvements provided by the noise reduction measures become more pronounced for larger observer distances. Sawtooth serrations and 3D-printed permeable inserts seem to behave comparably, and offer 5 reductions up to 6 phon. Metal foam inserts, on the other hand, are slightly louder and only decrease 5 up to about 4 phon. WT I presents higher tonalities ( 5 ) than WT II, which agrees with its stronger tonal signature observed in the narrowband spectra, see Fig. 6. All the 5 values range from 0.05 t.u. to 0.2 t.u, which are very similar to those observed by Persson Waye and Öhrström [18] in their experimental study featuring five different wind turbine types at an observer distance of 100 m. All the noise reduction measures slightly increase the tonality with respect to the baseline, especially for the case of WT I. This is explained by the reduction in broadband aerodynamic noise around the two tones present in the baseline spectrum at approximately 490 Hz and 620 Hz (see Fig. 6(a)) provided by the add-ons. A lower broadband noise around the tones decreases the masking of the tones, i.e. they become easier to perceive and, hence, the higher values of tonality. This behavior is less pronounced for WT II because the noise reduction measures have a smaller impact around the main tone present at around 3.2 kHz.
The noise reduction measures (except for the metal foam inserts) seem to reduce the sharpness value ( 5 ) only slightly, since their expected p values for frequencies higher than 2900 Hz (those more relevant for the sharpness metric) are relatively low. The metal foam inserts, on the other hand, seem to increase the sharpness in all cases due to the aforementioned high-frequency noise increase caused by these devices. The 5 values decrease when the distance to the wind turbine is increased because higher frequencies are more strongly affected by the atmospheric attenuation of the sound [100]. The experimental research of Persson Waye and Öhrström [18] found average sharpness values around 2.3 acum at 100 m from the wind turbine, which are larger than the ones observed in this study which are below 1.2 acum. This difference is most likely due to the shorter distance to the source (100 m) and smaller turbine size (with an average rotor radius of about 18 m, instead of 45 m) present in [18], which cause higher noise levels at higher frequencies [15].
The values of roughness ( 5 ) and fluctuation strength ( 5 ) did not vary significantly by the implementation of the noise reduction add-ons and are, therefore, not presented here for brevity reasons. Moreover, these two parameters are considered to be the least important for the calculation of the psychoacoustic annoyance [43,45].     Fig. 8 depicts the mean observed annoyance reactions in the listening experiment as a function of the wind turbine type, observer distance, and trailing-edge configuration. The metal foam does not reduce the annoyance significantly, compared to the baseline case. In contrast, sawtooth serrations, concave serrations, and 3D-printed permeable inserts all result in clearly reduced annoyance ratings compared to the baseline. This reflects the observed broadband noise reductions obtained with the measures (see Fig. 5), as well as the resulting overall reduction in the p,A,eq (see Fig. 7). Further, the observed annoyance to the WT I is higher than to the WT II, and the annoyance reactions to the latter decrease in a stronger way with distance (i.e. from norm to 400 m) than to the former. In contrast, the effectiveness of the measures was quite similar for both wind turbines and did not change between the distances significantly either. Similar performance can, therefore, also be expected for the noise reduction measures at distances between norm and 400 m (and also for larger distances, although this extrapolation of results would have to be experimentally verified).
The mixed-effects model analysis confirmed the observed effects. Trailing-edge configuration, wind turbine type, and observer distance were all significantly linked to noise annoyance and a significant interaction was observed between wind turbine type and observer distance (all p < 0.001). Fig. 9 shows the modeled mean noise annoyance pooled over the two wind turbines as a function of the noise reduction R. Merino-Martínez et al.   Fig. 9. Mean noise annoyance of the noise reduction measures (modeled data (bars) with the respective 95% confidence intervals), for the observer distance of 400 m, pooled over the two wind turbine types. Statistically significant differences between the measures are indicated by differing letters above.
measures for the observer distance of 400 m. As aforementioned, the metal foam inserts did not significantly improve the noise annoyance compared to the baseline, whereas the other measures did. Overall, the mixed-effects model represents the observed annoyance well, explaining approximately 80% of the variance ( 2 = 0.59, 2 = 0.80).

Potential of the SQMs to predict the observed annoyance
Whereas the mixed-effects model presented above explains a large part of the variance of the observed annoyance ratings, it is relatively complex (including also simple order effects and primacy effects), requiring 13 degrees of freedom (details not shown).
To evaluate the potential of the SQMs or the conventional p,A,eq to predict noise annoyance, two alternative models were established (see Section 3.3.4). The model based on SQMs revealed that 5 and 5 , as well as the playback number were both significantly linked to the annoyance (p < 0.001), whereas 5 was not (p > 0.25). Despite being much simpler (the model only requires 5 degrees of freedom) its performance was very similar to the model based on the design variables, explaining 79% of the variance ( 2 = 0.55, 2 = 0.79).
Also the model with p,A,eq revealed a significant link of p,A,eq and playback number with the annoyance (p < 0.001), requiring only 4 degrees of freedom. This model has a good performance as well, explaining 77% of the variance ( 2 = 0.53, 2 = 0.77). To interpret these results, however, one has to consider the large range of p,A,eq covered by the stimuli, whereas other characteristics, such as the fluctuation strength, were nearly identical between stimuli. If the range of p,A,eq was smaller and/or other acoustical characteristics vary, then other characteristics may become more important (see, e.g. [91]), and the p,A,eq will not be sufficient to explain differences in annoyance, as it does not represent other sound characteristics, such as fluctuation strength (see e.g. [8]). Fig. 10 compares the observed noise annoyance in the listening experiment with the estimated one using PA mod . There is a close relation ( 2 = 0.95) between both parameters with a non-linear trend and scatter between the two indicators. However, the relationship between observed annoyance and PA mod is not universally true, but would rather R. Merino-Martínez et al. change for different noise sources and/or experiments. Accordingly, Di et al. [79] found substantially different relationships between the measured annoyance and PA mod in their experiments than those found in the current manuscript. Nevertheless, once the relationship between PA mod and the observed annoyance has been established for a certain data set as in the current study, PA mod can potentially be used to estimate annoyance reactions evoked, e.g. by an additional noise reduction measure or different observer locations.

Conclusions
This paper proposed an innovative holistic approach to estimate the annoyance caused by wind turbine noise and to evaluate the performance of rotor blade trailing-edge add-ons to reduce it. This approach consists of auralizing plausible acoustical sceneries of wind turbine noise using a parametric wind turbine synthesis tool based on field experiments. The expected modifications caused by noise reduction measures in the wind turbine noise emission can be synthetically applied and then propagated to different observer locations. The obtained synthetic sound signals can then be reproduced in listening tests or analyzed with psychoacoustic sound quality metrics to estimate the short-term annoyance.
The feasibility of this approach was successfully demonstrated by incorporating data on noise reduction measures from wind-tunnel experiments and computational aeroacoustic simulations from the literature into the auralization process for the first time. A case study featuring two wind turbine types (Vestas V90-2.0 MW and Enercon E82-2.0 MW), three observer locations (IEC 61400-11 standard location, 400 m, and 600 m away from the wind turbine), and five trailing-edge configurations (baseline case, sawtooth serrations, concave serrations, 3D-printed permeable inserts, and metal foam permeable inserts) was employed as a demonstration. The effects of the blade modifications on the noise annoyance were proven and measured by listening experiments and also estimated by psychoacoustic metrics. In the considered example, the concave serrations showed the overall best performance in reducing the annoyance.
The importance of the sound characteristics of the wind turbine noise for the perceived annoyance was highlighted, such as the tonality, the spectral content, or the amplitude modulation (not varied in this study). The characterization of sound by psychoacoustic metrics can help to quickly estimate the short-term annoyance caused in different scenarios and for different observer locations. This is especially useful if the findings are validated by listening experiments.
The differentiation of the modifications allows for rankings of measures and a more reliable estimation of their effect on the residents living near wind farms. This differential perception provides very valuable information for the design of optimal noise reduction measures and, in general, wind turbine types.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.