Auralization of railway noise: Emission synthesis of rolling and impact noise

Within the research project TAURA, a trafﬁc noise auralization system was developed that covers road trafﬁc and railway noise. This paper focuses on an emission synthesizer for railway noise and presents a concept for rolling and impact noise. The synthesis is based on a physical approach in which the noise generation mechanism is modeled in the time domain. As a starting point, equivalent roughness patterns of each wheel and the rail are generated. These spatial signals are used to implicitly model the mechan- ical excitation of the wheel/rail system. Transfer paths describing the vibrational behavior and the radiation of wheels and rail are implemented as digital ﬁlters. This approach features a high degree of ﬂexibility but requires knowledge of the detailed model parameters.


Introduction
Auralization is the technique to artificially make a situation audible. By incorporating prediction models into the auralization process, this technique allows to listen to situations that do not exist yet. Auralization has a long tradition in architectural acoustics already, notably in room acoustics [1][2][3]. For instance concert halls and opera houses are typically auralized during the planing process. The application of auralization to environmental noise has been discovered only recently. Several studies on the auralization of road traffic noise [4][5][6][7], aircraft noise [8][9][10][11] and wind turbines [12,13] have been published.
To date, only a few studies have been related to the auralization of railway noise. In [14] the sound quality of traction noise of starting vehicles was assessed using synthesized sounds. In [15] train pass-bys have been auralized based on a combination of filtered and resynthesized binaural recordings. Within the SILENCE project, a software called VAMPPASS was developed which features audio synthesis capabilities for railway vehicle pass-bys [16][17][18]. The tool mainly uses recordings obtained by a microphone array and simulates physical acoustic sources on the vehicle by equivalent point sources [19]. Initial attempts to auralize train pass-bys are also indicated in [20]. In [21] beamforming was applied to obtain audio recordings of sub-sources during train pass-bys. These recordings may be used as input data to synthesize source signals [22].
Based on auralization, different acoustical scenarios may be compared in terms of their perception using e.g. listening test experiments. For railway noise where noise mitigation measures are diverse and costly, such an assessment of the effectiveness of different measures could be helpful. Measures at the source as well as on the propagation paths are viable options. Therefore an auralization model for railway noise should be able to simulate both intervention types independently, i.e. only measures at the source, on the propagation or on both. This suggests to use separate source and propagation modules which is in line with Vorländer's definition of the auralization process [2].
Auralization models are either based on audio recordings or include a synthesizer that artificially generates audible signals. In contrast to architectural acoustics where mainly speech or music signals are used, in environmental acoustics it is often desirable to synthesize the emission signals instead of relying on audio recordings. The latter only allows for little variation of different signal aspects. A more versatile method with a much higher degree of freedom, as well as full control of the influencing signal parameters is to synthesize the sounds.
Railway noise consists of different contributions which may dominate in dependence on the vehicle and track type, traveling speed, frequency and geometry. It consists of rolling and impact noise as well as noise from secondary sources such as the traction, http aggregates and aerodynamic noise [23]. Today, in most noise exposure situations rolling and impact noise dominate the A-weighted sound pressure level. Rolling noise is generated by very small amplitude undulations of the wheel and the rail running surfaces. The resulting varying contact forces excite the wheel/rail structure which consequently vibrates and radiates sound waves. The three main parts contributing to the radiated noise of the wheel/rail interaction are the wheel, the rail and the track sleepers. Impact noise arises due to discrete irregularities on these surfaces [24]. These transient sounds occur notably in the context of wheel flats [25], insulated rail joints or switches [26]. The frequency content of rolling and impact noise is very wide and it covers almost the whole audible frequency range. The maximum sound pressure level lies in the mid frequencies, typically in the range of 500 Hz to 2 kHz [27,23]. Hence in an auralization, signals with a wide frequency content need to be utilized.
In the research project TAURA: Traffic Noise Auralisator (2014-2016) an auralization model for traffic noise was developed that covers road traffic [28,7] and railway noise [29]. It will form the basis for future listening test experiments to assess different noise mitigation measures. The objective of this article is to present an emission synthesizer for the rolling and impact noise components of railway noise. Section 2 shortly describes the railway noise measurement campaign that produced expedient data for the model development. In Section 3 the auralization model is established and presented.

Measurements
In the years 2007 and 2008 a large railway noise measurement campaign was carried out in Switzerland in the context of the son-RAIL project. It involved 15 measurement sites and, along with the regular rail traffic, a dedicated measurement train. At all sites, sound pressure and rail accelerations were synchronously measured and passing axles were detected using light barriers. The measurement sites typically consisted of two-track sections (see [30] for the set-up). For each track, sound pressure was measured on both sides at the reference position 'A' according to the standard ISO 3095 [31], that is at a distance of 7.5 m from the centerline of the track and 1.2 m above rail. Furthermore, direct rail roughnesses, track decay rates and propagation attenuations were determined. A vertical microphone array was used for sound source separation.
In Section 3, experimental data from one measurement point and the measurement train is shown (Figs. 3, 4 and 14). The site was located in Lussy (Swiss National coordinates [CH1903+] 2'56 2'275/1'173'600) on the route Lausanne-Fribourg. The southern track was built in 1994 and consists of concrete monobloc sleepers on ballast substructure and UIC60 rails. Direct roughness measurements from August 2007 yielded weighted roughness levels L k;CA [32] of 7.4 and 9.4 dB for the two rails, respectively. On that basis, the rail roughness was classified as average [30].
The measurement train consisted of two locomotives, 7 passenger cars and 6 freight wagons and was composed as listed in Table 1. The train passed each measurement site six times in each direction at different speeds, i.e. 1 Â 60 km/h, 3 Â 80 km/h, 2 Â 100 km/h. Subsequent to the pass-by measurements, direct roughness measurements of all 36 freight wagon wheels were conducted.
Parts of this measurement data set were used for the development and validation of the auralization model, which is described in the following section. Fig. 1 shows the structure of the auralization model which contains three separate modules. The first module provides the signals emitted by the sources. The emissions are synthesized based on source specifications. The second module is a series of filters that simulate the propagation effects of the sound waves traveling from the source to the virtual observer point. These propagation filters are generated as a function of the source-receiver-geometry, topography and weather conditions. Also propagation effects due to obstacles, such as buildings [33], barriers [34,13] or natural objects [35][36][37] may be considered here, e.g. reflections, scattering and shielding. The third module is a reproduction system, which adequately renders the received signals to headphones or a multi-channel loudspeaker system. By considering the sound incidence direction, a spatial impression can be created.

Point source model
To comply with the fundamentals of auralization [2], in particular the separation of sound generation and propagation, the proposed model follows the source-path-receiver concept and uses distributed point sources.
The train composition as an acoustical source is represented by a series of moving point sources. They all move at the same speed which is equivalent to the traveling speed V of the train. For most exposure situations, a train has to be viewed as an extended source as its total length is typically larger than or of similar magnitude as the distance to the receiver point. Therefore the point sources have to be horizontally spread across the train composition. To correctly model ground reflections and shielding, the source height is of importance. As railway noise consists of noise sources of different heights, the point sources in the model are thus horizontally and vertically distributed along the train.
Primary point sources, denoted as S tr and S veh , are used to represent rolling and impact noise. S tr represents the contribution radiated by the track, and S veh the contribution radiated by the vehicle. For each axle i two primary point sources, S tr;i and S veh;i , are located at heights 0 and 0.5 m above rail. Secondary point sources, denoted as S sec , are introduced for traction noise, aerodynamic noise, aggregates, etc. Secondary point sources are positioned at different heights, i.e. at 0.5, 2, 3, and 4 m above rail according to state-of-the art engineering models [38,39,30]. The sonRAIL model [30] describes the sound powers of secondary sources at these predefined heights for different vehicle types. The horizontal distribution of the secondary point sources along the wagons should be defined according to the physical positions of the real sources, i.e. individually for each vehicle type. As a default setting, one stack is located on top of each boogie as shown in Fig. 2. The point source locations are exemplified in Fig. 2. However, it can be expected that in many cases, fewer secondary sources suffice, e.g. at large distances or in unshielded cases. For a wagon with N ax axles this yields a total of maximal 2N ax þ 8 point sources. Consequently, for a whole train composition, the number of point sources typically lies well above 100. Note that replacing the emission of the wheel/track system by two stacked equivalent point sources with a prescribed directivity is a simplification of the real system. In fact, the wheel/track system features an extended source, the rail, also radiating at locations away from the point of contact, especially at medium and high frequencies where vibration attenuation along the rail is low. Here, we follow the standard engineering models such as CNOSSOS-EU [40] where it is assumed that, as the wheel moves on the rail, this system is approximated by a single point source.

Synthesis approach
Various techniques for digital sound synthesis exist [41]. In the context of sounds from aircraft, wind turbines and road traffic, a combination of additive [42][43][44] and subtractive synthesis [42,43], referred to as spectral modeling synthesis [45,46,44], is commonly used [8,12,10,11,7]. With this synthesis method the emitted sound pressure p e at time t may be generated by p e ðtÞ ¼ X l a l ðtÞ cos / l ðtÞ þ 2p where a l and b j are amplitude modulation functions (with a l P 0 and b j P 0), / l is a phase modulation function (with 0 6 / l 6 2p), f l is frequency in Hz, m j denotes third-octave band filtered stationary noise, and l and j are indices of summation. For car engine sounds, also a method based on granular synthesis was developed [6]. As implemented in [16,17], the most obvious approach is to also use spectral modeling synthesis for the emission synthesis of train pass-bys. For secondary sources, this approach seems particularly justified as their sound generation principles resemble other environmental noise sources, e.g. cars or aircraft. However, our initial tests using this synthesizer type (Eq. (1)) for train pass-bys failed. Two reasons for this failure were impact noise which could not be represented, and the sound characteristics (timbre) of rolling noise which could not be reproduced. However, these subtle temporal and spectral patterns are important for realistic auralizations of railway noise. Experimental data which illustrates these characteristics is shown in Figs. 3 and 4. The spectrogram in Fig. 3 contains an excerpt of the sound pressure of the measurement train recorded close to the track (see Section 2). In this example, a series of typical features of railway noise can be pointed out. Firstly, clear broadband differences between the first part (up to time instant 3 s) and the second part can be observed. This is due to different types of wagons, as during the first part, passenger cars (K-braked), and in the second part, freight wagons (Ci-braked) are passing by. Secondly, between second 6 and 7, a temporal pattern is present. These vertical lines represent transient sounds which are due to a wheel flat on one of the freight wagon axles. Thirdly, fine spectral patterns consisting of peaks and dips can be observed throughout this example. These spectral peaks and dips seem to differ between the two wagon types. For the passenger cars, peaks occur at about 1.5, 2.3 and 3.0 kHz, whereas for the freight wagons peaks occur at somewhat higher values of 1.8, 2.5, 3.2 kHz. Fig. 4 shows measured narrowband sound pressure spectra of flat freight wagons traveling at different speeds. Identical wagons where measured at the identical location. Flat wagons were chosen for this example as secondary sources are negligible for this vehicle type and rolling noise is dominating. In Fig. 4 spectral peaks are also clearly apparent. Two different kinds of peaks can be discriminated (see Fig. 4 i), (a) and (b)): (a) Below 1600 Hz, two widely spaced peaks per speed are present. These peaks show an increasing amplitude as a function of the traveling speed as well as a distinct frequency shift. The frequency location of the peaks scales with the speed. (b) Above 1600 Hz, two peaks per speed are present which feature an increasing amplitude with increasing speed. However, their frequency remains constant. Their bandwidth slightly increases with speed which is mainly due to the Doppler frequency shift which leads to a frequency smearing. This indicates that two different physical phenomena are involved in creating these spectral patterns.
Therefore in the next section we propose a physical approach that includes modeling of the mechanical excitation of the wheel/rail system and the vibrational behavior of the system separately. A physical approach on that basis has the advantage that it allows the extrapolation to situations for which no synthesizer parameters have been measured. This approach has already proven to be successful for the synthesis of transient sounds, specifically percussion and bells [47][48][49] or footstep sounds [50]. For railway noise, transient sounds occur notably in the context of wheel flats [25], insulated rail joints or switches [26].

Emission synthesis of rolling and impact noise
As mentioned above, the synthesis of rolling and impact noise is based on a physical approach. The same approach is also reflected in theoretical models such as TWINS [51] and state-of-the-art railway noise engineering models such as Harmonoise [38], IMAGINE [39], sonRAIL [30] or CNOSSOS-EU [40]. The basis of the proposed auralization model is (in a statistical sense) an estimate of the  surface microstructure (i.e. the roughness) of the rail and the wheels. These roughness signals are then combined and processed to obtain the mechanical excitation of the wheel/track structure. Using the vehicle speed, a transformation from the spatial into the time domain is performed. Next, the modal behavior of the structure and the radiation are simulated. These effects are characterized by transfer paths which are applied in the time domain using digital filters.
As an overview, Fig. 5 shows the signal flow of the proposed emission synthesizer for a pass-by of a single axle i. The yellow blocks represent generator modules which produce spatial signals, i.e. depending on the location along the rail axis. These spatial signals are summed up and modified by a contact filter. Subsequently, the resulting spatial signal is converted to the time domain by a vehicle speed dependent resampling. After a differentiation with respect to time to yield vertical speed, the resulting excitation time signal is then fed into two transfer filters modeling the vibration and the radiation efficiency of the system. Their output is amplitude modulated by a directivity function representing the radiation pattern of the source. The resulting signals correspond to the free field sound pressure at a defined reference distance of 1 m, as radiated by the track and the vehicle, respectively. Accordingly these signals are attributed to point sources S tr;i and S veh;i .
The synthesizer in Fig. 5 thus delivers two signals per axle. These are two digital audio signals e tr;i ½k and e veh;i ½k for each axle i with k as sample index. Generally an audio sampling rate of f s ¼ 48 kHz is used. The following sections show in detail how these signal are generated. (i)

Roughness generators
The four generator modules in Fig. 5 produce roughness signals as a function of the spatial coordinate X. In order to do this, they use either measured or predicted roughness spectra as input. The operating principle of the generators is depicted in Fig. 6. It is based on the subtractive synthesis technique but differs for each generator in terms of input and basic waveform. Measurement data indicate a strong wavenumber dependency of roughness signals with substantial low wavenumber content (see e.g. Figs. 7 and 9). These wavenumber dependencies are realized by applying filters to broadband signals.
The basis of the rail and wheel roughness profiles are white noise signals. These signals are processed using digital FIR filters. In doing so, the initial constant spectral content is shaped. The corresponding filters are designed in the frequency domain based on spectral input data. By contrast, for impacts and wheel flats, discrete impulses are utilized as basic waveforms. These impulses are spatially shifted to their respective location and filtered in order to bring them into the desired spectral shape. In the following, the details on the computation of the roughness profiles (rail, wheel, impact and flats) are given.
The rail roughness generator uses a rail roughness spectrum L r;tr given in 1/3 octave bands as input. This data may be obtained by measurements on a rail or from a railway noise calculation model. It is generally assumed that roughness data is available in 1/3 octave bands for wavelengths from 63 to 0.1 cm, such as depicted in Fig. 7. To be able to reproduce this frequency range by digital signals (Nyquist-Shannon sampling theorem), a resolution of DX ¼ 0:45 mm is used for the spatial coordinate X, which corresponds to a spatial sampling rate of n s ¼ 1=DX % 2200 samples per meter. Based on that, the rail roughness is modeled by a white Gaussian noise signal w (basic waveform) that is spectrally shaped using a digital FIR filter (shaping filter) with impulse response h track . The rail roughness is generated by where Ã denotes the linear discrete convolution, x is the sample index and N is the number of filter taps of the filter h track with order N À 1. Eq. (2) is efficiently evaluated by the overlap-add method which uses the fast Fourier transform (FFT). The noise signal w is scaled to have unit power per wavenumber according to where the variables R x are random numbers with a normal distribution with zero mean and variance 1. For the filter design, i.e. the calculation of the filter coefficients h track ½m, the frequency sampling method is adopted. This method uses the inverse discrete Fourier transform (IDFT), or the inverse FFT (IFFT), respectively. Prior to this transformation some inter-and extrapolation steps are necessary 1. From the rail roughness levels L r;tr;j given per 1/3 octave band j, the signal power per unit spatial frequency n is estimated by where f c;j and B j denote the center frequency and the bandwidth of the 1/3 octave band filter j. 2. L r;tr ½n is linearly extrapolated to the spatial frequencies n ¼ 0 and n s =2. 3. L r;tr ½n is interpolated to the FFT bin frequencies n using piecewise cubic Hermite interpolation to obtain a scaled filter magnitude response A½n in dB. 4. From A½n a linear phase FIR filter is calculated using the IFFT. A basic version of the filter is obtained by with variable a 0 being equal to the reference roughness r 0 , i.e. a 0 ¼ r 0 ¼ 10 À6 m. 5. As a final step, h basic is symmetrically truncated around m ¼ 0 to a total of N filter taps, shifted by N=2 samples to make the filter causal and multiplied by a N-point Tukey window with a ¼ 0:2.
The resulting signal corresponds to the filter coefficients of the desired FIR filter.
The above described filter design procedure was numerically evaluated for the six spectra given in Fig. 7. Rail roughness signals r track of length 200 m were calculated by Eq. (2) and analyzed in 1/3 octave bands. Fig. 8 shows the mean absolute errors (MAE) of a parametric study, with It revealed that 2000 filter taps are needed to keep the errors low also at low wavenumbers, meaning that a filter length of about 1 m is required. Further, note that filter errors are larger if measurement data is used as input instead of data from prediction models. This is due to the uneven, jagged shape of a single measurement compared to the smooth shape of model curves (see Fig. 7) with the latter being easier to model by a filter. Next, the wheel roughness generator produces a periodic signal with the period being equal to the wheel perimeter P i in m (i.e., of the running surface) for wheel i. Along with the wheel perimeter, a wheel roughness spectrum L r;veh;i;j is needed as input. Fig. 9 shows wheel roughness data in 1/3 octave bands. The wheel roughness is modeled by a sequence s of finite length white noise snippets (basic waveform) that is spectrally shaped using a digital FIR filter  . Rail roughness level data as a function of the wavenumber in 1/3 octave bands. Three curves for different rail condition classes from the sonRAIL model [30], the two curves from CNOSSOS-EU [40] (denoted as based on ISO 3095:2013 [31] and to represent an average network, respectively) and a single measurement are shown. Note that at low and high wavenumbers most values appear to be (linearly) extrapolated.
(shaping filter) with impulse response h veh;i . The wheel roughness r veh;i of wheel i is generated by The basic waveform s i in Eq. (7) is scaled to have unit power per wavenumber and has a signal period of where : b e denotes the rounding function to the nearest integer: where the variables R i;x are random numbers with a normal distribution with zero mean and variance 1. The filters h veh;i are designed based on wheel roughness levels L r;veh;i;j analogously as for the rail roughness. However, as a relatively short independent signal of P samples per wheel is generated, larger errors for the roughness of a single wheel as compared to the rail are expected due to the statistical nature of the basic waveform. A typical wheel diameter of 0.92 m results in a perimeter P of about 2.9 m and a signal period of P % 6400 samples. Whereas for rolling highway vehicles, the wheel perimeter is only approx. P ¼ 1:1 m and P % 2500 samples, which is in the same order of magnitude as the number of filter taps required for h veh . Fig. 10 shows simulation results where the wheel roughness generation is evaluated. It can be observed that larger errors occur at low wavenumbers and particularly for the small wheel. In these cases, the model underestimates the required roughness levels. Furthermore, the difference between mean and mean absolute errors indicates that a single wheel may exhibit large errors at some frequencies, while the average error over many wheels approaches 0 dB. Thus, for the auralization of a full train consisting of many wheels, this approach seems justified. The rail impact generator uses a spatially shifted impulse as basic waveform and a shaping filter h imp . The equivalent impact roughness r imp is calculated by where x I denotes the sample index of the impact location I on the rail. The scaling factor in Eq. (11) assures that each Kronecker delta function (basic waveform) has unit energy per wavenumber. h imp is designed based on an equivalent roughness spectrum L r;imp;j for a reference segment of 1 m length given in 1/3 octave bands (see Fig. 11). However, in the literature, different definitions for the equivalent roughness of impacts exist. On the one hand, CNOSSOS-EU [40] contains an equivalent roughness spectrum for impact noise, which corresponds to the data set from the IMAGINE model [39]. This spectrum is added to the total roughness, i.e. without applying a contact filter. However, the presented auralization    ) from the sonRAIL model [30] and from CNOSSOS-EU [40] are shown together with measurement data of a single wheel.
model uses an impact roughness spectrum which is added before application of the contact filter and thus requires a compensation.
On the other hand, the sonRAIL emission model contains impact roughness spectra for wooden and concrete sleepers. However, the model and thus the data is defined in such a way that the impact roughness only affects the sound power radiated by the track, and not the component radiated by the vehicle, which is in contrast to the model structure shown in Fig. 5. A correction spectrum can be estimated as the level difference between the track, and the total (energetic sum of track and vehicle contribution) transfer function, both transformed to wavelengths. Thus, both data basis have to be adapted in order to be used in the described auralization model. Fig. 11 shows modified data based on the values from CNOSSOS-EU and sonRAIL. The spectra feature a high low wavenumber content with remarkably larger values than the rail roughness spectra from Fig. 7. This data may be used to design h imp with the above described five-step filter design procedure starting with Eq. (4). This results in symmetrically shaped impulses. Calculations for these spectra showed that also a filter length of 2000 taps is needed in order to keep the filter errors low. Additionally, to attenuate the occurring DC component, h imp is high-pass filtered with a cutoff frequency of 0.8 m À1 , which is well below the lowest 1/3 octave band.
The roughness corresponding to a wheel flat is modeled using a spectrally shaped impulse train, i.e. the wheel flat generator uses an impulse train with a period equal to the wheel perimeter as basic waveform and a shaping filter h flat . The equivalent wheel flat roughness r flat;i is calculated by where x 0 is a random offset and the scaling factor assures unit power per wavenumber of the basic waveform. The impulse response h flat is calculated based on an equivalent roughness spectrum for a wheel flat L r;flat;j given in 1/3 octave bands. Here, L r;flat is defined analogously as L r;veh , i.e. as a signal power, but by only considering the isolated wheel flat and not the remaining roughness of the wheel, which is already represented by r veh . Fig. 12 shows measurement data of two wheels with a wheel flat. The two wheels belong to the same wheelset and are responsible for the transient sounds shown in Fig. 3. Equivalent roughness spectra L r;flat and L r;veh as used by the presented auralization model are labeled as ''flat" and ''residual", respectively, in Fig. 12. They were derived by spatially windowing roughness raw data before applying a 1/3 octave filterbank. In this data set, the wheel flat dominates the total wheel roughness at low wavenumbers only. This means that the total equivalent roughness (as given in the sonRAIL model) shall not be totally attributed to the wheel flat. Thus, this justifies the chosen modeling approach, where, for a wheel with a wheel flat, the remaining wheel roughness is also considered.

From roughness to excitation
The total roughness r tot;i is modeled by summation of the four signals obtained from Eqs. (2), (7), (11) and (12) from which the effective roughness r eff is obtained by applying a contact filter with impulse response h contact . The contact filter models the effect of the finite contact zone of wheel and rail. This leads to a smoothing of the roughness, i.e. an attenuation of short wavelength variations which yields a low-pass behavior in the spatial domain. This effect depends on the wheel diameter and the axle load. The output of this filter corresponds to the effective roughness h contact is designed based on data, A 3;i;j , describing its magnitude response in 1/3 octave bands. A similar procedure as the above described five-step filter design procedure starting with Eq. (4) is used for the filter design. However, step 1 is replaced by A 3;i ½n ¼ f c;j ¼ A 3;i;j because A 3 already describes an attenuation, and the reference variable is set to a 0 ¼ 1. Basically, A 3 describes a low-pass behavior and depends on the wheel size and the axle load. At low wavenumbers, A 3 is 0 dB with a cutoff frequency at a wavelength of about 2.5 cm. Thanks to its broadly constant attenuation at low frequencies, significantly less filter taps (<200) are needed for h contact as compared to the filters within the above described roughness generators which require 2000 taps. Fig. 13 shows an example of the spatial signals which were generated with the described model for an extreme case where an impact as well as a wheel flat are present. The equivalent roughness of a rail, an impact, a wheel and a wheel flat as well as the combined effective roughness are illustrated. From top to bottom, the first panel shows the random structure of the rail roughness which was generated based on the measurement data from Fig. 7. At location 7 m an impact is present having a much higher magnitude than the rail roughness. In the second panel, the wheel diameter of 0.92 m leads to signal periods of 2.9 m which is also partially reflected in the effective roughness (third panel). The signals were generated based on the measurement data of the wheel W1 from Fig. 12. The peaks of the wheel flat are distinctly above   [39,30]. This translates into X ¼ tV ¼ x=n s where t denotes time in seconds. Applying this transformation to the spatial signal r eff ½x yields the deflection time signal where k is the time sampling index, f s is the audio sampling rate, n s denotes the spatial sampling rate and T 0;i is the time instant when axle i passes the location x ¼ 0. This compression of the abscissa by V corresponds to a stretching of the frequency axis by V, and vice versa. In the digital domain, this transformation is known as resampling. In Eq. (15), it cannot be assumed that x are integers. Therefore the resampling process needs some interpolation. If the vehicle speed is constant, this process is known as synchronous resampling or sampling rate conversion. From Eq. (15) it can be seen that the conversion ratio C depends on V; n s and f s and is given by In our case, the conversion rate C takes values between 1.3 and 0.55 for speeds between 60 and 140 km/h. At a speed of about 78 km/h C ¼ 1, which means that no resampling operation is needed. A computationally efficient method to solve Eq. (15) is rational (or fractional) resampling which involves up-and downsampling operations by integers L and M, respectively. This however requires C to be a fixed, rational number. Therefore in an implementation it is convenient to approximate C by e.g.
In choosing M, a compromise between computational load, and temporal and spectral errors has to be found. The approximation in Eq. (17) leads to a maximal relative frequency error Hence, the error is critical at high traveling speeds. For 140 km/h an acceptable value of 2% for the maximal frequency error can be achieved by M ¼ 44. The lowest panel in Fig. 13 shows the deflection signal f which was calculated by Eq. (15) from the effective roughness r eff shown in the panel above for T 0 ¼ 0 s and V ¼ 80 km/h. The subsequent differentiation of f i with respect to time t yields the effective roughness velocity time signal This signal is the basis for the excitation of the dynamic wheel/rail system. Together with the frequency dependent mobilities of the track, wheel and the contact zone, the contact forces can be derived [23]. However, the derivation of the contact forces requires very detailed knowledge of the complex dynamic system and is thus very ambitious. Therefore in the proposed auralization model, the contact forces are not explicitly identified but rather included in an integral way as described in Section 3.3.3. The differentiation in Eq. (19) corresponds to a multiplication with jx in the frequency domain. This operation may be approximated using a FIR filter, where the filter order steers the quality at high frequencies. It is therefore convenient to not implement Eq. (19) explicitly, but rather implicitly by integrating it into the subsequent transfer path filters.

Transfer path filters
The two subsequently applied transfer path filters implicitly describe a series of effects: They incorporate the transformations from the velocity signal v i to the contact forces which excite the wheel/rail structure. Further, they simulate the vibrational behavior of the wheel and track and their radiation efficiencies. The outputs correspond to the radiated sound pressure at a reference distance. In today's engineering models, these effects are often summarized by two transfer functions, one for the vehicle and one for the track [39,30,40]. Following this approach, we formulate the radiated sound pressure in the axial direction, p ax , at a reference distance of 1 m separately for the vehicle and the track using two filter impulse responses h struc as p ax;veh;i ½k ¼ functions describe the combinations of Eq. (19) and (20) or (21), respectively. However, in the literature, different definitions are used, which e.g. may be formulated for a differing reference distance or for sound power instead of sound pressure. This can be compensated for by either adding a constant to the level data or by using an additional gain correction of the resulting signals, p ax . The published spectra feature a smooth high-pass behavior as they also include the above stated differentiation (Eq. (19)) which corresponds to a x-proportionality in the frequency domain.
In a first attempt, the transfer path filters were designed with linear phase and a smooth magnitude response based on 1/3 octave band data. However, listening tests revealed that different filter types with a more complex frequency response are needed. The 1/3 octave band frequency resolution is justified for noise prediction but seems to be too rough for realistic auralizations for which the timbre plays an important role. This is a major difference with previous attempts of railway noise auralization, e.g. VAMPPASS.
Similar to Fig. 4, Fig. 14 shows measured narrowband spectra of sound pressure signals recorded close to the passing of axles of the same type. Between 1 and 8 kHz, the fluctuating frequency dependency of rolling noise can be clearly observed. Fig. 14 further illustrates that the distinct peaks do not depend on the vehicle speed. Thus it appears that they are not linked to the spatial domain, i.e. roughness. For higher speeds, the peaks are still present but somewhat less pronounced due to the Doppler effect. We conclude that the narrowband level variations as shown in Fig. 14 are most probably attributed to the modal behavior of the structure. Consequently, within the model structure depicted in Fig. 5, this effect has to be incorporated in the transfer path filters.
The separation into excitation and structural vibration is typical for physical modeling synthesis. To describe the vibration, we propose to use the modal synthesis technique where a resonating structure is described in terms of its modes [52,48]. The modal resonators can be modeled by second order oscillators (also known as damped harmonic oscillators) for an underdamped system. They can be realized by U digital filters that are connected in parallel [48,49]. Each filter is designed using the resonance frequency f u and the decay rate a u of the respective mode u. The decay rate is related to the structural reverberation time T 60;u in seconds for a 60 dB drop by a u ¼

lnð10Þ
T 60;u ð22Þ and to the bandwidth Df u of the resonance by Df u ¼ a u =p. The total impulse response h decay of a transfer path filter may be approximated by A u e ÀauðtÀsuÞ sinð2pf u ðt À s u ÞÞHðt À s u Þ ð 23Þ with the Heaviside function H, delays s u and amplitudes A u . The main challenge is to find an appropriate setting of the parameters f u ; a u ; A u and s u . To do so, detailed information about the dynamic behavior is required. The required parameters cannot be determined separately for a single wheel or a track, as they differ for the combined coupled dynamic system. The interaction adds further resonances and damping [23]. It is also known that the rolling of the wheel affects these parameters. For instance, compared to a wheel at rest a rolling wheel possesses additional and differing resonances [23]. Consequently, as isolated measurements or simulations are not expedient and data from current literature is not sufficient, the estimation of the model parameters in Eq. (23) may be taken as solving the inverse problem. We propose to use suitable pass-by measurements to fit the required parameters.
In the freight wagon sound pressure spectra of Fig. 4, distinct peaks which are related to the vibration are located at 1.8, 2.5 and 2.9 kHz. In the passenger wagon spectra of Fig. 14, distinct peaks may be recognized at different frequencies, namely at 1.4, 2.2, 2.9, 3.6 and 4.4 kHz. For higher frequencies, it is more difficult to discern clear spectral patterns. The data further shows that the peaks may be more easily localized at a low speed of 60 km/h, as compared to 80 or 100 km/h. At lower speeds, spectral smearing due to the Doppler frequency shift is reduced. However, as at lower speeds, rolling noise is reduced, secondary sources increasingly contaminate the data. The measurement data in Fig. 14 thus exhibits five peaks between 1 and 5 kHz, which signifies that only slightly less peaks than the number of 1/3 octave bands occur within this frequency range. As modal density increases with frequency [23], at higher frequencies, a higher number of resonances per 1/3 octave band is expected. This might explain the difficulty to recognize distinct peaks at high frequencies. Furthermore, the spectra of Fig. 14 are averaged over 12 axles with a certain variation in their resonance frequencies. This variation is also expected to be larger at higher frequencies. Therefore at higher frequencies a modal density of more than one mode per 1/3 octave band and some variation of the resonance frequencies between wheels is needed.
The amplitudes A u in Eq. (23) are set for each resonance individually. They steer the gain of each resonance and may be set based on measured 1/3 octave band transfer functions [39,30,40]. According to Eq. (23), the setting of A u thus also depends on the decay rate a u of each mode which is related to the structural reverberation time by Eq. (22). Structural reverberation times T 60 of freely suspended wheels may exceed 1 s for certain modes [53]. However, our measurements on rails and the literature suggest that structural reverberation times lie well below 1 s for practical situations where the boundaries and the interaction introduce a significant amount of damping. Field measurements revealed values of 0.07-0.23 s for five resonances of a running wheel in the range of 1.7 to 3.7 kHz by observing bandwidths of velocity levels [54]. Similar information might also be obtained by measuring narrowband sound pressure levels. However, this is delicate due to the Doppler effect and as within such measurements, resonance peaks of multiple wheels temporally overlap. A reverberation time of   iod to realize random phase relations between the modes. Further, 1/3 octave data from the sonRAIL model is used. As explained above, it can be clearly seen that the resonance peaks in Fig. 15 are much narrower than the measured data in Fig. 14. Furthermore, some variations between the two simulated filters can be observed in the impulse responses as well as in the frequency responses, particularly in the regions between the resonances. This is due to the randomly set delays s which leads to varying interference between the resonances. This has also implications on the accuracy of the filters magnitude response as measured in 1/3 octave bands and thus the filter design process. Fig. 15 further illustrates the spectral smearing effect of sound propagation. The simulation considers directivity (see Section 3.3.4), geometrical spreading and the Doppler frequency shift. The shown data was calculated for a distance of 7.5 m, 60 km/h speed and 1 s integration time. Fig. 15 shows that even for such low speeds the peak bandwidths are strongly increased, as explained above. This increase is due to the Doppler frequency shift, which is large for short distances and high speeds. However, the applied directivity, as defined in Section 3.3.4, is maximal at the shortest distance, which leads to an attenuation of the frequency shifted energy and thus decreases spectral smearing.
The filter design process starts by creating a decay filter impulse response h decay according to Eq. (23) with constant amplitudes A u for all modes within the same 1/3 octave band. The amplitudes are scaled so that the sum over the corresponding resonances leads to unit magnitude response at the respective band. Secondly, the target response given in 1/3 octave bands, such as shown in Fig. 16, is linearly extrapolated to a frequency range of 50 Hz to 12.5 kHz. Based on that, an initial shaping filter h shape;0 of 5000 samples length is created as described in Section 3.3.2 for the contact filter A 3 . An initial transfer path FIR filter h init is obtained by convolving these two impulse responses, i.e. h init ¼ h shape;0 Ã h decay . Due to the absence of resonances, interfering resonances and interpolation, h init may exhibit errors of several dB, as exemplified in Fig. 16. Therefore the deviations of h init are analyzed in 1/3 octaves and used to redesign the shaping filter h shape;corr which is preequalized by modifying the target response. The final transfer path filter is obtained as h TP ¼ h shape;corr Ã h decay , applying a temporal truncation after 4 Á T 60 and a linear fade out of a few milliseconds. Fig. 16 illustrates that this filter may closely agree with the target spectrum.

Directivity
The directivity of the rolling and impact noise sources are assumed to be frequency independent, i.e. shielding effects due to the vehicle body or resonance-specific radiation patterns are not considered here. The radiation pattern can thus be understood as a scaling of the emitted sound pressure as a function of the emission angle. This amplitude modulation is performed based on the horizontal directivity function DL W;dir;hor;i proposed in CNOSSOS-EU [40]. Converted to sound pressure, the modulation function reads   model, which was developed based on a monopole assumption, requires K ¼ 1:1.

Evaluation
As a demonstration of the presented auralization concept, the measured data from panel (i) of Fig. 4 was reproduced using the proposed emission synthesizer structure and propagation filters. Panel (ii) of Fig. 4 shows sound pressure spectra which were calculated from auralized pass-bys at an observer point close to the track. In the simulation, the only varying parameter was the vehicle speed V. It was set to 60, 80 and 100 km/h, whereas all the other model parameters were kept constant.
In the simulation, the measured rail roughness shown in Fig. 7 from the measurement site (see Section 2) was adopted. Five wagons of the type SBB Slmmnps, resulting in a total of 20 axles (i ¼ 1; . . . ; 20), were modeled. The wheel roughness profiles were generated based on the single wheel measurement spectrum from Fig. 9 and a wheel perimeter of P ¼ 2:9 m. The transfer path spectra were taken from the sonRAIL model [30]. The wheel modes were modeled as described in Section 3.3.3 with a supplemental random variation of the resonance frequencies, i.e. slightly differing resonances for all wheels. The relative frequency shift of each mode had zero mean and a standard deviation of 2%. For the track an exponentially increasing modal density was used. Regarding sound propagation effects, geometrical spreading, the Doppler effect and air absorption were considered and modeled according to [7]. The ground reflection was neglected as in the specific situation, controlled measurements using a loudspeaker have indicated no significant ground effect above 200 Hz [55]. In general, at close distances to the railway track, the simulation of the ground reflection is challenging due to the extended reaction of the ballast [56]. The receiver point was located at a distance of 7.5 m and 1.2 m above the rail.
In panel (ii) of Fig. 4 the behavior of both types of spectral peaks as observed in panel (i) are well reproduced by the model, i.e. peaks due to the excitation and peaks due to the vibration. Above 1.6 kHz three peaks are present which feature an increasing amplitude for increasing speed, but at a constant frequency. These peaks are attributed to the resonances included in the transfer path filters of the vehicle as defined in Eq. (23), i.e. due to the vibration. For the peak at 2.5 kHz, the speed-related increase of the bandwidth which is due to the Doppler shift is also visible.
For the studied situation, at frequencies below 1.2 kHz the transfer path of the track dominates the simulated sound pressure. The wide peak from 650 to 850 Hz for the speed 60 km/h is shifted upwards in frequency for higher speeds. These peaks are due to the distinct peak at a wavelength of 2 cm of the used wheel roughness as shown in Fig. 9. These peaks are therefore related to the excitation. Compared to the measurement, however, the magnitude of these peaks stays constant whereas the measurements suggest an increase. Below 800 Hz the pattern is less clear. The peak which shifts from 400 to 600 Hz in the measurements in Fig. 4 is not well reproduced. This peak might be attributed to the roughness patterns of other wheels. In fact, direct wheel roughness measurements have revealed that some wheels of the same type had a dominant peak at a wavelength of 5 cm [55]. Thus, this demonstrates the variation between axles and underlines the importance of detailed input data for the described model.

Conclusions
Our study showed that for a realistic auralization of railway noise, temporal and spectral patterns with a high resolution need to be considered. The temporal patterns arise on the one hand from differences between the vehicles and axles. On the other hand, transient sounds are also characteristic for railway noise. They particularly occur in the cases of wheel flats, or impact noise caused by insulated rail joints or switches. Spectral patterns of rolling noise were observed in narrowband spectra which showed distinct peaks which are due to different kinds of physical phenomena. Empirical data suggests that they can be either attributed to the mechanical excitation or the vibration of the dynamic wheel/track system. These spectral patterns evoke the characteristic metallic timbre of rolling noise.
In former studies, a method denoted spectral modeling synthesis has proven successful in the context of wind turbine, road traffic and aircraft noise. However, in this study it was found that this method is not capable to satisfyingly represent the above described characteristics of railway noise. Further, it is insufficient to rely on 1/3 octave band data only. In addition, modal data of the wheel/rail system has to be included to create the required timbre and thus realistic sounds.
In this article, a synthesizer structure was proposed that is based on a physical approach in which the noise generation mechanism is modeled in the time domain. It allows for the reproduction of the above described sound characteristics and concurrently represents rolling and impact noise. The physical model further allows for auralizing the effect of varying wheel and rail roughness as well as different vehicle speeds. Thus, different noise mitigation measures at the source and on the propagation path can be simulated without the need of elaborate measurements. The audio signals are purely artificially generated using a parametric synthesizer. This results in full control over the signals and a maximal degree of flexibility. The latter means that the approach allows for interpolation between known situations but also the extrapolations to new situations, i.e. future scenarios.
For the auralization of full train passbys, in addition to rolling and impact sounds, secondary sources also need to be considered, such as the noise from the traction system, aggregates or aerodynamic noise. Realistic emission sounds of these sources may probably be generated using spectral modeling synthesis. Further, typical audible railway noise characteristics such as squeal and rattling would demand a model with more complexity and subtlety. However, in the assessment and comparison of noise mitigation measures, such as a reduction of roughness, the introduction of dampers or noise barriers, relative differences between scenarios are in the focus and therefore for this purpose these characteristics may possibly be neglected. The presented auralization model is able to synthesize rolling and impact noise of railway noise without audible artifacts at a high audio quality. It offers a high degree of flexibility, however at the cost of numerous input parameters which are difficult to determine. Future efforts will include better estimation methods for the input data as well as perceptual validation of the proposed auralization system through listening tests.