Separation of rail and wheel contributions to pass-by noise with sparse regularization methods

This paper proposes a method for separating rail and wheel noise contributions via sparse regularization of microphone array data. The underlying idea is to promote sparse solutions, which jointly approximate the two noise contributions with few non-zero coefficients. The main hypothesis is that the acoustic field radiated by the rail is sparse in a dictionary of plane waves, that the acoustic field radiated by the wheels is sparse in a dictionary of moving sources, and that both acoustic fields are dense in the opposite dictionaries. How well this happens is studied with the coherence between the plane waves and the moving sources. The strength of the proposed approach is that it does not require static tests prior to the pass-by. The separation is performed in three main steps, executed in the time-frequency domain. First, the rail contribution is separated from the total pass-by noise using matching pursuit optimization, promoting solutions with a limited number of plane waves per frequency. Second, the residual between the total pass-by noise and the estimated rail noise is calculated. And third, the wheel contribution is separated from the residual via 1 -norm minimization, promoting solutions that are row-sparse at all frequencies. The separation performance is investigated with synthetic data, and validated with experimental data against reference predictions with the TWINS software, for pass-by noise measurements of trains running at 40, 80 and 160 km/h. © 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
There is a present urge for quantifying railway pass-by noise regardless of the track on which the train is running. Specially so for running speeds under 300 km/h, a regime in which rolling noise is a dominant source. In this regime, the vehicle noise-the target component for the homologation of European trains [1] , is often masked by the noise radiated by the track in the frequency range between 500 and 1600 Hz; leading to geographically-dependent sound emissions of the same vehicle. This dependency is traditionally taken care of by means of transporting the vehicle and measuring the pass-by noise in so-called "silent" tracks.
A number of methods based on microphone array data have been proposed as an alternative to this traditional solution. In 2006, Kitagawa and Thompson reported the use of 1D beamforming arrays to quantify the wheel and rail noise contributions. The approach resulted in an under-estimation of the rail contribution, attributed to the normal orientation of the beamformer towards the track [2] . A few years later, the problem was revisited by steering the beamformer towards a radiation angle that is dictated by the ratio between structural and acoustic wavelengths [3] . Reasonable agreement could be found by tuning the beamformer angles; however, no clear strategy for choosing such angles could be provided. Similar conclusions were drawn in 2012 by Le Courtois et al. with 2D beamforming array data [4] , and further examinations on more complex source directivity were recently carried out by Zhang et al. [5] . Another approach, proposed by Faure et al. in 2015, is the SWEAM method [6] , which approximates the sound radiation of the track with cylindrical waves, and performs an inverse estimation of the structural wavenumbers and decay rates of the waves. More recently, Zea et al. introduced the wave signature extraction (WSE) method [7] , whereby using wavenumber band-pass filters the rail contribution is extracted from microphone array measurements of the pass-by near the track. As the WSE method is tailored for the rail contribution, its obvious limitation is the inability to estimate the wheel contribution. For a recent round robin of state-of-the-art separation methods the reader is referred to the paper by Thompson et al. [8] .
A different perspective to the separation problem is related to the concept of sparsity. A signal that is sparse has few non-zero active coefficients, meaning that sampling only such coefficients should ideally allow for the recovery of the entire signal. This rationale was first applied in the1970 s by Claerbout and Muir [9] , who were concerned with the problem of statistical regression of geophysical data. Since then, mathematicians put a tremendous effort in understanding the machinery underlying sparse signal recovery, eventually leading to the sampling paradigm of compressed sensing, formalized in 2006 by Candès et al. and Donoho [10,11] . In particular, numerous applications of sparse regularization methods to the problem of sound source quantification have appeared in the recent years. The inverse solution is then enforced to meet a sparsity requirement, such that it can be represented with a few coefficients in some vector space: known as dictionary. In acoustics, such dictionaries are typically composed of elementary functions that faithfully resemble vibro-acoustic fields: e.g. plane waves for the vibrations of plates [12] , or equivalent sources for multipole-like sound radiation [13] . Further studies have also considered a combination of dictionaries and solution approaches. For example, Fernandez-Grande and Daudet have used block-sparse representations of the sound field and its derivatives in order to reconstruct both compact and spatially extended sources [14] . In addition, the concept of sparse regularization lies in close connection to the problem of structural (geometric) separation of signals [15] . For instance, in cosmological data analysis, galaxies are sparsely structured into pointlike singularities and filaments [16] ; and in material sciences, microscopic images of composites tend to consist of granular and fibrous structures [17] . Consequently, enforcing sparsity on the different classes of signals facilitates the separation of such signals.
On the basis of these works and ideas, we propose a method to separate the contribution of the rail and the wheel to pass-by noise by means of sparse regularization methods. In essence, the noise contributions are found by projecting the measured pressure field onto different dictionaries: plane waves and moving equivalent sources. To the best of our knowledge, such an application has not been explored yet. The rail contribution is enforced to have a sparse representation in the wavenumber-frequency domain, that is, in a dictionary of plane waves with given incidence angles. And the wheel contribution is enforced to have a sparse representation into equivalent sources that move at the speed of the train. Such dictionary of moving sources can be considered an extension of the sparse equivalent source method [13] .
The structure of the paper is as follows. Section 2 introduces a sparse model for the rail and the wheel noise contributions, including the dictionary representations-plane waves and moving equivalent sources, followed by an analysis of the sensing coherence between them. The separation method is then introduced in Section 3 , describing the separation process step by step. A numerical assessment is included in Section 4 , demonstrating the robustness of the method applied to synthetic data. Section 5 presents experimental validations with pass-by recordings of trains running at 40, 80, and 160 km/h, comparing the separated results with reference predictions by TWINS as well as with estimations via the WSE method; and including an analysis of multipolar sound radiation of the wheels. Section 6 entails a discussion regarding state-of-the-art separation methods. Conclusions are presented in Section 7 .

A sparse model for rail and wheel noise
The motivation to exploit sparsity in the context of rolling noise comes from the existing knowledge about the sound directivity of the track and the wheels [18] . The dispersion of a railway track consists of a finite number of bending wave families that spread across frequencies [19] . In fact, several studies suggest that the rail sound directivity can be approximated with such wave families, which radiate at an angle depending on the ratio of the acoustic and structural wavelengths [3,5,7] . These arguments set grounds for choosing a finite number of plane waves-or wavenumbers-to sparsely represent the rail contribution.
Similarly, studies on the sound directivity of the wheel [18,20] reveal that it radiates nearly equally in all directionslike a monopole-when it is subject to radial motion, whereas it radiates efficiently (inefficiently) in the direction parallel (orthogonal) to the wheel axis-like a dipole-when it is subject to axial motion. In addition, the spatial distribution of wheels is intrinsically sparse throughout the length of the train, typically consisting of two sets per bogie (i.e. per 26 m). Using these arguments it seems reasonable to represent the sound field radiated by the moving wheels with a sparse set of moving equivalent sources. ( Fig. 1 ).

Dictionary representations
The following definitions hold throughout the remainder of the paper: ( x m , y m , z m ) is the position of the m -th microphone of the array, ( ˜ x n (t) , ˜ y n , ˜ z n ) is the position of the n -th moving equivalent source, t is a given time instant, ω is a given angular frequency in rad/s, c and V are respectively the speed of sound in air and the speed of the train (both assumed constant in time), j is the imaginary unit, and · p = ( | · | p ) 1 /p is the p -norm for 0 < p < ∞ .

Band-limited plane waves
Consider a set of L plane waves at frequency ω, whose complex amplitudes at time t are collected in the column vector α ≡ α(ω, t ) = [ α 1 , . . . , α L ] T ∈ C L . Then, the sound pressure at the m -th microphone position due to the plane waves is defined as where k l is the spatial frequency associated with the l -th plane wave. As the strongest contribution of rail noise is bandlimited to a portion of the propagating waves in the wavenumber domain [7] , we band-limit the spatial frequencies to the parabolic region | k l | ≤ k bw = 50 √ ω /c rad/m. This region can be physically understood as a set of bending wavenumbers k whose phase velocities resemble the parabolic-like dispersion of the waves propagating on the track k 4 ∝ ω 2 [21] . The need for band-limitedness will later be supported with the coherence analysis in Section 2.2 , as well as with numerical results in Section 4 .
The pressure field r ≡ r (ω, t ) ∈ C M at the M microphone positions due to the L plane waves can be written in matrix form as r = α, (2) where is the plane wave dictionary. This dictionary executes a similar operation to that of an inverse spatial Fourier transform, synthesizing back the pressure from a collection of spatial frequency components with harmonic time-dependence e j ωt .

Moving equivalent sources
Consider now a set of N monopoles at frequency ω, moving parallel to the x -axis with constant subsonic speed V < c , as shown in Fig. 2 . The source strengths at time t are collected in the vector β ≡ β( Then, the sound pressure at the m -th microphone position due to the moving sources including Doppler effect is defined as where the two terms of φ nm correspond to the far-and near-field components of the sound pressure [22] . Here τ nm denotes the time elapsed since the sound field is emitted from the n -th source until it is recorded at the m -th microphone position. This emission time is earlier than the measurement time t , and its value is found from the solution to the (quadratic) equation [23] c ( t − τ nm ) = d nm The pressure field w ≡ w (ω, t ) ∈ C M at the M microphone positions due to the N moving equivalent sources can be written in matrix form as w = β, (8) where is the moving equivalent source dictionary. This dictionary executes a similar operation to the equivalent source dictionary [13] , with the difference that the effect of source motion is taken into account.
The transfer functions φ nm in Eq. (4) are defined in free space. This means that effects such as ground/ballast reflection and scattering due to the vehicle body are not accounted for, which might cause discrepancies in the wheel estimates. In principle it is possible to measure or model other kinds of transfer functions and account for these effects. However, the numerical estimation of the phased sound field is not trivial in the presence of complex boundaries and/or scatterers. The transfer function would require precise knowledge of phase shifts between transducers (as in e.g. matched field processing), which in the presence of complex boundaries of heterogeneous or unknown properties, is out of reach. Therefore, it is sensible to utilize only the direct field, knowing that the reflections will have less amplitude/carry less energy, leading to robust estimations-this is for example the rationale typically used in sound field control in rooms, and similar problems, where phased propagation is not trivial to estimate.

Sensing coherence
Ideally we want a dense representation of the rail noise into moving sources (i.e. a non-sparse representation, requiring a large number of coefficients), as much as a dense representation of the wheel noise into plane waves. In this way the chances of successful separation increase. As the coherence is an inner product between the columns of both dictionaries concatenated D = ∈ C M×(L + N) , the linear dependency between such columns must be as close to zero as possible. For this one can calculate the coherence metric [24] at fixed time and frequency, where · , · denotes the inner product, and d i is the i -th column of D . Note that the above coherence metric is a single number, while what one wants is that the inner product between all possible column combinations is as low as possible-to verify that there is maximal dissimilarity. One way to examine this situation is to look at the Gram matrix of the concatenated dictionary, which, when plotted, illustrates the linear dependency between all combinations of columns d i , d j . The largest off-diagonal element of G is then equivalent to μ( D ).  can be seen in Figs. 3 (a) and 4 (a) that the off-diagonal coefficients follow a rather fast decay for the wavenumbers considered. Such a decay increases with frequency as the sound field becomes more directional. In contrast to this, the off-diagonal elements in Figs. 3 (d) and 4 (d) exhibit a fast decay only within the distance spanned between 0 and about 10 m, which in fact corresponds to the array aperture 64 × 0 . 16 = 10 . 24 m. As a consequence, sources located outside the array aperture are highly coherent with each other, indicated by the non-zero off-diagonal elements in Figs. 3 (d) and 4 (d). Nevertheless, as it will later be shown in Section 5.1 , the source positions appear to be correctly estimated even when they lie outside the aperture. Similarly, the coefficients in the blocks (b) H and (c) H should ideally be as close to zero as possible. Otherwise, the sparse regularization applied to the combined noise contributions may result in wrongly allocating weights to plane waves which should be allocated to moving sources, and viceversa. As it can be seen in Figs. 3 (b)-(c) and 4 (b)-(c), there is little linear dependency between plane waves and moving sources-the coherence is no more than 0.7 at the largest wavenumbers; see e.g. the coherence values at | k l | ≈ 13 rad/m in Fig. 4 (c). The coefficients in the blocks H and H can then be regulated by means of band-limiting the plane wave dictionary.

Coherence and band-limitedness of plane waves
This relationship can be understood in the wavenumber (spatial Fourier) domain. Propagating plane waves are defined with wavenumbers | k l | ≤ ω/ c , which corresponds to real incidence angles θ ∈ [ −π / 2 , π / 2 ] . What happens is that sources moving outside the array aperture, ˜ , radiate waves incident with grazing angle | θ | = π / 2 , i.e. nearly parallel to the array. This results in an increased coherence between (i) plane waves propagating with grazing incidence angle and (ii) moving sources outside the array aperture. In other words, plane waves and moving sources are increasingly linearly dependent and can be mistaken by one another. An increased coherence can lead to an over-estimation of the plane waves-the pressure due to the moving sources is also collected in the plane wave dictionary; potentially resulting in an under-estimation of the moving source amplitudes.
The main purpose of band-limiting the plane wave dictionary is then to remove wavenumbers corresponding to grazing incidence angles, thus minimizing the coherence with moving sources that are outside the array aperture. As it will later be shown in Section 4 , band-limitedness establishes a compromise between identifying the plane wave families present in the sound field, while keeping the coherence-and separation errors-small.

Separation method
Consider the pressure data measured with the microphone array at fixed frequency ω and time t , denoted with p ≡ p (ω, t ) ∈ C M . This is simply the result of applying spectral density estimation to the time-domain microphone signals.
Assume that (i) the microphone array is close and parallel to the track, and (ii) the train speed is below 300 km/h, such that rolling noise is the dominant source of noise during the pass-by. The measurement model is written as where the entries of z ∈ C M account for modelling errors and/or contributions of other sources of noise: e.g. sleepers, ground scattering, etc. The above linear system of equations becomes under-determined when the number of unknowns L + N exceeds the number of observations M .

Step 1: Rail separation
This first step involves solving the non-convex optimization problem individually for every frequency ω b , b = 1 , . . . , B, in the 1/3 octave band. Here the positive integer K < M , and the 0 pseudonorm α 0 = (i | α i = 0) counts the number of non-zero wavenumber atoms. To solve this optimization problem one can use for instance the orthogonal matching pursuit (OMP) [25] , which seeks for a solution α that is K -sparse. Then, the rail contribution to the array pressure at the b -th frequency r ( ω b ) follows from Eq. (2) . In contrast to the linear filtering of the wave signature extraction method [7] , the proposed separation strategy for the rail contribution entails a nonlinear filtering of the measured data, in which the Fourier coefficients of the filter correspond exactly to the K sparse plane wave atoms. As a consequence, the Doppler effect can be accounted for since the OMP program finds the strongest plane waves per time instant, even when the wavenumbers are shifted.

Step 2: Residual calculation
Once the rail contribution at the array has been estimated in the whole frequency band, the residual with respect to the total pass-by noise is computed, for all frequencies b = 1 , . . . , B in the 1/3 octave band. The residuals at all frequencies and microphone positions are then stacked into the column vector

Step 3: Wheel separation
Intuitively, the wheels should be localized at the same position regardless of the frequency. This means that the array pressure should be row-sparse at multiple frequencies; that is, the same spatial sparsity profile of moving sources is shared across frequencies. A multiple measurement vector approach [26,27] can be suitable for this task. One can then cast the problem under the framework of the least absolute shrinkage and selection operator (LASSO) as for a suitable regularization parameter > 0 that can be related to the noise floor. A suitable choice can be found with a parameter selection method, e.g. the Pareto frontier curve [28] . Then, the corner of the logarithmic plot β ( all ) 1 versus 2 corresponds to the optimal regularization parameter. The block diagonal matrix stacks the dictionary matrices ( ω b ) for all frequencies b = 1 , . . . , B in the 1/3 octave band. Then, the wheel contribution at the array is obtained for all frequencies as

Sound pressure level spectra
The sound pressure level (SPL) spectra per 1/3 octave frequency band centered at f c , at the m -th microphone in the array, are obtained as (18) for the rail contribution; and similarly for the wheel contribution as (19) where δ f is the 1/3 octave bandwidth per unit Hz, p ref = 20 μPa, and A ( f c ) is the A-weight curve coefficient at the center frequency f c . Fig. 5 shows a block diagram of the signal processing chain of the proposed approach. The input consists of the raw time-series data measured with the microphones, which is first filtered into 1/3 octave bands. Thereafter, an iterative timedomain process is executed, beginning with spectral density estimation via the Welch's method [29] . The array calibration data is taken from [7] , and it is used to account for the sound scattering introduced by the frame and supports of the microphone array. Up to this point there is no difference in the processing with respect to the wave signature extraction method [7] .

Processing chain
Next, Step 1 runs the OMP individually per sub-frequency in the 1/3 octave band, leading to an estimate of the rail contribution at each sub-frequency r ( ω b ), as described in Section 3.1 . Once the field is processed at all sub-frequencies ω b , . . . , ω B , the residuals are calculated and stacked altogether in ( all ) , as described in Step 2 in Section 3.2 .
Step 3 involves executing the LASSO optimization routine for the whole frequency band, leading to the estimate of the wheel contribution at all frequencies w ( all ) . The algorithm is repeated until the last block of time data is processed, and the calculation of the rail and wheel contributions entails averaging the SPL values obtained per block of time data.

Separation of synthetic data
The purpose of this section is to assess the performance of the method using synthetic data, given various choices of dictionary atoms L and N , as well as varying bandwidth of the plane wave dictionary k bw . The time-harmonic pressure is synthesized at narrow-band frequency f as the sound generated by 6 plane waves and 4 moving sources: using Eqs.

Separation errors
To assess the separation performance we compute relative separation errors of the time series at the m -th array position pw (x m , f ) = 10 log 10 for the plane waves, and, similarly, for the moving equivalent sources. Here the superscripts (sep) and (ref) denote separated and reference pressure. The errors are evaluated at the center of the array m = 21 . For each changing parameter ( L, N or k bw ), the separation is executed for a hundred realizations of random plane wave amplitudes and moving source strengths; and the separation errors are presented in terms of mean errors with 95% confidence intervals. These error metrics can be understood as the negative of the SNR for each of the sound field components. That is, error values < −SNR correspond to a likely accurate separation; and, conversely, errors > −SNR correspond to a likely poor separation. Fig. 6 summarizes the results. The sparsity of the plane wave solution, K in Eq. (13) , should if known be chosen as the number of plane waves present in the pressure field, i.e. K = 6 . From looking at Fig. 6 (a), the more plane wave atoms L the lower the separation error of the plane waves pw . This can be understood in the wavenumber domain. If the number of plane wave atoms L is too small, the dictionary has too coarse a wavenumber resolution and will likely misplace the K plane waves present in the pressure field. The denser the dictionary resolution, the more likely it will resolve the K plane waves. After about L = 50 plane wave atoms, the separation error at both frequencies seems to converge: using more atoms does not significantly improve the separation. The separation error of the moving sources mes , shown in Fig. 6 (b), appears to be relatively constant with respect to the number of plane wave atoms. The results in Fig. 6 (a) and (b) suggest that as one chooses L ≥ 50 plane wave atoms, both separation errors pw and mes become smaller.
In these simulations a total of 4 moving sources is considered, mimicking four wheels of a train passing by the array. Because of the sequential nature of the separation method (first plane waves, then moving sources), the choice of moving source atoms N does not influence the separation error of plane waves pw . On the other hand, as shown in Fig. 6 (c), the separation error of moving sources mes does not significantly improve when the number of atoms N is greater than three times the actual number of moving sources (4). In fact, there seems to be a small increase in error when N is too large, e.g. see mes for N > 30 at 1250 Hz in Fig 6 (c). Overall, these results suggest that as one chooses N ≥ 12 moving source atoms, the separation error mes becomes smaller.
As explained earlier in Section 2.2.1 , the plane wave dictionary ought to be band-limited. Fig. 6 (d) demonstrates how the band-limitedness of the plane waves plays an important role in the separation of synthetic data. The bandwidth of the plane wave dictionary k bw is varied in the set 50 √ ω /c, ω/c rad/m, and expressed as percentage of the acoustic wavenumber k = ω/c. The dictionaries have L = 50 plane wave atoms and N = 12 moving source atoms. Both errors increase the more the bandwidth approaches the acoustic wavenumber. In particular, the separation error of plane waves pw is relatively constant when the bandwidth is below 65% of k ; thereafter it increases nearly linearly with 0.3 dB per % of k . Similarly, the separation error of moving sources mes increases rather slowly when the bandwidth is below 90% of k ; thereafter it increases nearly linearly with 1.2 dB per % of k . These results suggest that the bandwidth k bw should be large enough to identify the plane waves present in the sound field. But not too large either, otherwise it increases the coherence between (i) plane waves at grazing incidence and (ii) moving sources outside the array aperture, and, as a consequence, a likely poorer separation performance.

Separated SPL at the array
This section includes results of the separated sound fields at the array for various parameters: number of dictionary atoms L and N , and the bandwidth of the plane wave dictionary k bw . Fig. 7 shows the separated and reference SPL at the array, for both plane waves and moving sources. In this case we examine one realization of random plane wave amplitudes and moving source strengths. Time is fixed at t = 3 . 9 ms and the harmonic frequency is f = 1250 Hz. The separated SPLs are compared with the reference SPL. and (c) indicate that the sound field is strongest toward the left hand side of the array, which is expected considering the position of the moving sources with respect to the array. Furthermore, as Fig. 7 (d) demonstrates, the choice of bandwidth k bw influences the separation of both sound fields. In this case the dictionaries have L = 50 and N = 12 atoms. The separated SPL due to plane waves is closer to the reference SPL when the plane waves are band-limited ( + ), compared with the separated SPL when they are not band-limited ( •). The same happens with the separated SPL due to moving sources: it is closer to the reference SPL with band-limited plane waves (--) than with non band-limited plane waves (--). The results from this Section follow the separation performance observed in Fig. 6 .

Separation of experimental data
The experimental data used in this paper is taken from a measurement campaign executed during the H2020-636032-ROLL2RAIL project in 2016 (more details in [7,8] ). Pass-bys at three different speeds are considered: 40, 80, and 160 km/h. These speeds correspond to Mach numbers M = 0 . 03 , 0.06, and 0.13, when c = 340 m/s. Fig. 8 shows a photograph of the microphone array setup at the test site near Munich. The array consists of M = 42 microphones spaced by s = 8 cm, uniformly distributed on a line parallel to the track. The microphone spacing allows for aliasing-free sampling up to 2.1 kHz.
See Table 1 for a summary of the relevant parameters describing the train pass-bys, the data acquisition, and the postprocessing. The dashed box in Fig. 9 shows the segment of the train that is used for the analyses. This segment is approximately 26 m long, corresponding to a bogie and two wheel-sets (i.e. four wheels).   Relevant parameters for the train pass-bys, the data acquisition, and the post-processing. See Fig. 2 for an illustration of the positions of the microphones and the moving sources. Further details on the instrumentation and equipment specifications can be found in [7] .  Based on the results from Section 4 , the dictionaries have L = 50 and N = 12 atoms. The moving sources span a total length of 12 m, i.e. one source per meter. The offset d 0 is chosen according to the time analysis window of interest: in this case d 0 = −9 m guarantees that the N moving source atoms span the region where the wheels are located in the train (see minima in the dashed box of Fig. 9 ). The spacing between moving sources results then from dividing the length of the equivalent source distribution (12 m) by the number of dictionary atoms N . Readings from one accelerometer in the vertical direction of the track can be used to estimate the passage of the wheels.
The sparsity of plane waves in the OMP program of Eq. (13) is set to K = 8 atoms. This choice is motivated with the number of bending wave families propagating in the track. In the frequency range up to 2 kHz there are typically up to four families that cut on: vertical bending, lateral bending, torsional waves, and web bending [19] . Each of these families propagate along the positive and negative directions of the track, hence K = 2 × 4 wavenumbers.
The processing is performed in MATLAB 2019b . The plane wave amplitudes that optimize Eq. (13) are found with the toolbox CoSaMP and OMP for sparse recovery [30] , whereas the moving source strengths that optimize Eq. (15) are found with the toolbox SPGL1 [28,31] .

Time-domain separation results
This section includes results of the iterative time-domain processes in Fig. 5 . The results are shown for the trains running at 80 and 160 km/h in Figs. 10 and 11 , for the frequency bands centered at 500 Hz and 1.25 kHz, respectively. Each of these figures show plane wave amplitudes α and moving source strengths β within the entire frequency band, at three different time instants.  Table 2 ). Right column: absolute value of equivalent source strengths β versus frequency and source position. The dotted vertical lines indicate the position of the wheels as estimated from the acceleration signals in (a), (d) and (g).

Table 2
Wavenumbers of a free Timoshenko beam corresponding to the vertical and lateral bending wave families, for 1/3 octave bands centered at frequencies 500 Hz and 1.25 kHz [7] .

Frequency
Vertical bending Lateral bending 500 Hz ± 3.33 rad/m ± 5.00 rad/m 1250 Hz ± 5.82 rad/m ± 8.27 rad/m It can be seen that the wavenumbers of the plane waves estimated with the OMP program closely resemble those of a free Timoshenko beam. These Timoshenko wavenumbers are included in Table 2 below. The resemblance is clear at 500 Hz, shown in Fig. 10 (b), (e), and (h), in which two wave families can be seen, indicated with the arrows: one with wavenumber of approximately ± 3 rad/m and another of ± 5 rad/m.
It must be stressed that the rail contribution is stronger than that of the wheel at such a low frequency, and, as a consequence, the associated plane waves can be detected even when the array is in front of the wheels; see e.g. Fig. 10 (e) and (h). The situation changes as frequency increases, because the wheel contribution to the pass-by noise increases. Inspection of Fig. 11 suggests that the wave families can be better detected when the array is far from the wheels. That is, the wave families centered at approximately ± 6 rad/m and ± 8 rad/m are more clearly visible in Fig. 11 (b) than in Fig. 11 (e) and (h).
Regarding the moving sources, Fig. 10 (c), (f), and (i), as well as in Fig. 11 (c), (f), and (i), demonstrate that the solution is row-sparse as intended: the sparsity profile is shared across the frequency band. This corroborates the intuition of using the multiple-measurement vector optimization described in Section 3.3 . One key difference between Figs. 10 and 11 is the precision of localizing individual moving sources, which appears to be frequency-dependent. We suspect this is due to the difference in spherical spreading of the sound field. At lower frequencies, such as in Fig. 10 , it results more challenging to distinguish moving sources from one another. In particular, the localization of the sources is more accurate when the array  Table 2  aperture is closer to the wheels: e.g. comparing Fig. 10 (d) and (g) with Fig. 10 (f) and (i). When the array is far from the wheels the sources are not accurately localized: as seen by comparing Fig. 10 (a) with Fig. 10 (c). This result is in agreement with the increase in coherence between moving sources that lie outside the array aperture, as described in Section 2.2 .
On the other hand, moving sources at higher frequencies, such as in Fig. 11 , are localized and distinguished more clearly. Surprisingly so even when the array aperture is far from the sources: see e.g. Fig. 11 (c). Furthermore, the estimated locations of the sources agree reasonably well with the actual positions of the wheels (indicated with the dashed lines). Overall, the estimated moving sources seem to be in fairly good agreement with the moving wheels, justifying the proposed physical model.

Separated SPL spectra
The separated spectra are evaluated at the center-most microphone in the array m = M/ 2 + 1 , and compared against reference predictions obtained with TWINS [8] , as well as with the results obtained using the wave signature extraction (WSE) method [7] . Figs. 12-14 show the SPL spectra for the rail and the wheel, for train speeds of 40, 80, and 160 km/h, respectively. The wheel contribution is also inferred with the WSE method from the energetic difference between the total pass-by and the estimated rail contribution, shown in Figs. 12 (b), 13 (b) and 14 (b).
As regards the rail contribution, it can be seen that the spectra estimated with the proposed approach agree well with the reference data for the three train speeds investigated. The differences are in the order of ± 2 dBA, comparable with the performance of the WSE method up to 1.25 kHz. As it was found in [7] , the WSE method under-estimates the rail contribution above 1.25 kHz. In [7] , the authors attributed such an under-estimation to the presence of higher-order bending waves that were unaccounted for in the WSE filters, and the letter [32] further verified this by means of modifying the filter shapes. The spectra in Figs. 12 (a) and 13 (a) indicate that the proposed approach slightly under-estimates the rail contribution in the higher frequency end; however, not as much as the WSE method does. Unlike the WSE method, the proposed method does not have a fixed wavenumber filter shape, but, rather, it finds the strongest active wavenumbers at every time instant.  Sound pressure level (SPL) spectra of (a) rail contribution and (b) wheel contribution in dBA, for a train running at 80 km/h. Solid lines: reference spectra predicted by TWINS. Solid/dotted lines and triangles: estimated spectra via the WSE method [7] . Solid lines and circles: estimated spectra via the proposed method. The wheel contribution estimated with the WSE method in (b), marked with ( E ) in the legend, is inferred from the energetic difference between the total pass-by and the estimated rail contribution. Fig. 14 (a) shows that the rail contribution estimated by the proposed approach closely follows the reference prediction-and slightly over-estimates it at 2 kHz. As regards the wheel contribution, the spectra separated with the proposed method agree reasonably well for the three train speeds, most noticeably at 800 Hz and 1.25 kHz. Figs. 12 (b), 13 (b), and 14 (b) show that the wheel is over-estimated by the proposed approach at 1 kHz by about 5 dBA; considerably less, however, than the over-estimation resulting from the energetic difference estimate via the WSE method. For frequencies above 1.25 kHz, the proposed approach under-estimates the wheel contribution, particularly for the train running at 40 km/h shown in Fig. 12 (b). The under-estimation at frequencies above 1.6 kHz seems to be the result of having too few equivalent sources to represent the sound radiation of the wheels. If the number of equivalent sources N is increased with frequency, the estimated wheel spectra approaches the ref- Sound pressure level (SPL) spectra of (a) rail contribution and (b) wheel contribution in dBA, for a train running at 160 km/h. Solid lines: reference spectra predicted by TWINS. Solid/dotted lines and triangles: estimated spectra via the WSE method [7] . Solid lines and circles: estimated spectra via the proposed method. The wheel contribution estimated with the WSE method in (b), marked with ( E ) in the legend, is inferred from the energetic difference between the total pass-by and the estimated rail contribution.
erence spectra more closely. Another reason one could consider is the multipolar sound radiation of the wheels, which is investigated in more detail in Section 5.3 below.
On the other hand, Figs. 13 (b) and 14 (b) demonstrate excellent agreement at 630 and 800 Hz between the proposed approach and the reference prediction, with differences below 1 dBA. At 500 Hz, however, the wheel contribution is underestimated in the three cases, and by as much as 15 dBA in Fig. 12 (b) for the train running at 40 km/h. As discussed in Section 5.1 , such an under-estimation is at present attributed to the difference in spherical spreading of the sound field, which complicates the task of distinguishing and localizing individual sources in motion-particularly when the array is far from the wheels, and its aperture is small compared with the wavelength. Another possible explanation to the underestimation is the effect of ground reflection, which is not accounted for.
Lastly, it is worth mentioning that the proposed approach separates the wheel contribution fairly accurately even when its spectrum is far below the rail spectrum-compare ordinate axes between e.g. Fig. 13 (a) and (b) at 630 Hz. This strongly suggests that the use of sparse regularization methods may lead to accurate separation even in cases of poor signal-to-noise ratio.

Multipolar wheel sound radiation
One may wonder whether simple moving monopoles are sufficient to model the radiation of wheels, specially at higher frequencies since more complex modal patterns arise. Naturally, more complex modal patterns would result in a more complex sound field, of potentially multipolar nature. In this section we address this by appending a dictionary of moving dipoles to represent the wheel contribution. In mathematical terms, define the pressure field at the m -th microphone due to ˆ N moving dipoles as [22] where, similarly to Eq. (4) , the two terms of υ nm correspond to the far-and near-field components of the sound pressure.
The pressure contribution of the ˆ N moving dipoles at the M microphones is written in matrix form as is the moving dipole dictionary. Appending the sound field due to the moving dipoles means the pressure field at fixed frequency ω and time t , recorded at the microphone array, is redefined as Then, the matrix defined in Eq. (16) is redefined as ˆ ≡ˆ (ω, t ) ∈ C MB ×(N+ ˆ N ) B , the concatenation of monopole and dipole dictionaries, and the solution vector to the LASSO program (15) is modified accordingly, Lastly, the wheel contribution to the sound pressure at the array is obtained for all frequencies as ˆ . In what follows we set ˆ N = N, and the problem parameters presented in Table 1 remain unchanged. Table 3 lists the SPL spectra of the wheel using only moving monopoles, using both moving monopoles and moving dipoles, and the reference spectra predicted by TWINS.
The most noticeable differences between the two cases-using only monopoles versus using both monopoles and dipoles-appear at 500 Hz and 2 kHz, with an effective increase of the estimated wheel contribution by 0.5 dBA. Such an increase indeed makes the estimated wheel contribution closer to the reference predictions. In the remainder of the frequency range, however, the differences are rather negligible, in the order of 0.1 dBA. Similar results are obtained for the two other train speeds of 40 and 160 km/h, not reported here for the sake of brevity. From these findings, we suspect that using higher-order multipole expansions will not result in considerable differences.

Discussion
It is straightforward to compare the proposed method with the wave signature extraction (WSE) method [7] . On the one hand, the method proposed here estimates the bending wavenumbers during the pass-by with the matching pursuit optimization program, whereas such estimates of the WSE method rely on sufficiently long recordings of the sound field before and after the pass-by . Thus, the proposed approach circumvents altogether the dependency of available recording time (before and after the pass-by) on the rail contribution estimation, a crucial limitation of the WSE method for measurements on tracks with higher decay rates. Yet another disadvantage of the WSE method is its inability to estimate the wheel contribution: it can only provide an estimate of the energetic difference between the total pass-by noise and the rail noise. Such an estimate most often leads to an over-estimation of the wheel noise, as shown in Figs. 12-14 . In contrast to this, the method proposed here performs a (sparse) linear regression to the residual between the total pass-by noise and the estimated rail contribution, using a physical model of the moving wheels into moving equivalent sources.
A few words can be said about other state-of-the-art methods. According to the report by Thompson et al. [8] , only two methods have been proposed to identify the wheel component: beamforming and pass-by analysis (PBA). Beamforming consistently over-estimates the wheel contribution, with differences of up to 20 dBA at 1 kHz (see Fig. 19(a) in [8] ). The two variants of the PBA method studied in that paper either over-estimate the wheel contribution significantly at frequencies below 1.6 kHz, or produce inconsistent estimates with respect to the reference results (see Fig. 19(b) in [8] ). One must however be careful in comparing the data obtained here with the data reported there, as the measurement distances are different. The application of the proposed method at other measurement distances should be performed in order to make a fair comparison. It is at any rate promising that the wheel contribution estimated with the proposed method is in fairly good agreement with the reference predictions by TWINS.

Conclusions
This paper introduces a method to separate the rail and wheel contributions to pass-by noise, by means of sparse regularization of microphone array data. The main contribution of this work is to enable the joint separation of the two components without requiring static tests prior to the pass-by, using a collection of microphones near the track. Because sparse regularization methods are used to separate the two noise components, requirements regarding signal sparsity and sensing coherence must be met. Based on the existing knowledge about the sound radiation of the rail and the wheels, the sound field radiated by the rail is assumed to be sparse in a dictionary of band-limited plane waves; and the field radiated by the wheels is assumed to be sparse in a dictionary of equivalent sources that move with the speed of the train. An analysis of the sensing coherence between the two dictionaries is included, showing that band-limited plane waves and moving sources are incoherent within the array aperture. A numerical assessment is also included, showing the performance of the proposed method to varying dictionary sizes; as well as the bandwidth of the plane wave dictionary, demonstrating the need for band-limited plane waves. Lastly, the proposed method is experimentally tested with measurements taken during the European project H2020-636032-ROLL2RAIL. Overall, the separated spectra of the rail and the wheel are in good agreement with the reference data predicted by the TWINS software.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.