Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach

A kernel interpolation method for the acoustic transfer function (ATF) between regions constrained by the physics of sound while being adaptive to the data is proposed. Most ATF interpolation methods aim to model the ATF for fixed source by using techniques that fit the estimation to the measurements while not taking the physics of the problem into consideration. We aim to interpolate the ATF for a region-to-region estimation, meaning we account for variation of both source and receiver positions. By using a very general formulation for the reproducing kernel function, we have created a kernel function that considers both directed and residual fields as two separate kernel functions. The directed field kernel considers a sparse selection of reflective field components with large amplitudes and is formulated as a combination of directional kernels. The residual field is composed of the remaining densely distributed components with lower amplitudes. Its kernel weight is represented by a universal approximator, a neural network, in order to learn patterns from the data freely. These kernel parameters are learned using Bayesian inference both under the assumption of Gaussian priors and by using a Markov chain Monte Carlo simulation method to perform inference in a more directed manner. We compare all established kernel formulations with each other in numerical simulations, showing that the proposed kernel model is capable of properly representing the complexities of the ATF.


Introduction
Predicting how an environment alters a sound wave propagating within it is a complex problem with no closedform solution.The effects that scatterers, the boundaries of the environment, and other such environmental factors can have on sound waves are not fully understood.Assuming the environment has time-invariant acoustic characteristics and alters sound linearly, we can model these changes rather effectively by studying the behavior of the room impulse response (RIR) between any source and receiver positions in the environment, and specifically its frequency response the acoustic transfer function (ATF), which has interesting physical properties we can utilize for interpolation.Our main concern in this paper is the ATF for variable source and receiver positions within the regions measured at a discrete set of points.
Most current methods for ATF and RIR interpolation are limited to point-to-point, meaning single source and receiver, and point-to-region, meaning single source with variable receiver positions [1].Examples of such methods would be considering ATFs to represent standard linear time-invariant poles/zeros systems [2,3], approximate them using elementary wave functions [4][5][6], embedding physical properties using physics-informed neural networks [7] and reconstructing RIRs from data by exploiting spatial patterns and similarities using data-driven universal approximators [8,9].These methods do not account for source position variation and impose restrictions to the environment that are not broadly applicable, such as requiring big data sets with specific sampling point distributions.We aim to create an ATF model that can be employed broadly.
Thus, our objective is to create a region-to-region ATF interpolation approach that is constrained by the laws of physics.Specifically, we want to construct a linear estimator that always satisfies the Helmholtz equation [10].In [11], such an estimator was proposed by considering the ATF to be the superposition between a direct and a reverberant component, while approximating the latter with a truncated series expansion of wave function solutions to the Helmholtz equation.Previously, we extended this method to the equivalent of a series expansion of infinite order by using kernel ridge regression [12,13].We incorporated the physics of the problem into the kernel function by representing the reverberant component as a continuous superposition of plane waves.We would later extend this formulation to target reflective field components of higher amplitude while being insensitive to components of lower amplitude [14].
We propose the use of a fully adaptive kernel function for the ATF that takes into consideration directed and residual sound fields as separate kernel functions added together, creating a complete model for the reverberant field caused by the environment The directed reverberation field is caused by a sparse set of reflective components with much greater amplitudes than average.This field was modeled based on a point-to-region kernel proposed in [15] to interpolate the sound field using von Mises-Fisher distributions [16].The reverberant component, is composed of the remaining densely distributed plane wave components of lower amplitude.Due to the presence of densely distributed higher order reflections with low amplitude adding up together, we employ a neural network (NN) for the weight function.Using our generalized formulation for the ATF, our weight function could fully represent the nuances of the ATF.This kernel function formulation was initially proposed in [17], and is explored further in this work.An analogous formulation was also adapted for the point-to-region case in [18].
Previously, we have optimized the parameters of the kernel functions using robust loss functions [14] and simple least squares [17] combined with other optimization techniques [19].In this work, we take a probabilistic approach and instead propose two different optimization schemes based on Bayesian inference [20].One of them is the well-established Gaussian process regression (GPR) [21], or Kriging, and represents the optimal solution for kernel ridge regression under the assumption of Gaussian priors [22], in contrast to the simple least square used in [17].Another approach we propose is utilizing probabilistic programming [23] to infer the effects the kernel parameters have on the resulting distribution using a Markov chain Hamiltonian Monte Carlo (MCHMC) sampling method [24,25], specifically the no u-turn sampler (NUTS) [25], a variant of MCHMC.We combine the use of more sophisticated techniques by making the method more general and not assuming the amplitude or phase of the direct component is known, unlike in [17,18], which makes the method more broadly applicable.
The remainder of this paper is organized as follows: in Sect.2, we discuss the needed preliminary knowledge on the ATF and the formulation of the central problem from which we derive the kernel functions, as well as briefly discuss previous kernel formulations.Following that, in Sect.3, we discuss the formulation of the adaptive kernel and justify the choices made in its design.Next, we go over the optimization of the model parameters using Bayesian inference in Sect. 4 in order to derive both GPR and MCHMC-based estimations.Following that, we go over the numerical experiments and the results of the attempted interpolations.Finally, we discuss the totality of our findings in the conclusion.

Preliminaries and objectives
Suppose that there is a space ∈ R 3 with stationary acoustic properties.Suppose also that the speed of sound is c and the frequency is f.Consider that contains two simply connected regions of interest with arbitrary geometry referred to as the source region S ⊂ and the receiver region R ⊂ .The ATF h : R × R × R → C between a monopole source placed in position s ∈ S , and a pressure microphone placed in receiver position r ∈ R and wave number k = 2π f /c is written as h(r|s, k) .Henceforth, the wave number k will be omitted from arguments for notational simplicity unless explicitly necessary.

Physical model of ATF
The behavior of h depends on the properties of ; however, there are properties of the ATF model that we assume to hold regardless of environment.The first of them is that the ATF can be divided into a direct component h D and a reverberant component h R [11] as The second property is that the form of h D is consid- ered to be known: (1) where G 0 is the free-field Green's function [10] and α 0 is a multiplicative coefficient unknown to us.This model is similar to that of the exterior field with a scatterer proposed in [26], and could plausibly be extended to multipole sources using a similar formulation.
The third property is that, as S and R are free of scatterers and sources external to those relevant to the measurements, the reverberant component has an unknown form with known physics [13] in the form of the homogeneous Helmholtz equation when applied to both source and receiver coordinates: where ∇ 2 r is the Laplacian operator applied only to the receiver position coordinates, and ∇ 2 s is the Laplacian operator applied only to the source position coordinates.
Finally, as our source is a monopole, the ATF is considered to be reciprocal [27].This means if you switch around which point is the source and which is the receiver, the measured pressure field should not change.In other words, Given h D has a straightforward formulation that can be expressed with a single coefficient, we must define a (2) h D (r|s) = α 0 G 0 (r|s) = α 0 e ik�r−s� 4π �r − s� , (3) feature space H for the reverberant component that sat- isfies all of the properties expanded upon here while also allowing for solutions we can feasibly derive.

Problem statement
Suppose that we distribute a total of L point sources in the source region at known positions {s l } L l=1 ⊂ S and M pressure microphones in the receiver region at positions {r m } M m=1 ⊂ R , as shown in Fig. 1.We then record all possible N = LM ATF samples between each source and microphone position pair as where n = m + (l − 1)M is the index of the pairs.We also define the vectors of measurements y = [y 1 , . . ., y N ] T and noises ǫ = [ǫ 1 , . . ., ǫ N ] T , here not assumed to have any specific probabilistic distribution.
Given the properties outlined in Sect.2.1, our objective is to define a probabilistic model constrained by those physical requirements.We can find the most probable model parameters given our data as where p(α 0 , h R |y) is the posterior distribution, H is the functional space containing the reverberant component, p(y|α 0 , g R ) is the likelihood, p(α 0 , h R ) is the prior, and p(y) is the marginal probability distribution associated with measuring y , a quantity we cannot know for certain (7)  and that we cannot alter.This formula, known as Bayes' theorem [20], indicates that optimizing the posterior is equivalent to optimizing only the numerator, meaning our model parameters are estimated with a maximum a posteriori (MAP) approach.This estimation is determined as where α0 is the direct component coefficient and ĥR is the interpolation function of the reverberant component.
The interpolation function of the direct component ĥD is obtained by using the value of α0 in (2).Then, our final interpolation function ĥ is the sum of both.
In order to solve (10), we need to properly define H as well as determine the distributions associated.It is important that the feature space of the ATF allows for a tractable formula to compute the interpolation function.We opted to define H as a reproducing kernel Hil- bert space (RKHS), which yields a comprehensive and very general framework.

Reproducing kernel Hilbert space for reverberant component
Kernel methods have many applications in machine learning and statistical modeling [21,22] that make the formulation of our feature space as a RKHS especially attractive.A RKHS is a type of functional space that is also a Hilbert space, meaning a complete inner product space [28], equipped with an inner product �•, •� and a kernel function κ called the reproducing kernel of H , which is unique [29].
The impetus behind defining such a particular space is the representer theorem [29,30], which states any empirical optimization criterion can be optimized using a linear estimator based on the reproducing kernel and on the measurement arguments.An empirical optimization criterion in this context means the optimization of some objective function based on empirical measurements, for which (10) qualifies.In other words, the interpolation function of the reverberant component can be guaranteed to be of the form (10) α0 , ĥR = arg max where α ∈ C N is the coefficient vector and κ is the func- tional vector.Therefore, as long as we define the space H so it has the desirable properties for the reverberant component, our interpolation function will satisfy those conditions as well.

Formulation of RKHS with directional weighting
Given that the sound field interpolation problem and the ATF interpolation problem are closely related, we look to the sound field estimation problem and the usage of the Herglotz wave function [31,32] as a form of defining a general RKHS [15,33].This allows for the creation of an equivalent formulation for the ATF [14,17], except defined on two directional components instead of one: where S 2 ⊂ R 3 is the unit sphere, representing the direc- tional space, ŝ ∈ S 2 is the directional component relating to the source position, r ∈ S 2 for the receiver, and hR rep- resents both phase and amplitude of this directional pair.
Using this formulation, the inner-product space (H, �•, •� H ) can be defined as [14,17] where • is the complex conjugate, g 1 , g 2 ∈ H are generic functions in the feature space, w : S 2 × S 2 → R + is the directional weighting function, and L 2 S 2 × S 2 , w rep- resents the square-integrable functions on the domain S 2 × S 2 and for the weight function w.
The relation hR r, ŝ = hR ŝ, r guarantees the reci- procity of ATF, meaning (H, �•, •� H ) is an inner product space that satisfies our model requirements outlined in Sect.2.1 regardless of the weight function w.Because H inherits the completeness of L 2 spaces, it is a Hilbert space.
We can also define a bivariate kernel function such that h R (r|s) = T hR ; r|s which we can show is a reproducing kernel by operating with it on a generic function g R ∈ H to show that: to which we can apply the reciprocity to show that meaning κ is a reproducing kernel for a wide variety of weight functions w and thus our space is indeed a RKHS.Therefore, we have constructed a physical model guaranteed to satisfy the Helmholtz equation and to be reciprocal while the weight function w can freely learn using only data-driven methods.

Uniform weight
In [12], we proposed a kernel ridge regression (KRR) method for ATF interpolation.In [13], we showed that the proposed model was equivalent to a uniform weight, i.e., a constant weight function: While this kernel function can be sufficient in many problem configurations, the lack of parameters harms its performance, as it cannot properly adapt to properties of the environment.

Sunken sphere weight
Due to the separation of direct and reverberant fields, the wave components associated with directions similar (17) to that direct path should be reduced, as those components are present in the direct field function.For that reason, the authors proposed a weight function in [14] that rejects components in the η0 direction connecting the centers of the regions and promotes lateral components.This weight will be referred to as the sunken sphere weight w sk , due to the shape of its gain plot [14].
We guarantee reciprocity by making the weight function w sk separable, meaning w sk r, ŝ = ẘsk r ẘsk ŝ .The auxiliary weight ẘsk determines the auxiliary kernel function as where x and x ′ are positions, x = x − x ′ , Re is the real part, and γ sk , β sk are parameters of the weight.The resut- ing reverberant component kernel function κ sk can be expressed in terms of the auxiliary kernels as This adaptive kernel is capable of delivering better estimations than the uniform kernel [14], but still represents a rather limited model for the reverberant field.

Adaptive kernel function for directed and residual reverberation
Both the uniform and sunken sphere kernels lack the necessary complexity to fully represent the features of the reverberant component for arbitrary environments.The uniform kernel assumes equal gain for all directions, which is unrealistic, as we expect incoming wave components to be stronger or weaker depending on the relative positioning of R within .The sunken sphere kernel allows for greater complexity, but it still assigns the same gain for directions coaxially distributed around the antibias direction η0 .
We propose the use of a kernel function that is fully adaptive to the environment by defining a combined kernel function that addresses both directed and residual reverberant fields as separate models.The directed (22) ẘsk r; field component represents a select set of reflective field components with high amplitude expected to result from early reflections on the boundary of and possible constructive interference.The residual field represents the remaining components of the reverberant field, an infinite superposition of reflections with weaker directionality.

Directed kernel function
The directed field kernel function aims to represent wave components with strong directionality, and as such is directed towards a sparse set of bias directions where the weight w dir shows higher gains.The auxiliary weight ẘdir and kernel κdir can be expressed as where ηd D d=1 ⊂ S 2 are the bias directions, β is the gain coefficient of each direction, γ are the combination coef- ficients of the directions and C is the normalization function for the kernel.This auxiliary kernel is a combination of von Mises-Fisher distributions, chosen both due to their rapid growth that promotes greater amplitudes, and due to their convenient properties in directional statistics [16], which allows for a closed form solution for the kernel [34].It was initially introduced in [15].The ℓ 1 -norm criterion for γ also induces sparsity, coherent with our assumptions of the directed field.
We can then obtain the ATF weight w dir and kernel κ dir in a similar manner to the sunken sphere ATF kernel as (25) ẘdir r; β, γ = 1 4π w dir r, ŝ = ẘdir r ẘdir ŝ (31) κ dir r|s, r ′ |s ′ = 1 2 κdir r, r ′ κdir s, s ′ +κ dir r, s ′ κdir s, r ′ , which always guarantees that reciprocity is upheld.Given that j 0 is an entire function [35] and even, compensating for the square root, this kernel is differentiable everywhere on both parameters, making this kernel capable of being optimized using simple descent criteria.This kernel function is much more nuanced than the sunken sphere kernel, being able to represent more directions with coefficients learned from data.The directed weight has been shown to be very effective in representing reflections with very strong directionality [15], however the residual field composed of densely distributed lower amplitude wave components could not be entirely represented.For that reason, this kernel was combined with a residual kernel.

Residual kernel function
This kernel was created to address the expected behavior of the residual field [17], composed of numerous lower amplitude higher order components that could not be adequately addressed with the simpler directed field model.This field is more challenging to properly characterize than the directed component, as the residual field is comprised of densely distributed wave components generating directional patterns harder to represent mathematically.Instead, this kernel approximates the expression in (17) with a more flexible weight function without as many predetermined biases, unlike the directed component.As we have a shallow understanding of the expected behavior of this field we opt to use a universal approximator, in this case a NN, as the weight function in order for it to adapt to patterns learned from data.
We can impart into the weight the known properties of the residual field by carefully choosing the architecture of the NN and the field representation.In the interest of easing numerical load and reduce redundant calculations, we have opted to make this weight separable as well, thus directly enforcing the reciprocity.The auxiliary weight ẘres and auxiliary kernel κres can thus be expressed as where NN is a fully connected neural network receiving as argument a direction vector r ∈ S 2 , NI[•] indicates the integral is performed numerically, and θ are the param- eters of the NN.The ATF weight w res and kernel κ res are obtained as a combination of auxiliary kernels, like the previous kernels: (32) ẘres r = NN r; θ , (33) κres x, x ′ = NI S 2 ẘres e −ik r•�x dr , (34) w res r, ŝ = ẘres r ẘres ŝ This kernel function is not expected to perform properly by itself.NNs are structured as to take advantage of patterns in data, meaning that representing sudden localized growth requires either a special architecture or a deep network trained on external data that would carry too many parameters to be feasibly trained under the constraints of our problem.In other words, this kernel is not equipped to represent the directed field, but is expected to perform well in conjunction with the directed kernel.
The integration grid is not a trainable parameter of the kernel but it is still malleable, allowing for sampling in order to balance out model complexity and precision in the computation of the integral.As this estimation is essentially a discretization of ( 17), the defined kernel has an associated inner-product and thus approximates a properly-defined kernel function.
The NN weight employed here is a continuous function capable of estimating an arbitrary number of weight values with a fixed set of parameters.Given our objective is to create kernel functions for interpolations between source and receiver arrays, the expectation is that data is relatively scarce, as experiments with variable source position are difficult to perform.For that reason, this network should remain compact as to avoid overfitting.This creates an issue of representational power, which we addressed by employing a technique used primarily for keeping NNs memory-efficient (Fig. 2).(35) κ res r|s, r ′ |s ′ = 1 2 κres r, r ′ κres s, s ′ +κ res r, s ′ κres s, r ′ .

Network architecture
The architecture of the network was meant to remain compact, meaning we did not assign many hidden layers to make it a deep neural network (DNN), but rather employed 2 hidden layers with 12 neurons each.We also employed an extra layer that simply assigned a ReLU function, ReLU(x) = max(x, 0) , to the output of the previous layer in order to keep the weight nonnegative, one of the few requirements we imposed to it.For the rest of the network, we employed a special type of activation function, the rowdy activation [36].
The rowdy activation function is a special class of activation functions that are also adaptive to the data.That is to say, the rowdy carries with it network parameters that are few in number, but influence the entire network.We define the rowdy activation function ϕ as where ϕ 0 represents a standard activation function, in our case tanh , t is an iterator, T is the number of sinusoidal perturbations, {ω t } 2T t=1 are the angular frequencies of the sinusoidal functions, and {τ t } 2T t=1 are the amplitudes.Due to tanh being analytic in the real line [35] and both sin and cos being entire functions, this activation function is always differentiable and will pose no problems for the learning of the NN.(36)

Parameter optimization based on Bayesian inference
The introduction of Bayesian inference methods has had a monumental impact in machine learning and inverse problems as a whole [20,37].By taking into account uncertainties in our model, we can derive estimations that are less sensitive to unexpected observations.It is very unlikely for a function resulting entirely out of randomness to be continuous, much less differentiable [38].For that reason, random signals such as noise have very low probability of showing the same patterns as a signal satisfying a differential equation, which requires it be differentiable.By imposing physical constraints into our methods, we can guarantee differentiability.For a proper choice of partial differential equation, our data models should serve as strong priors.
In considering the physics of the ATF outlined in Sect.2, we can create coherent models with the behavior of the ATF satisfying the Helmholtz equation.This factor makes the process of Bayesian inference more guided.We experimented with two different forms of Bayesian inference techniques: GPR and MCHMC.

Gaussian process regression
GPR, also known as Kriging [21], is the solution to our problem statement when we consider the signal and noise distribution to be Gaussian distributions where the variance of the noise is known.
Consider that by evaluating the model on the measurement points, we see: where G is the vector of direct components, K is the Gram matrix.The optimization of α can be given using kernel ridge regression [17,22] and the optimization of α 0 can be given using least squares [19] as where = σ 2 is the regularization constant, which also corresponds to the variance of the noise.Under these (38) assumptions, we can rewrite (61) as an optimization problem only on the parameters of the model χ to find And of course, the parameters χ are internal to the ker- nels and thus are dependencies of the Gram matrix K and of the coefficients α and α 0 .The detailed derivation of this estimation is explored in Appendix 1.
The optimization of χ is performed with gradient descent [22] for the parameters of the residual kernel and the reduced gradient method [19] for the directed component, in order to determine the descent direction of γ while keeping the unit norm condition [17].
GPR is a well-established technique that supercedes simple KRR [21].However, it relies on both data and noise having normal distributions of known variance.We study next a method that can quantify uncertainty into the estimations.

MCHMC regression
Most Bayesian-based models need assumptions made about the distributions involved.One rather common assumption is that of Gaussian priors [38].GPR is a widely used technique very attractive due to the form of the posterior being very tractable and straightforward to optimize.However, the posterior assumed might be nonconvex in regards to the kernel parameters and might have saddles and local minima that hinder optimization.
Another option is to estimate the parameters of the problem using probabilistic programming.We use a similar approach for regression as [39] did for the related problem of sound field reconstruction, meaning we impose that our model parameters χ are entirely comprised of random variables.We evaluate the posterior based on samples of χ taken using a probabilistic programming language [23].We also assume we have a Markov process, meaning the next sampled state random variable χ t+1 only depends on the current state random variable χ t of values for our parameters and their prob- ability densities are connected by a transition function τ t such that: where X represents the values our random variable can take, and the transition function is a conditional probability density function that informs the sampling strategy we are using.
When the transition function preserves the underlying posterior distribution, our samples will start to tend (43) towards the typical set for a big enough number of randomly generated samples [24].An example of a sampling strategy that does so is Hamiltonian Monte Carlo, which makes it a very potent sampling strategy.Our chosen sampler is a variation of Hamiltonian Monte Carlo, NUTS [25].
With a sufficiently big number of samples, we can obtain the probability of observing y conditioned to our distribution of χ as p(y|χ ) , at which point we can per- form Bayesian inference on the parameters using Bayes' theorem.The use of MCHMC thus allows us to quantify the effects of our parameters on the model continuously without needing to sample the model continuously while approaching the typical set conditioned to the observations.

Experiments
We experimented with the methods in numerical simulations within a shoebox-shaped room of size 7.0 m × 6.4 m × 2.7 m with the center of the room being the center of the coordinate system.The simulations were conducted using the image source method [40].The reflection coefficients and maximum order of terms for the calculations were obtained using pyroomacoustics [41].We also generated noise with a probabilistic wind noise simulator [42,43].The choice of wind noise was due to two factors: the first factor is that wind noise is a relevant form of unpredictable, probabilistic noise, as it can result from devices such as air conditioning units.The second factor is that wind noise has a much less predictable probability distribution than Gaussian noise, so it should serve to establish how well each method performs under less than ideal conditions.
The source region S was considered to be a sphere of radius 0.5 m centered at (0, 0, 0) m .The receiver region was also a sphere of radius 0.5 m , but centered at (−2.5, −1.75, −0.2) m .The source and receiver positions were both placed in identical dual layer spherical arrays with a disposition determined by t-design [44,45] for t = 4 in both layers, resulting in a total of 50 points and 2500 measurements of the ATF.These configurations can be seen in Fig. 3.
The signals were added noise signals that had their standard deviation adjusted so that each sample would have a signal-to-noise ratio ( SNR ) of 20 dB , as the ATF interpolation problem is performed under quite controlled situations.We experimented with three different reflection coefficients for the wall equivalent to reverberation times T 60 ∈ {0.2, 0.4, 0.8} s .Finally, each frequency was multi- plied by a complex number chosen randomly with complex uniform distribution, meant to represent the direct field component, as it was also part of the estimation.
Our proposed formulation of deriving the model parameters of the mixed kernel using GPR, Proposed (GPR), and NUTS sampling, Proposed (MCHMC), were compared to GPR for the uniform weight kernel, Uniform, and the sunken sphere kernel, Sunken sphere.In order to also validate the choice of a dual kernel model and quantify the effects of only adopting one of the models, we also compared the proposed methods to the directed kernel by itself, Directed only, and the residual kernel by itself, Residual only.The numerical integrator in (33) for Residual only, Proposed (GPR) and Proposed (MCHMC) was based on Lebedev quadrature [46] of order 15.The bias directions {η d } D d=1 introduced in (25) of Directed only, Proposed (GPR) and Proposed (MCHMC) were calculated using Lebedev quadrature of order 7.Each kernel model was trained independently of the others and the integration grid orders were chosen empirically.

Settings for parameter optimization
The parameters for the GPR estimation were obtained by optimizing the posterior log likelihood.The regular- ization constant was set to 10 −2 for all the kernels, as its true value should not be considered a parameter of GPR.Doing otherwise can lead to overfitting.The coefficients α and α 0 were determined as described in (41) and (42).The other kernel parameters were determined using gradient descent apart from γ , which was opti- mized using the reduced gradient descent algorithm [19] with line searching in order to guarantee the unit norm was preserved.The descent direction associated with γ is δ = [δ 1 , δ 2 , . . ., δ N ] T , where where L opt represents (10) when the coefficients α 0 and α are considered to be ( 42) and ( 41), respectively, γ max is the current biggest value in γ .This constrained opti- mization was implemented in the Julia programming language [47] using the Optim library [48,49] under the Flux framework [50].
For the MCHMC regression on the mixed kernel, the settings correspond to defining prior distributions for our model parameters and how they will effect change in our posterior.For these priors, we want γ to be com- posed of only positive numbers and to promote sparsity.As these coefficients are always acting together with the coefficients β , we decided to place them in a hierarchical model: where E is the exponential distribution and Ŵ −1 is the inverse gamma distribution.The logic behind this parameter choice is simple: our hypothesis that the directed component is composed of a select number of reverberant field components, hence the usage of sparsityinducing criteria.The inverse gamma distribution skews for smaller values of β d .Conversely, each β d was chosen to have an exponential distribution because the only requirement we have for it is that it is not negative.Additionally, if γ d does happen to still be sparse, our directed and residual models would be given further credence.
Given that our coefficient vector α has a total of 2500 parameters, we opted to keep it deterministic in order to lessen computational load and it was calculated using (41).However, for this estimation, the regularization constant was considered to be a variable of the model due to MCHMC admitting quantified uncertainty.As such, it was assigned an exponential distribution with a minimum value of 10 −3 .Another parameter we will introduce for the sampling stage is the amplitude of the direct component α 0 , which will serve as an alternative to the simple division applied in (42), as these operations can be rather unstable and costly by requiring repeated operations with the matrix inverse.Noise was quantified by assigning a distribution dependent on .Finally, we have no conditions for our network parameters θ .Given the freedom embedded in these parameters, we assigned them Gaussian priors.(45) for γ n = 0 and 16), (47) where N is the real-valued normal distribution.
Finally, the model was fitted to our observations by calculating the Gram matrix K , sampling the predictions ŷ and adding the error to them: which were set to be sampled 1000 times with the NUTS implementation in [23].Once we obtained the posterior distribution of the data model, we can use Bayes' theorem in order to get the marginal likelihoods of our parameters.Finally, the parameters are estimated by sampling the parameter space around the mean within one standard deviation and picking the simulations that determined highest likelihood.

Evaluation metrics
We evaluated all competing kernel methods using two different criteria.The first criteria was the evaluation of each method on aggregate, within the entire region.The second criterion was a visualization of the ATF spatially and analysis of pointwise performance.
For the first criterion, we sampled the target region in a regular grid with a step of 0.2 m in the source and receiver regions, resulting in L e = 217 evaluation source positions {s e,l } L e l=1 ⊂ S and M e = 217 evaluation receiver positions {r e,m } M e m=1 ⊂ R , giving us a total of N e = M e L e = 47, 089 evaluation pairs.For the aggregate evaluation, our chosen metric was the normalized mean square error ( NMSE ) calculated for these evaluation points as where The second evaluation was performed by simulating the ATF caused by a single source at the center of S s 0 = [0, 0, 0] T for a reverberation time of 0.4 s .We simulated the ATFs in the square area {(x, y, z) : −3 < x < −2, −2.25 < y < −1.25, z = −0.2}, in order to visualize how well the methods can reconstruct ATFs.We evaluated both the ability of each kernel function to reconstruct the real part of the ATF as well as the pointwise normalized square error ( NSE ) defined as Finally, we also mapped out the gain for the final weight function w of the Bayesian regression for the same frequency analyzed for the visualization criterion, 1.65 kHz , in order to show how well our hypothesis about the combined kernel held up. (55

Experimental results
The results for our aggregate error analysis can be seen in Fig. 4, which show Proposed was able to achieve lower error across every frequency, both Proposed (GPR) and Proposed (MCHMC).For all reverberation times and frequencies, Uniform was the worst estimation, as was expected due to this kernel having no parameters to learn.For the lowest reverberation time of 0.2 s , Directed only and Sunken sphere had very similar performance to Proposed (GPR), as the present reverberant field is expected to be mostly wave components that have not been deeply impacted by the environment.Under these conditions, the simpler Directed only model is likely sufficient to perform estimations.However, for both 0.4 s and 0.8 s , the proposed meth- ods stay somewhat stable while Directed only and Sunken sphere performance degrade and approach that of Residual only as the reverberation time increases.As was within expectations, the Residual only kernel weight was not capable of interpolating the ATF properly when alone, likely due to the way it is structured.The fact we included error estimation and as such introduced uncertainty to Proposed (MCHMC) resulted in the most robust evaluations across all metrics.
The attempted reconstructions of the real part can be seen in Fig. 5, showing that once again Proposed (GPR) and Proposed (MCHMC) performed the closest reconstructions of the ATF in a pointwise basis, likely due to the representative power of the kernel model, which considered a more complete representation of the reverberant field than the other kernels.Perhaps due to the abundance of measurement points from the array present in the region, it might be difficult to tell apart the differences between Proposed (GPR) and Proposed (MCHMC).However, we can see more clearly that Proposed (MCHMC) outperforms Proposed (GPR) in Fig. 6, in which it is clear that Proposed (MCHMC) has higher overall representative power spatially too.
Uniform was not capable of properly interpolating the ATF pointwise, showing big dark regions and very low gain even where the approximation was somewhat close.Sunken sphere has a very smooth and uniform error estimation, likely due to how simple its weight estimation is.Interestingly, the gain in the direction pointing to the source and in the directions orthogonal to it is rather stable, likely due to how the weight is configured.Residual only shows unstable error behavior, showing that the residual kernel alone is not proper for interpolations without an added kernel capable of representing strong directionality.
Finally, we also decided to observe the kernel weights of Proposed (MCHMC) in order to confirm whether or not the Monte Carlo simulation-based estimation was capable of learning the patterns we expected.These gain plots can be seen in Fig. 7.The Directed kernel weight has regions of high gain separated by regions of low gain, indicating that the distribution of γ dependent on β induced sparsity.The residual weight also shows somewhat complimentary behavior to the directed weight, having some notably darker spots where the directed weight observes its peaks.The combined full weight shows that the gain of the directed kernel has more influence on the overall shape of the weight function, which would explain the directed kernel performing better by itself than the residual kernel.Interestingly, we do see that gains with lower elevation seem to feature more strongly on the directed weight, likely due to the array for the receiver region being placed closer to the ground.This lower gain is being compensated by the residual weight, which has higher gain for higher elevations.

Conclusion
We introduced a fully adaptive kernel interpolation model for the acoustic transfer function between regions that considers directed and residual reverberant fields.This comprehensive kernel estimation achieves this complete field representation by considering separate kernel models for directed and residual fields.The directed kernel aims to represent wave components with strong directionality, and as such is expressed as a combination of functions that bias the estimation toward particular directions.The residual kernel is meant to represent much less predictable reverberant field components with lower amplitudes, and was represented by a neural network weight.This joint kernel had its parameters chosen by a probabilistic criterion using both Gaussian process regression and a Markov chain Monte Carlo sampler in order to insert uncertainty into the estimation, as well as explore the behavior of the model as the parameters are sampled.This uncertainty-aware estimation outperformed the competing kernel estimations in numerical simulations using a noise simulator for wind noise.

Appendix 1: Derivation of GPR loss
In order to determine the value of the coefficients, we look at the prior distributions: where N C is the complex-valued normal distribution, σ is the estimated standard deviation of the noise.Then, we can apply these considerations to the prior distributions in (10) in order to obtain: Since the logarithm is a monotonically increasing function, we can show that ( 10) is equivalent to: where α0 is the optimal coefficient of the direct compo- nent, χ are the trainable parameters of the kernel weights (e.g.β , γ and θ for the proposed kernel weight), and χ are the optimized kernel parameters.We can then reorganize this criterion by eliminating the constants in order to arrive at: where α is the optimal coefficient vector of the kernel estimation.

Fig. 1
Fig. 1 Example of the region-to-region ATF interpolation problem

Fig. 2
Fig.2Diagram of the architecture of the NN weight w res , with a directional variable r as input and the weight value as output

Fig. 3
Fig. 3 Disposition of source and receiver positions within the room ,m|s e,l ) − ĥ(r e,m |s e,l )

Fig. 5
Fig. 5 Original and estimated pressure distributions at the frequency of 1.65 kHz and T 60 = 0.4 s .The black circle represents the boundary of the target region.a Original.b Uniform.c Sunken sphere.d Residual only.e Directed only.f Proposed (GPR).g Proposed (MCHMC)

Fig. 6
Fig. 6 Distribution of NSE at the frequency of 1.65 kHz .b Uniform.c Sunken sphere.d Residual only.e Directed only.f Proposed (GPR).g Proposed (MCHMC)

1 (Fig. 7
Fig. 7 Weight function gains in dB at the frequency of 1.65 Hz and reverberation time of 0.4 s for Proposed (MCHMC).a Full kernel weight.b Directed kernel weight.c Residual kernel weight