A database of near-field head-related transfer functions based on measurements with a laser spark source

This paper presents a database of near-ﬁeld head-related transfer functions (HRTFs) of an artiﬁcial head, measured at four distances (0.2, 0.3, 0.4 and 0.5 m), with 49 positions recorded at each distance, for a total of 196 measurement points. The HRTFs were recorded using an acoustic pulse created by a laser-induced breakdown of air (LIB), which realizes a close to ideal, massless, monopole sound source. While the LIB produces a high amplitude pressure pulse, the amplitude decays toward low frequencies, which introduced a low frequency limit of about 200Hz for this particular setup. Thus, a spherical head model based on the analytical expression for scattering by a rigid sphere was ﬁtted to the measured data, and used to extend the low frequency range of the measurements. A brief evaluation of the processed dataset was undertaken, considering interaural time and level differences. The measured and processed database, as well as the low frequency extension procedure are made publicly available to support future research into nearby sound localization, and virtual/augmented reality applications. (cid:1) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license


Introduction
Head-related transfer functions, or HRTFs, are the frequencydomain representations of the effect of the human head and torso on the acoustic pressure signals reaching the entrance of the ear canals.HRTFs capture the various acoustical cues (phase, level and spectral changes) from which the brain interprets the perceived location of a sound in space.Thus, the particular features of the HRTFs depend strongly on the angle and the distance of the sound source in relation to the head (and body), as well as the size and shape of the listener's outer ears (pinnae) and head.Measured or simulated sets of HRTFs have been widely used to spatialize sounds in many applications, from entertainment to assistive devices (e.g.[1]).Further away than approximately 1 m, HRTFs are largely distance independent.In the near-field (< 1 m), however, there is a strong effect of distance, due to the increased contributions of head-shadowing and attenuation through the inverse-square law [2].The auralization of near-field sound sources is of particular interest in e.g.virtual reality applications, as persons or objects a user is interacting with are often positioned in the acoustic near field.This paper presents a database of near-field HRTFs, measured close to an artificial head, and using a laser-induced spark as a sound source.
In order to measure a true impulse response or transfer function between a point in space and the ears of the head, the acoustic source should emit a broadband, spherical wavefront of sufficient amplitude, and should not itself reflect sound.The latter point is particularly important when the source is placed close to another, reflecting object, like the head, as the secondary reflections would alter the resulting sound field at the ears.Thus, typical sound sources, like loudspeakers, are not ideal for such a setup, due to their size.Nevertheless, some near-field HRTFs measured using loudspeakers have been published ( [3][4][5]).In this paper, a truly massless acoustic point source, generated using a laser-induced breakdown of air (LIB), was used instead [6].Previous work has shown that the properties of the acoustic sparks produced by LIB are close to that of the ideal point source [6,7].In particular, the LIB source produces a high amplitude spherical wavefront with good reproducibility in terms of directivity and spectrum [7].A downside of the approach is the potential hazard presented by the high-power laser, which precludes using this technique directly on humans, and necessitates employing a head mannequin instead.
The measurement setup presented in this paper has previously been used to validate numerical simulations of near-field HRTFs, where detailed considerations regarding measurement and modeling errors were reported [7].However, only frequencies above about 400 Hz were evaluated in that study.The current paper presents a different set of measurements, and focuses on auralization as an application of the collected dataset, necessitating a consideration of the entire audible frequency range (20 Hz to 20 kHz).
Despite producing a relatively high amplitude pressure pulse, the magnitude spectrum of the LIB pulse decays towards low frequencies at about 20 dB per decade [7, suppl. mat.].Thus, the signal-to-noise ratio (SNR) is likely to be insufficient at the lower end of the audible frequency range, and the limitations of the LIB technique in this regard need to be evaluated.
Problems with low SNR at low frequencies are common to most measurement-based HRTF datasets, and various low-frequency extension (LFE) techniques have been proposed to address them.The LFE is typically accomplished by either an extrapolation of the HRTF magnitude and phase response to low frequencies (e.g.[8,3,4,9]), or by applying a model-based solution, such as a spherical head model [10][11][12].In both cases, the noisy low-frequency portion of the HRTF is partly replaced by a clean, estimated response, with the aim of increasing the SNR.In the current paper, the spherical head model from Duda et al. [10] was fitted to the measured head mannequin, and combined with the measured responses based on the procedure outlined in [12].The procedure was modified slightly in order to obtain a smoother magnitude response in the transition region.

Measurement setup
The measurements were carried out in an anechoic chamber at Aalto University, Finland.The free space inside the chamber was cubical, and measured 4.2 m on each side between the tips of the sound absorbing wedges, which were 80 cm long.The chamber provided anechoic conditions above approximately 125 Hz.
The measurement setup consisted of a pulsed laser source (CFR 400, Quantel laser, Les Ulis, France) and a head mannequin with integrated microphones (GRAS 46DE, GRAS Sound and Vibration, Holte, Denmark) at the ears.The laser was configured such that the LIB would occur at 30 cm distance from the lens assembly, producing a peak pressure pulse of approximately 105 dB SPL at 1 m.The laser was mounted on a vertical linear translator, while the head mannequin was attached to a turntable, as well as a horizontal linear translator, thus allowing the LIB to be positioned at various distances and angles in relation to the head.The source position was defined as the distance r to the midpoint between the ears of the mannequin, the azimuth angle h with 0 in the front and positive sign to the left, and the elevation angle from the horizontal plane u.The setup is illustrated in Fig. 1.
The head mannequin was 3D printed from an optimized 3D scan using a stiff plastic (PA 12) material.The model was acquired by first 3D scanning the head of a human subject using a blue-light scanner.Then, separate 3D scans were taken of casts of each pinna, to provide greater detail and avoid problems with occlusion.The pinna scans were then fused with the head scan, and the resulting mesh underwent some manual cleanup and optimization to provide the final model (see sec.III.B. in [7]).Fig. 2 shows a side view of the final model.
The positioning, the laser, and the data acquisition were controlled by a computer running a LabVIEW (NI, Austin, TX, USA) project.The microphones were connected to a GRAS 12AQ signal conditioner (GRAS Sound and Vibration, Holte, Denmark), and acquired using an NI PXI-5922 oscilloscope board (NI, Austin, TX, USA) at a sampling frequency of 4 MHz.
A total of 196 positions were measured, with 49 angular positions repeated at 4 distances to the midpoint between the ears Fig. 1.Illustration of the measurement setup.The laser assembly was mounted on a vertical linear translator (z axis), while the artificial head was placed on horizontal linear translator (x axis) and a turntable (h azimuth angle).The LIB spark was always generated in the reference plane, which passed through the midpoint of the head.The coordinates ðz; x; hÞ were adjusted to place the LIB in the desired source position ðr; h; uÞ relative to the head.(r ¼ 0:2; 0:3; 0:4; 0:5 m).The list of measured source positions is shown in Table 1.Each position was measured as an average of 100 LIB pulses, with a repetition rate of 3 Hz.Free-field reference measurements (measured at the positions of head mannequin's ears, but without the head) were also collected for each ear at each of the four distances, for a total of 8 measurements.300 LIB pulse repetitions were used for the reference measurements.
For further details regarding the measurement setup and the acoustical properties of the LIB pulse, the reader is referred to the study by Prepelia ˘et al. [7] and their accompanying supplementary material.

Post processing
The recorded reference measurements p ref ðr; tÞ were cropped to a length of 120 ls.The peak pressure generated by the LIB pulse was about 105 dB SPL at 10-20 kHz and 1 m.However, due to the decaying magnitude spectrum towards low frequencies at about 20 dB per decade, the level at 100 Hz was roughly 40 dB below the peak.To derive impulse responses, an inverse filter H inv ðxÞ was constructed from each reference measurement using regularized frequency-domain inversion [13,14], with where P ref ðxÞ denotes the normalized complex spectrum computed from p ref , the superscript Ã indicates the complex conjugate, and bðxÞ is the frequency-dependent regularization coefficient.The coefficient bðxÞ was derived from a target response aðxÞ, defined as a Butterworth bandpass power transfer characteristic, with an order of n ¼ 4, and with low and high cutoff frequencies of x l =2p ¼ 50 Hz and x h =2p ¼ 20 kHz, respectively.The regularization coefficient was then obtained as bðxÞ ¼ 1 aðxÞ resulting in values close to zero in the pass band (i.e.no regularization), with sharply increasing regularization outside the pass band, thus band-limiting the inversion to the frequency range of interest.An example LIB pulse p ref measured at r ¼ 0:2 m, its magnitude response, and the derived inverse filters H inv are shown for both ears in Fig. 3.The inverse filters had a maximum gain of about 35 dB relative to their value at 10 kHz, which effectively corrected the frequency response of the LIB down to about 200 Hz.Results for the other distances were very similar.Thus, a low-frequency limit of 200 Hz was imposed on the measurements with the specific selection of regularization parameters.These settings were selected based on a visual inspection of the magnitude spectra of the LIB measurements and an estimation of the available SNR.For example, the magnitude spectrum of the left-ear reference pulse (Fig. 3, top, solid line) deviates from the logarithmic amplitude decay towards low frequencies from about 200 Hz and below, indicating that noise dominates in that channel below 200 Hz.
To obtain the head-related impulse responses (HRIRs), each measured response at the ears p ear ðr; h; u; tÞ was truncated to a length of 3 ms.The truncation length was chosen considering that the measured head mannequin lacked shoulders and a torso, and thus any reflections arriving later than 3 ms would be from the measurement setup.Then, the inverse filter H inv for the corresponding radius and ear, was applied: h meas ðr; h; u; tÞ ¼ p ear ðr; h; u; tÞ Ã h inv ðr; tÞ; where * represents the convolution operation, and h inv is the impulse response of the inverse filter obtained by the inverse Fourier transform of its complex spectrum H inv .Finally, the HRIRs were downsampled from 4 MHz to 48 kHz, normalized such that the maximum amplitude in the database was at full scale, and exported as a SOFA database [15].

Low-frequency extension
Due to the limited energy of the LIB pulse at low frequencies, it was confirmed that the usable frequency range of the original measurements did not cover the entire audible frequency range.Thus, in order to enable the application of the dataset for auralization, a low frequency extension (LFE) procedure was applied to augment the low-frequency portion of the measured responses.The process consisted of calculating the response of a spherical head model for a given distance and source direction corresponding to each measurement point, matching the modeled and measured responses in time and level, and finally combining them using crossover filters.The spherical-model based approach was chosen as opposed to a simpler extrapolation of the low frequency magnitude and phase responses (as in e.g.[8,9]) for two reasons.First, the process described in this paper provides a smoother transition in the magnitude spectrum if the region where the extrapolation is started is  not flat.Second, the spherical model was also utilized to ensure that the measured responses were consistent with expectations; i.e. to identify any outliers in the dataset that could indicate a measurement or processing error for a particular datapoint.An overview of the LFE process is shown in Fig. 4, and is described in more detail below.It broadly follows the procedure outlined in [12], but with an updated crossover filter and the addition of a phase alignment stage.For each measurement position, the impulse response at the ear positions of a spherical head model, h sim , was calculated using the algorithm described by Duda et al. [10].Based on an initial matching of the modeled vs. measured interaural time differences (ITDs), a head radius of 8.9 cm and ear positions of AE90 azimuth on the equator were selected for the model.The crossover frequency f c was chosen to be 500 Hz, to be well above the lower limiting frequency of 200 Hz in the measured data.
The modeled and measured HRIRs were then compared to derive a level and time difference, in order to align the modeled HRIRs to the measured responses.The level difference was taken as the average difference in the magnitude spectra in a range of AE200,Hz around f c .The time difference was calculated as the difference in time of arrival (TOA) between the modeled and measured HRIRs, upsampled by a factor of 10 and lowpass filtered at 8 kHz, using an 8-th order Butterworth filter.The TOA itself was obtained as the time lag of the maximum of the normalized cross-correlation between the impulse response and its minimum-phase version, following the method by Nam et al. [16].The 8 kHz lowpass filter was applied to exclude features introduced by the smaller details of the pinnae, which were not relevant for the sizing and alignment of the spherical model.For each source position, the modeled HRIRs were scaled and time shifted by the mean difference across the left and right-ear channels, in order to preserve the ILDs and ITDs provided by the model.After the initial time shifting, the remaining phase difference at the crossover frequency was calculated, considering again the average difference in phase e at f c AE 200 Hz.This phase difference was compensated for by introducing the corresponding small time delay Dt ¼ À e 2pf c to the modeled HRIRs.Finally, the modeled and measured responses were combined using crossover filters.The crossover was implemented using 4th-order low and highpass Butterworth filters with f c ¼ 500 Hz, applied using forward-backward (acausal, zero-phase) filtering utilizing the 'filtfilt' function in Matlab.Thus, the effective magnitude responses of the filters were squared, realizing the equivalent of an 8th-order Linkwitz-Riley crossover, but with no phase distortion.Together with the phase alignment described above, this  ensured an in-phase addition of the measured and modeled components, eliminating troughs in the magnitude response around the crossover frequency.The processed HRIRs were then also exported as a database in the SOFA format [15].

Objective evaluation
In order to verify the measurement, model outputs, and fitting, the HRIRs were examined in the time and frequency domains.ITDs were calculated by taking the difference in TOA between the right and left ear impulses, with a positive ITD indicating that the left ear is leading.The ITD calculations again included the 8 kHz lowpass filter, thus providing a wideband ITD estimate up to 8 kHz.ILDs were computed by subtracting the magnitude spectrum of the right ear responses from that of the left ear, and averaging over frequency between 0 Hz and 8 kHz, in order to provide a common analysis range for both ILDs and ITDs.Both ILD and ITD cues have been shown to be frequency dependent (e.g.[2]), but these dependencies are not further considered here, as the measures are applied primarily to evaluate the measurement and postprocessing methods, rather than to explore detailed features of the dataset.Looking at the magnitude responses (bottom panels), the level alignment and crossover provide a smooth transition to a constant magnitude towards low frequencies.It can also be seen that the modeled response lacks the fine notches and peaks present in the measured (and combined) responses above 1 kHz, due to the model lacking pinnae.Fig. 6 displays ITDs (top row) and ILDs (bottom row) computed separately for the measured, modeled, and combined HRIRs, with distance as a parameter, considering source positions in the horizontal plane only.It can be observed that ITDs are not strongly dependent on distance; the ITD curves are virtually identical.There is a slight but consistent increase of the ITD range with decreasing distance, which can also be seen in the dataset by Arend et al. [3] (cf. Fig. 5b in their paper), due to the slight increase in path length to the ears for sources close to the head [2].

Results and discussion
More prominent is the change in ILDs with distance (Fig. 6, bottom row).As is well known (e.g.[2]), maximum ILDs increase with decreasing source distance, and the rate of increase also becomes greater the closer the source gets to the head.Halving the distance from r ¼ 0:4 cm to 0:2 cm, the maximum ILD is increased by about 6 dB, to 26 dB for the measured HRIRs.Again, similar behavior can be seen in [3].It should be noted that the exact ILD values depend on the considered frequency range.Looking at the ILDs computed for the spherical model, the distance dependent ILDs are already reproduced by this simple model, although the range of ILDs is lower than what has been measured for the mannequin.One can also observe a local minimum/maximum at the data points closest to 90 and 270 .These are due to the so-called pressure ''bright spot" that appears around the point on the sphere opposite to where the sound impinges on it (e.g.[10]).The irregular shape of the head and the presence of the pinnae seem to alleviate this effect on the mannequin.Fig. 7 shows a direct comparison of the ITDs and ILDs for measured HRIRs and the spherical model, including the error, given by subtracting the ITD/ILD values computed for the model from those computed for the measurement.This comparison can be used to evaluate whether the model parameters (in terms of size and ear positions) provide a good match to the measured mannequin.
The largest absolute ITD error (top row), was 42 ls for the horizontal positions plotted.Fig. 8 visualizes the ITD error considering all directions.The maximum error between the head mannequin and the spherical model overall was 119 ls (at While this is larger than the just noticeable difference (JND) for ITDs (reported to be as low as 20 ls in anechoic conditions [17]), it is deemed acceptable as the larger errors appear to be outside the horizontal plane, and are most likely a result of the shape of the measured head deviating from a sphere.Thus, the correspondence could only be improved further by considering a more complicated model shape.Regarding ILDs (Fig. 7, bottom row), errors of up to about 9 dB can be observed between the measured HRIRs and the spherical model.This is expected as the additional contributions of the pinnae are missing in the model.Fig. 9 shows a similar comparison, but now for ITDs and ILDs between the measured, and the final, combined dataset with the LFE processing applied.In the horizontal plane, ITD and ILD errors were very small, less than 16.7 ls and 0.23 dB, respectively.Considering all directions (not shown in the figure), the maximum ITD error was 18.8 ls (at h ¼ 270 ; u ¼ À26 ; r ¼ 0:5 m), whereas the maximum ILD error was 0.37 dB (at h ¼ 240 ; u ¼ À26 ; r ¼ 0:5 m), both of which are below the JNDs reported in [17].Thus, the applied LFE processing did not alter the wideband ILDs or ITDs in comparison to the measured data to a degree which is expected to be perceptually significant.
Finally, example HRTFs are presented from the database (with the LFE applied) for two directions in Fig. 10.The left panel shows the responses for a source in front (h ¼ 0 ; u ¼ 0 ), with the rightear responses shifted by 20 dB.Note the difference in the frequency of the first notch (around 10 kHz for the right ear and 8.5-9 kHz for the left ear), due to the natural asymmetry of the pinnae of the head mannequin.The right panel in Fig. 10 shows a position close to the left ear (at h ¼ 90 ; u ¼ 26 ).Comparing the left and right ear responses, the increase in ILD with frequency and with distance, can be observed.On the side contralateral to the source (the right ear), the magnitude spectrum for the source at r ¼ 0:2 m dips below that of the larger distances (between about 4.5 and 8 kHz), demonstrating the increased shadowing of the close source by the head [2,3]).

Conclusions
This paper describes a database of near-field HRTFs measured on an artificial head mannequin.The dataset presented here is to the best of the authors' knowledge unique in providing near-field measurements obtained with a massless monopole sound source.The laser-based technique was found to have a low-frequency limit of about 200 Hz.Therefore, a spherical head model was fitted to the measured data, and used to extrapolate the low frequency response below 200 Hz.The low-frequency extension procedure was evaluated objectively, and was deemed to satisfactorily preserve wideband ITDs and ILDs in the measured data.
The original measurements, the processed responses, as well as the implementation of the model-based LFE procedure have been made publicly available [18].It is hoped that the dataset will provide a useful resource for further analysis of near-field HRTFs, and applications where the auralization of nearby sound sources is desired.

Fig. 2 .
Fig.2.Side view of the final head-model used to print the head mannequin that was measured.

Fig. 4 .
Fig. 4. Overview of process for combining the output of the spherical head model with the measured HRIRs.

Fig. 5 Fig. 6 .Fig. 7 .
Fig.5shows a selected HRIR in the time and frequency domains, illustrating the measured and modeled responses, as well as the

Fig. 8 .Fig. 9 .
Fig. 8. ITD errors visualized for all measurement points between the mannequin and the fitted spherical model.Circle size corresponds to absolute error magnitude.

Fig. 10 .
Fig. 10.Example HRTFs from the final database (with LFE applied) illustrating distance dependence for two directions; directly in front (left panel, responses for the right ear shifted down by 20 dB), and a left side elevated position (right panel).

Table 1
List of the measured source positions (rounded to the nearest angle).