Sound localization latency in normal hearing and simulated unilateral hearing loss

Directing gaze towards auditory events is a natural behavior. In addition to the well-known accuracy of auditory elicited gaze responses for normal binaural listening, their latency is a measure of possible clinical interest and methodological importance. The aim was to develop a clinically feasible method to assess sound localization latency (SLL), and to study SLL as a function of simulated unilateral hearing loss (SUHL) and the relationship with accuracy. Eight healthy and normal-hearing adults (18-40 years) participated in this study. Horizontal gaze responses, recorded by non-invasive corneal reflection eye-tracking, were obtained during azimuthal shifts (24 trials) of a 3-min continuous auditory stimulus. In each trial, a sigmoid function was fitted to gaze samples. Latency was estimated by the abscissa corresponding to 50% of the arctangent amplitude. SLL was defined as the mean latency across trials. SLL was measured in normal-hearing and simulated SUHL conditions (SUHL30 and SUHL43: mean threshold of 30 dB HL and 43 dB HL across 0.5, 1, 2, and 4 kHz). In the normal-hearing condition, the mean ± SD SLL was 280 ± 40 ms (n = 8) with a test-retest SD = 20 ms. A linear mixed model showed a statistically significant effect of listening condition on SLL. The SUHL30 and SUHL43 conditions revealed a mean SLL of 370 ± 49 ms and 540 ± 120 ms, respectively. Repeated measures correlation analysis showed a clear relationship between SLL and the average sound localization accuracy (R2 = 0.94). The rapid and reliable method to obtain SLL may be an important clinical tool for evaluation of binaural processing. Future studies in clinical cohorts are needed to assess whether SLL may reveal information about binaural processing abilities beyond that afforded by sound localization accuracy.


Introduction
The present study was motivated by the clinical need for rapid and language-independent tests of auditory processing of binaural suprathreshold stimuli for all ages. Towards developing such a test, localization latencies in normal-hearing and healthy adults were measured with an objective, rapid, and non-invasive eye-tracking technique. A mathematical function was used to model auditory elicited gaze patterns following an azimuthal sound shift.
Directing gaze towards a sound is a natural response with high survival and communication value (Grothe et al., 2010;Lachs et al., 2001). To localize sounds in the horizontal plane, e.g. an interesting conversation or an event outside the visual field, differences in time (interaural time difference; ITD), and level (interaural level difference; ILD), of the sound reaching the two ears are processed in the central auditory system (Middlebrooks and Green, 1991). Binaural processing is thus the basis for horizontal sound localization, which shows high acuity in humans in the frontal horizontal plane (one degree resolution for targets around midline (Mills, 1958)). While previous laboratory-oriented studies have revealed typical latencies for auditory evoked saccades and their relation to accuracy (e.g. Frens and Van Opstal, 1995), the clinical availability of techniques for measuring auditory evoked saccades is limited.
Potentially, latencies could reveal diagnostic information beyond accuracy (see for example Pichora-Fuller et al. (2016) for a discussion about latency and listening effort). It is also possible that adaption to a condition occurs in one of the measures but not the other. For example, individuals with unilateral hearing loss show an adaptation to their condition, sometimes resulting in high localization accuracy (Kumpik and King, 2019). Latencies for these individuals may still be atypical.
Processing of binaural cues occurs at several levels of the brainstem. The suggested neuronal organization for ITD is auditory afferent, cochlear nucleus and medial superior olive (MSO) (Laumen et al., 2016;O'Neil et al., 2011), while the neuronal organization for ILD is auditory afferent, cochlear nucleus, lateral superior olive (LSO) (Laumen et al., 2016;O'Neil et al., 2011), and an inhibitory contralateral path via the trapezoid body. The determination of horizontal sound location in primates is suggested to take place at superior colliculus, where auditory signals are translated into a reference frame roughly similar to that for vision Groh, 2014, 2012). For instantaneous spatial shifts, the computation of stimulus location for visual and auditory stimuli in the frontal horizontal plane is performed by the same population of superior colliculus neurons using different spatial codes (Lee and Groh, 2014). The neuronal firing rate resembles a topographical map for visual stimuli, whereas auditory objects are coded according to an overall neural firing rate (Lee and Groh, 2014).
Despite these differences in neural coding, the saccadic response latencies towards auditory and visual targets are similar (Caruso et al., 2016;Zambarbieri, 2002). The auditory saccadic response latencies are typically on the order of 200e400 ms (Frens and Van Opstal, 1995;Populin et al., 2002;Ten Brink et al., 2014;Zahn et al., 1978;Zambarbieri, 2002). Auditory evoked eye saccadic response latencies are not dependent on azimuth above 10e15 from gaze midline (restrained head: Frens and Van Opstal, 1995;Zambarbieri, 2002) (unrestrained head: Goldring et al., 1996, unlike auditory saccadic head latencies that show an eccentricity effect above 15 (Ausili et al., 2019). Stimulus characteristics, in terms of frequency (two-octave wide noise bursts at 0.75, 1.5, 3, and 6 kHz), or intensity (65 and 85 dB SPL), is reported to have limited effect on the eye saccade latency, as demonstrated in a small sample of human subjects (n ¼ 4, Zahn et al., 1978). However, the sequence of target presentation and the measurement paradigm affect saccadic latencies (see Leigh and Kennard (2004) for an overview) why it is crucial to use the same task when comparing normative saccadic behavior with saccadic behavior in patients. Furthermore, an accuracy-latency tradeoff, where increased time spent on motor planning will increase the accuracy of the movement (see for example Plamondon and Alimi (1997) for an overview), does not seem to occur for saccades towards visual stimuli (Wu et al., 2010).
Methods for the determination of the saccadic response latency towards auditory targets include electro-oculography with skin electrodes (e.g. Goldring et al., 1996;Zambarbieri, 2002) or insertion of scleral coils (e.g. Frens and Van Opstal, 1995;Populin, 2008;Populin et al., 2002;Yao and Peck, 1997), often restraining the head of the subject (Frens and Van Opstal, 1995;Zahn et al., 1978). For the head-unrestrained condition, the head often starts to turn simultaneously or soon after for contribution to saccadic amplitude and realignment of eye and head (Goldring et al., 1996;Goossens and Opstal, 1997). With the introduction of corneal reflection eyetracking technology, eye-movements towards spatially distributed events became available without medical procedures. For auditory events, this was initially done with heads restrained (Ten Brink et al., 2014). Modern corneal reflection eye-tracking however, has a potential for clinical use given not only its non-invasiveness but also freedom of head-movements (Asp et al., 2016). While freedom of head-movements may dynamically change ITD and ILD cues, thereby affecting experimental control, it allows the study of sound localization behavior in infants and young children for whom the assessment of auditory development is important in cases of early diagnosed hearing impairment.
Plugging one human ear to "monauralize" a listener is known to decrease sound localization accuracy for various response methods (head-fixed laser pointer: Agterberg et al., 2012;touch screen: Irving and Moore, 2011;electromagnetic head orientation: Slattery and Middlebrooks, 1994) including eye gaze (Asp et al., 2018). To our knowledge, it is unknown how sound localization latency measured by gaze response latency is affected. It is known from experiments with visual targets that the latency increases with task difficulty (Warren et al., 2013;Wheeless et al., 1967). For auditory targets, we are only aware of a single study examining effects of task difficulty on saccadic response latencies in humans, demonstrating some 30 ms increase in the latency when visual distractors were present (Ten Brink et al., 2014).
Two levels of simulated unilateral hearing loss (SUHL) were induced in this study, to investigate the acute effect on localization latency. While an acute peripheral SUHL is not the same as longstanding, unilateral hearing loss during development deprives binaural hearing (Kaplan et al., 2016). For example, neurons sprout to the hearing side at MSO following experimentally induced unilateral hearing loss at birth, as shown in rats (Feng and Rogowski, 1980). This study revealed a large and statistically significant prolongation of sound localization latency (SLL) for azimuthal shifts of a broad-band sound after a SUHL in normal healthy adults and a distinct repeated measures correlation with sound localization accuracy across SUHL conditions.

Study design
Sound localization latency was estimated in an objective way by Simulated unilateral hearing loss with an average hearing threshold of 43 dB HL across 0.5, 1, 2, and 4 kHz fitting a mathematical sigmoid function (arctangent) to eye gaze responses evoked by instantaneous shifts of a broad-band sound in the frontal horizontal plane (Fig. 1). Listening conditions included normal binaural conditions (test and retest; retest only used for reliability analysis) and two levels of simulated unilateral hearing loss. The order of listening conditions was randomized. A linear mixed model was used to study the acute effect of simulated UHL on SLL in a within-subject repeated measures design. This study was part of another study and the data presented here are obtained from the same measurements and subjects (Asp et al., 2018). It was approved by the regional ethical review board in Stockholm, Sweden (2013/104-31). Written informed consent was obtained and the research complied to the ethical principles stated in the declaration of Helsinki.

Setup
The setup consisted of an auditory-visual stimulus system and an eye-tracking system (Smart Eye Pro; Smart Eye AB, Gothenburg, Sweden), described in detail elsewhere (Asp et al., 2016). The auditory-visual stimulus system comprised a personal computer (Dell Latitude E5520; Dell Inc., TX) with Windows 7, equipped with a multichannel external soundcard (AudioFire 12; Echo Audio Corporation, CA) and two external multichannel video mixers (VP-108; Kramer Electronics, Israel). The outputs from the soundcard were connected to 12 loudspeakers positioned in 10 increments in a 110-degree arc (±55 azimuth) with 1.2-m radius, and the outputs from the video mixers were connected to 7" TFT-displays mounted below each loudspeaker (loudspeaker/display-pairs; LD-pairs), feeding them with a video graphics array signal from the computer. Subjects were seated 1.2 m from the LD-pairs, which were vertically adjusted to align the center of the loudspeakers to ear-level individually. Subjects were free to move their head. Gaze alone was registered which combine head and eye-movements. The head angle was not necessarily aligned with eye-ball direction and not necessarily directed towards the current display at the start of a trial.
Each LD-pair was defined as an Area of Interest (AOI) in a threedimensional coordinate system in the eye-tracking system (Asp et al., 2016). Objective detection of eye gaze intersection with the AOIs was performed at 20 Hz, resulting in 12 possible gaze direction samples (À55 to þ55 in 10 increments). The discreet visual target space motivated the use of gaze intersected AOI rather than raw gaze angle. These samples were transmitted from the eye-tracking system through a low latency network connection (User Datagram Protocol) to the audio-visual presentation system together with a time stamp. Synchronization between the eye-tracker and the audio-visual presentation systems was performed offline by adjusting the minimum difference between the time stamps from eye-tracker and the audio-visual presentation system to be 1 ms.
The constant system delay of 20 ms was compensated for (the sum of the buffer size of the sound card (5.8 ms) and the delay of the eye-tracking system (14 ms)).
For each sound localization test, the following variables were stored for subsequent offline analysis (see 10.6084/m9.figshare.7856909): presented AOI (À55 to þ55 azimuth), gaze intersected AOI (À55 to þ55 azimuth), time stamps in the eyetracking system and audio-visual presentation systems, respectively before off-line synchronization, and the delay estimated by the eye-tracking system.

Stimulus
An ongoing auditory-visual stimulus (a colorful cartoon movie with an accompanying melody) was presented at 63 dB(A), as measured at the position of the subject's head. The auditory stimulus was filtered to achieve a long-term frequency spectrum corresponding to the unmodulated noise in the Hagerman sentence recognition test (equal energy in 1/3-octave bands (Hagerman, 1982)), long-term spectrum and an excerpt of the stimulus spectrogram is found in supplements Figure S1. The sound was ramped off with a 50 ms raised cosine before change in azimuth and immediately ramped on again after the shift without any other alterations to the sound.

Test procedure
The subject watched the 3-min cartoon starting at the loudspeaker/display-pair at À5 azimuth (i.e. just to the left of midline). During these 3 min 24 azimuthal sound shif0074s were elicited (trials) with gaps in the visual part of the stimulus, see Asp et al. (2016) for details. The mean inter-trial gap was 7 s. In each trial, the sound shifted instantaneously from the loudspeaker in the current LD-pair (the target in the previous trial) to another randomly assigned loudspeaker (target). The visual stimulus stopped 170 ms before the azimuthal sound shift and was reintroduced 1.6 s after the azimuthal sound shift on the target loudspeaker/display-pair. The subjects were not asked to perform the task quickly but instructed to follow the auditory-visual stimulus and that the visual stimulus would temporarily pause as the auditory stimulus changed azimuth.

Objective determination of sound localization latency using an arctangent function
Visual inspection of gaze intersected AOI samples motivated the fitting of an arctangent function as a model of eye movement during an azimuthal sound shift (Fig. 1). The analysis window was 4.1 s (82 samples), starting 2.5 s (50 samples) before each azimuthal sound shift which allowed enough samples for a subsequent optimization of the fit (see below). The endpoint of the analysis window coincided with the reintroduction of the visual stimulus, i.e. 1.6 s (32 samples) after the azimuthal sound shift.
The period for the transition of eye gaze from one sound source to another coincides with the period for distinctly increased motor neuron firing rate of the eye muscles (cf. Sparks, 2002, Fig. 1). Accordingly, the abscissa corresponding to 50% of the amplitude of the fitted arctangent function was defined as the latency T in each trial (Fig. 1), i.e. halfway through the localization response.
SLL for each subject was defined as the mean of the T-values for each test ( 24 trials per subject and condition).

Optimization algorithm and constraints
The following formula was used to fit the samples in each trial: where a 1 and a 2 ( ) are continuous variables (À55 a 1;2 þ55 ) corresponding to the gaze intersected AOI before and after the azimuthal sound shift. The slope parameter c (s À1 ) is a measure of the speed and eccentricity of the trace (0 c 130), t (s) is the time and T (s) is the latency for each trial (T ! 0). Non-linear constrained optimization minimized the squared error between Equation (1) and the samples and returned optimal a 1 , a 2 c, and T for each trial (analysis window: 4.1 s/82 samples). The fitting of the function was done with open source package SciPy (version 0.16.1, http:// www.scipy.org/), Python programming language (version 2.7 available at https://www.python.org/, Python Software Foundation) and the optimization was performed with the bound constraints version of the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm (Byrd et al., 1995).

Exclusion of trials
Trials with T ¼ 0 ms were excluded due to alleged failure in the optimization. Trials with T > 1.6 s, i.e. longer than the sound-only period, were also excluded.

Normal and simulated UHL listening conditions
The subjects participated in one normal-hearing (NH) and two simulated UHL conditions. Unilateral attenuation was achieved by an earplug in the right ear (EAR Classic foam ear plug, 3M, Minneapolis, USA) and a combination of an ear plug and a circum-aural hearing protector (Bilsom 847 NST II) placed over the ear plug (Asp et al., 2018). The left ear plug and hearing protector were removed during sound localization tests. The simulated UHL was estimated by hearing thresholds with bilateral attenuation according to standard (ISO 4869-1, 2018). Hearing thresholds were measured by frequency-modulated tones in sound field in an audiometric test room using a fixed-frequency B ek esy technique (Asp et al., 2018;Berninger et al., 2014). The mean ± SD simulated UHL (the average threshold of 0.5, 1, 2, and 4 kHz in dB HL) obtained with earplugs was 30 ± 5.9 dB HL, hence denoted SUHL 30 . The combination of earplugs and hearing protectors yielded a mean ± SD SUHL of 43 ± 4.7 dB HL, hence denoted SUHL 43 , see supplementary material in Asp et al. (2018) for individual hearing thresholds.

Linear mixed modelling and statistical analyses
The statistical software used was R version 3.4.2 (R Foundation for Statistical Computing, Austria). The T-values in the normalhearing condition was modeled by the effects of separation between loudspeakers (10e110 ), azimuth after shift (À55 d55 ), and time elapsed during test (trial number 1d24). Due to the presence of repeated measures, we used linear mixed modeling (package lme4 version 1.1.21) with subject as random intercept. The fixed effect coefficients were evaluated by p-values with Satterthwaite's method for estimating the degrees of freedom (package lmerTest version 3.1.0 (Kuznetsova et al., 2017)) and the model fit by R 2 (marginal generalized linear mixed modeling R 2 from package MuMIn version 1.43.6).
Linear mixed modelling was further used to study if T-values could be predicted by SUHL condition and further the hearing thresholds (0.5, 1, 2, 3, 4, and 6 kHz) with random intercept for the subjects. The relationships between localization accuracy (defined as the absolute difference between presented AOI and the responded gaze intersected AOI in degrees), latency (T-values), and SUHL threshold were investigated by means of repeated measures correlation (package rmcorr version 0.3.0). Since SUHL was assumed to be constant throughout a test (as measured before the test) we aggregated the latency (i.e. average latency), and accuracy per test. The accuracy was quantified as an Error Index (EI), which is a normalized mean absolute error where zero is perfect localization and 1.0 corresponds to random performance, see Asp et al. (2018) for a formal definition. William's test (package psych version 1.8.12) was used to study the difference between two dependent correlations with one parameter in common.
Differences in the number of included trials and the goodness of the arctangent fit between the conditions were quantified by means of repeated measures ANOVA or Kruskal-Wallis where appropriate.

Sound localization latency in the normal-hearing condition
The mean ± SD SLL was 280 ± 40 ms (n ¼ 8) (Table 1). No learning effect was observed (test e retest mean ± SD ¼ À10 ± 20 ms). The 95% confidence interval of a single SLL was ±26 ms ( SDðtestÀretestÞ ffiffi ffi 2 p ,1:96). The intrasubject variability in Tvalues (average SD ¼ 91 ms, n ¼ 8 subjects) was more than twice as large as the SD in SLL.

Fitting of arctangent function
Gaze intersected AOI samples and corresponding arctangent functions are depicted for trials 1, 12, and 24 for each subject in Fig. 2. Nearly all trials (23 of 24 on average) could be modeled by an arctangent function (Table 1)

Relationship between sound localization latency and hearing threshold in the experimental ear
SLL across subjects and conditions for each frequency (Fig. 3) revealed a prolongation of SLL as a function of increasing hearing threshold. This was confirmed by adding the hearing thresholds as fixed effects (c 2 (6) ¼ 26.4, p < 0.0001), which demonstrated a significant effect of the hearing threshold at 1 kHz on T-values (p < 0.05, Table 2). When the non-significant effects (Table 2) were removed from the model, the threshold at 1 kHz predicted that Tvalues increased by 7.4 ms/dB (t(563) ¼ 14.6, p < 0.0001) and R 2 GLMM(M) ¼ 0.27 (p < 0.0001).

Relationship between sound localization latency and accuracy
There was no correlation between localization accuracy and Tvalues within a test (repeated measures R 2 ¼ 0.002, 0.0004, and 0.003 with p's > 0.5 for all three conditions), suggesting no accuracy/latency tradeoff. However, when taking all conditions into consideration a statistically significant relationship between T-values and sound localization accuracy across listening conditions existed (repeated measures R 2 ¼ 0.13, p < 0.0001), indicating prolonged T-values and decreasing localization accuracy as the SUHL thresholds increased. The relationship between sound localization latency and accuracy was further studied for the overall latency and accuracy measures (SLL versus EI, Fig. 4). The repeated measures R 2 for SLL versus EI was 0.94 (p < 0.0001). This value was statistically significantly different from the repeated measures R 2 for Error Index versus threshold at 1 kHz (0.77, p < 0.0001), and for SLL versus threshold at 1 kHz (0.82, p < 0.0001) (William's test, t(24) ¼ 2.5, p ¼ 0.02 for both comparisons). Fig. 5 shows trials 1, 12, and 24 for each subject in SUHL 30 and SUHL 43 , respectively.

Fitting of arctangent function
The mean number of trials included in the analysis was 23 of 24 in the SUHL conditions, i.e., similar to the NH condition (Table 1).
Visual inspection of all trials indicated that most exclusions were related to eye-blinks during the analysis window, or unstable eye-tracking, e.g. gaze was shifting between AOIs faster than what is physically possible for saccades, see for example supplementary Figure S3, subject (column) 4, trial 21. No significant difference in number of included trials existed between the 3 listening conditions (Repeated measures ANOVA F(2,14) ¼ 0.099, p ¼ 0.9). The variability (IQR) in T increased slightly in comparison to the NH condition (SUHL 30 range: 88e270 ms and SUHL 43 range: 130e330 ms). Arctangent functions for all trials (24 trials Â 8 subjects Â 2 conditions ¼ 384) are shown in Supplementary  Figures S4-S7 available online.

Root mean square error of arctangent fit
The median ± IQR root mean square error of fit (RMSE) of the arctangent fits was 2.8 ± 2.1 , 3.3 ± 2.2 , and 4.0 ± 2.2 in conditions NH, SUHL 30 , and SUHL 43 , respectively. There was a statistically significant difference in RMSE between the three conditions [Kruskal Wallis c 2 (2) ¼ 44.8, p < 0.0001].
To investigate whether the RMSE of the arctangent fit in each trial affected the SLL, a reanalysis of a subset of trials based on two additional exclusion criteria was done. Out of the previously analyzed trials (n ¼ 558), trials with less than 50% of gaze intersected AOI samples before or after the loudspeaker shift within the analysis window, and trials where the RMSE of the arctangent fit was larger than 7 were excluded. The additional exclusion criteria were chosen after visual inspection of the traces and histograms of percentage loss of samples and RMSE. When the gaze moved in the opposite direction of the stimulus and stayed there, the model fitted to the actual data points regardless of the true direction of the stimulus, see for example supplementary Figure S6, subject 8, trial 1. When the gaze moved in the opposite direction and then later, but within 1.6 s, moved in the other direction, the model still tried to fit an arctangent, but the trial would then be excluded by the additional exclusion criteria on RMSE, see for example supplementary Figure S8, subject 8, trial 21. By these exclusion criteria the tails of the RMSE histograms were cut. Loss of eyetracking samples accounted for an exclusion of 10% of trials (in all conditions) and the RMSE criterion added an additional exclusion of 0.5%, 5%, and 9% of trials in conditions NH, SUHL 30 , and SUHL 43 respectively. This resulted in a subset containing 483 trials. The additional exclusion criteria did not change the SLL or the variability significantly (mean ± SD SLL in NH, SUHL 30 , and SUHL 43 : 280 ± 43 ms, 360 ± 63 ms, and 560 ± 130 ms, paired t-test of SLL in all conditions: t(31) ¼ -0.36, p ¼ 0.72; cf. Table 1. Out of the 24 SLL, (8 subjects Â 3 conditions), 7 means were the same in the original analysis as in the subset. The maximum deviation between the original analysis and the subset in SLL was 9% (subject 4 in SUHL 43 and subject 8 in NH condition). In summary, additional exclusion criteria resulted in similar SLL across subjects and condition.

Discussion
This study demonstrated the feasibility of a rapid and objective eye-tracking method to assess SLL in the frontal horizontal plane in normal hearing subjects. A sigmoid function was used as a mathematical model to reflect auditory elicited gaze patterns following azimuthal sound shifts. The mathematical approach allowed the fitting of eye gaze data with low errors. The median RMSE ranged from 2.8 in normal binaural conditions to 4.0 in a simulated mildto-moderate UHL condition. SLL increased as a function of simulated UHL, from 280 ms in the NH condition to 540 ms in the SUHL 43 condition.
In studies with subjects' heads restrained, sound localization latencies ranged between 182 and 300 ms (Frens and Van Opstal, 1995;Shafiq et al., 1998;Zahn et al., 1978). Two studies that allowed unrestrained heads of the subjects reported latencies between approximately 120 ms (n ¼ 2, several hundred trials, Goldring et al., 1996) and 320 ms (n ¼ 5, Populin et al., 2002). In comparison, the present study revealed a mean latency of 280 ms, that is, within the range of reported latencies obtained with heads unrestrained. The lack of effect of separation between loudspeakers on T-values is also consistent with previous studies where no effect of eccentricity was found above 15 (head restrained) (Frens and Van Opstal, 1995;Zambarbieri, 2002). The reaction time of the neck muscles is rarely faster than the eye saccade (Goldring et al., 1996;Goossens and Opstal, 1997) why latencies are expected to be similar with the head unrestrained. The current setup might, however, have allowed for longer processing of auditory cues due to freedom om head movements and a continuous stimulus.
It is known from the visual domain that the otherwise common accuracy/latency tradeoff, where increased time of motion planning increases endpoint accuracy, does not apply to saccadic behavior to visual target sequences (Wu et al., 2010). This is consistent with our findings considering the performance in each trial within a listening condition, that is, no accuracy/latency tradeoff occurred. However, as SUHL became larger there was indeed an increase in latency as well as in accuracy (cf. Fig. 4). The effect of the SUHL on accuracy in the present study was not as drastic as in previous studies (for a review see Kumpik and King, 2019). The main reason for this difference is probably related to the measurement paradigm. Unlike previous studies where subjects were instructed to face 0 azimuth between presentations of sounds of short duration, the subjects in the present study were allowed to move their heads freely throughout the test while localizing a continuous sound that changed azimuth at random time intervals. The 1.6 s sound-only period likely allowed subjects to listen to the stimulus at more than one head angle with respect to the presented azimuth before the final eye gaze position. As such, monaural head-shadow cues probably were more salient compared to cues for a brief stimulus and could have increased accuracy. Furthermore, the visibility of the twelve possible targets might have increased accuracy and decreased latency compared to other setups. However, the use of gaze intersected AOI rather than raw gaze might have decreased the RMSE but should have limited effect on the SLL measure.
The statistically significant relationship between mean latency and accuracy across conditions suggests that increasing the difficulty of an auditory target task (here achieved by unilateral ear plugging) results in increased uncertainty in the integration of available localization cues. This uncertainty leads to a decrease in accuracy and a corresponding prolongation of the processing in the non-auditory centres involved in the determination of gaze movement. For example, perfect accuracy (EI ¼ 0) corresponded to a latency of 270 ms, whereas poor accuracy (e.g. EI ¼ 0.8) corresponded to a latency of 670 ms. The repeated measures correlation, which accounted for 94% of the variance in SLL, predicted that the mean EI in the SUHL 43 condition (EI ¼ 0.54, cf. Asp et al., 2018) corresponded to a SLL of 540 ms, which means twice the processing time compared to the normal condition (280 ms; delay in the motor pathway ¼ 20 ms, Fuchs et al., 1985;Zambarbieri, 2002). In support of this finding, a corresponding statistically significant relationship has been found with the effect of decreased target luminance on visually evoked saccadic latency and accuracy in the frontal horizontal plane (e.g. Warren et al., 2013;Wheeless et al., 1967). Also, head saccades show a substantial increase in latency when one ear received simulated cochlear implant stimulation (Ausili et al., 2019). The strong relation between latency and accuracy as a function of increasing task difficulty (i.e. degree of SUHL) remains to be studied in clinical cohorts. It is for example demonstrated that some listeners with permanent UHL of various degrees may achieve relatively high localization accuracy (Firszt et al., 2017;Johansson et al., 2019;Slattery and Middlebrooks, 1994) and daily training with a unilateral ear plug in normal-hearing subjects results in increased accuracy by a reweighting of available localization cues (e.g. spectral cues, loudness cues and binaural cues) (Kumpik et al., 2010). Accuracy and latency of sound localization responses might thus be affected differently than in the acute SUHL conditions in the present study. Hence, the clinical value of SLL remains to be determined, but the importance of a combination of latency and accuracy measures has been pointed out previously (Fitts, 1966;Wickelgren, 1977).
One possible source of increased latency in the SUHL-conditions is the processing of the asymmetric binaural input at the level of the brainstem. There is a large disparity between maturation of central auditory pathways at the cortical level (Sharma et al., 2002) and maturation of sound localization abilities (Asp et al., 2016(Asp et al., , 2011Kumpik and King, 2019;van Deun et al., 2009). Sound localization accuracy matures during the first few years of life (Asp et al., 2016) in a quite similar way as the maturation of the auditory brainstem response (ABR) wave IeV latency (Eggermont and Salamy, 1988), whereas latencies of cortical auditory-evoked potentials decreases into adulthood. In addition, it has been shown that the ABR IeV latency is closely related to sound localization accuracy in children with unilateral hearing loss (Johansson et al., 2019). Thus, the auditory brainstem seems to have an important role for sound localization ability (Grothe et al., 2010) but the development of SLL from early childhood to adolescence has not yet been investigated.
From a physiological point of view, it is interesting to observe the similarity in latency, dynamic behavior, and overlap of anatomical structures between the acoustic middle ear reflex and SLL. At reflex threshold, the latency of the acoustic middle ear reflex for a 500 Hz pure tone is 240 ms, decreasing relatively rapidly to 120 ms 10 dB above threshold, followed by a slower decrease to 100 ms 25 dB above threshold (abscissa corresponding to 50% amplitude) (see Borg, 1982, Fig. 1). These dynamic characteristics correspond to latency changes of 12 ms/dB for the first 10 dB above threshold, followed by a decrease of 1.3 ms/dB between 10 and 25 dB above threshold. In the present study, the increase in SLL as a function of simulated UHL threshold was 7.4 ms/dB at 1 kHz, that is, in the range of latency changes of the acoustic middle ear reflex in response to varying stimulus level. The similarity in latencies and dynamic characteristics might be due to the overlap of the anatomical structures for the stapedius reflex activity and horizontal sound localization. The neural organization of the acoustic reflex response is 1) auditory afferent, 2) cells in ventral cochlear nucleus with axons in trapezoid body, 3) interneurons in MSO leading to 4) stapedius motoneurons in the facial motor nucleus (Borg, 1973). Correspondingly, for the ITD: 1) auditory afferent, 2) cochlear nucleus (spherical bushy cells), 3) and MSO (Laumen et al., 2016) and for ILD: 1) auditory afferent, 2) cochlear nucleus (spherical bushy cells), 3) LSO, and 4) an inhibitory contralateral path via globular bushy cells and medial nucleus of the trapezoid body (Laumen et al., 2016). The relation between the ABR IeV latency and sound localization accuracy (Johansson et al., 2019), and the similarity in latencies and dynamic behavior between the acoustic middle ear reflex and sound localization responses, indicates that the brainstem has an important role in the processing of auditory spatial information.
While the present study did not quantify the ITD imposed by the simulated UHL, the ear plug used in the present study introduces an ITD (Kumpik et al., 2010). ITDs differing from zero are shown to increase latencies of electrophysiologically recorded postauricular muscle reflexes (Doubell et al., 2018) as well as brainstem responses (Laumen et al., 2016). Furthermore, Polyakov and Pratt (2003) has shown that ABR latencies are sensitive to spatial cues. It seems likely, therefore, that the SLL was affected by ITD besides the large ILD introduced in the simulated UHL conditions, rather than an overall decrease in audibility of the stimulus. Gabriel et al. (2010) showed that a decrease in presentation level from 75 to 55 dB(A) did not affect localization latency, further indicating that overall audibility does not change localization latency.
The corneal reflection eye-tracking technique for the assessment of SLL presented in this study may, though currently costly, be clinically feasible due to the low testing time (~3 min), high testretest reliability, and objective evaluation of auditory evoked saccadic responses. Horizontal sound localization accuracy has been assessed from 6 months of age with the same technique as in the present study (Asp et al., 2016). Hence, SLL may possibly be determined in infants and young children (Ekl€ of, Asp, & Berninger, presentation at ARO, Association for Research in Otolaryngology, Mid-Winter Meeting, Baltimore, MD, 2019). It remains to be confirmed in future studies in clinical cohorts whether the SLL measure may add information in the assessment of sound localization accuracy, and thereby enhance evaluation of binaural hearing in young ages, possibly at the level of the brainstem.

Conclusions
Sound localization latency was successfully determined by fitting an arctangent function to eye-gaze patterns evoked by azimuthal sound shifts in the frontal horizontal plane. The latency increased significantly following simulated UHL. A corresponding increase in sound localization accuracy occurred, resulting in a close relationship between SLL and sound localization accuracy across listening conditions.

Funding
This work was supported by Karolinska University Hospital and the regional agreement on medical training and clinical research (ALF) between Stockholm County Council, Karolinska Institutet, and STAF (Swedish Technical Audiological Society), and Tysta Skolan foundation. The study was presented as a poster during the 40th and 41st Midwinter Research Meetings, Association for Research in Otolaryngology, Baltimore, MD, 2017 and San Diego, CA, 2018.

Declaration of competing interests
The authors do not have any conflicting interests to declare.