Introduction

The dominant low-frequency localization cue in humans is the interaural time difference (ITD) due to small acoustic path length differences between the ears (Wightman and Kistler 1992). Jeffress (1948) suggested that ITDs are converted into a place representation by delay lines from each ear feeding coincidence detector neurons that fire maximally when input spikes arrive simultaneously. Although there is doubt about the complete validity of this model, particularly the existence of the delay line (Batra and Yin 2004; Brand et al. 2002; McAlpine and Grothe 2003; McAlpine et al. 2001; Palmer 2004), there is plentiful evidence that neurons in the medial superior olive (MSO; Yin and Chan 1990), reflected in the inferior colliculus (IC; e.g., Batra et al. 1997; Yin et al. 1987) do, indeed, act like coincidence detectors.

This coincidence detection is similar to cross-correlation, which is the dominant psychophysical model of ITD processing and binaural masking as shown below (e.g., Colburn 1977, 1996; Shackleton et al. 1992; Stern and Colburn 1978). If l(t) and r(t) are the probabilities that a spike will occur on inputs to the coincidence detector from the left and right ears, respectively, then a coincidence detector will fire if both occur simultaneously, with a probability p(t) = l(t)r(t). Ignoring the exponential term, which represents a temporal integration window, this is the same as the term under the integral in the running cross-correlation function in most models of binaural hearing (e.g., Shackleton et al. 1992):

$$ \rho {\left( {t,\tau } \right)} = {\int\limits_{ - \infty }^t {l{\left( {t\prime } \right)}r{\left( {t\prime - \tau } \right)}{\text{e}}^{{{{\left( {t - t\prime } \right)}} \mathord{\left/ {\vphantom {{{\left( {t - t\prime } \right)}} \Omega }} \right. \kern-\nulldelimiterspace} \Omega }} {\text{d}}t\prime } }, $$

where τ is the internal delay and Ω is the time constant of the temporal integration.

Although the ITD sensitivity of neurons in the MSO and IC have been extensively studied in the mammal, there have been few studies of interaural correlation sensitivity, mainly limited to correlations of −1, 0, and 1, or with large steps in correlation (e.g., Joris 2003; Palmer et al. 1999; Yin et al. 1987; Yin and Chan 1990). The only published reports of sensitivity to fine-scale variation in interaural correlation, are in the Barn Owl inferior colliculus and optic tectum (Albeck and Konishi 1995; Saberi et al. 1998).

In this paper we present measurements of the interaural correlation sensitivity of neurons in the IC of guinea pigs. In order to estimate the probability distribution of spike counts, we collected responses to a large number of stimulus repeats. Receiver operating characteristic (ROC) analysis was then used to estimate the neural interaural correlation discrimination thresholds (Bradley et al. 1987; Cohn et al. 1975; Green and Swets 1974; Shackleton et al. 2003).

Methods and stimuli

Physiological preparation

Recordings were made in the right IC of 36 pigmented guinea pigs weighing 342–779 g. The experiments were organized into two groups. In an initial group, 29 animals were used and data from 54neurons analyzed; in most of these experiments data were also collected for other purposes. In asecond group of 7 animals, data from 30 neurons were analyzed.

Animals were anaesthetized with urethane (1.3 g/kg i.p., in 20% solution in 0.9% saline) and Hypnorm (Janssen; 0.2 ml i.m., comprising fentanyl citrate 0.315 mg/ml and fluanisone 10 mg/ml). To prevent bronchial secretions, atropine sulfate (0.06 mg/kg s.c.) was administered at the start of the experiment. Anesthesia was supplemented with further doses of Hypnorm (0.2 ml i.m.), on indication by pedal withdrawal reflex. A tracheotomy was performed, and core temperature was maintained at 38°C via a heating blanket and rectal probe. The animals were placed inside a sound attenuating room in a stereotaxic frame in which hollow plastic speculae replaced the ear bars to allow sound presentation and direct visualization of the tympanic membrane. A craniotomy was performed over the position of the IC. The dura was reflected and the surface of the brain covered by a solution of 1.5% agar in 0.9% saline. Heart rate was monitored using a pair of electrodes inserted into the skin to either side of the animal's thorax. In early experiments respiratory rate was monitored by means of a fine polythene tube inserted into the tracheal cannula connected to a low-pressure transducer. In later experiments animals were artificially ventilated with pure oxygen via the tracheal cannula. The depth of ventilation was controlled to keep the end-tidal CO2 between 24 and 36 mmHg. All experiments were carried out in accordance with the UK Animal (Scientific Procedures) Act of 1986.

Recordings were made from single, well-isolated neurons, with glass-insulated tungsten electrodes (Bullock et al. 1988) advanced into the inferior colliculus through the intact cortex, in a vertical penetration, by a piezoelectric motor (Burleigh Inchworm IW-700/710). Extracellular action potentials were amplified (Axoprobe 1A; Axon Instruments, Foster City, CA, USA), filtered between 300 Hz and 2 kHz, discriminated using a level-crossing detector (SD1; Tucker-Davies Technologies, Alachua, FL), and their time of occurrence was recorded with a resolution of 1 μs.

Stimulus generation

Stimuli were delivered to each ear through sealed acoustic systems comprising custom-modified Radioshack 40-1377 tweeters joined via a conical section toa damped 2.5-mm-diameter, 34-mm-long tube (M.Ravicz, Eaton Peabody Laboratory, Boston, MA, USA), which fitted into the hollow speculum. The output was calibrated a few millimeters from the tympanic membrane using a Brüel and Kjær 4134 microphone fitted with a calibrated 1-mm probe tube.

All stimuli were digitally synthesized (System II, Tucker-Davies Technologies) at between 100 and 200 kHz sampling rates and were output through a waveform reconstruction filter set at one fourth the sampling rate (135 dB/octave elliptic: Kemo 1608/500/01 modules supported by custom electronics). Ifnot otherwise stated, stimuli were of 50-ms duration at 20 dB above the threshold for that stimulus, switched on and off simultaneously in the two ears with cosine-squared gates with 2 ms rise/fall times (10–90%). Gating was applied simultaneously in both ears, so there were only ongoing interaural phase differences (IPDs) in the stimulus and no onset ITD. The search stimulus was a binaural pure tone presented every 250 ms, of variable frequency and level. An IPD of 0.1 cycles was used for the search stimulus because this is the modal characteristic delay in the IC (McAlpine et al. 2001). When a neuron wasisolated, the lowest threshold and frequency at that threshold [characteristic frequency (CF)] were obtained audiovisually. Frequency response areas, rate-level functions, and peristimulus response histograms (PSTHs) were obtained using pure tones to enable the neurons to be characterized and their location in central IC verified (see Shackleton et al. 2003 for details). Rate-level functions were obtained using uncorrelated broadband or narrowband noise to determine the threshold for that stimulus. Rate threshold was defined for pure tones as the audiovisual threshold at zero ITD, and for noise as the point at which the rate-level function visually departed from the spontaneous rate.

In the first part of the experiment, we recorded from all neurons that showed some subjective modulation of the tone delay function. In the second part, we only recorded from neurons with good modulation of the tone delay function and a spike signal-to-noise ratio that was judged likely to be sufficient for a recording of two to three hours. Spikes were included in the spike count if they occurred between 0 and 80 ms after the stimulus onset. The position and duration of the counting window did not critically affect the results, so it was set wide to avoid missing any stimulus-driven responses at the expense of including a small amount of spontaneous activity.

Special considerations for controlling stimulus interaural correlation

The measured interaural correlation of a stimulus varies between different tokens and, from time to time, within a long token. Standard noise generation methods create noises with an expected interaural correlation of r, but the actual value obtained is randomly distributed around this expected value from trial to trial. A plot of standard deviation of measured token interaural correlation as a function of expected correlation is shown in Figure 1. Also shown in Figure 1 are the correlation thresholds obtained by Pollack and Trittipoe (1959a); these thresholds parallel the intrinsic variability of the correlation—which raises the possibility that the main limitation in correlation sensitivity is attributable to intrinsic variability in the stimulus. For this reason, we carefully controlled the correlation of the stimulus used in this experiment so we could isolate the effects of internal noise.

Fig. 1
figure 1

Standard deviation of interaural correlation of noise samples (left axis) and human correlation discrimination threshold (right axis) as a function of interaural correlation. Open symbols show the standard deviation of the measured correlation of 500 noise tokens for two different bandwidth (W), duration (T) products (using first method of generation described in the section “Rate vs. Interaural correlation functions”). The solid line running between these points is the empirically fitted function \( r:{\sqrt {{\left( {1 - r^{2} } \right)}} } \), which follows the data well. A similar plot is shown in Gabriel and Colburn's (1981) Figure 6, although a less well fitting function is plotted, where it is noted that an analytical solution is difficult (p.1397). Human psychophysical interaural correlation discrimination thresholds are shown as solid circles joined by lines for 1-s long noise samples at 85 dB SPL (Pollack and Trittipoe, 1959a).

The methods used controlled the variability in the interaural correlation of the stimulus; however, there are three factors affecting the interaural correlation of the internal representation of the stimulus before it reaches the coincidence detector: filtering, temporal variability, and internal delays.

Filtering of a broadband stimulus by the basilar membrane does not alter the expected correlation; however, because it effectively resamples the stimulus, it changes the interaural correlation of the filtered stimulus. To control for this, in one condition, we used narrowband stimuli that had the same equivalent rectangular bandwidth (ERB) as the guinea pig cochlear filters in order to minimize the alteration of the stimulus by cochlear filtering.

The short-term correlation of a stimulus varies randomly throughout its duration. This variation can be smoothed out by averaging over a long time, and there is psychophysical evidence that humans do exactly this (Bernstein and Trahiotis 1997). However, it is probable that the integration times at the IC are very short, leading to a variation in firing rate as the short-term correlation varies. We used short, 50-ms noise tokens and controlled for the variability between tokens. It is likely that the integration time at the IC is shorter than 50 ms; however, this seemed a reasonable compromise between reducing stimulus variability and recording steady-state responses beyond the onset response.

The paradigm model for ITD sensitivity is of internal delays followed by coincidence detection (Jeffress 1948). This internal delay interaurally decorrelates the spike trains before they reach the coincidence detector, so rICFs do not measure the complete correlation sensitivity of the coincidence detector itself unless the internal delay is compensated for. The ITD required to achieve this was estimated from the delay function (see below) and was termed the “compensation” ITD. rICFs were collected at both zero and “compensation” ITD.

Delay (ITD) functions

Delay functions were obtained by delaying, or advancing, the fine structure of the signal to the ipsilateral ear while keeping fixed the signal to the contralateral ear. Positive ITDs correspond to the signal at the contralateral ear leading (i.e., signal to ipsilateral ear delayed). We obtained delay functions over ±1.5 cycles of CF in 0.1 cycles of CF steps using 50 repeats at a repetition rate of 5/s. A single repeat consisted of the full range of ITD steps presented in pseudorandom order. In the earlier experiments, delay functions were obtained for both tones and broadband (50–5000 Hz) noise.

In later experiments delay functions were obtained for tones and for narrowband noise, centered on the CF with an ERB equal to that measured physiologically and behaviorally in the guinea pig:

$${\text{ERB}}{\left( {f_{{\text{c}}} } \right)} = 6.477f^{{0.56}}_{{\text{c}}} ,$$

where ERB and f c are measured in Hz (Evans 2001; Evans et al. 1992). Neurons were characterized as either peak, trough, or asymmetrical based upon the shape of the noise delay function. Peak neurons had a clear peak with no comparably sized dips below mean firing rate (Fig. 3Ab). Trough neurons were the converse, with a clear trough and no peak (Fig. 3Bb). Asymmetrical neurons had both a peak and a trough above and below the mean rate (Fig. 3Cb). Peak (or trough) position was visually estimated as the position where the peak (or trough) would have occurred if the function had been continuous. Peak (or trough) position was used as an estimate of the delay necessary to compensate for internal delays (“compensation” delay). In early experiments, a single tone delay function at CF was used to estimate the “compensation” delay, whereas in the later experiments the narrowband noise delay function was used to estimate the “compensation” delay. Mean best phase (BP) and vector strength were calculated from the delay functions using a modification of the method of Goldberg and Brown (1969), in which the delay function was treated like a period histogram and the strength of locking to the ITD measured.

Rate vs. interaural correlation functions

Rate vs. interaural correlation functions (rICFs) were obtained by presenting noise stimuli with interaural correlations between −1 and +1 in 0.1 steps at a repetition rate of 5/s. The noise stimulus was either broadband, from 50 to 5000 Hz, or narrowband, centered on CF with a bandwidth equal to the guinea pig ERB (Evans 2001; Evans et al. 1992) and rectangular cutoffs. Signals were of 50-ms duration and were presented at 20 dB above the uncorrelated noise threshold. A single repeat consisted of the full range of interaural correlation steps presented in pseudorandom order.

Interaural correlation was controlled using the well-known “two-independent noise generator” method (Jeffress and Robinson 1962). Briefly, two independent noise samples were generated. One of these was presented to the left ear. The signal presented to the right ear was a sum of that presented to the left ear, and the other independent noise in the proportion \( r:{\sqrt {{\left( {1 - r^{2} } \right)}} } \) (e.g., Culling et al. 2001, Eq. A1). We wished to exclude stimulus-induced variability from measurements of the interaural correlation function so that we could measure the effect of the intrinsic neural variability. This was achieved in two different ways. In the early experiments, a completely new pair of noise samples were generated for each trial according to the next value required in the interaural correlation function. The correlation between them was then measured, and the neural response to this stimulus pair was assigned to the 0.1-wide histogram bin containing the measured correlation rather than the expected correlation. Nominally, 50 repeats per correlation step were obtained; however, because of this rebinning technique there was a variation in the actual number of repeats per correlation step around 50.

In later experiments the problem of variability in sample correlation was addressed in a different manner. The problem arises because there is usually a small degree of correlation between the original two noise samples that are added together to produce the tokens used in the experiment. We used the Gram–Schmidt procedure (Culling et al. 2001) to remove this correlation so that adding the original noise samples in the proportion \(r:{\sqrt {{\left( {1 - r^{2} } \right)}} }\) produced exactly the expected interaural correlation. For each neuron, 10 tokens of each correlation were generated. Responses to between 20 and 50 repeats of each token were obtained. The data from each noise token were kept separate to allow rICFs to be calculated separately for each token; however, in this paper, the data are pooled together to reduce the variance of the data points. The variability attributable to different tokens will be considered in a future paper.

Receiver operating characteristic analysis

Receiver operating characteristic (ROC) analysis (Bradley et al. 1987; Cohn et al. 1975; Green and Swets 1974; Shackleton et al. 2003) was used to determine the smallest change in interaural correlation that the neuron could correctly indicate by a change in its firing rate. The details of and justification for this procedure are discussed at length by Shackleton et al. (2003); however, we will give a short summary here. ROC analysis allows threshold estimates to be made without assuming any particular distribution of spike counts. This is especially important for very low spike rates, where the distribution is significantly different from the Gaussian distribution assumed in d′ analysis (Green and Swets 1974). The analysis simulates a two-interval forced-choice psychophysical task. It is assumed that the firing rates in two intervals are compared and the target is chosen as that occurring in the interval with the higher firing rate. As shown in Figure 2A, the mean firing rate clearly changes with increasing correlation, so if there were no variability in the response it would be easy tomake fine discriminations in interaural correlationbased upon firing rate. However, there is also asubstantial variability in the firing rate as shown by the variance plotted as the light line in Figure 2A andthe distribution of firing rates in Figure 2B. Thesubstantial overlap in firing rate distributions between even very widely spaced points (Fig. 2B) demonstrates that discrimination based upon firing rate will only be correct on a proportion of trials, and that as this proportion becomes greater the less overlap there is in firing rate distributions. Using ROC analysis (Shackleton et al. 2003), we can calculate the proportion correctly as a function of separation from a reference correlation. Figures 2C–E show proportion correct as a function of interaural correlation forreference correlations of −1, 0, and +1, respectively. The 75% correct threshold is defined as the difference between the correlation at which 75% correct is first achieved (using interpolation) and the reference correlation. The upward pointing triangles in Figure 2F show the 75% correct thresholds as a function of reference interaural correlation. These clearly decrease as the reference approaches +1; this is because of the increase in slope of the rICF closer to +1, since the variance is approximately constant. When the firing rate of the target is below that of thereference, then choosing the target on the basis of an increased firing rate will give consistently incorrect responses. This is indicated by a 25% threshold in Figures 2C–E and downward pointing triangles in Figure 2F. This threshold is exactly equivalent to 75% correct based on judging a decrease in firing rate. In this paper, we summarize the variation in threshold as a function of reference by quoting thresholds at reference correlations of −1,0, and +1. Discrimination thresholds away from references of +1 and −1 were obtained directly, and the discrimination threshold around 0 was calculated as the mean of the 25% and 75% thresholds (open circles in Fig. 2F).

Fig. 2
figure 2

Illustration of receiver operating characteristic (ROC) analysis. A Rate vs. interaural correlation function (rICF) as a solid line joined by circles. Variance of distribution of spike counts is shown as a light line. rICF was collected by using the second method of generation described in the section “Rate vs. Interaural correlation functions.” Fifty repeats each of 10 tokens were collected and the results pooled across tokens, so each point is the result of 500 measures. B rICF (joined symbols) with distribution of number of times each spike count occurred superimposed corresponding to filled symbols. The reference correlations −1, 0, and +1 used in panels CE are encircled. C “Neurometric” function showing predicted percentage correct in a simulated 2IFC experiment (see the section “Receiver operating characteristic analysis” for details). The large circle shows the reference correlation of −1 and the large, upward triangle shows the correlation closest to the 75% threshold. D Same as in C, but for a reference correlation of 0. The large downward triangle shows the correlation closest to the 25% threshold. E Same as in D, but for a reference correlation of +1. FInteraural correlation thresholds as a function of reference correlation. Upward pointing triangles are the 75% correct thresholds and downward pointing triangles are the 25% correct thresholds. Circles show the threshold values reported in subsequent figures at references of −1, 0, and +1. The reported threshold at 0 is the mean of the 25% and 75% correct thresholds.

Classification of rICF functions

Curves were fitted to the rICF functions using the Marquardt–Levenberg algorithm embodied in the SigmaPlot™ 8.0 (SPSS Inc., Chicago, IL) plotting package. This algorithm seeks the parameters that minimize the least-squares difference between data and fitted function, making successive estimates of the parameters. Following Albeck and Konishi (1995), we fitted linear (y(x) = a + bx), parabolic (y(x) = a + b(1 + x)2, and ramp (y(x) = a + bx if a + b > 0; 0 otherwise) functions to the data. We did not normalize the functions first because this makes no difference to the accuracy of fit. We also fitted

$$\matrix {{{\text{power}}\quad y{\left( x \right)} = a + b{\left( {{{\left( {1 + x} \right)}} \mathord{\left/ {\vphantom {{{\left( {1 + x} \right)}} 2}} \right. \kern-\nulldelimiterspace} 2} \right)}^{p} } \hfill} \\ {{{\text{and}}\,{\text{negative}}\,{\text{power}}\quad y{\left( x \right)} = a + b{\left( {{{\left( {1 + x} \right)}} \mathord{\left/ {\vphantom {{{\left( {1 + x} \right)}} 2}} \right. \kern-\nulldelimiterspace} 2} \right)}^{p} \,{\text{functions}}.} \hfill} \ $$

The power functions were fitted with both positive and negative signs to allow for functions that were steeper around correlations of +1 and −1, respectively (e.g., Figures 3Ac and Ad, respectively). We also fitted a negative parabolic for the same reason. Albeck and Konishi (1995) did not find functions that were steeper at −1, whereas we found several. The goodness of fit of curves was compared using the sum of squared residuals.

Fig. 3
figure 3

Example analyses for: A a peak neuron with a CF of 127 Hz and a best phase of 0.14 cycles, “compensation” delay was 790 μs (0.1 cycle); B a trough neuron with a CF of 412 Hz and a best phase of 0.35 cycles, “compensation” delay was −850 μs (−0.15 cycles); C an asymmetrical neuron with a CF of 624 Hz and a best phase of 0.18 cycles, “compensation” delay was 160 μs (0.1 cycles). In all panels a–f, the error bars show the standard error of mean. (a, c, d, f) Interaural correlation functions (rICFs): the number of spikes elicited in 80 ms after onset of stimulus as a function of interaural correlation. The light line shows the variance of the spike count distribution. The thick line shows the fitted power function. The large symbol shows the stimulus condition that is equivalent to a condition within the delay function (b, e). The combinations of stimulus bandwidth and noise delay used are as follows: (a) broadband, zero noise delay; (c) broadband, “compensation” noise delay; (d) narrowband, zero noise delay; (f) narrowband, “compensation” noise delay. b Broadband noise delay function. The large circle emphasizes the zero-ITD condition, which can be compared with the circled condition in subpanel (a). The large triangle shows the “compensation” delay, which can be compared with the emphasized condition in subpanel (c). e Tone delay function. Large symbols as for (b). g Interaural correlation thresholds as a function of reference threshold for the four conditions shown in panels a, c, d, f. Symbols match the symbols in the individual panels: open circles = broadband, zero noise delay; open triangles = broadband, “compensation” noise delay; filled circles = narrowband, zero noise delay; filled triangles = narrowband, “compensation” noise delay. Thresholds shown are the average of the 25% and 75% correct thresholds. h Power and normalized magnitudes for the power function fits shown in panels a, c, d, f. Compare with Figure 11. Symbols are as in (g). The power function fitted in Ad and the solid circle in Ah are fitted with negative power function, all other fits are the positive power functions.

Results

Rate vs. interaural correlation functions and rate vs. ITD functions

Examples of the responses of three typical neurons are shown in Figure 3. A peak neuron is shown in Figure 3A, a trough neuron in Figure 3B, and an asymmetrical neuron in Figure 3C. Broadband noise delay functions are shown in subpanels b. These show the expected features, and are well damped away from the central feature that defines the neuron type. The large circle marks zero ITD, which is the ITD used to collect the rate vs. interaural correlation functions (rICF) to the left. The large triangle marks the “compensation” ITD used to measure the rICF tothe right. For all neuron types, the rICFs are monotonic and generally have either a constant slope or a steeper slope near a correlation of +1. An exception is shown in Figure 3Ad, which has a shallower slope near a correlation of +1.

The rICFs measured with zero delay and with “compensation” delay are very similar in the examples shown in Figure 3. This is largely because the “compensation” delay is very close to zero anyway. Figure 4 shows an example of an asymmetrical neuron where the peak of the delay functions is at 0.3 cycles. In this case, the zero-ITD rICFs are reversed inslope, and have their steepest slope at a correlationof −1. This may be understood by remembering that a correlation of −1 corresponds to an inversion of the waveform. For pure tones this is exactly equivalent to a delay of 0.5 cycles, whereas for a sufficiently narrow band of noise (such as the bandwidth of low CF auditory neurons) it is also approximately equivalent to a delay of 0.5 cycles of CF. Because the rICFs are monotonic, with their extremes at correlations of ±1, we expect them to bebounded at a correlation of +1 by the value of thedelay function corresponding to the delay at which the rICF was measured (i.e., the large symbols in Figs. 3 and 4), whereas at −1 we expect them to be bounded by the value on the delay function 0.5 cycles away from this. We thus expect the “compensation”-delay rICFs to always be monotonically rising for neurons where the “compensation” delay is set at the peak, and to be monotonically falling where the “compensation” delay is set at the trough, and also to have the maximum possible range of firing rates between extremes. Zero-delay rICFs would be expected to be a lot more variable in slope and to have a smaller range. Although the example shown in Figure 4 is rather extreme, insofar as the slope of the rICF actually changes sign, we recorded from many neurons where the zero-delay rICF was nearly flat, although the delay function and “compensation” delay rICF were well modulated, because the responses to signals with zero and 0.5 cycles of CF delay were nearly the same.

Fig. 4
figure 4

Example analysis for an asymmetrical neuron with a CF of 354 Hz and a best phase of 0.34 cycles showing the effect of a large best phase upon the slope of the zero-delay rICFs. Symbols and format subpanels a–f are as in Figure 3.

Figures 3 and 4 reflect the general trend that the rICFs measured with broadband and narrowband noise were normally very similar. To save space and needless repetition, we will therefore mostly consider the narrowband results throughout the rest of the paper.

Interaural-correlation discrimination thresholds

Interaural correlation thresholds were estimated from the rICF curves using ROC analysis (see Methods and Stimuli). Correlation thresholds as a function of reference correlation are shown in subpanel g of Figure 3 for each of the four different rICF conditions of subpanels a–f (identified by matching symbols). Generally the threshold decreases with increasing correlation, which can be due to either or both an increase in slope or a decrease in variance relative to firing rate. To simplify presentation, we will only present thresholds at reference correlations of −1, 0, and +1 throughout the rest of this paper.

The correlation discrimination thresholds for all neurons in our sample as a function of characteristic frequency are shown in Figure 5, with different neuron types shown by different symbols. In each panel, there are three subpanels showing correlation discrimination thresholds away from references of −1, 0, and +1. There is a great deal of spread in the data, with thresholds spanning the entire range. The average thresholds (dashed lines) are much worse than the comparable human psychophysical thresholds (thick, solid lines). The average thresholds for reference correlation of −1 are highest, with those for reference correlations of 0 being the lowest, but only just lower than those for a reference of +1. Average thresholds are slightly lower for conditions with “compensation” delay compared with zero delay. The lowest thresholds represent the best performance achievable by individual cells within the population. These are lowest for a reference correlation of +1, with those for a reference of −1 being the highest. The lowest thresholds occur for the “compensation” delay condition with a reference of +1 (Fig. 5B). These thresholds are worse than those of humans, but only by a factor of about 3. There appears to be little difference in thresholds for different neuron types. Thresholds also do not appear to vary as a function of frequency, unlike in human psychophysics (Culling et al. 2001). However, the bottom of the distribution is not very well defined, so it is possible that there might be a trend in the lower limit that we have not revealed.

Fig. 5
figure 5

Correlation discrimination thresholds measured from the rICFs. A Narrowband zero-delay conditions. B Narrowband “compensation” delay conditions. Within each panel from left to right are shown the thresholds for discriminations away from correlations of −1, 0, and +1, respectively. Within each subpanel the thresholds are plotted as a function of neuron characteristic frequency. Symbols indicate the neuron type: open, upward triangles represent peak neurons (n = 34 and 41 for panels A and B, respectively); gray circles asymmetrical neurons (n = 8, 12); solid, downward triangles trough neurons (n = 13, 10). The dashed lines show the mean threshold averaged across all neuron types. For comparison, thresholds from a recent psychophysical study are shown: the thick solid lines show the human interaural correlation threshold interpolated to a duration of 50 ms with a bandwidth of 100 Hz, centred on 500 Hz, at 70 dB SPL (Bernstein and Trahiotis, 1997); threshold for a reference correlation of −1 was not measured. For reference correlations of ±1, the maximum possible threshold is 2, which corresponds to the first discriminable point being at the opposite end of the function. The maximum threshold for a reference of 0 is only 1, because thresholds were measured from 0 towards either +1 or −1.

Large thresholds could be a result of either a high variance in the spike rate or a small change in spike rate as correlation is changed. To determine which factor is the more important, we plotted thresholds with a reference of +1 as a function of the standard deviation of the normalized spike count at a correlation of +1 (Fig. 6A, B). At all standard deviations both high and low thresholds were found, and there was no systematic trend in the data as illustrated by the very low correlation coefficients. It therefore appears, somewhat surprisingly, that the primary determinant of high thresholds is not high variability.

Fig. 6
figure 6

A, B Comparison of correlation threshold at a reference correlation of +1 and the standard deviation of normalized rICF at a correlation of +1. C, D Comparison of correlation threshold at a reference correlation of +1 and the inverse of the slope of the rICF at a correlation of +1 derived from the product of the magnitude and power of the power curve fitted to the rICF normalized by the maximum firing rate (see the section “Interaural correlation discrimination thresholds” for further details). A, C Results for zero delay (n = 28, 8, and 9 for peak, asymmetrical, and trough neurons, respectively). B, D Results for “compensation” delay (n = 40, 11, and 8 for peak, asymmetrical, and trough neurons, respectively). The solid lines are linear regression fits and the correlation coefficients are shown in each panel. Symbols are as in Figure 5.

We estimated the slope of the correlation functions at a correlation of +1 from the fitted normalized power functions. The slope of a power function at its maximum is simply the product of the magnitude and power (i.e., bp). We would expect thresholds to be inversely related to this slope because steep slopes produce large changes in firing rate for a small change in correlation and therefore lower thresholds for the same criterion change in firing rate. The thresholds at +1 are plotted as a function of the inverse of the slope of the fitted power function at +1 for each neuron in Figures 6C, D. Although there is a lot of scatter, there is a clear trend for thresholds to decrease as the slope increases (i.e., for 1/slope to decrease). This is especially apparent for the stimulus with a “compensation” delay.

As pointed out in the previous section, the zero-delay rICFs do not constitute a uniform population. We would expect the discrimination results to becritically influenced by the BP of the neurons, because this affects the overall slope of the rICF. The minimum correlation thresholds for zero-delay are clearly lowest for peak and asymmetrical neurons when theBP is approximately 0 to +0.1 cycles (Fig. 7A), whereas the minimum thresholds with a “compensation” delay do not vary as a function of BP. A comparison between the lowest thresholds using zero- and “compensation” delays in Figures 5 and 7 is not strictly fair, because many different neurons were tested, so sampling could be an issue. A comparison between the thresholds in neurons where both measures were obtained is shown in Figure 8 for both narrowband and broadband stimuli. If, as we have argued, the zero-delay rICFs are less highly modulated, then we would also expect them to have a lesser slope at all reference interaural correlations, and therefore have a higher threshold (i.e., the neuron would be plotted to the upper left of the diagonal line of equality in Fig. 8). Most neurons do, in fact, appear on the line of equality or the zero-delay threshold is higher. Those neurons that do have lower zero-delay thresholds may be because of random variation, different local slopes around the reference correlations, or lower variance because of lower firing rates.

Fig. 7
figure 7

Same as in Figure 5, except that results are plotted as a function of neuron best phase.

Fig. 8
figure 8

Comparison of correlation thresholds measured in the same neurons for zero-delay conditions (ordinate) and for “compensation” delay conditions (abscissa) for A narrowband and B broadband stimuli. From left to right, subpanels show thresholds for −1, 0, and +1 reference correlations, respectively. Diagonal line represents equality. Symbols are as in Figure 5, except that crosses mark thresholds of neurons where the “compensation” delay was zero. The numbers of neurons where the “compensation” delay waszero in each panel was: A 11, 8, 12; B 7, 5, 7 (from left to right). Thenumber of peak neurons shown are: A 6, 4, 7; B 3, 2, 3. Troughneurons: A 4, 3, 5; B 3, 4, 4. Asymmetrical neurons: A 5, 6, 7; B 4, 2, 6.

In many neurons we were able to obtain correlation thresholds for both broadband and narrowband noise. These thresholds are compared in Figure 9. There is a clear correlation between thresholds in the two conditions, with the broadband thresholds tending to be higher, as expected from the increased stimulus induced variation in correlation.

Fig. 9
figure 9

Comparison of correlation thresholds measured in the same neurons for broadband stimuli (ordinate) and narrowband stimuli (abscissa) for A zero-delay conditions and B “compensation” delay conditions. From left to right subpanels show thresholds for −1, 0, and +1 reference correlations, respectively. Diagonal line represents equality. Symbols are as in Figure 5. The number of peak neurons shown are: A 13, 21, 15; B 8, 8, 9 (from left to right). Trough neurons: A 5, 7, 6; B 7, 6, 7. Asymmetrical neurons: A 5, 8, 7; B: 5, 4, 5.

Shapes of the rate vs. interaural correlation functions

The shape of the rICF curves is of great theoretical importance. If, as Albeck and Konishi (1995) suggest, there are different forms of rICF functions, then it may be necessary to consider that different processes give rise to the different functional forms. However, if all functions can be described by a single function, albeit with varying parameters, then only a single basic mechanism needs to be considered. FollowingAlbeck and Konishi (1995), we fitted linear, parabolic, and ramp functions to the data; however, we were not satisfied that these functions adequately described the data. The classification gave little information about how curved the functions actually were and we were concerned that some functions that sharply turned up near one end were misclassified as linear when they clearly were not. For these reasons, we fitted power functions to the rICFs and found that they provided the best fit for all rICFs. This might be expected, because the power functions have three parameters, whereas those of Albeck and Konishi (1995) only had two. But even taking account of this extra parameter using an F-test for additional terms (Bevington and Robinson 2003, pp. 207–208), over half of the functions were better fit by the power function. The fit of these functions was generally very good, as can be seen in Figure 3 where the thick solid lines in the rICFs are the best fitting power function.

The fact that all rICFs can be fit by power functions allows a powerful summary of the population. Ifthe curves are normalized, then the shape of all rICFs can be summarized by only the two parameters plotted in Figure 10. Also shown in this figure are histograms showing the distribution of powers. To provide a feel for what the parameters mean, subpanel h in Figure 3 shows the parameters for the example neurons in the same form as Figure 10. The majority of powers are less than 2, indicating that most of the functions are less curved than a parabolic function, but there are a significant number of functions that have a high power and thus curve very sharply. There are also both high and low powers at all magnitudes, showing that rICFs with large magnitudes vary from straight to very highly curved functions. Highly curved functions are not restricted to minimally modulated functions.

Fig. 10
figure 10

The power (p) and normalized magnitude (m) of power curves (y(x′) = a + bxp) fitted to the unnormalized rICFs for A zero-delay conditions and B “compensation” delay conditions. The distributions to the side of each plot show histograms of the fitted powers. Symbols are as in Figure 5, but with the shading altered: black symbols show curves fitted with the ordinary power function (x′ is (1 + x)/2), which have their steepest section at a correlation of +1, whereas the white symbols were fitted with negative power functions (x′ is (1 − x)/2), with their steepest part at a correlation of −1. The functions were normalized by the maximum of the fitted function, which differed depending on the sign of b. If b is positive, then the maximum of the function occurred at x′ = 1 and is a + b, so \( y_{ + } {\left( {x\prime } \right)} = \frac{a} {{a + b}} + \frac{b} {{a + b}}x\prime ^{p} = {\left( {1 - m} \right)} + mx\prime ^{p} \) where m = b / (a + b). If b is negative, then the maximum is at x′ = 0 and is a, so \( y_{ - } {\left( {x\prime } \right)} = \frac{a} {a} + \frac{b} {a}x\prime ^{p} = 1 + mx\prime ^{p} \) where m = b/a. In both of these equations m represents the magnitude of the function, i.e., the amount by which it changes from correlations of −1 to +1, and the baseline value can be determined as 1 or (1 m) depending upon the sign of m. If m is positive, then the function increases with increasing x′, whereas if m is negative then it decreases with increasing x′. In panel A, there are 44 neurons for which a positive power function was the better fit, comprising 25 peak neurons, 7 asymmetrical neurons and 8 trough neurons. There were 14 neurons for which a negative power function was the better fit, comprising 9 peak neurons, 1 asymmetrical neuron, and 4 trough neurons. In panel B there are 53 neurons for which a positive power function was the better fit, comprising 34 peak neurons, 9 asymmetrical neurons, and 9 trough neurons. There were 11 neurons for which a negative power function was the better fit, comprising 7 peak neurons, 3 asymmetrical neuron, and 1 trough neuron.

Modeling rICFs

If there were linear processing at all stages prior to the coincidence detector then the rICFs would be straight lines, because, as discussed in the Introduction, the coincidence detector instantiates the term under the integral sign in cross-correlation. The fact that the rICFs are best fit by a power function indicates that there are nonlinearities prior to the coincidence detector. Some of this nonlinearity is cochlear in origin, and some occur beyond the auditory nerve. We modeled a simple coincidence detector using a Matlab toolbox (Akeroyd 2004). Briefly, for each ear, a narrowband noise was generated at a center frequency of 500 Hz and passed through a gammatone filter (with bandwidth equal to the guinea pig's behavioral and neural bandwidth; Evans 2001; Evans et al. 1992), and a transduction stage to simulate the action of the cochlea. The transduction was either linear, half-wave rectification, or a popular model of the hair cell (Meddis et al. 1990), with either a high spontaneous rate or a low spontaneous rate. There were two similar models of the coincidence detector. In the first, the coincidence detector was modeled as the product of the output of the left and right transduction stages, and averaged over the duration of the stimulus, simulating a single input from each side. In the second, the coincidence detector was modeled as the product of the left and right transduction stages squared, simulating two independent inputs from each side (because the probability of two spikes occurring at the same time from independent inputs is equal to the product of the probability of firings on the individual inputs). For each condition, an rICF was constructed in the same manner as the first part of the experiment and a power function fitted using the Matlab nlinfit function. Ten rICFs were generated per condition to allow the standard error of the fitted parameters to be determined. The powers of the fitted rICFs are shown in Figure 11. As expected, the rICFs with linear transduction are fitted by a power of 1. The rICFs with half-wave rectified transduction are fitted by a power of 2, consistent with Albeck and Konishi's (1995) assertion. More interestingly, the rICFs using the hair cell simulation are best fit by apower between 1.1 and 1.8. In order to obtain a power greater than 2, the inputs to the coincidence detector need to be squared. These results indicate that the nonlinearity of the majority of rICFs are explicable in terms of the monaural transduction stages; however, some of the rICFs with power greater than 2 may require multiple inputs to the coincidence detector. It should be noted, however, that additional nonlinearities between the cochlea and coincidence detector or between the coincidence detector and IC could produce similar effects. For example, if the output from two similar coincidence detectors were fed into a further coincidence detector (cf. Stern and Trahiotis 1992), then the output would be effectively squared, and the power would bedoubled.

Fig. 11
figure 11

Power of curves fitted to simulated rICFs. The model is described in the section “Shapes of the rate vs. interaural correlation functions.” The abscissa shows the type of peripheral transduction used before input to a coincidence detector (which multiplied together its inputs and summed the result over the duration of the stimulus). Linear: the inputs to the coincidence detector were just the signal filtered by a basilar membrane filtering stage. Half-wave: the same as in linear, except that the signals were half-wave-rectified before input to the coincidence detector. High spont: the same as in linear, except that the stimulus was passed through a simulation of high-spontaneous rate auditory nerve fibers (Meddis et al. 1990). Medium spont: the same as in high spont, except that the simulation was for a medium spontaneous rate fiber. High spont squared: the same as in high spont, except that the auditory nerve output was squared before input to the coincidence detector to simulate two similar inputs from each side. Medium spont squared: the same as in High spont squared, except that the simulation was for a medium spontaneous rate fiber. The error bars show the standard error from 10 repeats.

Discussion

We measured rate vs. interaural correlation functions (rICFs) using narrowband and wideband noise either with zero delay or with a delay to approximately compensate for internal delay (“compensation” delay). There was little difference in the ensemble results between broadband and narrowband noise, and between zero ITD and “compensation” ITD. Within neurons, the “compensation” delay conditions usually had a greater slope than the zero-delay conditions, except for neurons where the best phase was nearly zero. The reason why the differences between zero and “compensation” delay are not more apparent in the figures is probably attributable to the wide range of rICF slopes, and the zero delay figures contain data from neurons where the best phase is nearly zero, and so show the best performance of which that neuron is capable.

The zero delay conditions were included because the human psychophysics has been performed with no ITD imposed upon the stimulus. These functions are not as highly modulated as those measured with the “compensation” delay, and the exact shape of the function and the discrimination threshold depend upon the relationship between the neuron's BP and the shape of the intrinsic rICF. As such, they represent a very mixed bag of results. The lowest thresholds obtained were, predictably, for neurons with BPs near zero. However, because there was such a wide spread in thresholds in all conditions, the average thresholds in the zero delay condition were not, in fact, any worse than those in the “compensation” delay conditions. No effect of BP was found on any of the measurements made using the “compensation” delay. In Figure 7, the lowest thresholds for the zero-delay condition occur for slightly positive best phases. It might be thought that the larger number of neurons with BPs near 0.125 cycles (McAlpine et al. 2001) would increase the chance offinding low thresholds there; however, if that weretrue, a similar minimum in thresholds would be expected in the “compensation” delay results. If Figure 7 is replotted so that only data from the same neurons are shown, then the overall picture does not change (not shown). Thus the slight offset of minimum thresholds from zero BP in the zero-delay condition cannot be simply a result of sampling. We currently have noexplanation for this effect beyond the suggestionsthat it may be attributable to random variation,different local slopes around the reference correlations, lower variance because of lower firing rates, or changes in neuron responsiveness between recordings.

It would have been preferable to use a measurement of characteristic phase to make the classification into different neuron types (Yin and Kuwada 1983); however, the necessary data to calculate this were only rarely collected, the experimental time being instead used to collect sufficient repetitions in rICFs to permit ROC analysis. In any case, there was no systematic difference found in the sensitivity between the different neuron types.

In estimating the “compensation” delay, direct visual estimates of the peak (or trough) were used rather than BP because the ITD functions were often significantly skewed, resulting in the mean BP, obtained by vector averaging, being somewhat removed from the peak. Although the existence of an actual axonal delay line has recently been disputed (see McAlpine and Grothe 2003; Palmer 2004, for reviews), near CF it should make little difference whether the apparent internal delay is created by an axonal delay or by a phase shift (Brand et al. 2002). The compensation will be most accurate when the neuron is either a pure peak- or a pure trough-neuron; asymmetrical neurons that have a characteristic delay on the slopes of the noise-delay curve (Yin et al. 1986), or composite curve (Yin and Kuwada 1983), are obviously less well matched.

Noise stimuli for collecting rICFs were presented at 20 dB above the rate threshold for uncorrelated noise. This level was chosen to be comparable with previous measurements made in this laboratory. This might, at first, seem to be a very low level; however, on average, the overall level for broadband noise was 72 dB SPL and for narrowband noise was 60 dB SPL. The psychophysical results shown in Figure 5 were collected at 70 dB SPL. For the bandwidths we used, psychophysical thresholds are reasonably constant as a function of level (Gabriel and Colburn 1981; Pollack and Trittipoe 1959b), and, in general, binaural psychophysical thresholds do not alter much once the stimulus is more than 10 dB above absolute detection threshold (Durlach and Colburn 1978).

Using a short token of noise gave us control of the correlation as seen by the coincidence detector; however, it does raise some problems. Different narrowband noise tokens can have very different envelopes, giving rise to different firing patterns and rates because, to a first approximation, firing patterns follow the waveform envelope (to be discussed in a future paper). If only a single token of noise was used per point in the rate vs. interaural correlation function (rICF), then variability would be introduced into the rICF because of the token-to-token variability. This could be addressed by using a single pair of noise basis functions and creating all points on the rICF from them; however, unless a procedure such as the Gram–Schmidt ortho-normalization (Culling et al. 2001) is used, the correlations of the tokens will not be those intended. Even if ortho-normalization was used, there would still be a systematic, noncorrelation-based shift in the function as the waveform at one ear changed from that at a correlation of +1 to a completely independent one at a correlation of 0 and back to the inverse of the original at a correlation of −1. These difficulties can be overcome by using freshly generated noise each trial (as in the first part of this experiment), and measuring the actual correlation of the noise tokens, or by using a number of fixed tokens which is large enough to reasonably sample a variety of waveforms, but small enough to keep experiment duration reasonable. This is the approach we adopted in the second part of the experiment using 10 tokens.

The shapes of the rICFs could best be fitted by power functions typically with a power between 1 and2. A simple model with basilar membrane like filtering followed by a transduction stage and a cross-correlation stage showed that power functions with powers between 1 and 2 were produced by a popular auditory nerve model as the transduction stage (Meddis et al. 1990). To produce powers greater than 2 required the output of the transduction stage to be squared. This is equivalent to there being two statistically identical, but independent inputs to each side of the cross-correlation stage. It is unlikely that the exact form of the nonlinearities in the transduction stage are critical—the exponential nonlinearity in the Colburn (1973) auditory nerve-based binaural model produces similarly shaped curves. It is also possible that the effects are produced by other nonlinearities either before or after the MSO.

This paper provides the first systematic measurement of the interaural correlation discrimination thresholds of a large population of IC neurons. Generally, the thresholds are very poor, being much worse than human psychophysical thresholds. Here, as in a previous paper (Skottun et al. 2001), we are making conclusions based on cross-species comparisons, because it is not possible to do the psychophysics in guinea pigs nor the physiology in humans. The most likely discrepancy, consistent with other behavioral data from animals, would be that behavioral thresholds in the guinea pig are worse than in humans. This would, however, not materially affect the conclusions, but rather suggest that the information carried by the most sensitive neurons is unavailable. Consistent binaural physiology across a wide range of laboratory animals suggests that if discharge variability were the limiting factor, then neuronal interaural correlation thresholds in humans and guinea pigs may be similar. Additionally, the low-frequency part of human and guinea pig audiograms are similar and it seems highly likely that over the relevant low frequency range the ability of neurons to phase lock will not be very different, which is in largely a determinant of the discharge variability. However, guinea pig auditory filter bandwidths are approximately twice as wide as those of humans at low frequencies, so if stimulus variability were the limiting factor, we would expect guinea pig correlation thresholds to be lower because stimulus variance is inversely proportional to bandwidth (Fig. 1).

We previously measured tone ITD discrimination thresholds in a similar population of neurons (Shackleton et al. 2003; Skottun et al. 2001), and found many ITD thresholds that were much worse than those of humans; however, the best one approached the human threshold. In contrast, the interaural correlation thresholds are markedly worse in the conditions of zero ITD, a condition under which psychophysics is normally performed (Fig. 8A,C), although the thresholds of the best neurons are only a factor of 3 worse than human measurements using narrowband stimuli at “compensation” delay (Fig. 8D). So, although tone ITD discrimination could be performed on the basis of the few, most sensitive neurons (see Shackleton et al. 2003, for a fuller discussion of this issue), correlation discrimination requires some form of population coding—whether pooling of several neurons together or by viewing the pattern across an array of neurons. Some selection of which neurons to use for correlation discrimination is essential, because most of the neurons in our sample provide very little useful information, and many hundreds would need to be pooled together to reduce thresholds. The nature of the stimulus suggests that a population code would be useful. Because noise is stochastic, correlation is a statistic of the stimulus, which varies randomly around its “true” value throughout the duration of a stimulus. In order to achieve the best estimate of correlation, it is necessary to average over both time and frequency, so it is probable that as many neurons as possible are recruited into the process of correlation discrimination. This is in contrast to ITD discrimination, where even if a noise stimulus is used, the parameter to be discriminated is constant across time and frequency and the only variability is attributable to intrinsic neural noise.