Introduction

Masking by harmonic complex tones with different phase spectra provides one example where masked thresholds cannot be predicted by the power spectra of the masker and signal alone (Fletcher 1940). Changes in a masker’s phase spectrum can lead to changes in its temporal modulation pattern, with highly temporally modulated waveforms typically producing less masking than maskers with relatively “flat” temporal envelopes. The reduced masking produced by highly modulated maskers occurs presumably because listeners are able to make use of the time periods during which the masker level is low and the signal-to-masker ratio is high.

The waveshape and depth of modulation depends on the phase curvature of the harmonic complex tone (\( \frac{{{\partial^2}\theta (f)}}{{\partial {f^2}}} \), where θ and f denote the phase and frequency, respectively). Zero phase curvature (such as found with complexes in sine or cosine phase) produces highly modulated waveforms, whereas a particular constant non-zero phase curvature, known as Schroeder phase (Schroeder 1970), can be shown to produce a maximally “flat” temporal envelope. For the purposes of psychophysical masking, it seems that the temporal waveshape after cochlear filtering determines the amount of masking. Physiological studies in mammals have shown that the phase curvature of the cochlear filters, measured on the basilar membrane (BM) and in the auditory nerve, is approximately constant and negative for frequencies around the characteristic frequency (CF), at least for CFs greater than about 1 kHz (de Boer and Nuttall 1997; Carney et al. 1999; Recio and Rhode 2000). Psychophysical estimates of the phase curvature of the auditory filters have used harmonic complexes with the phases of components set according to a variant of Schroeder’s (1970) equation, provided by Lentz and Leek (2001):

$$ {\theta_n} = {{C\pi n\left( {n - 1} \right)} \mathord{\left/{\vphantom {{C\pi n\left( {n - 1} \right)} N}} \right.} N}, $$
(1)

where n is the component number, N is the total number of components present in the complex, and C is a constant. In general, for a given absolute value of C, harmonic tone complexes with positive phase curvatures (i.e., C > 0) produce less masking than complexes with the corresponding negative phase curvatures in both simultaneous masking (e.g., Smith et al. 1986; Kohlrausch and Sander 1995; Lentz and Leek 2001; Oxenham and Dau 2001a, b) and forward masking (Carlyon and Datta 1997a), in line with expectations, based on the negative phase curvature of the cochlear filters. For stimuli with positive phase curvature, the negative phase curvature of the auditory filters counteracts that of the stimulus to produce a filtered waveform with less phase curvature (i.e., near-constant group delay across frequency) and a highly modulated temporal envelope at the output of the cochlear filter tuned to the CF, as illustrated in Figure 1 and demonstrated physiologically (Recio and Rhode 2000; Summers et al. 2003). In simultaneous masking, temporal modulations in the masker envelope lead to low-energy epochs in the masker, which make the signal more detectable. The link between signal detectability and temporal envelope fluctuations in simultaneous masking has been established psychophysically, by measuring masking period patterns (e.g., Kohlrausch and Sander 1995; Carlyon and Datta 1997b), and through computational modeling (e.g., Oxenham and Dau 2001a).

FIG. 1
figure 1

Illustration of waveforms of positive and negative Schroeder-phase complexes (left panels) and waveforms resulting from an interaction between the phase characteristic of either complex with that of an all-pass filter with the phase response that cancels the phase curvature of the positive Schroeder-phase complex (right panels).

Similar (although generally smaller) differences in masking between complexes with negative and positive phase curvature have also been demonstrated in forward masking (Carlyon and Datta 1997a). Because the signal is always presented after the masker, temporal fluctuations during the masker cannot in themselves explain masking differences. Instead, it is generally believed that the combination of temporal fluctuations in the stimulus waveform after cochlear filtering and physiological compression, such as that observed in basilar membrane responses (e.g., Rhode 1971; Ruggero et al. 1997), can account for the effects of masker phase in forward masking (Carlyon and Datta 1997a). Briefly, waveform or envelope compression has the effect of reducing the power or root mean square (rms) level of temporally fluctuating stimuli, relative to unmodulated stimuli with the same rms before compression, making the modulated stimuli less effective maskers than the unmodulated stimuli. In fact, compression is also likely to play an important role in explaining simultaneous masking effects (Oxenham and Dau 2001a) and may account for why masker phase effects are reduced or absent in listeners with cochlear hearing loss (Summers and Leek 1998; Summers 2000; Oxenham and Dau 2004).

In summary, it is generally believed that the nonlinear gain and compression combine with the negative phase curvature in the normal cochlear response, to produce masker phase curvature (MPC) effects in both simultaneous and forward masking. Interestingly, both compression and negative phase curvature are only observed in basilar membrane responses for frequencies close to the CF. The response of the BM to frequencies well below CF is linear (at least in base of the cochlea, where most measurements have been made), and the phase curvature tends towards zero (e.g., Ruggero et al. 1997). The prediction, therefore, is that when the masker frequencies are well below the signal frequency, the envelope of the waveform at the output of the filter should resemble more closely that of the input stimulus. Consequently, an off-frequency masker with all components starting with the same phase (i.e., a masker for which parameter C = 0) would be expected to produce the peakiest waveform at the output of the filter tuned to the signal frequency well above the masker.

This prediction was tested by Oxenham and Ewert (2005). They presented a 2-kHz pure-tone signal in the presence of a simultaneous Schroeder-phase complex masker that was band-limited between 1,400 and 2,600 Hz (on-frequency condition) or between 200 and 1,400 Hz (off-frequency condition). In line with earlier studies, minimum masking was produced by the on-frequency masker when the phase curvature of the masker was positive. In contrast, for the off-frequency condition, minimum masking was produced when the phase curvature of the masker was zero. This result is consistent with the prediction that the zero phase curvature of the basilar membrane in response to off-frequency stimuli should lead to a minimum in masking when the curvature of the off-frequency masker is zero.

Because Oxenham and Ewert (2005) used simultaneous masking, it was not possible to determine the role of peripheral compression since, for maskers with highly modulated envelopes, detection of the signal could be improved by “listening in the valleys” during low-level epochs in the masker waveform (e.g., Wojtczak et al. 2001), and by suppressive interactions between the masker and signal (e.g., Temchin et al. 1997). Thus, although an off-frequency masker would not be expected to undergo compression at the place with a CF corresponding to the signal frequency, the effects of MPC could still be observed. Forward masking, in which listening in the valleys is not possible, provides a stronger test of the influence of peripheral compression on masker phase effects. If peripheral compression is responsible for the relationship between masker effectiveness and temporal envelope modulations, then forward masking by off-frequency maskers should show little or no dependence on the phase relationships within the masker, so long as the masker maintains the same overall rms level. Ewert and Oxenham (2002) provided a preliminary test of this hypothesis, but were not able to rule out masker phase effects for off-frequency maskers at the 4-kHz signal frequency that they used.

The present study had two main aims. The first aim was to test the hypothesis that peripheral compression can account for MPC effects in forward masking (Carlyon and Datta 1997a). Two predictions can be made from this hypothesis: first, using an off-frequency masker should lead to an elimination of masker phase effects, to the extent that the off-frequency maskers are processed linearly at the basilar membrane place with a CF corresponding to the signal frequency, as described above; second, the MPC effects should be independent of masker duration. This is because basilar membrane compression is believed to act essentially as static (i.e., independent of stimulus duration) compression, operating quasi-instantaneously (Ruggero et al. 1997). The second aim was to extend the measures of auditory filter phase response from simultaneous masking (Oxenham and Dau 2001b) to forward masking. If the earlier studies of simultaneous masking were correct in assuming that minimum masking is produced when the masker has the most modulated envelope after auditory filtering, then estimates of auditory filter phase curvature using forward masking should be the same as when simultaneous masking is used. The earlier study using forward masking (Carlyon and Datta 1997a) tested only Schroeder positive (C = 1) and negative (C = −1), and so was not able to provide an accurate estimate of auditory phase curvature.

Effects of masker phase curvature and duration on forward masking by harmonic complexes

Stimuli and procedure

Forward maskers were Schroeder-phase complexes with equal amplitude components and phase curvatures described by

$$ \frac{{{\partial^2}\theta (f)}}{{\partial {f^2}}} = C\frac{{2\pi }}{{N{f_0}^2}} $$
(2)

where f 0 is the fundamental frequency of the harmonic complex, and N is the number of components. The f 0 for all maskers was fixed at 100 Hz. Nine values of parameter C from −1 to 1 in steps of 0.25 were used. For C = −1 and 1, the waveforms were negative and positive Schroeder-phase complexes, respectively. For C = 0, all masker components started with the same phase (0 deg), resulting in the most highly modulated, or “peaky”, input waveform. The signal duration was always 10 ms, including 5-ms raised cosine ramps. Three signal frequencies, 1, 2, and 6 kHz, were tested. For each signal frequency, forward masking was measured as a function of the MPC for on- and off-frequency maskers. For the 2- and 6-kHz signals, the number of masker components was the same for on- and off-frequency maskers, but differed across signal frequencies as shown in Table 1. For the 1-kHz signal, the number of components in the off-frequency masker had to be reduced to keep all components more than half an octave below the signal frequency. Table 1 also shows the frequencies of the lowest and highest components in the on- and off-frequency maskers.

TABLE 1 The table shows the number of components (N) in the on- and off-frequency maskers and the frequencies of the lowest (F L) and highest (F H) masker components for three signal frequencies

For each signal frequency, three levels of the on-frequency masker, 45, 65, and 85 dB sound pressure level (SPL), and one level of the off-frequency masker, 85 dB SPL, were used. First, the masker condition (the level of an on- or off-frequency masker) was chosen randomly. For each masker condition, all nine values of parameter C were tested in random order. Once thresholds were measured for all the C values, another masker condition was selected at random. This procedure was repeated, using a different random order, until all masker conditions had been tested three times. The experiment was run for two masker durations, 200 and 30 ms. The conditions involving the long-duration masker were tested before the short-duration masker conditions. Thus, the experiment consisted of a total of 216 conditions: 3 signal frequencies × 4 masker levels (3 levels of on-frequency masker + 1 level of off-frequency masker) × 9 C values × 2 masker durations.

Each Schroeder-phase masker was generated digitally by adding sinusoidal components with phases described by Eq. 1 for a desired value of C. For each value of parameter C used, a Schroeder-phase complex was generated with a duration longer than that of the masker by one envelope cycle (10 ms). For each trial, the masker with the desired duration was cut from that longer complex, such that the starting point was chosen randomly from a set of 10 points within the first cycle of the complex, selected between 0 and 9 ms in 1-ms steps from the onset. This trial-by-trial randomization of the starting envelope phase was used to prevent local features at the end of the masker (e.g., an envelope peak or a valley) from having a systematic effect on masked threshold. Raised-cosine ramps of 5 ms were applied to the onset and offset of the masker. The signal was presented immediately after the masker (i.e., after a 0-ms offset–onset delay).

Thresholds were measured using an adaptive three-interval three-alternative forced-choice procedure in conjunction with a 2-down, 1-up tracking technique that tracks the 70.7% correct point on the psychometric function (Levitt 1971). The listener’s task was to indicate which of the three intervals contained the signal. Visual feedback indicating the correct interval was provided immediately after each listener response. A single experimental run consisted of a total of 12 reversals. At the beginning of each run, the signal was set to a level at which it was clearly audible. The level was reduced by 8 dB after two consecutive correct responses and increased by 8 dB after each incorrect response until the second reversal in the direction of the adaptive steps occurred. After that, the step size was reduced to 4 dB for the next two reversals and to 2 dB for the remaining eight reversals. The run terminated after 12 reversals and the threshold for each run was computed by averaging levels at the last eight reversals. The final threshold estimate was obtained by taking the mean of three single-run estimates for each condition. Visual feedback indicating the correct interval was provided after the listener’s response.

The stimuli were generated digitally on a PC with a sampling rate of 48 kHz and were played out via a 24-bit LynxStudio Lynx22 sound card. All the stimuli were presented to the listener’s left ear via a Sennheiser HD 580 headset. The listeners were tested in a double-walled sound-attenuating booth and responded via a computer keyboard or mouse.

Listeners

Eight listeners with normal hearing, four males and four females, participated in the study. The listeners were recruited from the student population and their ages ranged from 19 to 27. Not all the listeners participated in every condition due to their limited availability. Their hearing thresholds, measured using an ANSI-certified audiometer (Madsen Conera), were below 15 dB HL at the octave frequencies between 0.25 and 8 kHz. The listeners were paid an hourly wage for their participation. They received 2 h of practice before the data collection began. Each experimental session lasted 2 h including short breaks. Listeners who were tested in all conditions completed the experiment in about 20 sessions, which extended over 7–10 weeks, depending on the listener’s availability (two to three sessions per week). The listeners provided written informed consent prior to their participation, and the protocol was approved by the Institutional Review Board at the University of Minnesota.

Results

Although the individual masked thresholds for any given C differed across listeners, the overall threshold patterns as a function of parameter C (i.e., the positions of the minima and the shapes of the masked-threshold functions) were consistent for all listeners and so the mean data are presented in the figures. The upper row of Figure 2 shows data for the 200-ms on-frequency maskers, with masked thresholds for different signal frequencies presented in different panels. Different symbols represent thresholds for different masker levels, as described in the legend. The vertical dashed-dotted line in each panel corresponds to the value of parameter C that would be expected to yield the lowest masked threshold given the fundamental frequency of 100 Hz and the number of components in the maskers. These predicted C values were computed from the normalized phase curvatures estimated by Oxenham and Dau (2001b). The error bars in each plot represent the standard error of the mean. The dashed lines represent predictions obtained from a model described below.

FIG. 2
figure 2

Mean masked thresholds for N listeners (in parentheses) plotted as a function of parameter C for on-frequency maskers. Data for 200-ms maskers are shown in the upper panels, and for 30-ms maskers, in the lower panels. Different symbols represent data for different maker levels. Error bars represent the standard error of the mean. The vertical dashed-dotted line corresponds to the value of C, for which minimum masking was expected given the phase curvature of the auditory filters estimated by Oxenham and Dau (2001b). Dashed lines show predictions obtained using a model incorporating compression (α) of the squared output of a gammachirp filter with a fitted phase response.

The data were analyzed using one-way repeated-measures analyses of variance (ANOVAs) with the parameter C as the main factor. The analysis was performed separately for each signal frequency and masker condition. The Greenhouse–Geisser correction for violations of sphericity was applied where appropriate. Effects were considered significant for values of p ≤ 0.05.

For a 1-kHz signal (upper left panel), maskers with C = −1 produced the highest thresholds at all masker levels. The thresholds gradually decreased as C increased from −1 to 1. The minimum threshold was observed for C around 1, consistent with the predicted value of C = 1.04, obtained from the normalized phase curvature of the auditory filter tuned to 1 kHz (Oxenham and Dau 2001b). The effect of the MPC (determined by parameter C) was significant for all three masker levels [45 dB: F(2.1, 6.3) = 6.5, p = 0.029; 65 dB: F(1.7, 5.1) = 25.0, p = 0.002; 85 dB: F(2.0, 5.9) = 9.2, p = 0.015]. For the 2-kHz signal (upper middle panel), the functions relating the on-frequency masked threshold to the value of parameter C were non-monotonic, with a minimum at around 0.25. This value was a little lower than the predicted value of C = 0.33 for the BM filter tuned to 2 kHz, as shown by the vertical line. Again, the effect of the parameter C on thresholds was statistically significant for all masker levels [45 dB: F(2.3, 7.0) = 13.0, p = 0.004; 65 dB: F(2.5, 7.6) = 9.8, p = 0.006]; 85 dB: F(1.9, 5.8) = 15.7, p = 0.005]. Visual inspection suggests that the overall effect size, defined as the difference between the highest and the lowest threshold for a given masker level, was greater for the 2-kHz signal than for the 1-kHz signal. This might be due to the larger number of components passed through the filter tuned to the signal frequency, due to the wider absolute bandwidth of the auditory filter centered at 2 kHz.

For a 6-kHz signal (upper right panel), the patterns of masking as a function of parameter C show even stronger effects of MPC (i.e., greater differences between the maximum and minimum masked threshold) than those observed for the two lower signal frequencies. The effects were significant for all three levels of the on-frequency masker [45 dB: F(1.9, 9.6) = 9.7, p = 0.005; 65 dB: F(2.4, 12.0) = 25.2, p < 0.0001; 85 dB: F(1.7, 8.4) = 10.7, p = 0.006]. The minimum masking corresponded to C = 0.25, which was a little higher than the value of C = 0.13 predicted from the estimated phase curvature of the 6-kHz filter (Oxenham and Dau 2001b). Since the values of parameter C were selected in the steps of 0.25 in both this and the previous studies, it is difficult to specify curvature with accuracy greater than the 0.25 steps used.

The lower row in Figure 2 shows the mean data for the 30-ms on-frequency maskers. For each signal frequency, the patterns of the results as a function of parameter C were generally similar to those for the 200-ms masker, except that the effects of MPC were reduced. For a 1-kHz signal, the effect of MPC was significant for the 85-dB SPL 30-ms on-frequency masker [F(2.1, 6.2) = 7.6, p = 0.021], but not for the 45- or 65-dB SPL maskers [F(2.0, 5.9) = 2.4, p = 0.175, and F(1.6, 4.9) = 3.0, p = 0.146, respectively]. For the 2-kHz signal, the effect of MPC was significant for the 45- [F(2.3, 6.9) = 9.5, p = 0.009] and 65-dB SPL [F(2.5, 7.6) = 9.8, p = 0.006] maskers, but not for the 85-dB SPL masker [F(2.5, 7.5) = 1.0, p = 0.418]. For the 6-kHz signal, the 30-ms 45- and 65-dB SPL on-frequency maskers produced a significant effect [F(1.9, 7.5) = 0.019, p = 0.009, and F(2.5, 10.0) = 7.5, p = 0.008, respectively] while the 85-dB SPL masker did not [F(1.7, 5.0) = 2.4, p = 0.188]. Thus, the effects of MPC curvature were smaller and less consistent for the 30-ms masker than for the 200-ms masker.

The reduction in the effect of MPC at the shorter masker duration was not just a result of the masked thresholds falling into a range of lower levels, where the BM response to the signal could have been less compressive (see Carlyon and Datta 1997a). The effect of the MPC was reduced even when sets of data with comparable signal levels at threshold were compared. For example, for a 2-kHz signal, a 200-ms masker presented at 65 dB SPL (upper middle panel) produced a similar threshold to that for a 30-ms masker presented at 85 dB SPL (lower middle panel), when the phases of the masker components were generated with C = −1. However, the effect of the phase curvature for the 30-ms 85-dB SPL masker was not significant [F(2.5, 7.5) = 1.0, p = 0.418], even though a significant and sizeable effect was observed for the 200-ms 65-dB SPL masker.

The mean data for both the 200- (circles) and 30-ms (squares) off-frequency maskers are shown in Figure 3. For the 1-kHz signal (left panel), the effect of the value of parameter C was not significant for either the 200- or the 30-ms masker (p > 0.2 in both cases). Similarly, for the 2-kHz signal (middle panel), masked thresholds did not significantly depend on C [F(2.6, 7.9) = 2.8 and p = 0.116 for 200 ms, and F(1.6, 4.7) = 1.2 and p = 0.361 for 30 ms]. In contrast, masked thresholds for the 200-ms off-frequency masker and the 6-kHz signal (right panel) showed a systematic and significant dependence on the value of C [F(2.7, 13.3) = 10.6, p = 0.001]. The lowest masked threshold was at C = 0, the value for which the masker (input) waveform was the peakiest. For the 30-ms masker, a minimum threshold was also observed at C = 0, but the effect of parameter C just failed to reach statistical significance [F(2.2, 6.0), p = 0.059)].

FIG. 3
figure 3

Mean masked thresholds for N listeners (in parenthesis) plotted as a function of parameter C for off-frequency maskers. Data for 200-ms maskers are shown by circles and for 30-ms maskers, by squares. Dashed lines show predictions obtained as in Figure 2, but without compression of the masker.

Discussion

Effects of compression and filter phase curvature

Results for the 200-ms on-frequency forward maskers generally show patterns of masking as a function of MPC that are similar to those obtained using simultaneous maskers with equivalent bandwidths and fundamental frequencies (e.g., Oxenham and Dau 2001b). This finding suggests that estimates of auditory filter phase curvature are similar for both forward and simultaneous masking, consistent with the prediction that the “peakiest” masker waveform after auditory filtering should produce the lowest masked threshold in both simultaneous and forward masking.

The fact that effects of MPC are observed in nearly all the on-frequency masker conditions is consistent with the hypothesized effects of basilar membrane compression (Carlyon and Datta 1997a). The absence of an MPC effect for the 1- and 2-kHz off-frequency maskers is also consistent with the predicted effects of nonlinear basilar membrane filtering: to the extent that frequencies well below the signal frequency are not compressed at the place along the BM with a CF corresponding to the signal frequency, the off-frequency maskers should produce no MPC effects. Surprisingly, this prediction was not confirmed for the 6-kHz signal, where masked thresholds in the off-frequency condition showed a large and systematic dependence on the value of parameter C, particularly for the longer masker duration.

One possible explanation for the off-frequency results at 6 kHz is that the off-frequency masker components may still have been subjected to some degree of compression at the place along the basilar membrane with a CF around 6 kHz (Lopez-Poveda et al. 2003; Plack and Drga 2003). However, the relative bandwidth around the CF over which BM responses are compressive is thought to decrease with increasing CF (Rhode and Cooper 1996; Lopez-Poveda et al. 2003). Thus, if anything, BM responses to off-frequency maskers are more likely to have been compressed for the two lower signal frequencies, for which the effect of MPC of the off-frequency masker was not observed, than for the 6-kHz signal frequency. It might be argued that the off-frequency maskers produced compressive responses for all three signal frequencies but the masking functions were flat for the 1- and 2-kHz signals because the number of masker components passed through the filters tuned to these signal frequencies was insufficient to produce phase effects. Indeed, the number of off-frequency masker components for the two lower signal frequencies was smaller than for the 6-kHz signal (see Table 1). However, for the 2-kHz signal, the bandwidth and the fundamental frequency of the off-frequency masker were the same as in the study by Oxenham and Ewert (2005), which used simultaneous masking, and were shifted up by 200 Hz toward the signal frequency. Since data for each listener in that study exhibited strong MPC effects, the number of masker components seems to be sufficient to yield distinctly different waveforms at the output of the filter tuned to 2 kHz.

It has been suggested that compression originating from stages of processing central to the BM but peripheral to the origin of forward masking may play a role in creating effects of the MPC (Oxenham and Dau 2004). In fact, the inner hair cells exhibit compressive responses due to saturation of the transducer potential (Cheatham and Dallos 2000). Compression of the masker at sites central to the BM should be independent of masker frequency, and thus phase effects would be observed for both on- and off-frequency maskers, with the minimum masking occurring at the C values that produce the peakiest representations of the respective maskers on the BM. To account for the sets of data for all three signal frequencies, it would have to be assumed that central compression is reduced or absent at lower frequencies. At present, there is no evidence to support such an assumption. In fact, the opposite suggestion, of central compression being stronger for lower frequencies, has been made by Zeng and Shannon (1994) to account for loudness growth in cochlear-implant users. Thus, it seems that the effects of off-frequency MPC on the 6-kHz signal thresholds cannot be easily accounted for by static compression in the peripheral (or central) auditory system.

One aspect of the data from the off-frequency masker and the 6-kHz signal that is consistent with predictions based on peripheral auditory physiology is the finding that the minimum of the masking function occurs for a C value of 0, consistent with the idea that the phase curvature of the auditory filter approaches zero for frequencies well below CF (Ruggero et al. 1997; Oxenham and Ewert 2005).

Effects of masker level and duration

For all three signal frequencies and two masker durations, the minima of the masking functions in terms of C seem independent of the level of the on-frequency masker (Fig. 2, upper and lower panels). The level independence observed here is in line with the results of Oxenham and Dau (2001b) using simultaneous masking. It is also consistent with the observation that the frequency glide of auditory filter impulse responses from physiological measurements seems to be level independent (Shera 2001). The claim of level-independent phase curvature should not be interpreted as implying that the overall auditory filter phase responses are not affected by changes in level; indeed, a number of physiological studies show dramatic changes in both amplitude and phase response with level (Ruggero et al. 1997; Rhode and Recio 2001). However, changes in overall phase response (changes to the phase-frequency function or its first derivative) do not necessarily imply changes in phase curvature (the second derivative of the phase-frequency function).

Several interesting aspects of the effect of the MPC become apparent in Figure 4, which shows differences between the maximum (max) and minimum (min)-masked thresholds for different masker levels at each signal frequency. The max and min thresholds were estimated from sine-function fitsFootnote 1 to the data at each masker level in Figures 2 and 3. The bars depict the max–min differences in all conditions tested. The asterisks above the bars indicate conditions for which the effect of MPC was statistically significant, as determined by the repeated-measures ANOVAs described above.

FIG. 4
figure 4

Differences between the maximum and minimum thresholds for the 200-ms maskers are shown in the left panel, and for the 30-ms maskers, in the right panel. The fine-hatched bars show the threshold differences for the 45-dB SPL masker, the coarse-hatched bars are for the 65-dB SPL masker, and the black bars for the 85-dB SPL masker. The white bars show the differences for the off-frequency maskers. The differences between the maximum and minimum thresholds were estimated from sine-function fits to the data in Figures 2 and 3. Asterisks above the bars indicate significant effects of masker phase curvature.

For the 200-ms maskers (Fig. 4, left panel), the MPC effect increased with increasing signal frequency for all masker levels. For the on-frequency masker, this may be due to increasing absolute bandwidth of the auditory filter with increasing CF. As filter bandwidth increases, a greater number of masker components are passed through the filter tuned to the signal, leading to greater temporal envelope fluctuations in the peakiest waveforms. The largest effect (about 20 dB) was observed for the 6-kHz signal masked by a 65-dB SPL on-frequency masker. The effect for the off-frequency masker at 6 kHz was about 12 dB. For the 30-ms maskers (right panel), there was no systematic dependence of the size of the MPC effect on the signal frequency. For all signal frequencies, the effect of the phase curvature was smaller than for the same-level 200-ms masker. The effects of maker duration cannot be explained in terms of peripheral auditory phase curvature or static compression alone. Some recent studies have suggested the possibility that BM gain and compression may decrease over the course of stimulation (Strickland 2004). Such a mechanism should lead to stronger phase effects for the 30- than 200-ms maskers, contrary to the masker duration effect seen in our data. Moreover, two psychophysical methods for estimating BM compression, one measuring growth of masking and effectively estimating compression of a very brief signal (Oxenham and Plack 1997), and the other measuring temporal masking curves and estimating compression of a longer-duration masker (Nelson et al. 2001) lead to similar compression estimates. Thus, no evidence exists in physiological or psychophysical data for a buildup of peripheral compression over time.

Possible explanations and mechanisms

Since static compression and phase interactions alone cannot account for the effects of MPC on off-frequency masking with a 6-kHz signal or the effects of masker duration at all the signal frequencies, additional mechanisms need to be considered. Any candidate mechanism would need to produce MPC effects with an off-frequency masker and would need to have a relatively slow time course to account for the effects of masker duration.

Kubli et al. (2005) examined the effects of Schroeder-phase complexes with negative and positive phase curvature on the middle-ear muscle reflex (MEMR). The hypothesis they tested was that a Schroeder-phase complex with negative phase curvature should produce a more synchronized response across the BM (e.g., Dau et al. 2000), is perceived as being louder (Carlyon and Datta 1997a; Mauermann and Hohmann 2007), and may thus be a more potent elicitor of the MEMR than a Schroeder-phase complex with positive phase curvature, which should produce a less synchronized response. Note that stimuli with negative phase curvature (i.e., upward frequency glides) produce a more synchronized response across the BM, whereas stimuli with positive phase curvature produce a more compact response within a given BM filter (e.g., Uppenkamp et al. 2001).

The investigation Kubli et al. (2005) suggests an explanation of our data in terms of a feedback-based mechanism with a long time constant that would produce stronger effects of MPC by increasing the amount of masking by stimuli that produce greater excitation on the BM. It is unlikely that the MEMR can account for our results, since it is activated only by high-level stimuli and affects primarily low frequencies (Møller 2000). In contrast, our results in Figure 4 show that the effect of MPC was reduced for all three masker levels, as the masker duration decreased from 200 to 30 ms, and that the duration effect increased with increasing signal frequency.

Another candidate mechanism is the medial olivocochlear reflex (MOCR), which also has a relatively long buildup time but can be activated by stimuli presented at levels as low as 40 dB SPL (Backus and Guinan 2006; Guinan 2006). In addition, physiological data suggest that the effects of efferent activation are stronger for medium to high frequencies than for low frequencies (Kawase et al. 1993; Kawase and Liberman 1993). In the section below, a model based on BM filter phase characteristics and compression will be used to predict the data, followed by an additional stage exploring the potential of including some known characteristics of the MOCR to improve model predictions. The effect of MOCR activation will use an assumption that maskers with flatter internal waveforms elicit a stronger MOCR than maskers with the same rms level that produce peakier internal waveforms.

model predictions

Predicting masked thresholds using static compression and a level-dependent filter with a constant phase curvature

Predictions for masking by Schroeder-phase complexes were generated by combining the magnitude response of a gammachirp filter (Irino and Patterson 1997) tuned to the signal frequency and a phase response characterized by a constant phase curvature, as estimated by Oxenham and Dau (2001b). A fitted phase response was used because modeling results published earlier have indicated that the phase response of the gammachirp filter does not adequately characterize the phase response of the auditory filter (Lentz and Leek 2001; Oxenham and Dau 2001a). Masker waveforms at the output of the filter were half-wave-rectified and lowpass-filtered using an eighth-order Butterworth filter with a cutoff frequency of 500 Hz. A squared representation of the half-wave-rectified and lowpass-filtered waveforms was subjected to compression of 0.2 for the on-frequency maskers; linear processing (after squaring to produce an energy-like quantity) was assumed for the off-frequency maskers. The compression exponent was chosen based on psychophysical masking data that often lead to compression estimates of around 0.2 for stimuli between about 40 and 80 dB SPL (Oxenham and Moore 1995; Oxenham and Plack 1997; Plack et al. 2006) and suggest that compression is approximately constant across CFs (Lopez-Poveda et al. 2003; Plack and Drga 2003). To make threshold predictions, it was assumed that the signal threshold corresponds to a fixed signal-to-masker ratio after compression and integration. Because we were not predicting the decay of forward masking, the shape (or weighting function) of the temporal integrator was not relevant, and it was only necessary to integrate the signal and masker separately over an integer number of stimulus cycles. The criterion signal-to-masker ratio was set independently for each masker condition (each curve in Figs. 2 and 3) so that the predicted threshold for C = −1 coincided with the measured data point for that condition. The criterion signal-to-masker ratio was the model’s only free parameter.

Dashed lines show predictions for the 200-ms on-frequency maskers (upper panels in Fig. 2), the 30-ms on-frequency maskers (lower panels in Fig. 2), and both durations of the off-frequency maskers (Fig. 3). For the 30-ms on-frequency maskers, the predictions were in reasonable agreement with the data for signal frequencies of 1 and 6 kHz. Although discrepancies were observed for certain C values of the 85-dB SPL 1-kHz masker, the effect of the MPC quantified in terms of a difference between the maximum and minimum masked threshold was well predicted by the model. For the 2-kHz signal frequency and an on-frequency masker (Fig. 2, lower middle panel), greater discrepancies between the data and model predictions were apparent. The minima of the predicted thresholds corresponded to a larger C value than the minimum measured thresholds. This suggests that the average filter phase curvature estimated at 2 kHz in our listeners in forward masking may have had a smaller absolute value than that estimated by Oxenham and Dau (2001b) using simultaneous masking. For masker levels of 45 and 85 dB SPL, the effect of the MPC was larger for the predicted than for the measured thresholds, while the opposite was true for the 65-dB SPL masker. To reconcile the data and the predictions for the 2-kHz signal, a slightly larger compression exponent (less compression) would have to be used at 45 and 85 dB SPL and a smaller exponent at 65 dB SPL, in addition to a smaller absolute value of the phase curvature of the 2-kHz filter. Stronger compression at medium levels has been reported in both physiological and psychophysical estimates of peripheral compression (e.g., Oxenham and Plack 1997; Ruggero et al. 1997).

The model predictions were poorer for the 200-ms on-frequency maskers (upper panels in Fig. 2) at 2- and 6-kHz signal frequencies, while predictions for the on-frequency maskers at 1-kHz signal frequency were about as accurate as those for the 30-ms masker. In particular, at the two higher signal frequencies, compression and the fitted phase curvature alone were insufficient to account for the large changes in masked thresholds across C values, for masker levels of 65 and 85 dB SPL. Much smaller compression exponents (i.e., greater compression) would be necessary to produce the effect of the MPC consistent with the data, but such compression exponents would overestimate the size of the effect for the 30-ms maskers. Interestingly, when Carlyon and Datta (1997a) attempted to model the differences in forward masking between their long-duration Schroeder positive and negative maskers they found that even a compression exponent of sound pressure raised to power of 0.1 (intensity raised to 0.05 − a compression ratio of 20:1) was not sufficient to fully account for the observed masking differences. Our attempts to predict our data with a 1-kHz signal (as used by Carlyon and Datta) were more successful with a physiologically realistic compression ratio. It is not clear what accounts for this apparent discrepancy. One possibility is their use of a different filter—they used a roex filter with parameters adjusted using the POLYFIT procedure described by Rosen and Baker (1994), whereas we used a gammachirp filter. However, this seems unlikely to provide a full explanation as, if anything, the filter used by Carlyon and Datta (1997a) is broader than the gammachirp filter, and so should have led to larger differences in the envelope of filtered waveforms.

To predict data from the off-frequency conditions, the off-frequency maskers were not compressed and were only subjected to half-wave rectification, lowpass filtering, and squaring before integration. The phase curvature of the filter was also set to zero to match physiological (Ruggero et al. 1997) and psychophysical (Oxenham and Ewert 2005) estimates of off-frequency auditory filter phase curvature. The dashed lines in Figure 3 show predictions for the 200-ms (circles) and 30-ms (squares) off-frequency maskers. As expected from an integrator model with no compression, no MPC effects were predicted. Although the predictions are consistent with the pattern of results at 1- and 2-kHz signal frequencies, they are clearly inconsistent with the results at 6 kHz.

In summary, a model using filtering and static compression provided a good account of MPC effects with both on- and off-frequency forward masking using a 1-kHz signal. However, the model was not able to predict the on-frequency data from both masker durations for the 2- and 6-kHz signals, or the off-frequency data using a 6-kHz signal.

Simulating the effects of the MOCR

The working hypothesis considered in this section is that MPC effects for the 30-ms forward maskers were determined by the fast-acting effects of basilar membrane compression and filter phase curvature, whereas the MPC effects for the 200-ms masker results were additionally influenced by MOCR activation. The MOCR activation is assumed to reduce the cochlear gain of the response to the signal, thereby raising the level of the signal needed to achieve the threshold signal-to-masker ratio. A similar approach was used by Strickland (2008) to explain changes in the shape of a function characterizing growth of forward masking with the level of a precursor. Unfortunately, little is known about the MOCR, and the data available exhibit substantial variability across listeners and conditions. The effect of the MOCR in humans has been estimated recently from a change in the amplitude of the stimulus frequency otoacoustic emission (SFOAE), observed in the presence of an elicitor of the reflex (for a review, see Guinan 2006). All the SFOAE-based measurements of the MOCR in humans have used a probe frequency around 1 kHz. Our working hypothesis is, therefore, speculative and should be viewed merely as a direction that we have decided to take to explore potential consistency between the patterns in our data and the MOCR characteristics derived from the available human SFOAE data.

In generating predictions, it was necessary to assume that the MOCR-induced gain reduction results in a higher signal level necessary for masked threshold. Because of its relatively slow time course, the MOCR was assumed to enhance MPC effects for the 200-ms maskers more than for the 30-ms maskers.

Another assumption was that the MOCR effects are frequency-dependent, playing a negligible role at 1 kHz, and progressively stronger roles at 2 and 6 kHz. This assumption was based on the observation that for the 1-kHz signal, a model using a level-dependent filter with realistic (i.e., derived from human data) phase curvature and compression produced accurate predictions for both on- and off-frequency maskers and for both long and short masker durations. The assumption that efferent activation did not contribute to the effects of MPC at 1 kHz may appear inconsistent with studies that have shown strong effects of MOCR activation on SFOAE evoked by a 1-kHz tone (e.g., Backus and Guinan 2006). This apparent discrepancy can be resolved if it is assumed that the MOCR was activated by the maskers for 1-kHz signal but that differences in BM excitation, and thus differences in MOCR effect, across different MPCs, were too small to produce additional differences in masking due to a relatively small number of masker components processed by the cochlear filter tuned to 1 kHz. For the signal frequencies of 2 and 6 kHz, it was assumed that the BM response to the off-frequency masker at the signal CF place was linear, so that an effect of the MPC, when present, could be attributed entirely to the MOCR. Differences in efferent activation would result from differences in excitation level primarily in filters with CFs falling into the range of the off-frequency masker components. The reflex is assumed to have the effect of reducing gain at the signal CF place, even though all the components of the elicitor are below the signal frequency. This “upward spread” of the MOCR effect has support in the data of Lilaonitkul and Guinan (2009), which show a reduction in SFOAEs due to the MOCR for elicitors well below the probe frequency. The effect of the MOCR is described in terms of its contribution to the difference between the maximum and minimum masked threshold in dB, shown in Figure 4.

Based on the time course of the MOCR described by Backus and Guinan (2006), it was assumed that the MOCR effect started with a delay of d = 25 ms from the masker onset and that the buildup of the reflex was dominated by the fast time constant of 70 ms. Since the signal was presented after a 0-ms delay, the time of the MOCR buildup until the signal offset was:

$$ {t_r} = \left( {{D_m} - d} \right) + {D_s}, $$
(3)

where D m and D s were masker and signal durations, respectively. It was assumed that the magnitude of the reflex was directly reflected in the gain reduction, ΔGR, due to the MOCR at the time of the signal presentation and thus the gain reduction could be described by the same time course as the MOCR (Backus and Guinan 2006)

$$ \Delta GR\left( {t_r} \right) = G{R_{\max }}\left( {1 - \exp \left( { - \frac{t_r}{\tau }} \right)} \right) $$
(4)

where τ = 70 ms, and GR max was the maximum gain reduction caused by the MOCR elicited by a given masker.

Since the off-frequency MPC effect was significant only for the 6-kHz signal, the data for this signal frequency were considered first to test the working hypothesis. As shown in Figure 4, the difference between the maximum and minimum threshold for the 200-ms off-frequency masker was 12.2 dB. This change in signal level would correspond to a gain reduction of 12.2 × 0.2 dB, assuming the signal was subjected to a compression of 0.2. After rearranging Eq. 4, and substituting the estimated gain reduction for the signal, ΔGR(185 ms) = 2.44, the maximum gain reduction, GR max was calculated. The GR max was then used to estimate the gain reduction due to the MOCR for the signal following the 30-ms off-frequency masker, ΔGR(15 ms), from Eq. 4. For the signal compression of 0.2, the predicted difference between the highest and lowest threshold for the 30-ms masker was 2.5 dB. This value was smaller than the 6.5-dB difference in the data, as shown in Figure 4 (the rightmost bar in the right panel). However, for the 30-ms off-frequency masker, the effect of the phase curvature was not statistically significant, and thus the difference between the predicted and obtained values may not be statistically significant.

To predict the effect of the MOCR on the difference between the maximum and minimum threshold for each on-frequency masker, it was assumed that the gain reduction caused by the MOCR depended on the masker (i.e., elicitor) level. Backus and Guinan (2006) estimated that the change in SFOAE amplitude due to the MOCR increased by about 2% per 1-dB increase in elicitor level, for levels between 40 and 60 dB SPL used in their study. Thus, the effects for the 45- and 65-dB SPL on-frequency maskers were estimated as the appropriate proportion of the effect for the 85-dB SPL, which was assumed to be equal for the on- and off-frequency maskers.Footnote 2 The left panel of Figure 5 shows differences between the maximum and minimum threshold for three levels of the on-frequency masker for the 6-kHz signal. The gray and black bars show the maximum–minimum difference for the 30- and 200-ms maskers, respectively. The white bars show the predicted difference obtained by adding the estimated difference between the maximum and minimum threshold due to the MOCR to the difference observed for the 30-ms masker. If the hypothesis that the results using the 200-ms masker can be described as a sum of MOCR effects and the effects of BM compression and filtering is correct, then the black and white bars should be equal in height. For the 6-kHz signal, the predictions are consistent with the hypothesis (left panel of Fig. 5).

FIG. 5
figure 5

Differences between the maximum and minimum threshold replotted from Figure 4 for the 6-kHz signal (left panel) and the 2-kHz signal (right panel). The hatched bars are for the 30-ms maskers, and the black bars are for the 200-ms masker. The white bars represent differences obtained by adding the difference for the 30-ms masker to an estimated effect of the MOCR.

For the 2-kHz signal, the effect of the MPC was not significant for the off-frequency masker, and thus the predicted effect of the MOCR is underestimated for the on-frequency masker as shown by different heights of the white and black bars in the right panel of Figure 5. The failure to predict accurately some of the data for the 2-kHz signal condition may be due to differences in MOCR activation between the on- and off-frequency maskers, which are not accounted for in our simple model. As discussed above, it seems that it is not necessary to invoke MOCR effects, or anything other than known cochlear processes, to account for the data from the 1-kHz signal frequency, and so these data are not shown in Figure 5.

General discussion

The first question posed in this study was whether the combination of BM compression and negative auditory filter phase curvature is sufficient to account for MPC effects in forward masking. Our results suggest that the answer depends on the frequency region in question. For the 1-kHz signal, the MPC effects with the on-frequency masker, and the lack of MPC effects with the off-frequency masker, could be accounted for by a simple model of the auditory periphery for both long and short masker durations. For the two higher frequencies, the results were more complex. For the 2-kHz signal, there was also no significant off-frequency MPC effect, in line with the expectations based on the lack of BM compression for masker frequencies below CF; however, the greater MPC effects for the longer on-frequency masker were not consistent with predictions based on a static nonlinearity. For the 6-kHz signal, neither the off-frequency MPC effects nor the effects of masker duration were consistent with predictions based simply on BM compression and negative auditory filter phase curvature.

The second question posed in this study was whether the estimates of auditory filter phase curvature using forward masking would be similar to earlier estimates derived using simultaneous masking. The fact that the masking minima found in the present study were consistent across the two masker durations and were generally in line with earlier estimates using simultaneous masking, is consistent with the idea that the minimum in masking function reflects the phase curvature of the auditory filter, which is not influenced by type or duration of masker.

To explain the results that were inconsistent with a simple (static) peripheral nonlinearity, we explored the potential effects of a feedback-based gain reduction mechanism with a relatively long time constant, which we associated with the MOCR. Although there is currently insufficient knowledge about the physiological and behavioral effects of the MOCR to allow strong conclusions to be drawn, we showed that at least some of the patterns in the data could in principle be accounted for by such a mechanism. If such an explanation is pursued, our data provide strong and testable predictions. For instance, our hypothesis predicts that elicitors with “flat” temporal envelopes should produce stronger MOCR activation (and hence reduced cochlear gain) than elicitors with the same rms level but more highly modulated temporal envelopes. Further testing of these and other resultant predictions will be necessary to provide a firmer basis from which to accept or reject the hypothesis that the MOCR influences temporal masking patterns.