Introduction

Cochlear mechanics are influenced by efferent nerve fibers originating at the superior olivary complex in the brainstem. Neurons with cell bodies located near the medial superior olivary nucleus have axons that project mostly to the contralateral side and terminate on outer hair cells (OHCs; Warr, 1992). Electrical stimulation of the olivocochlear bundle reduces otoacoustic emission (OAE) amplitudes (Mountain, 1980; Siegel and Kim, 1982). Activation of olivocochlear efferents reduces basilar membrane (BM) vibration in a frequency-dependent manner (Dolan et al., 1997; Murugasu and Russell, 1996). These results suggest that medial olivocochlear (MOC) activation reduces the transduction potentials of OHCs, thereby reducing the gain of the cochlea amplifier and corresponding BM vibration.

Because of their noninvasive nature, OAEs have emerged as a key measurement tool in studying the MOC reflex in humans. OAEs are thought to be a good assay of the MOC reflex because OAEs arise from the action of OHCs, and the MOC reflex acts through synapses on OHCs to alter BM mechanics. However, care is needed when interpreting the effects of MOC activation on OAEs.

Some possible confounds were discussed by Guinan et al. (2003). First, some OAE probe stimuli themselves may elicit the MOC reflex, making the shift in the OAE smaller than it would otherwise be in the absence of this partial activation of the reflex by the probe. Second, the MOC activator is generally presented to the contralateral ear to avoid acoustic contamination of the OAE (Collet et al., 1990; Veuillet et al., 1991; Maison et al., 1997). Anatomical evidence suggests that the ipsilateral reflex is stronger than the contralateral reflex (Guinan et al., 1983; Robertson and Gummer, 1985; Warr, 1992), where “ipsilateral” refers to the case when activator and probe are presented in the same ear.

Guinan et al. (2003) presented data from human ears suggesting that using stimulus frequency otoacoustic emissions (SFOAEs) as an assay of the MOC reflex may overcome these problems: a low-level [approximately 40 dB sound pressure level (SPL)] SFOAE probe tone does not itself elicit the MOC reflex and allows measurements to be made using ipsilateral noise activators. A disadvantage of ipsilateral noise activators is that SFOAEs are elicited by so-called “two-tone” suppression during the presentation of the activator. It is therefore possible to measure MOC reflex effects only after the activator has been turned off.

The initial report by Guinan et al. (2003) suggested that SFOAEs and ipsilateral activators may be an effective way to measure the MOC reflex; however, their study reported data from a limited group of subjects at a single frequency and over a limited time interval. Part of their subject inclusion criteria was that subjects had easily measurable MOC efferent effects. If SFOAE tests of MOC reflex are to find clinical value, it is important to know the prevalence of measurable MOC effects in the general, normal-hearing population. Auditory nerve fiber thresholds in the cat (Guinan and Gifford, 1988) suggest that the MOC reflex is strongest across a range of frequencies in the lower mid-frequency range of hearing. The efferent innervation patterns in humans are not known, and MOC reflex effects in humans might also vary across a range of frequencies. Guinan et al. (2003) reported MOC shifts in only a 100-ms interval beginning 50 ms after cessation of a noise activator. The sensitivity of detecting MOC efferent shifts may be strongly affected by the probe frequency relative to SFOAE fine structure. Although Guinan et al. noted that their protocol may have produced an alignment with each subject's fine structure, they made no attempt to measure this directly.

Measurement of the MOC reflex may also be easily contaminated by the middle-ear muscle (MEM) reflex. Effective activators of the MOC reflex are also effective activators of the MEM reflex. Elicitation of the MEM reflex results in a stapedius muscle contraction, which alters the source sound pressure in the ear canal, primarily at low frequencies (Moller, 1983). This change in sound pressure may easily be mistaken for a MOC reflex, which can have a similar temporal pattern (Guinan et al., 1983).

To date, the majority of studies using OAEs to study MOC reflex in humans have only indirectly addressed the problem of MEM reflex (Mott et al., 1989; Collet et al., 1990; Veuillet et al., 1991, 1992; Hood et al., 1996, 2003; Maison et al., 1997, 1999). The studies of Burns et al. (1993) and Whitehead et al. (1991) suggest that emissions may be sensitive to small, subclinical MEM contractions, making direct measurement desirable. Two exceptions to the indirect evaluation of MEM reflex contamination exist. Buki et al. (2000) examined phase and level shifts of the 2f1–f2 distortion product otoacoustic emission (DPOAE) as a function of frequency. When high-level noise was presented to activate the MEM reflex, phase leads were seen below 2 kHz, and level decreases were seen below 1 kHz. In contrast, when lower-level noise was used to activate the MOC reflex, no obvious frequency dependence was seen in phase or level changes. Buki et al. (2000) suggested that this frequency dependence could be used to separate MEM and MOC effects. Guinan et al. (2003) evaluated the contributions of the MEM reflex by making several measurements at closely spaced frequencies and then examining the phase gradient of the responses. In this method, a shift caused by an MEM reflex is expected to have a short group delay, corresponding to a shallow phase gradient, compared to the much longer delays of SFOAEs associated with MOC-induced shifts (Shera and Guinan, 1999).

One of the main disadvantages of such methods is the need to make sequential measurements at several frequencies. The need to make several closely spaced measurements is time consuming and may make clinical implementation more difficult. The activation of MEM reflex can vary across runs even when stimulus levels are fixed Guinan et al. (2003). Because of this, interpretation of sequential measurements may be difficult when activator levels are near the MEM reflex threshold.

An alternative method, which may overcome these difficulties, is to simultaneously measure the MOC and MEM reflexes using low- and high-frequency probe tones presented together. The use of a low-frequency (226 Hz) probe tone to detect an MEM reflex via changes in acoustic admittance is a standard clinical practice. However, nonlinear signal extraction techniques such as those used to measure SFOAE residuals can be used to provide a more sensitive probe of the MEM reflex (Neumann et al., 1996).

Based on these considerations, the current study was designed to further examine the use of SFOAE probes and ipsilaterally presented activators to measure the MOC reflex. A sample of subjects was chosen with normal hearing and middle-ear function, but without regard to MOC reflex strength. Responses were examined near frequencies of 1.5 and 3.5 kHz. The exact probe frequency for each subject was chosen relative to their SFOAE fine structure. A novel method was used to simultaneously measure the MOC and MEM reflexes, and a nonlinear signal extraction technique was used to improve MEM reflex detection capability. Changes in SFOAE amplitude after the offset of a broadband noise activator were examined. It was hypothesized that (1) SFOAE probes would provide a sensitive measure of the ipsilateral MOC reflex in most subjects; (2) probes near 1.5 and 3.5 kHz would both be sensitive indicators of the MOC reflex; and (3) a simultaneously presented low-frequency probe tone would provide a sensitive measure of MEM reflex activation.

Methods

Subjects

Fifteen subjects in the age range 14–34 years [mean = 24.7 years, standard deviation (SD) = 6.5 years] were tested. All subjects had pure-tone air conduction thresholds ≤15 dB HL at octave frequencies from 0.25 to 8 kHz and normal middle-ear function, as assessed by normal 226 Hz tympanograms at the time of testing. The subjects included 11 females and 4 males. During testing, subjects were seated in a comfortable chair within a sound-attenuated booth. The experimental protocol was approved by the Institutional Review Board at Boys Town National Research Hospital, and written informed consent was obtained from all participants.

Measurement paradigm

Stimuli were digitally generated and recorded at a 22,050-Hz sampling rate using a personal computer, a Digital Audio Labs CardDeluxe sound card, and custom software. Stimulus delivery and response acquisition was made via an Etymotic Research ER-10C probe microphone system, using two loudspeakers and a microphone. The system was modified by Etymotic to provide 20 dB of additional gain in the signal driving each loudspeaker. SFOAEs were collected using a double-evoked (2E) method (Keefe, 1998; Keefe and Ling, 1998). In this paradigm, a total of three stimulus intervals were presented. The first stimulus (s 1) was presented through a first loudspeaker, followed by a presentation of the second stimulus (s 2) through a second loudspeaker. Then both stimuli were presented simultaneously (s 12), each through its own loudspeaker. The ear canal sound pressure p 1 was measured in response to stimulus s 1, p 2 to stimulus s 2, and p 12 to stimulus s 12. The nonlinear residual p d was extracted by calculating

$$p_{{\text{d}}} = p_{1} + p_{2} - p_{{12}} .$$
(1)

By this process, the signals themselves along with any isochannel system distortion were canceled, leaving only the nonlinear components (generally interpreted as OAEs or MEM reflexes) and any intermodulation distortion generated by the system. System distortion was assessed from coupler recordings using an artificial human ear simulator (Brüel and Kjær type 4157) that approximated the impedance of an average adult ear (IEC 711 standard). System distortion was below the level of the noise floor at all levels used in this experiment. Sets of stimulus waveforms (s 1, s 2, and s 12) were interleaved, and online and post hoc artifact rejection were used (see Analysis for details), resulting in stable and repeatable results.

Stimuli

A typical stimulus set is shown in Figure 1. The first stimulus (s 1) condition consisted of a higher-frequency probe tone (P H) and a lower-frequency (P L) probe tone presented simultaneously. The overall duration of each s 1 stimulus buffer was 2000 ms, during which the probe tone was gated on for 1350 ms and was silent for 650 ms (Fig. 1, top panel). The probe s 1 was gated on at 50 ms and gated off at 1400 ms with 5-ms gating ramps. The 650-ms silent interval was chosen to allow the auditory system to return to baseline. The second stimulus (s 2) condition was an acoustic MOC activator consisting of a notched broadband noise and had the same overall buffer duration of 2000 ms (Fig. 1, middle panel). The activator was gated on for 500 ms starting at an onset time of 300 ms, which was 250 ms after the onset of the s 1 probe. The activator ended at 800 ms, which provided a 600-ms postactivator window in which the probe remained on (i.e., between 800 and 1400 ms) in which possible efferent effects could be observed. The activator used the same 5-ms gating ramps as did the probe. The activator duration of 500 ms was chosen based on data suggesting that MOC onset effects in humans occur with a time constant of about 69 ms (Kim et al., 2001). The duration of 600 ms was chosen based on data showing an important class of efferent effects decay with time constants of 50–100 ms (Guinan, 1990; Tavartkiladze et al., 1996). The third stimulus condition (s 12) consisted of the joint presentation of s 1 and s 2 (Fig. 1, bottom panel).

Fig. 1
figure 1

Set of stimulus waveforms: s 1 (top), s 2 (middle), and s 12 (bottom) were presented in separate stimulus intervals. Stimulus s 1 consisted of a pair of sinusoidal probe tones gated on from 50 to 1350 ms; s 2 was a short-duration (500 ms) activator noise; s 12 was the joint presentation of s 1 and s 2. As indicated by vertical dashed lines in this and subsequent figures, the time period 50–299 ms is referred to as the “preactivator window,” the time period 300–800 ms is called the “activator window,” the time period 850–1350 ms is called the “postactivator window,” and 1351–2000 ms is called the “silent period.” The 50-ms delay between the end of the activator window (at 800 ms) and the beginning of the postactivator window (at 850 ms) allows for filter settling time and for cochlear suppression effects to die away. The vertical dashed line at the beginning of the postactivator period is drawn with a thicker line width in this and subsequent figures.

The probe frequency of P H was near 1.5 or 3.5 kHz, and the probe frequency of P L was near 0.25 kHz. Because the MEM reflex reduces sound transmission at low frequencies, and because SFOAEs are extremely difficult to measure at low frequencies near 0.25 kHz because of the higher noise floor, it was assumed that P L would be sensitive only to MEM reflexes. Conversely, it was assumed that P H would be sensitive to SFOAEs and processes that modify the strength of SFOAEs (e.g., the MOC reflex or other cochlear changes) because SFOAEs are generally measured in this frequency range and MEM reflex effects are considered small.

Because the probe tones were low-level (40 dB SPL), it is likely that any effects of the activator varied with the total duty cycle of the activator stimulus, based on a single presentation of each of s 1, s 2, and s 12. The three stimulus buffers comprised a 6-s repetition period, during which the activator was on for a total of 1 s, yielding a duty cycle of 1/6.

Specific P H frequencies (f H) were chosen individually for each subject to be near their SFOAE fine structure amplitude maxima, but at least 100 Hz away from any synchronous spontaneous OAEs. Two frequency regions were tested for f H: 1.5 and 3.5 kHz. The frequency of P L (f L) was between 0.25 and 0.3 kHz, the exact frequency chosen to ensure that harmonic distortion of P L fell on either side of P H, thereby removing a source of potential artifact. This frequency is in the range of clinical probe frequencies (typically 226 Hz) used to detect the MEM reflex. P H was presented at a level of 40 dB SPL, and P L was presented at a level of 55 dB SPL.

The MOC activator was an electrically flat notched noise with energy from 0.4 to 8 kHz. Broadband noise was chosen because it is an effective elicitor of the MOC reflex (Berlin et al., 1993). One purpose of using a notched noise was to measure an SFOAE signal when the SFOAE probe tone and the notched noise signals were simultaneously present (using a nonlinear SFOAE residual technique described later). A narrowband filter (30-Hz bandwidth) was used to extract an SFOAE signal such that the 30-Hz passband was contained in the notch of the notched noise. In the original experiments, the broadband noise contained a notch logarithmically centered at f H, with a notch width (NW) = 1/6, 1/3, 2/3, or 1 octave. The intent of varying notch width was to examine its effect on the strength of the MOC reflex. Nominal overall levels L A of the activator were 45, 51, 57, 63, 69, and 75 dB SPL, corresponding to spectral levels of 6, 12, 18, 24, 30, and 36 dB SPL (re 1 Hz bandwidth), respectively. Typical frequency spectra of the stimuli and activator, as measured in a subject's ear canal, are shown in Figure 2.

Fig 2
figure 2

Sample stimulus spectra. Low-frequency probes P L were presented at 55 dB SPL and were located between 250 and 300 Hz. Spectra are shown, in which the high-frequency probe tones P H were presented at 40 dB SPL and were located near 1.5 kHz (top) or 3.5 kHz (bottom). The MOC activator was an electrically flat, notched noise with energy from 0.4 to 8 kHz. The notch was logarithmically centered at f H with varying notch widths and levels. The example shown is for an activator of 45 dB SPL with a notch width of 1/3 octave.

Analysis

Data were analyzed using custom software written in MATLAB. For each measurement, N = 18 independent buffers of data were collected (where one buffer represents a set of p 1, p 2, and p 12 waveforms). Online artifact rejection was used to detect and reject data buffers that were contaminated by intermittent noise. The artifact rejection test is described in detail in Schairer et al. (2003). The 2E residual (p d) was calculated using Eq. (1), and responses to P H and P L were examined separately by filtering each set with a Kaiser window-based finite impulse response (FIR) filter (2463 taps, 55-ms group delay, 30-Hz equivalent rectangular bandwidth) centered at f H and f L. Each set of N p d waveforms was filtered. The Hilbert envelope and instantaneous phase (Oppenheim and Schafer, 1989) were extracted and averaged over the N waveforms to produce estimates of signal SPL and phase. The noise floor was calculated as the standard error of the mean (SEM) of the Hilbert envelopes and converted to a noise SPL (in dB). Signal-to-noise ratio (SNR) was calculated as the mean signal level minus the SEM.

The nonlinear residual pressure p d extracted using responses to P H is called the “SFOAE signal” in this report (the absence of a significant residual response in coupler measurements verified that the p d residual was of biological origin). The response to P H did not involve a substantial contribution from the MEM reflex at f H and was thus concluded to be of cochlear origin. This SFOAE signal was a nonlinear SFOAE residual obtained from the SFOAE present in response to a pure tone (s 1) and the modified SFOAE present in response to the simultaneous presentation of the pure tone in a broadband notched noise (s 12). The calculation of the residual identified as the SFOAE signal in Eq. (1) also involved a subtracting out of the broadband notched-noise signal component (s 2), which was present at low levels in the narrowband filter output centered within the notch of the notched noise.

The nonlinear residual pressure p d extracted using responses to P L is called the “MEM signal” in this report for the following reasons. The low-frequency P L responses were measured at f L close to 0.25 kHz, where SFOAE responses were weak because of the elevated internal physiologic noise floor, and MEM reflex effects are known from clinical practice to be significant. It was possible to classify a given nonlinear residual p d at f L as having an MEM source if its onset latency was ≥70 ms, or as having an SFOAE origin if its onset latency was shorter, e.g., ≤44 ms. The specific choice of 44 ms as an upper bound to an SFOAE latency is based on the fact that the SFOAE latency at 1 kHz at low levels is approximately 11 ms (Shera and Guinan, 2003), whereas the assumption of cochlear scaling symmetry (Zweig and Shera, 1995) predicts the latency at other frequencies. If such a symmetry exists, the SFOAE latency at 0.25 kHz is predicted to be four times that at 1 kHz, or 44 ms. If the symmetry is violated at low frequencies, e.g., if the BM is more broadly tuned at low frequencies in human ears, then the prediction is that the SFOAE latency at 0.25 kHz would be less than 44 ms. Thus, 44 ms is an upper bound for the expected round-trip SFOAE latency. The onset latencies of SFOAEs and MEM reflex effects were thus separated by tens of milliseconds, which allowed classification of the origin of each response to P L. Some residuals would possibly have both MEM and SFOAE sources, but such a residual would initially include the short-latency SFOAE response. The results summarized in Discussion showed that only one subject had a short-latency response that could be attributed to an SFOAE. Except for this subject, all other subjects had residuals p d extracted using responses to P L that were of MEM origin. For these reasons, a p d response to P L is called an MEM signal.

It remained to determine when a nonlinear residual was significantly different than zero. An SNR criterion was adopted for signal detection based on Green and McGill (1970). They showed that a 10-dB SNR corresponds approximately to a 99% correct criterion in a two-alternative forced-choice signal detection task, for a bandwidth-duration product of 1.5 (i.e., a 30-Hz filter and 50-ms window duration). A 50-ms window duration was chosen to allow for filter settling time and to facilitate comparisons with Guinan et al. (2003).

To verify the 10-dB criterion, empirical distribution functions (edfs) of SNR were constructed. An edf is a cumulative distribution function created from actual data. For a random sample x i , i = 1, 2,..., n, the edf equals the fraction of the set that is less than or equal to each x i . Edfs were calculated for three conditions, numerical simulations, coupler recordings, and recordings from a subject's ear (with the tone present and the activator absent). For each condition, 984 independent samples of SNR were used to calculate the edf. Samples were filtered with the FIR filters described above prior to calculating SNR. As shown in Figure 3, there was good agreement between the edfs of the three conditions. Close agreement was also seen between these measures and with Green and McGill: the average SNR value corresponding to a cumulative probability of 0.99 was 10.0 dB. Based on these results, a signal was considered present in any given 50-ms time window if the SNR was ≥10 dB.

Fig 3
figure 3

Empirical distribution functions (edfs) for SNR when no OAE is present. Solid line shows a numerical simulation, dotted line shows data recorded from a coupler, and dashed lines show data from a subject's ear (with the probe tone present but the activator absent).

The presence of a signal was also associated with the mean phase trajectory becoming relatively flat, indicating a quasi-sinusoidal response. A rapid-varying phase for this stimulus set would be consistent with a noise-dominated narrowband signal. Although phase was not used as part of the criteria for determining presence of a signal, it still yielded additional, useful information.

When the 2E method is used to extract OAEs, care must be taken that baseline shifts in the stimulus sound pressure are not mistaken for OAEs. Such baseline shifts may arise from movements of the acoustic probe assembly in the ear canal or from changes in middle-ear pressure. In the measurement paradigm described here, the effects of such changes were controlled in two ways. First, such changes were likely to be associated with a sudden increase in the overall energy present in the recording buffer. Post hoc artifact rejection was used to remove buffers where changes occurred. After narrowband FIR filtering, the energy in each buffer was calculated. Any buffer with energy more than five times greater than the median was rejected. This method of post hoc artifact rejection resulted in an average of N = 16 buffers (SD = 2 buffers) for each recording. Second, for those buffers that were kept, the method of calculating the noise floor as the SEM of the square of the emission sound pressure amplitude ensured that sudden changes were not interpreted as emissions. Using this method, baseline shifts would change the mean emission level slightly, but they would also increase the SEM (i.e., the noise floor), ensuring that changes as a result of a sudden baseline shift were not interpreted as emissions.

Results

Individual results

An example of expected results for P H in the absence of an MEM reflex is shown in Figure 4. In the preactivator window (0–350 ms), the SFOAE level is less than the detection criterion (shaded gray area) and thus not significantly different from the noise (black area), and the phase varies rapidly. The SFOAE SPL shows a fast-rising slope when the activator is turned on, this slope being primarily determined by FIR filter delay. The SFOAE SPL exceeds the detection criterion (shaded gray area) during the activator window, and the corresponding phase is relatively flat, indicative of a quasi-sinusoidal response at the probe frequency. In the absence of MEM reflex during the time in which the activator is on, the SFOAE response is presumably a result of suppression of the tone by the activator on the BM. Fluctuations in SPL and phase over the duration of the activator window may be a result of the MOC reflex or to other factors, which include the OAE effects caused by temporal fluctuations in the envelope of the frozen-noise activator. Because the characteristic time constants of MOC reflex decays are on the order of 100 ms or more (Guinan, 1990, 1996; Tavartkiladze et al., 1996), the influence of the MOC reflex is expected to extend into the postactivator window. In contrast, the influence of the basilar-membrane suppression mechanism ends 10–20 ms after the activator is gated off. In Figure 4, a response is present in the initial part of the postactivator window (just after 850 ms). This response decays into the noise floor within approximately 200 ms, consistent with reported values of MOC reflex decay (Guinan et al., 1983, 2003).

Fig 4
figure 4

SPL envelope (top panel) and phase (bottom panel) responses using P H = 1 kHz. Mean SPL is shown by black line of uniform width. The noise based on the SEM is shown by the black line of nonuniform thickness. The gray shaded area marks the detection criterion, the top of the shaded area being 10 dB above the noise. Vertical dashed lines mark the preactivator, activator, and postactivator windows.

An example of expected results for P L when an MEM reflex is present is shown in Figure 5. Again, there is no response in the preactivator window. A strong amplitude response is seen in the activator window, beginning approximately 100 ms after the onset of the activator. This response continues into the postactivator window, decaying into the noise floor by approximately 200 ms after the offset of the activator. The overall shape and onset and offset latencies associated with this response are consistent with reported values for MEM reflex (Dallos, 1964; Hung and Dallos, 1973). The fact that there is no response present in the first 100 ms of the activator window suggests that the measured response is not an SFOAE, which is expected to occur virtually instantaneously (i.e., on the order of cochlear travel-time latencies, and much shorter than the MEM onset latency) with activator presentation as a result of BM suppression. This allows classification of significant low-frequency shifts as MEM-induced shifts versus those induced by other cochlear changes.

Fig 5
figure 5

SPL envelope (top panel) and phase (bottom panel) responses using P L = 250 Hz. Layout is as described in Figure 4.

In contrast to the expected results described above, a wide variety of responses was seen when results for the 15 individual subjects were examined. Several different patterns emerged, and examples of the most common patterns are shown in Figure 6. All five responses shown in this figure were obtained from different subjects but using the same activator level. Figure 6a shows a response that was more common at low activator levels or wide notch widths, but that also sometimes was seen at higher activator levels. No response was present in P L, and a response to P H was seen only during the activator window. This result is consistent with the absence of MEM reflex, an SFOAE present because of cochlear suppression, and no influence of MOC reflex.

Fig 6
figure 6

SPL envelope responses as a function of time for individual subjects. Responses to low-frequency probe are shown in the left column; responses to high-frequency probe are shown in the right column. Noise floor (SEM) is shown as a black line. The 99% CI for the noise is shown by the upper edge of the gray shaded area. The f L and f H frequencies are labeled in each panel. All responses shown here are for L A = 63 dB SPL and NW= 1/6 octave.

In Figure 6b, a strong P H response is seen during the activator, and the response continues throughout the duration of the postactivator window. The persistence of this response is longer than expected for the MOC reflex. Note that a shift is also present in the preactivator time window. The simplest explanation is that the effect had not entirely died away by the end of the silent period and had wrapped around to the beginning of the next set of buffers included in the time average. This may suggest the involvement of either longer-duration MOC reflexes or the influence of other, presumably cochlear, processes.

Shifts in P H were sometimes attributable to MEM reflex, as indicated by a corresponding shift in P L. In Figure 6c, a response in P H is seen in the postactivator window; however, examination of P L indicates the presence of MEM reflex with a similar time course in its offset. The interpretation is therefore that the response in the postactivator window is influenced, and perhaps dominated, by MEM reflex. The effect of the MEM reflex on P H can also be seen during the activator window; the level of P H during the activator window decreases as the reflex shift increases for P L.

Shifts in both probes, each having a different time course, were often seen. In Figure 6d, a response in P L during the first 200 ms of the postactivator window suggests that the shift in P H during this time is influenced by MEM reflex. However, the P L response decays into the noise floor by 1050 ms, whereas P H shows a large response until the offset of the probe tone (1400 ms). As in Figure 6b, a shift in P H is also present in the preactivator window, indicating a response lasting longer than 1200 ms. These differing time courses indicate that more than one mechanism is responsible for the shifts.

Figure 6e shows a response in P H only. This response falls into the noise floor shortly after the offset of the activator, but then reappears. The response is somewhat similar to Figure 6d, but a simultaneous shift in P L is absent in this case, except for a possible onset transient near 500 ms. The phase (not shown) differs in the activator window and the postactivator window of P H. This phase pattern was seen in several subjects, and the cancellation-like effect in P H is suggestive of two or more cochlear mechanisms with different time courses.

Time constant of postactivator shifts

When higher activator levels were used in conjunction with an f H of 3.5 kHz, shifts were often seen in the postactivator window without a simultaneous shift in P L (Fig. 6a, b, e). In many cases, the responses lasted longer than 1250 ms, as evidenced by a response wrapping around into the preactivator window. The time courses of these responses were characterized by fitting an exponential function to data points within the postactivator window only. No attempt was made to extrapolate across the time gap to the following pre-elicitor window. The single exponential function is a useful assumption to assess general decay characteristics and was chosen as a first approximation of the data. Exploration of other decay functions was beyond the scope of the current data set. An in-depth exploration of the decay characteristics should be made on data collected using a longer probe tone, so that a large extrapolation is not required between post- and preactivator time windows. The time courses of shifts during the postactivator window were therefore characterized by a single exponential function, and time constants were calculated for shifts that were above the noise floor (28 recordings from 12 subjects). The resulting mean time constant was 516 ms (SD = 186 ms).

Group results

Group data characterizing the prevalence of MEM shifts and cochlear shifts are shown in Figure 7 for f H = 1.5 kHz and Figure 8 for f H = 3.5 kHz. The percentage of ears in which a response was present (either to P H or to P L) is plotted as a function of time. The panels represent different spectral levels of the activator (L A). The data shown in these figures were obtained using an activator notch width of 1/6 octave. Responses to P L (thin black lines) showed a similar pattern in the 1.5- and 3.5-kHz frequency regions, which confirms the expectation that the MEM reflex is independent of f H. Responses to P L generally started about 100 ms after the onset of the activator. The offset was dependent on the level of the activator, with the time of offset increasing as L A increased. These onset and offset latencies are consistent with the latencies of the MEM reflex (Moller, 1983; Dallos, 1964).

Fig 7
figure 7

Group results for P H = 1.5 kHz and NW = 1/6 octave. Percentage of ears with a response is shown as a function of time, calculated in 50-ms windows with 1/2 overlap. Panels show increasing activator level from top to bottom. Thick line represents percentage of ears with a response to the 1.5-kHz probe tone. Thin line shows percentage of ears with a response to the low-frequency probe tone. Gray area shows percentage of ears with a response to both probes simultaneously. Responses are arranged from top to bottom in order of increasing activator level.

Fig 8
figure 8

Group results for P H = 3.5 kHz and NW = 1/6 octave. The layout is as described in Figure 7.

In contrast to P L, the prevalence of shifts in P H (thick black lines) showed a different pattern in the two frequency regions. The prevalence of shifts in P H was close to 100% during the activator period (350–800 ms) for both the 1.5- and 3.5-kHz frequency regions. Because the notch width was only 1/6 octave, it is likely that the shifts in P H during the presentation of the activator were caused by suppression of the cochlear amplifier (so-called “two-tone suppression”). Regarding responses in the postactivator time window, at the 1.5-kHz region (Fig. 7), approximately 20% of ears showed a postactivator response at low levels (L A of 51, 57, and 63 dB). As L A increased, the proportion of ears showing a response to P H increased, but this was accompanied by an increase in the proportion of ears showing simultaneous responses to P L (filled gray area). The proportion of ears showing responses to P H at 1.5 kHz without a corresponding response to P L remained constant at approximately 20% across all activator levels. Responses to P H at 3.5 kHz were similar to P H at 1.5 kHz at low activator levels, but at high activator levels, approximately 30–60% of ears showed a shift in P H in the absence of a response to P L. Thus, shifts were observed more often using the 3.5-kHz probe than the 1.5-kHz probe.

Notch width

Only slight differences were seen in SFOAE responses elicited by noise signals with various notch widths. The broadband noise was always an effective activator of the MEM reflex at higher activator levels for the range of notch widths used in these experiments, yielding very few responses that were uncontaminated by MEM reflex. This, along with the wide variability in individual responses, made it difficult to draw any firm conclusions about the effect of notch width on the MOC reflex. Future experiments might use bandpass noise rather than broadband noise and compare pre- and postactivator responses. This may be more effective at activating the MOC reflex or other cochlear changes without simultaneously activating the MEM reflex.

Discussion

Measurements of MOC reflex using OAEs may be easily contaminated by the MEM reflex because effective broadband activators of the MOC reflex (Thornton, 1993; Maison et al., 2000; Velenovsky and Glattke, 2002) are also effective activators of the MEM reflex. An MEM reflex-induced change in ear canal sound pressure may easily be mistaken for a MOC reflex, which can have a similar temporal pattern (Guinan et al., 1983, 2003). This problem was highlighted in experiments with rats by Relkin et al. (2005). They showed that what appeared to be MOC effects were greatly reduced after sectioning of the MEMs. To date, the majority of studies using OAEs to study MOC reflex in humans have only indirectly addressed the problem of MEM reflex. For instance, Veuillet et al. (1991, 1992) inferred the lack of MEM contamination from the observation that similar changes in emissions were found in subjects with elevated or absent acoustic reflex thresholds. The observation that a broadband acoustic-noise activator generates more suppression of OAEs at low than high activator levels has been interpreted to mean that the effect of the MEM is negligible (Hood et al., 1996, 2003; Veuillet et al., 1991). The observation that the effects of the acoustic activator are somewhat frequency specific (Liberman, 1989) has been used to argue against substantial MEM contamination (Maison et al., 1997, 1999; Mott et al., 1989; Collet et al., 1990). Although such arguments provide evidence that the effects being measured are not dominated by MEM reflexes, they do not rule out contamination of the effects by MEM reflex. The studies of Burns et al. (1993) and Whitehead et al. (1991) suggest that emissions may be sensitive to small, subclinical MEM contractions. Therefore, a direct, sensitive measure of MEM reflex obtained simultaneously with probes of MOC reflex may aid in the interpretation of results.

Clinically, the MEM reflex is typically measured using a 226-Hz probe tone to measure changes in acoustic admittance and a higher-level tone or noise to elicit the reflex. Normal MEM reflex thresholds using this method are approximately 85–100 dB SPL for pure tone activators and 65–80 dB SPL for broadband activators (Gelfand, 1997; Moller, 1962; Margolis and Popelka, 1999; Wilson and McBride, 1978). However, nonlinear signal extraction techniques such as those used to measure nonlinear OAE residuals can be used to provide a more sensitive probe of the MEM reflex. Neumann et al. (1996) reported that thresholds, which were obtained using a difference method, were 8 dB lower on average than those obtained with a standard clinical measurement system. Recent work with wideband reflectance methods also suggests that the MEM reflex threshold may be activated at levels 10 dB lower than shown on clinical tests (Feeney et al., 2003). Based on these results, it was hypothesized that the nonlinear residual signal extraction technique used in these experiments would provide a sensitive probe of the MEM reflex. An examination of Figures 7 and 8 suggests that the MEM reflex was generally measured at noise activator levels of 63, 69, and 75 dB SPL. A direct comparison of standard clinical and nonlinear extraction techniques was not an aim of this study, and detailed comparisons were not made. However, it appears that the low-frequency probe tone with the nonlinear extraction technique identified MEM reflexes across the expected range of activator levels.

An alternative method for determining the contribution of MEM reflexes to shifts in SFOAE probes was reported by Guinan et al. (2003). Several measurements were made at closely spaced frequencies, and the phase gradient of the responses was examined. In this method, a shift caused by an MEM reflex is expected to have a short group delay, corresponding to a shallow phase gradient, compared to the much longer delays of SFOAEs associated with MOC-induced shifts (Shera and Guinan, 1999). The threshold levels for elicitation of the MEM reflex reported by Muller-Wehlau et al. (2005) and Guinan et al. (2003) using the phase-gradient technique were similar to those obtained in the current study. However, the low-frequency probe method may hold some advantages over the phase-gradient method. First, the low-frequency probe method allows simultaneous measurement of MEM reflex for each MOC measurement. This is important because, as noted by Guinan et al. (2003), the activation of MEM can vary across runs even when stimulus levels are fixed. Because the phase-gradient method does not measure the two effects simultaneously, it does not allow for interpretation of effects when activator levels are near the MEM reflex threshold. Second, simultaneous-probe measurements may be less time consuming because the need to make individual measurements at several closely spaced frequencies is eliminated. Third, with the simultaneous-probe method, the MEM reflex can be designated as present or absent based on a statistical SNR criterion. In contrast, the phase-gradient method has an area of “mixed” effects that are difficult to interpret, i.e., levels of activators where the phase gradient is in between the shallow slope expected for MEM reflex but not as steep as expected for an SFOAE. Fourth, the low-frequency probe method allows examination of the temporal interaction of SFOAE-based and MEM-mediated effects, which may occur in the same measurement. For example, responses like those shown in Figure 6d can be interpreted in terms of an MEM component and other (possibly MOC-mediated) components, and the time course of the MEM reflex can be seen relative to that of the other components.

The method using a low-frequency probe and nonlinear extraction is limited by the sensitivity of the probe to middle-ear impedance changes. Larger relative changes in impedance may occur at higher values of f L, but then it is more likely that the shift represents an OAE at that frequency. It was for this reason that f L was chosen close to 0.25 kHz. One adjustable parameter in using the low-frequency probe is the choice of level. It must be high enough so that the nonlinear residual at f L is above the noise floor, but not so high to elicit a measurable SFOAE or to itself elicit an MEM or MOC reflex. Previous work indicates that frequency integration occurs for MEM reflexes (Simmons, 1960; Sesterhenn and Breuninger, 1976; Blood and Greenberg, 1981; Stelmachowicz and Gorga, 1983; Kawase et al., 1997), as well as MOC reflexes (Berlin et al., 1993; Maison et al., 1997, 1998, 1999, 2000). Guinan et al. (2003) showed that DPOAE probes can partially activate the MOC reflex, whereas low-level SFOAE probes do not. The levels used in the present study (40 and 55 dB SPL) were lower than those typically used to measure DPOAEs clinically, and the tones were widely separated in frequency, unlike the typical separation between DPOAE primary tones. The activator bandwidth in the present study was already very wide (approximately 6–7 kHz), so that the addition of a single, low-level sinusoid would not be expected to cause an appreciable change in the reflexes.

The P L level of 55 dB SPL used in this experiment was chosen based on pilot data from three subjects. To make sure that the joint presentation of P H and P L did not appreciably affect the results, measurements were made using P H alone and P H + P L. None of the subjects showed significant differences in the amplitudes of the shifts obtained at P H with or without P L. Figure 9 shows representative data from one subject. Three repetitions of each condition (P L alone and P H + P L) are shown for two activator levels. In response to the lower activator level, a shift is seen only during the presentation of the activator (300–800 ms). In response to the higher level activator, a shift is seen during the presentation of the activator, and the shift continues for approximately 700 ms after the offset of the activator. At both activator levels, there appears to be complete overlap in the shifts measured by P L alone and P H + P L.

Fig 9
figure 9

Representative pilot data from one subject at f H for two noise activator levels. Results obtained using P H alone (dotted lines) and P H + P L (solid lines) are shown. Three repetitions of each condition are shown for two activator levels. f H = 1044 Hz and f L = 298 Hz. P H was presented at a level of 37 dB SPL, and P L was presented at a level of 55 dB SPL. Activators were presented at L A = 60 and 72 dB SPL. Activator NW = 1/6 octave.

Similar measurements were made to examine the effect of P H on P L. As before, none of the subjects showed significant differences in the amplitudes of the shifts obtained at P L with or without P H. Figure 10 shows representative data from one subject. Three repetitions of each condition (P L alone and P H + P L) are shown for one activator level. A shift occurs with a latency of approximately 100 ms after the onset of the activator. The shift decays into the noise floor within 200 ms of activator offset. This time course is consistent with an MEM reflex. There appears to be complete overlap in the shifts measured by P L alone and P H + P L. These overlaps of the measurements suggest that the joint presentation of P L and P H at the levels used in the present study did not appreciably alter the MEM or MOC reflex strength.

Fig 10
figure 10

Representative pilot data from one subject at f L for one noise activator level. Results obtained using P L alone (dotted lines) and P L + P H (solid lines) are shown. Three repetitions of each condition are shown for L A = 68 dB SPL. Activator NW = 1/6 octave. Probe frequencies and levels were the same as in Figure 9.

It was assumed in devising the measurement paradigm that P L would be sensitive to MEM reflexes and not to SFOAEs, and that P H would be sensitive to SFOAEs and processes that modify the strength of SFOAEs. These assumptions were met most of the time. In one subject with a low noise floor and strong SFOAEs, the residual at P L showed a shift occurring less than 40 ms after the onset of the activator (not illustrated). This delay is too short to be caused by MEM reflex, suggesting an SFOAE. In other subjects, the residual at P H showed level and phase changes over time courses similar to the time course of the MEM reflex (as measured by P L). This occurred at f H = 1.5 kHz (as in Fig. 6c) as well as at f H = 3.5 kHz (not illustrated). Although the MEM reflex creates a shift in the source sound pressure primarily at low frequencies, this does not rule out effects on emissions at higher frequencies. However, such effects can be differentiated from the noise-activator effect on SFOAEs because the longer onset latency of MEM reflex effects is much longer than SFOAE travel times. Consistent with observations of Burns et al. (1993) and Whitehead et al. (1991), the presence of these shifts measured at 3.5 kHz may indicate MEM reflex effects on SFOAE forward and/or reverse transmission.

Time constant of postactivator shifts

When higher activator levels were used, shifts in P H = 3.5 kHz were often seen in the postactivator window without a simultaneous shift in P L (Fig. 6a, b, e). As described in Results, when the time courses of these shifts were characterized by a single exponential function, the mean time constant was 516 ms (SD = 186 ms, N = 28). This time constant is much too short to be a “slow” efferent effect lasting tens of seconds (Sridhar et al., 1995; Cooper and Guinan, 2003), but is longer than expected for “fast” MOC reflexes, which rise and decay on the order of 100 ms (Wiederhold and Kiang, 1970; Sridhar et al., 1995; Guinan, 1996). A somewhat slower efferent effect for DPOAEs was demonstrated by Liberman et al. (1996) in cats and by Kim et al. (2001) in humans, showing decreases in DPOAE amplitudes during continuous stimulation. Kim et al. used a two-exponential function to fit this decrease and found a short time constant of 69 ms and a long time constant of 1510 ms. The time constants of shifts in the postactivator window found in the current data set fall between these values.

The magnitudes of the shifts measured in the current study for SFOAEs are similar to those reported by Kim et al. (2001) for DPOAEs. Note, however, that the magnitudes appear similar only when signal processing differences are taken into account: Kim et al. reported a median decrease of 0.4 dB in the DPOAE cubic difference tone. A shift of 0.4 dB in a 40-dB SPL SFOAE probe tone would yield a 13.5-dB SPL residual using the nonlinear extraction technique. Most of the shifts observed in the current study were between 10 and 20 dB SPL.

The results of the present study suggest caution in interpreting the observed shifts as mediated by the MOC reflex. Liberman et al. (1996) showed in cats that cochlear changes (1000-ms time constant) persisted after complete olivocochlear bundle section, suggesting that it was mediated by an internal cochlear process rather than the MOC reflex. It is unclear whether the shifts observed in the present study are related to the slower shifts observed by Liberman et al. (1996) and Kim et al. (2001).

If the shifts in the present study are related to the slower DPOAE adaptation, they may arise from intrinsic cochlear factors, such as changes in local chemical concentrations (e.g., K+; ATP, Ca2+) in the organ of Corti, or the action of nonefferent synapses beneath OHCs (Thiers et al., 2002a,b). Adaptation of stereocilia hair bundles may also play a role in the observed shifts. Liberman et al. (2004) concluded that DPOAEs can be produced in response to high-level stimuli without prestin-based OHC motility. Hair bundles typically display both fast (0.1–5 ms) and slow (tens to hundreds of milliseconds) adaptation components (Ricci, 2003). Ricci et al. (2002) described an additional slow adaptation component in turtle auditory hair cells, which they termed the “sag.” The sag describes hair-bundle adaptation with time constants of 100–300 ms, an opposite polarity to fast adaptation, and an overshoot relative to original position at the end of a depolarizing voltage step. Sags were typically seen after relatively large depolarizations, suggesting that they might exert an effect in response to high-level acoustic stimuli. If such sags are present in mammalian hair bundle adaptation, they could play a role in the SFOAE shifts described in this paper. Further study is required to understand the sources of the shifts measured in the present study.

Group results

The present results differ from those of Guinan et al. (2003), who reported MOC shifts 50 ms after the end of the activator in all ears tested. In a similar frequency region (f H = 1.5 kHz), this study showed shifts in the SFOAE without shifts in the low-frequency MEM probe for only 20% of subjects. Several factors may account for this discrepancy. First, Guinan et al. only reported data from subjects with easily measurable efferent effects, whereas the current study only required that subjects have normal hearing and normal tympanograms. Second, subjects in the Guinan et al. study were kept awake (Guinan, personal communication), whereas in the current study, subject state was not explicitly controlled. It has been observed that the onset of sleep can decrease the contralateral suppression of transient emissions (Froehlich et al., 1993). However, it seems unlikely that sleep was a major cause of the lack of responses to low-level activators seen in the current study because many subjects remained awake throughout the testing. Furthermore, the results shown in this report (activator of 1/6 octave notch width) were obtained at the beginning of recording sessions, when subjects who did eventually sleep would still have been awake. Third, the current study measured SFOAEs from the peak region of each subject's SFOAE microstructure. Guinan et al. did not specifically account for microstructure, but did choose frequencies with large SFOAEs; therefore, this is unlikely to be a major source of difference in the two studies. In both studies, the probe-tone frequencies were selected to be distant from spontaneous emissions. Fourth, a fewer number of averages and a different detection criterion may have made responses more difficult to detect in the current study. Finally, the duration of the activator was much shorter in the current study than that reported by Guinan et al. (2003) (500 versus 2500 ms). It is possible that the MOC reflex did not have time to build up as large an effect in the current study. Of these possibilities, subject selection, number of averages, detection criterion, and activator duration seem the most likely causes of the differences in results.

Conclusions

A low-frequency (approximately 0.25 kHz) probe tone presented simultaneously with a higher frequency SFOAE probe may be an effective method for monitoring MEM reflex while obtaining an SFOAE at a higher probe frequency. Further work should be performed to assess the sensitivity of the low-frequency probe as a function of probe level. The 3.5-kHz region appears to be a more sensitive frequency region than 1.5 kHz for measuring SFOAE shifts that are uncontaminated by the MEM reflex. Shifts were present in only 3 of 15 subjects at activator levels below 60 dB SPL. At higher levels, the majority of subjects showed SFOAE shifts, many of which were unaccompanied by a simultaneous shift in the low-frequency probe. The source of these shifts is unclear, however, and it may be that intrinsic cochlear factors are responsible rather than MOC efferent reflexes. Until further studies elucidate the mechanisms of such shifts, it may be helpful to exercise caution in characterizing them as resulting from the action of the MOC reflex.