INTRODUCTION

Speech recognition with no background noise is often quite good in cochlear implant (CI) users; however, speech recognition deteriorates significantly in the presence of background noise (Zeng 2004) and especially in fluctuating background noise (Nelson et al. 2003; Zeng et al. 2005). Additionally, music perception and localization abilities are much worse in CI users than in normal-hearing listeners (Gfeller et al. 1997, 2002, 2005; Senn et al. 2005; Nimmons et al. 2007), highlighting the need for improvement in CI technology.

Acoustic temporal fine structure (TFS) has been shown to be critical for good performance in difficult listening tasks, such as music perception (Kong et al. 2004), speech perception in fluctuating background noise (Nelson et al. 2003; Qin and Oxenham 2003; Zeng et al. 2005; Füllgrabe et al. 2006), and localization and binaural unmasking (Drennan et al. 2007). Rosen (1992) defined TFS in terms of the TFS cues that are important for speech between about 0.6 and 10 kHz. The present paper takes a physics-based view of TFS rather than a speech-based view. TFS is defined as the rapid fluctuations of sound that directly track the fluctuations in acoustic pressure. TFS can have any frequency, but frequencies relevant to normal hearing range from about 20 Hz to 4 to 5 kHz, above which the auditory nerve fibers cannot phase-lock (Johnson 1980). The “envelopes” of acoustic waves follow fluctuation of the peaks of TFS over time.

Modern CIs process stimuli into 16 or 22 narrow bands, extract the narrow-band acoustic temporal envelopes from each filter, and use these envelopes to modulate a biphasic pulsatile carrier of constant rate (Loizou 1998; Wilson 2004; Drennan and Rubinstein 2006). Thus, the pulsatile carriers bear no TFS information. However, some acoustic TFS information can still be delivered through the narrow-band electric envelopes. Given a broadband stimulus with a flat (nonfluctuating) temporal envelope, narrow-band envelope cues can be recovered via filtering (Ghitza 2001; Zeng et al. 2004; Gilbert and Lorenzi 2006). CI users are rarely able to follow temporal envelope modulations over 300 Hz, but there are some “star” exceptions (Townshend et al. 1987; Shannon 1992; Zeng 2002). Thus, low-frequency TFS (<300 Hz) could be transmitted to CI users via narrow-band temporal envelopes. The degree to which acoustic TFS is transmitted to CI users via electric temporal envelope cues is much less than the degree to which TFS is delivered to normal-hearing listeners, hence the difficulty with tasks that require TFS for good performance. To measure improvement in the delivery of acoustic TFS, a good test of the TFS discrimination ability of CI users is required.

Positive and negative Schroeder-phase stimuli differ only in their TFS. These stimuli are time-reversed sound pairs with identical long-term spectra and minimal envelope modulations (Schroeder 1970). Schroeder-phase stimuli have been used to measure sensitivity to TFS with minimal envelope cues in humans and birds (Dooling et al. 2002), as maskers to explore various phenomena in normal and hearing-impaired listeners (Kohlrausch and Sander 1995; Summers and Leek 1998; Summers 2000), and as a stimulus to explore the effect of phase on basilar membrane motion and the acoustic reflex (Summers et al. 2003; Kubli et al. 2005). Schroeder-phase harmonic complexes also appear promising for testing acoustic TFS delivery in CI users.

Figure 1 shows how the common ACE® processor encodes positive and negative Schroeder-phase stimuli with fundamental frequencies of 50 and 400 Hz. The Schroeder-phase stimuli are transformed into sweeps of envelope packets that either rise or fall in frequency and repeat. The speed with which the sweep repeats depends on the fundamental frequency of the Schroeder-phase harmonic complex. By measuring Schroeder-phase discrimination for different fundamental frequencies, we can evaluate the ability of CI users to differentiate a rising sweep from a falling sweep as a function of the speed of the sweep. Such rapid changes in frequency occur in consonant–vowel transitions in speech (Kewley-Port et al. 1983), so some correlation between speech understanding and Schroeder-phase discrimination would be expected. Furthermore, if future processing schemes encode TFS explicitly, a Schroeder-phase task could be used to measure the extent to which the processing is successful.

FIG. 1
figure 1

Pairs of 50-Hz (top, left and right) and 400-Hz (bottom, left and right) Schroeder-phase complexes and their resulting electrodograms (immediately under each waveform). This electrodogram represents, as vertical lines, the biphasic pulses predicted by an ACE® signal processing strategy. Electrode number refers to the depth of insertion into the cochlea (1 = apical, 22 = basal).

The first objective of the present study is to evaluate Schroeder-phase discrimination as a measure of temporal processing capabilities in CI users. The utility of the task for evaluating performance with different processing schemes can be evaluated if reliability of the test is good and learning is limited. Therefore, the tests are repeated on separate days. Another objective is to evaluate the extent to which the task is predictive of clinically relevant performance which might depend on TFS encoding. Thus, the consonant–nucleus–consonant (CNC) word test (Peterson and Lehiste 1962), a speech reception threshold (SRT) test in steady noise and two-talker babble (Turner et al. 2004; Won et al. 2007), and a clinical assessment of music perception (pitch, timbre, and isochronous melody discrimination) (Nimmons et al. 2007) were also conducted. Finally, because the TFS of the Schroeder-phase stimulus is transformed into a repeating frequency sweep in CI users, the Schroeder-phase test could involve both spectral and temporal discrimination abilities; therefore, the extent to which Schroeder-phase discrimination ability is independent from a primarily spectral task is also evaluated by testing spectral ripple discrimination ability. Additionally, the combination of spectral and temporal elements in the two tasks is expected to more strongly predict clinical success.

METHODS

Listeners

A total of 24 postlingually deafened adult CI users volunteered for participation. Relevant clinical and demographic information is summarized in Table 1. CI users ranged in age from 28 to 74, with an average of 56 years. They had an average of 4 years experience with their implants. Additionally, seven normal-hearing listeners (detection thresholds less than 20 dB HL at all audiometric frequencies) volunteered for participation in the study. Their ages ranged from 26 to 40, with an average of 31.3 years. Each listener participated in the study in accordance with the University of Washington Institutional Review Board.

Table 1 Summary of demographic and clinical parameters for participating CI listeners

Test battery administration

All tests were performed in a sound-attenuating booth at the Virginia Merrill Bloedel Hearing Research Center. A MATLAB (The Mathworks) graphical user interface running on a Mac G5 computer was used to present sound stimuli to listeners via a Crown D45 amplifier and a free-standing speaker (B&W DM303) placed at head level, 1 m in front of the listeners. CI users were instructed to turn off any hearing aids and set the loudness of their implant processor to a comfortable level. Listeners performed the task using a mouse and computer screen placed to the right of the speaker. The speaker was a studio monitor and was selected for its good amplitude and phase response. The speaker exceeded ANSI standards for speech audiometry, varying ±2 dB from 100 to 20,000 Hz. The phase response of these speakers was smooth across frequencies, varying ±30 degrees from 150 to 20,000 Hz and ±45 degrees below 150 Hz. Normal-hearing listeners were tested only on the Schroeder-phase discrimination test, whereas CI users were administered the Schroeder-phase discrimination test as part of a battery of psychophysical tasks. In addition to the Schroeder-phase discrimination test, the test battery included CNC monosyllabic word score presented at 62 dBA (Peterson and Lehiste 1962); a test of spectral ripple resolution (Henry and Turner 2003; Henry et al. 2005; Won et al. 2007); an assessment of speech-in-noise perception that evaluated the spondee reception threshold (SRT) (Turner et al. 2004; Won et al. 2007); and a musical test battery including a test of pitch-direction discrimination, isochronous melody identification, and timbre recognition (Nimmons et al. 2007). The order of test administration was varied, but the Schroeder-phase discrimination test was typically administered after 30 to 90 min of testing with other tasks. When time permitted and users were willing to undergo further testing on either the subsequent day or within a few weeks, they were retested.

The spectral ripple test, similar to the test used by Henry et al. (2005), was administered using methods and stimuli described in Won et al. (2007). The spectral ripple test measures the just noticeable difference in spectral ripple density. The stimuli had a bandwidth of 100–5,000 Hz and sinusoidal spectral ripples with a peak-to-valley ratio of 30 dB. A two-up, one-down adaptive tracking procedure was used in which the density of the spectral ripples was tracked. The dependent variable was ripples per octave. Twenty-two of the 24 CI users completed this test.

The SRT test was a closed-set task where users were asked to identify one of 12 spondees they heard in the presence of either steady-state, speech-shaped noise, or two-talker babble (Won et al. 2007). The SRT test measures the level of noise required to just mask a closed set of spondees. The level of the noise was tracked using a one-down, one-up procedure and 2-dB steps. The dependent variable was the signal-to-noise ratio. The spondee speaker was female. Twenty-two of the 24 listeners completed this test.

The “clinical assessment of music perception” (CAMP) was used for music testing (Nimmons et al. 2007). The CAMP consists of melody, pitch, and timbre discrimination tasks. The details of stimulus creation, the melodies used, and implementation of all three music tests are described in Nimmons et al. (2007). Twenty-two of the 24 listeners completed the music tests. All stimuli were presented at 65 dBA. Each music test had a short training period in which listeners could hear the melodies or instruments or practice a few pitch discrimination trials with feedback. Subsequently, during testing, no feedback was given. The melody identification test required users to identify isochronous melodies from a closed set of 12 popular melodies. Rhythm cues were removed such that notes of a longer duration were repeated in an eighth-note pattern. The test used synthesized piano-like tones with identical envelopes. A total percent correct score was calculated after 36 melody presentations, including 3 presentations of each melody.

The pitch-direction discrimination task was a two-alternative forced-choice (2AFC) task administered using the same synthesized complex, piano-like tones, and required users to identify which of two complex tones had the higher fundamental frequency. The dependent variable was the just-noticeable-difference threshold in semitones determined using a one-up, one-down tracking procedure converging on 50% correct (Levitt 1971). Complex tones of four fundamental frequencies (185, 262, 330, and 392 Hz) were used in the task and the primary dependent variable was the average of the threshold in semitones for all fundamental frequencies. The timbre recognition task required listeners to identify instruments from a closed set of eight instruments that played an identical melodic contour. Listeners discriminated live recordings of instruments with moderate, uniform tempo playing a simple five-note pattern. The instruments were the piano, trumpet, clarinet, saxophone, flute, violin, cello, and guitar. A total percent correct score was calculated after 24 presentations including 3 presentations of each instrument.

Schroeder-phase discrimination test procedure

A single administration of this task involved one short training block and six test blocks of trials. For each trial, a four-interval 2AFC paradigm was used. The training block contained only eight trials, two for each fundamental frequency (50, 100, 200, and 400 Hz). In each test block, a total of 96 randomly ordered trials were presented, with each of the four fundamental frequencies presented 24 times. Four 500-ms stimuli were presented interleaved with 100 ms of silence, including three presentations of the negative Schroeder-phase stimuli and one positive-phase stimulus that randomly occurred in either the second or third interval. Listeners were asked to identify the sound that was different by choosing either the second or third interval. Visual feedback of the correct answer was given after each presentation. As described in the stimuli section, each of the four successive stimuli had a random starting phase to eliminate the onset and the offset as reliable discrimination cues. Stimuli were presented at 65 dBA at head level 1 m from the speaker, and the levels of the stimuli were not roved. The percent correct was calculated for each block and fundamental frequency.

Schroeder-phase stimuli

Positive and negative Schroeder-phase stimuli pairs were created for each of four distinct fundamental frequencies. Fundamental frequencies of 50, 100, 200, and 400 Hz were chosen to facilitate comparison with Dooling et al.’s results (Dooling et al. 2002) and because preliminary tests suggested that these fundamental frequencies encompassed the performance range of a typical CI user. For each fundamental frequency, equal-amplitude cosine harmonics from the fundamental frequency up to 5 kHz were summed. Each harmonic is given a phase according to the following equation:

$$\theta _{{\text{n}}} = \pm \pi {\text{n}}{{\left( {{\text{n}} + 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {{\text{n}} + 1} \right)}} {\text{N}}}} \right. \kern-\nulldelimiterspace} {\text{N}}$$
(1)

where θ n is the phase of the nth harmonic, n is the nth harmonic, N is the total number of harmonics in the complex, and the positive or negative sign is used when constructing the positive or negative Schroeder-phase signals, respectively. During each presentation of stimuli, these harmonic complexes were randomly shifted in global phase and multiplied by a 500-ms constant-amplitude window with a 10-ms linear onset and offset. This was done so that the portion of the harmonic complex that occurred during onset and offset could not be used as a reliable cue for discriminating pairs of complexes.

Analysis

For each fundamental frequency, the scores in each of six blocks were averaged to create an estimate of percent correct for that fundamental frequency (hereafter referred to as the 50-, 100-, 200-, and 400-Hz scores). Additionally, an overall average percent correct (AVG) score was constructed by averaging the scores from each of four fundamental frequencies and all six blocks. Finally, a 75%-maximum-likelihood threshold fundamental frequency (hereafter referred to as MLTF) was determined using a maximum-likelihood fit of a psychometric function as described below.

The difference between normal-hearing listeners and CI users was analyzed using a four-fundamental-frequency by six-repetition, repeated-measures ANOVA. In addition, for each of the six candidate measures of performance (50-Hz, 100-Hz, 200-Hz, 400-Hz, AVG, and MLTF), a nonparametric one-way ANOVA was used to assess any significant differences between the two groups. Within-test learning on the test was assessed using the 4 × 6 repeated-measures ANOVA. Test–retest reliability (for the subset of 18 CI users who retook the full six-block test) was assessed using a nonparametric Wilcoxon sign-rank test. Correlation of the Schroeder-phase discrimination test with the other tests in the battery described above was assessed using both a Pearson’s linear correlation and Spearman’s rank correlation coefficient. Multiple regression and partial correlation was applied to determine if the combination of the Schroeder-phase discrimination test scores and spectral ripple scores could predict performance on speech and music perception tests better than any of them alone. Adjusted R 2 values were calculated for the multiple regression, and only regressions with significant coefficients were reported. Partial correlations controlling for the effect of the spectral ripple test were reported if Pearson correlation coefficients were significant before and after control.

Visual inspection of these and previous results (Dooling et al. 2002) suggests a relationship between fundamental frequency and percent correct discrimination that might follow a monotonic, sigmoid function typical of psychometric functions. Thus, threshold fundamental frequency could be an appropriate measure of performance on the test. A maximum-likelihood fit of a 2AFC psychometric curve was created for the data from each listener according to methods described by Madigan and Williams (1987). Because the trend was for performance to worsen as fundamental frequency increased, and because stimulus frequencies were logarithmically spaced, the negative log of the fundamental frequency was chosen as the independent variable and percent correct discrimination was chosen as the dependent variable. A custom MATLAB script was created, which found the parameters μ (mean) and σ (standard deviation) needed to describe a psychometric curve by varying these parameters and minimizing the negative log likelihood of the data for each listener. Minimizing the negative log of the likelihood is equivalent to maximizing the likelihood. Then, a profile-likelihood-based 95% confidence interval was obtained for the estimate μ (described in Venzon and Moolgavkar 1988). The mean and its confidence interval were converted back to representations of frequency, and, thus, an estimate of the 75% threshold point was obtained. When the maximum-likelihood fit predicted a threshold outside the range of 50–400 Hz, the confidence intervals were prohibitively large. These thresholds were reported as less than 50 Hz or more than 400 Hz. If, for example, a listener never reached 75% correct at any fundamental frequency, the MLTF was reported as less than 50 Hz. This derived threshold is valid as long as a monotonic relationship exists between fundamental frequency and performance, which was the trend seen in almost all CI users tested.

RESULTS

A total of 24 CI users and 7 normal-hearing listeners completed the Schroeder-phase discrimination test. The full test took between 40 and 50 min, allowing listeners breaks in between the six blocks of the test. Figure 2 demonstrates how the six measures of performance on the test were obtained. For each fundamental frequency, the six test blocks were averaged to obtain an AVG score. The overall performance on the test, designated as the AVG score, was obtained by averaging across all fundamental frequencies. Lastly, the scores on each block and fundamental frequency were used to fit a maximum-likelihood psychometric curve to the data, allowing a prediction of the fundamental frequency at which listeners had a 75% threshold for discrimination.

FIG. 2
figure 2

An example graph demonstrating the six candidate measures of performance on the Schroeder-phase discrimination test. Each filled circle represents a percent correct discrimination for each of the six test blocks. Gray bars represent the AVG across all six test blocks. AVG is the overall score across all fundamental frequencies. The dotted line represents the maximum-likelihood fit of a psychometric curve to the data from the four fundamental frequencies. The open square is the maximum-likelihood predicted 75% threshold fundamental frequency (MLTF) and the horizontal line is the 95% confidence interval for the MLTF estimate.

General trends

Visualization of the data for all normal-hearing listeners and CI users in the study reveals some noteworthy trends (Fig. 3). Normal-hearing listeners appear to perform fairly uniformly on this test, with most performing nearly perfectly on all but the highest fundamental frequencies. In contrast, there is a broad range of performance among CI users with some failing to score much above chance (e.g., S0015J and S0034K) and others scoring nearly as well as normal-hearing listeners (e.g., S0000C and S0032A). The population performance worsens with fundamental frequency and the psychometric curve fit appears to approximate the results for many normal-hearing listeners and CI listeners, but some CI users have results that appear nonmonotonic (e.g., S0029D). The appropriateness of assuming a monotonic decrease in performance with fundamental frequency was assessed using a chi-squared test correction for multiple comparisons. Two out of 24 listeners had at least one instance where a score at a higher fundamental frequency was significantly higher (p < 0.01) than the score at a lower fundamental frequency. By contrast, 23 out of 24 had at least one low fundamental frequency score that was significantly higher (p < 0.01) than a score at high fundamental frequency, showing that, for nearly all CI users, scores do tend to decrease with frequency.

FIG. 3
figure 3

A summary of data from both normal-hearing listeners (filled gray axes) and CI users (white axes) showing scores for each of six test blocks (filled circles) and maximum likelihood fit of a psychometric curve (dotted line). The predicted MLTF (squares) and its 95% confidence interval (solid horizontal line) are shown. Triangles designate that predicted MLTF lie below 50 Hz (left triangle) or above 400 Hz (right triangle) for that listener.

Average CI user performance

Average scores for all CI users are reported in Figure 4 (dark gray bars). The average CI user’s performance for 50, 100, 200, and 400 Hz was 84, 80, 67, and 58%, respectively. A two-way repeated-measures ANOVA (4 × 6, fundamental frequency by test-block repetitions) revealed a strong effect of fundamental frequency [F(3,26) = 47.9, p < 0.001], and post hoc analyses revealed that two fundamental frequency pairs (50 and 100 Hz and 200 and 400 Hz) were not significantly different from each other, but all other fundamental frequency pairings were significantly different (p < 0.01). All of these scores were significantly higher than chance performance on a binomial test with 144 repetitions, suggesting that this fundamental frequency range covers the performance range typical for CI users.

FIG. 4
figure 4

A comparison of scores on the six measures of Schroeder-phase discrimination test performance between seven normal-hearing listeners (NH, light gray bar) and 24 cochlear implant users (CI, dark gray bar). The left axis applies to 50-Hz, 100-Hz, 200-Hz, 400-Hz, and AVG scores and the right axis applies to the maximum-likelihood threshold fundamental frequency (MLTF). p values for a Kruskall–Wallis nonparametric test of significant difference lie above the pair of bars.

Comparison of normal-hearing listeners and CI users

A comparison between normal-hearing listeners’ and CI users’ performance is also shown in Figure 4. In contrast to CI users, normal-hearing listeners had 50-, 100-, 200-, and 400-Hz scores of 97, 97, 96, and 66%, respectively, showing a significant decrease in performance at only the highest fundamental frequency tested. Overall performance on the test, the AVG score, was 72% for CI users, with a range of performances from 51 to 91%, whereas normal-hearing listeners had a mean AVG score of 89% with a range from 80 to 95%, consistent with the observation that the normal-hearing listeners had less variance in their performance on the test. The average predicted threshold fundamental frequency (MLTF) for 75% correct in normal-hearing listeners was 352 Hz, whereas the average MLTF for CI users was 116 Hz. A nonparametric Kruskall–Wallis test showed that normal-hearing listeners performed significantly better (p < 0.03) than CI users for all of the measures of performance. The individual significance values for these differences are reported above pair of bars in Figure 4.

Learning effects and test–retest reliability

The 4 × 6 (fundamental frequency by test block repetition), two-way repeated-measures ANOVA failed to reveal any significant improvement in scores throughout the six test blocks [F(5,24) = 0.99, p = 0.44]. The test was also repeated by 18 CI users on a second day. Figure 5 shows reliability data with no significant learning effects. Test–retest analysis using the nonparametric Wilcoxon signed-rank test revealed that there was no significant difference between scores the first and second time the test was taken (p > 0.05). Thus, test–retest reliability was good. Several (5) of the 18 CI users showed a significant (p < 0.05) improvement, whereas only 1 out of 18 had significantly worse performance, suggesting that some minority of CI users might show performance improvement despite no significant learning for the average CI user.

FIG. 5
figure 5

An analysis of test–retest reliability showing the change in score from first to second test for all six Schroeder-phase discrimination measures. p values for a Wilcoxon Signed-Rank test lie above each bar pair.

Correlations with demographics and other tests

The six candidate measures of performance were compared with clinical parameters and performance on other tests. To ensure that both linear and nonlinear relationships between variables were detected, both Pearson’s correlation coefficient and Spearman’s rank correlation coefficient were computed. Because the Spearman’s correlation coefficients did not differ from the Pearson’s correlation coefficient in predicting significant correlations, only the Pearson’s correlation coefficients are reported. No trend towards correlation (i.e., p < 0.15) was found between age, duration of deafness, duration of implantation, and any of the six measures of performance on the Schroeder-phase discrimination test. Comparisons to other tests were performed for the 20 listeners who completed all of the other tests. No significant correlations were seen between the Schroeder-phase test and the results of the timbre test, the spectral ripple test, or the spondee reception threshold in babble-noise when comparing scores from the first time a listener completed the tests. However, a number of correlations with other tests were seen, as summarized in Table 2. Significant (p < 0.05) correlations are presented in bold and trends towards correlation (p < 0.15) are shown in italics. The results showed that the CNC monosyllabic word score was significantly (p < 0.05) correlated with 50-Hz, AVG, and MLTF scores (R = 0.52, 0.47, 0.49), and CNC scores showed a trend toward correlation with the 100- and 400-Hz scores. The melody recognition score showed trends toward correlation with the 100-Hz, AVG, and MLTF scores. The 400-Hz score negatively correlated (p < 0.04) with pitch-direction threshold at the 262-Hz base frequency and the average of four pitch-direction thresholds (R = −0.53, −0.46, respectively). Finally, the SRT in steady-state speech-shaped noise also significantly correlated with the 200-Hz Schroeder-phase discrimination score (R = 0.48, p < 0.03).

Table 2 Correlations of Schroeder-phase test measures with other psychophysical tests

It has previously been shown that the spectral ripple test correlates significantly with speech understanding in quiet and noise (Henry and Turner 2003; Henry et al. 2005; Won et al. 2007). To determine the extent to which Schroeder-phase discrimination correlates with other tasks independent of spectral resolution, we repeated the above correlation analyses but performed partial correlations controlling for the contribution of the spectral ripple test. This partial correlation analysis (Table 3) produces little change in the results of the correlations from Table 2, modifying R values by no more than 0.07 and modifying p values by no more than 0.05. None of the significant correlations were strongly affected, suggesting that capacity of the Schroeder-phase measures to predict CNC, speech reception threshold in noise, and pitch-direction threshold was independent of performance on the spectral ripple test.

Table 3 Partial correlations controlling for predictive effect of spectral ripple test results

A complementary analysis was performed to determine if the combination of the spectral ripple test and any of the Schroeder-phase scores could predict performance on music and speech perception tests better than the spectral ripple test or Schroeder-phase scores alone. To be considered a valid, joint contribution to predicting test results, the adjusted R 2 had to be greater than the R 2 from the Schroeder-phase or spectral ripple tests, p had to be less than 0.05, and the regression coefficients had to be nonzero with 95% confidence for both the spectral ripple test and the Schroeder-phase measures. The combination of the 400-Hz Schroeder-phase discrimination score and the spectral ripple test scores predicted the average pitch-direction threshold for four base frequencies (adjusted R 2 = 0.42, whereas R 2 = 0.29 and 0.21 for the spectral ripple and 400-Hz scores, respectively). That is, 42% of the variance for the mean pitch discrimination ability was accounted for by the combination of the 400-Hz Schroeder-phase discrimination score and spectral ripple discrimination. For comparison, this adjusted R 2 was equivalent to a Pearson correlation coefficient of R = 0.64.

DISCUSSION

CI users discriminated positive and negative Schroeder-phase stimuli above chance but, on average, had significantly poorer performance than normal-hearing listeners at all fundamental frequencies. Like normal-hearing listeners, the performance of CI users tended to get worse with increasing fundamental frequency. As noted in the introduction, pulsatile stimulation strategies do not explicitly encode TFS within each channel. Thus, it was expected that CI users would not do as well as normal-hearing listeners on a task intended to measure discrimination of TFS. However, as seen in Figure 1, there are between-channel differences between the positive and negative Schroeder phase, even as processed through a common pulsatile stimulation strategy. In each cycle of positive Schroeder stimuli, the envelope packets sweep downward in frequency across channels through one period. For the negative Schroeder stimuli, the envelope packets sweep upwards over the same period. A correlation analysis of the within- vs. between-channel differences in the ACE® current-level output for the two Schroeder-phase complexes was completed using a MATLAB Toolbox from Cochlear. Seventy-five percent of the listeners used ACE® processing, and the other types of sound processing were envelope-based, pulsatile strategies. For the within-channel correlation, the correlations between the outputs of single channels for the positive- and negative-phase Schroeder complexes were calculated and averaged. For between-channel correlation, the correlations among the output for all combinations of different channels were determined. The analysis revealed within-channel average correlations of 0.99 for 50-, 200-, and 400-Hz complexes but between-channel correlations of 0.53, 0.85, and 0.99 for 50-, 200-, and 400-Hz Schroeder complexes, respectively. Thus, the between-channel differences in the pulsatile stimulation are much larger than the within-channel differences.

If CI users are using these between-channel differences to discriminate Schroeder-phase complexes, variance in clinical stimulation and processing rates along with the variability in sensitivity to between-channel timing differences (Carlyon et al. 2000) could help to explain the wide range of performance seen on this test compared to normal-hearing listeners. At least some CI users are sensitive to phase shifts of pulse trains on pairs of channels when the shifts are on the order of a few milliseconds (Tong and Clark 1986; Carlyon et al. 2000), although it may be necessary that the channels are less than some critical distance apart (McKay and McDermott 1996). This critical distance is likely to vary among listeners, another possible cause for wide variability in the data. Taken as a whole, the correlation analysis and previous results strongly suggest that to discriminate positive and negative Schroeder-phase stimuli, CI listeners use between-channel timing differences in the temporal envelopes.

The results of the present study demonstrate that the Schroeder-phase test is a potentially useful measure of performance for CI users. First, the normal-hearing results are consistent with previous studies such as Dooling et al. (2002). Dooling et al.’s change in performance over frequency was similar to the current results, showing a decrease in performance beyond 200 Hz (see Fig. 5 and Dooling et al. 2002). They used a detection task varying from 0 to 100% correct performance, but midrange performance was at a similar frequency in both studies. Secondly, the results from CI users typically were above chance and had a broad range, demonstrating utility to evaluate a range of CI users’ capabilities. Additionally, test–retest reliability was good. Finally, there was no significant learning trend observed, suggesting that the Schroeder-phase tests are a potentially useful measure of performance over time, or with different sound processing strategies such as in a clinical trial.

Another factor supporting the value of the Schroeder-phase test is that performance on this task correlates significantly with the established CNC-word identification task (Peterson and Lehiste 1962; Thornton and Raffin 1978) for AVG, MLTF, and 50-Hz Schroeder-phase discrimination scores. Evidence suggests that vowel and consonant recognition are correlated with temporal modulation sensitivity (Cazals et al. 1994; Fu 2002). If the Schroeder-phase test is, in part, dependent upon sensitivity to temporal modulations, an association between CNC score and Schroeder-phase measures would be expected. Consonant–vowel transitions require sensitivity to a dynamic spectral change, much like Schroeder-phase discrimination requires sensitivity to a dynamic spectral change. The period of the 50-Hz stimuli is 20 ms. The duration of a typical consonant–vowel transition is about 40 ms (Kewley-Port et al. 1983). The better the temporal resolution is, the better the listener will be able to hear the trajectory of the transition. A resolution of 5 ms, for example, would yield good information for the dynamic changes in the spectral profile for consonant–vowel transition, as well as for the 50-Hz Schroeder-phase stimuli. -Discriminating a dynamic spectral change might also require some spectral resolution; thus, spectral resolution was evaluated and a partial correlation was done controlling for the effects of spectral resolution. With this control, the correlation between the 50-Hz Schroeder stimulus and CNC words was exactly the same (0.52), suggesting that the underlying element of Schroeder-phase discrimination accounting for 27% of the variance in CNC words was a temporal element.

Modest correlations were also found among the 200-Hz Schroeder tests and speech perception in steady-state noise. We speculate that the 200-Hz/speech-in-noise relationship might be related to the ability of the CI users to segregate out the target female speech (which has an approximately 200-Hz fundamental) from the steady-state noise. The target speech would have modulation frequencies near the speaker’s fundamental frequency, about 200 Hz.

Correlations were found between 400-Hz Schroeder and pitch-direction discrimination at 362-Hz fundamental and between the 400-Hz Schroeder score and the average pitch-direction discrimination thresholds for all fundamental frequencies tested. The correlation between the 400-Hz score and pitch-direction threshold might be explained with a between-channel theory. In the pitch-direction test, users were asked to discriminate complex tones. Complex tones of differing fundamental frequency generate unique spectral profiles that are more discernible with increased spectral resolution. However, harmonics falling within the same frequency band lead to temporal modulations beating at the fundamental frequency. Therefore, perception of the pitch of harmonic complexes might reasonably be correlated with temporal modulation and, as a result, the Schroeder-phase discrimination test measures. Based on the spectral and temporal theories of pitch perception (Wightman and Green 1974), pitch discrimination could be improved by improving either temporal or spectral resolution. Consistent with these theories, pitch-direction thresholds in CI users were predicted jointly (adjusted R 2 = 0.42, equivalent R = 0.64) by the Schroeder-phase test and the spectral ripple tests. In this case of CI users, the temporal information would be transmitted via temporal envelope modulations rather than TFS.

Finally, no correlation was found between understanding speech in a two-speaker babble background and Schroeder phase. As noted in the introduction, the task of identifying speech in fluctuating backgrounds requires the ability to discern TFS (Qin and Oxenham 2003; Füllgrabe et al. 2006). The mechanisms required to do well on Schroeder-phase discrimination are apparently not the same as the mechanisms required to do well on speech discrimination in fluctuating backgrounds. Presumably, the speech-in-babble task requires sensitivity to within-channel TFS cues to help segregate the fundamental frequencies of different voices. We speculate that if new processing schemes delivered within-channel information, performance on the Schroeder-phase discrimination would improve and a correlation between speech understanding in fluctuating backgrounds and Schroeder-phase discrimination would be observed.

In conclusion, the present study has demonstrated that, although most CI sound processing algorithms do not explicitly encode TFS, Schroeder-phase discrimination is still possible, given sufficient between-channel differences in temporal modulations. Schroeder-phase discrimination ability at 50-Hz and AVG performance over all fundamental frequencies has also been shown to correlate with word recognition, and at 200 Hz, Schroeder-phase discrimination was correlated with the ability to understand a female speaker in steady-state, speech-shaped noise. Furthermore, 200- and 400-Hz Schroeder-phase discrimination ability was correlated with pitch-direction discrimination ability using complex tones with similar fundamental frequencies. These correlations were independent of spectral ripple discrimination ability, suggesting independent spectral and temporal processes underlying complex-tone pitch perception in CI listeners.