Assessing temporal modulation sensitivity using electrically evoked auditory steady state responses

Temporal cues are important for cochlear implant (CI) users when listening to speech. Users with greater sensitivity to temporal modulations show better speech recognition and modifications to stimulation parameters based on modulation sensitivity have resulted in improved speech understanding. Behavioural measures of temporal sensitivity require cooperative participants and a large amount of time. These limitations have motivated the desire for an objective measure with which to appraise temporal sensitivity for CI users. Electrically evoked auditory steady state responses (EASSRs) are neural responses to periodic electrical stimulation that have been used to predict threshold (T) levels. In this study we evaluate the use of EASSRs as a tool for assessing temporal modulation sensitivity. Modulation sensitivity was assessed behaviourally using modulation detection thresholds (MDTs) for a 20 Hz rate. On the same stimulation sites, EASSRS were measured using sinusoidally amplitude modulated pulse trains at 4 and 40 Hz. Measurements were taken using a bipolar configuration on 12 electrode pairs over 5 participants. Results showed that EASSR amplitudes and signal-to-noise ratios (SNRs) were significantly related to the MDTs. Larger EASSRs corresponded with sites of improved modulation sensitivity. This relation was driven by across-subject variation. This result indicates that EASSRs may be used as an objective measure of site-specific temporal sensitivity for CI users.


Introduction
Cochlear implant (CI) recipients often understand speech well in quiet conditions but in difficult listening environments their performance worsens and becomes variable. Pre-, per-and postoperative factors account for 22% of this variance (Lazard et al., 2012). A proposed cause for some of the remaining variability in performance is perceptual variance along the tonotopic axis caused by the quality of each electrode neuron interface (ENI) (Pfingst et al., 2008;Bierer and Faulkner, 2010). Reducing these perceptual variations, by adjusting the stimulation parameters of individual sites, has been suggested as a means for improving speech performance (Zwolan et al., 1997;Pfingst et al., 2008).
The ENI affects the ability of an implanted electrode to transmit information to the auditory nerve. Ideally the electrode lies close to the modiolus, which should contain a full compliment of spiral ganglion cells (SGCs) (Long et al., 2014). Variations in electrode placement, tissue growth and local degeneration of SGCs, will cause variations in the ENI and differences in the perception of both spectral and temporal cues.
To account for individual variation along the implanted array, every device is fitted by an audiologist. The fitted parameters for each electrode include the threshold level (T) and comfort level (C). The parameters are stored in the device, and referred to as the MAP.
Commercial CIs transmit both spectral and temporal cues (Xu et al., 2005). Spectral information is predominantly transmitted through the location of stimulated electrodes, and is distorted by current spread and loss of SGCs. Using a focused tripolar mode, stimulation sites with high T levels have been related to broad psychophysical tuning curves which may indicate dead regions in electrical hearing (Bierer and Faulkner, 2010). High variability in T levels across electrodes negatively affects speech performance (Pfingst et al., 2004;Bierer, 2007;Long et al., 2014), possibly due to distortion of the internal representation of the spectrum of the signal.
Compared to normal hearing listeners, CI recipients have reduced access to spectral cues (Friesen et al., 2001). This places increased importance on temporal sensitivity, which is commonly assessed using modulation detection thresholds (MDTs). MDTs of CI users have been related to consonant and vowel recognition (Fu, 2002) and to word recognition and speech reception thresholds (SRTs) (Won et al., 2011).
The quality of the ENI varies uniquely along each implanted array. Altering the MAP based on the performance of each ENI has resulted in improved speech performance. Site specific adjustments have been made using a variety of selection criteria and adjustment methods. Zwolan et al. (1997) used 200 ms pulse trains to determine which channels along the electrode array could be discriminated from each other. Channels that were indiscriminable from each other were deactivated. With this altered MAP, seven of nine subjects improved in at least one speech recognition measure. Garadat et al. (2012) used masked MDT performance to create two 10 channel MAPs, one with good across-site mean MDT performance and one with poor performance. The MDTs were determined in the presence of an interleaved masker on the adjacent apical site. The array was divided into five sections of four electrodes. In each MAP two electrodes were retained from each section. One MAP retained two electrodes per section that exhibited the best masked MDTs and the second MAP retained the other two electrodes per section. MAPs with better across-site mean MDTs resulted in better speech recognition. Garadat et al. (2013) extended the previous study by creating a MAP for each participant that improved the mean modulation sensitivity while only removing five electrodes. They endeavoured to remove sites in a distributed fashion across the array, but did not always remove electrodes from all regions of the array. The frequency allocation was redistributed across the remaining electrodes. The modified MAP resulted in a mean SRT improvement of 2 dB over the clinical map and led to better performance than the clinical map for consonant recognition but not for vowel recognition. Zhou and Pfingst (2014) increased the T level of the five electrodes with the poorest MDTs. The T level was increased to artificially increase the loudness of the channel, which improves modulation sensitivity. This adjustment resulted in a mean SRT improvement of 2.4 dB.
Psychophysical evaluation of stimulation sites has illustrated the potential benefit of site-specific adjustments, but these behavioural measures are not always clinically feasible due to their extensive testing time and need for a cooperative participant. Objective measures based on evoked potentials offer the possibility of fast automated evaluation of stimulation sites. Electrically evoked auditory brain stem responses (EABRs) have been used to predict high thresholds and thus sites with poor spectral sensitivity (Bierer et al., 2011;Brown et al., 1990). But neither EABRs nor electrically evoked compound action potentials have shown clinically useful correlations with speech perception tasks or temporal sensitivity (Miller et al., 2008). Here we propose electrically evoked auditory steady state responses (EASSRs) as a measure of site specific temporal sensitivity.
EASSRs can be measured for CI recipients. These recordings are distorted by artifacts from radio frequency transmission and electrical stimulation. Removal of these artifacts (Hofmann and Wouters, 2010) has allowed prediction of behavioural thresholds at clinically relevant pulse rates (Hofmann and Wouters, 2012). We hypothesise that stimulation sites with increased neural responses to modulated auditory input will correspond to sites with improved modulation sensitivity. Thus EASSRs will provide an objective method for assessing the temporal sensitivity of cochlear implant stimulation sites.

Participants
Five native Flemish-speaking participants volunteered for this experiment. All participants were CI patients of the ENT Department at the UZ Leuven University Hospitals. The details of each participant are included in Table 1, including their word recognition in sentences as evaluated using the LIST sentences . Testing was approved by the Medical Ethics Committee of the UZ Leuven (approval number B32220072126) and informed consent was obtained.

Experiment
Each participant took part in four sessions, each lasting between two and four hours. The measures in successive sessions were: (1) T and C level measurements and loudness balancing, (2) EASSR measurements, (3) loudness balancing for the modulation detection task and (4) modulation detection task. T and C levels were checked again in sessions 2e4.
All stimuli consisted of symmetric biphasic pulse trains with 60 ms phase width and 8 ms inter phase gap, presented at a rate of 900 pulses per second in bipolar (BP) mode (for further information on the choice of stimulation parameters, see Section 4). All stimuli were delivered using the Cochlear Nucleus Implant Communicator (NIC). Bipolar stimulation was used as it may stimulate a more localised region of the cochlea than for monopolar stimulation (Snyder et al., 2008;Kwon & van den Honert, 2006). Pulse polarity is described relative to the more apical electrode. Cathodic first stimulation is defined as the biphasic negative phase first. All psychophysical modulation detection tasks were conducted using cathodic first stimulation. EASSR recordings were obtained using both polarity configurations. This was done so that a comparison could be made to previously published EASSR results (Hofmann and Wouters, 2012). All stimulus magnitudes are reported in Cochlear clinical current units (cu), which is a logarithmic conversion from amperes. For the CIC3 implant, the conversion from cu to current is i ¼ 10 Â 10 À6 Â 175 cu/255 mA, and for the CIC4 implant i ¼ 17.5 Â 10 À6 Â 100 cu/255 mA.
Stimuli were presented on three electrode pairs for each participant. These electrode pairs were spaced along the array to excite basal, middle and apical regions along the cochlea. All participants used BPþ2 mode except E9, who was unable to perceive any stimulus in this mode (Table 1). This participant had the mode changed to BPþ5, for which T and C levels were reached on the basal and middle electrode pairs. On the apical electrode, participant E9 did not reach a comfortable percept and this site was excluded. The dynamic range (DR) is defined as the difference between C and T levels. Two electrode pairs were excluded (participants E1 and E2) because the local variation in T and C levels of the clinical monopolar MAP was more than half of the mean DR of these electrodes. The T levels of all included sites varied by <40% of mean DR. Both excluded sites were basal pairs and the T level varied by >69% of the mean DR. Greater differences were assumed to be a sign of highly varying ENI conditions, which cannot be unambiguously assessed with the currently available spatially wide stimulation patterns of bipolar stimuli.

Threshold, comfort & stimulation levels
Threshold levels for unmodulated stimuli (T u ) were determined for each stimulation site using an adaptive three-alternative forcedchoice procedure. In each trial, three intervals of 500 ms separated by 500 ms of silence were presented. During each interval, a corresponding button on the computer screen changed color. One randomly selected interval contained an unmodulated pulse train with current that varied on each trial according to an adaptive twodown one-up rule (Levitt, 1971). No stimulus was presented in the other intervals. Participants were asked to identify the interval in which they heard a sound. After indicating which interval contained the stimulus, the next trial started. Six reversals were obtained for each run. The step size was 4 cu until the first two reversals occurred and was 2 cu thereafter. The staircase was visually checked for convergence and the cu values at the last four reversals were averaged and recorded as the threshold. This procedure was completed three times for each site, and results were averaged for each site to determine T u .
The comfort level was defined as the maximum level the user judged they could listen to continuously for 3 h. Comfort levels (C u ) for unmodulated stimuli were determined by increasing the stimulation level from 0 cu until the participant indicated that the percept was uncomfortably loud. Then the level was decreased until the listener indicated that it was at the comfort level, which was recorded as. C u All stimuli were presented using a research speech processor (L34) provided by Cochlear Ltd. Apex3 was used to determine the unmodulated threshold and comfort levels (Francart et al., 2008).

Stimulus loudness balancing
The level of an unmodulated pulse train on the apical electrode was adjusted to be equally loud as stimulation at C u on the middle electrode. This was done using an adjustment procedure consisting of four trials. Each trial continued until the participant indicated that the two sounds were equally loud. In the first trial the listener was repeatedly presented with the target, 1 s of unmodulated pulses at C u on the middle electrode; silence for 500 ms; the variable stimulus, 1 s of unmodulated pulses on the apical electrode; and 1 s of silence. The listener was asked to adjust the variable stimulus until the two sounds were equally loud. If no adjustment was made the presentation continued at the same values. The variable stimulus was started below the expected balanced value. When the listener indicated that the loudness was equal, the difference in cu was recorded as the trial result. In the second trial, the target was changed to an unmodulated pulse train on the apical electrode at the level from the previous trial. The variable stimulus was an unmodulated pulse train on the middle electrode, with the initial value set below C u . Again the listener had to adjust the variable stimulus until the loudness was equal. In the last two trials the target and variable stimuli were the same as for the first two trials, except that the variable stimulus was started above the expected value. In all trials the participant was encouraged to continue past the point where they first found the loudness equivalent and then turn back if required. The average difference in the levels of the target and variable stimuli of the four trials was used to determine the current required for loudness balanced unmodulated pulse trains at C level (C u,b ). The same procedure was used to balance the basal electrode to the middle electrode.
The same four-trial procedure was used at each stimulation site to balance the loudness of a 40-Hz modulated signal to C u,b . In the first trial of the procedure, the target was an unmodulated pulse train at C u,b , and the variable stimulus was a sinusoidally modulated pulse train with current between T u and C m,b . The listener adjusted the value of C m,b .
Loudness differences caused by stimulation polarity were removed by balancing opposing polarity configurations. All previous balancing was with cathodic-first stimuli. Modulated cathodicfirst stimulation between T u and C m,b was used as a target, and modulated anodic-first stimuli were balanced using the aforedescribed procedure.
All stimuli were presented via the L34. Loudness balancing was performed with custom scripts written in python interfacing with NIC.

Stimulation
EASSR stimuli were continuous amplitude-modulated pulse trains with modulation frequencies of 3.906 and 40.039 Hz, called for simplicity 4 and 40 Hz. 40 Hz was chosen, as modulation rates in this range give the largest ASSR SNRs (Picton et al., 2003). 4-Hz ASSRs produce smaller SNRs than those in the 40-Hz range, but can still be reliably detected (Alaerts et al., 2009). Both frequencies are within the range of perceptually relevant envelope modulation frequencies for CI users (Stone et al., 2008;Füllgrabe et al., 2009). Furthermore, 4 Hz was selected as it is a prominent frequency in the average envelope of natural speech (Edwards and Chang, 2013). Pulse amplitudes were sinusoidally amplitude modulated in amperes between T u and C m,b .
Each stimulus consisted of 256 epochs, each with a duration of 1.024 s. Stimuli were presented in both anodic-and cathodic-first configurations, resulting in four recordings per stimulation site. Stimuli were grouped by and randomised across frequencies, stimulation sites and polarities.
Participants were lying on a bed or sitting in a comfortable chair in a sound-proof and electromagnetically shielded room. Participants were asked to move as little as possible and to help them remain awake they watched a silent subtitled movie. Stimuli were generated using custom software as described in Hofmann and Wouters (2010) that interfaced with a L34.

Recording and postprocessing
The EEG was measured using a BioSemi 64-channel system with Ag/AgCl active scalp electrodes. Electrodes were placed according to the standard 10e20 position with the guidance of a headcap. Incoming signals were amplified and sampled at 8192 Hz. EEG data along with synchronisation signals were recorded using the BioSemi actiview software, for offline processing in MATLAB. An artificial recording was generated for each modulation frequency that merged the two EASSR polarities by concatenating epochs of alternating polarity. This allowed for analysis of the two conditions jointly as well as independently of each other. The EEG recorded during CI stimulation contains artifacts from the RF transmission and electrical stimulation. This can mask the neural responses and must be removed prior to further analysis. CI artifacts were removed using a method adapted from Hofmann and Wouters (2010), but this experiment used a different amplifier with a lower sampling rate that produced longer artifacts.
To remove artifacts, the signal was sampled at one instance between each RF transmission and linearly interpolated between these points. Recordings to BPþ2 stimulation contained artifacts that lasted for 700 ms after each stimulation pulse onset. The BPþ5 artifact lasted for 1000 ms. For consistent analysis, all artifacts were removed by interpolation at 1050 ms after stimulus pulse onset. The right-most columns of Fig. 4 in Hofmann and Wouters (2010) graphically illustrate the artifact removal technique.
After CI artifact rejection, a second-order butterworth high-pass filter attenuated frequencies below 2 Hz. To remove muscle potentials and other recording artifacts, the 10% of epochs with the highest peak to peak amplitude were rejected. To reduce variability and the influence of spontaneous neural activity, measurements from a subset of recording electrodes were averaged to give a single signal. Electrodes that were found to have significant responses as determined by a one-sample Hotelling T 2 test (Hofmann and Wouters, 2012) in more than 50% of recordings across subjects and stimulus electrodes were included in the average. Electrodes in the mirror image position of the head to those with 50% significant responses were also included. The selected electrodes were P9, P10, PO7, PO8, PO3, PO4, CP5, CP6, O1, O2, Oz, Iz and Pz. For each participant, any electrode that was touching the CI coil was removed from analysis. In all recordings, the phases of all selected electrodes were within ±30 of each other, ensuring there was no destructive interference. All results were calculated with Cz as the reference electrode.
A discrete Fourier transform was used to compute the complex frequency spectrum for each epoch. The response amplitude and phase were calculated as the mean of the response bin across epochs. Signal power was defined as the amplitude squared. The noise power was defined as the standard deviation of the response bin amplitude across epochs squared. The SNR was defined as the signal power in the frequency bin centred at the modulation frequency divided by the noise power in the same bin. A one sample Hotelling T 2 test, as used by Hofmann and Wouters (2012), was applied to determine whether EASSRs were present in the recorded signal.

Modulation detection
Modulation detection was assessed at 50% DR. Target stimuli were cathodic-first pulses amplitude modulated in cu at 20 Hz. The MDT studies mentioned in section 1 used a variety of different modulation rates including 10 Hz (Garadat et al., 2012;Zhou and Pfingst, 2014), 40 Hz (Pfingst et al., 2008) and 100 Hz (Fu, 2002). 20 Hz was used in this experiment as it was between the two EASSR rates and within the range used in previous CI studies. Modulation depth was defined as the peak-to-valley difference in cu. Measurements were taken at modulation depths of 1, 5, 10, 20 and 30 cu and 100% DR. If additional time was available, extra depths were included. If the peak-to-valley modulation depth was greater than 100% DR, then these depths were excluded. Such a wide range of modulation depths was tested to ensure that the MDT was reached for all participants.
As the loudness of fluctuating sounds in electrical hearing varies with modulation depth and absolute current level (McKay and Henshall, 2010), all modulated stimuli were loudness balanced to an unmodulated pulse train at 50% DR. For each modulation depth, a balancing procedure similar to that described in Section 2.4.1 was used. In the first trial the target was an unmodulated pulse train at 50% of the DR and the variable stimulus was modulated with a 20-Hz sinusoid with constant modulation depth. The listener was able to adjust the level of the entire modulated pulse train, while maintaining a constant modulation depth. Trials two through four followed the process previously described. Additionally, current level jitter of ±4 cu was applied to each stimulus during the MDT task to minimise any residual loudness cues after loudness balancing (Fraser and McKay, 2012).
A three-alternative forced-choice procedure was used to determine the modulation detection performance at each site and modulation depth. Three 500-ms stimuli were presented in each trial, separated by 500 ms of silence. Each stimulus was accompanied by a corresponding button changing color. Two intervals contained unmodulated stimuli, and one randomly selected interval contained the modulated stimulus. Participants were instructed to press the button corresponding to the interval that was different from the other two. 25 trials were performed at each depth and split across two measurement sessions to ensure continued concentration of the participant. Within each session, only a single stimulation site was tested. The order of presented stimulation sites and modulation depths was randomised. A training session of 10 trials at 100% modulation depth was completed for each site with feedback at the start of the experiment. For each stimulation site, psychometric functions were fitted to the percentage of correct responses as a function of modulation depth using the toolbox of Fründ et al. (2011). From the fitted curves the threshold was defined as the mid point between chance (33%) and completely correct (100%). The 66% correct point was extracted as the threshold, converted to ampere and reported in dB.

Statistical analysis
To determine the relation between behavioural MDTs and objective EASSRs, regression analysis was conducted on the twelve paired measurements. MDTs were compared to the SNR and amplitude of the EASSR. Analysis was conducted on the individual modulation frequencies and multiple regression analysis was used to assess the contribution from multiple modulation frequencies. In both the 4-and 40-Hz conditions, all 12 data points were included in the statistical analysis regardless of whether a significant response was detected. Statistical analysis was conducted using the R programming language.

Threshold, comfort and stimulation levels
To elicit an equal loudness percept between unmodulated and modulated stimuli, C m,b was always greater than C u,b . The difference varied between 2 cu and 19 cu (mean ¼ 6.5 cu, sd ¼ 4.4 cu). To balance anodic-with cathodic-first modulated stimuli, the additional current units required varied between À4.3 cu and 8.3 cu (mean ¼ 0.5 cu, sd ¼ 3.1 cu). Modulated stimuli at varying depths were loudness balanced to a constant pulse train. Fig. 1 illustrates the additional current units required to elicit an equal loudness percept for each of these conditions. Fig. 2 shows the estimated EASSR amplitudes for all measurements in bipolarþ2 mode when the signal was sampled at different times after stimulation. It can be seen that initially the signal was distorted and the amplitude artificially increased. This artifact peaked around 400 ms after stimulation, and decreased with further increase of time to 700 ms. Statistical analysis of the signal sampled before 700 ms incorrectly indicates every recording as containing a significant EASSR response. Similar analysis indicated that the bipolarþ5 artifact lasted 1000 ms. All further results are for artifacts removed by interpolating the signal 1050 ms after stimulus onset. To confirm that no residual artifact was present, the false detection rate at neighbouring frequencies was checked to be 5%. Also the Fig. 1. Loudness balancing results. Difference in peak cu required to balance: a) unmodulated pulse trains to modulated stimuli with peak-valley modulation depth equal to 100% DR; b) modulated pulse trains at increasing modulation depth to an unmodulated pulse train; c) cathodic-first modulated pulse train to anodic-first modulated pulse train. Symbols indicate the stimulation site. Error bars extend ±1 standard deviation from the mean. response phase was checked, to ensure it was not perfectly aligned with the RF transmission and electrical stimulation.

Electrically evoked auditory steady state responses
EASSRs were detected for all participants. For the 10min measurements, the percentage of significant responses was 83% and 100% for the 4-and 40-Hz measurements, respectively. Measurements longer than 10 min per condition would reduce the noise level and possibly increase the detection rate at 4 Hz. On average, amplitudes were no greater for anodic-first than for cathodic-first stimuli. The difference was not significant for either modulation frequency: 4 Hz, t(11) ¼ 1.05, p ¼ 0.32; 40 Hz, t(11) ¼ À1.11, p ¼ 0.29. As there was no difference in responses for the two polarities, the two signals were merged as described in Section 2.4.3, and analysed as a single recording. All of the following results are for the merged signals.
A summary of the EASSR response and background characteristics is given in Table 2. This information is presented graphically in Fig. 3, which summarises the amplitudes, and Fig. 4, which summarises the phase and SNR for each modulation frequency. Background activity was lower for the higher modulation frequencies, and larger SNRs were found at 40 Hz than at 4 Hz.

Modulation detection
MDTs determined from the fitted psychometric curves (see Fig. 5 for an example) varied between À32 and À12 dB. The accuracy of the MDTs was limited by the number of balanced modulation depths, resulting in a large variation of the 95% confidence intervals.

Discussion
Both objective EASSRs and behavioural MDTs were measured for 12 cochlear implant electrode pairs in five participants. Regression analysis showed that the 40-Hz EASSR was significantly related with the MDT, confirming the hypothesis that EASSRs may be used to assess site-specific temporal sensitivity.
For CI users, temporal sensitivity is of considerable importance due to their reliance on low-frequency modulation cues in the speech envelope. The standard measure of temporal sensitivity is modulation detection, which is related to vowel, consonant and word recognition (Fu, 2002;Won et al., 2011). Furthermore, MAP adjustments based on MDTs, such as site removal or T level adjustment, have resulted in significant speech perception improvements (Garadat et al., 2012;Zhou and Pfingst, 2014).
Participants with a wide range of speech perception performance were recruited for this study in an attempt to characterise the entire breadth of temporal sensitivity outcomes. This approach resulted in a spread of MDTs of 20 dB which is illustrated in Fig. 6. But contrary to expectations (Fu, 2002), users with the best and worst speech performance did not correspond to those with the best and worst MDTs. Each modulated stimulus was loudness balanced to a reference signal to ensure that the task was not performed using loudness cues (Fraser and McKay, 2012). We found that to maintain equal loudness with an unmodulated pulse train, the peak of the modulated signal had to be increased with increasing modulation depth. Future experiments could interpolate the offset required to produce an equal loudness percept, which would allow the use of adaptive procedures and reduce measurement time.
Two primary factors hinder the use of MDTs as a clinical tool: (1) MDTs require an active participant for an extended period of time, which makes testing young children a challenge, and (2) MDTs require long measurement times. These constraints have motivated the desire for an objective measure to assess the temporal sensitivity of cochlear stimulation channels. This study measured EASSRs at two modulation frequencies, both related to speech perception (Alaerts et al., 2009;Poelmans et al., 2012). 4 Hz is a prominent modulation frequency in the average envelope of speech (Assmann and Summerfield, 2004;Plomp, 1983;Edwards and Chang, 2013) and approximates the rate of syllables in running speech (Greenberg, 1999). Modulation rates in the range of 40 Hz give the largest ASSR SNRs (Picton et al., 2003). Both modulation rates are within the range of perceptually relevant envelope modulation frequencies for CI users (Stone et al., 2008;Füllgrabe et al., 2009). Given the envelope extraction strategies employed by most of the current generation CIs, and the extra reliance of CI users on temporal cues (Friesen et al., 2001), we hypothesised that the link between speech understanding and these low frequency ASSRs may be more pronounced in CI users than in normal hearing. Consistent EASSR responses were detected for both modulation frequencies. Longer measurement times would decrease the noise level, and most probably increase the detection rate at 4 Hz. In acoustic hearing, the 40-Hz response is known to have a larger SNR than for other modulation frequencies (Picton, 2010). Similarly, we found that EASSR SNRs where higher at 40 than at 4 Hz.
A significant relation was found between objective and behavioural measures; as EASSRs increased the MDTs improved. This relation existed for the EASSR SNRs and amplitudes for the 40-Hz modulation data, and the combined 4-and 40-Hz data. The relation between EASSRs and MDTs was primarily driven by acrosssubject differences. Further investigation should be conducted of across-electrode variation within individual arrays. A procedure that produces smaller error in the MDT results would help to distinguish across-electrode differences.
The relation between EASSRs and MDTs was predominantly driven by the 40-Hz EASSR measurements. The higher SNRs at 40 that at 4 Hz may have provided a more accurate sampling of the underlying neural mechanisms and this may explain the stronger relationship with MDTs. The 40-Hz response is generated earlier in the auditory pathway than the cortically generated 4-Hz response. The former is generated closer to the auditory nerve and brainstem (Herdman et al., 2002) and it may better reflect the signal transmitted from the electrode-neuron interface. Further investigation  of the relation between EASSRs and temporal modulation sensitivity at higher modulation rates may determine if EASSRs generated earlier in the auditory pathway better predict MDTs.
The choice of electrical stimulation parameters was restricted by current EASSR artifact removal techniques. The implemented technique requires a balance between phase width, pulse rate and stimulation mode. Increased phase width and wider stimulation modes produce longer artifacts and faster pulse rates leave less time to sample the artifact free data. In this study we chose to use clinically relevant stimulation rates at the cost of using a nonstandard stimulation mode. Bipolar stimulation mode was chosen as it may activate a more restricted region of the cochlea than monopolar stimulation (Snyder et al., 2008;Kwon & van den Honert, 2006) and act as a useful indicator of localised neural health (Chatterjee and Yu, 2010). Response amplitudes at 40 Hz were comparable to those for the bipolar measurements reported by Hofmann and Wouters (2010). However, there are no published reports of EASSR responses for frequencies below 40 Hz. When compared to acoustic responses, the 4-Hz amplitudes and SNRs were in the expected range (Picton et al., 2003).
Bipolar stimulation was also used in the modulation detection task to ensure that both measures were assessing the same cochlear space. The use of bipolar stimulation limits the immediate applicability of this study to commercial devices, as across-site MDT patterns differ depending on configuration mode (Pfingst, 2011). Improved artifact rejection techniques are required to measure EASSRs in monopolar mode. When these techniques become available, the relationship between EASSRs and MDTs should be re-evaluated with both the stimulation mode and rate matching those of commercial devices.
EASSR measurement for a single stimulation site took approximately five minutes using 40-Hz modulation and a single polarity. Further improvements could be made to reduce measurement time by optimising the stimulation parameters and post processing. ASSR techniques used to reduce measurement time could be adapted to EASSRs, including the multiple auditory steady-state response method (John et al., 1998) or advanced signal processing algorithms for noise removal and signal detection (De Cheveign e and Simon, 2008). Furthermore, other modulation frequencies may elicit larger and more reliable responses than at 4 and 40 Hz. The selection of modulation rates in this study was based on acoustic data of how responses vary with modulation rate (Picton et al., 2003;Alaerts et al., 2009). A study to determine which frequencies elicit the largest and most consistent responses in electrical hearing may reveal other more appropriate frequencies for modulation.
ASSRs are used as an early diagnosis tool for frequency specific hearing loss in hospitals. Their threshold detection reliability has been studied for both normal hearing and hearing impaired listeners (Luts and Wouters, 2005). The repeatability of ASSR amplitude measurements has also been studied (D'haenens et al., 2008). EASSRs are simply the electrical analogue, yet a systematic study of their reliability has not been conducted. Similarly, it should be investigated how long after implantation EASSRs become stable across the array, as more peripherally evoked potentials can take six to eight months to stabilise (Miller et al., 2008).
In conclusion it was found that EASSRs were significantly related to MDTs. As EASSRs increased, modulation detection improved. This indicates that EASSRs may be used as an objective measure of site-specific temporal sensitivity. Combined with measures of spectral sensitivity, this will allow for the practical characterisation of the electrode-neuron interface and provide an objective measure for the optimisation of site-specific stimulation parameters.

Disclosure
The authors declare that they have no conflicts of interest.