Distinct Representations of Tonotopy and Pitch in Human Auditory Cortex

Emily J. Allen; Juraj Mesik; Kendrick N. Kay; Andrew J. Oxenham

doi:10.1523/JNEUROSCI.0960-21.2021

Abstract

Frequency-to-place mapping, or tonotopy, is a fundamental organizing principle throughout the auditory system, from the earliest stages of auditory processing in the cochlea to subcortical and cortical regions. Although cortical maps are referred to as tonotopic, it is unclear whether they simply reflect a mapping of physical frequency inherited from the cochlea, a computation of pitch based on the fundamental frequency, or a mixture of these two features. We used high-resolution functional magnetic resonance imaging (fMRI) to measure BOLD responses as male and female human participants listened to pure tones that varied in frequency or complex tones that varied in either spectral content (brightness) or fundamental frequency (pitch). Our results reveal evidence for pitch tuning in bilateral regions that partially overlap with the traditional tonotopic maps of spectral content. In general, primary regions within Heschl's gyri (HGs) exhibited more tuning to spectral content, whereas areas surrounding HGs exhibited more tuning to pitch.

SIGNIFICANCE STATEMENT Tonotopy, an orderly mapping of frequency, is observed throughout the auditory system. However, it is not known whether the tonotopy observed in the cortex simply reflects the frequency spectrum (as in the ear) or instead represents the higher-level feature of fundamental frequency, or pitch. Using carefully controlled stimuli and high-resolution functional magnetic resonance imaging (fMRI), we separated these features to study their cortical representations. Our results suggest that tonotopy in primary cortical regions is driven predominantly by frequency, but also reveal evidence for tuning to pitch in regions that partially overlap with the tonotopic gradients but extend into nonprimary cortical areas. In addition to resolving ambiguities surrounding cortical tonotopy, our findings provide evidence that selectivity for pitch is distributed bilaterally throughout auditory cortex.

Introduction

A key organizing principle of the auditory system is tonotopy, an orderly mapping of sound frequency to place. Tonotopy is established in the cochlea, where different frequencies maximally displace different locations along the basilar membrane, in a high-to-low ordering from the base to the apex (Von Békésy, 1960). This tonotopic organization has been found at numerous stages of the auditory pathways, up to and including auditory cortex (Saenz and Langers, 2014; Thomas et al., 2015). Studies of cortical mapping using functional magnetic resonance imaging (fMRI) have typically employed pure tones or narrowband noises (Formisano et al., 2003; Talavage et al., 2004; Da Costa et al., 2011; Striem-Amit et al., 2011; Saenz and Langers, 2014) in much the same way as has historically been done to establish tonotopy in earlier stages of the auditory processing hierarchy (Von Békésy, 1960; Bourk et al., 1981; Nuttall and Dolan, 1996; Ruggero et al., 1997; Schreiner and Langner, 1997; Narayan et al., 1998; Cooper, 1999). However, pure tones, narrowband stimuli, and even many natural sounds, conflate two primary perceptual attributes of sound: pitch height and timbral brightness. In most sounds, pitch (i.e., the property defining melodies in music) is determined by the fundamental frequency (F0), whereas timbre (the perceptual property that distinguishes a trumpet from a clarinet, even when they play the same note at the same loudness), is affected by the spectral centroid (Fc) of the sound's energy distribution, with brightness increasing with increasing Fc (Krumhansl and Iverson, 1992; Marozeau et al., 2003; Allen and Oxenham, 2014). Because previous studies have used stimuli in which these two dimensions covary, it remains unclear whether the spatial organization observed in cortex simply reflects frequency-to-place mapping, inherited from the cochlear representation of spectral content, or whether some or all portions of the cortical maps instead reflect one or more higher-level features, such as pitch and/or brightness.

Although precisely localizing primary auditory regions in human auditory cortex is an ongoing challenge (Moerel et al., 2014), there is mounting evidence that the primary area A1 [estimated to be within Heschl's gyrus (HG)] shows a preference for relatively simple acoustic features, whereas surrounding nonprimary areas show greater sensitivity to complex stimuli, such as speech and music (Norman-Haignere et al., 2015; de Heer et al., 2017; Kell et al., 2018). Thus, it may be that the multiple gradients identified in previous studies as multiple tonotopic maps (Moerel et al., 2014; Saenz and Langers, 2014), reflect not just different auditory fields, but also maps of different auditory features.

To distinguish between the mapping of Fc (timbral brightness) and F0 (pitch) in cortical representations, we used high-resolution 7T fMRI to measure cortical responses to sequences of pure tones that varied over a range of frequencies, and complex tones that varied in either Fc or F0. We then used computational models to characterize the spatial organization of responses to each of these features. Our results replicate previously found tonotopic maps produced by pure tones and find similar responses to complex tones with a distinct spectral peak, consistent with the organization found in the more peripheral auditory pathways. However, our results also reveal new tuning to pitch in bilateral regions that partially overlap with the tonotopic maps but are located primarily outside HG. Overall, our findings reveal the existence of spatially organized representations of both tonotopy and pitch bilaterally within human auditory cortex.

Materials and Methods

Participants

The Institutional Review Board (IRB) for human participant research at the University of Minnesota approved the experimental procedures. Written and informed consent was obtained from each participant before data collection. Ten members of the University of Minnesota community [average (SD) age of 29.3 (4.2) years; six females, four males], all right-handed and having normal hearing, defined as audiometric pure-tone thresholds of 20 dB hearing level (HL) or better, at octave frequencies between 250 Hz and 8 kHz, participated in this study. An eleventh participant was excluded after having great difficulty hearing the stimuli and discovering elevated thresholds since their last audiogram, making them no longer eligible for participation.

Stimuli

All stimuli were generated in MATLAB (The MathWorks) and presented using the Psychophysics Toolbox (Kleiner et al., 2007). Stimuli were presented in three conditions: pure tones, complex pitch tones, and complex timbre tones. The 13 pure tones, each a single frequency, spanned six octaves (100–6400 Hz), in half octave steps (Fig. 1A). The complex pitch and timbre tones were bandpass filtered harmonic complex tones (Fig. 1B). For all complex tones, the components started in sine phase and were bandpass filtered with 12 dB per octave slopes around the center frequency (CF), and then lowpass filtered with a 16th order filter and a cutoff frequency of 10 kHz. The nine complex timbre tones had a fixed F0 of 200 Hz and varied in the location of the bandpass filter's CF, spanning four octaves (400–6400 Hz) in half octave steps (Fig. 1C). The nine complex pitch tones had a fixed bandpass filter CF of 2400 Hz and a varying F0, which spanned four octaves (100–1600 Hz) in half octave steps (Fig. 1D). The ranges for the pitch and timbre conditions were chosen to ensure that the Fc was always well above the F0, so that the peak of the spectral envelope (corresponding to the CF of the bandpass filter) was always defined by the amplitudes of the harmonics.

Experimental design and task

Stimuli were presented via MRI-compatible Sensimetrics S14 foam tip earbuds with custom filters to flatten the frequency response. Sound attenuation with S14s is consistent with standard foam plugs. These earphones, which are commonly used in auditory fMRI studies, have well-documented distortion characteristics that are consistent across different pairs of S14s (Norman-Haignere and McDermott, 2016). Because of the broadband nature of the pitch and timbre stimuli, as well as the broadband masking provided by the continuous scanner noise (with the most prominent peaks falling around 1 kHz), any subtle increases in amplitude around certain harmonics caused by distortion products would have a negligible contribution to the sound percept, which is largely dominated by the stimulus itself.

The stimuli were adjusted to be of equal perceptual loudness. This was done through a multistep process. First, in a separate session from the main experimental session, two participants, while wearing the S14 earphones, listened to repetitions of a single tone type, in blocks lasting 15 s. The scanner was not running during this session. Participants were instructed to adjust the level of the tone by pressing button “1” on the button box to decrease the level and button “2” to increase the level, until it was clearly and comfortably audible. Once they were satisfied with the level, the participants would press “3” to stay at that level for the remainder of the block. If they did not press “3,” they would automatically advance to the next block at the end of the 15-s block. In each subsequent block, they were instructed to make the tone as loud as the tone presented in the previous block. In these blocks, tones were ordered randomly without replacement. The participants performed three repetitions of this task, with the aim of making all the tones equal in loudness. We calculated the median level of the three trials for each tone. These median levels for each tone were then increased by 25–30 dB to account for the presence of the scanner noise. Since equal loudness percepts across frequencies tend to compress at higher levels (International Organization for Standardization, 2003), the same participants then listened to the sounds while the scanner was running and continued to make adjustments until all sounds were again of approximately equal loudness over the scanner noise. These levels were further adjusted and customized for each participant at the beginning of their respective sessions, as needed, until all the tones were reported as being of roughly equal loudness. The final equal loudness contours were similar across participants, with only small offsets in the mean level required for comfortable audibility. The mean (SD) level was 83.4 (5.2) dB sound pressure level (SPL) for the pure tones, 80.2 (3.9) dB SPL for the timbre tones, and 75.3 (1.8) dB SPL for the pitch tones.

We incorporated a “Morse code”-like rhythm into the stimuli to enhance their perceptual salience over the sound of the MR pulse sequence, inspired by the stimulus design of Thomas et al. (2015). Each stimulus was presented with an equal number of short (50 ms) and long (200 ms) tone bursts, including 20-ms onset and offset ramps. Every 700 ms consisted of two short and two long tones, each followed by a 50-ms gap, presented in random order. This process was repeated 11 times, with random shuffling of the tones for each repetition, for a total stimulus length of 7.7 s (Fig. 1G). All tones presented within the 7.7 s had an identical F0 and Fc but varied in duration (50 or 200 ms). After a 700-ms gap, a new stimulus was presented (with a new frequency, F0, or Fc, depending on the condition) for 7.7 s, and so on, until all stimuli within a given condition were presented once (i.e., one condition block) in a random order, followed by a 12-s silent gap (Fig. 1F). There were 12 experimental runs (three pure-tone runs and nine complex-tone runs), each about 6 min long (Fig. 1E). The order of the pure-tone and complex-tone runs was counterbalanced across participants. Each pure-tone run consisted of three pure-tone blocks and each complex-tone run consisted of two pitch blocks and two timbre blocks, presented in a random order. To avoid run-specific differences across conditions, both complex-tone conditions were included within each run. Because of the well-established and robust nature of pure-tone tonotopy, most of the scanning session was used to acquire data for the complex-tone conditions. Within a session, there were a total of nine trials for each pure-tone stimulus and 18 trials for each of the pitch and timbre stimuli. Ten seconds of silence was added to the beginning and end of each run. Participants were instructed to keep very still and resist any desire to move to the rhythm of the stimuli. For each 7.7-s stimulus, the participants' task was to indicate, via button box, whether the current stimulus was lower or higher (in either pitch or timbral brightness) than the previous one.

MRI

All data were acquired using Siemens scanners at the Center for Magnetic Resonance Research (CMRR, University of Minnesota). Functional data were acquired at the passively shielded 7T Siemens MAGNETOM scanner using a single transmit 32-channel Nova Medical head coil. The acquisition parameters for the gradient-echo EPI sequence used were: repetition time (TR) = 1400 ms; echo time (TE) = 20 ms, field-of-view (FOV) = 198 mm; matrix size 180 × 180; number of slices = 44; 1.1-mm isotropic voxels; multiband factor = 2; generalized autocalibrating partially parallel acquisition (GRAPPA) acceleration factor = 3. Slices were angled to align with the Sylvian fissure of each participant to fully encapsulate auditory cortices. The sound level of the functional sequence at the center of the bore was 101 dBA before earphone attenuation. Four fieldmaps were also collected throughout each session for distortion correction. The acquisition parameters for the fieldmaps were: TR = 190 ms; first TE = 4.08 ms; second TE = 5.1 ms; 2.2 mm isotropic voxels; 22 slices. The complex-tone runs had 258 volumes, and the pure-tone runs had 267 volumes.

Anatomical (T1 and T2-weighted) data were acquired at the Siemens 3T Prisma scanner with a 32-channel head coil. MPRAGE T1-weighted parameters were: TR = 2400 ms; inversion time (TI) = 1000 ms; TE = 2.22 ms; flip angle = 8°; 0.8-mm isotropic voxels. T2-weighted parameters were: TR = 3200 ms; TE = 563 ms; 0.8-mm isotropic voxels. Six T1s and three T2s were acquired for each participant.

Half of the participants used custom foam Caseforge head cases (https://caseforge.com/). The posterior portion of each head case was used to help stabilize participants' heads during the scans and additional padding was added under the neck and around the ears for further stabilization and comfort. The remaining participants used standard MR-compatible foam padding on the back of the head, along with additional neck and ear padding.

Anatomical and functional preprocessing

The data were preprocessed using a custom pipeline (Kay et al., 2019). Gradient unwarping, which corrects image distortions caused by of gradient nonlinearities, was performed on the T1-weighted and T2-weighted anatomical volumes using the gradient coefficient file provided by Siemens. All six T1 volumes for a given participant were then coregistered using rigid-body transformation with six degrees of freedom and cubic interpolation. Once aligned, the volumes were averaged together to improve contrast between the gray and white matter for high-quality segmentation. The same process was used for the three T2 volumes. The averaged T2 volume was then aligned to the averaged T1 volume for each participant.

Cortical reconstruction was performed via FreeSurfer (Fischl, 2012) using the averaged T1 volume. Since the anatomical data had submillimeter resolution, a “hires” flag was added, and an expert file was used to specify a larger number of inflation iterations (50). Segmentation results were then visually inspected in Freeview. The functional data were sampled across the cortical thickness at 25%, 50%, and 75% cortical depths and then averaged together. Note that while the analyses were performed on vertices in cortical surface space, for simplicity, the term “voxel” will be used throughout. For group-level surface maps, individual participant results were mapped to FreeSurfer's fsaverage cortical surface group space via nearest-neighbor interpolation. Fsaverage is an anatomical surface template to which individual participants can be aligned via curvature-based alignment.

Functional data preprocessing included slice time correction, fieldmap-based undistortion, and motion correction. Functional data were aligned to the anatomical data using an affine transformation. In the slice time correction step, the data were temporally upsampled from 1.4 to 1 s. In the motion correction step, the data were sampled onto the FreeSurfer depth-dependent surfaces. No smoothing was applied to the data.

The GLMdenoise technique was used to process the data and obtain a clean estimate of the BOLD response related to the experimental conditions (Kay et al., 2013). The GLMdenoise toolbox is available at http://kendrickkay.net/GLMdenoise/. The β weights were used to specify the amplitude of the BOLD response to each stimulus and polynomial regressors were used to specify the baseline response in each run. Each 7.7-s stimulus was analyzed as a block, and a canonical hemodynamic response function (HRF) was assumed. Leave-one-run-out cross-validation was performed and R² was used to quantify the proportion of the time-series variance (R²) that could be explained by the stimuli across all conditions. The three pure-tone runs (each containing three repetitions of each tone, totaling nine repetitions of each pure tone across the scanning session) were used to estimate three β weights per tone. Likewise, the nine complex-tone runs (each containing two repetitions of each tone, totaling 18 repetitions of each complex tone across the scanning session) were used to estimate two β weights for each pitch and timbre tone.

Statistical analysis

Encoding models

Encoding models were used to explore how similar or dissimilar topographic representations of pure tones were to the representations of complex tones varying either in their F0 or Fc. The first model implemented was the feature tuning model: Rf=ge((f−CF)2−(2σ2)), where responses (R) to a given feature (f) were modeled using a Gaussian filter function described by three parameters: gain (g), CF, and SD (σ) of the Gaussian, where g relates to the height of the filter and σ relates to its bandwidth. While the architecture of this model was identical across conditions, the input into this model varied: for pure tones, a single frequency value was the input for each of the pure-tone stimuli; a single F0 value was the input for each of the pitch stimuli; and a single Fc value was the input for each of the timbre stimuli.

We assessed model performance using n-fold cross-validation, with pure tones having 3 folds (two β weights per stimulus used for training, one for testing), and pitch and timbre each having 2 folds (one β per stimulus for training, one for testing), because of the number of β estimates that came out of the general linear model (GLM) analysis. For each fold, model R² values were derived using the held-out data, by computing the proportion of the original variance in the data that was unaccounted for by the model fit and subtracting this quantity from 1: R2=1−∑in(bi−b̂)2∑in(bi−b¯)2, where b is the pattern of β weights across n stimuli in a given condition, b¯ is the mean across β estimates, and b̂ are the predicted βs of the model for the corresponding stimuli. Higher R² values indicate more accurate model predictions of the variability in β estimates across tones within a given condition. Proportions of “tuned” voxels in a given region of interest (ROI) were calculated by taking the total number of voxels in a given condition with a model R² > 30%, divided by the total number of tuned voxels (model R² > 30% for any condition).

The second model implemented was the spectral tuning model, which was inspired by the population receptive field (pRF) method (Dumoulin and Wandell, 2008; Thomas et al., 2015). Instead of characterizing responses to each stimulus on the basis of a single-valued stimulus property, as was done for the feature tuning model, the spectral tuning model took into account the entire frequency spectrum of each stimulus. The form of this model is the same as Equation 1, except the response RS depends on the full linear amplitude spectrum (S) of a given stimulus, sampled at frequencies (f) up to 10 kHz: RS=∑f(ge((f−CF)2−(2σ2))×Sf).

While this model was the same as the feature tuning model for the pure-tone stimuli, which were characterized as a single frequency in both cases, it changed the input for the pitch and timbre stimuli, which are harmonic complex tones containing many frequencies. Because the input feature for this model was the frequency spectrum of the stimuli, the same model could be simultaneously applied to all conditions. However, to more closely compare the results of the feature tuning model to the spectral tuning model, this model was applied to each condition separately.

ROIs

The ROIs for this study were the main tonotopic regions within and around HG. The ROIs for each participant (one per hemisphere) were manually defined based on several criteria: macroanatomical landmarks of auditory cortices (identifying the HG for each participant,), myelin density maps, and functional data (i.e., the GLM R² maps and pure-tone tonotopy results of the feature tuning model). The boundaries of HG were identified by two independent raters with experience locating HG on the cortical surface, and then cross-checked with the Destrieux atlas delineations in FreeSurfer (aparc.a2009s). The myelin density observed in and around that region was used to expand the ROI. These myelin density maps were generated by dividing the averaged T1 by the aligned and averaged T2 of a given participant. Myelin density was sampled across the cortical thickness at 25%, 50%, and 75% cortical depths. These samples were then averaged together for a mapping of density across cortical depths (Fig. 2). For all participants, these maps showed greatest cortical myelin density in somatosensory, visual, and auditory regions, consistent with earlier studies (Glasser and Van Essen, 2011). Since myelin density maps are gradients lacking clear boundaries, minor adjustments were made using the functional data to ensure that the ROIs were not too conservative, so as to be omitting voxels with high R² values or parts of the main tonotopy gradients, but also not too liberal, so as to include an excessive number of uninformative voxels. These ROIs were then used for all surface plots for a given participant. Group-level ROIs were the intersection of all 10 participant's ROIs in each hemisphere.

For analyses involving the estimation of tuning properties within versus outside HG, the ROIs were divided into HG and non-HG sections. The HG ROI was manually drawn, based on the criteria described in the previous paragraph. The non-HG ROI was created by excluding the HG ROI from the original ROI.

Representational similarity analysis

In order to quantify the similarity of multivariate patterns of voxel activation elicited by different stimuli, representational similarity matrices (RSMs) were computed. For each participant, the average pattern of voxel-wise β estimates for each stimulus was computed within the two ROIs (both hemispheres), and voxels with a GLM R² of at least 10% (reflecting robust stimulus-driven activation) were selected for use in this analysis. Subsequently, activation patterns for all stimulus pairs were used to compute pairwise Pearson's correlations, generating a matrix whose off-diagonal elements reflect representational similarities between different pairs of stimuli. The group-level RSM was computed by averaging participant-level RSMs across all 10 participants.

Results

Behavioral results

Behavioral performance in judging whether the current stimulus was higher or lower than the previous one (Fig. 1) was high across all three conditions for all participants, suggesting that they successfully attended to the stimuli. The average proportion of correct responses was 96.8% (SD = 2.3%) in the pure-tone condition, 93.1% (4.5%) in the timbre condition, and 95.8% (4.4%) in the pitch condition. Because of near-ceiling performance in all conditions, a nonparametric Friedman test was run to detect differences in performance between conditions, which indicated a significant main effect χ²(2) = 9.6, p = 0.008. Post hoc analysis with a two-tailed Wilcoxon signed rank test was then conducted to compare conditions. After a Bonferroni correction for multiple comparisons, setting α to 0.017 (0.05/3), none of the paired comparisons reached significance (pitch vs timbre: Z = −1.78, p = 0.074; pitch vs pure tones: Z = 0.00, p = 1.00; timbre vs pure tones: Z = −2.35, p = 0.019). Therefore, differences in cortical representations between the three conditions are unlikely to be because of differences in behavioral performance.

Figure 1.

Stimulus and experimental session configurations. A, Frequencies of all 13 tones used in the pure-tone condition. B, Schematic diagram of harmonic complex tone manipulation. Shifts in the spectral envelope to the right or left (orange) correspond to increases or decreases in Fc, resulting in a higher (i.e., “brighter”) or lower (i.e., “duller”) timbre percept, respectively. Increases or decreases in the spacing between harmonics (yellow) correspond to increases or decreases in the F0 of the complex, resulting in a higher or lower pitch percept, respectively. All stimuli were lowpass filtered at 10 kHz. C, Spectra of all nine complex tones used in the timbre condition (F0 fixed at 200 Hz). D, Spectra of all nine complex tones used in the pitch condition (Fc fixed at 2400 Hz). E, Counterbalanced order of conditions across all runs. F, Randomized order of stimuli within a condition block. G, “Morse code”-like rhythm of each stimulus.

Figure 2.

Myelin density map and ROI. Average myelin density across cortical depths shown on an inflated brain for one participant, thresholded at the 50th percentile (top), and the same myelin density map, but restricted to the ROI within auditory cortex of the participant (bottom). HG denoted by white lines.

To verify that our pitch stimuli produced highly salient pitch percepts, psychophysical tests were conducted (Moore et al., 1984). A two-alternative forced-choice adaptive staircase procedure was used with a two-down one-up adaptive tracking rule that tracks the 70.7% correct point of the psychometric function (Levitt, 1971), consistent with our earlier work (Allen and Oxenham, 2014). F0 difference limens (F0DLs) were measured on four participants (three of whom participated in the main experiment) for six of the complex pitch tones used in the main experiment (F0s: 100, 141, 200, 400, 800, and 1600). The resulting F0DLs were small (all below 1%). For reference, a semitone difference, the smallest interval used in Western music, is around 6%. These results are in line with our earlier measures of F0DLs for similar broadband harmonic tone complexes (Allen and Oxenham, 2014), suggesting the salience was high across the entire range of our pitch stimuli.

Topographic mapping of both spectral content and fundamental frequency

To assess the patterns of topographic cortical mapping for each of the three conditions (pure tones, timbre, and pitch), we constructed a separate feature tuning model, where the GLM β estimates for each voxel in the ROI in auditory cortices of each participant were characterized as a Gaussian filter, with parameters gain, g, CF, and σ, applied to the respective stimulus feature (frequency, Fc, or F0). Figure 3 shows the resulting filters' CFs in the pure-tone condition on a cortical surface for one representative participant as well as the group average. Both individual and group levels of analysis show robust high-low-high tonotopic gradient reversals, in line with earlier studies (Formisano et al., 2003; Langers and van Dijk, 2012; Thomas et al., 2015), with a region of lower CFs (warmer colors) being anteriorly and posteriorly flanked by regions of higher CFs (cooler colors), centered roughly on HG. At both the individual and group levels, there are additional smaller clusters of low-CF and high-CF voxels, also as reported in earlier studies (Da Costa et al., 2011; Moerel et al., 2013).

Figure 3.

Tonotopy elicited by pure tones. Feature tuning model CF parameter maps for the pure-tone condition shown on inflated brains for one participant (top) and transformed to fsaverage space and averaged across all 10 participants (bottom) within the intersection of all participants' ROIs. HGs denoted by white lines. White and black arrows indicate high (blue) and low (red) frequency regions, respectively. L = left hemisphere, R = right hemisphere, A = anterior, P = posterior, HG = Heschl's gyrus, STG = superior temporal gyrus.

To determine whether the well-established tonotopic organization found with pure tones reflects spectral energy or F0 in more complex sounds, we compared the pure-tone CF maps to the CF maps in the timbre and pitch conditions. Since it can be difficult to visualize the auditory cortices within the Sylvian fissure on an inflated lateral surface, Figure 4 shows these maps on spherical representations of the cortices for several representative individual participants and the group average. As with the individual participant maps, the group-level maps shown are unsmoothed, though smoothed versions of these figures produced very similar results. Although the range of CFs (frequency, Fc, and F0) are different between conditions, because Fc must always be higher than F0 (see Materials and Methods, Stimuli), normalized color ranges were chosen to help visualize similarities in the general pattern of tuning (i.e., highs and lows) across conditions. We found the timbre maps to be broadly similar to the pure-tone maps in terms of their high-low-high (blue-red-blue) structure. The topographic organization in the pitch condition seems less well defined, although a similar high-low-high gradient can be identified in both the individual and group-level data. The cortical locations of the high and low CF regions are reasonably similar for timbre and pitch, despite the fact that they are derived from independent acoustic features, the spectral peak and the F0, respectively. Maps for each of the 10 participants are shown in Figure 5. While the pitch CF maps appear noisier than those of the other conditions, the CF maps were highly consistent across model fit estimates, as can be seen in Figure 6. Bandwidths (BWs) of the Gaussian filters were also estimated for each condition (Fig. 7). For both the pure-tone and timbre conditions, the narrowest BWs tend to be clustered centrally, around HG, consistent with earlier reports using just pure tones (Thomas et al., 2015). The distribution of BWs for the pitch condition is again less clear-cut, although some participants show some indication of a central region with sharper tuning.

Figure 4.

Feature maps reveal gradient reversals in all three dimensions. Unthresholded CF maps for the feature tuning model for three individual participants and averaged across all participants (bottom right quadrant). Maps are shown on spherical cortical surfaces of the left and right hemispheres, respectively. Each row is the CF map for a given condition (pure tones, timbre, and pitch, respectively). White and black arrows indicate high (blue) and low (red) frequency regions, respectively, in the pure-tone conditions. Arrows are shown in the same anatomical locations for the timbre and pitch conditions for ease of reference. Custom color maps span the respective CF range, on a log scale, of each condition, as labeled. L = left hemisphere, R = right hemisphere. HGs denoted by white lines.

Figure 5.

CF maps for each participant. Unthresholded CF maps for all 10 participants for the feature tuning model within their ROIs. Maps are shown on spherical cortical surfaces of the left and right hemispheres, respectively. Each row is a participant. Each column is the CF maps for a given condition (pure tones, timbre, and pitch, representing frequency, CF, and fundamental frequency, respectively). Custom color maps span the respective CF range, on a log scale, of each condition, as labeled. HGs denoted by white lines.

Figure 6.

Reliability of the CF estimates across cross-validation folds. CF maps for each model fit estimate for each condition are shown for two representative participants (top and bottom figures, respectively). Custom color maps span the respective CF range, on a log scale, of each condition, as labeled. HGs denoted by white lines.

Figure 7.

Distribution of tuning BWs and relationship to tuning CFs. A, Unthresholded BW maps, defined as the full width at half maximum (FWHM) of the fitted Gaussian (equivalent to 2.355 SD), for three individual participants for the feature tuning model within their ROIs. Maps are shown on spherical cortical surfaces of the left and right hemispheres, respectively. A low number of octaves (reds) indicate a small BW, whereas a high number of octaves (blues) indicate a large BW, corresponding to sharper and broader tuning, respectively. HGs denoted by white lines. B, Group-level scatterplots comparing BWs and CFs for the feature tuning model for the pure-tone condition. Voxels included for each of the 10 participants are those that exceeded the feature tuning model R² threshold of 30% and had BWs of eight octaves or less. Each participant's data are shown in a different color. Fit lines for each participant's data, as well as mean fit lines (black) are plotted. To the left and bottom of the scatterplot are marginal kernel density histograms. Pearson's r, slope of the line fit, and number of total voxels (n voxels) are reported. C, Group-level scatterplots comparing BWs and CFs for the timbre condition. Same plotting conventions are used as in B. D, Group-level scatterplots comparing BWs and CFs for the pitch condition. Same plotting conventions are used as in B, C.

To better understand which regions in auditory cortex are driven by each condition, Figure 8 shows the variance accounted for (R²) of the held-out data in the β weights by each voxel's filter in each of the three conditions. As with the model CF and BW parameters, the spatial distribution of the high R² voxels is similar in the pure-tone and timbre conditions. In the pitch condition, the number of voxels with a substantial amount of variance explained is reduced, with the exception of regions around the border of HG. While there appears to be some interindividual variability in the spatial patterns of high R² voxels, the group average results show a small cluster of higher R² values lining the anterolateral side of HG, bilaterally (Fig. 8, lower right). This location is consistent with previous studies' reports of the location of pitch-sensitive regions in both humans (Penagos et al., 2004; Norman-Haignere et al., 2013) and nonhuman primates (Bendor and Wang, 2006). However, because of the anatomical variability across individuals seen both in the present study and reported in earlier work (Rademacher et al., 2001), group-level maps have somewhat limited value and should be considered in conjunction with cortical representations from individual participants. R² heat maps for each participant and comparisons of data with feature tuning model fits can be seen in Figures 9 and 10, respectively.

Figure 8.

Heat maps for explained variance are similar for pure-tone and timbre conditions. Cross-validated R² heat maps for the feature tuning model for three participants and averaged across all 10 participants (bottom right quadrant) on spherical cortical surfaces of the left and right hemispheres, respectively. Each row is a different condition. L = left hemisphere, R = right hemisphere. HGs denoted by white lines.

Figure 9.

Maps of variance explained for each participant. Cross-validated heat maps showing R² values for the feature tuning model for all 10 participants for the feature tuning model on spherical cortical surfaces of the left and right hemispheres, respectively. Each row is a participant. Each column is the R² maps for a given condition (pure tones, timbre, and pitch, respectively). HGs denoted by white lines.

Figure 10.

Comparison of data and feature tuning model fits. Each pair of panels shows the tuning of individual voxels (left) and the predicted tuning of the corresponding model voxel (right) for the voxels with the highest R² values, ordered from top to bottom based on the fitted CF (from low to high). The three columns show data from the pure-tone, timbre, and pitch conditions, respectively. Each row shows data from an individual participant, taking the top 1000 voxels from each. The bottom row shows the data pooled across participants, taking only the top 2000 voxels (i.e., voxels with the highest R² values). For visualization purposes, each fit was normalized (0–1 range) and the same scaling constants were applied to that voxel's data. The data show a generally good correspondence between the predicted and actual tuning.

Pure-tone cortical tonotopy primarily reflects spectral content

The analysis shown in Figures 3 and 4 for each condition separately indicates strong similarities between the pure-tone and timbre conditions, suggesting that traditional pure-tone cortical tonotopy primarily reflects a sound's spectral content, rather than its pitch. To provide a more direct comparison of the cortical responses for different conditions, we calculated RSMs both within and across conditions (Fig. 11). Each matrix cell shows the correlation coefficient between voxels' responses for a given pair of stimuli (in terms of F, Fc, or F0) within the same ROIs as shown in the surface maps. High correlations in cells near the main diagonal, as seen in the within-condition comparisons for both pure-tone and timbre conditions (top-left and center boxes in each panel), indicate that tones that are closer in frequency (or Fc or F0) produce activation patterns that are more strongly correlated across voxels than tones that are distant in frequency. A similar diagonal correlation pattern can be seen when comparing patterns of activation between the pure-tone and timbre conditions (left-middle box), suggesting that voxels are responding to similar features in both conditions. In contrast, within-condition comparisons for the pitch condition (bottom right box) show higher correlations across all tones, and the RSMs comparing pitch and pure-tone conditions and comparing pitch and timbre conditions (bottom-left and bottom-middle boxes) show similarly high correlations for all higher frequencies (or Fc), independent of F0. This is likely driven by the relatively high Fc (2400 Hz) of all pitch stimuli. Overall, the RSM analysis confirms our initial analysis showing that classic tonotopy likely reflects the spectral content, and not the F0, of complex tones.

Figure 11.

Patterns of activation are more similar for pure-tones and timbre conditions. RSMs within the ROIs of three participants, averaged across repeats for a given stimulus, thresholded to include voxels with a GLM R² of at least 10% (i.e., voxels that show robust sound-evoked responses). The bottom right quadrant shows RSMs thresholded at a GLM R² of 10% for each participant and then averaged across all 10 participants. White spaces indicate pure-tone frequencies that do not have corresponding Fc or F0 values for the timbre and pitch conditions, respectively.

Shared and distinct tuning properties

To determine whether tuning to the different dimensions was anatomically distinct, we investigated voxels demonstrating clear tuning to one or more conditions. We did this by categorizing each voxel as being selective along a certain dimension if the fitted Gaussian function for that voxel accounted for at least 30% of the variance in that condition. We did this for each of the three conditions (pure tones, pitch, and timbre), resulting in each voxel being categorized independently as selective (or not) along each of the three dimensions. Figure 12A provides a surface plot for one participant, with voxels color-coded to indicate the condition(s) under which each voxel was categorized as selective.

Figure 12.

Percentage of voxels showing joint tuning. A, Surface map (top) for one participant showing voxels with a model R² of at least 30% for one or more features within the feature tuning model. Different colors denote conditions that meet this threshold, indicating which voxels are demonstrating tuning across multiple conditions. HGs denoted by black lines. Arrows point to the regions anterior to HG predominantly tuned to pitch (or pitch and pure tones). In the center is a Venn diagram for this participant showing the percentage of voxels with and without joint tuning, also only including voxels with a model R² of at least 30%. B, Seven voxels showing various types of tuning. The three plots in each row show tuning of a single voxel to each of the conditions. The colored lines are the β estimates for each stimulus type, ordered from low to high along the x-axis. The black line is the average feature tuning model fit. Boxes filled with gray indicate poor tuning (feature tuning model R² was 15% or less in all cases) for that condition. The colors of the box outlines correspond to the colors in the Venn diagram. Normalized data from each participant's individual Venn diagram was averaged together to derive the mean proportions. See legend at the bottom for a description of the color map.

In general, a large proportion of voxels jointly tuned to pure tones and timbre, as well as many voxels tuned specifically to timbre, are centered on HG. Beyond HG, while all combinations of tuning are represented in regions posterior to HG, there are prominent clusters of voxels in regions anterior to HG in both hemispheres, tuned either to both pure tones and pitch or just to pitch, in line with previously postulated pitch-sensitive regions (Patterson et al., 2002; Penagos et al., 2004; Norman-Haignere et al., 2013). Along with the surface plots in Figure 12A is a Venn diagram showing the proportions of voxels with each type of tuning for the same sample participant. The Venn diagram for the group-average data is shown in Figure 12B, along with examples of the data and model fits from individual voxels that provide examples of selectivity along one, two, or all three dimensions. Surface plots and Venn diagrams for each participant can be found in Figure 13. The relative proportions shown in these Venn diagrams remain similar for a range of R² thresholds and are not specific to the selected 30% threshold. Visual inspection of the surface maps for each participant for cross-validation fold 1 versus fold 2 showed a high degree of consistency, and paired-sample t tests comparing the proportions within each of the seven sections of the Venn diagram across all 10 participants found no significant differences (p < 0.05) for any of the across-fold comparisons. A simulation was also run to determine the amount of overlap that would be expected by chance, assuming the tuning for each condition was distributed independently. In all cases, with the exception of the overlap between pitch and timbre, the overlap found in the present study exceeded the amount of overlap that could be ascribed to chance.

Figure 13.

Joint tuning surface plots and Venn diagrams for each participant. Surface maps for each participant showing voxels with a model R² of at least 30% for one or more features within the feature tuning model. The adjacent Venn diagrams show the proportion of voxels exhibiting tuning to one or more features, also only including voxels with a model R² of at least 30%. See legend for a description of the color map. HGs denoted by black lines.

Overall, the greatest proportion of voxels are tuned to just the timbre of complex tones, followed by voxels jointly tuned to timbre and pure tones. Thus, over 70% of voxels are tuned to some aspect of spectral content (F, Fc, or both), with a relative lack of tuning to F0. The fact that many voxels appear to have selectivity for the spectral content of complex tones but not for the pure tones is consistent with findings from single-unit studies that have reported the existence of many cortical neurons that respond more strongly to spectrally complex sounds than to pure tones (Rauschecker et al., 1995; Bendor and Wang, 2005; Feng and Wang, 2017). Nevertheless, over 20% of voxels appear to exhibit tuning to pure-tone frequency without showing similar selectivity for the overall spectral shape of complex sounds.

Although the population appears to be dominated by voxels with spectral content selectivity, the resulting Venn diagrams are consistent with our other measures (e.g., Fig. 8) in showing a substantial proportion of voxels, approaching 30%, that appear to have F0 tuning, either exclusively or in combination with tuning to other dimensions.

Similarities in cortical tuning properties for pure-tone frequency, timbre, and pitch

Although the timbre response patterns, reflecting spectral content, seem to resemble pure-tone tonotopy (Fig. 11), and show the greatest degree of overlap in the Venn diagrams (Figs. 12, 13, purple section), some similarities were also observed between the pure-tone and pitch conditions, reflecting sensitivity to F0, as well as some overlap in the Venn diagrams (Figs. 12, 13, green section). Here, we provide a quantitative assessment of these similarities by comparing the model's CFs obtained in the different conditions for individual voxels. Figure 14 shows scatterplots of voxels that demonstrated tuning (i.e., selectivity along the dimension being tested) for both pure tones and timbre (Fig. 14A) or for both pure tones and pitch (Fig. 14B). As expected, there was a strong relationship between voxel CFs derived in the pure-tone condition and the CFs for the same voxels derived in the timbre condition (r = 0.89) with an average relationship close to unity.

Figure 14.

Auditory cortical voxels show similar tuning CFs between features. A, Scatterplots comparing CFs for the feature tuning model for timbre and pure-tone conditions. Voxels included for each participant are the intersection of those that exceeded the feature tuning model R² threshold of 30% for each condition being plotted. Axis ranges are limited to CFs that were included in both conditions. Each participant's data are a different color. Fit lines for each participant's data, as well as mean fit lines (black) are plotted. To the left and bottom of each scatterplot are marginal kernel density histograms as well as a mean kernel density line (black). Pearson's r, slope of the line fit, and number of total voxels (n voxels) are reported. B, Scatterplots comparing CFs for the feature tuning model for pitch and pure-tone conditions. Same plotting conventions as in A.

Interestingly, although there were fewer voxels that were responsive to both pure-tone and pitch conditions, the correlation between the CFs for those voxels was similarly high (r = 0.79). This finding suggests that voxels tuned to both pitch and pure tones often have a best frequency corresponding to the best F0. Finally, the kernel density histograms indicate a broad peak of voxels with CFs (in terms of F, Fc, and F0) around 800 Hz, suggesting a somewhat nonuniform distribution of CFs across all three dimensions.

Pitch tuning in auditory cortex

Although the primary cortical tonotopic gradients seem to be dominated by spectral content, as shown by the close correspondence between responses in the pure-tone and timbre conditions, evidence for tuning to F0 or pitch was also observed in all participants. Voxels from one participant showing tuning to low, medium, and high F0s are shown in Figure 15A. To further explore the spatial layout of F0 tuning, we examined pitch-tuned voxels that were not sensitive to changes in either the pure-tone or timbre conditions (i.e., Figs. 12, 13, yellow section of Venn diagrams). Figure 15B shows voxels pooled across all participants with an R² threshold of 0% (since model fit is evaluated on held-out data, the variance of the residuals can exceed that of the data, i.e., R² < 0%), and a second, more stringent, map with an R² threshold of 30%. These results suggest there may be some organization to these exclusively pitch-tuned voxels that is relatively insensitive to the R² cutoff used in the analysis. There is a trend for a high-low-high F0 organization around the edges of HG, as denoted by the white and black arrows.

Figure 15.

Voxels showing clear pitch tuning. A, Three different voxels for one participant tuned to low, medium, and high pitches, respectively. The blue and red lines are the β estimates, and the black line is the average feature tuning model fit. B, Voxels pooled across all participants with feature tuning model R² in the pitch condition of >0% (top) and >30% (bottom) and excluding voxels with corresponding thresholds in the other two conditions (pure tone, timbre). For voxels in common across participants, the pitch CFs are averaged. White and black arrows indicate high (blue) and low (red) F0 regions, respectively. Black rectangles denote regions along STG tuned predominantly to low F0s. HGs denoted by white lines. For reference, arrows and rectangles are in the same anatomical locations on both maps.

The impression that F0-tuned voxels were more likely to be found in nonprimary auditory cortex than frequency-tuned or Fc-tuned voxels was tested quantitatively by comparing the proportions of tuned voxels within each participant's ROI that were inside HG (the macroanatomical landmark associated with the “core” or primary auditory cortex, based on histologic methods; Wallace et al., 2002), versus outside HG. A two-tailed paired-samples t test revealed a significant increase in the proportion of pitch-tuned voxels (i.e., Figs. 12, 13, yellow section of Venn diagrams) outside of HG compared with inside HG (mean proportion of pitch voxels within HG = 11.59%; mean proportion of pitch voxels outside of HG = 18.36%; p < 0.01), mirrored by a proportional decrease in voxels tuned to spectral content (i.e., all other sections of the Venn diagrams, excluding yellow) outside of HG compared with inside HG (mean proportion of spectral voxels within HG = 88.41%; mean proportion of spectral voxels outside of HG = 81.64%; p < 0.01). Although the precise locus of A1 remains debated and is subject to large interindividual variability (Moerel et al., 2014), the fact that F0-tuning is found more outside than inside HG is consistent with pitch being represented more in higher-level cortical regions relative to spectral processing.

Critically, the voxels demonstrating clear F0 tuning (R² > 30%) are distributed throughout auditory cortex, bilaterally, and are not located in an isolated region or limited to one hemisphere. There was no significant difference in the number of voxels per hemisphere across participants that were selective for F0 (p = 0.58). This broad distribution of pitch tuning, which has also been found in macaques (Kikuchi et al., 2019), may help explain why it has proved difficult to build a consensus on the presence or location of a “pitch center” in auditory cortex (Hall and Plack, 2009; Bendor, 2012).

In addition, there appears to be a region along the STG that is tuned predominantly to low F0s, as indicated by the rectangles in Figure 15B This area has been identified as a pitch-sensitive region, in addition to the region anterolateral to HG, which contains a large cluster of high F0-tuned voxels (e.g., Patterson et al., 2002; Penagos et al., 2004; Norman-Haignere et al., 2013). However, it is important to note that pitch sensitivity (i.e., stronger cortical responses to sounds with greater pitch salience), found in previous studies, is distinct from representations of pitch selectivity (i.e., tuning to specific F0s) that we demonstrate here.

Spectral tuning model

To further support the claim that the topography shown in Figure 15 is a reflection of F0 tuning and cannot be explained by the subtle differences in spectral fine structure that occur with changes in F0, we employed a spectral tuning model. In this model, instead of using the spectral peak as the input for the timbre condition, and F0 as the input for the pitch condition, the Gaussian weighting function was applied to the full sound spectrum and was fitted separately to each of the three conditions. Performance for the pure-tone condition was the same as in the feature tuning model (as the input for both models is the frequency of the pure tone) and performance for the timbre condition was very similar in both models. However, as expected, given the lack of change in spectral envelope across the range of F0 values tested, this model explained virtually no variance for the pitch conditions and predicted essentially a flat line across all stimuli in the pitch condition (Fig. 16). Changing the color range for pitch to be around the Fc of the stimuli did not improve these CF maps. The fact that the spectral tuning model could account for the observed responses in the pure-tone and timbre conditions, but not in the pitch condition, further supports the claim that the tonotopy observed in studies using pure tones is predominantly driven by spectral content.

Figure 16.

The spectral tuning model is a poor model of pitch. A, Unthresholded CF maps for the spectral tuning model for two individual participants. Maps are shown on spherical cortical surfaces of the left and right hemispheres, respectively. Each row is the CF map for a given condition (pure tones, timbre, and pitch, respectively). Custom color maps span the respective CF range of each condition, as labeled. L = left hemisphere, R = right hemisphere. HGs denoted by white lines. B, Cross-validated R² heat maps for the spectral tuning model for the same two participants as in A. C, A voxel for a single participant showing good tuning for pure tones and timbre but poor tuning for pitch. The black line is the average spectral tuning model fit. The colored lines are the β values for each stimulus type, ordered from low to high. D, A voxel for a single participant showing good tuning for pure tones and pitch, but poor tuning for timbre. Same plotting conventions as in C. E, A voxel for a single participant showing good tuning for all three conditions. Same plotting conventions as in C, D. Note that the spectral tuning model is unable to account for pitch tuning in D and E.

Relationship between F0 and spectral density

Although our results are consistent with F0 selectivity, it is important to note that increases in F0 are inversely correlated with spectral density, or spacing between neighboring spectral components. Thus, it is, in principle, possible that voxels that appear to be tuned to F0 are instead tuned to different degrees of spectral density. We addressed this potential confound by using our pure-tone and timbre data, noting that pure tones are spectrally less dense than complex tones. Specifically, if spectral density were determining the responses in our F0-tuned voxels, voxels tuned to high F0s (i.e., potentially reflecting tuning to lower spectral density) should respond more strongly to pure tones than to timbre tones with the same CF. In contrast, voxels tuned to low F0s (i.e., potentially reflecting tuning to higher spectral density) should then respond more strongly to the timbre tones, which are spectrally dense. To test this prediction, we calculated the difference between the mean pure-tone and mean timbre responses in each F0-tuned voxel, defined as an F0 model fit exceeding R² of 30%, and plotted this difference as a function of the voxels' best F0 (Fig. 17A). If the apparent F0 tuning instead reflected a tuning to spectral density, then the relative response to pure tones over timbre tones should increase as a function of best F0, leading to a positive slope. In fact, we found no relationship between best F0 and the difference in mean response to pure tones and timbre tones (mean slope = 0.00, 95% confidence interval (CI) = [−0.16,0.06]). This outcome is consistent with responses driven by F0, rather than spectral density. To confirm these voxels' selectivity for F0, we analyzed their responses using the same approach that was used to rule out their selectivity for spectral density. Specifically, we subtracted the mean β response to the lowest pitch stimulus (F0 = 100 Hz) from the mean β response to the highest pitch stimulus (F0 = 1600 Hz). The positive slope in Figure 17B (mean slope = 1.13, 95% CI = [0.92,1.19]) indicates an increased preference for higher pitch stimuli as a function of pitch CF preference, thus confirming their selectivity for F0. While more research is needed expressly controlling for effects of spectral density, the present findings are consistent with the findings of Penagos et al. (2004), in which they reported no difference in activation maps when contrasting responses to spectrally sparse from spectrally dense harmonic tones, so long as both evoked a similar degree of pitch salience. The argument in favor of F0 tuning, as opposed to spectral-density tuning, is further bolstered by the scatterplot in Figure 14B, which shows a good correspondence between the best frequencies and best F0s for voxels tuned to both pure-tone frequency and F0.

Figure 17.

Spectral density does not explain F0 effects. A, Scatterplots showing the mean pure tone β minus the mean timbre β for each “pitch voxel” for each participant, defined as voxels that exceed the feature tuning model R² threshold of 30% for the pitch condition. Each participant's data are a different color. Fit lines for each participant's data, as well as mean fit line (black) are plotted. Pearson's r and slope of the line fit are reported. B, Scatterplots showing the mean highest pitch β (F0 = 1600 Hz) minus the mean lowest pitch β (F0 = 100 Hz) for each “pitch voxel” for each participant. Half of the data were used for fitting and measured responses are from independent data. Same plotting conventions as A.

Discussion

This fMRI study used complex tones to dissociate the auditory cortical representations of F0 (which determines pitch) from spectral content (which influences timbre) to determine which of these underlies the well-known tonotopic organization observed with pure tones. Consistent with previous pure-tone studies (Formisano et al., 2003; Da Costa et al., 2011; Striem-Amit et al., 2011; Saenz and Langers, 2014; Thomas et al., 2015), we found bilateral V-shaped high-low-high gradient reversals overlapping HG in all participants, with narrower tuning BWs around HG and broader tuning BWs in surrounding regions. Although the alignment across participants is complicated by individual differences in size, shape, and number of HG in each hemisphere (Rademacher et al., 2001), this high-low-high pattern of tonotopy was preserved at the group level (Fig. 3).

A similar high-low-high pattern was found with harmonic complex tones that maintained a constant F0 but varied systematically in their spectral peak or Fc, resulting in changes in timbre along a dull-bright continuum (Fig. 4). The similarity of the pure-tone frequency and complex-tone Fc representations both within and beyond HG suggest that the tonotopy observed in earlier studies was driven primarily by spectral content. However, the fact that many voxels exhibited selectivity for spectral content specifically in complex tones but not pure tones and vice versa (Figs. 12, 13) suggests an organization more complex than the simple filtering found in the cochlea. Most strikingly, we found evidence of an orderly representation of pitch-tuned voxels, particularly in regions surrounding HG, when examining responses to complex stimuli with a fixed spectral peak but varying in F0 (Figs. 4, 5, 15).

Relationship to previous studies

In addition to the pure tones commonly used to reveal robust cortical tonotopy, complex natural sounds, such as speech, musical instruments, and animal vocalizations, have been used to derive feature representations in auditory cortex using fMRI (Moerel et al., 2012; De Angelis et al., 2018). However, as with pure tones, the positive correlation between F0 and spectral energy often found in natural sounds (Assmann and Nearey, 2008; Hillenbrand and Clark, 2009; McAdams, 2013) makes it difficult to conclude whether the derived maps reflect spectral energy distributions, F0, or a combination of the two. The present study resolves this issue by independently varying F0 and Fc to tease apart the cortical topography of these features.

Early MEG studies using complex tones attempted to study the relationship between pitch and sound spectra. Their differing conclusions suggest either that cortical tonotopy reflects pitch (Pantev et al., 1989), or that it reflects orthogonal representations of both pitch and spectral distribution (Langner et al., 1997). However, the limited spatial resolution of MEG makes it poorly suited to fine-grained analysis of the topographical organization of cortical representations. The present study used high-field fMRI to explore the topography of these features at a much higher spatial resolution.

Lastly, several studies have used fMRI in an effort to identify regions of human auditory cortex that respond preferentially to pitch-eliciting stimuli (Penagos et al., 2004; Hall and Plack, 2009; Norman-Haignere et al., 2013; De Angelis et al., 2018). While their focus was on finding regions exhibiting categorical pitch sensitivity, the present study extends this work by treating pitch (F0) as a continuous variable and dissociating F0 selectivity from frequency selectivity (tonotopy). Our results reveal voxel-wise tuning to different ranges of F0, in regions surrounding HG, that is distinct from tonotopy. As such, our work provides a unique and complementary perspective on F0 encoding in human auditory cortex.

Voxel tuning across multiple dimensions

A considerable proportion of voxels exhibited tuning in two or more of the conditions tested, particularly in the pure-tone and timbre conditions (Figs. 12, 13). Although a majority of the voxels that exhibited tuning to more than one dimension were selective to the pure-tone and timbre conditions, it may seem surprising that the overall proportion was not greater, given the evidence that tuning in both these conditions is driven by spectral content. This lack of overlap in the populations may be due in part to the fact that the range of pure-tone frequencies (100–6400 Hz) was greater than the Fc range (400–6400 Hz), but may also reflect genuine differences in selectivity based on higher-level features, such as sound complexity or BW. Indeed, our findings are in line with single- and multi-unit studies in other species that have identified neurons that are sensitive to either pure tones or complex sounds, but not both (Rauschecker et al., 1995; Feng and Wang, 2017; Kikuchi et al., 2019).

Our data suggest the existence of two distinct cortical representations: one based on frequency selectivity (i.e., tonotopy), and the other based on pitch or F0 selectivity. While partially overlapping, responses in HG were predominantly driven by spectral content, as reflected by the strong model fits in both the pure-tone and timbre conditions. Pitch representations, on the other hand, were mostly found in regions surrounding HG. These findings are consistent with the idea that lower-level frequency content is processed predominantly in primary auditory region A1 and higher-level sound features (such as pitch) are processed predominantly in surrounding nonprimary (belt and parabelt) regions.

Pitch tuning in auditory cortex

While many studies have explored responses to pitch in auditory cortex, the approach has generally been to compare sounds with salient pitches to those with weak pitches or to present a variety of pitch-evoking stimuli to identify pitch-sensitive regions (Penagos et al., 2004; Hall and Plack, 2009; Norman-Haignere et al., 2013). In contrast, the present study explored voxel-wise tuning to different pitches while controlling for spectral variations. We were able to identify voxels in all participants that exhibited selectivity along the F0 dimension. The initial maps incorporating all voxels exhibiting F0 selectivity suggested a (somewhat noisy) high-low-high organization of F0. However, when only voxels exclusively selective along the F0 dimension (R² < 30% in the other two dimensions) were selected (Fig. 15), regions distinctly tuned to low, medium, and high F0s were found bilaterally, with clear clusters of voxels tuned to low F0s around the medial portion of HG (black arrows) and lining STG (black rectangles), and clear clusters of voxels tuned to high F0s in regions anterolateral to HG (white arrows). This distributed pitch coding throughout auditory regions is consistent with findings from cortical surface electrode recordings in humans (Gander et al., 2019).

Bilaterality in cortical representations

There is some disagreement in the literature regarding the laterality of cortical pitch representations. Bilateral sensitivity to pitch has been found in a number of studies (Patterson et al., 2002; Penagos et al., 2004; Warren et al., 2005; Hall and Plack, 2009; Norman-Haignere et al., 2013; Allen et al., 2017, 2018; De Angelis et al., 2018), whereas some others have suggested a right hemisphere lateralization for some forms of pitch processing (Zatorre et al., 2002; Hyde et al., 2008; Albouy et al., 2020). No significant difference in the number of pitch-tuned voxels between hemispheres was found in the present study, suggesting that pitch selectivity is represented bilaterally.

Limitations

Our results are consistent with pitch selectivity across extended regions of auditory cortex. However, as noted above, F0 is inversely proportional to spectral density of the harmonics. Although our additional analyses supported the hypothesis that the responses reflected selectivity to F0 and not spectral density, the use of both harmonic and inharmonic tones could help to better dissociate F0 and pitch from spectral density.

Our design involved varying either the F0 or Fc of a complex tone, while keeping the other dimension fixed. A fully-crossed stimulus design (i.e., pairing all possible F0s with all possible values of Fc) would have required more scanning sessions, but would help determine the generalizability of the F0 and Fc representations, such as whether the tuning to F0 of a given voxel is independent of the stimulus Fc. Now that the existence of F0 tuning has been established, such additional questions could be addressed in a more extensive study.

Finally, an open question is how relative changes are represented in cortex (i.e., contour and interval size, regardless of absolute changes in frequency or F0). Relative pitch processing is essential for both music and speech comprehension, but little is known about its cortical representations. There is some fMRI evidence of lateralization differences for the processing of relative contour changes versus absolute interval sizes (Stewart et al., 2008), and recent studies, based on recordings from subdurally implanted electrodes, as participants listened to variable pitch contours in speech stimuli, provide evidence of both absolute and relative pitch encoding in human auditory cortex (Tang et al., 2017; Hamilton et al., 2021). While the present study suggests there may be systematic maps of absolute pitch, follow-up work is needed to explore the contributions of relative pitch in the cortical processing hierarchy and to assess whether there is any systematic representation based on the magnitude and/or direction of change. Since relative changes are based on the relationships between consecutive sounds, this would point toward a higher, more holistic level of processing in the hierarchy compared with absolute pitch representations.

Footnotes

This work was supported by the National Institutes of Health (NIH) Grant R01 DC005216. The Center for Magnetic Resonance Research (CMRR) is supported by NIH Grants P41 EB027061, P30 NS076408, and S10 RR026783 and by the W.M. Keck Foundation. We thank Anahita Mehta, Omer Faruk Gulban, Andrea Grant, Cheryl Olman, and Stephen Engel for helpful assistance, training, and advice.
The authors declare no competing financial interests.
Correspondence should be addressed to Emily J. Allen at prac0010{at}umn.edu

SfN exclusive license.

References

↵
2. Albouy P,
3. Benjamin L,
4. Morillon B,
5. Zatorre RJ
(2020) Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367:1043–1047. doi:10.1126/science.aaz3468 pmid:32108113
OpenUrl Abstract/FREE Full Text
↵
2. Allen EJ,
3. Oxenham AJ
(2014) Symmetric interactions and interference between pitch and timbre. J Acoust Soc Am 135:1371–1379. doi:10.1121/1.4863269 pmid:24606275
OpenUrl CrossRef PubMed
↵
2. Allen EJ,
3. Burton PC,
4. Olman CA,
5. Oxenham AJ
(2017) Representations of pitch and timbre variation in human auditory cortex. J Neurosci 37:1284–1293. doi:10.1523/JNEUROSCI.2336-16.2016 pmid:28025255
OpenUrl Abstract/FREE Full Text
↵
2. Allen EJ,
3. Moerel M,
4. Lage-Castellanos A,
5. De Martino F,
6. Formisano E,
7. Oxenham AJ
(2018) Encoding of natural timbre dimensions in human auditory cortex. Neuroimage 166:60–70. doi:10.1016/j.neuroimage.2017.10.050 pmid:29080711
OpenUrl CrossRef PubMed
↵
2. Assmann PF,
3. Nearey TM
(2008) Identification of frequency-shifted vowels. J Acoust Soc Am 124:3203–3212. doi:10.1121/1.2980456 pmid:19045804
OpenUrl CrossRef PubMed
↵
2. Bendor D
(2012) Does a pitch center exist in auditory cortex? J Neurophysiol 107:743–746. doi:10.1152/jn.00804.2011 pmid:22049331
OpenUrl CrossRef PubMed
↵
2. Bendor D,
3. Wang X
(2005) The neuronal representation of pitch in primate auditory cortex. Nature 436:1161–1165. doi:10.1038/nature03867 pmid:16121182
OpenUrl CrossRef PubMed
↵
2. Bendor D,
3. Wang X
(2006) Cortical representations of pitch in monkeys and humans. Curr Opin Neurobiol 16:391–399. doi:10.1016/j.conb.2006.07.001 pmid:16842992
OpenUrl CrossRef PubMed
↵
2. Bourk TR,
3. Mielcarz JP,
4. Norris BE
(1981) Tonotopic organization of the anteroventral cochlear nucleus of the cat. Hear Res 4:215–241. doi:10.1016/0378-5955(81)90008-3 pmid:7263511
OpenUrl CrossRef PubMed
↵
2. Cooper NP
(1999) Vibration of beads placed on the basilar membrane in the basal turn of the cochlea. J Acoust Soc Am 106:L59–L64. doi:10.1121/1.428147 pmid:10615711
OpenUrl CrossRef PubMed
↵
2. Da Costa S,
3. van der Zwaag W,
4. Marques JP,
5. Frackowiak RSJ,
6. Clarke S,
7. Saenz M
(2011) Human primary auditory cortex follows the shape of Heschl's gyrus. J Neurosci 31:14067–14075. doi:10.1523/JNEUROSCI.2000-11.2011 pmid:21976491
OpenUrl Abstract/FREE Full Text
↵
2. De Angelis V,
3. De Martino F,
4. Moerel M,
5. Santoro R,
6. Hausfeld L,
7. Formisano E
(2018) Cortical processing of pitch: model-based encoding and decoding of auditory fMRI responses to real-life sounds. Neuroimage 180:291–210. doi:10.1016/j.neuroimage.2017.11.020
OpenUrl CrossRef PubMed
↵
2. de Heer WA,
3. Huth AG,
4. Griffiths TL,
5. Gallant JL,
6. Theunissen FE
(2017) The hierarchical cortical organization of human speech processing. J Neurosci 37:6539–6557. doi:10.1523/JNEUROSCI.3267-16.2017 pmid:28588065
OpenUrl Abstract/FREE Full Text
↵
2. Dumoulin SO,
3. Wandell BA
(2008) Population receptive field estimates in human visual cortex. Neuroimage 39:647–660. doi:10.1016/j.neuroimage.2007.09.034 pmid:17977024
OpenUrl CrossRef PubMed
↵
2. Feng L,
3. Wang X
(2017) Harmonic template neurons in primate auditory cortex underlying complex sound processing. Proc Natl Acad Sci USA 114:E840–E848. doi:10.1073/pnas.1607519114 pmid:28096341
OpenUrl Abstract/FREE Full Text
↵
2. Fischl B
(2012) FreeSurfer. Neuroimage 62:774–781. doi:10.1016/j.neuroimage.2012.01.021 pmid:22248573
OpenUrl CrossRef PubMed
↵
2. Formisano E,
3. Kim DS,
4. Di Salle F,
5. van de Moortele PF,
6. Ugurbil K,
7. Goebel R
(2003) Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40:859–869. doi:10.1016/s0896-6273(03)00669-x pmid:14622588
OpenUrl CrossRef PubMed
↵
2. Gander PE,
3. Kumar S,
4. Sedley W,
5. Nourski KV,
6. Oya H,
7. Kovach CK,
8. Kawasaki H,
9. Kikuchi Y,
10. Patterson RD,
11. Howard MA,
12. Griffiths TD
(2019) Direct electrophysiological mapping of human pitch-related processing in auditory cortex. Neuroimage 202:116076. doi:10.1016/j.neuroimage.2019.116076 pmid:31401239
OpenUrl CrossRef PubMed
↵
2. Glasser MF,
3. Van Essen DC
(2011) Mapping human cortical areas in vivo based on myelin content as revealed by T1- and T2-weighted MRI. J Neurosci 31:11597–11616. doi:10.1523/JNEUROSCI.2180-11.2011 pmid:21832190
OpenUrl Abstract/FREE Full Text
↵
2. Hall DA,
3. Plack CJ
(2009) Pitch processing sites in the human auditory brain. Cereb Cortex 19:576–585. doi:10.1093/cercor/bhn108 pmid:18603609
OpenUrl CrossRef PubMed
↵
2. Hamilton LS,
3. Oganian Y,
4. Hall J,
5. Chang EF
(2021) Parallel and distributed encoding of speech across human auditory cortex. Cell 184:4626–4639.
OpenUrl
↵
2. Hillenbrand JM,
3. Clark MJ
(2009) The role of f 0 and formant frequencies in distinguishing the voices of men and women. Atten Percept Psychophys 71:1150–1166. doi:10.3758/APP.71.5.1150 pmid:19525544
OpenUrl CrossRef PubMed
↵
2. Hyde KL,
3. Peretz I,
4. Zatorre RJ
(2008) Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46:632–639. doi:10.1016/j.neuropsychologia.2007.09.004 pmid:17959204
OpenUrl CrossRef PubMed
↵
International Organization for Standardization (2003) Acoustics - normal equal-loudness contours (standard no. 226). Geneva: International Organization for Standardization.
↵
2. Kay KN,
3. Rokem A,
4. Winawer J,
5. Dougherty RF,
6. Wandell BA
(2013) GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Front Neurosci 7:1–15.
OpenUrl CrossRef PubMed
↵
2. Kay K,
3. Jamison KW,
4. Vizioli L,
5. Zhang R,
6. Margalit E,
7. Ugurbil K
(2019) A critical assessment of data quality and venous effects in sub-millimeter fMRI. Neuroimage 189:847–869. doi:10.1016/j.neuroimage.2019.02.006 pmid:30731246
OpenUrl CrossRef PubMed
↵
2. Kell AJE,
3. Yamins DLK,
4. Shook EN,
5. Norman-Haignere SV,
6. McDermott JH
(2018) A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98:630–644.e16. pmid:29681533
OpenUrl CrossRef PubMed
↵
2. Kikuchi Y,
3. Kumar S,
4. Baumann S,
5. Overath T,
6. Gander PE,
7. Sedley W,
8. Patterson RD,
9. Petkov CI,
10. Griffiths TD
(2019) The distribution and nature of responses to broadband sounds associated with pitch in the macaque auditory cortex. Cortex 120:340–352. pmid:31401401
OpenUrl PubMed
↵
2. Kleiner M,
3. Brainard D,
4. Pelli D,
5. Ingling A,
6. Murray R,
7. Broussard C
(2007) What's new in Psychtoolbox-3? Percept 36:ECVP Abstr Suppl.
↵
2. Krumhansl CL,
3. Iverson P
(1992) Perceptual interactions between musical pitch and timbre. J Exp Psychol Hum Percept Perform 18:739–751. pmid:1500873
OpenUrl CrossRef PubMed
↵
2. Langner G,
3. Sams M,
4. Heil P,
5. Schulze H
(1997) Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography. J Comp Physiol A 181:665–676. doi:10.1007/s003590050148
OpenUrl CrossRef PubMed
↵
2. Langers DRM,
3. van Dijk P
(2012) Mapping the tonotopic organization in human auditory cortex with minimally salient acoustic stimulation. Cereb Cortex 22:2024–2038. pmid:21980020
OpenUrl CrossRef PubMed
↵
2. Levitt H
(1971) Transformed up-down methods in psychoacoustics. J Acoust Soc Am 49:467–477. doi:10.1121/1.1912375
OpenUrl CrossRef PubMed
↵
2. Marozeau J,
3. de Cheveigné A,
4. McAdams S,
5. Winsberg S
(2003) The dependency of timbre on fundamental frequency. J Acoust Soc Am 114:2946–2957. pmid:14650028
OpenUrl CrossRef PubMed
↵
2. McAdams S
(2013) Musical timbre perception. In: The psychology of music. Amsterdam: Elsevier Inc.
↵
2. Moerel M,
3. De Martino F,
4. Formisano E
(2012) Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity. J Neurosci 32:14205–14216. pmid:23055490
OpenUrl Abstract/FREE Full Text
↵
2. Moerel M,
3. De Martino F,
4. Santoro R,
5. Ugurbil K,
6. Goebel R,
7. Yacoub E,
8. Formisano E
(2013) Processing of natural sounds: characterization of multipeak spectral tuning in human auditory cortex. J Neurosci 33:11888–11898. doi:10.1523/JNEUROSCI.5306-12.2013 pmid:23864678
OpenUrl Abstract/FREE Full Text
↵
2. Moerel M,
3. De Martino F,
4. Formisano E
(2014) An anatomical and functional topography of human auditory cortical areas. Front Neurosci 8:1–14.
OpenUrl CrossRef PubMed
↵
2. Moore BCJ,
3. Glasberg BR,
4. Shailer MJ
(1984) Frequency and intensity difference limens for harmonics within complex tones. J Acoust Soc Am 75:550–561. doi:10.1121/1.390527 pmid:6699293
OpenUrl CrossRef PubMed
↵
2. Narayan SS,
3. Temchin AN,
4. Recio A,
5. Ruggero MA
(1998) Frequency tuning of basilar membrane and auditory nerve fibers in the same cochleae. Science 282:1882–1884. doi:10.1126/science.282.5395.1882 pmid:9836636
OpenUrl Abstract/FREE Full Text
↵
2. Norman-Haignere S,
3. McDermott JH
(2016) Distortion products in auditory fMRI research: measurements and solutions. Neuroimage 129:401–413. doi:10.1016/j.neuroimage.2016.01.050 pmid:26827809
OpenUrl CrossRef PubMed
↵
2. Norman-Haignere S,
3. Kanwisher N,
4. McDermott JH
(2013) Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. J Neurosci 33:19451–19469. doi:10.1523/JNEUROSCI.2880-13.2013 pmid:24336712
OpenUrl Abstract/FREE Full Text
↵
2. Norman-Haignere S,
3. Kanwisher NG,
4. McDermott JH
(2015) Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88:1281–1296. doi:10.1016/j.neuron.2015.11.035 pmid:26687225
OpenUrl CrossRef PubMed
↵
2. Nuttall AL,
3. Dolan DF
(1996) Steady-state sinusoidal velocity responses of the basilar membrane in guinea pig. J Acoust Soc Am 99:1556–1565. doi:10.1121/1.414732 pmid:8819852
OpenUrl CrossRef PubMed
↵
2. Pantev C,
3. Hoke M,
4. Lütkenhöner B,
5. Lehnertz K
(1989) Tonotopic organization of the auditory cortex: pitch versus frequency representation. Science 246:486–488. doi:10.1126/science.2814476 pmid:2814476
OpenUrl Abstract/FREE Full Text
↵
2. Patterson RD,
3. Uppenkamp S,
4. Johnsrude IS,
5. Griffiths TD
(2002) The processing of temporal pitch and melody information in auditory cortex. Neuron 36:767–776. doi:10.1016/s0896-6273(02)01060-7 pmid:12441063
OpenUrl CrossRef PubMed
↵
2. Penagos H,
3. Melcher JR,
4. Oxenham AJ
(2004) A neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging. J Neurosci 24:6810–6815. doi:10.1523/JNEUROSCI.0383-04.2004 pmid:15282286
OpenUrl Abstract/FREE Full Text
↵
2. Rademacher J,
3. Morosan P,
4. Schormann T,
5. Schleicher A,
6. Werner C,
7. Freund HJ,
8. Zilles K
(2001) Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13:669–683. doi:10.1006/nimg.2000.0714 pmid:11305896
OpenUrl CrossRef PubMed
↵
2. Rauschecker JP,
3. Tian B,
4. Hauser M
(1995) Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268:111–114. doi:10.1126/science.7701330 pmid:7701330
OpenUrl Abstract/FREE Full Text
↵
2. Ruggero MA,
3. Rich NC,
4. Recio A,
5. Narayan SS,
6. Robles L
(1997) Basilar-membrane responses to tones at the base of the chinchilla cochlea. J Acoust Soc Am 101:2151–2163. doi:10.1121/1.418265
OpenUrl CrossRef PubMed
↵
2. Saenz M,
3. Langers DRM
(2014) Tonotopic mapping of human auditory cortex. Hear Res 307:42–52. doi:10.1016/j.heares.2013.07.016 pmid:23916753
OpenUrl CrossRef PubMed
↵
2. Schreiner CE,
3. Langner G
(1997) Laminar fine structure of frequency organization in auditory midbrain the main auditory midbrain. Nature 388:383–386. doi:10.1038/41106 pmid:9237756
OpenUrl CrossRef PubMed
↵
2. Stewart L,
3. Overath T,
4. Warren JD,
5. Foxton JM,
6. Griffiths TD
(2008) fMRI evidence for a cortical hierarchy of pitch pattern processing. PLoS One 3:e1470. doi:10.1371/journal.pone.0001470
OpenUrl CrossRef PubMed
↵
2. Striem-Amit E,
3. Hertz U,
4. Amedi A
(2011) Extensive cochleotopic mapping of human auditory cortical fields obtained with phase-encoding fMRI. PLoS One 6:e17832. doi:10.1371/journal.pone.0017832
OpenUrl CrossRef PubMed
↵
2. Talavage TM,
3. Sereno MI,
4. Melcher JR,
5. Ledden PJ,
6. Rosen BR,
7. Dale AM
(2004) Tonotopic organization in human auditory cortex revealed by progressions of frequency sensitivity. J Neurophysiol 91:1282–1296. doi:10.1152/jn.01125.2002 pmid:14614108
OpenUrl CrossRef PubMed
↵
2. Tang C,
3. Hamilton LS,
4. Chang EF
(2017) Intonational speech prosody encoding in the human auditory cortex. Science 357:797–801. doi:10.1126/science.aam8577 pmid:28839071
OpenUrl Abstract/FREE Full Text
↵
2. Thomas JM,
3. Huber E,
4. Stecker GC,
5. Boynton GM,
6. Saenz M,
7. Fine I
(2015) Population receptive field estimates of human auditory cortex. Neuroimage 105:428–439. doi:10.1016/j.neuroimage.2014.10.060 pmid:25449742
OpenUrl CrossRef PubMed
↵
2. Von Békésy G
(1960) Experiments in hearing. New York: Mcgraw Hill.
↵
2. Wallace MN,
3. Johnston PW,
4. Palmer AR
(2002) Histochemical identification of cortical areas in the auditory region of the human brain. Exp Brain Res 143:499–508. doi:10.1007/s00221-002-1014-z pmid:11914796
OpenUrl CrossRef PubMed
↵
2. Warren JD,
3. Jennings AR,
4. Griffiths TD
(2005) Analysis of the spectral envelope of sounds by the human brain. Neuroimage 24:1052–1057. doi:10.1016/j.neuroimage.2004.10.031 pmid:15670682
OpenUrl CrossRef PubMed
↵
2. Zatorre RJ,
3. Belin P,
4. Penhune VB
(2002) Structure and function of auditory cortex: music and speech. Trends Cogn Sci 6:37–46. doi:10.1016/s1364-6613(00)01816-7 pmid:11849614
OpenUrl CrossRef PubMed

In this issue

View Full Page PDF

Citation Tools

Respond to this article

Request Permissions

Keywords

Cited By...

Research Articles

Show more Research Articles

Behavioral/Cognitive

Show more Behavioral/Cognitive

[1] ↵

Albouy P,
Benjamin L,
Morillon B,
Zatorre RJ
(2020) Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367:1043–1047. doi:10.1126/science.aaz3468 pmid:32108113
OpenUrl Abstract/FREE Full Text

[3] Albouy P,

[4] Benjamin L,

[5] Morillon B,

[6] Zatorre RJ

[7] ↵

Allen EJ,
Oxenham AJ
(2014) Symmetric interactions and interference between pitch and timbre. J Acoust Soc Am 135:1371–1379. doi:10.1121/1.4863269 pmid:24606275
OpenUrl CrossRef PubMed

[9] Allen EJ,

[10] Oxenham AJ

[11] ↵

Allen EJ,
Burton PC,
Olman CA,
Oxenham AJ
(2017) Representations of pitch and timbre variation in human auditory cortex. J Neurosci 37:1284–1293. doi:10.1523/JNEUROSCI.2336-16.2016 pmid:28025255
OpenUrl Abstract/FREE Full Text

[13] Allen EJ,

[14] Burton PC,

[15] Olman CA,

[16] Oxenham AJ

[17] ↵

Allen EJ,
Moerel M,
Lage-Castellanos A,
De Martino F,
Formisano E,
Oxenham AJ
(2018) Encoding of natural timbre dimensions in human auditory cortex. Neuroimage 166:60–70. doi:10.1016/j.neuroimage.2017.10.050 pmid:29080711
OpenUrl CrossRef PubMed

[19] Allen EJ,

[20] Moerel M,

[21] Lage-Castellanos A,

[22] De Martino F,

[23] Formisano E,

[24] Oxenham AJ

[25] ↵

Assmann PF,
Nearey TM
(2008) Identification of frequency-shifted vowels. J Acoust Soc Am 124:3203–3212. doi:10.1121/1.2980456 pmid:19045804
OpenUrl CrossRef PubMed

[27] Assmann PF,

[28] Nearey TM

[29] ↵

Bendor D
(2012) Does a pitch center exist in auditory cortex? J Neurophysiol 107:743–746. doi:10.1152/jn.00804.2011 pmid:22049331
OpenUrl CrossRef PubMed

[31] Bendor D

[32] ↵

Bendor D,
Wang X
(2005) The neuronal representation of pitch in primate auditory cortex. Nature 436:1161–1165. doi:10.1038/nature03867 pmid:16121182
OpenUrl CrossRef PubMed

[34] Bendor D,

[35] Wang X

[36] ↵

Bendor D,
Wang X
(2006) Cortical representations of pitch in monkeys and humans. Curr Opin Neurobiol 16:391–399. doi:10.1016/j.conb.2006.07.001 pmid:16842992
OpenUrl CrossRef PubMed

[38] Bendor D,

[39] Wang X

[40] ↵

Bourk TR,
Mielcarz JP,
Norris BE
(1981) Tonotopic organization of the anteroventral cochlear nucleus of the cat. Hear Res 4:215–241. doi:10.1016/0378-5955(81)90008-3 pmid:7263511
OpenUrl CrossRef PubMed

[42] Bourk TR,

[43] Mielcarz JP,

[44] Norris BE

[45] ↵

Cooper NP
(1999) Vibration of beads placed on the basilar membrane in the basal turn of the cochlea. J Acoust Soc Am 106:L59–L64. doi:10.1121/1.428147 pmid:10615711
OpenUrl CrossRef PubMed

[47] Cooper NP

[48] ↵

Da Costa S,
van der Zwaag W,
Marques JP,
Frackowiak RSJ,
Clarke S,
Saenz M
(2011) Human primary auditory cortex follows the shape of Heschl's gyrus. J Neurosci 31:14067–14075. doi:10.1523/JNEUROSCI.2000-11.2011 pmid:21976491
OpenUrl Abstract/FREE Full Text

[50] Da Costa S,

[51] van der Zwaag W,

[52] Marques JP,

[53] Frackowiak RSJ,

[54] Clarke S,

[55] Saenz M

[56] ↵

De Angelis V,
De Martino F,
Moerel M,
Santoro R,
Hausfeld L,
Formisano E
(2018) Cortical processing of pitch: model-based encoding and decoding of auditory fMRI responses to real-life sounds. Neuroimage 180:291–210. doi:10.1016/j.neuroimage.2017.11.020
OpenUrl CrossRef PubMed

[58] De Angelis V,

[59] De Martino F,

[60] Moerel M,

[61] Santoro R,

[62] Hausfeld L,

[63] Formisano E

[64] ↵

de Heer WA,
Huth AG,
Griffiths TL,
Gallant JL,
Theunissen FE
(2017) The hierarchical cortical organization of human speech processing. J Neurosci 37:6539–6557. doi:10.1523/JNEUROSCI.3267-16.2017 pmid:28588065
OpenUrl Abstract/FREE Full Text

[66] de Heer WA,

[67] Huth AG,

[68] Griffiths TL,

[69] Gallant JL,

[70] Theunissen FE

[71] ↵

Dumoulin SO,
Wandell BA
(2008) Population receptive field estimates in human visual cortex. Neuroimage 39:647–660. doi:10.1016/j.neuroimage.2007.09.034 pmid:17977024
OpenUrl CrossRef PubMed

[73] Dumoulin SO,

[74] Wandell BA

[75] ↵

Feng L,
Wang X
(2017) Harmonic template neurons in primate auditory cortex underlying complex sound processing. Proc Natl Acad Sci USA 114:E840–E848. doi:10.1073/pnas.1607519114 pmid:28096341
OpenUrl Abstract/FREE Full Text

[77] Feng L,

[78] Wang X

[79] ↵

Fischl B
(2012) FreeSurfer. Neuroimage 62:774–781. doi:10.1016/j.neuroimage.2012.01.021 pmid:22248573
OpenUrl CrossRef PubMed

[81] Fischl B

[82] ↵

Formisano E,
Kim DS,
Di Salle F,
van de Moortele PF,
Ugurbil K,
Goebel R
(2003) Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40:859–869. doi:10.1016/s0896-6273(03)00669-x pmid:14622588
OpenUrl CrossRef PubMed

[84] Formisano E,

[85] Kim DS,

[86] Di Salle F,

[87] van de Moortele PF,

[88] Ugurbil K,

[89] Goebel R

[90] ↵

Gander PE,
Kumar S,
Sedley W,
Nourski KV,
Oya H,
Kovach CK,
Kawasaki H,
Kikuchi Y,
Patterson RD,
Howard MA,
Griffiths TD
(2019) Direct electrophysiological mapping of human pitch-related processing in auditory cortex. Neuroimage 202:116076. doi:10.1016/j.neuroimage.2019.116076 pmid:31401239
OpenUrl CrossRef PubMed

[92] Gander PE,

[93] Kumar S,

[94] Sedley W,

[95] Nourski KV,

[96] Oya H,

[97] Kovach CK,

[98] Kawasaki H,

[99] Kikuchi Y,

[100] Patterson RD,

[101] Howard MA,

[102] Griffiths TD

[103] ↵

Glasser MF,
Van Essen DC
(2011) Mapping human cortical areas in vivo based on myelin content as revealed by T1- and T2-weighted MRI. J Neurosci 31:11597–11616. doi:10.1523/JNEUROSCI.2180-11.2011 pmid:21832190
OpenUrl Abstract/FREE Full Text

[105] Glasser MF,

[106] Van Essen DC

[107] ↵

Hall DA,
Plack CJ
(2009) Pitch processing sites in the human auditory brain. Cereb Cortex 19:576–585. doi:10.1093/cercor/bhn108 pmid:18603609
OpenUrl CrossRef PubMed

[109] Hall DA,

[110] Plack CJ

[111] ↵

Hamilton LS,
Oganian Y,
Hall J,
Chang EF
(2021) Parallel and distributed encoding of speech across human auditory cortex. Cell 184:4626–4639.
OpenUrl

[113] Hamilton LS,

[114] Oganian Y,

[115] Hall J,

[116] Chang EF

[117] ↵

Hillenbrand JM,
Clark MJ
(2009) The role of f 0 and formant frequencies in distinguishing the voices of men and women. Atten Percept Psychophys 71:1150–1166. doi:10.3758/APP.71.5.1150 pmid:19525544
OpenUrl CrossRef PubMed

[119] Hillenbrand JM,

[120] Clark MJ

[121] ↵

Hyde KL,
Peretz I,
Zatorre RJ
(2008) Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46:632–639. doi:10.1016/j.neuropsychologia.2007.09.004 pmid:17959204
OpenUrl CrossRef PubMed

[123] Hyde KL,

[124] Peretz I,

[125] Zatorre RJ

[126] ↵
International Organization for Standardization (2003) Acoustics - normal equal-loudness contours (standard no. 226). Geneva: International Organization for Standardization.

[127] ↵

Kay KN,
Rokem A,
Winawer J,
Dougherty RF,
Wandell BA
(2013) GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Front Neurosci 7:1–15.
OpenUrl CrossRef PubMed

[129] Kay KN,

[130] Rokem A,

[131] Winawer J,

[132] Dougherty RF,

[133] Wandell BA

[134] ↵

Kay K,
Jamison KW,
Vizioli L,
Zhang R,
Margalit E,
Ugurbil K
(2019) A critical assessment of data quality and venous effects in sub-millimeter fMRI. Neuroimage 189:847–869. doi:10.1016/j.neuroimage.2019.02.006 pmid:30731246
OpenUrl CrossRef PubMed

[136] Kay K,

[137] Jamison KW,

[138] Vizioli L,

[139] Zhang R,

[140] Margalit E,

[141] Ugurbil K

[142] ↵

Kell AJE,
Yamins DLK,
Shook EN,
Norman-Haignere SV,
McDermott JH
(2018) A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98:630–644.e16. pmid:29681533
OpenUrl CrossRef PubMed

[144] Kell AJE,

[145] Yamins DLK,

[146] Shook EN,

[147] Norman-Haignere SV,

[148] McDermott JH

[149] ↵

Kikuchi Y,
Kumar S,
Baumann S,
Overath T,
Gander PE,
Sedley W,
Patterson RD,
Petkov CI,
Griffiths TD
(2019) The distribution and nature of responses to broadband sounds associated with pitch in the macaque auditory cortex. Cortex 120:340–352. pmid:31401401
OpenUrl PubMed

[151] Kikuchi Y,

[152] Kumar S,

[153] Baumann S,

[154] Overath T,

[155] Gander PE,

[156] Sedley W,

[157] Patterson RD,

[158] Petkov CI,

[159] Griffiths TD

[160] ↵

Kleiner M,
Brainard D,
Pelli D,
Ingling A,
Murray R,
Broussard C
(2007) What's new in Psychtoolbox-3? Percept 36:ECVP Abstr Suppl.

[162] Kleiner M,

[163] Brainard D,

[164] Pelli D,

[165] Ingling A,

[166] Murray R,

[167] Broussard C

[168] ↵

Krumhansl CL,
Iverson P
(1992) Perceptual interactions between musical pitch and timbre. J Exp Psychol Hum Percept Perform 18:739–751. pmid:1500873
OpenUrl CrossRef PubMed

[170] Krumhansl CL,

[171] Iverson P

[172] ↵

Langner G,
Sams M,
Heil P,
Schulze H
(1997) Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography. J Comp Physiol A 181:665–676. doi:10.1007/s003590050148
OpenUrl CrossRef PubMed

[174] Langner G,

[175] Sams M,

[176] Heil P,

[177] Schulze H

[178] ↵

Langers DRM,
van Dijk P
(2012) Mapping the tonotopic organization in human auditory cortex with minimally salient acoustic stimulation. Cereb Cortex 22:2024–2038. pmid:21980020
OpenUrl CrossRef PubMed

[180] Langers DRM,

[181] van Dijk P

[182] ↵

Levitt H
(1971) Transformed up-down methods in psychoacoustics. J Acoust Soc Am 49:467–477. doi:10.1121/1.1912375
OpenUrl CrossRef PubMed

[184] Levitt H

[185] ↵

Marozeau J,
de Cheveigné A,
McAdams S,
Winsberg S
(2003) The dependency of timbre on fundamental frequency. J Acoust Soc Am 114:2946–2957. pmid:14650028
OpenUrl CrossRef PubMed

[187] Marozeau J,

[188] de Cheveigné A,

[189] McAdams S,

[190] Winsberg S

[191] ↵

McAdams S
(2013) Musical timbre perception. In: The psychology of music. Amsterdam: Elsevier Inc.

[193] McAdams S

[194] ↵

Moerel M,
De Martino F,
Formisano E
(2012) Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity. J Neurosci 32:14205–14216. pmid:23055490
OpenUrl Abstract/FREE Full Text

[196] Moerel M,

[197] De Martino F,

[198] Formisano E

[199] ↵

Moerel M,
De Martino F,
Santoro R,
Ugurbil K,
Goebel R,
Yacoub E,
Formisano E
(2013) Processing of natural sounds: characterization of multipeak spectral tuning in human auditory cortex. J Neurosci 33:11888–11898. doi:10.1523/JNEUROSCI.5306-12.2013 pmid:23864678
OpenUrl Abstract/FREE Full Text

[201] Moerel M,

[202] De Martino F,

[203] Santoro R,

[204] Ugurbil K,

[205] Goebel R,

[206] Yacoub E,

[207] Formisano E

[208] ↵

Moerel M,
De Martino F,
Formisano E
(2014) An anatomical and functional topography of human auditory cortical areas. Front Neurosci 8:1–14.
OpenUrl CrossRef PubMed

[210] Moerel M,

[211] De Martino F,

[212] Formisano E

[213] ↵

Moore BCJ,
Glasberg BR,
Shailer MJ
(1984) Frequency and intensity difference limens for harmonics within complex tones. J Acoust Soc Am 75:550–561. doi:10.1121/1.390527 pmid:6699293
OpenUrl CrossRef PubMed

[215] Moore BCJ,

[216] Glasberg BR,

[217] Shailer MJ

[218] ↵

Narayan SS,
Temchin AN,
Recio A,
Ruggero MA
(1998) Frequency tuning of basilar membrane and auditory nerve fibers in the same cochleae. Science 282:1882–1884. doi:10.1126/science.282.5395.1882 pmid:9836636
OpenUrl Abstract/FREE Full Text

[220] Narayan SS,

[221] Temchin AN,

[222] Recio A,

[223] Ruggero MA

[224] ↵

Norman-Haignere S,
McDermott JH
(2016) Distortion products in auditory fMRI research: measurements and solutions. Neuroimage 129:401–413. doi:10.1016/j.neuroimage.2016.01.050 pmid:26827809
OpenUrl CrossRef PubMed

[226] Norman-Haignere S,

[227] McDermott JH

[228] ↵

Norman-Haignere S,
Kanwisher N,
McDermott JH
(2013) Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. J Neurosci 33:19451–19469. doi:10.1523/JNEUROSCI.2880-13.2013 pmid:24336712
OpenUrl Abstract/FREE Full Text

[230] Norman-Haignere S,

[231] Kanwisher N,

[232] McDermott JH

[233] ↵

Norman-Haignere S,
Kanwisher NG,
McDermott JH
(2015) Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88:1281–1296. doi:10.1016/j.neuron.2015.11.035 pmid:26687225
OpenUrl CrossRef PubMed

[235] Norman-Haignere S,

[236] Kanwisher NG,

[237] McDermott JH

[238] ↵

Nuttall AL,
Dolan DF
(1996) Steady-state sinusoidal velocity responses of the basilar membrane in guinea pig. J Acoust Soc Am 99:1556–1565. doi:10.1121/1.414732 pmid:8819852
OpenUrl CrossRef PubMed

[240] Nuttall AL,

[241] Dolan DF

[242] ↵

Pantev C,
Hoke M,
Lütkenhöner B,
Lehnertz K
(1989) Tonotopic organization of the auditory cortex: pitch versus frequency representation. Science 246:486–488. doi:10.1126/science.2814476 pmid:2814476
OpenUrl Abstract/FREE Full Text

[244] Pantev C,

[245] Hoke M,

[246] Lütkenhöner B,

[247] Lehnertz K

[248] ↵

Patterson RD,
Uppenkamp S,
Johnsrude IS,
Griffiths TD
(2002) The processing of temporal pitch and melody information in auditory cortex. Neuron 36:767–776. doi:10.1016/s0896-6273(02)01060-7 pmid:12441063
OpenUrl CrossRef PubMed

[250] Patterson RD,

[251] Uppenkamp S,

[252] Johnsrude IS,

[253] Griffiths TD

[254] ↵

Penagos H,
Melcher JR,
Oxenham AJ
(2004) A neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging. J Neurosci 24:6810–6815. doi:10.1523/JNEUROSCI.0383-04.2004 pmid:15282286
OpenUrl Abstract/FREE Full Text

[256] Penagos H,

[257] Melcher JR,

[258] Oxenham AJ

[259] ↵

Rademacher J,
Morosan P,
Schormann T,
Schleicher A,
Werner C,
Freund HJ,
Zilles K
(2001) Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13:669–683. doi:10.1006/nimg.2000.0714 pmid:11305896
OpenUrl CrossRef PubMed

[261] Rademacher J,

[262] Morosan P,

[263] Schormann T,

[264] Schleicher A,

[265] Werner C,

[266] Freund HJ,

[267] Zilles K

[268] ↵

Rauschecker JP,
Tian B,
Hauser M
(1995) Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268:111–114. doi:10.1126/science.7701330 pmid:7701330
OpenUrl Abstract/FREE Full Text

[270] Rauschecker JP,

[271] Tian B,

[272] Hauser M

[273] ↵

Ruggero MA,
Rich NC,
Recio A,
Narayan SS,
Robles L
(1997) Basilar-membrane responses to tones at the base of the chinchilla cochlea. J Acoust Soc Am 101:2151–2163. doi:10.1121/1.418265
OpenUrl CrossRef PubMed

[275] Ruggero MA,

[276] Rich NC,

[277] Recio A,

[278] Narayan SS,

[279] Robles L

[280] ↵

Saenz M,
Langers DRM
(2014) Tonotopic mapping of human auditory cortex. Hear Res 307:42–52. doi:10.1016/j.heares.2013.07.016 pmid:23916753
OpenUrl CrossRef PubMed

[282] Saenz M,

[283] Langers DRM

[284] ↵

Schreiner CE,
Langner G
(1997) Laminar fine structure of frequency organization in auditory midbrain the main auditory midbrain. Nature 388:383–386. doi:10.1038/41106 pmid:9237756
OpenUrl CrossRef PubMed

[286] Schreiner CE,

[287] Langner G

[288] ↵

Stewart L,
Overath T,
Warren JD,
Foxton JM,
Griffiths TD
(2008) fMRI evidence for a cortical hierarchy of pitch pattern processing. PLoS One 3:e1470. doi:10.1371/journal.pone.0001470
OpenUrl CrossRef PubMed

[290] Stewart L,

[291] Overath T,

[292] Warren JD,

[293] Foxton JM,

[294] Griffiths TD

[295] ↵

Striem-Amit E,
Hertz U,
Amedi A
(2011) Extensive cochleotopic mapping of human auditory cortical fields obtained with phase-encoding fMRI. PLoS One 6:e17832. doi:10.1371/journal.pone.0017832
OpenUrl CrossRef PubMed

[297] Striem-Amit E,

[298] Hertz U,

[299] Amedi A

[300] ↵

Talavage TM,
Sereno MI,
Melcher JR,
Ledden PJ,
Rosen BR,
Dale AM
(2004) Tonotopic organization in human auditory cortex revealed by progressions of frequency sensitivity. J Neurophysiol 91:1282–1296. doi:10.1152/jn.01125.2002 pmid:14614108
OpenUrl CrossRef PubMed

[302] Talavage TM,

[303] Sereno MI,

[304] Melcher JR,

[305] Ledden PJ,

[306] Rosen BR,

[307] Dale AM

[308] ↵

Tang C,
Hamilton LS,
Chang EF
(2017) Intonational speech prosody encoding in the human auditory cortex. Science 357:797–801. doi:10.1126/science.aam8577 pmid:28839071
OpenUrl Abstract/FREE Full Text

[310] Tang C,

[311] Hamilton LS,

[312] Chang EF

[313] ↵

Thomas JM,
Huber E,
Stecker GC,
Boynton GM,
Saenz M,
Fine I
(2015) Population receptive field estimates of human auditory cortex. Neuroimage 105:428–439. doi:10.1016/j.neuroimage.2014.10.060 pmid:25449742
OpenUrl CrossRef PubMed

[315] Thomas JM,

[316] Huber E,

[317] Stecker GC,

[318] Boynton GM,

[319] Saenz M,

[320] Fine I

[321] ↵

Von Békésy G
(1960) Experiments in hearing. New York: Mcgraw Hill.

[323] Von Békésy G

[324] ↵

Wallace MN,
Johnston PW,
Palmer AR
(2002) Histochemical identification of cortical areas in the auditory region of the human brain. Exp Brain Res 143:499–508. doi:10.1007/s00221-002-1014-z pmid:11914796
OpenUrl CrossRef PubMed

[326] Wallace MN,

[327] Johnston PW,

[328] Palmer AR

[329] ↵

Warren JD,
Jennings AR,
Griffiths TD
(2005) Analysis of the spectral envelope of sounds by the human brain. Neuroimage 24:1052–1057. doi:10.1016/j.neuroimage.2004.10.031 pmid:15670682
OpenUrl CrossRef PubMed

[331] Warren JD,

[332] Jennings AR,

[333] Griffiths TD

[334] ↵

Zatorre RJ,
Belin P,
Penhune VB
(2002) Structure and function of auditory cortex: music and speech. Trends Cogn Sci 6:37–46. doi:10.1016/s1364-6613(00)01816-7 pmid:11849614
OpenUrl CrossRef PubMed

[336] Zatorre RJ,

[337] Belin P,

[338] Penhune VB

Main menu

User menu

Search

Distinct Representations of Tonotopy and Pitch in Human Auditory Cortex

Abstract

Introduction

Materials and Methods

Participants

Stimuli

Experimental design and task

MRI

Anatomical and functional preprocessing

Statistical analysis

Encoding models

ROIs

Representational similarity analysis

Results

Behavioral results

Topographic mapping of both spectral content and fundamental frequency

Pure-tone cortical tonotopy primarily reflects spectral content

Shared and distinct tuning properties

Similarities in cortical tuning properties for pure-tone frequency, timbre, and pitch

Pitch tuning in auditory cortex

Spectral tuning model

Relationship between F0 and spectral density

Discussion

Relationship to previous studies

Voxel tuning across multiple dimensions

Pitch tuning in auditory cortex

Bilaterality in cortical representations

Limitations

Footnotes

References

In this issue

Citation Manager Formats

Jump to section

Keywords

Responses to this article

Jump to comment:

Related Articles

Cited By...

More in this TOC Section

Research Articles

Behavioral/Cognitive