Skip to main content
Advertisement
  • Loading metrics

Cortical signatures of auditory object binding in children with autism spectrum disorder are anomalous in concordance with behavior and diagnosis

  • Hari Bharadwaj ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Writing – original draft

    hbharadwaj@purdue.edu

    Affiliations Department of Speech, Language, & Hearing Sciences, Purdue University, West Lafayette, Indiana, United States of America, Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana, United States of America

  • Fahimeh Mamashli ,

    Contributed equally to this work with: Fahimeh Mamashli, Sheraz Khan

    Roles Data curation, Formal analysis, Writing – review & editing

    Affiliations Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America

  • Sheraz Khan ,

    Contributed equally to this work with: Fahimeh Mamashli, Sheraz Khan

    Roles Data curation, Formal analysis, Methodology, Writing – review & editing

    Affiliations Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America

  • Ravinderjit Singh,

    Roles Investigation, Validation

    Affiliation Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana, United States of America

  • Robert M. Joseph,

    Roles Methodology, Supervision, Writing – review & editing

    Affiliation Boston University School of Medicine, Boston, Massachusetts, United States of America

  • Ainsley Losh,

    Roles Data curation, Formal analysis

    Affiliation Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America

  • Stephanie Pawlyszyn,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America

  • Nicole M. McGuiggan,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America

  • Steven Graham,

    Roles Validation

    Affiliation Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America

  • Matti S. Hämäläinen,

    Roles Funding acquisition, Methodology, Writing – review & editing

    Affiliations Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America

  • Tal Kenet

    Roles Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America

Abstract

Organizing sensory information into coherent perceptual objects is fundamental to everyday perception and communication. In the visual domain, indirect evidence from cortical responses suggests that children with autism spectrum disorder (ASD) have anomalous figure–ground segregation. While auditory processing abnormalities are common in ASD, especially in environments with multiple sound sources, to date, the question of scene segregation in ASD has not been directly investigated in audition. Using magnetoencephalography, we measured cortical responses to unattended (passively experienced) auditory stimuli while parametrically manipulating the degree of temporal coherence that facilitates auditory figure–ground segregation. Results from 21 children with ASD (aged 7–17 years) and 26 age- and IQ-matched typically developing children provide evidence that children with ASD show anomalous growth of cortical neural responses with increasing temporal coherence of the auditory figure. The documented neurophysiological abnormalities did not depend on age, and were reflected both in the response evoked by changes in temporal coherence of the auditory scene and in the associated induced gamma rhythms. Furthermore, the individual neural measures were predictive of diagnosis (83% accuracy) and also correlated with behavioral measures of ASD severity and auditory processing abnormalities. These findings offer new insight into the neural mechanisms underlying auditory perceptual deficits and sensory overload in ASD, and suggest that temporal-coherence-based auditory scene analysis and suprathreshold processing of coherent auditory objects may be atypical in ASD.

Introduction

Successful navigation of environments with multiple stimuli fundamentally relies on the brain’s ability to perceptually organize the barrage of sensory information into discrete coherently bound objects on which cognitive processes such as selective attention can act. Failure of this scene-segregation process, where one source stands out as the foreground “figure” and the remaining stimuli form the “background,” can result in an overwhelming sensory experience that makes it difficult to select a source of interest while suppressing the others [1,2]. For both visual and auditory processing, temporal coherence across assemblies of neurons that code for different stimulus features is thought to promote binding of those features into coherent perceptual objects [3,4]. A leading hypothesis about sensory processing abnormalities in autism spectrum disorder (ASD) is that this kind of temporal “synthesis” of sensory information is atypical [58]. This hypothesis stems from behavioral data indicating that individuals with ASD often show impaired processing of dynamic stimuli, such as the coherent motion of visual dots [9]. In the auditory domain, where stimuli are naturally dynamic, atypical sensory-stimulation-driven behaviors, particularly the experience of sensory overload and difficulty with being able to selectively listen to a foreground sound source of interest, are ubiquitous in ASD. These deficits are most acutely expressed in complex environments with multiple stimuli, where separation of foreground speech from background noise is essential [1012]. Although many lines of behavioral evidence in ASD are consistent with the impaired temporal synthesis hypothesis, direct neural correlates of such deficits have not been identified. In the visual domain, atypical neural responses to illusory figures and contours have been interpreted as indirect evidence of impaired figure–ground segregation in ASD [1315]. However, it is not known whether similar processes may also underlie the widespread auditory processing abnormalities in ASD [1012,1622], which in turn may contribute to the well-documented speech processing and language impairments associated with the disorder [21,2331].

Here, we investigated whether auditory temporal coherence processing, and thus auditory figure–ground segregation and/or suprathreshold processing of coherent figures, is impaired in ASD, by employing a novel auditory paradigm. By manipulating temporal coherence in the scene with synthetic sounds, the paradigm was designed such that the acoustic features of the stimulus perceptually bind together into auditory objects with different levels of salience, with the salience of the foreground “figure” object increasing with increasing temporal coherence. More specifically, the stimulus consisted of a compendium of 20 different tones that were spaced in frequency such that they would excite equally spaced sections along the cochlea, following prevailing models of tonotopy in the human auditory system. The individual tones (Fig 1A) were amplitude modulated using a random envelope, with the envelope fluctuations being statistically independent, i.e., temporally incoherent, across the tones at the start of the stimulus. At set intervals through the stimulus time course, a subset of N tones out of the 20 (N = 6, 12, or 18) were switched from being modulated independently to being modulated with temporally coherent envelope fluctuations (Fig 1B). At this point, the stimulus percept would change from a large number of buzzing tones to a broadband irregular click-train-like sound popping out as a coherent auditory object from the background buzzing. A key feature of the stimulus design was that the frequency spacing between any 2 tones was 50% larger than the bandwidth of cochlear filters, so that the different tones would excite mostly distinct tonotopic sections of the ascending auditory pathway. Consequently, robust responses to increases in temporal coherence would need to arise primarily from downstream neural computations that combine information across neurons driven by different tonotopic channels. As the number of tones that switched from being incoherent to coherent increased, the perceptual pop-out would become more salient, and would be expected to elicit increasingly larger neural responses. Sample stimuli for all 3 coherence levels are available in S1S3 Audios. Note that our stimulus paradigm deliberately maintains fixed modulation statistics within each tonotopic channel throughout the length of the stimulus, thus avoiding discontinuities within any given channel, and deviating from the more classic design pioneered by Teki et al. [3234]. We chose to do this in spite of the fact that this meant coherent periods might exhibit an increase in overall amplitude, because cochlear tonotopic processing dictates that the central nervous system does not have access to the overall amplitude and instead is driven by individual tonotopic channels. Thus, any neural responses that reflect the overall amplitude would also have to arise from neural computations that use the temporal coherence to combine information across different tonotopic channels. A supplementary behavioral experiment was also conducted to validate this design.

thumbnail
Fig 1. Stimulus properties and design.

(A) Tone mixture used as carrier for the temporal coherence stimuli. (B) Depiction of one trial. Yellow right-pointing arrows on top mark onset of non-coherent modulation periods (1 s), and white down-pointing arrows on top mark onset of coherent modulations of N = 6, 12, or 18 out of 20 tones, randomized. There were two 1-s-long coherent periods per trial. (C) Sample acoustic waveform from one trial.

https://doi.org/10.1371/journal.pbio.3001541.g001

To quantify the cortical processing of auditory temporal coherence, we recorded magnetoencephalography (MEG) signals from 26 typically developing (TD) and 21 ASD participants matched on age (7–17 years), gender, and IQ, using the above paradigm. The stimulus was presented passively while the children were watching a silenced, no subtitles, video of their choice. The participants were instructed to ignore the sounds they heard, so as to minimize potential differences due to attention across the 2 groups, and to focus on group differences in low-level auditory processing. We hypothesized that children with ASD would show reduced growth of the cortical response with increasing salience of the foreground object, and thus atypical neural signatures of temporal coherence processing.

Results

Source localization and evoked responses

The MEG signals at the sensors were used to estimate neural source currents on the cortex using individual structural magnetic resonance imaging (MRI) data. As expected, in all participants, the averaged evoked responses relative to the overall onset of the stimulus (yellow rightward-pointing arrows in Fig 1B) were localized to the early auditory cortex in both hemispheres. Within these left and right regions of interest (ROIs), there were no significant group differences or tendencies towards group differences in either amplitude or latency of the evoked responses, in either hemisphere, across all identified peaks (Fig 2). All subsequent results are reported based on data from these individually derived ROIs. Because no other brain areas showed a consistent response across all participants, all of our analyses were limited to these ROIs.

thumbnail
Fig 2. Responses evoked by overall stimulus onset.

(A) Averaged left hemisphere evoked responses relative to stimulus onset, in source space, based on individually identified regions of interest (inset), for each group. (B) Same as (A), for the right hemisphere. Shaded areas show standard error per group. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656). ASD, autism spectrum disorder; L, left; R, right; TD, typically developing.

https://doi.org/10.1371/journal.pbio.3001541.g002

The responses evoked by the changes in temporal coherence (white downward-pointing arrows in Fig 1B), rather than the responses evoked by the overall stimulus onsets, are of primary interest to the question of figure–ground segregation. There were no significant differences in the responses evoked by these coherence changes between the right and left hemispheres for either group (S1 Fig), and therefore results from both hemispheres were combined for all subsequent analyses.

Auditory cortex responses evoked by onset of stimulus coherence

At the onset of the coherent portion of the stimulus (white downward-pointing arrows in Fig 1B), a robust auditory evoked response was observed for all 3 levels of stimulus coherence, in both the TD (Fig 3A) and ASD (Fig 3B) groups, with both groups showing an increase in the magnitude of the response as the percentage of coherent tones increased (6, 12, and 18 out of 20), in line with the increasing perceptual salience of the foreground figure. Note that the baseline relative to which these coherence-related responses were quantified was the incoherent portion of the stimulus (just before the downward-pointing white arrows in Fig 1B) and not the pre-stimulus silent periods. In the TD group, both the M100 (M1, 50–150 ms) and the M200 (M2, 250–450 ms) components of the response were reliably identifiable for most participants, while in the ASD group this was true only for the M1 component. Because the M2 peak was not discernable for most ASD participants (S2B Fig), for the remaining calculations we focused on the response in the combined M1 + M2 time window (50 ms to 450 ms in Fig 3A and 3B). Quantifying the magnitude of the response within this time window showed that, overall, the growth in response magnitude with increasing coherence was significantly more sluggish in the ASD group than in the TD group (Fig 3C). The main effect of group (F[1,45] = 8.55, p = 0.005) and the group × condition interaction (F[2,90] = 3.61, p = 0.03) were both significant. The groups differed most in the N = 18/20 tones condition (t[45] = 3.53, p = 0.001).

thumbnail
Fig 3. Evoked and induced responses following changes in temporal coherence.

(A) Evoked responses in the typically developing (TD) group, where time = 0 represents the onset of a 1-s coherently modulated period within the trial (white arrows in Fig 1B), not the onset of the trial. The responses are plotted for each of the 3 coherent-modulation conditions (N = 6, 12, and 18), each averaged separately. (B) Same as (A), for the autism spectrum disorder (ASD) group. (C) Averaged amplitude of the responses in (A) and (B), over the 50-ms to 450-ms time window, for each condition and group. ***The group difference was significant at the N = 18 condition. (D) Time-frequency plot for the TD group (top) and ASD group (bottom) in response to the N = 18 condition, where time = 0 represents the onset of the 1-s coherently modulated period within the trial (white arrows in Fig 1B), not the onset of the trial. Dotted line outlines the time-frequency window used to compare the two groups. (E) Group difference within the time-frequency window outlined in (D). For both the evoked response and the induced response analysis, the magnetoencephalography (MEG) data during the incoherent stimulus portion that preceded the onset of the coherent figure was used as the baseline. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656).

https://doi.org/10.1371/journal.pbio.3001541.g003

Another component of the cortical response to temporal coherence is the induced gamma band (>30 Hz) activity, which has been associated with perceptual figure–ground segregation in vision [35]; with processing of complex sounds such as speech in audition [36,37], particularly with active listening [38]; and with temporal synthesis of information via predictive coding [39]. Gamma band activity is also thought to be mediated by inhibitory (GABAergic) mechanisms [40], which have been documented to be abnormal in ASD [4150]. In the N = 18/20 coherent tones condition, the TD participants had increased gamma band power (30–70 Hz), especially during the later period (400 ms onwards) of the response (Fig 3D, top), while no such increase was observed in the ASD group (Fig 3D, bottom). Indeed, the gamma band power was significantly lower in the ASD group (Fig 3E). Thus, not only was the response evoked by the object formation event reduced in ASD, but neural oscillations induced by the salient auditory figure were also reduced.

Classification by diagnosis and correlations with behavioral measures

To assess the relevance of the evoked responses and induced gamma band power to the ASD diagnosis, we used a blind linear support vector machine (SVM) classifier, with the individual z-scores for gamma band power and normalized evoked responses (both for the N = 18 coherent tones condition) as the predictive features. The classifier showed an accuracy of 83% ± 3% for group classification (Fig 4A).

thumbnail
Fig 4. Behavioral correlations and group classification.

(A) Group classification, using the individual evoked responses and induced gamma band activity as the inputs to the model. Yellow background depicts predicted ASD diagnosis. (B) The behaviorally assessed Social Responsiveness Scale–Social Communication and Interaction (SRS-SCI) scores, plotted against the same scores predicted by the model constructed using individual evoked responses and induced gamma band activity. (C) Same as (B), for the age-corrected Sensory Profile Questionnaire–Auditory Profile Subscale (SPQ-APSAC) scores. (D) The averaged evoked responses for the N = 18 tones condition within the trial, plotted relative to the NEPSY-II–Inhibition Inhibition Contrast Scaled Score–Inhibition (ICSSinh) behavioral measure, for each group. No correlation was observed between these measures. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656). ASD, autism spectrum disorder; MEG, magnetoencephalography; TD, typically developing.

https://doi.org/10.1371/journal.pbio.3001541.g004

We then examined the correlation between the cortical response and ASD symptom severity, using 2 behavioral measures: The Social Responsiveness Scale–Social Communication and Interaction (SRS-SCI) subscale, which assesses social-communicative symptoms, and the age-corrected scores (SPQ-APSAC) from the Sensory Profile Questionnaire–Auditory Profile Subscale (SPQ-APS), which assesses auditory processing abnormalities. To that end, we constructed linear models to predict the SRS-SCI and SPQ-APSAC from the individual z-scores for the evoked responses and the gamma band power, and compared the predicted scores to the behaviorally measured individual scores. We found that the predicted scores for both the SRS-SCI (Fig 4B) and the SPQ-APSAC (Fig 4C) were significantly correlated with the behavioral scores. Note that even without a correction for age, the predicted SPQ-APS still correlated with the observed SPQ-APS, just not as strongly (S3 Fig).

To quantify whether the observed effects might be attributable to gross variations in attentional control, we used data from the participants for whom the NEPSY-II–Inhibition Inhibition Contrast Scaled Score–Inhibition (ICSS-I) was collected. During the MEG session, participants were asked to attend to the movie (silenced, no subtitles) they were watching, and ignore the stimulus. Despite the passive nature of the measurements, the coherent auditory figure can elicit a pop-out effect, where attention is drawn in a bottom-up manner. It is possible that those participants less able to ignore the stimulus would also have stronger responses. The ICSS-I measures precisely this—the ability to inhibit attention. While there was no detectable group difference in ICSS-I between the groups, it is nonetheless possible that the ICSS-I would correlate with the magnitude of response (irrespective of diagnosis), should the response be driven in part by attention being drawn to the stimulus. We found no correlation between the ICSS-I and the magnitude of the MEG evoked responses in either group (Fig 4D), indicating that the group differences are likely due to lower-level auditory processes, as hypothesized, rather than to attention-driven processes. There was also no correlation between the ICSS-I scores and the MEG gamma band activity (S5 Fig).

Lastly, to test whether increases in overall amplitude during the coherent portions of the stimulus—rather than across-channel coherence relationships—could account for the observed group differences, we conducted a small supplementary behavioral experiment in a different group of adult participants (6 participants). In this experiment, the across-channel relationships were manipulated by changing the spectral separation between the coherent subset of tones for 2 coherence levels near threshold (either 4 coherent tones or 6 coherent tones). Crucially, these manipulations don’t change the overall amplitude. Yet, these manipulations had a large effect on figure detection, showing that across-channel relationships, and not overall amplitude changes, were the main factor driving figure–ground segregation (S4 Fig). These results are in line with the constraints imposed by cochlear tonotopic processing, confirming the validity of our stimuli for probing temporal-coherence processing that combines information across neurons responding to different tonotopic channels.

Discussion

This study aimed to test whether the cortical correlates of auditory figure–ground segregation based on temporal coherence may be abnormal in ASD. As hypothesized, we found that children with ASD had significantly attenuated evoked responses to the pop-out of the foreground figure, alongside a lower magnitude of induced gamma band activity. Importantly, the cortical measures were not correlated with the behaviorally assessed ability to suppress attention, suggesting that lower-level auditory processes contribute to the observed abnormalities, rather than overall differences in attention to the pop-out of the auditory figure while being distracted by the movie. The cortical measures did, instead, correlate with both ASD severity, measured behaviorally, and abnormality of auditory sensory processing, also measured behaviorally.

These results suggest that neural processing of the temporal coherence of the acoustic constituents of an auditory stimulus is atypical in ASD. More specifically, the growth of the cortical neural response with increasing levels of temporal coherence was more sluggish in ASD. This is consistent with a scenario where the neurophysiological substrates that support the perception of highly temporally coherent complex sounds as salient foreground objects are impaired in ASD. Given the importance of temporal coherence as a binding and scene segregation cue in natural sounds, the atypical processing of temporal coherence in ASD could contribute to poorer object binding, as has indeed been suggested [51], and demonstrated indirectly in the visual domain in ASD [1315]. In scenes with multiple sound sources, the reduced growth of the response with temporal coherence could lead to foreground sounds “standing out” less saliently, contributing to impaired auditory selective attention, i.e., “filtering,” and the experience of sensory overload [2]. Such abnormalities would inevitably also impact speech and language processing, which are highly temporally sensitive [36,52], especially in environments with multiple sound sources. Indeed, speech impairments in noisy environments in particular have been documented in ASD [11,53,54].

The pattern of evoked-response differences seen between ASD and TD (i.e., largest differences in the high-coherence condition) raises the possibility that suprathreshold salience of temporally coherent auditory figures, rather than threshold sensitivity to temporal coherence, may be impaired in ASD. In the visual domain, individuals with ASD show elevated thresholds for detecting (i.e., reduced sensitivity) coherently moving dots [55] and global patterns of movement of simple forms [9]. In the auditory domain, individuals with ASD show reduced suprathreshold ability to take advantage of coherent temporal fluctuations to derive masking release [12]. However, to the best of our knowledge, threshold sensitivity in ASD to temporal-coherence-based auditory pop-out has not been directly assessed. Our results showed that a combined M1+M2 MEG evoked response was measurable for all three coherence levels for both groups, suggesting suprathreshold rather than sensitivity deficits in ASD. However, the M2 peak per se was not discernable for a majority of ASD subjects at any of the three coherence levels. It is presently unknown how the M1 and M2 responses each relate to behavioral sensitivity to temporal coherence. Future behavioral studies would be needed to explore whether sensitivity or suprathreshold auditory coherence processing is anomalous in ASD.

While it is not possible to rule out the role of attention-based differences in contributing to the reduced growth of evoked responses in ASD, it is unlikely that attentional abnormalities are the primary driver of these differences. Tone-cloud stimuli similar to the sounds employed in this study have been used with functional MRI, MEG, and electroencephalography to probe auditory figure–ground segregation in typical adults. These studies showed that although evidence of temporal coherence processing is measurable in the auditory cortex, attention can significantly enhance the neural representation of the foreground figure, especially in later parietal regions [32,34,56,57]. Indeed, animal models show that temporal coherence sensitivity and neural computations that support auditory scene analysis begin subcortically, as early as the cochlear nucleus [58]. There is also evidence of reduced, rather than increased, attention in ASD to non-speech auditory stimuli, even when perception of these stimuli is identical between the groups [59]. By virtue of using a passive design and focusing the analysis on the auditory cortex, the results of the present study show atypical processing in ASD of even early aspects of temporal-coherence-mediated computations. Of course, this does not preclude the possibility that in an active auditory scene analysis task, impairments in later attentional processing in ASD would further exacerbate the perceptual deficits. Attention networks are also known to be anomalous in ASD [60]. Another factor that makes it less likely that deficits in attentional processing can fully account for the differences reported in this study is that we observed no gross differences in the evoked responses relative to the overall onset of the auditory stimulus; it is well known that attention modulates auditory evoked responses [61,62], and thus the absence of differences in this overall response is consistent with an absence of difference in attentional load for this particular paradigm. Furthermore, across participants in both groups, there was no correlation between ICSS-I score, which measures overall ability to inhibit attention, and MEG evoked responses. Thus, the results of our study suggest that pre-attentive aspects of auditory figure–ground segregation are already impaired in ASD.

In addition to sluggish growth of evoked responses in ASD, we also found that the induced gamma band activity that was observed in the TD group (in the high-coherence condition) was reduced in the ASD group. There are many possible interpretations for the observed gamma band differences. Computational models of gamma rhythms predict that these rhythms may be involved in coherence-dependent selective routing of sensory information to downstream regions [63]. Earlier in vitro studies using voltage-sensitive dyes, and more recent in vivo optogenetic studies, provide evidence that this coherence-dependent gating is mediated by GABAergic processing [40,64], which is known to be atypical in ASD [4250]. However, unlike the evoked-response differences, these gamma band differences were observed later in time (400 ms and after) relative to the onset of coherent modulations. Thus, although the evoked responses likely reflect pre-attentive processing at least in part, the increase in induced gamma band activity in the TD participants may indicate attention capture. The lack of gamma band activity in ASD may thus indicate failure of attention capture, either as a consequence of the impaired coherence processing and reduced salience with which the foreground figure pops out, or because of differences in attentional processing in general in ASD [65]. Another possibility is that the gamma band activity may encode prediction errors associated with the temporal synthesis of information during predictive coding [39], which also may be impaired in ASD [66]. These results are consistent with our prior studies of ASD that also found evidence of abnormal local functional connectivity, as well as increased feedforward connectivity alongside reduced top-down gain control [6769]. More targeted future work is needed to delineate the precise mechanisms that contribute to the gamma band differences observed in the present study.

Finally, our stimulus design deviated from the established stochastic figure–ground stimuli used by Teki et al. [32], and as a result, the overall amplitude of the stimulus was allowed to change during coherent periods. As discussed in the Results section, because of the cochlear constraints, it is unlikely that these overall amplitude changes themselves would drive differences in evoked responses. Furthermore, because many children with ASD exhibit hypersensitivity to more intense sounds, it has been hypothesized by many research groups that neural responses in ASD would be larger, not smaller, as stimulus amplitude is increased. Despite great interest in this question, surprisingly, there are no reports to our knowledge describing atypical auditory cortical responses as a function of stimulus intensity in children with ASD and average or above average IQ. Our own unpublished MEG data examined exactly this question, in a group of 11 TD children and 15 ASD children (average age 10 years, all with IQ > 80); to our own surprise, we found no indication of group differences in the cortical evoked-response amplitudes as a function of stimulus intensity (55 dB to 85 dB in 10-dB steps). The lack of published data on this question may indicate that other groups also did not see such differences, and therefore also left the data unpublished.

In summary, we have found that cortical evoked responses to increasingly coherent auditory stimuli show reduced growth of magnitude in children with ASD relative to TD children. The fact that the observed neurophysiological metrics correlated with behavioral measures that tap into individual ASD presentations, specifically the SRS-SCI and auditory processing scores, confirms the relevance of these neural measures to ASD. One key advantage of our novel stimuli and the passive design is that the paradigm is translatable to patient populations that are difficult to work with, where behavioral assessments cannot be performed. Along the same lines, the paradigm is also applicable to animal models of ASD and other neurodevelopmental disorders. In sum, our observations of reduced growth of neural responses with increasing temporal coherence of auditory stimuli in ASD provide novel insight into the mechanisms that ultimately result in the abnormal sensory and social-communicative processing deficits characteristic of ASD.

Materials and methods

Participants

Twenty-six TD school-aged children and 21 children with ASD participated in the study. The study was conducted in accordance with the principles expressed in the Declaration of Helsinki. Parents provided written informed consent according to protocols approved by the Massachusetts General Hospital Institutional Review Board (IRB protocol #2005P001768). Participants provided assent in addition to parent consent. Phenotypic data collected from the participants are summarized in Table 1. The age range in both groups was 7–17 years, with the mean age being 13.8 years and the median age being 14 years. All participants were right-handed, with handedness information collected using the Dean Questionnaire [70]. Hearing was assessed by testing cochlear sensitivity to sounds, and the groups were indistinguishable based on distortion-product otoacoustic emissions (see “Otoacoustic emissions” below). Participants with ASD had a prior clinical diagnosis of ASD and met ASD criteria on the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) [71], administered by trained research personnel who had established inter-rater reliability (with co-author RMJ). The Social Communication Questionnaire, Lifetime Version, was administered to rule out ASD in TD participants, and to further confirm ASD in ASD participants. ASD participants who did not meet a cutoff of ≥15 on the Social Communication Questionnaire, or had a borderline score on the ADOS-2, were then further evaluated by expert clinician co-author RMJ, to confirm the ASD diagnosis. Individuals with autism-related medical conditions (e.g., fragile X syndrome, tuberous sclerosis) and other known risk factors (e.g., gestation < 36 weeks) were excluded from the study. All TD participants were below the threshold on the Social Communication Questionnaire and were confirmed to be free of any neurological or psychiatric conditions, and free of substance use for the past 6 months, via parent reports and self reports.

Verbal IQ (VIQ) and nonverbal IQ (NVIQ) were assessed using the Kaufman Brief Intelligence Test, Second Edition (KBIT-2) [73], for 18 ASD and 17 TD participants, and using the Differential Ability Scales–II (DAS-II) [74] for 3 ASD and 9 TD participants. The 2 groups differed on both NVIQ and VIQ. The group difference in NVIQ was driven by a single ASD participant with a NVIQ of 59. When this participant was removed from the sample, the group difference was no longer significant. We therefore checked all the correlation results excluding this participant, and all of the significant correlations and classification results remained significant to a similar extent with and without this participant. Given these results, we kept the participant for all of the analyses, and did not correct for NVIQ. The group difference in VIQ is in line with the ASD phenotype, and therefore we did not correct for this difference. For the correlations between MEG data and behavioral scores, we focused on 2 ASD-related behavioral assessments. The first was the Social Responsiveness Scale, Second Edition (SRS-2) parent report [75], which was designed as a quantitative measure of autism-related symptoms. The SRS-2 yields separate subscale scores for social communication and interaction (SRS-SCI) and for restricted interests and repetitive behaviors (SRS-RRB). Confirmatory factor analyses demonstrate a 2-factor structure that differentiates the 4 SRS-SCI subscales from the SRS-RRB subscale [76]. Here, we focused on the SRS-2 SRS-SCI composite as a measure of ASD social symptom severity. The second ASD-related behavioral assessment was the Sensory Profile Questionnaire [10], and specifically the score from the auditory section of the sensory profile (SPQ-APS). Note that because we hypothesize that the processes probed here would correlate with sensory sensitivity or avoidance, rather than sensory seeking, we chose to exclude the response to question 8, which probes sensory seeking directly (“Enjoys strange noises/seeks to make noise for noise’s sake”), from the score. We also checked the results with question 8 included, and they remained significant, although the effect was slightly reduced.

SPQ-APS age effects

Because of the wide age range, we checked whether any of the MEG responses or behavioral measures correlated with age. Of all of the considered MEG and behavioral measures, the only measure that was (weakly) correlated with age was SPQ-APS (S3A Fig; p < 0.04 within the ASD group). To account for this, we adjusted the SPQ-APS scores for age, and used the residuals to examine correlations with the MEG measures (Fig 3C). The age-corrected value is referred to as SPQ-APSAC in the text. The correlations between the observed and predicted SPQ-APS values uncorrected for age are shown in S3B Fig.

Otoacoustic emissions

To obtain an objective correlate of cochlear mechanical function, distortion-product otoacoustic emission (DPOAE) growth functions were measured as a function of the level of the f2 primary tone (f2 = 1, 2, 4, or 8 kHz) using an integrated probe with 2 independent sound sources and a low noise microphone (Etymotic ER-10C, Etymotic Research). The frequency and level of the f1 tone were varied according to the formula provided by [77] to maximize the level of the DPOAE for each level of the f2 tone. The DPOAE level was estimated at the distortion frequency of 2f1–f2. The number of trials was varied such that the noise floor (estimated by subtracting the mean of odd trials from even trials) was −15 dB SPL or better at 1 kHz and 2 kHz, and −25 dB SPL or better at 4 and 8 kHz. All participants had 2f1−f2 DPOAE amplitudes of at least 6 dB above the noise floor for L2 of 40 dB SPL, consistent with normal hearing. The TD and ASD groups were indistinguishable based on the DPOAE growth functions at each of the 4 tested frequencies.

MEG data acquisition and preprocessing

MEG data were acquired inside a magnetically shielded room (IMEDCO) using a MEG 306-channel dc-SQUID Neuromag VectorView system (Elekta-Neuromag, Helsinki, Finland) with 204 planar gradiometers and 102 axial magnetometers. Two bipolar electro-oculogram (EOG) electrode pairs measured horizontal eye movements and blinks. A bipolar chest electrode pair was used to record electrocardiogram (ECG) data. All data acquisition was performed at a sampling rate of 3,000 Hz and subsequently downsampled to 1,000 Hz for analysis. After manual marking of bad channels during acquisition time, signal-space separation (SSS) was done to attenuate contributions from sources physically outside the radius of the helmet [78]. Four head position indicator coils were used to monitor head position. Samples containing artifacts associated with eye movements and blinks were extracted by detecting peaks from the vertical EOG channel; samples with cardiac artifacts were similarly identified from ECG data. These samples were used to define spatial filters to help suppress artifacts using the signal-space projection method: 2 filters for blink artifact removal and 2 for cardiac artifact removal. Data were then band-pass filtered to between 1 and 90 Hz. Data were then segmented into epochs time-locked to stimulus events. Epochs were rejected if the peak-to-peak range over the epoch exceeded either 1,000 fT in any magnetometer channel or 3,000 fT/cm in any planar gradiometer channel. Every stimulus condition included between 200 and 240 epochs for every participant included in the data analysis.

Source localization

T1-weighted, high-resolution MPRAGE (magnetization-prepared rapid acquisition gradient-echo) structural images were acquired on a 3.0 T Siemens Trio whole-body MRI scanner (Siemens Medical Systems, Erlangen, Germany) using a 32-channel head coil. The in-plane resolution was 1 × 1 mm2, slice thickness was 1.3 mm with no gaps, and the repetition time/inversion time/echo time/flip angle values were 2,530 ms/1,100 ms/3.39 ms/7 degrees. The geometry of each participant’s cortical surface was reconstructed from the 3D structural MRI data using FreeSurfer software (http://surfer.nmr.mgh.harvard.edu). After correcting for topological defects, cortical surfaces were triangulated with dense meshes with about 130,000 vertices in each hemisphere. For visualization, the surfaces were inflated, thereby exposing the sulci [79]. The cortical surface was decimated to a grid of 10,242 dipoles per hemisphere, corresponding to a spacing of about 5 mm between adjacent source locations on the cortical surface. The watershed algorithm was used to generate the inner skull surface triangulations from the T1-weighted MR images of each participant. The MEG forward solution was computed using a single-compartment boundary element model (BEM) [80]. The head position information from the first run was used to estimate the sensor location relative to the source space. Sensor data from subsequent runs were transformed to correspond to the head position of the first run during the signal-space separation preprocessing step [78]. The cortical current distribution was estimated using minimum-norm estimate (MNE) software (http://www.martinos.org/mne); in this solution, we assumed that the orientation of the source was fixed and perpendicular to the cortical mesh. Measurement noise covariance was estimated from the pre-stimulus MEG data pooled across all conditions and runs. To reduce the bias of the MNEs toward superficial source distributions, we used a noise-normalization procedure to obtain dynamic statistical parametric maps (dSPMs) as z-scores [81].

We used the sensor-level evoked response combined across all conditions to localize the sources on the cortex; ROIs for analysis were defined based on individual-participant source localization results. No whole-brain group analysis was performed for this study. ROIs were defined based on the localization of the onset response evoked by the start of the auditory stimulus pooled across all conditions. The dSPM scores in the 90- to 140-ms time window following the onset of the auditory stimulus were averaged to yield a single z-score per pixel. This whole-brain z-score map was then thresholded such that 2 contiguous clusters remained, 1 per hemisphere. It was manually verified that these clusters overlapped with the anatomical FreeSurfer label containing the Heschl’s gyrus in each participant. Note that the point spread from MEG inverse imaging often yielded activations at adjacent sulci (as in Fig 1C). For each participant, the largest contiguous set of 32 vertices in the left and right hemisphere ROIs was then averaged to extract source time courses for analysis. Because each individual’s ROI-based time courses were self-normalized in the data analysis using z-scoring relative to the baseline fluctuations, a fixed ROI size was used rather than attempt to tailor the ROI size to each individual’s activation pattern. To more readily quantify the difference between the 2 groups, as well as to minimize the effects of noise, individual ROI time series were normalized based on the individual baseline fluctuations using z-scoring.

Auditory stimulus

The primary goal of the stimulus design is that the temporal coherence between acoustic features should be amenable to parametric manipulation. Twenty tones were equally spaced from 200 Hz to 8 kHz on an “ERB scale” based on psychophysical estimates of cochlear tuning derived from a simultaneous masking paradigm [82] and were synthesized such that each tone had an intensity of 60 dB SPL (thus, the overall level is approximately 73 dB SPL). The tone spacing corresponded to approximately 1.5 ERBs for moderate sound levels. Given that human cochlear tuning bandwidths are likely narrower than those estimated from simultaneous masking [83], the individual tones would interact only minimally at the auditory periphery. Therefore, their temporal coherence or lack thereof would, at least in part, have to be computed downstream by the central auditory system based on the coherence of neural firing patterns. Each tone was modulated by a random envelope, which was a half-wave rectified and smoothed band-limited (4–24 Hz) noise. The envelope band was chosen so as to overlap with typical speech envelopes. Each stimulus iteration was 4 s long. For the first 1 s, the envelopes of 20 tones were independently realized, and hence temporally incoherent. For the next 1 s, the envelopes of N of the tones were the same noise realization, making them maximally coherent, and the remaining 20 − N remained uncorrelated. This sequence was repeated a second time, yielding an overall stimulus of 4 s duration. Three different temporal coherence conditions were synthesized with N = 6, 12, or 18 tones of the possible 20 being coherent. The particular subset of tones that were coherently modulated and their envelope waveforms were random from trial to trial. Greater values of N result in a more salient pop-out of the tone complex subset that is coherent. The trials for the 3 conditions were pseudo-randomly interleaved, with 240 trials presented per condition with an interstimulus interval (ISI) uniformly distributed between 1.2 and 1.3 s. Of the 1.2- to 1.3-s silent periods between stimuli, the last 0.5 s before the onset of a new trial was treated as the baseline period for estimating the noise-covariance matrix for source localization. The MEG recording was performed with participants watching a muted movie without subtitles as the stimuli were presented diotically.

Although our stimulus design was analogous to the stochastic figure–ground (SFG) stimuli used by Teki et al. [3234], our design deviated from the SFG stimulus in important ways. The SFG stimulus consisted of a cloud of 50-ms-long chords. For each 50-ms window, the set of tone frequencies that formed the chord was either random and uncorrelated with the adjacent chords (when the figure was absent) or was chosen such that a subset of tones followed a coherent time-frequency trajectory (when the figure was present). While this SFG design maintains the overall stimulus amplitude at a roughly constant level, the modulation statistics within some of the frequency bands (i.e., within some tonotopic channels) were allowed to change between the figure-absent and the figure-present portions of the stimulus. This leaves open the possibility that when a figure appears in the stimulus, just the within-tonotopic-channel discontinuities in the modulation statistics can contribute to a neural response. In contrast, the stimuli used in the present study has fixed modulation statistics within each tonotopic channel throughout the length of the stimulus. One consequence of maintaining the within-channel modulation statistics constant and simultaneously increasing the temporal coherence across channels is that the overall amplitude of the stimulus can increase during the coherent portions of the stimulus. However, cochlear processing dictates that the central nervous system does not have access to the overall amplitude and instead is driven by individual tonotopic channels. Accordingly, any neural responses that reflect the overall amplitude would also have to arise from neural computations that use the temporal coherence to combine information across different tonotopic channels. A supplementary behavioral study (S4 Fig) was conducted to validate this reasoning and our modified design.

Audio files of the stimuli are provided in S1S3 Audios.

Data analysis and statistical testing

From the left and right auditory cortex ROIs, time courses of evoked responses (i.e., across trial averaged responses) were extracted by simple polarity alignment and averaging. The evoked responses were expected to show an overall onset response at the beginning of the auditory stimulus, and 2 other event-related responses to changes in the temporal coherence that occurred at peristimulus times of t = 1 s and t = 3 s, marked by white arrows in Fig 1B. The evoked responses for each of the 3 different coherence conditions (N = 6, 12, and 18) were obtained by collapsing across the 2 coherence-change events (i.e., by averaging the time course surrounding the event at t = 3 s with the time course surrounding the event at t = 1 s). The evoked responses derived using this procedure were normalized for each individual by a z-scoring procedure relative the 200 ms immediately preceding the coherence-change events. To quantify the M1 component, we average the z-scores in the 50- to 150-ms time window. To quantify the M2 component, we average the z-scores in the 250- to 450-ms time window.

Because for both the M1 and M2 responses the greatest and most significant main effect of group was for the N = 18 coherent tones condition (S2 Fig), and because the M2 component was sometimes difficult to identify in the ASD group, we focused on analyzing the combined M1 + M2 response, computed using the average the z-scores in the 50- to 450-ms time window. The combined time window also eliminated the need to be sensitive to variations in individual latencies; while there was no notable group difference in M1 or M2 latencies, undetected latency differences may affect the analysis of individual peaks even in the absence of group differences in latencies.

For the analysis of induced oscillations, spectrograms were estimated from the ROI time courses in each trial and then averaged together. Spectrograms were estimated using the multitaper method for 200-ms-long windows and a time-bandwidth product of 4. This allowed for averaging over estimates from 3 orthogonal maximally concentrated tapers to obtain each time-frequency coefficient [84]. The spectrogram power estimates were then averaged over trials and log transformed. The time axis was then collapsed by averaging, to yield an induced power spectrum in the 5–70 Hz range. Because no systematic differences were seen between left and right hemisphere ROIs in the induced spectra, the responses from the 2 sides were averaged. The induced-oscillation power spectra from each condition and participant in the gamma range (30–70 Hz) were then used for the statistical analysis.

Statistical inference on the MEG ROI data was performed by fitting mixed-effects models to the data and adopting a model comparison approach [85,86]. Two separate sets of models were fit for the evoked-response amplitudes and for the induced spectra. Fixed-effects terms were included for the various experimental factors (i.e., the 3 temporal coherence conditions, the 2 groups, and the group × condition interaction), whereas participant-related effects were treated as random effects. Homoscedasticity of participant-related random effects was not assumed initially, and hence the error terms were allowed to vary and be correlated across the levels of fixed-effects factors. In order to not over-parameterize the random effects, the random terms were pruned by comparing models with and without each term using the Akaike information criterion and log-likelihood ratios [87]. The best-fitting random-effects model turned out to be a single participant-specific random effect that was condition independent (and an overall residual). This random-effect term was used for all subsequent analysis. All model coefficients and covariance parameters were estimated using restricted maximum likelihood as implemented in the lme4 library in R [88]. To make inferences about the experimental fixed effects (i.e., the effect of condition, group, and group × condition interaction), the F approximation for the scaled Wald statistic was employed [89]. This approximation is more conservative in estimating type I error rates than the chi-squared approximation of the log-likelihood ratios and has been shown to perform well even with fairly complex covariance structures and small sample sizes [90]. The p-values and F-statistics based on this approximation are reported.

Correlations between the neurophysiological measures and behavioral measures

To test whether individual behavioral scores in the ASD group correlated with the neurophysiological measures obtained (evoked responses and induced gamma band activity), we used these neurophysiological measures (from the N = 18 condition) as predictors in simple linear regression models, one to predict SRS-SCI score and another to predict SPQ-APS score, for each ASD participant. Thus, the best-fitting linear model was used to obtain the “predicted” behavioral scores exclusively from this combination of neurophysiological measurements. Pearson correlation coefficients between the predicted and the observed behavioral scores are reported to indicate the predictability of behavior from neurophysiology (Fig 3A and 3B).

Classification by diagnosis

To further assess the relevance of the evoked responses and induced gamma band power to the ASD diagnosis, we used a linear SVM classifier to test whether these neurophysiological metrics allow for blind classification of individuals into their corresponding groups. Specifically, the individual z-scores for gamma band power and normalized evoked responses (both for the N = 18 coherent tones condition) were used as the predictive features. The full dataset of N = 47 participants was randomly partitioned into a training sample containing 90% of the participants and a test sample containing the remaining 10%. The accuracy of the optimal linear SVM classifier obtained from the training sample was then assessed by comparing the predicted class labels to the actual class labels in the test sample. To obtain the mean and standard error of the classification accuracy, this 90%–10% train–test split was repeated 50 times with replacement. The classifier had an accuracy of 83% ± 3% for group classification (Fig 3A) across the 50 train (90%)–test (10%) splits.

Ethics statement

The study was conducted in accordance with the principles expressed in the Declaration of Helsinki. Parents provided written informed consent according to protocols approved by the Massachusetts General Hospital Institutional Review Board (IRB protocol #2005P001768). Participants provided assent in addition to parent consent.

Supporting information

S1 Audio. Auditory file of stimulus example for the N = 6 coherent tones condition.

The attached auditory file is an example stimulus heard by the participants.

https://doi.org/10.1371/journal.pbio.3001541.s001

(WAV)

S2 Audio. Auditory file of stimulus example for the N = 12 coherent tones condition.

The attached auditory file is an example stimulus heard by the participants.

https://doi.org/10.1371/journal.pbio.3001541.s002

(WAV)

S3 Audio. Auditory file of stimulus example for the N = 18 coherent tones condition.

The attached auditory file is an example stimulus heard by the participants.

https://doi.org/10.1371/journal.pbio.3001541.s003

(WAV)

S1 Fig. Evoked responses by hemisphere.

(A) The averaged amplitude of the evoked responses for each condition (x-axis), over the 50-ms to 450-ms time window, for each hemisphere (left = solid line, right = dashed line), for the TD group. (B) Same as (A), for the ASD group. There were no hemispheric differences in either group (p = 0.21). Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656).

https://doi.org/10.1371/journal.pbio.3001541.s004

(EPS)

S2 Fig. M1 and M2 components of the evoked responses.

(A) The M1 component (50–150 ms) of the evoked responses from Fig3A–3C, by condition, for both groups. The group difference was only significant for the N = 18 coherence tones condition. (B) Same as (A), for the M2 component of the response (250–450 ms). The group difference was significant for both the N = 12 and N = 18 coherent tones conditions, with differences getting larger with coherence. For main effect of group: F(1,45) = 9.65, p = 0.003; for group × condition interaction: F(2,90) = 3.4, p = 0.03. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656).

https://doi.org/10.1371/journal.pbio.3001541.s005

(EPS)

S3 Fig. SPQ-APS and age.

(A) SPQ-APS plotted versus age, for the ASD group. (B) Same as Fig 4C, just without the age correction: The behaviorally assessed auditory processing subscore, SPQ-APS, uncorrected for age, plotted against the same score predicted using the individual evoked responses and induced gamma band activity. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656).

https://doi.org/10.1371/journal.pbio.3001541.s006

(EPS)

S4 Fig. Data from a behavioral experiment testing the contribution of across-channel relationships versus the contribution of overall stimulus amplitude, for the detection of the temporally coherent figure.

Figure detection data were collected from N = 6 adult participants with 2 coherence levels (4 coherent tones or 6 coherent tones) that were near the detection threshold. (A) The acoustic time course of an example stimulus where the coherently modulated tones were closer in frequency spacing. (B) The acoustic time course of an example stimulus where the coherently modulated tones were wider in spacing, and interspersed with incoherently modulated tones. Because the number of coherently modulated tones is the same in (A) and (B), the amplitude increase is the same for both categories of stimuli. (C and D) The spectrograms of the stimuli in (A) and (B), respectively. (E) Detection accuracy dropped significantly when the frequency separation was high (6.3 ERBs or 5.7 ERBs—[B and D]), compared to when the frequency separation was smaller (1.5 ERBs—[A and C]), despite both stimuli producing the same increase in overall stimulus amplitude during the coherent portions. This demonstrates that it is the coherence relationships across different tonotopic channels, and not the overall amplitude, that is most likely to be the primary driver of figure detection, as expected. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656).

https://doi.org/10.1371/journal.pbio.3001541.s007

(EPS)

S5 Fig. ICSS-I scores and MEG gamma band activity.

As with the evoked responses, no significant correlations were observed between ICSS-I score, which measures overall ability to inhibit attention to unwanted stimuli, and induced gamma band activity. Underlying data can be found on Zenodo (doi: 10.5281/zenodo.5823656).

https://doi.org/10.1371/journal.pbio.3001541.s008

(EPS)

References

  1. 1. Desimone R, Duncan J. Neural mechanisms of selective visual attention. Annu Rev Neurosci. 1995;18:193–222. pmid:7605061
  2. 2. Shinn-Cunningham BG. Object-based auditory and visual attention. Trends Cogn Sci. 2008;12(5):182–6. pmid:18396091
  3. 3. Singer W, Gray CM. Visual feature integration and the temporal correlation hypothesis. Annu Rev Neurosci. 1995;18:555–86. pmid:7605074
  4. 4. Shamma SA, Elhilali M, Micheyl C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 2011;34(3):114–23. pmid:21196054
  5. 5. Robertson CE, Baron-Cohen S. Sensory perception in autism. Nat Rev Neurosci. 2017;18(11):671–84. pmid:28951611
  6. 6. Kawakami S, Uono S, Otsuka S, Yoshimura S, Zhao S, Toichi M. Atypical multisensory integration and the temporal binding window in autism spectrum disorder. J Autism Dev Disord. 2020;50(11):3944–56. pmid:32211988
  7. 7. Siemann JK, Veenstra-VanderWeele J, Wallace MT. Approaches to understanding multisensory dysfunction in autism spectrum disorder. Autism Res. 2020;13(9):1430–49. pmid:32869933
  8. 8. Stevenson RA, Siemann JK, Schneider BC, Eberly HE, Woynaroski TG, Camarata SM, et al. Multisensory temporal integration in autism spectrum disorders. J Neurosci. 2014;34(3):691–7. pmid:24431427
  9. 9. Blake R, Turner LM, Smoski MJ, Pozdol SL, Stone WL. Visual recognition of biological motion is impaired in children with autism. Psychol Sci. 2003;14(2):151–7. pmid:12661677
  10. 10. Tomchek SD, Dunn W. Sensory processing in children with and without autism: a comparative study using the short sensory profile. Am J Occup Ther. 2007;61(2):190–200. pmid:17436841
  11. 11. Mamashli F, Khan S, Bharadwaj H, Michmizos K, Ganesan S, Garel KA, et al. Auditory processing in noise is associated with complex patterns of disrupted functional connectivity in autism spectrum disorder. Autism Res. 2017;10(4):631–47. pmid:27910247
  12. 12. Alcantara JI, Weisblatt EJ, Moore BC, Bolton PF. Speech-in-noise perception in high-functioning individuals with autism or Asperger’s syndrome. J Child Psychol Psychiatry. 2004;45(6):1107–14. pmid:15257667
  13. 13. Brown C, Gruber T, Boucher J, Rippon G, Brock J. Gamma abnormalities during perception of illusory figures in autism. Cortex. 2005;41(3):364–76. pmid:15871601
  14. 14. Rippon G, Brock J, Brown C, Boucher J. Disordered connectivity in the autistic brain: challenges for the “new psychophysiology”. Int J Psychophysiol. 2007;63(2):164–72. pmid:16820239
  15. 15. Stroganova TA, Orekhova EV, Prokofyev AO, Tsetlin MM, Gratchev VV, Morozov AA, et al. High-frequency oscillatory response to illusory contour in typically developing boys and boys with autism spectrum disorders. Cortex. 2012;48(6):701–17. pmid:21458787
  16. 16. Yoshimura Y, Ikeda T, Hasegawa C, An KM, Tanaka S, Yaoi K, et al. Shorter P1m response in children with autism spectrum disorder without intellectual disabilities. Int J Mol Sci. 2021;22(5):2611. pmid:33807635
  17. 17. Kurita T, Kikuchi M, Yoshimura Y, Hiraishi H, Hasegawa C, Takahashi T, et al. Atypical bilateral brain synchronization in the early stage of human voice auditory processing in young children with autism. PLoS ONE. 2016;11(4):e0153077. pmid:27074011
  18. 18. Matsuzaki J, Ku M, Dipiero M, Chiang T, Saby J, Blaskey L, et al. Delayed auditory evoked responses in autism spectrum disorder across the life span. Dev Neurosci. 2019;41(3–4):223–33. pmid:32007990
  19. 19. Matsuzaki J, Kuschner ES, Blaskey L, Bloy L, Kim M, Ku M, et al. Abnormal auditory mismatch fields are associated with communication impairment in both verbal and minimally verbal/nonverbal children who have autism spectrum disorder. Autism Res. 2019;12(8):1225–35. pmid:31136103
  20. 20. Otto-Meyer S, Krizman J, White-Schwoch T, Kraus N. Children with autism spectrum disorder have unstable neural responses to sound. Exp Brain Res. 2018;236(3):733–43. pmid:29306985
  21. 21. Arnett AB, Hudac CM, DesChamps TD, Cairney BE, Gerdts J, Wallace AS, et al. Auditory perception is associated with implicit language learning and receptive language ability in autism spectrum disorder. Brain Lang. 2018;187:1–8. pmid:30312833
  22. 22. Foss-Feig JH, Schauder KB, Key AP, Wallace MT, Stone WL. Audition-specific temporal processing deficits associated with language function in children with autism spectrum disorder. Autism Res. 2017;10(11):1845–56. pmid:28632303
  23. 23. Bloy L, Shwayder K, Blaskey L, Roberts TPL, Embick D. A Spectrotemporal correlate of language impairment in autism spectrum disorder. J Autism Dev Disord. 2019;49(8):3181–90. pmid:31069618
  24. 24. Lee Y, Park BY, James O, Kim SG, Park H. Autism spectrum disorder related functional connectivity changes in the language network in children, adolescents and adults. Front Hum Neurosci. 2017;11:418. pmid:28867997
  25. 25. D’Mello AM, Moore DM, Crocetti D, Mostofsky SH, Stoodley CJ. Cerebellar gray matter differentiates children with early language delay in autism. Autism Res. 2016;9(11):1191–204. pmid:27868392
  26. 26. Lombardo MV, Pierce K, Eyler LT, Carter Barnes C, Ahrens-Barbeau C, Solso S, et al. Different functional neural substrates for good and poor language outcome in autism. Neuron. 2015;86(2):567–77. pmid:25864635
  27. 27. Verly M, Verhoeven J, Zink I, Mantini D, Peeters R, Deprez S, et al. Altered functional connectivity of the language network in ASD: role of classical language areas and cerebellum. Neuroimage Clin. 2014;4:374–82. pmid:24567909
  28. 28. Eyler LT, Pierce K, Courchesne E. A failure of left temporal cortex to specialize for language is an early emerging and fundamental property of autism. Brain. 2012;135(Pt 3):949–60. pmid:22350062
  29. 29. Roberts TPL, Bloy L, Ku M, Blaskey L, Jackel CR, Edgar JC, et al. A multimodal study of the contributions of conduction velocity to the auditory evoked neuromagnetic response: anomalies in autism spectrum disorder. Autism Res. 2020;13(10):1730–45. pmid:32924333
  30. 30. Edgar JC, Khan SY, Blaskey L, Chow VY, Rey M, Gaetz W, et al. Neuromagnetic oscillations predict evoked-response latency delays and core language deficits in autism spectrum disorders. J Autism Dev Disord. 2015;45(2):395–405. pmid:23963591
  31. 31. Roberts TP, Cannon KM, Tavabi K, Blaskey L, Khan SY, Monroe JF, et al. Auditory magnetic mismatch field latency: a biomarker for language impairment in autism. Biol Psychiatry. 2011;70(3):263–9. pmid:21392733
  32. 32. Teki S, Chait M, Kumar S, von Kriegstein K, Griffiths TD. Brain bases for auditory stimulus-driven figure-ground segregation. J Neurosci. 2011;31(1):164–71. pmid:21209201
  33. 33. Teki S, Chait M, Kumar S, Shamma S, Griffiths TD. Segregation of complex acoustic scenes based on temporal coherence. Elife. 2013;2:e00699. pmid:23898398
  34. 34. Teki S, Barascud N, Picard S, Payne C, Griffiths TD, Chait M. Neural correlates of auditory figure-ground segregation based on temporal coherence. Cereb Cortex. 2016;26(9):3669–80. pmid:27325682
  35. 35. Bertrand O, Tallon-Baudry C. Oscillatory gamma activity in humans: a possible role for object representation. Int J Psychophysiol. 2000;38(3):211–23. pmid:11102663
  36. 36. Giraud AL, Poeppel D. Cortical oscillations and speech processing: emerging computational principles and operations. Nat Neurosci. 2012;15(4):511–7. pmid:22426255
  37. 37. Gross J. Let the rhythm guide you: non-invasive tracking of cortical communication channels. Neuron. 2016;89(2):244–7. pmid:26796688
  38. 38. Viswanathan V, Bharadwaj HM, Shinn-Cunningham BG. Electroencephalographic signatures of the neural representation of speech during selective attention. eNeuro. 2019;6(5):ENEURO.0057-19.2019. pmid:31585928
  39. 39. Arnal LH, Giraud AL. Cortical oscillations and sensory predictions. Trends Cogn Sci. 2012;16(7):390–8. pmid:22682813
  40. 40. Cardin JA, Carlen M, Meletis K, Knoblich U, Zhang F, Deisseroth K, et al. Driving fast-spiking cells induces gamma rhythm and controls sensory responses. Nature. 2009;459(7247):663–7. pmid:19396156
  41. 41. Blatt GJ, Fatemi SH. Alterations in GABAergic biomarkers in the autism brain: research findings and clinical implications. Anat Rec (Hoboken). 2011;294(10):1646–52. pmid:21901839
  42. 42. Nelson SB, Valakh V. Excitatory/inhibitory balance and circuit homeostasis in autism spectrum disorders. Neuron. 2015;87(4):684–98. pmid:26291155
  43. 43. Di J, Li J, O’Hara B, Alberts I, Xiong L, Li J, et al. The role of GABAergic neural circuits in the pathogenesis of autism spectrum disorder. Int J Dev Neurosci. 2020;80(2):73–85. pmid:31910289
  44. 44. Kolodny T, Schallmo MP, Gerdts J, Edden RAE, Bernier RA, Murray SO. Concentrations of cortical GABA and glutamate in young adults with autism spectrum disorder. Autism Res. 2020;13(7):1111–29. pmid:32297709
  45. 45. Ajram LA, Pereira AC, Durieux AMS, Velthius HE, Petrinovic MM, McAlonan GM. The contribution of [1H] magnetic resonance spectroscopy to the study of excitation-inhibition in autism. Prog Neuropsychopharmacol Biol Psychiatry. 2019;89:236–44. pmid:30248378
  46. 46. Fernell E. Further studies of GABA and glutamate imbalances in autism are important challenges for future research. Acta Paediatr. 2019;108(2):200–1. pmid:30359475
  47. 47. Oliveira B, Mitjans M, Nitsche MA, Kuo MF, Ehrenreich H. Excitation-inhibition dysbalance as predictor of autistic phenotypes. J Psychiatr Res. 2018;104:96–9. pmid:30015265
  48. 48. Port RG, Gaetz W, Bloy L, Wang DJ, Blaskey L, Kuschner ES, et al. Exploring the relationship between cortical GABA concentrations, auditory gamma-band responses and development in ASD: evidence for an altered maturational trajectory in ASD. Autism Res. 2017;10(4):593–607. pmid:27696740
  49. 49. Rojas DC, Singel D, Steinmetz S, Hepburn S, Brown MS. Decreased left perisylvian GABA concentration in children with autism and unaffected siblings. Neuroimage. 2014;86:28–34. pmid:23370056
  50. 50. Coghlan S, Horder J, Inkster B, Mendez MA, Murphy DG, Nutt DJ. GABA system dysfunction in autism and related disorders: From synapse to symptoms. Neurosci Biobehav Rev. 2012;36(9):2044–55. pmid:22841562
  51. 51. Brock J, Brown CC, Boucher J, Rippon G. The temporal binding deficit hypothesis of autism. Dev Psychopathol. 2002;14(2):209–24. pmid:12030688
  52. 52. Saoud H, Josse G, Bertasi E, Truy E, Chait M, Giraud AL. Brain-speech alignment enhances auditory cortical responses and speech perception. J Neurosci. 2012;32(1):275–81. pmid:22219289
  53. 53. Hernandez LM, Green SA, Lawrence KE, Inada M, Liu J, Bookheimer SY, et al. Social attention in autism: neural sensitivity to speech over background noise predicts encoding of social information. Front Psychiatry. 2020;11:343. pmid:32390890
  54. 54. Russo N, Zecker S, Trommer B, Chen J, Kraus N. Effects of background noise on cortical encoding of speech in autism spectrum disorders. J Autism Dev Disord. 2009;39(8):1185–96. pmid:19353261
  55. 55. Milne E, Swettenham J, Hansen P, Campbell R, Jeffries H, Plaisted K. High motion coherence thresholds in children with autism. J Child Psychol Psychiatry. 2002;43(2):255–63. pmid:11902604
  56. 56. O’Sullivan JA, Shamma SA, Lalor EC. Evidence for neural computations of temporal coherence in an auditory scene and their enhancement during active listening. J Neurosci. 2015;35(18):7256–63. pmid:25948273
  57. 57. Schneider F, Balezeau F, Distler C, Kikuchi Y, van Kempen J, Gieselmann A, et al. Neuronal figure-ground responses in primate primary auditory cortex. Cell Rep. 2021;35(11):109242. pmid:34133935
  58. 58. Pressnitzer D, Meddis R, Delahaye R, Winter IM. Physiological correlates of comodulation masking release in the mammalian ventral cochlear nucleus. J Neurosci. 2001;21(16):6377–86. pmid:11487661
  59. 59. Ceponiene R, Lepisto T, Shestakova A, Vanhala R, Alku P, Naatanen R, et al. Speech-sound-selective auditory impairment in children with autism: they can perceive but do not attend. Proc Natl Acad Sci U S A. 2003;100(9):5567–72. pmid:12702776
  60. 60. Keehn B, Muller RA, Townsend J. Atypical attentional networks and the emergence of autism. Neurosci Biobehav Rev. 2013;37(2):164–83. pmid:23206665
  61. 61. Hillyard SA, Hink RF, Schwent VL, Picton TW. Electrical signs of selective attention in the human brain. Science. 1973;182(4108):177–80. pmid:4730062
  62. 62. Choi I, Rajaram S, Varghese LA, Shinn-Cunningham BG. Quantifying attentional modulation of auditory-evoked cortical responses from single-trial electroencephalography. Front Hum Neurosci. 2013;7:115. pmid:23576968
  63. 63. Borgers C, Kopell NJ. Gamma oscillations and stimulus selection. Neural Comput. 2008;20(2):383–414. pmid:18047409
  64. 64. Llinas RR, Leznik E, Urbano FJ. Temporal binding via cortical coincidence detection of specific and nonspecific thalamocortical inputs: a voltage-dependent dye-imaging study in mouse brain slices. Proc Natl Acad Sci U S A. 2002;99(1):449–54. pmid:11773628
  65. 65. Keehn B, Lincoln AJ, Muller RA, Townsend J. Attentional networks in children and adolescents with autism spectrum disorder. J Child Psychol Psychiatry. 2010;51(11):1251–9. pmid:20456535
  66. 66. Kessler K, Seymour RA, Rippon G. Brain oscillations and connectivity in autism spectrum disorders (ASD): new approaches to methodology, measurement and modelling. Neurosci Biobehav Rev. 2016;71:601–20. pmid:27720724
  67. 67. Khan S, Gramfort A, Shetty NR, Kitzbichler MG, Ganesan S, Moran JM, et al. Local and long-range functional connectivity is reduced in concert in autism spectrum disorders. Proc Natl Acad Sci U S A. 2013;110(8):3107–12. pmid:23319621
  68. 68. Khan S, Michmizos K, Tommerdahl M, Ganesan S, Kitzbichler MG, Zetino M, et al. Somatosensory cortex functional connectivity abnormalities in autism show opposite trends, depending on direction and spatial scale. Brain. 2015;138(Pt 5):1394–409. pmid:25765326
  69. 69. Khan S, Hashmi JA, Mamashli F, Bharadwaj HM, Ganesan S, Michmizos KP, et al. Altered onset response dynamics in somatosensory processing in autism spectrum disorder. Front Neurosci. 2016;10:255. pmid:27375417
  70. 70. Piro JM. Handedness and intelligence: patterns of hand preference in gifted and nongifted children. Dev Neuropsychol 1998;14:619–30.
  71. 71. Lord C, Rutter M, DiLavore PC, Risi S, Gotham K, Bishop SL. Autism diagnostic observation schedule (ADOS-2). 2nd edition. Torrence (CA): Western Psychological Services; 2012.
  72. 72. Korkman M, Kirk U, Kemp SL. NEPSY II. Clinical and interpretative manual. San Antonio: Psychological Corporation; 2007.
  73. 73. Kaufman AS, Kaufman NL. Kaufman brief intelligence test. 2nd edition. Circle Pines (MN): AGS Publishing; 2004.
  74. 74. Elliott CD. Differential ability scales. 2nd edition. San Antonio: Harcourt Assessments; 2006.
  75. 75. Constantino JN, Gruber CP. SRS2: social responsiveness scale. 2nd edition. Torrence (CA): Western Psychological Services; 2012.
  76. 76. Eicher JD, Powers NR, Miller LL, Akshoomoff N, Amaral DG, Bloss CS, et al. Genome-wide association study of shared components of reading disability and language impairment. Genes Brain Behav. 2013;12(8):792–801. pmid:24024963
  77. 77. Johnson TA, Neely ST, Garner CA, Gorga MP. Influence of primary-level and primary-frequency ratios on human distortion product otoacoustic emissions. J Acoust Soc Am. 2006;119(1):418–28. pmid:16454296
  78. 78. Taulu S, Simola J. Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements. Phys Med Biol. 2006;51(7):1759–68. pmid:16552102
  79. 79. Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage. 1999;9(2):179–94. pmid:9931268
  80. 80. Hämäläinen MS, Sarvas J. Realistic conductivity geometry model of the human head for interpretation of neuromagnetic data. IEEE Trans Biomed Eng. 1989;36(2):165–71. pmid:2917762
  81. 81. Dale AM, Liu AK, Fischl BR, Buckner RL, Belliveau JW, Lewine JD, et al. Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron. 2000;26(1):55–67. pmid:10798392
  82. 82. Glasberg BR, Moore BC. Derivation of auditory filter shapes from notched-noise data. Hear Res. 1990;47(1–2):103–38. pmid:2228789
  83. 83. Shera CA, Guinan JJ Jr, Oxenham AJ. Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements. Proc Natl Acad Sci U S A. 2002;99(5):3318–23. pmid:11867706
  84. 84. Thomson DJ. Spectrum estimation and harmonic analysis. Proc IEEE. 1982;70(9):1055–96.
  85. 85. Baayen RH, Davidson DJ, Bates DM. Mixed-effects modeling with crossed random effects for subjects and items. J Mem Lang. 2008;59(4):390–412.
  86. 86. Box GEP, Tiao GC. Bayesian inference in statistical analysis. New York: John Wiley & Sons; 2011.
  87. 87. Pinheiro JC, Bates DM. Mixed-effects models in S and S-PLUS. Berlin: Springer Science & Business Media; 2006.
  88. 88. Bates D, Maechler M, Bolker B, Walker S, Christensen RHB, Singmann H, et al. lme4: linear mixed-effects models using ‘Eigen’ and S4. Vienna: R Foundation for Statistical Computing; 2012.
  89. 89. Kenward MG, Roger JH. An improved approximation to the precision of fixed effects from restricted maximum likelihood. Comput Stat Data Anal. 2009;53(7):2583–95.
  90. 90. Schaalje GB, McBride JB, Fellingham GW. Adequacy of approximations to distributions of test statistics in complex mixed linear models. J Agric Biol Environ Stat. 2002;7(4):512–24.