Similar Brain Activation during False Belief Tasks in a Large Sample of Adults with and without Autism

Reading about another person’s beliefs engages ‘Theory of Mind’ processes and elicits highly reliable brain activation across individuals and experimental paradigms. Using functional magnetic resonance imaging, we examined activation during a story task designed to elicit Theory of Mind processing in a very large sample of neurotypical (N = 462) individuals, and a group of high-functioning individuals with autism spectrum disorders (N = 31), using both region-of-interest and whole-brain analyses. This large sample allowed us to investigate group differences in brain activation to Theory of Mind tasks with unusually high sensitivity. There were no differences between neurotypical participants and those diagnosed with autism spectrum disorder. These results imply that the social cognitive impairments typical of autism spectrum disorder can occur without measurable changes in the size, location or response magnitude of activity during explicit Theory of Mind tasks administered to adults.


Introduction
Theory of Mind ('ToM') is the capacity to represent mental states, such as thoughts, beliefs, desires, feelings, plans, suspicions and doubts [1]. Consideration of others' mental states helps people in many everyday activities: teaching, flirting, coordinating and cooperating, playing games, conducting minor and massive deceptions, making moral judgments and appreciating fiction. Individuals with autism spectrum disorders (ASD) have impaired ToM. For example, children with ASD are disproportionately delayed on tasks that tap inferences about other people's beliefs [2]. The neural mechanism of this impairment remains unknown. However, in neurotypical (NT) adults and children, fMRI studies reveal a remarkably reliable group of brain regions recruited during a ToM task of belief reasoning [3][4][5][6][7][8][9]. These regions include the left and right temporo-parietal junction (RTPJ and LTPJ), right anterior superior temporal sulcus (rSTS), the medial precuneus (PC), and the medial prefrontal cortex (MPFC).
Previous authors have suggested that ToM impairments in ASD could be caused by impaired function in the brain regions typically involved in ToM [10][11][12][13]. Attempts to characterize the function of these ToM-relevant brain regions in adults with ASD have yielded conflicting results, however. Some studies suggest that activations in ToM regions show no difference between ASD and NT individuals [14,15]. Others find reduced activity (i.e. hypo-activity) [13,16], or the opposite pattern, hyper-activity, in ASD [17,18], while still others find evidence of all three patterns depending on the specific task demands [16,19].
One factor contributing to these conflicting results may be that sample sizes are small, and individual variability is large. Small samples of individuals with ASD are problematic because individuals with ASD may be highly heterogeneous in their neural responses (e.g., [20]). Small samples of NT individuals are equally problematic, because they allow for calculation of only the mean of the typical response, not its distribution. Understanding the typical distribution is critical if neural measures are to be useful in a clinical or diagnostic setting. For most clinical applications, it is less important to describe differences between groups of individuals (e.g. studies of this nature have an average of 14 adults with ASD vs. 14 NT adults [13]), and more important to be able to describe the neural activity pattern of each specific individual, relative to typical and atypical distributions. For example, using fMRI to help diagnose an individual with ASD would require comparing each individual to the typical distribution.
In the current study, we therefore aggregated data collected over 8 years from 462 NT participants. This large sample allowed us to investigate individual differences in neural responses to a belief-reasoning ToM task, and measure any difference between NT participants and high-functioning adults with ASD with unusually high sensitivity. We also tested whether the response of ToM regions in NT individuals is related to basic demographic factors that may be relevant for ASD, including gender, age, and IQ.

Methods
All studies whose data are used in the current paper were reviewed and approved by the MIT IRB, the Committee on the Use of Human Experimental Subjects. Participants provided written, informed consent, in accordance with the guidelines of the MIT Committee on the Use of Human Experimental Subjects, and were compensated monetarily for their time.

ASD Participants
31 participants with a clinical diagnosis of ASD (mean = 32.5 years, range: 18-66 years; 26 male) were included in this analysis, having volunteered to participate in one of three previous studies [21][22][23]. In addition to a clinical diagnosis of ASD, the Autism Diagnostic Observation Schedule (ADOS) was administered (ADOS communication score mean = 3.2, SD = 1.3; ADOS social score M = 5.9, SD = 2.1). Each ASD participants had a combined social and communicative score . = 7 (the criterion for inclusion in the study). IQ was measured for all but one male participant with ASD (KBIT-2, IQ mean = 116.8, 69-141, SD = 15.7). In previous studies in our lab, these participants were found to have significant behavioral deficits in ToM [21,22] in a moral judgment task.
For direct NT vs. ASD comparison, a set of 27 NT participants were chosen based on pairwise match with 27 ASD participants on IQ (610 points), age (65 years), and gender. The pairs were also matched on all experimental parameters (e.g. the coil used, the TR and slice thickness, the modality of the stimuli, the number of stimuli per condition, the presentation duration of the stimuli, and the task the participant performed) (Note several of the ASD participants were excluded since they could not be matched to a specific NT participant). These samples each contained 22 males, and were matched on age (ASD mean age = 31.0 years, range 18-66, SD = 11.5; NT mean age = 30.6 years, range 19-50, SD = 9.3) and IQ (ASD mean IQ = 117.9, 90-141, SD = 12.7; NT mean IQ = 115.1, 83-133, SD = 12.2); these 54 participants were termed the 'matched' sample.

fMRI Tasks
All participants were presented with verbal stories in English that described a character who acquired a false belief (Belief condition) or a physical representation that became false, such as an outdated photograph or map (Photo condition). For example, one Belief story was: ''The morning of high school dance Sarah placed her high heel shoes under her dress and then went shopping. That afternoon, her sister borrowed the shoes and later put them under Sarah's bed.'' One sample Photo story was: ''Sargent famously painted the south bank of the river in 1885. In 1910 a huge dam was built, flooding out the whole river basin, killing the old forests. Now the whole area is under water.'' (More example stimuli are available at http://saxelab.mit.edu/superloc. php).
Across conditions, the stories were matched for length (see Table 2 for more details about the tasks). Each participant read or heard an equal number of stories in the two conditions. Localizers were designed to present between 10 and 16 stories per condition to participants, though due to extenuating circumstances a small number of participants were presented with as few as 5 stories per condition while others saw as many as 24 (mean = 13.2). The stories were presented either visually as text on a screen (to 420 participants), or aurally through headphones (to 73 participants). In separate blocks of the same experiment, 121 participants also saw stimuli from other conditions (e.g. physical descriptions of objects, lists of unconnected words) but those conditions were not included in the current analyses. The duration of the stimulus block corresponded, on average, to 0.47 seconds per word (STD = 0.06 s), followed by 10-12 seconds of rest (these values were constant within each variant of the task, see Table 2). After reading or hearing each story, participants performed one of three tasks: true/false (TF, e.g. ''In the painting the south bank of the river is wooded. True/False'', N = 304), fill in the blank (FITB, e.g. ''In the painting the south bank of the river is… Wooded/ Flooded'', N = 101), or word match-to-sample (MTS, e.g. in the preceding story, did you read ''Painted''?, N = 88). Task was held constant within participant, but varied across participant. These tasks correspond to the functional localizers used in previously published studies [24][25][26][27][28][29][30]

ROI Analyses
Seven functional ROIs from the ToM network were defined in individual participants, using the contrast Belief.Photo, consistent with previous literature (e.g. [4,6]): right and left temporoparietal junction (RTPJ, LTPJ), the precuneus (PC), the dorsal, middle and ventral components of MPFC (DMPFC, MMPFC and VMPFC) and the right superior temporal sulcus (RSTS).
To identify individual functional ROIs, initial ''hypothesis spaces'' for each of the 7 regions were defined based on group random effects analysis and used as a guide to identify clusters of activation representing the ROI in participants. To ensure independence, the participants were split randomly into two groups (first half N = 247, second half N = 246), and the hypothesis space from one group's random effects analyses was used to define ROIs in the participants belonging to the other group. ASD participants were evenly distributed between the two groups. The hypothesis spaces consisted of all voxels contained in a continuous cluster of suprathreshold voxels that include the region representative of the ROI. The ROI hypothesis spaces were approximately spherical, except the RSTS which was elongated following the sulcus. Averaged across both halves, the DMPFC comprised 1,185 voxels, all z.20 mm, centered at xyz coordinates (21 mm, 53 mm, 29 mm). The MMPFC comprised 1,094 voxels, between z.0 mm and z,20 mm centered at (1 54 12). VMPFC comprised 774 voxels, all z,0 mm, at (1 50 212). The RSTS comprised 3,002 voxels, all z.6 mm, centered at (55 210 216). The RTPJ comprised 2,812 voxels, all z.6 mm, centered at (54 252 23). The LTPJ similarly compromised 2,444 voxels, all z.6 mm centered at (252 258 25). Finally, the PC hypothesis space consisted of 3,339 voxels centered at (1-56 34). Average ROI hypothesis spaces are available as binary images in the NIfTI-1 file format at saxelab.mit.edu/hypothesis_spaces.zip.
Each participant's contrast image (Belief.Photo) was masked iteratively with the six hypothesis spaces. After each masking, candidate voxels were identified within the hypothesis spacewhere a voxel was a candidate if it was individually significant at p,0.001 (uncorrected) and contiguous with at least 10 other voxels significant at p,0.001. From this set of candidates, the voxel with the peak T is selected, along with all other candidate voxels that are contiguous with and not more than 9 mm from the peak. From each ROI, five parameters were extracted: the size of the ROI (number of voxels included), the mean T value across voxels included in the ROI, and the x-, y-, and z-coordinate of the ROI's ''center of mass,'' being the average position of ROI voxels weighted by their T values. The presence or absence of an identified ROI in each region was itself used as an additional parameter.
The reliability of ROI parameters within participants was assessed by split-half analysis. Two contrast images were defined, one from even run data and another from odd run data, in each participant. The correlation of the ROI even and odd parameter values was measured across participants. Significance was established by iteratively permuting (5000 permutations) the even-half data across participants to generate an empirical 'null' distribution. We report individual differences as reliable if the true pairing showed a higher correlation than 90% of the empirical null distribution. (Note that since these analyses are based on half of the data per subject, they are conservative estimates of the reliability of individual differences measured based on the full dataset per individual).
Next, we sought (i) to remove variance from the ROI parameters associated with 'nuisance' demographic and experimental variables to better reveal differences (if any) between ASD and NT groups and (ii) to evaluate the effects (if any) of our demographic and experimental variables on ROI parameters. To both these ends, a multivariate Generalized Linear Model (GLM) was constructed for each ROI parameter with a nine-column (age, gender, group, modality, coil, number of stimuli per condition, mean words per stimulus, task type and the intercept term) predictor matrix using data from 493 (462 NT and 31 ASD) participants (see Table 3). For the binary statistic that indicated The data analyzed here are aggregated across seven variants of the Theory of Mind task. Although the conceptual contrast was constant (Belief.Photo), across participants there was variation in the length and number of stimuli, the modality (V = visual, A = auditory) and explicit task (FITB = fill in the blank, MTS = match to sample, TF = true/false). We also report the mean number of words per stimulus ( whether or not the ROI of interest was identified in a given subject, the GLM presumed a binomial distribution and a logit linker function. The GLM used a normal distribution and an identity linker function for all other ROI statistics. Regressors, except the intercept, were mean-centered prior to regression. Correction for multiple comparisons was performed with Bonferroni correction for the nuisance predictors, across all predictors (age, gender, etc.) and all dependent measures (mean T, number of voxels, etc.), within each ROI, as detailed below. With the exception of the beta values that relate the predictors to the probability of discovery (which is binomial), these beta values directly relate the size of the effect in parameter units (i.e., mm) per regressor unit (i.e., years). Given the number of predictors being used, we evaluated the estimability of the predictors of interest, particularly the group predictor, using Belsley's Collinearity test [31]. The group predictor of interest was never found to exceed the standard tolerance (a variance decomposition proportion greater than 0.5 and a condition index greater than 30) established in the MATLAB collintest function. All predictor/parameter pairs were found to lie well within tolerance, across all tests.
In addition to the large-sample GLM, three other samples were considered in turn using an identical procedure, the only variation being the subset of participants from which the predictor matrix was constructed (see Table 1, Table 3). These groups were: (1) the 'Matched' sample: 27 ASD participants were matched pairwise to 27 NT participants based on gender, age, IQ, coil, stimulus modality, and task (both ASD and NT participants are drawn from [21][22][23]), (2) the 'ASD only' sample: an analysis of variability within participants diagnosed with ASD and (3) the 'IQ' sample: including all 91 participants for whom IQ was collected. In each sample, all non-degenerate predictors were used (i.e., predictors whose values were defined in all participants and varied withingroup, see Table 3). Estimability was assessed in the same way as in the Full sample. Across all tests, only one predictor, for one parameter, was found to exceed this tolerance: IQ when used to predict the mean T value of the MMPFC in the IQ sample. Thus, our predictors of interest were properly estimable in our models.
Because of the very large number of comparisons, we corrected p values using three different correction factors (m) according to Bonferroni's formula, [corrected p] = 1-(1-[uncorrected p]) m . For our key a priori predictors of interest (ASD vs NT in the full sample and the 'Matched' sample, and IQ in the 'IQ' sample), we corrected for the 6 dependent variables (i.e. the ROI parameters) per ROI, resulting in m = 6. The effect of ADOS score was measured in the ASD-only sample, and since it has two parameters (a social and a communication score), we used m = 12. All of the remaining predictors were treated as exploratory, so effects of these predictors are reported as significant corrected for both the number of dependent variables (6) and the number of nuisance predictors (9), resulting in m = 54 (exploratory predictors are only considered in the full sample). Any relationship found to be significant at p,0.01 uncorrected is discussed as a 'trend,' though corrected p-values are always reported for consistency.
In the matched sample, the significance effect of group on the mean value of each ROI parameter was also measured and multiple-comparison corrected nonparametrically. The objective of such nonparametric tests was to select an alpha using an empirical distribution such that the probability of any parameter within an ROI being a false positive result was 0.05. To this end, we permuted the group labeling randomly 25,000 times. In each permutation, the significance of the difference in means between the randomly generated groups was measured by a t-test. This yielded 25,000 p values for each parameter within an ROI. The p values for that ROI were pooled together and sorted, and the 0.83%tile (i.e., the 5 th %tile divided by 6, the number of comparisons per ROI) p value was chosen. This p value represents an empirical threshold such that, for a given ROI, the chances of obtaining at least one p value less than it for any parameter is 5%. We also tested whether the groups differed in the variance of any ROI parameter using a similar strategy, with two important differences: the p value was calculated based on an Ansari-Bradley test, a nonparametric two-sample test of equal variances, and the ''found/not found'' parameter was omitted, since the mean and variance of a vector of 19s and 09s are directly related by a deterministic function.

Whole-brain analyses
Whole brain analyses were conducted on the main contrast of interest (Belief.Photo). To correct for multiple comparisons, nonparametric whole-brain analysis was performed using SnPM (http://www.sph.umich.edu/ni-stat/SnPM/), which estimates the false-positive rate directly from the data. Each test used 3 mm variance smoothing and 5,000 permutations, with no global normalization, grand mean scaling, or threshold masking. The corrected p-value for filtering was 0.05, with an uncorrected Tvalue minimum threshold of 3, and a voxel-cluster combining theta value of 0.5. Voxel-cluster combining was performed jointly by Fisher, Tippet and Mass voxel-cluster combining functions. Permutations were repeated for each predictor of interest; all demographic and experimental predictor variables were included  in each model as nuisance regressors using a modified SnPM plugin designed to support nuisance regressors (see Table 3).
To look for subtle group differences, we also conducted a second, more sensitive whole-brain analysis. We used a more lenient voxel-wise threshold (p,0.001 uncorrected) to correct for multiple comparisons, and then validated the results using a splithalf analysis. We used data from each participant's even and odd runs, separately, to identify clusters showing a group difference (NT.ASD, or ASD.NT) in the response to Belief.Photo stories. We identified clusters in either the even or odd run random effects analyses, and extracted the response in those clusters in the other half of the data; clusters are reported as significant if the corresponding group difference was observed in the left out data at p,0.05 uncorrected.
The goal of this project was to explain individual differences in the size, magnitude and/or position of brain regions involved in ToM. Before testing individual differences, however, it was critical to determine that (i) there was variability in these measures, and (ii) the differences between participants on these measures are reliable (i.e. that inter-individual differences do not simply reflect noise in the measurement). All ROI parameters showed reasonable variability. The standard deviation of the mean T-value ranged between 0.5 and 1 across ROIs, and the standard deviation of ROI size (measured in number of voxels) ranged between 60 and 100 voxels. In order to test whether this variability reflects stable individual differences, we compared the correlation of ROI parameters from independent halves of the data from the each individual to an empirical permutation-based 'null' distribution of these correlations. Both mean T and ROI size were reliable within individual, compared to variability across participants, for all ROIs (mean T: all r.0.25, rank.96%; size: all r.0.13, rank.90%), except VMPFC. Center of mass was somewhat less reliable: the x position was reliable (rank.90%) for RTPJ, LTPJ and MMPFC; the y position was reliable for RTPJ, PC, DMPFC, MMPFC, and RSTS; and the z position was reliable for RTPJ, DMPFC, VMPFC, and RSTS.
Next we used multivariate GLM analyses to estimate whether any variance in the size, position or response magnitude of ToM brain regions is explained by whether an individual has been diagnosed with ASD.
For the large sample analysis, we compared all of the participants with ASD (N = 31, 26 male) to all of the NT participants (N = 462, 197 male). No parameter of any ROI was significantly predicted by the group membership (ASD vs. NT) of the individual (all p.0.22 for all ROIs, see Figure 1, Table 4). Furthermore, the odds ratio favoring the null hypothesis (no difference between the distributions) over the alternative hypothesis (a difference between NT and ASD), for all regions and all parameters was greater than 1.8:1 (Bayes factor, [32]), with two exceptions: for the mean T in VMPFC (0.8:1) and the probability of finding RSTS (1.1:1) the odds of the null and alternative hypotheses were approximately equal. No ASD participant fell outside of 3 standard deviations of the typical distribution on any measure for any ROI. The confidence intervals on the coefficient estimates were quite small, indicating a high degree of confidence that if any differences exist, those differences are very small. For instance, if there exists a difference in the mean T value of the RTPJ voxels between ASD and NT participants, we are 99% certain this difference is less than a T value of.3 in either direction.
Next we compared participants with ASD to NT participants in the 'Matched' group. Again, we found no significant difference between participants with ASD and the matched controls on any ROI parameter (all p.0.24, Table 4). For these comparisons, the odds ratio favored the null hypothesis over the alternative hypothesis (i.e. ratio.1.1:1) for all regions and all parameters, with one exception, the probability of finding activity in the PC (0.85:1). We also confirmed these results using nonparametric tests of group differences (which do not assume that the measured variables are normally distributed and used the null distributions to establish a corrected alpha). No ROI parameter showed an effect of group (the closest to significance was p.0.07, against a corrected threshold of p,0.01). We conducted a similar analysis to test whether the ASD group showed a more heterogeneous response (i.e. some participants showing hypo-activation while others showed hyper-activation). There was no evidence of increased variance in the ASD group, for any parameter for any ROI (the closest to significance was p.0.01, against a corrected threshold of p,0.002).
In the 'ASD only' group, we next considered the effect of ADOS scores (i.e. social and communicative symptom severity). As an exploratory analysis, we looked for effects of other demographic and experimental parameters. We found that gender and age did not affect any ROI parameter, even at the level of a trend, nor did the modality of the stimuli.
In the full sample, the variable with the greatest effect was the choice of coil.   For each ROI/parameter pair, the effect of group membership was measured via a GLM. The estimated beta value, the t2 and p-values, the 99% confidence intervals, and the number of NT and ASD individuals included in the regression are reported. Also reported is the Baye's factor, which relates the odds ratio of the null hypothesis (group membership has no effect) to the alternate hypothesis. doi:10.1371/journal.pone.0075468.t004 Match-to-Sample task, so these effects may be related to stimulus length, task, or the specific stimuli used in this experiment. The number of stimuli per condition did not predict any ROI parameter significantly. At the level of a trend, the VMPFC tended to have fewer voxels as the number of stimuli increased (t(260) = 22.62, p = 0.4, b = 24.1364.10).
In sum, ROI analyses suggest that while individuals differ reliably in the size and response magnitude (and to a lesser extent, position) of brain regions associated with ToM, these neural parameters were not affected by whether an individual was diagnosed with ASD. Within the range of ADOS scores in the current sample, autism severity did not explain variance in these ROI parameters, either. Only experimental parameters, such as the MRI coil used, and demographic variables, such as IQ, explained some of the variance across participants. However, ROI analyses inevitably provide a limited window on the brain, so to look further for differences between groups in ToM brain regions, we conducted whole brain analyses.

Whole brain analysis results
In the whole-brain analyses, the main effect identifies brain regions significantly recruited during Belief compared to Photo stories, controlling for variance explained by any of the nuisance regressors. This analysis identified robust activation in all of the regions previously associated with Theory of Mind, including bilateral TPJ, medial precuneus and posterior cingulate, MPFC, and STS (see Figure 2, Table 5). It also identified activation in other regions, including (bilaterally) the hippocampus, the parahippocampal gyrus, the temporal poles, the amygdala, and the dorsolateral prefrontal cortex.
Next, we compared activation in individuals with ASD vs NT adults, in both the Matched and Full sample. There were no significant differences in activation, when correcting for multiple comparisons. We then repeated the whole brain analysis in the Full sample using a lenient threshold (p,0.001 voxel-wise, uncorrected) in half of the data, and validated the results in the remaining half (p,0.05). Two clusters were identified by the contrast NT.ASD 6 Belief.Photo: one in left anterior IPS (14 voxels, peak at [232 mm, 240 mm, 40 mm]), and the other in left posterior IPS (38 voxels, peak at [234 mm 238 mm 42 mm]). The anterior IPS cluster was identified in both odd and even halves of the data (independent validation in even half: t(462,31) = 3.15, p = 0.002), whereas the posterior IPS cluster was found only in the odd half, but validated in the even half (t(462,31) = 2.16, p = 0.03; see Figure 3). In both regions, both groups showed higher responses to the Photo than the Belief stories, but ASD participant's greater activation during the Photo stories than NT participants. No regions were reliably recruited more in ASD than in NT individuals, for Belief.Photo.
A variety of other experimental covariates yielded clusters of activation, although we treat these as exploratory. These are listed in Table 6.

Discussion
The main question we sought to address in this paper was whether adults diagnosed with ASD show differences in the magnitude or location of activations in ToM-associated brain regions, compared to a large sample of NT participants. In order to answer this question, we aggregated data across multiple experiments to produce a large sample of NT individuals (N = 462) and a moderately large sample of high functioning individuals with ASD (N = 31). We tested whether the magnitude of neural responses to stories about people's beliefs, versus about physical representations like photographs, differed between groups either in targeted regions of interest or in whole brain analyses. These analyses identified no reliable differences between groups in the previously identified ToM brain regions. These results suggest that differences in activation between these groups of participants during explicit Theory of Mind tasks, if they exist, are very small and could not be used to diagnose ASD.

Effects of ASD on ToM activations
We used two complementary analysis strategies: ROI analyses focused on previous identified ToM brain regions are more sensitive, whereas whole brain analyses look for differences between groups anywhere in the brain, and therefore are less restricted. For both kinds of analyses, we conducted two comparisons. First, we compared the ASD group to the whole group of NT individuals, using simultaneous nuisance regressors to control for variance associated with demographic and experimental differences among participants. Second, we compared the ASD group to a smaller sample of NT individuals, one-to-one matched to the ASD group on age, gender, IQ and experimental parameters. For both comparisons, we found no reliable differences between groups in the size, response magnitude, or probability of identifying above-threshold voxels, in any ToM ROI (see Figure 1).
In addition to the absence of mean differences between the groups, we found no evidence that even a subset of individuals with ASD differed significantly from the typical population. The ROI parameters of individuals with ASD fell squarely within the distribution of typical values, rarely straying more than 2 SD from the typical means and never more than 3 SD. We also tested the hypothesis that the ASD group was more heterogeneous than the NT group. For example, similar mean activation could mask differences between the groups if the ASD group included a bimodal distribution: some individuals showing hypo-activation while other show hyper-activation. We found no evidence for this hypothesis, as the variance of the ROI parameters did not differ significantly across groups in the matched sample.
In the whole brain analyses, permutation-based correction revealed no significant differences between ASD and NT individuals, in either the full sample or the matched sample. Because our results overall suggest a null result-namely, no difference between groups-we also examined the same analyses at a more lenient threshold in half of the data (in case true differences between groups that are just below the threshold for significance), and then validated in the left-out half. We found two regions of parietal cortex with reliable effects; however, the group differences in these regions were in an unpredicted direction. In both regions, both groups showed more activation during Photo than Belief stories, but the ASD group showed more activation than the NT group during Photo stories. Furthermore, these regions were not near any of the regions implicated in ToM by the overall contrast of Belief.Photo. While intriguing, differences in these regions therefore do not seem likely to explain impairments in ToM typically observed in ASD. We could not identify any region that both (a) was reliably recruited for Belief more than Photo stories in 462 NT individuals, and (b) showed significantly less, or more, activation in the same contrast in 31 individuals with ASD.

Effects of other experimental parameters on ToM activations
Using a similar analysis strategy, we also found that gender does not affect activity in ToM brain regions; nor do the modality of the  stimuli (visual vs. aural) or the experimental task. The absence of an effect of gender is particularly noteworthy, because the full sample contained a large number of male and female participants. Behavioral measures of ToM often reveal an advantage for female individuals [33,34]; apparently this advantage is not due to measurable differences in ToM-associated brain region activity as elicited by the false belief task.
One factor that did have a significant effect on ROI parameters was the coil used. The 32-channel coil has documented higher SNR [35]; we found that this difference translated into larger ROIs that were more likely to be detected in individual participants. Thus, our results suggest that for individually-defined ROI analyses, the increased SNR of the 32-channel coil provides a clear benefit.

Interpreting the current results
With regard to our key null results, the current study has advantages and disadvantages. On the one hand, the large sample size provides more power and sensitivity to detect effects where they exist. In particular, although our sample of ASD individuals was only moderately large, the very large sample of NT individuals included gives us very high confidence on the true mean of the ROI parameters in NT individuals. Finding that the ASD population mean does not differ from the NT mean is thus strong evidence that these data cannot be attributed to different population distributions.
However, these results cannot be interpreted as ruling out any differences in the neural mechanisms for ToM in individuals with ASD. One qualification of the current results is that the parameters measured here (i.e. response magnitude to Belief vs. Photo stories) provide only a limited measure of a region's function. Other measures include the functional connectivity of each region and within-region spatial pattern of responses [36,37]. Individuals with ASD may differ in these other measures of ToM region function [14,38]. Indeed using multi-voxel pattern analysis, we found reliable differences between a subset of these same ASD and NT individuals in the pattern of activity in ToM regions [21].
A second qualification is that these results apply to a specific functional task: an explicit, verbal false belief task. It may be that deficits in theory of mind in individuals with ASD disproportionately affect implicit or spontaneous consideration of others' mental states, but not performance on explicit tasks [39]. fMRI studies using tasks that elicit spontaneous or implicit social processing may be more likely to find hypo-activation [23,[40][41][42], whereas those with tasks that demand explicit social judgments find normal or hyper-activation [19]. For example, spontaneous processing of irony may produce hypo-activation in ASD [43], whereas explicit instructions eliminate the hypo-activation, and may even cause hyper-activation [19].
Finally, a third qualification is that the ASD participants in the current sample are very high functioning. Although they meet clinical diagnostic criteria for ASD (and have been shown to have behavioral deficits in ToM tasks in a previous study [22]), these individuals are highly verbal and pass first-order false belief tasks. Thus, our results do not rule out gross differences in the ToM regions of lower-functioning individuals with ASD. On the other hand, the individuals in our sample are diagnosed with ASD because of disproportionate difficulties with social interaction and communication. Also, we found no evidence that within our participants increasing ASD severity had any effect on the measured ROI parameters. So the current results imply that social cognitive impairments can occur without measurable changes in the magnitude or position of ToM brain regions. Collectively, the current results provide strong evidence that the neural differences between high functioning adults with ASD and NT participants are not due to gross changes in ToM brain regions.
A common hypothesis is that the lack of performance differences between NT and high-functioning ASD individuals is a function of the development of compensatory processes in the ASD individuals. Our findings provide evidence against this hypothesis. Compensation predicts that successful performance on explicit ToM tasks would be supported by activity in other regions than (or in addition to) ToM regions. For example, one possible prediction might be that individuals with ASD pass false belief tasks by recruiting the mechanisms that NT individuals used to solve the logically similar 'False Photograph' tasks, such as the fronto-parietal network [44,45]. By contrast to these predictions, we found no sign of any increased compensatory activation during Belief stories, in ASD compared to NT individuals, in any region.
These results leave open a number of key questions. First, it will be important to identify the neural differences between adults with ASD and NT individuals that do account for behavioral differences in ToM. One possibility is that individuals with ASD are highly heterogeneous, so that different neural sources explain the behavioral delays in different individuals. As noted above, though, we do not see evidence for this possibility in the current data. Another possibility, also discussed above, is that the difficulties in theory of mind processing are related to the online use of these regions in real-world social interactions. It will be important to determine what social contexts lead to atypical as well as typical recruitment of these brain regions in ASD. Perhaps ToM brain regions can be recruited during explicit tasks but atypical interaction with other brain regions and networks results in hypoactivation during implicit tasks. Third, the current study focused on adults. It will be important in future research to test whether the developmental trajectory of ToM brain regions differs in children with ASD compared to NT children, even if the mature states of the system are reasonably similar. Finally, it would be useful to extend these analyses to lower-functioning individuals with ASD.
Nevertheless, the implications of this study are that (i) socialcognitive impairments can occur without large differences in the activation of ToM brain regions; and (ii) hypo-activation during explicit Theory of Mind tasks will not be useful for diagnosing ASD.