Empirical Failures of the Claim That Autistic People Lack a Theory of Mind

The claim that autistic people lack a theory of mind—that they fail to understand that other people have a mind or that they themselves have a mind—pervades psychology. This article (a) reviews empirical evidence that fails to support the claim that autistic people are uniquely impaired, much less that all autistic people are universally impaired, on theory-of-mind tasks; (b) highlights original findings that have failed to replicate; (c) documents multiple instances in which the various theory-of-mind tasks fail to relate to each other and fail to account for autistic traits, social interaction, and empathy; (c) summarizes a large body of data, collected by researchers working outside the theory-of-mind rubric, that fails to support assertions made by researchers working inside the theory-of-mind rubric; and (d) concludes that the claim that autistic people lack a theory of mind is empirically questionable and societally harmful.


Failures of Specificity
For nearly two decades, Simon Baron-Cohen and his colleagues claimed that poor performance on theory-of-mind tasks uniquely characterized autistic people (see Table  1).The initial claim was staked on autistic children's performance on a theory-of-mind task called False Belief. In a False Belief task, a child might be introduced to two puppets, one named Sally and the other Anne. The child watches as the Sally puppet places a possession, such as a marble, inside a basket. Then, the Sally puppet is taken away, and the Anne puppet moves the marble from its previous location to another location, such as inside a box. When the Sally puppet is represented, the child is asked orally, "Where will Sally look for her marble?" If the child answers with the location where the marble actually is, rather than the location where the first puppet placed the marble, the child is considered to have failed the False Belief task and to lack a theory of mind.
Other tasks have been used to assess theory of mind; some of the more popular ones appear in Table 2. But it was autistic children's performance on False Belief tasks that propelled Baron-Cohen and his colleagues' claim that autistic people uniquely lack a theory of mind.

Failures of Universality
A lack of a theory of mind is often assumed to be not only a unique characteristic of autistic people, but also a universal characteristic of all autistic people. Repeatedly, Baron-Cohen has claimed that "mindblindness … is universal in applying to all individuals on the autistic spectrum" (Baron-Cohen, 2008a, p. 61;Baron-Cohen, 2008b, p. 113;Baron-Cohen, 2009, p. 70;Baron-Cohen, 2010, p. 169;Baron-Cohen, 2011a, p. 40;Baron-Cohen, 2011b, p. 629; see also Table 3). This assumed universality has been widely promoted across psychology, as the opening quote of our article illustrates. However, as other authors note, many autistic children and adults pass theory-of-mind tasks; therefore, these other authors rightly argue that "mindblindness" cannot be a universal characteristic of autism (e.g., Bailey, Phillips, & Rutter, 1996;Bauminger & Kasari, 1999;Beversdorf et al., 1998;Boucher, 2012;Buitelaar, van der Wees, Swaab-Barneveld, & van der Gaag, 1999b;Charman, 2000;. Why do some autistic participants pass theory-of-mind tasks while others do not? Numerous researchers have aptly noted that theory-of-mind tasks rely heavily on spoken language (see Frymiare, 2005, andPripas-Kapit, 2012, for reviews). For example, nearly half the variance in participants' performance on False Belief tasks can be predicted by their spoken language comprehension (Capage & Watson, 2001); nearly three fourths can be predicted by their facility with vocabulary (Steele, Joseph, & Tager-Flusberg, 2003) and appreciation of grammar (Peterson, Wellman, & Slaughter, 2012). In longitudinal studies, vocabulary predicts False Belief performance more powerfully than age (Steele et al., 2003); in studies comparing autistic to nonautistic participants, vocabulary predicts False Belief performance more powerfully than whether the participants are autistic (Loukusa et al., 2014;Norbury, 2005; see also Milligan, Astington, & Dack's, 2007, meta-analysis with over 100 studies of typically developing children; Yirmiya, Erel, Shaked, and Solomonica-Levi [1998], meta-analysis with 40 studies of autistic children; and Gernsbacher, 2018a, for studies published after these meta-analyses).

Failures of Replication
Reproducibility is the cornerstone of science, as psychology's current focus on replication illustrates (Gernsbacher, 2018b(Gernsbacher, , 2018c(Gernsbacher, , 2018dSpellman, 2015;Tackett et al., 2017). However, when tests of reproducibility are applied to claims about autism and theory of mind, the seminal findings frequently fail.
For example, cognizant of the heavy reliance on language by most theory-of-mind tasks, Baron-Cohen, Leslie, and Frith (1986) designed a nonverbal task. Children were given a scrambled set of four pictures and told to arrange the pictures in a coherent order. One set of pictures displayed a boy standing at the top of a hill with a basketball-sized rock next to his foot; another picture displayed the boy with his foot close to the rock, as though ready to kick it; another picture displayed the rock halfway down the hill; and another picture displayed the rock at the bottom of the hill. Baron-Cohen et al. (1986) deemed this type of picture sequence "mechanical," and autistic children were almost perfect in sequencing such pictures. Oddly, typically developing children performed below 50% correct on these "mechanical" pictures-which most likely was unexpected because Baron-Cohen et al. (1986, p. 116) deemed these "mechanical" pictures "the simplest." Another set of pictures displayed a boy sitting on the ground holding an ice cream cone to his mouth with a girl standing nearby; in another picture, the ice-cream-holding boy is looking at the girl who, in this picture, is also sitting on the ground; in another, the girl is reaching for the boy's ice cream cone while he stretches his arm as far as possible away from the girl's reach; in the final picture, the girl holds the ice cream cone to her mouth, while the boy rubs his eyes. Autistic and typically developing children were equally adept at arranging this type of picture sequence, which Baron-Cohen et al. (1986, p. 115) deemed "behavioral" and, quite curiously, not an assay of the characters' intentions or requiring an understanding of "mental states." An example of the last type of picture sequence displayed a girl holding a teddy bear in her arms, while a flower extends from the ground beside her; in another picture, the girl is turned completely to one side and is holding the flower's stem, while the teddy bear is on the ground behind her; in another, the girl is holding the flower to her nose, while a boy, standing behind the girl, reaches for the teddy bear on the ground; in the final picture, the girl is turned around, there's no boy or teddy bear, and the girl's mouth is wide open. Baron-Cohen et al. (1986, p. 116, 224) deemed this picture sequence "intentional," and the typically developing children, who performed so shockingly poorly on the "simplest" mechanical pictures performed nearly perfectly on these pictures, whereas the autistic children performed poorly. Baron-Cohen et al. (1986, p. 113) used these data to claim that "a specific cognitive deficit … prevents the development of a 'theory of mind' in the autistic child." Four research teams, of whom we are aware, have published attempts to directly replicate these results-and none could do so. Using the same stimuli, procedures, and analyses, no other research team has replicated the finding that autistic participants perform significantly worse than typically developing participants on the "intentional" picture sequences ("there were no group differences on the intentional subtest of the picture sequencing measure," , p. 1093; "contrary to … previous findings , [the intentional condition of the Picture Sequence Task] … failed to reveal significant differences," Oswald & Ollendick, 1989, p. 122; "no two groups were significantly different [on the Intentional picture sequence]," Buitelaar, van der Wees, Swaab- Barneveld, & van der Gaag, 1999a, p. 46;"The [autistic] participants were close to ceiling … on the intentional Picture Sequencing items, " Brent, Rios, Happé, & Charman, 2004, p. 286).
Likewise, Baron-Cohen's (1989b) report that autistic participants are prone to fail secondorder False Belief tasks (see Table 2) is also prone to fail replication (e.g., "No group differences were found in performance on the control or test questions," Tager-Flusberg & Sullivan, 1994, p. 577; "was no difference between normal and autistic children's performance," Leekam & Prior, 1994, p. 907; "no significant association between group membership and proportion of items passed, " Bowler, 1992, p. 885; "our findings are inconsistent with early studies of False Belief abilities in autism," Bauminger & Kasari, 1999, p. 85; "The present findings contradict the claims of proponents of … the theory of mind … hypothesis of autism," Buitelaar et al., 1999a, p. 53).
Despite these seminal studies' precariously small sample sizes and their lack of replication, their grander claims continue to rebound through textbooks and scholarly literature, within and outside of psychology, and they ricochet through public vernacular. The robustness of these claims, if not the robustness of their supporting evidence, could well have deterred other researchers from publishing conflicting results (Franco, Malhotra, & Simonovits, 2014).

Failures of Convergent Validity
Several tasks have been proposed to assess theory of mind, as Table 2 illustrates. However, in more recent studies, many with quite large samples of autistic and nonautistic participants, these tasks fail to converge. These repeated failures of convergence seriously question the tasks' validity.  Adler, Nadler, Eviatar, & Shamay-Tsoory, 2010;Brent et al., 2004;Dziobek et al., 2006;Farrant et al., 2005;Kaland, Callesen, Møller-Nielsen, Mortensen, & Smith, 2008;Kristen, Rossmann, & Sodian, 2014;Roeyers et al., 2001). 4 In fact, the average correlation between performance on the Strange Stories task and the Reading-the-Mind-in-the-Eyes task, weighted across 27 systematically reviewed samples (Gernsbacher, 2018a), is only 0.089, with a CI that overlaps zero (i.e., 99.9% CI [ −.001, .178]). 5 Similarly, the Strange Stories task fails to correlate significantly with the Animated Triangles task (N = 100 autistic children, Lukito et al., 2017; N = 90 autistic adolescents, Hollocks et al., 2014; N = 89 autistic and 89 nonautistic adults, Wilson et al., 2014;N = 80 nonautistic adults, Brewer, Young, & Barnett, 2017; see also Clemmensen et al., 2016). The 4 Only sample sizes greater than 50 will be specified here; all other sample sizes are specified in Gernsbacher (2018a). 5 Baron-Cohen et al. (1997) agreed that the correlation between the Strange Stories task and the Reading-the-Mind-in-the-Eyes task "warrants direct testing" and promised that their article would provide that test ("to validate the Eyes Task as a theory of mind task, subjects in the two clinical groups were also tested on Happé's [1994a] Strange Stories. In the case of the subjects with autism and Asperger Syndrome, this was part of a separate study [Jolliffe, 1997]"; pp. 815-816). Unfortunately, for neither the autistic nor the non-autistic participants is the correlation between Reading-the-Mind-in-the-Eyes and Strange Stories reported, in either Baron-Cohen et al.'s (1997) original article or Jolliffe's (1997) "separate study." Similarly, Baron-Cohen and colleagues (Vellante et al., 2013) claimed that "studies have found the [Reading-the-Mind-in-the-]Eyes test to be highly correlated with the Strange Stories test (Baron-Cohen, Wheelwright, Hill, et al., 2001)" (p. 329). Unfortunately, the article cited by Baron-Cohen and colleagues to support this claim (viz., Baron-Cohen et al., 2001) does not include the Strange Stories task (and Jolliffe & Baron-Cohen's, 1999, article, which does include the Strange Stories task, does not include the Reading-the-Mindin-the-Eyes task).
Strange Stories task also fails to correlate significantly with the Faux Pas task (N = 123 nonautistic adults, Ahmed & Miller, 2011; N = 61 autistic and 32 nonautistic participants, Spek et al., 2010), particularly when language comprehension is controlled.
Even False Belief tasks can fail to correlate significantly with each other (e.g., Charman & Campbell, 1997;Duval et al., 2011;Hughes, 1998). The lack of convergent validity among theory-of-mind tasks undermines the core construct validity of theory of mind.

Failures of Predictive Validity
If theory-of-mind tasks assay "the basic machinery for social engagement" (Baron-Cohen, 2009, p. 73), then performance on theory-of-mind tasks should predict socioemotional function. But numerous studies document failures of prediction. For example, performance on theory-of-mind tasks fails to significantly predict with smaller samples of autistic and nonautistic children, adolescents, and adults in Begeer, Malle, Nieuwland, & Keysar, 2010;Campbell et al., 2011;Lalonde & Chandler, 1995;Travis, Sigman, & Ruskin, 2001).
Indeed, when Baron-Cohen and his colleagues applied machine learning to categorize a large sample (N = 395) of autistic adults into those who perform better versus worse on a theory-of-mind task, the researchers were unable to identify any variable that patterned with theory-of-mind performance "including sex/gender, age, depression or anxiety symptoms, autistic traits, trait empathy, and autism symptom severity" (Lombardo et al., 2015, p. 2). The only characteristic that reliably patterned with theory-of-mind performance was language dexterity.
Finally, if theory-of-mind tasks truly assay the ability to infer other people's "intentions, goals and desires" (Baron-Cohen et al., 1995, p. 381), and if autistic people lack a theory of mind, then autistic people should fare poorly at inferring other people's intentions, goals, and desires. But, as Table 4 illustrates, autistic people of all ages skillfully understand other persons' intentions, goals, and desires. This large body of data, collected by researchers working outside the theory-of-mind rubric, demonstrates another failure of the claim that autistic people lack a theory of mind.
The claim that autistic people lack a theory of mind is so entrenched that when existing measures fail to support the claim, researchers create new measures. For example, Baron-Cohen and his colleagues motivated the need for a new theory-of-mind task by claiming that autistic adults must "have a selective theory of mind … deficit," even though existing theory-of-mind tests "are not subtle enough to detect [that] deficit" (Rutherford, Baron-Cohen, & Wheelwright, 2002, p. 189). Rajendran and Mitchell (2007) suggest, as do we, that "the development of advanced tests [is] a post hoc response in finding data anomalous to the theory of mind hypothesis" (p. 229; i.e., data that do not support the claim that autistic people lack a theory of mind).
The development of more and more theory-of-mind tests resembles a methodological arms race. The deployment of first-order False Belief tasks escalates to second-order False Belief tasks, which escalate to the so-called advanced theory-of-mind tasks (Strange Stories, Reading-the-Mind-in-the-Eyes, and Animated Triangles) and then to the Strange Stories Film task (Murray et al., 2017), the Comic Strip task (Sivaratnam, Cornish, Gray, Howlin, & Rinehart, 2012), and the Beauty Contest task (Pantelis & Kennedy, 2017)-all in pursuit of finding a task to support the claim that autistic people lack a theory of mind, when previous tasks fail to support the claim.
Most recently, "implicit" theory-of-mind tasks have been developed (Schneider et al., 2013;Schuwerk et al., 2015;Senju et al., 2009; but see Schuwerk, Priewasser, Sodian, &Perner, 2018, andKulke, von Duhn, Schneider, &Rakoczy, 2018, for difficulties replicating measures of implicit theory of mind). As Rajendran and Mitchell (2007) note, researchers and their deployment of increasingly "advanced tests have turned … logic on its head." The drive to create more and more theory-of-mind tasks "seem to be premised on the assumption" that autistic people lack a theory of mind; therefore, "tests which do not reveal this must be insensitive or unsuitable" (p. 229).
There has even been a move toward asking nonautistic parents to gauge their autistic offspring's theory of mind (Hutchins, Prelock, & Bonazinga, 2012), which is problematic for at least two reasons. First, as autistic scholars have explained (e.g., Sinclair, 1993) and as empirical data demonstrate (e.g., Gernsbacher, Stevenson, & Dern, 2017), nonautistic people are often as disadvantaged when trying to understand autistic people as vice versa. Milton (2012) refers to this dilemma as the "double empathy problem" (see also Gernsbacher, 2006), which Loftis (2015, p. 10) illustrates with the following conundrum: "If autistics truly have a deficit in [theory of mind], then why is it that neurotypicals find it so difficult to intuit the intentions of autistic people"?
Second, most everyone misjudges their own theory-of-mind performance (Ames & Kammrath, 2004;Realo et al., 2003;Zaki, Bolger, & Ochsner, 2008). For example, an improbable eight out of 10 U.S. college students rate their own theory-of-mind ability as better-than-average (in contrast, a more probable half rate as more logically average their public speaking ability, social self-confidence, computer skills, physical health, emotional health, creativity, and propensity for risk taking, Higher Education Research Institute, 2017). Thus, it is unlikely that nonautistic parents can accurately assess their own, let alone their autistic offspring's, theory-of-mind abilities. As even the creators of a child's version of Reading-the-Mind-in-the-Eyes task admit, "it is unknown what the child [in the stimulus photographs] was actually feeling" because the stimulus photographs "were all derived from naturalistic settings (e.g., taken by parents) rather than being posed specifically for an experiment" (Pino et al., 2017(Pino et al., , p. 2746. Some researchers willingly admit that we do not know what theory of mind is (Schaafsma, Pfaff, Spunt, & Adolphs, 2015), much less how to measure it. Despite this uncertainty, other researchers claim with certainty that "autism is a clear illustration of what human life would be like if one lacked a theory of mind" (Baron-Cohen, 2000a, p. 266).
For example, philosopher David Livingstone Smith (2007, p. 172) claims that autistic people "live in a world in which nothing has a mind" and "perceive [other] people as hunks of flesh moving mindlessly through space." Developmental psychologist Alison Gopnik ventures even further, graphically describing how she envisions autistic people perceive other people: Around me bags of skin are draped over chairs, and stuffed into pieces of cloth, they shift and protrude in unexpected ways. … Two dark spots near the top of them swivel restlessly back and forth. A hole beneath the spots fills with food and from it comes a stream of noises. Imagine that the noisy skin-bags suddenly moved toward you, and their noises grew loud, and you had no idea why, no way of explaining them or predicting what they would do next. (Gopnik as quoted in Baron-Cohen, 1995, pp. 4-5;Gerrans, 2002, pp. 312-313;and Smith, 2007, p. 172) Along with the stigma promulgated by such renditions, the claim that autistic people lack a theory of mind causes societal harm (Dinishak & Akhtar, 2013). Because a lack of theory of mind is believed to impair autistic people's understanding of their selves, in addition to their understanding of others, the claim disputes autistic people's autonomy, devalues their selfdetermination, and discredits their credibility (Yergeau, 2018). Consequently, numerous autistic authors have decried the claim, reporting that it "perpetuates stereotypes and oversimplifications [with] the potential for tremendous harm" (Cohen-Rottenberg, 2011); that it has already "harmed … countless autistic individuals" (VisualVox, 2017); and that "its continued perpetuation will continue to be damaging to autistic people" (Nicholson, 2013). We, therefore, call for considerably greater caution before endorsing the claim that autistic people lack a theory of mind. Baron-Cohen S (2011a  Examples of Popular Theory-of-Mind Tasks

Example
False Belief task (first-order) Participant is shown a container with which they'd be familiar, for example, a closed bag of M&M candies. Participant is asked to predict what's inside. The bag is opened, and the participant is shown that their belief about the contents was false: The bag doesn't contain M&M candies; instead, it contains erasers. Participant is asked "What did you think would be inside the bag before I opened it?" If participant answers with the name of the bag's actual content (e.g., erasers) rather than the name of the bag's expected content (e.g., candy), the participant fails the false belief task.
False Belief task (second-order) Similar to a first-order False Belief task (as illustrated above), except that the participant is asked, "What do you think another person would think would be inside the box before I opened it?" Strange Stories task Participant listens to a spoken story that contains a spoken deception (e.g., a lie, white lie, pretense, or double-bluff), a figure of speech (e.g., a metaphor or irony), a misunderstanding, persuasion, or the like. Participant is required to orally explain why the person said what they said and what they were thinking when they said it.
Faux Pas task Participant listens to a spoken story that contains a social interaction, such as a person showing newly bought curtains to a friend, who says they don't like the curtains. Participant is required to identify whether "someone said something that they shouldn't have" and, if so, to orally explain why the person said something that they shouldn't have, what they should have said instead, and what the person and their friend must have been thinking when the person said what they said.
Animated Triangles task Participant views a series of animations with geometric triangles. After each animation, the participant is asked to orally explain "What happened in the animation?" Unknown to the participant, their oral answers are scored according to how likely they are to interpret the animated triangles as humans interacting and the number of emotional terms they provide in their oral explanation (e.g., if they say that one triangle was bullying another triangle).
Reading-the-Mindin-the-Eye task Participant views only the eye region of numerous black and white photographs and for each photograph is required to select one emotional expression from a set of four emotion terms (e.g., terrified, upset, annoyed, or arrogant).