Theory of mind and executive functioning: A brief introduction

Two important capacities that show substantial development during the preschool years are theory of mind (ToM) and executive functioning (EF; Garon, Bryson, & Smith, 2008; Wellman, Cross, & Watson, 2001). ToM is the social–cognitive ability to understand human actions in terms of the psychological states that motivate behavior, such as beliefs, emotions, desires, and intentions. EF refers to the cognitive processes that facilitate goal-directed action and problem solving, such as working memory, cognitive flexibility, inhibitory control, and self-monitoring (Anderson, 2002; Carlson, 2005). EF skills are important for the conscious, effortful control of thoughts and behavior (Oh & Lewis, 2008). Impairments in ToM and EF are associated with a range of neurodevelopmental and psychiatric conditions across the lifespan (Geurts, Verté, Oosterlaan, Roeyers, & Sergeant, 2004; Moritz et al., 2002; Pilowsky, Yirmiya, Arbelle, & Mozes, 2000; Schenkel, Marlow-O’Connor, Moss, Sweeney, & Pavuluri, 2008). Thus, uncovering the neurological basis of these cognitive abilities has implications for understanding both typical and atypical development, including how particular brain structures or functions may forecast the emergence of neurodevelopmental problems characterized by ToM and EF deficits (see Emerson et al., 2017, for a recent empirical example).

The purpose of this review is to apply a cognitive neuroscience lens to the study of ToM and EF co-development by outlining the shared and distinct neural correlates of each, with the intention of better accounting for their behavioral overlap. Our primary interest is childhood, since this is the epoch during which advances in these skills are most dramatic and that portends significant growth in these and other functional outcomes later in life (Apperly, Warren, Andrews, Grant, & Todd, 2011; Casey et al., 2011; Eigsti et al., 2006). Notwithstanding the suggestion that mental state inference may be achievable to some degree by infants (Baillargeon, Scott, & He, 2010), our focus is on ToM and EF beyond this early period, given the paucity of literature examining infant brain activity during tasks assessing these competencies (see Grossmann, 2015, for a review). In reference to ToM, we narrowly concentrate on traditional measures that probe the understanding of desire, emotion, and belief (i.e., false belief), as opposed to broader conceptualizations that entail abilities such as deception, lying, humor, sarcasm, and so forth. With respect to EF, we concentrate on “top-down” cognitive operations that can be deployed in the service of an internal goal, similar to notions of “cognitive control” or “effortful control” (see Nigg, 2017, for a review); thus, we tend not to include processes such as planning, organization, and strategy formation. Although these time-bound and definitional parameters confine our review and its conclusions, our aim is to provide a starting point from which more elaborate proposals may advance.

The behavioral dilemma: Predictive relations between ToM and EF

Developmental psychologists have long been interested in the association between ToM and EF, in part because they appear to develop in concert with one another. ToM and EF share a well-established behavioral link during the preschool period, with countless observational studies showing cross-sectional associations between these abilities (Carlson, Moses, & Breton, 2002; Hughes, 1998). The exact nature of this relationship remains elusive. Nearly two decades ago, Perner and Lang (1999) reviewed five theoretical accounts on the relation between ToM and EF. Three of these theories remain ongoing areas of debate and comprise the foundation of this review: (1) ToM depends on EF (EF→ToM), (2) EF depends on ToM (ToM→EF), and (3) ToM and EF are reciprocally related owing to shared brain regions or neural networks (ToM↔EF).

The idea behind Proposal 1 (EF→ToM) is that EF skills such as self-monitoring and inhibitory control are necessary to understand the mental states of oneself and others (Carlson & Moses, 2001). More specifically, Russell (1996) argues that self-monitoring—a facet of executive control—is required for self-awareness, and self-awareness is a prerequisite for ToM. Furthermore, the ability to inhibit and shift perspectives seems necessary to understand the mental states of others. Following the general idea that EF supports the development of ToM, ample longitudinal evidence now exists that early EF is a robust predictor of later ToM reasoning in childhood (Carlson, Mandell, & Williams, 2004; Hughes & Ensor, 2007; Marcovitch et al., 2015; Müller, Liebermann-Finestone, Carpendale, Hammond, & Bibok, 2012).

The rationale behind Proposal 2 (ToM→EF) is that representing the mental states of oneself and others is required in order to strategically control thoughts and behavior. In other words, EF tasks such as those requiring inhibitory control necessitate an awareness that the mental blueprints for actions are causally effectual. With the understanding that mental states are causally linked to behavior, children acquire the ability to exert executive control over interfering or unwanted action tendencies (Lang & Perner, 2002; Perner & Lang, 1999; Perner, Stummer, & Lang, 1999). This first requires the capacity to differentiate self from other, followed by a basic understanding of the relation between mental states and behavior (meta-representation). This awareness that mental states are the drivers of behavior provides the basis on which those states can be manipulated to control thoughts and actions. Although the preponderance of behavioral evidence favors the proposition that EF is directionally linked to later ToM, several longitudinal studies suggest that early ToM, or its developmental precursors, also predicts later EF (Hughes & Ensor, 2007; McAlister & Peterson, 2013; Müller et al., 2012; Wade, Browne, Plamondon, Daniel, & Jenkins, 2016).

Finally, Proposal 3 of a “common neural basis” of ToM and EF (ToM↔EF) was initially forwarded by researchers examining the etiology of autism and its associated neurocognitive dysfunctions. Observations of pronounced concurrent impairments in both ToM and EF in these children led various researchers to propose that a common neural architecture, particularly involving prefrontal cortical regions, supported these cognitive abilities (Ellis & Gunter, 1999; Ozonoff, Pennington, & Rogers, 1991). On this basis, we would expect ToM and EF deficits to be interdependent, and by extension, we would expect these abilities to be reciprocally predictive of one another. Empirical support for a bidirectional relation between ToM and EF also exists (Austin, Groppe, & Elsner, 2014; Hughes & Ensor, 2007; Müller et al., 2012).

To date, developmental psychologists have relied extensively on behavioral evidence to draw inferences about the relative plausibility of these three proposals. However, even with sophisticated longitudinal designs using cross-lagged models and robust confound-controls, the research continues to yield mixed results. Moreover, since Perner and Lang’s (1999) seminal article, there has been an eruption of neuroscientific inquiry into the nature of both ToM and EF development, which has been paced by advances in neuroimaging and other tools for assessing brain structure and function. Indeed, an entire field centered on “social cognitive neuroscience” has emerged, with dedicated publication outlets (e.g., Social Neuroscience, Social Cognitive and Affective Neuroscience), including those focused on childhood and adolescence (e.g., Developmental Cognitive Neuroscience) (see Lieberman, 2007). As of this writing, however, there has been no comprehensive review on the developmental overlap between ToM and EF from a cognitive neuroscience perspective. Aside from crude and generalized suppositions concerning the neural underpinnings of ToM and EF, developmentalists have rarely aimed to understand the relationship of these capacities in terms of shared and/or distinct neural substrates. This is an important endeavor, since psychologists aiming to uncover the etiological basis of conditions characterized by ToM and EF deficits often wish to elucidate the mechanisms underlying these problems. For instance, determining whether psychosocial stress (Noble, Norman, & Farah, 2005; Wade et al., 2016) or biomedical risk factors (Wade, Browne, Madigan, Plamondon, & Jenkins, 2014; Wade & Jenkins, 2016) impinge on ToM and EF through similar or disparate neural mechanisms has implications for understanding the nature of neurocognitive impairment and its relation to more complex developmental and psychiatric outcomes. Similarly, identifying the directionality of these associations has meaningful clinical implications with respect to targeting treatment toward fundamental cognitive skills that support development across domains. Thus, in this review we draw on evidence from cognitive neuroscience to clarify which proposals regarding the ToM–EF relationship appear most credible on the basis of existing evidence. In doing so, we address gaps in the literature that require future research and offer suggestions for how apparent inconsistencies might be resolved through biobehavioral investigations.

Typical brain development during early childhood and links to ToM and EF

Studies of normative brain development can help explicate the nature of the ToM–EF relationship by highlighting the ontogenetic primacy of brain regions or networks known to support these abilities. These studies could therefore help to determine whether ToM or EF is necessary for acquiring the other ability. Addressing the directionality of influence could be accomplished by showing either that (a) the neural structures/circuits supporting one ability precede the other developmentally, and are perhaps structurally or functionally necessary to develop the other skill (i.e., either the EF→ToM or ToM→EF proposal); (b) the maturation of common brain regions/circuits sufficiently drives both abilities (i.e., the strict ToM↔EF proposal); or (c) common neural structures are necessary but not sufficient for both ToM and EF development, which also recruit domain-specific regions (the lean ToM↔EF proposal). Latter option c does not rule out a directional EF→ToM [or ToM→EF] explanation if, for instance, the regions involved in an EF [or ToM] network feed into a ToM [or EF] network (without reciprocal effects). As we shall see, the existing literature on normal brain growth is not sufficiently advanced to disentangle these options directly. Nevertheless, a discussion of normal brain development is important to characterize the temporal onset and trajectories of neural maturation in regions linked to ToM and EF, and serves to highlight their interconnectedness across childhood.

The protracted maturation of the human brain is among the most intriguing aspects of human development. During fetal development, cell proliferation, migration, synaptic growth, and dendritic arborization lead to a brain that is about one-third the size of that of a human adult by the time of birth with continued growth via genetic influence and environmental modification, brain maturation continues well throughout childhood and into early adulthood (Toga, Thompson, & Sowell, 2006). However, different tissues, structures, and circuits mature at different rates and follow different patterns (Giedd & Rapoport, 2010). For instance, the seminal work of Huttenlocher (1979) showed that visual cortex has maximal synapse production by 4 months of age, whereas this peak occurs at 3 to 4 years in the medial prefrontal cortex, an area implicated in both ToM and EF. In general, the first 2 years of life are characterized by a dynamic interplay of progressive and regressive neural events, with brain growth to about 80% of adult size (Lenroot & Giedd, 2006). Total brain volume doubles in the first year, with an additional 15% in the second year (Knickmeyer et al., 2008). By age 6, the brain is nearly 95% of its peak size, but continued growth in prefrontal and occipital regions is observed from ages 5 to 11 (Sowell, et al., 2004). Gray matter volumes (i.e., an indirect measure of glia, vasculature, and neurons with dendritic and synaptic processes) develop regionally, following an inverted U-shaped trajectory (i.e., a preadolescent increase followed by a postadolescent decrease; Giedd et al., 1999). Spatiotemporal developmental patterns are evident in white matter (a proxy for axonal myelination), as well (Deoni et al., 2011), though white matter growth tends to follow a more linear trajectory (Giedd et al., 1999).

Evidence suggests that the development of brain structures tends to follow a pattern wherein the areas responsible for the most basic functions develop first, followed by those responsible for more complex functions. In a longitudinal study of gray matter development, Gogtay et al. (2004) found that sensorimotor cortices, as well as frontal and occipital poles, matured first, followed by areas involved in spatial orientation and language (i.e., parietal regions), and then those related to more advanced executive/mental reasoning abilities (i.e., frontal lobe). Within the frontal lobe, the prefrontal cortex was the last to develop. Thus, the brain areas linked to motor and sensory function matured first, followed by areas involved in spatial orientation, speech and language development, and attention. Last to mature were areas involved in executive function and motor coordination. Notably, the superior temporal cortex, which is similar to the prefrontal and inferior parietal cortices in its role of integrating primary functions, was the last to mature. This is interesting given the importance of superior temporal regions in ToM (Apperly, Samson, Chiavarino, & Humphreys, 2004). Furthermore, studies using tensor maps of neural growth have shown that the fastest growth rates from 3–6 years of age occur in frontal networks supporting mental vigilance and goal-directed behavior (Thompson et al., 2000), consistent with PET evidence showing that rates of glucose metabolism in frontal cortex double between the ages of 2–4 years (Chugani, Phelps, & Mazziotta, 1987). These results are fascinating given that both ToM and EF undergo rapid development during this time (Wellman, 2002; Zelazo et al., 2003) and both are widely linked to prefrontal functioning (discussed in detail below).

Despite the coincident onset of increased proficiency in ToM and EF alongside neural maturation, a strict mapping of cortical development onto cognitive functions remains highly tenuous (Crone & Ridderinkhof, 2011). That is, even if it is shown that the structures typically associated with EF precede those related to ToM (or vise versa), our conclusions would still be limited without a precise mapping of brain development onto cognition or behavior within the same sample. Without measures of neural activity and EF and ToM, we must rely on cross-study comparisons and reverse inference (Poldrack, 2011) to explore the developmental primacy of EF or ToM. An ideal example of a within-subjects design that mapped brain activity onto cognition is by Shaw et al. (2006), who showed that synaptogenic trajectories of cortical development in prefrontal-specific regions predicted IQ functioning in childhood. That both ToM (Baker, Peterson, Pulos, & Kirkland, 2014) and EF (Brydges, Reid, Fox, & Anderson, 2012) are associated with IQ provides some indirect evidence that dynamic neural events in prefrontal regions over early childhood may support the development of these cognitive skills. However, the relationship between ToM and EF cannot be explained strictly by individual differences in IQ (Carlson et al., 2002). Thus, in addition to domain-general cognitive functions and their associated brain regions, discrete prefrontal and nonprefrontal regions may be involved in the networks supporting ToM and EF development.

It is now widely accepted that the brain is organized into functional networks, with coordinated activity from multiple regions that support the execution of cognition that drives behavior (Grayson & Fair, 2017). Examining functional networks at rest—“resting-state networks” (RSNs)—is useful because this does not require explicit cognitive or behavioral tasks, which for young children are performatively challenging and frequently fraught with error. Rather, RSNs make use of the concept that “neurons that fire together wire together,” and thus index the temporal correlation of blood-oxygen-level dependent (BOLD) signals across brain regions. These patterns of synchronized activity during the resting state are considered intrinsic features of brain function, given that they closely map onto network architecture during task performance (Cole, Bassett, Power, Braver, & Petersen, 2014; but see Davis, Stanley, Moscovitch, & Cabeza, 2017). The modularity of the human brain suggests that it is organized into “communities” or modules that are highly reproducible (Yeo et al., 2011). In children, these networks include primary sensorimotor and visual networks that are segregated and nondistributed; the “default-mode network” (DMN), which comprises the medial prefrontal cortex (mPFC), posterior cingulate (PCC)/precuneus, inferior parietal lobule (IPL; including the angular gyrus), and medial temporal gyrus (MTG); the “dorsal attention network” which involves the intraparietal sulcus (IPS) and middle temporal cortex; the “salience network” which includes the dorsal anterior cingulate cortex (dACC) and anterior insula; the “frontoparietal network” which includes the dorsolateral PFC (dlPFC), ventrolateral PFC (vlPFC), and dorsomedial PFC (dmPFC); and the language/auditory network.

Critical to the discussion of ToM and EF, the DMN has been consistently linked to ToM in adults (Schilbach, Eickhoff, Rotarska-Jagiela, Fink, & Vogeley, 2008; Spreng & Grady, 2010). It has also been suggested that maturation of both structural and functional connectivity between DMN regions (especially mPFC and PCC) may support social–cognitive development over childhood (Supekar et al., 2010; see also Mak et al., 2017). Distinct developmental trajectories of specific DMN nodes have also been reported in 3- to 5-year-old children, with stronger interactions between nodes over this period (Xiao, Zhai, Friederici, & Jia, 2016). This is congruent with the rapid evolution of ToM during this period. Moreover, perturbations in intrinsic connectivity have been observed in conditions such as autism, for which ToM is grossly impaired (see Hull, Jacokes, Torgerson, Irimia, & Van Horn, 2016). Aberrations in connectivity between DMN regions are also demonstrable in children with conduct disorders and attention-deficit hyperactivity disorder (ADHD) (Broulidakis et al., 2016), suggesting that alterations in DMN function may be one source of risk for neurocognitive and neurodevelopmental problems early in life.

In contrast to the DMN, the salience network has sometimes been referred to as the “executive network” using common parceling techniques (Kemmer, Guo, Wang, & Pagnoni, 2015), whereas others have indicated that the frontoparietal network is the center of executive control (Seeley et al., 2007; Vincent, Kahn, Snyder, Raichle, & Buckner, 2008). Both of these networks may exercise distinct “control” functions during goal-directed behavior (Elton & Gao, 2014). Indeed, the development of these networks from childhood to adulthood has been posited to entail both decreased short-range connectivity and increased long-range connectivity of the nodes that compose them (Fair et al., 2007). As with the DMN, structural and functional maturation of these networks over the life course has been suggested to underlie the capacity for attentional and cognitive control (Uddin, Supekar, Ryali, & Menon, 2011), with deficits in functional connectivity apparent in children, adolescents, and adults with ADHD (Castellanos & Proal, 2012; Konrad & Eickhoff, 2010). Interestingly, specific brain regions within the frontoparietal and DMN networks show increased local and global functional features from ages 2–6, with researchers positing that this maturation supports abilities such as ToM, EF, and language (Long, Benischek, Dewey, & Lebel, 2017). These developments continue over the transition to adolescence (Sherman et al., 2014) and may explain the magnitude of individual differences in ToM and EF beyond childhood (Blakemore & Choudhury, 2006; Dumontheil, Apperly, & Blakemore, 2010).

Interrogating functional network maturation very early in development (i.e., within the first 2 years of life) gives clues as to the ontogenetic primacy of ToM and EF, which informs inferences about the nature of the ToM–EF relationship. Gao et al. (2009) have provided remarkable evidence for the establishment and synchronization of brain regions related to ToM and EF in infants from age 2 weeks to 2 years. They demonstrated that, whereas 2-week-old infants show a rudimentary and incomplete DMN, significant connections are established over the first 2 years, with a near-adult-like DMN observed by age 2. Growth in long-range neural connections and global efficiency over the first 2 years of life across DMN areas has been suggested to underpin the development of social–cognitive proficiencies (Gao et al., 2011). In a separate investigation, similar patterns of network synchronization over the first 2 years were observed for the dorsal attention network (Gao et al., 2013). Recently, the growth trajectories of nine functional networks were examined (Gao, Alcauter, Smith, Gilmore, & Lin, 2015). Among these were the DMN, salience, frontoparietal, dorsal attention (lateral visual/parietal), and language/auditory networks. The results showed that, across the various networks, the language/auditory and dorsal attention networks matured faster than the DMN, which matured faster than both the frontoparietal and salience networks. Integration between the frontoparietal and DMN networks increased over the first 2 years, while connectivity between the DMN and salience networks decreased. These results point to discrete, early-emerging cortical networks that govern internal mentalization and executive control, with the possibility of cross-talk between specific networks. The progressive interaction between DMN and frontoparietal networks over the first 2 years of life may therefore lay the foundation for the robust associations observed between ToM and EF in early childhood. Perhaps most striking, these findings support the notion that the ToM-related functions associated with the DMN may serve as a foundation on which other higher-order processes develop, providing the strongest support yet for the ToM→EF hypothesis using RSN methods. The enduring positive correlation between frontoparietal and DMN networks in 6-year-old children (Emerson, Short, Lin, Gilmore, & Gao, 2015) is consistent with behavioral evidence of significant correlations between ToM and EF in middle childhood (Devine, White, Ensor, & Hughes, 2016). Moreover, increased connectivity of both the auditory/language and dorsal attention networks with the frontoparietal network over the first 2 years (Gao et al., 2015) is suggestive that these early-emerging networks may developmentally precede and functionally modulate activity in the later-maturing networks that support higher-order cognition (a possibility we explore in more detail below).

Unfortunately, comprehensive longitudinal studies that simultaneously assess ToM and EF and their associated brain regions—including regionally specific activation, trajectories of white and gray matter change, and functional connectivity across regions—have not yet been conducted. Notwithstanding the immense progress made in understanding both cognitive and neural development, imaging studies have yet to unambiguously elucidate the temporal emergence of ToM and EF with respect to representation in the brain. That is, despite the observed developmental precedence of ToM-related networks (DMN) over EF-related networks (frontoparietal and salience) using RSN methods, existing studies of typical brain development cannot definitely speak to the directionality of the ToM–EF relationship on the basis of structural/functional primacy (i.e., ToM→EF or EF→ToM) or their reliance on common-developing structures/networks (i.e., ToM↔EF), owing to a lack of longitudinal designs that explicitly map cognition onto neural development. A recent study by Eggebrecht et al. (2017) is a notable exemplar of how measures of resting-state functional connectivity can be augmented to include measures of behavior/cognition in order to explain the emergence of particular abilities. In their study of infant joint attention from 12 to 24 months, brain–behavior associations emerged between the visual and dorsal attention networks and between the visual and default mode networks (in particular the PCC), suggesting that interactions between these networks may support the development of joint attention. This is fascinating given the importance of early joint attention for later ToM ability (Adolphs, 2003). Future studies that leverage this approach by integrating measures of brain and ToM/EF across childhood will be critical to improving our understanding of how such networks support these abilities and how intra- and internetwork dynamics explain the unfolding relationship between ToM and EF over development.

Neurological disorders as clues to the ToM–EF association

Studies of children with neurodevelopmental disorders may further help clarify which theoretical account of the relation between ToM and EF is most plausible. The proposal that EF is a prerequisite for ToM (EF→ToM) demands that, in the presence of EF deficits, there necessarily are ToM deficits as well. However, children with Prader–Willi and Williams syndromes readily pass ToM tasks, even when EF performance is variable or impaired (Perner & Lang, 2000; Tager-Flusberg, Sullivan, & Boshart, 1997). Similar results have been found in children with ADHD (Perner, Kain, & Barchfeld, 2002). Furthermore, ToM deficits (difficulties understanding unrealized goals, false belief, pretense, and intention) in neurofibromatosis type 1 (NF1) cannot be explained by parent-rated executive dysfunction alone (Payne, Porter, Pride, & North, 2016). This appears to contradict the directional EF→ToM argument. In contrast, the proposal that ToM is required for EF (ToM →EF) prohibits the existence of intact EF in the presence of impaired ToM. However, children with autism have been shown to have intact EF (planning, set-shifting, and inhibition) while exhibiting pronounced deficits in false belief understanding (Pellicano, 2007). Also, intervention studies of children with autism show that training EF has the effect of improving ToM at follow-up, whereas training ToM does not improve EF (Fisher & Happe, 2005). These studies discount the plausibility of the ToM→EF argument. More commonly, concomitant deficits in both ToM and EF are observed for neurodevelopmental disorders such as autism (Gökçen, Frederickson, & Petrides, 2016; Joseph, 2004; Ozonoff, et al., 1991) and ADHD (Fahie & Symons, 2003; Uekermann, et al., 2010). Although such studies support the ToM↔EF argument, they fail to rule out a directional association, because the assessment of ToM and EF is usually concurrent, with no prediction of cognitive decline over time. Thus, contemporaneous ToM and EF impairments could be explained by impairment in one domain that exerts a downstream effect on the other, or by a theory that posits a shared neural network that mutually supports both abilities. It is also plausible that the EF deficits previously observed in children with autism are attributable to difficulties understanding the experimenter’s implicit expectations for the task (White, 2013). In such circumstances, purported problems with EF may actually be born of mentalization problems rather than primary executive dysfunction, thereby highlighting the need to closely consider sources of task and instructional variance to ToM and EF performance in children with neurodevelopmental disorders.

Similar discrepancies have been observed in neurodegenerative diseases. For instance, low ToM at baseline, as measured by the Reading the Mind in the Eyes Test (RMET), may be a risk for impaired prefrontal/executive functioning and frontotemporal dementia 2 years later, supporting the ToM→EF link (Pardini et al., 2013). Alternatively, in patients with Alzheimer’s disease, EF impairments contribute to the deterioration of second-order false belief understanding (Laisney et al., 2013). However, the latter results were correlational and therefore do not offer strong support for a directional EF→ToM claim. The relationship between ToM and EF has also been demonstrated in multiple sclerosis (Henry et al., 2009; Kraemer et al., 2013) and Parkinson’s disease (R. J. Anderson, Simpson, Channon, Samuel, & Brown, 2013; Costa et al., 2013). These studies have used a variety of methods to assess ToM, including the RMET, interpreting human action from written scenarios, or inferring mental states such as the false beliefs of characters in videos. EF has also been assessed using a variety of tasks indexing response inhibition, working memory, verbal fluency, and cognitive flexibility. Thus, broadband EF deficits appear to correlate with broadband ToM deficits in these populations, though it is unclear whether particular aspects of EF are more strongly related to specific ToM-related abilities. Finally, evidence from studies of adults with acquired neurological pathology—subcortical pathologies, cortical degenerative disorders, frontal focal lesions, traumatic brain injury, and epilepsy—shows that, across a host of classical ToM tasks and EF domains (updating, shifting, inhibition, and access), congruent impairments in ToM and EF are present in 64% of cases (see Aboulafia-Brakha, Christe, Martory, & Annoni, 2011, for a review). In a small fraction of samples, ToM and EF were jointly preserved. Perhaps most interestingly, in 16% of cases, EF was impaired while ToM was intact, whereas in 13% of cases, ToM was impaired while EF was preserved. The fact that these patients could demonstrate, on the one hand, mutually impaired or preserved functioning in ToM and EF or, on the other, relative sparing of one ability and deficiency in the other, allows for any of the three proposals: ToM→EF, EF→ToM, or ToM↔EF. Indeed, a more nuanced approach that identifies the precise neural mechanisms underlying these pathologies might prove useful in revealing the conditions under which directional or reciprocal relations between ToM and EF emerge.

Studies of children with neurological disorders or acquired brain injury are also informative. For instance, children with traumatic brain injury (TBI) show both broad ToM and EF deficits (Robinson et al., 2014). Moreover, Dennis et al. (2013) examined how injury to specific brain networks was associated with different aspects of ToM. Damage to the central executive network (involving the dlPFC, posterior parietal cortices, and subcortical regions) was related to conative ToM (i.e., the ability to understand how acts like empathic praise or ironic criticism can influence another person’s thoughts or feelings), suggesting that ToM expression may depend on aspects of EF such as cognitive inhibition (see also Dennis, Agostino, Roncadin, & Levin, 2009). However, disruptions to other networks, such as the DMN and mirror neuron/empathy network, were also related to ToM, suggesting that EF deficits do not fully account for all ToM impairments in childhood TBI. More recently, Ryan et al. (2017) examined the contribution of large-scale neural networks to cognitive, affective, and conative ToM deficits in a large sample (N = 137) of typically developing children with mild to severe TBI. They demonstrated volumetric reductions in several large-scale networks that support ToM, including the DMN, salience, and central executive networks, as well as two additional networks specific to social cognition—the cerebro-cerebellar mentalizing network and the mirror-neuron empathy network. Of note, the parcellation approach used in these studies differed from that of the RSN studies noted above, making comparisons difficult. However, when Ryan et al. (2017) examined specific nodes within these networks, they found that poor cognitive ToM was associated with gray matter reductions in the temporo-parietal junction (TPJ), superior temporal sulcus (STS), and cerebellum; poor conative ToM was associated with reduced volume of the premotor area, IPL, and inferior frontal gyrus (IFG); and poor emotional ToM was associated with reduced amygdala volume. As we will see below, these findings overlap with those from task-based neuroimaging studies and converge on the conclusion that key neural nodes that participate in complex networks may provide a substrate for the co-development of ToM and EF.

Concurrent impairments in ToM and EF have also been observed in children with cardiac malformations (TGA; Calderon et al., 2010), cerebral palsy (Caillies, Hody, & Calmus, 2012), and focal epilepsy (Giovagnoli et al. 2011). Importantly, although many of these studies suggested that deficits in ToM may be the result of executive dysfunction, such conclusions are often predicated on assumptions concerning the role of EF in online social reasoning (i.e., a presumed EF→ToM relation). Although not implausible, these proposals require an explicit examination of the predictive roles of ToM and EF in each other’s development. A minimal requirement for making directional assertions is to test both reciprocal effects after controlling for key confounds, and a more stringent requirement is to assess both ToM and EF at various time points and examine their cross-lagged longitudinal effects. Including measures of ToM and EF at the time of or shortly after the brain insult and having premorbid levels of functioning would further enable causal inferences to be drawn. Currently, such approaches are rarely if ever feasible in clinical studies of this kind, and usually for sensible reasons: lack of appropriate measurements, unpredictable illness onset, and priority of providing acute care. The fallout is a lack of clarity regarding the directionality of the ToM–EF relationship in either normative or patient samples. However, the advent of reliable and valid neuropsychological assessment tools capable of being delivered at bedside in settings such as the emergency department (Khetani, Brooks, Mikrogianakis, & Barlow, 2016) offers promise for baseline assessment of functioning and outcome monitoring in several pediatric conditions. Moreover, recent calls for regular screening of ToM in children and adults with varied neurological impairments (Adenzato & Poletti, 2013; Ryan et al., 2015) may promote routine assessment of ToM and EF simultaneously. In turn, tracking how these abilities unfold in the aftermath of injury/diagnosis may facilitate an examination of how changes in these abilities predict decline, stagnancy, or recovery in one or the other’s development, thereby offering an avenue for expounding the directionality of the ToM–EF relationship.

In sum, the neurological evidence from pediatric, adult, and geriatric patients has not yielded consistent findings, in large part because of the vast heterogeneity in the quality, degree of deficit, and functional relations between ToM and EF across conditions. These studies do not enable unambiguous conclusions regarding the functional dependence of ToM and EF, and all three theories (ToM→EF, EF→ToM, or ToM↔EF) remain tenable on the basis of research in these areas.

Brain lesion studies

Studies from patients with brain lesions may shed further light on the nature of the ToM–EF relation. In these studies, the presence of a particular lesion (X) and associated cognitive deficit in one domain (Y) but sparing in another domain (Z) suggests that: (1) area X is necessary for Y, (2) area X is neither necessary nor sufficient for Z, and (3) function Y is not important for function Z. If region X is associated with impairment in Y and Z, then X may subserve both functions, or it may subserve one function that in turn impacts the other in a directional manner. Studies documenting mutual impairments in Y and Z as a function of lesion X may therefore support either directional or shared accounts of association. As we shall see below, this lack of clarity hampers conclusions regarding the nature of the ToM–EF relationship in neuropsychological and brain lesion studies.

In an early study of 31 patients with right or left frontal lesions, deficits in both ToM (first- and second-order false belief understanding) and a broad set of EFs were observed, relative to healthy controls (Rowe, Bullock, Polkey, & Morris, 2001). However, the results indicated that ToM deficits could not be explained by executive dysfunction. The fact that both ToM and EF were impaired, without a clear EF→ToM link, best fits a ToM↔EF model, but it does not rule out a ToM→EF effect. Unfortunately, the researchers did not localize the specific frontal regions involved in mental state attribution, making it difficult to understand which areas were dedicated to ToM and/or EF. Another early study by Fine, Lumsden, and Blair (2001) showed that patient B.M., suffering from congenital left amygdala damage, had profoundly impaired mental state representation in adulthood but showed relative sparing of EF (inhibition, goal-directed behavior, and sequencing). Damage to the amygdala either congenitally or in early childhood may be particularly detrimental to ToM, as damage in adulthood (e.g., as part of an anterior temporal lobectomy) is less predictive of ToM problems (Shaw et al., 2004). Similar findings of separable neural contributions to ToM and EF were found in a case of frontotemporal dementia, where patient J.M. showed severely impaired ToM (first- and second-order false beliefs, and faux pas) but intact EF on the Wisconsin Card Sort Test, which is generally considered a test of set shifting (Lough, Gregory, & Hodges, 2001). These results support a lean version of the EF→ToM proposal (i.e., EF may facilitate ToM but is not sufficient for it; otherwise, ToM would also be intact). Perhaps more conclusively, these results suggest that discrete rather than completely overlapping brain regions are involved in ToM and EF, with the sparing of one ability but not the other being at odds with a strict ToM↔EF relational model.

Many neuropsychological patient studies now suggest more specific neural contributions to ToM and EF. For instance, patients with bilateral vlPFC/orbitofrontal (OFC) lesions demonstrate subtle impairments in social reasoning (i.e., in understanding faux pas, but not first- or second-order false beliefs) akin to those found in high-functioning autism, whereas damage to left dlPFC is associated with working memory but not ToM deficits (Stone, Baron-Cohen, & Knight, 1998). Recent evidence has suggested that affective components of ToM may be subserved by ventromedial prefrontal cortex (vmPFC), whereas dorsolateral regions may support cognitive aspects of ToM (Kalbe et al., 2010; Leopold et al., 2012; Shamay-Tsoory & Aharon-Peretz, 2007; Shamay-Tsoory, Tomer, Berger, Goldsher, & Aharon-Peretz, 2005; Stuss, Gallup, & Alexander, 2001). Interestingly, the dlPFC is widely implicated in EF (Alvarez & Emory, 2006; Stuss, 2011). Moreover, ToM deficits in patients with prefrontal cortical lesions have been shown to be largely attributable to problems with EF, such as working memory, shifting, and inhibition, while deficits in false belief reasoning among patients with damage to the TPJ are independent of EF (Apperly et al., 2004). Involvement of the TPJ in spontaneous belief inference has recently been demonstrated in two patients with highly localized lesions in the left posterior part of the TPJ (Biervoye, Dricot, Ivanoiu, & Samson, 2016). With regard to to the relationship between ToM and EF, it may be that ventromedial or posterior temporo-parietal regions are specifically involved in the representation of mental states, whereas dorsolateral regions support the domain-general executive processes needed to successfully manage and functionally apply those representations (i.e., inhibiting self-perspectives, holding and manipulating in memory, shifting reference frames, etc.). This is consistent with a lean version of the EF→ToM proposal in which EF may be necessary but not sufficient for ToM processing. However, these results have been challenged by findings showing that a patient suffering a rare bilateral anterior cerebral artery infarction that damaged the mPFC had marked problems with set shifting and planning but not with ToM (Bird, Castelli, Malik, Frith, & Husain, 2004; see also Bach, Happe, Fleminger, & Powell, 2000). This finding contradicts the EF→ToM proposal, while allowing the possibility that ToM is necessary, but not sufficient, for EF (the lean ToM→EF proposal).

On balance, the fact that patients with comparable prefrontal lesions show variable levels of ToM and/or EF performance may be most consistent with the ToM↔EF proposal of a shared neural network supporting both skills, with other discrete regions separately underlying specific aspects of ToM and EF. The discrepancies across studies, then, may be due to a variety of mitigating factors, such as the recruitment of patients with differing brain pathologies/etiologies, diffuse brain lesions that fail to localize and may involve neighboring regions, and variability in the selection of ToM and EF tasks. For instance, it may be that the unspecified effects of particular lesions on ToM and EF reflect a more scrupulous phenomenon that is not adequately captured in all studies, such as the possibility that EF accounts for deficits in the cognitive but not the emotional dimension of ToM reasoning (see Yeh, Tsai, Tsai, Lo, & Wang, 2017, for an example).

Several issues remain to be addressed in future research. Issues with lesion localization and sample heterogeneity (e.g., patients with either congenital or acquired neurological pathology) clearly have a bearing on the reliability of the findings across studies. Another obstacle in addressing questions regarding the directionality of the ToM—EF relationship is the temporal cascading of effects. For instance, preservation of EF with impaired ToM allows for the possibility of EF→ToM, just as preservation of ToM with impaired EF allows for the ToM→EF proposal. However, studies that compare the preservation of EF and ToM do not directly address whether ToM or EF is dependent on the other. Indeed, such dissociations are necessary, but not sufficient, in order to make inferences about causality. Future studies might address these questions more directly by longitudinally tracking patients with lesions to specific neural regions/networks involved in ToM and EF and examining the resultant impact (deterioration or preservation) on the other ability over time.

Task-based neuroimaging studies of ToM and EF

Above we described several findings on RSNs, their maturation, and their putative relation to ToM and EF. It has been argued, however, that a full appreciation of the brain basis of cognition requires an examination of the networks that are activated while performing cognitive tasks (rather than at rest). Indeed, it is not always the case that RSNs neatly dovetail with task-based network activity (see Campbell & Schacter, 2017). In contrast to RSNs, the neural networks activated during tasks have been called “cognitive function networks” (CFNs; Davis et al., 2017). Like RSNs, CFNs comprise discrete brain regions that form integrated networks supporting particular cognitive abilities. Davis and colleagues described several challenges with relying on RSNs alone to understand the neural basis of cognition: (1) the brain regions connected within RSNs may be dissociable in a CFN (the dissociation problem); (2) regions belonging to different RSNs may co-activate in a CFN (the association problem); (3) a single brain region in an RSN may ostensibly participate in more than one CFN (the versatility problem); and (4) the extent to which a brain region is connected to other structures within an RSN does not presage its importance in a CFN (the performance specificity problem). These points underscore the importance of examining the neural structures that are specifically activated during ToM and EF tasks, which provides a complementary approach to RSN methods of elucidating the nature of the ToM–EF relation.

Rapid improvements in the quality and accessibility of tools for assessing brain structure and function over the last 20 years have led to an unprecedented number of studies on the neural foundations of ToM and EF. Common approaches include the use of electroencephalography (EEG; and its associated event-related potentials, ERPs), structural magnetic resonance imaging (MRI; and its functional subtype, fMRI), diffusion imaging (dMRI or DTI), and functional near-infrared spectroscopy (fNIRS). In this section, we review the extensive evidence on CFNs and their participant regions in relation to ToM and EF.

Theory of mind findings

In the adult literature, neuroimaging studies have revealed a remarkably consistent set of brain regions recruited during ToM tasks: the bilateral TPJ (the inferior parietal lobule at the junction with the posterior temporal cortex), the mPFC (including the anterior paracingulate cortex), the precuneus/PCC, and the STS/MTG (Amodio & Frith, 2006; Decety & Sommerville, 2003; Gallagher & Frith, 2003; Gallagher et al., 2000; Ruby & Decety, 2003; Saxe & Kanwisher, 2003; Saxe, Whitfield-Gabrieli, Scholz, & Pelphrey, 2009; Völlm, et al., 2006). Meta-analytic studies have corroborated the notion of a distributed fronto-temporo-parietal CFN that supports the ability to impute others’ goals, desires, intentions, and beliefs (Molenberghs, Johnson, Henry, & Mattingley, 2016; Schurz, Radua, Aichhorn, Richlan, & Perner, 2014; Van Overwalle, 2009).

In children, early EEG studies demonstrated that resting EEG alpha activity localized to the dmPFC and TPJ was positively associated with 4-year-olds’ representational ToM, even after controlling for EF (e.g., inhibitory control and shifting; Sabbagh, Bowman, Evraire, & Ito, 2009). These results indicate that EF cannot account for the pattern of brain activity that accompanies ToM reasoning, and instead suggest a dedicated neural circuit for ToM. ERP studies comparing adults and 4- to 6-year-old children have revealed comparable prefrontal activity in those who can effectively reason about mental states (Liu, Sabbagh, Gehring, & Wellman, 2009), signifying a critical role of the PFC in ToM deployment and development (see also Meinhardt, Kühn-Popp, Sommer, & Sodian, 2012). Using ERP methods, it has also been shown that both belief and desire reasoning are associated with midfrontal scalp activations, whereas belief reasoning shows an additional selective right-posterior scalp distribution (Bowman, Liu, Meltzoff, & Wellman, 2012). These results align with the idea that early desire reasoning may help to scaffold later belief understanding, consistent with behavioral studies that emphasize a developmental progression in ToM-based skills (see Wellman, Fang, & Peterson, 2011). On the basis of these EEG/ERP studies, it does not appear that the strict ToM↔EF proposal of a wholly overlapping neural basis for ToM and EF is tenable. However, definitive conclusions are difficult to draw, given the limited spatial resolution of EEG/ERP measures. As a result, neuroimaging methods with improved spatial resolution offer an important window into the neural basis of ToM in children.

Imaging studies in children, though rare as compared to with adults, are supportive of the CFN described above involving the TPJ, mPFC, PCC/precuneus, and STS/MTG from age 6 to 12 (Kobayashi, Glover, & Temple, 2007; Saxe et al., 2009; Sommer et al., 2010). Using fMRI, Gweon, Dodell-Feder, Bedny, and Saxe (2012) showed increasing activation in the bilateral TPJ from age 5 to 11, suggesting progressive selectivity of this region in reasoning about mental states. Moreover, Wiesmann, Schreiber, Singer, Steinbeis, and Friederici (2017) recently used diffusion-weighted MRI to show that breakthroughs in false belief understanding from age 3 to 4 are associated with age-related changes in local white matter structure in the TPJ, mPFC, precuneus, and MTG. As with prior EEG findings, these changes were independent of EF and language abilities, suggesting a dedicated neural circuit underlying advances in ToM from age 3 to 4. Wiesmann and colleagues also demonstrated increased connectivity between temporoparietal and IFG regions in relation to ToM performance, suggesting both white matter maturation and increased connectivity strength across these classic ToM regions in preschool-aged children. Moreover, the progression from desire to belief reasoning being supported by the TPJ has recently been replicated using fNIRS methods (Bowman, Kovelman, Hu, & Wellman, 2015). On aggregate, these findings coalesce with electrophysiological evidence in dispelling the strict version of the ToM↔EF proposal of a completely overlapping neural network. Importantly, such findings do not rule out the possibility of shared hubs that participate in networks for both ToM and EF. This first requires a delineation of the neural regions involved in EF.

Executive functioning findings

As we described above, EF reflects a constellation of cognitive abilities—inhibition, working memory, cognitive flexibility, self-monitoring, and so forth—and any one of these components could be functionally related to ToM. Many tasks assess EF, but most of them do not effectively separate these components. For instance, in older teenagers and adults, the Wisconsin Card Sorting Test (WCST), which has been suggested to involve many EFs (Miyake et al., 2000), has been shown to activate the dlPFC, vmPFC, and IPL (Alvarez & Emory, 2006), which parallels results from patient lesion studies (Stuss et al., 2000). Response inhibition is strongly associated with activity in the ACC, dlPFC, superior parietal lobule, and vlPFC (Aron, Robbins, & Poldrack, 2004; Wager et al., 2005). Similar areas have been shown to support working memory, with increased co-activation in ACC and IFG in individuals with larger working memory spans (Osaka et al., 2004). In general, working memory involves dorsolateral, ventrolateral, and anterior prefrontal regions along with parietal areas (Veltman, Rombouts, & Dolan, 2003). Response selection under conditions of high attentional load is further associated with activity in bilateral frontal eye fields and intraparietal sulcus (Culham, Cavanagh, & Kanwisher, 2001). Though the degree of activation for each EF dimension is not reviewed here due to space restrictions, it is clear that a number of regions appear to be associated with broadband EF abilities. Studies using transcranial magnetic stimulation (TMS) have confirmed these associations (Osaka et al., 2007).

In children, neuroimaging studies are largely in agreement with the adult literature. For example, in an MRI study of 5- to 10-year-old children, Kharitonova, Martin, Gabrieli, and Sheridan (2013) showed that age-related improvements in cognitive control were associated with gray matter thinning in the ACC and right IFG, whereas changes in working memory were associated with thinning in the superior parietal cortex, possibly reflecting a process of selective pruning and increased myelination over childhood. Developmental changes in mPFC have also been associated with cognitive control in this age group (Sheridan, Kharitonova, Martin, Chatterjee, & Gabrieli, 2014). In a study of 339 participants ages 8 to 89, using both MRI and DTI, it was shown that intracortical myelination is associated with intra-individual variability in a speeded inhibition task across the human lifespan, with EF growth being observed through the fourth decade of life (Grydeland, Walhovd, Tamnes, Westlye, & Fjell, 2013). Multimodal imaging techniques of this sort have further revealed that cortical surface area of the ACC is a critical predictor of cognitive control, with the strongest effects being observed in young children (Fjell et al., 2012; Velanova, Wheeler, & Luna, 2008; Walhovd et al., 2012). There is also evidence for progressive age-related increases in activation in lateral and medial fronto-striatal regions, as well as for the strength of interregional connectivity during cognitive control tasks (including inhibition and flexibility; see Rubia, 2013, for a review). Together, these findings underscore the importance of dlPFC, ACC, and temporoparietal maturation in the development of common EFs in childhood.

Relation between ToM and EF

Meta-analytic findings of brain regions implicated in EF in both children and adults (Alvarez & Emory, 2006; Houdé, Rossi, Lubin, & Joliot, 2010; Wager, Jonides, & Reading, 2004) have identified a number of areas that are frequently associated with ToM; these include the vmPFC (Shamay-Tsoory, Tibi-Elhanany, & Aharon-Peretz, 2006), IPL (Decety & Sommerville, 2003; Uddin, Molnar-Szakacs, Zaidel, & Iacoboni, 2006), anterior insula (especially in understanding feeling states; see Lamm & Singer, 2010), and temporoparietal regions (Saxe et al., 2009). The dmPFC has been linked to both ToM and inhibitory control in separate studies (Dodell-Feder, Koster-Hale, Bedny, & Saxe, 2011; Simmonds, Pekar, & Mostofsky, 2008). Moreover, in addition to its role in ToM, the TPJ has been characterized as an attentional “circuit breaker,” enabling the disengagement and reorienting of attention (Corbetta, Patel, & Shulman, 2008). On the surface, these findings may intimate domain-general properties of these regions that best fit with the ToM↔EF proposal of a shared neural architecture. However, this is complicated by the fact that these comparisons are made across studies, so it is plausible that the observed activity in response to a given task could be explained by the other (unmeasured) cognitive function being enlisted to perform that task. Thus, within-subjects designs that compare neural activity during ToM and EF tasks are critical. In one example, van der Meer, Groenewold, Nolen, Pijnenborg, and Aleman (2011) showed that a high- versus low-inhibition belief reasoning task and a basic stop-signal inhibition task both recruited IFG, whereas ToM alone recruited the STG, MTG, TPJ, and precuneus. These results suggest a common mechanism for self-perspective inhibition and basic response inhibition, as well as a non-EF-mediated, ToM-specific neural mechanism. Such findings are consistent with a lean EF→ToM relational link in which EF is enlisted to support ToM activities that require perspective inhibition. These results also align with behavioral studies that have demonstrated that ToM performance varies as a function of working memory demands (Bull, Phillips, & Conway, 2008; McKinnon & Moscovitch, 2007) and that the depletion of EF during delay of gratification stymies otherwise capable 4- and 5-year-olds’ ability to reason about mental states (Powell & Carey, 2017). The latter study, while convincingly showing that EF depletion reduces ToM performance, cannot rule out the possibility that ToM depletion also reduces EF performance—indeed, no such study has been conducted, to our knowledge. Additionally, such a finding may be explained by the idea that something common to both ToM and EF is depleted during delay of gratification, such as the attentional resources used to compute reward benefits or response costs, a theme we will return to at the end of this review.

The finding that EF may modulate ToM could be explained by (1) the “performance account,” in which immature EF places limits on children’s ability to effectively represent and reason about mental states during actual task performance, or (2) the “emergence account,” in which EF is required in order to form conceptual and mentalistic representations of others. Saxe, Schulz, and Jiang (2006b) directly addressed this question using fMRI in a sample of adults by conducting a series of experiments using false belief (ToM) and algorithm (EF) tasks, which both required response selection and application to the same stimulus materials (thus matching the task demands). They found common neural activity in mPFC, bilateral parietal sulcus, ACC, and left TPJ. The right TPJ was only associated with false belief performance. These results suggest that ToM reasoning recruits brain regions associated with response selection, inhibition, and attention, as well as regions that are specifically required in order to represent the contents of others’ thoughts, including the TPJ and, perhaps more weakly, the mPFC and PCC. Since EF was important for ToM in this sample of adults, the results favored the “performance” over the “emergence” account of the EF→ToM relationship. That is, even if ToM tasks involve executive demands, mental-state reasoning is subserved by a distinct neural substrate. The authors further suggested that their results stood in opposition to a strict ToM↔EF account, in which ToM and EF are associated because they rely on the same neural network. Similar results were subsequently demonstrated in 6- to 11-year-old children (Saxe et al., 2009).

Another study examining ToM and EF in the same sample used nonverbal visual tasks for false belief understanding and inhibitory control that were virtually identical (Rothmayr et al., 2011). The ToM and EF tasks showed substantial overlap of activity not only in right TPJ, but also in dmPFC, the dorsal part of the left TPJ, and lateral prefrontal regions (areas also linked to memory; Cabeza & Nyberg, 2000). These findings suggest that the mPFC and TPJ are not specific to either cognitive ability. Furthermore, the authors suggested that these regions may support domain-general processes involved in both ToM and EF, explicitly favoring the ToM↔EF proposal. However, the recruitment of independent regions for both ToM and EF suggests that, if there is a shared network for ToM and EF, the areas recruited by the different capacities are not completely overlapping (lean ToM↔EF relation).

Common hubs for ToM and EF?

The notion that ToM and EF recruit common neural regions underscores the versatility problem described by Davis et al. (2017) above—that is, specific brain regions may participate in more than one CFN in the service of different cognitive tasks. One explanation for the seeming overlap between the brain regions involved in ToM and EF is that these regions operate as hubs within distributed neural networks that support the development of both abilities. This may partially explain why these abilities are modestly but robustly correlated across childhood. Indeed, data from both structural and functional brain analyses have revealed several cortical hubs that are both densely anatomically connected and dynamically interactive (see M. P. van den Heuvel & Sporns, 2013, for a review). Structural hubs (derived from diffusion imaging) include the precuneus/PCC, ACC, dlPFC, and insular cortex, as well as MTG/STG. Similarly, functional hubs (derived from resting-state fMRI) include the precuneus/PCC, ACC, vmPFC, and IPL. These hubs traditionally localize to the DMN or executive control network (see Cole, Pathak, & Schneider, 2010). Perhaps more interesting, certain hub regions may participate in more than one network. The ventral part of the PCC, for instance, shows strong functional connectivity to the DMN, whereas regions in the dorsal PCC show high connectivity to the frontoparietal network (Leech, Braga, & Sharp, 2012). Such fine-grained distinctions have also been established in the TPJ. For example, Scholz, Triantafyllou, Whitfield-Gabrieli, Brown, and Saxe (2009) used high-resolution imaging to show that, within the right TPJ, there is a 6- to 10-mm spatial displacement between the activations for representational ToM and attentional reorienting. Mars et al. (2012) then showed that the TPJ can be structurally divided into three subregions, each demonstrating differential functional connectivity to brain regions in the default mode, frontoparietal, and salience networks. This degree of distinction may be less pronounced once key task confounds are controlled, with an overarching function of the TPJ in addition to some specialization (Özdem, Brass, Van der Cruyssen, & Van Overwalle, 2017). Recent reviews have also pointed to the IPL as a hub that participates in a broad range of cognitive functions, from bottom-up perception to higher-order abilities such as ToM and EF (Igelström & Graziano, 2017). Others have suggested that some hubs have a heterogeneous quality, participating in different functional networks (Tomasi & Volkow, 2011).

These studies on neural hubs offer a unique opportunity for future research to expound the nature of the ToM–EF relationship. First, continued use of high-resolution imaging techniques is critical to fleshing out the degrees of structural/functional specificity for ToM and EF. Paired use of diffusion imaging, to delineate structural connectivity, and fMRI/NIRS, to describe functional properties, will certainly lead to increased precision in differentiating among the regions/networks dedicated to ToM and EF. Failure to interrogate these regions at a detailed level of analyses may hamper efforts to uncover the structural and functional specificity of brain regions and their attendant networks in supporting ToM and EF.

Moreover, quantifying the extent to which certain neural networks/hubs are activated in relation to ToM and EF tasks may help inform the directionality of the ToM–EF relationship. One approach is to map the neural co-activation patterns during EF and ToM tasks separately within the same participants (i.e., to map their CFNs), and then determine whether the covariance pattern elicited from one cognitive ability more strongly predicts performance in the other ability. Using functional connectivity methods, a related approach could be applied in which RSNs are first established using standard resting-state methods and then used to predict both ToM and EF performance outside the scanner (e.g., Kanske, Böckler, Trautwein, & Singer, 2015; Reineberg, Andrews-Hanna, Depue, Friedman, & Banich, 2015). Not only would this method validate the differential involvement of RSNs in particular cognitive tasks, it would enable an assessment of whether the RSNs conventionally assigned to one set of abilities (executive control vs. internal mentalization) more strongly predict performance in the other cognitive domain. Do both DMN and frontoparietal activity independently predict ToM, but only frontoparietal activity predicts EF? This may provide RSN evidence of a directional EF→ToM relationship. In addition, this method would complement the literature examining the interaction between RSNs in supporting particular abilities; that is, it is possible that network interactions are stronger for one ability than for the other. Perhaps the DMN and the frontoparietal network (or hubs within them) interact more strongly during ToM than during EF. Such a finding would suggest that EF modules are required for ToM to a greater extent than the reverse, again underlining an EF→ToM directional relationship. As an example, it has been shown that increased cooperation between the DMN and frontoparietal regions is associated with more rapid memory recollection (Fornito, Harrison, Zalesky, & Simons, 2012), with the PCC acting as a connectivity hub facilitating efficient processing. This sort of analysis is suggestive that the traditionally prescribed, and sometimes competing, brain regions/networks may have heterogeneous and integrative functions (Allen et al., 2014; Cocchi, Zalesky, Fornito, & Mattingley, 2013; Power, Schlaggar, Lessov-Schlaggar, & Petersen, 2013). At a whole-brain level, examining how connected frontoparietal, salience, and DMN regions are to the rest of the brain may inform regarding the extent to which mentalization versus executive control modulates other cognitive functions. Reineberg and Banich (2016) recently showed that nodes both within and outside the frontparietal network are associated with individual differences in common EFs. They also identified potential hubs, including the precuneus, inferior temporal gyrus, and frontal pole—the latter of which was proposed to support cognitive representation of actions and plans. Should this cortical hub, which involves a gradient for representing cognitive and emotional states, emerge in future studies as being associating with EF, this might provide evidence that representing mental states supports EF (ToM→EF). More generally, discovering how connected particular nodes/hubs are to the rest of the brain will help determine how accessible other regions are to higher-level modulation by either representation/reflection (ToM) or control (EF), thereby expounding the relative influences of ToM and EF on each other, and on other cognitive abilities. Complementary use of methods to assess structural connectivity and white matter tract integrity across regions will also improve our understanding of how particular brain regions/hubs work together in supporting ToM and EF, and the degree to which their association is due to common versus disparate mechanisms, or reciprocal versus directional influence.

Summary, limitations, and future directions

The present review drew on evidence from cognitive neuroscience to examine and help explain the developmental relation between ToM and EF. Three theories to explain this association are that (1) ToM relies on EF (EF→ToM), (2) EF relies on ToM (ToM→EF), and (3) ToM and EF are related through a shared neural system that supports both abilities (ToM↔EF). The key findings from this review are that (1) behavioral measures of ToM and EF are indeed robustly interrelated during childhood, both cross-sectionally and longitudinally, with the preponderance of research favoring the EF→ToM relationship over the reverse directional link; (2) evidence from neurodevelopmental, neurodegenerative, and neuropsychological lesion studies provides support for the notion that ToM and EF are at least partially separable in the brain, but also demonstrate considerable overlap; (3) studies of normative brain development support the coincident and rapid maturation of ToM and EF over the first 5 years of life, though few studies have explicitly mapped measures of both ToM and EF onto measures of structural or functional neural development within the same sample; (4) functional connectivity analysis suggests the possibility that the intrinsic, resting-state networks traditionally associated with ToM may developmentally precede those supporting EF, with increased integration between networks over time; (5) studies examining cognitive function networks generally converge on the conclusion that ToM and EF rely on distinct but also shared neural circuits, particularly in the mPFC, IPL, TJP, and IFG; and (6) certain brain regions may serve as hubs that are both structurally and functionally linked to other regions that “assemble” together to support both ToM and EF, and such hubs may support network interactions that explain the correlation between ToM and EF over childhood. Future use of RSN, CFN, and structural connectivity methods will help illuminate exactly why ToM and EF are associated and the degree to which these modulate one another and other cognitive proficiencies over the life course.

As of this writing, it appears that only the strict version of the ToM↔EF hypothesis, in which ToM and EF rely on completely overlapping neural structures, can be ruled out with confidence. However, stringent versions of the directional EF→ToM and ToM→EF proposals, in which one of these abilities is sufficient for the other, also seem unlikely. Rather, it appears that ToM and EF share some overlapping neural structures, while also recruiting regions that are specially designed for mental representation and executive control, respectively. Shared regions, which may operate as neural hubs, could partially explain the robust correlation between ToM and EF across childhood. Moreover, the extent to which these hubs and other nodes within ToM and EF networks are structurally or functionally connected, or the degree to which the networks cooperate across tasks, might help explain the existence of directional relations. Future research will be needed to disentangle these possibilities, with continued development and use of high-resolution imaging techniques making this enterprise within reach.

The highly replicable behavioral finding of an EF→ToM link is suggestive that performance on several ToM tasks involves neural mechanisms for response selection, inhibition, attention, and working memory (Saxe, Moran, Scholz, & Gabrieli, 2006a; Saxe et al., 2006b). However, the relative profusion of evidence in favor of this directional link may be due to a lack of empirical inquiry about competing models. Specifically, the ToM→EF proposal has been largely ignored in the adult literature, perhaps owing to the difficulty of systematically varying the level of mental state reasoning in order to examine the effects on EF. Two methodological approaches that may be useful in this context have been pioneered by Apperly and colleagues. The first is to manipulate the degree of congruence between self and other perspectives in a visual perspective-taking paradigm (see Qureshi, Apperly, & Samson, 2010), and the second is to manipulate psychologically relevant ToM parameters (e.g., true vs. false belief; Hartwright, Apperly, & Hansen, 2012). Using the latter approach, Hartwright et al. showed that variation in the valence of belief and desire recruits neural regions regularly implicated in ToM (TPJ and mPFC), but also recruits nodes involved in EF (dmPFC, including dorsal ACC). This finding is commonly interpreted as EF (e.g., perspective inhibition) supporting ToM (EF→ToM). Indeed, it may be the case that increased coupling of DMN and frontoparietal regions is required for goal-directed social cognition (see Igelström & Graziano, 2017). Using these methods, firm directional conclusions could be enhanced by measuring ToM and EF in the same study and examining their mediational effects on each other. For instance, inhibition (measured behaviorally) may mediate the link between observed neural activity in regions such as ACC and false belief reasoning (EF→ToM). Alternatively, the heightened activity in the ACC during false belief reasoning may reflect a brain state that is primed for executive processing. In this case, ToM may mediate ACC activity on EF (ToM→EF). Future studies that systematically vary EF- and ToM-based parameters and examine the effects on performance in the other domain, and that investigate these effects at the level of neurobiology, are ripe avenues for developmental and cognitive psychologists hoping to further delineate the influences of EF and ToM on one another’s development.

Although the discrete brain regions implicated in ToM and EF have been fairly well described, we still know relatively little about the structural and functional connectivity and interregional/network dynamics supporting these abilities. The extent to which ToM and EF nodes feed into networks supporting the other ability, or the degree to which relatively separable networks interact during ToM and EF tasks, is a burgeoning field that will benefit from continued multimodal research that integrates across RSN, CFN, and structural connectivity approaches. The use of complementary methods such as TMS may also be informative. As an example, TMS applied over the dlPFC has been shown to induce a selective effect on cognitive aspects of ToM (Costa, Torriero, Oliveri, & Caltagirone, 2008; Kalbe et al., 2010), and has been shown to impair EF in separate studies (Ko et al., 2008; O. A. van den Heuvel, Van Gorsel, Veltman, & Van Der Werf, 2013). Given the purported role of dlPFC in both ToM and EF, disruption of this nodal structure may simultaneously interfere with the functioning of the networks subserving both ToM and EF. Is dlPFC a common neural structure that supports both ToM and EF? Is it a hub that connects various regions underlying both abilities? Is the effect on ToM explained by disruption in executive skills such a perspective inhibition, or is interference with mental state representation driving impairments in EF? Future use of within-subjects designs that simultaneously measure ToM and EF using TMS approaches may shed light on the specific role of this and other structures in the development of these cognitive abilities.

Another area for future research is to apply sound theoretical models to understand how componential processes may explain the shared neural basis of ToM and EF. These may be lower-level perceptual and sensory functions (e.g., see Stone & Gerrans, 2006), but may also involve intermediate cognitive (endo)phenotypes. Candidate processes that may account for such an overlap are the allocation of attention, processes for comparing internal predictions to external stimuli, and response suppression/selection. As we alluded to above regarding network synchronization during the first 2 years of life, the allocation of attention may be one process that underlies both EF and ToM (see Corbetta et al., 2008). For EF, the allocation of attention may be particularly relevant due to the need for monitoring and interrupting automatic responses in order to provide a behaviorally appropriate but conflicting response. In that sense, attention to salient external stimuli that dictate response selection should be considered a key componential process. For ToM, tracking key events (e.g., type of object, location transfer, or the presence or absence of a protagonist) or monitoring multiple simultaneous beliefs (e.g., true vs. false beliefs) would also place demands on attentional resources. The seminal role of attention in both ToM and EF is plausible on the basis of the abovementioned evidence that the dorsal attention network develops faster than the networks typically recognized to underlie ToM and EF, with connectivity changes between certain networks over the first 2 years of life (Gao et al., 2015). Interestingly, Santiesteban, Kaur, Bird, and Catmur (2017) recently used disruptive TMS over the right TPJ to show that domain-general attentional processes may mediate the ability to take another’s visual perspective during self-perspective judgments. Future use of TMS during ToM and EF tasks that vary in their attentional demands may illuminate the degree to which such attentional processes are important in facilitating ToM and EF performance.

Furthermore, mental processes for comparing internal predictions to external stimuli may be needed for both EF and ToM (Decety & Lamm, 2007; Spengler, von Cramon, & Brass, 2009). For executive inhibition, this involves a comparison between generated expectations and actual external stimuli that may require internal models to be inhibited, shifted, or otherwise manipulated in order to successfully complete an EF task. For ToM, one must compare an internal mental model of what one actually knows to that which is externally (or subjectively) known by another; that is, one’s own beliefs, desires, emotions, or intentions need to be compared to another person’s in order to successfully complete ToM tasks. Thus, a shared neural network for ToM and EF might be partially explained by the recruitment of common brain regions that support constituent abilities needed for both abilities.

Finally, theoretical models posit that three other cognitive faculties may underlie ToM and EF, and thus may potentially explain their functional overlap. The first is language ability. For ToM, communication systems may foster the internalization of multiple perspectives that are borne out in social interactions, thereby facilitating the representation of others’ mental states (Fernyhough, 2008). For EF, language may scaffold children’s ability to control thoughts and behavior by internalizing words, gestures, or other semiotic cues that have been used to regulate the child’s behavior, or that the child has used to influence others’ behavior (Fernyhough, 2010). From this perspective, language may facilitate the verbal representation and reasoning about mental states required for ToM, as well as the capacity for verbal self-regulation and self-monitoring that typifies EF (Müller, Jacques, Brocki, & Zelazo, 2009). Interestingly, language ability is known to be supported by a neural mechanism involving the TPJ (Binder et al., 1997; Perner & Aichhorn, 2008). Thus, the link between language and EF/ToM may not be simply task-driven (i.e., high language demands). Instead, language may play a functional role in ToM and EF development, and accordingly, the relatively circumscribed neural circuits supporting language ability may feed into the networks that support ToM and EF.

Another basic process that may be shared by ToM and EF, and thus explain their overlap, is self-consciousness or self-awareness. With regard to ToM, an awareness of the objectivity of one’s own body, thoughts, and experiences is a necessary precondition to differentiate self from other, thus enabling the partitioning and ascription of mental states to self and other. For EF, the capacity to control or inhibit thoughts and behavior relies on the ability to understand that one is capable of exerting this kind of control, thus demanding self-awareness (Lang & Perner, 2002). Indeed, a recent meta-analysis of fMRI studies revealed that self-recognition and false belief understanding share overlapping regions in the mPFC (van Veluw & Chance, 2014). As we suggested above, mPFC may be a nonspecific region supporting both ToM and EF. Thus, self-conscious awareness may be another componential process supporting ToM and EF, and may partially explain their neural and behavioral overlap.

A third and related process that we propose may explain the ToM–EF correspondence is secondary representation. This process precedes the capacity for meta-representation, which is believed to underlie belief reasoning and ToM, but follows primary representation, in which children represent the world in strictly literal terms (Perner, 1991; Whiten & Suddendorf, 2001). Secondary representation emerges in the second year of life, at which point children are believed to be able to consider multiple mental models simultaneously. In turn, this capacity to hold in mind two (possibly conflicting) representations of the world supports a suite of abilities, such as pretend play, cooperation, empathy, and joint attention (Moore, 2007). The capacity to represent the intentions of oneself and others may be of considerable importance here (Moore, 2007; Tomasello, 2001). Neuroimaging studies examining secondary representation are scarce and are often conflated with the notion of “meta-representation.” However, Critchley, Wiens, Rotshtein, Öhman, and Dolan (2004) suggested that secondary representation is supported within PFC and the cingulate cortices, with a particularly prominent role of mPFC (see also Amodio & Frith, 2006). Self-consciousness and secondary representation also relate to the concept of agency—the experience of oneself as the generator of thought and action. With this level of understanding, children may begin to successfully differentiate self from other and, by extension, begin to represent multiple mental representations simultaneously, while also understanding the causal effects of mental states on behavior. This has clear implications for the reasoning about mental states that underlies ToM and the controlling/inhibiting of cognition that contributes to EF. Notably, the mPFC is crucially involved in agency (David et al., 2006; David, Newen, & Vogeley, 2008), as is the right TPJ (Decety & Lamm, 2007).

To understand the contributions of these componential processes to ToM and EF development, these domains themselves require a more nuanced theoretical analysis. For example, although models of attention are abundant, it is not clear which dimension(s) of attention may be especially involved in ToM and EF (selective attention, divided attention, attentional (re)orienting and disengagement, or attentional shifting). Indeed, these dimensions of attention are not completely overlapping, either behaviorally or at the level of the brain (see Posner & Rothbart, 2007, for a review). It is also plausible that integration across attention-regulating systems underlies the capacity for incipient social–cognitive abilities such as joint attention, which have been shown to predict later mental state reasoning and global measures of EF tapping inhibition, working memory, and cognitive flexibility (Wade et al., 2016). Attention to internal representations and coordination of attention systems may therefore facilitate children’s capacity to monitor representations of their own and others’ goal-directed behavior, thus laying a foundation on which more sophisticated cognitive abilities can mature (Mundy & Newell, 2007). Similarly, for response suppression/selection, theories that predict how certain responses are selected for over others, and the implications of this for ToM and EF (and indeed for other cognitive systems) are needed. Here, it may be worth differentiating response selection into perceptual (i.e., selecting from the external world) and conceptual (i.e., selecting from internal representations) domains (Kan & Thompson-Schill, 2004). Developmental models of how such a system biases the selection of responses based on competing perceptual or representational stimuli—for instance, by representing behavioral goals (Yantis, 2008) or action values based on reward benefits and response costs (Rushworth, 2008)—will be helpful in determining how such processes relate to more complex executive and mentalization abilities.

Finally, if mental processes for comparing internal predictions to external events support ToM and EF, what exactly is the character of this mismatch detection mechanism? How are internal predictions generated (on the basis of prior beliefs or experience; means–end or teleological reasoning; or some other generator)? And how do external stimuli become integrated into existing mental models to prompt the type of conceptual change that allows a child to know that others may possess information about the world that is different from their own, and that this information may be false (as in ToM)? Thus, although a constellation of domain-general cognitive functions may be recruited in the service of both ToM and EF and may help explain their behavioral and neural overlap, it is also true that these foundational abilities deserve further theoretical and empirical attention in their own right.

Conclusion

Converging evidence across subfields of developmental and cognitive neuroscience suggests that ToM and EF share some underlying neuroanatomical mechanisms, supporting the ToM↔EF proposal. These neural networks and their attendant nodes are not completely overlapping, however, which effectively rules out a strict version of this proposal. Moreover, brain regions generally involved in EF also appear to be recruited during mental state reasoning, and systematic manipulation of EF has consequences for ToM performance in both children and adults, supporting the EF→ToM directional relationship. Although support for the ToM→EF link is comparatively limited, this may reflect a paucity of empirical studies as opposed to null findings. Furthermore, there is some evidence using RSN methods that networks generally implicated in ToM may ontogenetically precede and increasingly interact with the networks supporting EF over the first 2 years of life, and other evidence pointing to critical nodes within traditional ToM networks that may serve as hubs that integrate and process information across several brain regions. Thus, the ToM→EF proposal deserves further scrutiny. New imaging techniques and experimental methodologies, especially those adapted for use with children, will be essential in elucidating which of these models best explains the developmental link between ToM and EF. Indeed, consistent with intervention studies in children that have shown robust bidirectional facilitatory effects of ToM and EF on each another (Kloo & Perner, 2003), both common and discrete neurological mechanisms may be operative in the maturation of ToM and EF, and these mechanisms may scaffold and reinforce each other over the course of development. More nuanced theories about how basic cognitive processes such as attention, language, self-consciousness, secondary representation, and agency explain the functional overlap between ToM and EF will help integrate the fields of cognitive neuroscience and developmental psychology and shed light on how shared and unique neural mechanisms contribute to these indispensable cognitive faculties across the lifespan.