1 Introduction

One central debate in cognitive science concerns whether mental representations are symbolic or perceptual. Many terms have been used to refer to the two alternatives of this distinction. In recent times, much attention has been given to the opposition between modal and amodal representations (Kaup et al., 2022). In this paper, I will focus on concepts, which are usually conceived by cognitive scientists as one such kind of representations (Margolis & Laurence, 2007), stored in long-term memory and employed in higher cognition, where they stand for categories and linked bodies of knowledge (Machery, 2009, p. 12). The notion of a concept itself is not devoid of controversy. Philosophers, for example, often adopt a different definition, which sees concepts not as cognitive contents encoded by the brain, but as mind independent entities (Shapiro, 2019, chap. 4). This difference springs from the fact that cognitive scientists are interested in the role concepts play in our cognitive processes, while philosophers are usually interested in them as components of propositional attitudes (beliefs, desires, and so on), and adopting one definition or the other leads to different questions and research programs (McCaffrey & Machery, 2012). As it is arguably the most useful definition for cognitive scientists, in my use of the term “concept” I will stick to Machery’s definition and point out when other authors use definitions which are significantly different.Footnote 1

The focus of this paper is on a debate between a position according to which concepts are modal representations and the opposing idea that concepts are amodal representations. According to the former – which is also dubbed “concept empiricism” (Prinz, 2002) or “neo-empiricism” (Machery, 2007) –, concepts are couched in a perceptuomotor modality. This has been the commonsense view in Western thought from at least the times of the British empiricists, having links also with behaviorism and logical positivism (Prinz, 2002, chap.5). In contrast with this, concept amodalism, the idea that concepts are symbolic representations endowed with language-like properties, has been the default view in standard cognitive science and is deeply linked with Fodor’s (1975) Language of Thought hypothesis. In the following, I will refer to the two theories respectively as concept modalism and concept amodalism (modalism and amodalism for short) and to their theorists respectively as modalists and amodalists. I will avoid referring to modalism as a kind of empiricism, as this point has proven to be highly controversial given the profound differences between modern modalism and empiricism (Barsalou, 2016). In any event, modalism in its current form is viewed as being consistent with the embodied and situated cognition paradigms (see Clark, 1998), and has been criticized by authors who identify reasons to believe that concepts are in fact amodal.

Both modalists and amodalists have collected evidence in favor of their respective model, but the controversy persists about how to evaluate this evidence. My intent is to show how the lack of agreement in the debate on the format of concepts derives from theoretical weaknesses in the criteria employed to distinguish modal from amodal representations in the first place, weaknesses which ultimately spring out of a lack of clarity about the definitions of key terms. In doing so, I aspire to lay out some preconditions which a theory on the format of concepts should satisfy. I will argue that when these requirements are not met, the theory is unable to properly assess whether the empirical evidence favors modalism or amodalism. Prior to answering the question of whether concepts are modally or amodally represented, one should define what modal representations are. A host of criteria have been proposed in the existent literature. My review of them does not aspire to be exhaustive. However, I follow authors like Haimovici (2018) in selecting the isomorphism criterion, the neural location criterion, and the input specificity criterion as the main ones. The discussion of the isomorphism criterion is attached to the presentation of modalism and amodalism in §2. After a review of the empirical evidence in §3, I will discuss the neural location criterion (§4) and the input specificity criterion (§5).

In particular, I will show how an ambiguity about what can be considered to be “real” conceptual processing makes harder to evaluate the neuroscientific evidence as it is always possible to reinterpret either the activation in modal areas or the activation in arguably amodal areas as secondary. Furthermore, an evaluation of the brain activations during conceptual processing should be preceded by a clear definition of what counts as a perceptual area in the first place, a definition which is problematic. One can in fact define as perceptual what responds specifically to one modality and as amodal what responds to different modalities. The problem with this approach is that many representations which are commonsensically considered to be perceptual would end up being amodal, dissolving the debate in an unproductive manner. Finally, in §6, I offer some suggestions on how the debate on the format of concepts can advance beyong the recent impasse. In particular, I suggest adopting a broad definition of “concept” and a more graded view on the multimodal/amodal distinction.

2 A classification of theories on the embodiment of concepts

The debate on embodiment is not an either/or question, as current theories are placed on a spectrum (Chatterjee, 2010). One of the most influential classification attempts, and the one that will be adopted in this paper, was made by Meteyard et al. (2012). They employ a four-term classification in which theories are categorized as unembodied or belonging to three forms of embodiment: secondary, weak, and strong. In relation to concepts, we can note how unembodied theories and theories belonging to secondary embodiment mostly subscribe to amodalism, while forms of concept modalism naturally follow from weak and strong embodiment.

2.1 Amodal theories

Concept amodalists hold that we possess a conceptual representational system which is distinct from our perceptual representational systems. Concepts are encoded in this specific representational system, which unites sensory and non-sensory information about any given category into a single distinct language-like representation (Machery, 2006). Amodalists posit that the arbitrariness of cognitive symbols is necessary to account for the systematicity and productivity of thought, as they seem to require symbols which can be freely recombined together.

According to unembodied theories, sensorimotor information plays no role in conceptual representation, to the point that these theories predict that no impairment in semantic processing would result from a disruption of sensorimotor systems (Meteyard et al., 2012). Concepts according to these theories are amodal representations – that is, representations with a structure which is different from the structure of the things they represent (Kaup et al., 2022). Although amodal representations need not to be linguistic, propositional representations are usually held to constitute the quintessential kind of amodality. This is because, while words present themselves to our cognition in a perceptual modality (we read them with our visual system if they are written or we hear them with our auditory system when they are spoken), the contents are not presented in any sensory modality. At least apparently, the word “cat” leads to cats without passing through sensory intermediaries (Shapiro, 2019, p. 85). Consequently, linguistic representations are also arbitrary, as there is no necessary link nor isomorphism between words and their meanings (as was historically noted by Saussure, 1966/1916). In much the same way, while amodal representations in general can encode information which is derived through one or more sensory modalities, they represent meaning irrespective of the sensory modality that was involved in the perception of the stimulus (hence, they are amodal) and they contain no information which would tell us what they refer to (hence, they are arbitrary) (Louwerse, 2018).

According to theories classified by Meteyard and colleagues as belonging to secondary embodiment, concepts are amodal, but there is no hard boundary between them and sensorimotor representations. In fact, there is a necessary relationship between concepts and sensorimotor content: either concepts are derived from sensorimotor inputs (as it is believed by Patterson et al., 2007) or they are instantiated through sensorimotor information (as proposed by Mahon & Caramazza, 2008). Secondary embodiment theories hold that concepts are amodal but connected to sensorimotor information, so that proper semantic activation does not overlap with the sensorimotor areas of the brain.

Both unembodied and secondary embodiment theories are compatible with the ideas of classical cognitive science, according to which cognition consists of computations over amodal and arbitrary symbols and perceptual organs are just peripheral devices, whose connection to central cognition is what allows the mind to have access to the world of objects to which its symbols refer (for an early exposition, see Craik, 1943, for an in-depth analysis see Pylyshyn, 1985). According to the standard view, which has been dubbed “the sandwich model” (Hurley, 1998), the relation between the mind and behavior can be described as a process of translation from external processes into the internal symbolic code (perception) and from the symbolic code to external processes (action), while the “meat” cognitive scientists should be interested about is what happens in between perception and action, that is the transformation, based on explicit rules, of this symbolic code into another set of symbols, according to which the organism’s behavior must be organized. Perceptual organs and the corresponding perceptual areas of the brain, with their modally-connotated information, are just input–output systems, while actual cognition takes place over a code which has the language-like properties of arbitrariness and amodality and which has thus been called the “language of thought” (Fodor, 1975).

2.2 Modal theories

The traditional view of cognition has been criticized on the ground that it is not clear how the symbols which constitute cognition can refer to anything in the outside world (Harnad, 1990; Searle, 1980) and it has also been accused of positing processes which would be too slow and cumbersome for the time-pressured and ecologically bounded environment the human mind operates in (Clark, 1999). According to embodied cognitive science, cognition is not about symbolic representation as it is about real-world action and crucially depends upon perceptual and motor systems. Although some authors have brought these ideas to an anti-representational extreme (as, for example, Varela et al., 1991), most scholars agree that representations are still needed to account for the capacity of humans to think of things which are not present in the here-and-now. Embodied cognitive science thus faces the problem of reconciling its deflationary ideals with the need for stable representations; one of the main ways in which this problem has been addressed is through the notion of simulation. One of the interpretations given to embodied cognition is thus that sensory-motor systems help the mind in representing information and drawing inferences by simulating the relevant aspects of the physical world (Wilson, 2002).

Contemporary concept modalism crucially adopts this idea of simulation as the basis of representation and inferential reasoning. According to the theory, concepts are perceptual representations in the sense that each concept is composed of off-line simulations of the perceptual experiences one has had with the tokens of the category represented by the concept. The concept HAMMER, for example, consists of a set of perceptual representations (how hammers look, which movements to perform when using them and so on), so that to entertain the concept means to simulate the perceptual interaction with hammers (seeing them, using them et cetera). These simulations involve the reactivation of the same sensorimotor brain areas which are active during online perception and action, but they happen “subthreshold” not leading to any form of action in the external world (hence, they are off-line). When hearing or using the word “hammer” we are employing the same areas which are responsible for our perception and interaction with hammers, although we are doing it in a subthreshold and unconscious manner. So, the world presents its content to us in a modal way even if we are not conscious of it. In this, contemporary modalism differs from traditional empiricism, as according to modalism concepts need not to be conscious nor modal representations have to be consciously held mental images. Also, a modalist need not agree with the inherent anti-nativism of empiricism: it is one thing to debate the format of concepts, another to discuss whether they are innate or acquired. Furthermore, in stating that all concepts have a perceptual format, a concept modalist adopts a broader conception of what “perceptual” means, which includes sensory, motor, affective, and proprioceptive states (on these differences, see Barsalou, 2008). Meteyard et al. (2012) distinguish weak and strong versions of modalism: according to the former, sensorimotor information, which can be partially abstract, at least in part constitutes conceptual representations; while according to the latter, conceptual processing is entirely dependent upon sensory and motor systems. Strong and weak versions of embodiment also yield different predictions. Strong embodied theories predict activation of the primary sensory and motor cortices across all semantic tasks. Weakly embodied theories predict that not each tokening of a concept implies detailed simulations in the lowest sensorimotor levels. One theoretical advantage which modalists traditionally claimed their model held over amodal views of cognition is its parsimony. In the modalist paradigm, sensorimotor systems (which are seen as primary) account for cognition without the need to postulate a further set of amodal symbols. Modalists have argued that if all cognition can be concisely explained without recurring to amodal symbols, in the lack of positive evidence of their existence, we should not postulate them at all. In one of the early attempts to defend concept modalism, Barsalou (1999) remarks that “there is little empirical evidence that amodal symbols exist”.

Central to the criticism made by embodied cognition theorists against the traditional approaches is the idea that it is unclear how amodal symbols can refer to outside entities (Harnad, 1990). So, to define modal representations, supporters of modalism have often employed a criterion according to which a representation is modal if it presents an isomorphism between its format and its content, while amodal representations have an arbitrary relation to what they represent. Although the problem of the grounding of representations is indeed central to cognitive science, it should be noted how amodal symbols may be related to their referents by systematic mappings without the need of an isomorphism to their contents, while, on the other hand, modal representations do not make the problem of the dethatched nature of symbols less severe as perceptual representations, even in the earliest stages of sensory processing and in the latest stages of motor planning, present information which is highly abstracted (Mahon & Hickok, 2016). So, the idea of modalists like Barsalou (1999) that modal representations present an isomorphism to their referents in the outside world may be insufficient to criticize the classical approach to cognition based on amodal symbols. Moreover, it might also fail to achieve the goal of identifying representations which are more grounded.Footnote 2

3 Discussion of the empirical evidence

Embodied theorists have furnished a wealth of experimental data in support of the perceptual character of concepts. As good reviews of this evidence already exist (see, for example, Barsalou, 2008; Fischer & Zwaan, 2008; Chatterjee, 2010; Kiefer & Pulvermüller, 2012; Pecher, 2013; and Dove, 2016), the following doesn’t aspire to be a fully-fledged review, but just to illustrate the main experimental designs that have been employed to support a modalist conception and the rationale(s) underlying them.

3.1 Behavioral and neuroscientific evidence and its interpretations

Behavioral studies (like Richardson et al., 2003; Meteyard et al., 2007; and Zwaan & Taylor, 2006) provide evidence that the processing of verbs referring to motion affects the capacity of people to recognize images in specific ways: verbs describing vertical motions (lift) affect the capacity to discern shapes at the top/bottom of the screen, verbs describing horizontal motions (push) have the same effect on shapes at the left/right, motion verbs affect the ability to detect visual motion. Other studies (like Borghi et al., 2004) have found that simulated actions are part of the meaning of words: participants were faster to say that, for example, roof is part of car if they had to press an upper button first and were faster to say that wheel is part of car if they had to press a lower button (for similar effects see Glenberg & Kaschak, 2002). As one can understand the meaning of sentences in the absence of any movement, it has been argued that the time differences may have nothing to do with the conceptualization of actions and may be epiphenomenal (Chatterjee, 2010). It should also be noted that Kaschak et al. (2005) found that watching upward or downward motion slows semantic processing of sentences describing motion in the same direction, so congruency can lead to facilitation in some tasks and to interference in others (Willems & Francken, 2012).

Neuroscientific evidence for the neural base reuse, that is the activation of sensorimotor areas of the brain during the processing of sensory- and motor-related concepts (whether they are expressed by natural language or not) has been used to argue that the same brain areas which give us our percepts also give us our concepts, seemingly supporting the view that the latter are constituted by the former. Such reuse has been found in relation to the olfactory (González et al., 2006), the auditory (Kiefer et al., 2008), the visual (Pulvermüller & Hauk, 2006) systems, as well as in the processing of words with a connection to affective states, like pain (Reuter et al., 2017) or rewarding processes, like eating (Goldberg et al., 2006; Simmons et al., 2005). The same effect has also been noticed concerning motor areas: conceptual processing of action-related words has been found to activate motor areas of the brain in an effector-specific fashion (Hauk & Pulvermüller, 2004; Hauk et al., 2004; Tettamanti et al., 2005; Willems et al., 2010). The effect seems also to go the other way around, from concepts to the processing of sensory data, which seems to be evidence for modalist theories insofar as they predict that a representation of a category (the concept) and the perception of that same category should rely on (partly) overlapping perceptual features, meaning that the recall of the concept of a certain category should facilitate the perception of objects of that class. There are studies which point exactly in this direction (Stanfield & Zwaan, 2001; Zwaan et al., 2002, 2004), showing that the early presentation of sentences that imply the orientation, shape, or motion of an object (e.g., “He hammered the nail into the floor” vs. “He hammered the nail into the wall”) affects the speed at which subjects process pictures in which the relevant dimension either matched or mismatched the sentence (e.g., the picture of a nail horizontally or vertically disposed). Pecher et al. (2009) have shown that this facilitation effect persists even after a long (45 min) delay between the presentation of the sentences and that of the pictures. This seems to imply that concepts contain perceptual features and that they do so in a context-dependent manner (the precise features evoked by the presented sentences are those in which the picture processing facilitation is more evident).

It should be noted that many studies have found that areas related to conceptual processing are not isomorphic to those used in direct experience, but activation in conceptual tasks is often shifted in areas anterior to perceptual ones (for a review, Chatterjee, 2010). These results are compatible with weakly embodied theories but are harder to reconcile with strongly embodied ones, as the former allow for concepts to be the result of an abstraction from direct sensorimotor information that can be happening in areas located adjacent to modality-specific cortical areas involved in direct experience (Meteyard et al., 2012). Also, Mahon and Caramazza (2008) have argued that neuroscientific evidence collected in favor of embodied cognition is compatible with the hypothesis that such activation cascades from disembodied concepts to the sensory and motor systems. Modalists have replied to the claim on the possible epiphenomenal nature of the sensorimotor activation by citing the fastness of the activation of motor areas in the brain after the appearance of the word/sentence (Pulvermüller, 2005). They also noted how modulating brain activity in the motor areas has effect over the causal sequence of processes which underlies language (Buccino et al., 2005; Pulvermüller et al., 2005), suggesting (Barsalou, 2008) that the motor activation is not epiphenomenal.

Another phenomenon which has been taken as supporting a modalist interpretation is the evidence that there exists a modality-switching cost associated with a property identification task. Building on a previous study by Spence et al. (2000), Pecher et al. (2003) presented the subjects with a series of pairs of nouns and predicates and had them judge whether the predicate was true of the object denoted by the noun; in each pair the first property was followed by a property that belonged either to the same sensory modality or to a different one. The result showed that participants were faster at making judgments about the second member of a pair when it involved the same modality as the previous one compared to the situation were the two predicates belonged to different modalities. The modalist interpretation of these results has been that if switching perceptual modalities implies a cost over conceptual processing in terms of processing speed, then this implies that conceptual processing depends upon perception. Pecher and colleagues also argued that amodal theories are wrong as they do not predict such a switching cost. Amodal theorists have replied that the switching cost may arise from connections inside a set of amodal symbols (Louwerse & Connell, 2011). Under this interpretation, the fact that a predicate like “loud” enhances the speed of processing of the predicate “rustling” if compared to the predicate “tart” doesn’t depend on the fact that “loud” and “rustling” belong to the same sensory modality but on the fact that “loud” evokes a series of connections which favor the processing of subsequent related concepts compared to unrelated ones. The connections evoked are not modal per se (an abstract concept would favor the processing of closely related concepts if compared to concepts which are distant from it): it is the closeness two concepts enjoy inside the linguistic context and not their closeness in terms of sensory modality which explains the modality switching cost. This clearly does not amount to a positive reason against modalism, and it does not establish that the switching cost depends on linguistic structure rather than on modality. It just shows that an alternative interpretation of the data on the switching cost is possible, so that the existence of this effect is not unequivocal evidence in favor of modalism.

3.2 Evidence from neuropathology

Neuropathological data on the embodiment of concepts is equivocal. There have been studies showing how patients with a range of conditions damaging their motor areas in the brain have problems with the processing of action words. These conditions include amyotrophic lateral sclerosis (Grossman et al., 2008), apraxia (Buxbaum & Saffran, 2002), motor neuron disease (Bak et al., 2001), Parkinson’s disease (Boulenger et al., 2008; Fernandino et al., 2012, 2013) and general lesions in the motor areas (Moro et al., 2008; Serino et al., 2010). The same seems to hold true for perception, with patients with damage at the auditory cortex having problems with the processing of sound-related concepts (Trumpp et al., 2013). However, classical models of apraxia distinguish production and recognition of actions (Rothi et al., 1991), and they have been confirmed by a series of recent studies (Johnson-Frey, 2004; Negri et al., 2007). Binder and Desai (2011) notice how, contrary to the idea, endorsed by some versions of modalism, that conceptual processing is largely dependent on sensory-motor areas, conceptual deficits following impairments in sensory-motor areas tend to be subtle.

Moreover, there are cases in which neuropathology seems to provide evidence which seemingly supports amodal theories, putting into question Barsalou’s claim that there is no evidence for the existence of amodal symbols. One piece of such evidence comes from a neuropathological condition called semantic dementia (SD). The assumption made by amodalists is that if conceptual processing depends on perceptual processing, in the sense that there is a reuse of the same modality-specific brain areas that are active during perception, then we should assume that conditions which affect conceptual processing do so in a modality-specific manner, as they damage the perceptual areas which give rise to concepts in the first place. However, SD seems to behave differently from how modalists should predict. The condition is a rare variant of frontotemporal dementia caused by bilateral neurodegeneration of the anterior temporal lobe (ATL). The main symptom of SD is a decline in expressive vocabulary accompanied by deficits in naming and recognizing objects. The discussion about SD is particularly relevant in the format debate as this condition presents itself as a specific damage of the concept processing system: in SD, a damage to a circumscribed brain region causes a selective symptomatology as, until the very last stages, only semantic processing is affected (McCaffrey, 2015).

The amodalists argue that the idea that conceptual processing depends on perceptual processing would imply the prediction that SD patients present a modality-specific conceptual loss, while this is not the case. The modalists in fact should see conceptual knowledge as distributed across the perceptual systems, localized in different brain regions, and thus when such a modality-specific brain region is impaired by SD, only the modality-specific knowledge stored in that region should be affected. This would mean that the patients suffering from SD should lose access to a given perceptual modality across all concepts, while preserving the capacity to represent these same categories from other modalities: for example, they should lose the capacity to reenact visual experiences of dogs (and all other visible objects), while preserving the capacity to reenact auditory or olfactory experiences of dogs (and other objects), being thus able to represent the concept DOG from all but one impaired modality (Machery, 2016).

SD, however, does not seem to impair the conceptual system in the way predicted by modalists. First, SD patients lose conceptual knowledge across the various categories (so they can lose knowledge about some tools but not all of them, some animals but not all of them). Second, when they lose a concept, they lose access to it from all the modalities; they lose all feature knowledge (visual, auditory tactile and so on) for the compromised concepts. Third, the knowledge of those same features for related concepts is spared. Simply put, the conceptual deficits caused by SD seem to be broadly cross-modal (Machery, 2016) and they have been called (McCaffrey & Machery, 2012) “modality-general, item-specific” deficits. SD thus seems to show that conceptual knowledge is not distributed in the brain, rather it is the product of specific brain areas, and that such knowledge is not organized according to perceptual features. Amodalists have used the features of SD to argue that the ATL is a center specific for amodal conceptual processing. In their model, the activation of perceptual areas in conceptual tasks does not constitute actual concept processing, rather processing happening in the ATL does. It should be noted how SD symptomatology seems only to be a problem for strong accounts of embodiment: weakly embodied theories do not propose that a concept must be necessarily accompanied by lower-level sensorimotor information, as they envision the possibility of concepts resulting from processes of abstraction and convolution (see for example, Michel, 2021). Whether these processes can account for the formation of all concepts is a complex question we will only partially touch in Section 5. If the recent modalist accounts hold true, a concept may be recruited from higher-level representations without the necessary coactivation of lower-level sensorimotor data. The overall effect in pathology may be that of cross-modal deficits as those observed in SD. Note that such deficits are also predicted by hybrid accounts that will be reviewed in Section 6 of this paper.

4 The neural location criterion

4.1 Amodalist and modalist reinterpretations of the empirical evidence

As the empirical evidence seems to lend support for both modalist and amodalist interpretations, scholars supporting one framework or the other have tried to reinterpret the data which seemingly favors the opposing point of view.

One idea amodalists, like Mahon and Caramazza (2008), have provided to show how the neural base reuse evidence does not necessarily lead to a modalist model calls into question the embodiment theorists’ belief that an early somatotopic activation of the motor system implies that the amodalist view of concepts is wrong. Their opposing model, called the grounding by interaction hypothesis, concerns the activation of motor areas specifically, but the reasoning can be easily extended to data related to the activation of sensory areas in the processing of sensory-related concepts. They propose that the activation of motor areas cascades from disembodied concepts to the sensorimotor systems that interface with the conceptual systems. This may seem an ad hoc reinterpretation, but the authors argue that the phenomenon of activation cascading between qualitatively distinct levels of processing is documented elsewhere in cognition, as for example in speech production. Early theories suggested that each step in speech production (conceptual processing, lexical retrieval, phonological encoding) could not start if the preceding one had not finished (Levelt, 1989). However, further studies (Morsella & Miozzo, 2002; Navarrete & Costa, 2005) have suggested that, even though everyone agrees that the phonology of a word is distinct from its meaning, phonological processing is intertwined with lexical processing. Mahon and Caramazza thus warn that we should resist the temptation to take the fact that motor processing is intertwined with conceptual processing as evidence that the former is constitutive of the latter, as the activation of brain perceptual areas in the processing of perception-related concepts does not automatically show that the concepts themselves are modal. In fact this activation may be the by-product of activation elsewhere in the brain. They have furthermore posited that the activation of perceptual areas which goes along with conceptual processing does not constitute the concept, rather it serves to ground “abstract” representations in the sensory and motor content which mates our interaction with the world. Differently from a completely disembodied view of concepts, the grounding by interaction model suggests that the instantiation of a concept includes the retrieval of perceptual information as such information “enriches” the concept; differently from a completely embodied view of concepts, perceptual information does not constitute the concept itself, as removing it does not lead to a loss of the concepts but only to concepts which are impoverished. The authors make an analogy between this view of concepts and syntactic structures. The syntactic structure of a sentence is not tied to the specific words though which the expression of the structure is realized, but the structure “wears” words, just like concepts “wear” perceptual information. I cannot have syntactic structures independent of words (“naked” syntax), but the same structure can be applied to different sentences, showing that it is independent of specific words. So, a “naked” concept, deprived of the accompanying perceptual information, is not useful to interact with the world, but the fact that it each concept can be accompanied by differing perceptual information shows that concepts are independent of it. The sentences “the dog jumps over the chair” and “the dog walk under the chair” are accompanied by differing perceptual simulations, but the concept DOG, which is evoked in both, is the same.

A similar reinterpretation of the neural base reuse evidence comes in the form of Machery’s (2016) offloading hypothesis, according to which concepts are amodal, but we often manipulate perceptual and motor representations to solve tasks, offloading them from the amodal conceptual system to the perceptual ones. Perceptual representations are not constitutive of concepts but may be used when the conceptual system does not contain the information needed to solve a certain task or in the case of tasks which are solved more easily through offloading that through the sole involvement of the conceptual system (the perceptual representations are a heuristic). We possess two representational systems, one conceptual (and amodal) and the other perceptual (and modality-specific) and we employ the latter one during tasks in which the former wouldn’t be as useful alone. Interestingly, in the attempt to evade the problem posed by the peculiar nature of SD deficits, modalists give an interpretation which has the same structure as the offloading hypothesis reinterpretation of the neural base reuse evidence. According to Kiefer and Pulvermüller (2012), conceptual knowledge is grounded in sensory and motor areas, while the ATL just serves the role to “facilitate” conceptual processing, without it storing conceptual knowledge per se. The authors go on to argue that one may conceive the ATL as a “convergence zone” which integrates distributed modality-specific conceptual features in a common semantic space. Such an integration may be achieved by a “supramodal higher-level representation” which do not store content per se but guide the retrieval of stored information by “stabilizing” the activity of perceptual areas.

4.2 The circularity and arbitrariness of the neural location criterion

Mahon and Caramazza’s grounding by interaction model, Machery’s offloading hypothesis, and Kiefer and Pulvermüller’s reinterpretation of the activity in the ATL share a common structure: faced with evidence supporting respectively a modal or amodal interpretation they state that the brain areas responsible for such processing are not actually involved in conceptual processing. These reinterpretations all spring out of the idea that the concept debate can be solved by looking at the neural reuse evidence as modal representations are distinguished by amodal ones from the fact that they are processed by the same areas of the brain which are involved in direct interaction with the world. One problem with this approach is that establishing what counts as a sensorimotor area of the brain is far from a simple task. This is not just an empirical problem: to know which areas are perceptual and which are amodal, one should clarify first what being perceptual means, but then the entire criterion of determining what is modal through neural location can be accused of being circular, as it does not provide us with a distinction between perceptual and amodal representations, but rather it presupposes one.Footnote 3 A further problem, highlighted, for example, by Haimovici (2018), is that these reinterpretations of the empirical evidence work with a definition of concept as a representation which is useful to solve certain high-level cognitive tasks (like categorizing and understanding language), but these tasks aren’t solved by employing concepts exclusively and no criterion is provided to distinguish systems that implement the conceptual repertoire from auxiliary systems. We currently seem to possess data suggesting the activation, during conceptual processing, of both areas usually conceived as perceptual and areas (like the ATL) which are taken to produce “supramodal” representations. Even ignoring the problem of circularity, it is arbitrary to believe that the activation of perceptual areas is just an offloading (as Machery, 2016, does) just as it is arbitrary to think that the activation of the supramodal areas is just an auxiliary activation (as Kiefer & Pulvermüller, 2012, do). Any theory which aspires to make the neural location criterion work should avoid circularity by first proposing a theoretically and empirically sound definition of what “perceptual” means, in order to allow us to properly distinguish perceptual and amodal areas of the brain; while, in order to avoid Haimovici’s problem, it should provide an equally sound definition of “concept” such that what counts as properly conceptual processing and what is ancillary processing can be precisely determined prior of looking into the neural reuse evidence. To my knowledge no currently available theory defines these notions with the required precision, making the neural location criterion both circular in its theoretical foundation and empirically weak under its own premises. We see activation, during conceptual processing, of both areas which are traditionally considered to be modal and of areas which are traditionally considered to be amodal, making it arbitrary to exclude one or the other from constituting proper conceptual processing.

Both Mahon and Caramazza’s grounding by interaction theory and Machery’s offloading hypothesis lead to the somewhat arbitrary exclusion of sensorimotor areas from being a part of proper conceptual processing. In regard to the former, Michel (2021) notes how it presupposes a particularly anemic view of concepts: if perceptual information is what makes a concept interact with reality, why should one suppose that the “actual” concept is separate from it? What does a “naked” concept amount to? The risk is making cognitive contents in general irrelevant in the determination of concepts, relegating them to a secondary, non-conceptual role. If amodal representations are often accompanied by modal ones, and these do most of the cognitive work, the statement that they are not constitutive of the concepts seems arbitrary. The definition of concepts as something distinct from cognitive representations is perfectly valid but it is not useful in the context of cognitive science.

In their cascaded processing reinterpretation of the neural reuse data, Mahon and Caramazza (2008) draw an analogy between the relation between phonological processing and lexical processing on the one side and between motor processing and conceptual processing on the other. The analogy is motivated by the evidence we have that phonological and lexical processing intermingle. In the studies cited (Morsella & Miozzo, 202; and Navarrete & Costa, 2005), subjects had to name a picture of a target object (like “hammock”) while ignoring a distractor picture which was either phonologically related (“hammer”) or unrelated (“button”) to the target and were found to be faster in their task in the phonologically related condition. However, from the evidence of the relation between phonological and lexical processing, we shouldn’t conclude that the meaning of a word is constituted by its phonology. And so, if the analogy is correct, from the evidence that motor and conceptual processing intermingle, we should not conclude that motor aspects are constitutive of the concepts they are related to.

However, there is a difference between these two cases (Meteyard et al., 2012): in the case of the relation between phonological processing and lexical processing, we can suppose that the activation of the phonology of “hammer” facilitates the production of “hammock” by simple phonological similarity. That the similarity between the phonology of two words should help in the production of the phonology of one of the two is somewhat expected. But the production of the word “hammer” does not involve hammer use: there is no reason to believe that motor areas would activate during a picture naming task, as they seem not to be involved in it. The finding of such activation can lead us to two hypotheses: one is that there is some link between phonological and motor processing, while the other is that the motor system is always activated in an effector-specific fashion regardless of the task requirements. By the adoption of this latter hypothesis, we would explain away any evidence of the involvement of the motor system in any cognitive process unrelated to a motor output. Given that no reason is provided for why the motor system should be implicated in these tasks in the first place, any theory about the involvement of the motor system into semantic processing is compatible with such hypothesis. Pulvermüller et al. (2005) have provided evidence of how the application of TMS on the motor cortex disrupts semantic processing. This finding is compatible with Mahon and Caramazza’s hypothesis, as they proposed that the cascade of processing is reversed back from motor areas to lexical areas involved in semantic tasks. This tells us nothing about the nature of the representations which mediate such interactions, so the hypothesis that they might be amodal representations still stands. However, in the lack of any explanation of why the motor system should interact with the lexical system, such hypothesis is also perfectly compatible with a modalist interpretation, which also has the advantage of providing a hypothesis of why the motor system should be activated. So, the hypothesis of Mahon and Caramazza is compatible with both interpretations and with whatever empirical evidence is collected.

A similar criticism as that posed to Mahon and Caramazza’s grounding by interaction model can be brought against Machery’s offloading hypothesis. As noted by Haimovici (2018), if perceptual representations can be used to solve conceptual tasks, which the conceptual system isn’t able to perform alone, why shouldn’t such representations be considered conceptual? Machery (2009, p. 12) defines concepts as “bodies of knowledge that are used by default in the processes underlying the higher cognitive competences”. Using this same definition and his conception of perceptual representations under the offloading hypothesis we should conclude that such representations indeed qualify to be concepts. Machery discards the processing happening in brain areas he considers to be perceptual from being part of proper conceptual processing. Given that such activations are necessary, according to Machery himself, for the conceptual processing, such discarding seems arbitrary.Footnote 4 The same can be said in the case of Kiefer and Pulvermüller’s (2012) modalist reinterpretation of SD’s symptomatology. They arbitrarily exclude “supramodal” representations from constituting proper concepts and hold that areas they consider to be amodal are not producing concepts, while conceptual processing just happens in the modal areas. Even ignoring the circularity of the neural location criterion, their statements are problematic as they suffer from the same arbitrariness in relation to empirical evidence as Machery’s reinterpretation but applied to the opposite set of data.

5 The input specificity criterion

A further criterion which has been employed to distinguish modal representations from amodal ones is the input specificity criterion, according to which modal representations are defined based on the inputs they can receive. Under this interpretation, each modal system responds to a specific class of inputs, while amodal systems can respond uniformly to stimuli of different modalities (Haimovici, 2018). This criterion is problematic, as it would make all concepts trivially amodal, dissolving the debate in a unproductive manner. No modalist author argues in fact that concepts should be reached from one specific modality, and it can be easily shown that even the simplest of concepts a certain degree of abstraction and convolution of modalities.

5.1 The approximate number system and interpretations of the process of abstraction

Take for example the debate surrounding the modal or amodal nature of our approximate number system (ANS). The ANS is the capacity we humans possess to intuitively and automatically assess the number of objects on a given scene (visual numerosity estimation) or the number of sounds in an auditory sequence (auditory numerosity estimations). The ANS is at the basis of more refined concepts of numbers, which, in turn, enter our mathematical skills. What has been noted is that both visual and auditory numerosity estimations obey Weber’s law: the level of inaccuracy in such estimations increases logarithmically with respect to the stimuli’s magnitude (Dehaene et al., 1998). Another important aspect of the ANS is that there seems to be no intermodal transfer cost involved in it: for example, if one is asked to count together percepts of differing modalities (adding together the sounds one hears in a sequence and the objects one sees in a scene) she will not be significantly slower than if she was adding together just sounds or visual elements (Barth et al., 2006; Izard et al., 2009). These phenomena are held as evidence that the ANS employs amodal representations, since its activity seems unaffected by the specific sensory modality of the incurring stimuli (Machery, 2016).

The claim that the evidence so far collected is evidence that the ANS uses amodal concepts has been disputed. Jones (2016) argues that this system shares many of the properties identified by Fodor (1983) as characterizing modules. Fodor himself believed that modular organization was the defining property of perceptual, rather than high-level, processing. So, if we were to agree with Fodor’s interpretation of modularity, we would ascribe the ANS to perceptual, rather than conceptual, mechanisms. Jones is aware that this reasoning isn’t really satisfying: scholars adhering to the massive modularity paradigm (Carruthers, 2006) believe that central cognition too can be characterized as a modular system, while the embodied mind theorists mostly believe that even perceptual systems are not modular. Even Fodor himself believed that there were some modular processes which were not perceptual, like syntactic parsing. However, there is also empirical evidence that Jones brings in support of his thesis: the macaque homologue of the human horizontal intra-parietal sulcus, which has been found to support numerosity estimates in our species, contain neurons which respond selectively to specific numerosities. The hypothesis is that these neurons function in a similar way to selective neurons in the visual system, such as edge-detectors or face-detectors. Furthermore, obedience to Weber’s law is usually taken as a property of perceptual systems rather than conceptual ones.

Jones argues that the ANS may be a specialized perceptual system with modular properties, dedicated to detecting the number of entities in a collection. As not all modular systems are necessarily perceptual ones, the apparent modularity of the ANS is not evidence that it employs modal representations. Furthermore, amodalists can object that the system described by Jones is not actually perceptual as it receives data from a variety of different modalities. Jones takes the fact that the system can receive inputs from different modalities as evidence that it is a multimodal system, while amodalists take the same evidence as evidence that it is an amodal one. It should be noted that the same holds true for concepts in general. The modalists necessarily need to posit some process of schematization and convolution of modalities in order to account for concepts which can be retrieved from different modalities, and which present a significant amount of abstraction. If these processes of schematization and convolution are considered as making the resulting concepts amodal, then the amodalists would be right in arguing that a properly modal system cannot explain abstractness. Haimovici (2018) comments that if the input criterion is correct, then the amodalist objection that the numerosity system, as described by Jones, is amodal would also be correct.

Machery (2016) has argued that the modalist idea that number estimation may be performed by a multimodal perceptual system rather than by an amodal one weakens their position, as, then, the parsimony argument can be reverted against them:

Neo-empiricists may respond that the system involved in numerosity estimation is a perceptual system, just not a modality-specific one; rather, numerosity estimation involves a multimodal perceptual system. The data reviewed above do not distinguish between this hypothesis and the claim that numerosity estimation is amodal, but this hypothesis comes with a theoretical cost for neo-empiricists: it considerably undermines the parsimonious nature of their approach – which was supposed to be a virtue of their approach – because neo-empiricists now need to appeal to multimodal systems that mimic amodal systems. This neo-empiricist response would be more compelling if neo-empiricists explained how multimodal systems are to be distinguished empirically from amodal systems, but this difficult theoretical challenge has not been met.

Machery objects to the modalist idea that whatever is explained by amodal systems can be explained by multimodal systems by saying that such an opinion weakens the parsimony argument – their strongest theoretical point. In fact, postulating the existence of multimodal concepts adds further complexity, making the modalists’ explanations just as costly as the amodal ones. It should be noted, however, that such criticism is not per se a sufficient reason to adopt an amodalist stance. Amodal systems are not by default the simplest option between them and multimodal ones, so there is nothing contradictory or problematic in the opinion that multimodal systems are inherently more parsimonious and that, consequently, amodal systems “mimicking” multimodal ones are unnecessary. Multimodal theories, in fact, do not presuppose more assumptions about the workings of a cognitive system than amodal ones do, so Occam’s razor per se is not a reason to affirm that an amodal model which accounts for the same data is automatically to be preferred – nor vice versa. In order to make the input specificity criterion work, one should clarify the notions of “multimodal” and “amodal”: as things stand now, under this criterion, every piece of empirical evidence which can be explained by an amodal system can also be accounted for (with an equal level of basic assumptions) by a multimodal system, so there is no way of determining which system is inherently more parsimonious theory-wise. Things are not going to change with further experimental investigations as the ambiguity of the terms “multimodal” and “amodal” will still make them apt for further re-interpretations of the empirical evidence. Machery challenges modalists to find an empirical criterion to distinguish multimodal systems from amodal ones, but I think that this challenge is based on a way to distinguish between modality and amodality which is in itself problematic, and that, consequently, modalists should not accept Machery’s challenge.

Note that this is not to say that what is multimodal and what is amodal is just a matter of stipulation. One important difference between the two notions is that amodality is categorical while multimodality is a graded notion (we can ask how many modalities are convolved, how much they are abstracted away from simple resonance with the world and so on). Asking whether cognition is embodied or not is not anymore an interesting problem, as graded questions are more fruitful. One example of this may be the ANS itself. It has some properties which are usually linked with perceptual systems (obedience to Weber’s law and a largely modular organization) and properties which are linked with amodal systems (the lack of the switching cost). Given a shared definition of what perceptual means and a precise way to tell apart multimodal from amodal representations (definitions we currently lack), we would be able to assess how many perceptual properties the ANS shares. Currently, the ambiguity of the notion of “perceptual” makes it difficult to establish which kind of data would be decisive to conclude that the system is perceptual or not (or how many properties of the perceptual systems it shares). As we will see below, the input specificity criterion is not a good way to define perceptual systems as it would make even systems which are usually considered to be perceptual to fall under the category of amodal systems.

5.2 The definition of modal

As pointed out by Haimovici (2018), the definitions given to the notion of “modal” are either too permissive (everything becomes trivially modal) or too restrictive (everything becomes trivially amodal). Under the too permissive definition of “modal”, modal is all that responds to perceptual inputs. Multimodal and convolved representations would then still be modal representation, for the simple fact that they derive from sensation. However, if we adopt such a permissive definition of modal, modalism would be reduced to a modern restatement of the old empiricist motto nihil est in intellectu quod non fuerit in sensu (nothing is in the intellect that is not first in the senses). But that all information we receive must be somewhat derived from perception is just a truism, so, under this definition, modalism would be completely uninformative. On the other hand, to adopt the input specificity criterion, and thus defining “modal” as what responds specifically to single modalities, while “amodal” as what responds uniformly to many different ones leads to equally trivial consequences. Most representations, including sensations, can in fact be expressed by more than one modality, as most of them are tied at least to some linguistic expression. No modalist denies, for example, that reading the number “3” can lead to the same neural activation as seeing three apples. This is a clear case of multimodality of a neural activation related to number processing. If we adopt the input specificity criterion, the fact that this activation is the same in both cases should be enough to conclude that the concept THREE is amodal. If even this case of multimodality is to be considered by default as an instance of amodality, then modalism would be trivially false.

However, the fact that different modalities (whether linguistically expressed or not) can lead to the same neural activations seems to be a point in favor, rather than against, modalism. Let us consider some evidence about our number sense. We now possess good empirical evidence that at the basis of our number sense there is the ASN (Jones, 2016). Studies also confirm an activation of areas related to finger movements when performing arithmetic tasks even if the fingers are left put, suggesting that the mind is simulating counting on the fingers (Andres et al., 2007; Sato et al., 2007; Tschentscher et al., 2012). Furthermore, we have evidence that the cognitive system conceptualizes numbers in terms of space (Fias & Fischer, 2005). When modalists claim that the number system of the human mind is perceptual, they are not stating that it must possess dedicated channels for each sensory modality which presents numerical regularities, rather they are arguing that apparently more abstract faculties like counting can be conceived in more concrete terms (through simulations of actual sensations and actions). These data can thus be taken as evidence of the employment of modal representations during the performance of number-related tasks.

The absence of the modality switching cost in the ANS shows only that this part of the conceptual system responds to different modalities. This evidence can only be taken (as Machery, 2016, does) as evidence against modalism if one conceives of modalism as a theory which holds that each part of the conceptual system is a perceptual system, where he defines perceptual systems as systems which respond exclusively to one specific modality. Then the fact that the ANS responds to different modalities would make it an amodal part of the conceptual system, thus falsifying modalism. But such a definition of modalism would be trivially false as most (if not all) parts of our conceptual system respond to multiple modalities. They are not perceptual systems if perceptual systems are taken to be those which respond specifically to one modality. A more interesting definition of modalism would be that offered in §2.1: modalism is the theory which holds that conceptual representations are the result of reenactments of the perceptual experiences one has had with the tokens of the category represented by the concept. Modalism isn’t falsified if such reenactments are multimodal and if, consequently, parts of the conceptual system respond to different modalities. The fact that we do not possess a way to distinguish multimodal from amodal representations is not evidence in favor of the existence of amodal concepts, rather, it is the sign that the input specificity criterion does not constitute a good method to establish such difference and is theoretically weak. If we discard the criterion and allow modal systems to respond uniformly to different modalities, then the multimodality of the ANS (including the fact that it does not present a modality switching cost) do not represent a problem for modalists anymore.

6 The need for a graded approach

My main claim here is that in order to adjudicate the debate on the format of concepts by means of empirical research, a conceptual clarification of the main theoretical ambiguities is needed. Recall Louwerse and Connell’s (2011) reinterpretation of the modality switching cost data we saw in §3.1. McCaffrey and Machery (2012) generalize their point, stating that statistical relations inside a structure of amodal representations may resemble a semantic system organized by modality. My claims in the previous section may be seen as a further generalization of their point: that, due to the way in which the debate has been framed, it is hard to find a conceptual or empirical way to distinguish between “multimodal” or “amodal” symbols. In this last section, I want to offer some hints at what a theory which overcomes the theoretical difficulties I have underlined may look like.

Take the ambiguities surrounding what is “real” conceptual processing and what is ancillary processing. We have ample evidence of the activation, during conceptual processing, of both brain areas which are involved in the processing of outside world information and of areas (like the ATL) in which, as suggested by the symptoms of pathologies like SD, information is more abstractly represented. As we have seen, it always possible to reinterpret the data by proposing that either activation is part of ancillary processing, but this often comes at the cost of weaking the notion of concept. Considering both the activation in the ATL and that in more immediate sensorimotor areas as relevant for conceptual processing might provide a way of dissolving the debate on the interpretation of the empirical evidence avoiding the difficulties related to the need to reinterpret such evidence. Rather than asking if conceptual processing can be considered as belonging to the range of perceptual activities or not (a question whose answer hangs on the difficult definition of what is perceptual), we can consider all brain activations which seems relevant to conceptual processing as central to it.

I am aware that empirical research can continue to produce interesting results even in the absence of stringent definitions; what I am proposing here is that lacking a precise definition of “perceptual”, one should have at least a sufficiently broad definition of what a “concept” is, so as to neutralize the possibility of a reinterpretation in the first place. This would make it possible to prevent an a priori exclusion of the outputs of different brain areas involved in conceptual processing from being proper part of concepts themselves. I am not excluding here the possibility that there could be coherent theories which posit a stricter restriction than that, but such a restriction should be legitimized on empirical grounds: if the most reasonable thing to do to coherently account for empirical evidence is to posit that concepts are produced only by a part of the brain areas which are shown to be necessarily involved in conceptual processing, then it would be legitimate to put limitations on one’s definition of a “concept”. Otherwise, a more encompassing definition would put less arbitrary constraints on the findings produced by empirical research. Such an attitude would dissolve the debate in its present form, but it would open the way to the more interesting (and more graded) question of how much and in what ways different areas of the brain contribute to forming and stabilizing concepts. A similar attitude can be taken in approaching the amodal/multimodal distinction. As we have seen with the case of the ANS, there are systems whose output has properties of both modal and amodal representations. Either we assume a “dissolving" attitude and we say that the differences between a multimodal representation and an amodal one are negligible (or that the entire question is ill-posed), or we maintain the distinction insofar as it has any practical usefulness by asking ourselves the more interesting question on which systems and to which degree present the characteristic traits of modal and amodal systems.

The debate on the format of concepts is part of a larger discussion on embodiment, which concerns all fields of cognition (Kaup et al., 2022). There is a growing consensus that the theme of embodiment requires graded questions and graded answers. The underlying problem of the criteria I have reviewed is that they tend to frame the debate on the embodiment of concepts as an either/or question, where the “real” concept is either produced by perceptual areas or by areas like the ATL and where a representation is either amodal or multimodal. My argument is that such questions cannot be resolved by means of empirical research, as they allow for constant reinterpretations of the data; and facing such questions one also risks of overlooking more interesting problems, such as to what degree and in what form each area of the brain contributes to the formation and stabilization of concepts.

7 Conclusion

The main question underlying this paper has been that on the definition of modal representations.

As briefly discussed in §2.2, the isomorphism criterion does not achieve the objectives it was created for – i.e., to allow modal representations to have a direct relation to what they refer to. Furthermore, the criterion suffers from many theoretical problems, and is unable to properly distinguish modal representations from other types (see Machery, 2007, 2016). Given that all representations undergo processes of significant abstraction and convolution, a more interesting question is to see how they are created and how much of a resemblance they maintain with their referents. Attempting to reduce this complexity to a dichotomy between purely isomorphic and purely arbitrary representations is an act of dubious usefulness. I agree with Searle (1980) and Harnad (1990) that the problem of the grounding of representations is a significant problem for cognitive science, but to resolve it by presupposing a simple isomorphism between representations and their referents has been shown to be, from an empirical point of view, too cheap of a strategy.

A similar conclusion can be reached for the neural location criterion, which has been discussed in §4: more interesting questions lurk behind the either/or question of whether the “bulk” of concepts is produced by sensorimotor areas or by arguably amodal areas. A problem exists regarding how to define what perceptual areas are in the first place and. Moreover, it is now clear how activation during conceptual processing spreads through areas directly involved in sensations and motion and areas which are more detached from them. My argument is that empirical research on the contribution of different areas of the brain in the retrieval and stabilization of concepts would benefit by discarding the question of whether concepts are modal or amodal.

Finally, I have argued that the input specificity criterion is weak, and that it should be abandoned, for the reasons reported in §5. The examples I discussed in relation to the ATL can be extended to many further functions: they present traits of both modal and amodal systems.

In the light of foregoing analysis, it is apparent that the question whether concepts are modal or amodal has outlived its empirical usefulness.