A predictive processing framework of tool use

In this paper I introduce the theory of predictive processing as a unifying conceptual framework to account for the human ability to use and innovate tools. I explain the basic concepts of predictive processing and illustrate how this framework accounts for the development of tool use in young infants and for findings in the neuropsychological and neuroscientific literature. Then, I argue that the predictive processing model needs to be complemented with a functional-evolutionary perspective, according to which the developmental and neurocognitive mechanisms should be understood in relation to the adaptive function that tools subserve. I discuss cross-cultural and comparative studies on tool use to illustrate how tools could facilitate a process of cumulative cultural and technological evolution. Furthermore, I illustrate how central premises of the predictive processing framework, such as the notion of Bayesian inference as a general principle and the role of prediction-error-updating, speak to central debates in evolutionary psychology, such as the massive modularity hypothesis and the trade-off between exploitation and innovation. Throughout the paper I make several concrete suggestions for future studies that could be used to put the predictive processing model of tool use to the test.


Introduction
Without any apparent effort on a daily basis we interact with computers, drink coffee, drive cars and use our smartphones. These tools have dramatically altered our environment and our way of living. Over the past decades neuropsychological studies and neuroscientific research have elucidated the neurocognitive mechanisms that enable us to use tools in a goal-directed and flexible fashion. Specifically, the study of apraxia has been at the heart of traditional neuropsychology for more than a century. Selective impairments in the ability to imitate gestures, to mimic tool use or to correctly use tools have provided intriguing insight in the neural basis of our motor system (Park, 2017) and in the neural organization of conceptual knowledge for action (van Elk, van Schie, & Bekkering, 2014). Recent review papers have provided a state-of-the-art overview and have identified key debates in research on tool use and apraxia (e.g., whether tool use relies on manipulation knowledge vs reasoning; the role of affordances vs semantics in tool use; etc., see: Lesourd et al., 2018;Martel, Cardinali, Roy, & Farne, 2016;Osiurak & Badets, 2016;Reynaud, Lesourd, Navarro, & Osiurak, 2016). In this paper I propose to integrate basic insights from predictive processing theory with an evolutionary-psychological framework to provide a novel perspective on both the ultimate and the proximate factors underlying human tool use. The computational framework of predictive processing is emerging as an integrative and unifying tool in psychology and cognitive neuroscience (Clark, 2013;Friston, 2018;Friston & Kiebel, 2009). For instance, the theory of predictive processing has been successfully applied to explain the emergence of hallucinations (Griffin & Fletcher, 2017), selfrecognition (Apps & Tsakiris, 2014), the bodily self (Blanke, Slater, & Serino, 2015), placebo-effects (Buchel, Geuter, Sprenger, & Eippert, 2014), automobile driving (Engstr€ om et al., 2018) and action observation (Kilner, Friston, & Frith, 2007). Predictive processing has also been applied to account for religious and spiritual experiences (van Elk & Aleman, 2017) and for the working of psychedelics, such as LSD and psilocybin (Carhart-Harris & Friston, 2019). Whereas predictive processing is closely related to embodied and extended views of cognition (e.g., Allen & Friston, 2018;Clark, 2015), so far the study of tool use has not been framed in terms of predictive processing. In this theoretical paper I review how the predictive processing framework can provide a unifying framework to account for the proximate mechanisms underlying tool use. I will start off by shortly going over the basic tenets of the predictive processing framework and I exemplify how predictive processing helps us to understand the development of tool use and the neurocognitive mechanisms underlying tool use.
At the same time, I argue that the predictive processing framework needs to be complemented with an evolutionary perspective. Following the four questions as postulated by Tinbergen (1963), predictive processing provides insight in the proximate mechanisms underlying tool use (i.e., the ontogeny and mechanism). In contrast, evolutionary psychological accounts help us to better understand the phylogeny and ultimate function of tool use, for instance, the role these abilities subserved in a process of cumulative cultural evolution and as a way to shape and extend the human mind. Throughout the paper I make several concrete suggestions for future studies that could be used to put the predictive processing model of tool use to the test.

2.
General principles of predictive processing A basic premise of predictive processing is that our brain functions like a prediction machine, continuously aiming to 'explain away' the incoming sensory input (Clark, 2013). Based on prior generative models our brain generates top-down predictions regarding the expected state of the world. In case of a mismatch between our predictions and the sensory information, a prediction error signal is generated, resulting in an updating of our prior models. The brain engages in a process of Bayesian inference, whereby priors are used to yield a generative model, which is updated based on incoming sensory information.
Belief updating occurs in a hierarchical fashion, such that high level beliefs are used to generate lower-level sensory predictions (see Fig. 1). In case of a mismatch between expected and observed sensory input, a prediction error signal is generated, which is passed on in the hierarchy. This notion of a generative model is also reflected at a brain level: the sensory regions of the brain are organized in a hierarchical fashion, and higher-level brain regions send predictive signals to lower-level (sensory) regions, which in turn can pass on bottom-up prediction errors to higher-level regions. Specifically, whereas the superficial pyramidal cells are thought to encode feedforward prediction error signals, the deeper infragranular layers represent prior expectations (Bastos et al., 2012). Next to using exteroceptive sensory signals for belief updating, the brain also relies on interoceptive signals related to one's bodily states for making inferences (Seth, 2013). For instance, inferring one's emotional state (e.g., 'I am angry') relies on the integration of feedback-related signals from one's heart rate and sweating response typically in association with context-specific information of one's environment.
On the predictive processing view, some predictions are more precise than others, and some sensory information is more reliable than other. Accordingly, precision-weighting is applied to our predictions and sensory input, such that more reliable signals have a stronger impact on prediction error updating. Strong and precise priors yield a stronger top-down effect on sensory perception, than weak and imprecise priors. This is specifically the case when sensory information is ambiguous or unreliable (e.g., in the case of sensory deprivation or in a noisy environment). This way, predictive processing can account for instance, for placebo-effects, whereby prior expectations exert a top-down effect on the processing of sensory signals (e.g., the expectation that an expensive cream will provide more pain-relief, results in the subjective experience that the stimulus is less painful; cf., Buchel et al., 2014). Imprecise coding of predictions can also lead to altered experiences of agency, as observed for instance in schizophrenia, in which auditory-verbal hallucinations may be related to an imprecise coding of self-generated speech (Griffin & Fletcher, 2017).
In the case of 'stubborn predictions', priors may be resistant to prediction error updating, because they are assigned an extremely high precision. This could be the case for instance when priors were acquired phylogenetically or through an extensive process of learning (Yon, de Lange, & Press, 2019). For instance, the perception of our body is (typically) constrained by the prior that we only have one body and our body temperature is constrained to stay within certain limits to ensure survival. Also, based on our past experiences we have acquired a strong prior that light comes from above (Sun & Perona, 1998) or that faces are convex (Hill & Johnston, 2007), which explains our proneness to certain types of illusions (e.g., the hollow face illusion). It has been proposed that certain psychopathological disorders, such as schizophrenia or depression, can be understood in terms of maladaptive priors guiding the patient's behavior and experiences (Kube, Schwarting, Rozenkrantz, Glombiewski, & Rief, 2020). In the case of stubborn priors, prediction error minimization may be achieved through a process called 'active inference', which is reflected in making changes in the environment such that the sensory input matches prior predictions (K. Friston, 2010). For instance, depressive patients may avoid social contacts and prefer situations of social isolation, because that situation confirms their prior belief that they are lonely or that people don't like their company.
As a specific instance of predictive processing, the free energy principle states that the brain tries to minimize the amount of free energy, by reducing the overall state of surprise (Friston, 2010). This can be achieved through a process of prediction error minimalization and model optimization (e.g., reducing the complexity of existing models and selecting models that minimize overall 'surprise'). A premise of this view is that prediction error minimization occurs over a longer time-scale and that accordingly, the basic processes of Bayesian inference and belief updating should be considered from the perspective of the survival of the organism.
An example may help to illustrate the theory of predictive processing and how it could account for the use of tools. Suppose you would like to drink coffee from your favorite coffee mug, which typically stands next to your computer on your desk. Almost without any effort you grasp the cup and you bring it towards your mouth to take a sip. In this case you apply a prior model of the shape of the cup to guide your hand towards the cup (typically referred to as grip-related or manipulation knowledge in the tool-use literature) and a Fig. 1 e The central table lists the distinction between the different levels and objects of explanation, resulting in four different questions that can be asked. The predictive processing framework (upper panel) acts primarily at a proximate level of explanation, specifying the mechanisms and the development of tool use. According to predictive processing tool use knowledge is represented in a generative model, instantiating top-down predictive signals to lower-level sensorimotor cortices, and that is updated through prediction error signaling. These mechanisms can be placed in a broader perspective, by taking into account the phylogeny and the adaptive value of tool use. Tools contribute to a process of cumulative cultural evolution (lower panel) and the two central features of this process (imitation vs innovation) may be instantiated through the process of top-down predictions and prediction error signaling. subsequent predictive motor plan, guiding your arm and the cup towards the location of your mouth. This simple action can be thought of as a process of prediction error miminization, such that the expected end-state of the motor system is comparable to the predicted end state. So far, this account fits well with earlier computational and Bayesian theories of motor control (Wolpert, 1997). Note however, that predictive processing differs from classical top-down models of motor control, in several ways, for instance through the notion of a hierarchically organized generative model, the notion of precision-weighting and the process of active inference.
Going back to the example, now suppose you find yourself in a different culture in which they serve drinks from a different type of cups (e.g., small wooden cups without handles). In this case, your hand reaching for and grasping the cup will yield a prediction error signal: you will mis-reach for the handle and apply too much force. Your grasping movement will result in the updating of your prior prediction and the activation of a different action plan required for drinking (e.g., a whole-hand grip with adjusted movement dynamics). Or suppose that while at home, you would be intently staring at your screen while trying to drink. In this case your action relies more strongly on proprioceptive information, while peripheral visual information is down-weighted. The notion of active inference can be illustrated by going out to the barkeeper in a coffee shop and asking him for a different type of cup that better matches your prior model of what a cup looks like. In that case there is no need to adjust your model, because you make active changes in the environment such that the sensory input still matches your predictions.
As such, this example illustrates how the basic principles of predictive processing can account for performing and observing tool use actions. Below I will elaborate in more detail on how predictive processing applies to some central observations from the tool use literature. The theoretical framework provides a useful perspective for understanding the proximate mechanisms underlying the development of tool use and the neurocognitive mechanisms underlying tool use. I therefore discuss relevant studies from developmental psychology, as well as the neurocognitive mechanisms underlying tool use.

The development of tool use
Since a long time developmental psychologists have acknowledged young children's ability for Bayesian inference and reasoning (Gopnik & Bonawitz, 2015), and recently a predictive processing account of cognitive development has been proposed (Koster, Kayhan, Langeloh, & Hoehl, 2020). This predictive processing account explains findings obtained with habituation and violation-of-expectation paradigms, statistical learning principles and children's increasing understanding of their physical and social environment. Regarding the learning of tool use, three central observations stand out, which fit well with a predictive processing account of the development of object use. First, action experience appears to be a driving factor for infants acquiring a basic sense of mastery of their body, their environment and their ability to use tools (for review, see: van Elk et al., 2014). Classical developmental studies show that infants readily learn the associations between making body movements and observing the effects in the environment (Rovee-Collier & Cuevas, 2009). During the first year of life, infants acquire the basic sensorimotor skills for interaction with objects, and they learn to pre-shape their hands, for instance in anticipation of using an object. By the end of the first-year infants are well able to pantomime the use of objects and show a basic ability to interact with everyday objects (e.g., drinking from a bottle; grasping food and bring it towards the mouth). The acquisition of tool use knowledge through action experience can be framed in a predictive processing framework, according to which efferent copies used to predict the sensory consequences of one's actions. A comparator model allows the child to infer a feeling of agency and control over one's actions (van Elk, Rutjens, & van der Pligt, 2015) and prediction error signaling between predicted and observed sensory consequences (e.g., moving the cup too low or too high) allows a refinement of the prior motor programs guiding the child's actions (Bays & Wolpert, 2007).
Experience with performing specific actions, in turn, enables infants to better understand and anticipate the actions observed by others, e.g., predicting that a mobile phone will be moved towards the ear (Monroy, Gerson, & Hunnius, 2017). The role of action experience for action understanding has been framed in terms of predictive processing (Kilner et al., 2007). On this account the child uses its own prior motor representations, to constrain and predict the incoming sensory information. Vice versa: observing and imitating the actions by others is another driving factor underlying tool use learning in infants. On the predictive processing model, action and perception are two sides of the same coin, both aimed at minimizing prediction errors between predicted and observed sensory input (Kilner et al., 2007). Thus, prior models specifying how tools should be used, can be acquired both through observation, action execution and imitation. For instance, it has been shown that infants can already anticipate the endgoal of object-directed actions (e.g., a cup moving towards the mouth), well before they are able to perform these actions themselves . Observing incorrect or unusual actions in turn can yield a prediction error signal, which is used to update prior action predictions (Langeloh et al., 2018). Observing unexpected actions with an object can also trigger a process of 'active inference', reflected in the subsequent exploration and testing of that object (Stahl & Feigenson, 2015). Thus, learning through observation is an important developmental mechanism, which can be understood from the perspective of a shared predictive model of tool use for action observation and execution.
Third, within the first year of life infants acquire the ability to apply means-ends reasoning which allows them to use objects in a functional fashion (e.g., pulling a cloth to obtain a toy; cf., van Elk et al., 2014). This ability relies on being able to make a distinction between the means and the goals of an action and to apply technological or means-ends-reasoning to both performed and observed tool use actions (Osiurak & Reynaud, 2020). Developmental studies have indicated that infants from 3 months old already can distinguish between the goals and the means of an observed action (Sommerville, Woodward, & Needham, 2005). From a Bayesian perspective c o r t e x 1 3 9 ( 2 0 2 1 ) 2 1 1 e2 2 1 the emergence of means-end reasoning can be understood in terms of the increased use and refinement of a generative model, which allows one to translate a high-level action intention into the fine-grained motor skills required for actually implementing the action (Ridderinkhof, 2014). Vice versa: based on observed kinematics one can 'reverse-engineer' the goal or intention of the action (Kilner et al., 2007). For instance, when observing grasping movements, 12-month old infants showed predictive eye-movements towards the end location of the action (Falck-Ytter, Gredeback, & von Hofsten, 2006). With increased age, children become more proficient in applying the basic principles of Bayesian inference to the use of objects. For instance, when presented with a Blicketdetector that could be activated by objects that had the 'blicket-property', children of 4-years of age were well able to reason about the causal power of objects e but even the 2-year old children showed a basic ability for applying the principles of Bayesian inference (Gopnik & Sobel, 2000).
In sum, development studies are in line with the view that from the moment they are born, infants rely on predictive models that enable them to learn to use basic objects and to apply means-end reasoning, directly based on their own experiences, as well as through action observation.

Neural mechanisms underlying tool use
A central discussion in the neuroscience of tool use, has focused on the question how conceptual knowledge is represented in the brain. Based on studies with neuropsychological patients it has been suggested that conceptual knowledge is represented in modality-specific brain regions, such as visual, tactile and motor-related regions (Barsalou, Kyle Simmons, Barbey, & Wilson, 2003). Information from different modalities is integrated in so-called semantic hub-regions, such as the anterior temporal lobes, that are involved in representing conceptual knowledge at a high level of abstraction. The modality-specific view in turn has been criticized based on the observation that functional impairments in conceptual knowledge do not always consistently relate to damage to specific brain regions (Caramazza & Mahon, 2003). Instead of the modality-specific view, a network-approach has been proposed (Mahon & Caramazza, 2011), according to which conceptual knowledge is represented along different functional brain networks that are constrained by principles of innate connectivity. The predictive processing framework can be conceived of as hitting the middle-ground between modality-specific views and network theories of conceptual knowledge, and as such offers the potential to provide a unifying account of the neural organization of tool use knowledge (see also: van Elk & Bekkering, 2018). That is, the predictive processing model implies that tool use knowledge is instantiated in prior models that are hierarchically organized and that these models are updated through a process of prediction error monitoring. Indeed, it is commonly observed that the brain is divided in different functional networks that display strong withinnetwork connectivity, and that these networks in turn are organized in a hierarchical fashion along a cortical gradient from peripheral sensorimotor cortices on the one hand to highly connected association cortices on the other hand (Margulies et al., 2016).
This hierarchical organization is also reflected within cortical networks, such as the somatosensory and the visual cortex. The motor system, including the primary and supplementary motor cortex, as well as the premotor regions, is organized in a hierarchical fashion, such that more anterior areas are involved in representing high-level properties of planned actions, while more posterior regions are involved in coding for the low-level properties of actions (Grafton & Hamilton, 2007;Koechlin & Jubault, 2006). Also at a behavioral level, actions are planned and organized hierarchically such that lower-level action features are selected based on high-level action intentions (Rosenbaum, 2009). Furthermore, conceptual knowledge for conducting actions with tools also is also organized hierarchically, around the goal-location and function associated with using objects (van Elk et al., 2014). These observations fit well with a central assumption of the predictive processing framework that the brain is a hierarchically organized system, involved in a continuous process of prediction error updating based on high-level predictive signals and bottom-up input from lower-level regions. The predictive processing framework is also compatible with early Bayesian models of motor control (Bays & Wolpert, 2007) that postulate a central role for optimal estimation of expected outputs and monitoring the sensory consequences of planned actions.
In the neuropsychological literature, a classical distinction has been made between ideomotor and ideational apraxia (Buxbaum, 2001;Buxbaum & Saffran, 2002), which are characterized respectively by a loss in the manipulation knowledge and a loss of semantic knowledge for object use (Heilman, Rothi, & Valenstein, 1982;Johnson-Frey, 2004). In terms of predictive processing, these deficits could be understood as selective impairments at different levels of the cortical hierarchy involved in representing stored object knowledge. Ideomotor apraxia has been associated with damage in the premotor and supplementary motor cortex, which have been implicated in the planning and execution of basic motor movements (Halsband et al., 2001;Wheaton & Hallett, 2007). In contrast, ideational apraxia may be caused by more widespread damage in the cortical hierarchy in by damage to higher-level associative brain regions, such as the supramarginal gyrus (Gross & Grossman, 2008). In line with this suggestion, recent studies have demonstrated that different brain regions are indeed involved in representing observed actions at different levels of abstraction. For instance, whereas the premotor cortex was found to encode actions at a concrete level (e.g., opening a bottle vs a corkscrew; cf., Wurm & Lingnau, 2015), more posterior regions were involved in coding actions at a higher level of abstraction (e.g., opening a bottle or a box; actions involving transitivity; cf., Wurm, Caramazza, & Lingnau, 2017). Similarly, it has been found that the lateral occipital temporal cortex (LOTC) represents objects in terms of the actions that they afford (e.g., grasping; manipulating), but not in terms of the shape of the object (e.g., round vs rectangular shape; cf., Wu, Wang, Wei, He, & Bi, 2020), also supporting the notion that action features for objects are represented in the brain at different levels of complexity. Thus, on the predictive processing model, c o r t e x 1 3 9 ( 2 0 2 1 ) 2 1 1 e2 2 1 selective impairments in different aspects of tool use knowledge can be understood as a selective deficit at different levels of the generative model that underlies our action planning with tools.
The predictive processing framework also provides an account of the mechanisms involved in observing correct or incorrect actions with tools. It has been found for instance, that observing incorrect actions yields a stronger activation of motor-related brain regions (Manthey, Schubotz, & von Cramon, 2003;Stapel, Hunnius, van Elk, & Bekkering, 2010;van Elk, Bousardt, Bekkering, & van Schie, 2012), which might well reflect prediction errors associated with the violation of a prior prediction regarding the use of an object. An obvious implication of this view is that neural responses to different types of action errors (e.g., observing someone applying an incorrect grip to an object vs observing someone using an object in a functionally incorrect way) could well reflect the coding of actions at different levels of abstractness in the brain's predictive hierarchy (de Lange, Spronk, Willems, Toni, & Bekkering, 2008).
Next to providing an integrative account of different findings in the literature, the predictive processing framework also allows formulating testable predictions to be addressed in future studies. For instance, the notion of 'precision weighting' can be applied to study the relevance of different types of modality-specific information (e.g., visual, auditory, tactile proprioceptive) for the successful interaction with objects. Based on our prior experiences with using objects in different contexts, it could be expected that information from some modalities is assigned a higher precision for using specific objects (e.g., using dumbbells relies more on proprioceptive and tactile feedback, whereas visual information is crucial for playing tennis). Systematically manipulating the uncertainty of the information provided (e.g., by using different levels of visibility) and measuring EEG beta-power as a proxy for precision-weighting (Palmer, Auksztulewicz, Ondobaka, & Kilner, 2019), could provide insight in the relative importance of the different sensory modalities for tool use knowledge.
In sum, the predictive processing framework proposes that the brain is organized in dynamic and hierarchically structured networks that are each involved in representing tool use actions at different levels of specificity. The framework also accounts for the selective deficits observed in neuropsychological patients, as well as the effects of observing incorrect actions.

A functional-evolutionary perspective
Most research in psychology and neuroscience on tool use, including the developmental and neurobiological studies discussed above, tends to focus on the proximal mechanisms that help us to understand how our human ability to use tools in a flexible and goal-directed fashion comes about (van Elk et al., 2014). While highly valuable, this approach could potentially miss out on complementary perspectives asking the 'bigger questions' related as of why humans use tools at all and how this ability developed in our remote ancestors. This somewhat single-sided focus on mechanisms related to tool use, is especially interesting in light of evolutionary psychological accounts. For instance, it has been proposed that a mechanism of high fidelity imitation and innovation, underlies the cumulative cultural evolutionary processes that provided a major advancement in the development of complex material culture (Richerson & Boyd, 2008). On this account, the developmental and neural mechanisms underlying tool use, provide insight in the causal chain enabling a process of cumulative cultural and technological evolution (for review, see: Osiurak & Reynaud, 2020). A useful theoretical approach to place the study of tool use into context may be found in the four famous questions asked by Tinbergen (1963) and his distinction between different explanatory levels (see Fig. 1). At a proximate level, one can specify the mechanisms through which specific behavior comes about, as well as the development (ontogeny) of that behavior, i.e., how the behavior changes throughout the lifespan. The developmental and neurobiological studies that were reviewed above, can be placed at this explanatory level. However, at an ultimate level, behavior can be described according to its adaptive value (i.e., how does it increase one's fitness?), as well as the evolutionary process through which the behavior came about (i.e., which selective pressures have shaped the behavior?). These questions can be addressed from a contemporary perspective (i.e., how does it work today?), as well as from a historical perspective (i.e., how did it evolve in the past?).
Typically, different scientific disciplines have focused on different types of questions. For instance, neuroscience and neuropsychology, including the predictive processing framework, are primarily concerned with specifying mechanisms; developmental psychology focuses on the ontogeny of behavior; evolutionary psychology aims to understand human behavior in terms of its adaptive value; evolutionary biology and anthropology focus on the evolutionary process that might have shaped behavior in the past. Each of these disciplines of course also have their own preferred method for answering these questions, ranging from using brain scanners to comparative studies between species and cultures. However, in order to obtain a full understanding of the phenomenon in question, a complementary perspective is needed, taking into account each of the four questions. Thus, as argued elsewhere, a multidisciplinary perspective is needed, where basic cognitive science approaches are complemented by cross-cultural, developmental and cross-species studies (Liebal & Haun, 2018).
Tinbergen's questions and the different explanatory levels can be used to place the predictive processing framework in perspective, by relating the causal mechanisms involved in tool use to their adaptive value (function) and to the way in which these mechanisms came about (phylogeny). The predictive processing model starts from a Bayesian perspective, whereby incoming information is evaluated in the light of prior expectations. As outlined above, this view entails the possibility that priors are based on prior learning experiences, but also that 'evolved priors' are 'hardwired' in the brain and are acquired phylogenetically because they clearly provided adaptive significance in our ancestral past (for theoretical integration between evolutionary psychological and predictive processing perspectives, see for instance: Barrett, 2014).Cross-cultural developmental and comparative studies can help us to obtain insight into the relative contribution of innate motor capacities and the role of learning in tool use. Discussing these studies sheds light on the phylogeny and the adaptive value of tool use and thereby speaks to a central debate within evolutionary psychology, related to the concept of 'cumulative cultural evolution'.

6.
Predictive processing and cumulative cultural evolution The phenomenon of cumulative cultural evolution (CCE) e also called cumulative technological culture (CCT; Osiurak & Reynaud, 2020) e refers to the process whereby we continuously build on and expand on inventions made by others. For instance, even an object as simple as a pencil represents a high amount of integrated knowledge and innovation: one needs wood, rubber, charcoal, metal and be able integrate these materials in an effective design. Each of these subprocesses relies on a high degree of prior expertise, learning, cultural transmission and mechanical innovation (e.g., woodcarving machines). Two key ingredients for cumulative cultural evolution to occur are high-fidelity imitation and innovation (see Fig. 1). First, cultural knowledge has to be reliably transmitted such that others are able to use this knowledge as well. For instance, in the case of producing a pencil one has to learn basic procedures of woodcarving and which materials lend themselves best for making pencils. Second, in order to enable innovation, new features need to be added to already existing designs and subsequently through a process of cultural selection, more adaptive designs will 'survive' while maladaptive innovations will be lost. Take the pencil again: due to a shortage of wood, one could turn to a different material such as plastic for making pencils. Soon however, it would turn out that plastic pencils are not functional as they cannot be sharpened. Other innovations, such as an eraser integrated at the end of the pencil are however more successful and will be retained and copied by others as well.
Thus, the first process underlying cumulative cultural evolution relies on high-fidelity imitation and copying the behavior observed by others. The last decades an ongoing discussion has focused on the question if CCE is a uniquely human ability, or whether it is shared by other species as well. Some have argued that our ability for diversifying tool design, cumulative change and high-fidelity social transmission and imitation is unique to humans (Vaesen, 2012). This mechanism in turn explains the rapid proliferation of the human species and the vast expansion of human cultural and technology (Tennie, Call, & Tomasello, 2009). Others have pointed out that in other species, such as crows or chimpanzees, also a similar basic capacity for complex tool design and tool use can be observed (Hunt & Gray, 2003;Vale, Davis, Lambeth, Schapiro, & Whiten, 2017). This debate strongly hinges on how CCE is defined and which criteria are applied for operationalizing CCE (Mesoudi & Thornton, 2018). Core features such as social learning and the improvement of fitness may be shared with other species; but the ability to recombine and diversify tools and the propensity for niche-construction may be uniquely human (Barrett, 2014).
The developmental studies outlined above already indicated that human infants have a deeply engrained tendency to attend to social information and to learn through action observation. They rapidly learn the use of novel tools, either through direct experience, or through observing and imitating the actions of others. Cross-cultural and comparative studies elucidate to what extent this ability reflects a general human propensity and whether it is shared with other species. A review of cross-cultural research on tool use in children shows that in traditional societies children learn to use novel tools often spontaneously and unsupervised, indicating that this ability is in their basic behavioral repertoire (Lancy, 2017). A recent cross-cultural developmental study also indicates that young children, aged between 2 and 5 years old, spontaneously invented similar tool use behavior as those observed in great ape populations, indicating that such behavior may be in the 'zone of latent solutions', i.e., within the general cognitive and physical abilities of humans and great apes (Neldner et al., 2020). At the same time, it was found that children from a Western society, more often found new solutions and successfully completed the tool use tasks, compared to children from a hunter-gatherer community. These studies thus indicate that a process of cultural scaffolding can refine and expand basic tool use skills (Riede, Johannsen, Hogberg, Nowell, & Lombard, 2018). This process may well rely on a general and uniquely human propensity for social learning and imitation, as other developmental psychological studies have shown that children from a young age onwards tend to over-imitate observed actions (Whiten, McGuigan, Marshall-Pescini, & Hopper, 2009) and have a natural tendency to take a pedagogical stance when attending to information from other people (Csibra & Gergely, 2011).
Comparative studies complement the idea that basic tool use abilities rely on different mechanisms than the use of more complex technological tools. The capability to interact with simple tools, such as using rakes and flaking stones, and to spontaneously infer their use might be phylogenetically old and shared with other species, suggesting a common neural homolog (Proffitt et al., 2016). Other species, including wild chimpanzees and capuchin monkeys are well able to use stones as hammers, to apply tools in an efficient fashion (i.e., planning economical motor actions) and to use the same tool for different purposes (Osiurak & Reynaud, 2020). However, more complex tool use that relies on culturally transmitted knowledge, e.g., such as weaving or bow-and-arrow technology, in turn depends more strongly on refined motor skills and dedicated neural structures related to action observation, imitation (Stout & Hecht, 2017) and technological reasoning (Osiurak & Reynaud, 2020).
Thus, the available developmental, cross-cultural and comparative studies indicate that human infants and other species can learn to use basic tools through trial-and-error learning, while complex tool use likely relies on specialized neurobiological and psychological mechanisms involved in social learning, imitation and means-end reasoning. From a predictive processing perspective, these mechanisms reflect the ability to rely on more complex and elaborate predictive models, that can be updated based on both action execution and action observation Thus, predictive processing accounts well for the notion of shared action representations as c o r t e x 1 3 9 ( 2 0 2 1 ) 2 1 1 e2 2 1 underlying the faithful transmission and imitation of tool use knowledge (see Fig. 1).
Whereas the imitation and exploitation of existing tool use knowledge relies on the acquisition of prior models, an intriguing possibility that could be investigated in future studies, is whether the second central process associated with CCE, namely the innovation of tools and technologies (also called 'exploration'), could also be accounted for in terms of predictive processing (see Fig. 1). An obvious candidate mechanism underlying innovation is the central role of prediction error signaling in updating and revising one's mental models. As outlined above, occasionally the observation and execution of tool use actions will be accompanied by a mismatch between predicted and observed sensory consequences. These prediction error signals in turn could result in a refinement of an existing action representation (e.g., a child learning that a hammer needs to be held with a full grip), or in the acquisition of a new action representations (e.g., a child learning that a hammer can also be used to remove rather than to hammer nails). On this view, erroneous object use or observing incorrect actions may be accompanied by a state of 'surprise' and a subsequent process to reduce the overall amount of entropy, either by updating one's action representations or through active inference to adjust the sensory input.

Cognitive gadgets and the extended mind
The predictive processing model is compatible with the view that the brain functions like a domain-general problem-solving mechanism, as our ancestors had to cope with continuously changing environments and challenges (for discussion of the massive modularity hypothesis, see: Chiappe & Gardner, 2012). The human propensity to attend to social information and to imitate observed actions could rely on deeply engrained priors or alternatively might have been shaped through an extensive process of social interaction itself (see for instance : Heyes, 2019). That is, next to being a driving factor enabling cumulative cultural evolution, we need to acknowledge the possibility that cultural processes themselves gave rise to the 'cognitive gadgets' that are typically considered as being innate (e.g., such as our ability to use language, to engage in mind-reading, to imitate etc.). Training and learning studies are needed to settle the debate to what extent the cognitive machinery typically thought to enable cultural evolution, is itself shaped by cultural learning processes (see for instance: Santiesteban et al., 2012). Relatedly, building on the predictive processing framework one could conceive of tools and objects as affording possibilities for action in association with a dynamic 'brain-bodyenvironment' system (Bruineberg & Rietveld, 2014). This integrated perspective takes the classic idea that cognition can be 'offloaded' to the environment (e.g., through the use of smart objects), one step further by specifying the mechanisms through which the brain selects the relevant affordances. It has been well-established that tool use extends one's body schema (Maravita & Iriki, 2004). Based on input from different sensory modalities our brain continuously generates a predictive model of our body (Apps & Tsakiris, 2014), which is dynamically adjusted in case we use tools for a prolonged period of time. Next to extending our body scheme, in some situations the generative model itself may actually extend to the tools we use, as proposed by 4E accounts of cognition (i.e., embodied, embedded, extended and enactive cognition; cf., Newen, De Bruin, & Gallagher, 2018). For instance, when navigating through a new city with our smartphone, this device can be conceived of as an external predictive memory representation, that allows us to plan our actions (Clark, 2017). On this account, the process of cumulative cultural and technological evolution also ultimately transforms the human mind itself. Or in terms of predictive processing: tools extend the brain's prediction machinery, by allowing more precise and elaborate predictions to guide our interactions with the world.

Conclusions
In this paper I have highlighted how the theory of predictive processing provides a domain-general account of the development and the neural basis of tool use, thereby allowing to synthesize existing findings and to generate new and testable predictions. Evolutionary approaches in turn place insights in the basic neurocognitive mechanisms underlying tool use in a broader perspective, by focusing on the phylogeny as well as the adaptive functions that tool use subserves. Whereas basic tool use may be shared with other species as well, the ability to use complex technological tools appears to be uniquely human and reflects our ability to build on and extend existing action representations. Ultimately the faithful transmission of tool use knowledge through shared action representations enables a process of cumulative cultural evolution. Tools, in turn, also extend and shape the human mind and prediction error signaling could underlie this process of cultural innovation. c o r t e x 1 3 9 ( 2 0 2 1 ) 2 1 1 e2 2 1