Human skill learning: expansion, exploration, selection, and refinement

Learning, or the process of acquiring knowledge and skill, allows humans to shape and adapt to their environments during development. Researchers have long theorized that the principal brain processes behind learning resemble a recruitment process. The brain initially explores an expanded pool of candidate neural circuits. Based on outcomes, the most promising candidate circuit is selected for refinement. Partly fuelled by new methods, the last decade of research on learning-related functional and structural changes in rodents has supported this theory, and, more recently, related evidence has started to emerge from human studies. We emphasize the need for formal theories and neurocomputational modelling of cortical plasticity to guide work on open issues, such as the link between functional and structural changes.


Introduction
An astonishing ability to acquire skills, such as reading and writing, playing a musical instrument or flying an airplane, is the basis of humankind's most impressive achievements. This ability also plays a key role in development. Learning, that is, the process of acquiring knowledge and skill, tailors humans to their environments, and vice versa. In modern societies, skill learning, for example during schooling, is crucially important for matching individuals to the needs of the society and the labour market [1] and it affects lifelong well-being and health [2]. At the same time, skill acquisition exerts a transformative force on the environment, both in evolutionary and historical time [3,4]. Understanding and enhancing learning is thus of great individual and societal importance.
Researchers have long theorized that the brain mechanisms behind learning and learning-influenced brain development resemble a job recruitment process [5 ,6,7,8,9 ,10-13]. Faced with a mismatch between its goal and capacity [14], such as when the fingers simply refuse to nicely form the piano chord that a music piece requires, the brain initially sets out to test an expanded pool of candidate neural circuits for performing the job (expansion and exploration). Based on the outcomes of these tests, the most promising candidate circuit is then chosen (selection) for further training (refinement). Partly fuelled by new methods, such as two-photon microscopy, the last decade of research in rodents has consolidated this theory, and in the last couple of years related evidence has started to emerge from human studies. Here we review this work, focusing in particular on motor learning. We end with a discussion of open issues for this expansion, exploration, selection, and refinement theory of learning. We also note that understanding the neural signals and triggers of plasticity and stability during learning may turn out to be highly informative of mechanisms regulating states of heightened plasticity during development, such as in sensitive periods (see also [5 ,15]).

Changes in Behaviour during Motor Learning
Several aspects of motor learning make it to a good model of skill learning. Motor paradigms are well suited for studies on many species, resulting in complementary information from many methods. Experience-dependent and repetition-mediated improvements in complex motor tasks (e.g., playing an instrument) have origins in many types of learning. Declarative and implicit learning support goal and action selection, and interact with learning at the level of action execution, to improve speed, precision, and consistency of movements [16][17][18][19][20][21]. Manifestations of motor skills, such as for example a beautifully executed pass of a ball during a football match, are in this sense not only about the smooth, precise, and reliable execution of movements, but also about the timely selection of the appropriate target and movement from a range of options. Performance on complex motor tasks typically shows rapid improvements and high variability early in practice, followed by a protracted period of slower developing refinements towards task execution with little variability. This pattern is well characterized and qualitatively quite consistent across individuals and tasks, although the rate and exact shape of learning of course may vary [18,[22][23][24].

Changes in Brain Activity During Motor Learning
Given the multiple processes involved in learning of motor skill, it is not surprising that a large network of brain regions is involved. Regions include for example the prefrontal cortex, supplementary motor area, pre-motor cortex, somatosensory cortex, cerebellum, basal ganglia, hippocampus, and posterior parietal cortex [16,19]. Rodent studies also show primary motor cortex involvement in motor skill learning [25], but this is a less common finding in humans [26,27], probably because many human learning paradigms usually tap more into action selection than execution [16]. That is, the typical human paradigm, such as learning to rapidly press a short sequence of keys with the fingers of your dominant hand, requires very little in terms of improved quality of execution of novel motor coordination.
Studies of rodents show that cortical representations of limbs and movements initially expand [28,29] and then renormalize during learning [30]. For example, rats have been reported to show expanded cortical maps after three days of skilled reaching training, but after eight days of training the expansions waned without any accompanying reductions in performance [30]. Related studies of sensory learning show that the expansion is beneficial for learning but not necessary for maintaining skill. For example, Reed and colleagues [31] reported that nucleus basalis stimulation-tone pairing in the rat was accompanied by cortical map extension in the auditory cortex. The rats were then trained in an auditory discrimination task, and improved discrimination learning was observed in animals with an expanded cortical map. Importantly, the map expansion faded over the following weeks although discrimination performance was unaffected. Thus, the expansion of the maps was related to learning but it was not the substrate of memory [8]. Although behavioural paradigms often are different, a few functional Magnetic Resonance Imaging (MRI) and brain stimulation studies of humans also show increases of activity [32,33] in primary areas that are followed by decreases during learning [34][35][36][37]. The dominant finding from human studies is, nevertheless, a learning-related decrease of activity outside primary regions [26,33,38]. Related findings suggest a general migration of execution-related activity from cortical regions to striatal regions, and migration within striatal regions, possible signalling more automatic and less controlled execution [39,40].
Importantly, studies of rodents show larger trial-to trial variability of local brain activity patterns earlier than later in learning. This work suggests that many different circuits of excitatory neurons within the motor cortex are activated early in learning, but that stable use of a devoted neural circuit characterizes performance later in learning [41 ]. The nature of the association between changes in trial-to-trial variability of activity patterns and cortical map expansion is elusive [25,41 ]. It may be that increases in trial-to-trial variability are underlying the increases in activity extent that is sometimes seen on the aggregate level, but also that expansion allows for a larger area and thus more neural ensembles to be activated. Findings suggesting that reductions in regional inhibitory activity may play a role in these processes might support the latter option [9 ,25,42]. This cascade of local processes in the primary cortices may be initiated when the system encounters a large mismatch between its goal and capacity [14]. One possibility is that this mismatch is signalled by dopamine prediction errors from striatum and ventral tegmental area and opens a window for exploration in more primary regions [9 ,43-45]. The early trial-to-trial variability of activity patterns has been proposed to signify exploration of possible network states [23,46], with the interpretation that initial variability may provide a pool of circuits from which the optimal one can be selected through system-level feedback mechanisms, such as striatum-mediated reinforcement learning or cerebellum-based sensory prediction errors [6,8,9 ,23]. This notion shares much of its potential and limits with the exploration-exploitation dynamics discussed in the reinforcement learning literature [47].

Changes in Brain Structure during Motor Learning
Learning-related changes in brain activity are accompanied by changes in structure. For example, synaptic density in the rodent motor cortex initially increases and then decreases during learning. Novel synapses rapidly form in the motor cortex of rodents during motor learning [48][49][50], but with continued training the growth of dendritic spines (a proxy for synapses) is followed by stabilization of the new spines and removal of old spines, and overall spine density almost reverts to pre-training levels [51,52,53 ]. Synaptic remodelling occurs both in deep [52] and superficial [41 ] layers of the motor cortex. The probabilities of deletion of old synapses and formation of new ones are typically thought of as locally governed by the rules of Hebbian and homeostatic plasticity [9 ]. Clearly, synaptic structural remodelling coincides with changes in variability of activity patterns [41 ], but how this local process relates to system-level learning mechanisms (e.g., reinforcement learning) remains unclear. It is likely that outcome-mediated exploration and selection of neural circuits interact with these local processes [9 ].
More recent studies of learning-related changes in human brain structure also show increase followed by renormalization. Using primarily T 1 -weighted Magnetic Resonance Imaging (MRI), several researchers have observed experience-dependent increases in regional estimates of human brain volume and cortical thickness in adulthood [7,54,55]. More recently, Wenger and colleagues [56 ] acquired 18 T 1 -weighted structural magnetic resonance images over a seven-week period, during which 15 right-handed adult human participants practiced left-handed writing and drawing. This behavioural paradigm was selected to tap into those dexterity-requiring fine-motor continuoussequence movements that are likely to tax the primary motor cortex and thus be closer to the animal paradigm than many other typical paradigms used for human motor learning. The images were analysed with voxel-based morphometry (VBM), which results in estimates of local grey matter probability (a mixed measure of cortical area and thickness together with local tissue composition). After four weeks, increases of grey matter probability were observed in both left and right primary motor cortices relative to a control group; three weeks later, these differences were, however, no longer reliable. Time-series analyses showed that the estimates of grey mater probability in the primary motor cortices increased during the first four weeks of learning to write and draw with the left hand, and then partially renormalized during continued practice [56 ]. The microstructural alterations underlying these changes are unknown and likely to be of many types [57]. Learningrelated changes in myelination have for example recently been shown to play key roles in motor learning [58][59][60][61]. Yet, synaptic remodelling has been demonstrated to be one possible candidate [62,63], providing an empirically untested but entirely possible link between the recent human findings and those in the rodent.

The Expansion, Exploration, Selection, and Refinement theory
The findings reviewed above have been previously synthesised in related ways by several researchers [8,9 ,10,12,13] including ourselves [5 ,6,7]. Driven by a large mismatch between the expected goal behaviour and its actual execution, a task-relevant cortical area expands. In this area, noise and strategic behavioural exploration results in trial-to-trial variability on activations of different neural circuits that can approximate the goal behaviour. Different actions are probed and different motor patterns to achieve the same goal occur. Trial-totrial behavioural variability ( Figure 1A) and variability of neural activity patterns ( Figure 1B) are therefore large. This broad activity in turn induces structural brain changes, such as formation of synapses (schematically illustrated in Figure 1C). Via outcome-mediated trialand-error learning (e.g., reinforcement learning; Figure 1B) the best-performing circuit is then selected. After circuit selection, neural activity as well as neural and behavioural variability decreases ( Figure 1A and B). Synaptic remodelling in the selected neural circuit continues to occur in a subsequent repetition-based refinement of task execution, but novel and pre-existing structure in Human Skill Learning: Expansion, Exploration, Selection, and Refinement Lö vdé n, Garzó n and Lindenberger 165

Current Opinion in Behavioral Sciences
Illustration of the expansion, exploration, selection, and refinement theory of learning.
www.sciencedirect.com unselected circuits retracts ( Figure 1C). The initial expansion of the ensemble is thus beneficial for learning because it provides a large pool of circuits from which to make an optimal selection, but memory of skill is consolidated in the selected circuitry. At the aggregate level of measurements of human brain structure (e.g., volume or synaptic density), this process is reflected in growth followed by retraction ( Figure 1D). The exploration process is enabled by activity-dependent growth of neural structure, most of which retracts after the best circuit for the job has been selected.

Future Directions
In its current form, the expansion, exploration, selection, and refinement theory is a first step toward a more mechanistic understanding of experience-dependent adaptation of brain circuits. A pressing task for future research is to endow this theory with the computational machinery that is needed to arrive at physiologically grounded and formally tractable quantitative predictions. Computational simulations, which have been successful for instance in guiding working memory research [64], will be pivotal to link the multiple levels involved [65], from neuron to macroscopic imaging and behavioural measurements. While plasticity has been an active field of research with in silico models of spiking neurons [66], and also at a more abstract level within the machine learning and artificial intelligence domains [67], more comprehensive models that relate behavioural and neuroimaging empirical data to neuroplastic changes in brain circuitry are still lacking, in particular for human learning. Notably, little is known about how experience-dependent alterations predict behavioural improvements. In contrast to the information-rich imaging methods to measure changes in the brain, which are bound to get even more sophisticated with wider availability of 7 T MRI scanners and more reliable acquisition sequences [68], and a broader repertoire of positron emission tomography tracers, most studies have resorted to relatively simple, unidimensional measures of performance (e. g. movement time). Structural and functional plasticity human studies would benefit from a higher emphasis on developing carefully controlled behavioural paradigms [69], simultaneously capturing multiple facets of behavioural change at different timescales.
Many questions remain. Which are the signals that trigger the expansion reflected in structural and functional neuroimaging measurements? Besides the aforementioned dopaminergic modulatory signalling mediating the reinforcement of actions, g-aminobutyric acid (GABA) signalling is likely to play an important role in the initial stages of neuroplastic transformation, as evidenced by observed reductions in GABA concentration within primary sensorimotor cortex in motor sequence tasks, with higher GABA concentrations in early learning stages being associated with poorer learning [70]. This suggests a role for the balance between excitation and inhibition in promoting a plastic state that favours initial expansion and subsequent exploration, and is reminiscent of the regulation of critical periods by maturing gabaergic parvalbumin-positive (PV) inhibitory neurons in early childhood [71,72]. Likewise, there must exist signals triggering the end of exploration and stabilization of representations (refinement). In the case of development, we know that perineuronal nets are important to halt plasticity to close critical periods [72], but it is less clear which factors may activate stabilization in skill acquisition, with the ensuing retraction of structure and decreases in neural activity. Overall, much remains to be elucidated concerning how the tension between stability and plasticity is regulated and how it relates to mechanisms in place to prevent catastrophic interference (the erasure of previously learned patterns when new ones are acquired to support novel movements, [73]), the cornerstone of continual learning.
A final task will be to translate our conclusions about motor skill acquisition to more general principles of learning. In any case, if we aspire to influence human learning, developing a detailed model of neuroplasticity processes is a sine qua non.

Conclusions
The last decade of research in rodents has supported the expansion, exploration, selection, and refinement theory of motor execution learning. Related evidence has started to emerge from human studies, but such data remains scarce. Many more studies are needed to consolidate this theory of human learning. Open issues also remain for the core theoretical processes that are assumed. The link between system-level learning processes and the local learning-related changes is elusive, the link between functional and structural changes remains to be detailed, and the processes linking changes in the variability of activity patterns with changes at the aggregate level have not been unveiled yet. Nevertheless, in the presence of ever more detailed data at neural and behavioural levels of analyses, we propose that new insights into mechanisms of skill acquisition will require a greater reliance on formal theory and neurocomputational modelling.

Declarations of interest
None.

References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as of special interest of outstanding interest