Time-dependent competition between habitual and goal-directed response preparation

Converging evidence indicates that separate goal-directed and habitual systems compete to control behavior1. However, it has proven difficult to reliably induce habitual behavior in human participants2–4. We reasoned that habits may be present in the form of habitually prepared responses, but are overridden by goal-directed processes, preventing their overt expression. Here we show that latent habits can be unmasked by limiting the time participants have to respond to a stimulus. Participants trained for 4 days on a visuomotor association task. By continuously varying the time allowed to prepare responses, we found that the probability of expressing a learned habit followed a stereotyped time course, peaking 300-600ms after stimulus presentation. This time course was captured by a computational model of response preparation in which habitual responses are automatically prepared at short latency, but are replaced by goal-directed responses at longer latency. A more extensive period of practice (20 days) led to increased habit expression by reducing the average time of movement initiation. These findings refine our understanding of habits, and show that practice can influence habitual behavior in distinct ways: by promoting habit formation, and by modulating the likelihood of habit expression.


Introduction
An essential aspect of many skills is the ability to quickly and accurately select an appropriate movement 1 . For instance, table tennis players must not only be able to execute shots with good technique, they must also be able to judge the flight of the ball and select which shot to play -all in less than a quarter of a second. Rapid action selection is critical to many everyday activities such as typing, driving, and playing sports. Selection speed, measured through the reaction time, is also used as the primary measure of performance in many prominent paradigms used to study motor learning, including sequence learning 1,2 , and learning arbitrary visuomotor associations 3 .
How might it be possible to improve the speed at which actions can be selected? Action selection typically depends on time-consuming computations to determine the appropriate response, but it is not always necessary to perform these costly computations every time a stimulus is encountered. By storing the outcome of common computations, the selection problem can be reduced to a direct, pre-computed stimulus-response relationship. A downside, however, of such pre-computation is inflexibility. If a change in task goals requires selection of a different action, the pre-computed stimulus-response policy will persist, leading to habitual selection of outdated responses 4,5 . The idea of storing a pre-computed policy therefore suggests a potential link between improved skill, and the tendency to become habitual following practice.
Many parallels have previously been drawn between skills and habits: both are thought to involve a qualitative change in the underlying representation of behavior [4][5][6][7][8] , both appear to recruit the basal ganglia 9,10 and the acquisition of both is associated with dopaminergic, rewardbased learning mechanisms 11,12 . Habit learning is even often studied in rodents as a model of skill acquisition [13][14][15] .
Despite these parallels, selection skill could also improve through alternative means; for instance, by learning to execute necessary computations more efficiently 16,17 , or through more P a g e 4 o f 3 5 rapid perceptual processing of stimuli. If so, improvements in the speed of response selection could occur independently of whether a response becomes habitually selected.
Here, we examined the effects of practice on the speed and habitualness of action selection in a visuomotor association task. Participants were trained to press specific buttons as quickly as possible in response to arbitrary visual stimuli. These associations were practiced for various durations ranging from a minimal amount to 20 days. We assessed improvements in the latency of action selection through changes in the reaction time required to respond to a stimulus. To determine whether action selection had become habitual, we switched the stimulus-response contingencies for a subset of stimuli -if response selection were habitual, one would expect participants to persist with the initially learned mapping 5,18 . However, assessing habitual response selection is complicated by the fact that behavior is generated through an evolving competition between goal-directed and habitual processes 4,19 . A habitually selected response might be only transiently prepared, and later replaced by a more deliberately determined response. Indeed, limiting preparation time has proven to be an effective means of prohibiting deliberate, goal-directed processes from influencing behavior 17,[20][21][22] . We therefore predicted that imposing limited preparation time would unmask such latent habitually selected responses.
In order to more precisely quantify the effects of practice, we devised a computational model that related the speed and potentially habitual nature of selecting each action to the timevarying likelihood of expressing each potential response. Fitting this model to data allowed us to identify the effects of practice on speed of response selection and the extent to which responses were selected habitually. Consequently, we were able to determine that the speed of response selection improved independently from the development of habitual response selection.

Response selection improved with practice
In Experiment 1, participants (n=22) completed a visuomotor association task in which arbitrary stimuli instructed them to press specific buttons on a keyboard ( Figure 1). To assess the effects of practice, we contrasted behavior in two conditions, a 4-Day Practice condition and a Minimal Practice condition. In the 4-Day Practice condition, participants first trained on a previously unseen stimulus-response mapping, completing 4,000 reaction-time trials (10 x 100 trial blocks for four consecutive days) in which they responded as quickly as possible to stimuli presented on the screen in rapid succession ( Figure 1d). Performance, averaged over the first and last day of practice, improved ( Figure 1e)

Limiting reaction times revealed habitual selection following practice
In order to unmask potential habitual selection of the original response, we forced participants to act at different response times, ranging from 0-1200 ms, using a forced-response paradigm 17,23,24 ( Figure 3a). Four tones were played, each separated by 400ms, and participants were instructed that they must make a response synchronously with the final tone. The time of stimulus presentation was varied from trial to trial relative to this fixed response time, effectively controlling the allowed preparation time in each trial.
We first assessed whether practice improved the speed of response selection for symbols that were consistently mapped. Figure 3 shows speed-accuracy trade-off (the probability of generating a correct response as a function of allowed preparation time; purple curve) for consistently mapped stimuli (purple curve) for an example participant (b) and averaged all participants (d). This speed-accuracy trade-off began at chance (0.25) for preparation times less than ~300 ms, indicating that participants did not have sufficient time to process the stimulus and select the appropriate response in this range, and instead had to

A computational model distinguished between goal-directed and habitual responding
The distribution of response times imposed on participants revealed a stereotyped timecourse of habitual responding. We developed a computational model to account for this behavior and better understand how it varied across participants ( Figure 4. Our model extends that proposed in our previous work 24 , assuming that participants select a responses at some time T A , which varies randomly from trial to trial (here, according to a normal distribution; Figure   4a). The speed-accuracy trade-off reflects the probability that the correct action had been selected by the time of responding. Improvements in selection speed are accounted for in the model through a shift and narrowing in the distribution of T A (Figure 4b).
To account for potentially habitual selection following revision of the map, we assumed that participants might habitually prepare the initially practiced response at a random time T     We examined whether this reflected difficulty in acquiring the revised mapping, or could be attributed to participants habitually persisting with short reaction times that had been successful during extensive practice 25 . When attempting to learn the revised mapping, participants that As expected, the 20 Day Practice condition also led to habitual response selection. The likelihood of expressing the previously practiced response was at chance for times before participants could process the stimulus (300-0ms before t min , t-test against chance, t 13 =1.0, p=0.36), then rose above chance (0-300ms after t min , t 13 =6.0, p<0.001), before falling below chance for responses at longer response times (300-600ms after t min , t 13 =3.6, p<0.01).
When forced to respond at low latencies, participants that trained for 20 days were more likely to produce habitual responses than participants that trained for 4 days (t-test on 2-Day practice vs 4-Day practice groups for t min to t min +300, t 34 =2.98, p<0.01). Our computational model again accounted for the observed behavior extremely well (Figure 7e), and demonstrated that this increased likelihood of habitual responses was attributable to the fact that practice allowed responses that were already selected habitually after 4 days of practice, to be selected more P a g e 2 0 o f 3 5 rapidly. All participants in Experiment 2 exhibited habitual selection (mean difference in AIC = 19.72). Furthermore, as in the 4-Day practice group, extending the model to allow for partial habits (0 ≤ α ≤ 1) did not provide a better description of the data (Likelihood ratio test; p=1.00).

Discussion
Our data and model show that practice led to both more rapid response selection, and habitual response selection. However, these developments followed a different time course. Response selection became habitual in most participants within four days of practice. By contrast, response speed improved over up to twenty days of practice. Furthermore, while the speed of response selection varied continuously with practice, being subject to habitual action selection appeared to be all or nothing. Variations in the likelihood of expressing the original response as a function of preparation time could be fully accounted for by continuous variations in the speed of response selection, without having to assume any continuum of habit strength. In other words, being habitual was a discrete state, whereas skill level could vary continuously.

Limiting reaction times unmasks habitual behavior
Our paradigm and results clearly illustrate the time-varying competition between goaldirected and habitual control processes. Varying the allowed preparation time modulated which response was expressed. This implies that both mappings were represented during each trial, demonstrating the existence of multiple components of learning. The relative expression of different components of learning has previously been shown to be influenced by limiting cognitive resources 26,27 , including available preparation time 17,20,22,28,29 . However, previous research has manipulated preparation time in a relatively simple 'high-or-low' manner 17,20 , or based on spontaneous variations in 'voluntarily' selected reaction times 29 . The forced-response paradigm used here allowed us to measure the temporal dynamics of these effects at far P a g e 2 1 o f 3 5 greater resolution; by assessing responses across a continuum, we were able to track the dynamically evolving competition between habitual and goal-directed selection processes.
The behavior we observed was consistent with a model which assumed that responses were selected at a random time following movement onset. Practice reduced the mean time at which a response could be selected. Practice also led response to be habitually selected.
Importantly, we suggest that selection of a response does not necessarily imply immediate expression of that response; rather, a response must be prepared and initiated separately. We have previously argued that the reaction time at which a movement is initiated is independent of preparation or selection of the required movement 24 . Participants have longer reaction times than appear necessary based on the speed-accuracy trade-off, yet also commit 'fast errors' in which they seemingly initiate movement before selecting the correct response. A similar separation between selection and initiation is particularly apparent when participants attempted to learn the revised stimulus-response relationship during the criterion training block in Experiment 2. Having practiced for 20 days, participants tended to respond rapidly, perhaps through a habitual tendency to respond at short reaction times 25 . Notably, these participants were more likely to express the previously practiced response, due it having been habitually selected.

Skills, Habits, and Automaticity
Both skills and habits are related to the notion of automaticity. Definitions of automaticity vary, but it is typically thought to involve improvements in skill, the obligatory enactment of a skill, and the ability to perform a skill with little or no conscious deliberation. Our results support links between habitual selection and automatic behavior; participants habitually chose the previously selected response, despite consciously attempting to select the revised response.
There is a long-standing debate regarding whether automaticity is a continuous 30 or discrete 31 process. Our finding that habitual selection is all-or-nothing supports the idea that automaticity P a g e 2 2 o f 3 5 might be discrete. However, we suggest that both the speed of response selection, and whether or not selection is habitual, both contribute to common measures of automaticity.
Although the notion of pre-computaion, or caching, or stimulus-response associations seemed to provide a plausible link between skills and habits, our data did not support this idea.
Participants did not achieve more rapid response selection by becoming more habitual. The exact relationship between skills, habits and automaticity remains uncertain. However, other recent findings support the fact that skill can vary independently of habit and automaticity.
Deliberate (model-based) control can end up leading to habitual 32 , while goal-directed behavior can become expressible rapidly and automatically through practice 33 . The computational basis of faster response selection, habitual response selection, and automaticity remain to be precisely determined.
Recognizing that skill acquisition and habit formation are be distinct processes has significant implications for studying the neural substrates of skills, habits and automaticity.
Previous research has failed to achieve any clear consensus on the neural basis of automaticity, proposing that automaticity arises either through increases in network efficiency 3,34 , or through discrete shifts in the brain regions that control behavior 5,35 , either within the basal ganglia 5 , within cortex 36 , or from the cortex to the cerebellum 3 . We propose these differing conclusions arise because the tasks they employ all involve practice, but their behavioral assays focus on only a single measure of performance. Separately measuring skill level and the extent to which behavior is habitual could therefore considerably enhance our understanding of the neural basis of performance improvement through practice.

General Procedures
The task involved responding to the appearance of one of four stimuli (letters of the Phoenician alphabet) by pushing a specific key on a computer keyboard with the index, middle, ring, or little finger of the dominant hand. The stimulus corresponding to each response was counterbalanced across participants, controlling for potential effects whereby participants would find some stimuli easier to recognize and learn to respond to than others. As Experiment 1 comprised two conditions and used a within-subjects design, we employed two sets of distinct stimuli (see Supplementary Figure 2), and counterbalanced the condition to which they corresponded across participants. Participants in Experiment 1 also completed the two conditions in a counterbalanced order.
Participants attempted to respond to stimuli in training, criterion test, or forced response trial blocks:

Training blocks
During training participants completed a gamified task in which they attempted to complete blocks of 100 reaction time based trials as quickly as possible (See Figure 1c). In each trial a stimulus appeared in the center of the screen, and a tone played to signal the participant that a trial had started. On correct responses a pleasant auditory tone sounded, and after a 300ms delay the task advanced to next trial. Errors were punished with an auditory buzzer sound and an enforced delay of 1000ms, after which the participant could once again respond to the same stimulus; this process repeated until the correct response was provided, at which the task progressed to the next trial. At the end of each block participants received feedback on the time taken to complete each block, and how this compared to their 'personal best' block completion time. Participants were encouraged to improve their performance by aiming to beat their personal best time each time they completed the task.

Criterion test blocks
We assessed the ability of participants to learn new, established, or revised stimulusresponse associations using criterion test blocks. Participants were instructed that reaction time constraints were removed, that their goal was to learn the correct set of stimulus-response associations, and that the block would end once they had made enough correct responses in a row. These blocks ended once participants had made five consecutive correct responses to each stimulus (minimum of 20 trials), and the number of trials required to reach this steady, high-accuracy criterion was recorded.

Forced response blocks
We used forced-response trials to probe the speed of response selection and to assess whether participants habitually selected their responses. Each block comprising 100 trials. In each trial the participant heard a series of four tones, spaced 400ms apart, and was instructed P a g e 2 5 o f 3 5 to synchronize their response with the onset of the fourth and final tone. The stimulus appeared at a random time during the series of tones, effectively controlling the time in which participants could prepare their response. As such, in cases in which participants did not have chance to process the stimulus (e.g. when it appeared less than ~300ms before the deadline of the fourth tone), they were essentially forced to guess the correct response (and thus had a 1 in 4 chance of selecting the correct answer).

Experiment 1
In Experiment 1 participants completed a counterbalanced, crossover design comprising two conditions. Both conditions began with a warm up/familiarization task. Participants On a separate day after all training sessions were complete, participants were exposed to the same assessment as in Experiment 1; they learned a revised set of stimulus-response associations in a criterion test block, and their performance on this new mapping was then probed in 5x100 trial blocks of forced-response trials.

Reaction time trials
Performance for each block was measured by taking the median reaction time (measured from stimulus onset to response onset) for correct trials, the median absolute deviation of the reaction time (this is equivalent to variance but using median instead of mean averaging, and is thus more appropriate for reaction time data), and by calculating the error rate for each block (i.e. number of erroneous responses provided in each block; note that it was P a g e 2 7 o f 3 5 possible for participants to make multiple errors in the same trial, as the trial did not advance until the participant provided the correct answer).

Criterion test trials
Criterion test trials were primarily analyzed by counting the number of trials required for a participant to make five consecutive correct responses to each stimulus. The reaction time for each response was recorded (although participants were made aware that there were no reaction time requirements for these trials).

Forced response trials
Preparation times were calculated as the time between the presentation of the stimulus and the first response that the participant made to it. Data were used to examine the likelihood of three types of response; correct responses to consistently mapped stimuli, i.e. stimuli for which the same key press was required throughout the experiment, correct responses to the revised associations, and responses consistent with the original mapping. We employed a sliding window approach to visualize the time-varying liklihood for each of these trial types and response types; responses were binned over 100ms windows, and the proportion of correct vs incorrect responses was calculated and recorded for the center of each window.

Response Selection Model
We developed a simple model to quantify participants performance and assess the relative effects of practice on the speed of response selection and whether or not response selection became habitual. We assumed that, for a given mapping A, the correct response would be selected at a random time Responses generated prior to A T would be random, while responses generated later than A T P a g e 2 8 o f 3 5 would be generated correctly with probability A q . The probability of observing a correct response, A r r = , given that the response was generated at time t is then given by ( ) is the cumulative distribution of A T . Likewise, the probability of generating any other response is given by assuming that all errors after A T would be uniformly distributed across other responses.
The speed of response selection, which gives rise to the observed speed-accuracy trade-off, is therefore represented by the parameters A μ and A σ .
To model the impact of habitual selection when exposed to remapped stimuli, we modeled each learned response, A and B, through analogous processes, i.e. we assumed that response A could be selected at some random time A T , and that response B became available at some stochastic time B T after stimulus presentation. The probability of a given response being generated depended on which events (selection of A; selection of B) had occurred by the time of response initiation:

r t t t t p t t t t p r t t t t p t t t t p r t t t t p t t t t p r t t t t p t t t t
Since participants were instructed to act according to mapping B, we assumed that if the response associated with mapping B was available, then participants would generate it (with probability B q ). If, however, response A was available but not response B, then response A would be generated. If neither response was available, participants would generate a random response. We captured the fact that random responses (before selection of A or B) might not have been selected uniformly through a parameter I q . Note that since responses were pooled across both of the two remapped stimuli, and across the two non-remapped stimuli, we only needed to include a single parameter that specified the relative baseline likelihood of selecting remapped versus non-remapped responses.
The conditional probabilities were therefore given by: Note that the bottom two rows of the matrix are the same, reflecting the fact that the response probabilities after B is prepared are independent of whether or not A has been prepared.
We similarly fit the model by finding parameter values ( B μ , B σ , B q , I q ) that minimized the penalized negative log-likelihood. We compared these two models by computing the Aikake information criterion, which takes into account the relative (unpenalized) likelihood of each model while also including a term which accounts for the number of parameters in the model.
In order to describe the possibility of habitual selection that may have been only partial, we introduced a further parameter ρ which modulated the probability that A would be Note that the habit and no-habit models described above are special cases of this more general