Configural Learning depends on Task Complexity and Temporal Structure

This paper describes a set of associative learning experiments in which the appropriate response depends on multiple relevant stimuli. We vary both the complexity of the stimulus-response mapping (task) and the temporal structure of the stimuli that are presented. We find that both of these manipulations affect the accuracy with which the task can be learnt, and that task complexity affects the proportion of subjects who correctly provide declarative knowledge of the underlying association. Computational modelling of subjects’ behaviour, based on Dynamic Logistic Regression models, allowed us to probe the strategies that subjects employed during learning. We found that the majority of subjects employed a configural learning strategy during the complex task and a mixed configural/rule-based strategy during the simpler task. Computational modelling also provided an entropybased index of strategy exploration with greater exploration observed during the complex task.


Introduction
Associative learning of stimulus-response mappings can proceed using a variety of cognitive strategies, neuronal representations, and decision making systems (Domenech & Koechlin, 2015). This paper considers a learning context in which there are multiple relevant stimuli and the appropriate response depends on the pattern of stimuli that are presented. This type of learning has previously been studied, for example, using the Weather Prediction Task (WPT) (Knowlton, Squire, & Gluck, 1994). The multiple systems that are engaged during this task include a medial temporal lobe representational system and a habitual feedback-based fronto-striatal decision system (Poldrack et al., 2001).
In a recent study Duncan, Doll, Daw, and Shohamy (2018) created two WPT-based learning tasks. The first required an 'elemental' strategy. Here, one learns the log-odds with which the constituent elements of a pattern determine the outcome. The log-odds of the outcome are then given by the sum of the log-odds of the constituent elements -the "whole being the sum of its parts". The second task required a 'configural' strategy where the log-odds of the outcome was specific to each pattern (or 'configuration'). They found that most subjects used a configural strategy when required (and that it relied on conjunctive representations in anterior hippocampus).
Interestingly, approximately half of subjects used a configural strategy when the simpler elemental strategy would have sufficed. This suggests that configural learning is automatically engaged and rather effortless.
In this paper we describe a new a set of WPT-based learning tasks which are designed to probe the characteristics of this configural learning system. This work is motivated by the concern that both configural and elemental strategies are rather limited as general computational strategies. Firstly, elemental decision making makes the limiting assumption that the log-odds of the outcome is a linear function of the stimuli. A fully configural learning system on the other hand allows for mappings with arbitrarily complex non-linearities, but this won't scale-up because the number of configurations grows exponentially with number of stimuli.
We therefore designed our WPT-tasks to have true stimulus-response mappings that were nonlinear but had intermediate levels of complexity (between elemental and configural). Additionally, although the tasks were probabilistic each could be described using a (simple or complex) verbal rule (Ballard, Miller, Piantadosi, & Goodman, 2017). We hypothesized that subjects would abandon a configural strategy in favour of a rule-based strategy, especially for the simpler task. We also predicted that the rule subjects implicitly learnt would at some stage become explicit so that subjects would be able to declare the rule they were using.
The experiments in Duncan et al. (2018) used only 6 configurations. We scale this up to 25 in this paper so as to push the limits of the configural learning system. Another unrealistic feature of WPT concerns the temporal structure of the stimuli. In the standard WPT the probability that any particular pattern is presented on any given trial follows a uniform distribution. The patterns that appear to our sensory cortices, however, have a good deal of temporal regularity and we hypothesized that this could affect both the learning strategy implemented and the accuracy with which the task could be performed. We therefore manipulated the temporal structure of the stimuli in our experiments and hypothesized this would also affect declarative knowledge.

Methods
60 participants performed a WPT-based learning task in which they used feedback to learn which weather outcome was associated with what pattern of visual cues. At each trial two geometrical shapes, each having three four, five, six or seven sides, were presented. There are thus k = 1..25 unique patterns. Participants had to classify the pattern with a weather 771 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 outcome, Sun or Rain, (procedure in Figure 1 panel A). We assessed whether participants had learnt the task implicitly or explicitly by asking them about their knowledge after each task had been completed.
We manipulated task difficulty (within-subjects factor), and the temporal structure of the cues (between-subjects factor). Task difficulty varied over two levels such that a "simple task" could be described using a single logical clause whereas a "complex task" could only be described using multiple logical clauses. The probabilistic structure of these two tasks was specified by making the log-odds of the outcome a quadratic function of stimulus characteristics. Flipping the sign of a single parameter in this mapping produced either the simple or complex rule as shown in Figure 1 panel B by the signed dependency structure.
Cues were presented with three different temporal structures. The first was generated as is standard in the WPT such that the probability of the kth cue pattern occurring was uniform over trials, p(u t = k) = 1/T where T = 250 is the number of trials per task. The second structure, was generated to create a "blocky" design such that the probability of the kth cue pattern occurring was concentrated within an interval. We . We refer to this as an 'Interval' structure by analogy with interval training in physical exercise. The third temporal structure was a mixture of the first two. The first 175 trials were generated from an interval distribution, the last 75 from a uniform one. We conceived of this last section of trials as a revision period in which knowledge could be created or consolidated.
The 60 subjects were split into 20 who were exponsed to a uniform structure, 20 to an interval structure and 20 to a mixed structure. Each subject learnt both simple and complex tasks. For the simple task the subject should decide "Sun" if the number of sides of the two geometrical shapes was the same. For the complex task the subject should decide "Sun" if the total number of sides was equal to ten.

Computational Model
Subjects behaviour was modelled using a Dynamic Logistic Regression (DLR) framework (Speekenbrink, Channon, & Shanks, 2008) in which the outcome on each trial, y t = {1, 0} for {Sun, Rain}, was modelled as where λ is a subject-specific decision noise parameter and the activations are given by The regression coefficients were estimated online using gradient ascent as where α is a subject-specific learning rate.
A Configural model (m = 1) is specified by choosing x t to be a [Kx1] binary vector with kth entry equal to 1 if the kth pattern was presented on trial t. This model therefore has 25 regression coefficients.
A Rule model (m = 2), is specified by choosing x t to be a two element binary vector with entries [1, 0] if condition "C" is met, otherwise x t = [0, 1]. Here "C" is either "The two shapes have the same number of sides" (Task 1) or "The sum of the sides of the two shapes is 10" (Task 2). This model therefore has 2 regression coefficients.
A Mixed model (m = 3) is a mix between the rule and the configural model, where if the rule is met x t = [1; zeros(20, 1)] (in matlab notation). If the rule condition is not met then x t is set to the all zeros vector but with a 1 in position k + 1 if pattern k was presented. This model therefore has 21 regression coefficients.
Subject specific parameters, θ = {λ, α}, were estimated using Bayesian inference where the log likelihood of subject decisions, A = {a 1 , .., a t , .., a T }, is given by and Gaussian priors were used over LogIt functions of parameters (Mathys, Daunizeau, Friston, & Stephan, 2011) to define prior densities, p(θ|m). A Laplace approximation was used to compute the evidence for each model, p(A|m), which facilitated Bayesian Model Comparison.
We also computed a running estimate of the model probabilities as each task progressed. This assumed a flat prior over models at t = 0, that is p(m t ) = 1/3 indicating subjects had no initial preference for the configural, rule or mixed models, which was then updated recursively using Bayes rule p(m t = i) = p(a t |m t−1 = i)p(m t−1 = i) ∑ j p(a t |m t−1 = j)p(m t−1 = j)

772
Finally, we computed the entropy over this distribution where h t is a trial-by-trial measure and H is the average entropy over a task which we use as an index of strategy exploration.

Declarative Knowledge
For the simple task 28 out of 60 subjects declared the correct strategy whereas only 1 out of 60 did for the complex task. This significant difference (x 2 = 33.14, p < 0.001) confirms that our difficulty manipulation was successful. There was no significant variation in number of declarations as a function of Temporal Structure -see Table 1.

Overall Accuracy
We define overall accuracy as the correct rate averaged over all 250 trials. We ran a two-way mixed design ANOVA with dependent variable overall accuracy and factors of (i) difficulty and (ii) temporal structure. This revealed both main effects, interaction not significant (Difficulty, F(1) = 21.576, p < 0.001; Temporal structure, F(2) = 3.908, p = 0.025). See Table@2 for full results. Participants performed better in the simple task compared to the difficult one (see Figure 2 A). Also, participants performed differently based on the temporal structure they were presented with (see Figure 2 B). These significant differences show that our manipulations were successful. Tests of simple effects showed that accuracy was higher in the interval than uniform condition for the complex task (t(19) = 3.076, p = 0.006) but not for the simple task. Participants who explicitly declared the strategy performed better (t(58) = 2.282, p = 0.026).

Inferred Strategy
Bayesian model comparison revealed that during the simple task, most of the participants used the mixed strategy (34/60), only 10/60 used the configural strategy and 16/60 the rule strategy. Conversely, during the complex task 48 participants used the configural strategy, 11 the rule strategy and just a single participant the mixed strategy. Tables 3 and 4 give a  Table 2: Accuracy Mean accuracy per task difficulty and temporal structure. Standard error of the mean between brackets.
breakdown of the inferred strategy as a function of temporal structure.

Strategy Exploration
A two-way mixed ANOVA with temporal structure as a between-subjects factor and task difficulty as a within-subjects factor revealed both main effects, but no interaction (Temporal structure, F(2) = 11.422, p < 0.001; Task Difficulty, F(1) = 4.239, p = 0.044). Entropy, H, is higher in the complex task, indicating more exploration, as compared to the simple task. Furthermore, entropy is higher with uniform temporal structure as compared to both the interval (t(38) = 4.777, p < 0.001) and the mixed structures (t(38) = 2.337, p = 0.024), entropy is higher with mixed structure compared to interval (t(38) = 2.485, p = 0.017) as shown in Figure 2 C.

Discussion
We designed this study in order to investigate why and in which circumstances participants are pushed to drop a configural strategy, to explore and engage with alternative strategies. We designed a task where participants were not directly instructed to find a rule, but to learn the association between stimuli and outcomes. Participants had the possibility to explore different alternative strategies and use the one they preferred. Given the number of combinations among our stimuli (twenty-five), a configural strategy was cognitively expensive so the search for another strategy was implicitly incentivized. Strategy exploration comes with a cost though, so the trade off was between a cognitively expensive configural strategy or cognitively expensive strategy exploration.
We found that a high proportion of subjects correctly declared the underlying rule for the simple task (28/60) but not for the complex task (1/60), thus validating our task design. Participants who explicitly declared the strategy performed significantly better than those who did not.
As expected, a larger number of configurations than in Duncan et al. (twenty five versus six) pushed participants beyond the configural strategy. For the simple task the majority of subjects used the mixed configural/rule-based strategy. For the complex task, however, subjects continued to use the configural strategy. Strategy exploration was higher for the complex task, presumably because no explicit rule could be found.
We found main effects of temporal structure and task complexity on accuracy, but no interaction. This implies that the effect of temporal structure does not depend on difficulty. This may be a power issue, however, as tests for simple effects (interval versus uniform) were significant for the complex task but not for the simple task.
We had hypothesized that that temporal structure would have an effect on declarative knowledge. Specifically, that a mixed temporal structure (with focussed intervals followed by a revision period) would result in more subjects developing explicit knowledge. However this turned out not to be the case.
Our last set of findings was that strategy exploration was higher for the complex versus simple task and higher for the uniform temporal structure that is standard in WPT. This suggests that the more naturalistic temporal structures are less confusing for subjects.
Our results complement those found by Duncan et al. (2018). They showed how humans naturally learn relationships between outcomes and configurations of stimuli, and we have shown the contextual constraints on these behaviors. Our findings provide a foundation for further behavioural and neuroimaging studies. For example, will be able to use the Simple Task to study how explicit knowledge is derived from configural and mixed configural/rule-based strategies. Similarly, we can use the Complex Task to study purely implicit learning and use contrasts with the Simple Task to identify creation and use of explicit knowledge.