Children perform extensive information gathering when it is not costly

Humans often face decisions where little is known about the choice options. Gathering information prior to making a choice is an important strategy to improve decision making under uncertainty. This is of particular importance during childhood and adolescence, when knowledge about the world is still limited. To examine how much information youths gather, we asked 107 children (8–9 years, N = 30), early (12–13 years, N = 41) and late adolescents (16–17 years, N = 36) to perform an information sampling task. We find that children gather significantly more information before making a decision compared to adolescents, but only if it does not come with explicit costs. Using computational modelling, we find that this is because children have reduced subjective costs for gathering information. Our findings thus demonstrate how children overcome their limited knowledge and neurocognitive constraints by deploying excessive information gathering, a developmental feature that could inform aberrant information gathering in psychiatric disorders.


Computational modelling
To understand the neurocomputational mechanisms underlying information gathering, we used a model previously developed for this task. Here, we present its key features -a detailed description of the model can be found in Hauser et al., (2017b) and subsequent papers (Hauser et al., 2017a(Hauser et al., , 2018).
The computational model assumes that agents make inference about which colour is more likely to form the majority of cards ( | , ) y P MY n N with MY being a majority of yellow, given the current amount of yellow cards (ny) out of a total of N sampled cards. P(MY) is calculated using the current number of cards, thereby inferring the generative probability likely to have caused this proportion of yellow cards (for details cf (Hauser et al., 2017b)). The actual generative probabilities in this task were close to 50% in 2/3 of trials (probability across all draws of one trial), and close to 70% in 1/3 trials, similar to our previous study (Hauser et al., 2018).
This belief is then used calculate the value of taking the decision to declare 'yellow' or 'blue' (Q -value), weighting both potential gains and losses by the inferred probability of them taking place with Rcor and Rinc being the potential wins and losses (set to +100 / -100 here; for discussion cf (Hauser et al., 2018)).
A more challenging computation is the estimation of the action value for not deciding (Q(ND)). This is computed as the sum of the value of the future states (V(s'), a weighted sum of the Q values in that state), weighted by how likely they will materialise (P(s'|ny,N); i.e. how likely will I end up in that state given the cards that I have opened so far). In addition, a cost per step c is incurred, which accounts for the subjective costs for sampling more information and thus spending more effort, time, and points (in the decreasing condition) on gathering information.
In accordance with our previous studies (Hauser et al., 2017b(Hauser et al., , 2017a(Hauser et al., , 2018, we found that these subjective costs did not follow the explicit costs (i.e. 0 in the fixed, -10 points in the decreasing condition; named 'objective' model), and that the costs also did not follow a linear increase ('linear' model), but rather the costs follow a non-linear form (modelled as a sigmoid) in which the costs increase during sampling (Fig. S2). The best performing model had two free parameters, a scaling parameter cs that determined how big the maximal costs could be, and an intercept p, which determined after how many samples (n) these costs started to escalate (also cf. Drugowitsch et al., 2012;Murphy et al., 2016).
Lastly, the choice policy was determined using a softmax rule with decision temperature τ and an additional epsilon greedy element (ξ) that captures choices that were not adequately captured by the model. Because our model was an exact partially observable Markov decision process estimated through by dynamic programming, the policy was not only used for choice arbitration, but also in the planning process (cf Hauser et al., 2017b for details).
τ thereby is not only important in arbitrating between different choice values, but also critical in the planning stages, in which this parameter portrays the precision of the inference process.
A low τ means that an agent believes to make most precise decisions, which in turn means that it primarily evaluates the choice trajectories that it considers likely. A high τ on the other hand means that planning is taking place in a less precise manner Hauser et al., 2016;Schwartenbeck et al., 2019). Model comparison revealed that models with separate decision temperature parameters performed better than ones with shared parameters (data not shown).

Model fitting and comparison
To determine the best fitting model, we compared different models with different forms of cost structures to it (cf above). To compare their model fit, we use out-of-sample prediction using 5-fold cross-validation assessing the predictive likelihood. This is a measure that finds an optimal balance between complexity and accuracy using the held-out data (cf. Dubois et al., 2020). We compared the models across all subjects, which allows to determine which model is the best across all subjects. Model parameters were optimised using fmincon with multiple starting points to overcome local minima. In line with our previous studies, we found that a model with nonlinear costs outperformed all other models (Fig. S2a). We additionally looked at model selection within each group to assess whether there were fundamental differences in how subjects performed the task (as captured by the models). We found that the same model that won across all subjects ('nonlinear w scaling') also won in each age group (Table S1).
However, it should also be noted that the differences between the model increased by age with only very small differences between winning and second model in the children. It should also be noted that the model performance cannot be directly compared between subjects. This is because in this information gathering task, the number and art of choices depends on the information gathering behaviour. Concretely, the more information a subject gathers, the more non-decision choices are fitted using the model. It is thus only valid to compare model performance on a within-subject basis. Lastly, we conducted a parameter recovery analysis that confirmed that all parameters could well be identified (Fig. S2b).

Analysis of linear age effects
To investigate how information gathering changes during development, we gathered data from three distinct age groups and in our main analyses, we used (repeated-measures) ANOVAs to detect differences between groups. This approach is agnostic to the exact developmental trajectories and simply investigates whether there is a difference between groups. It therefore does not constrain us to a specific developmental pattern. An alternative approach is assuming a linear change with chronological age (rather than age group), which can be analysed using linear regressions. To additionally test for these linear effects, we here report the key analyses using linear regressions using the subjects' chronological age as predictor.

Age effects on task performance
Investigating the key behavioural signature in our tasknumber of draws before decisionwe replicate the effects showing that there is a significant linear decrease in information gathering with age for the fixed (β=-.458, p<.001), but not for the decreasing condition (β=-.059, p=.545).
In addition, we found an age effect on the accuracy in both conditions, similar to the main analysis. This was stronger in the decreasing (β=.302, p=.002) than in the fixed condition (β=.211, p=.029).

Age effects on model parameters
Next, we analysed how linear age affects the model parameters. Investigating parameter p, we found a significant effect in the fixed condition (β=-.404, p<.001), but no effect in the decreasing condition (β=-.087, p=.375). When analysing the subjective cost of sampling parameter cs, we again found a significant age effect (β=-.357, p<001). These findings are thus 6 in line with the results reported in the main analysis showing multiple age-related effects in subjective sampling costs.
We then analysed the model parameters τ and found only a significant effect in the decreasing condition (β=-.374, p<.001). This parameter in the fixed condition, however, did not reach significance (β=.170, p=.081). Lastly, we again did not observe any age-effects on the noise floor parameter ξ (β=-.021, p=.831). These findings again confirm the results of the main analysis.  Figure S1. Children earn marginally more points in the fixed condition. We investigated overall performance as measured by the number of points subjects earned in this task. This is a rough measure of how well subjects played the task. We foundsimilar to our previous studiesthat all subjects earned more points in the decreasing condition (F(1,104)=83.08, p<.001, η2=.44). Whilst there was no difference between the groups (F(2,104)=.50, p=.607, η2=.01), there was a trend for a condition-bygroup interaction (F(2,104)=2.68, p=.073, η2=.049). This was mainly driven by a slight (uncorrected) trend of children winning more points than the young adolescents in the fixed condition (t(69)=1.96, p=.054, d=.48), where they also opened more cards and thus made better-informed decisions. This indicates that children benefitted mildly from their increased information gathering behaviour in the fixed condition by winning marginally more points. t: p<.10. Figure S2. Model comparison. (A) Model comparison predicting cross-validated hold-out data revealed that a nonlinear increase in subjective sampling costs fits subjects' performance best. (B) Confusion matrices of the winning model shows that the parameters could be recovered using 50,000 simulated agents. 1: fixed condition, 2: decreasing condition.