Childhood unpredictability and the development of exploration

Significance Exploration, or sampling for new information, facilitates discovery and socio-emotional learning that sets the stage for adaptation and well-being later in life, yet it is costly in the short run. In this experiment, children who experienced their lives as less predictable explored less for information because of a preference for familiarity and a tendency to repeat their previous responses—even when those choices yielded lower rewards. Thus, this study revealed that exploration could be a pivotal learning parameter that influences developmental trajectories and that environmental unpredictability, though currently understudied, constitutes an important dimension of the childhood environment and warrants greater attention in understanding human development.

between left and right options; +1 if left is more informative, i.e. played less in forced trials; -1 if right is more informative), A (information bonus), B (side bias), and s (decision noise).

𝑝(𝑙𝑒𝑓𝑡) = 1 1 + exp ( ∆𝑅 + 𝐴∆𝐼 + 𝐵 𝜎 )
Results.We replicated the finding that childhood unpredictability was associated with reduced directed exploration using the same linear regression model reported in the main text, only replacing the empirical percentage of exploration with the computational parameter of information bonus specified above.Results were consistent in the primary (B = -2.1,SE = .90,t(73) = -2.34,p = .02)and the replication study (B = -3.92,SE = .90,t(73) = -4.41,p <.001).The computational modeling approach did not allow us to use trial-level data to examine strategic exploration or the mediating mechanisms.This is because it generates two parameters averaging across all short and long horizon games for each participant.Thus, the number of observations was reduced from 80 to 2 for those analyses, which significantly wiped out between-trial variance and reduced the power.Analysis code, data, and stimuli are available at https://osf.io/5ba43/.

Additional Mediating Mechanisms for Reduced Exploration in the Horizon Task
We investigated two additional mediating mechanisms beyond temporal discounting and habitual responding using empirical trial-by-trial behavioral data.The first possibility was that unpredictability may lead to more impulsive choices.We regressed the reaction time of the first free choice on childhood unpredictability and task conditions (i.e., horizon, reward-information conflict) in a linear mixed effects model.Results did not support this hypothesis, as there was no significant change in reaction time associated with increasing unpredictability regardless of the horizon (Primary Study: B = -.03,SE = .08,t = -.37 p = .71;Replication Study: B = .17,SE = .12,t(73) = 1.41, p = .16).Next, we tested whether unpredictability hindered reward-maximization motivation or ability, which was not supported either.We regressed the percentage of participants choosing the high-payout option at the last choice of the longhorizon games on childhood unpredictability in a generalized mixed-effects model.Unpredictability had no significant effect on the rate of landing at the "correct" option when children had accumulated more information to disambiguate the context (Primary Study: Odds Ratio = .96,p = .68,95% CI: [0.79, 1.16]; Replication Study: Odds Ratio = .92,p = .35,95% CI: [0.78, 1.09]), controlling for reward differential.Thus, there was no evidence that the reduced directed exploration was associated with impulsive actions or deficits in reward maximization.

Orchard Task Additional Information
In all orchards, the average initial supply on each tree was randomly drawn from a Gaussian distribution with a mean of 10 and an SD of 1. Then it gradually dwindled with repeated harvests, with a depletion rate randomly drawn from a Beta distribution (alpha = 14.9, Beta = 2.0).This setup yielded a mean depletion rate of 0.88, meaning repeated harvest would, on average, yield 12% fewer apples than the previous harvest.Participants needed to press a key to make decisions within a 1 second response window right after the brown dot under each tree turns white (see Main Text Fig. 1b), or they received a brief warning message, during which they cannot harvest or travel to the next tree.On average, participants received .01(SD = .02)piece of warnings, indicating that their attention was well sustained during the task.In all orchards, the "harvest" time was 3 seconds.Reaction time was not counted into the task duration to ensure that the stay-or-switch decision was the only factor influencing the number of apples obtained.At the end of each orchard, participants saw the number of points they got in that orchard.

Instructions
The same instructions were presented to participants in the Primary and the Replication study.
Welcome! Thank you for volunteering for this experiment.
In this game we would like you to choose between two stacks of boxes.
The stacks of boxes look like this.
Every time you choose one stack, the lever will be pulled like this.
And the points you earn will be shown like this.For example, in this case, you chose the left stack this box is giving you 77 points.
Most of the time you will find about the same number of points for all the boxes in a stack, but some may give you a few more or a few less points than others.
For example, the average reward for the stack on the right might be 50 points, but the first box might give us 52 points.
On the second box we might get 56 points.
If we open a third box on the right we might get 45 points this time.
… and so on, such that if we were to play the right bandit 10 times in a row we might see these rewards.
Both bandits will have the same kind of variability and this variability will stay constant throughout the experiment.
On each game, either the left or the right stack will give you more points on average and is the better option to choose for that game.
To make your choice: Press '<' to play the left stack, and Press '>' to play the right stack.
On each game you can tell how many choices you will have by the height of the stacks.For example, when the bandits are 10 boxes high, there are 10 trials in each game.
When the stacks are 5 boxes high there are only 5 trials in the game.
Finally, for the first 4 choices in each game we will tell you which option to play.
There will be a green square inside the box we want you to open and you must press the button to choose this option so you can collect your points and move on the next trial.For example, if you are instructed to choose the left box on the first trial, you will see this.
If you are instructed to choose the right box on the second trial, you will see this.
For the last few choices in each game, you will see two green squares which means you can pick either stack.Each time you see 2 new stacks of boxes on the screen, a new game is starting.
So, to be sure that everything makes sense lets go through some things.
[Comprehension questions will be administered here.] Press space when you are ready to begin.Good luck!

Comprehension questions
In the Primary study, these questions were administered verbally to participants.Experimenters corrected and re-instructed participants who did not provide a reasonable answer to an item.In the Replication study, these questions were presented to participants on the computer screen.Conditional feedback will be provided for wrong choices.Then the question would repeat for a second time to re-check understanding.-Reference to goals of task: to get as many points as possible, and/or decide based on the information on each side's boxes.

Orchard task instructions
Before participants started the task, they received task instructions and performed a short practice session where they were presented with both types of foraging environments and practiced pressing the keys for different decisions.This pre-exposure allowed participants to familiarize themselves with the task setup and strategy selection.
Thank you for participating!In this game, you will make choices that can earn you points.
You will be in a series of orchards where you visit trees to harvest apples.You have a limited amount of time in each orchard and you need to decide whether to spend this time harvesting apples at a tree or moving to a brand new tree.Some trees produce more apples than others and each apple is worth points.Your goal is to earn as many apples as possible.
On each trial you will see a tree.You can decide to harvest apples by pressing the Down arrow key or move to a new tree by pressing the Right arrow key.
Harvesting apples takes up some time but earns you points.However, you will tend to find that the more