Dynamic Integration of Forward Planning and Heuristic Preferences in a Sequential Two-Goal Task

To decide which goal to prioritize at what point in time, one has to evaluate the consequences of future actions by forward planning. However, when the goal is still temporally distant, detailed forward planning can be prohibitively costly. One way to select actions at minimal computational costs is to use heuristics. It is an open question how humans mix heuristics with forward planning to balance computational costs with goal reaching performance. To test a hypothesis about dynamic mixing of heuristics with forward planning, we used a novel stochastic sequential two-goal task. We found that participants’ decisions substantially deviated from an optimal full planning agent at the early stages of goal-reaching sequences. Only towards the end of the sequence, participant’s behavior converged to near optimal performance. Subsequent model-based analyses showed that participants used heuristic preferences when the goal was temporally distant and switched to forward planning, when the goal was close.

Deciding which goal to select at what point in time is central to everyday life (Mansouri, Koechlin, Rosa, & Buckley, 2017;Neal, Ballard, & Vancouver, 2017). How do we decide between the many possible future goals, whose realization might require multiple actions in a dynamic and uncertain environment? One possibility is to evaluate the goals with forward planning and select the one that maximizes expected reward. Dynamic programming is a common way to calculate such optimal policies towards a goal and has been used in cognitive science to model human behavior (Ballard, Yeo, Neal, & Farrell, 2016;Juechems et al., 2019;Korn & Bach, 2018). However, when goals are multiple and temporally distant, exhaustive forward planning is prohibitively costly. In order to make decisions within reasonable time, one has to adjust the degree of control with respect to its costs and benefits (Boureau, Sokol-Hessner, & Daw, 2015). In this study, we investigated how participants integrate forward planning with heuristic decision making during goal pursuit.
To do this, we used a novel sequential two-goal task. In miniblocks of 15 trials, participants had to accept or reject offers to reach either one or two goals. The difficulty of the task was to adaptively adjust one's strategy whether one should pursue both goals in a parallel or sequential manner.
To model the choice data and delineate participants' use of forward planning and heuristic preference we used a computational model with four free parameters. The precision parameter (β) weighted optimal choice values derived by full forward planning. Optimal choice values were subjective in the sense that they were modulated by two additional parameters, discount (γ) and reward ratio (κ). A γ parameter smaller than one means that the participant undervalued choice values, when the goal was still far away. The κ parameter captured potential distortions in the participants' valuation of goal reward. Finally, strategy preference θ was an additive term modelling a participant's heuristic bias towards either a sequential or parallel goal strategy.
If we find that participants' strategy preference θ is smaller or larger than zero, we can conclude that participants indeed used a heuristic component to com-

202
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 plement any forward planning. Indeed, when inferring the four parameters for all 89 participants using hierarchical Bayesian inference, we found that participants' choices were influenced by a heuristic strategy preference ( Figure 1A) in addition to a forward planning component ( Figure 1B). For 65 out of 89 participants, we found that the 95% credibility interval of the posterior over strategy preference did not include zero, i.e. participants with positive θ were biased towards a parallel strategy and participants with a negative θ towards a sequential strategy. There was only weak evidence for an effect of γ and κ ( Figure 1C, D). To test whether participants rely more on strategy preference when the goal was still distant compared to when the goal was close, we conducted a multiple regression analysis, fitting parallel strategy choices against model parameters ( Figure 2). We found a significant interaction between strategy preference and miniblock-half (p < 0.001), demonstrating that strategy preference is more predictive for the proportion of parallel strategy choices early into the miniblock than compared to later, i.e. close to the goals. Figure 2: Linear regression of the proportion parallel strategy choice against fitted model parameters, "miniblock-half" and an interaction between "miniblock-half" and strategy preference.

Summary
The present research shows that over prolonged goalreaching periods, individuals tend to behave in a way that approaches the behaviour of an optimal full planning agent, with noticeable differences early in the goal-reaching period, but approximately optimal behaviour when the goal is close. It also highlights the potential of computational modelling to infer the decision parameters individuals use during different stages of sequential decision-making. Such models may be a promising means to further elucidate the dynamics of decision-making in the pursuit of both laboratory and everyday life goals.