The impact of choice discriminability and outcome valence on visual decision making under risk

Much of human activity involves perceptual or perceptuo-motor choice between options with uncertain out- comes. Previous research suggests that decisions in these contexts can be near-optimal in some circumstances but can also be significantly biased. Here we investigate how biases might depend on: i) discriminability of available choice outcomes, adjusted by manipulating the Expected Value (EV) function curvature; ii) outcome valence, which changes the tendency for risk seeking/aversive behaviour in cognitive decision making. In three experi- ments, participants set the size of a catcher in order to catch a dot moving on a random walk (with varying levels of predictability) after it emerged from behind an occluder. Catching and missing the dot were associated with scoring a variable number of outcome points depending on catcher size. In experiment 1 outcomes were most discriminable (high EV curvature) and catcher size settings were near-optimal. In experiments 2 and 3 outcomes were harder to discriminate (low EV curvature) and there was a significant bias to set the catcher size too small. Unlike cognitive decision making, the valence manipulation had little effect. Subsequent analyses suggest observed biases might reflect participants moving settings towards the region with highest EV curvature, where feedback is most informative. These data suggest that: i) unlike cognitive decisions, in this task choices are largely insensitive to outcome valence; ii) EV curvature is potentially an important factor when interpreting performance in such tasks; iii) Choice may be biased towards high EV curvature regions, consistent with value being placed on exploration to increase information return.


Introduction
Much of human activity involves choice between uncertain or risky options involving different potential outcomes with associated probabilities of occurrence. Typically people associate the term decision making (DM) in this context with higher level cognitive choices, for example between different investments, whether or not to take out home insurance or which available medical treatment to choose. Decades of research has investigated such decision making under uncertainty and risk (e.g. see Johnson & Busemeyer, 2010 for review). In a very common experimental paradigm in this field, participants are asked to choose between hypothetical lotteries L1 and L2, e.g. would you prefer L1: a 50/ 50 chance of winning £100 vs. winning nothingsummarised as (0.5, £100; 0.5, £0) or L2: a guaranteed win of £49 (1.0, £49). Interestingly, a significant majority of people prefer L2 in this task, even though the expected (i.e. the average) value (EV) of L2 is lower than that of L1. Such behaviour can be characterised as risk aversechoosing the less risky option even though it is less valuable on average. Perhaps even more interesting is behaviour when the choice is transported to the loss domain, i.e. would you prefer L1: a 50/50 chance of losing £100 vs. losing nothing (0.5, -£100; 0.5, £0) or L2: a guaranteed loss of £49 (1.0, -£49). Now a significant majority of people prefer L1 over L2 even though the EV of L2 is higher. This tendency can be characterised as risk seekingchoosing the risky option even though it is less valuable on average. Such behaviour is typically characterised as biased, since it departs from that of a theoretical agent that chooses consistently in line with the maximum expected utility of lotteries. A full description of human decision-making behaviour using such paradigms was at the heart of the development of Prospect Theory (Kahnemen & Tversky, 1979) which is now a dominant descriptive model of human DM used in economic theory.
Although humans do face many important cognitive decisions of the kind described above, in fact, we make many more decisions every day at a much lower level (Wolpert & Landy, 2012). Perceptual and/or perceptuo-motor decisions are choices made to support action. For example, using visual information about the speed or distance of an approaching vehicle to decide whether or not to overtake the car in front or deciding where to place your foot on the next step along a rocky path. Clearly, we must constantly engage in such perceptual and perceptuomotor choice processes so that we can successfully move around in, and interact with, our environment.
In the last 20 years numerous studies have investigated perceptual and perceptuo-motor decisions in the context of statistical decision theory (Maloney, 2002), using tasks which are formally equivalent to lottery selection paradigms from higher level cognitive decision making (e.g. Trommershäuser, Maloney & Landy, 2003;Tassinari, Hudson & Landy, 2006;Wu, Trommershäuser, Maloney & Landy, 2006;Gepshtein, Seydell, & Trommershäuser, 2007;Landy, Goutcher, Trommershäuser & Mamassian, 2007;Maloney, Trommershäuser & Landy, 2007;Hudson, Maloney & Landy, 2008;Wu, Delgado & Maloney, 2009;Warren Graf, Champion & Maloney, 2012;Jarvstad, Rushton, Warren & Hahn, 2012;Jarvstad, Hahn, Rushton & Warren, 2013;Ota Shinya & Kudo, 2015;Jarvstad, Hahn, Warren & Rushton, 2014, Farmer, El-Deredy, Howes & Warren, 2015. Across many of these tasks human performance is good and, in some circumstances, close to the Maximum Expected Value (MEV) performance of an optimal agent. Moreover, there is evidence that when tasks are appropriately designed to be equivalent, performance is both good (close to optimal) and commensurate for cognitive, perceptual and perceptuo-motor choices (Jarvstad et al, 2013). However, in certain contexts performance breaks down. Perhaps unsurprisingly, this is particularly evident when the task requires information processing beyond the limits of human capabilities (e.g. Wu et al., 2006, Jarvstad et al., 2014, Ota et al., 2015. Of particular importance here is the issue of judging performance when the difficulty associated with finding the optimal solution is not fixed across tasks. Jarvstad et al. (2014) point out that when the difference in EV between lotteries is small (and, in particular, below the threshold imposed by human perception), then finding the optimal solution is clearly more difficult and it is therefore unsurprising that humans cannot locate it. However, they also point out that, given the small differences in EV, such sub-optimality is unlikely to be materially important to performance. Taken together, this body of work suggests that performance in such tasks is variable across paradigmsit is often good and can be near-optimal, but this is certainly not always the case. Moreover, following Jarvstad et al. (2014), when characterising behaviour as sub-optimal care should be taken to consider both the difficulty in finding the optimal solution and the range of behaviours over which the material consequences of bias are small. In an early, influential example of this work in perceptuo-motor choice, participants were asked to make rapid pointing movements to a touch screen to hit within the boundary of a circular target reward region while avoiding a (possibly overlapping) penalty region (Trommershauser et al, 2003). Points were awarded/taken away for pointing in the target/penalty region. It can be shown that this task is formally equivalent to a risky choice between motor lotteries, with one lottery available for each aim point. The risk in the choice comes from the intrinsic uncertainty associated with making a rapid pointing movement to a target pointif one repeated this many times intrinsic neural and motor variability would lead to a 2D cloud of touch points. Using this task it was shown that motor lottery choice (i.e. the mean aim point chosen) was close to that of an optimal agent (in the sense of maximising EV) that had access to an individual participant's parameters for the bivariate motor noise process given an aim point. Participants in this task can be considered about as good as possible under constraints imposed by the performance limits of the motor system. Although these constraints prevent the human from being formally optimal (which would require the human to have no constraints) such behaviour has been described as computationally rational, i.e. optimal when unavoidable constraints on the agent are taken into account (see Lewis, Howes, & Singh, 2014;Howes, Warren, Farmer, El-Deredy & Lewis, 2016; and see the related concept of resource rationality in Griffiths, Lieder & Goodman, 2015).
Given unavoidable variability in movement planning, humans must make many choices like the one from Trommershauser et al (2003), where intrinsic noise is dominante.g. where to aim the hand when picking up a wine glass so as not to knock it over. However, there are also many (and arguably more) circumstances where action choice is required and in which the major source of noise is external to the participante.g. deciding where to place the foot on rocky/slippery path, deciding where to aim the racket in badminton, or deciding on the right time to try to swat a fly. In such scenarios the probability structure on which the outcome depends is determined by sources which are external to the decision maker. Consequently, an optimal agent would first need to be able to estimate this external uncertainty before evaluating the expectation of each possible lottery and selecting the best one. Graf, Warren & Maloney (2005) have developed a task that assesses ability to recover an estimate of external uncertainty in the trajectory of a moving dot. Participants first observed a dot following a random walk trajectory before it disappeared behind an occluder and were then asked to position and adjust the size of a catcher region on the other side of the occluder to the extent they were confident it would catch the dot. This was repeated for a range of dot trajectory reliabilities and occluder widths. This study provided compelling evidence that participants were able to estimate a parameter that was tightly coupled to the variability in the dot path. Not only did they make the catcher smaller when the path was more reliable or the occluder width was smaller, they made it smaller by an appropriate amount. Following on from this work, Warren, et al. (2012) extended the motion extrapolation paradigm from Graf et al (2005) to incorporate decision-making under risk (Fig. 1). Similar to Graf et al (2005) a dot moved on a random walk (with varying levels of reliability) until it disappeared behind an annular occluder. Participants then set the location and the angular size of a catcher on the other side of the occluder in order to try and catch the dot when it emerged on the other side ( Fig. 1A,B). However, participants were also awarded points for catching the dot such that the number of points awarded decreased as the catcher size increased (Fig. 1C).
Awarding points for different outcomes makes this task formally equivalent to the perceptuo-motor pointing task from Trommershauser et al (2003), but instead of choosing between motor trajectory lotteries the choice is between different catcher size lotteries, where the uncertainty is external to the participant and determined by the variability in the random walk of the target dot. Using Monte Carlo simulation it was possible to obtain a numerical estimate of the probability of catching a dot for a range of levels of uncertainty in the dot path and occluder widths across the range of possible catcher sizes (see general methods). Taken together with the points awarded for catching a dot, the EV of each catcher size could then be calculated and the optimal (MEV) catcher size, θ MEV C , was estimated from the peak of this EV curve (Fig. 1C). Similar to Trommershauser et al (2003), in this very different task where uncertainty is largely external to the observer, Warren et al (2012) found that participant catcher size settings were close to those of a computationally rational agenti.e. they were able to reliably find the peak of the EV curve (the MEV solution) across a range of conditions (in which occluder width and dot trajectory reliability were varied).
In the present study, across three experiments, the motion extrapolation DM task highlighted in detail above from Warren et al. (2012) is used to address two research questions. The first question considers how bias depends on the discriminability of the potential outcomes/behaviours? This is motivated by Jarvstad et al. (2014) (see above), which emphasised the importance of considering the magnitude of differences in expected value of different decision alternatives. It is anticipated that when differences are small (so that the MEV solution is less easily discriminated from other solutions) bias will be larger. To address this issue we manipulated the curvature of the EV function ( Fig. 1C) both within and across experiments. High curvatures are associated with big differences between EVs for a unit change in catcher size and so discriminating between outcomes is easier. Conversely, low curvatures arise when differences in EV are small for a unit change in catcher angle and so discriminating between outcomes becomes harder. Note that, in spite of the predicted increase in bias, such a finding might still be consistent with the idea of computational rationality since ability to discriminate the different outcomes would be another unavoidable constraint on performance, i.e. performance might still be as good as possible under such constraints. Nonetheless, if biases are observed then it will be interesting to measure both the magnitude and direction of such biases and how they depend upon EV function curvature.
Our second research question relates to the robust finding from studies of cognitive DM under risk that the direction of the biases observed depends on the valence of the options available. As noted above, risk averse behaviour is commonly observed for choices between lotteries in the gain domain, whereas the opposite bias, i.e. risk seeking behaviour, often accompanies choices between lotteries involving loss. This raises the question of whether similar tendencies will be observed in the perceptual decision making under risk task of Warren et al (2012)? Finding that similar effects are present in both cognitive and perceptual decision-making contexts would suggest that the underlying neural processing is similar for both higher-and lower-level decision making. In each of our 3 experiments we will therefore also contrast decision behaviour when outcome valence is manipulated, i.e. when outcomes involve losing vs gaining points for catching the dot. Based on the equivalent cognitive task it might be anticipated that behaviour will be more risk seeking when choices are in the loss domain relative to the gain domain. However, in previous research using the motion extrapolation paradigm, performance is near to that of a computationally rational agent, so it will be interesting to see if any evidence of such effects is observed at all in this perceptual decision-making context. With this in mind it is likely to be particularly informative to look for the effects of outcome valence when EV curvature is lower. If, as anticipated, bias is higher in such circumstances then the effects of outcome valence might be more evident in such circumstances (i.e. when behaviour is more obviously biased).

Participants
110 undergraduate students (Exp 1: N = 30; Exp 2 N = 30; Exp 3: N = 50) from the University of Manchester were recruited using the SONA recruitment system and received course credits for taking part in the experiment. Ethical approval was granted by the ethics committee of the Division of Neuroscience and Experimental Psychology at the University of Manchester. All participants gave informed consent before taking part in the experiment. Data collection in each experiment was undertaken over two (experiment 3) or three (experiments 1 and 2) sessions of no longer than 30 min each.

Procedure
Methods were very similar to those used in Warren et al. (2012). Participants were seated with the head on a chin rest at a distance of approximately 57 cm from a 15in LCD display (resolution 1280 × 1024, 60 Hz). On each trial participants first watched a dot following a random walk trajectory at 3 cm/s from the centre of the screen to the inner edge of a ring shaped (annular) occluder (distance from centre to the inner edge of occluder was 6.5 deg and occluder width was 2.5 deg) (Fig. 1A). For a detailed treatment of the random walk process see Warren et al. (2012). Briefly, dot direction on each frame was sampled from a von Mises distribution centred on the current direction with concentration parameter K. Higher values of this parameter are associated with lower spread in the distribution and, accordingly, more reliable (predictable) trajectories. On hitting the occluder the dot disappeared and the participant was instructed to set the angular location (using the mouse) and angular size (using the up and down keys on the keyboard to increase catcher size in 1 degree steps) of a catcher arc on the other side of the occluder that they thought would catch the dot when it re-emerged (note that this was not a real time task, i.e. participants had as much time as they liked to make their catcher settings). Participants could set the size of the catcher to be anywhere in the range 0 to 100 degrees. Points were awarded for catching the dot but decreased as a function of catcher size and missing the dot led to a penalty (see specific methods for each experiment for more details on the reward/penalty structure). When happy with their setting participants pressed a mouse button and received feedback on the outcome of that trial, i.e. they saw their setting, the point where the dot entered the occluder and the point where it exited (which was also simulated by carrying on the trajectory under the occluder). They were also given feedback about whether the catcher contained the dot and how many points they had scored (or been penalised) on that trial and a running total was presented. To increase motivation across experiments participants were told that the person scoring the highest points total (in that experiment) would receive a £20 prize. Two examples are shown for lower (K = 25) and higher (K = 100) reliability trajectories. When the dot hit the inner edge of an annular occluder it disappeared and participants had to set the angular location and size of a catcher arc on the outer edge of the occluder. B: Key variables in this paradigm are θ C , the angular size of the catcher region and θ in , the angular location at which the dot hits the inner edge of the occluder. C: Using Monte Carlo methods it is possible to estimate the probability p(θ C , K) of catching the dot for any given catcher size and reliability parameter K (see general methods). When points are awarded for catching/missing the dot as a function of catcher size then it is possible to calculate the expected value associated with any catcher size using the formula EV(θ C , . This paradigm is then formally equivalent to a choice between lotteries (one for each of the possible catcher sizes) and the maximum expected value decision maker should chose the size θ MEV C that maximises EV(θ C , K).

Software
Experimental code was written in Lazarus (https://www.lazarus-ide. org/) an open source free pascal IDE and the JEDI-SDL libraries (http://www.delphi-jedi.org). All data analyses were undertaken using MatLab TM (2020) and we also used the circular statistics toolbox described in Berens (2009).

Monte Carlo methods to estimate p(θ C , K) and EV(θ C , K)
Similar to Warren et al (2012) we used a Monte Carlo (MC) approach to estimate the probability of catching a dot for each possible catcher size given the variability of the dot trajectory,p(θ C , K). To estimate p(θ C , K), for each value of the parameter K, we simulated 200 trajectories from the centre to the inner edge of the occluder. For each of these trajectories we then simulated a further 5000 follow-on trajectories underneath the occluder until the dot emerged on the other side. This resulted in a total of 1,000,000 trajectories. In order to work out the probability of catching the dot over these simulations for each catcher size we need to consider the catcher location settings. In Warren et al (2012) we provided strong evidence that catcher location settings in this task do not depend on the angle at which the dot entered the occluder and were predicted simply by the angular location θ in (Fig. 1A) which specifies the line between the centre of the arena and the point at which the dot entered the occluder. We assumed the same was also true in the present data collected under very similar conditions. In line with this assumption in the present study (see below) and consistent with Warren et al. (2012) catcher location settings were close to being normally distributed and centred on the line between the centre of the display (start point) and the point at which the dot entered the occluder. In other words, on a given simulated trial the catcher location setting θ loc was modelled as θ loc = N(θ in ,σ loc ), where θ in is the angle of the line between the centre and the point at which the dot entered the occlude (Fig. 1A) and σ loc is the spread of this setting over participants across all conditions (there was no relationship between this parameter and any of the experimental parameters manipulated). In line with Warren et al. (2012), we assume that variability in setting about θ in simply reflects constraints on the ability to i) hold the location at which the dot entered the occluder in short term memory; ii) recover the trajectory direction on the last time step. In practice we used a value for σ loc of 11.96 degrees based on the average circular standard deviation from θ in across all conditions in Warren et al, (2012), together with Experiments 1, 2 and 3 in the present study (see Results for Experiment 1 for more details).
Given this model of catcher location settings and the 1,000,000 simulated trajectories we could then estimate the probability of catching the dot across the catcher size range. In order to facilitate subsequent calculations (in particular to enable us to develop a closed form estimate of EV curve curvaturesee discussion and Appendix A) we then fit the probability curves using a 2nd order logistic polynomial function of the form: Note that this form has the appropriate properties of tending to 1 as θ C increases and equalling 0 when θ C = 0. Parameters for these fits are given in Table 1 and the associated probability curve in Fig. 2, note that experiments 1 and 2 used the same values for parameter K and so the probability curves are identical for these experiments.
Once p(θ C , K) is estimated, for a given reward/penalty structure (i.e. a method for assigning points for catching vs missing the dot), we can calculate the associated EV curve for each possible catcher size, EV(θ C , K): Across all three of our experiments the points assigned for catching the dot decreased linearly with the size of the catcher, whereas the points deducted for missing the dot were constant i.e.
with specific values of parameters β 1 , β 2 , and M varying across experiment (see below). While the penalty for missing the dot, V(θ c , miss), did not vary with catcher size it did vary across experiments and conditions (see specific methods sections for further details). If the participant caught the dot, then the value of β 1 +β 2 θ c points was added to the running total and if the participant missed the dot then the value of M points was deducted from the running total.

Experiments 1 and 2
In experiments 1 and 2 we examined the effect of outcome valence on catcher size choices as well as the influence of dot trajectory variability. Both within and between experiments we considered the impact of changing the difficulty of finding the MEV solution by manipulating the curvature of the EV function. This can be done in two wayseither by changing the reliability of the dot trajectory or by changing the fall off in reward for catching the dot as catcher size increases. Within experiments we varied the path reliabilityhigher reliability leads to higher curvature of the EV function (see Fig. 3 and supplementary materials). Between experiments 1 and 2 we varied the parameters of the function relating catcher size to V(θ c , catch), i.e. parameters β 1 and β 2 in Eq. 3. In particular, a smaller negative slope (β 2 ) parameter in experiment 1 leads to lower curvature of the EV curve (compare columns in Figure S1.1 in supplementary materials). Note that this change in parameters means that the penalty per additional degree of catcher size in experiment 1 is Table 1 Fitted parameters for p(θ C , K) using functional form outlined in Eq. (1). (2). Note that as the concentration parameter K (which controls the reliability of the dot's random walk) increases the probability of catching also increases for any given catcher size.
three times that in experiment 2.

Participants
Sixty Undergraduate students (Exp 1, N = 30; Exp 2, N = 30) took part in these experiments. Two participants in Exp 1 did not complete both sessions of the experiment and so their data were excluded (i.e. Exp 1, N = 28; Exp 2, N = 30).

Design
In both experiments two independent variables were manipulated. The first independent variable (random walk reliability) was the same across both experiments and is characterised by the concentration parameter, K. This parameter controlled the spread of the von Mises distribution from which direction samples were drawn on each step (see Warren et al., 2012 for more details) and had 3 levels: (25, 100, 400). Higher values of K were associated with more reliable paths (at K = 400 the trajectory still deviated from a straight-line path, but the deviation was much less marked than at K = 25). The second independent variable (outcome valence) had three levels: (gain, intermediate, loss). Table 2 summarises the (β 1 ,β 2 ,M) parameters from Eqs. 3 and 4 for the different levels of the outcome valence for experiments 1 and 2.
The corresponding EV curves associated with the value functions outlined in Eq. 3,4 and Table 2 and the probability curves defined above ( Fig. 2) are illustrated in Fig. 3 and in the supplementary materials in Figure S1.1. The MEV solution catcher size associated with the computationally rational agent is also shown.
Note that parameters in Table 2 were chosen such that the shape of the EV curve, and hence the peak (MEV) value, was fixed across outcome valence conditions for each value of K. However, relative to experiment 1, the curvatures of the EV curves near their peak (MEV) solutions in experiment 2 are considerably lower. Consequently, in experiment 2 it is harder to find the MEV solution. Note also from Fig. 3 that in the gain condition the EV curves are fully in the gain (+ve expected points) region whereas in the intermediate and loss conditions (to a greater or lesser degree) the EV curves are fully in the loss (-ve expected points) regions.

Procedure
Over three sessions of approximately 20 mins each, participants made 216 catcher settings (24 repetitions for each of the 9 possible pairings of outcome valence × random walk). In each session participants completed a block each of gain, intermediate and loss outcome valence trials. Order of outcome valence blocks was randomised across participants. Fig. 3. Catcher angle settings for Experiment 1. Each black dot corresponds to a single participant, whereas the larger yellow dot is the average participant. Unbroken vertical line indicates the MEV catcher setting whereas the broken vertical line indicates the average of participant catcher setting. Across all outcome valence and trajectory reliability conditions average catcher size over observers is close to that of the computationally rational agent. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 2
Parameters for V(θ c , catch) and V(θ c , miss) (see eq. 3 and 4) used in experiments 1 and 2. Figure S2.1 in the supplementary materials shows 9 (outcome valence × random walk reliability) histograms of catcher position relative to the straight line between the centre of the display and the point at which the dot first entered the occluder (at an angle θ in ) over all participants in Experiment 1. Note that histograms are approximately normal and centred on zero suggesting that on average participants made a setting centred on θ in corrupted by zero mean gaussian noise (likely due to constraints on the ability to remember the point at which the dot disappeared behind the occluder). Note also that there is no obvious systematic effect of condition on the histograms. Similar distributions were obtained for Experiment 2, Experiment 3 and when using the same occluder width in our previous experiment . The average circular standard deviation of catcher location settings over all conditions and current and previous experiments (matched for occluder width) was calculated as 11.96 • . Catcher location settings, θ loc , were therefore modelled as an unbiased Gaussian noise process centred on θ in , i.e. θ loc N(θ in , 11.96) (see general methods above for how we used this to recover the probability of catching the dot for a given value of K and catcher width).

Catcher size settings
Figs. 3 and 4 plot mean points scored in each condition against the catcher size settings for all participants (smaller black dots) in experiments 1 and 2 respectively, together with the average catcher size setting over participants (larger lighter dot and dotted vertical line). The unbroken curves and vertical lines in each panel represent the EV curve for that condition and the associated MEV catcher angle respectively. In both experiments settings appear to be largely insensitive to outcome valence, however, there appears to be an effect of path reliability consistent with tracking the MEV solution. Specifically as increases in path reliability move the MEV solution to be at smaller catcher angles, human settings track this shift to the left (note that the dotted vertical lines denoting the average catcher size settings in each column move to the left for increasing values of reliability parameter K). Note also that in experiment 2 (where EV function curvature was smaller) there appears to be a consistent bias such that participants set the catcher to be significantly smaller than they should to maximise points scored. The impact of the experimental manipulations, together with the tendency to underestimate the optimal catcher angle in experiment 2 can be seen more clearly in Fig. 5 which summarises the catcher angle setting in each condition as a function of the reliability of the dot trajectory along with the MEV catcher angle for each condition. Note, in particular, that settings are close to the MEV solution across reliability conditions in experiment 1 (Fig. 5A) whereas they are much lower than required in experiment 2 (Fig. 5B), where the curvature of the EV curve was considerably lower. Bayesian statistical analyses (see supplementary materials, Tables S3.1, S3.2) suggest that there is some (albeit weak) evidence for the null hypothesis (no difference between the MEV solution and the observed catcher angle settings) in experiment 1 but very strong evidence for the alternative hypothesis (a difference between that the MEV solution and the observed catcher angle settings) in experiment Fig. 4. Catcher angle settings for Experiment 2. Across all outcome valence and trajectory reliability conditions average catcher size over observers underestimates that of the computationally rational agent.
2. The alternative models presented (fixP and maxCrv) are described, and their consistency with the data are considered, in the discussion.
To formally test for effects of path reliability and outcome valence manipulations in each experiment we conducted two 3 × 3 repeated measures ANOVAs on the catcher angle settings (one for each experiment). In experiments 1 and 2 we found a significant main effect of reliability (Exp1: F(2, 54) = 5.17, p =.009; Exp 2: F(2, 58) = 22.0, p <.001). However, we failed to find evidence for a main effect of outcome valence (Exp1: F(2, 54) = 0.22, p =.80 Exp 2: F(2, 58) = 1.81, p =.17). In experiment 1 we also failed to find clear evidence for an interaction between outcome valence and path reliability (F(4, 108) = 2.22, p = 0.07). However in experiment 2 we did find evidence for an interaction between these factors (F(4, 116) = 3.06, p = 0.02). Closer examination of Fig. 5B suggests that this interaction is driven by the presence of larger differences in settings at the lowest level of path reliability relative to higher reliability levels. In particular there is a suggestion in the data that catcher settings are highest (most risk averse) when K = 25 and the outcome valence is in the gain domain. This suggests that there may be greater impact of the valence manipulation when we are more likely to observe biased settings, i.e. in experiment 2 when task is harder because curvature is low and also when the path is most unpredictable. In experiment 3 we consider the impact of valence when focusing on a smaller range of low reliability trajectories while maintaining the difficulty associated with finding the MEV via a low curvature EV curve.

Efficiency
While there is a clear bias in the catcher settings away from the MEV solution in experiment 2 note that performance is still good with respect to the number of points scored by each participant. To see this note that each black dot in Fig. 4 is relatively close to the expected points that would be scored on average by the MEV observer. This can also be seen in Fig. 6 where we have plotted the average efficiency (i.e. the difference in points scored between observer and the MEV model as a percentage of the MEV score 1 ) together with the 95% CI over participants. In both experiment 1 and 2 efficiency is close to 100% across dot trajectory reliability conditions and outcome valence conditions (although performance improves to a certain extent with random walk reliability). Bayesian statistical analyses (see supplementary materials Tables S3.4, S3.5) provide some (differing in strength over reliability conditions) evidence for the null hypothesis suggesting that performance is not different from that of the MEV observer.

Experiment 3
In the third experiment we focus on a narrow range of low values for the dot trajectory reliability parameter (K) so that trajectories are generally more unreliable (relative to Experiments 1 and 2). Also, lower K is associated with lower EV curvature since, as can be seen in Fig. 2, lower K means that probability of catching increases less quickly with catcher size. Consequently, in this experiment we examine how the valence manipulation affects behaviour in the situation where finding the MEV solution is most difficult and, consequently, performance is most likely to be biased.

Participants
Undergraduate students (N = 50) took part in this experiment and all participants completed both sessions.

Design
Similar to experiments 1 and 2, we manipulated random walk reliability (again characterised by the concentration parameter, K) and outcome valence. In this experiment, however, we focused on a narrower range of low K values: (25, 50, 75, 100). These values are in between the two lowest values of K used in experiment 2.
In this experiment we considered only two outcome valence levels: (gain, loss) and the parameters (β 1 , β 2 , M) from Eq. 3 and 4 for the different levels of outcome valence are reported in Table 3. The corresponding EV curves associated with i) these value functions and ii) the probability curves defined in Fig. 2 are illustrated in Fig. 7 together with the MEV solution catcher size associated with the computationally rational agent (vertical lines). Note that the range for the y axis is chosen to facilitate comparison with the EV curves from experiments 1 and 2, and that the curvature in the region of the EV solution is low relative to experiment 1 and consistent with the appropriate K conditions from Experiment 2) which should lead to more biased behaviour and potentially make the impact of outcome valence clearer. 1 1 Note that in order to make the efficiency measure commensurate across different valence conditions all differences were normalised with respect to MEV scores in the gain condition. This is equivalent to expressing the difference from the MEV score as a percentage of the distance between the maximum and minimum expected scores across catcher angles.

Procedure
Over two sessions of approximately 20 mins each, participants made 128 catcher settings (16 repetitions for each of the 8 possible pairings of outcome valence × random walk reliability). In each session participants completed one block of gain outcome valence and one block of loss outcome valence. Order of gain and loss blocks was counterbalanced across participants. Fig. 7 illustrates mean points scored in each condition against the catcher size settings for all participants (smaller black dots) in experiment 3, together with the average catcher size setting over participants (larger lighter dot and dotted vertical line). The unbroken curves and vertical lines in each panel represent the EV curve for that condition and the associated MEV catcher angle respectively.

Catcher size settings
Note that similar to experiments 1 and 2 the bias appears to reduce as the dot trajectory reliability increases (dotted vertical lines approach the unbroken line as K increases). Note also that, similar to experiment 2 in which the EV curve curvature was also relatively low, there is a clear bias in settings, with catcher angles systematically smaller than the MEV solution across conditions. However, now there is also some evidence that this bias is larger for loss valence conditions (dotted lines are shifted further to the left relative to MEV solution in the loss domain), suggesting that catcher angles were set smaller (more risky) for loss valence conditions.
The impact of the experimental manipulations can be seen more clearly in Fig. 8 which summarises the average catcher angle setting (over participants) in each condition as a function of the reliability of the dot trajectory. Note that the catcher is too small (relative to the MEV solution) in both valence conditions but is set to be even smaller across all of the Loss valence conditions, suggesting behaviour was more risk seeking for choices in the loss domain. Bayesian statistical analyses (see Supplementary Materials, Table S3.3) suggest that there is strong evidence for the alternative hypothesis (a difference between the MEV solution and the observed catcher angle settings).

Efficiency
Again, similar to experiment 2, while there is a clear bias in the catcher settings to underestimate the MEV solution note that performance is still good with respect to the number of points scored by each participant. To see this note that each black dot in Fig. 7 is relatively close to the expected points that would be scored on average by the MEV observer. This can also be seen in Fig. 9 which illustrates the average efficiency (i.e. the points scored as a percentage of the expected MEV score) together with the 95% CI over participants. Efficiency is close to 100% across dot trajectory reliability conditions and outcome valence conditions. Bayesian statistical analyses (see Supplementary Materials  Table S3.6) provide some (albeit mixed over reliability conditions) evidence for the null hypothesis suggesting that performance is not different from that of the MEV observer.

Summary
Using the perceptual decision-making paradigm from Warren et al. (2012), across three experiments we have investigated the impact of manipulating: i) the difficulty in finding the peak of the EV function and ii) the outcome valence, on catcher size choices for a range of dot trajectory reliabilities. In experiment 1, where EV curvature was highest (i. e. unit changes in catcher size were associated with larger changes in EV) we found that performance was close to that of a computationally rational agent across all outcome valence and reliability conditions. This finding is consistent with that from Warren et al. (2012) and suggests that when differences in EV associated with different choices are above some threshold value, observer performance can be close to computationally rational, with little effect of outcome valence.
In experiment 2, the curvature of the EV curve was considerably reduced, meaning that the difficulty of finding the MEV solution is higher (and conversely the impact of not finding MEV solution is considerably lower). Accordingly, there was evidence for a much larger bias in performance, with participants setting catcher sizes 10-15 degrees smaller (more risk seeking) than those of the MEV observer. Although participants were still sensitive to the dot reliability manipulation (making settings smaller when the path was more reliable) there Error bars are 95% CIs over participant efficiencies. Note that across both experiments efficiency is high.
was little evidence of an effect of outcome valence (although there was some suggestion of an effect at the lowest level of dot trajectory reliability).
In experiment 3 the curvature of the EV curve was again lower than that in experiment 1 but we focused on a smaller range of low dot trajectory reliabilities. Similar to experiment 2 we again found evidence for a marked bias in performance, with participants setting catcher sizes 8-13 degrees smaller (more risk seeking) than those of the MEV observer. We also again found a similar effect of dot trajectory reliability (with smaller catcher size settings when the path was more reliable). However, unlike experiment 2 there was now a small (on the order of 4-5 degrees) effect of outcome valence, with catcher size settings in all loss outcome conditions smaller (i.e. more risky) than those in gain outcome conditions.
Taken together these data suggest that when the consequences of making choices that depart from the MEV solution are sufficiently large, performance in this task is approximately computationally rational. However when these consequences are minimal, behaviour is biased towards setting the catcher too small i.e. more risky that optimal. Moreover, for conditions in which there is most uncertainty (low values Fig. 7. Catcher angle settings for experiment 3. Across all outcome valence and trajectory reliability conditions average catcher size over observers underestimates that of the MEV agent. of dot trajectory reliability) there is evidence that loss domain choices do result in slightly riskier behaviour than gain domain choices, which is at least partially consistent with behaviour observed in cognitive decision making.

The dependence of bias on choice difficulty and EV curvature
As noted above (and in Jarvstad et al., 2014), when the difference in EV between lotteries is around or below the threshold of human perception, finding the optimal solution is, of course, difficult. But also given the small difference, making a sub-optimal choice is not particularly important. For example, consider an example from Kahneman & Tversky (1979) in which participants are asked to choose between lotteries A and B such that: A: {(0.33, £2500); (0.66, £2400); (0.01, £0)}. B: {(1.0, £2400)}.
In this example the EV of A is £2409 whereas the EV of B is £2400. Around 82% of participants chose option B in this scenario. Clearly this is only formally sub-optimal relative to an EV maximising decision maker with the ability to discriminate very small differences in EV (the differences here are around 0.4%). Jarvstad et al. (2014) go on to point out that care needs to be taken when comparing behaviour across conditions in which task difficulty is not taken into account and that in experiments where difficulty is equated across tasks, performance is actually rather similar, even across different decision-making contexts (Jarvstad et al., 2012;Jarvstad et al. 2013, 201;Wu et al., 2009).
To assess formally how choice difficulty (and EV curvature) is related to behaviour in our task we derived the curvature of the EV curve in closed form, choosing the curvature at the MEV solution as our metric of difficulty. This metric can characterise task difficulty in this context where choices are spread over a continuous range of catcher sizes. We calculated EV curvature at the MEV solution for each of the three experiments reported here. To do this note that the curvature, C, of function g(x) at point x is based upon the first and second derivatives of that function: As a consequence, to calculate curvature for our EV curve we need only the first and second derivatives of EV as a function of catcher size, which can be derived from equations (2), (3) and (4). These derivatives in turn depend upon the first and second derivatives of p(θ C , K) which can be recovered from equation (1) (see appendix A). Fig. 10A shows the relationship between EV curvature at the MEV catcher size across conditions and experiments together with the associated average bias over participants. Note that there is a clear negative correlation between these variables (r(26) = -0.826, p <.001). This effect can be seen both within experiment (note lower curvatures are associated with lower value K conditions) and across experiments (e.g. compare biases for experiment 1 with those for experiments 2 and 3).
Based on these results we suggest that participant behaviour is sensitive to choice difficulty. Consequently, this further cements the idea that care needs to be taken when interpreting behaviour as biased without considering choice discriminability/difficulty. Unavoidable constraints on the ability to detect differences in outcome value, together with the associated limited consequences of departing from the optimal solution are must be factored in to such interpretations.

A systematic bias towards risk seeking -Non-MEV models of behaviour
The majority of participants in experiments 2 and 3 (and the average participant) exhibited a marked bias towards setting the catcher size too small (being more risky than the MEV agent). As noted above, we suggest that bias occurs partly because EV curvature is low in these experiments and there is no strong incentive to find the optimal catcher size. However, this does not explain why the catcher size was set systematically too small.
One possibility is that participants exhibit a bias to underestimate the uncertainty in the dot trajectory and thus set the catcher to be too small. Alternatively, participant settings might reflect underlying biases in probability estimates similar to those seen in the 'decision from experience' literature (e.g. see Hertwig & Erev, 2009;Rakow & Newell, 2010). The choices made in the present study are closer in nature to those seen in previous 'decision from experience' (as opposed to 'decision from description') tasks because probability information is obtained experientially. Previous research suggests that low probability events are underweighted in 'decisions from experience' and if (for example) our participants underestimated the relatively small probability of not catching the dot around the MEV solution they might then set the catcher to be too small. However, note that while they might provide appealing redescriptions of the data, neither of these possibilities provides an account of why such behaviour is observed. Perhaps more importantly the question arises of why behaviour consistent with such biases is not seen across all experiments (note that performance is considerably closer to the MEV agent in experiment 1).
While we do not claim to have a definitive explanation for the behaviour exhibited across our experiments, here we explore two alternative accounts. The first is motivated by the fact that catcher angle settings in experiments 1 and 2 are rather similar. It is possible therefore that participants are ignoring the value information (which differs markedly between experiments 1 and 2) and simply act to maintain a fixed probability of catching across different trajectory reliability and valence conditions. Note that any fixed probability of catching will naturally lead to differences in catcher angles for different values of reliability (see Fig. 2). Consequently it is possible to find the fixed probability value providing the best fit (using the MatLab fminsearch function) to the observed catcher angle settings across conditions. The best fitting fixed probabilities for experiments 1 and 2 were very similar (around 0.58 in both cases) and, consequently, the associated best fitting catcher angles across K values are also similar. The output from this analysis is presented as the fixP model in Figs. 5 and 8. Note that although this model appears to have the appropriate overall tendency to set catcher angles too small, it departs from the human data markedly in both experiments, having too large catcher angles at low trajectory reliability and otherwise too small catcher angles. In addition, in both experiments it is clear that catcher angle settings from this model vary much more strongly with reliability parameter than the human data, which is much closer in this respect to the MEV model.
The second account is motivated by the analysis in the previous section that identified the potential sensitivity of participants to EV curvature. More specifically, we suggest that the participant might recognise that there is more information to be gained from setting smaller catcher angles than the MEV solution in some of our experiments. To see this note from Figures 4, 7 and S1.1 in the Supplementary Materials that in experiments 2 and 3 (but less so in experiment 1) the region of highest EV curvature is shifted to the left of the MEV solution where curvature is actually rather low. Consequently, in experiments 2 and 3 changes in setting size near the MEV have limited impact on outcome. Accordingly, if participants were 'zoning in' on the part of the curve where changes in settings lead to largest impact on outcome then we would indeed predict smaller catcher sizes in experiments 2 and 3 relative to experiment 1. Finding the part of the curve with highest curvature is equivalent to finding the region in which there is most information to be gained by changing settings (since small changes in setting lead to larger differences in outcome). Recent research has similarly suggested that exploratory behaviour can be intrinsically rewarding, and, in particular, that information gain from exploration might contribute to the value function in certain tasks (Clark & Gilchrist, 2018). Exploration of EV space in the region of maximal curvature is in line with this suggestion. In Fig. 10B we show that in accordance with this idea, the bias observed across conditions and experiments is correlated with the distance from the MEV solution to the point of maximum EV curvature ((r(26) = 0.845, p <.001). Again, although behaviour is indeed biased away from the MEV catcher size perhaps this is in fact a result of making settings in the region that lead to the biggest changes in outcome (i.e. that were more informative), as opposed to reflecting a tendency for greater risk seeking.
Based on this analysis we define the maxCrv model as the agent that sets catcher angles at the point of maximum EV curvature. Settings from this model are also illustrated in Figs. 5 and 8. Note that the model is biased to make smaller settings that the MEV solutions and is closer to the human data than either of the other models considered across all three experiments, providing a relatively good fit for both experiments 1 and 3.
To formally compare models we calculated ΔAIC values for the MEV, fixP and maxCrv models in each of the three experiments, based on the residual sum of squares (RSS) associated with each model using the formula: where p is the number of parameters associated with the model and n is the number of data points over which the RSS is calculated. Note that the MEV and maxCrv models have no free parameters whereas the fixP model has 1 free parameter in each experiment (i.e. the fitted fixed probability for each experiment). Table 4 presents the outcome of this analysis and in all three experiments the AIC value is lowest for the maxCrv model suggesting it provides the best account of the data.

Dependence of choice behaviour on outcome valence
In cognitive decision-making paradigms the effect of outcome valence on choice is robust, with participants exhibiting consistently risk seeking behaviour when facing choices between (hypothetical) outcomes in the loss domain as opposed to risk averse behaviour for equivalent gain domain outcomes (e.g. Kahneman & Tversky 1979). In contrast, in the present perceptual choice task we observed relatively little impact of the outcome valence manipulation. It is tempting to conclude that this is because behaviour is somehow fundamentally less biased in the perceptual domain, especially when the discrepancy between choice values is easily detected. However, note that previous research suggests that, in fact, there is no discrepancy between performance in appropriately matched cognitive, perceptual and perceptuomotor tasks (Jarvstad et al. 2013). Note also that we did actually observe an effect of outcome valence (in the expected direction) in experiment 3 where we focused on a smaller range of low dot trajectory reliabilities. But while this effect was statistically significant, it was small (on the order of 4-6 degrees) and so considerably smaller than the more noticeable bias towards smaller than optimal catcher sizes observed in all conditions in experiments 2 and 3.
There is evidence from previous research that similar biases can emerge in both perceptual and cognitive decision-making paradigms. For example, a recent paper by Jarbo, Colaco & Verstynen (2020), considered the impact of the framing bias on motor execution behaviour. In the classical framing effect (Tversky & Kahneman, 1981) the fundamental pattern of choices is altered by the way in which a decision is framed. For example suggesting a food item contains 10% fat vs being 90% fat free can change subsequent preferences. Jarbo et al (2020), showed that a similar framing manipulation in which participants had to decide on the spatial location to deliver either a precision drone strike (harm context) or an ammunition delivery (help context) led to systematic biases in the selected location.
With these issues in mind we suggest that, in fact, although present, a similar effect of valence to that observed in the cognitive paradigm is greatly reduced and only pronounced in specific circumstances where bias is emphasised. It should be noted however that while the perceptual task here is formally equivalent to the classical lottery selection task used in many cognitive experiments, there are, nonetheless, significant differences. For example, a primary difference is that in the perceptual task the participant selects between many lotteries varying over a continuous range of probabilities of success (and which trade off against outcome value). In contrast, in cognitive DM paradigms the participant is typically asked to choose between only two lotteries. We suggest that such task differences could potentially obscure similar effects of outcome valence from being seen across higher vs lower-level decision making domains. Nonetheless it is still interesting that in experiment 3 a clear effect of outcome valence with a more pronounced bias towards seeking risk is observed in the loss domain relative to gain.

Potential effects of context on catcher size settings
Note that catcher size settings in corresponding conditions in experiments 2 vs 3 differspecifically settings in experiment 3 are approximately 5 degrees higher. This suggests that the context in which the participant is placed during the experiment (i.e. the range of uncertainties experienced) impacts upon subsequent settings. One potential account of this behaviour is that participants integrate observations from previous trials to form a local prior on K for the current trial. Given the differences in the ranges of K values in Experiments 2 vs 3 this might lead to such priors being different across experiments. While a formal analysis of this idea is beyond the scope of the present study, this raises an interesting potential direction for future work involving Bayesian modelling of settings with priors adjusted in line with local context across experiments.

Potential effects of learning on catcher size settings
In our experiments participants collected many trials over several sessions. It is possible therefore that settings were not stable over sessions and some learning occurred. To address this question we looked at data in experiments 1 and 2 in which each valence condition was blocked and repeated three times (sessions 1-3). Figure S4.1 in the Supplementary Materials shows how settings changed across the three sessions. It is clear from this analysis that there is evidence for a small difference (tendency to increase catcher angle by a few degrees) between settings in sessions 1 and 2 but that there is no further difference in settings in sessions 3. We also conducted Bayesian paired t-tests to looks for evidence for differences between settings between sessions 1 and 3. The results suggest that there was only limited evidence for the alternative hypothesis that there was a difference in settings between sessions. We suggest, in light of these results, that there is limited evidence of learning across our experiments and performance was relatively stable from the outset.

Conclusion
We suggest that the impact of valence manipulations in perceptual DM tasks like ours is potentially present but considerably smaller than that observed in cognitive DM and only arises when sub-optimal performance is more likely (i.e. when difficulty in finding the MEV solution is high). More importantly we propose that studies of decision making (both perceptual, perceptuomotor and cognitive) and associated appraisals of the quality of human choice under risk should take into account the difficulty in finding the MEV solution. Moreover we suggest that participants are sensitive to EV function curvature and studying this sensitivity could yield important insights on drivers of human behaviour in such tasks.
Access to data and analysis code. Anonymous raw data and analysis code are available here https ://figshare.manchester.ac.uk/articles/dataset/PDM_choice_difficulty_ gain_v_loss_/14199761.

Acknowledgements
Thanks to Sonia Mansouri, Isobel White and Isaac Bryan who helped with data collection. Thanks to Laurence T. Maloney, Cristina de la Malla, George Farmer and two anonymous reviewers who provided helpful feedback on earlier versions of the manuscript.

Appendix A. Closed form derivation of EV function curvature
Using equation (5), we see that the curvature, C, of the EV function is given by.