Adaptive decision making depends on pupil-linked arousal in rats performing tactile discrimination tasks

Perceptual decision making is a dynamic cognitive process and is shaped by many factors, including behavioral state, reward contingency, and sensory environment. To understand the extent to which adaptive behavior in decision making is dependent upon pupil-linked arousal, we trained head-fixed rats to perform perceptual decision making tasks and systematically manipulated the probability of Go and No-go stimuli while simultaneously measuring their pupil size in the tasks. Our data demonstrated that the animals adaptively modified their behavior in response to the changes in the sensory environment. The response probability to both Go and No-go stimuli decreased as the probability of the Go stimulus being presented decreased. Analyses within the signal detection theory framework showed that while the animals’ perceptual sensitivity was invariant, their decision criterion increased as the probability of the Go stimulus decreased. Simulation results indicated that the adaptive increase in the decision criterion will increase possible water rewards during the task. Moreover, the adaptive decision making is dependent upon pupil-linked arousal as the increase in the decision criterion was the largest during low pupil-linked arousal periods. Taken together, our results demonstrated that the rats were able to adjust their decision making to maximize rewards in the tasks, and that adaptive behavior in perceptual decision making is dependent upon pupil-linked arousal.


INTRODUCTION 18
Adaptive behavior is essential for animals to survive in an ever-varying environment. In 19 perceptual decision making tasks, sensory information is accumulated over time in the central 20 nervous system, eventually leading to a decision to choose one of the alternatives and generating 21 motor commands to indicate the animal's choice (Smith and Ratcliff, 2004  Multiple neuromodulatory systems may exert heavy influences on the cognitive control of 31 adaptive decision making (Doya, 2008). Previous work has suggested that tonic activation of the 32 locus coeruleus -norepinephrine (LC-NE) system plays a critical role in regulating decision 33 making processes with regard to exploring alternatives or exploiting current resources based on 34 uncertainty of available information (Aston-Jones and Cohen, 2005; Yu and Dayan, 2005). On the 35 contrary, phasic activation of the LC-NE system is thought to reset functional networks, and 36 therefore facilitate their reorganization to enable behavioral adaptation in decision making (Bouret 37 and Sara, 2005). It has been postulated that the activity of the cholinergic system is related to 38 expected uncertainty (Yu and Dayan, 2005). Precise tactile stimuli were delivered via a multilayer piezoelectric bending actuator 106 (PL140; Physik Instrumente, Germany) driven by a high-voltage amplifier (OPA452; Texas 107 Instruments, Dallas, TX). Whiskers were placed in a short glass capillary pipette approximately 108 15 mm long with an outer diameter of 1 mm and an inner diameter of 0.5 mm (AM Systems, 109 Carlsborg, WA). The pipette was bonded to the end of the piezo actuator and placed 8 mm away 110 from the right snout. The whisker that received tactile stimuli was chosen with respect to its 111 thickness between C2, C3 and D2, and the same chosen whisker was used in all behavioral 112 sessions for each animal. The chosen whisker was slightly trimmed to facilitate the insertion into 113 the pipette. 114 To mask possible auditory cues, a buzzer (bandwidth: 16 Hz -10 kHz) delivering white 115 noise-masking sound was placed next to the whisker stimulator. Onset tone (6 kHz), reward tone 116 (3 kHz), and timeout tones (16.5 kHz) were delivered by a speaker installed in the chamber. 117 Animals were remotely monitored with a CCD camera, and an infrared LED was placed in the 118 chamber for illumination during the task. Control of the behavioral task and sampling of animals' 119 behavioral responses were performed by custom-programmed software running on a MATLAB 120 xPC target real-time system (Mathworks, Natick, MA). All behavioral data was sampled at 1 kHz 121 and logged for offline analyses. 122 Tactile stimulus. Whisker stimuli used were sinusoidal waveforms of 8 Hz and 4 Hz (0.5 s, 1 mm 123 amplitude), with the 8 Hz stimulus randomly assigned as the Go stimulus and the 4 Hz stimulus 124 as the No-go stimulus. The probability of the Go stimulus being presented was designated as 125 either 80%, 50%, or 20% for each session. 126 Pupillometry recording. Recording of the pupil contralateral to the whisker deflection was made 127 using a custom-made pupillometry system (Liu et al., 2017), which were triggered at 20 Hz by the 128 xPC target real-time system (Mathworks, MA) that controlled the behavioral task. Pupil images 129 were streamed to a high-speed solid-state drive for offline analysis. To extract pupil size, the pupil 130 contour was segmented using the DeepLabCut toolbox (Mathis et al., 2018). A training set 131 consisting of two hundred frames recorded across different sessions was selected. Each frame 132 had 12 evenly distributed points labeled surrounding the pupil manually, and the images were 133 cropped to enable a higher training accuracy. The ResNet50 deep network was used to analyze 134 the video clips from all sessions after the training. The automatically labelled points were fit with 135 circular regression and the pupil size was computed as the area bounded by the contour. 5% of 136 all images were randomly selected for inspection to validate the accuracy of the software. Pupil during the tasks. However, during the behavioral task, correct responses to a Go stimulus were 144 rewarded with ~60 uL aliquots of water. Because the number of possible rewarding trials (i.e. Go 145 stimulus trials) was different across the three paradigms, supplemental water was given before 146 returning the animals to the animal facility to ensure their daily water intakes were the same across 147 all training days. The weight of the animals was measured and logged immediately after the task. 148 The onset of each trial was indicated by a brief "trial onset tone" (300 ms, 6 kHz), followed 149 by a random delay (1 to 3.5 s uniform distribution) ( Figure 1B). To discourage the animal from 150 impulsively licking, the last 1 s of the waiting period was a designated "no lick" period, during 151 which any premature licks would result in an additional delay in stimulus presentation pulled from 152 a 1 -2.5 s uniform distribution (Stuttgen and Schwarz, 2008;Ollerenshaw et al., 2014). The 153 stimulus for each trial could be either a Go stimulus or a No-go stimulus, but the fraction of Go 154 stimulus trials was randomly selected from 0.8, 0.5 and 0.2 for each session, resulting in three 155 behavioral paradigms. Licking within a window of opportunity (1.3 s) following a Go stimulus 156 resulted in a brief "reward tone" (300 ms, 3 kHz) accompanied by a water reward, whereas licking 157 within the window of opportunity following a No-go stimulus triggered a "timeout tone" (5 s, 16.5 158 kHz) which began a 10 s timeout period. CR and miss behavioral outcomes were neither rewarded 159 nor penalized. A 6 s inter-trial period followed the end of the window of opportunity for CR and 160 miss trials, water reward for hit trials, and timeout period for false alarm (FA) trials. Across all 5 161 animals, 260 sessions were performed and 70516 trials were recorded. Pupillometry was 162 recorded in 165 sessions. 163

Data Analysis 164
All data analyses were first conducted on individual sessions. Grand averages and 165 standard errors of means were then calculated across sessions for analysis and presentation. For 166 each session, the first 20 trials were excluded due to the time required to adjust the pupillometry 167

camera. 168
Behavioral Performance. Response probabilities for each session were calculated as the hit rate 169 (HR, i.e. number of hit trials/number of S+ trials) and FA rate (FAR, i.e. number of FA trials/number 170 of S-trials). These were used to calculate perceptual sensitivity (d') and decision criterion as 171 For analyzing response probabilities, perceptual sensitivity, and decision criterion versus 175 percent of maximum baseline, each session's baseline range was first computed and then evenly 176 broken into 20 bins, each trial was sorted into one of the bins, and HR, FAR, d', and criterion were 177 calculated for each bin. The loglinear approach was utilized to allow for calculating d' and criterion 178 in bins where HR or FAR equaled 1 or 0, where 0.5 was added to the number of hits and FAs and 179 1 was added to the number of S+ and S-presentations prior to calculating HR and FAR (Stanislaw 180 and Todorov, 1999). 181 Reaction times were computed as the time from stimulus onset, which is when the window 182 of opportunity began, until the first lick response within the window of opportunity. Reaction times 183 were only computed when a response was logged within the window of opportunity, i.e. for hit and 184 FA trials, but not miss or CR trials. 185 Pupil dynamics. Pupil sizes were first Z-scored for each session prior to further analyses. Pupil

Simulation to determine optimal decision criterion 196
To determine optimal decision criterion, we simulated the behavior of rats with different 197 decision criterion in the three paradigms based on the signal detection theory and computed water 198 reward per unit time for each decision criterion in each paradigm. For each simulated session, 199 the probability of S+ trials was set at either 20%, 50%, or 80%. For a given decision criterion, on 200 an S+ trial, a random number was drawn from a normal distribution with mean = 0.52, which is 201 the mean perceptual sensitivity across the three paradigms, and variance = 1. If the random 202 number was greater than the decision criterion, a hit was logged. Otherwise, a miss was logged. 203 For a No-go trial, a random number was drawn for a normal distribution with mean of 0 and 204 variance of 1. Either a false alarm or correct rejection was logged, depending upon if the random 205 number was greater than the decision criterion. The duration of a hit trial was composed of a 206 random waiting period (from a 1 -3.5 s uniform distribution), a mean response time, and a 6 s 207 inter-trial interval, while the duration of a miss or correction rejection trial is composed of a random 208 waiting period (from a 1 -3.5 s uniform distribution), a 1.3 s of window of opportunity and a 6 s 209 inter-trial interval. The duration of a false alarm trial is the sum of a random waiting period (from 210 a 1 -3.5 s uniform distribution), a mean response time, a 10 s timeout, and a 6 s inter-trial interval. 211 For each paradigm, we simulated 15,000 trials for each decision criterion, and the decision 212 criterion leading to the maximal water reward per unit time was considered as the optimal decision 213 criterion. Note that water rewards resulted only from hit responses. We repeated the simulation 214 20 times to estimate the variance of the optimal decision criterion for each paradigm.

RESULTS 232
To test how animals adaptively change their behavior in perceptual decision making tasks 233 in response to changes in sensory environment, we trained head-fixed rats to perform tactile 234 decision making tasks using a Go/No-go discrimination paradigm ( Figure 1A) (Schriver et al.,235 2018; Liu et al., 2021). In these tasks, the rats were required to make decisions to respond or 236 withhold response after a tactile stimulus was presented (Figure 1B). In the initial training 237 sessions, Go stimulus (S+ stimulus, 0.5 s 8 Hz whisker stimulation), which the animal is trained 238 to respond to for rewards, was randomly presented in 50% of trials, while No-go stimulus (S-239 stimulus, 0.5 s 4 Hz whisker stimulation), to which the animal is trained to withhold response to 240 avoid time-out, was presented in the rest of the trials. Animals had significantly higher response 241 probability to S+ than to S-stimulus in these sessions (0.72±0.019 vs 0.58±0.02, p<7.6e-07, 242 paired t-test, mean ± SEM unless otherwise noted, Figure 1C), indicating that the animal 243 understands the requirement. Moreover, consistent with our previous work, the pupil size of the 244 rats fluctuated throughout the sessions. The pupil dynamics around stimulus presentation were 245 different across the four possible behavioral outcomes (i.e., hit, correct rejection, false alarm, 246 miss) (Figure 1 D&E). Baseline pupil size before stimulus onset was higher for hit and false alarm 247 To test whether the animals adaptively changed their behavior in response to changes in 253 sensory environment, we systematically manipulated the statistics of sensory signals. In our 254 experiments, we used three paradigms in which fractions of S+ trials, i.e. trials on which S+ 255 stimulus was presented, were set at 20%, 50%, and 80%. Each session was randomly assigned 256 with one paradigm and its corresponding fraction of S+ trials. We found that animals adaptively 257 changed their response rate in response to changes in the fraction of S+ trials for each session 258 (Figure 2A). In general, both hit rate and false alarm rate decreased as the fraction of S+ trials 259  We reasoned that if animals were thirstier in sessions where 20% of the trials were S+ trials, they 281 would tend to lick more impulsively, leading to a higher impulsive licking rate. However, our data 282 suggested this was not the case. The fraction of impulsive licking trials was significantly smaller 283 for sessions where 20% of the trials were S+ trials, compared to the other two paradigms 284 (0.174±0.019 vs 0.134±0.0106 vs 0.062±0.0055, p<1.06e-8, ANOVA test) (Figure 3A), 285 suggesting that the changes in response probability resulted from cognitive processing in 286 responses to changes in probability of S+ trials, rather than the level of thirst. To further support 287 this notion, we found that the reaction time monotonically increased with the decrease in fraction   Figure 3C). As we expected, our data showed that decision criterion was negatively correlated 291 with percent of impulsive licking trials (p<2.23e-15) ( Figure 3D). Taken together, these results 292 suggest that the adaptive behavior of the animals that we observed in our experiments was due 293 to higher-level cognitive processing of the statistics of sensory environment, rather than low-level 294 physiological needs such as thirst. 295 We have previously showed that perceptual decision making depended on pupil-linked 296 arousal (Schriver et al., 2020). We then examine if pupil dynamics were different during adaptive 297 decision making across the three paradigms. Indeed, the pupil dynamics around stimulus 298 presentation were significantly different between the three paradigms for the four behavioral 299 outcomes ( Figure 4A). We found there was a dramatic difference in task-evoked pupil dilation in paradigms with the fraction of S+ trials being 0.8 and 0.5, there is a profound inverted-U or U 312 shaped relationship between baseline pupil size and hit/false alarm rates, decision criterion, and 313 perceptual sensitivity (Figure 4D&E). However, this inverted-U or U shaped relationship between 314 baseline pupil size and hit/false alarm rates, decision criterion, and perceptual sensitivity is less 315 conspicuous for the paradigm in which the fraction of S+ trials is 0.2 ( Figure 4F). 316 How were the pupil dynamics related to the adaptive behavior? Our data demonstrated 317 that task-evoked pupil dilations were different across the three paradigms, and that the animals 318 mostly adjusted their decision criterion while maintaining the same perceptual sensitivity across 319 the three paradigms. To determine the optimal decision criterion for each paradigm, we simulated 320 the water reward per unit time with different decision criteria for each paradigm using our 321 experimental parameters. If a decision criterion is too negative, the animal will be liberal in making 322 Go decisions. Consequently, they will encounter many false alarms, and thus a substantial portion 323 of the task will be in the time-out period. On the contrary, if the animal is too conservative in 324 making Go decisions and sets the decision criterion to be a large positive value, the animal will 325 falsely reject many S+ stimuli, resulting in a low hit rate and less water intake throughout the task 326 period. Our simulation results indicate that the optimal decision criterion was significantly smaller 327 than the ones that we observed experimentally. For the paradigm with fraction of S+ trials being 328 0.8, the optimal decision criterion and observed decision criterion were -3.215±0.107 vs -329 1.137±0.084 (p<0.2.35e-20, t-test). Similarly, the optimal and observed decision criterion were -330 1.535±0.033 vs -0.479±0.065 (p<1.43e-10, t-test) and -0.93±0.0275 vs 0.0636±0.086 (p<6.5e-08, 331 t-test) for the two other paradigms, respectively ( Figure 5A). We further examined if the decision 332 criterion is dependent upon pupil size within each paradigm. We found that in the paradigm where 333 80% of trials were S+ trials, the decision criterion increased monotonically with baseline pupil size 334 (p<0.0025). However, this trend did not hold for the other two paradigms (Figure 5B). 335 Although these results indicated that the animals were sub-optimal in terms of their 336 adaptive behavior, the adaptive change in decision criterion observed in our experiments was in 337 line with the optimal adjustment of decision criteria. We therefore compared changes in decision 338 criteria across the three paradigms, measured as the slope of decision criterion across the three 339 paradigms, between the optimal decision making case (i.e. simulation) and the real case (i.e. 340 experiments). We found that the change in optimal decision criteria in response to changes in 341 paradigms, i.e. slope of optimal decision criterion across the three paradigms, was much stiffer 342 than the experimentally observed slope (1.14 vs 0.355) (Figure 5C). We further examined if the 343 adjustment of decision criterion in response to changes in sensory environment depended on 344 pupil-linked arousal indexed by pupil size. To this end, we grouped trials of each session into three 345 groups based on the baseline pupil size, and calculated decision criteria for trials within each 346 group ( Figure 5D). We found that there was a systematic change in the slope along the baseline 347 pupil size, with the slope being largest during small baseline pupil size (p<5.4e-28, ANOVA test). 348 However, the slopes for all pupil size dependent groups were significantly smaller than the slope 349 of optimal decision criteria (p<4.4e-138, ANOVA test) (Figure 5E). 350 We further used the drift diffusion model (DDM) to quantify the extent to which the other 351 parameters of decision making, including non-decision time, decision boundary, drift rate and 352 initial bias, were affected by the changes in sensory environment ( Figure 6A). To this end, we 353 used a Bayesian approach to estimate the distributions of decision making parameters at the 354 group level for each paradigm (Wiecki et al., 2013), and Pp|D is used to refer to the proportion of 355 posteriors from Bayesian inference, supporting the working hypothesis that there is a difference 356 between the paradigms at the group level. We first calculated the Deviance Information Criterion 357 (DIC) value of the four variants of the hierarchical DDM for our behavioral data (see Methods). 358 Since DIC balances between a goodness-of-fit of the model and additional free model 359 parameters, we used the model with the lowest DIC value (Figure 6B). This model generated a 360 similar distribution of reaction times to those measured experimentally ( Figure 6C). HDDM results 361 suggested a significant difference in non-decision time and initial bias among the three paradigms 362 (Pp|D≈1) (Figure 6D&E). However, for the decision boundary, there is only a significant difference 363 between the paradigm with 80% S+ trials and both paradigms with 50% and 20% S+ trials 364 (Pp|D≈1), and there is no difference between the paradigm with 50% S+ trials and the paradigm 365 with 20% S+ trials (Pp|D=0.2) (Figure 6F). We failed to find significant differences in drift rate 366 across the paradigms (Pp|D>0.05) (Figure 6G). 367

DISCUSSION 368
Our previous work investigated how pupil-linked arousal modulates behavioral 369 performance (Schriver et al., 2018)  several novel findings. First, we showed that the animals became more liberal in making a Go 377 decision when the probability of S+ stimulus increased. This adaptation is in line with the optimal 378 adaptation to maximize rewards during the task ( Figure 2&5). Second, task evoked phasic pupil-379 linked arousal is higher when probability of S+ is low (Figure 4) paradigms. But less engagement and poor attention usually lead to poor performance in 407 perceptual tasks. Moreover, the task-evoked pupil dilation is largest in the paradigm where the 408 probability of S+ trials was 20%, but previous work suggested larger task-evoked pupil dilation 409 during more task-engaged or attentive periods (Hoeks and Levelt, 1993;Cardoso et al., 2019). 410 Taken together, the behavioral adaptation during perceptual decision making in our experiments 411 is unlikely to be primarily due to changes in behavioral states such as attention or engagement in 412 the task. A possibility is that this behavioral adaptation is driven by different internal models 413 involving probabilistic inference and expectation (Bouret and Sara, 2005;Yu and Dayan, 2005;414 Tervo et al., 2014). Future work with electrophysiological recordings in higher-order brain regions 415 and neuromodulatory systems will help answer this intriguing question. 416 Our findings provide new evidence that adaptive decision making is dependent upon pupil-417 linked arousal. Previous work suggested that pupil size is able to reliably index the activation of 418 the LC-NE system, as microstimulation of the LC evoked dramatic dilation of pupil in rats and processing with neural interfaces. 442