So what'cha want? The impact of individualised rewards on associative learning in psychopathic offenders

Psychopathic individuals typically present with associative learning impairments under explicit learning conditions. The present study aimed to investigate whether the formation of stimulus-outcome associations, as well as updating of these associations after changed contingencies, could be improved by using rewards with sufficiently high subjective values. To this end, 20 psychopathic offenders, 17 non-psychopathic offenders and 18 healthy controls performed a passive avoidance task with a reversal phase under three motivational conditions, using naturalistic rewards. The subjective values of the rewards were assessed a priori for each individual participant using a visual analogue scale. The correspondence of these values to their internal representation was confirmed by analyses of brain potentials. Analyses using both signal detection theory and classical approaches indicated that psychopathic offenders performed worse compared to the other groups during passive avoidance learning. However, using the signal detection approach, we found this deficiency to be present only when a hypothetical reward was used ('neutral reward' condition), whereas psychopathic offenders performed similar to the other groups when naturalistic rewards could be obtained ('low reward' and 'high reward' conditions). Furthermore, traditional analyses suggested that psychopathic offenders had more hits than the other groups during reversal learning, but the signal detection approach indicated that no effects of group or condition were present. Analysis of win-stay and lose-shift behaviour showed that psychopathic offenders were less likely to stay with a rewarded response during passive avoidance learning in the neutral reward condition. In addition, regardless of experimental phase or condition, psychopathic offenders were less likely to stop responding to a particular stimulus after receiving negative feedback. Although the approaches employed did not lead to unequivocal results, our findings suggest that psychopathic offenders do have the ability to adapt their behaviour to environmental contingencies when positive reinforcers with sufficiently high subjective values are used.


a b s t r a c t
Psychopathic individuals typically present with associative learning impairments under explicit learning conditions. The present study aimed to investigate whether the formation of stimulus-outcome associations, as well as updating of these associations after changed contingencies, could be improved by using rewards with sufficiently high subjective values.
To this end, 20 psychopathic offenders, 17 non-psychopathic offenders and 18 healthy controls performed a passive avoidance task with a reversal phase under three motivational conditions, using naturalistic rewards. The subjective values of the rewards were assessed a priori for each individual participant using a visual analogue scale. The correspondence of these values to their internal representation was confirmed by analyses of brain potentials. Analyses using both signal detection theory and classical approaches indicated that psychopathic offenders performed worse compared to the other groups during passive avoidance learning. However, using the signal detection approach, we found this deficiency to be present only when a hypothetical reward was used ('neutral reward' condition), whereas psychopathic offenders performed similar to the other groups when naturalistic rewards could be obtained ('low reward' and 'high reward' conditions).
Furthermore, traditional analyses suggested that psychopathic offenders had more hits than the other groups during reversal learning, but the signal detection approach indicated that no effects of group or condition were present. Analysis of win-stay and lose-shift behaviour showed that psychopathic offenders were less likely to stay with a rewarded response during passive avoidance learning in the neutral reward condition. In addition, regardless of experimental phase or condition, psychopathic offenders were less likely to stop responding to a particular stimulus after receiving negative feedback. Although the approaches employed did not lead to unequivocal results, our findings suggest that

Introduction
Healthy social functioning relies on having a well-developed set of cognitive skills and functions that allows us to flexibly adapt our behaviour. Such functions include the capacity to learn from feedback, behavioural inhibition, the anticipation of behavioural consequences and evaluation of punishment and reward (Morgan & Lilienfeld, 2000;Ogilvie, Stewart, Chan, & Shum, 2011). Individuals that show impairments in these cognitive functions are more likely to engage in antisocial behaviour, which might eventually result in entering the criminal justice system. For these individuals, it may be particularly difficult to benefit from the approach used in many correctional systems, in which offenders are expected to reflect on and learn from the negative outcomes of their choices, and to acquire new behavioural repertoires associated with (more) positive consequences.
As offenders are likely to show psychiatric symptoms (Gottfried & Christopher, 2017;James & Glaze, 2006), those afflicted would likely benefit more from psychiatric interventions to achieve this behavioural change. However, highly antisocial offenders, particularly those with psychopathy, are known to respond poorly to traditional therapeutic interventions (see Brazil, van Dongen, Maes, Mars, & Baskin-Sommers, 2018). Psychopathy is a personality disorder characterized by emotional disturbances in combination with severe antisociality. Psychopathic individuals are known for their reckless and impulsive behaviour as well as a callous disregard for others, presenting with a lack of empathy and remorse (Hare, 2003). Consequently, psychopathic offenders constitute one of the most difficult groups to treat and show high rates of recidivism, while they are often involved in extremely violent offences (Barbaree, 2005;Hare, Clark, Grann, & Thornton, 2000;Hildebrand, de Ruiter, & de Vogel, 2004;Salekin, Worley, & Grimes, 2010). These high reoffending rates suggest a relatively low sensitivity to corrective experiences, or an impaired ability to adapt and correct previously acquired antisocial behaviour.
The reduced ability to learn from negative behavioural consequences, like previous incarceration, has often been linked to a number of cognitive processing deficits that are typically observed in psychopathy. A well-established finding is that psychopathy is associated with maladaptive behaviour following negative feedback (e.g., Von Borries et al., 2010), as well as impaired learning of positive and negative contingencies (e.g., Budhani, Richell, & Blair, 2006;Newman & Kosson, 1986). Earlier studies on associative learning in psychopathy were focused on passive avoidance learning, during which individuals learn by trial and error which stimuli they should respond to in order to obtain positive reinforcement (e.g., winning points, referred to as 'reward' in those studies), and which stimuli require a response to be withheld as they do not yield any positive outcomes or are even associated with aversive consequences (e.g., losing points, labelled as 'punishment'). Psychopathic individuals made more commission errors than healthy individuals during passive avoidance learning, but the groups had a similar number of omission errors (Newman & Kosson, 1986;Newman, Widom, & Nathan, 1985;Thornquist & Zuckerman, 1995). The findings indicated that, in conditions including both positive and negative outcomes, the acquisition of stimulus-outcome associations based on aversive contingencies was compromised.
Later studies also identified an impairment in reversal learning in psychopathy (e.g., Brazil et al., 2013;Budhani et al., 2006;Mitchell, Colledge, Leonard, & Blair, 2002). During reversal learning, participants first learn which stimuli are associated with positive outcomes, and which stimuli are associated with negative outcomes. After a predefined learning criterion has been reached, contingencies are reversed, so that stimuli that previously led to positive outcomes should be avoided, and stimuli that led to negative outcomes now yield positive outcomes. The findings indicated that psychopathic individuals performed worse than controls in the reversal phase, suggesting that they had trouble updating the stimulusoutcome mappings after the contingencies had changed (e.g., Baskin-Sommers, Curtin, & Newman, 2015;Brazil et al., 2013;Budhani et al., 2006;Mitchell et al., 2002).
It is likely that the learning impairments associated with psychopathy contribute to the limited capacity for behavioural change that is often seen during treatment of psychopathic individuals. However, facilitating adaptive behaviour also requires the use of reinforcers with sufficiently high motivational value (Bissonette & Roesch, 2016;Miendlarzewska, Bavelier, & Schwartz, 2016;Schultz, 1998). An individual can assign a value to a reinforcer based on how motivationally relevant (i.e., attractive or unattractive) the reinforcer is for the individual, and individuals will vary in the value assigned to the same reinforcer. For example, an apple can have a relatively high value when a person is hungry, but the same apple will have a lower value if the person is already satiated. As the physical properties (e.g., caloric content) of the apple are equal in both situations, the assignment of value to the apple is subjective.
Human and animal research has indeed shown that the subjective values assigned to positive reinforcers can guide a variety of cognitive processes, including associative learning and decision-making (Gallagher, McMahan, & Schoenbaum, 1999;Medic et al., 2014;Padoa-Schioppa & Cai, 2011). However, the majority of studies on reinforcement learning in psychopathy used mere 'points' that could be earned or lost (e.g., Blair et al., 2004;Budhani et al., 2006;Dargis, Wolf, & Koenigs, 2016;Mitchell et al., 2002), without considering the role of subjective valuation. One important note is that points were earned on a trial-by-trial basis, serving as immediate reinforcers that help guide learning throughout the task. Only a handful of studies considered other types of reinforcers that may be seen as being more 'naturalistic' than points, such as money, cigarettes, or snacks that could be obtained only after task completion instead of on each trial (Newman & Kosson, 1986;Newman, Patterson, Howland, & Nichols, 1990;Newman & Schmitt, 1998). However, prior studies in psychopathy have used the term 'reward' interchangeably to refer to both immediate and long-term reinforcement, even though these may not involve the exact same mechanisms (Woolley & Fishbach, 2016. To avoid further confusions, we will use the term 'reward' to refer to long-term reinforcement, using the definition formulated by Berridge and Robinson (2003), which includes three central elements of reward: (a) it must be attractive (affective aspect), (b) it must motivate the individual for action (motivational aspect), and (c) the thought of obtaining the reward must lead to an expectation of an enhanced positive state (cognitive aspect). This definition underlines the multi-facetted nature of the experience of reward, as well as the subjectivity and interindividual variability associated with what is experienced as reward. The studies employing 'naturalistic' rewards in psychopathy research pointed out that using such rewards did not influence learning differently than when points were used. However, these studies did not provide any information on the extent to which the included rewards were experienced as being relevant and of high value for their participants. As such, experimental research focusing on reinforcement learning in psychopathy may have used rewards with relatively low subjective value. Importantly, evaluating and comparing different types of rewards require the values assigned to each reward to be placed on a common scale, and animal research has shown that the reward with the highest subjective value on this common scale is often chosen (Lak, Stauffer, & Schultz, 2014). There is compelling evidence that subjective reward values are represented by a common neural currency in (pre)frontal brain areas in humans as well (Levy & Glimcher, 2012;Peters & Bü chel, 2010). These representations seem to be generated in the ventromedial prefrontal cortex (vmPFC), a region for which various neuroimaging studies in psychopathic individuals have reported reduced volume (Boccardi et al., 2011;de Oliveira-Souza et al., 2008;Tiihonen et al., 2008;Yang, Raine, Colletti, Toga, & Narr, 2010) and activity during task performance (Birbaumer et al., 2005;Finger et al., 2011;Rilling et al., 2007). Such findings support one of the predictions generated by the Integrated Emotion Systems (IES) model of psychopathy (Blair, 2004). This model assumes that impaired learning in psychopathy is partly driven by dysfunctions in generating representations of reinforcement (including reward) expectancies in the vmPFC (e.g., Blair, 2007), in addition to disturbances in establishing stimuluseoutcome associations in the amygdala. Taken together, there are grounds to believe that the computation and representation of reward values might be compromised in individuals with psychopathy, which in turn could (partly) underlie their maladaptive tendencies and poor decision-making.
However, while there is evidence for disturbances in the computation of reward value in psychopathy (see also Hosking et al., 2017), it is unknown how such an impairment ultimately affects the learning process. The fact that each individual will subjectively assign a different value to a particular reward makes it even more challenging to unravel this relationship. One approach to studying the impact of individual differences in value assignment would be to study reinforcement learning using rewards that are matched on their subjective values across individuals. Thus, the attractiveness of the rewards can be manipulated and controlled for by letting each individual indicate which rewards they value most, and subsequently incorporating these rewards in the experimental paradigm. In addition, collecting neural responses indexing outcome processing can provide further insights into how value assignment affects learning in psychopathy.
The overarching aim of the present study was to explore how subjective valuation of rewards affects associative learning in psychopathic offenders. Specifically, we investigated to what extent self-reported, subjective reward values influence both initial learning and contingency updating, and whether these processes were different in psychopathic offenders compared to non-psychopathic offenders and healthy controls. In line with previous research, we expected both passive avoidance learning and reversal learning to be impaired in psychopathic offenders compared to nonpsychopathic offenders and controls under low motivational conditions (i.e., using rewards with low subjective values), but we expected their performance to improve when rewards with high subjective values were incorporated in the task. The rationale behind this hypothesis was that higher subjective reward values would increase sensitivity to cues (i.e., positive and negative feedback on a trial-by-trial basis) that guide behaviour towards obtaining the reward at the end of the task; we expected higher rewards to increase the salience of those immediate reinforcers and also increase the readiness to make effort. This approach is consistent with research showing that short-term, immediate rewards predict persistence in working towards longer-term goals or rewards (Woolley & Fishbach, 2016. It also has many parallels with clinical practice, in which working towards longerterm goals also involves using immediate feedback to stay motivated and guide behaviour. Win-stay and lose-shift percentages, reflecting the tendency to stay with a response yielding positive feedback (i.e., immediate positive outcomes) and to shift away from a response yielding negative feedback (i.e., immediate negative outcomes), were also calculated for both acquisition and reversal to understand the relation between immediate feedback processing and general task performance. Previous investigations of contingency processing during associative learning in populations presenting with reversal learning deficits (e.g., individuals with psychopathy and patients with OFC-lesions) have focused on win-stay and lose-shift behaviour during reversal learning only (Berlin, Rolls, & Kischka, 2004;Budhani et al., 2006). However, we were also interested in studying win-stay and lose-shift behaviour during acquisition, as this would allow for a more thorough investigation of whether impaired passive avoidance learning in psychopathy is mainly attributable to reduced processing of information related to positive outcomes, reduced processing c o r t e x 1 4 9 ( 2 0 2 2 ) 4 4 e5 8 of information signalling negative outcomes, or both. Since processing of positive outcomes often seems to come at the expense of processing negative outcomes in psychopathy, at least during passive avoidance learning Newman et al., 1990), we expected psychopathic offenders to be less prone to shift away from a response yielding negative feedback but to be equally likely to stay with a response yielding positive feedback compared to the two other groups. In addition, similar to our general expectations regarding passive avoidance and reversal learning, we expected their win-stay and lose-shift behaviour to increase under conditions with high subjective reward values.
Finally, electrophysiological responses to feedback (as indexed by the feedback-related negativity e FRN) were collected to examine brain responses to rewards as a function of their subjective values. The FRN is an event-related potential (ERP) commonly used to study outcome processing in reinforcement learning tasks. The FRN has been found to be particularly sensitive to the valence and magnitude of external feedback representing performance-based outcome (i.e., positive or negative outcomes), and is generated in the anterior cingulate cortex (ACC) and other prefrontal areas associated with outcome evaluation (Gehring & Willoughby, 2002;Holroyd & Coles, 2002;San Martín, 2012;Yeung & Sanfey, 2004). The amplitude of the FRN is generally larger following negative relative to positive outcomes, but it is also modulated by motivational significance. For example, one study found that the amplitude of the FRN increased as the level of personal relevance of the outcomes increased during joint action (Loehr, Kourtis, & Brazil, 2015). As such, the FRN could provide information on whether associative learning deficits in psychopathy stem from deficient coding of outcome value. We reasoned that higher subjective reward values would evoke stronger neural responses to negative feedback, and that the groups would not differ in this regard given prior findings indicating intact feedback processing in psychopathic offenders (Von Borries et al., 2010).

Participants
Participants in the offender groups were recruited from the inpatient population of a maximum security forensic psychiatric institute in The Netherlands. Offenders were initially selected based on available information about clinical status and history obtained from their head therapists and patient files. Healthy controls were recruited via advertisements on social media, via research participant pools, and among employees in the facility. Subsequently, trained psychologists screened potential participants using the Dutch version of the MINI Psychiatric Interview (Van Vliet, Leroy, & Van Megen, 2000) and the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II; Weertman, Arntz, & Kerkhofs, 1996). Exclusion criteria were all major current Axis I and Axis II disorders (with exception of cluster B personality disorders), chronic use of intoxicating substances, and the use of psychotropic medication at the time of testing. All participants received written information about the experiment, gave written informed consent and received financial compensation. The experimental protocol was approved by the local ethics committee of the Radboud University Nijmegen (ECSW2016-2501-373). Each participant's IQ was estimated using a combination of two subtests (Information and Coding) of the core Dutch Wechsler Adult Intelligence ScaleeIV (WAIS-IV) subtests (Girard, Axelrod, Patel, & Crawford, 2015;Wechsler, 2012aWechsler, , 2012b. For assignment to the group with psychopathic offenders, a cut-off score of 26 (Rasmussen, Storsaeter, & Levander, 1999) on the Hare Psychopathy ChecklistdRevised (PCL-R; Hare, 2003) was used, an instrument combining file information and a semi-structured interview to assess the core interpersonal, affective and behavioural attributes of psychopathy. The offender groups and healthy controls were matched for age and IQ. No PCL-R scores were available for the control group, as these participants did not have criminal records. Demographics of the three participant groups are presented in Table 1.
The sample size was similar to that of prior studies focused on feedback-based learning and/or error processing in Note. Group data are mean (SD) unless stated otherwise. FSIQ ¼ fullscale intelligence quotient. PCL-R ¼ psychopathy checklist-revised. N/A ¼ not applicable. *For one participant in the psychopathic offenders group, no exact PCLR-score was available. However, multiple legal documents in this participant's patient file stated that the PCL-R had been assessed, and that the resulting score indicated that this offender had (very) high levels of psychopathic traits. Mean PCL-R total score was therefore calculated with data from 19 psychopathic offenders. In addition, for one participant in the psychopathic offenders group, the IQ score that was obtained in the screening session could not be traced at the time of data analysis. Therefore, calculation of mean FSIQ score, as well as further analyses containing IQ, was performed with data from 19 psychopathic offenders.
c o r t e x 1 4 9 ( 2 0 2 2 ) 4 4 e5 8 offenders with psychopathy (e.g., Brazil et al., 2009;Budhani et al., 2006;Von Borries et al., 2010), and an a priori performed power analysis (G*Power 3.1; Faul, Erdfelder, Buchner, & Lang, 2009) confirmed that our sample size would generate sufficient statistical power (>.80) to investigate our effects of interest (Cohen's f ¼ .25, a ¼ .05, two-tailed). Legal copyright restrictions prevent public archiving of the various diagnostic instruments and tests described in Section 2.1, which can be obtained from the copyright holders in the cited references.

Task and design
First, participants sorted different items representing potential rewards based on their attractiveness using a Visual Analogue Scale (VAS), on which ratings could range from À25 (extremely unattractive) to þ25 (extremely attractive). The reward items were selected from a larger pool of items we identified in a previous study on reward preferences in forensic samples (Glimmerveen, Brazil, Bulten, & Maes, 2018), and could be roughly divided into three categories: material rewards, food-related rewards, and rewards related to personal development (e.g., attending workshops; see Appendix A, for an overview). Participants in the offender samples sorted 20 different items, and healthy controls were presented a subset of 10 items that were identical or comparable to the items presented to the offender groups. One reason for this difference was that a number of items related to personal development were difficult to arrange outside the clinical setting. In addition, the descriptions of some rewards differed in specificity between groups, resulting in multiple specific items in the offender groups versus a single more inclusive item for the healthy control group. For each participant, the two rewards with the lowest positive ratings and the two rewards with the highest positive ratings were selected. The negatively valued items were discarded, as they could not be considered as rewarding items. One of the two selected lowvalue rewards and one of the two selected high-value rewards were used in another task (not part of the present study) that was performed in the same session; allocation of the highest and lowest rated high-value and low-value rewards to each task occurred in counterbalanced order. In addition, task order was counterbalanced across participants. The experiment consisted of an adapted version of the passive avoidance task developed by Newman and Kosson (1986), in which a reversal component was added based on the reinforcement contingency scheme used by Budhani et al. (2006). The experiment was run with OpenSesame v3.0 (Mathôt, Schreij, & Theeuwes, 2012). Participants were seated in front of a 100 Hz computer screen on which events were presented against a black background (see Fig. 1). A trial started with the presentation of a white fixation cross in the middle of the screen, followed by a go or a no-go stimulus. Stimulus presentation was terminated by a response, or by time-out. Participants made their responses by pressing the spacebar of a keyboard. After presentation of the stimulus, a blank screen was presented, followed by visual and auditory feedback on the response. Feedback consisted of the Dutch words goed (correct) or fout (incorrect), presented in green (correct) or red (incorrect) capitals, accompanied by a high or a low tone, respectively. When no response had been made (and stimulus presentation had timed-out), no feedback was presented. Note that responses were followed only by feedback and not by additional rewards or punishments during the task. This distinction is important, as it implies that the task does not involve (subjective) punishment, but only subjective rewards to be obtained at the end of the task. Research has shown that negative feedback, although aversive, is not processed as punishment per se (Li et al., 2019). Hence, we designed the positive and negative feedback presented during the task to be 'immediate reinforcers' to guide behaviour towards obtaining the subjective rewards on the long term. We avoid using the terms 'reward' and 'punishment' when referring to feedback and use the term 'reward' only to refer to the subjective rewards that participants could obtain at the end of the task.
In line with Newman and Kosson (1986), participants were presented with eight different two-digit numbers of which four were go stimuli and the other four were no-go stimuli. However, half of the go and no-go stimuli had probabilistic reinforcement contingencies, such that a response on the two probabilistic go stimuli would yield positive feedback on 80% of the trials and negative feedback on 20% of the trials, and, likewise, a response on the two probabilistic no-go stimuli would yield negative feedback on 80% of the trials and positive feedback on 20% of the trials. In addition, there was a go/no-go reversal for half of the stimuli, such that responses to previously correct stimuli (i.e., go stimuli) were followed by negative feedback indicating a wrong response, and responses to previously wrong stimuli (i.e., no-go stimuli) were followed by feedback indicating a correct response. This subset included each of the four stimulus types (non-probabilistic go, nonprobabilistic no-go, probabilistic go, probabilistic no-go) and was gradually introduced throughout the experiment following a predefined schedule (see Appendix B). Participants were not informed when these reversals would occur. For both nonreversing and reversing stimuli, there were 20 initial presentations, after which reversing stimuli had 20 additional presentations with reversed contingencies. One experimental run comprised of 240 trials, divided into five blocks of 40, 60, 40, 60, and 40 trials, respectively. Participants were unaware of this segmentation, but were offered self-paced pauses in the middle of the second and fourth blocks. Each participant performed the task under three reward conditions, in counterbalanced order using a Latin Square: 'neutral reward', involving a hypothetical reward (further explained below), 'low reward', involving the selected lowvalue reward, and 'high reward', involving the selected highvalue reward. The neutral reward was considered as having a subjective value equalling zero. A horizontal bar representing the participant's performance during the task was displayed continuously on the top of the screen (see Fig. 1). The bar increased and decreased in steps of 25 points, simultaneously with the presentation of positive and negative feedback, respectively. Since we wanted participants to be focused on the reward, and not on the number of points needed to gain the reward, they were not informed about this underlying point system. A cartoon figure (i.e., Homer Simpson) was placed at the beginning of the bar, and the upper right corner of the screen showed a picture of either the neutral, low or c o r t e x 1 4 9 ( 2 0 2 2 ) 4 4 e5 8 high rewards. The pictures representing low and high rewards were similar to the pictures that were used to rate the rewards on the VAS at the start of the experiment, and thus depended on the participants' individual choices. In the neutral (i.e., hypothetical) reward condition, the reward was represented by a donut. It was explained to the participants that they would not gain this donut themselves, but that Homer would be very grateful if they could help him to get it. Participants (or the cartoon figure, in the neutral reward condition) gained the reward when the horizontal bar reached (or crossed) the reward picture at the end of the task. To account for learning effects, the numbers of points to be earned for obtaining the reward were determined separately for each experimental run (first, second, or last), based on the median numbers of points participants earned in an (unpublished) pilot study (1410, 1550, or 1750 points, respectively). Material and food-related rewards (except the supervised dinner-preparing session to take place on the ward) were handed immediately at the end of the session; for the rewards related to personal development, appointments were arranged.

ERP acquisition and data processing
Electroencephalography (EEG) was recorded using 32 active electrodes (Acticap, Brain Products GmbH, Germany) arranged according to an extended version of the 10e20 system. All electrodes were referenced to the left earlobe. Vertical and horizontal eye movements were monitored using bipolar electrooculography (EOG) electrodes positioned above and beneath the right eye and at the outer canthi of both eyes. Impedance was kept below 10 kU and all signals were acquired with a sampling frequency of 500 Hz. EEG data processing was performed offline using Brain Vision Analyzer software (V2.01.3931, Brain Products GmbH, Germany). The data were re-referenced to the mean of both earlobe electrodes. Ocular artefacts were removed using Independent Component Analyses (Jung et al., 2000). The data were filtered using high-and low-pass filters of .05 Hz (24 dB/ oct) and 30 Hz (24 dB/oct), respectively. Next, the EEG data were segmented into epochs ranging from 200 msec before to 900 msec after feedback onset. The FRN was identified as the most negative peak relative to a 200 msec pre-feedback baseline period measured on the FCz electrode in the 150e400 msec interval after feedback onset. The choice for this interval was based on the time windows that are most commonly reported in the literature (Hajcak, Moser, Holroyd, & Simons, 2006;Holroyd, Nieuwenhuis, Yeung, & Cohen, 2003;Von Borries, Verkes, Bulten, Cools, & De Bruijn, 2013;Yeung & Sanfey, 2004). Finally, difference waves were calculated by subtracting FRN amplitude after win feedback from FRN amplitude following loss feedback. This subtraction procedure should isolate the components specifically related to differences in the processing of win and loss feedback (Hajcak, Moser, Holroyd, & Simons, 2007;Holroyd & Coles, 2002). Hence, larger difference waves are thought to reflect stronger neural responses to losses.

Data analyses
We used two different (but related) approaches to interpret response accuracy. First, we analysed hit rates and false alarm rates. However, one drawback of this approach is that it does not account for inter-individual differences in response tendencies (e.g., participants that respond on any trial type vs overly careful responders). Therefore, we used a signal detection framework to obtain less biased measures of participants' ability to discriminate between targets and nontargets. Using this approach, we obtained 'cleaner' performance measures that better reflect sensitivity to detect relevant cues, which is relevant to the learning process. Prior to analysis, hit rates and false alarm rates were calculated for each phase and contingency probability (both Fig. 1 e Sequence of events and their timing (ms) during the experiment. ITI ¼ Inter-trial interval. The figure depicts a trial in the neutral reward condition, with the cartoon character in the upper left corner of the screen, and a picture of the reward (i.e., a donut in the current (neutral reward) condition) in the upper right corner of the screen. In the low-and high reward conditions, pictures of the corresponding low-valued and high valued rewards were displayed in the upper right corner. The yellow bar between the cartoon character and the reward reflects the accumulated number of points and thus the progress towards obtaining the reward at the end of the current experimental run. In the figure can be seen that about 40% of the required number of points has been accumulated, as the yellow bar is almost halfway between the cartoon figure and the reward.
separately and combined), as well as for the total experiment (hit rate ¼ number of hits/number of go trials; false alarm rate ¼ number of false alarms/number of no-go trials). Next, hit rates and false alarm rates were converted into d 0 discriminability values using a signal detection framework: d 0 ¼ Z(hit rate) À Z(false alarm rate).
In order to examine participants' behaviour after receiving positive or negative feedback, win-stay and lose-shift percentages were calculated based on trial-by-trial data. Winstay percentage reflects the proportion of instances a participant repeats a response that yielded positive feedback on the subsequent encounter with a specific stimulus, instead of (incorrectly) withholding a response to this stimulus (win-stay percentage ¼ number of win-stay trials/(number of win-stay trials þ number of win-shift trials) Â 100). Lose-shift percentage reflects the proportion of instances a participant withholds a response when confronted with a stimulus that yielded negative feedback during the previous encounter, instead of incorrectly responding to this stimulus again (lose-shift percentage ¼ number of lose-shift trials/(number of lose-shift trials þ number of lose-stay trials) Â 100).
ERP data were analysed using a 3 Â 3 repeated measures analysis of variance (ANOVA) with Condition (neutral reward, low reward, high reward) as within-subject variable and Group (psychopathic offenders, non-psychopathic offenders, controls) as between-subjects factor.
We analysed hit rates and false alarm rates for acquisition and reversal separately using 3 Â 2 Â 3 repeated measures ANOVA's with Condition (neutral reward, low reward, high reward) and Probability (100e0, 80e20) as within-subject factors, and Group (psychopathic offenders, non-psychopathic offenders, controls) as between-subjects factor. Similar analyses were performed using the d 0 values as outcome measure. Win-stay and lose-shift percentages were analysed using a 3 Â 2 Â 3 repeated measures ANOVA with Condition (neutral reward, low reward, high reward) and Phase (acquisition, reversal) as within-subject factors and Group (psychopathic offenders, non-psychopathic offenders, controls) as betweensubjects factor.
Since IQ was found to differ significantly between groups, and IQ was expected to contribute to inter-individual learning variability, it was added as a covariate in all behavioural analyses involving the group factor (i.e., analyses of covariance; ANCOVAs). Effect sizes are reported as partial eta-squared [h 2 p ; small !.01, medium !.06, large !.14 (Cohen, 1988)]. Statistical analyses were performed in SPSS 23.0 (IBM Corporation, Armonk, NY). An overview of all tested effects in primary analyses and post-hoc tests is presented in Appendix C and Appendix D.

ERP results
First, the mean peak amplitude of difference waves at FCz was compared between reward conditions. Mauchly's test indicated that the assumption of sphericity had been violated, c 2 (2) ¼ 6.29, p ¼ .043, therefore degrees of freedom were corrected using Huynh-Feldt estimates of sphericity (ε ¼ .88  Fig. 2). This indicates that feedback processing was positively related to (subjective) reward value at a neural level. The analysis revealed no other statistically significant main and interaction effects.

Standard accuracy analyses
Standard accuracy analyses did not reveal a significant contribution of IQ to hit rates or false alarm rates. However, given the theoretical significance of intelligence with respect to (adaptive) learning ability (Sternberg, 1997), we decided to keep IQ as a covariate in our accuracy analyses involving the group factor.   Table 3 shows mean values of d 0 for all groups in each phase and condition, pooled over the probability factor and corrected for IQ. The Group Â Condition Â Probability repeated measures ANCOVA on d 0 from the acquisition phase revealed that IQ was significantly related to d 0 values [F(1, 50) ¼ 5.66, p ¼ .021, The Group Â Condition Â Probability ANOVA on the d 0 prime values obtained from the reversal phase did not reveal any significant main or interaction effects (see also Appendix C.4 and E.2).
Regarding lose-shift percentages, there was an interaction effect of group and phase, Wilk's Lambda ¼ .88, F(2,50) ¼ 3.35, p ¼ .043, h 2 p ¼ .118 (see also Appendix C.5). In order to examine this interaction effect, ANCOVAs were performed for acquisition and reversal separately (see Appendix D.4, and Fig. 4a and b, respectively). This revealed a main effect of group during acquisition, F(2,50) ¼ 5.31, p ¼ .008, h 2 p ¼ .175, but not during the reversal. Pairwise comparisons showed that, during acquisition, psychopathic offenders were less likely to shift away from a response yielding negative feedback compared to non-psychopathic offenders [t(51) ¼ À2.68, p ¼ .010] and controls [t(51) ¼ À3.02, p ¼ .004], but no differences were observed between non-psychopathic offenders and controls (p ¼ .

Discussion
The aim of the present study was to explore to what extent the subjective valuation of (long-term) rewards influenced associative learning based on (immediate) trial-by-trial feedback in psychopathic offenders. Analyses of brain potentials showed that rewards with relatively high subjective values evoked stronger neural responses compared to neutral and low value rewards, providing evidence for consistency between the a priori assigned subjective values to the rewards and the internal representation of these reward values. In addition, the ERP analyses did not indicate any group differences, suggesting that the mechanisms involved in the computation of value and processing of feedback were unaffected in the offender groups.

Passive avoidance
Using the traditional analytical approach, we found that, at least in the acquisition phase of the task, psychopathic offenders made more false alarms but a similar number of hits compared to non-psychopathic offenders and healthy controls. This is in line with previous research showing that psychopathy is associated with making more errors of commission but equal errors of omission during passive avoidance learning (e.g., Newman & Kosson, 1986;Newman et al., 1990). There were no effects of reward condition on hits and false alarms during passive avoidance learning. However, using a signal detection framework, we found that psychopathic offenders performed worse during acquisition than the two comparison groups in the neutral reward condition. In other words, with no reward with sufficiently high subjective value to look forward to, psychopathic offenders learned less about the contingencies compared to non-psychopathic offenders and controls. The discrepancy between the results obtained with these two different approaches can be explained by considering that d 0 does not depend on the response bias of individual participants. Hence, compared to hits and false alarms, it is a more sensitive measure of how well an individual is able to discriminate between targets and non-targets (see Stanislaw & Todorov, 1999) and consequently to learn to which stimuli to respond. Fig. 3 e Win-stay behaviour for each group in each condition across phases. Percentages reflect the proportion of instances participants repeated a response that yielded positive feedback on the subsequent encounter with a specific stimulus.
c o r t e x 1 4 9 ( 2 0 2 2 ) 4 4 e5 8 The findings obtained using the signal detection framework are in correspondence with previous research reporting passive avoidance learning deficits in psychopathy when reinforcers with relatively neutral subjective value are used Newman & Kosson, 1986;Newman et al., 1990;Newman & Schmitt, 1998). Interestingly, however, we found this deficit to be absent in conditions where rewards with sufficient subjective values could be earned. The interpretation of this finding is complicated by the fact that there was no (clear) linear effect of reward value condition on the d 0 value for any of the groups. This suggests that the presence of a group difference in the neutral but not the low and high reward conditions is not merely due to the psychopathic offenders improving their performance on the task with increasing subjective reward value. At least numerically, for the psychopathic offenders d 0 tended to be higher during Fig. 4 e a. Lose-shift behaviour for each group in each condition during acquisition. Percentages reflect the proportion of instances participants withheld a response when confronted with a stimulus on which responding yielded negative feedback during the previous encounter. b. Lose-shift behaviour for each group in each condition during reversal. Percentages reflect the proportion of instances participants withheld a response when confronted with a stimulus on which responding yielded negative feedback during the previous encounter.
the low and high reward conditions than the neutral condition, whereas this was reversed for the non-psychopathic offenders and controls. In other words, the similar d 0 prime values for the groups in the low and high subjective reward conditions was not only due to a (non-significant) tendency for the psychopathic offenders to perform better under these conditions, but also to a simultaneous (non-significant) tendency for the other two groups to decrease their performance under these conditions, relative to the neutral reward condition. The reason for the latter effect is unclear but may be related to complex interactions between (reward-induced) motivation, arousal level, and inhibitory control (e.g., Yee & Braver, 2018), which for the non-psychopathic and control participants may have resulted in reduced inhibitory control in the non-neutral reward conditions. This pattern of findings is in line with the notion of the IES model that the disturbances in reinforcement expectancies and stimulus-outcome associations can be modulated by saliency, such as reward level (Blair, 2007;Blair et al., 2004). Blair et al. (2004) indeed found performance of psychopathic offenders during passive avoidance learning to be positively related to the reward level of specific stimuli. Although, again, our results do not reveal a clear improvement (or decrement) across conditions for any group, they do show that the passive avoidance deficit that our psychopathic participants displayed in the neutral reward condition relative to performance of the other two groups under this condition, was absent in the low and high reward conditions.
In addition, our results cannot be explained by the Response Modulation (RM) hypothesis (Gorenstein & Newman, 1980;Newman & Kosson, 1986;Patterson & Newman, 1993), which is another influential framework used to explain impaired associative learning in psychopathy. The RM hypothesis states that the impaired passive avoidance learning displayed by individuals with psychopathy is caused by an attentional deficit, directing too much attentional resources to information related to positive outcomes in conditions involving both positive and negative behavioural contingencies. According to the RM hypothesis, this would result in a loss of attention for other contextual information, such as negative outcomes, leading to impaired learning from negative feedback. Importantly, without additional assumptions, if anything, high reward conditions should engender even greater attention to reward information at the expense of attention to punishment information, thereby (further) decreasing rather than increasing task performance.

Reversal learning
Although standard accuracy analyses indicated that psychopathic offenders had more hits during reversal compared to the other groups, this group difference disappeared using a signal detection framework, which showed that all groups performed similarly in the reversal phase of each condition. This finding is contrary to expectations as reversal learning impairments in psychopathy have been shown before, at least under explicit learning conditions (e.g., Brazil et al., 2013;Budhani et al., 2006;Mitchell et al., 2002). However, there are indications that reversal learning impairments in offenders with psychopathy are less robust than previously believed (De Brito, Viding, Kumari, Blackwood, & Hodgins, 2013;Mitchell et al., 2006). The presence of the reversal learning deficits seems to be dependent on the learning context provided by the experimental task used (Brazil, 2015;Brazil et al., 2013). In addition, sample-specific cognitive and clinical variables, such as the level of processing of predictive information and childhood maltreatment history, also appear to play important roles in the severity of the reversal learning impairment in psychopathy (Dargis et al., 2016;Gregory et al., 2015). We cannot rule out that such factors may have affected task performance during reversal learning in our particular samples. Another explanation might be that there could be a relation between task complexity and the locus of the learning impairments in psychopathy (Estrada, Tillem, Stuppy-Sullivan, & Baskin-Sommers, 2019). In relatively simple tasks including few stimuli, psychopathic offenders show intact acquisition and reversal (Brazil et al., 2013). However, in tasks including a more intermediate number of stimuli, they are still able to perform acquisition, but show impaired reversal (Budhani et al., 2006). Furthermore, in very complex tasks, including a large number of stimuli, they also show impaired acquisition (Von Borries et al., 2010). Importantly, in these studies, psychopathic offenders and comparison individuals were matched for (estimated) IQ or educational level, controlling for possible differences in intellectual abilities. Together, these data suggest that increased task complexity elicits a shift in the locus of impairment, but to date no studies have examined this hypothesis systematically. In our study, including eight (i.e., an intermediate number of) different stimuli, it would be expected that, like in the study of Budhani et al. (2006), they would at least present with impaired reversal. The fact that they showed intact reversal as compared with healthy controls and non-psychopathic offenders, highlights the need to further specify how task complexity may affect associative learning in psychopathy in future studies.

Win-stay behaviour
To investigate the immediate behavioural effects of positive and negative feedback, we also looked at participants' behaviour in every next encounter with each particular stimulus. Compared to non-psychopathic offenders, psychopathic offenders tended to be less likely to stay with a response yielding positive feedback in the neutral reward condition. Although win-stay behaviour of controls did not differ from both offender groups, this suggests that the motivational drive of offenders with psychopathy to use positive feedback information to guide future decisions is lower than what is observed in other violent offenders when explicit rewards are lacking. Hence, their ability to use positive feedback depends on whether the feedback is linked to an explicit and attractive reward with a subjective value (e.g., the low and high value rewards used in our experiment) that is higher than that of neutral rewards (e.g., points). When confirmed in future studies, a practical implication of this could be that feedback should be explicitly linked to clear and attractive rewards that they are willing to pursue, in order to have psychopathic offenders using positive feedback to develop more adaptive behavioural repertoires.
On the other hand, as previously noted, the IES model states that the formation and updating of expectancy representations may be compromised in psychopathic individuals because of abnormalities in (the connectivity between) the amygdala and the vmPFC. Indeed, Budhani et al. (2006) found reduced winstay behaviour in psychopathic offenders during reversal (the authors did not report on win-stay and lose-shift behaviour during acquisition). However, the IES model does not explain how differences in subjective reward value would influence feedback processing. Perhaps the neural processes involved in the formation and representation of stimulus-outcome associations and expectancies become more efficient when subjective reward values exceed a critical threshold. Interestingly, the findings obtained by Gregory et al. (2015) suggest that subjective value promotes associative learning in psychopathic offenders, although they found the neural processing of subjective value information to be highly atypical compared to non-psychopathic offenders and controls.

Lose-shift behaviour
Compared to non-psychopathic offenders and healthy controls, psychopathic offenders were less likely to shift away from a response yielding negative feedback during acquisition. In other words, psychopathic offenders seemed less likely to use negative feedback during passive avoidance learning than the two comparison groups, which is in agreement with the results obtained by Von Borries et al. (2010). Interestingly, in both phases, there was no effect of reward condition on this immediate measure of negative feedback processing, which suggests that a more general deficit underlies these impairments. This finding might be in line with the RM hypothesis when assuming limited attentional resources for negative feedback information processing in tasks involving both positive and negative feedback regardless of subjective rewardinduced overall motivational level. When evaluated from the neurocognitive perspective of the IES model, it suggests that the formation and updating of expectancy representations in the amygdala and vmPFC is especially compromised when negative feedback is being processed.

Limitations
One potential caveat of our study is that we were not able to test for effects of PCL-R factor scores, since for a number of participants factor scores were not available. We were dependent on file information and did not have permission nor resources to assess the PCL-R by ourselves in the present study. A second limitation of our study design is that we exclusively focused on subjective reward, without looking at subjective punishment. Investigating how varying levels of subjective punishment affect associative learning would be particularly relevant for understanding how behavioural change can be achieved in settings where it is difficult to implement the use of a wide array of subjective rewards (e.g., prison). Importantly, ethical aspects of research into subjective punishment in populations from the criminal justice system should be well-considered, both with respect to study design and the practical implications of its findings.

Conclusions
Our findings suggest that the use of naturalistic rewards, as opposed to more 'artificial' rewards, results in initial learning of new information in psychopathic offenders that is more comparable to that displayed by non-psychopathic offenders and controls. Moreover, such rewards facilitate their ability to use positive feedback information to guide future decisions. Importantly, the attractiveness of each reward was tailored to the subjective preferences of each individual participant, but the observed effects were independent of the magnitude of the associated subjective reward values. Another important finding is that traditional, standard accuracy analyses may lack sufficient sensitivity to detect these effects. Contrary to expectations, we did not find any group differences nor any effects of reward condition on reversal learning performance. However, in the acquisition phase and irrespective of reward condition, psychopathic offenders were impaired in adapting their behaviour following negative feedback. Our findings suggest that psychopathic offenders, despite a more general deficit in negative feedback processing, have the ability to reach performance levels that equal those of non-psychopathic offenders and controls according to environmental contingencies when positive reinforcers with sufficiently high subjective values are used. These findings are the result of a novel approach to associative learning in psychopathy, and stress the importance of personalised methodologies when using reinforcement techniques in forensic treatment.

Transparency
We report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/ exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study. No part of the study procedures and analyses was pre-registered prior to the research being conducted. Raw and pre-processed study data, analysis code, as well as digital study materials can be accessed at https://doi.org/10.34973/qqr5-9s48.

Open practices
The study in this article earned an Open Data badge for transparent practices. Data for this study can be found at: https://doi.org/10.34973/qqr5-9s48.