Feasibility of unconscious instrumental conditioning: A registered replication

The extent to which high-level, complex functions can proceed unconsciously has been a topic of considerable debate. While unconscious processing has been demonstrated for a range of low-level processes, from feature integration to simple forms of conditioning and learning, theoretical contributions suggest that increasing complexity requires conscious access. Here, we focus our attention on instrumental conditioning, which has been previously shown to proceed without stimulus awareness. Yet, instrumental conditioning also involves integrating information over a large temporal scale and distinct modalities in order to deploy selective action, constituting a process of substantial complexity. With this in mind, we revisit the question of feasibility of instrumental conditioning in the unconscious domain. Firstly, we address the theoretical and practical considerations relevant to unconscious learning in general. Secondly, we aim to replicate the first study to show instrumental conditioning in the absence of stimulus awareness (Pessiglione et al., 2008), following the original design and supplementing the original crucial analyses with a Bayesian approach (Experiment 1). We found that apparent unconscious learning took place when replicating the original methods directly and according to the tests of awareness used. However, we could not establish that the full sample was unaware in a separate awareness check. We therefore attempted to replicate the effect yet again with improved methods to address the issues related to sensitivity and immediacy (Experiment 2), including an individual threshold-setting task and a trial-by-trial awareness check permitting exclusion of individual aware trials. Here, we found evidence for absence of unconscious learning. This result provides evidence that instrumental conditioning did not occur without stimulus awareness in this paradigm, supporting the view that complex forms of learning may rely on conscious access. Our results provides support for the proposal that perceptual consciousness may be necessary for complex, flexible processes, especially where selective action and behavioural adaptation are required.


Unconscious processing
Unconscious learning Instrumental conditioning Consciousness a b s t r a c t The extent to which high-level, complex functions can proceed unconsciously has been a topic of considerable debate. While unconscious processing has been demonstrated for a range of low-level processes, from feature integration to simple forms of conditioning and learning, theoretical contributions suggest that increasing complexity requires conscious access. Here, we focus our attention on instrumental conditioning, which has been previously shown to proceed without stimulus awareness. Yet, instrumental conditioning also involves integrating information over a large temporal scale and distinct modalities in order to deploy selective action, constituting a process of substantial complexity. With this in mind, we revisit the question of feasibility of instrumental conditioning in the unconscious domain. Firstly, we address the theoretical and practical considerations relevant to unconscious learning in general. Secondly, we aim to replicate the first study to show instrumental conditioning in the absence of stimulus awareness (Pessiglione et al., 2008), following the original design and supplementing the original crucial analyses with a Bayesian approach (Experiment 1). We found that apparent unconscious learning took place when replicating the original methods directly and according to the tests of awareness used. However, we could not establish that the full sample was unaware in a separate awareness check. We therefore attempted to replicate the effect yet again with improved methods to address the issues related to sensitivity and immediacy (Experiment 2), including an individual threshold-setting task and a trial-by-trial awareness check permitting exclusion of individual aware trials. Here, we found evidence for absence of unconscious learning. This result provides evidence that instrumental conditioning did not occur without stimulus awareness in this paradigm, supporting the view

Introduction
Ever since the earliest demonstration of subliminal perception (Peirce & Jastrow, 1886), the extent to which information can be processed in the brain without conscious awareness has been a widely studied question. Unconscious processing has been demonstrated for many low-level processes such as feature detection and integration (e.g. integrating colour, shape and texture of an object into one coherent percept; Blake & Fox, 1974), as well as simple forms of learning, for instance emotional (Olsson & Phelps, 2004), visuospatial (Rosenthal, Kennard, & Soto, 2010) or associative (Scott, Samaha, Chrisley, & Dienes, 2018). However, the extent to which unconscious processing is possible for higher-level, more complex functions remains a topic of debate (Axelrod & Rees, 2014;Sterzer et al., 2014). One such example is learning the contingencies between stimuli and outcomes, especially in instrumental scenarios, where the agent must learn from multiple temporally separated events: the stimulus itself, their action, and its consequence. This kind of learning has apparently been shown to be feasible in the absence of stimulus awareness (Mastropasqua & Turatto, 2015;Pessiglione et al., 2008). However, following recent evidence to the contrary (Reber, Samimizad, & Mormann, 2018) and discussions about the minimal conditions for unconscious processing (e.g. Mudrik et al., 2014), as well as developments in methods used to assess conscious awareness (Dienes, 2015b;Rothkirch & Hesselmann, 2017;Shanks, 2017), we revisit the finding that instrumental learning can occur unconsciously.
Here, we attempt to replicate the original result of Pessiglione et al. (2008), leveraging the developments in the field of unconscious processing to apply a more robust statistical approach (Experiment 1), and a more rigorous methodology (Experiment 2). While there is no clear agreed theory demarcating what conscious versus unconscious mental states may represent (Breitmeyer, 2015;Dupoux, Gardelle, & Kouider, 2008;Kouider & Dehaene, 2007;Seth & Bayne, 2022), there have been attempts at outlining conditions under which unconscious processing can take place . A number of theoretical contributions consider consciousness a necessary component for higher-level processing, including (but not limited to) semantic knowledge, complex visual processing, as well as problem solving and decisionmaking (Baars, 2002;Treisman, 2003). In those views, consciousness plays a role in enabling information to be integrated across distinct brain regions through long-range feedback and feed-forward connections (Baars, 2002;Dehaene & Changeux, 2011;Dehaene, Sergent, & Changeux, 2003). In contrast, unconscious processing appears to be confined to separate areas, and does not result in a global spread of activity (Baars, Ramsøy, & Laureys, 2003;Melloni et al., 2007).
In support of this view, previous research in the fields of associative learning and priming suggests that low-level or short-range spatiotemporal (Lin & He, 2009;Van Den Bussche, Van Den Noortgate, & Reynvoet, 2009) and multisensory (Faivre, Mudrik, Schwartz, & Koch, 2014;Scott et al., 2018) information integration can proceed without conscious awareness of the stimuli (typically achieved with subliminal presentation methods such as masking or continuous flash suppression). Conversely, higher-level or longer-range spatial and temporal processing (e.g. for tasks requiring longer-term information maintenance or selective, flexible decision-making) should require conscious access (Dehaene et al., 2003;Kouider & Dupoux, 2001). Previously reported instances of unconscious learning are in line with those assumptions. For example, classical conditioning can be achieved without awareness in delay scenarios (where stimuli to be integrated overlap temporally), but not in trace scenarios (where stimuli are temporally separated (R. E. Clark & Squire, 1998)), with similar results in other associative learning tasks (Knight, Nguyen, & Bandettini, 2003;Raio, Carmel, Carrasco, & Phelps, 2012;Seitz, Kim, & Watanabe, 2009).
As such, the idea that instrumental learning can proceed with unconsciously perceived stimuli is an interesting case. In the first experiment to demonstrate it (Pessiglione et al., 2008), participants learned to adjust their behaviour, through Go/ NoGo decisions, in line with subliminally presented rewarding and punishing cues, learning to approach the rewarding and avoid the punishing stimulus without ever consciously perceiving them (constituting an example of trace instrumental learning). In order to successfully learn when to act and when to refrain from acting, participants had to learn from two temporally separated events across the length of the trial (up to 4 s): the subliminal stimulus itself, and its consequences, presented supraliminally as monetary reinforcement. Such a form of learning thus involves a fairly complex process of integrating information over a large temporal scale and distinct modalities, necessary to process the visual input, deploy selective action in response to the predictive cue, and process the reinforcement. The task is then considerably more complex than the aforementioned classical conditioning or associative learning scenarios, where there are fewer events, often in closer temporal proximity. Assuming that subliminally presented cue is not capable of evoking largeescale activity to be integrated with subsequent processes, the case of unconscious instrumental conditioning might appear at odds c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 with the theory and previous experimental evidence covered above. Yet, instrumental conditioning is also one of the earliest and most fundamental forms of adaptive behaviour, both phylogenetically and ontogenetically. As such, the extent to which it requires conscious access is a question of considerable theoretical value.
A key challenge in any research into unconscious influences on behaviour lies in reliably asserting that processing is genuinely unconscious (Newell & Shanks, 2013;Rebuschat, 2013;Timmermans & Cleeremans, 2015). Although it is frequent practice in this line of research to infer unconscious processing when a behavioural measure (e.g. conditioning, priming, etc.) is above chance, while a separate measure of awareness is non-significantly different from chance performance (e.g. a non-significant result in a discrimination task), this approach has been heavily criticised (Dienes, 2015b;Vadillo, Konstantinidis, & Shanks, 2016). A non-significant result alone cannot disambiguate between no evidence for an effect (i.e. insensitive data, e.g. due to the small sample size) and absence of an effect (i.e. support for the null hypothesis). As such, finding that performance on an awareness check does not significantly differ from chance is not enough to assert true absence of awarenessdan assertion which must be fulfilled to enable any inferences about the effect of interest, such as presence of unconscious conditioning in the original Pessiglione and colleagues study (Dienes, 2015b;Shanks, 2017). This fallacy can be rectified in two ways: 1) ensuring that the methods are relevant, and sufficiently sensitive (Berry & Dienes, 1993, p.38;Shanks & St. John, 1994), and 2) with use of statistical methods, most prominently the Bayes factor, which enables stronger inferences about whether a null result indicates support for the null (e.g. awareness absent) over the alternative hypothesis (e.g. awareness present), or whether the data are insensitive (Dienes, 2014(Dienes, , 2016Sand & Nilsson, 2016).
With these considerations in mind, we revisit the suggestion that instrumental learning can proceed without stimulus awareness. Experiment 1 will attempt to replicate the effect found by Pessiglione et al. (2008), following the original design and supplementing the original analyses with a Bayesian approach geared to determine a genuine absence of awareness, at least as measured by their test of awareness (whether this measure is a justified measure of awareness is an issue we will return to). Should the replication be successful, Experiment 2 will attempt to replicate the effect once again, this time with improved methods, to address the methodological issues related to the criteria of sensitivity and relevance in the original study.
In order to test whether stimuli that produce a certain level of learning are subliminal, one needs to know how much conscious perception would be needed to produce that level of learning (Dienes, 2015b). Thus, a pilot study was conducted in which stimuli were presented moderately above the objective threshold in order to determine a relationship between the level of awareness (given the test of awareness used by Pessiglione et al., 2008) and learning. Thus, first we ran a pilot study to norm the relationship between learning and required awareness levels when the learning is based on conscious perception.

2.
Pilot: relationship between level of awareness and learning above the objective threshold The pilot study aimed to assess both perceptual discrimination accuracy when awareness is present in a same/different discrimination task (as ensured with supraliminal stimulus presentation), and the corresponding level of learning subsequently achieved in a Go/NoGo task with the same stimulus exposure duration. This will be assessed employing a methodology identical to that of the replication study, Experiment 1. The observed relationship between awareness and learning will be used to identify the rough appropriate effect size for Bayes Factor calculation in the corresponding task conducted without awareness. The pilot was pre-registered at https://osf. io/rwnt7.  (Faul, Erdfelder, Lang, & Buchner, 2007), using a Cohen's d of .7 (a large effect size is justified given the supraliminal nature of the stimuli and the simplicity of the task), with 95% power. One participant was excluded after reporting to have misunderstood the learning task during debrief, yielding a final sample of 25 participants.

Stimuli and materials
The stimuli included 9 randomly selected characters from the Agathodaimon font presented in a white typeface on a black background, with a size of 70 Â 70 pixels. For each participant, 3 were randomly assigned to the first perceptual discrimination task (PDT1; threshold-setting), 3 to PDT2 (awareness check), and 3 to the main leaning task (1 to be associated with rewarding, 1 with punishing, and 1 with neutral outcome). Two black-and-white visual noise masks of the same size as the stimuli were generated by scrambling one character image into 8.75 by 8.75 pixels squares. The same two masks were used for all participants in the same fashion (one preceding and one following the target stimulus). The outcome images were a circled £1 coin image for reward, a crossed-out £1 image for punishment, and a greyed-out coin for neutral. The task was programmed using Matlab 2018b (Math-Works, 2018), running Psychophysics Toolbox (Brainard, 1997), and presented on a Samsung 2233RZ LCD monitor with a 120 Hz refresh rate (following recommendations for precise visual presentation; Wang & Nikoli c, 2011). Responses were collected with a standard keyboard.
2.1.3. Procedure 2.1.3.1. PERCEPTUAL DISCRIMINATION TASK 1: THRESHOLD FINDING. This task aimed to establish a cue display duration that permitted conscious discrimination at above-chance levels without reaching ceiling. Participants were seated at a 50 cm distance from the screen (ensured with a chinrest). Each trial began with a fixation cross (500 ms), followed by presentation of two cues (display duration starting at 600 ms), both forwardbackward masked (67 ms), separated by a 3s interval indicated by a fixation cross, following the method of Pessiglione et al. (2008). Following the displays, participants were asked to indicate whether the cues presented were the same or different, and judge their confidence in that decision (on a binary scale between "some confidence" and "total guess"). Both responses were made using the arrow keys. Cue display duration started at 600 ms, and dropped by 50 ms with every correct and confident discrimination. Once participants reached 100 ms or indicated guess for the first time, the display duration was increased by one increment (þ50 ms), and proceeded to decrease by smaller increments (8 ms, corresponding to a single screen refresh duration on a 120 Hz monitor). Once participants responded guess 6 consecutive times (irrespective of accuracy), the corresponding display duration was taken to be their threshold of conscious perception.
The display duration was then set to be 16 ms greater than the identified threshold and participants required to continue to make the same 'same' or 'different' and confidence judgments for a minimum of one block of 10 further trials. If objective discrimination accuracy for those 10 trials was between 70% and 90% (above chance, indicating that participants can reliably discriminate the cues, but are not at ceiling), the task terminated and the duration was recorded as the display duration to be used in the main task. Note that confidence was discounted in this measure, and only objective accuracy was taken into account. If discrimination accuracy for these 10 trials was greater than 90% then the display duration was reduced by 8 ms and the process repeated for another 10 trials until discrimination accuracy fell into the desired range (70e90%). Similarly, if discrimination accuracy for the 10 trials was below 70% the display duration was increased by 8 ms and a further block of 10 trials completed until such time as the desired discrimination accuracy was achieved.
2.1.3.2. MAIN CONDITIONING TASK. In keeping with the original protocol (Pessiglione et al., 2008), participants were asked to choose between making a response by pressing a spacebar (Go), or refraining from a response (NoGo), to masked cues. In each block, one cue was paired with reward, one with punishment, and one with the neutral outcome. Hence, participants could choose either to take a "risky" action (where they might win £1, lose £1, or have a neutral outcome depending on the proceeding cue) or to refrain from acting and thus ensure a neutral outcome.
Each trial began with a fixation cross (500 ms), followed by a forward mask (67 ms), one of the target cues (determined supra-threshold display duration), and backward mask (67 ms). Subsequently, a question mark appeared on the screen, indicating that the response could be made. Regardless of the response (Go or NoGo), the response window remained open for 3000 ms, after which the choice made (Go! or No!) was displayed (500 ms), followed by the outcome (reward, punishment, or neutral; 2000 ms). There was one block of 90 trials, with 30 rewarding, punishing and neutral trials each, in a randomised order.
2.1.3.3. PERCEPTUAL DISCRIMINATION TASK 2: AWARENESS CHECK. The second and final discrimination task was used to assess the objective level of cue awareness, as indexed by same/different discrimination accuracy. No further adjustments to display duration were made, which remained at the level determined in the perceptual discrimination task. There was one block of 100 trials, with 50 same and 50 different trials in a randomised order.

Analysis and Results
Bayes factors (B) were used to assess the strength of evidence for the alternative hypothesis, H1, over the null, H0 (Wagenmakers et al., 2017). All Bayes factors, B, reported here represent the evidence for H1 relative to H0; to find the evidence for H0 relative to H1, take 1/B. Here, B H(0, x) refers to a Bayes factor in which the predictions of H1 were modelled as a half-normal distribution with an SD of x (see Dienes & McLatchie, 2017); the half-normal can be used when a theory makes a directional prediction where x scales the size of effect that could be expected. With the assumptions we used for modelling H1, as it happened, where an effect yielded a p value less than .02, the Bayes factor was above 6, though there is no guarantee of such a correspondence between B and p values (Lindley, 1957). To indicate the robustness of Bayesian conclusions, for each B, a robustness region will be reported, giving the range of scales that qualitatively support the same conclusion (i.e. evidence as insensitive, or as supporting H0, or as supporting H1), notated as: where Â1 is the smallest SD that gives the same conclusion and Â2 is the largest (see Dienes, 2019).

Data pre-processing
In order to account for potential response bias, type I d' (a Signal Detection Theoretic measure of sensitivity to signal versus noise; Stanislaw & Todorov, 1999) was computed for both PDT2 (awareness check) and the main conditioning task. Type I d' can be used to index awareness level corresponding to the objective threshold, where chance performance corresponds to lack of awareness, regardless of confidence or subjective awareness reports. Note that this measure is used here following the procedure of Pessiglione et al. (2008). For the PDTs, correct same/different responses were treated as hits, and incorrect responses as false alarms. In the conditioning task, Go responses to rewarding cues were treated as hits, and Go responses to punishing cues as false alarms. Go responses to neutral cues were discounted, as participants are expected to respond arbitrarily to them due to their null outcome.

Awareness check
At the group level, d' scores for the PDT2 were entered into a one-way t-test against 0, which indicates no ability to discriminate the stimuli (no sensitivity between signal versus noise, akin to chance performance). A Bayes Factor (B) was computed for the difference, with the predictions of H 1 (awareness is present) modelled as a half-normal distribution centred on 0, with an SD equal to a d' of 1 (the average expected effect size corresponding to 70% hit rate (accuracy) and c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 30% false alarms, an estimate of above-chance and belowceiling performance).

Main conditioning task
The d' scores for the conditioning task were entered into a one-way t-tests against 0, indicating lack of discrimination between the cues, and consequently, lack of learning. B was computed for the difference, with the predictions of H1 (learning is present) modelled as a half-normal distribution centred on 0, with an SD equal to .7 (the expected effect size if learning is present, derived from Pessiglione et al., 2008).
The results indicate that participants were able to successfully learn, with the average d' significantly greater than

Pilot: discussion
The purpose of the pilot was to establish a rough relation between the level of awareness as measured by the awareness measure by Pessiglione et al. (2008), and the level of learning it can support. In the pilot, the mean awareness was d' ¼ .9, and the mean learning was d' ¼ 1.793. These are the crucial facts we need. If both these measures result from the influence of the same knowledge base, namely conscious perception, then as conscious perception goes to zero, both should also go to zero (Dienes, 2015b). Thus, on a plot of awareness against learning ( Fig. 1), a line from the point given by the two means going to (0,0) gives a rough estimate of the relation that should be obtained between awareness and learning, assuming it is linear. While there are uncertainties in both the estimates and their linearity, we only need a rough estimate, as we will model uncertainty around this estimate, and robustness regions will be provided. Now we are in a position to proceed with the replication. The theory that the Pessiglione et al. (2008) method produces unconscious learning involves two predictions: 1) participants will perform at chance on the awareness measure; and 2) participants will show conditioning. These are the crucial tests we will consider below. 1) might be regarded as an outcome neutral test in order for the paradigm to be relevant for showing unconscious learning. From the point of view of a replication, however, it constitutes a crucial test of whether the procedure does result in stimuli being subliminal.

Experiment 1: direct replication
Experiment 1 aims to directly replicate unconscious instrumental conditioning found in Pessiglione et al. (2008). For this reason, all methods are in keeping with those employed in that original study. The original frequentist analyses are supplemented with Bayes Factors, in order to disambiguate potentially non-significant results as either indicating support for the null hypothesis, or indicating insensitive data. The Stage 1 registration can be found at https://osf.io/gf8jp/. The in-principle accepted Stage 1 manuscript can be found at https://osf.io/cmdfs/. The task code, materials, timestamped raw data files, data processing script, and summarised data files are available at https://osf.io/ke6yj/.   In keeping with the original study, participants were told they will be reimbursed with their earnings from the task, but at the end this was rounded to a fixed amount of £6. Ethical approval was granted by the School of Psychology ethics committee at the University of Sussex, and the study was conducted in accordance with the Declaration of Helsinki. Data for three participants were partially missing due to a technical glitch, yielding a useable sample of 59 participants.

Stimuli and materials
All stimuli and materials used were equivalent to those reported in the original study. The stimuli were 15 randomly chosen characters from Agathodaimon font, presented in white typeface on a black screen, with a size of 240 by 180 Fig. 2 e Trial sequence in Experiment 1 (top) and 2 (bottom). In both experiments, participants were presented with a stimulus (predictive of a rewarding, punishing, or neutral outcome if Go is executed) between two visual masks. Following each presentation, they were asked to make a Go or NoGo response. If a Go response was chosen, the outcome was presented, depending on the type of cue. If a NoGo response was chosen, the outcome was a greyed-out coin. In Experiment 2, each trial ended with an immediate awareness check, composed of a judgment of cue symmetry and confidence.
c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 pixels. For each participant, 3 were randomly assigned to PDT1, 3 to PDT2, and 9 to the main task, with 1 rewarding, 1 punishing, and 1 neutral cue in each of the 3 blocks. Two black-and-white visual noise masks of the same size as the stimuli were generated by scrambling one character image into 30 by 30 pixel squares. The same two masks were used for all participants in the same fashion (one preceding and one following the target stimulus). The outcome images were a circled £1 coin image for reward, a crossed-out £1 image for punishment, and a greyed-out coin for neutral. The task was programmed using Matlab 2018b (Math-Works, 2018), running Psychophysics Toolbox (Brainard, 1997), and presented on a Dell LCD monitor with a 60 Hz refresh rate (manufactured in 2006 to approximate the screen technology used in the original experiment by Pessiglione et al.). Responses were collected with a standard keyboard.
3.1.3. Procedure 3.1.3.1. PERCEPTUAL DISCRIMINATION TASK 1. Participants were seated with their chin on a chin rest placed at 50 cm distance from the screen. Each session commenced with a PDT1 used to determine the individual cue display duration. The duration was either (33 ms or 50 ms); the largest for which they show chance-level ( 50%) discrimination performance. In the task, participants were shown two cues, each forward-backward masked (67 ms), separated by a 3s interval indicated with a fixation cross, following the method of Pessiglione et al. (2008). Following the display, they were asked to report whether the cues presented were the same or different, using the arrowkeys. The task consisted of 2 blocks of 120 trials (with 60 same and 60 different trials in each, in a randomised order). The first block was conducted with 50 ms display duration of each cue. If discrimination accuracy at this stage was at chance (assessed with a chi-squared test for each participant), the task was ended and a 50 ms display duration adopted in the main task. If performance in the first block was above chance, the duration was decreased to 33 ms for the second block, and performance assessed again. If it was at chance at the end of the second block, the duration of 33 ms was be adopted in the main task. Two participants who remained above chance at 33 ms were not able to take part, yielding a final sample of 56.
3.1.3.2. MAIN CONDITIONING TASK. The task was identical to the original protocol and the pilot study, with the exception that the cues were presented subliminally, for the duration determined in the PDT1 (33 or 50 ms). Participants were asked to choose between making a response by pressing a spacebar (Go), or refraining from a response (NoGo), to masked cues (see Fig. 2). In each block, one cue was paired with reward, one with punishment, and one with the neutral outcome. Hence, participants could choose either to take a "risky" action (where they might win £1, lose £1, or have a neutral outcome depending on the proceeding cue) or to refrain from acting and thus ensure a neutral outcome.
Each trial began with a fixation cross (500 ms), followed by a forward mask (67 ms), one of the target cues (determined subliminal display duration), and backward mask (67 ms). Subsequently, a question mark appeared on the screen, indicating that the response may be made. Regardless of the response (Go or NoGo), the response window remained open for 3000 ms, and participants' response was collected at the enddGo if the spacebar was being pressed, and NoGo if it was released. Finally, the choice made (Go! or No!) was displayed (500 ms), followed by the outcome (reward, punishment, or neutral; 2000 ms).
In order to counterbalance motor conditions, the 'risky' response was pseudo-randomised to be Go for half the participants, and NoGo for the other half. There were 3 blocks of 120 trials, with 40 trials of each type (rewarding, punishing, neutral). Within each block, the order of the trial types was randomised without constraints.
3.1.3.3. PREFERENCE TASK. Following each of the three conditioning blocks, the three cues used were shown on the screen side by side, unmasked, in a randomised order. Participants were asked to rate them in order of preference, from most (3) to least (1) liked.
3.1.3.4. PERCEPTUAL DISCRIMINATION TASK 2. A PDT2 with 120 trials (60 same, 60 different) and 3 new stimuli was repeated at the end of the testing session. There were no adjustments to the display duration, which was kept at the level determined in PDT1. The task allows to determine whether or not participants' cue awareness remained at chance level.

Data pre-processing
Identical to pilot study.

First crucial test: awareness check
Absence of awareness was determined by assessing discrimination performance on the second perceptual discrimination task, indexed by type I d' scores (corresponding to the objective threshold of awareness). At the group level, d' scores were entered into a one-way t-test against 0, which indicates no ability to discriminate between the stimuli (no sensitivity between signal versus noise). B was computed on the obtained mean d', with the H 1 (awareness present) modelled as a halfnormal distribution with a mean of 0 and a SD equal to the value derived from the pilot study, following the regression method outlined by Dienes, 2015b, p.211-213 Fig. 3). This result indicates that the full sample was aware in the PDT2, an outcome contrary to that found at this stage in Pessiglione et al. (2008).
Following the original method, performance for every individual compared to chance (50% accuracy) was assessed with chi-square tests. Participants who showed significant above-chance performance (16) were excluded from further analysis, as well as those who explicitly reported seeing the stimuli on-screen (0).

Second crucial test: main conditioning task
Presence of learning in the conditioning task was assessed with d' scores. D' scores were entered into a one-way t-tests against 0, indicating lack of discrimination between the cues, and consequently, lack of learning. B was computed with H 1 modelled as a half-normal distribution with a mean of 0 and a  Participants were able to learn to refrain from making a Go response over the course of the block. Asterisks indicate significance at: * ¼ p < .05, ** ¼ p < .005, p < .001. Tilde indicates a sensitive B supporting the H1. c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 SD equal to .7 (expected effect size if learning is present, derived from Pessiglione et al., 2008). Resulting B H(0,0.7) > 6 can be taken as evidence of learning. B H(0,0.7) < 1/6, can be taken as evidence for absence of learning. In the event of an insensitive result, data collection will cease at 170 participants (upper cap estimated in the same way as in section 3.2.2., using a learning d' of .7 as the expected effect size and learning SE of .3 obtained in the pilot). A robustness region will be reported, as described in the pilot.
As pre-registered, following the original analysis steps, after excluding individual participants who showed significant above-chance perception, learning d' was above 0 (M ¼ .37, SE ¼ .11; t(39) ¼ 3.24, p < .001; B (H,0,0.7) ¼ 93.65, RR B > 6 [.06, 12 d' units]; see Fig. 4). This demonstrates that the participants deemed unaware on the PDT2, following the criteria of the original paper, were able to learn the unconscious stimuluseactioneoutcome associations in the conditioning task.

Exploratory analysis: awareness check
We also assessed the awareness level in the remaining sample following the individual awareness exclusions (N ¼ 40), which was not a part of the registered analysis. In this sample, awareness d' was not significantly different from 0 (M ¼ .05, SE ¼ .03; t(39) ¼ 1.55, p ¼ .064; B H(0,0.333) ¼ .57, RR B < 1/6 [0, 1.2]; see Fig. 5). This shows the awareness data after individual awareness-related exclusions were insensitive. This result is comparable to the findings of Pessiglione et al. (2008). See Fig. 6 for a depiction of the levels of learning plotted in reference to awareness before and after the awareness-related exclusions.

Exploratory analysis: preference ratings
The preference ratings (from 1 [least liked] to 3 [most liked]) were entered into a repeated-measures ANOVA, with cue type (rewarding, punishing, neutral) as a factor. There was no main effect of cue type on preference ratings (M REW ¼ 2.33, M PUN ¼ 1.93, M NEU ¼ 2.05 ranking units; F(2,342) ¼ 1.408, p ¼ .246). Note that preference ratings were lost for the first two participants due to a programming error.

3.3.
Conclusions of experiment 1 Experiment 1 aimed to directly replicate unconscious instrumental conditioning found in Pessiglione et al. (2008), following the exact methods and analysis steps, supplemented with Bayes factors. Using type I d' as a proxy for learning (i.e. being able to discriminate between the stimuli), we found sensitive evidence in favour of learning in the sample deemed unaware according to the original criteria (i.e. the participants that remained after removing those showing individual above chance performance on the PDT2). This result replicates the effect found in the original paper. However, we were not able to replicate the first crucial test, namely that awareness at the group level, assessed with PDT2, was absent. In the original paper, absence of awareness on the PDT2 was asserted through a non-significant difference between the obtained d' (M ¼ .08, SD ¼ .2, SE ¼ .04) from 0, with  no participants being excluded due to individual abovechance performance. In the present study, the full sample was found to be significantly above chance, with a Bayes factor providing strong support for the presence of awareness as indicated by this test. After individual exclusions of abovechance participants in an exploratory analysis (section 3.2.4.), awareness in the remaining sample was not significantly different from 0, with an insensitive Bayes factor (note that the aforementioned sensitive evidence for learning was obtained on this sample). We postulate that the assertion of absence of awareness from a non-significant t-test against 0 in the original paper resulted from a similar situation. Indeed, computing a Bayes factor on the provided awareness d' statistics from that paper also yields an insensitive result (B H(0, We would like to note the caveat that our registered stopping rule (sensitive evidence for H 0 , or 200 participants if insensitive in the full sample) did not explicitly allow for the effect to go in the other direction, i.e. sensitive evidence for H 1 . While this was not an expected outcome, logically the requirement for sensitivity should have been bi-directional and as such, the very strong support for H 1 in our data warranted stopping data collection.
To conclude, while we were able to replicate the presence of learning in the conditioning task, we were not able to assert absence of awareness at the group level in the PDT2, a necessary precondition for claiming that unconscious instrumental conditioning took place. Consequently, in Experiment 2 we propose a series of methodological improvements to reassess this claim.

Experiment 2
Experiment 2 was designed to be conducted only in the event of Experiment 1 replicating the effect found in Pessiglione et al. (2008). In light of the methodological and theoretical advances and debates in the field of unconscious learning (e.g. Dienes, 2015a;Mudrik et al., 2014;Newell & Shanks, 2014), Experiment 2 aims to replicate the result, introducing changes to the paradigm targeted at increasing the methodological rigour. The Stage 1 registration can be found at https://osf.io/ gf8jp/. The in-principle accepted Stage 1 manuscript can be found at https://psyarxiv.com/p9dgn/. The task code, materials, timestamped raw data files, data processing script, and summarised data files are available at https://osf.io/t23by/. Firstly, in the original study, the measures in the awareness check and the learning task pertain to two different aspects of decision-making. The perceptual discrimination tasks (serving as threshold-setting and as awareness check) required a same/different perceptual judgment of 2 cues, separated by a 3s interval. In contrast, the main conditioning task required an approach/avoid response after a single stimulus. Hence, the measure used in the perceptual discrimination task reflected a different decision process than was required in the conditioning task, violating the relevance and sensitivity criteria (Berry & Dienes, 1993;Newell & Shanks, 2014). As such, the threshold-setting task was amended to match the conditioning task more closely. In order to further enhance the sensitivity of the task and ensure that it reliably settles on sub-threshold conditions for the largest possible number of participants, stimulus contrast was reduced and the exposure time range was extended beyond the original limit of either 50 or 33 ms. Stimulus exposure was individually titrated for each participant in a stepwise manner based on both the accuracy and reported confidence in visual perceptions. 2 This procedure was used extensively in previous research for effective identification of sub-threshold conditions (Scott et al., 2018;Skora, Yeomans, Crombag, & Scott, 2021;Skora, Livermore, Nisini, & Scott, 2022).
Secondly, the separate awareness check was replaced with a trial-by-trial measure, allowing to access the information about participants' awareness in a more immediate fashion (Berry & Dienes, 1993;Newell & Shanks, 2014). Thirdly, the original task design leaves open the possibility that participants might occasionally experience awareness of the stimuli in the learning task, which does not become apparent in the final PDT2. This might occur either where the same brief moments of awareness do not reoccur in the PDT2 or where they are too infrequent to significantly influence the overall objective accuracy measure. Reliably excluding individual trials is impossible when only objective discrimination measures are collected. With this in mind, the trial-by-trial awareness check also included confidence ratings, allowing to exclude trials where participants were subjectively aware. Because the initial staircase should sensitively settle on subthreshold conditions, a trial-by-trial check should only elicit a small number of aware trials. While post-hoc trial exclusion of conscious trials can lead to regression to the mean (Shanks, 2017), its effect is negligible if the majority of trials are unconscious. We examined this assumption by modelling worstcase scenarios for the presence of conscious trials at different proportions of observed unconscious trials, at different error rates (see Supplementary material). This allows us to determine the maximum percentage of conscious trials which could inadvertently contribute to unconscious knowledge. Thus, if we observe 80% of unconscious trials (leaving room for error in the remaining 0e20%), the maximum proportion of conscious trials possibly contained within our observed unconscious trials is 1.59%. This would be the percentage of our conscious knowledge potentially accounting for the learning effect. Using the observed d' found in the pilot study of 1.8, we find that the maximum influence from conscious knowledge where 80% of responses are attributed to unconscious responding is d' ¼ .03. We consider this negligible and as such will adopt a strategy whereby provided the proportions of responses attributed to conscious responding does not exceed 20%, our exclusion criteria will be applied (see section 4.2.2.).
Finally, the forward and backward masks were generated afresh on each trial by randomly scrambling a black-andwhite noise image. The use of different masks on each trial reduces the likelihood of participants building erroneous associations from possible salient repetitive features of the masks (or of some stimulusemask combinations). Mask duration was also extended from 67 to 300 ms in order to ensure robust masking with the larger possible range of stimulus display durations.
As in Experiment 1, original frequentist analyses were supplemented with Bayes Factors.

4.1.
Method 4.1.1. Participants 45 participants (33 females) were recruited at the University of Sussex (M AGE ¼ 20.59, SD AGE ¼ 1.52, range ¼ 18e25 years; 8 participants did not report their age). Sample size was determined with the Bayesian Stopping Rule, using previously obtained effect sizes as empirical priors, or cease at 170 participants should the result remain insensitive (see 4.2.2. Planned Analyses for detail). In keeping with the original study, participants were told they will be reimbursed with their earnings from the task, but at the end this will be rounded up to a fixed amount of £6. Ethical approval was granted by the School of Psychology ethics committee at the University of Sussex, and the study was conducted in accordance with the Declaration of Helsinki.

Stimuli and materials
The stimuli were 11 characters from Agathodaimon font, chosen pseudo-randomly to ensure six symmetrical and seven asymmetrical characters. All were presented in light grey typeface (RBG: 141, 141, 141) on a darker grey background (RBG: 115,115,115), with a size of 240 by 180 pixels. For each participant, two characters were randomly assigned to PDT1 (with one symmetrical and one asymmetrical character). The nine remaining stimuli were pseudo-randomly assigned to the main task, with one rewarding, one punishing, and one neutral cue in each of the three blocks, such that each block contained both symmetrical and asymmetrical cues. Both the forward and backward masks were generated afresh on each trial by randomly scrambling a 240 by 180 pixels black-andwhite noise image in blocks of 3 Â 3 pixels. 4.1.3. Procedure 4.1.3.1. PERCEPTUAL DISCRIMINATION TASK 1. Each session commenced with a PDT, allowing to determine the individual cue display duration. In this task (in contrast to Experiment 1), participants were shown a fixation cross (500 ms), followed by a single cue (display duration starting at 600 ms), forwardbackward masked (300 ms). After each sequence, participants were asked to report whether the cue presented was symmetrical or asymmetrical, using the arrow-keys. Next, they were asked to report whether they had any confidence in their judgment, or if they were guessing, also using the arrowkeys. They were instructed to report 'some confidence' if they had any degree of confidence, and 'total guess' only if they felt they did not perceive the cue and were responding randomly. With every correct response made with confidence (taken as aware), the display duration of the cue was reduced by 50 ms on the subsequent trial. When a duration of 100 ms was reached, or the first guess was made, the display duration returned to the previous level (þ50 ms), and subsequently decreased in 16 ms steps on the following trials (corresponding to single screen refresh duration in a 60 Hz screen used here). This reduction continued until participants reported guessing again, at which point the display duration remained the same until a guess was reported on six consecutive trials, regardless of response accuracy. The cue display duration on those trials was applied as the individual threshold of conscious awareness in the main conditioning task. Participants who reach the minimum possible display duration (16 ms) without guessing were not able to take part. The average established display duration was 80 ms (SD ¼ .044 ms; mode ¼ 84 ms). 4.1.3.2. MAIN CONDITIONING TASK. The conditioning task was identical to Experiment 1, with the exception that an awareness check was added at the end of each trial. Following feedback presentation (reward/punishment), participants were asked to report if the masked cue was symmetrical or asymmetrical, using the arrow keys. Next, they were asked to report their confidence in that judgment on a binary scale (between 'some confidence' and 'total guess'). There were 3 blocks of 120 trials, with 40 trials of each type (rewarding, punishing, neutral). Within each block, the order of the trial types was randomised without constraints. Prior to beginning, participants were explicitly instructed that the symmetry judgments are not related to the rewarding/punishing outcomes. They were also shown a different pair of example cues to illustrate what is meant by symmetry. 4.1.3.3. PREFERENCE TASK. Identical to Experiment 1.

Data pre-processing
Identical to Experiment 1.

Crucial test: main conditioning task
Individual trials where participants made a correct symmetry judgment with confidence were marked as 'aware' trials and excluded (M ¼ 23.76 trials, mode ¼ 1 trial) 3 . In cases where exclusions exceeded 20% (72) of trials, the entire participant was excluded from analysis (N excluded participants ¼ 4, final N ¼ 41). The remaining trials were analysed with type I d' in a manner identical to Experiment 1. B was computed with H 1 modelled as a half-normal distribution with a mean of 0 and a SD equal to .7 d' units (expected effect size if learning is present, derived from the original study). Resulting B H(0,0.7) > 6 can be taken as evidence of learning. B H(0,0.7) < 1/6, can be taken as evidence for absence of learning. In the event of an insensitive result, data collection was determined to cease at 170 participants (upper cap estimated in the same way as in section 3.2.3., using a learning d' of .7 as the expected effect size and learning SE of .3 d' units obtained in the pilot). A robustness region is reported. The d' scores from the conditioning task after excluding aware trials were entered into a one-way t-test against 0, indicating lack of discrimination between the cues. The results indicate that participants were not able to discriminate between the subliminally-presented stimuli (M ¼ .

Exploratory analysis: main conditioning task, guess trials
We also assessed the learning effect in the unaware trials, this time considering all confident trials as aware (M ¼ 38 trials, c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 mode ¼ 9 trials), regardless of their accuracy (in contrast to considering only correct and confident trials as aware in the crucial analysis above). Those confident trials were excluded. The same participant exclusions as in the crucial analysis above applied. The results indicate that participants were not able to discriminate between the subliminally-presented stimuli (M ¼ .02, SE ¼ .

Exploratory analysis: main conditioning task, all trials
We also assessed the learning effect without applying the awareness exclusion criteria. Without applying participantwise exclusions (i.e. all participants, all unaware trials), the learning d' was not significantly different from 0, with an insensitive B according to our criteria (M ¼ .09, SE ¼ .

Exploratory analysis: main conditioning trials, aware trials
We also assessed the learning effect on the trials classified as aware (correct and confident) on the trial-by-trial awareness check. However, due to the small proportion of aware trials (M ¼ 23.76, mode ¼ 1 trial), d' was only computable or meaningful for a small number of participants. Without applying any further exclusions (i.e. all aware trials, all participants with a computable d'), the aware learning d' was numerically higher than for unaware trials, but not significantly different from 0, with an insensitive B, likely due to the small resulting sample size (M ¼ .23,SE ¼ .31 d' units;t(25) 42,620,030,835,485,600,RR B > 6 [.00018,3.42 d' units]). B was computed for the raw regression slope using the ratio-ofmeans heuristic (Dienes, 2019), computing the expected slope for the relation between the number of aware trials and learning d' from the ratio of their means (M aware Proportions of Go responses to rewarding (red) vs punishing (blue) cues. Neutral cues are ignored since they provided no outcome. Participants were not able to learn to respond Go more often to rewarding than punishing cues (unaware trials only). Cross indicates a sensitive B supporting the H 0 .
trials ¼ 23.76, M learning d' ¼ .14). The resulting ratio (.006) was entered into B calculation as the scaling factor.
While we classified only correct and confident trials as aware in the above analysis, for completeness we also assessed the learning effect on all confident trials (whether correct or incorrect). Without applying any further exclusions (i.e. all confident trials, all participants with a computable d'), learning d' was again numerically higher than for unaware trials (M ¼ .28, SE ¼ .20 d' units), but not significantly different from 0, with an insensitive B, again likely due to the small sample size (t (35)

4.3.
Conclusions of experiment 2 Experiment 2 aimed to replicate the result of Experiment 1, introducing changes to the paradigm targeted at increasing its methodological rigour, and satisfying the relevance, sensitivity, and immediacy criteria. Those changes included amending the type of decision required in PDT1 to match that required in the conditioning task, introducing an individuallytitrated threshold of conscious awareness, and replacing the separate awareness check in PDT2 with a trial-by-trial check, which allowed robust exclusion of all subjectively aware trials from analysis. Again using type I d' as a proxy for learning (i.e. being able to discriminate between the stimuli), we found sensitive evidence for the absence of learning, when all subjectively aware trials were excluded. This result challenges the finding of learning with an insensitive awareness effect found in the original paper (Pessiglione et al., 2008), and in the same way, clarifies our direct replication (Experiment 1) e showing, in both cases, that apparent evidence for learning was likely due to inadequate exclusion of aware trials.
In the exploratory analysis of the preference ratings, the rewarding stimulus was rated significantly higher than the neutral cue, suggesting that participants may have acquired some knowledge of the stimulus values over the course of learning. However, successful instrumental learning should result in the largest difference between ratings of rewarding and punishing stimuli (in addition to the difference between the valenced and neutral cues), a comparison which failed to reach sensitivity, preventing any clear conclusions. This result could also be attributed to the fact that while participants who were aware on over 20% of trials were excluded, the remaining sample still included incidental moments of stimulus awareness, which may have affected the overall preference ratings.

General discussion
The present study revisited the question of unconscious instrumental conditioning by attempting to replicate the seminal study by Pessiglione et al. (2008) both directly (Experiment 1), and with amendments targeted at increasing the methodological and analytical rigour (Experiment 2). We find that learning took place when replicating the original methods directly (albeit we could not assert that the full sample was unaware in the separate awareness check, PDT2). However, following the enhancement of the sensitivity of the thresholdsetting perceptual discrimination task, and the introduction of a trial-by-trial awareness check permitting exclusion of individual trials, we found evidence for the absence of learning. All conclusions were based on informed Bayes factors. This difference can be attributed chiefly to the ability to detect individual trials where participants showed stimulus awareness by discriminating stimulus symmetry correctly and with confidence, a measure introduced in Experiment 2. Absence of such a granular method of assessing awareness in Experiment 1 meant that occasional trials with subjective awareness of the stimulus would have remained undetected. This, in turn, increased the chances that learning was not fully unconscious, such that the observed effect could have arisen through a mixture of conscious and unconscious trials. As is evident in Experiment 2, even with a sensitive thresholdsetting task, most participants exhibited some degree of awareness on occasion. This may reflect visual adaptation and normal fluctuations in visual sensitivity. The ability to identify those trials in an immediate (i.e., trial-by-trial) manner and to exclude them ensured that only genuinely unaware trials were analysed, uncontaminated by individual aware trials. Critically, when only unaware trials were analysed, learning did not take place.
This result supports the theoretical perspectives converging on the view that flexible, goal-oriented behaviour, supported by long-range, recurrent information processing patterns, requires conscious perceptual access to relevant stimuli (Baars, 2002;Dehaene & Changeux, 2011;Dehaene, Charles, King, & Marti, 2014;Lamme, 2006;van Gaal, de Lange, & Cohen, 2012). While simpler forms of associative learning (including classical conditioning) may be feasible in the absence of conscious perception of relevant stimuli, instrumental conditioning is considerably more complex. It requires the integration of information from separate modalities involved with visual processing of the stimulus, extracting the expected stimulus value, deploying a selective action or refraining from action altogether, relating the reinforcement to the stimulus-dependent action, and comparing the actual outcome to the expected outcome, in order to update the expected stimulus value and store it in memory for subsequent interactions. It may therefore not be so surprising that conscious perception appears to be crucial for the flexible and long-lasting information processing strategies c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 underpinning instrumental conditioning. An unconsciously presented stimulus might simply not have the capacity to evoke such a broad range of activity (although it remains an open, and worthwhile, question at which stage the process breaks down in the unconscious case). Our result also adds to the growing body of work demonstrating that complex forms of learning, including instrumental conditioning, cannot operate without conscious perception of the stimulus (Mertens & Engelhard, 2020;Reber et al., 2018;Travers, Frith, & Shea, 2018). Since Stage 1 acceptance of this registered report, we have also produced corroborating evidence for both trace and delay instrumental conditioning scenarios occurring only with conscious perception in our paradigms (Skora et al., 2021;Skora, Livermore, et al., 2022;Skora, Scott, & Jocham, 2022). Unconscious stimuli are also less likely to drive long-term behavioural adaptations, in comparison to consciously perceived stimuli (e.g. in conflict adaptation or post-error slowing; de Lange, van Gaal, Lamme, & Dehaene, 2011;Kunde, Reuss, & Kiesel, 2012;van Gaal et al., 2012). Together, this evidence supports the notion that learning instrumental associations requires perceptual conscious access. However, learning complex structures, including those involving valence, may occur implicitly when the stimuli are consciously perceived (Jurchiș, 2022;Jurchis ‚ , Costea, Dienes, Miclea, & Opre, 2020;Waroquier, Abadie, & Dienes, 2020).
Such a conclusion has implications for the debate about the function(s) of consciousness, supporting the position that adaptive behaviourdeven simple, entirely deterministic instrumental behaviour of the kind presented heredrequires perceptual consciousness in order to successfully operate. This is in line with theoretical proposals that consciousness is closely linked to action, providing a frame of reference for agents' interactions with the world in an embodied fashion (Clark, 2016;Land, 2012;Merker, 2005;Seth et al., 2016). On some theories, consciousness may be necessary to support flexible, longer-term decision-making going beyond simple stimulusestimulus or stimuluseresponse associations, thus permitting building complex, counterfactual models of the agent in its world. Complex learning, labelled unlimited associative learning (UAL), has also been considered an evolutionary marker of consciousness (Birch, Ginsburg, & Jablonka, 2020Ginsburg & Jablonka, 2019). UAL is defined as a capacity of agents to learn about themselves and their environments in an open-ended, unlimited (within lifetime) fashion. UAL, including learning novel responses and trace conditioning, in addition to learning from compound features, second-order conditioning, and flexible re-learning, is proposed to rely on conscious access, and constitute a positive marker of consciousness. While the general claim may be constrained to perceptual consciousness (given evidence for implicit learning; see e.g. (Dienes & Seth, 2022), the results of the present experiment indeed support such adaptive approaches to understanding the function of consciousness (Cleeremans & Tallon-Baudry, 2022).
One might worry that d' is a limited proxy for assessing learning, since it collapses performance across an entire block and therefore discounts potential within-block incremental improvements in the learning process (as would be expected during conditioning). However, inspection of the learning curves and the proportion of Go responses in response to rewarding and punishing stimuli strongly suggests that participants failed to improve their responses as the block progressed (in Experiment 2, where awareness was appropriately controlled, Fig. 6 B, C). Due to this, and considering also the absence of discrimination between the cues overall, we deemed it redundant to conduct finer-grained analyses of the learning process, such as reinforcement learning modelling as conducted in the original Pessiglione et al. (2008) paper.
The conditioning paradigm investigated here constitutes a variant of trace conditioning: the stimulus, response, and outcome are temporally separated from each other (although the ongoing response bridged the gap between stimulus and the outcome). Could this temporal separation have impaired the integration of stimulus information needed to allow the stimulus to become predictable of a specific action-dependent outcome? Perhaps bringing the events (stimulus, response, outcome) to the point of overlap, as in delay conditioning, would maximise the likelihood of successful instrumental learning. However, recently we demonstrated that participants were not able to learn in a comparable task even in a delay conditioning scenario with primary (appetitive and aversive) outcomes (Skora et al., 2021).
It is of course impossible to entirely exclude the possibility that modifications to a task will inadvertently disrupt learning, or make any learning more difficult to detect. In our Experiment 2, the symmetry and confidence judgments, needed for the trial-by-trial awareness check, introduced a longer spacing between the trials, which could increase the cognitive load during the learning process. It is conceivable that both of these factors (increased temporal spacing, increased cognitive load) could have impeded learning.
The way masking is implemented can also affect stimulus processing and, consequently, learning. Using the same masks on every trial (like in Experiment 1) can introduce systematic maskecue interactions (for instance, a mask can bring out features of a given stimulus more than of another, making it more distinguishable or prompting participants to build associations from repetitive combinations). To avoid these potential confounds, we used a different mask on each trial in Experiment 2. However, it is also conceivable that new visual features of each new mask on every trial could blur cueeoutcome associations, or cause participants to search for links between masks and outcomes (despite being explicitly instructed not to). We consider those issues unlikely, since the same masking method has been successfully used to show simpler associative learning (e.g. Scott et al., 2018).
Finally, while the trial-by-trial awareness check in Experiment 2 constituted a methodological improvement, it still leaves open a possibility that the knowledge gained from the conscious (albeit excluded) trials affected behaviour on the unconscious trials. We guarded against this by excluding participants with a high proportion (over 20%) of aware trials. For those with fewer than 20% of such trials, we demonstrate that the potential effect of conscious knowledge is negligible (see the registered Supplementary material).
Altogether, even though any task modification may carry the possibility of unintended consequences, the modifications we made were guided solely by the goal of improving detection c o r t e x 1 5 9 ( 2 0 2 3 ) 1 0 1 e1 1 7 of (un)awareness. Still, it remains a worthwhile pursuit to investigate unconscious instrumental learning in different kinds of paradigms, for instance with different types of decisions than Go/NoGo, different stimulus suppression methods, or different stimuli. This would be necessary to make general claims about the feasibility of instrumental learning in absence of awareness, or at varying degrees of awareness.
To conclude, our study revisited the question of the feasibility of unconscious instrumental conditioning by attempting to replicate a previous study by Pessiglione et al. (2008) both directly, and in a second experiment with improvements to methodology and analysis. We found evidence for learning when replicating the original methods directly. However, following the enhancement of the sensitivity of the thresholdsetting task, and the introduction of immediate and relevant trial-by-trial awareness checks allowing for exclusion of individual trials, we found evidence for the absence of learning in response to subliminally presented stimuli. This result provides robust evidence that instrumental conditioning cannot be achieved without stimulus awareness in a simple and therefore potentially highly general paradigm, in line with other emerging evidence that complex forms of learning may rely on conscious perception of the relevant stimuli. Altogether, our results support the theoretical view that perceptual consciousness may be necessary for complex, flexible processes, especially where selective action and behavioural adaptation are required, and they contribute to mapping out the role of consciousness in adaptive behaviour.

Open practices
The study in this article earned Open Data, Open Materials and Preregistered badges for transparent practices. Materials for the study are available at: https://osf.io/ke6yj/and https:// osf.io/t23by/ Credit author statement

Declarations of competing interest
None reported.