Stimulus awareness is necessary for both instrumental learning and instrumental responding to previously learned stimuli

Instrumental conditioning is a crucial part of adaptive behaviour, allowing agents to selectively interact with stimuli in their environment. Recent evidence suggests that instrumental conditioning cannot proceed without stimulus awareness. However, whether accurate unconscious instrumental responding can emerge from consciously acquired knowledge of the stimulus-action-outcome contingencies is unknown. We studied this question using instrumental trace conditioning, where participants learned to make approach/avoid decisions in two within-subject modes: conscious (stimuli in plain view) and unconscious (visually masked). Both tasks were followed by an unconscious-only instrumental performance task. We show that even when the contingencies are reliably learned in the conscious mode, participants fail to act upon them in the unconscious responding task. We also replicate the previous results that no instrumental learning occurs in the unconscious mode. Consequently, the absence of stimulus awareness not only precludes instrumental conditioning, but also precludes any kind of instrumental responding to already known stimuli. This suggests that instrumental behaviour is entirely supported by conscious awareness of the world, and corroborates the proposals that consciousness may be necessary for adaptive behaviours requiring selective action.


Introduction
Instrumental behaviourexecution of appropriate action in order to achieve reward or avoid punishmentis fundamental to flexible, goaloriented interaction with the world.Control of instrumental behaviour is a complex process, involving not only learning of the associations between stimuli, actions, and their outcomes, but also assigning motivational value to the outcomes to invigorate future behaviour, selective deployment of action, and adaptation in the face of environmental volatility.Mechanistically, this involves integrating information across distinct modalities and long temporal scales, including processing the stimulus and extracting its predictive value (e.g.positive or negative), deploying a selective response (e.g.approach or avoid), and comparing the expected outcome with the actual outcome in order to update the expectations of stimulus values (Balleine, 2011;Balleine and O'Doherty, 2010;Balleine and Ostlund, 2007;Sutton and Barto, 1998).
Recent evidence demonstrates that instrumental learning cannot proceed when the reward-or punishment-predictive stimuli are prevented from entering awareness (Reber et al., 2018;Skora et al., 2021a;Skora et al., 2021b;Skora et al., 2022;Skora et al., 2023).Comparable evidence is emerging for other complex forms of learning, such as fear conditioning (Mertens and Engelhard, 2020) and contingency learning (Travers et al., 2018).In contrast, simpler forms of learning that do not require selective action, such as classical conditioning or simple associative learning (e.g.learning the association between two unrelated stimuli), appear to be feasible even when the stimuli are not consciously perceived (Clark and Squire, 1998;Kim et al., 2015;Knight et al., 2003;Lin and He, 2009;Rosenthal et al., 2016;Scott et al., 2018).
This disparity supports the theoretical perspectives suggesting that processes requiring information integration across time and distinct cognitive modules should require conscious access (Dehaene et al., 2014;Dehaene and Changeux, 2011;Dehaene and Naccache, 2001;Lamme, 2006;Mudrik et al., 2014).Conscious access, under this perspective, reflects complex processing patterns necessary for longrange, long-lasting, feedback and feedforward connections between distinct brain regions.While low-level or shorter-range information integration, supporting simple forms of learning, may be possible without conscious awareness, increased complexity of processing, such as that observed during problem-solving or decision-making, should require consciousness (Baars, 2002;Melloni et al., 2007;van Gaal et al., 2012).Indeed, rendering stimuli inaccessible to consciousness with visual masking could prevent learning from taking place by disabling information integration across the long-range recurrent network supporting such learning (Skora et al., 2021a;Skora et al., 2021b;Skora et al., 2022Skora et al., , 2023)).
Nonetheless, while unconscious instrumental conditioning might not be feasible, it is conceivable that instrumental behaviour can proceed unconsciously once the necessary knowledge of stimulus-actionoutcome associations has been acquired consciously.Such an effect has been observed in a multisensory associative learning paradigm, where conscious acquisition of the contingencies proved necessary for accurate performance when stimulus awareness was prohibited (Faivre et al., 2014).Here, we test whether a similar dependency exists for instrumental behaviour.Specifically, we test if instrumental responding to stimuli that are not consciously perceived can occur after conscious learning of the instrumental contingencies.We test this possibility with a two-phase task, conducted in two within-subjects conditions.In the first condition, subjects perform a conscious instrumental conditioning phase, learning to approach a rewarding stimulus and avoid a punishing stimulus (100% deterministic), followed by a performance stage, where they continue to make approach/avoid responses to the same stimuli, now rendered unconscious through forward-backward visual masking.In the second condition, both the learning and the performance stages are unconscious.
Investigating this effect in an instrumental conditioning scenario can shed light on the functional boundaries of consciousness from the perspective of adaptive value.If accurate unconscious instrumental performance is observed following conscious acquisition of the stimulusaction-outcome contingencies, we may conclude that consciousness is vital for acquisition of adaptive behaviours, but not necessarily for their successful execution.A failure to observe successful unconscious performance following conscious conditioning would suggest that instrumental behaviour, both learning and execution, is entirely dependent on conscious access.

Participants
38 participants (9 males) with a mean age of 22.42 years (SD = 2.84, range = 19-31, two participants failed to report their age) were recruited at the [university hidden], through word of mouth and campus advertisement.All had normal or corrected-to-normal vision, no current or history of neurological illness, and participated in exchange for course credit or 6 EUR payment.Ethical approval was granted by the Psychology Ethics committee at the [hidden], and the study was conducted in accordance with the Declaration of Helsinki.
Sample size was determined using the Bayesian Stopping Rule (see Results-Learning), whereby data collection continued until a sensitive result was obtained in both conditioning tasks, either in support of the H0 (absence of learning; conventionally indicated by a Bayes Factor smaller than 0.3), expected for the unconscious conditioning phase), or the H1 (evidence of learning; indicated by a Bayes Factor larger than 3), expected for the conscious conditioning phase.

Stimuli and materials
The experiment was ran on Matlab 2019a (MathWorks, 2019) with Psychophysics Toolbox (Brainard, 1997).The stimuli were presented on a 24 in.Asus PG248Q gaming monitor (1920 by 1080 pixels) with a refresh rate of 120 Hz.
The stimuli in the main task included 6 neutral symbols (3 vertically symmetrical, 3 asymmetrical) taken from Agathodaimon font, and two circular shapes in the perceptual discrimination task used for threshold determination.All stimuli were 240 × 240 pixels in size, and presented in light grey (RGB: 217217217) on white background.The stimuli were forward-backward masked with a 240 × 240-pixel black-and-white noise image, generated by randomly re-scrambling the 3 × 3 pixel blocks in the image on the onset of every trial.Low contrast cues and the type of mask were intended to maximise the exposure duration without awareness.

Procedure
Each experimental session was composed of two conditions.Each condition consisted of an initial learning phase, either conscious (C, with stimuli presented in plain view, rendering participants fully aware of them), or unconscious (UC, with the stimuli visually masked and thus presented subliminally, without conscious awareness), followed by a subsequent performance phase (always unconscious, UC).Participants were seated with their chin on a chin rest at a 55 cm distance from the screen, and allowed to choose between English and German as their preferred language of the session.Each session began with the threshold of visual awareness determined individually for each participant using a masked perceptual discrimination task (see Threshold setting).The established threshold (M = 284 ms, SD = 108 ms) was then used as display duration in a brief, 8-trial practice round, and in the first condition (UC learning + UC performance, or C learning + UC performance; conducted in a randomised order).Following the first condition and a self-paced break, participants completed the threshold-finding task again, which was then used to find a display duration during the remaining condition (M = 239 ms, SD = 125 ms).The second thresholdsetting task was applied to counteract the expected visual adaptation to the stimuli over the course of the first condition, especially if C learning, where the stimuli were unmasked and presented in plain view, occurred first.

Threshold setting
Each trial consisted of a fixation cross (500 ms), followed by forward mask (300 ms), a cue (either a symmetrical or an asymmetrical circular shape, starting at 600 ms), and backward mask (300 ms).Next, participants were asked to judge whether the cue was symmetrical or asymmetrical by pressing corresponding arrow-keys, and instructed to guess if they failed to see the stimulus.They were next asked whether they had any confidence in the judgment, or if they were guessing, also using the arrow-keys (following Skora et al., 2021b).They were instructed to indicate 'some confidence' if they had any degree of confidence (even an intuition), and 'total guess' only if they felt they did not see the cue and responded at random.This distinction was applied as any feelings, including hunches or intuitions, were considered conscious experience (Dienes and Seth, 2010).This awareness check, combining an objective discrimination with a subjective confidence report, ensures that conscious experience of the stimulus is directly relevant to the first-order perception of relevant features of this stimulus, which can then guide choices.In other words, the confidence report reflects the conscious status of the relevant perceptual content (Dienes et al., 2010;Dienes and Seth, 2010).Thus, if participants can discriminate the relevant property of the stimulus (here, symmetry) with any degree of confidence, even just an intuition, we classify them as having conscious experience of the stimulus on that trial.Any objective discrimination, regardless if correct or not, made without a corresponding conscious experience reflected in any degree of confidence, would then be classified as an unconscious trial.Every time a correct symmetry response was made with confidence, the presentation duration was reduced by 50 ms on the subsequent trial.Upon reaching a duration of 100 ms, or upon the first guess response, the presentation duration returned to the previous level (+50 ms), and reduced in 8.35 ms (single screen refresh) steps on the L.I. Skora et al. subsequent trials.The reduction continued until the next guess, at which point the presentation duration remained the same if the participant continued guessing.The duration maintained over six consecutive guesses (regardless of the accuracy of responses) was set as the individual threshold of conscious awareness.

Conditioning / performance tasks
The conditioning / performance tasks were adapted from a subliminal deterministic instrumental learning scenario used previously (Pessiglione et al., 2008;Skora et al., 2021b).In the task, participants try to learn to approach (Go response) or avoid (NoGo) the presented stimuli, where one stimulus is deterministically associated with a positive outcome if approached, and another with a negative outcome.If the participants approached the stimulus associated with a win, or successfully avoided a stimulus associated with a loss, they were rewarded with 1 token (golden token displayed on the screen as feedback).Conversely, if they approached the stimulus associated with a loss, or avoided a stimulus associated with a win, they were punished with − 1 token (a red cross over the golden token displayed).Note that in contrast to the previous paradigms using this task, the 'avoid' (NoGo) response was not neutral, and was also leading to stimulus-action-contingent outcomes, thus maximising the chances of learning.Here, the task proceeded in two conditions.In the UC learning + UC performance condition, the learning task consisted of 70 trials of approach/avoid responses to stimuli rendered subliminal using forward-backward masking, with participants' performance tested in a subsequent 70 trials.In the C learning + UC performance condition, the learning task consisted of 70 trials of approach/avoid responses to stimuli presented in plain view, with their performance tested for those stimuli rendered subliminal in the subsequent 70 trials.Each condition (and the practice task) used a different pair of stimuli, chosen from the pool of 6, randomly assigned to be either rewarding or punishing, without replacement, in order to ensure that each condition contained a novel pair.The stimulus-outcome contingencies remained the same between the learning and performance tasks of each condition.
In all unconscious trials (within the UC learning and UC performance phases), each trial consisted of a fixation cross (500 ms), mask (300 ms), target stimulus (duration determined in the perceptual discrimination task), and mask (300 ms), followed by a decision prompt in the form of a question mark, during which participants had two seconds to make a response (Fig. 1).While the question mark was on-screen, participants decided between 'approaching' the stimulus by pressing the spacebar (Go) and 'avoiding' the stimulus by refraining from pressing (NoGo).Participants were instructed that they should follow their instincts in making the decisions, as they should not expect to be consciously aware of the stimulus shown.
Following feedback, participants were asked to report whether the stimulus was vertically symmetrical or asymmetrical, and their confidence in that judgment on a binary scale (some confidence or total guess).As in the threshold task, they were instructed to indicate 'some confidence' if they had any degree of confidence in the symmetry of the stimulus (including instincts), and 'total guess' only if they responded randomly.This allowed us to label all correct and confident trials (including correct hunches) as aware, providing a strict criterion of awareness.The responses were made with the arrow keys before proceeding to the next trial.If three correct and confident responses were made in a row (indicating awareness), duration was reduced by another single screen refresh (8.35 ms).Participants were shown examples of the stimuli (from a different sample) prior to beginning, and informed that stimulus symmetry was unrelated to its rewarding or punishing outcome.
In the C learning task, the trial sequence was identical, with the exception of the masks, which were omitted in order to present the stimulus in plain view (Fig. 1.B).Following feedback presentation, participants were also asked to judge the symmetry of the symbol and their confidence in their decision, with the expectation that they should be able to be largely correct and confident in their judgments.No reduction in display duration occurred following three correct and confident responses.

Exclusion criteria
In the UC learning/performance tasks, all trials where participants made a correct symmetry judgment with confidence were marked as aware and excluded (17% of all trials), in order to ensure that analyses are conducted only on truly unconscious trials.All participants showed some awareness during the task (M AWARETRIALS = 48, range: 1-198), which resulted in a drop in duration of exposure (M = 40 ms, M = 30 ms for phases 1 and 2, respectively).Of those who showed awareness, participants (31%) who were aware on >25% (52) of all UC trials were excluded from further analysis.
In the C learning task, one participant who was not correct and confident over 25% of time was excluded, on the assumption that they were not paying attention to the task or the symbols when they were clearly visible.

Learning
On average (across all unconscious phases), participants executed more Go responses (56%) than NoGo responses, regardless of stimulus type.To account for this response bias, type I d' (a Signal Detection Theoretic measure of sensitivity to signal versus noise; (Stanislaw and Todorov, 1999) was computed, with Go responses to rewarding cues treated as Hits, and Go responses to punishing cues as False Alarms.The resulting measure of sensitivity can be taken as evidence of successful discrimination between the stimuliapproximating learning -if it is significantly above 0.
Group-level d' scores for both the conscious and unconscious learning tasks were entered into one-tailed one-sample t-tests against 0 (no ability to discriminate the stimuli; chance performance).A Bayes Factor (B; Dienes, 2015aDienes, , 2016) ) was computed for the difference, with a half-normal distribution, mean specified as 0, and d' of 1.79 as the SD of the mean (corresponding to the expected effect size should learning take place, obtained in the supraliminal task of Skora et al., 2023)).1A robustness region (RR) is reported for each B, giving the range of scales that qualitatively support the same conclusion (i.e.evidence as insensitive, or as supporting H0, or as supporting H1), notated as: RR 1/3>B>3 [x1, x2], RR B<1/3 [x1, x2], and RR B>3 [x1, x2], where x1 is the smallest SD that gives the same conclusion and x2 is the largest (Dienes, 2019).In line with the Bayesian Stopping Rule (Dienes, 2015b), data collection continued until a sensitive result was found in support of either H (absence of learning; by convention indicated by a B smaller than 0.3) or H 1 (presence of learning; indicated by a B larger than 3).
In the C learning condition, d' was significantly different from 0 (M = 2.90, SE = 0.28; t(24) = 10.50, p < 0.001, Cohen's d = 2.10; B H(0,1.79) = 146,536,301.23,RR B>3 [0.14, 988] 2 ; Fig. 2), indicating that participants were able to learn the associations between the stimuli, their actions, and the action-dependent outcomes.In the UC learning condition, from the pre-registered value of 0.7, estimated from Pessiglione et al. (2008).The reason for this deviation is twofold: 1) the learning effect obtained in Pessiglione et al. failed to replicate in independent studies (Skora et al., 2021(Skora et al., , 2022(Skora et al., , 2023)), suggesting that the value of 0.7 as representing unconscious learning is unreliable; 2) the value of 1.79 corresponds to the expected justabove threshold accuracy, where learning was reliable.Robustness regions provided for Bs allow to assess the range for which the same qualitative conclusion holds, independently of the value chosen as the expected effect size.The robustness regions indeed include the initial value of 0.7, showing that the conclusion would not change should the value have been used.
The magnitude of learning in both conditioning stages (conscious and unconscious) was then compared with a paired t-test.Due to the lack of a precise numeric prediction of the expected difference between the conditions, Bayes factors for the paired t-tests were computed with a normal distribution, mean specified as 0, and the SD of the distribution specified as half of the maximum expected difference (previously used d' of 1.79/2 = 0.9; following Dienes, 2019).As expected, C learning gave rise to a significantly higher d' than UC learning (M diff = 2.87, SE diff = 0.31; t(24) = − 9.18, p < 0.001, Cohen's d = − 1.84; B H(0,0.9) = 1,062,733.22,RR B>3 [0.2, 955]; Fig. 2).

Instrumental performance following conscious vs unconscious learning
The statistical approach applied to performance was identical to that applied to the learning blocks.In the UC performance task following C learning, d' was not significantly different from 0 (M = − 0.26, SE = 0.27; t( 22 Within the conditions, there was a significant decrease in accurate ).From the end of the UC learning stage (M = 0.09) to the end of the corresponding UC performance stage (M = − 0.30), there was no change in accurate choices (M diff = 0.39, SE diff = 0.25; t(24) = 1.49, p = 0.147, Cohen's d = 0.30; B N(0,0.9) = 0.82, RR 1/3<B<3 [0, 2.8]).See Fig. 3 for a depiction of proportions of correct choices across the spans of the entire learning and performance stages.

Exploratory analysis: Win-stay/ lose-switch behaviour
Given participants' chance-level behaviour in the unconscious blocks, in an exploratory analysis, we computed the proportion of winstay and lose-switch behaviour (WSLS).WSLS is a heuristic whereby a choice leading to a win is repeated, and a choice resulting in a loss provokes a switch to an alternative choice (Estes, 1950).Such a strategy corresponds to setting the learning rate to one in reinforcement learning algorithms, indicating that choices are only guided by the most recent outcome, rather than by an integrated choice-outcome history.Note that here, as participants showed no signs of learning, we analysed stay or switch behaviour on the trial immediately following the win or loss outcome (rather than the next trial on which the same stimulus will be presented again).We investigated these performance metrics to obtain exploratory evidence that participants, despite random performance in the absence of knowledge about the stimulus presented to them, were still trying to solve the task by defaulting to a simpler strategy.We calculated win-stay (and lose-shift, respectively) proportions as the percentage of win (loss) trials that were followed by a choice repetition (a choice switch) on the next trial.Above-chance WS and LS values would indicate that participants relied on WSLS as a strategy to guide their decisions in absence of stimulus awareness providing clear stimulus-outcome mappings.One-sample t-tests against the chance value of 0.5 were computed for each learning and performance stage (see Table 1).In absence of reasonable expected effect, B was computed for each stage using a Cauchy distribution and a default prior in JASP (JASP Team, 2023).Note that we refrained from computing WSLS values in the conscious learning condition, as participants rapidly attained near-ceiling accuracy, which precludes interpretability of WSLS scores from this block.
Participants showed a tendency to engage in a WS strategy in all three unconscious phases (UC performance following C learning, UC learning and the corresponding UC performance), repeating a rewarded choice on the next trial (see Fig. 4).However, a strong support for this hypothesis from B was only obtained in the UC performance stage following UC learning.In contrast, participants did not engage in the LS strategy more often than expected from chance in any of the learning or performance stages (see Table 1 for full test statistics).

Exploratory analysis: reaction times
Reaction times (RT) >2 SD from the mean for each subject or shorter than 100 ms were removed from analysis.Subsequently, RT difference index was computed by subtracting RTs to rewarding cues from RTs to punishing cues.The resulting positive values indicate that participants took a longer time to respond to punishing cues than to rewarding cues, in line with RT-oriented indicators of reward learning (Atas et al., 2014).Zero indicates that there was no difference between the two.The index was entered into a one-sample t-test against 0, for learning in both conscious and unconscious conditions.B was computed with a halfnormal distribution, mean specified as 0, and a value of 34 ms as the SD of the mean (obtained from a past study which found a RT difference in the absence of performance effects in a comparable task (Atas et al., 2014)).Note that aware trials were excluded, and that trials where no response was made (correctly or incorrectly) yielded no RTs.
In the C learning block, the RT difference was significantly different from 0 (M = 257.04ms, SE = 60.54,t( 19 [0,430]).The Bayes factors suggest that the data were insensitive.

Discussion
In the present study, we investigated whether instrumental responding can proceed unconsciously once the necessary knowledge of stimulus-action-outcome association has been acquired consciously.Subjects performed a two-stimulus, deterministic instrumental trace conditioning task in a conscious and unconscious mode (within- subjects), in both cases followed by an unconscious instrumental performance stage using the same stimuli.
Firstly, the results indicate that while subjects were easily able to learn the associations in the conscious mode, there was no evidence of learning in the unconscious mode, reflected in participants' inability to discriminate between the positive and negative stimuli, and approach or avoid accordingly.This replicates previous evidence that both trace and delay instrumental conditioning requires conscious access to the stimuli (Reber et al., 2018;Skora et al., 2021aSkora et al., , b, 2022Skora et al., , 2023)).Secondly, there was no evidence of successful instrumental responding to unconsciously presented cues, even after the associations were reliably learned in the conscious conditioning mode.This demonstrates that it is not only the acquisition of new instrumental associations that requires conscious accessrather, successful instrumental behaviour in general appears to rely on consciousness.
consciously, re-activating the relevant processing patterns could proceed as an unconscious process.This has been demonstrated for multisensory integration and simpler kinds of stimulus-stimulus associative learning (Faivre et al., 2014;Mudrik et al., 2014).However, it does not appear to be the case for instrumental associations.Even reliable, conscious knowledge of stimulus-action-outcome associations, trained over the course of 70 trials (35 trials per stimulus), failed to produce above-chance instrumental performance when the same stimuli were blocked from entering awareness with visual masking.One interpretation of this effect could be that the components of the process are too widely distributed, requiring sequential processing across distinct brain regions, retrieval of the representations of expected stimulus values from memory, and selective action deployment.Processing of this complexity, involving long-range, feedback and feedforward connections, is largely considered to require conscious access (Baars, 2002;Dehaene et al., 2014;Dehaene and Changeux, 2011;van Gaal et al., 2012).A subliminally presented stimulus might not be able to evoke this broad range of activity across regions and time.If this is true, then the unconsciously presented stimulus fails to be integrated with subsequent events, even though it is processed sufficiently in the visual cortex and perhaps beyond (note that similar masking methods produces successful simple associative audio-visual learning; Scott et al., 2018).Possibly, subliminally presented stimuli are not represented in higherorder visual areas that provide input to lateral orbitofrontal regions that have been shown to be crucial for correctly assigning credit for an outcome to the choice that caused it (Jocham et al., 2016;Klein-Flügge et al., 2013;Walton et al., 2010).During learning, even if a correct response is made, the outcome may not be able to be successfully linked with the subliminally presented stimulus and the executed action.Consequently, the stimulus never gets to carry any intrinsic value in order to drive subsequent choices.As such, a failure during this process would explain absence of unconscious instrumental conditioning in the first place.However, after successful conscious learning, the stimuli should already have acquired action-contingent outcome values.Failure to respond appropriately to an already known but subliminally presented stimulus implies that blocking it from conscious access might prevent it from activating the regions coding for its value.Alternatively, the regions coding for value may be activated, but fail to facilitate selective choices.This scenario, however, is unlikely for a few reasons.Firstly, activating expected value might activate appropriate action purely through automatic, instinctive processes (e.g. through a Pavlovian bias, the tendency to approach rewarding stimuli).Secondly, one would expect to observe indirect markers of stimulus values being activated that are not under volitional control, such as autonomic readouts of performance monitoring or a reaction time difference.Those have not been observed in this context (Skora et al., 2021b;Skora et al., 2022).Nonetheless, neuroimaging and connectivity analyses might be fruitfully applied to investigate the entire process in order to pinpoint where it breaks down.
The result that instrumental performance still requires conscious access even after conscious training invites the question of the need for consciousness in habitual, automatic behaviour.Instrumental behaviour can become habitual (driven by highly automated stimulus-response associations, as opposed to outcome values) when sufficiently deeply ingrained (Dezfouli and Balleine, 2012).It is plausible that extensive conscious instrumental experience with a stimulus could reduce or eliminate the need for conscious access through automatic mapping of stimulus to responses.Still, instrumental behaviour is characterised by high flexibility and selectiveness.As such, it is likely that it should require conscious access in order to allow for rapid and flexible responding or behavioural adjustment if the contingencies change.Once instrumental behaviour becomes habitual, the flexibility is diminished.It is conceivable that consciousness facilitates the flexibility of instrumental behaviour, which becomes rigid and resistant to adjustment without conscious access.The analysis of WSLS strategy suggests that participants partially relied on decision heuristics to guide their decisions in the absence of stimulus awareness providing clear stimulus-action-outcome mappings.In the unconscious learning and performance tasks, participants had a tendency to repeat rewarded choices (although this result achieved strong support only in the unconscious performance following unconscious learning task).However, there was no evidence for a LS strategy being adopted in any of the unconscious tasks.Interestingly, the unconscious performance task following unconscious learning showed a clear and sensitive tendency for WSperhaps due to having learned that there is no clear strategy to follow over the course of the task besides WS.This, however, remains a speculation.
Pertaining to reaction times, we were not able to find sensitive evidence against the absence of a difference in response speed to rewarding versus punishing stimuli (see also Skora et al., 2021b).This likely stemmed from an insufficient number of trials (notably, as a feature of the design, avoid responses do not yield any RT).This outcome differs from the previously reported result, where participants were faster to respond to rewarding stimuli than to punishing stimuli in absence of any difference in choices, suggesting that some unconscious knowledge may have been acquired (Atas et al., 2014).However, the difference in methods between the Atas et al. study and ours may account for the discrepancy.For example, conscious access is considered to be graded and different across masking methods (Breitmeyer, 2015;Kouider and Dehaene, 2007;Windey et al., 2014), with crowding (as used by Atas et al.) potentially permitting more conscious access.Importantly, Atas et al. did not control awareness in a trial-wise fashion.
Interestingly, previous research has shown that subliminal (masked) No-Go cues can activate prefrontal regions associated with inhibitory control and trigger behavioural inhibition, resulting in slower RTs to No-Go cues or even complete abortion of a response (Lau and Passingham, 2007;van Gaal et al., 2008;van Gaal et al., 2010).Those results imply that in the performance stage (after successful learning), the subliminal punishing stimuli should be able to trigger some degree of inhibition.This was not observed in our task.One possible reason for this difference could be that the above papers interleaved the subliminal cues with supraliminal cues, potentially priming the subliminal trials with knowledge from the supraliminal trials.Another possible reason stems from the type of behaviour.In the above studies, participants' default action (Go) is meant to be stopped by a No-Go signal (whether masked or not).In our task, each trial requires an active choice between approach and avoid, rather than inhibition of a default response.This key difference might engage different mechanisms: inhibitory control versus active choice.It is possible that inhibitory control of default actions can be triggered without awareness of the cue, but once selective action is needed, cue awareness is necessary.
This evidence, and the above considerations, have implications for understanding the function of consciousness, supporting the notions that adaptive behaviours, where selective and flexible decision-making is involved, require consciousness for successful operation.Complex forms of learning (including trace conditioning, second-order conditioning, or flexible re-learning) have been considered to share overlapping markers with the hallmarks of consciousness, suggesting that the two may be evolutionarily intertwined (Birch et al., 2020;Birch et al., 2021;Ginsburg and Jablonka, 2019).Elsewhere, consciousness has been closely tied to action, providing a frame of reference for individuals' interactions with the world (Clark, 2016;Land, 2012;Merker, 2005;Seth et al., 2016).Consequently, consciousness may enable complex, flexible and longer-term decision-making strategies, going beyond the simple and rigid stimulus-stimulus or stimulus-response associations.Indeed, such forms of learning have been demonstrated without conscious access (Bayley et al., 2005;Clark and Squire, 1998;Knight et al., 2003;Lin and He, 2009;Rosenthal et al., 2016;Scott et al., 2018;Seitz et al., 2009) Nonetheless, it is noteworthy that d' is a limited means of assessing learning, averaging across the entire block and discounting potential improvements in the learning process.However, inspecting the learning curves suggests that participants failed to improve their responses over the course of the unconscious blocks (learning and performance), in contrast to the conscious learning condition.Secondly, while the trialby-trial awareness check constitutes an immediate, relevant, and sensitive measure of awareness (Berry and Dienes, 1993;Newell and Shanks, 2013), there is still a possibility that the conscious (albeit excluded) trials affected behaviour on the unconscious trials.We took steps to guard against this possibility by excluding participants with a large number of conscious trials (over 25%).For those with a smaller number of conscious trials, the potential effect of conscious knowledge is negligible (Skora et al., 2023).
Finally, it could be argued that symmetry, used as a first-order property of the stimulus in the objective part of the awareness check, is too high-level of a feature to adequately capture any partial perception of the stimulus.The motivation for choosing symmetry and the answer to this concern lie in the awareness check.Symmetry was the key characteristic predictive of the nature of the stimuli (in each block, one stimulus was always symmetrical and the other asymmetrical).Hence, we assessed participants' confidence reflecting the conscious status of symmetry as the relevant perceptual contentif they could perceive the stimulus sufficiently to determine if it was symmetrical or not with any confidence, they should also be able to approach or avoid it correctly.This is, of course, imperfect, as it may fail to capture partial perception based on another telling property of the stimuli.However, if such uncaptured partial perception was driving choices, it could have only elevated the d'.Since we failed to obtain evidence of learning, we deem the likelihood of uncaptured partial perception unlikely.
To conclude, the present study investigated whether instrumental responding can proceed unconsciously once the necessary knowledge of stimulus-action-outcome associations has been acquired consciously.We demonstrate that even when the instrumental contingencies are reliably learned consciously, participants fail to successfully respond to the same stimuli when they are not consciously perceived.We also replicate the previous evidence that no instrumental conditioning occurs in the unconscious mode.Consequently, the absence of conscious awareness of the stimuli in the environment not only precludes instrumental conditioning, but also precludes any kind of instrumental responding to already known stimuli.This suggests that instrumental behaviour is entirely supported by conscious awareness of the world, and corroborates the proposals that consciousness may be necessary for adaptive behaviours requiring selective action.

Fig. 1 .
Fig.1.A: Illustration of a single trial in the UC mode of learning and performance (English version).Following a fixation cross, the target stimulus (either rewarding or punishing if approached) was presented between two visual masks.Participants had two seconds to make an approach (Go) or avoid (NoGo) decision, and were rewarded or punished accordingly.In this example, a participant made a correct avoid decision and was rewarded with a golden token.The trial ended with a binary symmetry and confidence judgment.B: Illustration of a single trial in the C mode of learning, where a participant made a correct approach decision.The trial sequence was identical to the UC mode, but the stimuli were presented in plain view, without visual masks.

Fig. 2 .
Fig. 2. Top: Type I d' for the learning blocks, comparing unconscious instrumental learning (UC learn) with conscious instrumental learning (C learn).Bottom: Type I d' for the performance blocks, comparing unconscious instrumental performance following unconscious learning (UC-> UC perf) with unconscious instrumental performance following conscious learning (C-> UC perf).Asterisks indicate significance at: *** = p < 0.001.For B: ~ indicates a sensitive B supporting the H1.+ indicates a sensitive B supporting H0.

Fig. 3 .
Fig. 3. Top: Learning curves for the conscious learning (all trials) -> unconscious performance condition (unaware trials only).Bottom: Learning curves for the unconscious learning -> unconscious performance condition (unaware trials only).Grey lines represent mean (across participants) of correct choices (approach and avoid responses to the rewarding and punishing options, respectively).Coloured lines represent smoothed values (locally weighted regression; ribbon represents +/− 1 SEM).

Fig. 4 .
Fig. 4. Boxplots presenting the proportions of win-stay/lose-switch responses during the unconscious stages of the task: unconscious performance task following conscious learning (left), and unconscious learning and the corresponding unconscious performance task (right).Conscious learning task is omitted due to poorly computable WSLS proportions caused by participants learning the correct choices quickly with near-ceiling accuracy, thus eliminating the need for relying on WSLS as a decision heuristic (see Fig.3 for learning curves).

Table 1 WSLS
analysis across the learning and performance stages of both conditions (conscious, unconscious learning), representing t-and p-values obtained from one-sample t-tests against 0.5 (chance level), Cohen's d, and default-prior Bayes factor B.