Unraveling the influence of Pavlovian cues on decision-making: a pre-registered meta-analysis on Pavlovian-to-instrumental transfer

Amidst the replicability crisis, promoting transparency and rigor in research becomes imperative. The Pavlovian-to-Instrumental Transfer (PIT) paradigm is increasingly used in human studies to offer insights into how Pavlovian cues, by anticipating rewards or punishments, influence decision-making and potentially contribute to the development of clinical conditions. However, research on this topic faces challenges, including methodological variability and the need for standardized approaches, which can undermine the quality and robustness of experimental findings. Hence, we conducted a meta-analysis to unravel the methodological, task-related, individual, training, and learning factors that can modulate PIT. By scrutinizing these factors, the present meta-analysis reviews the current literature on human PIT, provides practical guidelines for future research to enhance study outcomes and refine methodologies, and identifies knowledge gaps that can serve as a direction for future studies aiming to advance the comprehension of how Pavlovian cues shape decision-making.


Rationale
In many daily life scenarios, environmental cues can guide our decision-making (Dayan and Balleine, 2002;Doya, 2008).For instance, merely spotting the logo of a familiar chocolate brand can prompt us to purchase that particular product or a similar one, even if it wasn't initially on our shopping list.By anticipating upcoming rewards or punishments, these environmental cues become motivationally relevant (i.e., Pavlovian cues) and can bias behavior by guiding our actions either toward our goals or veering us into undesired conduct (Corbit, 2005;Corbit et al., 2007;Everitt andRobbins, 2016, 2005;Lewis et al., 2013).Their impact can even extend beyond momentary decisions, potentially influencing the development and persistence of clinical conditions like addiction and compulsive behavior (Garbusow et al., 2022;Krypotos and Engelhard, 2020;Peng et al., 2022;Robinson and Berridge, 2001).Thanks to the insights it offers into the intricate interplay of the learning processes underpinning the influence of Pavlovian cues over decision-making (sometimes also referred to as Pavlovian bias or cueguided decision-making), the Pavlovian-Instrumental Transfer (PIT) paradigm has increasingly found its way into human studies over the past two decades (Campese and Laurent, 2023;Cartoni et al.,

Objectives
Based on these premises, a systematic quantitative analysis of the factors that can modulate the influence of Pavlovian cues on decision-making appears of crucial importance to guide future PIT studies.
To this aim, we conducted a meta-analysis of the studies published until 2023 that investigated human Pavlovian-Instrumental Transfer (PIT).Methodological and task-related factors (e.g., extinction, outcome, etc..), individual factors (e.g., age, population), and training and learning factors (e.g., amount of learning, contingency awareness) were analyzed as possible modulators (for more details, see "2.8 Data items") The results will be discussed with three goals in mind: (1) provide an overview of the literature on human Pavlovian-to-Instrumental transfer, with a specific focus on outcome-specific and general transfer, intended as tools to test the influence of Pavlovian cues on decision-making; (2) provide guidelines for the design of future studies (i.e., what determines a stronger effect and more consistent results, estimation of effect size for power analysis), aiming to clarify currently mixed evidence, and promote more robust and replicable findings; (3) identify knowledge gaps on which future research should focus to advance the current understanding of the influence of Pavlovian cues over decisionmaking.

Methods
J o u r n a l P r e -p r o o f

The Pavlovian-to-Instrumental Transfer (PIT) task
The Pavlovian-Instrumental Transfer (PIT) task generally consists of three phases.The first one is a Pavlovian conditioning phase, during which participants learn stimulus-outcome associations.Thus, they are presented with initially neutral stimuli (e.g., an image or a color) that become conditioned stimuli (CSs) after repeated pairing with the delivery of reinforcer, either a reward (e.g., food) or a punishment (e.g., shock).Stimuli that predict no reinforcer are also usually present in this phase (CS-) as a control condition.The second phase consists of an instrumental conditioning phase, in which participants learn the association between one or more instrumental actions (e.g., pressing a button) and their related outcome (the same or similar reinforcer, as in the Pavlovian conditioning phase).The third so-called "transfer" phase is the critical phase, in which the influence of the CSs on instrumental actions (i.e., choice) is tested.The participant performs the same instrumental actions previously learned, in the presence of the Pavlovian cues, which are actually task-irrelevant.Of note, this phase is usually performed under either true extinction (i.e., no rewards or punishments are presented following the instrumental action) or nominal extinction (i.e., no rewards or punishments are presented following the instrumental action, but participants are told that rewards and punishments are nevertheless being earned) (Hogarth and Chase, 2011).Depending on task structure, two related but distinct types of transfer may be tested, namely the outcome-specific transfer and the general transfer.Outcome-specific transfer tests the influence of the CSs on instrumental responses directed toward the same outcome as the one earned during the previous Pavlovian and instrumental conditioning phases (e.g., the pizzeria logo induces us to get some pizza).In contrast, general transfer tests the influence of the CSs on instrumental responses directed towards a different but motivationally similar outcome (e.g., the pizzeria logo induces us to get any kind of food).

Protocol, registration, and open materials
The present meta-analysis was conducted following the PRISMA guidelines (Page et al., 2021) and was pre-registered on the Open Science Framework (OSF, https://osf.io/3yk5h)before data acquisition on December 1 st , 2021.The datasets and code used in the paper are also freely available on OSF according to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles (Wilkinson et al., 2016).Pre-registration, data, and materials are available at: https://osf.io/j8sq5/.

Eligibility criteria
All peer-reviewed and unpublished studies (provided by the authors, see below) available in English that met the following PICOS (Participants, Intervention, Comparator group, Outcome, and Study design) criteria were included: 1) Participants: all samples were included regardless of sex, age, education, and clinical or subclinical characteristics of the participants.2) Intervention: does not apply.
3) Comparator: for outcome-specific transfer, we considered the contrast between congruent (e.g., choosing pizza when the pizza-associated cue was presented) and incongruent (e.g., choosing pizza when the chips-associated cue was presented) responses; for general transfer, we considered the contrast between responses performed during presentation of a conditioned stimulus associated with an outcome (CS+, appetitive or aversive) and a conditioned stimulus associated with no outcome (CS-).
J o u r n a l P r e -p r o o f 4) Outcomes: means and standard error of participants' behavioral performance during the transfer phase or the t-or F-value of the contrasts reflecting either outcome-specific or general transfer effect were used to calculate Hedge's g as the standardized effect size measure.5) Study design: studies that used a Pavlovian-to-Instrumental transfer task in which it was possible to distinguish between outcome-specific and general transfer effects.Studies that used a single response PIT paradigm were excluded due to the inability to disentangle between the two types of transfer effects.

Information sources
Studies were identified by searching electronic databases, scanning reference lists of articles, and consultation with experts in the field.Electronic databases included PubMed and PsycINFO.The ten most active researchers resulting from this initial search were considered experts in the field and contacted to revise the final list of included papers.These authors were also asked to share unpublished data, if any.Whenever relevant information was not available in a paper, the corresponding author was contacted and asked to provide the missing data.Studies for which it was impossible to obtain the information reported above were excluded.

Search
Electronic databases (PubMed and PsycINFO) were searched using the following keywords "Pavlovianto-instrumental transfer" OR "appetitive Pavlovian-to-instrumental transfer" OR "aversive Pavlovianto-instrumental" OR "cue-triggered decision making" OR "cue-guided decision making" to find publications published up until August 2023.

Study selection
Eligibility assessment was performed by 3 reviewers (M.B., L.A.E.D., and D.D.) independently but in a coordinated manner so that every disagreement could be resolved by consensus.To check the inclusion criteria of the articles, an initial screening was performed based on title and abstract inspection.Once classified as potentially eligible, they were fully screened to confirm compatibility with the pre-specified inclusion criteria.Studies that used multiple experimental groups were divided and classified separately.See the flow chart in Fig. 1 for further information.

Data collection process
Data were collected using information provided from the full-text or supplementary material for most papers.When missing, data requests were made to corresponding authors via e-mail.Requests for which we received no or partial information resulted in the exclusion of the paper from the present meta-analysis.Issues regarding duplicated or partially overlapping samples (i.e., in more than one paper) were resolved including only the data from the paper with the largest sample size.All data were collected separately by M.B., L.A.E.D., and D.D. and double-checked at the end of data collection.Data were categorized and included in two different databases, based on the type of transfer effect, either outcome-specific or general.

Data items
For each paper, we extracted the following information: -Authors -Title All moderators were decided a-priori (see pre-registration), except for the number of trials, the mean sample age, the type of extinction, contingency awareness, and instructions which were added following paper inspection and/or the peer-review process.
If studies presented more than one transfer phase, only data related to the first one (or the one with no additional experimental manipulation) were included.

Risk of bias within studies
Studies characterized by biased methods or results may lead to biased results also in the meta-analysis (Sterne et al., 2001;Thornton, 2000).To avoid this, the risk of bias was evaluated both at the study and the results level via a qualitative and a quantitative assessment.The qualitative assessment of the experimental methods and procedures ensured that no study with atypical characteristics was included in the final sample.More specifically, an examination of the presence of confounding variables was conducted, such as significant task-related modulations, sample characteristics, atypical analysis, or results that may introduce a bias.The quantitative assessment consisted of excluding studies with an effect size above the 3 rd interquartile range of the overall effect size.

Summary measures
For outcome-specific transfer, we considered the contrast between congruent (e.g., choosing pizza when the pizza-associated conditioned stimulus was presented) and incongruent (e.g., choosing pizza when the chips-associated conditioned stimulus was presented) responses.For general transfer, we considered the contrast between responses performed during the presentation of a conditioned stimulus associated with an outcome (CS+, either rewarding or aversive) and a conditioned stimulus associated with no outcome (CS-).

J o u r n a l P r e -p r o o f
Means and standard error of participants' behavioral performance during the transfer phase or the tor F-value of the contrasts reflecting either outcome-specific or general transfer effect were used to calculate Hedge's g as the standardized effect size measure for each study.As compared to the more commonly used Cohen's d, Hedge's g allowed us to avoid possible biases due to low sample sizes (Finke et al., 2021;Lakens, 2013).Hedge's g was interpreted as small (0.2), medium (0.5), or large (0.8) (Cohen, 1988).

Risk of bias across studies (publication bias)
J o u r n a l P r e -p r o o f Publication bias indicates the presence of skewed results in the meta-analysis due to the absence of non-statistically significant results in the literature or because of biased studies (Sterne et al., 2001;Thornton, 2000).In the present meta-analysis, publication bias was assessed by visual inspection of a funnel plot (Sterne and Egger, 2001), where an asymmetrical distribution would be considered as a presence of risk of publication bias.A contour-enhanced method was implemented to provide information about the statistical significance of the studies in the funnel plot (Peters et al., 2008).Egger's regression test (Egger et al., 1997) was also used for a quantitative assessment of the asymmetry in the funnel plot, where a p-value <0.05 would indicate a statistically significant risk of publication bias.

Multimodel inference
Multimodel inference was used to test the overall impact of several categorical moderators on the effect size and obtain a relative importance index for each moderator in terms of contribution across the studies.Such computation was obtained considering the weight of each moderator in all possible models arising from the combination of the moderators (Giam and Olden, 2016).Akaike weight (w i ) indicated the probability of the model to be the best for the data.Model weights were summed and grouped for each moderator, with a probability ranging from 0 to 1 for each moderator.Moderators with w i ≥ 0.5 were considered as important.Akaike information criteria (AIC) was used as an index of the models selected for the multimodel inference, hence, low AIC denoted a better model.Δ AIC indicated the difference between the best model (i.e., the one with the lowest AIC) and the following model (second to last lowest AIC).Models with Δ AIC <2 were considered relatively important.Models with Δ AIC higher than 2 were considered less important (Burnham et al., 2011;Burnham and Anderson, 2004).

Subgroup analysis
Subgroup analysis was used to provide a more detailed explanation of each categorical moderator considered, by providing a direct comparison between all the subgroups contained.Subgroup analyses were performed and reported via forest plots using all qualitative moderators (see "2.8 Data items") considered for outcome-specific PIT and general PIT.Qualitative moderators with less than 3 observations/samples per subgroup were not considered in the analysis (i.e., phase order, conditioned stimulus, and baseline correction for general PIT).Since some studies reported more than one type of measure for a single sample (i.e., number of responses, reaction times, and/or grip force), the type of measure associated with the highest effect size was selected for all subgroup analyses (except the "type of measure" subgroups analysis) to avoid biases due to duplicated study samples.The subgroup analysis on the type of measure was implemented separately, to retain all samples.Contingency awareness of Pavlovian learning (aware/unaware) was added as an additional subgroup analysis following the assessment of the risk of bias within studies (for more details, see "3.2.5.6 Contingency awareness" in the results section).

Meta-regression
Multiple linear meta-regression was used to investigate the impact of the quantitative moderators (see "2.8 Data items") on effect size.Only moderators with more than 10 observations were considered.The R 2 was used as an index of between-studies heterogeneity explained by moderators.Bubble plots depicting the results of all moderators examined were reported.
J o u r n a l P r e -p r o o f

Study selection
A total of 201 papers were initially selected: 198 were published studies identified through comprehensive database research; 3 were unpublished studies obtained after sending a request to the most active authors in the field (see above).Inspection by title and abstract led to the exclusion of 133 studies.After full-text inspection, out of the 68 remaining studies, 5 more studies were excluded because they did not contain the necessary data (a request was sent to all corresponding authors, see above).Out of the 63 remaining papers, 156 samples were identified (5765 subjects).Of these, 121 samples concerned the outcome-specific transfer effect (4529 subjects), and 36 concerned the general transfer effect (1236 subjects).A summary of the study selection process is shown in Figure 1.

Risk of bias within studies
The quantitative assessment suggested the exclusion of two samples (Hogarth et al., 2007;Gamez et al., 2007) due to extreme effect sizes (Figure S1).The qualitative assessment revealed that a small number of studies that included in the analysis only participants categorized as "aware" of the Pavlovian contingencies were systematically associated with higher effect sizes (Hogarth et al., 2007;Jeffs & Duka, 2017;Steins-Loeber et al., 2020;Vogel et al., 2018Vogel et al., , 2020)).This observation suggested the inclusion of 'contingency awareness' as a moderator for a separate subgroup analysis including only those studies that used this manipulation.Hogarth et al. (2007) could not be included because the data from the unaware group were not available.Cohen's d average (d av ) was used as an effect size measure (Lakens, 2013) for this analysis.For all other subgroup analyses, if data were available, aware and unaware groups were merged to obtain a unique sample comparable to those of other studies that did not adopt such rigorous exclusion criteria.

Results of individual studies
A random effects model was used to analyze the effect size of all samples (Figure 2).To avoid overrepresentation of a single sample, when more than one type of measure was available for the same sample (N = 12), only the one with the highest effect size was included in this analysis (see "2.14 Subgroup analysis").We found that the presence of outcome-specific transfer effect was robust across studies, as indicated by a medium pooled effect size (g z = 0.75, 95% CI = [0.68;0.83]).However, we also found substantial differences in effect size between the studies, as indicated by a medium between-study heterogeneity (τ 2 = 0.0891, 95% CI = [0.0640;0.1563], with I 2 = 70.9%,95% CI = [64.3%;76%], p<0.001).The prediction interval range (g z = 95% CI = [0.16;1.35]) indicated that future studies might find high variability in the effect size, from low to very large positive effects.Taken together, the results show medium effect sizes and medium heterogeneity.Such heterogeneity could be primarily due to differences across studies, which we assess in the next sections.

Risk of bias across studies
The funnel plot in Figure 3 shows the relationship between the effect size (Hedge's g) and the relative standard error of the samples included in the meta-analysis.A visual inspection of the funnel plot J o u r n a l P r e -p r o o f suggested a possible asymmetry of the sample distribution.This observation is supported by Egger's regression test reporting between-study asymmetry (t 96 = 4.03, p = 0.001).Indeed, some studies, particularly those characterized by higher effect sizes, appear to exert a notable influence on skewing the distribution towards more positive effects.The asymmetry of the sample distribution can be explained by the absence of published studies reporting non statically significant results, and the presence of the small-study effect, observable when smaller studies report a larger effect size than the larger ones (Kühberger et al., 2014;Sterne and Egger, 2001;Sutton, 2000).

Multimodel inference
Multimodel inference analysis was performed to test which categorical moderators influence the strength of outcome-specific transfer.The ten models reported in Table 1 showed a good index of fit (Δ AIC < 2).The best model included five moderators which were coherently associated with the highest w i (Figure 4), namely Phase order (w i = 1.00),Response measure (w i = 0.65), Extinction (w i = 0.63), and Baseline correction (w i = 0.57)Based on these results, we reported the subgroup analysis on the four strongest moderators in the main manuscript, while the results of all other moderators are reported in the supplementary materials (Figures from S2 to S6).

Subgroups analysis
A summary of all moderators and the relative subgroups is reported in Figure 5.

Extinction
Figure 8 shows the forest plot including all studies sub-grouped based on the type of extinction, namely Nominal (N =78) and True (N = 12).The Nominal subgroup reported a higher effect size (g z = 0.78, 95% CI [0.69; 0.86]) compared to the True subgroup (g z = 0.57, 95% CI [0.33; 0.80]).Nominal subgroup reported medium heterogeneity (τ 2 = 0.0880, I 2 = 69%, p = 0.25) while True (τ 2 = 0.1045, I 2 = 78%, p<0.01) subgroup reported high heterogeneity.Cochran's Q test showed a non-statistically J o u r n a l P r e -p r o o f significant difference between subgroup variability (Q 1 = 3.31, p = 0.07).Of note, as evidenced in Figure 5, the Nominal subgroup exhibited higher effect sizes compared to the True subgroup, despite the presence of overlapping CIs.

Meta-regression
Meta-regression was used to investigate the impact on the effect size of the quantitative moderators.The multiple models including the amount of Pavlovian training, the amount of instrumental training, the number of trials during transfer (per CS), the sample size, the mean age, and the year, explained less than 1% of the overall heterogeneity (R 2 = 0.9%), suggesting that the moderators considered for the model did not significantly influence the effect size (all moderators ps>0.05,see Table S1 and Figure S7).

Risk of bias within studies
The quantitative assessment didn't reveal studies with excessively high or low effect sizes.Thus, we assumed that the results were free of evident risk of bias within studies (Figure S8).The qualitative assessment didn't reveal any further confounding variables.

Results of individual studies
A random effects model was used to analyze the effect size of all samples (Figure 11).To avoid overrepresentation of a single sample, when more than one type of measure was available for the same sample (N = 6), only the one with the highest effect size was included in this analysis (see "2.14 Subgroup analysis").

J o u r n a l P r e -p r o o f
We found a medium effect size (g z = 0.57, 95% CI = [0.46;0.68]) indicating the presence general transfer effect across studies.Between-study heterogeneity was medium (τ 2 = 0.0545, 95% CI = [0.0198;0.1199], with an I 2 = 63%, 95% CI = [45.1%;75%], p<0.001) indicating the presence of substantial differences in effect size between the studies.The prediction interval range (g z = [0.08;1.06]) indicated that future studies might find high variability in the effect size, from nearly zero to a large positive effect.Taken together, the results show a medium effect size and medium heterogeneity.Such heterogeneity could be primarily due to differences across studies, which will be assessed in the next sections with multimodal inference, subgroup, and meta-regression analyses.

Risk of bias across studies
The funnel plot in Figure 12 shows the relationship between the effect size (Hedge's g) and the relative standard error of the sample included in the meta-analysis.A visual inspection of the funnel plot seems to suggest a possible asymmetry of the sample distribution.This observation is supported by Egger's regression test showing between study asymmetry (t 28 = 4.84, p<0.001).Indeed, studies with higher effect sizes appear to be associated with higher standard error, while lower effect sizes appear to be associated with lower standard error.The asymmetry of the sample distribution can be explained by the absence of published studies reporting non statically significant results, and the presence of the small-study effect, observable when smaller studies report a larger effect size than the larger ones (Kühberger et al., 2014;Sterne and Egger, 2001;Sutton, 2000).

Multimodel inference
Multimodel inference analysis was performed to test which categorical moderators influence the strength of general transfer.The three models reported in Table 2 showed a good index of fit (Δ AIC < 2) and indicated the Instructions as the best model.The best model included two moderators but only the instructions moderator was taken into account by all three models, reporting the highest w i (w i = 0.92, Figure 13).Based on these results, we reported the subgroup analysis on the instructions moderator only in the main manuscript.A summary of all moderators and the relative subgroups is reported in Figure 14, while detailed results on each moderator are reported in the supplementary materials (Figures from S9 to S13).

Subgroups analysis
A summary of all moderators and the relative subgroups is reported in Figure 14.

Instructions
Figure 16 shows the forest plot including all studies sub-grouped based on the type of instructions, namely Naive (N = 18) and CS-relevant (N = 8).The CS-relevant subgroup reported a higher effect size (g z = 0.80, 95% CI [0.64; 0.97]) compared to the Naive (g z = 0.47, 95% CI [0.34; 0.61]) subgroup.CSrelevant reported low heterogeneity (τ 2 = 0.0004, I 2 < 1%, p=0.55) while Naive (τ 2 = 0.0421, I 2 = 56.1%,p<0.01) subgroups reported medium heterogeneity.Cochran's Q test showed a statistically significant difference between subgroup variability (Q 1 = 11.83,p < 0.001).Of note, as evidenced in Figure 15, both methods exhibited substantial precise effect size and higher effect size for the CS-relevant subgroup than the Naïve subgroup.Overall, these results suggest that the general transfer effect is associated with a higher effect size when explicit instructions on the relevance of the CS compared to no relevant instructions were used.

J o u r n a l P r e -p r o o f
A multiple meta-regression model including the amount of Pavlovian training, the number of trials during transfer (per CS), the sample size, the mean age, and the year explained about 69% of the overall heterogeneity (R 2 = 69.21%).Specifically, results showed a negative relationship between the sample size and the effect size (t 17 = -3.52,p = 0.003, r = -0.65,see Figure 16A), a positive relationship between the sample mean age and the effect size (t 17 = 2.48, p = 0.02, r = 0.52, see Figure 167B), and a negative relationship between the number of trials during the transfer phase and the effect size (t 17 = -2.51,p = 0.02, r = -0.52,see Figure 16C).All other moderators did not report a statistically significant relationship with the effect size (all ps>0.05,see Table S2 and Figure S14).

Discussion
The present meta-analysis provides a quantification of the results obtained in the literature investigating the influence of Pavlovian cues on decision-making, as tested via the Pavlovianinstrumental transfer (PIT) paradigm.The results will be here discussed based on four main parameters emerging from the analyses, namely the strength of the effect (effect size; the higher, the stronger), the uncertainty (or precision) of the estimated effect size (95% CI; the narrower, the less uncertain or more precise), the extent to which two or more moderators differ (overlap between 95% CI, the smaller the overlap, the stronger the difference), and the consistency of the results amongst studies (heterogeneity, the higher, the more consistent) (Borenstein et al., 2017;Hespanhol et al., 2019;Higgins, 2003;Huedo-Medina et al., 2006).Overall, results confirmed that human behavior can be strongly modulated by environmental stimuli associated with rewards or punishment (i.e., Pavlovian cues), as both outcome-specific and general transfer effects were confirmed as robust phenomena, being associated with strong effect sizes and reasonably narrow 95% CIs.More precisely, outcome-specific transfer reported a higher effect size than general transfer, and both effects were associated with medium heterogeneity and large prediction interval ranges, indicating that future studies may find from low to very large effect sizes.To determine which features can explain such heterogeneity, we further investigated whether and how the effect size of PIT studies is modulated by factors related to the methodology, task, training, learning, and individual aspects.Among these, results showed that methodological and task-related factors played the biggest role, especially in outcome-specific transfer; however, many other moderators played an influence on results, and thus deserve careful consideration to detect strengths and weaknesses of current studies and provide useful guidelines for future ones (see Table 3).

Phase order
The order in which the Pavlovian and instrumental conditioning phases are presented resulted in outcome-specific transfer.Of note, comparable average strength and precision of the effect sizes were found between studies having either the Pavlovian or the instrumental as the first phase (Figure 5), overlapping strong effect sizes and reasonably narrow CIs).Nevertheless, studies presenting the Pavlovian phase first reported consistently strong results as compared to studies presenting the instrumental phase first, in which higher heterogeneity was found (Figure 6).Based on this result, we recommend future studies to use a Pavlovian-Instrumental order of phases to study outcome-specific transfer (Table 3).For general transfer, no comparison was possible as only two studies used a Pavlovian-Instrumental sequence, while all others adopted an Instrumental-Pavlovian sequence.

J o u r n a l P r e -p r o o f
These results clarify previous conflicting evidence (Cartoni et al., 2016;Holmes et al., 2010).While reviews on human studies considered the phase order to be irrelevant (Cartoni et al., 2016), a metaanalysis of 30 studies on rats found a difficult-to-interpret interaction between the length and the order of the two phases, such that prolonged instrumental training facilitated the transfer effect when Pavlovian conditioning preceded instrumental training but disrupted it when the order of the two phases was reversed (Holmes et al., 2010).The reasons for such an effect remain unclear but one may venture that either (a) a consolidation of the Pavlovian associations may take place during the subsequent instrumental phase (Van Zessen et al., 2021;Wise, 2004); and/or (b) the continuity between the instrumental phase (no cues -outcomes available) and the transfer phase (cues -no outcomes) may enhance the idea that the decision-making process shall now follow a new strategy, as cues are the sole information available to the participant.While both hypotheses may at least partially explain the enhanced stability of the influence of Pavlovian cues on instrumental choices when the Pavlovian learning phase is experienced first, these remain purely speculative.Future studies should address this issue more directly to clarify this effect.

Response measure
In human studies, the transfer effect has been quantified via different types of response measures, like the number of responses (i.e., total number or expressed as a percentage) or the vigor of the response (i.e., reaction time or grip force).For outcome-specific transfer, two clearly distinct and fairly precise effects were observed (Figure 5), in that the number of responses was associated with a much stronger -although less consistent -effect size as compared to the vigor of the response, which was instead associated with a consistently low effect size (Figure 7).Based on this result, we recommend future studies using the number of responses (total or percentage) as the preferred type of measure for studying outcome-specific transfer (Table 3).For general transfer, the presence of overlapping effects and CIs between the two response measures (Figure 14) suggests that there is not enough evidence to conclude whether this factor plays an important role.However, since vigor was associated with a slightly lower effect size but largely more consistent results than the number of responses, we recommend future studies to use vigor (reaction times or grip force) as the preferred type of measure for studying general transfer (Table 3).Taken together, these results resonate with a growing body of research hypothesizing a direct relationship between the response measure (number or vigor) and the different nature of outcomespecific and general transfer.More specifically, outcome-specific transfer is typically associated with the use of a cognitive top-down strategy and thus better expressed in terms of the number or percentage of responses that reflect the Pavovlian bias on decision-making in terms of direction of the response (i.e., choosing more congruent than incongruent responses), whereas general transfer is interpreted as a prevailing motivational effect and thus better accounted for by measures of the vigor of the response, such as reaction time or grip force (Degni et al., 2022;Finotti et al., in prep.;Garofalo et al., 2021Garofalo et al., , 2020Garofalo et al., , 2019;;Garofalo and Robbins, 2017;Hinojosa-Aguayo and González, 2020;Marzuki et al., Preprint;Seabrooke et al., 2019;Sommer et al., 2022).The present meta-analysis provides strong support for this interpretation of outcome-specific transfer but also partial support for the general transfer.Indeed, although weak differences between the number and vigor of the responses were found for general transfer, vigor measures were associated with an effect size two times bigger for general transfer (Figure S12) than for outcome-specific transfer (Figure 7), thus still indexing the possibility of a different mapping of the transfer effects onto specific types of response measures.

Extinction
Especially in non-human literature, the transfer effect has been classically tested under extinction (i.e., no outcome is delivered during the transfer phase) to avoid the confounding effect of the reward on J o u r n a l P r e -p r o o f instrumental actions, which could lead to forming new associations and/or to response competition (Corbit and Balleine, 2015;Tricomi et al., 2009).In human literature, this form of "true" extinction has been sometimes replaced by the so-called "nominal" extinction, in which participants are led to believe that, although not visible, the outcomes are still being earned.This procedure is supposed to prevent participants from rapidly extinguishing responses across the transfer phase, due to the devaluation of the response-outcome association (Hogarth and Chase, 2011).For outcome-specific transfer, the present meta-analysis reported a stronger effect when nominal extinction was used, as compared to true extinction, with partially overlapping CIs and comparable consistency across studies in the two subgroups (Figure 5; Figure 8).For general transfer, results showed largely comparable results in terms of consistency of results and effect sizes, as well as largely overlapping CIs between the two extinction procedures (Figure 14; Figure S11).Of note, for both outcome-specific and general transfer, results showed a slightly higher precision of the estimation for nominal as compared to true extinction.Taken together, these results suggest that future studies could adopt a nominal extinction procedure to either increase the strength (for outcome-specific) or the precision (for outcome-specific and general) of the transfer effect (Table 3).Nonetheless, a broader reflection on the nature of the effect tested by using true as opposed to nominal extinction may be crucial.In the case of true extinction, one may argue that the cost-effective strategy is to simply stop responding after a while since no outcome is being collected and the response-outcome contingency is thus gradually degraded.With nominal extinction, on the other hand, continuing to respond may be worthwhile as, in the absence of any other feedback, the costeffective strategy could be to follow the only information available, i.e., the CS.This result speaks again in favor of the different nature of the two transfer effects, as it explains how, by indirectly changing the strategy, changing the type of extinction affects outcome-specific, thought as more strategic, more than general transfer, thought as more motivationally driven (Degni et al., 2022;Finotti et al., in prep.;Garofalo et al., 2021Garofalo et al., , 2020Garofalo et al., , 2019;;Garofalo and Robbins, 2017;Hinojosa-Aguayo and González, 2020;Marzuki et al., Preprint;Seabrooke et al., 2019;Sommer et al., 2022).

Baseline correction
As custom in many experimental settings wishing to control for state-dependent factors, also in the PIT literature, some studies used a baseline correction consisting of subtracting the chosen behavioral measure (for example, number of responses) registered in the absence of a CS (baseline phase) from that registered following CS presentation (transfer phase).For outcome-specific transfer, two clearly distinct and fairly precise effects were observed (Figure 5), in that not using a baseline correction was associated with a stronger -although less consistent -effect size as compared to the vigor of the response, which was instead associated with a consistently lower but still medium to large effect size (Figure 9).Based on this result, future studies could adopt two different approaches for the analysis of responses: either using a baseline correction to get more consistent results or not using it to maximize the effect size of the outcome-specific transfer (Table 3).For general transfer, no comparison was possible, as only two studies included in this meta-analysis used a baseline correction.This result may be explained either as an overestimation of the effect in the absence of a baseline correction (i.e., without correcting for an individual baseline rate of responding, the modulation due to CSs is exaggerated) or by a possible ceiling effect occurring during the baseline phase (i.e., if participant's responses reach their maximum capacity during baseline, the observable variability between conditions of the subsequent responses is necessary limited when subtracting that amount).Nevertheless, the present results do not allow us to disentangle between these two possibilities and future studies may try to answer this question directly.

Type of reinforcer
Another aspect that may affect the transfer effect is the type of reinforcer used.While primary reinforcers have innate (not learned) reinforcing qualities, as they can fulfill a specific biological need (like, water, food, sleep, shelter, or pleasure), secondary reinforcers have no inherent value but gain a reinforcing quality via a learned association with a primary reinforcer (like, money, grades, or praise) (Schultz, 2006;Sescousse et al., 2013).Our metanalysis found stronger outcome-specific transfer when primary reinforcers were used, as compared to secondary reinforcers, with clearly separated CIs, and good and comparable precision and consistency across both subgroups (Figure 5, Figure S2).For general transfer, overlapping effect sizes and CIs were found between primary and secondary reinforcers (Figure 14, Figure S10), thus indicating that the type of reinforcer does not modulate this effect, although results were more consistent when secondary reinforcers were used.Based on this result, we recommend future studies to use primary reinforces if outcome-specific and general transfer are investigated in the same task (using different types of reinforcers between the two effects could alter the results) but to prefer secondary reinforcers if general transfer is studied in isolation (Table 3).Taken together, these results support the existence of a dissociation between the mechanisms sustaining outcome-specific and general transfer.While stimuli paired with reinforcers directly associated with a biological need can more effectively induce outcome-specific transfer, this aspect does not seem to play a role in general transfer.This can be explained by the fact that primary reinforcers are strongly characterized by unique sensory-specific properties upon which outcomespecific transfer is known to depend (Cartoni et al., 2016;Holmes et al., 2010;Mahlberg et al., 2021).Humans have a clearly distinct sensorial representation for chocolate and chips but less so for two different currencies (e.g., euro/dollar), thus making it easier for outcome-specific transfer to be observed in the presence of reinforcers that are openly distinguishable in terms of their sensory properties.General transfer, on the other hand, is classically intended as more motivationally related (Degni et al., 2022;Finotti et al., in prep.;Garofalo et al., 2019Garofalo et al., , 2020Garofalo et al., , 2021;;Garofalo & Robbins, 2017;Hinojosa-Aguayo & González, 2020;Marzuki et al., 2023;Seabrooke et al., 2019;Sommer et al., 2022) and, as such, less affected by the type of reinforcer, as both primary and secondary reinforcers can equally motivate behavior (Lehner et al., 2017).

Value of the reinforcer
The transfer effect can be influenced by the appetitive (e.g., food, money, drinks) or aversive (e.g., loud noise, electric shock) value of the reinforcer (or outcome) toward which the responses are directed (Hebart and Gläscher, 2015;Nadler et al., 2011).The present meta-analysis revealed that outcome-specific transfer was stronger -although less consistent -when tested with appetitive than aversive reinforcers (Figure 5; Figure S4), while general transfer was associated with stronger and more consistent effect size when tested with aversive reinforcers (Figure 14; Figure S9).Nevertheless, in both cases, the presence of wide (especially for aversive studies) and largely overlapping CIs between the two subgroups advises careful consideration of such results and the need for additional studies to reach a conclusive interpretation.In line with this, the value of the reinforcer was reported as a weak modulator of both outcome-specific and general transfer according to the multimodel inference analysis.

Conditioned stimulus
While most studies used a Pavlovian conditioning procedure in which an initially neutral visual stimulus (e.g., a fractal, a shape, or a color) becomes a conditioned stimulus (CS) after repeated pairing with a reinforcer, other studies directly used intrinsically motivational stimuli (e.g., pictures of a J o u r n a l P r e -p r o o f cigarette or a beer).Of note, in most of such cases (21 out of 24 studies), these stimuli were not even paired with a reinforcer during a Pavlovian phase but relied on its motivational values "as is".These studies mainly recruited social smokers or drinkers, which are easily triggered by tobacco or alcohol cues, resulting in heightened motivation.The present meta-analysis revealed that outcome-specific transfer was stronger when tested with intrinsically motivational than conditioned stimuli (Figure 5; Figure S3) but the presence of wide and largely overlapping CIs and comparable consistency between the two subgroups advises careful consideration of such results and the need for additional studies to reach a conclusive interpretation.In line with this, the type of CS was reported as a weak modulator of both outcome-specific and general transfer according to the multimodel inference analysis.

Instructions
In the literature, a few pieces of evidence showed that participants' performance during the transfer phase can be influenced by the instructions provided (Hogarth et al., 2014;Mahlberg et al., 2021;Watson et al., 2018).While some studies do not provide specific instructions about the role of the CS on response selection (Naïve, usually adopting the same wording used for the instrumental phase), others emphasize (CS-relevant) or reduce (CS-irrelevant) the importance of the CS on the task performance by explicitly instructing participants to "remember what have you learned before" (e.g., "Based on everything you've learned so far, you should be able to earn points to get the final reward") or "not pay attention to the CS" (e.g., "Which picture is presented on the screen is however unimportant and it does not influence your gain") to the CS, respectively.More specifically, Hogarth and colleagues (2014) found that highlighting the irrelevance of the CS on reward delivery reduced outcome-specific transfer if participants explicitly abandoned their hierarchical beliefs concerning stimulus-outcome and response-outcome contingencies.This meta-analysis revealed that outcome-specific transfer was similarly stronger when either CSrelevant or CS-irrelevant instructions were used, as compared to naïve instructions.Nevertheless, the presence of partially overlapping CIs between the three subgroups advises careful interpretation.In line with this, instructions were reported as a weak modulator of outcome-specific transfer by the multimodel inference analysis.For general transfer, on the other hand, instructions were found to be the strongest modulator, where a much stronger and consistent effect was observed when CSrelevant instructions were given, as compared to Naïve instructions.Importantly, clearly distinct and fairly precise effects were observed for the two conditions.Unfortunately, CS-irrelevant instructions were not included in this analysis because of a lack of studies.Taken together, these results indicate that explicitly referring to the role of the CS in the instructions of the transfer phase can increase the observed effect, especially in the context of general transfer.However, whether this is true only for a rather obvious modulation induced by CS-relevant instructions or for both CS-relevant and CS-irrelevant instructions could not be disentangled by the present study.Future studies should address this issue more directly because the two scenarios have very different but crucial implications for understanding this phenomenon.If only CS-relevant instructions increase the transfer effect, this can simply be attributed to the fact that participants are explicitly nudged to use the CS to inform their response selection.In this scenario, the transfer effect is the mere result of an instructed strategy.If, on the other hand, the same increase in effect size is observed with both CSrelevant and CS-irrelevant instructions, a more general shift of attention towards the CS may be the mechanism at stake.

Sample size
A wide range of sample sizes has been used across the PIT literature, for both outcome-specific (from 8 to 121) and general (from 11 to 100) transfer.Importantly, while the strength of the outcome-J o u r n a l P r e -p r o o f specific transfer remained stable across different sample sizes (Figure S7), general transfer was characterized by a strong negative correlation between effect size and sample size, where lower sample sizes were associated with higher effect sizes and vice versa (Figure 16).The presence of a negative correlation between effect size and sample size can be attributed to a number of different reasons: more controlled laboratory settings, more heterogeneous samples, lower number of mediating factors, the use of power analysis (i.e., if large effect size is expected, the power analysis indicates that a small sample size is sufficient to detect it), the use of repeated trials or multiple items (which can decrease the pooled standard deviation and, thus, inflate the effect size), and/or the malpractice of adaptive sampling (i.e., collecting data until results are statistically significant (Kühberger et al., 2014;Sterne and Egger, 2001;Sutton, 2000).According to Kühberger and colleagues (2014), a common ground between all these explanations is that the presence of a negative correlation between effect size and sample size is indicative of publication bias, that is, the selective presence in the literature of statistically significant results (Sterne et al., 2001;Thornton, 2000).Since results have a better chance of being published if statistically significant, researchers may analyze the data in ways that minimize the danger of getting a non-statistically significant result (Simmons et al., 2011).Overall, these results suggest the presence of a stable and unbiased estimation of the strength of the outcome-specific transfer, even with a small sample size, but invite caution in the analysis and interpretation of the general transfer, especially if the sample size is small.

Pavlovian and Instrumental training
Previous evidence, mainly originating from non-human studies, reported mixed results concerning the influence of the number of Pavlovian and instrumental trials on the transfer effect.More specifically, such influence was reported only when considered in interaction with other factors that complicated the interpretation of these results, like the reinforcement schedule (Cartoni et al., 2016) or the order of the training phases (Cartoni et al., 2016).The reason why such a relationship has been investigated in literature is usually related to the idea that long sequences of repetition of an associative learning task can alter the nature of the learned response itself.For example, overtraining an instrumental response can turn a goal-directed action into a habitual one (Everitt andRobbins, 2016, 2005), such that the performance of the instrumental behavior becomes more automatic and less dependent on the fact that performing the instrumental action will indeed lead to the goal (i.e.action-goal contingency) and the motivation to achieve the goal.The present meta-analysis found important differences across studies in terms of the number of trials implemented.For Pavlovian conditioning, these ranged from 2 to 26 per CS for outcome-specific transfer and from 8 to 40 per CS for general transfer.For instrumental conditioning, these ranged from 4 to 75 per response for outcome-specific transfer and from 4 to 30 per response for general transfer.Despite such a wide range, the number of trials performed during the Pavlovian (Figure S7 and S14) and the Instrumental (Figure S7 and S14) conditioning phases did not show a significant relationship with the effect size of both transfer effects.

Pavlovian and Instrumental learning
In the present meta-analysis, important insight into the role of learning factors on transfer effect came from assessing the risk of bias within studies, which revealed that a small number of studies, systematically associated with larger effect sizes, selected participants based on the level of awareness J o u r n a l P r e -p r o o f of the Pavlovian contingencies (Hogarth et al., 2007;Jeffs and Duka, 2017;Steins-Loeber et al., 2020;Vogel et al., 2020Vogel et al., , 2018)).These studies applied a rigorous contingency awareness criterion, including in the final analyses only participants showing correct learning in the final block of Pavlovian conditioning (defined as "aware" participants), while excluding all others (defined as "unaware" participants).Surprisingly (given the simplicity of the task, as usually a very low number of contingencies is required to learn) this exclusion criteria resulted in the dismissal of up to half of the original sample from the final analysis reported in these papers.Given this peculiarity relative to the rest of the literature, these studies were analyzed in a separate meta-analysis aiming at a direct comparison between aware and unaware participants.The results revealed a very strong difference between aware and unaware participants (Figure 10), with aware participants reporting a transfer effect about five times stronger -although largely inconsistent -than unaware participants, with largely separated CIs.This result suggests that the awareness of the Pavlovian contingencies can have a strong impact on the transfer effect and is in line with previous studies reporting that contingency awareness during Pavlovian training is necessary to manifest the transfer effect (Hogarth et al., 2006;Hogarth and Duka, 2006).Nevertheless, a potential bias in participants' selection based on such stringent inclusion criteria cannot be excluded.Future studies should clarify this issue and shed new light on the role of learning in the transfer effect by collecting and reporting more precise measures of Pavlovian and instrumental learning, ideally on a trial-by-trial basis and related to both explicit (direct questions that test the acquisition or the expectancy of stimulus-outcome and response-outcome contingencies) and implicit (psychophysiological measures of anticipation and arousal, such as skin conductance or pupillometry) indexes of learning.Unfortunately, to the best of our effort, we failed to find these measures in the papers included in the present meta-analysis, as these are rarely reported or even collected in PIT literature.

Transfer
The number of trials performed during the transfer phase reported no relationship with outcomespecific transfer (min 2, max 80) and a negative correlation with general transfer (min 4, max 30), such that the general transfer effect decreased with increasing number of trials.This result argues again in favor of the separate nature of the two transfer effects.While, being more strategic, outcome-specific transfer is not affected by the length of the transfer phase even if under extinction, these same circumstances can instead decrease the motivation necessary to withstand the general transfer effect (Degni et al., 2022;Garofalo et al., 2021Garofalo et al., , 2020Garofalo et al., , 2019;;Garofalo and Robbins, 2017;Hinojosa-Aguayo and González, 2020;Seabrooke et al., 2019;Sommer et al., 2022).Of course, further studies are needed to directly address such a hypothesis.

Population
The impact of psychological and psychiatric diseases on the transfer effect is still debated (Garbusow et al., 2022).While it has been widely recognized that exposure and sensitivity to substance-related stimuli represent a major cause of relapse and maintenance of addiction caused by a "wanting" dominance over "liking" (Robinson and Berridge, 2001), some theories also highlighted how the interplay between Pavlovian and instrumental learning processes can determine a transition from goal-directed to habitual behavior, and from habitual to compulsive conduct, which may induce the enaction of behaviors even when we are aware of their detrimental consequences (Everitt et al., 2001;Everitt andRobbins, 2016, 2005).Moving from these premises, most of the PIT literature on clinical J o u r n a l P r e -p r o o f populations focused on substance addiction (i.e., drug, tobacco, or alcohol, 14 samples included investigating outcome-specific transfer only), with fewer studies on obsessive-compulsive disorder (OCD, 3 samples included), eating disorders (Lehner et al., 2017;Vogel et al., 2020), or schizophrenia (Morris et al., 2015).These last two populations, as well as addiction for general transfer (van Timmeren et al., 2020), were not included in the present meta-analysis because of the small amount of data available.For outcome-specific transfer, the results showed that while addiction was associated with a stronger transfer effect than the healthy population (with comparable consistency), studies on OCD were consistently associated with a weaker transfer effect than the healthy population (Figure 5; Figure S2).Nevertheless, the lower precision of the estimated effect size found in the two clinical subgroups and the partially overlapping CIs between the subgroups, suggest interpreting these findings with caution.For general transfer, largely overlapping effect sizes and CIs were found between OCD and healthy people, with comparable moderate consistency but a very low precision for OCD studies (Figure 15, Figure S9), thus generally constituting inconclusive evidence.Of note, the extremely high variability of the effect in OCD is probably also due to the small number of samples available (2 studies, 3 samples).In line with this, the value of the reinforcer was reported as a weak modulator of both outcomespecific and general transfer according to the multimodel inference analysis.Overall, these results indicate the need for additional data on all clinical populations to reach more conclusive evidence, especially regarding the OCD population.

Age
The relationship between age and transfer effect has been poorly investigated in the PIT literature.While some studies (Alarcón & Bonardi, 2020) investigated the transfer effect with a specific focus on children (7-11 years old), reporting the presence of both outcome-specific and general transfer in this population, to the best of our knowledge, only one pre-print study (Marzuki et al., Preprint) performed a direct comparison between age groups, reporting that adolescents (16 years old, on average) show reduced outcome-specific but comparable general transfer relative to a group of older adults (42 years old, on average).The present meta-analysis tried to shed new light on this issue by testing the relationship between the average age of each study sample and the transfer effect.Results revealed a positive relationship between age and general transfer, with older age predicting a stronger transfer effect (Figure 16), but no relationship between age and outcome-specific transfer (Figure S5).However, this result does not constitute definitive evidence for several reasons: first, the results here reported are in contrast with the only study currently present in literature (Marzuki et al., Preprint); second, the relationship between age and general transfer seems to be mainly driven by four samples (Marzuki et al., Preprint;van Timmeren et al., 2020), two of which are from clinical populations (obsessive-compulsive disorder and alcohol use disorder); third, the mean age was not reported in many studies (30 out of 112 for outcome-specific transfer and 6 out of 36 samples for general transfer); fourth, in general, younger and older age groups were scarcely represented, since the average age in most studies was between 20-25 years old.Thus, evidence about the relationship between age and transfer effect is still scarce, and future studies should study this relationship directly to provide more solid evidence.

Limitations
The first limitation of the present study is the exclusion of the literature concerning single-response PIT tasks (usually implemented as go/no-go paradigms).Nevertheless, after careful consideration, we deemed it beyond the purpose of the present meta-analysis for several reasons, some of which were J o u r n a l P r e -p r o o f reported in the introduction of this paper.Additional reasons include: a) the impossibility to distinguish between outcome-specific and general transfer; b) absence of agreement in the literature on whether and how single-response and multiple-response PIT studies are comparable (Cartoni et al., 2016;Holland, 2004;Xia et al., 2019; but see also Watson and Mahlberg, 2023); the presence of several technical differences that make the two experimental setups hard -or even wrong -to compare (Xia et al., 2019), like the simultaneous presence of wins/losses and of approach/avoidance responses (which ends up involving inhibitory processes; see Mirabella et al., 2008), the absence of selection between alternatives, and so on.Given the presence of so many differences between the two paradigms, not including single-response studies is the most sensible choice to avoid significant biases in the results due to an insufficient homogeneity amongst the samples, which is a common and critical issue in the metanalytic approach (Eysenck, 1994).Another limitation is that studies were not weighted based on their quality.High-quality studies mean that they likely underwent a more thorough manuscript inspection and faced higher publication requirements.However, while this is usually accounted for by including the journal as a moderator, high-impact factor journals don't necessarily ensure more accurate results (Fang and Casadevall, 2011), therefore, we decided not to include it.Lastly, considering the potential presence of a publication bias in both outcome-specific and general transfer effects, the results be influenced by small-study effects or unpublished studies that were not included (Sterne et al., 2001;Thornton, 2000).

Conclusions and guidelines for future studies
In essence, meta-analysis serves as a powerful tool for advancing scientific knowledge and informing evidence-based research and practice.By leveraging the collective insights gathered from a body of research, meta-analytic findings offer invaluable guidance for optimizing experimental design, enhancing result consistency, and informing sample size determination.Moving forward, adherence to the methodological recommendations derived from meta-analyses will facilitate the generation of robust and reproducible research findings, ultimately contributing to the advancement of scientific inquiry and the improvement of clinical practice.By synthesizing a multitude of studies, the present meta-analysis not only provides a comprehensive overview of the literature on human Pavlovian-to-Instrumental transfer but also offers valuable insights to improve the quality and robustness of future experimental designs.Future studies can indeed optimize the strength of the effect and/or the consistency of their results by adhering to methodological recommendations extensively discussed in the discussion section and summarized in Table 3.These include standardizing experimental procedures, enhancing control over confounding variables, and implementing rigorous statistical analyses.Lastly, these results highlight the importance of transparently reporting the instructions and extinction procedures used during the experiment and to not underestimate their influence on the nature and strength of the transfer effect that these may have.Moreover, the present results estimate the effect size of both outcome-specific and general transfer considering a diverse array of studies, thus providing researchers with a benchmark for evaluating the magnitude of the effects and conducting power analyses to determine the appropriate sample size for future investigations (Table 4).
J o u r n a l P r e -p r o o f        The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study.
J o u r n a l P r e -p r o o f

J
o u r n a l P r e -p r o o f -Year -Means and standard deviation (for each condition considered, see PICOS in the "2.3 eligibility criteria" section) -F-value, t-value, degrees of freedom, p-value (for each contrast considered, see PICOS in the "2.3 eligibility criteria" section) -Methodological and task-related factors o Phase order (Pav-Inst/Inst-Pav) o Response measure (number/vigor of responses during transfer phase) o Extinction (nominal/true) o Baseline correction (correction/no correction) o Type of reinforcer (primary/secondary) o Value of reinforcer (appetitive/aversive) o Conditioned stimulus (neutral/motivational) o Instructions (Naïve/CS-relevant/CS-irrelevant) o Sample size -Training and learning factors o Amount of Pavlovian training (i.e., number of trials per CS) o Amount of Instrumental training (i.e., number of trials per outcome) o Number of transfer trials (per CS) o Contingency awareness (aware/unaware) -Individual factors o Population (healthy/addiction/OCD) o Age (average age of the sample)

Figure captions Figure 1 :
Figure captions

Figure 2 :
Figure 2: Forest plot for the outcome-specific transfer.The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are provided.The letters following the year of publication indicate different samples from the same study.

Figure 3 :
Figure 3: Countour-enhanced funnel plot for the outcome-specific transfer.The standard error is reported on the y-axis and the effect size is reported on the x-axis.Purple dots indicate each sample included in the analysis.The vertical line indicates the average effect size estimated by the random effects model and the oblique line indicates the predicted distribution based on effect size and standard error.Background colors indicate the area of statistical significance, as shown in the upper-right legend.

Figure 4 :
Figure 4: Akaike weight for the moderators considered for outcome-specific transfer.The y-axis represents the moderators considered.The x-axis represents the w i of each moderator.The vertical red line indicates the cutoff to define the importance of moderators (w i >0.5).

Figure 5 :
Figure 5: Summary of moderators' effect size and 95% CI for outcome-specific transfer.The x-axis indicates the mean effect size with 95% confidence intervals (CI).The y-axis indicates moderators and subgroups.The continuous line indicates the overall average effect size.The dotted line indicates zero.

Figure 6 :
Figure6: Forest plot of the "Phase order" moderator."Pav-inst" and "Inst-pav" are reported as separate subgroups.The left columns depict the first author, the year of publication, the sample size, and the weight of each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study.

Figure 7 :
Figure 7: Forest plot for "Response measure" moderator."Number" and "Vigor" are reported as separate subgroups.The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study

Figure 8 :
Figure 8: Forest plot for "Extinction" moderator."Nominal" and "True" are reported as separate subgroups.The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study.

Figure 9 :
Figure9: Forest plot for "Baseline correction" moderator."No correction" and "Correction" are reported as separate subgroups.The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study.

Figure 10 :
Figure10: Forest plot for contingency awareness.Forest plot of the subgroup analysis divided into "Aware" and "Unaware" groups.The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study.

Figure 11 :
Figure 11: Forest plot for general transfer.The left column contains the label of the first author of the study and the number of subjects for the samples included in the metanalysis.The central and the right columns contain the standardized mean difference (Hedge's g), the relative 95% Confidence interval, and the relative standard error.J o u r n a l P r e -p r o o f

Figure 12 :
Figure 12: Funnel plot for general transfer with contour-enhanced implementation.The standard error is reported on the y-axis and the effect size is reported on the x-axis.Purple dots indicate each sample included in the analysis.The vertical line indicates the average effect size estimated by the random effects model and the oblique line indicates the predicted distribution based on effect size and standard error.Background colors indicate the area of statistical significance, as shown in the upper-right legend.

Figure 13 :
Figure 13: Akaike weight for the moderators considered for general transfer.The y-axis represents the moderators considered.The x-axis represents the w i of each moderator.The vertical red line indicates the cutoff to define the importance of moderators (w i >0.5).

Figure 14 :
Figure 14: Summary of moderators' effect size and 95% CI for general transfer.The x-axis indicates the mean effect size with 95% CI of the studies for each moderator.The y-axis indicates moderators and subgroups.The continuous line indicates the overall average effect size.The dotted line indicates zero.

Figure 15 :
Figure15: Forest plot for "Instructions" moderator."Naive", "CS-relevant" and "CS-irrelevant" are reported as separate subgroups.The left columns depict the first author, the year of publication, the sample size, and the weight for each study.The right columns display the effect size, the corresponding 95% confidence interval, and the standard error for each study.The central plot illustrates the effect size and the standard error for each study.Below, the effect size (with the relative 95% confidence interval) of the random-effects model, the prediction interval, and the heterogeneity indexes are indicated.The letters following the year of publication indicate different samples from the same study.

Figure 16 :
Figure 16: Meta-regression between moderators and effect size in the general transfer.The bubble plot shows A) the sample size; B) the mean age; C) the number of trials during transfer (per CS) of the study on the x-axes and the effect size of the studies on the y-axes.The size of the dots indicates the weight of the studies, where larger dots represent larger weights as shown in the legend on the right