Implementing practices focused on workplace health and psychological wellbeing: A systematic review

Rationale: Workplace health and wellbeing practices (WHWPs) often fail to improve psychological health or wellbeing because of implementation failure. Thus, implementation should be evaluated to improve the effectiveness of WHWPs. Objective: We conducted a systematic review to identify critical success factors for WHWP implementation and gaps in the evidence. Doing so provides a platform for future theoretical development. Methods: We reviewed 74 separate studies that assessed the implementation of WHWPs and their effects on psychological health or psychological wellbeing. Most studies were from advanced industrial Western de- mocracies (71). Intervention types included primary (e.g., work redesign, 37 studies; and health behavior change, 8 studies), secondary (e.g., mindfulness training, 11 studies), tertiary (e.g., focused on rehabilitation, 9 studies), and multifocal (e.g., including components of primary and secondary, 9 studies). Results: Tangible changes preceded improvements in health and wellbeing, indicating intervention success cannot be attributed to non-specific factors. Some interventions had beneficial effects through mechanisms not planned as part of the intervention. Three factors were associated with successful WHWP implementation: continuation, learning, and effective governance. Conclusions: The review indicates future research could focus on how organizations manage conflict between WHWP implementation and existing organizational processes, and the dynamic nature of organizational contexts that affect and are affected by WHWP implementation. This systematic review is registered [PROSPERO: the International Prospective Register of Systematic Reviews ID: CRD42019119656].


Introduction
Workplace health and wellbeing practices (WHWPs) are classified (LaMontagne et al., 2007;Richardson and Rothstein, 2008) according to whether their target is preventing ill-health or poor wellbeing (primary prevention; e.g., work redesign or health promotion), providing skills for healthy individuals to manage exposure to risk (secondary prevention; e. g., resilience training), or rehabilitation (tertiary intervention; e.g., talking therapies). Although WHWPs can be effective (LaMontagne et al., 2007), implementation factors influence their effectiveness (Egan et al., 2009).
Implementation is "the dynamic process of adapting the program to the context of action while maintaining the intervention's core principles" (Herrera-Sánchez et al., 2017:4). No systematic review has yet integrated research on WHWP implementation across all forms of WHWP and related implementation to intervention outcomes. Prior systematic reviews have focused on variables used in research (Havermans et al., 2016;Wierenga et al., 2013), on specific kinds of WHWP (e. g., return to work interventions, Hoefsmith et al., 2012; see also Moran et al., 2014;Murta et al., 2007;Rojatz et al., 2016), a specific implementation issue (managers' support for interventions, Passey et al., 2018), and the rigor of WHWP intervention studies (Burgess et al., 2020). A scoping review focused on identifying gaps between research and practice (Rasmussen et al., 2018).
Conceptual and narrative reviews of WHWP implementation have developed frameworks to guide researchers and practitioners. In Table 1, we propose a typology of these frameworks. We identified five types, which can be divided into frameworks to evaluate factors that influence intervention effectiveness (implementation, appraisal, and realist frameworks) and models of best practice (best practice models and a sub-set focused on regulatory compliance). Implementation frameworks focus on providing guidance on implementation, what should go into a successful intervention, and segmentation of interventions into planned phases. Appraisal frameworks focus on the design of evaluation studies and include checklists of factors that support intervention effectiveness. Realist evaluation, specifically Pawson's notion of Context, Mechanisms, and Outcome (CMO) configurations (Pawson and Manzano-Santaella, 2012), represents a methodology for describing how complex interventions work (Greenhalgh, 2014). Best practice and regulatory compliance models prescribe that WHWPs should consist of planned stages of activities.
A limitation in the literature is the lack of theoretical or conceptual bases for research on WHWP implementation (Biron & Karanika-Murray, 2014Burgess et al., 2020;Martin et al., 2016;Nielsen, 2013). Without a comprehensive mapping of research on how WHWP implementation affects WHWP outcomes, it is not possible to know the empirical regularities that can provide a basis for theoretical development, unknowns requiring empirical investigation, and ambiguities requiring theoretical resolution. The objectives of this systematic review are to identify critical success factors for WHWP implementation and gaps in the evidence. Doing so provides a platform for future theoretical development.
We reviewed studies that assessed components of psychological wellbeing (e.g., affective and eudaimonic wellbeing, Waterman, 1993). Requires accumulation of body of empirical evidence of theorized configurations in order to make theoretical progress.
Refinement and development required of staged models, mapping and assessment techniques, in practice settings.
Call for application of guidance by organizations. Biron and Karanika-Murray, 2014;Fridrich et al., 2015;,2017Nielsen & Randall, 2013;Havermans et al., 2016Egan et al., 2009Hoefsmith et al., 2012;Moran et al., 2014;Murta et al., 2007;Passey et al., 2018;Rojatz et al., 2016;Wierenga et al., 2013 The focus on psychological wellbeing reflects that many WHWPs target and have benefits for psychological wellbeing (LaMontage et al., 2007), and that improvements in physical health provide psychological benefits (Steptoe et al., 2015). Focusing on psychological wellbeing enables inclusive and comprehensive coverage of WHWPs, compared to focusing on interventions for specific health conditions. Therefore, our review is focused on studies that report on the implementation and effects on psychological wellbeing of the full range of WHWPs (primary, secondary, and tertiary), regardless of the intended focus of the intervention. We included interventions focused on improving health and wellbeing directly (e.g., health promotion) or indirectly through changes to the work environment (e.g., enhancing managerial skills).

Methods
The review protocol followed the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P, Shamseer et al., 2015).

Criteria for inclusion and exclusion
The PICOS framework guided the development of search terms and inclusion/exclusion criteria (population, intervention, comparison, outcomes, and study design, Shamseer et al., 2015, see protocol for search terms). Fig. 1 shows the bibliographic databases searched.
Population. Studies of working adults or sick-listed workers, including employees and the self-employed. We placed no restrictions on occupational sector or country.
Intervention. Factors involved in WHWP implementation. We took a broad approach, including interventions that were primary focused on work redesign, primary focused on health behavior, secondary, tertiary, or multifocal interventions combining features of other intervention types.
Comparison. Studies assessing markers of psychological health and wellbeing, enabling comparisons between interventions that improved indicators, those with no effects and those with adverse effects. Where studies used other health indicators, these were considered (e.g., health behaviors).
Outcomes. Primary outcomes were factors influencing WHWP implementation. Formal process evaluations and other studies were included that provided data, for example, on how interventions were adapted and/or stakeholder actions involved implementing or resisting the intervention. Studies that just reported on the effectiveness of an intervention without considering its implementation were excluded. Secondary outcomes were changes in psychological wellbeing indicators (as defined above). Studies needed to include both primary and secondary outcomes.
Study design. Qualitative, quantitative, and mixed-methods studies with a longitudinal element were included (randomized control trial, non-equivalent control group design, and pre-test/post-test only). Other. We only included empirical studies published in peerreviewed journals. We focused on peer-reviewed research because there is a sufficient data within the peer-reviewed literature to answer the research questions and peer-review provides assurance of quality and rigor. We searched English language databases only, but did include articles published in other languages. We included studies from 2009 onwards, because such studies tend to use more rigorous methodologies and incorporate findings from previous research.

Study selection
Searches identified 18,011 titles. At least two independent reviewers coded the papers at every stage. At initial title-sifting, a paper moved to abstract sifting if at least one reviewer thought it met the inclusion criteria. Abstracts moved to full-text screening and then to data synthesis, if both reviewers thought the paper met the inclusion criteria. Disagreements were resolved through discussion. Average agreement between reviewers exceeded 77% (Cohen's kappa ≥.30) at each stage of sifting, figures that justify using two reviewers for each title, the inclusive approach to sifting adopted, and resolving disagreements through discussion. Seventy-four unique interventions were included in the review, represented in 86 separate papers. Fig. 1 summarizes the sifting process.

Data extraction
Prior to full sifting, we piloted and modified data extraction sheets.
Two review team members independently extracted data from each study to ensure comprehensive coverage of relevant data. We undertook additional searches to find papers that contained data on intervention effectiveness if such data were not included in the papers reviewed. We extracted data from 31 additional sources, leading to a total of 117 papers that described the 74 studies.

Synthesis
We developed a coding frame from prior systematic reviews and frameworks that list factors associated with facilitating or impeding the implementation of WHWPs (Table 1 and Introduction provide exemplary citations). We refined the coding frame by reading the papers included in the review and through interviews (N = 42) with various organizational stakeholders (occupational health and human resources practitioners, senior managers, and front-line workers). We doublecoded a random sample of ten papers and modified the coding frame for consistent application prior to interpretation and synthesis. A random sample of a further ten papers were double-coded with the revised frame, revealing consistency in classifying intervention type (κ = 1), effectiveness (κ = 0.78, 90% agreement), making of changes (κ = 1), and coding of contextual features (κ = 0.82, 89% agreement). Discrepancies were discussed, and the first coder's interpretation was deemed credible. To further ensure robustness of data synthesis, all authors checked the synthesis of the data and its interpretation across multiple iterations. Table 2 summarizes the coding frame.
First, data were coded according to intervention type following Table 2 Coding structure.

Code Description
Intervention Primary work redesign focused; Primary health behavior focused; Multifocal (e.g., secondary + primary work redesign); Secondary; Tertiary Benefits Beneficial: Demonstrable effectiveness on at least one health/wellbeing marker (and no adverse effects) between control and intervention conditions (direct effects shown).
Contingently beneficial: No demonstrated effectiveness on any health/wellbeing marker (and no adverse effects) between control and intervention conditions, but changes in at least one health/wellbeing marker for sub-groups (moderation) or in conditions where the intervention was implemented (effects transmitted through a mediator that is a marker of intervention implementation or intervention mechanisms). Non-beneficial: Null or adverse effects. One adverse effect in the presence of other improvements in health/wellbeing is classified as nonbeneficial, although such cases should be flagged in the analyses. Changes made (mechanisms activated) Changes made, not made or not made as intended (e.g.; wellbeing related roles, wellbeing related Human Resources, wellbeing related education, job quality, physical environment, tangible wellbeing resources) to activate mechanisms (or not) that explain changes in wellbeing.
Mechanisms can be intendedthe intervention worked according to the theoretical principles of intervention (e.g., a work redesign intervention evidences changes in job quality linked to changes in wellbeing). Mechanisms can be unintendedevidence the mechanisms worked according to some process not anticipated (e.g., a health promotion intervention evidences changes in social relationships linked to wellbeing, rather than changes in health behaviors). Negative mechanisms -unintended mechanisms producing adverse effects (e.g., a health promotion intervention encourages competition between work teams, leading to deteriorating social relationships). Omnibus context External omnibus context External shocks (e.g., financial crash) or a range of other external facilitators/inhibitors (e.g., labor market conditions).

Internal omnibus context
Factors internal to the organization not directly related to the intervention, including shocks (e.g., takeovers), competing priorities/logics, organizational capability/capacity (e.g., availability of resources). Discrete context Organizational culture/political factors Evidence of changing rituals and routines for symbolic purposes (e.g., middle manager stress management training, which may serve as a signal to others); evidence of narratives relating wellbeing to organizational values; evidence of symbolic involvement of senior managers and decisions to invest effort funds; evidence of use of power to influence the intervention. Governance/delivery structures Co-ordination and management of intervention activities, including factors such as presence of a steering committee, assigned responsibility for wellbeing and intervention implementation, who is represented in the governance structures, level of planning and program theory guiding the intervention, use of evidence-based practice, embedding wellbeing initiatives in a strategy. Sequencing Planned order of events/activities (e.g., prescribed order of assessment, decision, intervention, evaluation).

Continuity
Perseverance in implementation efforts, local adaptations, embedding practices into everyday activities.

Learning structures
Procedures for capturing learning from implementation for adaptation and/or capacity/capability building. Service/service provider characteristics Features of the intervention (e.g., novelty) or the people implementing aspects of the intervention at an operational level (e.g., training delivery). Relates to perceptions/attitudes/expectations and behaviors including commitment, value placed on health/wellbeing, beliefs on responsibility for health/wellbeing, denial/withdrawal from intervention, diffidence about health/wellbeing, passive and active resistance to intervention, competence/capacity/capability for implementation, passive or proactive engagement in intervention.

Worker dispositions
Dispositions of recipients of the intervention. Examples the same for service provider characteristics. Line/middle manager dispositions Dispositions of immediate managers of the recipients or other managers whose day-to-day work may affect the intervention implementation. Examples the same for service provider characteristics.

Senior manager dispositions
Dispositions of senior organizational leaders (CEO and other C-suite executives). Examples the same for service provider characteristics. Expert/strategic implementers dispositions Specialist functional roles with relevant expertise for implementation at a strategic/program level rather than operational levelmainly related to dispositions of human resources or occupational health functions. Examples the same for service provider characteristics.
classifications used in previous reviews (LaMontagne et al., 2007;Richardson and Rothstein, 2008). The classifications were primary work redesign, primary health behavior, secondary, tertiary, and a category for multifocal interventions that combined elements of other types of intervention (e.g., primary work redesign and secondary). We classified intervention effectiveness according to whether the intervention had any benefits or not. Given the number of variables collected in studies varied, we considered the minimal benefit to be a demonstrable change in at least one health or wellbeing indicator, accompanied by no adverse effects. We differentiated those interventions that had benefits for the entire sample from those interventions where the benefits were contingent on another factor (i.e., moderation) or where indirect effects were transmitted through intervention implementation (i.e., mediation) with inconsistent, but no negative, effects across the sample. Ineffective interventions were classified as those with null or adverse effects (including studies where there was one adverse effect on health/wellbeing indicators, irrespective of other benefits).
Using the CMO framework for realist evaluation (Pawson and Manzano-Santaella, 2012), we coded data for factors related to changes leading to the activation of mechanisms and a range of contextual features. We differentiated context according to whether it referred to the omnibus context of factors in the wider organizational environment (e. g., prevailing labor market conditions) or the discrete context of intervention implementation (i.e., contextual factors around the intervention or stakeholders' attitudes to WHWPs) (Johns, 2006).
We used Snape et al.'s (2019) quality rating scale, which integrates guidance on research quality for quantitative (GRADE, Early Intervention Foundation) and qualitative research (CERQual, CASP). Snape et al. recommend providing a strength of evidence rating for each review finding, summarized as an evidence statement. Snape et al.'s four-point scale ranges from: 'strong evidence', in which there is confidence a finding is robust; 'promising evidence,' which suggests the finding is robust, but requires further investigation; 'initial evidence,' where there is less confidence than for 'promising evidence' and further investigation is required; and 'no evidence/evidence not yet strong enough for conclusions,' where there is insufficient evidence to draw conclusions. We rated the strength of each evidence statement by examining reviewers' judgements of the quality of the studies underpinning each evidence summary and the consistency of the evidence underpinning each evidence statement. Data extraction sheets contained information and a summary statement on the quality of each study. Each strength of evidence grading was accompanied by an explicit rationale. Evidence ratings were developed through consensus within the review team and consultation with three external experts (see acknowledgements). Table 3 shows the studies reviewed. NN Numbers signify the studies in the tables because multiple papers sometimes described the same study. The review included data from 16,319 workers participating in interventions and 6,685 workers in control groups. Forty-eight of the 74 studies were from Northern Europe, 23 from other advanced Western democracies (e.g., Canada), one from another advanced democracy (Korea), and one each from Turkey and China. A range of sectors were included, including construction, manufacturing, and utilities. Twentyseven studies were conducted in health or social care organizations and 15 in public service organizations (e.g., education).

Results and discussion
Thirty-seven studies were evaluations of primary work redesign interventions (e.g., psychosocial risk assessment followed by team meetings to develop action plans, 2, Biron et al., 2010); eight were evaluations of preventive health behavior change interventions (e.g., physical activity promotion through peer encouragement, information provision, subsidized gym membership, and pedometer provision, 37, Edmunds et al., 2013); nine were evaluations of multifocal interventions (e.g., psychosocial risk assessment, team-led changes to work environments, leadership development, stress management training, and health information, 8, Fridrich et al., 2016;Jenny et al., 2011Jenny et al., , 2015; eleven were evaluations of secondary interventions (e.g., mindfulness training,

Table 3
Studies in the review, sample sizes, and intervention types. 12, Braganza et al., 2018); and nine were evaluations of tertiary interventions (e.g., physician guided problem-solving to support return to work for workers with minor mental health problems, 49, Arends et al., 2014aArends et al., , 2014b. The eight preventive health behavior change interventions were entirely or largely focused on physical health (e.g., physical activity). All except nine of the remaining interventions were focused on psychological wellbeing and health. Of these nine, five had a dual focus on physical and psychological health (20, Lundmark et al., 2017;21, Jensen, 2013;63, Mabry et al., 2018;Olson et al., 2016;64, Lee et al., 2014;and 66, Brisson et al., 2006;Oude Hengel et al., 2011. The others focused on reducing muscular-skeletal problems or ergonomic risk (62, Sorensen et al., 2011Sorensen et al., , 2016; safety (70, Tregaskis et al., 2013); and sedentary behaviors (50, Hadgraft et al., 2017;Healy et al., 2017;51, Brakenridge et al., 201651, Brakenridge et al., , 2018. Twenty-eight interventions were classified as beneficial (N = 6845 for treatment conditions, N=4333 for control conditions), 17 as contingently beneficial (N=6223 for treatment conditions, N=600 for control conditions), and 29 as conferring no benefits or as harmful (N = 3251 for treatment conditions, N = 1652 for control conditions). Randomized controlled or non-equivalent control group designs were used to evaluate 14 of the beneficial interventions, five of the contingently beneficial interventions, and 17 of the non-beneficial interventions. There is, therefore, no indication that stronger research designs (randomized or non-equivalent control group designs) were associated with intervention effectiveness. Table 4 summarizes the evidence on whether changes were made and/or mechanisms activated, alongside overall sample sizes for intervention and control groups. In all the beneficial interventions, across all intervention types, changes were made and some mechanisms activated. The mechanisms activated were not always those mechanisms intended (e.g., workplace health promotion leading to behavior change). In all cases where unintended mechanisms were activated, intervention effectiveness was attributed to improvements in the social aspects of workplaces brought about by social activities underpinning intervention implementation (e.g., workshops, group exercises). In two cases, the unintended mechanisms were also attributed to changes in aspects of workplace cultures, specifically health behavior norms (37, Edmunds et al., 2013;50, Hadgraft et al., 2017. In another, changes in workplace behavioral norms were the intended mechanism of change (48, Byron et al., 2015).

Changes and mechanisms
For contingently beneficial interventions, no studies reported the activation of intended mechanisms. In three studies, where changes were implemented at least partially, the interventions' mechanisms were through unintended effects on workplace cultures. In four studies, some participants were exposed to contextual factors that may have affected intervention implementation. In four studies, changes were not implemented for some participants and, in one study (38, de Visser, 2017, 2018), some participants had access to a restricted range of intervention components.
For non-beneficial interventions, no studies provided evidence that mechanisms were activated. Changes were not implemented at all or as intended, contextual factors may have hindered the implementation of changes or activation of mechanisms, or changes were implemented but no mechanisms activated. In one study where changes were made but mechanisms were not activated (51, Brakenridge et al., 2016Brakenridge et al., , 2018, a secondary intervention was focused on mitigating muscular-skeletal risks from poor sitting positions through supported use of an activity tracker. Although the intervention group improved on movement (step count), there was no improvement in wellbeing outcomes. In this case, it may be the mechanisms activated were insufficient to have an impact on health/wellbeing outcomes, at least during the evaluation period.
In summary: Evidence statement 1: To produce benefits for wellbeing, a necessary but not sufficient condition is for the WHWP to activate intended mechanisms or mechanisms emergent from intervention implementation. (Rated strong evidence, Table 5). Table 5 summarizes the evidence statements, ranked by the strength of evidence with a rationale for the grade given to each evidence statement. Table 6 summarizes the evidence on various aspects of omnibus and discrete intervention contexts, categorized according to intervention outcome (beneficial, contingently beneficial, and non-beneficial), overall sample sizes for those exposed to the intervention (treatment group) and whether the contextual feature was considered a negative or positive context for implementation. Examples of negative contextual features include recessionary pressures, negative middle manager attitudes to health/wellbeing initiatives, and omitting key stakeholders from intervention governance. Examples of positive contextual features include structures for effectively capturing learning from implementation, problem-solving to overcome barriers to implementation, and appropriately resourced professional implementers. Table 6 shows beneficial outcomes tend to be associated with positive, internal omnibus contexts. Adequate financial resources were the most frequently mentioned positive feature of the omnibus context. Positive internal omnibus contexts seemed not to guarantee intervention effectiveness. Moreover, mention was made of lack of resources in studies of two beneficial interventions (12, Braganza et al., 2018;50, Hadgraft et al., 2017 and a contingently beneficial intervention (73, Hasson et al., 2014).

Omnibus context
Negative internal contexts tend to be associated with less beneficial interventions, although this is not always the case (Table 6). The most frequently mentioned negative feature was competing priorities (e.g., workload, time constraints, other organizational changes). In one study (48, Byron et al., 2015), the intervention was modified to prevent intervention sessions clashing with work commitments. Other  organizational changes appeared to differentiate many contingently and non-beneficial interventions from beneficial interventions. Nevertheless, two studies (6, Abildgaard et al., 201626, Nielsen and Randall, 2012, Nielsen et al., 2010, 2017, Randall et al., 2009) indicated that concurrent changes may not always affect the implementation and/or effectiveness of an intervention. Study 6 reported a wider cultural shift in the organization, of which the intervention was just one part. Study 26 reported a negative impact on job satisfaction, but positive effects on other wellbeing markers. Both study 6 and 26 reported on other factors supporting the intervention (e.g., learning structures) and that initially skeptical workers developed positive attitudes towards the intervention over time. Therefore, features of the discrete context may overcome negative features of the internal omnibus context. Evidence statement 2: Although adverse internal omnibus contexts can affect the implementation and effectiveness of WHWPs, overall there is mixed evidence on the relationship between the favorability of a range of internal contextual factors and WHWP implementation. (No strength of evidence grading, Table 5).
Contextual factors external to the organization were not associated with beneficial interventions. Adverse external environments appear to have detrimental effects on WHWP implementation and effectiveness. In a study of a contingently beneficial intervention (68, van Wingerden et al., 2013), workers were trained to make improvements to their working conditions. Those workers who did not implement the intervention felt external political factors constrained individual choices or resources. Studies 5 (Hoefsmit et al., 2016a) and 9 (Andersen et al., 2014;Martin et al., 2012Martin et al., , 2013Martin et al., , 2015 were work/rehabilitation interventions: Poor labor market conditions were blamed for lack of success due to restricted opportunities to place participants back into work. For study 66 (Brisson et al., 2006;Oude Hengel et al., 2011, recessionary pressures were blamed for impaired intervention implementation, although it is unclear which recessionary pressures, for example, constrained resources or influenced internal organizational change (both features of the internal omnibus context) were the cause. Research is therefore required on how internal contexts change in response to changes in external contexts because properly managed internal responses may not affect WHWPs implementation, as noted above.
Evidence statement 3: Adverse external environments affect detrimentally the WHWP implementation and effectiveness. (Initial evidence, Table 5).

Organizational cultural and political factors and their role in delivery of WHWPs
The favorability of internal organizational political and cultural factors tends to be associated with more beneficial interventions (Table 6). It is possible to differentiate between situations where cultural or political factors were used to aid the intervention and situations where cultural and political factors hindered implementation.
There were examples of cultural and political factors aiding implementation from beneficial interventions, contingently beneficial, and a non-beneficial intervention. Examples include union involvement in the intervention to build trust with workers (political, 70, Tregaskis et al., 2013), using elements of the intervention to create shared understandings about the intervention (cultural, 3, Augustsson et al., 2015;Tafvelin et al., 2018;von Thiele Schwarz et al., 2015, 2017, taking into account existing social norms when developing interventions (cultural, 74, Sørensen and Holman, 2014), senior managers signaling strategic support for the intervention (cultural, symbolic, 12, Braganza et al., 2018;66, Brisson et al., 2006, Oude Hengel et al., 201148, Byron et al., 2015;74, Sørensen and Holman, 2014), and mandating participation in the intervention (political, 8, Fridrich et al., 2016;Jenny et al., 2011Jenny et al., , 2015. There appears to be an increased probability of intervention effectiveness from power associated with formal positions of authority or representation (e.g., unions) and/or organizational cultural norms that enable stakeholder sense-making. However, the presence of one nonbeneficial intervention (66, Brisson et al., 2006;Oude Hengel et al., 2011 suggests engaging with political and cultural factors does not guarantee success. Moreover, there are some questions over how political and cultural factors have effects, either through aiding implementation (e.g., taking existing norms into account, 74, Sørensen and Holman, 2014) Albertsen et al., 2014;Garde et al., 2012). Study 34 (Zhang et al., 2015(Zhang et al., , 2016 was an unusual case in which the implementation team exercised its expert power by withdrawing the intervention from an unreceptive context. The presence of a beneficial intervention amongst cases of negative adverse political and cultural contexts suggests adverse cultural and political contexts can be overcome (50, Hadgraft et al., 2017;Healy et al., 2017).
Evidence statement 4: Overt use of power and/or cultural aids WHWP effectiveness, and adverse political and/or cultural factors hinder WHWP effectiveness. (Initial evidence, Table 5). Table 6 indicates that (dys)functional governance and delivery structures tend to be associated with intervention (in)effectiveness.
Evidence statement 5: Effective governance and clear delivery structures appear to be a necessary but not sufficient condition to facilitate WHWP

studies N = 14325 in treatment groups
Mixed or unclear evidence across the studies (both statements).

6: The relationship between the sequencing of specific activities and WHWP implementation is unclear.
35 studies N = 7,577 in treatment groups implementation. (Promising evidence, Table 5).

Planned sequencing of activities
A planned sequence of activities is not clearly related to intervention effectiveness ( Table 6). Examples of sequencing from beneficial interventions include a staged sequence of intervention workshops or modules (23, Goldberg et al., 2015), staged approach to design, development and implementation (16, Menzel et al., 2015), and forward planning of activities (24, Edwards and Higuchi, 2018).
Needs/risk assessment is specified as an early activity in many best practice models and regulatory compliance guidelines (Table 1). Where needs/risk assessment was mentioned as an early activity, it was associated with three beneficial, three contingently beneficial, and six nonbeneficial interventions. In non-beneficial interventions, reasons for problems with needs/risk assessment include: managers reacted badly to the results of assessments leading to implementation problems (25, Coffey et al., 2009;45, Schelvis et al., 201645, Schelvis et al., , 2017; issues with decision-makers' understanding of results from assessments (2, Biron et al., 2010;5, Andersen et al., 2014;Martin et al., 2012Martin et al., , 2013Martin et al., , 2015; and assessments causing participants to experience psychological discomfort (5, Andersen et al., 2014;Martin et al., 2012Martin et al., , 2013Martin et al., , 2015. The presentation of evidence from needs/risk assessments may be an important factor. For example, a contingently beneficial intervention included a risk/needs assessment that was tailored to a specific context (6, Abildgaard et al., 2016Abildgaard et al., , 2018Nielsen et al., 2014;von Thiele Schwarz et al., 2017).
Evidence statement 6: The relationship between the sequencing of specific activities and WHWP implementation is unclear. (No strength of evidence grading, Table 5).
Two non-beneficial interventions evidenced attempts at continuity. In one (25, Coffey et al., 2009), although there were no improvements in health/wellbeing markers, there were improvements in health literacy, changes in organizational policies and practices, and staff empowerment. In the other (62, Sorensen et al., 2011Sorensen et al., , 2016, coherent communication about the intervention appeared to be lacking. Eight non-beneficial interventions reported on why no attempts were made at continuity in implementing, adapting, or sustaining the intervention. The reasons include the time-limited nature of the intervention (e.g., 27, McGilton et al., 2013), abandonment of the governance structure (53, Anderson and Sice, 2016), and minimal or no participant engagement with the intervention (e.g., 51, Brakenridge et al., 2016Brakenridge et al., , 2018.
In summary, WHWP effectiveness appears to be associated with effort in ensuring continuity of implementation, including adaptation. There is a qualifying condition that such efforts at continuity require regular communication about WHWPs (6, Abildgaard et al., 201657, Mikkelsen et al., 2011;62, Sorensen et al., 201162, Sorensen et al., , 2016. Evidence statement 7a: A critical success factor for WHWPs is continuity in efforts at implementing, adapting, or otherwise sustaining the intervention. (Strong evidence, Table 5).
Evidence statement 7b: Frequent communication about the intervention assists continuity of efforts. (Initial evidence, Table 5).

Learning structures
We focused on studies of interventions in which learning structures supported intervention implementation, rather than studies in which learning was the planned mechanism.
Two beneficial interventions and three contingently beneficial interventions reported on learning structures to support implementation. Examples of learning structures from beneficial interventions include use of Kaizen principles, coaching, problem-solving approaches, workshops (all from 3, Augustsson et al., 2015;Tafvelin et al., 2018;von Thiele Schwarz et al., 2015, 2017), and training (7, Mejías Herrera and Huaccho Huatuco, 2011. Learning structures may build continuity, as continuity in efforts at implementing or adapting the intervention co-occurred with learning structures in three cases (3, Augustsson et al., 2015, von Thiele Schwarz et al., 2015, 20176, Abildgaard et al., 20168, Fridrich et al., 2016, Jenny et al., 2011 and dysfunctional learning structures co-occurred with lack of continuity in one non-beneficial intervention (2, Biron et al., 2010). Where functional learning structures were present in both beneficial interventions and two contingently beneficial interventions (6, Abildgaard et al., 20168, Fridrich et al., 2016, Jenny et al., 2011, governance structures were present. Where functional learning structures were reported in one contingently beneficial intervention (74, Sørensen and Holman, 2014) and all non-beneficial interventions, no evidence of governance structures was provided. Dysfunctional governance and dysfunctional learning structures were present in the non-beneficial intervention (2, Biron et al., 2010). Therefore, functional governance structures may promote functional learning structures, in turn facilitating adaptation of interventions during implementation (von Thiele Schwarz et al., 2016).
Evidence statement 8: Learning structures, coupled with effective governance structures, help adaptation and continuity in WHWP implementation. (Initial evidence, Table 5).

Service or service delivery characteristics
Thirty-seven studies reported on the service or service delivery. Examples of positive features of interventions include fit with participants and/or context (56, Moll et al., 2018aMoll et al., , 2018b; similarity of service delivery professionals to participants (48, Byron et al., 2015); and novelty (29, Kinser et al., 2016). Examples of negative features include incompatibility with working patterns/spaces (55, Havermans et al., 2018a); negative evaluations of intervention content (19, Russell et al., 2016); lack of clarity/communication about the intervention (1, Pålsson et al., 2018); negative evaluations of service delivery professionals (43, van Oostrom, 2009van Oostrom, , 2010; and problems with supporting technologies (32, Foureur et al., 2013). Table 6 indicates a trend for beneficial interventions to have positive service/service delivery features relative to less beneficial interventions. Regardless, seven beneficial interventions and seven contingently beneficial interventions had negative service delivery features. There is a trend for non-beneficial interventions to have more negative features relative to beneficial interventions, although removing preventive work redesign studies from consideration removes this trend. Therefore, while positive service/service delivery features may enhance implementation of effective interventions, negative features do not necessarily undermine implementation or WHWP effectiveness. Overcoming negative features may be especially problematic for primary work redesign interventions.
Evidence statement 9: Positive service/service delivery features enhance WHWP implementation; negative service/service delivery features can be overcome. (Promising evidence, Table 5).

Key stakeholders: workers, managers, and professional implementers
Examples of worker dispositions to WHWPs include: levels of mistrust or confidence in management (e.g., 27, McGilton et al., 2013); worker skepticism about the intervention (24, Edwards and Higuchi, 2018); and fear of, readiness, or capability to change (e.g., 10, Chau et al., 2014Chau et al., , 2016; including health as a barrier in tertiary interventions, e.g., 38, de Visser, 2017, 2018). Table 6 indicates that positive/negative worker dispositions tend to be associated with more/less beneficial interventions. Nevertheless, some interventions conferred benefits in the presence of negative worker dispositions. Worker attitudes improved over time in four studies. Union involvement overcame mistrust in a beneficial intervention (70, Tregaskis et al., 2013). In a contingently beneficial intervention (28, Chapleau et al., 2011), adaptations were made to the intervention in response to negative worker attitudes, after which attitudes changed and wellbeing improved. One intervention was labelled non-beneficial because of an adverse effect on job satisfaction, although there were positive effects on other wellbeing markers (26, Nielsen and Randall, 2012;Nielsen et al., 2010Nielsen et al., , 2017Randall et al., 2009). In another non-beneficial intervention (53, Anderson and Sice, 2016), although worker attitudes were changing, senior managers abandoned the intervention.
Evidence statement 10: Positive worker dispositions towards WHWPs and WHWP implementation are associated with beneficial outcomes; negative dispositions can be overcome. (Promising evidence, Table 5).
Beneficial interventions tend to be associated with senior manager positivity and less beneficial interventions with senior manager negativity (Table 6). Senior management support was present but not seen as critical to implementation (3 Augustsson et al., 2015;Tafvelin et al., 2018;von Thiele Schwarz et al., 2015, 2017. These instances indicate there may be some circumstances where senior manager dispositions are not critical to WHWP implementation or effectiveness. In most other cases, it seems to be that senior managers prevent or hinder implementation rather than hinder the activation of mechanisms, because there were only two cases of non-beneficial interventions where changes were made despite negative senior manager dispositions (60, Cummings et al., 2013;71, Greasley andEdwards, 2015, Greasley et al., 2012).
Evidence statement 12: There relationship between senior manager dispositions towards WHWPs and WHWP implementation is unclear, although senior managers can block or hinder implementation of changes, or less frequently, undermine the effectiveness of changes that are made. (Promising evidence, Table 5).

Strengths and limitations
One strength of this review is its inclusivity compared to previous reviews, synthesizing evidence from a wide range of intervention types and engaging with complex features of organizational contexts. One question is whether the implementation factors associated with effective interventions varies by intervention type. In initial syntheses of data, we did separate analyses for each intervention type, and found no appreciable differences between intervention types, except where noted above. Consistency of findings across intervention types mitigates against concerns over the number of work redesign interventions in the review (35). Notwithstanding, future research could redress the balance of interventions studied.
A limitation concerns the locations and sectors where studies were conducted. Forty-eight of studies were from Northern Europe and 71 from advanced Western democracies. Twenty-seven studies were conducted in health or social care organizations, and a further 15 in public service organizations (e.g., education). The geographical and sectoral spread of the studies does indicate a need for research from a wider range of contexts.
The present review complies with many features of good practice guidelines for systematic reviews (Johnson and Hennessy, 2019)., Although two reviewers independently extracted data for each study, coding and synthesis was conducted by one reviewer (lead author). This was to accommodate the qualitative and nuanced nature of the data, as well as the breadth of the codes in the coding frame. Notwithstanding, data synthesis was checked by review team members and double-coding a sample of papers indicated the credibility of the coding. In comprehensively reviewing the literature on WHWP implementation, we hope future research is able to develop fine-grained definitions of facets within each broad code used here.

Conclusions
We build on prior reviews and conceptual frameworks by studying the full range of interventions, and synthesizing evidence on how a comprehensive range of implementation factors are linked to intervention outcomes. The reviews' contributions are threefold. First, we identify areas requiring targeted empirical investigation. Gaps in research are associated with evidence statements that were rated as promising or initial evidence, or where no strength of evidence rating was given (Table 5).
Second, the review summarizes empirical regularities that can become a basis for further theoretical development. An important finding is that there is strong evidence that WHWPs have their effects on psychological wellbeing through activating mechanisms whether intended in the planning of the WHWP or emergent from its implementation. Mechanisms emergent from implementation tended to be associated with social factors, a finding consistent with ideas that social mechanisms provide paths to WHWP effectiveness (Karanika-Murray and Biron, 2013). Non-effective interventions were either not implemented or contextual factors inhibited activation of mechanisms.
We found that a critical success factor for WHWP implementation is continuity of effort and adaptation of interventions, supported by functional learning and governance structures. Learning structures and consultative and inclusive governance structures may provide means to capture local adaptations during implementation, to disseminate adaptations across the organization, and to communicate regularly with stakeholders to establish a coherent narrative around the WHWP.
Governance structures that include senior managers and are well resourced may act as signals of the importance of worker health and wellbeing, and encourage positive worker and line manager attitudes and behaviors towards WHWPs. Findings therefore suggest that further conceptual development could focus on the role of continuity of effort and supporting learning and governance structures in activating intended and emergent mechanisms.
Third, we have identified ambiguities requiring theoretical resolution. Our review indicates that a range of adverse contextual factors can influence WHWP implementation, although they do not do so predictably. Abstracting across all of areas of omnibus and discrete context, research on WHWP implementation has left largely unexplored the inherent conflicts between existing organizational processes (political, cultural, and sociotechnical) and WHWP implementation. Therefore, conceptual work is needed on how organizations resolve conflicts between WHWP implementation and other organizational processes.
Another area for conceptual development is to consider context in a dynamic and multilayered way (19, Russell et al., 2016). Studies in our review that reported changes in workplace social relationships, cultures, and norms indicate WHWPs can change omnibus contexts, potentially making the context conducive for implementing more WHWPs (Hall et al., 2010). The connections between WHWPs in the same workplace have been ignored in the implementation literature, although comprehensive approaches may be more effective than single interventions (cf. LaMontagne et al., 2007). Therefore, there exists a possibility of a further differentiation of context that includes the discrete micro-context of implementing a single WHWP, the omnibus macro-context of the organization, and a meso-context concerned with the introduction and management of multiple WHWPs over an extended period of time.

Declaration of competing interest
None.