A systematic review of non-randomised evaluations of strategies to improve participant recruitment to randomised controlled trials

Background: Recruitment to trials can be challenging. Currently, non-randomised evaluations of trial recruitment interventions are rejected due to poor methodological quality, but systematic assessment of this substantial body of work may inform trialists’ decision-making about recruitment methods. Our objective was to quantify the effects of strategies to improve participant recruitment to randomised trials evaluated using non-randomised study designs. Methods: We searched relevant databases for non-randomised studies that included two or more interventions evaluating recruitment to trials. Two reviewers screened abstracts and full texts for eligible studies, then extracted data on: recruitment intervention, setting, participant characteristics, number of participants in intervention and comparator groups. The ROBINS-I tool was used to assess risk of bias. The primary outcome was the number of recruits to a trial. Results: We identified 92 studies for inclusion; 90 studies aimed to improve the recruitment of participants, one aimed to improve the recruitment of GP practices, and one aimed to improve recruitment of GPs. Of the 92 included studies, 20 were at high risk of bias due to confounding; the remaining 72 were at high risk of bias due to confounding and at least one other category of the ROBINS-I tool. The 20 studies at least risk of bias were synthesised narratively based on seven broad categories; Face to face recruitment initiatives, postal invitations and responses, language adaptations, randomisation methods, trial awareness strategies aimed at the recruitee, trial awareness strategies aimed at the recruiter, and use of networks and databases. The utility of included studies is substantially limited due to small sample sizes, inadequate reporting, and a lack of coordination around deciding what to evaluate and how. Conclusions: Careful thought around planning, conduct, and reporting of non-randomised evaluations of recruitment interventions is required to prevent future non-randomised studies contributing to research waste. Registration: PROSPERO CRD42016037718


Abbreviations Introduction
Randomised controlled trials (RCTs) are at the core of evidence-based healthcare. They use random assignments to allocate participants to treatment groups, and therefore guard against selection bias 1 , whether these involve medicinal products, devices or services. Recruiting participants can be difficult, as can the process of recruiting clinicians to work on the trial with and on behalf of the trial team 2 .
One important source of evidence for trialists looking for rigorously evaluated evidence on how to effectively recruit participants to trials is the 2018 Cochrane systematic review of interventions to improve trial recruitment 3 . Despite having no date or language restrictions and including 72 recruitment comparisons, just three are supported by high-certainty evidence 3 .
This systematic review reported here uses a similar process to the 2018 Cochrane systematic review of interventions to improve trial recruitment 3 , but with one substantial difference. This review focusses only on recruitment interventions that are evaluated using non-randomised methods. Until now, systematic reviews of non-randomised studies of recruitment interventions have been scarcely undertaken due to the perception that non-randomised studies are individually, of low methodological quality. However, the systematic evaluation of a substantial amount of research activity is necessary and worthwhile; without collation, this body of evidence is currently being ignored, and may hold substantial/promising undiscovered effects. Whether evidence of benefit is found for one or more interventions, the trials community will benefit from knowing the outcome of this review. Moreover, aggregating data from non-randomised studies using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach 4 , may raise confidence in the overall body of evidence, and supplement the evidence-base from randomised studies.

Objective
We conducted a systematic review of non-randomised studies that evaluated the effects of strategies to improve recruitment of participants to RCTs.

Methods
The full protocol for this review has been previously published 5 and registered with PROSPERO (CRD42016037718). No amendments have been made to the protocol since its publication. A brief summary of methods is given below.

Types of studies
Non-randomised studies of two or more interventions to improve recruitment to a randomised trial. 'Non-randomised studies' are defined as any quantitative assessment of a recruitment intervention that did not randomly allocate participants to intervention or comparison groups. No additional eligibility criteria (e.g. publication year, status, language or journal) were applied.
Types of participants Individuals enrolled in a trial. The context of the trial is likely to be healthcare but may not be, for the reason that interventions that are effective in other fields may also be applicable to settings in the healthcare environment.

Types of intervention
Any intervention or approach aimed at improving or supporting recruitment of participants nested within studies performed for purposed unrelated to recruitment.

Types of outcome measures
Primary: Number of individuals or centres recruited into a trial.
Secondary: Cost of using the recruitment intervention per trial participant.

Search methods for identification of studies
We searched the following electronic databases without language restriction for eligible studies: Cochrane Methodology Register (CMR), Medical Literature Analysis and Retrieval System Online (MEDLINE), MEDLINE In-Process, Excerpta Medica dataBASE (EMBASE), Cumulative Index to Nursing and Allied Health Literature (CINAHL), and PsycINFO. The full search strategy is published and freely accessible 5 . Reference lists of relevant systematic reviews (e.g. 3) and included studies were hand-searched.
The literature searches were carried out between 16 th October and 11 th November 2015. On 2 nd August 2018 an updated search was made in all databases, and a further 2,521 abstracts were found. 460 abstracts from 2018 were screened in duplicate, which led to 10 full texts being checked for inclusion. The ten full texts detailed ten studies, none of which provided sufficient detail about the design or implementation of interventions to allow us to pool data. Adding these studies into the review would not strengthen or disprove the conclusions we had already drawn. For this reason, we have chosen not to carry out a full updated search, and all data presented in this paper reflect the full literature searches carried out in 2015.
Selection of studies Two reviewers (HRG and one other) independently screened the abstracts of all search records. Full texts of potentially eligible abstracts were then independently reviewed by HRG and one other to determine inclusion. Disagreements were resolved through discussion.

Data management and extraction
Search results were merged, duplicate records removed, and a master spreadsheet was used to track all inclusions/exclusions to allow us to create a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram ( Figure 1). Data were extracted by two reviewers independently (HRG and one other) and collected on specially designed forms (Extended Data File 1 Blank data extraction form 6). Disparities were resolved through discussion.

Assessment of risk of bias in individual studies
Two members of the project team used the ROBINS-I tool 7 to assess studies for aspects of methodological quality such as confounding, participant selection, intervention measurement, departures from the intended intervention, missing data, outcome measurement and selection of the reported result. As per ROBINS-I guidance, studies at critical risk of bias were excluded from any synthesis.

Analysis
Studies were analysed according to the type of intervention used; interventions were grouped when their form or content was deemed sufficiently alike. We planned to further categorise  studies by participant if we found the same intervention applied to more than one type of participant (e.g. patients, staff at recruiting centres).
Dealing with missing data Attempts were made to contact study authors to obtain missing data. Analyses were conducted on an intention-to-treat basis where possible; alternatively, data were analysed as reported.

Assessment of heterogeneity
The nature of the included studies meant that much of the analysis was anticipated to be narrative. Where population, intervention and outcome were sufficiently similar to allow for a meta-analysis, we planned to look for visual evidence of heterogeneity in forest plots, and statistical evidence of heterogeneity using the chi-square test for heterogeneity and the degree of heterogeneity quantified using the I 2 statistic 9 . Where substantial heterogeneity was detected (I 2 ≥ 50 %), we planned to investigate possible explanations informally and summarise data using a random-effects analysis where appropriate.

Assessment of reporting bias
We planned to investigate reporting (publication) bias for the primary outcome using a funnel plot where 10 or more studies of the same population, intervention and outcome were available.

Screening and identification of studies
We screened a total of 9,642 abstracts identified by the database search, and 231 articles found through hand searching of review article reference lists. Of the screened abstracts, 256 were suitable to assess for inclusion at full-text stage. We were unable to obtain the full text of 33 of the 256 articles (details in Extended Data File 2 References to studies awaiting assessment 6 ). Of the 223 full-text articles assessed, 124 were excluded; this includes seven articles which required additional data to allow for inclusion (details of excluded studies in Extended Data File 3 Characteristics of excluded studies 6 ). A total of 99 full texts were included, which comprised 102 individual studies; 92 of these were considered to be at serious risk of bias while ten were considered to be at critical risk of bias. The latter group were excluded from the study as per ROBINS-I guidance (see risk of bias assessments for these studies in Extended Data File 4 Studies that were at a critical risk of bias and therefore excluded from this review 6 ).

Description of studies
Of the 92 included studies, 90 studies assessed interventions that aimed to improve the recruitment of participants to trials (55123 individuals and 172 couples), one assessed an intervention that aimed to improve the recruitment of GP practices to trials (54 practices), and one assessed an intervention that aimed to improve recruitment of GPs (150 GPs). 23 studies reported data on cost per recruit. Study size ranged between 14 and 5887 participants.
The design of included studies varied substantially and did not always fit into conventional design categories. Most studies (82/92) were what we describe as 'yield' studies. These types of studies appear not to have been planned as a method aiming to rigorously evaluate a recruitment intervention or interventions; rather, authors retrospectively report what methods have been used to recruit participants into the trial. Reporting of yield studies tends to rely on self-report by participants, although where online methods were used, the calculation of participant yield was recorded by the software or website; e.g. via number of clicks recorded by Facebook.
The remaining study types included in this review included cohort (7/92) and before and after designs (3/92).

Using risk of bias to select studies
Of the 92 included studies, 72 were classified as at a 'moderate' or 'serious' risk of bias in one of more domains as well as the bias due to confounding domain. The remaining 20 studies were at 'serious' risk of bias in the confounding domain but were deemed to be at 'low' risk of bias across all other domains (participant selection, intervention classification, deviations from intervention, missing data, outcome measurement, selection of reported result).
We made the decision to focus on the 20 studies at least risk of bias in the Results and Discussion sections of this review. Primarily, this reflects our confidence in the evidence presented and follows a similar approach taken in the 2018 Cochrane recruitment review 3 . Data from the 72 studies that we have chosen not to focus on are presented in Extended Data File 5 Characteristics of included studies 6 . The 20 studies that we focus on here are organised into seven broad intervention categories (full references to these studies are presented in Extended Data File 6 References for the 20 studies included in the results section of this review 6 ). Letters, e.g. 'A', refer to more than one intervention tested within the same intervention category. Numbers, e.g. 'B1', refer to instances where the same intervention was tested more than once within a study.
seven categories for the 20 studies. This table should be viewed with caution because the general lack of denominators means that direct and meaningful comparison within and across categories is not possible, as described below.
Comparing data within categories. The details of the interventions evaluated in these studies are limited, so to bring order to the variety of interventions, we have assigned them to broad categories. Each of these categories includes a range of interventions, the majority of which we are unable to thoroughly describe. For this reason, we urge you not to compare data within categories. By this we mean looking at two studies, e.g. Andersen 2010 and Bell-Syer 2000, seeing that 87 out of 187 participants and 104 out of 187 participants were recruited using face to face recruitment initiatives respectively, and assuming that these values demonstrate the success or failure of specific face to face recruitment initiatives.
We have simply used categories to bring order to the variety of interventions included in this review; each category includes a diverse range of interventions. This diversity in interventions means that none of the data presented have been pooled, and it is important that caution is exerted when interpreting data to ensure that we do not assign influence to studies where they are not deserving of it.
Comparing data across categories. Similarly, we urge you not to compare data across categories. By this, we mean looking at a study, e.g. Andersen 2010, seeing the 87 participants were recruited using face to face recruitment initiatives, and 100 participants were recruited using postal invitations and responses, and making a judgement about the success (or failure) of either of the interventions used. It's important to bear in mind that these data do not provide denominators; there is no way for us to know how many people were exposed to either of these interventions, or over what time period, in order to recruit 187 participants.
Face to face recruitment initiatives Ten studies (totalling 3853 participants and 150 GPs) evaluated face to face recruitment initiatives, two of which used cohort studies, and eight used yield studies (see Table 1 and Table 2).
Face to face recruitment initiatives varied across the ten studies in this category; largely they focussed on recruitment of participants who were attending appointments with their primary care physician or GP, other studies looked at recruiting participants who were in the waiting room of their care-provider before an appointment took place. In most cases, waiting room recruitment was facilitated by a research nurse. Other methods used include referral of participants from different parts of their own clinical care pathway, though most were targeted around an existing appointment made by the potential participant. These care pathways included outpatient appointments, appointments at community institutions, academic institutions, and at veterans' health administration centres.
Despite the superficial similarity of the interventions used within this category, both the diversity of comparators, settings and populations, and the poor reporting of the specifics of the interventions, made pooling data unfeasible.

Language adaptations
One study (2575 participants) evaluated language adaptations; Barrera 2014 compared translations of Google AdWords in Spanish or English language using a yield study (see Table 1 and Table 2). The trial was based online within the USA and aimed to recruit pregnant women to a trial of an internet intervention for postpartum depression, the embedded recruitment study did not account for variations in how common postpartum depression is in Spanish-speaking populations in comparison to English-speaking populations.

Postal invitations and responses
Nine studies (totalling 1614 participants) evaluated postal invitations and responses, two of which used cohort studies, and the other seven used yield studies (see Table 1 and Table 2).
Postal invitations and responses were used widely within the studies included in this review. Largely interventions within this category were based on patient lists held by caregivers; letters were sent out and then the number of responses from potential participants monitored, in most cases these studies reported the number of responses from people that ultimately went on to be recruited into the study. As mentioned in the 'face to face recruitment initiatives' section, many of the postal interventions used a face to face method as their comparator. Despite the superficial similarity of the interventions used within this category, both the diversity of their comparators, settings and populations, and the poor reporting of the specifics of the interventions, made pooling data unfeasible. In only one case (Funk 2010), did comparators vary from this trend. In this study, the method of response to a mailed brochure was monitored; potential participants were given the option of responding to the mailing by telephone or a website. These comparators were unusual within the literature, and draw attention to the two-dimensional nature of many of the other studies within this category; largely researchers looking at postal methods are focussing on the method used to contact potential participants, rather than the ways that these individuals may respond.

Randomisation methods
One study (553 participants) evaluated randomisation methods; Brealey 2007 compared use of telephone and postal randomisation methods using a yield study (see Table 1 and Table 2). Initially, general practices involved used a telephone service to randomise patients to the host trial. Delays in the start of recruitment at some sites led the team to modify the randomisation procedure to include postal randomisation. Following this, new sites were given the option to use either postal or telephone randomisation methods. Trial awareness strategies aimed at the recruitee Four studies (totalling 407 participants) evaluated trial awareness strategies aimed at the recruitee, one of which used a before and after study, and the remaining three used yield studies (see Table 1 and Table 2). This category is diverse; the four studies include four distinct interventions. The reporting of these interventions is ambiguous; for example, Carr 2010 describes a community outreach event, Johnson 2015 describes a non-targeted flyer, and Sawhney 2014 describes increased awareness of the trial via use of a telephone reminder prior to their clinic appointment. It is feasible that all of these interventions could come under the umbrella of 'trial awareness strategies aimed at the recruitee' which is what is described by Carter 2015. The text states that Carter 2015's interventions included distribution of leaflets and posters at clinics, therapy centres and regional multiple sclerosis societies, presentations and attendance at regional multiple sclerosis events and to local physiotherapy teams, and referral from other professional such as multiple sclerosis nurses and word of mouth.
Trial awareness strategies aimed at the recruiter Fives studies (totalling 188 participants and 54 practices) evaluated trial awareness strategies aimed at the recruiter, one of which used a cohort study, two used before and after studies, and two used yield studies (see Table 1 and Table 2).
Again, the interventions evaluated within this category are diverse: Carr 2010 looked at a medical education event; Embi 2005 and Treweek 2010 looked at methods of clinical trial alert software set to trigger during clinic appointments; Beauharnais 2012 assessed effectiveness of an automated pre-screening algorithm to identify potential participants; and Colwell 2012 evaluated the use of viral marketing techniques in the form of postcards, invitation letters and flyers. The diversity of these interventions means that data could not be pooled.

Use of networks and databases
Two studies (totalling 486 participants) evaluated the use of networks and databases, both of which used yield studies (see Table 1 and Table 2).
Park 2007 compared centralised recruitment efforts with de-centralised approaches that were tailored to the study and sites specifically. Weng 2010 evaluated effectiveness of existing lists of potentially eligible participants; comparing a clinical patient registry with a clinical data warehouse. The interventions are sufficiently different that data could not be pooled.

Discussion
This review identified 92 studies, 20 of which were included in a narrative synthesis; those 20 studies evaluated the effect of seven categories of interventions to improve recruitment to randomised trials.
The interventions evaluated in these studies varied significantly; even those that had an intervention category in common were sufficiently dissimilar to prevent pooling of data, rendering subgroup analyses unfeasible. That said, what limits the utility of these studies is not necessarily the interventions evaluated; it is the abundance of small study samples sizes, inadequate reporting, and a lack of coordination when it comes to deciding what to evaluate and how.
Does this mean that non-randomised evaluations of recruitment should be stopped in favour of implementing randomised approaches? This review does not show ground-breaking evidence that will change the global landscape of how trialists recruit participants into trials. However, the 2018 Cochrane recruitment review 3 of randomised evaluations of recruitment interventions was not able to provide clear evidence of benefit for the majority of interventions either. Like this review, the randomised review also experienced challenges with small, methodologically flawed studies, a diverse range of interventions, and a lack of detailed reporting. This fact may not be comforting for trialists, but it demonstrates that the utility of non-randomised studies is not always vastly different from their randomised counterparts.
Non-randomised evaluations have acquired a bad reputation, but they do have their merits. Randomised evaluations are not always possible because of logistics, financial resources, or ethical reasons 10 , and non-randomised studies could allow researchers to gather useful data to complement or replace data generated by randomised trials 11 .
It is clear that non-randomised evaluations of recruitment interventions will continue. In their current form, however, we found their usefulness to others to be extremely limited. What we need to focus on now is improving the way that these non-randomised evaluations are planned, conducted and reported.

Planning non-randomised evaluations of recruitment
The non-randomised studies that are included in this review largely take the form of what we refer to as 'yield' studies. As described earlier, these types of studies appear not to have been planned as a means to rigorously evaluate a recruitment method; instead, they represent the work of authors retrospectively reporting what they have done, and subsequently what they have seen.
This practice limits utility of these studies in two ways: 1. The studies are not designed in such a way as to lend themselves to straightforward analysis, which means that interventions and their comparator are not always introduced at the same time or used for the same length of time. A lack of planning also results in the collection of data that are incomplete and lack context; this is a problem that features in most studies included in this review. Data are presented in terms of numerators; they provide numbers of participants/ GPs/practices recruited into a trial, but do not provide a denominator, meaning that comparing interventions to assess effectiveness is impossible.
2. As is clear from the larger intervention categories such as face to face recruitment initiatives, and postal invitations and responses, the trials community is currently lacking a consistent approach to the nonrandomised evaluations that they are publishing.
Rather than reporting what has been done retrospectively, we would encourage trialists to prospectively plan to embed recruitment evaluations, specifically using a study within a trial (SWAT) protocol 12 that already exists on the SWAT repository 13 , into their trials from the very beginning of the process of planning the host trial. The Medical Research Council Systematic Techniques for Assisting Recruitment to Trials (START) project is a remarkable example of the effectiveness of a well-planned, organised and cohesive approach to SWATs 14 ; the project ran between 2009 and 2015 and answered its research question regarding optimised participant information sheets within the space of six years. The follow-on PROMETHEUS project is coordinating over 30 recruitment and retention SWATs and will substantially increase global evidence for trial recruitment and retention in the space of around four years. Without coordination of high-quality evaluations, it is entirely possible for a decade to pass without materially increasing the evidence base available to trialists, as a comparison of the 2007 15 and 2018 3 Cochrane recruitment reviews demonstrates.

Conducting non-randomised evaluations of recruitment
The process of conducting non-randomised evaluations of recruitment lacks structure; limited planning means that many of the studies included in this review were penalised as a result of poor conduct.
74% of included studies were judged to be at moderate risk of bias in the 'bias in classification of interventions' domain of the ROBINS-I tool. These studies were most often penalised as a result of blurred lines between interventions and their comparators. For example, Adams 1997 (Extended Data File 5 Characteristics of included studies 6 ) compares the effectiveness of professional referrals, cold calling by the research team, presentations at senior centres, media outreach, mailings sent to personal care home managers, and flyers; a total of six interventions. Participants could conceivably have been drawn to take part in the trial as a result of more than one of these six interventions; someone could have seen the media outreach campaign, received a flyer, and attended a presentation at a senior centre. This, combined with self-report of one method by participants, makes meaningful interpretation of the results extremely difficult.
Reporting non-randomised evaluations of recruitment Currently, trialists are focussing on the mode of delivery of the interventions that they are working to evaluate; they omit key details regarding the content of the intervention, as well as the specific timescales that interventions were in place for. We highly encourage the use of the Guidelines for Reporting Non-Randomised Studies 16 , and the Template for Intervention Description and Replication (TIDieR) checklist and guide 17 when reporting these types of studies.
Missing data was another aspect of reporting where detail was lacking. Of the 92 included studies, 13% were deemed to be at serious risk of bias due to missing data; a glaring example of research waste. Pieces of data that were missing were not entire data categories or a reflection of participants being lost to followup; in some cases, the data simply did not add up. One example is Blackwell 2011; this paper reported recruitment of 301 participants, but when we manually calculated how many participants had been recruited across each of the seven methods used in the study, and also included the participants that were reported as 'don't know/refused/other', the total was 303 participants. The size of the discrepancy may appear trivial, but it undermines confidence in the data presented and the study generally. This was not a unique occurrence; missing data were also found in

Conclusions
Implications for systematic reviews and evaluations of healthcare Some interventions to increase recruitment described in this review do show promise but methodological and reporting problems mean that our confidence in these results is not substantial enough to recommend changes to current recruitment practice. Currently the literature is oversaturated with a diversity of interventions tested in non-randomised evaluations that fail to drill down deep into the effects of each specific recruitment strategy. Their usefulness to other trialists is therefore extremely limited.
What is needed now is a move away from retrospective descriptions of what happened, to carefully planned prospective evaluations of well-described recruitment interventions and their comparators. Without this change, authors of non-randomised evaluations of recruitment interventions are simply contributing to research waste.

Data availability
Underlying data All data underlying the results are available as part of the article and no additional source data are required. 1.

2.
effectively there is no narrative synthesis (though a summary of some study characteristics is given).
I appreciate that it is not easy to do a systematic review on this topic area due to the considerable heterogeneity and weaknesses of the studies, so I sympathise with the authors. But I am surprised that some of the problems with this evidence base were not identified during the protocol development, scoping, and pilot testing of the methods. The upshot is that very weak (and seemingly useless) studies appear to have made it all the way through to the data synthesis step but then all of them were rejected from narrative synthesis. Could the evidence synthesis planning process have been done differently perhaps, to identify some of the methodological problems (e.g. limited sample sizes and ambiguous descriptions) earlier, or would this have just resulted in all studies being excluded? It feels to me that all studies have, effectively, been excluded anyway since none of them provide any comparative results to support the review objective. Are there any lessons that review teams can learn here, to improve future evidence syntheses in this difficult area?
A comprehensive risk of bias assessment was conducted, but this is not reported consistently. For example, the 7 individual domains of bias are reported for the 10 studies that were excluded because they were at critical risk of bias (Extended data file 4) but only an overall risk of bias judgement is reported for the 72 studies that were included in the review (Extended data file 5). Given that a total of 92 studies were eligible for inclusion in the review, the risk of bias assessments for 10 studies appear to be missing (Extended data files 4 and 5 report 82 studies in total).
The review report is not fully clear about which of the methodological issues discussed were assessed as risks of bias, and which were identified additionally. Missing data are described separately in the Discussion (e.g. for the Blackwell 2011 study) but I presume that these missing data had already been captured in the "Missing data" domain of the risk of bias assessment? It would be helpful if each of the risk of bias domains could be explained so that their interpretation in relation to non-randomised recruitment studies is clear.
In summary, this review has failed to meet its stated objective and it is difficult to see how it advances knowledge in this scientific area (we already knew that non-randomised studies are methodologically weak, and the rationale behind this review was to make best use of these studies despite their limitations... which hasn't been achieved).
But I believe the authors could update this manuscript to make it into a useful scientific contribution. I have suggested two possibilities that the authors could consider. I am not sure how much effort these would take but they would utilise the studies already included in the review and may require little if any amendment to the current protocol: Check the included studies and evaluate systematically whether there are any data that can be obtained from these non-randomised studies that would constructively inform a data synthesis (i.e. go back and attempt to fulfill the stated objective). Currently, the review considers only limitations, whereas it would be helpful to look for the strengths (if any) that can be taken from these non-randomised studies. Would GRADE or other approaches (e.g. sensitivity analyses) enable any of the included studies to contribute to data synthesis? Could you make best-/ worst-case assumptions in sensitivity analyses to address some of the uncertainties? (see specific comments below). I would be surprised if all 92 identified studies are so poor that none of them at all can be descriptively reported in the narrative synthesis, but this should be systematically checked and transparently reported so that if it is true that the entire evidence base is useless then at least the conclusion would be defensible.
If the evidence-based conclusion is that data synthesis is not feasible due to all studies being too 2. 1.
If the evidence-based conclusion is that data synthesis is not feasible due to all studies being too weak, then pragmatically the review could direct its effort to systematically analysing the limitations of the studies as a basis to support evidence-based recommendations on how to improve non-randomised studies in future. In order to make the critique of the studies' methods more systematic, consistent, and transparent perhaps a table could be provided in the results section for all the included studies indicating exactly which methodological limitations each of the studies was susceptible to? This could then support evidence-based specific recommendations about exactly what needs to be improved in the non-randomised studies for research practice to be improved. I think this could make the paper valuable -provided that the recommendations arising are evidence-based and are communicated and disseminated carefully to achieve impact.

Specific comments:
Methods: Types of intervention: I don't understand the meaning of the following sentence (NB minor typo needs correcting): "Any intervention or approach aimed at improving or supporting recruitment of participants nested within studies performed for purposed unrelated to recruitment." Methods: Search methods for identification of studies: "Adding these studies into the review would not strengthen or disprove the conclusions we had already drawn." This statement seems to imply that the 10 studies being referred to were somehow tested for their influence on the overall results. I suspect this is not the intended meaning. Do you mean that, by virtue of their lack of clarity, these studies could not usefully inform the review?
Methods: Assessment of risk of bias in individual studies: Very little description or discussion of the risk of bias assessment is provided. As ROBINS-I is a relatively new tool I wonder how many readers will be familiar with it? Could a supporting explanation be provided of how decisions were reached? If the review is re-framed to focus on the studies' weaknesses (see my general comments above), then a more detailed discussion of risks of bias would be appropriate. How good was reviewer agreement on rating the risks of bias? Were any specific bias domains particularly difficult to interpret or agree on? Figure 1: PRISMA chart: What were the reasons that so many (33) full-text articles were not available? Is there anything practical that could be recommended to reduce this problem in future research?
Results: Screening and identification of studies: "this includes seven articles which required additional data to allow for inclusion". The meaning of this statement is not clear. Do you mean that insufficient information was reported in these studies to make an eligibility decision and since the authors did not provide any further information these studies had to be excluded?
Results: Using risk of bias to select studies: The meanings of the seven bullets listed here are not explicitly defined anywhere, although to me they seem intuitive and self-evident. An exception is "Language adaptations" which I didn't understand the meaning of until I reached Table 2, where I noticed that it is explained (implicitly). Perhaps cross-refer the reader to Table 2 to clarify what these bullets mean? Results: Trial awareness strategies aimed at the recruitee: Minor typo: "professional".
Results: Trial awareness strategies aimed at the recruiter: Minor typo: "fives".
Discussion: "Non-randomised evaluations have acquired a bad reputation, but they do have their merits. Randomised evaluations are not always possible because of logistics, financial resources, or ethical reasons" This statement seems to be referring to non-randomised studies in general. Are there specific reasons why some (or indeed most) types of recruitment study can only be non-randomised?
Discussion: Some information about the Adams 1997 study is revealed here but it feels like this information should have been reported in the results section, along with equivalent information from the other included studies, for consistency and transparency.
Discussion: "Currently, trialists are focussing on the mode of delivery of the interventions that they are working to evaluate; they omit key details regarding the content of the intervention, as well as the specific timescales that interventions were in place for." Is this statement evidence-based? It doesn't really link to the results section.
Discussion: The issue of missing data has not been consistently covered for each study in the results section, so it is unclear how representative the discussion of missing data in the Blackwell 2011 study is of the studies in general. It feels harsh to say that 2 missing data (<1%) would be sufficient to cause mistrust in the entire Blackwell 2011 study. Were such strict criteria applied across all studies? Could you do a (quantitative or descriptive) sensitivity analysis -i.e. explore whether changing the sample size by n=2 would make any tangible difference to the outcomes? I think even Cochrane Review guidance is pragmatic about not being overly strict if the amount of missing data is trivial?
Discussion: The review included several study designs of which Yield studies were the most frequent and, apparently, least useful. Could the remaining study types that would be more reliable tell us anything? No results from any of these studies have been provided.
Conclusions: The statement "Some interventions to increase recruitment described in this review do show promise" is not evidence-based, since no evaluation of intervention effectiveness has been provided.
References: There is no information provided for reference 15.

Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes
Are sufficient details of the methods and analysis provided to allow replication by others? Partly

Is the statistical analysis and its interpretation appropriate?
I am not sure what is meant by the statement that 7 articles were excluded at the full-text stage because they needed additional data to allow for inclusion. From the information in the study was it not possible to determine whether the study was eligible? Or is it that there was some of the data you wanted to extract was missing. If it is the latter I would include the study and state that there was missing data.

Using risk of bias to select studies:
Of the 72 studies excluded what was their risk of bias due to the confounding domain, it just states 'the bias due to confounding domain'.
Comparing data within categories: I think the following statement needs to be rephrased or removed. I am not clear why you are unable to describe the intervention. 'Each of these categories includes a range of interventions, the majority of which we are unable to thoroughly describe.'

Planning no-randomised studies:
The acronym SWAT is used for the 1st time without being spelt out fully, it is only spelt out fully later on in the paragraph.
It would be useful in this section to explain what a SWAT is, as it stands the section focuses on what is limiting the utility of evaluations of recruitment and it would be improved and more useful if more focus is given on what could be done to increase the utility of these studies.

Reporting non-randomised studies:
It would be helpful to give an overview of the type of data that was missing. At present this section is focusing on the fact that the number of participants didn't add up in some of the studies. Was this the main example of missingness?
Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes

Is the statistical analysis and its interpretation appropriate?
Not applicable

Are the conclusions drawn adequately supported by the results presented in the review? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Severe mental illness, smoking cessation, RCTs I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

5.
6. This systematic review includes non-randomised studies of strategies to improve participant recruitment to randomised controlled trials. The justification for inclusion of non-randomised studies is that the previous Cochrane systematic review of RCTs found very little high-certainty evidence to guide trialists. This is a well presented, large scale, robust systematic review of a topic of great importance to trialists seeking strategies to improve their trial's recruitment. The inclusion of non-randomised studies resulted in a diverse number of evaluations and the narrative synthesis is as a result quite broad and high-level. The recommendations for use of TIDIER in intervention reporting and SWATS are practical and feasible. There are a number of clarifications that I feel would improve the manuscript overall. These are as follows: In the introduction could you please give some detail as to what strategies were identified as effective from the Cochrane review. In the discussion can you compare and contrast the findings of this current review to those of the Cochrane review. It would be helpful to get an understanding of what the addition of these non-randomised studies adds overall to the evidence base, given the substantial amount of work in this review. Included studies were of insufficient quality to allow for the pooling of data, however, the authors could present narratively a summary of strategies that demonstrated success, to guide trialists, and those designing SWATS as to what strategies may be worth exploring further.
The literature search was carried out in 2015. The authors have justified this by including details of how a search in 2018 identified an additional 10 studies only, the addition of which would not strengthen or disprove the conclusions. However, this search is 5 years out of date and I think the authors should contextualise their conclusions in light of this limitation. The addition of more 'yield studies' would be unlikely to change the conclusions (as these would be likely high risk of bias and not included in the analysis), however, the addition of a few more cohorts studies could have a bigger impact.
Were all non-yield studies pre-planned studies?
The review identified 92 studies, but only 20 were considered low risk of bias and included in the narrative review. Could the authors comment on 20 included studies in more detail in the manuscript? For example, from my reading for the table, this includes 13 yield studies and 7 cohorts, etc. type studies.
Could the authors comment on the definition of non-randomised study -I wonder if a more detailed definition was applied would this have reduced the number of yield studies that were identified?
In the introduction, the authors comment that aggregating data from non-randomised studies using