The Society for Implementation Research Collaboration Instrument Review Project: A methodology to promote rigorous evaluation

Background Identification of psychometrically strong instruments for the field of implementation science is a high priority underscored in a recent National Institutes of Health working meeting (October 2013). Existing instrument reviews are limited in scope, methods, and findings. The Society for Implementation Research Collaboration Instrument Review Project’s objectives address these limitations by identifying and applying a unique methodology to conduct a systematic and comprehensive review of quantitative instruments assessing constructs delineated in two of the field’s most widely used frameworks, adopt a systematic search process (using standard search strings), and engage an international team of experts to assess the full range of psychometric criteria (reliability, construct and criterion validity). Although this work focuses on implementation of psychosocial interventions in mental health and health-care settings, the methodology and results will likely be useful across a broad spectrum of settings. This effort has culminated in a centralized online open-access repository of instruments depicting graphical head-to-head comparisons of their psychometric properties. This article describes the methodology and preliminary outcomes. Methods The seven stages of the review, synthesis, and evaluation methodology include (1) setting the scope for the review, (2) identifying frameworks to organize and complete the review, (3) generating a search protocol for the literature review of constructs, (4) literature review of specific instruments, (5) development of an evidence-based assessment rating criteria, (6) data extraction and rating instrument quality by a task force of implementation experts to inform knowledge synthesis, and (7) the creation of a website repository. Results To date, this multi-faceted and collaborative search and synthesis methodology has identified over 420 instruments related to 34 constructs (total 48 including subconstructs) that are relevant to implementation science. Despite numerous constructs having greater than 20 available instruments, which implies saturation, preliminary results suggest that few instruments stem from gold standard development procedures. We anticipate identifying few high-quality, psychometrically sound instruments once our evidence-based assessment rating criteria have been applied. Conclusions The results of this methodology may enhance the rigor of implementation science evaluations by systematically facilitating access to psychometrically validated instruments and identifying where further instrument development is needed. Electronic supplementary material The online version of this article (doi:10.1186/s13012-014-0193-x) contains supplementary material, which is available to authorized users.


Background
Identification of psychometrically strong instruments for the field of implementation science is a high priority in the United States, as underscored in a recent National Institute of Health working meeting (October 2013; Rabin et al., unpublished). a Reliable and valid instruments are critical to scientific advancement as they allow for careful collection, expression, and comparison of results of observation and experimentation [1]. Unfortunately, poorquality instruments have slowed the discovery and application of evidence-based implementation strategies for supporting widespread delivery of evidence-based care. Many new fields face instrumentation challenges until consensus builds around high-quality measures of key constructs. Without consensus, informative, applicable instrumentation will remain slow and hindered by duplicative efforts and incommensurable results. For an in-depth discussion of instrumentation issues in implementation science, see Martinez et al. [2].
Existing instrument review efforts within the field of Dissemination and Implementation Science (DIS) focus on individual constructs such as readiness for change (e.g., [3]) constructs that predict specific implementation outcomes such as adoption [4] and on broader reviews of multi-level domains [5]. Other instrument review efforts such as the Grid-Enabled Measures Project (GEM; [6]; https://www.gem-beta.org/public/wsoverview.aspx? cat=8&wid=11&aid=0) engage researchers and stakeholders in populating and evaluating an online repository of measures. Thus far, review efforts reveal that few instruments have undergone systematic development and are psychometrically strong. These instrument review efforts represent important contributions as they inform the state of measurement quality in the field and support a significant need for additional research in this area.
Despite these instrument review efforts, three important gaps remain. First, no existing instrument reviews include a comprehensive array of constructs relevant for DIS. A comprehensive review of constructs is important to guide instrument selection and development and then to facilitate identification of constructs that are implicated in successful implementation. Second, existing methodologies for instrument reviews are narrowly focused and only provide limited psychometric assessments of the instruments. Specifically, Chaudoir et al.'s instrument review focused only on predictive validity [5]. Although predictive validity is critical to the identification of key constructs, in the absence of also establishing reliability and/or content and construct validity, predictive validity is only marginally informative. Further, Chor et al.'s work provided dichotomous (yes/no) conclusions about the psychometric validation of instruments without providing an indication of the process for this determination [4]. These limitations of existing instrument review methodologies must be addressed to support quality measurement in this field. Third, no protocol exists to systematically develop a compendium or repository of instruments for widespread use. An open-source resource would facilitate simultaneous access to instruments and comparison between instruments with respect to their psychometric strength. A centralized online database that is searchable and provides head-to-head comparisons of instrument psychometric properties would be a significant step forward for the field.

The current project: aims and objectives
The Society for Implementation Research Collaboration (SIRC; formerly known as the Seattle Implementation Research Collaborative) b Instrument Review Project (IRP) has established a methodology for instrument review to address these gaps by a) conducting a systematic and comprehensive review of quantitative instruments assessing constructs delineated in two of the field's most highly cited frameworks, the Consolidated Framework for Implementation Research (CFIR; [7]) and the Implementation Outcomes Framework (IOF; [8]); b) adopting and applying a systematic search process (using standard search strings); c) engaging an international team of experts to assess the full range of psychometric criteria (reliability, construct validity, and criterion validity); and d) building a centralized online, open access, evolving repository of instruments depicting graphical head-tohead comparisons of their psychometric properties. Existing instrument review and repository efforts are summarized and compared in a separate manuscript that highlights their unique contributions and the gaps in the field that the SIRC IRP seeks to fill (see Rabin et al., unpublished). In this article, we describe the SIRC IRP methodology and summarize preliminary results of the 420+ instruments that have been identified according to the following: the number of instruments identified for each of the 48 DIS constructs (including the 13 subconstructs; CFIR and IOF), the rigor underlying instrument development, whether the construct was explicitly defined in the original article, the year and field in which the instrument was created, the stakeholder targeted by the instrument, settings in which the instrument has been used, and the number of published studies reporting use of the instrument (bibliometric data).
The findings from this methodology will inform a pressing research agenda by identifying priorities for measurement development. Moreover, the online repository will position those invested in advancing the field of implementation science (e.g., researchers and stakeholders: agency leaders, purveyors, decision makers in service provider organizations) to engage in rigorous evaluation of their implementation initiatives by providing online access to instruments, associated peer reviewed articles, and information regarding their psychometric properties. Although the resulting repository is geared towards implementation of psychosocial interventions in mental health and health-care settings to be consistent with the focus of SIRC, the repository is designed to promote the use of instruments across disciplines that will be useful to researchers and stakeholders implementing evidencebased practices across a broad spectrum of settings.

Methods
Step 1: defining the scope of the project The instrument review protocol and development of the repository focuses on quantitative instruments used in the implementation of evidence-based practices or innovations in mental health, health care, and school settings. To adhere to this scope, we developed the following two criteria for identifying relevant instruments: a) if the instrument assesses some aspect of implementation science with regard to settings where mental health interventions are used, it will be regarded as relevant; and b) if an instrument can be easily adapted to make its subject pertinent to the mental health field, it will be deemed relevant (e.g., only the name of the intervention, population, or setting would need to be changed within the instrument).
Step 2: selecting theoretical frameworks to guide the review Our team prioritized identifying a theoretical framework that could guide identification and organization of the instruments according to key DIS constructs. Although there are over 60 guiding frameworks for DIS ([9]; e.g., PARiHS [10], DoI [11], PRISM [12]), there is little agreement and little empirical evidence on which constructs are more important for planning and evaluation [13]. Few theoretical frameworks come close to comprehensively outlining the diverse array of constructs and domains implicated. However, two of the most highly cited frameworks were selected to categorize and organize instruments: (1) the CFIR [3] and (2) the IOF [4].
The CFIR was an obvious first choice as it fits with our goal to be as comprehensive as possible. Specifically, the CFIR is a meta-theoretical framework generated to address the lack of uniformity in the DIS theory landscape that minimizes overlap and redundancies in available frameworks, separates ideas that had been formerly seen as inextricable, and creates a uniform language for the domains and constructs of DIS. Our team conceptualizes the CFIR constructs as potential predictors, moderators, and mediators or "drivers" of DIS outcomes. Despite the fairly comprehensive nature of the CFIR, it is limited in that clearly defined outcomes for DIS are missing. DIS outcomes are distinct from clinical treatment and service system outcomes. Implementation outcomes are typically measured in implementation activities, can advance understanding of the implementation processes, enhance efficiency in implementation research, and pave the way for studies of the comparative effectiveness of implementation strategies [8]. To address this limitation, our team identified a second framework put forth by Proctor et al.'s work delineating "implementation outcomes" [8]. The isolation and concrete operationalization of implementation outcomes, separate from service and client outcomes, was a unique and important addition to the literature (Table 1). This added focus may be critical in future research seeking to understand the temporal relations between constructs. Our team conceptualizes implementation outcomes, such as penetration and sustainability, as dependent variables in a DIS process and, therefore, as integral constructs warranting inclusion in a comprehensive review of DIS instruments. A detailed review of the theories and frameworks summarized here can be found elsewhere [9].
In sum, by combining the two frameworks, the resulting repository would include instruments based on a comprehensive listing of constructs implicated at the inception of an implementation project, throughout the early stages of an implementation, as well as those thought to contribute to the success of an implementation initiative. Constructs are defined here as factors inside domains that predict, moderate, or mediate DIS as well as implementation outcomes. The following domains guide review of the DIS instrument literature: characteristics of the intervention, outer setting, inner setting, characteristics of the individuals involved in implementation, process, implementation outcomes, and client outcomes (see Table 1).
Step 3: generating a search protocol for the literature review of constructs Utilizing the CFIR and IOF, a scoping review of the DIS literature was conducted, broadly, in search of instruments and related articles that purportedly measured each of the 48 constructs (including subconstructs). Scoping reviews are a useful first step to inform the parameters of subsequent systematic reviews [14]. In our scoping review process, we completed searches of PsycINFO and Web of Science to explore the landscape of DIS instruments and identify those relevant to mental health. This first pass of the literature on DIS constructs resulted in identification of 105 instruments.
This exploratory stage was integral to setting search parameters to guide the subsequent review. This task was undertaken using the help of a trained information specialist. From this scoping review, a publication date parameter was set to include only those articles published Table 1 Listing of included and excluded constructs from the organizing frameworks

Construct
Included Excluded after 1985 to maximize the relevance of instruments identified given how recently the science of dissemination and implementation has emerged. Drawing upon the work of Straus et al. [15], McKibbon et al. [16], and Powell et al. [17] who published helpful search strings for DIS literature reviews, a core set of search word strings that reflected the parameters of the project were identified (see core search strings in Additional file 1). Titles and abstracts were examined to exclude obviously irrelevant articles. Articles that survived the title and abstract review were then reviewed more thoroughly with special attention paid to the articles' method sections. In addition, the articles' references were reviewed and articles that appeared likely to yield new instruments were accessed.
Once an instrument was identified as relevant, it was sent to the project leads (i.e., C.C.L., C.S., R.G.M., and B.J.W.) for verification. Disagreements were resolved through careful review and consensus among our core workgroup. Disagreements were most often a result of issues of homonymy and synonymy as described in Martinez et al. [2], failure of the author to define the construct of interest, and misalignment (or multiple alignment) of the targeted construct with the constructs delineated by the organizing frameworks. In each case, at least two core workgroup members reviewed all available material and took one of the following actions: place the instrument within its most relevant construct, place the instrument within multiple constructs for ease of access, or exclude the instrument altogether.
The initial construct reviews were replicated by a team of research assistants (RA) at a second site. Each instrument author was contacted to obtain the full-length instrument in the event it was not included in the original article and to request permission to post the instrument under the password protection of the SIRC website for members to access. This process sought to improve the yield of available instruments to populate the developing repository.
Concurrent with the review of published literature, a snowball sampling email procedure was used to locate instruments in preparation or otherwise unpublished instruments. This was particularly important for preventing the creation of redundant instruments and extends this methodology beyond that of a typical systematic review. The snowball sampling technique accessed DIS stakeholders through relevant email LISTSERV (e.g., SIRC membership; Association of Behavioral and Cognitive Therapies Dissemination and Implementation Science Special Interest Group) and personal contacts. DIS-related websites across disciplines with a particular focus on mental health and health care were also reviewed for instruments or related papers and authors were subsequently contacted. Stakeholders who received emails from our group were encouraged to share the email request for DIS instruments with colleagues in the field.
Step 4: the literature review of specific instrumentsextending beyond a systematic review In the instrument review phase, we systematically compiled all information regarding each identified instrument, particularly with respect to the development of psychometrics and any data relevant to the evidence-based assessment (EBA) criteria described below in step 5. This step is a significant deviation from a typical systematic review protocol, but a necessary and effective innovation for our methodology to evaluate and synthesize the literature and produce a decision aid for researchers and stakeholders. As with the construct reviews, PsycINFO and Web of Science served as the primary databases for the instrument review. The instrument name written in quotations (e.g., "Treatment Acceptability Rating Form") served as the primary search string; the search was then limited by drawing upon the core set of search terms outlined in Additional file 1. Specific instrument reviews were replicated by a second RA. When completed, all documents pertaining to a single instrument were compiled and combined into a single PDF (henceforth referred to as a packet) in preparation for the quality assessment phase in step 6: data extraction and rating.
Step 5: development of the evidence-based assessment rating criteria In order to ensure that all identified instruments are evaluated for their psychometric qualities using a relevant system that is amenable to a large-scale collaborative effort, we developed an evidence-based assessment rating criteria. These criteria were derived from the EBA criteria of Hunsley and Mash's earlier work that focused on standardized patient outcome measures [18] and from the work of Terwee et al. [19]. These criteria will ensure that all identified instruments are evaluated for their psychometric qualities using a standardized system. To reduce rater subjectivity and enhance inter-rater reliability, the criterion anchors needed to be especially concrete. The main modifications included increasing the number of anchors (from 3-5) to promote variability of the ratings. To maximize the utility and relevance of the EBA criteria for the purposes of DIS, the first draft was sent to 106 expert DIS scientists, members of the SIRC Network of Expertise. We obtained 60 responses containing rich conceptual (e.g., how to include DIS-specific criterion) and practical (e.g., how to improve the likelihood that anchors would be selected reliably) feedback. All 60 responses were reviewed and integrated by the project's core workgroup. The second draft of the EBA rating criteria was then sent to local experts in classical test theory and test development. A third version of the EBA criteria emerged from further revising the anchors in accordance with the expert feedback. In total, this final version of the EBA rating system included six criteria reflecting: norms, reliability information, criterion (predictive) and construct (structural) validity information, responsiveness (sensitivity to change), and usability (assessed by length). Each criterion included a five-point anchoring system for rating ranging from "0" or "no evidence" to "4" or "excellent evidence" (see Table 2 for the final version of the EBA).
Step 6: data extraction and rating instruments The data extraction phase is ongoing to capture the most up-to-date public information on the instruments included in the repository. In this phase, the data is extracted by independent reviewers (RAs) using a standardized, piloted extraction procedure. Specifically, data referencing EBA-relevant information is highlighted and labeled by an RA for each article in every packet (which contains the instrument, the source article, and all associated peer reviewed publications in which the instrument is used). The purpose is to have well-trained RAs systematically complete the data extraction to promote ease of rating by the volunteer task force member (i.e., expert implementation scientist). Each packet is randomly assigned to an in-house advanced RA (often a PhD-level research scientist) plus one task force member to be rated for its psychometric strength and usability using the EBA criteria. Modeled after the work of Terwee et al. [19], we employed a "worst score counts" methodology. This is an intentionally conservative approach that also facilitates reliability in the rating process. Cohen's kappa is computed to assess inter-rater reliability, and rating discrepancies are resolved through consensus among the core workgroup. Figure 1 presents an illustration of the EBA criteria application and the resulting graphical displays of criterion scores. In this figure, two measures of evidence-based practice acceptability were evaluated according to the EBA rating process. As depicted in Figure 1, the Evidence-based Practice Attitudes Scale (EBPAS), a 15-item self-report measure that assesses "mental health provider attitudes toward adoption of evidence-based practice" [20], is directly compared with Addis and Krasnow's 17-item self-report measure that assesses practitioners' attitudes towards treatment manuals [21]. Using the worst score counts methodology and available data, the ratings reveal that the EBPAS is of high psychometric quality overall. Both instruments appear to have garnered strong psychometric properties including established structural validity (i.e., EFA/PCA analyses have accounted for more than 50% of variance), available norms, and fewer than 50 items. However, readers have the capacity to determine for themselves which qualities are most important (e.g., responsiveness versus predictive validity). The EBPAS has demonstrated stronger internal consistency and is more responsive (i.e., sensitive) to change. Conversely, Addis and Krasnow's [21] measure appears to have more consistently predicted criterion measures. Important to note is that the EBPAS has demonstrated predictive validity in previous studies (e.g. [22]) but not in all. This is a prime example of how the worst score Table 2 Evidence-based assessment criteria Step 7: population of the website repository Once both sets of ratings are attained, data are converted into a head-to-head graphical comparison that depicts the relative and absolute psychometric strength of an instrument relative to others for that construct (see Figure 1). This information is contained in the website repository alongside the instrument and links to all relevant literature. This step is integral for researchers and other stakeholders to efficiently judge the state of instrumentation for each construct.

Preliminary results and discussion Preliminary results
Despite identifying over 420 instruments across the 48 DIS constructs (including subconstructs), we uncovered critical gaps in DIS instrumentation. Preliminary results highlight constructs for which few to no instruments exist (see Table 3). Specifically, our review methodology revealed no instruments for the following constructs, many of which fall within CFIR's outer setting domain: complexity of the intervention, intervention design quality and packaging, intervention source, external policies and incentives, peer pressure, tension for change, goals and feedback, formally appointed internal implementation leaders, and engaging champions. Many other constructs appear to have only one or two instruments available (e.g., compatibility, relative priority). These preliminary results suggest that there is a great need for instrument development to advance DIS, particularly in the critical domain of outer setting. In the absence of outer setting measures, the field will be challenged to identify the role that these constructs play in successful implementation across different contexts. Interestingly, Table 2 Evidence-based assessment criteria (Continued)   despite the recently renewed NIH program announcement explicitly highlighting their interest in instrumentrelated proposals, they have received few proposals centered on instrument development (David Chambers DPhil, personal communication, October 24, 2013). Numerous constructs have 20 or more available instruments (e.g., acceptability, adoption, organizational context, culture, implementation climate, knowledge and beliefs about the intervention, other personal attributes, planning, reflecting, and evaluating), suggesting saturation. However, without readily available information on what exists nor the psychometric properties and associated decision making tools, DIS researchers and stakeholders may continue to develop instruments in these seemingly saturated areas or select poorly constructed instruments that will hinder scientific progress. It is important for researchers and stakeholders to carefully consider the applicability of available instruments to promote cross-study comparisons, which is a necessary process for building the DIS knowledge base. Figure 2 depicts the timeline across which identified instruments were developed ("year developed" is based on the year in which the original article was published). That is, based on our search parameter (i.e., beginning in 1985), less than one quarter (23.17%) of all identified instruments were developed prior to 1999 (14-year period), whereas one quarter (25.61%) of instruments have been developed since 2009 (4-year period), reflecting the growth of DIS in recent years. Notably, and perhaps not surprisingly, over one third (34.90%) of instruments for implementation outcomes have been developed since the seminal paper by Proctor et al. was published [8]. Proctor et al. articulated a research agenda for DIS outcome evaluations that appears to have positively influenced instrument development. Table 4 summarizes the six discrete fields from which the instruments emerged. The majority of instruments tapping implementation outcomes emerged from subfields of Psychology. Instruments tapping intervention characteristics stem from Psychology and Public Health or Government research. Inner setting instruments emerged from the previously mentioned fields, although more significantly from Organizational, Workplace, and Business literatures. Instruments tapping characteristics of individuals, process, and client outcomes were generated from a range of fields including those listed previously but also Medicine and Education. The breakdown of fields from which the identified instruments were generated suggests that Psychology and its subfields have contributed immensely to the evaluation of DIS, representing a higher average number of instruments Stages of dev. means the stages of development through which the instrument passed based on an eight-stage coding system describe in the text. It is important to note that these stages are not necessarily linear, meaning that an instrument need not pass through stage one to enter stage two and so forth. Rather, instruments received a point for any of the stages the instrument passed through. Finally, these ratings are reflective of the instruments' quality at its inception (i.e., based on its source article) and are not necessarily indicative of the instruments' current psychometric strength.
than any other field across constructs (M = 3.91). Notably, the discipline from which the instruments emerged was consistent with the strengths of each field. Tables 5 and 6 reflect the stakeholders targeted by each instrument and the contexts in which the instruments have been used, respectively. Across domains, the majority of instruments were developed to target the service provider rather than the service director, supervisor, or consumer. However, measures of intervention characteristics and process targeted stakeholders in the "other" category, encompassing a range of general staff as well as researchers. In line with the field from which the instruments originated and the scope of the review, the majority of instruments have since been used in mental health settings.
Bibliometric data available for each of the identified instruments (see Table 3) makes it possible to deduce which instruments have been perceived favorably by researchers conducting DIS via publication counts for each instrument. This information is of course confounded by the year in which the instrument was developed and thus should be interpreted with caution. To date, instruments tapping inner setting are the most frequently used and published. Notably, compatibility instruments have an average of 11 publications, followed by combined instruments (e.g., culture and climate, average of 9.22 instruments). External change agent instruments have an average of 10 published articles. Implementation outcomes are receiving greater attention in the literature; despite having far fewer publications, there is steady growth over the recent years.
With data extraction and psychometric ratings ongoing (step 6), we can nevertheless provide a preliminary account of the quality of the identified instruments. Across the 48 constructs (including subconstructs), an average of 71% included explicit construct definitions. This suggests that the construct validity of approximately one quarter of the instruments, which is based on careful operationalization of constructs according to their theoretical underpinnings, is questionable. In the absence of explicit construct definitions, use of identified instruments by other teams requires investigators to make assumptions about the instrument's construct validity based on available items, which may be challenging given the potential overlapping nature of constructs within domains (e.g., the construct of appropriateness is often used synonymously with perceived fit, relevance, compatibility, suitability, usefulness, and practicability; [8]). Until consensus among constructs and terms is achieved [23], this practice may compromise the generalizability of study findings.

Process
Engaging 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Engaging: opinion leaders 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Engaging: formally appointed internal implementation leaders 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) which each instrument should progress. Eight stages were identified based on seminal work of Walsh and Betz [24]: (1) construct is defined, (2) initial items are generated by a group of experts, (3) pilot test of items with representative sample, (4) validity and reliability tests conducted based on pilot testing, (5) instrument is refined based on pilot results, (6) refined instrument is administered to the targeted sample, (7) validity and reliability tests are performed, and (8) psychometric properties are reported. Each instrument was coded such that 1 point was assigned for each aforementioned stage through which the instrument progressed as reported in the original articles. Table 3 indicates that on average, the instruments identified did not even pass through three (of a possible eight) full stages of "proper" instrument development based on our coding system. These preliminary results suggest that the systematic development and psychometric characteristics of the body of instruments available in DIS is weak at best. However, these findings need to be substantiated by our rigorous psychometric evaluation, which is currently underway, in order to place confidence in these observations.
A comparison of SIRC's methodology to existing reviews and repositories To date, using this multi-faceted and collaborative search, synthesis, and evaluation methodology, SIRC's IRP has identified over 420 instruments tapping 48 constructs (including subconstructs) relevant to DIS. Use of this methodology, which combines systematic review techniques with email snowball sampling (to identify instruments in progress) and ongoing review of the latest publications, has resulted in a more comprehensive DIS instrument database than previous efforts. Specifically, although Chaudoir et al. [5] employed a systematic review of key DIS domains (i.e., structural, organizational, provider, patient, and innovation, as opposed to constructs: e.g., intervention adaptability, external policy, and incentives), they identified only 62 instruments which is substantially fewer than the 420+ instruments revealed by the SIRC methodology. We posit that the low number of instruments identified by Chaudoir et al. is due to the exclusion of instruments that assess implementation outcomes, arguably the most critical domain of DIS constructs to date, and due to the fewer number of domains included in their review. Moreover, our review methodology is unique because unlike previous reviews, all literature pertaining to each instrument has been identified to enable accurate conclusions about individual instrument quality. Previous efforts to employ a collaborative instrument review process, notably the GEM [6], do not systematically locate all available literature to rate the quality of the instruments. Rather, the GEM approach encourages website users to provide their own ratings regardless of user knowledge of the extant literature.

Implications
This multi-faceted methodology has potential long-term implications for DIS. Upon creation of the repository, researchers and stakeholders will have a relevant and useful resource for identifying available and psychometrically sound DIS instruments, thereby reducing the need to create "homegrown" instruments (i.e., relevant for one-time use; [8]) to evaluate their DIS efforts. We anticipate that access to the repository will encourage repeated use of the same, high-quality instruments to measure similar constructs across settings, reduce instrument redundancy, and increase the potential for the DIS field to evolve more rapidly. In addition to being a resource for existing DIS instruments, the repository may stimulate new areas of research and instrument development given that some constructs are saturated whereas others are lacking in instrumentation. Our preliminary results also signal a need for new instrumentation targeting non-provider stakeholders such as leaders and external change agents (e.g., implementation practitioners or intermediaries), particularly in light of research identifying the role they play in implementation success (e.g., [25]). The ongoing application of our evidence-based assessment rating criteria leads  Fifty-eight instruments did not have an identifiable field of origin. "Psychology" includes clinical, counseling, community, school, sports, social, developmental, and forensic. "Medicine" includes psychiatry, VA, nursing, and pediatrics. "Organizational" includes workplace and business. Public health also includes government agency. Engaging: opinion leaders 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 3 (100%) Engaging: formally appointed internal implementation leaders 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Engaging: champions 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Engaging: external change agents 0 (0.00%) 1 (100%) 0 (0.00%) 0 (0.00%) 0 (0.00%) us to anticipate a dearth of high-quality, psychometrically sound instruments, which will signal a need for instrument development of greater quality. Although the above suppositions represent more short-term implications, long-term implications of this review are twofold, at minimum. First, the application of the EBA rating criteria as described in step 6 will aid in identifying psychometrically strong instruments and a potential consensus battery of high-quality, essential DIS instruments as a basic resource for researchers and stakeholders to advance cross-study comparisons. Second, it is our intention that the SIRC repository will be a dynamic resource. That is, the repository will grow with the evidence base to incorporate newly developed and/or tested instruments, as well as instruments identified via methodologies of colleagues completing relevant research (e.g., crowd sourcing methods). We believe this dynamic process will improve the efficiency and rigor of implementation science evaluations as a whole.

Limitations
There are several noteworthy limitations inherent in this methodology. To ensure rigor and quality of the resulting repository, each step is meticulous and necessarily time-consuming and must be replicated by a second party. As a result, the intensity of time, resources, and personnel required by this comprehensive and multi-faceted methodology may be a potential limitation. Specifically, (a) initial literature reviews to identify instruments for targeted constructs take approximately 1.5-3 h, (b) cross-checking reviews take an additional 45 min-1 h, (c) instrument-specific literature reviews take an average of 2.5-4 h; (d) cross-checking instrument-specific literature reviews adds 1-3 h, and (e) rating requires an average of 50 min to complete. Because of limited funding, these preliminary results have taken roughly 2 years to achieve. It is highly encouraging, however, that the careful creation of project protocols and international support forthcoming for this project have allowed us to engage multiple core worksites and a large task force committed to realizing the goals of the SIRC IRP. Moreover, the lead authors (CCL, CS, and BJW) anticipate receiving grant funding from the National Institute of Mental Health to extend this work to also include pragmatic ratings of instruments, a critical domain for advancing the practice of implementation in real-world settings [26]. Another potential limitation of our work centers on the specific frameworks used to guide construct selection. Basing our work on the CFIR [7] and Implementation Outcomes Framework [8] provides a comprehensive conceptual framework, yet it is clear that DIS investigators employ diverse frameworks delineating unique constructs not included in the SIRC IRP [2]. Nonetheless, we are hopeful that the thoughtful selection of these comprehensive and complementary frameworks will identify and make accessible a range of high-quality instruments that will be relevant to the majority of interested researchers and stakeholders.

Conclusions and future directions
This multi-faceted and collaborative methodology is perhaps the most comprehensive attempt to identify, evaluate, and synthesize DIS instruments to date. Moving forward, we will review literature as it is published to ensure that this repository evolves with developing research, hence the need for a website platform. We have assigned a research assistant to review the Implementation Network monthly e-newsletter for additional instruments of relevance to our comprehensive review. In addition, a function for setting Google Scholar alerts according to our search strings will be implemented to review research published on a weekly basis to add relevant instruments and literature to our database.
Endnotes a Implementation science refers to the scientific study of strategies used to integrate evidence into real-world settings [27]. Implementation practice is the act of integrating evidence into real-world settings [28]. Instrument, in the case of this project, refers to quantitative tools, surveys, or measures that can be administered to individuals to obtain perspectives or information regarding their experience. Psychometric properties refer to outcomes of psychological testing of an instrument that reflects how well it measures a construct of interest with respect to reliability and validity.