An open toolkit for tracking open science partnership implementation and impact

Serious concerns about the way research is organized collectively are increasingly being raised. They include the escalating costs of research and lower research productivity, low public trust in researchers to report the truth, lack of diversity, poor community engagement, ethical concerns over research practices, and irreproducibility. Open science (OS) collaborations comprise of a subset of open practices including open access publication, open data sharing and the absence of restrictive intellectual property rights with which institutions, firms, governments and communities are experimenting in order to overcome these concerns. We gathered two groups of international representatives from a large variety of stakeholders to construct a toolkit to guide and facilitate data collection about OS and non-OS collaborations. Ultimately, the toolkit will be used to assess and study the impact of OS collaborations on research and innovation. The toolkit contains the following four elements: 1) an annual report form of quantitative data to be completed by OS partnership administrators; 2) a series of semi-structured interview guides of stakeholders; 3) a survey form of participants in OS collaborations; and 4) a set of other quantitative measures best collected by other organizations, such as research foundations and governmental or intergovernmental agencies. We opened our toolkit to community comment and input. We present the resulting toolkit for use by government and philanthropic grantors, institutions, researchers and community organizations with the aim of measuring the implementation and impact of OS partnership across these organizations. We invite these and other stakeholders to not only measure, but to share the resulting data so that social scientists and policy makers can analyse the data across projects.


Introduction
For the most part, people live in the safest, healthiest, richest and most democratic period in history (Roser, 2018) partly due to the ability to secure clean water, deliver vaccines, institute the rule of law, and develop ideas of equality and democracy. Despite this, there are rising concerns about the way research is collectively organized, ranging from its escalating cost and lower research productivity (DiMasi et al., 2016;Munos, 2009;Pammolli et al., 2011), to low public trust in researchers to report the truth even if against the interests of sponsors (American Academy of Arts & Sciences, 2017), a lack of diversity of the players involved in the research enterprise and poor community engagement (Puritty et al., 2017;Valantine & Collins, 2015), and a research culture that, among other things, provides researchers with incentives to publish over producing quality research, leading to questionable research practices and irreproducibility (Begley & Ellis, 2012;Nosek et al., 2012;Open Science Collaboration, 2015). Researchers, public research organizations, firms, governments, funders and society more broadly are adopting or supporting open science (OS) practices and OS partnerships to address these concerns (Ali-Khan et al., 2018b;Dai et al., 2018). , preregistration, and the avoidance of restrictive intellectual property. Informed by principles and values these practices aim to reduce transaction costs, promote data re-use, increase rigor and reproducibility, decrease redundant research, better involve patients, consumers and others, facilitate researcher transparency in sharing processes and results, and improve connections with a larger variety of actors to produce more innovative approaches and solutions over the medium to long terms (Gold, 2016;McKiernan et al., 2016). Nevertheless, there exists no single standard for OS with the result that different organizations, governments, and firms apply OS as a label for their own favored set of practices.
This article contributes to the OS discussion by proposing the creation of an open toolkit and data set, based on internationally developed and open measures, to provide an evidence base through which we can collectively determine if, how, when, and where partnerships based on OS principles and practices can contribute to social and economic welfare in general and research and innovation (R&I) in particular. We derived the toolkit based principally on our knowledge of the life sciences but with input from other fields such as information technology and artificial intelligence. Already, the Structural Genomics Consortium (SGC) and the Montreal Neurological Institute (MNI) have agreed to use the toolkit to collect and share data. Acknowledging the different definitions of OS, we set out to measure participation in particular practices rather than determine which set of practices constitute OS.

OS Partnerships
While there are different ways of implementing OS, we focus on partnerships (OS partnerships) in which all partners agree to comply with OS practices in conducting their joint work. Public entities, either with other public institutions or jointly with private firms, can create these partnerships by using and combining the policies, contracts, and infrastructure of institutions to increase knowledge flow and reduce redundancy (Fecher & Friesike, 2014). Relevant public institutional policies include conditions for tenure and promotion, research grant practices, sharing by default, preregistration of studies and analysis plans, the avoidance of intellectual property rights that prevents, patient consent, continuing education and training, publication and data release (Australian National Data Service, 2017). Contracts relate to standardized forms for material transfer, sponsorship, partnership agreements and subject participation. Institutional infrastructure comprises personnel and the physical and electronic infrastructure that support the immediate, free and usable sharing of data, software, policies, and practices (Gold, 2016).
Through these policies, contracts, and infrastructure, those pursuing OS partnerships aim to increase efficiency and reproducibility, and inspire discovery and innovation (Ali-Khan et al., 2017). Two Canadian institutions are prominent exemplars of OS private-public biomedical partnerships: the SGC and the MNI (Dolgin, 2014;Edwards et al., 2009;Poupon et al., 2017). These build on years of open source, open access, and open data partnerships in projects such as Linux, the Apache HTTP Server Project, the Human Genome Project, the SNP Consortium (Thorisson & Stein, 2003), and the Open Source Malaria Project, all of which have delivered significant advance in technology and knowledge.

Amendments from Version 1
This revision responds to the comments of both reviewers. Mindful that, given the method of bringing together a large group of stakeholders, it would not be possible to make substantive changes requiring a new consensus, we have focused on responding to points where there is a lack of clarity or there is an omission in the text.
Changes to the Introduction respond to some of the specific questions raised by Reviewer 1 and includes an additional citation to other work on open science. While we agree that Reviewer 1 raises interesting and important issues, we cannot resolve them at this stage as they would require a new consensus step. As Reviewer 1 points out, however, this is a living document and we expect these issues to feed into future revisions of the toolkit.
We have revised the article to make clear in the Introduction and Conclusion, as Reviewer 2 requested, that the toolkit is based on our experience chiefly in the life sciences but with input from other fields. We have also added to paragraph 2 of the Introduction an acknowledgement that open science includes open software and infrastructure. We have added a better explanation of why Toolkit D cannot be integrated into Toolkit A or B in the Results section. We did add, as suggested, cross-references from Toolkit D to Toolkits A and B. We added in a question in Toolkit A, Question 1, regarding open science mandates. As the purpose of proposing the toolkit is to provide evidence of the outcome of following an open science approach over non-open approaches, we fully agree with Reviewer 2's penultimate point.

REVISED
Despite these successful partnerships, many public research organizations, government policy-makers, researchers, and firms remain uncertain about the costs and benefits of OS and their distribution among stakeholders (Dai et al., 2018). The lack of evidence concerning costs and benefits as well as attitudes and experience, hinders experimentation with OS partnerships upon which to build theory around OS and R&I systems (Ali-Khan et al., 2018b).
To overcome this lack of evidence, we propose here the use of a measurement toolkit to spur understanding of OS partnerships, their effects and characteristics. The toolkit consists of measures through which to collect data to be reported annually, interview guides for semi-structured interviews, sample surveys to assess implementation of OS practices, and other measures that can be collected by or for OS and non-OS partnerships. These shared quantitative and qualitative data are based on a common coding framework (See the Measurement Toolkit below). The policies comprise communication, patient and public involvement and engagement, intellectual property management, promotion and peer review criteria, skill development and training, sharing, and commercialization models. We propose that the toolkit and resource become adopted as a communitymanaged and open toolkit around the globe.
A critical contribution of this article is to propose that prospective data on OS partnerships be collected and shared. A prospective approach will strengthen the quality of the data and move us beyond the more common retrospectively created data sets that inevitably leave theoretical holes, rely on surrogate measures, lack historical context, and result in incomplete data sets (Kemp & Prasad, 2017;Schwartz & Sichelman, 2017). The measurement toolkit will enable prospective collection and sharing of data on OS partnerships. As such, this measurement toolkit will provide richer, more in-depth and harmonized data to better study OS partnerships. With greater knowledge of how these partnerships contribute to R&I, we envision that policymakers and researchers will devise better indicators of success for particular projects or funding programs.
The measurement toolkit was created with quantitative measures and qualitative approaches that research organizations participating in OS and non-OS partnerships could implement for collecting data about their collaborations. Here, we describe how we created these measures through a collaborative process drawing on the expertise of various stakeholders, including researchers, publishers, and funders. We begin with a literature review outlining the rationale for our methodology and our conceptual approach. We then describe the development of the measures. We end with a call to the larger community to comment upon and improve the proposed measures and to begin implementing them.

Literature review
Previous studies have focused more on the practice and implementation of OS and less on the measurable effects that OS may have on better engagement, research efficiency, communications, and priority setting, as well as new delivery mechanisms and new products and services (Jones et al., 2014;National Academies of Sciences & Medicine, 2018;Tripp & Grueber, 2011). For example, some initiatives present both quantitative and qualitative indicators to track openness and transparency in publication and data sharing (Smith, 2017;Smith et al., 2016) and stakeholder understanding and engagement with OS (Ali-Khan et al., 2017;Tuomi, 2016). Other studies developed indicators to investigate how organizations implement OS (Lampert et al., 2017;Nosek et al., 2015;Smith, 2017;Smith et al., 2016;Tuomi, 2016) and a few studies have evaluated the implementation or impact of specific OS policies or practices (Hardwicke & Ioannidis, 2018;Kidwell et al., 2016). Our project differs from the other studies by developing more comprehensive measures of both social and economic influence, research outcomes, diversity and inclusion, trust, and opportunities for youth and early career researchers. Our measures aim to facilitate researchers' understanding of the nature and extent of the impact of OS.
In addition to earlier studies on OS, other studies have proposed measures of innovation in general, such as the OECD's Oslo Manual (OECD/Eurostat, 2005). These measures, however, do not evaluate the relationship between OS partnerships and outcomes. Further, many of these measures are ad hoc to the specific studies and created based on retrospectively created data sets, limiting their use in more generic contexts. Finally, these measures tend to focus on firms using proprietary models, such as open innovation and closed/semi-closed partnerships (Community Innovation Surveys (Mairesse & Mohnen, 2010)); OECD and World Bank Innovation Indicators; OECD innovation scoreboards (OECD, 2010;OECD, 2017); and The Global Innovation Index (Cornell University et al., 2017).
Our aim, in this article, is to propose measures that enable hypothesis-driven research on the influence and impact of OS partnerships on a variety of social and economic outcomes, as well as research culture, rigor, diversity, social capital and patient and consumer voice. The set of measures we propose establishes a global basis for collecting and sharing data and will accelerate not only our collective understanding of OS, but provide support and evidence to those contemplating, implementing or monitoring the effects of OS partnerships.

Methods
We draw on existing methodologies, with the modifications that we discuss below, to develop the set of measures in the proposed measurement toolkit. In particular, we examine the literatures on evaluation of projects, programs, and knowledge transfer. We adopted a three-stage knowledge exchange process to facilitate our development of the toolkit.
The first body of literature assesses whether projects or programs have achieved their anticipated outcomes. This literature relies on logic models to track whether those partnerships deliver outputs that, over the medium and long terms, produce the outcomes promised by those who established the partnership. There are two reasons why logic models are inappropriate for the creation of the measurement toolkit and the set of measures we propose here. First, as noted, logic models are rigid in that they focus on anticipated outcomes within a model rather than exploring foundational questions (Cooksy et al., 2001;Treasury Board of Canada Secretariat, 2012). This narrow focus on anticipated outcomes leaves aside effects that "can be realized by paths other than those presumed by program theory" (Weiss, 1997). Second, we aim for the toolkit to aid in developing theory rather than applying an established theory. As Weiss notes, "if theory is taken to mean a set of highly general, logically interrelated propositions that claim to explain the phenomenon of interest, theory-based evaluation [i.e., a logic model] is presumptuous in its appropriation of the word." Weiss writes that logic models derive from an established theory to evaluate whether anticipated outputs actually result from undertaken activities, but not to develop the theory itself (Weiss, 1997).
Although we do not use formal logic models, we nevertheless acknowledge the importance of developing measures that correspond to potential influences and impact of OS partnerships on R&I systems, diversity, social capital and other critical outcomes. We thus constructed a set of potential hypotheses concerning the influence of OS partnerships, without attempting to eliminate contradictions or alternative pathways. We employed a method of knowledge exchange through which stakeholders come together to identify research questions, jointly construct the measures, collect data and share and analyse that data. In such a method, stakeholders collectively refine knowledge-hypotheses and measures-iteratively until "only the most valid and useful knowledge is left" (Graham et al., 2006). By ensuring a diversity of perspectives in co-creating the set of hypotheses, this process also increases communication and the likelihood of research uptake (Kothari et al., 2011).
We are aware that previously developed measures to describe certain environments have become prescriptive rather than descriptive, often without sufficient analysis of how metrics can establish perverse incentives and perverse side effects (Cain et al., 2005). For example, the use of patent counts and promised licensing revenues from university technology transfer changed from a useful means of comparison to an output measure of performance (Kim et al., 2008). Such practices often lead universities to over-patent and engage in poor licensing practices (Ryan & Frye, 2017). Using descriptive measures as targetssuch as number of patents held-rather than providing a snapshot of current activities, also raises significant ethical concerns over the use and dissemination of measures. These concerns can be partially countered by proposing a large enough set of measures to make it difficult to cherry-pick only a handful of measures that can be gamed. Further, combining quantitative and qualitative measures also reduces the risk of gaming.
We recognize that it is difficult to track causal links between phenomena and ultimate impact (Council of Canadian Academies, 2013). Beyond the difficulties in establishing causation, OS practice varies based on the setting, problem, available resources and stakeholders. Additionally, internal and environmental features can also lead to multiple pathways and interactions between measures and impacts. Some of these features are difficult to capture, including informal knowledge transfer, relationship building, trust and education of new trainees and expert personnel (Nicol, 2008). Instead, we expect relationships between OS practices and outcomes to take the form of a contribution chain that acknowledges influence, but shies away from claiming causation.

A three-stage process
We adopted a three-stage process to implement the knowledge exchange. First, we developed a working definition of OS partnerships based on a review of the literature and of partnerships that consider themselves to be open science. Second, we convened global stakeholders in Washington, DC in October 2017, to map out the ways in which OS partnerships might influence innovation and social and economic outcomes. Third, drawing on these influences and potential outcomes, we brought together experts in measurement, evaluation and empirical studies from a variety of disciplines and countries to develop a prospective set of measures that we propose OS partnerships around the world use to construct data sets.

Stage 2
The global stakeholders we convened in the second stage in Washington, DC in October 2017 included thought-leaders from developed and developing nations, intergovernmental organizations, researchers, governments, science agencies, funders, members from the philanthropic sector, patient organizers, and members from biotechnology, pharmaceutical, and artificial intelligence industries (see extended data, Supplementary File 4 (Gold, 2019) for a list of participants). After presenting our definition of open science and discussing the example of the MNI, stakeholders together engaged in a series of facilitated discussions asking what success of OS means from the point of view of researchers, governments, industry, philanthropies and patients. The organizers then summarized these discussions and represented them to the group for further discussion and elaboration. Ali-Khan et al. (2018a) summarized those discussions, obtained feedback from participants, and published the results. Through these iterative discussions, stakeholders collectively mapped out the different ways that OS partnerships might contribute to innovation and desired or feared social and economic outcomes. Examples of the jointly-created hypotheses included the following: 1) that OS partnerships would simplify and thus increase exchanges of students and postdoctoral fellows between university and industrial labs; 2) that students practicing OS making the transition to tenure track positions would be hindered by not having their own private data set to found their own labs or, alternatively, that these students would benefit by increasing their exposure to a larger network of investigators; and 3) that OS partnerships would increase the quality of data by encouraging researchers to place more emphasis on data quality and reproducibility prior to public exposure or, alternatively, would decrease the quality of data due to the desire and facility of quickly publishing their work and establishing priority.
As these examples illustrate, stakeholders understood the relationship between OS, research, innovation, communities and the public to be complex, and explored different, sometimes contradictory, hypotheses in order to generate, in the third stage, a set of prospective measures that would allow researchers and stakeholders to investigate that relationship. We published the results of that meeting and proposed seven overarching themes for further exploration as follows: 1) Increased quality and efficiency of scientific outputs; 2) Accelerated innovation and impact; 3) Increased trust and accountability of the research enterprise; 4) Increased equity in research; 5) Better opportunities and recognition of early career researchers and youth; 6) Positive economic impact; and 7) Implementation success (Ali-Khan et al., 2018b).

Stage 3
At the third stage, we assembled a group of global experts across diverse fields-including innovation measurement and policy, law, public engagement, bibliometrics, economics, business and sociology-in London, UK in May-June 2018 to develop a set of measures to underpin the development of the prospective measurement toolkit (see extended data, Supplementary File 5 (Gold, 2019) for a list of participants). To provide continuity, we included some participants from the Washington Forum in this workshop. Most participants, however, were new to include individuals with different expertise as well as those involved in other major OS measurement and standard-setting initiatives. The latter included individuals who had worked on the European Commission (EC) OS Monitor, the RAND SGC analysis (Jones et al., 2014), the EC Expert Groups on Indicators and FAIR Data, the TOP Guidelines and the Metric Tide (Wilsdon et al., 2015). We included these individuals to promote alignment and complementary processes between our proposed measures and measurement toolkit with other global OS measurement initiatives.
The goal of this third-stage workshop was to generate prospective measures based on the seven themes produced at the first meeting (Graham et al., 2006). Matching the hypotheses generated in the first workshop to measures enables the testing of hypotheses about the influence of OS partnerships (Canadian Academies of Health Sciences, 2009;Tracz & Lawrence, 2016). Accordingly, we organized participants into groups corresponding to the seven themes identified in the first workshop. These groups developed working documents with a mixture of quantitative (e.g., counts, revenues, patents, students, survey results, etc.) and qualitative (principally semi-structured interview guides) to provide a nuanced set of data through which to study OS partnerships (see extended data, Supplementary File 6 (Gold, 2019)).
Following the third-stage workshop, we reviewed and organized the proposed measures. We eliminated duplicate measures and put aside for future work those that were missing critical information (e.g., lack of data source, coding frame, or clear connection to a hypothesis). We sorted (and in some cases adapted to fit a partnership context rather than a country or region) those measures that could be implemented in the study of individual OS partnerships from those that related to general environmental conditions, such as overall government funding or education levels generally. We also recorded measures proposed at the workshops that were specific to countries, specific databases (e.g., databases of academic articles such as PubMed or Web of Science), or that would require the state to compel information disclosure (e.g., by governmental statistical agencies). Finally, we pre-published the measured on the Gates Open Research platform as a document (Gold et al., 2018) and solicited comments for several months from the general community on them. We revised the measures in light of those comments.
We leave these to others to expand and potentially implement in other contexts. We present our outcomes below.

Results
The outcome is a set of measures that can be collected about OS and non-OS partnerships, and potentially individual institutions or projects, which agree to do so, and the resulting data shared openly. This data will not only create a baseline for analysis but will provide insight into the evolution of research and innovation practices. We divided the measures into separate instruments based on the nature of the measures (quantitative or qualitative), source of the data (participants in the partnership, social science group observing the partnership, or other entity). The seven themes we identified crossed these categories, making them less relevant as an organizing framework of these instruments; nevertheless, we preserved the underlying hypotheses, themes and working group information as metadata to document their origin (see extended data, Supplementary File 3 (Gold, 2019)).
The measures include the following components: Toolkit A: A form of annual report of quantitative data related to the partnership, such as publications and data sets (including their persistent unique identifiers such as DOIs), number of students, student employment post-graduation, authorship, investments, etc.; Toolkit B: A series of semi-structured interview guides to better understand norms, attitudes and understanding across the spectrum of stakeholders involved in the partnership (e.g., do you feel that you derive benefit from your participation in the OS collaboration? What challenges and opportunities does OS present for your business?); Toolkit C: A form of survey to identify implementation of OS practices within the partnership; and Toolkit D: A select number of other quantitative measures that require expertise in advanced social-science methods that cannot reasonably be included as part of the annual report in Toolkit A. These include, for example, measures that require linking publications with citations in the academic, grey or patent literatures. We expect teams external to the collaboration (or a distinct unit of the collaboration) to collect these data and share them.
Beyond this set, we identified a non-exhaustive set of measures that can be best implemented by governments, intergovernmental organizations, research funders, agencies, or database owners that are not specific to any one OS partnership (see extended data, Supplementary File 1 (Gold, 2019)). Finally, we recorded incomplete and rejected measures so that the community may draw on these in the future (see Supplementary File 2 (Gold, 2019)).
The measures we propose are in plain language and are userfriendly in conformity with best knowledge dissemination practice, thus encouraging user uptake (Kothari et al., 2011). We include definitions, data sources and coding rules, in addition to tracing how we developed the measure and underlying hypotheses that lead to it.
In accordance with good practice, the measures we propose are aimed to be transparent and clear in their coding. We also aimed for the necessary data to be cost effective and easy to collect across a spectrum of OS partnerships. As noted in the methodology section, we combined qualitative assessments to support quantitative evaluations. By publishing these measures, definitions and instruments on an open platform that allows comment, transparent updating and review, we have created the opportunity to continuously update the measures, introduce new ones and retire those that prove difficult to collect or share in practice (Wilsdon et al., 2015).

Discussion
We developed the set of measures proposed in this article as a necessary step towards the construction of a global measurement toolkit on OS partnerships, which we see as key to understanding changing research and innovation environments and to the role and impact of OS in particular. We anticipate that partnerships around the world will collect and share data on OS practice and outcomes by drawing on our measures. The resulting measurement toolkit will provide researchers with the ability to validate data and improve the measurement toolkit, and to test hypotheses to develop a grounded theoretical understanding of the contributions, positive and negative, of OS partnerships on research, innovation and social and economic life. Stakeholders can also draw on the data to better appreciate their own organizations and operations. Decision-makers in government, industry, universities and community groups will be able to draw on this learning to structure future OS partnerships and to eventually develop logic models through which to assess particular partnerships.
The economic and social influence of OS partnerships may take years to materialize and may be subject to a plethora of diverse influences. While we recognize that OS successes do not happen in a vacuum, careful empirical analysis of OS will nevertheless help researchers identify key determinants of values and benefits of OS. This will allow the community to propose mechanisms to enable OS practice and to define the contribution chain between OS activity and outcomes.
We acknowledge certain limitations to the measures we propose and call on other researchers to investigate and propose improvements. First, while our stakeholders included individuals and institutions from developing countries, data for some of the measures will be easier to collect and most relevant to partnerships in industrialized countries. This is because data sources will likely be more available in industrialized countries and sharing mechanisms, motivations, and barriers to implementation may differ across countries. Specifically, we recognize that data collection in lower-income countries is constrained by lack of resources, weaknesses in institutional organization, and inability of governments and organizations to collect reliable and appropriate data (Elahi, 2008). Further research is needed to determine the suitability of our proposed measures, to propose additional measures and to investigate ways to access data sources. Second, we derived the indicators predominantly (but not exclusively) from experience with the life sciences, with a particular focus on biomedical science. Whether these indicators are as suitable to other fields such as nanotechnology, information technology, health system analysis, environmental sustainability, arts (digital, visual or performance), agriculture, or history, for example, needs to be investigated.
Finally, to mitigate the dangers of misuse of the measures and their associated data, we encourage those who are using the measures to use them openly and transparently. By doing so, the community can better monitor use of the measures and quickly respond with any concerns arising from their use.

Conclusion
Measuring the influence of OS partnerships is important to improving R&I systems because deeper understanding of OS influence will reduce uncertainty about the relative benefits, positive impacts, and negative impacts of OS partnerships. This uncertainty manifests itself in several ways: in a lack of trust in open and public scientific knowledge generation, in a lack of policy frameworks in some countries and by inertia within public research organizations, and in a failure of researchers, public research organizations, communities, or firms to experiment with OS partnerships.
Implementing the set of proposed measures will lead to a data resource to aid in understanding the role of OS partnerships in R&I systems. This data resource might encourage the establishment of OS partnerships by mitigating the uncertainty surrounding OS partnerships, contributing to a better theoretical understanding of OS, and encouraging a shift towards more openness and inclusivity in science. To fully realize this understanding, diverse communities will need to investigate the benefits and drawbacks of using OS approaches using such evidence-based metrics. By doing so, communities can generate an evidence base regarding beneficial impacts and drawbacks of OS, and share data openly as research data. The data therefore should be FAIR (findable, accessible, interoperable and reusable), and "as open as possible but as closed as necessary" (European Commission, 2016). In order to build a comprehensive data set, it would be advantageous for OS partners to share annual reports and conduct semi-structured interviews and administer the proposed survey at least once every two years. Ideally, we envision that stakeholders will develop an OS partnership that will act as a repository for the data, curate that data, share it and revisit and update, periodically, the measures we propose here. Both the SGC and the MNI have agreed to do so; we invite and welcome other stakeholders to share their data sets should they be willing.

Measurement Toolkit
Foreword This document sets out the measurement toolkit developed in An Open Toolkit for Tracking Open Science Partnership Implementation and Impact in order to build a data resource through which to study and, with that knowledge, build assessment tools for open science collaborations. We recommend that partnerships complete and share the results of the Annual Report (Part A) on a periodic basis, which we suggest being once per year. A group independent from the collaboration's management -to ensure confidentiality of results -ought to administer the semi-structured interviews (Part B) to a representative sample of stakeholders each period. We suggest that the collaboration ought to administer the survey (Part C) at the beginning of the collaboration and periodically thereafter. Finally, we suggest either the collaboration's administration or an independent group ought to develop the measures in Part D during the same period as for the annual report and after having been given access to the results of the annual report.
We envision that this toolkit be implemented through information technology, rather than through manual data entry, with standard nomenclature (e.g., as to departments and institution names). Two OS organizations, the Structural Genomics Consortium and the Montreal Neurological Institute have agreed to draw upon the toolkit to collect and share data.

Toolkit A: Open Science Collaboration Annual Report
Section One: Identity of Partners 1. List the principal academic, community, industrial and governmental partners of the collaboration for the reporting period. For each partner, provide the following details: 6. List any project in the reporting period from question (2) which did not yet result in a publication or in a published data set listed in questions (4) or (5).
Section Three: Measure of Scale 7. List all external awards, prizes and grants that recognize or directly support OS that were awarded or granted to researchers in the collaboration during the reporting period. For each of these awards, prizes or grants, provide the following details: 17. List all current financial or in-kind contributions to the collaboration by industry or philanthropy during the reporting period other than those listed in (7)  23.5. Non-academic researchers, community scientists.

Toolkit B: Semi-Structured Interview Guides
Description This is a semi-structured interview guide that is meant to be administered annually by open science (OS) collaborations. The purpose of the interview guide is to gather substantive qualitative measures of the benefits and costs of OS. The guide is designed to include a wide set of OS stakeholders, including full-time academic staff, early career researchers, individuals from the private sector, research participants, and ethics review board members and/or administrators. The interview results will be used for a variety of purposes including, at an aggregated level, to assess the OS partnership, to study OS partnerships in general, to assess quantitative measures of OS impact and so on.

General Instructions about Consent and Meeting Research Ethics Requirements
Please ensure that, in addition to obtaining consent for use of the raw data by those administering the survey and sharing anonymized or aggregated data generally, that the raw data can be shared with other groups who are operating under a similar protocol and who have obtained ethics approval, even if these other groups are in a different jurisdiction. Also ensure that the nature of the ethics approval and the process that led to it is as openly documented as possible.

Toolkit C: Survey for Measurement of Open Science Engagement Description
Open science (OS) collaborations aim to reduce transactions costs, increase sharing, and build better connections with communities. This survey is designed to identify best practices for these collaborations and to assess the ways in which the collaboration is open.

General Instructions for Selecting Survey Participants
Administer to a representative sample of individuals at stakeholder organizations within the collaboration.
1. Do you believe these things are beneficial? Click all that apply.

Always Partly Never
Open Research Grant Application

Toolkit D: Additional Measures of Open Science
We list here measures that require some analysis, such as identifying the citations (including in patents) to outputs. The list that follows requires, as explained below, expansion.

Paper summary:
The authors claim that Open Science practices have been introduced as a way to address concerns of trust in research. However OS implementation lacks of standards or even best practices with the result that practitioners apply their own definition of OS. As a step forward, the authors present a toolkit to structurally collect and track over time information about OS partnerships, with the aim of assessing and studying the impact of such partnerships in research, society and innovation. The toolkit is the result of a three-phase process involving experts in the domain and collects data about: (i) quantitative measures collected by OS partnership admins every year; (ii) semistructured set of interviews for stakeholders; (iii) survey from participants in the partnerships; and (iv) quantitative measures for Open Science. The set of measures proposed by the toolkit is the result of merging different understandings of OS, allowing to track and measure participation to particular practices rather than defining a one-size-fits-all interpretation.
The exercise focuses on OS partnerships, intended as collaborations where participants all adhere to OS practices. Contracts, policies and infrastructure must therefore find a "way through" to make science reproducible, increase efficiency, and foster innovation. Such virtuous exemplars are invited to use the toolkit in order to track quantitative and qualitative evidence of their OS practices, to provide evidence of the advantages of OS to those organizations, policy-makers, funders, firms which are today skeptical about its cost and benefits trade-off.
Quite importantly, the acquisition of such data will allow funders, policy-makers, and researchers to define appropriate and better indicators of success in terms of OS practices and inherent scientific, social, economic impact.

Feedbacks:
: The definition adopted in this paper focuses on open access and sharing of Definition of Open Science outcomes such as literature and data. According to other interpretations, Open Science includes software as a distinct first-class citizen, beyond literature and data. Recent developments are showing also how "thematic services/infrastructures" (aka digital laboratories) are key for reproducibility and should be part of the "research outcome package". Semantic links between scientific products (article-data, article-software) are also extremely important and not always provided; the absence or obsolescence of such links compromises reproducibility and discoverability of science. Moreover, I also came to appreciate definitions of Open Science that include the notion of "open collaboration", which implies again a discussion on sharing and access rights, but imply the adoption of methodologies and tools (e.g. Virtual Research Environments) opening to collaboration the research life-cycle, so while during experiments are still being performed, and not just before or afterwards.
These observations do not represent reservations about the article, but of course impacts in the kind of measures that are proposed and could/should be included in the toolkit. I see the toolkit as a very good milestone in the Open Science roadmap, I just fear it could be a missed opportunity not to include some key aspects in the current measures.

About measures
It If Toolkit D is intended as list of functions to be used to calculate indicators, then I would suggest to define an explicit link between indicators in Toolkit D and the measures in all toolkits A, B, C. For example, social and economic indicators are key to "depict" the measures in Toolkit B, otherwise only expressed in heterogenous narrative forms (still very important, but hard to process, evaluate and confront). Ideally, around the toolkits, scientists should fire an iterative process of analysis of responses and identification of indicators to be added to Toolkit D.
The fact the proposed measures were defined in the context of life-science is specified as an aside note in the Discussion section. I would say this is a quite key piece of information, that should emerge sooner in the text (if not in the title/sub-title). It is also very important to acknowledge the fact Open Science cannot have ONE definition or ONE interpretation as different disciplines have different research life-cycles and practices (the discussion is very similar to the one undertaken under the FAIR initiatives). In fact, the Toolkits may well introduce the notion of "Community profile", intended as the specific set of measures that are of interest to given communities, to be identified and fine-tuned over time by means of the Toolkits. Such community perspective could be a good incentive for using the Toolkits which could become the means where policies, standards, and best practices can be collaboratively defined by research communities.

Others
To foster Open Science it is also important to provide evidence of the fact its implementation takes to better, or at least equivalent, results when compared to non-Open Science collaborations, especially from the researcher's point of view. Are we aware of any similar toolkits, adopted in more traditional scientific settings, whose data can be used to compare the outcomes of OS and non-OS partnerships?
The impact of Open Access and Open Science mandates on OS partnerships is also another important evidence that could be collected that does not seem to be directly addressed by the Toolkit A and B.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
No competing interests were disclosed. Thank you very much for this excellent review and suggestions. On behalf of my co-authors, I am pleased that you find the article a useful step in the progress of understanding the effects of open science. As you note, we see our proposed toolkit as a first step that will need testing and elucidation over time. From my perspective, the toolkit represents a snapshot of thinking on open science that represents a consensus among a particular group of actors. We anticipate that others will build on and refine it over time. In the meantime, we seek partnerships to begin collecting data using it. As other researchers add other measures to the toolkit, we anticipate that those partnerships will collect data on those measures as well. It is nevertheless important that collaborations start collecting data immediately so that the entire community has access to the data.
Given this and the nature of the process we used, the article and this version of the toolkit are now fixed artifacts. We encourage other researchers to build on them and add additional measures and critique the measures we proposed. Thus, I will respond to your very helpful comments but will not alter the toolkit itself.
Here are my specific responses to your comments.

About measures:
Comment: It is not very clear why the measures in Toolkit D about patents do not find a sub-section in Toolkit A about patent measures. We could not reach consensus Response: on the sources of data to use nor whether the measures ought to be restricted to patents. We encourage others to further develop these measures so that they can be added at a later time. We agree that, once refined, they could add to Toolkit A.

Comment:
For the aforementioned reasons, I would have appreciated a section on open source and software publishing practices and relative measures.
The article Response: was focused on open science partnerships rather than publishing practices overall. I believe that the comments in your review complement the article well and encourage readers of the article to also read and consider your comments.

Comment:
If Toolkit D is intended as list of functions to be used to calculate indicators, then I would suggest to define an explicit link between indicators in Toolkit D and the measures in all toolkits A, B, C. For example, social and economic indicators are key to "depict" the measures in Toolkit B, otherwise only expressed in heterogenous narrative forms (still very important, but hard to process, evaluate and confront). Ideally, around the toolkits, scientists should fire an iterative process of analysis of responses and identification of indicators to be added to Toolkit D.
As noted above, we were not able to reach consensus on Response: should fire an iterative process of analysis of responses and identification of indicators to be added to Toolkit D.
As noted above, we were not able to reach consensus on Response: the measures in Toolkit D. As they all need more work and could significantly alter in the process, I suggest it is premature to link them up at this time to Toolkits A, B and C.

Comment:
The fact the proposed measures were defined in the context of life-science is specified as an aside note in the Discussion section. I would say this is a quite key piece of information, that should emerge sooner in the text (if not in the title/sub-title). It is also very important to acknowledge the fact Open Science cannot have ONE definition or ONE interpretation as different disciplines have different research life-cycles and practices (the discussion is very similar to the one undertaken under the FAIR initiatives). In fact, the Toolkits may well introduce the notion of "Community profile", intended as the specific set of measures that are of interest to given communities, to be identified and fine-tuned over time by means of the Toolkits. Such community perspective could be a good incentive for using the Toolkits which could become the means where policies, standards, and best practices can be collaboratively defined by research communities.
Perhaps we ought to Response: have highlighted our focus on life sciences earlier, but our intention is to create a more general toolkit that can be used to compare effects across fields. We anticipate that different fields and communities will draw on the toolkit to develop community specific measures. However, we urge all communities to collect data on as many of the measures as possible so that there exists a set of measures that allow for comparison.

Others:
Comment: To foster Open Science it is also important to provide evidence of the fact its implementation takes to better, or at least equivalent, results when compared to non-Open Science collaborations, especially from the researcher's point of view. Are we aware of any similar toolkits, adopted in more traditional scientific settings, whose data can be used to compare the outcomes of OS and non-OS partnerships? We Response: conducted an extensive literature search prior to preparing the article, drawing on the expertise of all those who attended the workshops. We identified some measures, as noted in the article, that we brought into the toolkit but none of the existing sets of measures was sufficient for the purposes in their existing forms.

Comment:
The impact of Open Access and Open Science mandates on OS partnerships is also another important evidence that could be collected that does not seem to be directly addressed by the Toolkit A and B.
We looked at actual practices within the Response: partnerships rather than the reasons that motivated adoption of those practices. We sought to encourage the collection of data on those actual practices. To the extent that a mandate was adopted formally or informally within the partnership, they are captured by the detailed questions in Toolkit C.

Competing Interests:
04 September 2019 Reviewer Report https://doi.org/10.21956/gatesopenres.14064.r27758 'developing world' that is similar, perhaps this is even worth commenting on and highlighting as something to be actively aware of in the future.
Was there any discussion too about potential sources of bias in the datasets? For example, Web of Science is mentioned, but known to be heavily biased against research from the developing world again. If this toolkit is meant to be a global one, then I feel some mention of this could be needed.

Results:
Generally, I think the 4 different toolkits are very well thought out and designed.
Is the toolkit only available in English? I think you know where I'm going with this.
Discussion: I'm glad to see that the authors want to see the toolkit revised. I think this is important, especially given the dynamic nature of open science at the present (and presumably the future).
The mention here that this is geared towards the life sciences needs to be made much clearer and earlier on, I feel. It popped out as a surprise here! Conclusion: Do you think that maybe government funders, publishers, researchers and librarians should maybe all come together to build something like a shared open data infrastructure for open science? Not one that is closed, proprietary, and based on heavily biased data sources like some other services. I think this would be vastly superior and key, as such data could have potentially great uses beyond simply evaluation.
Other general thoughts: Measure all the things! Reading through this, and looking at the four main elements, pretty much all of this is about data gathering and surveying open science collaborations. I wonder, do the authors feel this might have any unintended consequences? Administrators love metrics, and I wonder if for example, such data could be used in potentially deleterious ways to impact researchers, their work, or careers. Alternatively do you feel that if Goodhart's Law comes into practice, such that researchers start 'gaming' metrics in a way that is beneficial to themselves, but also (open) science, do you think this is a good thing? I know this is mentioned in the methods section briefly, but dismissed a little too easily. I think it would be a good idea to be more critically reflective on such potential issues.
Some of the keywords are in the title, and so probably redundant for SEO things.
For the toolkit, do you see there being any associated costs with managing it?
Who is actually supposed to gather the data for the toolkit? I can imagine many researchers would be thrilled to have such an additional administrative burden! Overall, I think this is a valuable resource to have created, and congratulate the authors on such a large effort. I hope the comments here are useful and help to improve the MS a bit. .

française des sciences de l'information et de la communication
Publisher Full Text 2. McKiernan E, Bourne P, Brown C, Buck S, Kenall A, Lin J, McDougall D, Nosek B, Ram K, Soderberg C, Spies J, Thaney K, Updegrove A, Woo K, Yarkoni T: How open science helps researchers succeed.
. 2016; . eLife 5 Publisher Full Text Is the work clearly and accurately presented and does it cite the current literature? Yes Is the study design appropriate and is the work technically sound? Yes

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? Yes
As mentioned in the review report, I know several of the authors here personally.

Competing Interests:
Reviewer Expertise: Palaeontology, Open Science, Peer Review I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.