Social responsibility in research and innovation practice and policy across global regions, institutional types, and fields: Interview data and qualitative content analysis outputs revealing the perspectives and experiences of professionals

The European Commission-funded RRING (Responsible Research and Innovation Networked Globally) Horizon 2020 project aimed to deliver activities that promoted a global understanding of Socially Responsible Research and Innovation (RRI). A necessary first step in this process was to understand how researchers (working across Global North and Global South contexts) implicitly understand and operationalise ideas relating to social responsibility within their day-to-day work. Here, we describe an empirical dataset that was gathered as part of the RRING project to investigate this topic. This Data Note explains the design and implementation of 113 structured qualitative interviews with a geographically diverse set of researchers (across 17 countries) focusing on their perspectives and experiences. Sample selection was aimed at maximising diversity. As well as spanning all five UNESCO world regions, these interview participants were drawn from a range of research fields (including energy; waste management; ICT/digital; bioeconomy) and institutional contexts (including research performing organisations; research funding organisations; industry and business; civil society organisations; policy bodies). This Data Note also indicates how and why a qualitative content analysis was implemented with this interview dataset, resulting in category counts available with the anonymised interview transcripts for public access.


Introduction
The European Commission-funded RRING (Responsible Research and Innovation Networked Globally) Horizon 2020 project aimed to deliver activities that promote a global understanding of responsible research and innovation (RRI), including launching a global network around such ideas.However, a necessary first step was to understand how research and innovation professionals (working across Global North and Global South contexts) implicitly understand and operationalise ideas of responsibility within their day-to-day work.
RRING therefore undertook structured qualitative interviews to gain a 'bottom-up' perspective on RRI, revealing existing local practices and policies that underpin research and innovation globally, in addition to how such practices and policies may need to change to better align with societal needs and values.Fundamentally, this required insights on perspectives, processes and practices from professionals employed around the world, to ensure that the organisation of research and innovation is ethical, socially inclusive and suitably addresses public concerns.The qualitative data presented here was analysed and reported alongside a large quantitative survey dataset in a major RRING project deliverable (Jensen et al., 2021).This empirical research is connected with the project's policy agenda, which leveraged and bolstered UNESCO's global policy instrument on RRI, called the Recommendation on Science and Scientific Researchers (Jensen, 2022a;Jensen, 2022b).
Our intention in publishing this Data Note is two-fold.Firstly, the move to considering RRI beyond its Eurocentric roots is in its infancy, and thus we are expecting a potential ballooning of research in this area.We hope that our data, resources and the detailed procedures outlined in this Note are therefore of future (re)use.Secondly, given our interest in research responsibility, it is only appropriate that we ourselves follow high standards in transparency and open access.

Methods
Research instrument: structured interviews Structured interviews were selected as the most consistent method for collecting additional in-depth data on RRI practices across the 17 countries.Consistency in the lines of questioning (including allowable follow-up questions) across the countries was considered particularly important given the range of interviewer experience.Each interview involved nine sets of questions, and specific interview protocol guidelines were provided to interviewers on how the interview was to be conducted.
Interviews were conducted either face-to-face or through virtual calls.Although face-to-face interviews allow for more personal contact and clarity in communication, virtual interviews were allowed where physical/financial limitations prevented face-to-face communication.The structured interviews generated reliable, focused, and uniform data relevant to producing a more comprehensive overview of current RRI practices globally.

Research instrument design
Country selection.To meaningfully attain an in-depth understanding of the ground-level experiences of research and innovation -from those who are actually performing those research and innovation roles -it was necessary to focus our efforts on specific countries in the world.The purpose of doing this was not to provide a representative sample of the world (or indeed its constituent regions), from which we could draw context-free conclusions of how (responsible) research and innovation is, or should be, done.Instead, the purpose was to tease out and qualitatively illustrate the range of research and innovation experiences across cultural contexts.This sub-section herein details our approach to country selection, through which this range was investigated.
The boundaries of country selection were steered by the search for sufficient spread across global regions, namely via the UNESCO world regions classification (UNESCO, 2022): Europe and North America; Sub-Saharan Africa; Asia; Latin America and the Caribbean; and the Arab world.Due to the heterogeneity of the regions and countries around the world -and the many different aspects that RRI concepts involve, added to the local accessibility to data and local partners -the country selection was done on a multi-based criterion.Hence, in each region, the selection was based on the following five stages in the selection process: Stage 1. Application of objective criteria to make initial selection: • All countries were evaluated on the basis of their Gross Domestic Product (GDP) per capita in USD (World Bank, 2019) and Gross Expenditure on Research and Development (GERD) (UNESCO, 2019) to maximise sample diversity.One high and one low ranked country was selected for both GDP and GERD, with an alternative high/low country also identified for each GDP/GERD variable.The reasoning was to ensure that a range existed across the region in terms of domestic spend on research and innovation.
• A minimum population size of two million was set for country selection, as a proxy for ensuring that the respective country was large enough to have its own defined sector(s) conducting research and innovation.
• Only countries with a Travel Advisory Level of 1 and 2 were selected (except for Turkey), as per US State Department Travel advice.The rationale here was to ensure that our interviewers were safe and that also the political situation was stable enough in the respective country, so as to be able to draw conclusions on the support structures that are in place for research and innovation.

•
The main exception is countries that were specified to be in the sample in the Grant Agreement (e.g.US, India).These were indicated as 'must select' countries and further contributed to establishing a diverse sample.
Stage 2. Capacity of partners to collect data in initially selected countries • Partners were canvassed to identify which selected countries they would be capable of helping with and/ or lead on.In case no partner was available in the primary selected country, partner availability was determined for the alternative country, and a decision made accordingly.
Stage 3. Subcontracting alternatives for countries that project partners could not cover

•
In countries where partners were not available to conduct interviews for both the primary selected and the alternative country, then University College Cork (as coordinator) and Anglia Ruskin University (as data collection lead) investigated options for a subcontract, with active input sought from partners and their networks.
Stage 4. Revisiting options if specific countries were too difficult to access Based on this selection process, we selected four countries for each of the five world regions: one high and one low country for both GDP and GERD, for each region.These were locked in and pursued in earnest.However, a small number of these countries could not be included in the final dataset either due to unforeseen difficulties in undertaking the interviews (e.g. one country's central government would not formally allow the interviews to happen), or because of the data submitted did not meet the project's quality thresholds required for analysis.Table 1 thus details the very final list of countries (including their GDP and GERD information) that were included in the final interview dataset.
Participant sampling.The selection of participants from each country was based on standardised selection criteria, which each country's interviewer team used as targets for participant recruitment: • Number of interviews: A minimum of five interviews conducted per country.
• Gender: A 50-50 target split between men on the one hand and women and/or other gender identities on the other, with an acceptable minimum of 40% representation of women and/or other gender identities, per country.Protocol design and requirements.The priority for the interviews was to probe participants' personal interpretations, perceptions and understandings of RRI-like practices, as part of their day-to-day professional work.In particular, the interview protocol's questions focused on generating insights on the participants' in-situ experiences and indeed their social construction of such RRI-like practices.In this way, the interview protocol was not designed to specifically target 'factual' evidence and information.
The interview protocol included nine sets of questions, organised across: types of research and innovation activities performed; public engagement; aligning with ethical values; open access and open data; meeting societal needs; anticipation; diversity and gender equality; responsibility; and, closing reflections.We also insisted that all interviewers did not mention "Responsible Research and Innovation", "RRI" and even "responsibility" in the framing of the interview and/or in the phrasing of most questions, as we wanted to ensure participants maintained their focus on their own experiences, as opposed to e.g.being distracted by performative ideas of what they thought interviewers believed responsible practice entailed.
The interview protocol was peer-reviewed by three RRING colleagues who were not involved in the interview planning, implementation or analysis.The review focused on academic standards (e.g.rigour, consistency, novelty).The protocol was additionally peer-reviewed by RRING's Gender Sub-Committee, to ensure intersectional issues were adequately accounted for, both in terms of question content and interviewer guidance.This protocol is available in Foulds and Sule (2019).
Alongside the protocol, a fieldnotes form was circulated to all interviews (acting as interview memos).This short interviewer survey asked for brief reflections on participant familiarity with the terms used; the atmosphere during the interview; moments where the interviewer particularly influenced participant responses; reflections on the method used, in particular the structured nature of the interaction; etc.etc.These fieldnotes were completed as soon as possible after the interview by the interviewer(s) themselves.The fieldnotes had the dual purpose of being actual data, as well as providing context for the data analysis (all analysts were instructed to read the respective fieldnotes entry before coding each interview).
The fieldnotes form template is available in Foulds and Sule (2019).
The interview protocol and fieldnotes template were piloted twice, in English.The two pilot interviews were conducted in the UK and in South Africa, both of which also formed part  of the final dataset too.These pilots led to improvements to the protocol and interviewer guidance relating to, for example: question phrasing; precisely when and how one could deviate from the structured lines of questioning; and transcription requirements.
All interviews were audio recorded, transcribed, and then translated from local languages into English.Transcripts were written up, alongside the fieldnotes, as soon as possible postinterview and similarly submitted for quality assurance and consistency checks centrally as soon as possible too.
Anonymised versions of the 29 interview transcripts, for which permission was granted for public sharing, is available in RRING Project (2021).

Data analysis and validation procedures
Analysis approach: qualitative content analysis.Qualitative content analysis was used as the primary data analysis method.It focused on forming thematic categories through a consistent set of codes applied to textual data (in this case, transcripts and fieldnotes) (Morgan, 1993; also see Jensen & Laurie, 2016).Content was analysed both descriptively and interpretatively, with the spotlight put on the thematic categories and codes with the highest prevalence across all the interviews.The analysis was led by the second author of this paper.
The coding and analysis of the interviews took place across five phases, which are now detailed in turn.Analysis of the qualitative data was done using NVivo 12.
Analysis Phase 1: Inductive coding and preparation of the codebook.Inductive coding was conducted using a grounded theory approach.Following guidelines in Bazeley andJackson (2013), Miles et al. (2014) and Saldana (2016), various stages of coding and recoding were done for progressive refinement of the codes generated, divided into two cycles.In the first cycle of coding, an eclectic combination of attribute, structural, descriptive, in vivo, value, versus and holistic/lumper codes, were used.Coding was led by the objective of identifying best practices in research and innovation and determining the participants' perspectives on the various structured interview themes.In the second cycle of coding, thematic codes were used to categorise the sub-level codes into higher-level themes identified within the context of the research objective.In this stage, although the interview structure guided the formation of themes, it was not used deductively to generate categories.As a result, other cross-cutting themes like 'Conflicts in theory and practice' and 'Collaboration' also emerged.
Initially, pilot coding was carried out for two interviews.Based on this analysis, a pilot version of the codebook was prepared.This was then peer-reviewed and subsequent revisions were made.After this, a preliminary codebook was prepared based on the qualitative analysis of 30 interviews.This codebook contained 257 codes under 13 categories.The coding was done to account for both cross-cutting (i.e.across all the interview questions and all the geographies/fields/etc.etc.)themes (e.g.enablers, constraints, conflicts, etc.etc.), as well as context-and question section-specific subject matter based on the structured interview-based themes (e.g.public engagement, open science, etc.etc.).After subsequent peer reviews (by first and third author of this paper), revisions were made to the codebook, including tackling boundary issues, complexity, sheer number of codes, coding instructions, etc.etc.This revised version, which contained 117 codes under 12 categories, was then used in the coder training phase.The codebook's 30 interviews were selected from 11 countries to ensure a good distribution of country representation and, within each country, at least one interview from each gender was selected.Of the 30 interviews analysed, approximately 40% of interviews were with women.In addition, all research and innovation fields and institutional types were covered in a fairly even distribution.An anonymised version of the final codebook is available in Foulds et al. (2019).
Analysis Phase 2: Coder training.Coding of the remaining 84 transcripts was done deductively by a team of three coders (in addition to the Lead Coder, also this paper's second author), using the codebook from Phase 1.For this, the coders were provided with extensive training in two practice rounds.In the first round, a full-day training workshop was held that included the methodological lead (paper first author), all four coders, and an observer from one of the partner organisations responsible for coding quality assurance.The coders were given sufficient time to go through the codebook and familiarise themselves with all the codes in advance.In the first part of the workshop, the codebook and the coding process were further explained to all the coders, giving them the opportunity to discuss and ask questions wherever necessary.In the second part, the coders were given a pre-prepared practice transcript with coded text highlighted and bracketed in different colours with blank spaces for inserting codes.This was done in accordance with the method proposed by Campbell et al. (2013) to determine inter-coder agreement.
In the last part of the workshop, the coders submitted their coded transcripts, which were then compared to determine inter-coder agreements.The coders discussed their common experiences and compared notes to better understand the codes and how to use the codebook deductively moving forward.Based on these discussions, further improvements were made to the codebook, relating to guidance on e.g., simultaneous coding, length of coding, repetition of text, making inferences, boundaries between codes, and coding gaps.
In the second practice round, each of the four coders was given a separate second practice transcript to be coded independently.Coding was then compared with the Lead Coder over virtual calibration meetings, and inter-coder agreement determined and reached.It was found that percentage agreement with one coder was below the minimum standard of 61%.Additional training was therefore carried out with that one coder (Coder 3).A new practice transcript was provided to Coder 3, and inter-coder agreement was again determined with the Lead Coder.The two coders then discussed their coding through another virtual calibration meeting, reaching an agreement of 82%.This second practice round also led to some minimal revisions to the codebook, mainly concerning the ambiguity of certain definitions.
Analysis Phase 3: Deductive coding.The finalised codebook from Phase 2 was used by the three additional coders to deductively code the remaining 84 interview transcripts.An NVivo shell file was provided to ensure consistency in the deployment of the coding scheme.Regular review and feedback sessions were also conducted periodically during the deductive coding phase, between each individual coder and the Lead Coder.
While coding for the remaining 84 interviews was mainly to be done deductively, the coders were expected to flag any critical new codes and reach a satisfactory inter-coder agreement.The distribution of interview transcripts was done through a just and fair process.Initially, 10 interview transcripts were allocated to each coder, distributed numerically based on interview code.After the initial distribution, subsequent transcript allocations were based on a first-come-first-served basis: coders who completed their coding task faster were consequently allotted a higher number of interviews.
Analysis Phase 4: Inter-coder reliability checks.The final statistical assessment of inter-coder reliability was conducted on about 21% of interviews (18 interviews) using Krippendorff's Alpha (also called Krippendorff's Coefficient) (Krippendorff, 2011).The values for this intercoder reliability analysis were calculated using Krippendorff's Alpha Python implementation 'fast-krippendorff' (Pln-Fing-Udelar, 2019, no pagination).Since values were given as the frequency of a variable's occurrence, the interval metric for Krippendorff's Alpha was used.This accounts for the interval scale that is being used, meaning that the difference between one and 10 is weighted more severely than the difference between nine and 10 in the application of the statistical test.
Initially, nine interviews (about 11% of interviews coded) were selected from the 84 deductively coded interviews for inter-coder reliability testing.These nine interviews were chosen through random sampling, ensuring a proportional distribution of interviews from each coder based on the total number of interviews coded.Table 3 presents an overview of this distribution and selection process.Excel's random number generator was used to randomly generate the number of the interview to be tested.The results of the inter-coder reliability test for each code are presented in Foulds et al. (2019).
The initial inter-coder reliability analysis found that only four of 117 codes had a Krippendorff's Alpha value below the commonly accepted threshold of 0.8.These were "71: Personal responsibility and morality" (0.79), "88: Anticipation" (0.77), "91: Responsive approach" (0.78) and "117: Difficulties in collaboration and engagement" (0.74).For 15 codes, an alpha score could not be successfully calculated, as these were not used during coding, and a code count greater than 0 is required to calculate Krippendorff's Alpha.Arguably, this also represents a perfect agreement, as both coders decided not to code the variable ever.However, it is usually advisable to either extend the sample size for the inter-coder reliability analysis to increase the probability of encompassing these codes, or to decide not to use the variable for further analysis, should it be detected, as the reliability test does not evidence the coders' ability to independently detect its presence, but only its absence.Hence, another nine interviews were randomly selected from the 84 interviews, to get a total test sample of about 21%.The inter-coder reliability analysis was then repeated using a test sample of 18 interviews.This final test showed that only seven of 117 variables yielded an alpha value below the commonly accepted reliability threshold of 0.8.These were "40: Empowerment tools" (0.78), "42: Campaigning-Lobbying" (0.75), "53: Diversity and inclusion" (0.76), "56: Gender diversity" (0.72), "57: Ethnic and religious diversity" (0.77) and "67: Discrimination-a non-issue" (0.71).We still proceeded with these seven codes, as they were in the zone of acceptability.
Additionally, the code "113: Ecosystem of support" (0.41) yielded an unacceptably low alpha value, leading to its rejection for further analysis due to its poor reliability.As this is the only code with a drastically lower alpha value, it can be considered as an outlier and does not impede the reliability of other codes.This is the only code that we excluded from further analysis.
For six codes1 , an alpha score could not be successfully calculated, as a code count greater than 0 was required to calculate Krippendorff's Alpha, meaning the variable was never present in the coded data, according to both coders.Given that the recommended sample proportion for inter-coder reliability had already been exceeded with an inclusion of about 21% of all data, it was deemed reasonable to proceed with these categories which are simply relatively low prevalence in the sample.Therefore, these six codes were retained for full coding and analysis.
On average, coders achieved a Krippendorff's Alpha value of 0.95, and a reliability of over 0.8 for 89% of variables.These are good results overall, indicating a robust coding process.In each of these seven themes, code counting was done at the level of UNESCO global regions.Code counts for these major themes provided the first step for analysis; an efficient review and comparison of code counts highlighted the most prevalent themes during the interview conversations, while also pointing to issues which (if at all) had been undermined or paid less attention.This helped in targeting key codes for further qualitative interrogation and interpretation.For this, four codes with the greatest prevalence across all 133 interviews (as indicated by having the highest counts) were selected for a more in-depth analysis of each theme.All code counting results are presented in Foulds et al. (2019).
Within these codes that were identified based on their prevalence, the associated code interview text was then interrogated and analysed more deeply.Preliminary findings are included in the RRING project deliverable report that this work fed into (Jensen et al., 2021).

Ethics policies and informed consent
The data collection and analysis methods were approved by the Departmental Research Ethics Panel located within Anglia Ruskin University's Global Sustainability Institute (reference number of GSIDREP-1819-003; approval date of 28 November 2018).This process ensured ethics experts signed off on, for example, the Participant Information Sheets and Informed Consent forms.
Informed consent was obtained for all participants, prior to their data being included in the analysis and any subsequent publication, as per the EU General Data Protection Regulation requirements.Consent was obtained pre-interview through an email correspondence, where the participant was asked to print, sign and scan the consent form, and email it back to the interviewer after reading the Participant Information Sheet.If this approach was not viable, the Participant Information Sheet was read before the interview and consent was instead audio recorded.Whilst all participants (n=113) consented to the data being anonymously used in analyses and in our final publications, only 26% (29 participants) consented to the anonymised transcripts being published in an open access data portal.
• Code counts covering all codes across all 113 interviews, including organisation at both themes and world regions.
• Inter-coder reliability data and test results.
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
We are grateful to numerous colleagues for conducting the interviews, including those from the following organisations:

Poonam Pandey
Post-Growth Innovation Lab, Universidade de Vigo, Vigo, Galicia, Spain This data note on 'Social responsibility....experiences of professionals' is a methodologically sound description of the data collection and data analysis protocol that the authors followed.In my view such scholarly products can act as useful guides for researchers who are undertaking similar scholarly exercises.I recommend that the note should be indexed with minor revisions along following lines: It would be very useful if the authors could add a separate section on the limitations and challenges of using a qualitative data collection approach that is too focused on harmonization and standardization.For example, it would be nice to reflect on the use of methodological tools such as 'Krippendorf's alfa value' and what negotiations are made during making the data more 'inter-coder reliable'?Is there a compromise, in terms of the depth and richness of interpretation, when one chooses to deal with qualitative data in large quantity.It would also be useful to reflect a little on the biases embedded in the software, and the ethical challenges implied when individual coders analyse the data of a context for which they do not have much understanding of its socio-economic, cultural, historical, and political settings. 1.
The choice of countries based on GDP and GERD in each region is not symmetrical.Its not clear why some countries are chosen because they have high/low GDP and why some countries are chosen because they have high/low GERD.A lot goes between different permutations and combinations of GDP and GERD that needs to be explained by political, historical, and socio-cultural factors.A clear reasoning should follow the choice of these countries and their GERD and GDP.

2.
I looked at the additional interview data for India (two interviews).It appears that the same interview is uploaded twice!I request the authors to double check any similar discrepancies.

3.
Is the rationale for creating the dataset(s) clearly described?Yes The paper presents, in great detail, the procedures undertaken to collect, analyse and interpret the data related to responsible research and innovation.The authors provide publicly available materials from their work.
My suggestions are more related to clarity of the text presented.
First, although being a data note, I would argue that including a reporting checklist (e.g.COREQ) would give readers more easier ways to navigate the text.It would be better to specify which sentences are referring to each element.

○
Moreover, although it is not usual in qualitative research, I think that a flowchart of the process would help greatly in understanding of the analysis phases.
○ Table 2 could be removed to the end of the text, or placed as Supplementary file.
○ Also, instead of referring to the author order, maybe adding initials of researchers would be preferred, as well as description of their experience and background.
○ Also, there are several typos in the text which could be corrected (repetitive etc. on p11).

○
Once again, thank you for opportunity to review this manuscript.Interview transcripts -such as those shared in the data set described in this data note -are notoriously difficult to use productively outside the context and process of their original collection or construction.It is therefore interesting to see how the data note describes this context and process in great detail.At first this description reads as an attempt at purging the interviews and the subsequent analyses of any subjectivity on the part of the investigators.This would pave the road for future replications.It might also give the categories (codes) developed inductively a near context-free character, as implicitly required in most quantitative analyses.At the very least, however, I think it is fair to read this process as an attempt at securing that the variations in the data stem from the interviewees and not from the interviewers and the analysts, who themselves span a great variety of contexts and approaches.The data note, then, can be read as a description of both the methodology of constructing the data and of the data itself.

Is
For my review, I will take a slightly different approach.The data set merits a brief structural description, adapted to the perspective of this review.In my perspective, it has four layers of data, which we may call construction data, interview data (transcripts), codes (or categories), and code counts (or category counts).
Construction data corresponds to the description of procedures for generating the collection of interview data.These procedures are meticulously described in such a way that they are suitable for replication by other research teams.They also have a data character because they give a basis for deconstructing the protocols for sampling and interviews, should someone wish to really test the robustness of a study that aims to broaden the cultural basis for the very important (and very fluid) concept of RRI.For example, in order to study how the processual aspects of the EU commission's six RRI policy pillars are balanced against (or integrated with) RRI as a matter of goals and outcomes (which also occurs in the interviews).Or, in the words that the authors themselves use when describing the variation between interviewees, to understand how the authors "implicitly understand and operationalise" their own research in this project.
Interview data is usually considered the primary data of such a study and could also be called the data proper.The full data set contains transcripts of 113 structured qualitative interviews, 29 of which are made available in the publicly available data set.
The codes are a form of analytically based thematization of the interview data, generated in this project through a grounded-theory approach, guided by the analytical purpose of the project (i.e., an investigation of the global variety of "best RRI practices" in actual R&I).The inductive source of the codes in the interviews themselves make them data in their own right.In line with requirements of a qualitative content analysis, 116 codes grouped under 12 headings are developed in this way and then applied to all 113 interviews as a way of data reduction.The detailed coding of sections of the text is not shown in the 29 transcripts.
Counts of the occurrence of these 116 codes are then generated across all 113 interviews and their contexts (i.e.gender, country (and UNESCO region), the four sampled R&I fields, and institutional type or role in R&I system).This makes the code counts an appropriately reduced dataset for a qualitative content analysis, i.e. in principle without having to consider the interview transcripts themselves.This data reduction is described in the data note and performed in a different paper.These counts are made available in a spreadsheet with a 113x116 count table and a number of tables that are aggregated over two or three levels across country (and region), R&I fields (renamed as themes), and institutional type (renamed as stakeholders) and also across the 12 groupings of codes.
Is the rationale for creating the dataset(s) clearly described?
The rationale for collecting the full dataset in the project is clearly described: To empirically understand how researchers (and other relevant actors) in a wide array of contexts "implicitly understand and operationalise ideas relating to social responsibility within their day-to-day work".
The rationale for making the subset available (the protocols, the 29 transcripts where the interviewee allowed anonymized publication, the 116 codes and their counts across the 113 interviews) is to enable future (re)use of the data, resources and detailed procedures for generating the data and the codes in a potentially ballooning field of research.Also, the authors wish to practice social responsibility in their own work by following high standards of transparency and open access.

Are the protocols appropriate and is the work technically sound?
Absolutely, given the purpose of the project and its chosen strategies.

Are sufficient details of methods and materials provided to allow replication by others?
Data generating methods (including procedures) are meticulous, and meticulously described.
Because the multi-stage sampling is strategically honed for variation in significant contexts and not for creating a random sample, replications as grounds for a quantative meta-analysis should hardly be expected.Rather, the value of the procedural documentation is that it may make it easier not to lose the connection between the code counts and the actual interviews.In theory, this should open for alternative analyses despite the unavailability of the major part of the interview transcripts.

Are the datasets clearly presented in a useable and accessible format?
The construction data are as clearly presented as any.
The interview transcripts are useable and accessible.The only problem is the sample selfselection bias (willingness to release the transcripts), but 29 is definitely more useful than 0.
The codes (the what) are meaningful and relevant for the project's purpose and for related research interests.
The code counts (the what and how many) should be read as data for a qualitative content analysis in line with Morgan's (1993) recommendation, i.e. to be used as a basis for investigating why and how these counts appear in the data.Because the sample is not random, other uses should be carefully considered, as the data note also indicates (at least indirectly).This may limit its appropriate usability.To work with the aggregated tables rather than with the full 113x116 dataset is cumbersome.This is illustrated by the 1,200 pages long project report (Jensen et al., 2021), where hundreds of pages are dedicated to discussing said tables, and -significantlyenhanced by quotations from interviews not available in the public datasets.Many analysts would say that this demonstrates the difficulty of using the code counts alone as a reduced data representation of the 113 interviews.However, this would depend on the purpose of the analysis.Many analysts would simply prefer the full code count dataset (113x116 counts) to include the background variables to make it 113x129 (with binary variables for gender, R&I field, and stakeholder because some interviews had more than one interviewee and some interviewees represented multiple contexts).Then they would also be free to use the aggregation and statistics software of their choice to look for patterns in the data, including co-occurrence, which is not readily visible in the aggregated tables in the provided spreadsheet.
Finally, I have some minor comments on the actual text of the data note, all intended to enhance readability.
The UNESCO region that comprises Asia does also include the Pacific, but it is only named Asia in the text.The use of full official terms in the text will enhance clarity.

○
Participant sampling, relevance: It is unclear whether "experience of RRI-like activities" excludes individuals who are mainly researching RRI as a phenomenon.This is clarified in the following section ("RRI practitioners"), but a slight rewording might improve the flow of the argument.

○
A fieldnotes form was "circulated to all interviews" (p.5).I expect this is a typo for "interviewers".

○
Transcripts were "submitted for quality assurance and consistency checks centrally" (p.11).I read this to be a check with the interviewees until the word "centrally" appeared.Moving "centrally" to between "submitted" and "for" would steer away from the alternative interpretation of interviewees reviewing what they said in their interviews.

○
Code counts pointed to issues which "had been undermined" (p.13).At first I wondered how interviews that were so carefully planned could have sabotaged the surfacing of important issues.Then I understood that it might be a question of digging less into those issues than the occasion would have permitted (sub-optimally mined).A different word might clarify the meaning.

Conclusion
In conclusion, I believe the main re-use of the data (at all four levels) will be in interpreting the analytical and published output from the project and the generation of new ideas for related research.Which would be no mean feat.I would recommend indexing of the data note and a slight re-organization of the code count data to include the background variables in the dataset at interview level as described above (i.e., to be a 113x129 table ).

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Partly Competing Interests: No competing interests were disclosed.
Reviewer Expertise: My relevant areas of research include studies of innovation processes and systems, the functions of research and innovation in society, policies and practices of sustainable industry development (including smart specialization), and worker participation in technological development.Relevant methodological approaches include quantitative and qualitative studies, participatory action research, and impact and process evaluation of public programs.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

I
have very little to add to what authors already stated in their work.
the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Psychology; Research integrity: Research methodology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Introduction The data note provides metadata for 113 structured qualitative interviews collected across 17 countries around how Responsible Research and Innovation (RRI) is conceptualized and practiced.The data derive from the RRING project (rring.eu) to promote a global understanding of RRI.

Table 3 . Transcript distribution and selection of interviews for inter-coder reliability testing.
American Association for the Advancement of Science, US; Academy of Scientific Research and Technology, Egypt; Bintel Analytics, Malawi; University of Amsterdam, The Netherlands; Centro de Estudios y Proyectos, Bolivia; National Research Council, Italy; Center for the Promotion of Science, Serbia; De Montford University, UK; Israel Institute of Technology, Israel; Meiji University, Japan; National Research Foundation, South Africa; Participatory Research in Asia, India; R&D Maroc, Morocco; Royal Scientific Society, Jordan; and, United Nations Educational, Scientific and Cultural Organization, France.