Development of consensus on essential virtues for ethics and research integrity training using a modified Delphi approach

ABSTRACT Most ethics and research integrity (ERI) training approaches are based on teaching moral rules, duties or responsibilities, often not sufficiently addressing virtue-based ethics. This study aimed to obtain a consensus among relevant experts on the importance of essential virtues for ERI training and their acquisitions. A modified Delphi consensus process was conducted in three rounds; 31 ERI experts participated in Round 1 and 23 in Round 2 and Round 3. Based on findings generated from qualitative data in Round 1, a structured questionnaire with 90 different statements grouped under five domains was developed for Round 2 and Round 3. After the final round, a consensus was achieved on two-thirds of statements included in this study. The experts agreed that virtues are based on learned and reflected attitudes and that the appropriate direction to acquire research virtues is through continuing education using case studies and discussions based on real-life scenarios. Furthermore, the consensus was obtained on 35 scientific virtues that should be stimulated in ERI training, prioritizing honesty, integrity, accountability, criticism and fairness as the most essential scientific virtues for good research practice. These results should be considered in developing or adjusting the ERI training program and materials.


Introduction
It is well known for decades that scientific malpractices, ranging from questionable research practices to severe scientific misconduct, produce misleading results, waste public funds, undermine the trust in the research process, increase regulation by external policy institutions, and slow scientific progress in general (Chubin 1985;Mojon-Azzi and Mojon 2004;Ioannidis et al. 2014).Therefore, strengthening ethics and research integrity (ERI) in the research process is a crucial part of science education and training.However, what embodies proper or effective ERI training is still a matter of debate since various studies have shown different results.Although several studies showed no positive impact of ERI education on ethical behavior (Plemmons, Brody, and Kalichman 2006;Funk, Barrett, and Macrina 2007;Antes et al. 2010;May and Luth 2013) or that effects are uncertain due to the low quality of evidence (Marusic et al. 2016), there is also evidence of substantial benefits to trainees and considerable improvement of these training in recent years (Watts et al. 2017).
Most authors distinguish two main educational approaches to preventing scientific misconduct and promoting integrity in research: one based on norms, rules or principles, and another on values and virtues (Godecharle, Nemery, and Dierickx 2014;Horbach and Halffman 2017;Steneck 2006).The principle-based approach portray ethical conduct as consisting of adherence to ethical rules, duties, or responsibilities (Resnik 2012).In contrast, the virtue-based approach focuses on character development cultivated through habit and attention to detail of each particular situation, allowing scientists to strive for excellence in their practices (Chen 2015;Pennock and O'Rourke 2017).The critics of the principle-based approaches argue that they are not very useful in ERI education since these approaches do not provide adequate guidance for ethical decision-making in real-life situations not covered by a given set of rules, due to their narrow focus on compliance and negligence of intrinsic values development (Macfarlane 2009;Pennock and O'Rourke 2017;Steele et al. 2016).Although some theorists favor virtues over principles (Macfarlane 2009), most of them agree that standard ERI training should be augmented by the inclusion of moral virtues (Resnik 2012).Principle-based and virtue-based approaches are complementary because they focus on different aspects of ethical conduct -one stresses the importance of following rules, while the other emphasizes character development (Resnik 2012).
However, the majority of those approaches today are principle-based, as they focus on teaching moral rules, duties, or responsibilities rather than on developing moral virtues for scientists (Resnik 2012;Zandvoort et al. 2013).This neglect of the virtue-based approach can be explained by the controversy surrounding virtue education in general since the efforts to develop virtues through moral education usually provoke a skeptical responses, which are often not based on empirical evidence but on moral, historical, conceptual, political, psychological and epistemological misunderstandings or misinterpretations of character and virtue (Kristjánsson 2013).This is supported by the results of numerous virtue-based educational interventions tested in recent years.For instance, assessment of the curriculum intervention on students' understanding and practice of virtue showed rapid improvement in the experimental group (Pike et al. 2021), as well as evaluation of a curriculum intervention designed to enhance students' virtue perception and virtue reasoning (Harrison, Burn, and Moller 2020).In their study on nurturing virtues of the medical profession, Schweller et al. (2017) concluded that medical students' empathy might be amenable to early curricular interventions designed to promote values inherent to medical professional identity.Furthermore, the study on the virtue of tolerance conducted by Van Fossen et al. (2021) revealed that students increased their understanding of the concepts even when virtues were taught virtually.
Regardless of these controversies surrounding virtue education, the virtue-based approach is slowly gaining a place in ERI training.Results of several studies showed that scientists believe that scientific virtues could be learned, as well as that they value and even prefer a virtue-based approach over traditional ERI training ("Character traits: Scientific virtue 2016; Berling et al. 2019;Palmer and Forrester-Jones 2018;Tomić, Buljan, and Marušić 2022).Moreover, recent studies demonstrate increasing implementation of approaches based on nurturing values and virtues in ERI courses (Mejlgaard et al. 2019), as well as virtues being mentioned in about one-third of ERI educational resources (Pizzolato, Abdi, and Dierickx 2020).
The majority of previous studies regarding science and virtue ethics addressed its moral-philosophical aspect but did not seriously consider its educational aspect (Sarafis et al. 2014).Nevertheless, some studies have begun to look into how a virtue-based approach can contribute to scientific training (DuBois 2004;Chen 2015;Horbach and Halffman 2017;Paternotte and Ivanova 2017), as well as which virtues are most important in the area of research integrity (Berling et al. 2019;Pennock and Miller 2019).As this approach to ERI training is still very new, there are almost no standard classroom materials (Pennock and O'Rourke 2017).In addition, there is no consensus on the most important scientific virtues to teach because different authors and academic associations proposed various lists of the important scientific virtues (ALLEA 2017;Macfarlane 2009;Macrina 2014; National Academies of Sciences, Engineering and Medicine 2017; Paternotte and Ivanova 2017;Pellegrino and Thomasma 1993;Pennock and Miller 2019;Pring 2001), with limited agreement on the essential ones.Although they all considerably contributed to the discussion on scientific virtues and the virtue-based approach to ERI, the lack of basic agreement on this matter seems impractical for training and developing educational materials.Considering this knowledge gap, it is necessary to further develop the evidence base regarding virtue-based training for research integrity.Since previous studies indicate that scientific virtues can be ranked by their importance for conducting research (Pennock and Miller 2019;Tomić, Buljan, and Marušić 2022), it is also important to provide more evidence on which virtues should be addressed in ERI training.Thus, this study aimed to involve a broad range of experts in the field of ERI to build a consensus on which virtues should be stimulated and prioritized in training for good research practice and how they should be learned.

Study design
As a part of the Horizon 2020 VIRT 2 UE project (CORDIS 2021), which aimed to develop a sustainable train-the-trainer blended learning program enabling contextualized ERI teaching across Europe, a modified Delphi consensus process was conducted to achieve consensus among relevant experts about which virtues should be addressed in an ERI training program.The Delphi consensus process is an iterative survey research method for consensus building, which uses a series of questionnaires or "rounds" to gather information from a panel of selected experts.After each round, the experts are provided with an anonymized summary of their previous responses and encouraged to revise earlier answers.The process is repeated until a certain degree of group consensus on a specific topic is reached (Goodman 1987;Hsu and Sandford 2007;Keeney, Hasson, and McKenna 2006;Powell 2003).The Delphi method was initially developed by Dalkey and Helmer (1963), but many variants of this method have been proposed over time.While these variants share some fundamental characteristics, such as feedback and iterative process, no universal guidelines on using the Delphi method exist, and there is no standardization of methodology (Hasson, Keeney, and McKenna 2000;Keeney, Hasson, and McKenna 2006;Pare et al. 2013).Therefore, we applied a modified version of this method based on different Delphi studies to answer our research question more adequately.The protocol for this Delphi consensus process was pre-registered on OSF (https://osf.io/pmxaf).We followed CREDES reporting guidelines in writing this manuscript (Jünger et al. 2017).
Theoretically, the Delphi consensus process can be continually iterated until a consensus is achieved, but typically three rounds of questionnaires sent to a preselected expert panel are often sufficient to reach a consensus (Hsu and Sandford 2007;Powell 2003).For that reason, we conducted a Delphi consensus process of three rounds approximately one week apart (Figure 1).The length of each round was three weeks.Although the length of each round was planned at two weeks, we extended each round by one week to achieve a higher response rate.Data were collected from September to November 2019 through an online survey sent via SurveyMonkey (SurveyMonkey Inc., San Mateo, California, USA).
In order to identify preliminary topics and develop a set of questions for the Delphi consensus process, a scoping review of virtues addressed in ERI training (Marušić et al. 2019) and face-to-face focus groups with key stakeholders (Tomić et al. 2022) were also conducted as a part of the same project.Based on the focus groups discussion findings, an open-ended questionnaire for Round 1 was developed (Appendix 1) to allow and encourage participants to generate new ideas on scientific virtues (Hsu and Sandford 2007;Powell 2003).The structured questionnaire (Appendix 1) for Round 2 and Round 3 was developed based on Round 1 results with input from our previous studies (Marušić et al. 2019, Tomić, et al. 2022).The experts were asked to rate their agreement for 90 different statements on a slider rating scale from 0 -strongly disagree to 100 -strongly agree.The 0-100 scale was used since some of the participants in focus group discussions (Tomić et al. 2022) emphasized that virtues represent an abstract idea that may be difficult to define and precisely rate.Reminder e-mails were sent to encourage participants to complete each round of the survey.Only the experts who participated in the previous round(s) received a structured questionnaire for the next round.In the last round, they also received an anonymized summary of the previous round responses to revise their earlier answers in light of all other Delphi panel members' replies.
In the Delphi consensus process, decision rules and criteria to define and determine consensus must be established to assemble and organize the judgments and opinions provided by involved experts.In most studies, the consensus is achieved if a certain percentage of experts' votes falls within a prescribed range or through a median score based on a Likert-type scale (Diamond et al. 2014;Hsu and Sandford 2007).For Round 2 and Round 3 in our study, the consensus was expressed as the percentage of experts who rated an individual statement as 61 or more on the scale from 0 to 100 since that range means agreement or strong agreement with the statement.We defined that a consensus was reached for a statement when >70% of the experts rated it between 61 and 100 on the sliding scale; this level of agreement has been considered appropriate in previous Delphi studies (Downar and Hawryluck 2010;Slade et al. 2014;van Hecke et al. 2015;Vogel et al. 2019).To keep Round 3 as brief as possible, we excluded statements that have already obtained strong consensus in Round 2. The statements were excluded from Round 3 based on two criteria.The first was whether consensus was achieved or not based again on the threshold defined a priori as greater than 70% agreement among the experts.The second criterion was the strong level of agreement based on the median of the experts' scores for an individual statement since measures of central tendency are also used in Delphi studies (Diamond et al. 2014;Keeney, Hasson, and McKenna 2006).Strong disagreement was considered when the median was between 0 and 19, and strong agreement was defined as the median between 81 and 100.According to those two criteria, we did not include statements in Round 3 (Appendix 1) if they achieved strong agreement or strong disagreement based on the median and consensus defined as >70% agreement among the experts on ratings 61-100 for each statement.The use of two different criteria additionally ensured that only statements without a strong consensus were included in Round 3. The final draft of the results was reviewed and approved by a VIRT 2 UE project consortium and European Commission before publication and dissemination.

Participants' recruitment
The participants of the Delphi consensus process are traditionally referred to as "experts," but the level of expertise most often varies through different studies according to the needs of each research topic (Vernon 2009).To avoid the potentially misleading meanings of that title, the participants of the Delphi consensus process should be understood as informed individuals with knowledge of the topic that is the subject of study, which is lacking by members of the general public (Goodman 1987;Fink-Hafner et al. 2019).In our study, a total of 74 invitations for participation in the Delphi consensus process were sent out to the potential participants who met the criteria according to which we could consider them experts in the field of ERI.These criteria included participation in international ERI projects, publication of scientific articles in the field of ERI, participation in conferences on ERI, and engagement or work experience in offices or government agencies that focus on research integrity.The participants' list was drawn from publicly available sources, personal research contacts and recommendations from researchers involved in various EU projects on ERI.Each potential research participant was contacted via e-mail, and those who wanted to participate were asked to sign an Informed consent form by clicking on the "I agree to participate" button, which was provided via a questionnaire in Round 1 (Appendix 1).Since the Delphi studies do not seek to be fully representative but rather to include a broad representation of people and disciplines, we used a heterogeneous stratified purposive sample to include all relevant stakeholders involved in the research process: academics, ERI committees, policymakers, funding and process organizations, students, industry and small and medium-sized enterprises.In addition to being a member of one of those stakeholder categories, all participants were required to be over 18 years, fluent English speakers, active in research, and not involved in other VIRT 2 UE project studies.

Ethical considerations
Within the Delphi study, participants do not meet with each other face-toface, and therefore they can present and react to ideas unbiased by others' identities and pressures (Goodman 1987).The Delphi consensus process was performed after obtaining approval from the Ethics Committee of the University of Split School of Medicine (Reg. No.: 2181-198-03-04-18-0044). Ethical standards and guidelines of Horizon 2020 were rigorously applied.All participants received information about the study in advance and agreed in writing to participate.Only anonymized data was used for analysis.All collected data will be stored for a period of five years after the publication.

Data analysis
Data analysis involved the management of qualitative and quantitative data.Qualitative data was generated from the open-ended questionnaire in Round 1 of the Delphi consensus process and analyzed using the computer software NVivo 12 Plus for Windows (QSR International, London, UK).The reflexive thematic analysis approach developed by Braun and Clarke was used to analyze qualitative data (Braun and Clarke 2006;Braun et al. 2019) since the theoretical freedom of this approach allows great flexibility.VT coded qualitative data from Round 1 with an inductive approach and developed the themes at a semantic level, and discussed them with all authors.Following the familiarization with the qualitative data through reading and re-reading, initial codes were generated and gathered into potential themes.After reviewing themes across an entire data set, the themes were refined and finalized in the form of statements for questionnaires in Round 2 and Round 3 (Appendix 1).Quantitative data was generated from the rating-scale questionnaires in Round 2 and Round 3 of the Delphi consensus process.All quantitative data analysis was performed using the computer software IBM SPSS Statistics 26 for Windows (IBM Corp., Armonk, NY).Descriptive statistics were used to describe the experts' demographic characteristics and group responses to each statement in Round 2 and Round 3.

Participants
A total of 31 participants completed a questionnaire for Round 1 (31/74; response rate 42%) and 23 for Round 2 and Round 3 of the Delphi consensus process (23/31; response rate 74%) (Table 1).All rounds had participants of both genders.Their median age was 47 years (interquartile range 12) in Round 1 and 49 (interquartile range 13) in Round 2 and Round 3. The participants came from around the world, and the majority had a PhD or MD level of education.The vast majority of them considered themselves experienced or very experienced in ERI issues (87%).They were active in different types of research activities, with academic researchers most strongly represented.The median of years of their participation in research or research-related activity was 18 (interquartile range 12) in Round 1 and 19 (interquartile range 13) in Round 2 and Round 3. The most common research disciplines amongst them were biomedicine and social sciences.

Round 1
We conducted a thematic analysis of the experts' answers to open-ended questions in Round 1 and developed a list of themes presented as statements grouped under the following five domains: 1. Meanings and understandings of virtues in research, 2. Virtues important in research, 3. Overarching goals of virtue-based training in research integrity, 4. Acquisition of virtues in research, and 5. Possible improvements in training methods for virtues in research.The details of the thematic analysis are presented in Appendix 2. The list of statements generated from Round 1 was then expanded with the findings from the focus group discussions (2022) and the scoping review (2019) from the VIRT 2 UE project to create the finalized questionnaire with 90 statements for Round 2 (Appendix 1).

Round 2
In Round 2, we presented 90 different statements grouped under five domains, developed during the thematic analysis.Overall, 40 out of 90 statements reached the consensus according to both criteria and were excluded from the next round, leaving 50 statements in Round 3.
Under the domain "Meanings and understandings of virtues in research," 3 out of 6 statements achieved the consensus on both criteria, so they were excluded from the final round, leaving 3 statements for inclusion in Round 3 (Appendix 3).From the list of 54 virtues in the domain on the importance of virtues in research, 21 achieved the consensus on both criteria and were excluded from the next round, leaving 33 virtues for inclusion in Round 3 (Appendix 4).Honesty and integrity achieved 100% agreement, followed by accountability, criticism and fairness, which reached an agreement of more than nine-tenths (96%) of the experts.Objectivity, open-mindedness, reliability, rigorousness, transparency and truthfulness reached 91% agreement.
Out of 9 statements from the domain on the goals of virtue-based training for good research practice, 3 statements did not reach the consensus and were included in Round 3 (Appendix 5).From the domain "Acquisition of virtues in research," 2 out 7 statements did not reach the consensus and were included in the final round (Appendix 6).The fifth domain comprised a list of 14 methods or techniques relevant to ERI training, from which 8 were included in Round 3 (Appendix 7).Teaching methods that achieved the highest consensus were case studies and discussions (91% agreement among the experts), followed by individual mentoring and workshops (87%).The lowest agreement was found for boot camps and formal lectures (44%).

Round 3
In Round 3, we presented 50 different statements that did not obtain consensus in Round 2, grouped under five domains.Consensus, defined a priori as greater than 70% agreement among the experts, was reached on 22 statements.Under the domain "Meanings and understandings of virtues in research," only one of three presented statements achieved consensus (Appendix 3).Therefore, the overall consensus was achieved after the final round on 4 out of 6 original statements (Appendix 8).
Based on the results from Round 2, a list of 33 virtues relevant for research integrity (domain 2) was presented to experts in Round 3 to provide their opinion on how important it is to include those virtues in ERI training.A total of 29 virtues achieved consensus, with meticulousness having full agreement (100%), followed by carefulness, competency, perseverance and being skeptical (96% agreement among experts), and reflexivity (91%).The lowest level of agreement was for temperance (13%), altruism, compassion, loyalty and positivity (22%).The final list included 35 out of 54 presented virtues in research that are important in the ERI training (Table 2).
Two of the three statements on overarching goals of virtue-based training for good research practice (domain 3) achieved consensus in Round 3, so the final list included 8 of 9 initial statements (Appendix 9).Under the domain "Acquisition of virtues in research," both statements achieved consensus, so this domain was the only one in which all statements achieved agreement among the experts (Appendix 10).From the domain addressing teaching methods or techniques for virtue-based ERI training, all three statements from Round 3 achieved consensus.Overall, the experts reached a consensus on 8 of 14 initially presented teaching methods or techniques in ERI training (Appendix 11).

Discussion
This Delphi consensus process on scientific virtues reached the consensus among a panel of experts on two-thirds (69%) of statements included in the consultations.We presented 90 different statements grouped under five domains to the experts and obtained a consensus on 62 of them.In order to inform the future efforts in the development of scientific virtue training, the primary aim of our study was to obtain consensus on virtues that should be stimulated and prioritized in training for good research practice.Since there is difference between virtues that contribute to being a flourishing human being and those that shape an exemplary scientific researcher (Pennock and O'Rourke 2017), several authors have tried to identify the most important scientific virtues in the last few decades.Pellegrino and Thomasma (1993) emphasized the importance of trust, compassion, practical wisdom, justice, courage, temperance, integrity and self-effacement in medicine.To prevent temptations in their search for the truth, Pring (2001) argued that virtuous researchers require moral and intellectual virtues such as courage, honesty, modesty, humility, kindness, generosity of spirit, openness to criticism or concern for justice.Macfarlane (2009) proposed an alternative approach to research integrity, which focuses on developing researchers' personal understandings of values that underpin their academic practice.This approach includes six essential virtues: courage, respectfulness, resoluteness, sincerity, humility and reflexivity.Macrina (2014)  Although almost all authors emphasized particular virtues as the most important, it is evident that there is no broader agreement among them on the crucial ones, which creates practical difficulties during the development of virtue-based training for good scientific practice.In addition, the importance of virtues is mostly considered theoretically or from a broad perspective based on a general sample of scientists who are not experts in the area of ERI.Even though such input is significant for expanding our knowledge of scientific virtues, we believe that it is equally important to examine the views of ERI experts to adjust educational materials and practice of ERI training.To provide evidence for the future development of virtue-based education, we obtained a consensus among a panel of experts in the field of ERI on 35 out of 54 presented scientific virtues that should be considered for inclusion in the ERI training (see Table 2).Although the experts agreed that all of these virtues could be included and stimulated in ERI training, some of them achieved greater consensus than others, so we can conclude that they should be prioritized.Furthermore, five virtues that achieved greatest consensus among the experts in Round 2 could be considered as the most essential virtues for ERI training.
Two of those five virtues, honesty and integrity, could be even understood as central scientific virtues for good research practice since all experts rated them as most important already in Round 2. Honesty is central virtue because it applies to all aspects of the research process, including developing, proposing, performing, reviewing, reporting and communicating research.It requires that all these aspects of research are conducted in a transparent, fair, full and unbiased way without any fraud or deception.Honest scientists are obligated to meet all commitments to the research process and to others by making the totality of their results available to the scientific community (ALLEA 2017;Macrina 2014;Paternotte and Ivanova 2017).Integrity is also a kind of master virtue since it defines the nature of the individual who integrates all of the virtues that are constitutive in successfully taking one's life seriously (Cox, La Caze, and Levine 2014;Pellegrino and Thomasma 1993).This virtue tends scientists to make decisions independently of their expected rewards since it implies that personal gain and conflicts of interest cannot influence the scientist's choices (Paternotte and Ivanova 2017).
Although the other three most essential virtues for ERI training in our research did not achieve complete consensus among experts, there are also good reasons to acknowledge them as central scientific virtues as well, aside from being rated as second best by the experts in our study.Accountability is also often understood as a fundamental value that is central to the functioning of the research enterprise since scientists are required to stand behind their work and be accountable for their actions, statements, and roles in proposing, performing, reporting, and reviewing research (Macrina 2014; National Academies of Sciences, Engineering and Medicine 2017).At its core, accountability means an obligation to explain, demonstrate or justify one's research work from idea to publication, its management and organization, training, supervision and mentoring, and its broader impacts (ALLEA 2017; National Academies of Sciences, Engineering and Medicine 2017).Since science is a systematic endeavor that aims to build and organize knowledge, the virtue of criticism also has a central role in improving science as a whole because it addresses problems within science through the constant search for the elimination of error.This type of thinking requires developing skills in thought organization and exchanging ideas because accepting such criticism goes against one's natural inclination.Therefore, it is important to nurture in scientists the spirit of critical thinking, self-criticism and the openness to the criticism of others (Falcó-Pegueroles et al. 2021;Pring 2001).The scientific enterprise also includes research collaborations and partnerships, which challenge the achievement of fair professional relationships since they involve judging others' work for purposes of funding, publication, authorship or deciding who is hired or promoted (Lavery and Ijsselmuiden 2018; National Academies of Sciences, Engineering and Medicine 2017).For that reason, the value of fairness is particularly important since it reflects impartial judgment and appropriate behavior toward collages, as well as humans and animals as research subjects.Being "fair as a scientist includes providing appropriate credit to the work of others, citing the literature accurately and responsibly, providing appropriate recommendations, conducting objective peer review, and sharing data" (Macrina 2014).
It is important to emphasize that prioritization of these five virtues, as well as stimulation of other scientific virtues that achieved consensus among the experts in our study, applies primarily to the development and adjustment of ERI education.Our findings do not imply that scientists should not pursue nurturing other scientific virtues or that other virtues are not necessary for achieving the aims of science.We can try to explain this by comparing our findings with the results of the study on scientific virtues conducted by a group of researchers involved in the Scientific Virtues Project ("Character traits: Scientific virtue 2016; Pennock and Miller 2019).In order to find out the most important scientific virtues, they asked scientists to identify traits that they value most in one another, and according to them, honesty and curiosity are the most important traits underlying excellent science by far.On the other hand, in our study, the experts in the field of ERI were asked to identify virtues that are the most important for good research practice.Although the experts agreed that curiosity should be stimulated in ERI training, it did not achieve a similar consensus.In our opinion, the reason for such discrepancy lies precisely in the different contexts in which scientific virtues have been observed.Curiosity is undoubtedly the crucial trait for expanding our knowledge and discovering empirical truths about every aspect of our universe, but from the perspective of ERI it just does not play the same role considering ethical aspects of the justification of research or ethical aspects of doing research.A closer examination of any case of scientific misconduct or questionable research practice would hardly discover curiosity as the underlying cause for violating professional standards and ethical behavior because scientists do not falsify or fabricate results to satisfy their curiosity.As an illustration, for his infamous fraudulent paper that advocated a non-existent connection between autism and the MMR vaccine, a former surgeon Andrew Wakefield was found guilty not because of his lack of curiosity but because of his lack of honesty.A similar explanation can be applied to other lower-ranked virtues in our research, such as humility and trust.In that sense, it is crucial to distinguish essential scientific virtues in general and scientific virtues that are essential for ERI education.
Although the primary aim of our study was to obtain consensus on virtues essential for ERI training, we also aimed to gain broader insights into scientific virtues that are necessary for the development or adjustment of the holistic virtue-based training program and materials.However, in order to address our research findings on this broader insights into scientific virtues more efficiency, only the main points are presented here.Nevertheless, an extended discussion with more details on this topic is available to interested readers in Appendix 12.
The experts agreed that virtues in research could be understood as a compass because they provide guidelines for "doing the right thing" in unknown situations that are not covered by rules and codes.That is not surprising because it makes sense to link a virtuous person with doing the right things to others, which is primarily a matter of being moral (Slote 2005).Since Aristotle saw virtues as character traits that contribute to human flourishing (Aristotle 2014), scientific virtues can be understood as those traits that underpin intentions, motivations, resulting decisions and actions of an exemplary scientific researcher (Chen 2015;Pennock and O'Rourke 2017).This aligns with the understanding of research virtues by the experts in our study, who saw them as traits that enable researchers to make decisions that benefit the whole research process and all involved stakeholders.Experts also agreed that virtues for good research practice should be stimulated equally in every research sector or discipline since they are universal.
The expert panel in our study agreed that virtues are based on learned and reflected attitudes, which means that a person can meaningfully shape them and develop over time.Many contemporary virtue ethicists argue that virtues are learned through practice, habituation and reflection on how we behave in the community (Athanassoulis 2014;Carr and Steutel 2005;MacIntyre 2007).Even though previous studies showed that virtuous character education clearly affects students' moral and academic development (Baehr 2017;Berkowitz andBier 2004, 2007;Hershberg et al. 2016), it is still a matter of debate about the proper way to learn virtues.To inform future efforts in scientific virtue education, we tried to provide consensus on the acquisition of virtues in research and overarching ERI training goals.Brief or once-in -a-lifetime virtue-based training has been recognized by the experts as ineffective because research virtues can be acquired only through continuing education which is in line with findings of previous studies that also endorsed periodic over one-time ERI training (McGee et al. 2008;Palmer and Forrester-Jones 2018;Goddiksen and Gjerris 2022).
Since we acquire virtues through experience and not through theory, ERI training should be based on real-life scientific practice and cases rather than memorizing the facts, according to the experts.This view reflects the observation that individual experience and active learning methods are more effective than passive learning (Kalichman 2007), as well as the necessity of scientific virtues development through repetition and practice since they cannot be taught through verbal instruction alone (Curren 2005;Pennock 2019).According to the experts, there are teaching methods or techniques that could help acquire virtues in ERI training (Appendix 11) and, based on reached agreement, case studies, discussions, and individual mentoring should be prioritized to achieve the best results.Previous findings also revealed that more successful ERI education programs preferred case-based activities, discussions, interactive participation, mentoring, and practice of ethical decision-making skills (Antes et al. 2009;Todd et al. 2017;Tomić, Buljan, and Marušić 2022).Since previous studies on the goals of existing ERI education and training programs identified a lack of agreement about those goals and their uncertain effects (Kalichman and Plemmons 2007;Chen 2016), we tried to provide a consensus among the experts on overreaching goals of virtue-based training for good research practice.Besides improving critical analysis of questionable situations, compliance with research codes and guidelines, and self-reflection on the research practice, the experts saw raising awareness of the importance of virtues and identifying the most important virtues as the most important goals of virtue-based training.

Strengths and limitations
This is the first Delphi study aimed at developing a consensus among ERI experts on the importance of scientific virtues in training for good research practice.The Delphi consensus process has been established as a valuable research method in healthcare development, but it has also been used in other areas, including science and technology, communication improvement, policy analysis, education and planning.The Delphi method can be beneficial to the non-positivist researcher since it is particularly useful in the absence of a complete theoretical framework or when rapid understandings are required (Vernon 2009).Considering the already discussed lack of knowledge on virtue-based ERI training and the need to develop or adjust educational materials, we consider using this method as appropriate for our study.
Like any other method, Delphi has strengths and weaknesses already discussed in detail (Fink-Hafner et al. 2019;Murry and Hammons 1995;Vernon 2009).The main strength of our Delphi study was a heterogeneous sample, which included participants of both genders at different stages of their carrier from different countries, institutions, scientific disciplines, and stakeholder groups.This diversity of experts leads to better performance as this may allow for the consideration of different perspectives and a wider variety of alternatives (Murphy et al. 1998).Also, in each of the three rounds, we included more than 15-20 participants, which is considered appropriate in a Delphi study (Ludwig 1997;Mitchell 1991).Since adequate expert panel selection is crucial for the Delphi approach, we included only those participants with specific knowledge and experience in the area of ERI.Self-reported data from participants also confirmed their expertise since most of them considered themselves experienced in ERI issues, and no participant included in our study reported being not experienced in these issues.In addition, the majority of participants had long carriers as researchers, as well as the highest level of education.Another strength of this study is a preparatory phase in which we conducted focus group discussions and a review of the literature to develop more appropriate questionnaires to be sent to the experts.We also conducted the first round with the open-ended questionnaire to allow participants to express their own views and generate new ideas on scientific virtues beyond what was found during focus group discussion and literature review.Furthermore, we used Internet-based research tools to conduct our study since this approach maximizes the advantages and limits the disadvantages of the traditional version of the Delphi method (Fink-Hafner et al. 2019).Finally, our findings provided future developers of virtue-based education with the five essential scientific virtues that should be prioritized in ERI training, but we also offered a broader list of 30 additional virtues that could be stimulated in those training.We believe that this approach enabled the achievement of necessary consensus on essential scientific virtues for good research practice but also provided flexibility to better adapt ERI training programs to specific situations and populations they serve.
There are several limitations of our study.First, methodological limitations of any Delphi study include the difficulty of generalizing the results to a broader population due to the sample size (Fink-Hafner et al. 2019).However, since the Delphi method aims to address questions for which traditionally scientific approaches are less suitable (Vernon 2009), these results can be very helpful for developing or adjusting the ERI training program and materials.Second, since we purposively sampled participants from different stakeholder groups to capture diverse perspectives on scientific virtues, certain groups may have been overrepresented.The same applies to geographical diversity since participants from Europe were the majority.Third, the list of virtues included in the Round 2 questionnaire resulted from the literature review and specific individuals' opinions in our studies.Since our previous findings on this topic showed that researchers differed in their definitions and understandings of the concept of virtue (Tomić et al. 2022), it is possible that some readers would not consider all included items in the Round 2 questionnaire (Appendix 1) as virtues.Fourth, to achieve consensus on essential virtues for good research practice, we included experts in the field of ERI, but their expertise on virtue ethics pedagogy remained unknown.Unfortunately, reaching an appropriate sample of participants who are at the same time experts in ERI and virtue ethics pedagogy was unrealistic.Fifth, since we did not reflect on implementation from a feasibility perspective, additional research from this perspective could be beneficial as well as further ranking of scientific virtues using different approaches.Finally, we did not include all the previous statements in Round 3, which prevented their final ranking by importance.Delphi is well known to be vulnerable to dropouts because it is pretty time-consuming and laborious for participants (Fink-Hafner et al. 2019), which is why we chose this approach to maintain the higher response rate among the invited experts.However, we modified the Delphi approach with additional criteria based on the median of the experts' scores for an individual statement to ensure that the statements without a strong consensus were not excluded from Round 3.

Conclusions
Several conclusions may be drawn from the results of this study.The essential virtues for good research practice are different from essential scientific virtues in general, so it is crucial to distinguish them during the development or adjustment of the ERI training program and materials.For that reason, developers of future virtue-based ERI training programs and materials should consider scientific virtues that have achieved consensus among the experts in our study, prioritizing honesty, integrity, accountability, criticism and fairness as the most essential virtues for good research practice.We can also conclude that virtues are based on learned and reflected attitudes, which is why a person can meaningfully shape them and develop them over time.Since the experts also achieved consensus on how virtues should be learned, we can conclude that it could be useful to adjust virtue-based teachings according to those findings.Since brief or once-in-a-lifetime virtue-based training has been recognized as less effective, the more appropriate direction to acquire research virtues is through continuing education.This can be achieved by a combination of formal and informal education delivered through individual mentoring and daily practice.Aside from that, the best learning techniques for acquisitions of research virtues are case studies based on real-life scenarios and discussions focused on the gray area issues.Based on the results of our study, we can also conclude that the most essential virtues for good research practice -honesty, integrity, accountability, criticism and fairness -are universal, so they should be stimulated in every research sector or discipline.Due to the methodological limitations, further research on this topic with another approach to validate the Delphi results using triangulation is recommended since the generalization of these findings is limited.However, despite its limitations, we believe that our findings may guide the development or adjustment of ERI training programs with the quality improvement since they reflect the consensus of individuals with considerable expertise on ERI issues.

Figure 1 .
Figure 1.Flow chart of Delphi consensus process.

Table 1 .
Demographic characteristics of the panel members.All percentages in the tables have been rounded to an integer number and may not add up to 100%.†Eastern Europe included Hungary; Northern Europe included Denmark, Norway, United Kingdom; Southern Europe included Bosnia and Herzegovina, Croatia and Spain; Western Europe included Austria, Belgium, France, Germany, Luxembourg, Netherlands and Switzerland; Outside of Europe included Australia, Bahrain, Brazil, Canada, Israel and Iran.‡The sum of the roles represented exceeds the number of participants because participants could select multiple answers.

Table 2 .
Achieved consensus on the importance of different virtues in ERI training*.