A marathon, not a sprint – neuroimaging, Open Science and ethics

Open Science is calling for a radical re-thinking of existing scientific practices. Within the neuroimaging community, Open Science practices are taking the form of open data repositories and open lab notebooks. The broad sharing of data that accompanies Open Science, however, raises some difficult ethical and legal issues. With neuroethics as a focusing lens, we explore eight central concerns posed by open data with regard to human brain imaging studies: respect for individuals and communities, concern for marginalized communities, consent, privacy protections, participatory research designs, contextual integrity, fusions of clinical and research goals, and incidental findings. Each consideration assists in bringing nuance to the potential benefits for open data sharing against associated challenges. We combine current understandings with forward-looking solutions to key issues. We conclude by underscoring the need for new policy tools to enhance the potential for responsible open data.


Introduction
Fueled by advances in information-communication technologies that are bringing imagers together in a proximity never seen before, the research, translational, and clinical domains of neuroimaging have profoundly internationalized. Coincident with this movement are the broadening calls for Open Science to go mainstream, favoring the benefits of free sharing of imaging research inputs and outputs over protections of data and intellectual property. The impetus is the desire to maximize resources and ensure reproducibility of results (e.g., Alzheimer's Disease Neuroimaging Initiative (ADNI) 1 ; NICHD HEALthy Brain and Child Development Study (HBCD) 2 . Initiatives such as the Human Brain Project (HBP) 3 , the International Brain Initiative (IBI) 4 , Euro-BioImaging 5 , and other large-scale national initiatives including the Canadian Brain Research Strategy (CBRS) 6 and the Canadian Open Neuroscience Platform (CONP) 7 have further underscored the importance of exploring the potential benefits of open collaboration and sharing data beyond institutional walls and even across borders ( Grillner et al., 2016 ;Illes et al., 2019 ).
Open Science has the potential to flatten hierarchies in the production and dissemination of scientific knowledge, and further the goals

Box 1
Examples of ethical and legal considerations in the protection of human subjects for Open Science and neuroimaging.

Topic Summary
Concern for individuals and communities Respect for the autonomy and well-being of individual participants must be balanced with the interests of communities that may be implicated by neuroimaging research results. Concern for marginalized communities Community consultation can beneficially inform the conceptualization and design of neuroimaging research. A communications strategy for disseminating results is vital to the effort. Consent Broad consent with ongoing governance is the most Open-Science-friendly option. Neuroimaging research with indigenous communities and other isolated or historically marginalized populations may require specific consent with clear limitations on secondary use.

Privacy protections
The broadest possibility for data sharing in neuroimaging is contingent upon robust de-identification methods: de-facing MRIs, scrubbing DICOM headers of direct identifiers, and data re-structuring are some methods. While these may be technical or complex methods, communication about concepts and approaches to privacy is essential. Participatory research designs Working with participants and the wider public in the design of non-technical aspects of neuroimaging research will promote meaningful research questions and results, community and public trust, and effective dissemination of new knowledge.

Contextual integrity
Data access agreements and de-identification can assist in ensuring the ethical integrity of neuroimaging data sets, which is largely determined by the context in which data are generated.  ( Poupon et al., 2017 ). The frame of mind that Open Science creates is that scientific outputs should be accessible by default, and that restricting access to them requires justification; for example, when the validity of results or participant privacy may be endangered ( Azoulay, 2020 ). Open data is thus not all-or-nothing. Sacrificing the validity of results in the name of openness makes little sense; we would have openness but no scientific knowledge. The protection of participants and communities is similar in this regard; openness must be reconciled with the concerns of ethics and law. Indeed, the potential benefits of Open Science bring obligations that fall largely on the shoulders of scientists and engineers who carry out imaging research. From addressing classical issues of consent to more novel issues of preventing stigmatization of communities from which imaging subjects are drawn, neuroimagers should not have to bear these responsibilities alone.
We explore these responsibilities here using neuroethics as a lens through which to understand the multifaceted ethical and legal issues Open Science poses. The focus of neuroethics is on the alignment of neuroscience discovery with human values, and pragmatic solutions to associated challenges that arise in trying to achieve that goal. Many of the responsibilities, challenges and solutions apply across disciplines that involve human participants; others are specific to the brain. Many of the definitions, norms, and principles for Open Science also apply across disciplines that involve human participants. Indeed, throughout the data life cycle, fundamental rights and interests such as human dignity and privacy must be safeguarded ( Mortier et al., 2014 ;Thinyane, 2019 ;Yotova and Knoppers, 2020 ). Other challenges for Open Science are especially salient to brain imaging, such as the potential for stigmati-zation following research on certain mental health conditions. We focus on eight key ethical challenges for Open Science and open data in particular: concern for individuals and communities, marginalized communities, consent, privacy protections, participatory research designs, contextual integrity, fusions of clinical and research goals, and incidental findings. We recognize that this is only a partial list among many we could assemble for this review, but these are the concerns that have been most prevalent in the broad discourse surrounding data sharing and big data ethics ( Vayena and Gasser, 2016 ) and relevant to the context of neuroimaging. Other Open Science challenges such as intellectual property and authorship are beyond the scope of this focus on the protection of human participants, and have been addressed elsewhere ( Ali-Khan et al., 2018 ;Brand et al., 2015 ;David, 2004 ). We conclude with a discussion of policy changes that draw upon the concept of solidarity that we believe are central to delivering upon the promises of Open Science ( Box 1 ).

Overarching concern for individuals and communities
Western clinical and research bioethics has the individual as its focus ( The Nuremberg Code, 1949 ; World Medical Association, 2013 ), but a focus that is too narrowly placed on the individual can hide important considerations for the communities in which the individual is situated ( Weijer, 1999 ;Emanuel and Weijer, 2005 ). Contextualizing these implications is particularly important in the generation of neuroimaging data and knowledge where cultural constructs about the mind are ever present ( Amadio et al., 2018 ;Harding et al., 2021 ), and the potential for social stigma is substantial when cultural associations are made between mental health and social status ( Dodell-Feder et al., 2020 ;McLaughlin et al., 2011 ).
Respect for communities in the context of big neuroimaging data also dovetails with discourses surrounding data justice. At its core, data justice emphasizes accountability such that individuals and communities should not bear the burdens of potential misuses ( Taylor, 2017 ). Justice in this sense is rooted in fairness in the way individuals and communities are rendered in data; in areas of geographic disparities of economic development, data justice also takes on new dimensions. Access to neuroimaging facilities for clinical care and research follows typical North-South inequalities, which has downstream effects on access to, participation in, and representation in data sets ( Heeks and Renken, 2018 ). In this context, the ethical imperative to share imaging data is grounded in the leverage of resources and cost savings.

Concern for marginalized communities
Imaging and the role of neuro data in the construction of social categories and concepts incur distinctive risks of stigmatization, and justify the consideration of marginalized groups as its own category of concern. That is, even if individuals cannot be identified, risks may continue to exist for the identifiable groups. For already marginalized communities, there is the risk of further marginalization. Research regarding schizophrenia among specific communities, for example, may result in unjustified discrimination ( Drabiak-Syed, 2010 ). These social effects are unique in that their dispersed nature means that the individuals who affected are not necessarily the ones who underwent an imaging procedure.
We see three main avenues by which risks of stigmatization may be mitigated. One way is at the conceptualization and design phase of research projects. Open Science already encourages community involvement in the scientific process. Where research that may generate stigmatizing results for groups is contemplated, researchers should work with those groups to ensure that the research does not unnecessarily risk stigmatizing them. Some vulnerable groups already have developed guidelines to assist researchers in engaging with and designing research with marginalized communities (e.g., International Transgender Health Forum's Transgender Informed Consent (TRICON) Disclosure Policy (2019) ; Principles of Ownership, Control, Access, and Possession (OCAP)( First Nations Information Governance Centre, 2018 )). While such community guidelines are useful, concerns regarding who speaks for whom makes such documents only a starting point in terms of research design and participant engagement. The outcome of community engagement processes should also be communicated within the data sets, helping structure acceptable use conditions. Read-me files are an obvious candidate and could supplement metadata tagging, albeit more work is needed to decide on how community preferences should be communicated within structured metadata.
It is important to note that stigmatization typically arises from outside of the scientific community. Laypeople may hear about certain findings second-hand and incorrectly interpret them. While not foolproof, a way to mitigate risk of stigmatization is to have a plan in place to communicate results to the wider public. Partnering with trusted science journalists, for example, increases the likelihood that study results are properly contextualized ( Illes et al., 2010 ) and that concerns relating to essentialism are reduced. Where open sharing of de-identified data is foreseen, care should be exercised in deciding whether or not to even include information relating to marginalized communities in the data set. This type of group anonymity may protect such communities from risks posed by unforeseen research that may not adequately respect their dignity. As always, however, decisions require context. Not including such data may also present concerns as to equity, as well as the robustness and generalizability of knowledge generated from such data sets.

Models, methods, and meanings of consent
Consent is primordial and familiar to human subjects researchers, but we would be remiss to not review its multifaceted nature the context of neuroimaging and Open Science.
Specific consent is the longest standing model of consent. It is yoked to the ethical principle of autonomy (respect for persons), and has the benefit of simplicity. It is arguably relatively simple to inform a participant or legal representative about the goals of a single research study, seek institutional ethics approval, and even establish contextspecific community engagement and approval grounded in communitarian ethics, where that is needed. Specific consent is further advantageous in the context of neuroimaging where small or remote communities are involved, participatory research designs desirable (please also see below), and for which the risk of disclosure or identification of persons or the community through secondary data uses is high. Canadian Indigenous Peoples, Native Americans, M āori are all examples of this context. As technology such as portable, cloud-enabled MRI and other advanced neurotechnologies are being developed and contemplated for deployment to such communities for research where it could not possibly have been done before ( Grill et al., 2020 ;O'Reilly et al., 2021 ;Tran et al., 2021 ;Turpin et al., 2020 ), specific consent may play an essential, if not a comeback role even against the backdrop of Open Science. The downside today to specific consent, however, is in the de facto limitation to the reuse of data and whether sharing hijacks simplicity in the cost-benefit equation.
Dynamic consent, whereby research participants decide on a caseby-case basis how their neuroimaging data may be used, is an option that permits the use of data in a continuous flow of research ( Kaye et al., 2011 ). This form of self-managed privacy offers participants the greatest control over their data, consistent with the principle of respect for autonomy. However, participant fatigue and loss of contact with them are substantial risks. It is untenable for people in communities with limited capacity for research in terms of human resources, even with portable technology on the horizon, and in the face of more potentially profound daily challenges to health, food, water and other threats to personal security. Even those with ample resources may be unable to appreciate the risks that inhere to a particular decision for their data to be used sequentially over time ( Solove, 2012 ).
Broad consent consists of individual consent to an area of research that is coupled with ongoing governance. Under this model, the limitations of a narrow consent to a specific research project are removed, and the fundamental concern for individual autonomy and non-maleficence are secured through governance mechanisms that protect the data for bona fide uses only. Where relevant, bodies such as data access committees can ensure representative membership from certain epistemologically diverse communities (e.g., Indigenous groups) to ensure that uses cohere with community norms and expectations.
Broad Notably, however, the governance mechanisms for broad consent form part of larger repositories of data and samples. The UK Biobank, for example, has been conducting brain imaging of its participants ( Alfaro-Almagro et al., 2018 ), who have given broad consent to the use of their data that are overseen by multitiered governance structures ( Laurie, 2011 ). In a similar vein, the Canadian Alliance for Healthy Hearts and Minds conducts MRI exams coupled with cognitive evaluations and blood sampling in partnership with CARTaGENE and the Montreal Heart Institute Biobank, which ensures that biobanks evolve as research infrastructures ( Anand et al., 2016 ).
Laws such as the European Union's General Data Protection Regulation (2016) allow, in principle, for broad consent. Yet, insofar as personal data are processed, consent as a legal basis is likely not possible in many Open Science contexts. Data processing will be done by different researchers who may not be able rely on the initial consent to data processing ( Peloquin et al., 2020 ). Not having had prior interactions with data subjects, secondary use researchers are unlikely to be able to obtain new consent to data processing. Consequently, some other legal basis must be found, which may pose challenges even when research presents much potential to further public interest ( Becker et al., 2020 ).

Privacy protections
Privacy is among the most vexing issues for ethics and law in the Open Science paradigm. The fundamental issue is how autonomy can be respected in conjunction with the obligation to reduce risks while also aspiring to the broadest possible sharing and reuse of data. Data must typically be personal, i.e., relate to an identifiable individual, be-fore legal and ethical limitations are put on its use. Still, as the preceding section underscored, the generation and re-use of data derived from marginalized communities may impose additional responsibility in setting use conditions, even if data are not considered to be personal. Key privacy issues include transfers to other jurisdictions that may offer less robust privacy protections, and the appropriate level of de-identification.
Overall, if the data are not personal, they can flow freely. Failing a sweeping change to privacy law, de-identifying data such that they are no longer considered personal is a key pillar in contemporary Open Science practice. DICOM headers with direct identifiers should be scrubbed; indirect identifiers are trickier as the analysis must be contextual ( Tremblay-Mercier et al., 2020 ). Certain fields may need to be aggregated, such as details about occupation can be grouped into broad categories. De-identification of scans through de-facing algorithms and similar tools can reduce the risks data sharing poses to individuals ( Bischoff-Grethe et al., 2007 ). These techniques must be communicated to participants as part of the informed consent process, such as with the Open Brain Consent ( Bannier et al., 2020 ).

Participatory research designs
Whether the approach to neuroimaging involves specific, dynamic or broad consent, or focuses on the individual or community, delivering on the promises of Open Science involves an entire ecosystem that surrounds the production of scientific knowledge. Where Open Science meets citizen science, for example, members of the public engage with experts in setting research priorities and project design ( Wyler and Haklay, 2018 ). While technical aspects of a neuroimaging study will remain with the scientists and engineers, this approach with its historical basis in community-based participatory research, democratizes knowledge through the democratization of the research process itself ( Israel, 2013 ). It may seem cumbersome if not exceedingly challenging to make such a shift, but concerted efforts to engage patients or persons from historically marginalized populations in identifying and prioritizing research questions, goals and governance ( Stevenson et al., 2013 ;Woodbury et al., 2019 ) can ensure good use of often precious resources, meaningful results, and effective strategies for dissemination of new knowledge. A relatively passive, take-it or leave-it notion of autonomy in traditional consent processes is thus transformed into an active one.

Contextual integrity
To maintain the contextual integrity of neuroimaging data with a sharing pathway, safeguards are needed to ensure the integrity of functions, purposes, and values ( Nissenbaum, 2019 ). At the most fundamental level, this may mean assessing how conceptually far away a secondary use may be from the initial one. Data initially generated for research into the development of brain imaging techniques poses profoundly different issues than using that same data for the study of stigmatizing mental conditions ( Heinrichs, 2012 ). Consequently, even if there are no formal barriers to secondary use, the contextual limitations of data must be recognized.
The open data emphasis of Open Science does not mean that everyone should have access to data. Rather, it implicitly asserts that data are used for bona fide research purposes, whether by researchers at traditional institutions such as universities and institutes or by citizen scientists. Law enforcement agencies, insurance companies, political parties, and any agents of these groups are not the intended beneficiaries of lowered barriers to data access. Registered and controlled access models, data access agreements, prohibitions on re-identification, and other such safeguards are available for this purpose ( Sarwate et al., 2014 ). While effective at ensuring the contextual integrity of data, these safeguards put up barriers to data access. The modalities of sharing within the Open Science paradigm are relatively static and consist of either deidentifying or having a registered or controlled access model overseen by a data access committee. New technologies such as federated search, training and analysis are developing quickly. Such federated infrastructure can permit the processing of personal data where sharing would not be otherwise possible, e.g., training deep learning algorithms on large fMRI data sets ( Li et al., 2020 )

Fusion of clinical and research goals
Where Open Science research meets the clinic, ensuring the contextual integrity of neuroimaging data in a sharing pathway is particularly complex. Participating in research must not impede clinical care -international biomedical ethics insists on this ( Council for International Organizations of Medical Sciences, 2016 ;World Medical Association, 2013 ). Yet real-life boundaries are never as crisp as on the page. Consider the progressive realization of learning health systems that aim to use individual data to improve clinical practice, including efficiency and quality in an integrative fashion ( Institute of Medicine (US) Roundtable on Evidence-Based Medicine, 2007 ). Examples of such systems initially emerged in the context of artificial intelligence for rare diseases and cancers ( Graaf et al., 2018 ), but brain imaging data of individuals today are showing similar powerful benefits in the context of multiple sclerosis ( Mowry et al., 2020 ) among other neurologic disorders such as stroke and epilepsy, disorders of aging, and major psychiatric disorders. Research and clinical ethics neuroimaging paradigms need to anticipate and attend to new issues in this context ( Faden et al., 2013 ), especially as they pertain to internally-facing considerations of quality assurance, reproducibility, bias, and outwardly facing considerations of transparency in data use, privacy, and public trust.

Incidental findings
Discussion of incidental findings has been a robust topic for neuroimagers for two decades and, some argue, is the sine qua non example of the blurring of research and clinical lines in neuroimaging. With a direct connection to participants, researchers involved in the primary generation of images should have a management plan in place that involves transparency about pathways to identifying and disclosing unexpected findings ( Illes et al., 2004 ;Illes and Racine, 2005 ). In secondary uses of neuroimaging data, a new discovery of an anomaly of brain structure, or even potentially brain function ( Scott et al., 2012 ) may occur if the primary research excludes attention to and reporting it out. Wolf et al. (2008) explored this situation in the context of biobanks -using the term biobank to refer both to collections of samples and collections of data -and suggest that biobanks shoulder the responsibility to manage incidental findings and individual research results of potential health, reproductive, or personal importance to individual contributors. When re-identification of individual contributors is possible, and the consent permissions allow, the biobank should work to enable the biobank research system to discharge four core responsibilities to clarify the criteria for returnable findings, analyze a particular finding for actionability, re-identify the contributor, and offer the finding to the contributor through recontact. This framework has been successfully implemented for neuroimaging by Anand et al. (2016) .
Where data are in a public, open-access portal, the return of incidental findings is less likely due to difficulties in identifying participants. Data would have gone through an intensive de-identification process and re-identifying subjects may, depending on the data management practices of the data generator, require unreasonable efforts. There are hypothetical re-identification methods ( Ravindra and Grama, 2019 ), but they do not exist for every de-identification process. Where subjects may have distinctive brain features that allow for identification with effort (e.g., arachnoid cysts, pilocytic astrocytomas), however, secondary-use researchers should see if the original research team foresaw the return of results. As part of the informed consent process, clarity that data sharing will not necessarily entail more individual results being returned is essential.

Solidarity: uniting infrastructures to support a new paradigm
Solutions to the challenges posed by Open Science for neuroscience and neuroimaging must recognize that different harms require different protections. What is needed is a principle, like solidarity, that bridges the right to benefit from scientific advancement with neuroimaging with the right to be protected from unjustified harms Buyx, 2017 , 2012 ). We explore three potential options to this notion of solidarity: legal prohibitions on re-identification, disallowing the use of datasets for non-scientific purposes, and a harm mitigation fund. Crucially, these are protections that neither research ethics committees nor even the most carefully crafted data access agreements can offer. Instead, they speak to the highest levels of research regulation. If Open Science is to flourish, creativity at this level is essential. Governments and the funders of research, be they public or private, have a large role to play in bringing about these enabling conditions.

Legal prohibitions on re-identification
If the potential harms are informational and discriminatory, then a general prohibition on attempting to re-identify individuals from any data set without a reason in law for doing so is a start. This approach has been taken by the United Kingdom ( Data Protection Act, 2018 , sec . 177), for example. Such a law is more effective than a data access agreement because of its general applicability -there is no agreement needed -and it presents the possibility of government-backed sanctions.

Disallowing the use of datasets for non-scientific purposes
Some harms may go beyond the individual and may affect public trust in the Open Science endeavor. Legally disallowing the use of scientific datasets for insurance or political purposes would offer additional protections. In the case of law enforcement, more nuanced solutions are required to strike a proportionate balance in the interests at stake. Regulating the processing of data can act as an upstream, ex ante protection, compared to the focus of discrimination law on remedying ex post harms ( Cofone, 2019 ). Participants rightly expect that their data will be used for bona fide scientific research, not to impede their ability to obtain insurance or have criminal justice implications for them.

Harm mitigation funds
In recognition of both the contribution of participants and of the fallibility of safeguards, a harm mitigation fund has been proposed for the rare cases of harms that are due to data misuse ( Prainsack and Buyx, 2016 ). Such an approach mirrors universal healthcare systems or other types of collective insurance schemes. At its core, it recognizes that anyone may suffer a harm and leaving the burdens up to pure chance is ethically questionable, or even indefensible. A harm mitigation fund may moreover present greater flexibility than insurance contracts that are commonly required for research projects. Contributors to the funds could include funding agencies, researchers' institutions, owners of intellectual property derived from Open Science products, and large publishers. With regard to management, models can be found in pension funds, charitable trusts, and other common legal vehicles that permit the management of capital for specific purposes and for defined classes of beneficiaries.

Conclusion
All told, robust scientific, ethical and political debates are needed to ensure that Open Science can achieve its full potential. Reaching that potential can be envisioned, metaphorically, as marathon not a sprint. Open Science is a philosophy and an approach to the creation of generalizable knowledge that is multidimensional and, simultaneously, exhilarating and daunting. The potential to leverage data in neuroimaging is enormous, but technical considerations and responses to them must go hand in hand with ethical, legal and social ones. While neuroimagers shoulder the challenges and the benefits of solutions to them for the brain imaging landscape, intersectoral partnerships across the life sciences, law and humanities that are deeply integrated with any research plan, and solidarity through unified infrastructures, will mitigate the burdens that come with any new innovation.