Towards a Risk Catalogue for Data Management Plans

Although data management and its careful planning are no new topics, there is only little literature on risk mitigation in data management plans (DMPs). We consider it a problem that DMPs do not include a structured approach for the identifcation or mitigation of risks, because it would instil confdence and trust in the data and its stewards


Introduction
"A data management plan (DMP) is a document that describes how you will treat your data during a project and what happens with the data after the project ends" (Michener, 2015, p. 1).DMPs "serve to mitigate risks and help instil confdence and trust in the data and its stewards" (Donnelly, 2012, p. 83)."Planning for the effective creation, management and sharing of your data enables you to get the most out of your research" (Jones, 2011, p. 2).Therefore, the creation of a DMP should not only happen for obtaining a grant but also for successfully conducting the proposed project.
According to ISO 31000 (International Organization for Standardization, 2009, p. 1) a risk is "an effect of uncertainty on objectives".Data management plans should help to decrease effects of uncertainty on project objectives.We consider it a problem that neither DMPs nor funders' DMP evaluation schemes include a structured approach for the identifcation or mitigation of risks, since this would foster the successful conduction of data-generating projects, which often are funded research projects.We believe our approach will help funders evaluate risks of proposed projects and hence the risks of their investment options.
Data management maturity models like the Data Management Maturity (DMM) SM Model (Capability Maturity Model Integration) (CMMI Institute, 2019) or the Enterprise Information Management (EIM) maturity model (Newman and Loga, 2008) are primarily designed for enterprises and may not be feasible for higher education institutions (HEIs).A rigid model for HEIs to coordinate support of data management and sharing across a diverse range of actors and processes to deliver the necessary technological and human infrastructures "cannot be prescribed since individual organisations and cultures occupy a spectrum of differences" (Jones, Pryor and Whyte, 2013, p. 4).Also, there is a potential confict between organisational demands and scientifc freedom.The Charter of Fundamental Rights of the EU contains scientifc freedom as a constitutional right and researchers may view the imposition of specifc data management processes as a restriction of their scientifc freedom.On an even more international level, the UNESCO recommends that "Each Member State should institute procedures adapted to its needs for ensuring that, in the performance of research and development, scientifc researchers respect public accountability while at the same time enjoying the degree of autonomy appropriate to their task and to the advancement of science and technology" (UNESCO, 2018, p. 119).
We consider it important, that researchers commit themselves to data management practices like e.g., ISO 31000.However, ISO 31000 (International Organization for Standardization, 2009, p. 14) defnes the risk management process as a feedback loop to be conducted in organisations.Projects tend to have a much more limited scope with regard to funding and duration than organisations.Therefore, we regard the ISO 31000 risk management process as too time-consuming and of limited suitability for funded research and similar projects.
In this paper, we propose a lightweight approach for the identifcation of general risks in DMPs.We introduce an initial version of a generic risk catalogue for funded research and similar projects.By analysing a selection of 13 DMPs for projects from multiple disciplines 1 published by the Research Ideas and Outcomes (RIO) journal, we 1 Anderson and Fey, 2016; Canhos, 2017; Fisher and Nading, 2016; Gatto, 2017; McWhorter, Thomas  and Wright, 2016; Neylon, 2017; Stolze and Nichols, 2016; Pannell, 2016; Traynor, 2017; Wael, 2017;  White, 2016; Woolfrey, 2017; Xu, Ishida and Wang, 2016

IJDC | Conference Pre-print
Weng and Thoben | 3 demonstrate that our approach is applicable and transferable to multiple institutional constellations.As a result, the effort for integrating risk management in data management planning can be reduced.

Related Work
Jones, Pryor and Whyte (Jones et al., 2013, p. 2) developed a guide for HEIs "to help institutions understand the key aims and issues associated with planning and implementing research data management (RDM) services".In this guide, the authors mention data management risks for HEIs.While the upfront costs for cheap storage of active data "may be only a fraction of those quoted by central services, the risks of data loss and security breaches are signifcantly higher, potentially leading to far greater costs in the long term" (Jones et al., 2013, p. 13).There are "potential legal risks from using third-party services" (Jones et al., 2013, p. 14).Data selection counters the risks of "reputational damage from exposing dirty, confdential or undocumented data that has been retained long after the researchers who created it have left" (Jones et al., 2013, p. 15).The OSCRP (Open Science Cyber Risk Profle) working group developed the OSCRP, which "is designed to help Principal Investigators (PI) and their supporting Information Technology (IT) professionals assess cybersecurity risks related to Open Science projects" (Peisert et al., 2017, p. 2).The OSCRP working group proposes that principal investigators examine risks, consequences and avenues of attack for each mission critical science asset on an inventory list, whereas assets include devices, systems, data, personnel, workfows, and other kinds of resources (Peisert et al., 2017).We regard this as a very detailed alternative to our approach, but FAIR guiding principles (Wilkinson et al., 2016, p. 8) and long-term preservation need to be added.
In 2014, Ferreira et al. (Ferreira et al., 2014, p. 41) "propose an analysis process for eScience projects using a Data Management Plan and ISO 31000 in order to create a Risk Management Plan that can complement the Data Management Plan".The authors describe an analytical process for creating a risk management plan and "present the previous process' validation, based on the MetaGen-FRAME project" (Ferreira et al., 2014, p. 42).Within this validation Ferreira et al. (Ferreira et al., 2014, p. 50) identify project task specifc risks like "R6: Loss of metadata, denying the representation of the output information to the user via Taverna".This risk is tailored to the use of Taverna and hence may not be relevant for the majority of funded research and similar projects.There may be projects, for which analysing specifc risks for all resources may be crucial.However, a detailed risk analysis may require a considerable amount of work.

Methods
We propose a lightweight approach that can serve as a starting point to include risk management in research data management planning.It doesn't preclude detailed approaches like OSCRP (Peisert et al., 2017) or ISO 31000 (International Organization for Standardization, 2009).Instead, we propose an approach which tries to reduce and maybe avoid the burden of a full risk management process like e.g.ISO 31000.Our approach is based on a pretailored and extensible general risk catalogue (Table 1) to lessen the effort required for risk management.We derived part of this risk catalogue from 29 interviews with researchers from multiple disciplines 2 , which we conducted as part of project SynFo -Synergy Creation on the operational Level of Research Data Management.One goal of project SynFo was the development of a transferable approach to improve research data management in multiple organisational constellations.In generalized content from the interviews, we identifed risks entailed by interfaces of information, e.g. between researchers and data subjects or between researchers and external service providers.For the development of our approach, we also consulted the catalogues for threats and measures from the supplement of the "IT-Grundschutz" catalogues (Federal Offce for Information Security (BSI), 2016) by the German Federal Offce for Information Security (BSI), the FAIR guiding principles (Wilkinson et al., 2016) as well as the report and action plan from the European Commission expert group on FAIR data (Collins et al., 2018).

IJDC | Conference Pre-print
Our risk identifcation includes risks, their possible risk sources, mitigation approaches, and consequences.By analysing occurrences and mitigations of risks from our catalogue within a selection of 13 DMPs from multiple disciplines3 , published by the RIO journal, we demonstrate that our lightweight approach is applicable to DMPs and transferable to multiple institutional constellations.We evaluate the occurrences of the 15 risks in our catalogue by identifying possible risk sources in each of the selected DMPs and analyse the risk mitigations in accordance to what the authors wrote.

Legal Risks
A breach of a regulation like the GDPR or the Nagoya Protocol can result in high fnes.At worst, compliance breaches can lead to reputational damages, legal disputes and enormous cost.

Penalty for conducting unreported notifable practices
Research may include reportable research practices like the collection of physical samples regulated by the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefts Arising from their Utilization, which was transposed into EU law by Regulation (EU) No 511/2014.Under this regulation, there is a reporting obligation if the research on genetic resources is fnancially supported (Regulation (EU) No 511/2014, Art. 7, Sec. 1) and if the fnal stage of development of a product that is based on the utilisation of genetic resources (Regulation (EU) No 511/2014, Art. 7, Sec. 2).Article 11 says that "Member States shall lay down the rules on penalties applicable to infringements of Articles 4 and 7 and shall take all the measures necessary to ensure that they are applied" (Regulation (EU) No 511/2014).The Nagoya Protocol "and EU documents themselves give no guidance on penalties, each country has the liberty to determine these" (van Vegchel, 2018).Consequences may be fnes of up to EUR 810,000 or even imprisonment (van Vegchel, 2018).To avoid penalties, the parties should comply strictly with the rules.The Convention on Biological Diversity publishes a detailed list of parties to the Nagoya Protocol4 .

Penalty for unpermitted usage of external data
In many countries, data by themselves do not have inherent legal protection.Licence contracts can reach various agreements concerning terms of use.Free licences make (data) objects available for utilisation to everyone, but usage can be restricted or conditioned.Creative Commons (CC) licences and the GNU General Public License (GPL), which is specialised for free software, are widely used.Nonetheless, using CC licences can lead to conficting rights of third parties.Publicity, personality, and privacy rights "not held by the licensor are not affected and may still affect your desired use of a licensed work" (Creative Commons, 2019)."If there are any third parties who may have publicity, privacy, or personality rights that apply, those rights are not affected by your application of a CC licence, and a reuser must seek permission for relevant uses" (Creative Commons, 2019).This e.g.holds for pictures of persons.Also, the GNU GPL licence imposes transitive obligations, e.g."derivative programmes must also be subject to the same initial GPL conditions of ability to copy, modify, or redistribute" (Lipinski, 2012, p. 312).To mitigate the risk of unpermitted usage of external data, it is recommended to abide by the licence terms.In general, an overview about the data and the related licences can be developed in the DMP or within the framework of a data policy.

Penalty for unpermitted usage of personal data
In the EU, the General Data Protection Regulation (GDPR) governs the processing of personal data.Articles 6 and 7 of the GDPR regulate the lawfulness of processing and the conditions of consent.On an international level, the European Commission can conduct an assessment to "ensure that the level of data protection in a third country or international organization is essentially equivalent to the one established by the EU legislation" (Article 29 Data Protection Working Party, 2018, p. 5).Canada (commercial organisations), Israel, Switzerland, Japan and the USA (limited to the Privacy Shield Framework) offer an adequate level of data protection (European Commission, 2019).To avoid penalties, it is recommendable to receive written consents from data subjects including information about purpose and procedures of data processing.

Penalty for conducting inadequate data protection practices
Article 5 of the GDPR enumerates principles related to processing of personal data: the principle of lawfulness, fairness and transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity and confdentiality as well as accountability.According to Article 45 of the GDPR, "A transfer of personal data to a third country or an international organisation may take place where the Commission has decided that the third country, a territory or one or more specifed sectors within that third country, or the international organisation in question ensures an adequate level of protection.Such a transfer shall not require any specifc authorisation" (Council of the European Union and European Parliament, 2016, p. 61).Countries without adequacy, which are not classifed as safe third countries, can guarantee protection in other ways, for example by appropriate safeguards (Art.6, GDPR) or binding corporate rules (Art.7, GDPR).To avoid penalties, it is recommendable to abide by the applicable laws.In case of doubt, researchers can contact the (data protection) authorities.

Privacy Risks
A loss of confdentiality can have adverse effects on an organisation like fnancial effects (Federal Offce for Information Security (BSI), 2016, p. 396).These effects may also

IJDC | Conference Pre-print
Weng and Thoben | 7 apply to a researcher who additionally may want to keep research data confdential before scientifc output is published, so that research data will not be subject to theft of work.

Loss of confdentiality through sending data to an unintended recipient
Correspondence has the intrinsic potential that a researcher transmits data to an unintended recipient.This may happen accidentally or as the result of a fraudulent attack like social engineering and leads to loss of confdentiality."Social engineering is a method used to gain unauthorised access to information or IT systems by social action" (Federal Offce for Information Security (BSI), 2016, p. 419).Researchers should take extra care when sending confdential information and be aware of fraudulent attacks.

Loss of confdentiality through interception or eavesdropping of information
In the supplement of the IT-Grundschutz catalogues, the BSI specifes the threats of interception or eavesdropping of information, which entail the risk of loss of confdentiality (Federal Offce for Information Security (BSI), 2016, p. 396)."Since data is sent using unforeseeable routes and nodes on the internet, the sent data should only be transmitted in an encrypted form, as far as possible" (Federal Offce for Information Security (BSI), 2016, p. 3105).

Loss of confdentiality through loss or theft of portable storage media or devices
"Portable terminal devices and mobile data media in particular can be lost easily" (Federal Offce for Information Security (BSI), 2016, p. 394) or even be stolen."Whenever possible, mobile data media such as USB sticks and laptops should always be encrypted completely even if they are only occasionally used for confdential information" (Federal Offce for Information Security (BSI), 2016, p. 3877).

Loss of confdentiality through careless data handling by an external party
We regard the event that researchers share data with an external party without the purpose of publication as entailing the risk of loss of confdentiality.The external party may handle confdential data carelessly."It can frequently be observed that there are a number of organisational or technical security procedures available in organisations, but they are then undermined through careless handling of the specifcations and the technology" (Federal Offce for Information Security (BSI), 2016, p. 767).We recommend that researchers who share their research data to always grant specifc usage rights in written form to the external party or to check if appropriate security measures are applied by the external party.

Technical Risks
Data can lose their integrity or be lost (Federal Offce for Information Security (BSI), 2016, pp.422-423) leading to the major risk of unavailability of data.Unavailability of the correct data through silent corruption can lead to usage of incorrect data and hence to the production of incorrect results.If data are unavailable, either the project may fail IJDC | Conference Pre-print or researchers need to repeat their data collection and the project will be behind schedule.

Unavailability through data corruption
"The integrity of information may be impaired due to different causes, e.g.manipulations, errors caused by people, incorrect use of applications, malfunctions of software or transmission errors" (Federal Offce for Information Security (BSI), 2016, p. 423)."If only accidental changes need to be detected, then checksum procedures (e.g.cyclic redundancy checks) or error-correcting codes can be used" (Federal Offce for Information Security (BSI), 2016, p. 2991).Nonetheless, there may be other scenarios where these verifcation techniques are insuffcient.

Unavailability through data loss
Data may "be lost when devices or data media are damaged, lost or stolen" (Federal Offce for Information Security (BSI), 2016, p. 422), hence become unavailable.Approaches to mitigate irretrievable losses of data are for example regular backups (Federal Offce for Information Security (BSI), 2016, p. 4432) or keeping copies in multiple storage locations (Reich and Rosenthal, 2000).

Science Risks
Consequences of poor discoverability and reusability of data are that researchers may unnecessarily repeat work and that scientifc outputs derived from it may fail to be comprehensible, reproducible, or traceable.Problems with reproducibility and replication "can cause permanent damage to the credibility of science" (Peng, 2015, p. 32).For this reason, we named this category "Science risks".

Poor knowledge discovery or reusability for stakeholders cannot fnd access integrate or reuse the data
Making data fndable, accessible, interoperable and reusable to human and computational stakeholders is a best practice approach described in the The FAIR Guiding Principles for scientifc data management and stewardship (Wilkinson et al., 2016).Therefore, we include the risks that stakeholders cannot fnd, access, process or reuse data in our risk catalogue.Authors of DMPs can mitigate these risks as described by Wilkinson et al. (Wilkinson et al., 2016).We abbreviated the risk names under this risk category using the term 'poor knowledge discovery or reusability' but refer to all FAIR principles by Wilkinson et al. (Wilkinson et al., 2016).

Preservation Risk
If data are not suitably preserved, scientifc outputs derived from them may fail to be comprehensible, reproducible, or traceable in the long run.Data should be stored in a trusted and sustainable digital repository (Collins et al., 2018, p. 22).

Unsustainability in the long-term through unavailability or discontinuity of fnancial support
A digital preservation location has the intrinsic technical risk that data become unavailable through data loss or corruption.However, preservation locations also entail the risk of becoming unavailable when their funding ends.For example, Canhos states that discontinuity of fnancial support is a threat to Brazil's Virtual Herbarium and its data sources (Canhos, 2017, p. 5).Authors of DMPs should consider these risks when selecting a preservation location.They can mitigate the risk that data are not preserved long-term by reviewing the external preservation location's longevity, certifcates, and funding.We also suggest that attention is paid to possible migration and exit strategies like exporting and handing over data to a national data archive.This may particularly be important when the preservation location is not external.

Evaluation
When applying the risk catalogue (Table 1) to the sample of 13 DMPs, we distinguish between risk occurrences themselves and risk occurrences with at least one mitigation as show in Table 2. .00 Loss of confdentiality through sending data to an unintended recipient [RPRIR] .00 Loss of confdentiality through loss or theft of portable storage media or devices [RPRIS] .00 Penalty for conducting unreported notifable practices [RLEGU] .00

IJDC | Conference Pre-print
Weng and Thoben | 11 Because risk sources and mitigations were not always explicitly mentioned in the 13 sample DMPs, we needed to make interpretations.Appendix A shows our interpretation notes.According to these interpretations, we found the mitigations shown in Appendix B.

Evaluation Results
Each of the 15 risks of our catalogue occurred in at least two of the selected 13 DMPs.Table 3 summarises our evaluation results.
Within the small sample of 13 DMPs, we found 34 distinct strategies to mitigate ten of the 15 risks of our proposed catalogue.Hence, we also found that for fve of the 15 risks from our catalogue the authors did not describe any mitigation in the corresponding DMP.These risks are legal and privacy risks and they do have possible consequences like loss of reputation or project failure through theft of work.The authors of the selected DMPs overall attach highest importance to mitigating data unavailability through data loss, making data fndable, accessible, interoperable and reusable as well as their long-term digital preservation.We found that two risks from our catalogue the authors mitigated in all of the selected DMPs.These risks are unavailability through data loss (RTECL) and poor knowledge discoverability or reusability for stakeholders cannot access the data (RSCIA).

Conclusion
Since we identifed each risk of our catalogue in at least two of the selected DMPs, we conclude, that our risk catalogue is applicable to DMPs from multiple areas of research.In the selected DMPs, we overall fnd 53 of 125 (42.4%) risk occurrences not mitigated and hence see the necessity of DMP quality improvement through risk identifcation and mitigation planning in the data management planning phase.
We consider our approach useful to identify general risks in DMPs.We propose that after flling out a funder's DMP template, authors of DMPs refer to a risk catalogue to identify possible risk sources and hence risks.Next, the authors should add mitigations to their DMP in the corresponding paragraph, if their DMP does not already contain one.For example, in a DMP's paragraph in which authors write about the usage of external hard disks they should add a sentence indicating that these external hard disks will be encrypted to mitigate the risk of loss of confdentiality through loss or theft of storage media, if their DMP does not yet contain any measures mitigating this risk.
The risk catalogue may also be useful to funders, since it makes it possible for them to evaluate basic investment risks of proposed projects.
Many of the legal assertions in this article hold within the EU.Applicability to non-EU countries may vary.
We think further research on suitable risk management approaches concerning the data management of funded research and similar projects needs to be conducted.

Table 1 .
General risk catalogue CorrespondenceLoss of confdentiality through interception or eavesdropping of information [RPRII]Online data transmission Loss of confdentiality through loss or theft of portable storage media or devices[RPRIS]Portable storage media or devices Loss of confdentiality through careless data handling by an external party[RPRIE]

Table 2 .
Risk occurrences (+) and risk occurrences with at least one mitigation (-) in the sample

Table 3 .
Summary of risk evaluation results