A FAIR-Decide framework for pharmaceutical R&D: FAIR data cost–benefit assessment

The FAIR ( ﬁ ndable, accessible, interoperable and reusable) principles are data management and stewardship guidelines aimed at increasing the effective use of scienti ﬁ c research data. Adherence to these principles in managing data assets in pharmaceutical research and development (R&D) offers pharmaceutical companies the potential to maximise the value of such assets, but the endeavour is costly and challenging. We describe the ‘ FAIR-Decide ’ framework, which aims to guide decision-making on the retrospective FAIRi ﬁ cation of existing datasets by using business analysis techniques to estimate costs and expected bene ﬁ ts. This framework supports decision-making on FAIRi ﬁ cation in the pharmaceutical R&D industry and can be integrated into a company ’ s data management strategy.


Introduction
The FAIR guiding principles for scientific data management and stewardship enhance the usability of scientific research data at scale by humans and machines [1].The potential of these principles as a cooperative data strategy for managing pharmaceutical R&D data assets has been highlighted since 2019 [2].In the pharmaceutical industry, the benefits expected from FAIR implementation include: accelerated innovation through the availability of numerous data sources [3,4]; reduced time frames of drug discovery through the ready accessibility of data [2,5]; data silo elimination to foster internal and external collaboration [6,7]; and the facilitated use of sophisticated analytical methods (artificial intelligence and beyond) [8,9].
The promise exhibited by the FAIR principles has motivated researchers to investigate their implementation in the pharmaceutical industry, providing evidence to support the growing requirement to adhere to these principles in managing the data assets of large, complex multinational pharmaceutical enterprises [2].Case studies on FAIR implementation by two pharmaceutical companies reported the successful reuse of data and associated metadata, and highlighted the importance of data quality assurance alongside the implementation of the FAIR principles [10].Nevertheless, despite the potential value of applying the FAIR data principles in pharmaceutical R&D, the effective execution of retrospective FAIRification is hindered by several issues [11][12][13][14], which we have previously classified into legal, technical, organisational and financial challenges [15].Harrow et al. [10] highlighted the need to cherry-pick pre-existing datasets in order to select data that are of value in clearly articulating use cases.This need is due to need for extensive internal effort and support other knowledge service and technology providers.A recent study also pointed to the use of lessons learned to devise pathways for prospective FAIR data transformation (that is, the conception of datasets that are FAIR by design) [16].
The alignment of R&D datasets with FAIR principles requires significant investment of time, effort and resources, and requires collaboration between multiple pharmaceutical stakeholders [17].Stakeholders, from data holders and researchers to senior management, would benefit from the use of decision-support frameworks that can balance potential benefits and expected costs to guide the selection of legacy datasets for FAIR implementation [15].Previous studies have tended to focus on technical reviews and with limited, if any, inclusion of business analysis [2,6,10].Using this approach could help pharmaceutical stakeholders to prioritise their data assets and to understand the associated costs and expected benefits of FAIRification.
Business analysis is commonly and extensively used in economics, transport and health to guide decision makers with respect to the outcomes of projects or policies [18,19].Such an analysis comprises a variety of methodological techniques, which can be broadly classified into two main types: the single-criterion method, which is a monetary approach, and the multi-criteria method, which is a nonmonetary technique [20].The former is represented by cost-benefit analysis (CBA), which examines the costs and benefits of an intervention, expressed in monetary units, to allow a quantitative comparison [21].The latter is represented by multi-criteria analysis (MCA), which is a non-monetary alternative intended to assess outcomes in accordance with several criteria or to serve as a complement to CBA [22].Applying these techniques to support decisions on data FAIRification can play a crucial role in facilitating and fostering FAIR implementation in pharmaceutical R&D.
In the FAIR context, CBA has been put forward as a way to estimate the expected costs and potential benefits of introducing the FAIR data principles into data management.The European Commission's early report on the costs and benefits arising from the absence of FAIR data in the European context, as well as the national report for Denmark, uses CBA but restricts generalisability to certain sectors (public and private) [23,24].An assistance framework that combines CBA and MCA in the prioritisation of data assets for FAIRification and that helps pharmaceutical stakeholders to make FAIRification decisions is desirable.
We introduce the FAIR-Decide framework as a decision-support framework for assessing the potential of pre-existing datasets to be FAIRified and for the prioritisation of R&D datasets using business analysis techniques (CBA and MCA).Implemented within a question-based FAIR-Decide tool, this framework provides a systematic approach to assessing costs and benefits and for involving multiple stakeholders in the FAIRification process.

The FAIR-Decide framework
The FAIR-Decide framework is an assistive framework that can help a range of pharmaceutical stakeholders and decision makers to determine, assess and explain the cost-benefit of investing in retrospective FAIRification.The framework supports: (i) the systematic justification of decision making and (ii) the translation of experts' tacit or internal knowledge on making FAIRification decisions into external insights that can be easily assessed and used by other stakeholders, who have various roles and different positions in pharmaceutical enterprises.
The framework was developed after a workshop of 11 pharmaceutical professionals participated in the FAIRplus project [25][26][27].It also incorporates our earlier work on identifying the expected costs and benefits of FAIRification [15], and the establishment of a conceptual model of FAIRification decisions [17].The framework presents a conceptual model for the FAIRification decision-making process (Figure 1): business use cases for the datasets are prioritised; the effort and benefits required to FAIRify the data are assessed; and FAIRification is approved and resourced based on the assessment of both the scientific and the business case.
In Figure 1, we identify two categories of stakeholders: (i) the management team, which may include the middle manager, lab head, IT leader and so on; and (ii) the data team, which includes data providers and producers (e.g., researchers), data con-sumers (e.g., data scientists), and data stewards [17].To cover the range of stakeholders for each dataset more explicitly, the FAIR-Decide framework organises them into four groups that are focussed on different areas: Business, Legal, Data, and Technical (Table 1).
The FAIR-Decide framework has three components: input, analysis, and output (Figure 2).At the input stage, each participant, working independently or collectively with others, completes a series of questions about their role, the roles of other stakeholders, the data of interest and their type, and the goals and scope of FAIRification.Next, the participant answers questions regarding cost and benefit factors related to the FAIRification of a single dataset using the FAIR-Decide survey tool.The majority of these factors can be scored individually on a five-point scale to reflect their importance in decision making (from 1 not important to 5 very important).The tool produces an assessment report and a score for each factor on a five-point scale for that dataset.The participants then review their weightings, reports and scores for each dataset to come to a joint decision.The strength of the framework is less the actual scores and more the reflection on the factors, questions and relative weightings or scores given by each participant.
In the analysis part, the cost-benefit factors are a set of indicators associated with FAIRification that were identified in our previous Qualitative Interview Study [17].Cost factors cover the legal right to access data and ethical compliance when carrying out FAIRification retrospectively, as well as costs related to the resolution of legal issues and to the current state of the dataset with regards to metadata, quality and volume.Resource factors include human resources (e.g.employees who do the work, and the allocation of a certain amount of time), the skills required to implement FAIRification, the knowledge and expertise necessary to perform FAIRification, and technical resources such as internal IT applications or the external tools necessary for FAIRification.
Benefit factors provide the value proposition (attainable value) of performing FAIRification.The first of these factors, the reusability of data assets at scale, is the main perceived advantage arising from the implementation of FAIR principles in pharmaceutical R&D.This benefit factor generates value from data assets by enabling to use their data in pathways to derive novel scientific insights.The second benefit factor is cost savings: aligning data with the FAIR principles enables companies to get the most out of their data assets, reducing experiment duplication and thus lowering costs and shortening timeframes across the R&D pipeline.
The questions that are used to gather input for the framework can be found on Zenodo [28].They are primarily single selections from a set of five choices and have "get out clauses" covering "other" or "don't know" options.Some questions have a free text "justification" box that does not contribute to the score but does contribute to the report.Only the volume and dimensionality" sub-factor requires information other than option selection.The questions are not balanced between the major areas: cost has 29 questions and benefits only 9, with only 2 questions about cost savings.This is due to the fact that the cost factors are often easier to express and assess than the benefits in the economic literature [29].It should be noted that the unbalanced number of questions does not affect the weighting, as each cost and benefit factor has its own weight.

FIGURE 1
Conceptual model for the FAIRification decision-making process.There are multiple starting points.[17] The FAIRification process comprises a number of variables, and a variety of stakeholders.

FEATURE TABLE 1
Intended users of the FAIR-Decide framework.The inputs are analysed using the weighted sum method (WSM), a popular scoring approach grounded in MCA [30,31], which was chosen to handle qualitative data in a flexible way.The score for each factor, A i , is calculated by adding the scores of each decision factor (a) multiplied by its assigned weight (w).The process is expressed in the following equation: where a decision problem has factors, and a is a factor.Each decision maker can assign a value (score) to each factor and can denote its importance as a factor weight (w) that adds up to 1.This means that the stakeholders who are involved in FAIRification can complete assessments using their own weightings, after which their results can be combined and compared with those of other stakeholders to delve into differing perspectives.
The output of the FAIR-Decide tool is an assessment report in three parts, initialised by data type and time of assessment.The first part shows users the questions and their answers, and is available in a compre-hensive view should users want to discuss it with their colleagues or line managers.The second part presents a summary of the cost-benefit assessment, encompassing the overall score for each cost and benefit factor in the form of a gauge chart and a weighted sum matrix for visualisation.The third part of the report displays the scores from the benefit assessment, including the selection of a given benefit and any justification for it, and the scores obtained in the cost assessment.The report can be exported as a PDF document or printed out directly, or participants can choose to receive an email that includes the full assessment report.
The tool was piloted using Qualtrics XM [32], the web-based questionnaire service approved by the University of Manchester's Information Governance Office [28,33].This service was selected over several others because it allows the FAIR-Decide tool to be installed and used without the need for other software packages (standalone), making its testing more accessible to pharmaceutical practitioners.In addition, Qualtrics XM has numerous external libraries for mathematical and algorithmic assistance, which are available online under open licences.These features are important because they are needed to calculate the weighted sums of cost and benefit factors.The tool (version#1) is available on the University of Manchester's servers [34].
To evaluate the utility of this pilot tool, we conducted focus group discussions [35,36].Eleven pharmaceutical professionals participated by making FAIRification decisions on pharmaceutical datasets (Table 2), each with their own use case.Prior to this workshop, we invited six participants who were members of the FAIRplus project [25] and outside the pharmaceutical industry to evaluate this tool based on their experience in the FAIRification process.Their involvement had a significant influence on shaping the tool.More details of the evaluation are available on Zenodo [37].
Most of the participants appraised the tool as valuable, were satisfied with the way in which the cost and benefit factors are captured, and considered the tool to be a useful starting point for assessing the cost of applying the FAIR principles to  Overview of the FAIR-Decide Framework.The FAIR-Decide framework involves three core components: input, analysis and output.The input tool is an instrument for capturing stakeholders' assessments regarding cost and benefit factors.The input is analysed using the Weighted Sum Method (WSM).The output provides a summary of the assessment of cost-benefit factors that informs final decision-making.legacy datasets.They viewed it as a promising step towards the implementation of the FAIR principles in their companies, as it enables easy adoption in real-world settings.The incorporation of handling or raising legal issues, which is one of the challenges confronting the conversion of datasets into FAIR materials, especially in pharmaceutical organisations, was particularly highlighted, as was the efficient simplicity of the tool.
The participants praised the visualisation of decisions as one of the most important strengths of the FAIR-Decide tool.They highlighted the decision matrix, which they deemed suitable for team cussions.They likewise recognised the flexibility of the tool, in which the assessment of most cost and benefit factors is optional.Certain factors may not always be applicable to all types of datasets and assessor roles.A weakness raised by a few participants is the lack of specificity regarding the minimal knowledge required to do the assessment (prior to actual use).In addition, the participants emphasised the need for background material as a guide for users as they initiate assessments.
We received several comments on ways to enhance the FAIR-Decide tool.These suggestions centre on improvements to the layout of the user interface, adjustments to the wording of some of the questions, and supplements to the information provided in the assessment report.Participants also commented on the need for a feature to combine their reports in order to allow an automatic comparison instead of a manual one, thereby facilitating the decision-making process.
The framework accommodates multiple datasets and is intended to support collaborative decision making.However, the tool currently only asks questions about one dataset at a time and only supports answers to questions by one participant, although numerous participants can directly compare their evaluations for the same dataset.Future work should extend the tool to support many datasets rather than just a single dataset, and should provide features for handling collaborative decision-making by teams.The FAIR-Decide tool uses a proprietary survey system as a convenience.However, the questions and analysis methodology are open and can be implemented in any survey tool.Thus, the tool is available to a wide range of pharmaceutical professionals or through organisations such as the Pistoia Alliance.
It should be noted that none of the participants fell into the category of stakeholders who have a legal focus, and access to representatives of this group proved challenging.This limitation could have an impact on the development of the FAIR-Decide tool as the views of these stakeholders might differ from those of stakeholders within the other categories.

Conclusion and remarks
This paper introduced the FAIR-Decide framework and its helper tool, which is anchored in business analysis techniques, to advance FAIR implementation in pharmaceutical R&D.To the best of our knowledge, the FAIR-Decide framework is the first of its kind to use business analysis for FAIRification in the pharmaceutical industry.This framework can accelerate decision making on FAIRification by enabling the assessment of expected costs and potential benefits, which play distinct roles in balancing investments.It provides practical help for pharmaceutical stakeholders and complements their internal techniques for making decisions on FAIRification, which are mainly ad hoc in nature.
Looking ahead, the current research can be extended in three ways.First, researchers could develop a comprehensive business decision tool that not only identifies benefits and costs at scale (low benefit or high cost) but also enables the analysis of quantitative data (describing in monetary terms how much FAIRification costs or what benefits are derived from it) and of how these factors are linked to FAIRification.This may also address the use of unbalanced numbers of questions relating to cost and benefit factors.Second, the FAIR-Decide tool can be converted into an open access tool to reinforce its availability to a wide range of pharmaceutical professionals.The tool and methods can be made publicly available to and collaborative for an extended number of users in relevant communities.Third, the tool could incorporate a function for handling

TABLE 2
Summary of focus group participants.