Creating personas for exposome research: the experience from the HEAP project

The exposome is a complex scientific field that has enjoyed consistent growth over the last two decades, defined as the composite of every exposure to which an individual is subjected from conception to death. The study of the exposome requires consideration of both the nature of those exposures and their changes over time, and as such necessitates high quality data and software solutions. As the exposome is both a broad and a recent concept, it is challenging to define or to introduce in a structured way. Thus, an approach to assist with clear definitions and a structured framework is needed for the wider scientific and public communication. Results: A set of 14 personas were developed through three focus groups and a series of 14 semi-structured interviews. The focus groups defined the broad themes specific to exposome research, while the sub-themes emerged to saturation via the interviews process. Personas are imaginary individuals that represent segments/groups of real people within a population. Within the context of the HEAP project, the created personas represented both exposome data generators and users. Conclusion: Personas have been implemented successfully in computer science, improving the understanding of human-computer interaction. The creation of personas specific to exposome research adds a useful tool supporting education and outreach activities for a complex scientific field.


Introduction
The rapid advance of technology means that complex scientific issues become an inevitable part of modern research and society in general.A basic understanding of these complex issues is needed if such technological advancement is to gain wide adoption and eventual societal implementation.One of the essential assumptions pursuant to the greater understanding of scientific outcomes is that greater access to information will eventually lead to more knowledge, moving beyond the technical systems where that knowledge has originated from, and becoming a wider scientific and public commodity.This is one of the main pillars supporting the 'Open Science' principles as articulated by the European Commission over the last decade [1][2][3][4] .At the same time, new concepts of the "understanding of science" have emerged, emphasising the needs for "better science communication" directly relating to many fields where past technological advancement has been rapid, for example nanotechnology, molecular biology and -omics technologies, such as genomics [5][6][7] .These needs are expressed equally as strongly whether they relate to narrow-focus applications of new technologies (e.g., molecular genetic tests) 8 , fields of activity (e.g., infectious diseases) 9 or even entire facilities and research infrastructures (e.g., biobanks) [10][11][12] .
One such complex scientific field that has enjoyed consistent growth over the last two decades is the exposome, which is defined as the composite of every exposure to which an individual is subjected from conception to death.Study of the exposome requires consideration of both the nature of those exposures and their changes over time 13 .The concept was originally developed by Dr Chris Wild as a means of drawing attention to the critical need for more complete environmental exposure assessment in epidemiological studies ('environmental' defined in this context in the broad sense of non-genetic).The exposome, therefore, complements the genomic technologies by providing a comprehensive description of lifelong exposure history; and is linked to epidemiological tools so that any outcomes can be utilized for better delineating the causes and prevention of human disease 14 .However, the exposome is a very broad as well as a recent concept and that makes it challenging to define or to introduce in a structured way.Therefore, an approach to assist with both clear definitions and a structured framework is needed for the wider scientific and public communication, driven through specific implementation cases 15,16 .A similar experience was acquired through the B3 Africa project, where the engagement of clinicians and policy makers to biobanking was achieved through well delineated definitions and communication activities 17 .
To facilitate engagement with the scientific community and to gain a deeper insight for the exposome-derived set of tools, and to establish a clearer context and structure for what the "exposome" means in practice, a series of personas was created within the Human Exposome Assessment Platform (HEAP) project 18 .Personas are imaginary individuals of any gender that represent segments/groups of real people within a population 19 .The population represented by such a persona can be specific groups of users of content, a tool or a wider system.Personas have been implemented successfully in computer science, improving the understanding of human-computer interaction often through use cases 20 , as well as in direct-to-consumer marketing case studies 21 .Hence, the utilization of personas in the exposome field was a logical extension of such previous communication activities.
The HEAP project is a five-year project funded by the European Union (EU) Horizon 2020 Research and Innovation programme.It aims to provide an informatics platform, populated with research data from cohort studies, national registries, wearable sensors and consumer receipts.The ultimate goal is to make pseudonymized data from large-scale population cohorts; including data on biological samples, from health registries, and from research; safely interoperable and reusable.
A longer-term aim is to create a legacy for HEAP as an exposome research resource by providing training for the wider scientific community, and to clearly communicate the benefits of using HEAP to the target audiences.To achieve the latter effectively and lay the grounds for future adoption, a training and communication strategy must be developed to provide enhanced insights of the process to current and future end-users.This manuscript describes the creation and implementation of end-user personas, specific to the context of the current project, but with the intention that this communication approach can be adapted for wider use in the exposome research field in the future.

Methods
The two main data sources for the current work were a series of virtual focus group meetings and a number of virtual, individual, semi-structured interviews.More specifically, the personas were generated from collected qualitative data as follows: i) Information was gathered from three meetings of a small focus group with five participants, during which the initial identification of stakeholder groups and their characteristics were defined.These Persona classifications enabled the selection of the common questions that formed the basis of the subsequent interviews.The virtual focus group participants included experienced professionals with more than five years of experience.
ii) Semi-structured interviews then took place with individuals working on the project during three days of a HEAP workshop.Focus group participants were not involved in the face-to-face interviews, except for one individual due to the specificity of their scientific expertise.The sample size (n=14) of the individuals interviewed in this step is similar to other qualitative studies utilizing virtual focus groups for data collection 22 .
The participants were aged from 20 to 60 years old and a slight majority identified as male (eight out of 14).When asked about their professional designation there were several titles provided; the final professional titles used for the creation of the personas were mutually agreed during the interview.All participants had some prior knowledge of the exposome concept and of the HEAP proposal, though not necessarily exhaustive.
The following questions were asked to all interview participants:  40-50) The rationale behind asking about age was to reflect the duration of professional experience and level of seniority in the participants' professional field.

"
Tell us a bit more about yourself and your professional background."This information was listed under the heading "Personal information" and was intended to provide further context and depth to the persona.For example, it allowed us to record that, prior to becoming an epidemiologist, one of our interviewees completed medical school and qualified as a doctor.Another interviewee, participating as a persona in the "Public" stakeholder category, shared details about children and family life as well as educational background.
6. Interests: "What are your motivations for using the HEAP informatics platform or the scientific insights emerging from the HEAP informatics platform?"The interviews used an inductive, bottom-up approach for the analysis, which meant making use of meaning and themes formed and verbalized by the participants 23 .This method was considered more appropriate for the creation of believable and empathic personas.The data were grouped under appropriately relevant themes.These were then presented to the second author, a leading expert at the same faculty, who audited the categories and titles of the themes to ensure their accuracy and depth of specificity.A further member check with the focus group participants also contributed to the trustworthiness of the findings.This research was conducted under IARC ethical approval No.22-37; the anonymous participation in the interview was considered as an indication of consent.The raw data on the personas interviews are available at the Open Science Framework, https://doi.org/10.17605/OSF.IO/VXM3Z.

Results
The stakeholder groups identified through the initial focus group sessions are listed in Table 1.These were based on the current broad end-user groups of the scientific information of exposome research, and aligned with the categories outlined in the HEAP proposal.
Four themes were expected to emerge from the experiential accounts of the virtual interviewees.Each of these will be described along with the subthemes that were identified through the analysis process.The four themes were: (a) the scientific interests in general and of exposome research, (b) the scientific expertise deployed in the use of the exposome platform, (c) the ethical considerations in conducting such research activities and, (d) the needs/wants from both the HEAP project and exposome research in general.Each of these included subthemes, these are shown in Table 2.
Two such personas are shown in Figure 1 below.

Discussion
The themes identified represent broad categories reflective of scientific endeavour in general.As such the underlying commonality of the personas make them potentially transferable to other scientific fields and/or technological implementations.While there are general sub-themes on the impact of research on health and well-being, the creation, use and integration of data tools features strongly throughout all sub-themes.This is expected due to the background of many of the HEAP partners, as well as the emergence of this scientific interest in general.The innovations in healthcare, diagnostics, sensors, and data analysis with advanced methods offer opportunities for improved personalized healthcare, lower costs and benefits to the medical industry 24 .However, due to the rapid development of technology, the implementation and integration of such tools and platforms is heterogeneous, leading to calls for their greater understanding, and creation of a better-defined process for such utilizations [25][26][27] .This need is more pronounced in the case of a relatively recent concept such as the exposome [28][29][30][31] .Importantly, the sub-theme of understanding the limitations of exposome research has also emerged, in line with previously published work 32,33 .
As the goal of exposome research is to better study the complexity of healthcare realities, it inevitably requires a close connection with many disciplines, such as epidemiology, data analysis and bioinformatics, as is the case in the personas.
Having said that, the handling of large and diverse datasets, generated by these different disciplines, raises an entire group of questions relating to the ethical and legal aspects of their use 34,35 .While there does not seem to be a consensus approach in responding to those challenges, there are several high-profile projects that have addressed them within the context of their specific research aims 36,37 .The last set of sub-themes relate to the forward-looking application of the exposome, understanding the ability to provide further insights, to apply advanced analytical tools (such as AI and ML), while doing so in a manner that will be 'simplified', so that wider adoption can be enhanced.
Moreover, two separate sub-themes emerged: educational support towards understanding the exposome and the tools applied therein; and the role of the citizen scientist.The educational need is self-evident, as for any scientific or technological innovation and can range from the educational needs of students 38 to those of professionals 39,40 .On a separate note, the role of the citizen scientist has been strongly encouraged by the European Commission, both in strategic statements 41 and in practice by the creation of the open science environments on the cloud 42,43 .As such it is not a surprise that this parameter would emerge for an EU-based project, however it also highlights the underlying educational need for the wider public.
The personas created as part of HEAP are designed as a gender-independent communications and training tool to enable the consortium participants to understand experiences and backgrounds that differ from their own.This is particularly important for cross-disciplinary teams (as there are on most exposome studies) that may benefit in the long-term, as personas may represent a unifying compass for goal alignment 44 .
In the context of the HEAP project, the information gathered in the form of personas was used by the topic experts (in ethical/legal, technical, data management, and bioinformatics) to develop Learning Needs Assessments for future users of the HEAP platform.The personas enabled the topic experts to identify the strengths, knowledge/skills gaps and motivations of the HEAP target audiences through an engaging presentation of their "Needs", and their "Interests", "Ethical Considerations" and "Powers/expertise".
However, there are certain limitations to the current work.Firstly, the data were manually gathered and then manually analysed to develop the personas, resulting in a relatively low number produced, and a low rate for potential further customization.Furthermore, the personas are not necessarily representative of all the types of stakeholders originally identified.Having said that, this number is sufficient for the needs of the HEAP project, and the methodology can be replicated so that further personas are produced from a larger data set in the future.A further limitation is the impact of this work through established communication channels.
While the HEAP project is active on several social platforms, these are not exhaustive, and as such the use of the personas at present would remain targeted.
Having reflected on the questions asked during the Personas interviews, the following lessons learnt were identified and will be applied to future iterations of the questionnaire: • Age range: This question will be adapted to focus not on age, but on years of professional experience and seniority.This is because age is not always an accurate reflection of years of experience in a professional field, due to the increasing tendency towards mid-life career changes, such as starting a research degree in later life.
• Gender: To ensure that the HEAP platform is designed to meet the needs and expectations of all genders, a question about gender identity will be included in the future.This might enhance identifiability of users with the developed persona.
As the HEAP informatics platform nears completion, the focus will move beyond the academic research community to the "Personas" who will be concerned with the research insights and outputs of HEAP, such as Policy Makers and the Public.Future persona interviews will be conducted with these stakeholder groups to tailor dissemination materials and the project communication strategy to maximise the impact of the project.The end result will provide a comprehensive overview, using Personas, of all the main stakeholder groups of the project.

Conclusion
The rapid technological advancement has been able to support new scientific concepts, such as the exposome.However, in doing so the scale of data required and complexity has also increased, often exponentially.It is anticipated that a greater understanding of complex issues will in turn lead to enhanced ability on the part of individuals and communities to deal with these issues when they encounter them.To achieve the latter appropriate tools are required, and the creation of personas is one such tool.Here we describe the creation of personas for the EU-funded exposome project HEAP.The conceptual capabilities of the personas as an interface to analytical systems present considerable promise, both for enhancing the understanding of the methods themselves, as well as for data-driven interpretations within healthcare.Indeed, they can potentially offer considerable impact for researchers and organizations that desire to understand their end-user needs and vice versa.
The article highlights the importance of clear definitions and structured frameworks in scientific and public discussions related to exposome research.In addition, the personas developed in this study can be applied to other scientific fields and technological implementations, promoting better collaboration and understanding.
The current manuscript is well written, but there are some flaws in statistical analysis and data interpretation.Some additional rationale of the approach taken, and potentially additional analyses, may be warranted.

Eleni Fthenou
Qatar Biobank for Medical Research, Qatar Foundation for Education, Science, and Community, Doha, Qatar The article describes the creation of 14 end-user personas, imaginary individuals, genderindependent, representing groups of real people within a population.The overall aim is for the personas to be used as both exposome data generators and users under the context of the HEAP project.Personas will be used as exposome data generators and users in communication and training tools engaging scientific community, more particularly the HEAP consortium.
The personas were generated from collected qualitative data from virtual focus groups applying semi-structured interviews.Four themes reflecting general scientific endeavor in exposome research, the ethical aspect of such research and the HEAP needs and requirements were identified.The conceptual capabilities of the end-user personas identified in this study is expected to ease the understanding of individuals and communities to deal with complex issues enhancing the procedures themselves as well as they encounter them in data-driven interpretations.
The current manuscript presents the work done in a clear and accurate way by providing adequate access to supplementary data and citing current literature.The study design is clearly presented along with its strengths and weakness.The scientific merit is clearly stated as the outcomes identified represent broad categories of scientific endeavor with commonalities of the personas make them potentially transferable to other scientific fields.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and does the work have academic merit?Yes Are sufficient details of methods and analysis provided to allow replication by others?Yes If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Two of the 14 personas created regarding exposome research: a) the Epidemiologist/Medical Geneticist, and b) the Public/ Citizen.

the current literature? Yes Is the study design appropriate and does the work have academic merit? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate? Partly Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Reviewer Report 22 February 2023 https://doi.org/10.21956/openreseurope.16730.r30826© 2023 Fthenou E. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.