Data Curator in the Middle: Curating Data for a Diverse Community of Stakeholders

The Prevention and Early Intervention Research Initiative is an archiving project to preserve the data and reports that were generated by twelve years of philanthropic and state investment into prevention and early intervention approaches in the children and youth sector in Ireland and Northern Ireland. The investment resulted in an extensive collection of evaluation data and reports, which collectively provide an evidence base for continued investment into PEI programmes that are shown to be effective. In 2016, the Prevention and Early Intervention Research Initiative (PEI-RI) was established to preserve the outputs from these evaluations in the national data archives, as a publicly available evidence base. The political and social signifcance of this collection is manifest in the range of stakeholder groups that the project is engaging with, including the community and not-for-proft organisations that operated the PEI programmes, the research teams from academic institutions that evaluated these programmes, and representatives from government departments that co-funded many of these programmes with Atlantic. This paper tells the story of the PEI-RI archiving project, describing the steps we’ve taken since 2016 to preserve and promote the PEI data. During the course of the project we realised that it would not be enough to provide access to the data alone, as "[g]enerating and collating the evidence is of no use if it never reaches the commissioners and professionals who need it" (What Works Network, 2014, pp. 6). In the second phase of our project we are creating a range of resources for practitioner and decision maker audiences which provide a pathway to the data using the archival infrastructure. The project provides a case study of curating a digital collection that is intended for multiple stakeholders with different expectations of the archived material. The PEI-RI data curator is located in the middle of a triad of data creators, data consumers and data archives, and is tasked with balancing the interests, expectations and limitations of each. Submitted 15 December 2019 ~ Accepted 19 February 2020 Correspondence should be addressed to Ruth Geraghty, CES Dublin Offce 9 Harcourt Street Dublin 2. Email: rgeraghty@effectiveservices.org This paper was presented at International Digital Curation Conference IDCC20, Dublin, 17-19 February 2020 The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution Licence, version 4.0. For details please see https://creativecommons.org/licenses/by/4.0/ International Journal of Digital Curation 2020, Vol. 15, Iss. 1, 12 pp. 1 http://dx.doi.org/10.2218/ijdc.v15i1.706 DOI: 10.2218/ijdc.v15i1.706 2 | Data curator in the middle


Introduction
There is a long tradition of data reuse in quantitative social science, and while reuse of qualitative data has been slower to catch on, in the past decade there has been a growing acceptance of qualitative secondary analysis as an established method for research (Bishop and Kuula-Luumi, 2017, pp. 2). While mandates for openness in the social sciences have done much to encourage the archiving of research material, it is also worth considering the factors that are helping to foster a culture of data reuse. There is a nascent literature examining the enablers and barriers to the reuse of social science data, usually involving surveys and interviews with researchers on their experiences of reusing archived data. Gonçalves Curty (2016) found that the degree of effort required to locate the appropriate data (data discovery) and to fully make sense of its origins could infuence whether a researcher proceeded with re-using it. Yoon (2016) interviewed researchers about their failed attempts to reuse archival data and found that incorrect or incomplete data documentation was a major obstacle for reuse. Consequently, researchers resorted to contacting the creator of the data (the principle investigator of the study) or the third-party provider of the data to check whether the data fles contained the variables or measures of interest, before formally requesting a copy. Faniel, Kriesberg and Yakel (2015) surveyed social scientists who had successfully reused data deposited in the Inter-university Consortium for Political and Social Research, and found that the quality of the data documentation was signifcantly related to data re-users' level of satisfaction with the reuse experience, as it "facilitated an in-depth understanding of the data collection procedures and, subsequently, increased trust in the data" (pp. 1412). The experience of qualitative data re-use is dependent on the availability of rich contextual documentation, such as information about the feldwork and data collection methods, and background or demographic information about the research participants, but less so information about "the primary project itself, why it was done, and so on" (Bishop and Kuula-Luumi, 2017, pp. 9). This suggests that social science data is more commonly used for answering new research questions, for providing a comparative sample for new research, or for methodological purposes such as teaching research methods, rather than for replication of the original research that produced the data.
Given the amount of effort that goes in to preparing data for sharing, what can a creator or depositor of social science data do to ensure their collection is a good candidate for reuse? Interesting learning can be gleaned from exploring the archived collections that are requested the most often. Data that has been produced with reuse in mind is more likely to include good quality 'provenance information' (Goodman et al. 2014) and following on from the points made above, is likely to be associated with success stories of reuse. There is also some effect where the data collection is actively promoted, and in different ways to different user-audiences. For example, in her exploration of the most frequently requested collections in the UK Data Archive, Bishop concluded that the "active promotion" of a study dramatically increased the use of its associated data in the archive (2014, pp. 168). This is certainly true for the data collections in the Irish Social Science Data Archive, where the most requested data are from the Growing Up in Ireland 1 study and The Irish Longitudinal study on Ageing (TILDA) 2 , which are both nationally representative, longitudinal studies that receive signifcant media coverage. The data from both are reused in a wide array of new research, primarily in the felds of social science and health science.
This paper tells the story of the Prevention and Early Intervention Research Initiative (PEI-RI), which is a signifcant archiving project to preserve a series of evaluation datasets that were generated by twelve years of funding from The Atlantic Philanthropies (hereafter, Atlantic) in the children and youth sector across the island of Ireland. The project provides a case study of curating a digital collection that is intended for multiple stakeholders with different expectations of the archived material. The PEI-RI data curator is located in the middle of a triad of data creators, data consumers and data archives, and is tasked with balancing the interests, expectations and limitations of each. Now in its fnal phase, there has been a concentration of effort in building tools to create pathways to the data that was archived between 2016 and 2018. This signposting work has been undertaken by the data curator, who has an overview of both the contents of the PEI-RI archived data, and the needs and interests of the various stakeholder groups, based on consultations with various user audiences.

Origins of the Data
In 2004 Atlantic launched its Prevention and Early Intervention Initiative in Ireland and Northern Ireland, which was a signifcant funding commitment to transform the way that children and young people receive services on the island (The Atlantic Philanthropies, 2015). 'Prevention' is defned as "providing a protective layer of support to stop problems from arising in the frst place or from getting worse"; and 'early intervention' is defned as "providing support at the earliest possible stages when problems occur" (Prevention & Early Intervention Network, 2018). A prevention and early intervention (PEI) approach to service delivery can be less expensive, less punitive and have a greater chance of success than intervening at a later point in a problem cycle. Under this initiative, Atlantic funded a series of interventions and evidence-based services across the island of Ireland, sometimes in conjunction with government departments and other organisations. This extensive investment into PEI in Ireland and Northern Ireland was consistent with international trends towards prevention strategies in health, education and social care, and is estimated to have reached 90,000 children and young people, 23,000 parents and caregivers, and 4,000 professionals (Rochford, Doherty and Owens, 2014).
In addition to promoting PEI as a methodology, Atlantic were instrumental in embedding evidence-based practice in the children and youth (C&Y) sector in Ireland and Northern Ireland. Evidence-based practice involves the implementation of programmes and interventions "that have been consistently shown to produce positive results by quality, independent scientifc research" (Hickey et al., 2018, pp. vi). Up to that point Ireland "did not have a strong tradition of using research evidence to make policy decisions or conducting rigorous evaluations of its programs" (Paulsell, Del Grosso and Dynarski, 2009, pp. 6). This was a major shift for community and not-for-proft organisations, whereby those in receipt of an Atlantic grant began gathering and using data to inform how their programmes were designed and delivered. Atlantic actively promoted the use of data as 'evidence' for social change through the following stages: 1. a community organisation commissioned a 'baseline' research study to gather data about the specifc needs in their local community; and following this, 2. the community organisation chose an appropriate PEI programme with a strong scientifc evidence base for its effectiveness, sometimes in consultation with the academic that created the programme; and once the programme was implemented, 3. the community organisation commissioned a rigorous evaluation the effectiveness of the programme in the community they serve. Effectiveness was usually measured in terms of improvements in the outcomes of the target population.
The resulting Atlantic funded programmes included home visiting interventions, parenting programs, high-quality early childhood education, and youth mentoring strategies. Programmes that addressed the legacy of confict and sectarian division for children and young people in Northern Ireland were also funded.
Community organisations in receipt of Atlantic funding were required to commission an independent evaluator to conduct the evaluation of their programme (stage 3 above) using a scientifcally robust methodology, most often an experimental design. Evaluation data were collected through a variety of means, depending on the objectives of the research, and a single evaluation might generate data using a combination of methods such as direct assessments, selfcompleted questionnaires, face-to-face interviews, analysis of administrative records, focus groups and feld observations. Some of the data fles were born digital, such as word-processed interview transcripts or data collected using computer-assisted personal interviewing; while others were manually transcribed from paper surveys into statistical programmes such as SPSS. The evaluations were conducted by social science and health science academics from third-level institutions in Ireland and the UK 3 . In general, an evaluation was commissioned through open competition, and the contract awardees usually had some disciplinary expertise in the intended outcomes of the intervention or programme, for example a programme to establish high-quality early childhood education was evaluated by academics with expertise in early childhood development and education theory. Given the range of social problems targeted by different interventions, plus the range of disciplinary backgrounds for each research team, there is much variation in the type of evaluation data collected (see Geraghty, 2017 for a more detailed description of this variation). What bound all of these studies together was the development of a much-needed, locally grown evidence base to leverage future support for PEI.

The Data Preservation Project
In 2016, Atlantic completed its grant giving in Ireland, and in the same year established the Prevention and Early Intervention Research Initiative (PEI-RI) to preserve the outputs from the PEI programme evaluations in the national data archives, as a publicly available evidence base. The PEI-RI was an ambitious project given the scale Atlantic's investment across a range of geographic sites, and across a range of interventions that targeted different communities in different ways. At the outset of this project, more than ffty evaluations were identifed as a relevant source of data for this evidence base. However, there were a number of bumps along the road to archiving the PEI data (see Geraghty, 2017 for a detailed discussion on the challenges of archiving this legacy data). The most signifcant bump was the omission of a 'permission to archive' clause in the consent process with research participants, and this omission excluded almost three quarters of the evaluations from being archived. In most cases this clause was omitted because, during the commissioning and design phase of these evaluations, neither the commissioners nor the researchers considered the future potential for sharing their data with others and did not view the 'raw data' as the evidence base for PEI, but rather the reports from their analysis of this data. During the commissioning phase of the research, discussions around permission and copyright usually focused on ownership of published reports, and rarely on what would become of the data after the evaluation. Consequently, fourteen PEI data collections were deemed suitable for sharing in the public data archives. The list of archived collections is available in Table 1 (see Appendix).
The political and social signifcance of the data is manifest in the range of stakeholder groups that the curator has engaged with in the course of the PEI-RI project, including the community and not-for-proft organisations that operated the PEI programmes, the research teams from academic institutions that evaluated these programmes, and representatives from government departments that co-funded many of these programmes with Atlantic. It was Atlantic's intention that the archived data would be of value for further exploration and development of the PEI method, but in fact the data is of value far beyond PEI. The extended period of investment into social science research was unprecedented, allowing for population-Geraghty | 5 level data to be gathered about communities that were experiencing deprivation during a period of rapid economic and political change. For example, population baseline studies involved the collection of large amounts of demographic and descriptive information about these communities. In its totality, the research spans fourteen years, beginning in the Celtic Tiger and post-Good Friday Agreement era of the early 2000s, through the global Great Recession era of the late 2000s, and concluding during the post-recession period up to 2015. It is therefore likely the collection will "prove to be a signifcant part of our cultural heritage and become resources for historical as well as contemporary research" (Corti, 2007, pp. 37). The following section describes the steps that were taken by the Data Curator to access, prepare and publish the fourteen collections of PEI data in the Irish public data archives.

Steps to Preserve the PEII data (2016 -2018)
Step 1: Negotiation with the Data Owner During 2016 the PEI-RI data curator approached the copyright owner of each collection of evaluation data, to obtain their permission to review its suitability for inclusion in the archives. In general, copyright belongs to each community and not-for-proft organisation that operated each PEI programme. These organisations were mostly enthusiastic about the potential to share their evaluation data in the archives, as it is in keeping with their orientation towards publicising and sharing their knowledge on PEI within their peer networks, particularly where a programme has been found to be effective. In the majority of cases, the commissioning organisation did not hold any copies of the evaluation data, sometimes as a measure to preserve respondent confdentiality and sometimes because they did not have an in-house researcher or data expert to manage it. We were mostly referred to the evaluator for access to the data.
Step 2: Collaborative Work with the Data Creator During the data processing phase in 2016 -17, the curator supported eight research teams across four universities in Ireland and the UK to locate the evaluation data and prepare it for the archive. In nearly all cases the main evaluation was conducted by researchers based in thirdlevel institutions, although in a small number of cases, an independent social research agency was commissioned to conduct the evaluation, or part of it. Many of the principle investigators had evaluated several different PEI programmes, but only a handful had previous experience of archiving research data. The curator trained post-graduate staff at three universities in methods for appraisal, data cleaning and quality control, and data de-identifcation, which were carried out on-site at the university before data were passed over to the curator for the curation activities. The PEI-RI project provided grants to support internal staff to work on the data for approximately four to six months, depending on the size of the collection. Two guiding documents 4 were created by the curator to ensure data were prepared in a consistent manner across these sites. Both documents are based upon best practice guidance in social science data archiving from the UK Data Archive (Van den Eynden et al., 2011), Inter-university Consortium for Political and Social Research (2012), Irish Qualitative Data Archive (2010) and the Digital Curation Centre (Whyte and Wilson, 2010). The PEI-RI guidance documents provide clear guidance on disclosure limitation protocols, including response category aggregation and top and bottom coding to remove extreme values, and in a small number of cases, the removal of variables with a high potential for disclosure harm. For example, one evaluation collected extensive data on family characteristics including a question about whether a social worker was involved with the family (PFL evaluation team, 2013). This variable had the potential to identify a family as experiencing social and emotional problems, but also more serious issues such as domestic violence and child abuse. In this instance, the variable about 4 CRN-PEI Protocols for preparing and archiving evaluation data, and the CRN-PEI Guiding Principles.

IJDC | Conference Pre-print
6 | Data curator in the middle social worker involvement was removed from the dataset, to allow us to retain most of the descriptive data about each family, which would be of value to a wide audience of new users. The Irish Qualitative Data Archive's guidance on assessing the sensitivity of social science data (2010, pp. 6-7) and the Anonymisation Decision-making Framework by Elliot et al. (2016) were pivotal documents for this work. Step

3: Confirmation of Copyright Permission for Measures
A key step in data processing was to confrm whether the results generated by standardised measures could be included in a publicly available data collection. Some of the standardised measures that were used by these evaluations have very specifc copyright conditions which prohibit the replication of their materials, including the content of their questionnaires and scoring instructions. The curator contacted the copyright owner of each standardised measure to confrm what material could be reproduced in the public data archives. In most instances we were permitted to include individual items (variables) once the item label did not contain any copyrighted information. For example, the variable that was generated by question 1 on the communication subscale of the Ages & Stages Questionnaire (ASQ) was re-labelled 'ASQ communication Item 1'. This provides the new user enough information to re-use the data, even at the subscale level. One risk with preparing the data in this way is a new user cannot confrm how well the items in the archived fle match the question in the survey and must trust the data were prepared accurately. Also, a new user must ensure they are using the same version of the standardised measure that was used in the original study. The publication information for each scale, such as version number and year of publication, is captured in the data documentation.
Step 4: Creation of Contextual Documents Curation activities involved the creation of documentation for each archived collection, including a detailed codebook for the quantitative data. A codebook can be automatically generated by a statistical programme, such as SPSS, and this fle lists all the variables within a data fle along with the coding responses per variable. The curator included information on where and how variables had been anonymised and provided a citation for each standardised measure. Each codebook concludes with a list of variables per data fle, and the thematic domain to which they belong. Although it is not standard practice to include this, the variable list was added to the codebook because evaluation data tends to have a large quantitate of variables and can be unwieldy for a new user to navigate. The list provides a snapshot of the data and allows new users to quickly identify data generated by standardised measures across different evaluations, and therefore supports cross-dataset analysis. The curator also created a 'user guide' for each evaluation which provides technical information in a standardised and easy to navigate format. The guide also describes how the data were prepared for the archive, such as the method for de-identifying participants and managing missing data. The user guide contains templates of the information and consent material that were given to participants during the research. All the contextual documents are openly available for download from the archives and can be reviewed before proceeding with a request to access a restricted data collection 5 .
Step 5: Promoting Re-use of the Data In total, thirteen quantitative data collections were deposited with the Irish Social Science Data Archive (ISSDA), and one qualitative collection with the Irish Qualitative Data Archive (IQDA). In 2017-18, the PEI-RI awarded a series of research grants to support secondary analysis of the Geraghty | 7 PEI data, the results of which are reported in a special edition of the open access Children's Research Digest (Guerin and Geraghty, 2018) to promote the use of the PEI data amongst specialists in the C&Y sector. However, the PEI-RI project is not only about preserving the data for further scientifc research, but also to support the ongoing work to embed PEI knowledge into regular service provision. We realised that an alternative approach was needed to promote the data amongst user groups who typically do not work with variable-level data so they could get value from the archived materials. In 2019 we consulted with a range of stakeholders in the C&Y sector in Ireland and Northern Ireland, including service providers, policy makers and researchers working with and for children and young people. The key fnding from this consultation phase was the value in building resources based on the archived material that would support the work of these stakeholders in commissioning and delivering services for children, young people and families. The following section describes two resources that these stakeholder groups were most enthusiastic about, and these are in production in 2020 6 .

Creating Pathways to the Data
The frst resource is a searchable library of over 100 PEI evaluation reports from Atlantic's PEI investment in Ireland and Northern Ireland. These are public-facing evaluation reports and summaries that will be openly accessible via the Digital Repository of Ireland (DRI), which is Ireland's national digital repository for humanities, social sciences, and cultural heritage data. The DRI provides access to related qualitative material from the Irish Qualitative Data Archive, including data from Preparing for Life 7 , Growing Up in Ireland 8 , and other research on the Irish family. Also, the DRI hosts a curated collection of business records and ephemera from Atlantic's grant giving activities across the island of Ireland, which links to a larger legacy project at Cornell University. Using the archival metadata, the PEI evaluation reports will be linked to these international curation projects and will be exposed to new audiences. At present, many of the reports from the PEI investment are openly available through the websites of copyright holders, but in a temporary and disjointed manner across multiple locations. Once ingested into the DRI, each report will be richly described using Dublin Core metadata and will be minted as a digital object with a persistent identifer. Where the report has associated data in the data archives, a data citation will be provided, including a persistent identifer for the data collection. The library of reports is one signifcant part of our work to preserve the Irish evidence base for PEI, and it provides a context to the genesis of the archived data.
The second resource is the Index of Standardised Measures, which is a database of over 200 measures that were used across the PEI evaluations, and also the Area Based Childhood Programme evaluation and the Growing Up in Ireland study. All of these studies share a central theme of measuring Irish children's outcomes and have used many of the same standardised measures. A standardised measure is a research instrument (usually in the form of a questionnaire) which is used to assess the characteristics of an individual or group, for example the Adaptive Social Behaviour Inventory is used to assess the social development of a preschool child. Measures can also be used to assess the quality of a provision or setting, for example the Environmental Ratings Scales (ERS) are used to assess process quality in early childhood group care. These measures are standardised, meaning the results are scored in a "standard" or consistent manner, which makes it possible to compare the relative performance of individuals or groups. They are generally considered to have good validity and reliability, which is an indication of the degree to which the scale can measure what it claims to measure.
The Index of Standardised Measures provides a detailed description of each measure along with links to where the measure can be downloaded or purchased. Information per measure is provided using the Dublin Core Metadata Initiative element set plus additional felds. The database will enable the user to search for a standardised measure using a range search criterion, and to assess its suitability for their research needs. They can also compare measures that are used to assess similar characteristics or outcomes. The primary audience is the practitioner group, for example, a teacher who wants to measure student outcomes, or a service organisation conducting in-house evaluation of their programme. The Index will also be of use to the research community for fnding and comparing measures, and for commissioners of research who are assessing the suitability of a proposed methodology. Similar measure databases already exist however a key strength of the Index of Measures is the localised context it provides. Standardised measures are typically created and tested with populations outside of Ireland. The Index will provide a long-lasting link to the associated PEI evaluation report in the DRI (as described above), allowing the user to assess how the measure performed with an Irish population. Where there are associated data from a measure in the public data archives, the Index will provide a hyperlink to their location, and will therefore drive traffc towards the archived datasets and enhance discoverability.

Conclusion
By the time Atlantic completed its grant giving in 2016, an evidenced-based approach to mainstream service design and delivery in Ireland and Northern Ireland had been frmly established. Their legacy in the C&Y sector is evident in recent developments such as the Irish government's What Works initiative, which aims to facilitate practitioners, service providers and policymakers to access data for service planning, design and delivery 9 . The data that was archived by the PEI-RI project is the bedrock upon which current mainstreaming of PEI approaches on the island is based, and it is of value beyond Ireland to a growing international evidence base for PEI. We realised that it would not be enough to provide access to raw data fles alone, as "[g]enerating and collating the evidence is of no use if it never reaches the commissioners and professionals who need it" (What Works Network, 2014, pp. 6). In the latter part of the project our challenge was fnding ways to maximise the value of the archived data to the widest range of end users, and this was mostly about providing pathways to the data or parts of it. Because the objective of this project was to archive a large quantity of data, we were fortunate to have the time and resources to improve and experiment with data signposting. Curation is generally not a priority for researchers, and given the current under-resourcing of the public data archives in Ireland and Northern Ireland this project provided an interesting case study for what can be achieved when archival staff have suffcient opportunity to become well acquainted with the data and their potential.
The latter part of the PEI-RI project involved an ongoing and active consultation with key stakeholders, alongside drafting of various digital resources. During this active consultation, the data curator has been able to both test the ground for these resources but also create a level of anticipation amongst the stakeholders. Our experience is echoed in the assertion by Bishop that "archives need not be passive agents, trying to fathom what users want. They can actively shape those needs and wants, ideally in an interactive and collaborative manner with re-users" (2014, pp. 168). Two of the more popular resources in development have been described here. The library of evaluation reports exposes the research to new audiences through linkage with projects in other disciplinary areas, while also preserving the context of the archived research data. The

Geraghty | 9
Index of Measures is a tool to drive user-traffc to the archive, and to improve data discoverability by pointing to the exact location where comparative samples exist. When these two resources are launched in 2020 we will investigate their impact on the number and type of interactions with the archived data.