Research Data Management Initiatives at University of Edinburgh

During the last decade, national and international attention has been increasingly focused on issues of research data management and access to publicly funded research data. The pressure brought to bear on researchers to improve their data management and data sharing practice has come from research funders seeking to add value to expensive research and solve crossdisciplinary grand challenges; publishers seeking to be responsive to calls for transparency and reproducibility of the scientific record; and the public seeking to gain and re-use knowledge for their own purposes using new online tools. Meanwhile higher education institutions have been rather reluctant to assert their role in either incentivising or supporting their academic staff in meeting these more demanding requirements for research practice, partly due to lack of knowledge as to how to provide suitable assistance or facilities for data storage and curation/preservation. This paper discusses the activities and drivers behind one institution’s recent attempts to address this gap, with reflection on lessons learned and future direction. 1 This paper is based on the paper given by the authors at the 6th International Digital Curation Conference, December 2010; received December 2010, published July 2011. The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. ISSN: 1746-8256 The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre. Robin Rice and Jeff Haywood 233

The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors.ISSN: 1746-8256 The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre.

Background
The University of Edinburgh is a research-led Higher Education Institution (HEI) with nearly 27,000 students whose mission is "the creation, dissemination and curation of knowledge."It has been ranked in the top five of UK universities by volume of 4star "world-leading" research in the UK's 2008 Research Assessment Exercise and twentieth in the world according to the 2009 Times Higher Education-QS World University Rankings.Within the University, Information Services (IS) provides support for the research and education activities of schools across three colleges and consists of the following divisions: Library User Services, Library & Collections, IT User Services, IT Infrastructure, Applications, EDINA and Data Library, and the Digital Curation Centre (DCC), plus the Vice Principal's Office.Although both EDINA and the DCC provide national rather than locally-facing services, each of these Divisions has a unique perspective on the research, computing, and data requirements of the University.
A number of recent and current activities inform this paper.These include research and development projects funded by the UK Joint Information Systems Committee (JISC), led by the Data Library with the aim of enhancing the service:  DISC-UK DataShare, 2007-2009; 2   Data Audit Framework Edinburgh Implementation, 2008; 3  Research Data MANTRA (Management Training), 2010-11. 4  They also include University activities led by Information Services top management:

From Data Use to Data Sharing
The Edinburgh University Data Library celebrated its 25 th anniversary in 2008, with a Symposium on Institutional Data Services.One objective was to identify appropriate future directions for the service.The Data Library was started to meet a demand for support of 1981 machine-readable population census data and other largescale datasets, such as government surveys, used by social scientists and others.
Data Library staff continue to provide direct support to University of Edinburgh staff and students through personal appointments and answering email enquiries.We also maintain a library of datasets for local use, and extensive web pages on finding data sources on the internet.We provide documentation and training for both local resources and national data services, such as EDINA Digimap.
Support for finding, accessing and using data is the traditional role of academic data librarians (where such traditions exist).But so much has changed in the computing environment in the last 25-30 years that data librarians, like reference librarians faced with the reality of Google, find themselves in need of inspiration to reinvent their roles for supporting modern forms of research and computing.As the principle of scarcity (of information) is replaced by information overload in mainstream academic librarianship, so an emphasis on support for finding and using data now needs to be balanced with support for managing and sharing data.
So there appeared to be a need to assist our researchers in sharing their data over the internet, and that repository platforms could provide a solution.
…And from Data Sharing to Data Management DISC-UK (Data Information Specialists Committee -United Kingdom) is made up of data professionals working in UK Higher Education who specialise in supporting their institution's staff and students in the use of numeric and geo-spatial data.Together with repository managers at the Universities of Edinburgh, Oxford and Southampton, the group initiated the DataShare project to pilot the establishment of institutional data repository services, as an addition to these institution's extant publications repositories.Edinburgh DataShare was established out of the project by the Data Library as an institutional data repository service to compliment the Edinburgh Research Archive, an open access publications repository operated by the Library.
Among the lessons learned were identifying benefits and barriers to deposit, especially on open access terms (Gibbs, 2007).As one of the key deliverables for the broader community, project staff authored a guide to assist other institutions with policy-related decision-making about accepting datasets in repositories (Green, Macdonald & Rice, 2009).This output was converted into a training workshop facilitated by DISC-UK members at two events in 2009: the International Association for Social Science Information Services and Technology (IASSIST) conference in Tampere, Finland, and Beyond the Repository Fringe in Edinburgh, United Kingdom.

The International Journal of Digital Curation Issue 2, Volume 6 | 2011
To identify datasets being created in different parts of their universities, and to begin to address some of the concerns about sharing data in an open access repository, partners utilised the Data Audit Framework (DAF) to engage with researchers at all stages of the research process.The DAF methodology, developed by the DCC staff in Glasgow, was conceived in response to recommendations made in the JISCcommissioned report, Dealing with Data: "A framework must be conceived to enable all universities and colleges to carry out an audit of departmental data collections, awareness, policies and practice for data curation and preservation," (Lyon, 2007).
The Edinburgh DAF Implementation project produced five case studies as one of four JISC-funded projects to test the framework.Some of the concerns pointed to the need for improvement in data management practice (Ekmekcioglu & Rice, 2009).The case studies found:  Storage provision is often insufficient,  Data value is perceived as high and long retention periods needed,  A lack of a formal data management plan; ad-hoc practices,  A lack of guidelines and standardised procedures in creating and storing data,  Minimal metadata; much effort is expended in finding extant data on servers.
The other DAF pilot projects showed our own institutions were not alone in facing these issues (Jones, Ball & Ekmekcioglu, 2008).
These results, along with the digital curation community's vocal consensus about the need for planned curation from the very start of the research or data lifecycle, made it clear that to help our researchers share or publish their data openly, ultimately to support the re-use of locally created data, we could not escape the imperative of supporting them to better manage their data.

Raise the Game through Support and Training
The Edinburgh Data Audit Framework Steering Committee outlived the DAF Implementation Project by a year or so.Members of the committee were broad-based, including for example, the University Archivist, an Edinburgh Research and Innovation official (as the research office is called), data 'champions' in academic departments, and IS managers.Whilst the project was completed in about eight months, the committee voted to continue meetings until an agreed set of outcomes could be delineated.This was seen as key to embedding the lessons learned from the case studies.Put another way, now that we knew how unsatisfactory the general situation was for academic staff, it would not seem right to simply finish the project and move on.
In the end, the committee agreed on three desirable outcomes, while noting a fourth: 1. Guidelines for research staff in planning and carrying out research data management, that they could reference on the university website.2. Training for postgraduates and early career researchers to embed good practice early and contribute to culture change.Issue 2, Volume 6 | 2011 3. Policy development to clarify expectations and responsibilities between the institution and its research staff (discussed below).4. Service gap analysis -this was attempted to determine which services already meet related requirements and what services are missing, but was aborted because the committee did not feel well-positioned to complete it without more Information Services managers present.

The International Journal of Digital Curation
The Data Library staff, in conjunction with the DAF project manager based in Research Computing, wrote a set of web pages to meet the goal of creating general guidelines for research staff.These were launched as part of the new-look Information Services website in September, 2009 and gained some attention amongst practitioners in the digital curation community.Pages were kept short and covered definitions and funder policies; reasons to manage research data well; a checklist for planning; documentation and metadata; and storage, security and encryption (with crossreferences to existing information on the IS website).Another group of pages covered basic digital preservation concepts, data sharing, and depositing data in a repository, as well as key contacts for getting help within the institution.These guidelines need more promotion amongst our university researchers, many of whom are still unaware they exist.
In the autumn of 2009, a partnership with Transkills (Postgraduate Transferable Skills Unit) was formed to consider training methods for PhD students and early career researchers across the disciplines, and a workshop was piloted in the School of GeoSciences with detailed feedback sought from students.Evaluation of the workshop paved the way for the development of a proposal in response to the JISC Research Data Management Programme in the spring of 2010 to develop online training materials for Transkills, now part of the Institute for Academic Development.
Research Data MANTRA (Management Training) was funded to run from August 2010 to July 2011.The project aims to develop online learning materials which reflect best practice in research data management grounded in three disciplinary contexts: social science, clinical psychology, and geoscience, reflecting partnerships with three postgraduate programmes.In addition to web-based 'chapters' that students can work through at their own pace, the course will include video interviews with leading academics about data management challenges, and practical exercises in handling data in four software analysis environments: PASW, also known as SPSS, NVivo, R and ArcGIS.The resultant materials will also be deposited with an open licence in JorumOpen, a national repository for open educational resources.
The video-based anecdotes aim to make the somewhat dry topic of data management relevant to postgraduates who are immersed in very specific research topics and may not appreciate the goal of developing good practice in RDM, while the data handling exercises aim to bridge the gap between a course covering broad data management concepts and discipline-specific data analysis skills taught in core courses.Data handling covers the software-based preparatory aspects that allow analysis to be conducted, as well as leading the way to proper documentation of data transformations.Issue 2, Volume 6 | 2011 While it is hoped the training will contribute to long-term culture change and raise awareness in the three graduate programmes and beyond, the Data Library also plans to increase its outreach to academic staff over the next year to gain voluntary deposits in the Edinburgh DataShare repository, and to reach its planned target of establishing six new research collections over the academic year.More online guidance contextualising the deposit process will be written (such as explaining open data licensing) and face to face meetings with principal investigators and research units who have data to share will be offered.New engagement with schools and Scottish Research Pools may take the form of workshops or focus groups to discuss research data management planning and data sharing solutions for collaborative research across institutions.The Edinburgh-based ERIS project (Enhancing Repository Infrastructure in Scotland), DCC and other parts of IS may be collaborators in some of this activity.

The International Journal of Digital Curation
Generally it is hoped that through these efforts, research projects in the university at any stage of the Digital Curation Continuum6 , a visual aid created for DISC-UK DataShare, will find themselves moving one or two notches further up in the diagram.At the start of 2010 the Vice Principal for Knowledge Management set up two linked Working Groups to help find a realistic way forward in dealing with the complex challenge of digital research data.One group explored research data storage (RDS) and the other research data management (RDM), as two parts of a whole.In one sense, the RDS group approached the 'bottom-up' needs expressed by researchers in the research computing survey, while the RDM group explored the top-down drivers for an institution to take responsibility for the management of its research data assets.Formally, the two groups reported through the Library Committee (RDM group) and the Information Technology Committee (RDS group).

Top-down Meets Bottom-up in the Middle
In October 2010, both groups' draft outputs were circulated to the university community and the college committees for consultation.The documents indicate an important direction of travel for the University of Edinburgh.The outcome of the RDM document is a draft university policy for managing research data, taking account of increasing funding agency demands for compliance, the open access agenda, and the shared responsibilities of the university and principal investigators.The RDS document identifies the types of data that need storage and the features that an effective University-wide storage service would need to offer.The staff effort and hardware/software implications of the documents are substantial and a joint implementation plan will be key to their effectiveness.

Cross-Platform File Store
It is recommended that a globally accessible cross-platform file store (GACPFS) be established centrally.The computing requirements of different research groups are too varied for a single platform solution, and data need to be accessible from outside the university domain.Digital data of all kinds need to be backed up.The research computing survey had found that 80% of researchers' needs would be met with a file store of 100 gigabytes.A simple but flexible allocation is required that can meet increasing and variable demand.

Authentication, Authorisation and Access
Access control is important, as well as secure storage for sensitive data and automated encryption and decryption, where needed.A standards-based solution for controlling access to the file store by authorised researchers from other institutions using a federated identity system is needed.However, the group's findings are that it is by no means clear that this exists (Shibboleth, OpenID and Eduroam are limited to Web or WiFi access).Commonly for databases, access control needs to be fine-tuned for Create-Read-Update-Delete (CRUD) permissions on particular tables or even elements (fields) as well.Other university researchers will require more open access to leverage peer improvements to the data, or possibly even 'crowdsourcing' techniques.The model put forward will need to identify a fixed set of variants or 'flavours' of data that need to be accommodated in terms of access control and use, in order to be efficient.

Backup and Syncing
The ability for the server to automatically back up from laptops and sync from other mobile devices needs to be in scope.In essence, this will allow Information Services to be the cloud provider for its users.Whether commercial cloud solutions are utilised to perform this role in the future is incidental, although there are strong indications that this would help with the green agenda of reducing the university's carbon footprint, articulated in the University Sustainability Policy.A 'Drop-box' type functionality would be the minimum requirement; full automated syncing in the background for certain mobile devices is desirable.

The International Journal of Digital Curation
Issue 2, Volume 6 | 2011

Centralisation and Trust
In addition to technical design issues, IS needs to tackle issues of trust for such a system to be effective, including the access and security issues mentioned earlier.Trust becomes a more acute issue if and when storage is outsourced to commercial cloud operators.The Edinburgh Compute and Data Facility (ECDF), introduced in 2007 as a project and launched formally as a service in October 2010, is a success story for IS in terms of initiating a centralised high performance computing (HPC) solution.The launch marks the end of an era in which HPC is carried out by researchers forced to buy and attempt to maintain their own kit.Those who use ECDF contribute to its operation with their research funding, a model which is most efficient and sustainable.The RDS recommendations aim to provide data storage solutions that similarly eliminate the need for 'under the desk' solutions by university researchers.

Network of Support
Developing a network of support was the final issue raised in the RDS group, which identified the three IS college consultancy teams (made up of both library and computing specialists) as a potential locus for developing 'face-to-face' support for research data management planning.Training for support staff was not covered in the paper, but the DCC's Digital Curation 101 course may prove useful in this regard.

Towards University Policy
The RDM group began by discussing a range of potential activities that would improve research data management practice across the university, but quickly found its focus in the university policy arena.This was, in part, because some other activities were already under way (e.g., web guidelines, learning materials) or could be provided nationally (e.g., the DCC's data management planning tool).In part it was because any recommendation for the creation of new services within IS raised the question of what drivers, including demand, existed for rationalising such changes of priority, especially in an environment of increasing financial constraints.On the other hand, by putting policy first, inevitably the question was raised as to what services would be needed to support such a policy.In the end the group seemed to decide that by nurturing the egg first (policy), healthy chickens (research data management best practice and related support services) could then be born.

Concerns
Some trepidation within the group has been apparent in pursuing the policy route.The Library and Collections Director, who chaired the RDM group, had recently succeeded in getting the Research Publications Policy through the University Senate.This policy requires academics to deposit their research outputs in a publications repository (and as open access, where appropriate) from January 2010.Questions arose about the extent to which a research data management policy implied a 'mandate' and would be resisted by hard-pressed staff who might see it as just another bureaucratic noose around their neck.
It was also noted that the Incremental project at Cambridge and Glasgow had completed a survey that found researchers were often unaware of departmental policies on RDM or found such policies dense and unfriendly (Freiman, Ward, Jones, Molloy & Snow, 2010).Some researchers in the group initially felt that such policies were better placed coming from the research funders, and that the HEI's claim on the data The International Journal of Digital Curation Issue 2, Volume 6 | 2011 assets produced in the course of research were tenuous.At worst, asserting a policy would be seen to interfere with academic freedom.Finally, when it was noted that the University of Edinburgh had an opportunity to be the first to establish a research data management policy in the UK, it was argued that being first could be a risk if we got it wrong, and it was better to wait and see what others would do.

Drivers
A prime driver for the eventual consensus of the group towards a university policy was the recent adoption of the Code of Practice for Research (UK Research Integrity Office, 2009) by the university's research office.This sets out rules for the retention of and access to data related to research publications, as well as the obligations of institutions to provide support for doing so.A second driver was the negative publicity engendered for one HEI after approximately 1,000 email messages from the University of East Anglia's Climatic Research Unit were hacked and published online in November 2009, shortly before the Copenhagen Summit.Two independent reviews of the incident have been published, one of them investigating allegations relating to aspects of the behaviour of the CRU scientists, including their handling and release of data.A number of lessons were outlined for the University of East Anglia in particular and HEIs in general, not least the reputational risk and legal accountability associated with staff not being forthcoming in response to Freedom of Information (FOI) requests from the public (Russell, Boulton, Clarke, Eyton, & Norton, 2010).

A Draft Policy
The Vice Principal and the RDM Group Convenor initiated action towards policy development by hiring the recently retired director of the DCC to write a number of draft policy iterations, taking into account feedback from the group, over the summer of 2010.This has culminated in the draft policy being made available on a closed university wiki and circulated to university committees, along with the RDS paper for consultation, from October through to November.
One starting point for the policy was the DAF steering committee's short paper outlining policy recommendations.These covered the need to clarify ownership and intellectual property rights for research data assets, the need for data management plans and procedures at various levels including the research unit, the need for adequate retention of data to allow sufficient reference following publication of results, the need for guidance on retention periods and support and advice for curation, and a formal procedure for data transfer (e.g., right to a copy) for when staff and students leave the institution (Rice et al., 2010).
The draft policy is made up of ten principle points: The draft policy itself is contextualised within an 11-page paper.
The current policy paper omits a definition of research data, but an earlier version adopted the Organisation for Economic Co-operation and Development's (OECD) definition of research data.This definition is useful to distinguish research data from analogue data (such as specimens), corporate data, or learning and teaching data, not to mention any other digital files that accumulate on members of staff's hard drives."In the context of these Principles and Guidelines, 'research data' are defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings.A research data set constitutes a systematic, partial representation of the subject being investigated."(OECD, 2007) The policy paper deals with the issue of open data, outlining open data licenses and giving reference to the Panton Principles7 without being prescriptive about such decisions by researchers.It also covers potential constraints on the policy, including the Data Protection Act, Department of Health guidelines and requirements from Ethics Committees.
The success of any policy that may be approved through this initiative will be somewhat dependent on other institutions adopting compatible policies, in no small part because of the enormous importance of collaborative research.

Conclusion
We have discussed issues involved in exploring university obligations in the area of research data management, while conveying the current direction of travel at one institution, the University of Edinburgh.The issues are fairly static -from data ownership and rights to retention and sustainability -but the solutions are a moving target as the research environment and its technologies continue to change, subtly altering what is perceived as possible, feasible, and desirable.
Previous to the DAF work, the Vice Principal's Office of IS had commissioned a survey of staff to help develop its Research Computing Strategy."Allstaff and research students in all three Colleges need data services at some level to underpin and secure their research.Data services should include regular, effective backup at a secure location with effective recovery, and curation and preservation of data in such a way as to ensure availability for re-use by the creators or others.At present, although there are some examples of good practice, there is a general insufficiency of both facilities and support, centrally and locally, resulting in varied practices across the University.Development of appropriately scaled data services and training in their use are therefore essential if the University wishes to maintain a high quality research environment."(University of Edinburgh Knowledge StrategyCommittee, 2008)

International Journal of Digital Curation Issue 2, Volume 6 | 2011 4
All new research proposals [from date of adoption] must include research data management plans or protocols that explicitly address data capture, management, integrity, confidentiality, retention, sharing and publication..The University will provide training, support, advice and where appropriate guidelines and templates for the research data management and research data management plans.5.The University will provide mechanisms and services for storage, backup, registration, deposit and retention of research data assets in support of current and future access, during and after completion of research projects.6.Any data which is retained elsewhere, for example in an international data service or domain repository should be registered with the University.7.Research data management plans must ensure that research data are available for access and re-use where appropriate and under appropriate safeguards.8.The legitimate interests of the subjects of research data must be protected.9.Research data of future historical interest, and all research data that represent records of the University, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a University repository.10.Exclusive rights to reuse or publish research data should not be handed over to commercial publishers or agents without retaining the rights to make the data openly available for re-use, unless this is a condition of funding.(University of Edinburgh Research Data ManagementGroup, 2010)