Research Data Management Education for Future Curators

Science has progressed by “standing on the shoulders of giants” and for centuries research and knowledge have been shared through the publication and dissemination of books, papers and scholarly communications. Moving forward, much of our understanding builds on (large scale) datasets, which have been collected or generated as part of the scientific process of discovery. How will this be made available for future generations? How will we ensure that, once collected or generated, others can stand on the shoulders of the data we produce? Educating students about the challenges and opportunities of data management is a key part of the solution and helps the researchers of the future to start to think about the problems early on in their careers. We have compiled a set of case studies to show the similarities and differences in data between disciplines, and produced a booklet for students containing the case studies and an introduction to the data lifecycle and other data management practices. This has already been used at the University of Southampton within the Faculty of Engineering and is now being adopted centrally for use in other faculties. In this paper, we will provide an overview of the case studies and the guide, and reflect on the reception the guide has had to date.


Introduction
estimated that the worldwide capacity for storing digital information in 2007 was 276 exabytes.A similar estimate from private sector research company International Data Corporation (IDC) put the figure at 264 exabytes and calculated that all the data created and replicated in the "digital universe" was 281 exabytes (Gantz et al., 2008).In fact, since 2007, more data is produced than can actually be stored, as much of the data is transient.Some data is deleted when it is no longer required, some might be transformed into other formats, and some might be processed and the raw data discarded.
As a large proportion of data is not kept, looking after your data is becoming much more important, and teaching the value of this to students early in their careers helps them to recognise its importance and start thinking about ways of managing data.
We looked at five researchers' work from medicine, materials engineering, aerodynamics, chemistry and archaeology, and produced case studies showing the similarities and differences between the data types they produce.We created a guide containing the case studies and an introduction to research data management.

Student Guide
The case studies were written up into a glossy, introductory guide.The guide was broken down into the following three parts: This helped to set the scene and introduce types of data and the research that data represented.The case studies in Part II were written using this framework.

Guide Part II: Case Studies
The aim of the case studies was to show the similarities and differences in research data between disciplines and how this was managed.Each case study was written using terminology from the discipline, whilst remaining accessible to non-experts.
Each case study begins with a table summarising the data categories used by the researcher, grouped using the framework introduced in the previous section.Each case study included a discussion of the researchers' practices when producing and using the data, broken down into three sections: 1. Obtaining the data 2. Using the data 3. Managing the data These sections are taken from one of the data life cycles in section 1 of the guide, shown in Figure 1.
Figure 1.A simplified data life cycle, used as section headings for the case studies.
Finally, each researcher was asked to provide some images that showed their research or data in use.
The case study was formatted to fit on three to four pages.An example of one of the case studies is shown in Figures 2-4.As can be seen, the format of the guide was presented clearly and used colour extensively to make it more approachable and easier to digest.The final part of the guide provided general advice on how to manage data.Topics include file naming, data preservation, file tracking, file formats, backups and file versioning.Figure 5 shows an example of how this was presented.

Lessons Learned
The guide has been presented to students twice as part of a training lecture given to first year postgraduates.To gauge its reception, students were asked three questions at the end of the lecture, as shown in Table 1.

1.
Five Ways To Think About Research Data -providing an introduction into data categorisation and data life cycles; 2. Case Studies from medicine, materials engineering, aerodynamics, chemistry and archaeology;3.Data Management Practices -giving general tips on managing data.Guide Part I: Five Ways To Think About Research DataCombining some recognised definitions of research data, we introduced research data by showing the following five ways of considering it:1.How research data is collected(Research Information Network, 2008)    2. The forms of research 1 3. Electronic storage of the research data

Figure 2 .
Figure 2. Case study data usage summary.Figure3.Case study text description.

Figure 3 .
Figure 2. Case study data usage summary.Figure3.Case study text description.

Figure 4 .
Figure 4. Case study figures and summary.

Figure 5 .
Figure 5.An extract from Part III, giving advice on data management best practices.

Table 1 .
Feedback from research data management introduction lectures.
The International Journal of Digital CurationVolume 8, Issue 1 | 2013