The International Journal of Digital Curation

What are the roles necessary to effective data management and what kinds of expertise are needed by the researchers and data specialists who are filling those roles? These questions were posed at a workshop of data creators and curators whose delegates challenged the DCC and RIN to identify the training needs and career opportunities for the broad cohort that finds itself working in data management – sometimes by design but more often by accident. This paper revisits previous investigations into the roles and responsibilities required by a “data workforce”, presents a representative spectrum of informed opinion from the DCC Research Data Management Forum


Introduction
Co-sponsored by the Digital Curation Centre (DCC) and the Research Information Network (RIN), the Research Data Management Forum (RDMF1 ) is a community comprised mainly of data producers and custodians which meets twice per year to address pressing questions in the data world.
At its November 2008 meeting, which took an analysis of roles and responsibilities for effective data management as its theme, the RDMF concluded by agreeing three objectives for further work: 1.That the DCC should produce a white paper that identifies the training needs and career opportunities for those working in data management; 2. That appropriate measures for providing certification that will cover the skills acquired by practitioners already at work in data management should be explored and recommendations made; 3.That on behalf of the RDMF, the DCC should investigate options for the creation of a national data management education and training forum.
Whilst they were regarded as three distinct outcomes from the meeting, they are all elements of a single aspiration to recognise, enable and sustain those working with research data, at various points throughout the curation lifecycle2 .Consequently, whilst addressing the first objective, it makes sense to give due consideration to all three.
The RDMF event was structured to allow contributions from the broad range of individuals working in the data management field: from data creators, the researchers in our universities, to librarians, the traditional custodians of knowledge; and from data scientists to the data managers working in our institutional or national data centres.Yet naming these four groups does not adequately describe the variations or the extent of activities or skills to be found amongst "data practitioners".Consequently, it was agreed that it would be misleading to focus on fixed designations such as data manager, since a large proportion of individuals in the broad research community occupy a position somewhere on a continuum that includes them all.
This white paper provides an overview of the roles and responsibilities for data management as currently perceived from within the community, their predicted evolution, and the present opportunities for providing and acquiring the necessary skills; from the basic toolkit that will equip researchers with the ability to plan for the longevity of their data to the more sophisticated suite of tools required by a data management professional.
Career options are explored in a similar manner, crossing the boundaries between researchers who work with data and out of necessity or expediency develop a data management brief and, at the other extreme, professionally committed data managers employed specifically to undertake a set of data tasks.To achieve effective data management, the package of capabilities and responsibilities that may need to be assimilated can apply to a diverse range of employment characteristics and roles, providing a significant challenge both to those delivering and those seeking a data practitioner's "toolkit".

Key Perspectives
In their July 2008 report to the Joint Information Systems Committee (JISC), Swan and Brown3 immediately identified the heterogeneity of existing data-related roles.Whilst they attempted to "distinguish four roles: data creator, data scientist, data manager and data librarian", they acknowledged that "in practice, there is not yet an exact use of such terms in the data community, and the demarcation between roles may be blurred".In their definition of these four roles the crucial words 'training' and 'formal qualification' are for the most part absent.Data creators are described typically as researchers who have acquired a high level of expertise in handling and manipulating data; data scientists appear to be working closely with the data creators and may be involved in creative enquiry and analysis; and data managers tend to be computer scientists, information technologists or information scientists who have taken responsibility for the facilities necessary to the storage, access and preservation of data.Whilst one must infer that this latter group has acquired recognised qualifications in disciplines associated with data, it is only the fourth group, data librarians, that may claim to have been formally trained in the curation, preservation and archiving of data.
We carried out a review of the academic and non-academic programmes which exist in the UK and selected other countries which contain data-related components.
(Table 1).The number of data librarians currently employed in the UK is extremely small (Swan and Brown put their estimate at five!) and there is no recognised career path in place to attract individuals already well-qualified in specific domain expertise and knowledge.They too, have arrived in their careers more often by accident than design 4 .In an interview by CILIP5 , Macdonald and Martinez-Uribe describe some data librarians as having "research skills, some library skills, some are statisticians and some are subject experts.Each of the UK's main data libraries addresses different academic audiences and institutional biases, and local staff have developed different skillsets accordingly"6 .This "accidental" career choice is not restricted to data librarians.Speaking at the November 2008 RDMF workshop, Sam Pepler referred to his team of data managers at the British Atmospheric Data Centre (BADC) as comprising experimental researchers, computer scientists, information technologists and information scientists.The BADC group was reported7 to be heavily IT-based, including system administrators, systems developers, infrastructure management and Web experts, and data modellers -all having followed typically random development paths but generally within IT.Helen Parkinson of the European Bioinformatics Institute described her team of biocurators as life scientists who manage life sciences data8 , most of whom are active computational research scientists; adding that it has proved quicker to turn bioscientists into data curators than it would be to turn computer scientists into data curators with any fluency in the biosciences9 .Not surprisingly, when the data aspect of anyone's role is not core to their career aspirations, the retention of skilled staff has proved to be a problem.
From the observations of Macdonald and Martinez-Uribe 6 , it might be imagined that amongst university information services there could be found the greatest opportunities to establish an identifiable career path for data practitioners, particularly in institutions where information and technology functions are organisationally combined and the barriers between discrete groups of experts are less rigid.

Roles and Skills
A SHERPA document published in August 2008 10 , which sought to assist the writing of job descriptions, focused on staff managing data in the context of institutional repositories; but even in that structured context it was found that staffing requirements for a repository vary greatly among institutions, depending on the remit of the repository, and existing and available resources.The skill set identified by SHERPA does not describe a particular repository service function, but rather a spectrum of skills and knowledge that could be supplied from within the established IT and library community.It does not include the depth of domain knowledge that researchers insist is crucial to informed data management, although it does refer to the ability to manage user expectations.
Delegates to the RDMF workshop identified a set of core skills required in the management of data, which were subsequently mapped to each of the four roles described by Swan and Brown (Figure 1).Whilst recognising the risk of speaking in terms of fixed designations, this exercise framed the portfolios necessary to each role.The diagram depicts not only the skills that are apparently unique to each role but also their intersecting boundaries, the number of intersections made by ostensibly critical activities serving to re-emphasise the blurring of boundaries that everyone seemed to acknowledge.

Qualifications and Certifications
So does the above diagram tell us only that the skill requirements for data workers are multi-faceted?If so, it is not an especially practical message to inform the construction of a career path, or the design of a training programme.
Twittering from DigCCurr Conference on April 3, 2009, Kevin Ashley of the University of London Computer Centre noted an audience member's comment "that mid-career people need training that doesn't involve time and travel; we're working on that, folks!"But an analysis of precisely why is it needed would be welcome.One of the factors to consider is job mobility.At the European Bioinformatics Institute, for example, researcher-curators are obliged to move on once they have completed three years in post.This has obvious advantages and disadvantages when it comes to spreading skills and experience across the sector, but at the same time it repeatedly weakened the organisation's collective skillset and sense of internal continuity.
There is the question of employee recognition, of gaining appropriate reward (both in terms of salary and prestige) for a task which can take up a significant chunk of one's working life.There is the question of career progression/ advancement.And finally, there is the question of quality management -being assured that the person in role to look after precious, sometimes irreplaceable data has some form of external validation to hold such a position of responsibility.Sheila Corrall's presentation12 to the RDMF workshop considered the protagonists in the data-centric environment according to their principal specialism.She categorised IT experts as conduit specialists, library or information scientists as content specialists, and academics or professionals as context specialists.Applying that analysis, the activities of data scientists would traverse all three fields, although the demands for handling data in each of the fields is constantly evolving, producing a continual pressure upon everyone to up-skill.Whilst she saw a role for repository and subject librarians, who will need to add data to their portfolio, her perception of hybrid information specialists with boundary-spanning roles, ambiguous status and a variety of generic and specific titles for the same job confirms the generally received analysis of the data practitioner.Her approach, nonetheless, provides some clues to the potential shape for a training regime, which will have to address both the breadth and depth of competency requirements, combining technical expertise with contextual understanding (i.e., significant domain knowledge) and interpersonal skills.(For example, King's College London's MA in Digital Asset Management focuses on cultural heritage rather than research data, but the programme culminates in a dissertation project involving the application of computing in a discipline area in which the student has appropriate intellectual training -this could be an opening for prospective data librarians.)Given, too, that knowledge and skills tend to be acquired on the job, and that there is an established practice of supplementing one's knowledge by attendance at specialist workshops and courses, any programme of formal training will have to be closely associated with real-life situations and practice, will need to offer options for flexible delivery, and must be affordable to both employees and employers, bearing in mind that there could be difficulties in sourcing funding for delivery as a traditional taught course.
Professor Corrall's main proposition was that data skills should be made a core academic competency, a concept that reflects the UK e-Science Envoy's work on a national e-Science curriculum, and the ICEAGE (2008) group's Curricula for Undergraduate and Masters Level Courses, where data handling is embedded in the curriculum. 13Credit-bearing data management units would also be added to research training modules, online data skills tutorials would be provided, and practising stakeholders would be involved in the planning and delivery of training.There is a need to go beyond the workshop and the short training course, and embed preparation for a professional (and personal) lifetime of digital data curation within the academic curriculum.Such an approach would see data skilling evolve incrementally, from short courses to individual modules to accredited modules to specialist qualifications.As she concluded, if information studies departments in universities are to move in the direction of training data practitioners, they will need to know if there is a market before they will invest.That market could become visible as a consequence of an internal refocusing of university curricula.The big question is whether the importance The current situation is depicted in Figure 2 below, together with suggestions on how we might tackle the problem within the existing framework.Most universities will award some amount of proportionate credit for prior experiential learning -Accreditation of Prior and Experiental Learning (APEL); of course, in order to award APEL, there must be an appropriate qualification already in existence, and APEL alone is not the answer: the Quality Assurance Agency (QAA) and individual institutions will have upper limits (typically around a quarter or a third of the total credit per level) which may legitimately be awarded on the basis of prior learning.In order to introduce this onto the curriculum, there is a need to lobby and seek to influence certain key players.

The Researcher Perspective
What is largely missing from this debate is evidence from the community of active researchers with respect to their own needs and aspirations within the research lifecycle.The National Centre for e-Social Science's Data Management through e-Social Sciences (DAMES) Project takes as its starting point the cogito that "while data management tasks are a major part of most social science projects, researchers often don't have good fluency in these tasks, and consequently don't take advantage of their own data resources." 15e recent RIN-funded Case Studies in the Life Sciences project16 investigated seven disciplinary groups, and found that data curation is only a minor element of the research lifecycle in the majority of research programmes, but that where there has been direct contact with data management professionals it is possible to identify examples of significant if isolated change, most typically as an unintentional impact or outcome from the participation of data experts in a research project.For example, the value of interaction with data professionals was illustrated by evidence from the study of the neuroscience group.In 2008 this group had already been the subject of an immersive study by the DCC's SCARP Project17 into the curation of neuroimaging data for sharing and re-use (Whyte, 2008).One of the outcomes from this relationship was a declaration by the neuroscience team that "now we manage our data, whereas before we didn't."The key observation made by the original SCARP investigator was that effective curation needs human infrastructure, and how researchers' and investigators' heedful attention to each other's data will in effect underpin the curation activity within a team.The dividend from this "think local" approach should be taken into account when assessing the supply of data curation capabilities and it could provide a working rationale for institutional service organisations.However, while the researchers who engaged with the Case Studies project responded warmly to the need for informed data management, their demands would be difficult to satisfy.More than one research group expressed a desire for the supply of centralised (institutional) informaticians who could be "parachuted in" for a short period to set up data management systems, as well as local and readily accessible informaticians constantly on hand with knowledge of the discipline, the project and the work done by the group, providing ongoing support and solutions to problems as they arose.For any institution these demands would represent a formidable challenge in terms of providing and managing such a highly skilled and flexible resource.The explicit requirement for a substantial level of subject knowledge also militates against the development of a central cohort that might be useful in multiple situations.

Whose Responsibility?
So whom do we need to influence or engage in order to get data-related issues embedded within academic programmes?The list is potentially quite lengthy, including: the QAA's subject benchmark review panels; leaders of academic programmes and their external examiners; heads of graduate schools with a responsibility for researcher training mechanisms and programmes, and the Human Resources services which support them; groupings of vice-principals with a responsibility for Research and/or Learning and Teaching; the Higher Education Academy (HEA), which aims to inform policy and support best practice for the aims of bettering the student learning experience.Other stakeholder groupings with a contribution to make to this debate may include research librarians, IT directors or vice-principals with a responsibility for information management, and professional bodies.Issue 2, Volume 4 | 2009 Lastly, there is the question of the awards themselves: who would, could or should certify, accredit or otherwise endorse these changes?In the United States, the Data Management Association International (DAMA) offers a Certified Data Management Professional credential, "awarded to those who qualify based on a combination of criteria including education, experience and testbased examination of professional level knowledge.This credential is offered at the Mastery or Practitioner level.To maintain certified status and continued use of the credential, an annual recertification fee along with a 3-year cycle of continuing education and professional activity is required." 18This certification is offered in partnership with the Institute for Certification of Computing Professionals (ICCP), which administers testing and recertification. 19ile a small number of recent workshops has begun to engage with this issue (e.g., Education for digital stewardship: librarians, archivists or curators?(2008) 20 , The International Data curation Education Action (IDEA) Working Group (Hank &  Davidson, 2009)), given the multiplicity of options that has emerged in the paragraphs above, it can be argued there is a need to convene a national, and possibly even a global debate to resolve the demand for data management education and training.Indeed, the DCC and the School of Information and Library Studies at the University of North Carolina, Chapel Hill, have recently received funding from the UK's Joint Information Systems Committee (JISC) 21 and the US Institute of Museum and Library Services (IMLS) 22 for a project entitled Closing the Digital Curation Gap: International Collaboration to Integrate Practice, Research, and Teaching in Digital Curation, an international study which aims to establish and support the network of educators through face-to-face meetings and the provision of various IT tools.The initiative's overarching goals are to establish a baseline of digital curation practice and knowledge

Figure 1 .
Figure 1.Core skills for data management, Chris Rusbridge and Martin Donnelly. 11

11
Research Data Management Forum: RDMF2: Core Skills Diagram http://dataforum.blogspot.com/2008/12/rdmf2-core-skills-diagram.html skills can be sold to universities as a critical driver for restructuring the teaching programme.An alternative (or, perhaps complementary) solution would be to integrate data management within postgraduate training programmes, whether embedded within the MRes curriculum run by Graduate Research Schools within the Faculties, via subjectspecific Doctoral Training Centres (DTCs), or via separate, Human Resources-run training and development programmes (e.g., Aberdeen Skills for Postgraduate Innovation, Research and Employability (ASPIRE)14  ).

Figure 2 .
Figure 2. Training opportunities for data professionals.

Table 1 .
Library and Information Science MSc offers a specialism in Data Curation.The School also hosts a Summer Institute for Humanities Data Curation.Data-related educational programmes.
http://www.shef.ac.uk/is/prospectivepg/courses/ edlm University of Strathclyde Information and Library Studies MSc/PgDip PG Attendance http://www.strath.ac.uk/cis/courses/mscpgdipinf ormationandlibrarystudiespostgraduate/ University College London Information Management BSc, Library & Information Studies MA, Archives & Records Management MA, Records & Archives Management International MA, Electronic Communication & Publishing MA, Information Science MSc, Library, Archive & Information Studies MRes.The Department of Information Studies also offers a number of short courses related to digital information management.UG/PG Attendance http://www.ucl.ac.uk/infostudies/teaching/IRELAND The International Journal of Digital Curation Issue 2, Volume 4 | 2009 Graduate Certificate in Digital Information Management PG Distance http://digin.arizona.edu/University of California at Berkeley Master of Information Management and Systems PG Attendance http://www.ischool.berkeley.edu/programs/masters University of Illinois at Urbana-Champaign The International Journal of Digital Curation Issue 2, Volume 4 | 2009

Table 2 (
below) offers a quick overview of potential accrediting or certifying bodies, and the strengths and weakness of such a situation.

Table 2 .
Options for accreditation or certification.