Use and Impact of UK Research Data Centres

UK data centres are an important part of efforts to gain maximum value from research data. However, if they are to operate effectively, the services that they provide must be based upon an understanding of researchers’ practices and needs. Furthermore, in order to build a case for ongoing funding, data centres must be able to demonstrate their value to researchers work and, increasingly, their contribution to wider political “impact” agendas. This paper presents the findings of a survey of users of five UK data centres. It suggests that research data centres are highly valued by their users. Benefits appear to be particularly strong around improving research efficiency, especially access to data. Data centres are less important in terms of stimulating novel research questions. Despite a few interesting cases of observable impact, in the main it remains difficult to understand the wider reach of research which draws upon data centre resources. 1 This paper is based on the paper given by the authors at the 6th International Digital Curation Conference, December 2010; received December 2010, published March 2011. The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. ISSN: 1746-8256 The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre.


Introduction
In recent years, there has been a growing understanding of the importance of data as a primary research output.This is demonstrated, for example, by the increasing interest in, and policies for, data preservation and management on the part of research councils.Such attention is due in part to recognition of the data deluge: the enormous quantities of data created as a result of changing research processes and, in particular, the growth of e-science.(Hey & Trefethen, 2003).The production of large datasets is expected to continue to expand and data outputs from so-called "small science" are also being recognised as important resources for preservation and reuse (Lyon, 2007).
The proliferation of data creates both opportunities and challenges for researchers.As the US National Science Board (2005) has pointed out, large amounts of data can be aggregated to permit new forms of scientific research, and data can be reused and aggregated to answer research questions beyond those for which it was originally gathered.However, such work depends upon data being readily discoverable and in a format which easily allows reuse.In practice, publication methods and locations are varied, which has tended to limit the ways in which datasets can be used by other researchers (UKRDS, 2008).
It has long been recognised -by both policy bodies and by researchers themselves -that a suitable structure for the collection, management and access of research data is crucial if that data is to be useful for other researchers (Lievesley & Jones, 1998, Research Information Network, 2008a).Data centres present one important attempt to ensure that the potential of research data is fully exploited.They do this in two main ways.First, they attempt to ensure that data is discoverable by providing a single location where researchers can deposit their work and tools which allow other researchers to find and access it.However, it is important to note that most have not yet achieved comprehensive coverage within their discipline, with Beagrie et al. finding that only 18% of researchers deposit their work in a data centre -although 43% use one to access data (2009).Furthermore, even data which is deposited may be governed by restrictions on access or reuse for ethical reasons or to enable the original researchers to get the maximum publication benefit from it before opening it up to the wider research community (Research Information Network, 2008b).
The second important role played by research data centres is to provide a support structure for researchers who need to get their data -and metadata -into shape prior to deposit.Most researchers are not accustomed to preparing their data for use by others and few have the time to do so, or to learn to do so (Research Information Network, 2008b).Indeed, this was identified as a major barrier to fully open research data by the Australian National Data Service Technical Working Group (2007).Repositories are an important resource for these researchers, providing advice, guidance and structures to ensure that data is ready for reuse.
The shape, size and number of datasets will continue to change over coming years.New research techniques and technologies may also allow novel uses of existing datasets (National Science Board, 2005).Thus, it is necessary periodically to evaluate the usage of research data centres, to ensure that they are relevant and meeting the needs of researchers, both as depositors and as users of data.This can also help to inform future service development and ensure that the limited available funding is The International Journal of Digital Curation Issue 1, Volume 6 | 2011 spent upon changes that will be valued by users.Furthermore, it is important that data centres can prove their value to funding bodies to ensure that their work receives ongoing financial support.(Research Information Network, 2008a).In recent times, such bodies have focused increasingly upon the "impact" of academic research on wider society as a key criterion for success (Grant et al., 2009).Thus, it becomes important to trace not only the role of data centres in researchers' work, but also the role of that work within public and private endeavour.This paper draws on work commissioned by the Research Information Network, which examines how researchers use data centres, and the benefits which accrue to them, and to wider society, by doing so.The research suggests that data centres are well-regarded by users and considered important to the research process.In some areas, there is considerable diversity of opinion between users of the five centres surveyed, suggesting that any developments to a data centre would need to take account of the specific needs and interests of its users.Finally, the research concludes that it is extremely difficult to identify causal links between usage of data centres and wider societal impact.This is not to suggest that such links do not exist; simply that the methodologies employed in this project were not able to identify them.

Methodology
The project on use and benefits of research data centres comprised of desk reviews of eight UK data centres and a survey of users of five of those eight, with 1,388 usable responses generated.Since around half of those responses came from users of a single data centre, data have not been aggregated in this paper to show combined totals for all data centres.Furthermore, it is important to note that the sample is self-selecting and may therefore be vulnerable to response biases.In particular, most of the respondents to the National Geoscience Data Centre (NGDC) survey worked for a single public sector organisation, the British Geological Survey (BGS).It seems likely that this response bias stems from the way in which this particular survey was publicised.
The data centres which participated in the survey are shown in Figure 1, along with a brief summary of their remit and budgets for the financial year 2008-9.Also shown are the overall number of responses to the user survey for each data centre.

Patterns in Research Data Centre Usage
The survey began by gathering some basic information about data centre users.Figure 2 shows the sectors in which survey respondents work.Sectors which account for at least 10% of respondents have been highlighted for ease of reference.The distribution of users for individual data centres varies considerably.The atypical result for the NGDC is probably due to the large number of respondents who work for the BGS, which is a public research organisation.Overall, while the majority of users appear to come from academic backgrounds, there is a relatively strong showing for public research organisations for the BADC, and for central and local government and community and charity organisations for the ADS.Use by researchers outside remains relatively small scale compared to use within academia.Figure 3 shows how survey respondents use research data: respondents could tick all categories which applied.The most common response in each category has been highlighted.The findings suggest that most data centres are used in one primary way by a large number of researchers, with other uses being less widespread.For the ADS and CDS, the data is primarily accessed for reference purposes, while most users of the BADC and the CDS use data for their research work.These differences may be due in part to the nature of the data centre's holdings; the CDS, for example, holds primarily experimental data which academic researchers reference routinely and frequently, while the ESDS holdings are much better suited to form the basis of a user's own research.The NGDC appears to be well-used for a range of purposes, although this may again reflect the small and homogeneous group of survey respondents.

The International Journal of Digital Curation
Figure 4 shows trends in use over time for each data centre.Again, the most common response in each category has been highlighted.For most centres, usage has broadly stayed the same over time, in some cases with fluctuations.Data centres' own research shows that overall usage has gone up over time.When combined with the results presented here, this suggests that data centres are gathering new users rather than seeing more intense usage from existing ones.That said, the ADS and NGDC have seen many users increase the frequency of their usage over time, while the BADC has seen over twice as many users decrease their usage over time as increase it.Where data centre usage has changed, the survey indicated that this is often to do with changes to the researcher's circumstances rather than changes made by the data centre.Increases in usage were attributed to new research questions (28%) and changes in role or position (21%), while decreases were attributed to research questions being addressed or shifting emphasis (42%) or changes in role or position, including retirement (33%).However, 26% of respondents said that they increased their usage due to improvements to the range and quality of data available, suggesting that developments made by the data centre can have a positive impact upon usage.

Frequency of use
Use is not limited, of course, to downloading data; data centres also provide a service for researchers who wish to share data that they have created.Accordingly, survey respondents were asked about their data sharing habits, and their perception of the impact that data centres have had upon data sharing and reuse within their disciplinary field.Figure 5 shows the proportion of data-creating researchers who submit content to data centres; the most common response for each data centre has been highlighted.For the BADC, this question was phrased slightly differently and so results are not comparable.The "N" figures for this information are themselves interesting; for most data centres, roughly half of the survey respondents created new data; for the ESDS it was roughly a third.Users of both the ADS and the CDS were most likely to submit some, but not all, of their data to a data centre.The low levels of submission -and low overall data creation -from ESDS users is probably related to the nature of research in the social sciences, where the high long-term value of data and ethical concerns about human subjects can inhibit sharing.may well be explained by the large proportion of respondents who are based at the BGS and who are therefore bound by that organisation's data sharing policies.

Submission of new data
Figure 6 shows how survey respondents perceive data centres to have improved the culture of data sharing and reuse within their own research communities.The most common response for each data centre has been highlighted.Users of all data centres seem to see a strong improvement in data sharing and reuse, which they consider attributable to the existence of the data centre.The slightly less enthusiastic response from users of the ESDS may be because this data centre has been in operation for more than 40 years, with a web presence for the last 15, meaning any behavioural changes have already filtered into the mainstream.

Impact of Data Centres on Researchers and their Work
Beyond understanding how research data centres are used, the survey also sought to establish the impact of data centres on researchers and their work.Figure 7 shows how important researchers consider data accessed via data centres to be to their research.For each centre, the most common response has been highlighted.For most data centres, researchers consider the data to be either "very important" or "essential".The NGDC represents a particularly extreme case, with 85% of researchers considering the data to be "essential"; this may again be due to the homogeneous nature of respondents to the survey for that particular data centre.Researchers were asked to gauge their level of agreement with several statements about the benefits of being able to access data from a data centre.These can be broadly grouped into four main areas: research efficiency, research practice and quality, research novelty, and researcher training.These areas are examined in greater detail, and by data centre, in Figures 8, 9, 10 and 11.Overall, however, the most widely-cited benefits fall into the research efficiency category.Most of the free text responses about data centre benefits also concerned research efficiency.Benefits relating to research novelty achieved the lowest overall rate of agreement.However, even the least widely-

The International Journal of Digital Curation
Issue 1, Volume 6 | 2011 supported statement, about new intellectual opportunities, was rejected by only 22% of researchers overall, while 36% agreed with it "to a large extent".
Figure 8 shows the percentage of researchers, by data centre, who agreed "to a large extent" with the statements about research efficiency.The most common response has been highlighted for each data centre.The most widely-agreed benefit for all data centres was around saving time for data acquisition and processing, which was a primary aim of many data centres when they were established.Figure 9 shows the percentage of researchers, by data centre, who agreed "to a large extent" with statements about research practice and quality.Again, the most common response has been highlighted for each data centre.The rankings here are less clear than those for research efficiency; for most data centres, however, the most widely-agreed benefit appears to be an improvement in the evidence base for research.Few respondents for each data centre, other than the NGDC, agree that the centre has increased the use of data in their own research.This suggests that data centre benefits are concentrated around giving researchers access to core data, rather than encouraging them to undertake research which is more heavily data-focused.Figure 10 shows the percentage of researchers, by data centre, who agreed 'to a large extent' with statements about research novelty.Again, the most common response has been highlighted for each data centre.Benefits here appear to be concentrated around enabling research that might not otherwise have happened.techniques used are only possible due to the aggregation of data through the data centre, or whether the research would not have happened because the data itself would have been inaccessible.Given the overall character of responses to this set of questions, the latter seems more likely: for most data centres there was relatively limited agreement with the statements about new types of research and new intellectual opportunities.Finally, a single question was asked about researcher training, the results of which are presented in Figure 11.This area revealed the greatest differences between data centres.The free text comments suggest that most of these benefits stem from the availability of a resource that can introduce researchers to the important data sets within their field and demonstrate best practice in collecting and handling data.This relates closely to the two founding aims of many data centres: to widen access to data sets and to improve researcher practices around curation and storage of data.

Wider Impact of Data Centres
As set out in the introduction, it is important that data centres are able to demonstrate their impact and value beyond the academic community.Figure 2 above indicated the broad reach of some data centres, and some of the researcher benefits outlined in the previous section therefore relate to researchers working in nonacademic settings.However, the aim of "impact" is to reach beyond the research community, and the project therefore sought evidence that this was happening.
Figure 12 shows the intended audiences for research produced using data acquired from data centres.Of course, an intended audience is by no means the same thing as an actual audience; surveying the supply side gives little concrete information about demand or usage.Nonetheless, this table gives an indication of the potential reach of research based upon data centres' holdings.Furthermore, it seems unlikely that researchers would continue to produce work for a certain audience if that audience The International Journal of Digital Curation For all data centres except the NGDC, the primary audience is academics, while for the NGDC the primary audience is other individuals within the user's organisation.This is consistent with the profile of survey respondents which, as we have said above, was particularly inclined towards BGS staff members for the NGDC survey.Each data centre presents some interesting trends.NGDC users appear to be quite outwardfacing, with high response rates for a number of different audiences, and a low number of users suggesting that their research was for their own use only.Again, this probably reflects the fact that respondents to this survey were, for the most part, BGS staff members and therefore part of an organisation with a strong cross-sectoral brief.CDS users, on the other hand, have a much more homogeneous target audience, with most of the attention focused on academics and very little on audiences outside the respondent's own organisation.This may well be due to the highly technical nature of the data within the CDS, which may require several degrees of analysis before it can be translated to non-specialist audiences.
Policy makers achieved a reasonably strong showing across all data centres except CDS and, to a lesser extent, ADS, while business was an important end audience for users of the ADS and NGDC.The low level of response to the 'unknown' category suggests that researchers take a strong interest in producing work which is useful for specific audiences.The "other" category, which scored particularly highly for ADS and NGDC, in many cases contained very specific sub-groups which could be considered part of "business" or "policy makers".However, "students" and "the general public" came up relatively frequently for all data centres; it may be that if these had been a prompted answer that they would have scored more highly still.
As mentioned above, although these intended audiences give a useful perspective on the reach that data centres may have, they are not themselves indicative of wider impact.We attempted to capture such impact through short case studies of some of the research projects mentioned by survey respondents.This was not an easy task.Beyond the well-rehearsed problems about connecting "impact" directly to academic activity, the great majority of survey respondents were professional researchers with no direct view of the ultimate use and outcomes of their data centre enabled research.This difficulty is confirmed by the free text responses to the survey question about impact, most of which related to the impact of a data centre on a researcher's own work, rather than their work on society.Some respondents were overtly hostile to the The International Journal of Digital Curation Issue 1, Volume 6 | 2011 notion that research impact is worth measuring, while many others suggested areas where their research could add value, but no evidence to indicate that it had already done so.However, a small number of researchers were able to provide examples of instances where their work had influenced practice and policy outside academia.
For the most part, the subject matter of the data centre determines where research is likely to have impact.Thus, most of the impacts identified by researchers using BADC data were in environmental fields, while ESDS researchers influenced areas of social policy.Broken down by type of impact, most responses talked about either new models or tools which helped to support decision making by public or private bodies, new policies and regulatory controls, and development of new commercial materials, particularly drugs.Effects can be observed in the public, private and voluntary sectors, although given the small size of the sample for this question it would be unwise to attempt any estimation of the distribution between these three groups.

Conclusions
This research suggests that UK data centres are playing a valuable role in the research community.They are making it easier and cheaper to access data, are supporting new ways of doing research and are helping researchers to manage and curate their own data more effectively.Overall, they are fulfilling many of the needs identified within the literature.However, it is important to emphasise that this survey only contacted existing users of research data centres: it is possible that there are researchers in these fields who are not accessing the benefits brought by data centres because they are not yet using them.Further study should focus on these non-users, and in particular any barriers which might prevent their use of data centres.
Most data centres have a fairly homogeneous user group consisting of researchers from academia or from public research institutions.ADS users represent the widest range of backgrounds, but overall there is relatively little usage from private researchers or business.This may be related to the nature of the research -it is possible that non-academic researchers have less time or interest in completing a user survey.However, data centre funders should consider whether they can encourage use from a more diverse community.Most users reported that they have maintained their level of usage (with some fluctuations) over their period of data centre use.Data centres themselves report increasing overall levels of usage, suggesting that they are attracting new users rather than encouraging existing ones to increase their intensity or frequency of usage by, for example, exploring new kinds of research question.
Indeed, there was a strong sense overall that, while data centres may have an effect on some elements of researcher behaviour -such as the propensity to share data -they are having a relatively weak effect on the types of research that are undertaken.This was the least widely-supported benefit of data centres; those to do with research efficiency and cost achieved much higher levels of agreement.It may be that research novelty is an important function of data centres for some researchers; for the majority, however, it is less important than the ability to access data quickly and cheaply.In developing further services for researchers, data centres should take into account the relative value of these benefits to researchers.

The International Journal of Digital Curation
Issue 1, Volume 6 | 2011 When considering possible future service developments, it is also important to note that on some issues the views of users were not homogeneous across data centres.For example, the ways in which researchers use the data they acquire varies by data centre, as does their view of the value of the data centre to researcher training.In many cases this will be determined by the content of the centre as well as the needs of the researchers.However, it raises interesting questions for the UKRDS, particularly in terms of researcher training and development, and the extent to which a national framework can be sensitive to the specific needs of researchers in different disciplines.This research also confirmed the difficulty of tracing the impact of academic research.Several researchers suggested that their work could have an impact, in some cases suggesting very specific ways in which this could happen, but were not able to show that it had actually occurred.However, a few researchers were able to cite specific instances where their work had supported developments in public, private or voluntary sector organisations.The fact that researchers cannot always see the impact of their work suggests that such impact may be more widespread than this survey reveals.Future research could address this problem in more depth by contacting the eventual end users of the research, although this is bound to present new problems around traceability and access.
There are some other important questions that this research was not able to address.The value of data centres to small science was not covered explicitly within the survey and in the context of developments such as Dryad UK it would be useful to understand whether and how an established data centre might support researchers in these fields.The research also highlights the number of researchers that produce new data but do not submit it to a data centre.It is likely that in some cases this is because they do not think that the data has any value to other researchers; in others, however, potentially useful data will be going unshared.Funders should consider how they can encourage researchers to submit data to data centres, and in particular whether stronger guidelines about data citation might help.
The high submission levels by users of the NGDC The International Journal of Digital Curation Issue 1, Volume 6 | 2011 Figure10shows the percentage of researchers, by data centre, who agreed 'to a large extent' with statements about research novelty.Again, the most common response has been highlighted for each data centre.Benefits here appear to be concentrated around enabling research that might not otherwise have happened.It is not entirely clear whether the research would not have happened because the

The International Journal of Digital Curation Issue 1, Volume 6 | 2011
Extent of Improvement in Data Sharing and Reuse Due to Data Centre, by Data Centre.