The Product and System Specificities of Measuring Curation Impact

Using three datasets archived at the National Center for Atmospheric Research (NCAR), we describe the creation of a ‘data usage index’ for curation-specific impact assessments. Our work is focused on quantitatively evaluating climate and weather data used in earth and space science research, but we also discuss the application of this approach to other research data contexts. We conclude with some proposed future directions for metric-based work in data curation. International Journal of Digital Curation (2013), 8(2), 223–234. http://dx.doi.org/10.2218/ijdc.v8i2.286 The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ 224 Specificities of Impact doi:10.2218/ijdc.v8i2.286


Introduction
A quantitative evaluation or assessment of any phenomenon will try to answer two basic questions: 1. What should be counted?2. How much should it count?
In scholarly communications, citations are typically what is counted, and their position, prevalence and popularity determine how much a citation should count as being evidence of research impact.To measure research impact, many statistical techniques like co-citation analysis (Small, 1973), or the h-index (Hirsch, 2005) have been developed to show how an individual can be evaluated via the citations made to their publications.
More recently, scholarly communications has started to innovate with these methods of analysis by questioning what is counted.In particular, alternative metrics (hereafter referred to as altmetrics) are beginning to leverage the various traces of activity on the social web in re-calculating research impact.Microblogging (aka "tweeting") (Priem and Costello, 2010), the prevalence of journal articles on social bookmarking sites (Haustein and Siebenlist, 2011), research blogging (Shema, Bar-Ilan, and Thelwall, 2012) and website page-views (Thelwall, 2012) have all been explored as potential alternative, new impact indicators.These altmetric analyses don't so much critique existing citation-based metrics as much as they offer a complementary means of impact assessment -one that provides a broader, more complete view of knowledge production in contemporary science (Priem, Piwowar and Hemminger, 2012).So, we might say that altmetric studies question not just what is counted, but also how much and even why these new media traces count as evidence of research impact.
To date, most altmetric studies have been aimed at quantifying an individual's impact on their community of practice (a notable exception is Bollen et al., 2005).Our work here is focused on expanding that horizon, and asking whether or not we can develop assessment techniques that successfully quantify the impact of curation services developed by large groups of people and infrastructures funded by entire institutions.In a sense we're promoting the same reconsideration of research impact as previous altmetric studies, but we're doing so at a different level of granularity.We want to reconsider what it means for a service or a system to have research impact.
In this paper then, we have three ambitions:

To innovate with existing quantitative impact assessment techniques;
By developing new indicators of how, when and under what circumstances research data are accessed, the curation community might also engage in a broader discussion about how these calculations can gauge the impact of services and infrastructures supporting data-intensive research.We also believe these metrics are an important step in making

Setting: The RDA at NCAR
The Research Data Archive (RDA) is a repository of atmospheric and oceanographic observational data, weather prediction model output, gridded analyses and reanalyses, climate model output, and satellite-derived data that has been curated by staff in the Computational and Information Systems Laboratory at NCAR for over 40 years (Jacobs and Worley, 2009).The holdings of the RDA are dynamic; many datasets are routinely updated, and new datasets are added each year, with total holdings currently exceeding 1.3 Petabytes.
One of the motivations for this study is to find ways to assess the performance of the RDA beyond a generic "total number of users served" statistic.In particular, we want to highlight and make visible the nuanced or craft-like work that goes into curating heterogeneous large-scale datasets in this environment.Software engineers working in the RDA have a more complex set of responsibilities than their title implies, including at least two activities not mentioned in the digital curation life-cycle model: 1.The creation of data services, which spans a wide range of activities from creating customized sub-setting and format conversions for multi-terabyte-sized datasets to "data rescue" for content stored on out-dated magnetic tapes; The International Journal of Digital Curation  (Jacobs and Worley, 2009).
These two curation activities are currently assessed quite differently.The effectiveness of data services are usually measured through user satisfaction surveys administered on an annual or semi-annual basis; while archival content development is typically evaluated through systems log-analysis, or web analytics that attempt to directly correlate the volume of data downloaded with the quality of the data being served.
Separately, these two techniques are effective for exploring when and how often data hosted by an archive are consumed, but they are also exceptionally labour-intensive and it is often difficult to generalize about "impact" from survey or log-analysis data alone (Bollen et al., 2008;Henneken et al., 2009).These techniques also have a difficult time capturing the nuanced work of data curators, including how shifts or changes in services impact end user consumption.This leaves the services and infrastructures, such as those developed by staff at the RDA, invisible to promotion or tenure awards at an individual level, and often ignored or overlooked by federal funding at an institutional level.

A Data Usage Index
One previous attempt at making curation work more visible is Ingwersen and Chavan's (2012) Data Usage Index (DUI).Using a combination of web-analytics and log-analysis, this index consists of 14 quantitative indicators that capture different ways that data are discovered and accessed in an archival setting (See Table 1 for full description).The DUI was originally developed to measure the use of species occurrence records from the Global Biodiversity Information Facility (GBIF) database, and was effective in showing how changes within that infrastructure impacted user activity over time.
Ingwersen and Chavan state that the DUI should be adaptable to a new research domain, but that in doing so, '…one needs to take into account the fundamental characteristics of datasets and their usage patterns' within that domain (2011).In adapting the DUI from a biodiversity setting, we found a number of differences between the ways that users performed searches, but also in their very orientation to "using" climate and weather data.
To return to our original discussion of what counts and what is counted in any impact assessment; in the DUI what is counted are data access events (e.g.downloads, searches etc.), but what should count will be unique to the system and the type of data products being analysed.We refer to the differences between what is counted and what counts in a DUI assessment as the system and product specificities of measuring research impact.

Specificities
Oliver Williamson originally used "asset specificity" to describe economic transactions where one firm acted irrationally, or unexpectedly, when trading goods with another firm (1981).Williamson observed that some firms required specific assets, like a particular material, tool, or type of human expertise in order to achieve a desired outcome.A firm requiring these assets had a specificity that locked them into certain transactions, and certain ways of doing business that seemed completely irrational to an unknowing marketplace.
As an example, consider an architectural firm that designs a skyscraper to be made entirely of white marble.Any construction company that they hire to build the skyscraper will be necessarily beholden to a few specific sites in Tuscany, Italy where Carrera marble (the only kind of white marble strong enough for this scale of construction) is quarried.Carrera marble then is an asset specificity of this building's design: it locks a construction company into certain ways of working, and necessarily limits their choice in acquiring a competitive price on the materials they need to accomplish a task.Without a nuanced understanding of the larger context in which both firms are operating, their actions seem irrational.However, we can understand and begin to better accommodate these types of behaviours if we can find ways to account for and record specificities that constrain marketplace actions.Malone et al. (1987), and more recently Haythornthwaite (2006), refined Williamson's concept of asset specificity and added new applications of the term, such as institutional, knowledge, structure and system specificities.These types of specificity more explicitly account for external factors that shape the way groups, teams and organizations produce new knowledge, and are limited in their organization and collaboration by specificities introduced through networked information technologies.
When evaluating usage patterns and the characteristics of datasets served by the RDA, we noted two types of specificities that constrained scientists accessing these materials: system and product specificities.
System specificities include the architecture and organization of data hosted by an archive.These specificities limit the way a user can search, browse or access a dataset.Whether data is accessible through a graphic user interface or through a command line tool like 'curl' is an example of system specificity.These externalities shape the way a user can interact with an archive's content, and consequently these specificities are manifest in the user-logs that record how often, and what amount of data a scientist can access in the RDA.
Product specificities are the properties of a dataset -the file structure, format, and size -that affect the way a user can interact with an archive in consuming and discovering data.An example from the RDA is a dataset containing observations made at a NOAA weather station.This dataset will likely contain variables like precipitation or wind speed that are recorded at a sub-daily rate, and a file corresponding to each sub-daily recording.To retrieve a meaningful or complete set of records, an end user often has to consume thousands of files in a single session.Based on file count alone, the downloading an entire weather station's data would seem like a user had consumed a massive amount of data.In reality, the volume of these files might equal only a single gigabyte in size.These externalities make file size, download counts, or even download frequency a product specificity for impact assessments.Both product and system specificities shape the way that users interact with or access the content of a data archive.In turn, metrics that are developed based on user-archive interactions will necessarily reflect these specificities.In the next section we explore the creation of a DUI unique to the Research Data Archive at NCAR, and note some generalizations that can be gleaned from this process.We then operationalize the DUI to study three datasets hosted by the RDA and discuss the limitations of this work in light of the product and system specificities described above.

The International Journal of Digital Curation
Volume 8, Issue 2 | 2013 Towards a DUI for the RDA Building on Ingwersen and Chavan's previous work, the first step in adapting a DUI to a new research environment is to define a unit of analysis.This unit must determine: 1.An appropriate level of granularity at which there is a meaningful group of data, or a 'dataset'; and 2. An appropriate time window in which to capture robust user-system interactions.
For the RDA's DUI, we chose to define a dataset as any data product that was issued a unique identifier.Since many datasets in the RDA are dynamic, and will have new content added at regular intervals, we believed that a monthly time window would yield a high enough volume of user-archive interactions for the purposes of our case study.

What to Count? Usage-Based Indicators
Ingwersen and Chavan's DUI was made up of a series of indicators that were derived from events recorded in a system's user-log data, such as the number of files a user downloaded or the number of unique user queries performed in a given month.We similarly rely on these traces of data access to calculate groups of indicators that make up the RDA's DUI (see Table 2).

What Counts? A Case Study of Three Datasets
We selected three different datasets from the RDA to test our proposed DUI indicators.These three datasets are a representative sample of the RDA's diverse holdings: one is a set of global observational data (ds540); the next is a popularly analysed dataset derived from a numerical weather prediction centre (ds083); and the last is a complete and exceptionally large global atmosphere and ocean reanalysis dataset (ds093.0-6).In Table 3, we have normalized the user log data and fitted the scores to a complete set of indicators for two separate one-month time windows.We choose two separate time periods that were 16 months apart in order to emphasize the stability of certain indicators, such as unique users, download frequency and homepage hits.

Ds083
.2 is undoubtedly one of the most popular datasets in the RDA, as reflected by the number of unique users it attracted in both time intervals.Interestingly this dataset's importance in the RDA is also reflected in the usage impact and usage balance indicators.Users of this dataset have a much higher download density than either of the other two datasets, suggesting that users of ds083.2show a high amount of interest in the dataset and access a large portion of the dataset per time window.This latter point is an important one: ds083.2 is exceptionally popular because it includes observations from the Global Forecast Systems (GFS) and its users are likely to systematically download new additions to the dataset on a regular basis.
Interestingly, ds093 increased greatly in popularity over the 16-month period of our observation (a trend that has continued).While it increased nominally in file size, the number of unique users and the usage balance tripled, and the download density more than doubled.This leads to a secondary, but nonetheless compelling value of the DUI indicators: they hold the potential to both compare impact across an archive, as well as track fluctuations of use, popularity and impact of an individual dataset over time.This has important implications for the amount of staff time that is devoted to curating this particular dataset, as it appears to have an expanding user base.
Ds540 is much smaller than the other two datasets in our case study and consequently its indicator scores were lower, which seems to indicate that it receives less attention in the archive as a whole.However, the secondary interest impact score of ds540 is quite high -indicating that it is of very high interest to a small number of repeated users.One explanation for this very high score is that the community of users for this dataset (which consists of historical observational data) is likely to be climate-model developers.Although there are very few climate model development projects in the world, their work has an enormous impact on the field of climate science overall.Thus, in the case of ds540, the secondary impact score indicates that there is an additional value of this dataset that is not well represented by the index as a whole.We'll return to the issue of size as a function of attention later in this paper, but we do recognize the need for a weighting scheme that can smooth the effect of size on metrics developed for archives hosting datasets that vary in volume.

Discussion
Data usage indicators typify how data are discovered or accessed, and we believe that the DUI as a whole can give curators valuable insight regarding the impact of data on a community of users.Over time, we also believe these indicators can be useful tools for understanding which datasets within an archive would most benefit from additional curation efforts.
Creating new impact assessments can also provide the opportunity to make curation work more visible to two particular stakeholders.Firstly, indicators that signal the impact of a dataset can be used to illustrate the value of a repository to research funding agencies on behalf of a data producer.Secondly, and of equal importance, impact indicators can be used internally by repository staff to assess the effectiveness and value of their own services, systems and workflows.In combination, these indicators can inform the ways that a particular piece of architecture should be The International Journal of Digital Curation Volume 8, Issue 2 | 2013 redesigned, which datasets should receive curation attention, or even when an additional service, like sub-setting, should be offered to end users.However, metric development is much like other activities in curation -it is undoubtedly a craft process that requires much practice in order to achieve compelling or even generalizable results.In that vein, there are a number of limitations in the study we've presented here, including the very small time windows used for our assessment, the highly skewed weighting that occurs when a dataset has a small number of users and, of course, the relatively immature development of DUI as a tool for impact assessment.

Final Thoughts
In conclusion, we return to the three ambitions of this paper:

To innovate with existing quantitative impact assessment techniques
We've shown that many of the DUI's indicators are capable of illustrating which datasets are accessed most often, and how the frequency of access changes over time.Some of these indicators also provide a way to assess how datasets that are smaller or less frequently accessed might still be extremely valuable to particular communities, or during particular time windows.In this sense, different indicators can be used to illustrate both the immediate value of a repository (e.g.how a repository is providing resources for new users), and the ongoing long-term value of curation work (e.g.maintaining access to less widely known or used resources).

To explicitly and openly discuss the process of developing new research impact metrics
Much of the work described here was beholden to two types of specificities that constrained and shaped the work of developing new research impact metrics: product specificities, which were unique to the digital object being analysed (in our case climate and weather data), and system specificities, which were unique to the architecture of the system being analysed (in our case, the RDA).There are likely to be many other externalities and specificities that constrain the development of new impact assessment techniques, and we hope that future work in this area will continue to openly address these constraints.

To lay a baseline for future curation impact assessments, citation or otherwise
Studies that have compared different assessment techniques often find that combining usage and citation statistics is the best way to make reliable statements about research impact (Bollen et al., 2009).Currently, data citations are being promoted as a way to track data use and provide professional credit for data collection and management (Uhlir, 2012).As these initiatives grow, there will be a significant need to measure their success through citation-based analyses.We believe further developing the DUI can help establish baseline of reliable usage statistics to compliment those efforts.When data citation initiatives do fully mature, these two types of statistics can then be combined and more reliably used for curation impact assessments.

The International Journal of Digital Curation
Volume 8, Issue 2 | 2013

Table 1 .
The Indicators from Ingwersen and Chavan's Data Usage Index.

Table 3 .
Indicator scores for three datasets from the RDA.