Towards a Symbiotic Relationship Between Academic Libraries and Disciplinary Data Repositories : A Dryad and University of Michigan Case Study

In addition to encouraging the deposit of research data into institutional data repositories, academic librarians can further support research data sharing by facilitating the deposit of data into external disciplinary data repositories. In this paper, we focus on the University of Michigan Library and Dryad, a repository for scientific and medical data, as a case study to explore possible forms of partnership between academic libraries and disciplinary data repositories. We found that although few University of Michigan researchers have submitted data to Dryad, many have recently published articles in Dryad-integrated journals, suggesting significant opportunities for Dryad use on our campus. We suggest that academic libraries could promote the sharing and preservation of science and medical data by becoming Dryad members, purchasing vouchers to cover researchers’ data submission costs, and hosting local curators who could directly work with campus researchers to improve the accuracy and completeness of data packages and thereby increase their potential for reuse. By enabling the use of both institutional and disciplinary data repositories, we argue that academic librarians can achieve greater success in capturing the vast amounts of data that presently fail to depart researchers’ hands and making that data visible to relevant communities of interest. Received 13 January 2014 | Accepted 26 February 2014 Correspondence should be addressed to Jennifer Green, University of Michigan, Ann Arbor, MI. Email: greenjen@umich.edu An earlier version of this paper was presented at the 9 International Digital Curation Conference. The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution (UK) Licence, version 2.0. For details please see http://creativecommons.org/licenses/by/2.0/uk/ International Journal of Digital Curation 2014, Vol. 9, Iss. 1, 119–131 119 http://dx.doi.org/10.2218/ijdc.v9i1.306 DOI: 10.2218/ijdc.v9i1.306 120 | Academic Libraries and Disciplinary Data Repositories doi:10.2218/ijdc.v9i1.306


Introduction
Academic libraries are increasingly providing support for the management of research data generated on their campuses (ACRL Research Planning and Review Committee, 2012;Heidorn, 2011;Fearon et al., 2013).One of the cornerstones for this support are institutional repositories that provide long-term storage of and access to many types of scholarly outputs, including research data, from particular universities or research institutions (ARL Digital Repository Issues Task Force, 2009;Walters, 2007).However, institutional repositories are not the only potential home for research data.Rather, international and national organizations, academic societies, and multi-institutional collaborations also host data repositories that tend to be centered around particular disciplines or data types (for example, the Worldwide Protein Data Bank, British Atmospheric Data Centre, the Inter-university Consortium for Political and Social Research, and the Ecological Society of America Data Registry).
From some perspectives, institutional and disciplinary data repositories could be considered as being in competition.That is, academic librarians who are motivated to capture and preserve the scholarly record of their specific institution may choose to promote the deposit of research data into their institutional repository instead of disciplinary repositories.However, where to deposit research data to ensure its preservation and future access need not be a binary decision.Rather, the relationship between institutional and disciplinary repositories can be mutually beneficial, or symbiotic, to both individual universities and larger research communities (Lynch, 2003).For instance, institutional, national, and international data repositories can be considered as different and ascending tiers of a 'Data Pyramid' (The Royal Society, 2012), with institutional repositories initially collecting a large swath of datasets that might otherwise be discarded or lost, and national or international repositories then committing to preserving and ensuring access to those datasets with the highest value, thereby increasing the visibility of the data to relevant communities of interest (Hodson, 2012).This positive outcome would depend on active partnerships between academic institutions and disciplinary data repositories.For instance, by virtue of being close to the source of research data, academic librarians or other local data curators could work directly with researchers to process and review data, create metadata and provide contextual information, address data sensitivity concerns, and ingest data into institutional repositories, after which 'archive-ready' data packages could be pushed into disciplinary repositories for long-term preservation (Green & Gutmann, 2007;Steinhart, 2013).
Due to the complexity of the research data ecosystem (Borgman, 2012;National Science and Technology Council, 2009) and some perceived inadequacies of current institutional repository systems (Murray-Rust, 2011;Salo, 2008;Davis & Connolly, 2007), institutional repositories alone are unlikely to meet all data preservation and sharing needs of individual researchers or larger research communities.Instead, greater success in capturing the vast amounts of data that presently fail to depart the hands of the original researchers might be achieved by enabling the use of both institutional and disciplinary data repositories.Here, using the Dryad data repository and the University of Michigan Library as a case study, we suggest how academic libraries could forge and foster partnerships with disciplinary data repositories to more effectively support researchers on their campuses and to benefit the greater scholarly community.

Overview of Dryad
Grown out of collaboration among the University of North Carolina at Chapel Hill, North Carolina State University, and Duke University, Dryad1 is a repository that holds the research data underlying articles published in peer reviewed journals or other scholarly documents, such as books or dissertations.Although Dryad initially developed as a repository for ecology and evolutionary biology data, it has since transformed into a general repository for scientific and medical data.Built using open source DSpace software, Dryad accommodates all digital data formats (such as text, spreadsheets, video, images, software code).Depending on the journal in which the associated article is published, researchers can deposit data either before peer review (allowing restricted access by journal editors and peer reviewers) or after article acceptance (allowing unrestricted access after any necessary embargo).Deposited data are free to download with no legal barriers to re-use via a Creative Commons Zero (CC0) waiver.Deposited data are stored locally at North Carolina State University, and a commitment is made to preserve data in perpetuity via the CLOCKSS2 network.At Dryad's discretion, the formats of files may be migrated to improve the accessibility of their contents or preservation potential3 .Dryad is a non-profit organization that is currently supported by funding from the National Science Foundation, although its main source of financial sustainability is expected to come from data submission fees, which were put into place in 2013.
Submission of data packages to Dryad is primarily a researcher-driven process supported by a light amount of behind-the-scenes curation4 .After registering for a Dryad account, researchers describe the journal article associated with the data package (i.e., title, authors, journal name, abstract, DOI, volume, year, keywords, taxonomic names, geographic areas, geological timespan) and then upload and describe each file in the data package (i.e., title, description of file contents, author, embargo length, keywords, taxonomic names, geographic areas, geological timespan).The number of metadata fields is deliberately kept to a minimum so as not to overburden researchers, with submission taking no longer than 15 minutes.After researchers submit their data, Dryad curators check that files open properly, are named appropriately and not duplicated, contain what look to be the correct data, and appear to have reasonable metadata at both data file and package levels.They do not check the scientific validity, veracity, accuracy, or completeness of file contents 3 , leaving this responsibility to the researchers themselves.Curators then register digital object identifiers (DOIs) for data packages using the California Digital Library's EZID5 and send acceptance emails to the researchers and journal contacts.Their goal is to fully ingest data files within two business days.
The use of Dryad confers several benefits to researchers.Deposit of data into Dryad ensures that data are openly accessible, enabling compliance with journal publisher or funding agency requirements.The assignment of DOIs to data packages permits the persistent indexing and formal citation of data, allowing researchers to share permanent links to their datasets and to get credit when other researchers refer to or re-use those doi:10.2218/ijdc.v9i1.306datasets.Data packages in Dryad are indexed by Thomson Reuters' Data Citation Index6 , which keeps citation records and provides a single point of access to research data housed in a growing number of data repositories.Dryad also provides researchers with data package-level metrics, including number of page views and number of downloads.In late 2013, each data package housed in Dryad had been downloaded a median number of 38 times, providing evidence that Dryad gives visibility to the data underlying scientific publications that would otherwise be hidden or lost.

Organizational Partnerships with Dryad
Dryad invites the formation of partnerships with a broad array of stakeholders, including journal publishers, academic societies, libraries, universities and research institutions, and funding agencies.These partnerships can take three different forms:  Organizations that publish journals can coordinate their article submission process with Dryad's data submission process;  Organizations can become members and participate in the governance of Dryad; and/or  Organizations can choose a pricing plan to cover the cost of data submission for affiliated researchers.

Journal Integration
Publishers of journals can coordinate their article submission process with Dryad's data submission process, which allows researchers to submit their data faster and more easily.This 'integrated submission' is a free service provided by Dryad.Publishers first complete a questionnaire about their article submission process, then discuss possible integration options with a Dryad representative, and finally undergo simple testing before implementation.As of late 2013, there were a total of 39 Dryad-integrated journals7 , with many more at some stage of completing the integration process.

Membership
Any type of scholarly organization can become a Dryad member.As of late 2013, Dryad had 28 member organizations8 , consisting primarily of scientific journal publishers (e.g., PLoS, Oxford University Press, Wiley Blackwell) but also including professional associations/societies (e.g., AAAS, British Ecological Society, American Society of Naturalists), one government agency (US Fish and Wildlife Service), and one national library (German National Library of Medicine).A Dryad membership provides organizations with two main benefits: the standing to nominate and elect Dryad Board of Director members and vote on amendments to bylaws, and a discount on pricing plans for data submission fees.

Data Submission Pricing Plans
Any type of scholarly organization can elect to cover the cost of data submission fees (starting at $80 USD per data package) for affiliated researchers.Dryad offers three different pricing plans 9 , with member organizations receiving discounted pricing.
 Vouchers: Any organization can purchase and distribute vouchers to individual researchers to cover the cost of future data package submissions.
 Deferred payment: Organizations that publish journals can be invoiced for the submission of data packages associated with articles published in their journal(s) during the previous quarter.
 Subscription: Organizations that publish journals can fund an unlimited number of data package submissions by paying a fixed fee based on the total number of articles published in their journal(s) in the prior year.

Opportunities for Academic Libraries
Academic libraries seeking to provide more comprehensive support for the data preservation and sharing needs of researchers on their campuses could consider entering into formal relationships with disciplinary data repositories.The growing number of prominent organizations choosing to partner with Dryad serves as evidence that Dryad is evolving into a trustworthy repository for scientific and medical research data.Moreover, as Dryad explicitly invites scholarly organizations, including libraries, to become members, partnering with Dryad may be a particularly promising endeavor for academic libraries.Indeed, Dryad just recently welcomed their first library memberthe German National Library of Medicine -which will be covering the costs of Dryad data package submissions for researchers publishing in their open access journals GMS German Medical Science and Medizinische, Informatik, Biometrie und Epidemiologie.
Here, we highlight some of the existing forms of Dryad partnerships that might be most relevant to academic libraries.Furthermore, we suggest the possibility of a new type of partnership between academic libraries and Dryad that could be realized in the future.

Dryad Membership
Dryad member organizations choose a representative to attend the annual Dryad membership meeting and vote on Dryad Board of Director members, bylaws, and budgets.At this meeting, attendees also learn about the progress of Dryad, discuss its future direction, and hear about emerging issues from Board of Director members and other leaders in scholarly publishing and data preservation 10 .Becoming a Dryad member would thus enable academic libraries to establish a relationship with this emerging data repository and maintain awareness of its advances in data preservation and publication.Furthermore, academic libraries could potentially shape Dryad's development through voting and conversations with administrators.In this regard, it is notable that the majority of current member organizations are journal publishers, whose interests and values around data and other scholarly materials may at times conflict with those of researchers, libraries, and the greater academic community.The inclusion of academic libraries into the circle of Dryad member organizations would therefore widen the pool of stakeholder perspectives that guide the future of Dryad and shape the larger research data ecosystem.Currently, the annual Dryad membership fee for organizations is $1,000 if annual gross income is below $10M and $5,000 if annual gross income is above $10M.For organizations such as academic libraries, the annual membership fee is based on annual budget instead of gross income (L.Wendell, personal communication, September 19, 2013).The chance for academic libraries to enter into conversations with Dryad administrators, journal publishers, and academic societies and to potentially influence the development of Dryad as a repository and outlet for scientific research data may be worth the annual membership cost.

Purchase of Dryad Vouchers
Scholarly organizations can choose to cover the cost of Dryad data package submissions for affiliated researchers via one of three pricing plans, with Dryad member organizations receiving discounted pricing.Two pricing plans -the deferred payment and subscription plans -are linked to particular journals and thus are designed for publishers or other organizations that publish journals.However, the voucher plan is tied to individual researchers rather than particular journals and therefore would be well suited to academic libraries.Libraries could purchase vouchers (25 voucher minimum: $65/voucher for members or $70/voucher for non-members) that would cover the cost of future Dryad data package submissions.The vouchers are codes that libraries could distribute to individual researchers, who then enter the codes during the data submission process.The voucher plan is completely flexible; vouchers can be used by any researcher who publishes in any journal (i.e., they not restricted to researchers who publish in Dryad-integrated journals).Libraries would be free to set their own priorities and systems for voucher distribution.For instance, vouchers could be preferentially given to graduate students, post-docs, or assistant professors to foster data sharing among younger scientists or to researchers who publish articles in open access journals to further promote a culture of open science.Dryad sends organizations monthly statements containing information on which vouchers were used and the names of the researchers who used them.

Local Dryad Curator
In addition to the existing forms of partnership, a more significant relationship between academic libraries and the Dryad data repository could take the form of local, librarybased Dryad curators.The Dryad curation team is currently based at the University of North Carolina (UNC)'s Metadata Research Center and consists of one full time senior curator and three UNC School of Information and Library Science students who work as assistant curators.These curators oversee the backend of the data submission process and communicate with researchers and journal contacts.As the volume of data submissions to Dryad increases, the possibility emerges of having library-based assistant curators who can remotely ingest data into Dryad.These library-based assistant curators could be graduate students in library and information science programs, who would take local courses on data curation, metadata, and digital preservation, and travel to the UNC Metadata Research Center to receive training on the Dryad data curation workflow.After returning to their home institutions, doi:10.2218/ijdc.v9i1.306Katherine Akers and Jennifer Green | 125 assistant curators could market Dryad to relevant populations of researchers, find individual researchers to serve as Dryad adopters, identify datasets that could be submitted to Dryad, remotely assist in the ingest of data packages into Dryad, and serve as liaisons among library and information schools, academic libraries, and relevant science and medical departments.Moreover, because local assistant curators could directly interact with researchers, they could play a key role in adding value to data and thereby increase the likelihood that data housed in Dryad will be meaningful and reusable by others in the future.That is, local assistant curators could help gather contextual information describing the purpose and process of data collection, identify and properly deal with missing or incorrect values, check that summary statistics match those reported in the associated journal article, ensure that individual data items (i.e., spreadsheet rows or columns) are adequately described in codebooks or 'readme' files, verify that associated computer code runs properly, and convert data files into nonproprietary formats when possible (A.Green, personal communication, October 25, 2013).
At present, however, the idea of local Dryad assistant curators may be premature.Although Dryad is seeing a rapid increase in the number of partnering journals and data submissions, Dryad administrators point out that the present workforce is sufficient to handle the current stream of submissions and that there has not been a large enough volume of data submissions from any single institution to warrant remote assistant curators (J.Greenberg, personal communication, August 21, 2013).Furthermore, the data curation workflow at Dryad is still in flux, with much time and effort being devoted to dealing with exceptions to the general workflow.As such, at this point in time, the behind-the-scenes data curation must occur centrally at the UNC Metadata Research Center.However, as Dryad continues to mature and establish routine data curation workflows, the addition of remote assistant curators could be a practical and valuable option.Not only would local assistant Dryad curators specifically serve researchers at their institutions and thus expand their libraries' support of research data management, they could also provide a deeper layer of data curation than that currently offered by Dryad by ensuring that submitted data and metadata are sufficiently accurate and complete to permit future re-use.Moreover, datasets submitted to Dryad could also be ingested by institutional data repositories, thereby enhancing the institutions' scholarly records, and insights gained by assistant curators from their hands-on experience with Dryad could be applied to the continued development of institutional data repositories.

Case Study: The University of Michigan
The University of Michigan Library is developing a network of services, called Research Data Services (RDS), to support the management of data throughout all phases of the research lifecycle.One component of RDS that is currently under consideration is the provision of infrastructure to support the long-term storage and sharing of research data created on our campus.Although Deep Blue11 , the libraryhosted institutional repository, currently contains some research datasets, we recognize that this is not the ideal system to house research data, as it does not provide sufficient data visibility or discovery to be appealing to most researchers.To close this gap, a library task force is currently investigating other solutions for the medium-to long-term storage of and access to data generated by researchers at the University of Michigan. doi:10.2218/ijdc.v9i1.306However, recognizing the value of disciplinary data repositories, we acknowledge that the most effective approach to supporting our researchers' data management needs may be to provide internal means for data preservation and sharing and to facilitate the use of external, disciplinary data repositories.
Our library already has formal relationships, which vary in depth, with repositories for social science data.For instance, as an institutional member of the Roper Center for Public Opinion Research12 , we provide University of Michigan researchers with access to their data collections and receive periodic reports of usage statistics.A deeper relationship exists between our library and the Inter-University Consortium for Political and Social Research (ICPSR) 13 .As an institutional member, we provide campus researchers with access to ICPSR data collections and designate librarians to serve as ICPSR representatives, who assist researchers with accessing and working with ICPSR data, attend an ICPSR biennial meeting, and vote on ICPSR Council members.As we develop a more complex service model around research data, we have recently been considering new types of relationships with external data repositories, such as Dryad, not only to help researchers locate and use existing data but also to help researchers disseminate the products of their research by depositing their data in appropriate places.
To estimate the potential uptake of Dryad use on our campus, we first determined the number of articles authored by University of Michigan researchers that were recently published in Dryad-integrated journals.A Web of Science search14 revealed that there were 91 such articles in 2012 and 2013 (Table 1).Next, we determined whether University of Michigan researchers have deposited data into Dryad.As researchers are not asked to provide institutional affiliations for themselves or their co-authors during the Dryad data submission process, neither Dryad nor Thomson Reuters' Data Citation Index can be searched for data from particular institutions.However, direct communication with Dryad administrators revealed that as of late 2013, University of Michigan researchers had deposited eight data packages into Dryad (L.Wendell, personal communication, October 29, 2013).Six of these data packages were associated with articles in Dryad-integrated journals (Table 1), and two were associated with articles in non-Dryad-integrated journals (Journal of Biogeography and Molecular Biology and Evolution).Four other data packages had been submitted to Dryad and were in the process of review.Finally, as most data packages currently housed in Dryad pertain to ecology and evolutionary biology, we scanned through the personal websites of faculty within the Ecology and Evolutionary Biology Department at the University of Michigan.We found that several researchers who have published in Dryad-integrated journals have also publicly posted research data or computer code on their websites.Also, one faculty member is a co-editor-in-chief of a new open access journal that recently integrated with Dryad.Therefore, we speculate that although submission of data to Dryad may not currently be standard practice among University of Michigan researchers, its potential for adoption could be significant.doi:10.2218/ijdc.v9i1.306By becoming a Dryad member, our library could take advantage of a discounted pricing plan to financially assist University of Michigan science or medical researchers with submitting their data to Dryad.In recent years, the library participated in the Compact for Open-Access Publishing Equity15 by covering or subsidizing researchers' costs of publishing articles in open access journals.This venture was a success by all measures, with dedicated funding quickly allocated to the publication of nearly 40 articles in open access journals 16 .In a similar fashion, the library could purchase vouchers to cover researchers' costs of submitting data to Dryad.To pilot this program, the library could purchase the minimum number of vouchers and establish a system for their dispersal to University of Michigan researchers, such as directly marketing the program to relevant departments or requesting that Dryad place a note on their website instructing University of Michigan researchers to contact the library for financial assistance.
Furthermore, our library system encompasses Michigan Publishing17 , the university press, which publishes several scientific and medical journals, such as The Michigan Botanist, Archive for Organic Chemistry (Arkivoc), Journal of Anthropological Research, and Journal of Muslim Mental Health.Therefore, another route of expanding our partnership with Dryad would be to integrate Michigan Publishing's journal article submission process with Dryad's data submission process.Vouchers could also be made available to researchers who publish in Michigan Publishing journals, whether they are based at the University of Michigan or at other institutions.
Finally, should the idea of local Dryad assistant curators come to fruition in the future, our library could host a University of Michigan/Dryad fellow or intern.This individual, potentially a graduate student in the University of Michigan's School of Information or a Council on Library and Information Resources (CLIR) postdoctoral fellow18 , could liaise among the library, relevant departments, Michigan Publishing, and Dryad to provide focused research data management support tailored to specific populations of University of Michigan researchers, such as faculty in the Ecology and Evolutionary Biology Department.Furthermore, this fellow or intern could learn best practices in data curation from the neighboring ICPSR, which could guide efforts to add more value to datasets prior to their submission to Dryad and any institutional data repository.Regardless of the existence of this specialized position, representatives of the library, such as science or data librarians, could actively reach out to scientists and medical researchers at the University of Michigan and promote the use of the Dryad repository to disseminate the research data underlying their journal articles.

Conclusion
Managing data across all phases of the research lifecycle, including ensuring its longterm accessibility, is a complicated and challenging task that can be aided by a more robust network of institutional and disciplinary data repositories.Our intent here is not to suggest that institutional data repositories are unimportant.Rather, institutional data repositories can play a vital role in bridging the gap between the vast amounts of research data that are currently hidden in personal hard drives and university servers, and the small amounts of research data that are placed in national and international doi:10.2218/ijdc.v9i1.306Katherine Akers and Jennifer Green | 129 disciplinary data repositories (Hodson, 2012;Lynch, 2003).However, researchers may tend to personally align with their disciplinary communities more than their institutions (Davis & Connolly, 2007;Foster & Gibbons, 2005;Erway, 2012), and disciplinary repositories may be more likely to enhance the visibility of data to specific communities of interest.Therefore, to provide the most effective and meaningful support for research data management, academic libraries must go beyond promoting the deposit of data into institutional repositories and actively seek to partner with major disciplinary data repositories.We suggest ways that academic libraries could form relationships with the Dryad repository for scientific and medical data, such as becoming a Dryad member organization, providing financial assistance to campus researchers who wish to submit their data to Dryad, promoting the use of Dryad to relevant local departments and research groups, and directly working with researchers and their data to increase the likelihood that data will be understandable and useable by others in the future.Apart from Dryad, academic libraries could further promote the preservation and sharing of research data by partnering with other data repositories and/or other stakeholders in the research data ecosystem to facilitate the transfer of research data between institutional and disciplinary repositories, including pushing data into disciplinary repositories to increase their visibility and harvesting research data from disciplinary repositories to deepen the holdings of particular institutions.