Disciplinary Differences in Faculty Research Data ManagementPractices and Perspectives

Academic librarians are increasingly engaging in data curation by providing infrastructure (e.g., institutional repositories) and offering services (e.g., data management plan consultation) to support the management of research data on their campuses. Efforts to develop these resources may benefit from a greater understanding of disciplinary differences in research data management needs. After conducting a survey of data management practices and perspectives at our research university, we categorized faculty members into four research domains – arts and humanities, social sciences, medical sciences, and basic sciences – and analyzed variations in their patterns of survey responses. We found statistically significant differences among the four research domains for nearly every survey item, revealing important disciplinary distinctions in data management actions, attitudes and interest in support services. Serious consideration of both the similarities and dissimilarities among disciplines will help guide academic librarians and other data curation professionals in developing a range of data management services that can be tailored to the unique needs of different scholarly researchers.


Introduction
Research data need not merely serve as material underlying conference papers, journal articles, and books.Rather, if documented, preserved and made accessible, datasets can stand alone as scholarly products with the potential to impact future research (Williford and Henry, 2012).Academic librarians are increasingly becoming engaged in data curation by providing infrastructure and developing services to support the management of research data on their campuses (ACRL Research Planning and Review Committee, 2012;Heidorn, 2011;Monastersky, 2013;Olendorf & Koch, 2012;Reznik-Zellen, Adamick and McGinty, 2012;Soehner, Steeves and Ward, 2010;Starr, Willett, Federer, Horning and Bergstrom, 2012;Tenopir, Birch and Allard, 2012).However, the data management needs of researchers vary substantially across disciplines.Not only do researchers in the humanities, social sciences and natural sciences create datasets that differ in size and content, they are also enmeshed in diverse research cultures and communities of interest with different attitudes toward and expectancies for data sharing and archiving (Digital Curation Centre, 2012;Borgman, 2012;Palmer and Cragin, 2008).Effective data curation, therefore, requires an understanding of these disciplinary distinctions and the development of services that are tailored to different populations of academic researchers (Cragin, Palmer, Carlson and Witt, 2010).
The Electronic Data Center 1 at the Emory University Libraries has a long history of helping researchers locate, acquire and prepare data for analysis during the early stages of data collection.Recently, we have begun to expand this support to encompass data management planning as well as the documentation, sharing and preservation of research data.To help guide the development of new services to support the management of data across all phases of the research lifecycle, we followed the lead of other academic libraries (e.g., Bardyn, Resnick and Camina, 2012;Gu, Averkamp, Walton, Saylor and Soderdahl, 2012;Parsons, Grimshaw and Williamson, 2013;Raggett, 2012;Scaramozzino, Ramirez and McGaughey, 2012;Wells Parham, Bodnar and Fuchs, 2010;Westra, 2010) and conducted a campus-wide survey of research data management practices and perspectives.As Emory University comprises several colleges and professional schools spanning a wide range of disciplines, we were particularly interested in whether faculty members in different fields of study have varying approaches to research data management or preferences for particular data management-related services.Therefore, we categorized faculty members into four broad research domains -the arts and humanities, social sciences, medical sciences, and basic sciences -and analyzed differences in their patterns of survey responses.
email invitation to all employees with faculty status, as classified by Human Resources, to voluntarily complete a brief online survey.The survey contained 13 possible questions (see Table 1 in the Appendix) and was administered using Qualtrics software.It was open for four weeks, and three email reminders were sent at one-week intervals.Survey respondents were allowed to move forward in the survey without requiring an answer to each question.For multiple choice questions, respondents were given the choice of selecting 'other' and writing in a text response.

Survey Data Cleaning
In some cases (25 out of 1,650 possible cases, < 0.02%), survey respondents did not select a pre-defined answer to multiple choice questions but instead selected 'other' and provided a text response that clearly matched one or more of the pre-defined answers.In these cases, survey responses were overwritten to reflect the information provided in the text response.

Categorization of Survey Respondents
To evaluate the differences in patterns of survey responses among researchers in different fields of study, we placed each survey respondent into one of four different categories: arts and humanities, social science, medical science or basic science.In some cases, respondents were categorized solely based upon their primary departmental affiliation (e.g., all respondents from the Art History department were assigned to arts and humanities).In other cases, respondents were categorized based on their specific research topics and methodology (e.g., some respondents from the Psychology department were assigned to social science and others to basic science).
Arts and humanities respondents were faculty members in area studies, art history, classics, creative writing, history, languages, liberal arts, literature, music, philosophy, religion, theater studies and theology.Social science respondents were faculty members in anthropology, business, economics, educational studies, journalism, law, linguistics, medicine, political science, psychology, public health, sociology and women's studies.Medical science respondents were faculty members in biostatistics and medicine.Basic science respondents were faculty members in biology, biostatistics, chemistry, environmental studies, medicine, physics and psychology.
We considered medical science as research conducted in a clinical setting or otherwise 'applied' in nature, whereas basic science is research conducted in a laboratory or field setting or otherwise 'experimental' or 'observational' in nature.

Statistical Analysis
The representativeness of our samples was calculated for each Emory University liberal arts college (Emory College of Arts and Sciences, Oxford College as gray bars.When statistically significant differences among groups were observed, the bars were overlaid by markers corresponding to the different fields of study.Complete results of all statistical analyses are provided in the Appendix (Tables 1 and  2).Survey data are available at the Emory University Dataverse (Akers and Doty, 2013).

Survey Respondents
From a total of 5,590 Emory University faculty members, 456 initiated the survey (~8% response rate).Of these, 330 answered 'yes' to an initial question of whether they conduct research that generates some type of data and provided answers to at least one subsequent survey question.All analyses focused only on these 330 faculty members, who represented all of Emory University's major schools and colleges.Statistically representative samples of faculty members were obtained from the Candler School of Theology, Goizueta Business School, Oxford College, Nell Hodgson Woodruff School of Nursing and Rollins School of Public Health; an over-representative sample was obtained from the College of Arts and Sciences; and under-representative samples were obtained from the School of Law and School of Medicine (see Table 2 in the Appendix).The overall margin of error was ± 5%.
Depending on their research topics and methodology, faculty members were categorized as conducting research in arts and humanities (n = 54), social sciences (n = 78), medical sciences (n = 124), or basic sciences (n = 74).

Data Storage and Back-Up
Overall, the amount of digital research data currently stored by individual faculty researchers at Emory University mostly falls within the gigabyte range (Figure 1A).However, compared to researchers in other fields of study, basic science researchers are more likely to have larger quantities (i.e., terabytes) of data than researchers, and arts and humanities researchers are more likely to state that they do not know how much data they are storing.The most common methods of storing or backing-up data are via desktop or laptop computer hard drives, external hard drives (including USB drives), and university-or department-based servers (Figure 1B).Basic science researchers are more likely to rely on external hard drives, university-based servers, the hard drives of the instruments used to collect data, and lab books, field notes, or other printed/handwritten materials.By contrast, arts and humanities researchers are more likely to rely on computer hard drives and internet-based storage services, such as Dropbox and Google Drive.There were no significant differences among fields of study in use of CDs, DVDs, tapes or 'other' methods of data storage and back-up.

Data Management Planning
Overall, most (~82%) faculty researchers are only somewhat or not at all familiar with requirements for data management or data sharing plans as components of grant applications for federal funding agencies, such as the National Science Foundation (NSF), National Institutes of Health (NIH) and the National Endowment for the Humanities (NEH) (Figure 2).Furthermore, arts and humanities researchers are most likely to be completely unfamiliar with these funding agency requirements for data management plans.

Data Sharing
Most faculty researchers at Emory University do not currently share their research data with people outside of their research group (Figure 3A), although researchers in basic sciences were more likely to share their data than researchers in other fields of study.
Considering researchers who do share their data (i.e., those who answered 'yes' to the question shown in Figure 3A), emailing data upon request is the most common method of sharing data (Figure 3B).However, social science researchers are least likely to share data via email, and medical science researchers are least likely to post data on personal websites.Basic science researchers are most likely to share data via supplementary material linked to journal articles (e.g., supplementary datasets hosted Volume 8, Issue 2 | 2013 by the Public Library of Science (PLoS) journals or links to datasets deposited in the Dryad repository) or posted on department or university websites.No arts and humanities researchers share their data via data repositories or databanks.There were no significant differences among researchers in 'other' ways of sharing data, the most frequently noted of which were internet storage services (i.e., Dropbox, Microsoft SkyDrive, Amazon Web Services) and sponsored accounts or restricted access to university-based servers.Considering all faculty (i.e., those who answered 'yes' or 'no' to the question shown in Figure 3A), the vast majority are willing to share their data with other researchers (e.g., principal investigators, students, staff) working on the same projects (Figure 3C), although arts and humanities researchers are least willing to do so.However, fewer faculty researchers are willing to share their data with a wider audience.Most medical science researchers are not willing to share data with researchers outside of their projects or with instructors interested in using the data as a teaching tool.Arts and humanities researchers, however, are more willing to share their data with the general public than researchers in other fields of study.Interestingly, nearly half of all faculty members are not willing to share their data with project funders.There were also no significant differences among fields of study in the proportion of researchers willing to share data with 'other' individuals or not willing to share data with anyone.For those who selected 'other', the most common write-in response was that the individuals with whom they are willing to share their data depend on whether manuscripts related to the data have been published and whether the data contain sensitive information.A few faculty members stated they are willing to share their data with the Institutional Review Board (IRB) or university administrators.

The International Journal of Digital Curation
We next asked what reasons might prevent faculty researchers from sharing data with people outside of their research group (Figure 3D).We found that the top three reasons for not sharing data more widely are: 1.The data contain personal or sensitive information; 2. Researchers might not get credit for their data in terms of acknowledgement, citation, or authorship; 3. The data might be misinterpreted or misused.
Medical and social science researchers are most likely to not share data because they contain personal or sensitive information, or require secure and/or restricted access, whereas researchers in the arts and humanities are least likely to share these concerns.Researchers in basic sciences and arts and humanities are more concerned that they might not get credit for their data than researchers in medical and social sciences.Basic and medical science researchers are more likely to withhold data because the outputs of their research could be patentable or commercialized.There were no differences among fields of study in concerns that data might be misinterpreted or misused, sharing would require too much time and effort, data may be of little value to others, researchers are not licensed to share their data, or 'other' reasons preventing data sharing.Those who selected 'other' stated that data sharing is prevented by the IRB or the Health Insurance Portability and Accountability Act (HIPAA), that data would not be shared if still in the collection or analysis phases or if manuscripts related to the data had not yet been published, or that data stored on university servers are not easily accessible by others.

Data Preservation
The vast majority of faculty researchers do not deposit their data in data repositories or databanks (Figure 4A).However, researchers in basic sciences are more likely to do so than researchers in other fields of study.The most commonly used data repositories/databanks (reported by those who answered 'yes' to the question shown in Figure 4A) are those provided by the National Center for Biotechnology Informationincluding GenBank, Sequence Read Archive (SRA), Gene Expression Omnibus (GEO), and Database of Genotypes and Phenotypes (dbGaP) -and Protein Data Bank  Considering those faculty who do not currently deposit their data in a data repository/databank (i.e., those who answered 'no' to the question shown in Figure 4A), slightly over half (~55%) stated they are somewhat or very interested in starting to deposit their data (Figure 4B).In particular, medical science researchers are most interested in starting to deposit their data in a data repository/databank.

Data Documentation
As expected, most faculty researchers are not at all familiar with documenting and/or creating metadata for their data so the contents of their datasets can be understood by others and/or computer-readable (Figure 5), with no significant differences among fields of study.

Interest in Data Management-Related Services
Finally, we asked faculty members to indicate whether they would use a range of data management-related services if they were offered by Emory University (Figure 6).Regardless of field of study, the two services that garnered the most interest are: 1. Faculty workshops on data management practices, 2. Assistance preparing data management plans for grant applications.Faculty expressed less interest in data citation services (e.g., assignment of permanent digital object identifiers).Researchers who selected 'other' wrote in that they are interested in receiving support for setting up their own servers, reliably storing terabytes of data, creating and managing databases, designing data collection tools, and more easily sharing data with other researchers or research groups.A couple of faculty members explicitly stated they are not interested in any services.Compared with researchers in other fields of study, researchers in medical science are more interested in faculty workshops on data management, assistance with data-related confidentiality/legal issues, and identifying appropriate data repositories.Also, arts and humanities researchers are most interested in digitization of research materials in physical formats.

Discussion
Different disciplines vary widely in their research funding, technical infrastructures, collaboration networks, source materials, subject populations, methodologies, ethical considerations and types of research outputs.Therefore, to be most effective, data curation requires discipline-specific approaches.Although disciplinary differences in data management issues have been noted by previous reports (Connaway and Dickey, 2009;Digital Curation Center, 2010;Jahnke and Asher, 2012;Pryor, 2009;Witt, Carlson, Brandt and Cragin, 2009), there has been little quantitative analysis of these variations.By categorizing our survey respondents into arts and humanities, social sciences, medical sciences, and basic sciences, we revealed important disciplinary distinctions in data management actions, attitudes, and interest in support services.Notably, our results complement those obtained by another survey recently conducted by the University of Oxford, which had a similar sample size and equivalent breakdown of respondents into four research domains (Wilson, Jeffreys, Patrick, Rumsey and Jefferies, 2013).Studies such as these advance our understanding of the data management needs of different populations of academic researchers.
Compared to other fields of study, the curation of scientific data receives a disproportionate amount of concern, in part due to the large quantities of data produced and amounts of federal funding dedicated to this area of research.As such, the data management practices and perceptions of scientists have been documented by several previous studies (Borgman, Wallis and Enyedy, 2006;Piwowar, 2011;Karasti, Baker and Halkola, 2006;Tenopir et al., 2011;Westra, 2010;Whyte and Pryor, 2011;Williams, 2013).The results of our survey are largely consistent with this existing body of information.As expected, basic science researchers at our university tend to store more digital data than other researchers, with approximately one-third of these researchers storing terabytes of data.For storage and back up, basic science researchers use university-based servers more than other researchers, which may reflect either the preferential allocation of campus resources to basic scientists or their greater familiarity with or need for these resources.As the basic sciences rely heavily on specialized instruments for data collection, the hard drives of these instruments are also key data storage locations.Further, basic scientists often use laboratory notebooks or other handwritten documents to record data or metadata, which presents the challenge of integrating physical and digital records to preserve the meaningfulness of data over time (Briney, 2012).Not surprisingly, basic scientists are the most familiar with funding agency requirements for data management plans.They are also the most likely to share data with people outside of their research groups and to deposit some or all of their data in data repositories or databanks.As noted by another study (Tenopir et al., 2011), the primary reason why basic scientists choose not to share their data is that they might not get credit for their work.However, at the same time we did not find a high degree of interest in services related to data citation, perhaps because researchers are relatively unfamiliar with this concept.Therefore, basic science researchers in particular may benefit from increased awareness of

The International Journal of Digital Curation
Volume 8, Issue 2 | 2013 methods to receive more credit for their data via the publication of data papers (Chavan and Penev, 2011), the archival of datasets in data repositories, the assignment of DOIs to datasets, and the consideration of datasets as products of scholarly research in NSF funding applications (National Science Foundation, 2013).
A growing data-intensiveness of scholarly research applies not only to the basic sciences but also to the social sciences.Although social scientists are increasingly generating larger amounts of data, much of this data is not shared or archived, largely due to concerns about confidentiality or privacy of human research subjects (Digital Curation Centre, 2010;Gutmann et al., 2009;Jahnke and Asher, 2012;King, 2011).We found that only around a third of social science researchers at our university share their data with people outside of their research group.Compared to other types of researchers, social scientists are least likely to share their data via email, perhaps because their data contain private or confidential information that require a more secure method of transmission.Consistent with this possibility, two of the most pressing reasons that might prevent social science researchers from sharing their datasets are that they contain personal or sensitive information or require restricted modes of access.To support the specific needs of social science researchers, academic librarians could assist in the careful de-identification of data (DeWolf, 2002), champion new tools allowing protected storage of and controlled access to sensitive data (Jahnke and Asher, 2012), or create a 'cold room' for secure computing within the library, such as that provided by the University of Wisconsin-Madison Libraries 2 or the Johns Hopkins University Population Center 3 .
Many of the issues that prevent data sharing among social science researchers are also relevant to medical science researchers.Although federal funding agencies, such as the NIH, strongly encourage data sharing, only a fraction of clinical research data are shared beyond the original research teams to avoid potentially violating the Health Insurance Portability and Accountability Act (HIPAA) for the protection of patient privacy (Freymann, Kirby, Perry, Clunie and Jaffe, 2012).Compared to researchers in other fields of study, medical researchers at our university are least willing to share their data with researchers not working on the same projects, instructors interested in using data as teaching tools, or the general public.Similar to social scientists, medical researchers also state that the most critical reasons for not sharing datasets are because they contain personal/sensitive information or require restricted access.However, researchers in the medical sciences expressed high levels of interest in taking advantage of data management-related services on campus, particularly attending workshops on data management practices and receiving assistance with data-related confidentiality and legal issues.Furthermore, although most medical researchers do not currently deposit their data in data repositories or databanks, approximately two-thirds are interested in doing so, with assistance identifying or using appropriate data repositories being a highly desired service.Therefore, complementary to a previous study that identified a need for easier discovery and use of existing datasets that are relevant to medical research (Bardyn, Resnick and Camina, 2012), we found that medical researchers could also benefit from assistance finding appropriate places to archive the datasets that they produce.Interestingly, researchers in the arts and humanities were most likely to state that they do not know how much data they are storing, perhaps due to uncertainty regarding whether their research material meets the commonly accepted definition of 'data'.Therefore, when considering the needs of researchers in the arts and humanities, academic librarians may want to adopt a view of data that is more expansive than that typically applied to the natural and social sciences (Muñoz and Renear, 2011;Williford and Henry, 2012).At our university, arts and humanities researchers tend not to store and back up their data using university-based servers but instead rely heavily on computer/external hard drives and internet-based storage.This may reflect a need to easily access data from off-campus locations during field or archival work.Compared to researchers in other fields of study, arts and humanities researchers are much less familiar with funding agency requirements for data management plans, which is not surprising considering that these researchers are typically less dependent on sources of federal funding that are largely skewed to the natural sciences.Practically no arts and humanities researchers who responded to our survey use data repositories to archive or share their data, most likely because fewer repositories exist for this type of data, such as the UK Data Archive or the Cultural Policy and the Arts National Data Archive (CPANDA).Although this population of researchers is least likely to share their data with other researchers working on their projects, perhaps because their studies are often performed without collaborators, they are the most interested in sharing their data with the general public.Of all of the potential data management-related services, arts and humanities researchers were most interested in the digitization of physical research materials.These survey responses highlight some of the ways in which academic librarians can be involved in developing new approaches for the curation of arts and humanities data (Muñoz and Renear, 2011;Svensson, 2011), including advocating for better data storage and computing facilities on campus, creating institutional or subject-specific data repositories to support data preservation and sharing with a wide audience, and providing more options for high quality digitization of a variety of physical research materials.

Conclusions
A primary objective of conducting this survey was to gather information on researchers' practices and perspectives to guide our development of library services to support the management of research data at Emory University.By obtaining survey responses from representative samples of faculty researcher populations, we were able to identify some of the most appropriate research data services for our campus and recognize areas in which stronger partnerships and collaborations between the library and other campus units are essential for the successful implementation of such services.
As our survey was in progress, we worked to set up Shibboleth authentication access for Emory University researchers to the DMPTool 4 , a free online tool created by the University of California Curation Center and California Digital Libraries that walks researchers through creating data management plans for grant proposals.Soon afterward, we also held a workshop that introduced the DMPTool and general concepts in research data management to faculty and graduate students.When our survey was complete, we were happy to find that the two potential services receiving the most interest were faculty workshops on data management practices and assistance with preparing data management plans, meaning that our initial efforts to provide wider support for research data management were in line with the needs of our faculty researchers.To further this line of support for data management plan preparation, we are now working to customize the DMPTool to provide sample language and links to resources that are relevant and specific to our university.
Our survey results have also helped us determine appropriate solutions to research data storage at our university.Our institutional repository, OpenEmory 5 , currently accepts only peer-reviewed journal articles authored by faculty members and not research data.However, the somewhat low level of interest by faculty members in an institutional repository for research data suggests that expanding OpenEmory to accept research data -which would require development costs, more storage space, and additional staff -might not be warranted.Instead, we are exploring other ways of supporting the preservation and sharing of research data that might be more useful to our faculty members, such as facilitating deposit of data into disciplinary repositories or setting up an instance of the Dataverse Network 6 for our university.These alternative options could allow us to address researchers' concerns with sharing confidential or sensitive information, while also increasing awareness of the benefits of releasing datasets to a wider audience and making them citable.
By categorizing our survey respondents into the arts and humanities, social sciences, medical sciences, and basic sciences, we were able to reveal key distinctions among different populations of researchers in their research data management needs.For example, we discovered that arts and humanities researchers (as well as librarians) may benefit from participating in conversations about definitions of arts and humanities research data and how they may differ from those for other research domains.Arts and humanities faculty may also benefit from a greater awareness of campus resources aimed at helping researchers manage their digital files.Therefore, we are now targeting arts and humanities scholars on campus through widely advertised workshops that address methods for managing one's digital assets.Furthermore, we are engaged in formal efforts to develop greater support for managing the data used in ongoing and emerging digital humanities projects as part of the newly formed Emory Center for Digital Scholarship 7 .It is important to mention, however, that although we considered only differences in data management actions and attitudes among four broad research domains, further distinctions are certain to exist among specific disciplines (i.e., astronomy vs. ecology, psychology vs. economics, art history vs. literature).Nonetheless, serious consideration of both the similarities and dissimilarities among disciplines will help guide academic librarians in developing a range of data management-related services that can be tailored to the unique needs of different researchers, thereby resulting in more effective and comprehensive approaches to research data curation.
) and professional school (Candler School of Theology, Goizueta Business School, Nell Hodgson Woodruff School of Nursing, Rollins School of Public Health, School of Law, School of Medicine) using Z-tests.Differences in survey responses among different fields of study (arts and humanities, social science, medical science, and basic science) were evaluated using chi-square (χ 2 ) tests.Statistical significance was set at p < 0.05.In the figures, overall responses regardless of field of study are shown
(PDB).Less commonly used data repositories/databanks include Yale University's NeuronDB and ModelDB, the Cambridge Crystallographic Data Centre (CCDC), Mouse Genome Informatics (MGI), Alzheimer's Disease Neuroimaging Initiative (ADNI), HIV Drug Resistance Database (HIVdb), Rutgers University's Cell and DNA Repository, Collaborative Initiative on Fetal Alcohol Spectrum Disorders (CIFASD), the Dataverse Network, and the Inter-university Consortium for Political and Social Research (ICPSR).