Building an Online Data Management Plan Tool

Following the 2011 announcement by the National Science Foundation (NSF) that it would begin requiring Data Management Plans with every funding application, the University of Houston Libraries explored ways to support our campus researchers in meeting this requirement. A small team of librarians built an online tool using a Drupal module. The tool includes informational content, an interactive questionnaire, and an extensive FAQ to meet diverse researcher needs. This easily accessible and locally maintained tool allows us to provide a high level of personalized service to our researchers. © 2013 Reilly & Dryden. This open access article is distributed under a Creative Commons Attribution 3.0 Unported License, which allows unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. PRACTICE jlsc-pub.org | Journal of Librarianship and Scholarly Communication Received: 01/22/2013 Accepted: 04/29/2013 INTRODUCTION On January 18, 2011 the National Science Foundation (NSF) began requiring supplemental data management plans (DMP) as part of every funding application. This new requirement created a great deal of confusion and anxiety in the research community, including among researchers at the University of Houston. At the University of Houston Libraries, the first call for help came on January 15th of that year from a faculty researcher requesting assistance with her DMP. She had been uncertain as to what to include in her DMP and was seeking guidance from other sources. She solicited help from her colleagues but wasn’t confident with the results, which consisted of a three sentence example that a fellow researcher had given her. We (the researcher and librarians) didn’t feel that this was adequate and weren’t sure that it would satisfy the NSF. After many phone calls, emails, and further online research we developed a DMP that we thought would better fulfill the requirement. For this researcher and others, the new DMP requirement was initially irritating and confusing because it requires researchers to think differently about the data they create. It compels researchers to think beyond what their data means to them and consider what it might mean to someone else. It introduces the concepts of the fragility of data, the interoperability of data, and the sharing of data. Conveying these concepts to our faculty researcher helped her understand the need and benefit of a DMP— not only to meet the NSF requirements, but to guide her in effectively managing her data for future use. That first faculty member request spurred the University of Houston Libraries’ Head of Digital Services to explore simple ways to help our researchers develop a DMP through the most expedient and user-friendly approach possible. Our guiding question was one that is shared by many libraries: What can the library provide that will help researchers develop data management plans when they don’t even know what those plans should contain? This paper describes how the University of Houston Libraries answered that question through the creation of a concise, interactive online tool.


INTRODUCTION
On January 18, 2011 the National Science Foundation (NSF) began requiring supplemental data management plans (DMP) as part of every funding application. This new requirement created a great deal of confusion and anxiety in the research community, including among researchers at the University of Houston. At the University of Houston Libraries, the first call for help came on January 15th of that year from a faculty researcher requesting assistance with her DMP. She had been uncertain as to what to include in her DMP and was seeking guidance from other sources. She solicited help from her colleagues but wasn't confident with the results, which consisted of a three sentence example that a fellow researcher had given her. We (the researcher and librarians) didn't feel that this was adequate and weren't sure that it would satisfy the NSF. After many phone calls, emails, and further online research we developed a DMP that we thought would better fulfill the requirement.
For this researcher and others, the new DMP requirement was initially irritating and confusing because it requires researchers to think differently about the data they create. It compels researchers to think beyond what their data means to them and consider what it might mean to someone else. It introduces the concepts of the fragility of data, the interoperability of data, and the sharing of data. Conveying these concepts to our faculty researcher helped her understand the need and benefit of a DMPnot only to meet the NSF requirements, but to guide her in effectively managing her data for future use.
That first faculty member request spurred the University of Houston Libraries' Head of Digital Services to explore simple ways to help our researchers develop a DMP through the most expedient and user-friendly approach possible. Our guiding question was one that is shared by many libraries: What can the library provide that will help researchers develop data management plans when they don't even know what those plans should contain? This paper describes how the University of Houston Libraries answered that question through the creation of a concise, interactive online tool.

LITERATURE REVIEW
Although many researchers (and librarians) were surprised by the new data management requirements, libraries have been preparing for the new landscape of research data curation for quite some time. Much of the early literature comes from the United Kingdom, where government mandates for research data sharing gained traction a few years earlier than in the United States, but a number of individuals and institutions in the U.S. were actively pondering librarian roles in data management early in the millennium. For example, in 2006 the Association of Research Libraries issued a call for the creation of longterm data stewardship frameworks (ARL 2006).
Beyond organizations like ARL, libraries that were early leaders in exploring data management roles included Purdue University (Brandt 2007, Witt 2008, Witt et. al 2009, Cragin et. al. 2010, Johns Hopkins University (Choudhury 2008(Choudhury , 2010, and Cornell University (Steinhart 2007. More recently, a number of American universities have undertaken local studies of the needs of researchers on their campuses in order to better understand opportunities for library involvement in data management. These case studies, including one at the University of Houston (Peters & Dryden, 2011), have shown a wide range of attitudes and levels of preparedness among faculty researchers (Bracke 2011, Latham & Poe 2012, Parham et. al. 2012). Most of these case studies involve surveying or interviewing faculty members at the institution, but Lage et. al. (2011) offer a novel approach towards the creation of "personas" for researchers with different needs and attitudes related to data management.
As evident from the early leaders (e.g. Purdue), much of the conversation around research data management has come from large research institutions, but other types of universities are also beginning to research their institutional needs. Scaramozzino, et. al. (2012) has identified strong roles for librarian involvement in research data curation even at traditionally teaching-centered universities, particularly in terms of educating researchers about data management practices and requirements. Similarly, Shorish (2012) provides suggestions for small steps that librarians at masters and baccalaureate granting institutions can take to engage their faculty around data management issues. Even some very small liberal arts colleges have begun to explore data curation opportunities, notably Mount Holyoke and the other members of the "five colleges" consortium (Goldstein & Oelker, 2011).
As a necessary adjunct to exploring the role of libraries, attention has also begun to be paid to librarian preparation for data management roles. Harris-Pierce and Liu (2012) question the adequacy of North American LIS degree programs for preparing librarians to enter these emerging roles, and find that while many new courses have been created to respond to this demand, there are still far too few. Cox et. al. (2012) discusses potential curricula for training existing science librarians in competencies needed to support research data curation and management. They suggest an 8-part curriculum based on previously identified roles for librarians involved in data management. Steinhart and Qin (2012) discuss their mentoring collaboration with a newly developed certificate program in eScience Librarianship at the University of Syracuse.
Of course, the preparation of libraries and librarians for data management roles only addresses one side of the issue; studies such as Johnston and Jeffryes (2013) have also explored the need to prepare students in STEM disciplines to meet new data requirements and expectations in their current and future research. This is especially important because, although the concepts of public access to federally funded research data and data management plans generally have been around for some time, researchers seem to have been largely unprepared to meet the 2011 NSF mandate and other similar requirements. In fact, according to a Cornell University Data management survey; "53% [of respondents] would be interested in any sort of guidance, including consultation," for writing a data management plan in support of an NSF grant application . In the same study, Cornell also found that, "considerable confusion exists as to what 'counts' as data, even among researchers who are likely among their discipline's experts." Much of this confusion may stem from a lack of clarity about funding agency requirements for data management planning. Another Cornell study conducted in the summer of 2012 (Dietrich, et. al.) found that no single agency's policy addressed all elements of what should be included in a Data Management Plan. Four of the evaluated funders appeared to have no policy at all; though the study did find some common requirements that pertained to general data management activities.

JL SC
Cornell's findings illustrate the excellent opportunity that librarians have to educate researchers about the organization of their data and the role that data management planning plays in supporting long-term access to their data, the possibility of reuse of their data, and the potential for increase in citations to their research. These data management benefits can lead to better science, enhanced returns from publicly funded research, and improved linking between datasets. The opportunity both to provide a valuable-and needed-service to researchers and also to improve the overall accessibility and impact of research data is one that librarians should continue to explore and embrace.

INSTITUTIONAL CONTEXT
As illustrated earlier, at the University of Houston Libraries we have found that creating a DMP is daunting for some researchers. Junior faculty and experienced researchers alike are often confused by what data management is, why they need to plan for it, and what should be included in a plan. Exacerbating this confusion, and making DMP assistance even more vital, is the fact that researchers have been seeking grant opportunities far more vigorously than they have in the past due to the University of Houston's drive to become a Tier 1 research institution (University of Houston, 2010).
As grant activity expanded, the number of phone calls and emails we received quickly made it clear that researchers needed assistance in developing their DMPs. Many researchers were leaving their DMPs to the last minute, which made it critically important to be able to provide guidance that was understandable, easily navigable, and thorough. Initially, the Head of Digital Services considered designing a template, but discovered that there were templates available online already. Based on the calls received from researchers, it was apparent that they had questions that no available template was able to answer. Furthermore, templates did not seem able to provide the desired level of interactive assistance. An interactive web form that could help generate DMPs was identified as the best solution to the problem.
We were aware that there were other, more comprehensive, tools in development (see Tool Review), but none were Launched in October 2011, the DMPTool is designed to help researchers meet the new funder requirements by allowing them to create funding agency specific data management plans. It offers a step-by-step guide and instructions for generating a DMP. Features include: 1. Account creation. By creating an account the researcher can save a plan and come back to it later. Very useful if the researcher has unanswered questions at first log in. 2. Funding agency specificity. When the researcher creates a new plan, he/she chooses the appropriate foundation/agency/directorate for the grant.
The DMPTool takes the researcher through a 5 step process for creating a plan. Different information is required at each step: Roles and responsibilities, expected data, period of data retention, data format and dissemination, and data storage and preservation of access. After filling out this information, the researcher is given the opportunity to export the DMP to plain text or rich text (Microsoft Word). The format is clean and in a manner that the NSF prefers. The researcher can also copy and paste the information from the web page. (cont'd) Journal of Librarianship and Scholarly Communication | jlsc-pub.org

JL SC
This tool is very user friendly, especially if the researcher knows what metadata is, where data will be stored, and how data will be shared. The funder requirement link pulls up the link to the funder and a funder requirements template and lets the researcher know whether the DMPTool supports that particular funder. The help section offers a step-by-step guide to using the tool and there is also an interesting Demo video.
Including information about local resources and services for researchers could be problematic. The only way to modify the form to make it institution specific is to have the institution become a contributing member. Some institutions may not be willing or may not have the resources to manage or maintain an institution specific instance.  Figure 2). This tool helps research teams develop European funding agency required DMPs, but it works for U.S. funders also. The researcher is asked a series of questions; for example, regarding the stage in the application process the research is currently in and the type of funder to which the researcher has applied. If U.S. Funders is selected, the tool will ask the researcher to select the relevant funder and the only option is NSF-generic. The questions after this are all based on the researcher's answers, which is helpful since DMP Online then ensures that funder specific requirements are addressed.
For example, if a researcher is applying to one of the funders that make specific data-related conditions at the application stage, he/she will see the funder's requirements in the left-hand column. These have been mapped to the appropriate clauses in the DCC Checklist for a Data Management Plan (Digital Curation Centre, 2011). By answering the questions presented, the researcher should meet the funder's requirements.
At the present time any funders that do not have generic data-related conditions at the application stage are not currently mapped. The DCC hopes to work with all of the major funders to create accurate mappings that the funders can approve. Once the researcher indicates where his/her funding is coming from, a much longer series of questions is presented that covers all aspects of data management. These questions are, again, taken from the Checklist for a Data Management Plan (Digital Curation Centre 2011). The researcher has the option of answering as many or as few of the questions as desired within the section and can add or remove questions. For many of the questions, additional information is available by clicking the information button ( )on the right-hand side of the page.
The completed DMP can be exported in PDF, HTML or CSV formats. It is at this stage in the process that the researcher can reorder sections and can include or exclude sections. The DMP Online tool is very versatile and has some helpful features. At this time the only "US funders" template is for the National Science Foundation (generic). One drawback is that there isn't additional explanatory information for each question and the authors could find no example text. Additionally, this tool does not allow the inclusion of information about local resources and services that could be useful for researchers.  Figure 3) is a simple one-page web form. The researcher answers the questions, clicks on 'create PDF', verifies the information, and the process is done (producing a tidy PDF). There was no help, explanations, or definitions, only links to the NSF requirements and a helpful suggested repositories list.
In April 2012, Version 2 was launched with enhanced features: login, re-use of previously made DMPs, associating DMPs with funded NSF awards, an IEDA DMP Tool FAQ, and a preview of the Tool. The login is tied to the GeoPass (Integrated Earth Data Applications 2013b) system, which users must register for if they are not existing members of the organization.
widely available at the time, and there was no indication of when these tools would become available. Because we wanted to offer better guidance to University of Houston researchers before the next grant cycle began, we decided that waiting for these tools would be unproductive, and it would be more efficient to develop an in-house solution.
It was determined that our online form would be researched, developed, and executed by a very small group. Using a small working group made it easier to be more agile and responsive to changes and technical issues, which ultimately helped the project go swiftly and smoothly. The working group was composed of the Head of Digital Services and a librarian fellow from the Libraries' Systems department; the systems librarian provided the technical expertise required to build the web form while the Head of Digital Services provided the content. Within three months of the NSF announcement the team had designed a web form that would email researchers a draft DMP based on their answers to a questionnaire. This draft plan is in a format that is concise, easily readable, and includes what we determined the granting agencies wanted to see in a data management plan.

FORM OVERVIEW
The University of Houston Libraries DMP form (http:// info.lib.uh.edu/p/dmpform) consists of three separate web pages: an introduction/instruction page, the questionnaire, and an extensive FAQ. The first page of the form is informational and explains the purpose and the elements of a DMP and provides links to further information on this topic (Figure 4). It goes on to discuss how the form is assembled, what researchers can expect to generate as a result of completing the form, and the best way to finalize their plans. This introduction/instruction page was essential in providing guidance and context to the researcher on how to best answer the questions to ensure the least amount of editing in the end.
From this information/instruction page the researchers are able to preview and print the questions. This allows them to formulate their answers, look up terminology they may not be familiar with, and ask thoughtful questions about designing their DMP. Printing the questions in advance can guide the researchers in their writing and also allows them to create their DMPs without using the web form if they choose.
The questionnaire asks a series of questions and provides space for the researchers to fill in their answers ( Figure 5).
It is organized into five sections of thematically similar questions. Each section begins with introductory text about the questions in that section as well as links to the FAQ and contact information for the Head of Digital Services. The sections are: • Types of Data

• Data and Metadata Standards
• Policies for access/sharing and provisions for appropriate protection/privacy • Policies and provisions for reuse and redistribution • Plans for archiving and preservation of access Upon submitting the form, the researcher receives an email containing the answers to the questionnaire with the question text redacted. If the researcher has filled out the form in short narrative sentences, it is only necessary to then copy and paste the body of the email into a Word document. Any needed edits can be made at that time.
The average email response is approximately one and a half pages in length and after any edits the resulting DMP comes in under the two page limit that many agencies request.
We created a companion to the form which we called the DMP FAQ ( Figure 6). This was a separate web page which provided extensive detail for each question on the form, including sample answers and definition of terms that might be unknown to researchers. An extensive amount of research was put into the FAQ to provide researchers with the most current information possible. This FAQ is frequently audited and updated to ensure accuracy.
An increasing variety of funding sources are requesting DMPs from researchers. For this reason, we chose to make our form granting agency and directorate agnostic. We felt that offering a general form would help the greatest number of researchers. On those questions where directorates have made specific requirements, the researcher is directed to the FAQ.

FORM TECHNICAL NOTES
Our DMP web form was built in-house on our Drupalbased library website. We explored other options for creating the form, including third-party hosted solutions, but decided that building the form within the main library website would provide us with the greatest ability to customize the form itself. It also provides a better user experience through the more consistent look and feel of The form was built using the "Webform" module of the Drupal content management system (Figure 7). This is a fairly straightforward tool for building web forms with an easy-to-understand user interface. The Webform module allowed us to break up each question into three different components: question text, "helper" text to further clarify the question, and example text.
We used these text components in order to keep the questions as straightforward as possible. However, limitations of space and design made it impossible to include every example, suggestion, and explanation within the form itself. We linked frequently within the web form to points in the FAQ addressing specific questions and thematic areas.
The greatest difficulty that we encountered was in formatting the response emails correctly. The Webform module for Drupal does not easily allow for making changes to the output of form submission responses. At this point we required assistance from one of our library web developers. He was able to write a custom script to strip out the question text and format the responses in the way that we wished researchers to receive them. He also made some changes to the style sheet of the questionnaire itself, which we felt improved the look and feel.
Since the initial implementation, we have made several changes to each component of our DMP online tool to make it more effective. We regularly go back through the informational/instruction page, the questionnaire, and the FAQ to ensure that language/terminology is consistent in all areas. We clarified several questions that had proven confusing to early users of the questionnaire. One example of these changes was to indicate questions that may not be relevant to all researchers, such as HIPAA concerns. As different directorates have provided clarity about their expectations related to DMPs, we have edited the FAQ to reflect this information.
One of our early users also requested the ability to save and return to an in-progress DMP at a later date. This is a complicated process that would require us to hook into the University's authentication system, which we felt was beyond the scope of our project and the abilities of our content management system. Our University's authentication system would also have required that the researcher be officially affiliated with the university

Figure 7. Drupal Webform Module
(it is not possible to create guest network accounts for unaffiliated researchers). This could create difficulties for our researchers working on a collaborative project with colleagues outside of the University of Houston. In its current form, any researcher from any institution can use the form without a login.

Local Development and Promotion
Although the multi-institution DMP tools discussed earlier (see the Tool Review sidebar) are now widely available, we plan to continue to support and update our local tool. While these other forms are undoubtedly useful and have some valuable features that ours does not (e.g. the ability to save work in progress and link to previous DMPs), we feel that our ability to insert extremely targeted and specific local instructions (e.g. University retention requirements) for our researchers outweighs the absence of those features.
Beyond the ability to provide institution-specific instructions, there are two additional important reasons why we continue to update and provide more context to our form instead of using and promoting one of the multiinstitution tools. First, we retain local administration of the form, making it possible for us to follow up with researchers who have used the form and to make decisions about local needs. Second, we can notify researchers if the University of Houston Division Of Research (DOR) or a funding agency makes changes that impact their projects.
We are able to do this because our DMP online form keeps a record of every researcher that uses the form. This allows us to offer an expanded level of service and maintain a closer level of involvement with researchers. There may come a time when we move to another form, but for now we feel strongly that our existing form is most beneficial for our researchers. Of course, we certainly do not dissuade our researchers from using other tools if they wish-but we do not actively promote them.
We do, on the other hand, want to further promote the use of our form. Although this usage data shows that use is up from the first year of the form's existence, an ongoing initiative in the evolution of our DMP online form is to encourage its use in the wider University community. The DOR guide for researchers links to the form but finding the link is not easy. They are in the process of redesigning their researcher checklist webpage and the DMP online web form will have a more prominent place in the new design. We also hope that our investigation of research support needs (discussed in the following section) will highlight additional opportunities to promote the tool.

Building Out from the DMP Tool
While researchers who have used the tool have reported that it is extremely helpful, the Head of Digital Services still frequently receives phone calls and emails from researchers. These more personal interactions allow her to provide further explanation of the requirements or to walk the researcher through writing the DMP. Some of the most common questions she receives are: What is metadata? How can I share my data? How long should I preserve my data? What data should I be preserving, working data or completed data? The biggest question of all seems to be: Where can I preserve my data (preferably for free)? These questions have become more complex over time, indicating that our researchers are interested and actively involved in data management planning.
Though our online tool is proving useful for researchers who need to create a DMP, the form (and its instructions and FAQ) is not intended to answer every question about data management. Building on the success of the tool, we have also put together a team of librarians that work in collaboration to provide researchers on campus the best and most current information about data curation and management. One of the team's first projects was to construct a research guide (University of Houston Libraries, 2013) that explains what data management is, offers best practices, metadata definition and standards, and describes data storage options available on or off-site. This research guide is linked from the DMP online web form and the DMP online web form is linked from the research guide, which provides better information for researchers completing the online form as well as better visibility for the online form with researchers who visit the guide first. A small group, composed of a new librarian dedicated to science research and the DMP form project team, is also currently conducting an investigation of research support needs on campus. This is a follow up to a previous study evaluating researchers' readiness to comply with agency DMP requirements (Peters & Dryden 2011), and it should further inform our efforts to develop services like the DMP tool for researchers.

RECOMMENDATIONS
A few years have passed since the mandates for submitting DMPs have been put into place. There are now several widely available tools for creating data management plans, and many institutions may prefer to implement an existing tool or otherwise encourage their researchers to use them on their own. However, creating a local tool is still a valuable and achievable option for institutions that would like more local administration without joining or subscribing to an existing tool. The authors recommend building your own tool as an option for institutions that meet the above criteria. A similar tool can be created using most webform utilities.
Building a tool in-house affords the institution the ability to include very specific local information related to their researcher funding activities. This could include resources and services available on campus as well as institution specific policies and requirements, such as Institutional Review Board (IRB) guidelines. An in-house tool allows librarians to monitor form usage and to respond to researcher help requests. Our tool serves an outreach function as well as providing a valuable service to the campus researchers.