Assessment, Usability, and Sociocultural Impacts of DataONE: A Global Research Data Cyberinfrastructure Initiative

DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research.


Introduction Background
The DataONE team formed in late 2007 to design a project that would comprehensively address NSF's goals for its DataNet program with the intent of being selected as one of the "small set of full-scale exemplars" envisioned by NSF. The DataNet program goals were to: '(1) combine expertise in library and archival sciences, computer, computational, and information sciences, cyberinfrastructure, and domain sciences and engineering; (2) develop models for economic and technological sustainability over multiple decades; (3) engage at the frontiers of science and engineering research and education as an information resource, an object of research, and a research entity; and (4) work cooperatively and in coordination to create a functional data network with revolutionary new capabilities for information access, use, and integration without regard to conventional barriers such as data type and format, discipline or subject area, and time and place' (NSF, 2007).
From the beginning, the working group structure was fundamental to the DataONE project and its goal to engage broadly with stakeholders and thus effect cultural change in researchers, institutions, data management education, and research and academic libraries. The DataONE project created a small executive team consisting of the principal investigator, executive director, and directors of (1) cyberinfrastructure and (2) community engagement and outreach. Coinvestigators were named as co-chairs of ten working groups or as members of the Core Cyberinfrastructure Team, which was responsible for the detailed technical design and infrastructure development (see Figure 1). Working group members, other than the coinvestigators supported by the project, were volunteers who were solicited and screened by the DataONE leadership to ensure that a diverse range of skills and backgrounds were represented in each working group.
Five working groups were designed to address technology issues and five to address community engagement and outreach topics. 1 Each working group developed its own charter and worked independently on tasks as well as working together towards achieving the overall DataONE goals. The Sociocultural Issues (SC) WG, active from 2009 through 2014, expressed its purpose, scope, and mission in its charter: 'This working group is responsible for informing the efforts of DataONE from a set of diverse perspectives: sociocultural, international and interdisciplinary. The working group engages in identifying, promoting, assessing and developing models, frameworks, definitions, theories, policies, practices and products that can be used within DataONE as well as in the broader scientific community… This working group researches the social and cultural context of the scientific data lifecycle to devise strategies that maximize the impact of DataONE.
This working group thinks and visualizes from large-scale, long-term perspectives, considering the sociocultural aspects of data management, data use, data sharing, data access and preservation.
The working group succeeds by inspiring innovations in the data practices of scientists and other stakeholders to ensure preservation and access to multi-scale, multi-discipline and multi-national environmental science data.' 3 The Usability & Assessment (U&A) WG, active from 2009 through 2019, had the following purpose, scope, and mission: 'This working group will focus on the research, development, and implementation of the necessary processes, systems, and methods to ensure DataONE products and services meet network goals, include appropriate community involvement, and demonstrate progress and achievements of DataONE.
The scope of the Usability & Assessment Working Group is defined as activities necessary to establish program performance indicators, measure usage and impact, and adopt usability analysis principles and methods to ensure that high quality, community-driven products and services result from DataONE activities. This includes periodic testing of versions of the system and tools as they are being developed. The Working Group also establishes and implements appropriate methods, tools, and instruments for usability and assessment of all DataONE stakeholders.' 4 The SC and U&A WGs held joint face-to-face meetings twice per year during the first fiveyear cooperative agreement (Phase 1 of the project) and there was cross-participation between the groups and with other working groups on many activities during this period. At the start of the second NSF cooperative agreement (Phase 2), the project was reorganized and streamlined ( Figure 2) to focus on growing the number of participating data repositories and providing new cyberinfrastructure services. Several members and leaders of the SC WG joined the U&A WG, ensuring that the sociocultural perspective was not lost.

Literature Review
The work of the SC and U&A WGs was influenced by scholarship in the following areas: infrastructure, science and technology studies, cyberinfrastructure, computer-supported cooperative work, collaboratories and virtual organizations, digital libraries, human-computer interaction, and free/libre/open source software. Lee and Schmidt (2018) provide a thorough review and critique of the literature covering the first five of these areas. The earliest critical and analytical work on infrastructure conceived the interrelationships between computing machinery, software, people, and organizations as a sociotechnical web (Kling and Scacchi, 1982). Throughout the 1980s and the early 1990s, social scientists and some computer scientists continued to develop methods and theory based on empirical research illustrating the social aspects of computing (Gasser, 1986;Suchman, 1987;Star and Griesemer, 1989;Bowker, Star, Turner, and Gasser, 1997). Among their interests were: determining the factors that support or stand in the way of the adoption of computing technologies; studying how organizational routines adapt to the introduction of new technology; and exploring the social impacts of technology use (Gasser, 1986;Kling, 1987).
According to Lee and Schmidt (2018), "...'infrastructure' refers to a technical facility that provides a service to the wider world." Infrastructure can be primarily physical, like transportation systems, or virtual, like the Internet. Infrastructure grounded in computation, data, and networks is sometimes referred to as cyberinfrastructure (NSF, 2003). The term cyberinfrastructure was coined in a 2003 NSF report (Atkins et al., 2003) to refer to infrastructure such as DataONE's that is "based upon distributed computer, information, and communication technology" (Atkins et al., 2003). Lee and Schmidt (2018) urge researchers to carefully consider how they define the term infrastructure: "The point ... is that [infrastructure] refers to a system that under some description supports another: an infrastructure in its relation to a superstructure." For example, multi-modal transportation systems comprised of ships, trains, and trucks are the infrastructure that supports the supply chains required to produce products in the globalized economy (the superstructure). DataONE's computational layer of repositories and software comprise the infrastructure that supports emerging practices (the superstructure) of data science, synthetic research, data citation, and data reuse.
Scientific collaboratories were described and their feasibility discussed in an influential 1993 report from the U.S. National Research Council (NRC, 1993). By the mid-2000s many workshop reports about a range of disciplines had been published by NSF (see, for example, NSF, 2003). Many independent researchers had published about the challenges and successes of collaboratories (Finholt, 2003;Jirotka, Lee, and Olson, 2013). Star and Ruhleder (1996) studied the complexities of large-scale infrastructure for multi-disciplinary distributed collaboration and described multiple levels of technical, social, and structural challenges, characterizing the challenges as differences among users based in their disciplinary practices, cultures, domain knowledge, and understanding of the infrastructure itself.
The NSF has recognized the crucial role of secure, scalable data cyberinfrastructure in multi-disciplinary scientific collaborations. 6 These collaborations can involve deep integration, or convergence, between two or more disciplines. According to the NSF (2017), convergence research is defined by two characteristics: first, it is driven by a specific and compelling problem, and second, it involves deep integration between disciplines. This integration of disciplinary expertise leads to novel research approaches, new research paradigms, and the formation of new research communities (Pollock, Yan, Parker, and Allard, 2019b). Multiple sources have noted the success of such approaches in addressing complex challenges in areas including computation, engineering, the environment, and human health (Bainbridge, 2004;MIT, 2011;NRC, 2014;Sharp, Hockfield, and Jacks, 2016). DataONE may be examined as both a facilitator of and an example of convergence research.
Boundary objects, as defined by Star and Griesemer (1989) are "objects which are both plastic enough to adapt to local needs and the constraints of the several parties employing them, yet robust enough to maintain a common identity across sites." Boundary objects -including repositories and repository networks -can help facilitate communication and cooperation between different individuals or communities with differing viewpoints and areas of expertise. Bowker and Star (1999) introduced the concept of "boundary infrastructure," serving multiple communities of practice and maintaining a consistent structure while allowing heterogeneity in information types and information practices among the communities it serves. As further detailed below, multiple WG products function as boundary objects, while DataONE itself can be interpreted as boundary infrastructure.
Many of the concepts and approaches described above were integrated and applied to the study and development of open, web-based research infrastructure through the NSF/ARPA/NASA-funded Digital Library Initiative (1994)(1995)(1996)(1997)(1998). The six DLI projects intentionally included social scientists, library and information science researchers, and human factors experts as well as computer scientists, domain experts, and engineers. The research approach social informatics emerged during this period as Rob Kling and other colleagues from the University of California Irvine participated in some of the six DLI projects, providing exemplars for numerous future digital library and cyberinfrastructure projects (Bishop, Neumann, Star, Merkel, Ignacio and Sandusky, 2000;Borgman, 2003;Hill, Carver, Larsgaard, Dolin, Smith, Frew, and Rae, 2000;Marchionini, Plaisant and Komlodi, 2003;Van House, 2003).

DataONE Working Groups: What We Did and How We Did It
Projects undertaken by the SC and U&A WGs were designed to help the entire DataONE team understand the practices of the broader research community as they related to the management of research data. The projects proposed by the SC and U&A WGs were discussed with the entire DataONE team and benefited from these discussions through a process of iterative feedback and refinement. The products resulting from these projects helped inform and guide the work of the entire DataONE project. The SC and U&A WG members also participated in the framing and review of a wide range of foundational artifacts, documentation, and resources created by other WGs. These artifacts not only established a framework for the SC and U&A WGs as they embarked on their work, but were also a central resource that helped other DataONE working groups come to a common understanding of our stakeholders, their needs, and their overall research workflows. For example, the contextual and socio-cultural information derived from various studies of DataONE stakeholders conducted by the SC and U&A WGs fed into the design and development work performed by the Cyberinfrastructure Team. These artifacts also provided a baseline structure that informed the work of the Community Engagement and Education Group (Phase 1), as well as enriched the strategic and sustainability planning spearheaded by the Leadership Team.
Three artifacts -a stakeholder matrix, the data life cycle model, and personas -served as boundary objects that facilitated the connections between different groups and researchers with different expertise. This established a foundation and shared language which was critical for future knowledge development and achieving convergence.

Stakeholder Matrix
The SC and U&A working groups realized early on that a guiding structure would help identify and prioritize stakeholders to study. A stakeholder matrix was developed by the SC WG to identify all possible groups that could be a part of or benefit from the DataONE community ( Figure 3). In Phase 1, the SC WG explained: "To facilitate understanding of stakeholders' education and training needs, system specifications, socio-economic and political contexts and to facilitate measuring DataONE's progress the Sociocultural Working Group has created a stakeholder matrix. The matrix includes five key stakeholder sectors (private industry, academia, community, government, non-profit) and numerous stakeholder employment settings. It can be used to understand the kinds of questions various stakeholders address in their work, their information needs and the ways in which DataONE can positively impact their work." 7 DataONE project participants subsequently prioritized and identified those groups who would most likely contribute to or benefit from the changing culture of data stewardship and reuse by employment setting (academia, private industry, etc.) rather than workplace (libraries, publishers, institutions, etc.). These priority stakeholders included scientists (both as researchers and educators in any employment setting), libraries and librarians in federal and academic settings, and science data managers in government and academic settings.

Data Lifecycle Model
The SC WG "review[ed] … numerous models developed to describe and depict the Data or Information Life Cycle…" and created a concise model for use by the DataONE project. 8 The life cycle includes eight steps and describes a generally sequential process. The model was unique at the time because it visualized data as the focus of the lifecycle with stakeholders entering and exiting the cycle at various points as they interacted with data. The first iteration of the life cycle included these steps: Collect, Assure, Describe, Deposit, Preserve, Discover, Integrate, and Analyze. The current version adds the step "Plan" and subsumes the discrete step "Deposit" within "Preserve" (Figure 4).  The DataONE Data Life Cycle Model established a simple, easy-to-understand, actionoriented visual model of the research data workflow that has been widely disseminated across disciplines. The model has been used by internal DataONE teams as they build and develop services for researchers that conduct work at any phase within the life cycle model. Additionally, the model has been widely used by the broader research and information community, as illustrated by website usage statistics. The model has been regularly accessed since its publication on the DataONE website in December 2015 (an average of more than 700 unique page views per month of the webpage describing the model). A data management primer expanding upon the model was created and linked to this page and deposited in the eScholarship Publishing and Repository Platform, hosted by the California Digital Library (CDL). 10 CDL metrics on this document report an average of 35 hits and 11 downloads per month. Additionally, Google Scholar reports 32 citations to this primer.

Personas
The concept of personas was introduced by Alan Cooper (1999) in the context of user interaction design. Within the systems development discipline, user stories and scenarios that make up a persona are valuable tools to help a community develop a shared understanding and perspective on users and stakeholders within their systems' community (Crowston, Bissell, Grant, Manoff, and Davis, 2015a).
Personas were created by members of the SC and U&A WGs to describe the DataONE community of users: five types of scientists and a science data librarian as primary users, as well as five secondary roles. An example is shown in Figure 5. Each persona description includes background, reasons for using DataONE, needs for and expectations of the tools, skills that could be applied, technical support available, personal biases about data sharing and reuse, and associated DataONE use cases. Personas descriptions also include a name, picture, personal background, and life and career goals, hopes, and fears to make the user more real and thus salient to users. Personas were based on interviews and practical experience in each of the roles described. The collection of personas was published on the DataONE website 11 and the development process was described in a journal article (Crowston, 2015).
The personas were developed to help the entire DataONE team understand the stakeholders for whom they were building their tools, educational resources, and communication efforts. The development team used the personas to group together related use cases supported in a particular release, for planning future releases, and to identify which kinds of users should be involved in system testing. The community engagement team found them useful as a way to engage potential new users by showing that the system was designed for people like them. Personas can also illustrate how users might benefit from DataONE tools and services to augment their data creation, use, management, and reuse. As such, the personas were an important tool to help the DataONE external community understand their own role within the data life cycle and help them understand how they contribute to the overall data landscape.
The personas, used in conjunction with the data life cycle, serve to explain how different stakeholders participate in data management, illustrating their involvement in different stages of the life cycle. Website usage shows that these persona resources have been regularly accessed since its publication on the DataONE website in fall 2015 (70 unique page views per month across all the personas).
In addition to informing DataONE project participants, the stakeholder matrix, the data life cycle, and the personas continue to help external community users understand the interactions of stakeholders at different stages of the data life cycle.

Framing and Refining Internal Resources
The SC and U&A WGs both reviewed and contributed to framing and refining internal and external documentation and resources created by other project WGs including, but not limited to: DataONE policies, best practices for data citation, DataONE member node guidelines, DataONE cyberinfrastructure and governance, documentation for DataONE tools, DataONE executive summary, DataONE terms and conditions for use, network analysis of DataONE working group structure and membership, DataONE FAQs, education modules, user metrics, and building the DataONE usability analysis strategy. These reviews helped the broader DataONE team and informed the work of other DataONE working groups, in particular the 12 DataONE: https://www.dataone.org/personas/sun-early-career-herpetologist

IJDC | Research Paper
Robert J. Sandusky et al. | 11 Community Engagement and Outreach Working Group and the Core Cyberinfrastructure Team.

Tests, Methods, and Approaches
The U&A WG used a variety of methods to discover current data practices, attitudes, and opinions from many of the stakeholder groups initially identified (Figure 3) to measure the impact of DataONE publications and presentations and to plan the future work of both the SC and U&A WGs. Figure 6 shows the methods used over time for each group, including surveys, usability tests, interviews, persona development, and environmental/website scans. Taken together these provided input to improving DataONE products and services and demonstrated some of the impacts of DataONE on the broader community.

Usability/UX tests
User Experience (UX) testing was integrated into the design, development, and refinement of the DataONE technical infrastructure. UX testing measures the usability, efficiency, and effectiveness of a product or a system by capturing the users' experiences to identify problems. 13 The goals of UX testing in DataONE were to improve DataONE products and to help 13 Usability.gov: https://www.usability.gov/how-to-and-tools/methods/usability-testing.html understand community needs and expectations. Iterative UX testing and evaluation using a variety of usability testing methods occurred throughout the project. During the design and development phases, heuristic evaluations, prototype testing, and eye tracking studies were completed to identify any problems before the product was released. Following product release, iterative UX testing was performed to ensure the product continued to meet users' needs. Some of the products evaluated include:  The current DataONE Search and the former ONEMercury Search  Specific parts of the DataONE Search (e.g., provenance display, semantics display, signin features, member node profiles, metadata display)  DataONE website  Data Tools (e.g., MatLab, DMPTool, metadata editor, ONEDrive).
To reach users from across a range of DataONE stakeholder groups, UX testing was conducted at conferences, scientists' work places, the University of Tennessee's state-of-the-art User-eXperience Lab, by telephone, and online. Approximately 50 UX studies were conducted throughout the project.
In addition to helping improve its products and services, UX testing strengthened DataONE's relationship with its users. Users felt a sense of pride and connection with the project because they were able to be a part of developing and refining the products and services.

Surveys
When the project began in 2009, published reports of empirical research into the data management, data sharing, and data reuse practices and attitudes of DataONE's key stakeholder groups was limited or non-existent. Surveys were designed to gather data to guide the development of products and services, and to understand where more education and training was needed. Understanding the user (and potential user) communities was not a sole responsibility of the U&A WG. The Community and Engagement WG also worked on understanding user practices and attitudes in order to identify training opportunities and create shared materials and webinars. Additionally, DataONE learned from other projects investigating changing stakeholder perceptions and practices (see for example, Wallis, Rolando, and Borgman, 2013;Faniel, Kriesber, and Yakel, 2016;Van Den Eynden et al., 2016;Yoon, 2017;Yoon and Schultz, 2017;Bezuidenhout and Chakauya, 2018).
The DataONE U&A WG first prioritized which of the many potential stakeholders were key to changing the culture of data practice and then made a recurring plan to study these key stakeholders over time. The two primary stakeholder groups were 1) scientists in all workplaces and 2) academic libraries and librarians. Knowing stakeholder attitudes and practices and how they may be changing helped to understand how the culture for open data could be improved. It was decided to survey those groups every three years for a total of three cycles. Scientists and librarians also often serve a dual role as educators or data managers, so those additional stakeholder groups were reached by surveying scientists and libraries.
Surveys of scientists were published in 2015a;2019a, in press). Surveys of academic libraries in North America were published in 2012  and 2015 (Tenopir et al., 2015b) and 2019 (Tenopir, Allard, Kaufman, Sandusky, and Pollock, 2019a, in press) and of European academic libraries in 2017 . The unit of analysis for the library surveys were the libraries as organizations -measuring the policies, practices, and services of the library as a whole. In order to get attitudes of the librarians who work in those libraries, we also surveyed individual academic librarians (Tenopir, Sandusky, Allard, and Birch, 2013;Tenopir et al., 2019).
Interviews were conducted both to explore new areas of stakeholder research and interests and to triangulate quantitative survey results. Supplemental interviews with members of key stakeholder groups were conducted to augment survey data, which allowed a more nuanced picture and probed issues that needed clarification from survey responses. For example, in the second libraries survey there was not as much progress in offering RDS as was expected based on answers about future planning in the first survey. Interviews with five directors of academic libraries revealed that implementing RDS was more time-consuming than they originally thought or that other priorities had emerged in the meantime (Tenopir et al., 2015b).
The goal of another project was to understand the role of environmental data in emerging research communities, here defined as those that have begun to converge around new areas of science and new scientific challenges. In 2016, interviews were conducted with domain scientists who had participated in convergence research in the area of environmental health to understand the role of data in their research teams (Pollock et al., 2019b). Participants were selected based on their co-authorship of an environmental health journal article that made use of open environmental data held by a DataONE member repository.
Participants described challenges when sharing data within these multidisciplinary convergence teams as interpersonal rather than technical, related to things such as making sure data are understood even by non-domain experts. Participants described team members filling a role that can be described as a data mediator, a trusted member of the team skilled at communicating across disciplines, who is often relied upon to interpret the raw data for others. Additional interviews with four directors of synthesis centers that have helped facilitate environmental health research also point to the interpersonal challenges of sharing data among convergence research teams (Pollock, Allard, Yan, and Parker, 2019a). Here again, respondents primarily described interpersonal and disciplinary-level data challenges and noted the need for expert personnel able to listen and communicate across different domains, particularly as disciplinary divides between environmental and health science remain. Additional interviews are recommended to examine the role of environmental data in other convergence research communities (Parker, Pollock, and Allard, 2018).

Environmental scans
Surveys and interviews monitor attitudes and behaviors of individuals or institutions, but do not give a big picture explanation of the data management landscape. Organizations have come to recognize the need for gauging their place in the broader environment by assessing resources and relationships and making adaptations based on their findings (Bolman and Deal, 2009). Environmental scans were conducted in both Fall 2013 and Fall 2018. These were multifaceted analyses of projects and initiatives in the DataONE mission space to help DataONE leadership better understand the existing competitive ecosystem. Assessing DataONE's place in this broader environment provided valuable information and insight to inform the transition from a project to a sustainable program.
In 2018, 21 organizations were identified whose missions aligned loosely with the DataONE mission. 14 Four were comparable to DataONE in that they held metadata, but not data (Forrester, Allard, Cannon, Pollock, and Specht, 2019). One organization was characterized as a data search tool and the rest represented either data support services or data repositories.
Results from the scan indicated that DataONE is well positioned to concentrate on the following service areas that currently exist or are extensions of on-going work: usage reporting, data replication, and data quality. Additionally, DataONE's proximity to the data distinguished it from training-only or data support services and gives DataONE a competitive advantage in the area of data science training. 14 DataONE Mission: Enable access and use of data about life and the environment.

Summary of Foundational Work and Assessments
As described above in Figure 6, multiple methods were used to study the major stakeholder communities, in particular focusing on scientists and on libraries and librarians. The studies assessed both the stakeholders' ability to use the DataONE cyberinfrastructure through usability testing and their attitudes toward and practices regarding research data management or the data landscape in general. Reflection on these studies paint a picture of change, sometimes slower than expected, and a growing realization of the importance of sound data practices by some segments and also of barriers that inhibit data sharing. The results of the studies (and often the datasets associated with them) have mostly been published and widely disseminated (Appendix B provides a bibliography of the WG publications). The purpose of this paper is not to repeat detailed findings from those publications, but instead, to highlight some of the important findings above and show the reach and impact of the DataONE assessments.

Broader Impacts -Outcomes and Evidence
There are a few broad areas that categorize the impacts of the SC and U&A WGs: understanding research data management practices; developing practitioner and professional research data management communities; and improving the usability of research data management resources. The specific outcomes and work that was undertaken provide evidence of DataONE's impact on research data management and specifically the work of these two working groups.

Impact of the Working Groups on the Awareness, Learning, and Understanding of Research Data Management
As the literature of working groups and research data management described previously shows, a dedicated group of interdisciplinary members can collectively make important contributions to the issues they are working on together. By participating in the scholarly discourse, the WGs promoted the importance of sound data practices and conveyed results and insights from thousands of scientists and librarians. This can be demonstrated by the number of scholarly works (Appendix B) and presentations (Appendix C) produced by all WG members and participants. Between 2009-2019, 48 papers were published across 25 different journals and eight conference proceedings. Based on subject classification in Ulrichsweb.com™, the disciplinary audience reach of the publication titles present in the database is shown in Figure 7. Sciences include environmental studies, earth sciences, biology, agriculture, and astronomy. For the ten-year period, and as of writing, members of the SC and U&A WGs made at least 170 presentations (talks, papers, and posters) in 22 different countries (Figure 8). 15 The reach is actually greater, because there were also at least 13 virtual presentations with audiences in multiple locations (Table 1). Some were recorded for viewing later. While it is unknown exactly how many people were present at all these face-to-face, virtual, and recorded presentations, an estimated minimum of 5,000 people heard a DataONE related presentation from SC or U&A WG members.  The WGs disseminated their understanding of sound data practices, as well as barriers to data sharing and data reuse, through scholarly publication. One measure of evidence of the influence of DataONE WG activities on other researchers is citation analysis. To measure this impact, the U&A WG compiled a list of publications (Appendix B) resulting from the various activities and searched for citations in the Clarivate Web of Science Database (WoS), Elsevier Scopus database, and Google Scholar. Altmetrics scores were also collected. Results are presented in Table 2. 16 Survey data from several of these publications were placed in data repositories to make them accessible and citable. Additional metrics were gathered to illustrate the discoverability and potential reuse of the individual data sets through views and downloads recorded by the repositories (Table 3). 16 Citation counts collected July 1, 2019

WG participation
The large and varied participant composition of the WGs catalyzed new partnerships through WG participation. Over the period of 2010-2019, there have been a total of 27 WG members from a range of organization types and professional roles and more than 79 volunteer affiliates (e.g., students, post-docs, visiting scholars, interns, practitioners) who attended WG meetings. In total, 88 individuals came from eight countries ( Figure 9) and represented 40 organizational affiliations. More than half of all WG members and affiliates are characterized as working in academic organizations ( Figure 10). 17  The SC and U&A WG members conducted internal assessments of satisfaction, perceived communication issues, and perceived effectiveness of the WG model. These assessments included members of all DataONE working groups, not just the SC and U&A WGs (Crowston, Specht, Hoover, Chudoba, and Watson-Manheim, 2015b). Results from Phase I analysis indicated that working groups can be effective when they are structured well (Crowston et al., 2015b). While team problems are likely to arise, shared routines and mental model such as openness to diverse opinions, shared communication practices, and active participation of bridge builders, such as librarians, can lead to success (Crowston et al., 2015b). Crowston et al. (2015b) found that the DataONE working groups generally functioned well, with a commitment to share information and keep group members informed. Working group participants overall felt their WG was successful and they felt their WG was above average in comparison to other groups, no matter to which group the respondent belonged. Participants "felt the work of 'their' group was innovative, had produced valuable outcomes, and the team had worked effectively together. In summary, group members respected their fellow member's contributions and felt the work of their group was of value, the great majority expressing a longterm commitment to the project" (Crowston et al., 2015b).

Workforce development
Many of the volunteer affiliates in the SC and U&A WGs were graduate students at the time of their involvement with DataONE ( Figure 11). These students have gone on to a variety of jobs and four continued participating after transitioning from student status to being members of the workforce. As of April 2019, the employment of 34 of the 44 students (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) could be identified. The majority are employed in academia as faculty, librarians, and staff (Table 4) and hold data-related positions with titles such as: Data Curation Librarian, Data Scientist, Metadata Content Editor, Data Ingest, Engineering Project Manager, Information Specialist, Business Analyst, Lead Digital Analytics Manager. They hold positions in the USA, Brazil, India, and Turkey.
Although not specifically analysed, the student participation in working group activities surely had an impact on their successful employment. Student involvement in real-world experiences (e.g., practical internships) make them more competitive on the job market and employers recognize the benefits of participation in these types of activities (Ferrer-Vinent and Sobel, 2011;Pymm and Juznic, 2014). The DataONE WGs provided an opportunity for students to be involved in solving complex real-world problems and further develop transferable skills, such as communication, teamwork, and professionalism, critical to success in any job environment.
In addition, the interdisciplinary nature of the working groups helped bring a variety of perspectives. There were a number of librarians and other information professionals in the groups, who led studies on how librarians learn about research data management and what libraries are doing to help scientists at their institutions Tenopir et al., 2013;Tenopir et al., 2015b).  Many of these students were supported by grants from other agencies that resulted from their institutions' involvement in DataONE. The Institute of Museum and Library Services (IMLS) provided almost $2 million for several capacity-building grants to the University of Tennessee, Knoxville and University of Illinois at Urbana-Champaign. Over a period of nine years these grants helped the universities to educate almost 30 graduate students in the areas of team science, data management, and usability and assessment.

Website scans
The role of academic librarians is important in disseminating information about DataONE and its resources. The U&A WG conducted website scans of academic library members of the Association of Research Libraries (ARL) to measure mentions of DataONE (a mention is defined as an occurrence of the exact term "dataone"). ARL websites were defined to include resource lists as well as library and research guides. Of the 116 academic library members of the ARL, 67% mentioned DataONE at least once and 357 total mentions were found on these pages. Of the total mentions, 80% link to best practices or other informational tools and 20% point to the DataONE search and discovery system (Cannon, 2018).

Impact of the Working Groups on Improving the Usability of Research Data Management Products
Again, an interdisciplinary and dedicated group of participants can collectively contribute in important ways to the overall design and usability issues. The U&A WG applied usability analysis principles and methods to ensure that high quality, community-driven products and services were available to the community. This included conducting heuristic analysis and iterative usability testing of DataONE websites and products to ensure they followed general usability principles. Each instance of usability testing resulted in a report to the WG or team that had requested the testing. The work of the U&A WG improved the functionality and appearance of the DataONE website and search and enhanced the user experience on the interfaces, increasing the website's value as a data and data management resource. Many visual changes to the DataONE website and search occurred over the ten years in response to reports and recommendations provided to DataONE (Figure 12). The U&A WG scope also expanded beyond DataONE-specific products and had an impact on improving the usability of products and services in the broader research data management community. An ICPSR-Sloan challenge grant to Syracuse University built upon the DataONE best practices and developed a Capability Maturity Model (CMM) for Research Data Management. The CMM provides a rubric to help projects or organizations assess their level of data management practices as a set of capability levels, from no data management practices (level 0) to institutionalized practices (level 3) (Qin, Crowston, and Kirkland, 2014).
Usability work has been conducted with several DataONE partners, including Atmospheric Radiation Measurement Climate Research Facility (ARM), United States Geological Survey (USGS), and the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC). This capability led to the incorporation of usability work into other DataONE-related projects, including Whole Tale, Make Data Count, and DMPTool, as well as community research projects. In one example, researchers from the University of Sao Paulo conducted usability tests of non-native English speakers' interaction with DataONE. Results indicated that non-native English speakers rely on the website search function rather than menus. These are important findings if an objective of a data repository search interface is to draw more users who are not fluent in English.

Conclusions
Opportunities for the broad, iterative types of assessment performed by the DataONE community on global-scale information infrastructure are rare. This paper provides a summative description and reflection on the activities and broader impacts of the work performed by two of the working groups established at the beginning of the DataONE cooperative agreement. The SC and U&A WG members and affiliates established close working relationships with each other and often met together, with participants from both groups contributing to many sociocultural, usability, and assessment projects. The working groups provided analysis that informed a wide range of DataONE activities, both for the better function of the research project, and in interface with the broader community.
Due to the interdisciplinary nature of the DataONE SC and U&A WGs and the extensive interaction of a broad range of members throughout the project, a new vocabulary and set of frameworks emerged for thinking about research data management, indicators that the working groups were conducting convergence research. For example, in envisioning the research data life cycle, the group decided on a simplified, action-oriented model that was relevant across disciplines and functions. Furthermore, these vocabularies and framework have influenced thinking in the wider community, thus promoting broader convergence around data management, as documented above.
The position of WGs in the DataONE structure was deliberately pragmatic (Figure 13). Using the evaluation and advice of a wide group of experts organized into working groups, the project was able to refine and realize the project goals. The research community had a voice through the WG members, providing some quality assurance and reality-testing along the way. The WG members themselves were DataONE interpreters to their respective communities, acting as agents of outreach.
The WG members have expanded the research data community through many follow-on grants and projects. Many of these relationships cannot be quantified, but some are evidenced by grants or projects that brought together WG participants and others into new relationships. The interdisciplinary, transorganizational, and transnational reach of DataONE and its collaborative working group model has created and strengthened networks across the dataecosystem science space. An example of this was a successful application from one of the WG members with colleagues, largely met through DataONE, to the Belmont Forum, for Science-Driven e-Infrastructures Innovation (SEI) for the "enhancement of transnational, interdisciplinary, and transdisciplinary data use in environmental change research." This application (Building New Tools for Data Sharing and Re-use through a Transnational Investigation of the Socioeconomic Impacts of Protected Areas (PARSEC)) has partners from Brazil, France, Japan, and the United States and collaborators from Earth Science Information Partners (ESIP), ORCID, Research Data Alliance (RDA), DataCite, National Computational Infrastructure (NCI) Australia, and the British Geological Survey (BGS UK). 18 Figure 13. Application of the multi-level organisational innovation system as employed by DataONE based on Jantsch (1970).
DataONE is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse during its first two phases. In the decade from 2009-2019 the open data landscape has evolved, with increased awareness by scientists, mandates from government and other funding agencies, and requirements from publishers. The work done by the SC and U&A WGs provided a unique longitudinal look at how scientists, librarians, and other key stakeholders progressed in their thinking and practices around research data management. While the WG model is not part of the long-term sustainability of DataONE, the work of the SC and U&A WGs informs the ongoing operations as it transitions to its third phase. The new DataONE Governance Model will ensure a community driven organization comprised of four primary groups: Management Team, Advisory Board, DataONE Community, and the DataONE Community Board. 19 User Experience (UX) testing will continue to be an important tool to ensure the needs of the broad stakeholders are continually met.

Future Directions and Lessons Learned
The work and reach of these working groups will not stop after ten years of NSF funding. As shown, the studies continue to be cited and still influence the work of others beyond the immediate DataONE community. The DataONE cyberinfrastructure and community engagement activities will continue, headquartered at the University of California, Santa Barbara, National Center for Ecological Analysis and Synthesis (NCEAS).
Throughout the first ten years of DataONE, the members of the SC and U&A WGs have learned lessons about what works in large-scale projects that bring together global interdisciplinary communities (Crowston et al., 2015b) and cautionary tales of what can be done better. The working group model can be incredibly powerful, productive, and impactful. Some important lessons to increase working group effectiveness include:  working groups need to have effective leadership that is responsive to the participants and project objectives;  diversity (e.g., different countries, disciplines, career stages, demographics) enables flexible responses to challenges;  face-to-face meetings (with adequate travel budget) are essential to establishing effective working groups; virtual communication can facilitate group cohesiveness between meetings;  development and acceptance of convergence boundary objects (e.g., the data life cycle model) facilitates communication across a multidisciplinary project;  foundational work performed by the working groups influenced and shaped the nature, operations, and success of the project as a whole;  feedback is important to members of the working groups to acknowledge the importance of their contributions to the project as a whole;  iterative usability and assessment provide quality assurance mechanisms at all stages of a project.
Although this working group model is successful in building community and achieving goals of a large-scale project, continued sustainability after ten years of funding will be a challenge. Not all working groups or members will continue, but the strong sense of a shared purpose will ensure that research on topics related to data sharing and data re-use will continue. One suggestion for ensuring continued momentum is the toolkit approach, suggested by Gold et al. (2019).
While not all projects will be on the scale of DataONE, some lessons are important for projects of any size. Articulating and understanding the needs, attitudes, behaviors, and expectations of external and internal stakeholders is key to success. This needs to be a continuous process throughout the life of the project as the landscape changes. Organizations often fail to adapt as innovations introduced at the onset of a project become standard practices. It is vital for a project to institutionalize learning in order to advance. As part of this, multiple types of assessment are useful, including surveys, interviews, landscape analysis, and others. Engaging stakeholders in iterative usability testing at all stages of any development project is crucial. Diverse and well-functioning working groups contribute to the larger organization's cycle of innovation, assessment, and integration. However, if a project lacks a formal working group structure such as that described in this paper, the project will benefit from seeking feedback and including diverse points of view throughout the life of the project.