ExPaNDS: Laying the Foundations for Achieving Open Science for Everyone

Diamond Light Source, the UK’s national synchrotron, has been leading work package 6 (WP6) responsible for dissemination and outreach, working alongside European light source and neutron source partners to deliver the foundations required for their collective science to be open to everyone. The collaboration of 10 national Photon and Neutron Research Infrastructures (PaN RIs) (Figure 1) from across Europe, based on the European Open Science Cloud (EOSC) services in partnership with EGI [1], has worked closely with PaNOSC—a European project gathering six European RIs. The European Open Science Cloud (EOSC) Photon and Neutron Data Service grant (ExPaNDS) has worked to deliver a shift in policy to see FAIR principles (Findable, Accessible, Interoperable, Reusable) being considered and, in some cases, applied according to the users’ needs. One aspect of the work has looked to harmonize efforts to migrate a facility’s data analysis workflows to EOSC platforms, enabling them to be shared in a uniform way through the development of search Application Programming Interfaces (APIs) where it is possible to technically implement these. The ExPaNDS project plan was designed around these six work packages:

Diamond Light Source, the UK's national synchrotron, has been leading work package 6 (WP6) responsible for dissemination and outreach, working alongside European light source and neutron source partners to deliver the foundations required for their collective science to be open to everyone.
The collaboration of 10 national Photon and Neutron Research Infrastructures (PaN RIs) ( Figure 1) from across Europe, based on the European Open Science Cloud (EOSC) services in partnership with EGI [1], has worked closely with PaNOSC-a European project gathering six European RIs.
The European Open Science Cloud (EOSC) Photon and Neutron Data Service grant (Ex-PaNDS) has worked to deliver a shift in policy to see FAIR principles (Findable, Accessible, Interoperable, Reusable) being considered and, in some cases, applied according to the users' needs. One aspect of the work has looked to harmonize efforts to migrate a facility's data analysis workflows to EOSC platforms, enabling them to be shared in a uniform way through the development of search Application Programming Interfaces (APIs) where it is possible to technically implement these.
The ExPaNDS project plan was designed around these six work packages: A series of facility use cases [2] have been produced as part of the delivery of the grant. These include: A consultation was carried out that reached over 14,000 researchers and called for their needs for better data management. The consultation, which received feedback from almost 200 respondents, unraveled the following six key findings: This article has been corrected with minor changes. These changes do not impact the academic content of the article.
• 82% of respondents declared making at least some of their data open and 71% declared making at least some of their data FAIR. • Almost 70% of respondents declared that "documenting the datasets (auxiliary and main) so that the results can be replicated and understood" was a challenge to make their data FAIR and open. • Almost 50% of respondents declared that "data are too big to share" was a challenge to make their data FAIR and open. • Over 60% of respondents declared that their institution does not provide a platform to publish their research data. • Out of the respondents using a data sharing platform (n = 71), either provided by their institution or a third party, nearly 40% (n = 27) declared that the information available on the platform did not provide sufficient information for a fresh analysis to be carried out. • More than 70% of respondents declared that they did not use a digital notebook.
Reflecting on these findings, the next steps will require a focused effort to digitalize laboratory notebooks and provide a range of technology to users.
During the grant, the creation of search APIs that could be implemented in any current or forthcoming data management architecture was developed, led by the Paul Scherrer Institute (PSI).
ExPaNDS also allowed for some extensive advocacy work through the attendance, presentations, and poster sessions at facility user meetings, such as SOLEIL, DESY, HZB and Solaris to name a few, hosting a joint annual symposium with PaNOSC disseminating the grant progress and webinars. The grant also issued a call for involvement and received over 50 ambassadors willing to be involved throughout the grant and answer direct user questions.
Through a number of online events, Ex-PaNDS succeeded in facilitating important discussions with key audiences. For example, the Librarian Symposium [3] was aimed at librarians and data managers who work with and support PaN science facilities. In particu-lar, the symposium focused on the interface between publications and data. Questions posed include: How can we bring these together better for the benefit of all? Where are the gaps? What do we need to bridge them?

Conclusions
The PaN community is extremely diverse, covering virtually all fields of research. There is huge diversity in data management approaches, and harmonization remains a challenge when developing data pipelines and making them meaningful in catalogues. It's become clear that data require a lot of contextualization for data mining to be successful. The most successful communities are those that already have bespoke data catalogues where facilities can feed their data pipeline to be displayed within a wider context. However, the partners' work has been extremely valuable in advocating and rooting activities in PaN user needs. The grant has allowed detailed mapping of the data pipelines to understand the extent to which the FAIR principles have been deployed and how the chosen community make their data open.
Professor Dr. Volker Gülzow, IT Director at DESY, explains, "The grant provided time and resources to do in-depth analysis of the current state of data plans. One key conclusion is that the level of implementation of the FAIR principles within these diverse communities is very disparate. The detailed mapping does allow for options to be considered when making data available, from facilities being ready to implement data catalogue APS to those where resources are extremely limited having the option to deposit data onto third party platforms like Zenodo or university-based data catalogues." ExPaNDS exposed the technical challenges for facilities to implement the data catalogue APIs developed under the PANOSC grant. Some facilities across Europe are clearly more advanced on this than others, and a mixed approach is required for progress to be made.
Collaborations such as Lightsources.org, LEAPS, Neutronsources.org, and LENS, all of which help to foster relationships between PaN facilities and communities, have an important role to play when it comes to sustaining the progress made by the Ex-PaNDS grant towards standardized, interoperable, and integrated data sources and data analysis services for PaN facilities. Planned and sustained efforts in this area will undoubtedly help to advance future science and global research for the benefit of society as a whole.
In the following, we'll take a more focused look at work packages 1-5.

WP1: Management and sustainability
WP1 has been responsible for the project coordination and its finances. It ensured the successful achievement of the project's objectives, the timely delivery of deliverables and milestones, and has guaranteed that the overall architecture of products is in line with the description of work and is interoperable with the requirements and structure of the EOSC core services and the marketplace portal. Examples of those services are the data catalogues, the data portal, the training platform, various data analysis services and templates for data policies and DMSs. In the context of the EOSC, WP1 has supported our partners in registering themselves as service providers and in offering their services within the EOSC marketplace.
Furthermore, WP1 has kept a close connection to projects with similar objectives but, in particular, with our sister project PaNOSC. The tight coordination with PaNOSC in terms of common technical and dissemination activities has helped us build a strong network inside the EOSC for the PaN community.
Another essential objective of WP1 has been to guarantee the sustainability of our products and our work on common policies in the context of open data. For that, we have been engaged in supporting our project partners in the dissemination of our achievements to their senior-level management and to offer opportunities to participate in future projects, picking up on our achievements. Finally, WP1 has ensured that processes and achievements of the project are properly documented and are available as FAIR data in the future.

WP2: Enabling FAIR data for PaN national RIs
The increasing size and complexity of data from experiments means that it is becoming harder for users to manage, while the opportunities for new science from analyzing and reusing that data become greater. The notion of Findable, Accessible, Interoperable, and Reusable (FAIR) data has been proposed, providing guidelines on managing data to support sharing and reuse, and FAIR data is a founding principle of the EOSC. However, how these guidelines are reflected in practice within photon and neutron facilities remains to be explored in detail.
ExPaNDS has produced guidelines and recommendations providing a toolkit that facilities can use to ensure that data generated from experiments is FAIR, and thus suitable for sharing and reuse, as well as easier for the experimenters themselves to use. Firstly, the organizational context is considered in guidance on setting a data policy that supports data sharing, with a commitment to support the publication of FAIR data, while protecting the user's priority of conducting science. A second recommendation discusses establishing a FAIR experimental process, including coordinating the tools and information systems of the facility to support the collection of rich metadata about data, so researchers can discover and understand the data sufficiently to allow reuse. A third re com mendation discusses Persistent Identifiers (PIDs), which uniquely label resources such as data, papers, and even people so that they can be unambiguously found and used, while allowing credit to be given to the experimental team. Data Management Plans (DMP) are discussed in the fourth recommendation, bringing "FAIRness" to a particular experiment, specifying additional metadata describing that instance. DMPs are time-consuming for users to produce, and the guidance considers how this burden can be significantly reduced by integrating DMPs into the experimental process.
Finally, we need to understand how well facilities are doing in supporting FAIR data. Ex-PaNDS has developed a self-assessment technique to allow a facility to review how their processes have been set up to ensure that FAIR data is generated from experiments. This tech-nique has been tested in a self-assessment exercise across ExPaNDS facilities, giving them insight into their journey towards FAIR data.

WP3: EOSC data catalogue services for PaN national RIs
Metadata catalogues are a pivotal tool for EU photon and neutron national RIs and their users to implement open research data policies according to FAIR principles. WP3 aims to coordinate the adoption and enhancement of existing solutions. Furthermore, it links them to the domain federated metadata catalogues (e.g., Pa-NOSC) and so makes them available in the domain portals and via services of the European Open Science Cloud for all users of the EU national photon and neutron RIs, in conformance with the rules of participation of the EOSC.
WP3 aims to enable the implementation of harmonized policies, practices, and guidelines coordinated in WP2 (enabling FAIR data) and provide services for WP4 (data analysis services). It facilitates federated access to metadata, experimental data, derived data, and related information. Exploiting existing links with communities such as PaNOSC, services such as OpenAIRE, and standards such as NeXus, WP3 will enable and coordinate the adoption of agreed standards at a national science community level.
Thanks to contributions from all partners, WP3 is able to map the status landscape of involved RIs and consequently set targets and areas of focus. This has led to prioritizing the adoption of data catalogues, in particular ICAT and SciCat, and enabled almost all facilities to integrate it into their existing infrastructure. Over half were able to exploit links with the PaN portal and the EOSC services. Towards FAIR, WP3 has participated, along with PaN-OSC, in delineating the aforementioned links, defining the baseline standards for APIs for search protocols, linking data catalogues to data search portals such as B2FIND, openAIRE, Google Dataset Search and the PaNOSC portal.
Data curation is of central importance in achieving FAIR data. WP3 has defined, in conjunction with WP2, new PaN metadata standards known as the PaNET ontology. This outlines a hierarchical controlled vocabulary for PaN techniques, which can be accessed through a dedicated RESTful service.

WP4: EOSC data analysis services for PaN national RIs
The purpose of WP4 has been to provide photon and neutron users with the ability to find and run analysis workflows against the EOSC aligned data services. This consolidates the reusability of the workflows, which is key to fully developing the FAIR principles for photon and neutron data. Integrating these existing data analysis services with EOSC services such as browser-driven remote desktops and Jupyter analysis services helps to ensure that they are accessible and interoperable. Finally, standardizing against these EOSC analysis services greatly increases the reusability and reproducibility of the underlying algorithms used to understand the FAIR data. For reproducibility, and as a basis for further continuous development, WP4 selected reference photon and neutron data sets. Deliverable D4.4 described fully five challenging data analysis pipelines' implementation as remote data analysis services taking into account the needs of representative scientific communities while supporting the diversity of the institutions' existing computing infrastructures. Each of these five use cases has been developed by a different facility and, in order to achieve sustainable solutions, all project partners based their implementations on concepts and environments that are typical for state-ofthe-art open-source projects such as GitHub, GitLab, and CI/CD or on already available core services in the EOSC Marketplace.
The coordination of efforts by ExPaNDS helped its partner-facilities to provide PaN scientists in Europe with standardized data analysis pipelines available through the EOSC portal and to conceive an overall modular architecture around generic APIs [4]. They also opened the road to enlarge the audience of facilities using the VISA portal.
Deliverable D4.5 provides an explicit description of the user relevance of these activities, the strategy for integration of existing analysis services into EOSC, and deployment details of selected reference analysis services as specific use cases, with reference to linked training material coordinated with WP5.

WP5: Training activities through EOSC platforms
Training is an essential part of the ExPaNDS project: to provide training material accessible to staff and users from photon and neutron (PaN) research infrastructures is one of the key objectives of the project. ExPaNDs partners aim to educate and train users to exploit the services developed during the project by organizing a series of events and delivering training materials on a dedicated platform (pan-training.eu).
The pan-training.eu portal is a result of the two H2020 projects ExPaNDS and PaNOSC. Our portal consists firstly of a catalogue for the registration of third-party training materials such as tutorials, videos, repositories, or training workflows to show learning steps and dependencies between materials. The portal also offers a tightly coupled e-learning system with PaN-specific learning courses with Jupyter notebook support. The catalogue is based on an established solution developed by the Elixir community, which is also available within EOSC. With the catalogue, we are strengthening the role of EOSC services for the PaN scientific field, with a focus on providing a sustainable training, learning, and documentation platform for the PaN community, including initiatives such as LEAPS, LENS, and Laserlab Europe. Currently, the platform gathers 181 materials including almost 50 courses, as well as 303 events and 10 workflows.
The organization of several events has also enabled the promotion of the outcomes of Ex-PaNDS. Several workshops have been organized, covering a large range of services such as FAIR principles, data stewardship, data management, catalogues, and data analysis services. These actions dedicated to service providers and users fostered faster adoption of best practices by an enlarged number of the PaN user community. All the material and recordings of the workshops can be also found on the training platform: https://pan-training.eu/.