EasyScienceGateway: A new framework for providing reproducible user environments on science gateways

Science gateways have become a core part of the cyberinfrastructure ecosystem by increasing access to computational resources and providing community platforms for sharing and publishing education and research materials. While science gateways represent a promising solution for computational reproducibility, common methods for providing users with their user environments on gateways present challenges which are difficult to overcome. This article presents EasyScienceGateway: a new framework for providing user environments on science gateways to resolve these challenges, provides the technical details on implementing the framework on a science gateway based on Jupyter Notebook, and discusses our experience applying the framework to the CyberGIS‐Jupyter and CyberGIS‐Jupyter for Water gateways.

user environment which quickly becomes infeasible.Approaches that attempt to bundle specific repositories of code 13 or workflows 14 represent powerful tools for reproducibility, but shift the burden to users and attempt to re-create the user environment.
We set out to find a solution for providing multiple user environments to avoid these pitfalls and trade-offs.In approaching the problem, we held many brainstorming sessions with researchers and developers of our science gateway to determine the criteria we would need to meet to provide long-term reproducibility.Through those discussions, we came up with five characteristics of an optimal solution: multiplicity, feasibility, persistence, flexibility, and simplicity.Multiplicity allows a gateway to maintain previous versions of user environments while continually releasing updates, critical for maintaining reproducibility.Feasibility requires that any solution must be computationally achievable.Persistence requires that our user environment needs not be recreated to avoid issues with changing dependencies and software deprecation or deletion.Flexibility means that advanced users still have the necessary tools to customize and mix-and-match their environments as needed.Last but not the least, simplicity is a key criteria that ensures any solution can be used by the gateways' user base and remains accessible for non-technical users.
In this article, we present a EasyScienceGateway, new framework for managing user environments on science gateways that addresses these five criteria.Section 2 discusses the current solutions to providing user environments.We describe the new framework in Section 3 which integrates a variety of existing tools-Anaconda, 15 Easybuild, 16 Lmod, 17 and Jupyter 10 -to create a powerful, generic framework for providing reproducible user environments on science gateways.Going beyond the technical details, Section 4 discusses how we have applied this framework to two existing science gateways in the geosciences, CyberGIS-Jupyter 18 and CyberGIS-Jupyter for Water (CJW). 7

RELATED WORK
We review two main approaches to providing reproducible user environments on science gateways.Section 2.1 discusses science gateways which provide a pre-built, self-contained user environment.Section 2.2 concerns "package tools" 14 which allow users to package their code and workflows into its own self-contained environment, such as the functionality provided by Binder. 13While both of these approaches are useful and have their benefits, as we will discuss in their respective sections, they fail to meet our five criteria as summarized in Table 1.

Science gateways
Although not all publications on science gateways provide the technical details on how they provide their user environments, the typical deployment is detailed by Zonca and Sinkovits. 19Science gateways deployed using JupyterHub most often rely on Docker 20 and Docker images provide a complete and stable snapshot of a user environment.Gateways often rely on a containerized user environment such as those hosted by the Jupyter team,* with additional software, kernels, and features added as necessary. 19In this framework, users are provided pre-built, containerized user environments.Figure 1 illustrates a typical container for a science gateway, based on Jupyter's docker images: the entire user environment is completely within the container.The combination is theoretically perfect for reproducibility, but to maintain the perfect reproducibility, the container can never be changed ("containerized gateway with static image" in Table 1).This clearly introduces major issues for science gateways trying to provide cutting-edge software and respond to the needs of their communities.
This gives gateway managers a trade-off between reproducibility and functionality: updates to software decrease reproducibility, but locking in a user environment on launch and remaining consistent forever means that you cannot provide cutting-edge software and respond to the needs of the community of users using your gateway.One may be tempted to provide a user environment with the same set of software that is continually updated ("containerized gateway with continually updated image" in Table 1) as a way to provide a mostly consistent user environment of a typical container-based environment for a science gateway.All of the software required for the Jupyter interface and computation is within the user's container.
without being tied to a singular release and simply ignore the criteria of multiplicity and persistence, such as the approach taken by previous iterations of CyberGIS-Jupyter, 5,18 but this approach ultimately causes issues.Updating software runs into issues of changing dependencies, software packages being removed from package managers, and software packages restructuring or changing the names and arguments of functions.This creates a challenging upgrade process and worse yet breaks the promise of reproducibility made to users of the science gateway.Additionally, many gateways-including CyberGIS-Jupyter, CJW, 7 and CUAHSI JupyterHub 8 -provide example applications or notebooks for educational and research purposes which have to be checked for every release to ensure that minor upgrades did not cause errors.
A solution which is possible using JupyterHub 10 is to continue updating images and providing users with all of the versions of your container to choose from when starting up their container ("containerized gateway with multiple images" in Table 1).However, this provides a major technical barrier to inexperienced users confronted with a list of versions of containers they know nothing about.Switching between versions of the environment in this approach requires shutting down one's Jupyter server and spinning it back up with the desired version.This means that two versions of the environment cannot co-exist; making mixing-and-matching versions of software impossible for advanced users who may desire that functionality.This approach also means that switching between versions of a user environment requires stopping one's container and restarting the container with the correct version.
Another possible solution would be to keep adding new kernels and software to the original container while minimizing changes to existing user environments ("containerized gateway with continually extended image" in Table 1).An example would be installing new versions of software in an Anaconda or virtual environment alongside the old ones, then providing an image with both the new and old kernels.Unfortunately, this approach is infeasible for a few reasons.For one, the image would quickly bloat to untenable sizes.Even without keeping multiple versions of software in the container, the CyberGIS-Jupyter image grew to 22.4 GB uncompressed.Second, continuously building on top of an image-adding layers-is bad for performance and many containerization technologies have limits on the number of layers, such as Docker. 20Third, keeping the same base image in perpetuity would mean foregoing crucial security updates.

Packaging tools
Recent work has produced tools for creating user environments from a repository 13 and capturing the execution of experiments. 14Binder 13 offers an excellent open source tool for computational reproducibility, but unfortunately it cannot resolve all of the issues mentioned ("Binder/ReproZip" in Table 1).A core element of Binder is its repo2docker tool which is capable of converting code repositories into Docker images for code execution. 13,21The team behind Binder has also created infrastructure for publishing and running code † which is an excellent service to the scientific community.ReproZip 14 offers more advanced functionally by packing up required software, files, and environmental variables to replicate compute environments for reproducing scientific workflows run on Linux.These are powerful tools for achieving desirable levels of computational reproducibility, but such solutions come with two major disadvantages: (a) software is built at the time it is needed, increasing startup time and introducing the possibility that some of the dependencies of the environment are no longer available or have changed; and (b) the onus of specifying and managing software installations is passed to users, many of whom are domain scientists and not comfortable with such responsibilities.This provides a technical barrier for users (simplicity) and requires re-creation of a user environment (persistence).

FRAMEWORK
To meet the criteria for reproducibility and usability laid out in Section 1, we constructed the EasyScienceGateway framework for providing and managing user environments on science gateways.Our new solution puts the software responsible for user environments outside of the container as shown in Figure 2. In this approach, the scientific software and Anaconda environments that provide users with kernels are stored on a network file system (NFS) which is mounted within a user's container.Multiple user environments can be held in this directory, addressing multiplicity.Furthermore, this immediately eliminates the feasibility concerns in which containers get larger every release while at the same time also allowing science gateway managers to retain the exact files required for older release versions of the user environment rather than attempting to rebuild them like Binder 13 addressing our persistence criterion.Additionally, we have managed to provide users with the same compute experience without adding technical barriers for users without in-depth technical knowledge about cyberinfrastructure (simplicity) while giving advanced users greater flexibility to customize their user environments (flexibility).This is necessary because as shown in Table 1, none of the approaches with software in the container were able to satisfy all five of our criteria.
Designing a new framework for user environments in a way that meets our criteria relied on a variety of software tools and techniques.
Section 3.1 discusses how we moved key non-Python software out of the container with Easybuild 16 and Lmod. 17F I G U R E 2 A diagram of a user container in our new framework for CyberGIS-Jupyter.Much of the software that provides the users their user environment is moved to the shared directory, while some essential packages and Jupyter remain in the container.

Software management with Easybuild and Lmod
Providing end-users with a user environment that utilizes software from an external shared file system is achieved using two software packages that form a layered approach.The bottom layer employs Easybuild 16 to install and manage software.Easybuild couples with a module software, in our case Lmod, 17 that allows end users to specify which pieces of software and which versions of software need to be loaded at a particular time.
Easybuild describes itself as "a software build and installation framework that allows users to manage (scientific) software on high performance computing (HPC) systems in an efficient way." ‡ The Easybuild project is designed to make computational research easier and provides recipes, called Easyconfigs, for managing thousands of pieces of software.
Easybuild was designed to work with Lmod 17 by providing modulefiles for software that it builds which makes managing software installed through Easybuild quite simple.Modulefiles provide Lmod with a record of the software and instructions for Lmod to make the software available in a user environment.Both Easybuild and Lmod are designed with multiple versions of software in mind, meeting our multiplicity criterion.This modular approach adds flexibility for advanced users and the complexity is hidden from users by loading a certain set of software for each kernel in the background.Additionally, the ubiquity of tools like Lmod and environmental modules on advanced cyberinfrastructure means that many of users with in-depth cyberinfrastructure knowledge are already familiar with the provided interface.
While the Easybuild community already provides a wide variety of software, unfortunately it does not contain all software.Despite this, we found that integrating software with the Easybuild ecosystem by writing Easyconfigs is relatively easy for technical users such as those managing science gateways.Easyconfigs rely on Easyblocks, essentially templates designed around generic software installation frameworks or specific software.Their generic Easyblocks § include the most common installation paradigms such as installing from tarballs, installing from binary, and using make.These Easyblocks mean that designing a recipe, or Easyconfig, for new software in the Easybuild ecosystem is often as simple as putting a link to the source for software and letting the Easyblock take care of details.For the CyberGIS-Jupyter 5 and CyberGIS-Jupyter for Water (CJW) 7 gateways, we had to implement a few Easyconfigs and it is our aim that by working within the Easybuild ecosystem and contributing Easyconfigs for geospatial software packages, we can grow the Easybuild community and provide the geospatial community with a simple tool for installation and management of geospatial software.
To simplify managing software through Lmod, we have chosen to write "metamodules" for our user environment.These are designed to aggregate all of the software installed through Easybuild that comprises a user environment, with their specific versions, into a single module to be loaded.These metamodules are modulefiles which load a collection of software by "depending" on them.This gives users a collection of software that is tested together to ensure compatibility and gives advanced users a baseline user environment that they are free to customize as they wish with the module commands provided by Lmod.The metamodule for CyberGIS-Jupyter and CJW are named "cybergisx" and "cjw" respectively and is versioned based on releases.However, metamodules for specific kernels and use-cases can be used if the complexity requires such an approach.

Anaconda environments
User environments are provided to users via Jupyter's kernel.In the typical deployment described in Section 2.1, the kernels were provided to users via Anaconda 15 environments installed within the user's container alongside the rest of the system-level software.However, this design gives us the issues of persistence and flexibility discussed previously in Section 2.1.Under the previous paradigm, new releases and updates to the gateway are accompanied by new containers which hold the updated user environment, making reproducibility difficult and reducing the flexibility of the environment.
While the EasyScienceGateway design of storing Easybuild-installed software outside of the user's container resolved many issues, it created some new ones to tackle.In particular, how do we install Python packages that depend on our Easybuild-installed software in our new framework?
A possible solution would take a simple base Docker image ¶ ), create a container with it, mount the drive with Easybuild-installed software, load the software with Lmod, install the conda environments using this setup, and then commit # the changes to the container to create an image.However, this approach seemed overly complicated and prone to issues.Rather, we decided to simply move the Anaconda environments to the NFS alongside the Easybuild-installed software as shown in Figure 2. In this model, we load the Easybuild-installed software while installing the Anaconda environments, and the entire user environment lies on the NFS.This bundling is illustrated in Figure 3.This solution eliminates the feasibility concerns of an ever-growing image and reproducibility concerns, because the software does not take up space in the image and the software does not have to be re-built in a new image every release.
F I G U R E 3 Illustrating how we bundle our Anaconda environments with Easybuild modules.Each conda environment is bundled with a set of modules from Easybuild which are loaded when the conda environment is installed or used.An example would be the "Conda A" environment circled in red bundled with the modules circled in red.

Kernels
While the new software installation paradigm discussed in Sections 3.1 and 3.2.1 gives us a lot of flexibility with software management, we need to ensure that users are still provided a simple and easy-to-use compute experience.This is important for all science gateways, but especially for CyberGIS-Jupyter because it is used extensively for educational purposes. 5While we discuss user experience more extensively in Section 4, here we will discuss the technical details of providing a seamless kernel with our new framework.Jupyter's kernels provide enough flexibility that we were able to tie together each kernel's Anaconda environment with the correct set of Easybuild-installed software (i.e., metamodules), providing the exact same user experience as before.
Jupyter 10 stores the key information on their kernel specifications in a file called kernel.json|| which details the display name, language, and other key features of the kernel.The argv section details a list of commands to execute when starting a kernel.In this list of commands, we have asked kernels to run a bash script, called prepend_and_launch.sh, which allows us to tie together the Easybuild-installed software with the Anaconda environment.In the prepend_and_launch.sh,each kernel loads its associated metamodule, ensuring that this software is available for both the Python packages in the Anaconda environment, but also for users.Examples of the kernel.jsonand prepend_and_launch.shscripts for CyberGISX's Python3 kernel are in Figures A3 and A4 in Appendix respectively.Because our Anaconda environments are persistent and our metamodules are versioned, this provides users with a persistent and reproducible user environment.

Integration
Moving the software responsible for user environments outside of the container introduces a few small technical hurdles to providing an integrated experience.In this section, we discuss how we have integrated the software and solutions discussed in Sections 3.1 and 3.2 into a coherent solution for science gateways.The directory structure for the software directory used in the EasyScienceGateway framework as illustrated in Figure 2 is given in Figure 4.This NFS hosts the Anaconda environments that provide our users with kernels (conda/), the software installed by Easybuild (easybuild/), our metamodules (metamodules/) and various scripts that tie everything together for a seamless experience (scripts/).
When users start up their JupyterHub provided container, we augment the container's start-notebook.shto also run our own pre-start script (scripts/cybergisx/pre-start-notebook.sh in Figure 4, script in Appendix A as Figure A2).This pre-start script starts by calling our set_environment.shscript which sets environment variables including paths to our Anaconda environments and metamodules (available as Figure A1 in Appendix).Using these environmental variables, our pre-start script performs multiple tasks to prepare the user's container to work with our setup, including installing a list of default kernels.The files that specify each kernel are held in the NFS (scripts/cybergisx/kernels/ in Figure 4) and our pre-start script copies the files for each of the default kernels to the appropriate path in the container.On the CyberGIS-Jupyter 18 and CJW 7 gateways, we specify a list of default kernels rather that installing all of the available kernels to avoid inundating users with too many choices, but the rest of the available kernels are made available to users via a command line interface (CLI) detailed in Section 4.1.

F I G U R E 4
The directory structure of the network file system that hosts the Easybuild-installed software, Anaconda environments, and scripts of the framework.
F I G U R E 5 An example of how the upgrade process works with the EasyScienceGateway framework.New packages are installed with Easybuild and new conda environments are installed with Anaconda in the shared directory without affecting the old packages or conda environments.

Upgrading the user environment
A key advantage of the framework is that our process for upgrading the environment is easier, more reproducible, and requires less downtime compared to the conventional approach.Upgrading the user environment in the conventional approach meant rolling out a new image, causing a disruption to users and changing the user environment, the new framework can roll out changes without interrupting users on the system and our changes can exist alongside the old versions of kernels rather than replacing them as illustrated in Figure 5.We can install newer versions of software with Easybuild (e.g., PROJ/j, libspatialite/k, and GDAL/l on the right side of Section 3.This allows our redesign to meet all of the criteria discussed in Section 1.The ability to keep previous versions of Easybuild-installed software and Anaconda kernels means that we can provide users with multiple versions of our user environment (multiplicity).Our software being hosted on a NFS rather than being contained in a Docker image means our image size and number of layers are no longer an immediate concern (feasibility).
Additionally, none of the files/software are recreated or reinstalled, but rather persist on the NFS ensuring persistence.The integration of our metamodules with kernels (Section 3.2.2) ensures that the experience is simple for users, while providing flexibility for advanced users and use cases.We will discuss the simplicity and flexibility criteria in more depth in Section 4.

USER EXPERIENCE
We have implemented the framework on two science gateways aimed at the geosciences, CyberGIS-Jupyter 5 and CJW, 7 with minimal problems.
This seamless roll-out is facilitated by the design of Jupyter's kernels 10 which allowed us to merge our two layers of software to provide integrated user environments without the need for user intervention.While one function of these gateways is education, where technical barriers need to be minimized, our gateways are also used for reproducible computational research in geospatial science.These users tend to be more technically capable and their work often requires customization to our provided user environments.In this section, we will discuss some of the new functionalities provided to advanced users and how we have provided a simple default experience for beginners.

Versioned kernels and the cybergisx command line interface
The major motivation behind the new framework is to increase the reproducibility of workflows on science gateways.However, we were cognizant that we must provide a simple default experience for non-technical users that did not drastically differ from the experience in the old design.This trade-off informed how we approach kernels and versioning our user environment in the new framework.As a compromise, we have decided to provide both versioned and unversioned kernels.Versioned kernels are unchanging, they will always use the same Anaconda environment and metamodule meaning that users do not need to know specific software versions or paths, just the kernel name.Jupyter notebooks save the kernel name in their metadata, so once a notebook is tested and saved with a versioned kernel, it should automatically open the kernel and run without issue.Our unversioned kernels always point to the latest version of the environment, preserving how kernels worked in the old design.For example, we provide a user environment called python3 which is used for general-purpose geospatial computation, thus we have a python3 kernel which always point to the latest release of the python3 kernel and kernels named "python3-<version>" which only ever points to the same metamodule and Anaconda environment.This makes it easy for users who start with the unversioned kernel to identify and move to a versioned kernel to ensure reproducibility of their work.This gives advanced users the ability to tie their notebooks to an exact and unchanging version of the user environment without forcing all users to choose an exact version for every notebook they use.
While this compromise allows us to avoid confusing new or non-technical users by giving them an unversioned user environment, we are faced with the hurdle of providing all the releases of kernels to advanced users without overwhelming our user-base.As we release new versions of our kernels, the list of supported kernels will continue to grow and providing all of these options is sure to confuse new and non-technical users.Therefore, we have decided to limit the default set of kernels to the unversioned ones to reduce confusion and the latest release of the python3 kernel to ensure that users are aware of the functionality.This means that we need to provide users some way to access our catalog of versioned kernels, which inspired the cybergisx CLI.
The cybergisx CLI (cjw on CyberGIS-Jupyter for Water 7 ) provides a simple interface for advanced users with to browse and install from our catalog of available kernels.The functionality of this tool is illustrated in Figure 6, where users can see the help text, listing available kernels, installation of a versioned kernel, and listing of personal kernels.The personal kernels functionality of the cybergisx CLI provides that once a kernel is installed through the CLI, the kernel will be reinstalled by the pre-start script every time the user's container is started until it is removed from the list of personal kernels.This means kernels installed through the CLI persist and do not have to be re-installed every time the container is restarted.
The result of these changes is that users are provided with the same experience as before, but have access to a great number of environment and have the flexibility to peg their notebooks to versions of our user environment to ensure reproducibility.We have made the source code for the CLI publically available on GitHub for those who wish to utilize the CLI in their own gateways.**

Lmod Python interface
Our software being installed by Easybuild 16 and managed by Lmod 17 allows advanced users to take advantage of env_modules_python Python package provided by Lmod to customize their user environment within the notebook.This functionality is illustrated in Figure 7, which shows how a F I G U R E 6 Advanced users can utilize our command line interface (CLI) to select the kernels they would like to use.
F I G U R E 7 Advanced users can utilize Lmod's 17 Python interface to interact with their user environment within the notebook.
user can add the package to their path and load it in Cell 1 and use the package to list the loaded modules in Cell 2. The Python package's usage mirrors the CLI's usage, meaning that those with experience with Lmod or environment modules from HPC settings will pick up the Python package's usage quickly.For example, the command line syntax to list loaded packages is module list while the Python equivalent is simply module(∼list∼) as show in Cell 2 of Figure 7.This gives users the flexibility to customize the software packages loaded through Lmod should they be unable with the configurations provided by our metamodules.

DEPLOYING AN EASYSCIENCEGATEWAY
Sections 3 and 4 have discussed the benefits of the framework and the user experience, but the experience of those deploying and maintaining the science gateway is also an important consideration.Section 5.1 provides a high-level overview of the steps-independent of infrastructure and software-needed to deploy your own EasyScienceGateway.Section 5.2 discusses our experience using the EasyScienceGateway framework on multiple gateways.

Steps to deploy an EasyScienceGateway
The EasyScienceGateway framework can be utilized for any JupyterHub-based gateway.As shown in Figure 8, the architecture of an EasyScience-Gateway is strikingly similar to the typical gateway deployment described by Zonca and Sinkovits 19 and our previous iteration of CyberGIS-Jupyter 18 with the main difference being the additional software NFS.Much of the complexity of our framework is in installing software on the NFS and Example of the architecture for an EasyScienceGateway deployed using Docker Swarm. 20nfiguring the user containers to utilize the software seamlessly as described in Sections 3.2.2 and 3.3.1.Deploying a gateway using the Easy-ScienceGateway framework will require a bit more work upfront, but as we have discussed in Section 3 the rewards are well worth the cost.The steps for deploying an EasyScienceGateway installation are: (1) set up a NFS, (2) install necessary software on the file system, (3) deploy your Jupyter-Hub using the typical workflow, (4) add the NFS to the user containers, and ( 5) configure the user containers to use the software in the mounted file system.In this section, we will discuss these steps in full detail.
Our first step is to set up a NFS, which is already part of the typical JupyterHub deployment workflow to provide persistent user storage. 19r the EasyScienceGateway deployment, configure both a writable network share which is used for user storage and a read-only share which will contain the software for users.One share is read-only to ensure that users cannot accidentally break or change the software which would destroy reproducibility.The software file system could alternatively be deployed as a separate NFS server or using another file system like the CernVM file system (CVMFS) 22 if desired.When setting up the NFS, we recommend putting this on the same infrastructure as the gateway to minimize latency.
The second step is to install the user software.As discussed in Section 3.1, Easybuild provides a massive catalog of software † † and it is relatively straight-forward to develop new Easyconfigs.For those hoping to learn to use Easybuild, they have comprehensive documentation ‡ ‡ and tutorials available online.§ § It is not required to use Easybuild, 16 Lmod, 17 and Anaconda, 15 so long as you can install software and make it available to users in the container, but from experience we found this trio to be the simplest approach.Alternatives to Easybuild include Spack 23 and Nix. 24Environmental modules ¶ ¶ can be substituted for Lmod, and an appropriate Anaconda replacement will depend on the programming languages the gateway hopes to support.Once the necessary software is installed, it is convenient although not necessary to create "metamodules" which load a set of software for various environments.
With our software file system created, our third step is to deploy JupyterHub.The specifics of this step will vary depending on the infrastructure and environment, but the two most common approaches are to deploy through Docker Swarm 20 or Kubernetes, 25 both of which we have been used to deploy EasyScienceGateway.Deploying JupyterHub with Docker Swarm is, in our opinion, the simplest of the two approaches and utilizes the JupyterHub DockerSpawner.## The "Zero to JupyterHub with Kubernetes" guide |||| walks through deploying JupyterHub with Kubernetes on a variety of commercial clouds while Zonca et al. 19 provides resources and guidance for deploying on Jetstream's Openstack cloud.
Once you have the basic JupyterHub deployed, the fourth step is to ensure that user containers can access the software installed on our software NFS.Those using NFS for their user data will need to complete this step for the user data file system anyways.With Docker Swarm deployments, we can mount file systems on the virtual machines in the swarm and then simply mount the software directory into the container as a volume. 20For those using Kubernetes, you need to deploy a provisioner that will provide user pods access to your NFS share.***With the software file system mounted, you can use the terminal within the user containers to generate Jupyter kernel configurations for your software environments † † † and then make any necessary modifications to these files like loading metamodules, as discussed in Section 3.2.2.
Our fifth and final step to deploying an EasyScienceGateway is ensuring that user containers are configured to use the software in our NFS as discussed in Section 3.3.1.To accomplish this, we have a script called "pre-start-notebook.sh" on the NFS that sets environmental variables and copies the Jupyter kernel configurations into the appropriate path for Jupyter to use them (/home/jovyan/.local/share/jupyter/kernels)(see Figure A2 in Appendix).For a Docker Swarm deployment, we can ensure that this script is invoked by adding a line calling the script to the Dock-erSpawner's "start-notebook.sh"script.For Kubernetes deployments, we added the script as a post start lifecycle hook.‡ ‡ ‡ Once your JupyterHub is able to run this configuration step, your EasyScienceGateway is complete!F I G U R E A2 Our "pre-start-notebook.sh" script currently on CyberGISX at /cvmfs/cybergis.illinois.edu/software/scripts/cybergisx/pre-start-notebook.sh.This script copies the kernel configurations from the network file system so that Jupyter can use them, puts key environmental variables into the user's environment, and adds the cybergisx CLI to the user's PATH.We have added line breaks and removed some commands that are not specific to EasyScienceGateway like configuring Git, but the full script is available on CyberGISX.

F I G U R E A3
The kernel JSON file for our Python3 kernel on CyberGISX.This can be found currently on CyberGISX at /cvmfs/cybergis.illinois.edu/software/scripts/cybergisx/kernels/python3/kernel.json.The JSON declares key environmental variables like the metamodule and ensures that our python3-0.9.0 Python environment is used in the kernel.
F I G R E A4 The prepend_and_launch.sh script for our Python3 kernel on CyberGISX.This can be found currently on CyberGISX at /cvmfs/cybergis.illinois.edu/software/scripts/cybergisx/kernels/python3/prepend_and_launch.sh.The script sets environmental variables, loads the appropriate metamodule, and configures the folder user-installed pip packages before launching the kernel.
In Section 3.2 elaborates on how we moved Anaconda15 environments out of the container and used Jupyter kernels10 to provide users with cohesive software environments.Lastly, Section 3.3 describes how we integrated software on a NFS into the typical Jupyterhub-based gateway deployment and discusses the advantages of our framework for meeting the five criteria in further detail.Here, we focus on the framework itself, using examples from CyberGIS-Jupyter and CJW to clarify abstract concepts, and in Section 5 discuss the process and experiences of deployment.
3.2) and create new Anaconda environments (e.g., Python3-b, Hydro-b, and GeoAI-b on the left side of Section 3.3.2) on the NFS without disturbing users on the gateway or changing/destroying the current user environment.When we are ready to deploy the changes, we simply change our pre-start scripts discussed in Section 3.3.1 to use the latest versions of kernels.While we will still roll-out new versions of our containers, these containers only hold the necessary software for Jupyter and interacting with the software on the NFS (such as lmod), and should not affect any user environments.
Comparison of each approaches' ability to satisfy the five criteria we have identified.