Collaborative Groundwater Modeling: Open-Source, Cloud-Based, Applied Science at a Small-Island Water Utility Scale.

Recent advances in cloud-computing and social-networking are influencing how we communicate professionally, work collaboratively, and approach data-science tasks. Here we show how the groundwater modeling field is well positioned to benefit from these advances. We present a case study detailing a vertically-integrated, collaborative modeling framework jointly developed by participants at the American Samoa Power Authority and at the University of Hawaii Water Resources Research Center. The framework components include direct collection and analysis of climatic and streamflow data, development of a water budget model, and initiation of a dynamic groundwater modeling process. The framework is entirely open-source and applies newly available data-science infrastructure using Python-based tools compiled with Jupyter Notebooks and cloud computing services such as GitHub. These resources allow for seamless integration of multiple computational components into a dynamic cloud-based workflow that is immediately accessible to stakeholders, resource managers, or anyone with an internet connection.


Introduction
For the last half-century, computational modeling has become a principal tool in the water resource manager's toolbox. Groundwater models have become indispensable, industry standard methods for estimating the availability and sustainability of groundwater resources (e.g. Young & Bredehoeft, 1972;Cummings & McFarland, 1974;Willis & Yeh, 1987). However, because of the inherent complexity of numerical models and the significant time, effort, and expertise needed for their development, it is often challenging for stakeholders and water resources managers to access models that are appropriate for their needs (Essawy et al., 2018). Within the traditional model development paradigm, water management agencies usually take one of two approaches for obtaining hydrologic models to suit their needs, 1) dedicate significant resources to building internal modeling capacity or 2) contract with outside 'experts' to deliver models that typically cannot be interacted with once completed. Drawbacks to the former approach include the high cost of training, software, and salary required for agencies to retain personnel with sufficient skills to assess the validity, conceptualization, calibration, and usefulness of existing models or to create and maintain effective modeling programs. This level of resource dedication is often only possible for larger utility companies or management agencies, leaving small, remotely located agencies with few to no options for accessing quality modeling tools. On the other hand, the latter approach of hiring contractors generally results in production of static models that may lose relevance quickly and are often delivered in a format that do not allow end-users to modify parameters or address new questions. This approach also suffers from the inherent temporariness of typical funding mechanisms, whereas the calibration and validation process lasts only as long as the project account is solvent, after which a final report is delivered and the latest iteration of model files are archived for long-term storage on a server in the back of someone's office. Compounding this issue is the fact that many groundwater models are developed with proprietary software or within specialized computational environments, making it prohibitively challenging for end-users to open and interact with the finished product.
To circumvent these problems, yet retain the benefits from each of the aforementioned approaches, we here propose a new approach to the hydrologic modeling process; where model developers and end-users enter into a long-term collaborative working relationship, facilitated by advances in open-source, cloud-computing capabilities. To demonstrate this approach, we present as a case study, the ongoing development of a collaborative modeling The primary objective of this manuscript is to present an example of a collaborative hydrologic modeling framework that that takes advantage of recent advances in cloud-computing and open-source modeling tools. The framework is vertically integrated as it is intended to handle all components in the process that ultimately leads to the development of groundwater models used for drinking water management. These components include: (1) the direct collection and processing of basic hydrologic parameters through an island-wide hydrologic monitoring network, (2) development of automated data processing applications to integrate updated data into subsequent model components, (3) creation of a dynamic water budget model to predict island-wide groundwater recharge and (4) development and application of a set of open-source groundwater modeling tools that can be directly modified by stakeholders to address local water resources management questions.
The framework is intended to be entirely open-source, and all raw data, model code and processed outputs are made publicly available online, so anyone with skills and interest can modify inputs, test scenarios, and continue model development. This philosophy promotes transparency, reproducibility, and accessibility through both the development and implementation process, thereby facilitating interaction with interested stakeholders or other modelers. The framework is designed to adhere to the best practices of reproducibility for digital research objects , which include (1) being physically accessible by using open-source codes, (2) being conceptually accessible by facilitating sharing of required core skills for data management, workflow efficiency, and visualization, and (3) being reusable by applying modularity, and by providing opportunities to dynamically change inputs as needed.
The framework is also intended to be portable, flexible, use small file sizes, and only include models with short run times, which are attributes that have been shown to enhance model adoption rates amongst managers (Argent and Grayson, 2003). By presenting this framework, we hope to demonstrate the ease of use and the applicability of modern code sharing and cloud-computing tools in a scientific modeling setting involving participants at remotely located institutions. These tools have allowed us to connect researchers and stakeholders through ready-built data science infrastructure and to share advanced modeling capacity across our network of participants. While the case study in its current form demonstrates these ideas, we also view model development as a process, not necessarily an end-goal. Therefore, the framework continues to evolve and change as we and our stakeholders continue to participate in discussion, raise concerns, and contribute new ideas. 6

Case Study Setting
The island of Tutuila is the main population center of the U.S. territory of American Samoa. It is located near 14° S and 170° W, and at 142 km 2 is the third largest island in the Samoan hotspot island chain. Geologically, Tutuila contains two distinct provinces. The bulk of the island is composed of an older, highly eroded basaltic shield edifice (1.5 to 1.0 Ma), and recent (Holocene age) rejuvenation-stage volcanism on the southwestern flank of the older shields has created the younger Tafuna-Leone Plain (Stearns, 1944;McDougall, 1985). The young pahoehoe flows of the Tafuna-Leone Plain give it a higher hydraulic conductivity (K) than the Older-Volcanic Unit, which is composed of a heterogeneous mixture of a'a lava flows, pyroclastic materials, and trachyte domes (Stearns, 1944;Eyre and Walker, 1991). Geological subdivisions within each of these units exist, and may be used as the basis for further refinement into zones with different hydrogeologic properties (Izuka et al., 2007). Tutuila's climate is warm and humid with abundant, year-round rainfall due to its position within the South Pacific Convergence Zone. The island experiences a wetter season with increased precipitation amounts from October to May, and a drier season with less, though still significant, precipitation from June to September. Rainfall varies considerably with location and elevation, and ranges between 1,800 mm/yr near the Tafuna Airport up to more than 5,000 mm/yr along the crest of the highest mountains (Daly et al., 2006). The region is also influenced by tropical storms and hurricanes, and an average of 25 to 30 significant thunderstorms affect the island annually (Kennedy et al., 1987).
In American Samoa, groundwater resources supply over 90% of domestic, and nearly 100% of industrial water use. However, these resources are afflicted by multiple threats to their long-term sustainability. Since 2009, portions of the public water supply system have been unsafe to drink, necessitating one of the longest standing boil-water-advisories in U.S. history. This is partly caused by the vulnerability of Tutuila's young and highly-permeable aquifers to anthropogenic and surface water contamination (Shuler et al., 2017;Shuler et al., 2018). Other aquifers on Tutuila produce high salinity water, presumably caused by salt-water intrusion (Izuka, 1999). In some cases, the island's wells produce water with Clconcentrations exceeding the U.S.
Environmental Protection Agency drinking water standards by four to five times. Multiple local stakeholders see groundwater models as a tool that will greatly facilitate management of these issues (ASPA, 2013, Anderson-Taggarino personal communication Oct, 2018. As of this writing, there have been four known groundwater models developed for portions of Tutuila  (Izuka et al., 2007;ASPA, 2013, Shuler et al., 2014, Shuler et al., 2017. While each of these models addressed a specific question, ranging from defining well-capture zones to modeling nutrient transport, none have satisfied the requirements to fully address ASPA's water management needs. The static nature of these models also restricts their ability to be modified, and ASPA, like most small-scale water utilities, does not have time or resources to support building and maintaining an active hydrologic modeling program on their own.

Collaborative Groundwork and Stakeholder Needs
In American Samoa, ASPA is the only water utility and the agency is also responsible for all municipal power, wastewater, and solid waste services. American Samoa is a unique environment as it is small (population of approximately 60,000), geographically isolated (4,000 km to the nearest continent), and a sovereign society still retaining much of its indigenous culture and tradition. Therefore, ASPA is particularly invested in not only meeting customer needs, but also in conservation and responsible stewardship of the island's limited natural resources. The Water Resources Research Center is a technical research unit at the University of Hawaii, and its stated mission is, "To promote understanding of critical state and regional [including the U.S. Affiliated Pacific Islands] water resource management and policy issues through research, community outreach, and public education." To fulfil this mission in American Samoa, UHWRRC has been working with ASPA and other agencies since 2013 to develop an integrated water resources research program in the territory that strives to incorporate on-island stakeholder concerns into research priorities.
In 2015, we, a group of researchers and staff at ASPA and UHWRRC, formally initiated the collaborative modeling effort through a memorandum of understanding originally intended to 1) develop infrastructure for collection of hydrologic and climatic data, and 2) apply this data in support of ASPA's water resources management priorities. At the time, we also recruited a diverse group of representatives from local management agencies to form the American Samoa Water Resources Stakeholders Committee. The committee was tasked with documenting and communicating American Samoa's water resource needs, and since its formation, hydrologic data collection and groundwater model development have been consistently identified as top priorities. Throughout the next four years, our collaborative efforts have been directed towards hydrologic data collection, development of water budget estimates, development and use of groundwater and hydraulic system models, and capacity building within both institutions. During a recent training workshop conducted in American Samoa for ASPA and UHWRRC staff, the following modeling focused management priorities were identified: -Assessing resource sustainability, through water budgets or sustainable yield estimates -Identifying new well drilling locations based on freshwater lens thickness -Simulating contaminant plumes from sources including piggeries and industry -Identifying low-pressure zones and examining hypothetical stresses in the water distribution system These stakeholder-driven objectives currently guide the focus of our conceptual and numerical groundwater modeling activities, as well as the continued development of hydrological monitoring operations.

Cyberinfrastructure Framework
To make hydrologic monitoring data publicly accessible, and also to store projects and code in a The diverse and rapidly developing array of existing Python packages allows the language to be used reliably for every task across our collaborative modeling framework. Maintaining this methodological consistency streamlines execution of the modeling process by routing computational outputs from one module as inputs to others. While computing languages such as Python, R, and MATLAB are commonly used for scientific model development (e.g. Borah and Bhattacharjya, 2013;Bakker et al., 2016;Yin et al., 2017), code-based tools have nonetheless been historically difficult for end-users to access, due to steep learning curves and sometimes costly licenses. Jupyter Notebooks help to solve this issue by bridging the gap between "coders" and the uninitiated by integrating live code, equations, visualizations, web links, and explanatory documentation into notebooks to make them easier to understand and interact with (Perez and Granger, 2015;Kluyver et al., 2016). Because of this accessibility, Jupyter Notebooks are becoming increasingly popular amongst modelers and across many scientific disciplines (e.g. Subramanian et al., 2015;White et al., 2016;Somers, 2018). Though these advances may seem unimportant to those more familiar with coding, in a collaborative framework where team members with variable degrees of expertise wish to be involved in the modeling process, simplicity and ease of access is paramount for everyone's engagement.
Additionally, inclusion of numerous participants, and integration of multiple components into a single workflow necessitates a significant amount of data organization and project management work. While this would be time consuming to do manually, the open-source project management tool GitHub provides free cloud-based server hosting, version control, and workflow organization to facilitate collaborative contributions from multiple participants. GitHub maintains organization in the modeling framework by storing all data and code within individual "repositories" that preserve the file-folder structures that allow for consistent connectivity between model inputs and outputs. Open-access repositories can be directly downloaded from the web by anyone, and authorized team members can make changes and upload them back to GitHub. Automated version control features track all changes and allow users to review and accept, or reject, them.
GitHub is becoming increasingly popular as a collaborative coding and data-driven project management tool (Dabbish et al., 2012), and at present, it is the computer-science industrystandard application for storing, managing, and tracking changes to code (Stack Overflow, 2018). A key feature for facilitating input from stakeholders is GitHub's browser-friendly graphical user interface (GUI), which allows anyone to view and explore files, datasets, and results without needing any specialized software or computing resources.
While GitHub provides online file storage and organization, it does not provide computational resources for running models. To avoid issues with local dependencies (software) and computational resource limitations on individual user's computers, we are currently exploring the utility of a number of open-source, cloud-computing resources dedicated to addressing this concern. Such services operate by opening a cloud-based Python environment on a remote server and installing all needed software at the time of use. There are numerous existing resources including Binder and Google Colab, amongst others, that provide this service through seamless integration with Jupiter Notebooks and GitHub. At present, we have found these services to be most useful for conducting demonstrations of framework components during workshops or teleconferences. Finally, although a seemingly simple task, the importance of teleconferencing and specifically screen-sharing services cannot be overlooked for projects of this nature. For this project, Skype and Google Hangouts has been a significant boon for facilitating communication to and from American Samoa, which is still serviced by international telephone calling rates. These internet-based services allow participants to share screens and see visual output directly, and this is especially helpful when working within a fairly complicated modeling framework.

Modeling Framework
The Tutuila collaborative modeling framework is intended to include tools that address the island's most pressing water management questions through three distinct, yet integrated components. These include (1) collection and automated processing of weather station, stream gauge, and monitoring well data from a hydrologic monitoring network, (2) development and application of a dynamic water budget model that produces and automatically updates an island-wide groundwater recharge coverage based on monitoring network data, and (3) ongoing collaborative development of dynamic regional and local scale numerical groundwater models that automatically intake the most recent recharge coverage and monitoring well data.
Tasks performed in the first framework component include collecting raw hydrologic data, uploading data to the GitHub repository (Shuler and Mariner, 2019), checking quality, and processing data into the necessary format to be used as input to the subsequent modeling steps. We developed Python based data processing scrips as Jupyter Notebooks, which are archived along with raw and processed hydrologic data on GitHub. Processed hydrologic data from the monitoring component is output to the input data folder accessed by the water budget model script. We developed the Tutuila water budget model, with the Soil-Water Balance 2 (SWB2) code developed by the USGS (Westenbroek et al., 2018 then, ideally, become more accurate every time new monitoring data is processed. Figure 1 shows a schematic of the data processing and modeling workflow for the ASPA-UHWRRC cooperative modeling framework. Datasets or geospatial layer components are shown in quadrilaterals, code-based processes are contained in ovals, and external model executables are contained in triangles, which are themselves within ovals since they are run as Python sub-processes.

Hydrologic Monitoring Network
Beginning in the 1950's, the U.S. Geological Survey (USGS) monitored rainfall and streamflow at multiple sites throughout American Samoa. However, in 2008 all USGS monitoring operations in the territory were halted, leaving a 7-year long data gap until 2015 when we began installing weather stations. Since 2015 we at ASPA and UHWRRC have worked to develop a monitoring network consisting of eight stream gauging stations, six weather monitoring stations, and three preliminary water level monitoring sites (Fig. 2). Hydrologic data from these instruments is imperative for estimating groundwater recharge, which is an important spatially distributed input variable in groundwater models, especially in island settings with very steep rainfall gradients.
Both ASPA and UHWRRC contributed to instrument installation and we continue to work together to maintain the network. To ensure continuity with data downloads and maintenance of physical infrastructure, ASPA created a full-time position for a hydrologic technician, and UHWRRC continues to develop and maintain systems for data processing, quality assurance / quality control (QA/QC) procedures, and archiving and distribution of data.
All weather stations have the capability to record precipitation, temperature, relative humidity (RH), wind speed and direction, and solar radiation (SR). The network was initially developed

Tutuila SWB2 Model Development
The USGS developed the SWB2 Water Budget Modeling Code (Westenbroek et al., 2018) to allow users to easily calculate water budget components, and specifically groundwater recharge. Some functionality based on the Hawaii Water Balance Code (Engott et al., 2017) was incorporated into SWB2 making it one of the better suited options for modeling tropical basaltic islands in the Hawaiian or Samoan chains (Westenbroek et al., 2018). For this study, the SWB2 code was applied to develop a Tutuila Island groundwater recharge coverage that then served as an input to the FloPy groundwater model. The SWB2 code is based on a modified Thornthwaite-Mather (1955) soil-water balance approach, which in a simplified form is represented by the following:

Recharge = Rainfall -Runoff -Actual Evapotranspiration
For this study, runoff-to-rainfall ratios and temporal rainfall distributions derived from our monitoring network data were used as key input variables to the Tutuila SWB2 Water Budget Model. All other input datasets used for the SWB2 model were obtained from existing publications or databases, with each being described in the respective documentation as cited below. All SWB2 inputs are either in the form of tabular lookup data or spatially-distributed datasets in the ESRI ascii grid format. Input files for the Tutuila SWB2 model included: -Gridded monthly precipitation data (Daly et al., 2006) -Precipitation gauge data used to represent temporal rainfall distributions (this study) -Land use data (Meyer et al., 2016) -Impervious surface ratios (Meyer et al., 2016) -Canopy coverage ratios (Meyer et al., 2016) -Soil-type data consistent with the NRCS SSURGO database (Nakamura, 1984) -Direct infiltration data from municipal water line leaks (ASPA, personal communication) -Direct infiltration data from OSDS effluent discharge (AS-DOC, 2009) -Runoff-to-rainfall ratios (this study; Perrault, 2010); Wong, 1996) -Potential evapotranspiration data in monthly gridded format (Izuka et al., (2005).
-Canopy evaporation data (Engott et al., 2015;AWS Truepower, 2014) -Gridded monthly maximum and minimum temperature data (Daly et al., 2006) The SWB2 code calculates all water balance components at a daily resolution and output files are produced in NetCDF format. The Tutuila Water Budget Model, just like the other routines and models used in the framework, was designed to be dynamic, whereas once newly collected runoff and rainfall data are uploaded to GitHub, the seamlessly integrated workflow automatically incorporates it into all subsequent calculations. Although a dynamic version of the Tutuila Water Budget Model is integrated as a module of the integrated modeling framework, a static version of the water budget model was published as a stand-alone version, for documentation as Shuler and El-Kadi (2018b). This report along with all of the input data, code, and output data for the static version of the Tutuila Water Budget Model is publicly available at (https://github.com/UH-WRRC-SWB-model/SWB2-Tutuila).

FloPy Groundwater Model Development
In 1984 the USGS released the first version of MODFLOW (McDonald, and Harbaugh, 2003).
Over three decades later, MODFLOW remains one of the most widely used groundwater modeling applications. The model's large user base results from its simplicity and continuing evolution, and presently, the Python based module FloPy lies at one of the forefronts of USGS model development efforts (Bakker et al., 2016). FloPy is an open-source Python package that provides functionality to simplify pre-and post-processing tasks for the MODFLOW family of models, such as MODFLOW (Harbaugh et al., 2000), MT3DMS (Zheng and Wang, 1999), and SEAWAT (Guo and Langevin, 2002). Following the development philosophy of most Python packages, FloPy is open-source and constantly in-development, therefore new functionality continues to be added by the software's developers and users. Presently, the FloPy package is relatively new, but it is rapidly gaining in popularity due to its modularity, open-source availability, and support by USGS modelers (e.g. Rotzoll et al., 2016;Feo et al., 2018, Foglia et al., 2018. Benefits of using FloPy for developing groundwater models include: 1) model building and pre-processing steps are quick to execute, 2) specific inputs are easy to modify, for example, changing cell size, and 3) the modeling process is transparent and simple to share with team members, as well as with end-users, other researchers, or reviewers. Providing the ability to modify model inputs quickly and easily makes FloPy work with our process-based paradigm quite well, simplifying the model evolution process as new stakeholder needs, development of new procedures, and updated data become available.
At present, we have initiated FloPy groundwater model development for regional scale models covering the whole-island, and because this component of the framework continues to be driven by direct stakeholder needs, sustained development is intended to continue into the foreseeable future; pending continued support from both UHWRRC and ASPA. Presently the scope of the groundwater modeling component of the framework is focused on (1)  2. Model steps are broken into manageable units in order to make the process easy to understand. Each step is annotated with text-based explanations in markdown cells separating code blocks, and input data and important parameters are clearly defined, typically in a dedicated cell.
3. Whenever possible, output from each step is automatically visualized using in-line plots.
Plots are also saved as images and as geographic data files, using '.kml' format, which open directly in end-user mapping platforms.
4. All input datasets are simple, cleanly organized, and well annotated. Participants can directly modify inputs if needed or when other updated datasets become available.
5. Calibration routines automatically incorporate updated observation and input data, e.g.
updated water levels and revised recharge coverages from the SWB2 model following collection from the hydrologic monitoring network.
6. Cell size resolution is simple to modify so experimentation can be performed at low resolution with short morel run times. Resolution can then be increased for sensitivity testing or creating finalized results. 7. All model files are kept small enough to be hosted on GitHub (under 100 mb each).

Monitoring Network Implementation
Weather stations and stream gauges are downloaded on a quarterly basis and the raw data require QA/QC processing and integration with previous data to create long-term station records. We accomplish these tasks with Python based processing routines designed to produce output data that are formatted for use as input to the water budget model, with the primary weather station output accessed by the SWB2 model being a daily rainfall time series from each station. Once downloaded, the hydrologic technician at ASPA uploads raw data files directly to the cloud-based repository on GitHub (Shuler and Mariner, 2019), and new datasets are automatically incorporated with previous datasets once the processing routine is run.
The weather station data processing routine includes: -Consolidation and organization of raw data files into a single time series -Performing QA/QC checks -Removing previously identified sequences of corrupted data from known station malfunctions -Graphing datasets to allow users to inspect data validity (Fig. 4) -Summarizing data at different time resolutions and creating output files both for distribution and to be used as input in the Tutuila SWB2 Water Budget Model The streamflow datasets are processed with a script similar to the weather station routine. The streamflow routine processes updated stage (i.e. water height) data from each gage site as well as discrete streamflow measurements that are used for automated rating curve development.
Stage measurements from PT's at each site are post-processed by comparing with flow measurements to convert stage into discharge. The stage-discharge relationship is unique to each site and is dynamic. Alluvial processes are constantly reshaping channel morphology, thereby necessitating continual updates to rating curves. Although rating curves can take many different mathematical forms, we have achieved the lowest error by applying a 2d-polynomial relationship between stage and discharge at the Tutuila gauging sites. Data from land-based barometers is also uploaded with stream stage data and used to automatically correct for transient changes in barometric pressure.
The data processing routine for stream gauge data includes: -Automated barometric compensation -Removal of false readings -Corrections for physical changes at gauging sites -Automated rating curve development -Baseflow and surface runoff separation (Wahl and Wahl, 1995) -Summarizing data into daily time series, monthly averages, and annual averages (Fig. 5) The routine also generates monthly volumetric surface runoff rates for each basin above each gauge, which is the primary stream gauging output accessed by the SWB2 model to calculate runoff to rainfall ratios. An updated file is automatically generated each time new data is loaded and the routine is run, and this file is saved to a location directly accessed by the SWB2 model as it compiles input data.
Monitoring well data is processed with a routine similar to the streamflow routine, where data is consolidated, barometrically compensated, and QA/QC procedures are implemented (Fig. 6). At present, the groundwater modeling component is set up as a steady state model, therefore the transient water level data is averaged to obtain a single water level value for each site, which is then consolidated with the other static water levels and used for calibrating the MODFLOW model.

SWB2 Model Implementation
The SWB2 (Fig. 7). The annual groundwater recharge layer produced by SWB2 is then directly integrated into the FloPy pre-processing routine to supply the MODFLOW model with recharge rates at the desired spatial resolution. Because the framework is dynamic, results adjust as updated streamflow and weather station data are produced. Thus it should be noted that the model outputs reported here only represent the latest model iteration and are subject to change as new data are gathered.
The primary management utility of the water budget model lies in assessing different scenarios to show how possible future conditions may affect groundwater recharge. To develop recharge predictions in consideration of likely future climate scenarios, we modified the rainfall and temperature variables in the SWB2 model using output from gridded dynamically-downscaled climate projections for American Samoa (Wang and Zhang 2016). The gridded climate model data covered three specific scenarios: 1) present-day climate for the years 1990 to 2009, 2) future climate during the years 2080-2099 reflecting a lower-carbon emissions scenario (RCP4.5), and 3) 2080-2099 climate reflecting on a higher emissions scenario (RCP8.5). The Wang and Zang (2016) projections for both emissions scenarios predict significant increases in both precipitation and temperature, and when integrated into the Tutuila SWB2 model, this translated into overall increases in all water budget components as calculated by the modified SWB2 runs. Most notably, the 11% to 18% increase in precipitation predicted by the RCP8.5 and RCP4.5 scenarios, respectively, drove increases in groundwater recharge rates of 17% to 27%, respectively.

FloPy Model Implementation
One of the primary motivations behind developing the modeling framework within a cooperative paradigm was to build direct technical capacity at ASPA, the organization tasked with managing the island's water resources. Therefore, instead of focusing the groundwater model component on creating a single extensively calibrated static model realization, this component is designed to build modeling capacity and share knowledge between participants through the collaborative development of a set of modular FloPy-centric tools. By developing this portion of the framework as a dynamic toolbox we can continually adapt the models and data collection efforts to target specific water management questions as they arise. For example, ASPA is currently drilling a number of new production wells in the village of Malaeimi on Tutuila. As additional pump test and drawdown data are collected during well development, these tools can be directly applied by ASPA to help set sustainable pumping rates at the wells.
We developed the groundwater modeling toolbox as a series of Jupyter Notebooks that use the whole island of Tutuila as the model active area. When investigation of a more localized area is desired, the active area can be changed by substituting different shapefiles and the scripts can be reused with only a small amount of modification. The implementation presented here covers the whole-island model, which is intended to demonstrate the application of these tools.
However, the validity of the regional scale realization is severely limited by the spatial distribution of available observation data, which is a common issue when modeling steep and challenging terrains, such as high volcanic islands, where wells are only located in coastal areas. Therefore, results and plots based on this realization should be considered to be for demonstration purposes only and should not be directly used for management purposes at this time.
In general, the Tutuila FloPy Model follows a workflow that includes the following steps: (1) defining cell size and model geometry including active and inactive areas, layer elevations, and boundary conditions, (2) importing input data such as observed water levels, groundwater recharge from the SWB2 model, and starting values for hydraulic conductivity, (3) Setting starting conditions for model variables including water levels and the elevation of the salt-water interface, (4) compiling inputs into a FloPy model object, (5) running the MODFLOW executable as a sub-process in Python, (6) optimizing calibration parameters and (7) reading output files, visualizing results, and assessing model performance.

Model Initialization
The first cells of each notebook consolidate explanations, variable definitions, and settings (grid size, time steps, etc…) to make it easy for users to understand and interact with the model. All modules were written to automatically adjust for changes in grid resolution so users can reduce model run time when experimenting. Input datasets are clearly defined in a single cell so users know what data goes into each model. The subsequent Jupyter Notebook cells each contain a separate module that handles a specific model development task. These tasks generally correspond to specific MODFLOW packages, for example geolocation and model grid boundaries are defined in a single cell that creates the MODFLOW ".dis" package. In this cell the grid is defined by importing a projected shapefile, determining its extent, and extending the grid edges beyond the shapefile edges by a defined percentage. For this case study, the grid covers an area 5% larger than the extent of the -50 m bathymetric contour around Tutuila (Fig   8a). The ".bas" package contains model active areas and boundary conditions such as specified head or general head boundaries. We define these by overlaying shapefiles onto the model grid, and creating grid indexed arrays to assign conditions to individual cells. For the Tutuila model, the area inside of the -50 m bathymetric contour, which geologically represents the former maximum areal extent of the island prior to erosion and subsidence (Stearns, 1944), is defined as the model active area, and the submarine area between this contour and the island's coastline is set as a general head boundary to simulate the freshwater head (set to 0.01 m) exerted on the ocean bottom by seawater. (Fig 8b). The model top and bottom elevations are defined with a digital elevation model (Fig 8c) and a constant depth value of 1000 ft., respectively. Model stress periods and time steps are also defined for the ".bas" package.
Spatially distributed groundwater recharge is obtained from the output of the SWB2 water budget component and is spatially resampled to match the groundwater model grid size for implementation in the MODFLOW ".rch" package (Fig 8d). To simulate the interaction between salt and freshwater we use the FloPy SWI2 package (Bakker et al., 2013) to obtain a first order approximation of the position of the 50% freshwater-seawater interface (Fig 9). The more computationally expensive density-dependent package SEAWAT or (Langevin et al., 2008) can also be implemented with FloPy and should provide more reliable salinity information in localscale adaptations of the regional model.

Model Execution and Calibration
With the FloPy package model inputs can be formatted and the MODFLOW executable can be run directly from a single Python interface. This functionality allows the model to be wrapped into a loop, function, or class for model calibration and sensitivity testing. Calibration can be handled by a wide array of widely used Python-based optimization packages such as SciPy (https://www.scipy.org/), PEST (http://www.pesthomepage.org/), or other custom built optimization packages such as pyPCGA (Lee et al., 2016). To optimize the Tutuila model, we have designed two different workflows to calibrate for spatially distributed K values using a zone based approach and a grid based method both of which minimize error between observed and simulated water levels. The zone based approach divides the island into different hydrogeologic zones based on a geologic map (Stearns, 1944) and a single K value for each zone is parameterized for optimization using the scipy.optimize.minimize package (Figs. 10a and 10c).
The grid based approach applies principal component geostatistical approach (Kitanidis and Lee, 2015) by applying the open-source pyPCGA Python package. This package provides a ready-built, computationally inexpensive inversion method to solve for large parameter sets, in this case each model cell, with low numbers of observations. Each method has different benefits and costs, and the same model can be run with both to provide managers with a sense of how initial assumptions and methods may affect the model output (Figs.10b and 10d).
The static water level observations used for calibration were collected from historical records and monitoring wells in the monitoring network. Historical observations are taken from all known driller's logs and production well pump test logs conducted at the time of drilling. These "predevelopment" water levels represent the most reliable, spatially distributed static water level information for Tutuila, since no dedicated monitoring wells have been drilled on the island.
Historical water levels are lumped together with average water levels from the monitoring network and these water levels are automatically updated as monitoring well instruments are downloaded, and data is uploaded to GitHub. As new wells are drilled by ASPA, or as production wells are taken offline and converted to monitoring wells, these data will be included as well. Please note that the model results shown in the figures below are example visualizations and do not represent appropriately calibrated model outputs. Figure 10: Examples of calibrated hydraulic conductivity (K) distributions and resulting water table elevation contours. A) shows zone-based K calibration based on simplified geologic units from Stearns (1944). B) shows grid-based K calibration developed using the pyPCGA optimization technique. C and D) water table elevation contours computed using the zone-based and the pyPCGA methods, respectively. Note that results are shown only for demonstration purposes as the model calibration remains, as of this writing, in an oversimplified state.

Discussion and Conclusions
The traditional approach to groundwater modeling has a number of significant drawbacks. It is expensive, it produces products with limited longevity, and it is technologically dated. Recent advances in social-networking are spilling over into how we communicate professionally, how we work collaboratively, and how we approach data-science. Scientific endeavors, and especially computational tasks such as groundwater modeling, are well poised to take advantage of these new developments. Improvement in the sharability of information is revolutionizing how we work with each other, and this allows for a new process-based paradigm that promotes the maintenance of long-standing project partnerships. The collaborative, process-based approach is especially well suited to development of groundwater models on small islands such as Tutuila, where there is a critical management need for environmental models, but limited resources to develop and maintain the scientific capacity to use them.
Groundwater modeling is a complex process and within the traditional paradigm, often takes multiple years of project development to obtain results. During this period, the original research questions may become outdated, and newer more relevant questions may not be appropriate to answer with a model designed for older objectives.
To create a low-cost and functional groundwater modeling solution in light of these challenges, we developed a collaborative hydrologic modeling framework that integrates monitoring network data, water budget modeling, and groundwater modeling into a seamless data-to-model

Limitations of the Framework
The primary limitation of the collaborative modeling framework we propose here is the significant investment needed from the participants. Process-based approaches by nature simply require more time and commitment then static product-based approaches. While the existing funding and resource allocation setting at both ASPA and UH have been conducive to this model, we recognize that elements are often not aligned in this way. Nonetheless, we contend that new cloud-based collaborative tools can simplify numerous aspects of this challenge and facilitate the maintenance of relationships between agencies in different locations. Another limitation of this paradigm is that it depends heavily on the abilities and interests of both modelers and the stakeholders whom the work is intended to benefit. In a relationship where neither party is obligated to maintain engagement, the strength of collaboration relies on the cost to benefit ratio of maintaining the program for each organization.
Therefore, in our program, we strive to ensure that the benefits of working together, such as improved access to resources, maintaining dynamic models, and improved understanding of results and uncertainties, outweigh the costs.
From a technical standpoint, generalization of the code is another limitation of the framework.
Since our backgrounds are in hydrology and not programing, the code used in this project is primarily focused on meeting our and the stakeholder's specific needs, rather than producing a software product intended to generate similar results in different locations. Therefore, to apply these workflows to other datasets will require variable degrees of modification depending on their similarity in formatting and scale to the input datasets used in this workflow. Nonetheless, to account for the lack of robustness built into the code itself, we have instead focused on making our workbooks and workflows well annotated and easy to understand, so other potential users can learn from these methods and develop their own through variable degrees of modification.

Current and Future Management Applications[r1]
This case-study demonstrates a long-term, process-based groundwater modeling approach that as of this writing, continues to evolve. As stakeholders continue to develop uses for data and model results, and as our experience using these tools grows, we plan to continue developing the framework to meet the management needs in American Samoa. Planned additions to the monitoring network include additional streamflow stations and continued upgrades to weather station infrastructure. Data will continue to be downloaded on a quarterly basis and streamflow measurements for to rating curve updates remains ongoing. The water budget model will automatically incorporate new monitoring data as it is updated quarterly, and we are currently working directly with numerous stakeholders at agencies throughout American Samoa to develop future land-use scenarios for the water budget model. These scenarios will be incorporated with the future climate scenarios to provide multi-faceted predictions of impacts to groundwater recharge under different possible futures.
The groundwater modeling component is, as stated before, ongoing with continued model calibration, validation, and running groundwater pumping scenarios to assess potential rise in the transition zone. Collection of high-resolution salinity data is currently planned as part of ASPA's system-wide operational-SCADA system upgrades, and as this data becomes available we intend to improve the FloPy model by applying our existing calibration and validation approaches, e.g. the scipy.optimize method and pyPCGA (Lee et al., 2016). In addition, we also intend to explore and apply other open-source Python-based optimization modules such as SALib (Herman, J., & Usher, 2017) and SPOTPY (Houska et al., 2015) to refine our existing methods of sensitivity testing and model calibration. The ability to plug-in new tools to our analysis pipeline as they are developed and as we become aware of them is one of the great advantages to maintaining the workflow in the Python ecosystem. The groundwater model has already, and will likely continue to expose data gaps, which can be prioritized in the future.
These include, developing additional monitoring well capacity and additional constraint on mountain front recharge behavior in the Tafuna-Leone Plain area. Ultimately addressing the issues of sustainable yield and salt-water intrusion remains as the primary goal of the groundwater modeling component and we anticipate the tools developed here to be an important part of laying the foundations for these efforts to improve the water resources sustainability in American Samoa.