User-friendly workflows for catchment modelling: Towards reproducible SWAT+ model studies

A Graphical User Interface (GUI) is regularly used to support model applications in catchment hydrological modelling software. A GUI is generally user-friendly for novice users but opens sources of irreproducible research. We illustrate that none of the 10 Soil and Water Assessment Tool (SWAT) models over the Upper Blue Nile can easily be reproduced. Scripted workflows provide the ability to reproduce model set-ups, but they may be less user-friendly especially to novice users. We present a software (SWAT + AW) that promotes reproducible SWAT + model studies while remaining user-friendly for both novice and expert users. SWAT + AW uses a configuration file to create models that are compatible with GUI. We applied the workflow to the Blue Nile catchment and show that it yields the same results the SWAT + GUI. We conclude that such user-friendly scripted workflows enhance reproducibility, transparency and reusability of hydrological models. The software is publicly available at https://github.com/VUB-HYDR/SWATPlus-AW.


Introduction
The scientific method relies on the ability of scientists to reproduce each other's published results so they can build upon prior knowledge (Begley and Ioannidis, 2015;Marcus, 2015). Recently, the reproducibility of science has come under scrutiny (Marcus, 2015). It has been discovered that a large proportion of scientific research is not reproducible (Vasilevsky et al., 2013;McNutt, 2014). In their survey involving researchers from biology, chemistry, earth and environment, medicine, physics and engineering, and other fields, Baker and Penny (2016) reported that more than 70% of the researchers that tried to reproduce another scientist's experiment failed in the experiment. Research groups at Amgen Corporation were able to reproduce only 11% of the academic research on haematology and oncology (Begley and Ellis, 2012;Vasilevsky et al., 2013). In many other cases, major conclusions have not been confirmed, even when repeated by the same investigators (Begley and Ioannidis, 2015). A more troubling source of irreproducible research is scientific fraud and misconduct. Scientific fraud is where researchers tamper with the model to get the results that they are looking for.
Just like other scientific fields, the catchment modelling community suffers from reproducibility issues (Hutton et al., 2016). In response, Crout et al. (2008) have detailed 'Good modelling practice' which adds on 'Ten iterative steps in development and evaluation of environmental models' proposed by Jakeman et al. (2006). The guidelines presented in these papers help boost provenance and reproducibility in modelling studies. As such, current publication practices in catchment studies should enable reproducibility (Shu et al., 2012), but there are a number of sources of irreproducibility in the model creation process. This is mainly because there are many adjustments made during a catchment model setup to tailor the model to a specific context. Furthermore, some catchment modelling exercises also use some bits of code during the model setup phase which, in many cases, are not made available (Hutton et al., 2016) and are too specific for that exercise. While some catchment modellers take time to report specific details of their model setup, other technical information is still not included. The above causes of irreproducible research point to lack of transparency in reporting as the overall cause of irreproducible research in catchment modelling. Crout et al. (2008) also list the lack of transparency as one of the barriers to reproducible studies. Refsgaard and Henriksen (2004) point out that without transparency, "modelling projects can be difficult to audit, and without a considerable effort it is hardly possible to reconstruct, repeat and reproduce the modelling process and its results".
The problem is illustrated when evaluating the reporting in peer reviewed journals of several studies using SWAT for catchment modelling of the Upper Blue Nile catchment. For each of the studies, the availability of the information that is necessary to reproduce the study is shown in Table 1. Lacking any of this information makes it very difficult if not impossible to reproduce the SWAT models results. Leonard and Duffy (2016) emphasise the value of workflows in capturing all steps for reproducibility and provenance. Furthermore, Crout et al. (2008) encourage modellers to embrace the use of automated methodologies to support transparency and understanding of results. Automating the model creation workflow stores all settings and data which, if made available, solves the lack of transparency on model creation. Catchment modelling can, thus, benefit from using fully automated workflows for the entire model setup process. However, one needs at least basic programming skills to use automated workflows. In addition, introducing automated workflows to catchment modelling typically replaces a Graphical User Interface (GUI) which often allows direct manipulation of the models and is also more intuitive for a novice user. Nielsen et al. (2017) pointed out that GUIs assist in bridging the expertise gap experienced by potential new model users. However, when Chen and Zhang (2007) compared the advantages and disadvantages of both GUIs and Text-based User Interfaces (TUIs) which most workflows use, they concluded that the TUI offers more benefits to expert users Table 1 Availability of information for successful SWAT + model reproducibility for past studies in the Blue Nile (Ayele, 2017;Bayissa, 2018;Betrie, 2011;Gashaw, 2018;Lemann, 2016;Roth and Lemann, 2016;Tegegne and Kim, 2018;Woldesenbet, 2017;Worku et al., 2017;Worqlul, 2018). C.J. Chawanda et al. Environmental Modelling and Software 134 (2020) 3 than to novice users. We argue that an ideal workflow for catchment model setup should care for both the novice and the expert.
This study aims at promoting reproducibility and transparency of SWAT + for both novice and expert users by presenting a software, SWAT + Automatic Workflow (SWAT + AW), for setting up the Soil and Water Assessment Tool+ (SWAT+). The innovative element is that the software is interoperable with several GUIs used to setup SWAT + models. A user can start in SWAT + GUIs, which gives more support to the novice users and transfer the model to the SWAT + AW. We elaborate on how the software improves reproducibility of SWAT + model by summarising model configuration into a single configuration file (con- fig.py). We show that the configuration file alone is sufficient to allow reproducible SWAT + models as it transparently stores all model settings.

Present modelling procedure for SWATþ
The Soil and Water Assessment Tool (SWAT) (Arnold et al., 2012) is one of the most widely used catchment model software. It is a deterministic, time-continuous, semi-distributed hydrological model software developed for application at catchment scale. SWAT has been applied to many case studies in different parts of the world (e.g. Ricci et al. (2018); Perra et al. (2018); Aouissi et al. (2018); Shen et al. (2015); Bieger et al. (2013)) SWAT is based on the concept of Hydrologic Response Units (HRUs). HRUs are areas with a unique combination of soil, land use, slope and sub-catchment at which calculations take place. Results are routed through a stream network that is often derived from the DEM to the outlet. SWAT+ (Arnold et al., 2018) is a completely restructured version of SWAT. SWAT + uses the same equations as SWAT while offering more flexibility in model configuration to the user. SWAT + introduces Land Scape Units (LSUs) to discretise subbasins further and allow separation of upland processes from wetlands. During model setup, subbasins are delineated based on stream thresholds while LSUs are delineated based on channel threshold as described in the user manual (https://swatplus. gitbook.io/docs/user/qswat+). Decision tables have also been introduced in SWAT + to schedule activities that are only carried out when specified conditions are met. Decision tables in SWAT + can be used to specify land use manage ment and reservoir management. SWAT + also introduces connection files which allow users to connect objects within the model easily.
Currently, the only interfaces for setting up a SWAT + model are QSWAT+ (Quantum GIS (QGIS) based interface) and SWAT + Editor. Apart from input data preparation, the model application procedure can be described based on the steps presented in the GUI (Fig. 1). Steps 1 through 3 are performed using the QSWAT + software in QGIS, whereas steps 4 and 5 are performed using SWAT + Editor.

New workflow for SWATþ: The SWAT þ Automatic Workflow (SWAT þ AW)
We created a software where we automate the procedure to set up a SWAT + model from a given dataset and settings selected by the user in a single file we refer to as config file. We implemented the software using a Python-based scripting environment. Python is a simple programming language system with extensive library support for programming in general (Snyder, 2007). Being an interpreted and a high-level programming language, users can easily add functions that they need to the software. A list of dependencies for the SWAT + AW is specified in the user manual available from the SWAT + AW repository (https://github. com/VUB-HYDR/SWATPlus-AW).
SWAT + AW takes the directory where it is executed as an argument and uses the config file available in the directory to setup the SWAT + model which is saved in the same directory as the config file's. The model can be opened in the GUIs that are available to set up SWAT + model. SWAT + AW also can retrieve data from an already existing model and create a config file that reproduces the model from which the config file is created. The software is described in detail below.

Software requirements
The workflow has been tested on Windows 10 where it uses Python 3.7 and on Linux (Ubuntu 20.04 running, where it uses Python 3.8). Users need to install the 64-bit version of QGIS v3.10 and SWAT + AW using the installer from the repository.

Input data
The input data is prepared in the same way as input that is used in the GUI. However, it is placed in a predefined directory structure within the directory named 'data' within the directory where SWAT + AW is run   The organization of the input data files is further described in the user manual available from the repository.

The config file (config.py)
The workflow requires user options which are specified through the config file (Appendix 8.1). The config file is a Python file where the user enters model configuration settings using text editing programs such as Notepad. It is placed in the same directory as the data directory. The following information must be entered into the config file to set up a new model: If there is information on reservoir management and land use management for the model being set up, this information can be entered in the config file, and SWAT + AW will use this information during the model setup. Otherwise default management practices will be applied.
The workflow allows the user to retrieve input data and a config file from an existing model setup that was created in QSWAT+ and SWAT + Editor or using SWAT + AW. Data and config file are retrieved if the Model_2_config option is set to True. In this case, the model should already be present in the directory where the config file is located. SWAT + AW also has an option to keep a log for troubleshooting later or just for keeping a record during model setup.

Software code structure
Before starting the SWAT + AW, the user should populate the config file and the data directory. The user can start the software by opening Command Prompt or PowerShell in the current directory, typing swat-plus_aw and pressing enter. Running this command sets up the Python environment for working with QGIS Application Programming Interface (API; PyQGIS) which is necessary to run the main steps, 1 through 6 ( Fig. 3). It is also possible to run only a specific step without running all the steps. The swatplus_api command is used with a step option as an argument to run only one step, e.g. swatplus_api prepare_project. The available step options include: prepare_project, delineate_watershed, create_hrus, setup_editor_project, import_weather, con-figure_model_options, setup_management, write_files, run_swatplus, make_figures and calibrate.
Details for each step are discussed below.

Prepare project
The project preparation stage is performed depending on whether the Model_2_config option in the config file is set to True or False. If Model_2_config is set to True, project preparation is not carried out. Otherwise, SWAT + AW will prepare the QGIS project structure for the SWAT + model application. This stage includes the following steps: (i) Creation of the default directory structure for a SWAT + application: The name of the parent directory bears the name of the project specified in the config file (Fig. 4). (ii) Set up of databases: Two databases are set up. One is swat-plus_datasets.sqlite which contains default parameters for the SWAT + model. SWAT + AW also creates the SQLite project database bearing the project name. The project database at a later stage stores HRU data. The scripted workflow also imports landuse lookup, soil lookup and usersoil tables into the database. The names of the tables are acquired from the config file. (iii) Transfer of GIS input data: SWAT + AW integrates input raster and shapefiles from the data directory into the project directory structure under Watershed (Fig. 4) (iv) Creation of a QGIS project file: The file bears the project name which is acquired from the config file. SWAT + AW also stores information on watershed delineation and HRU creation into the QGIS project file so that QSWAT + can read it later when the user opens the project in the GUI. This file is saved in the Project Name directory (Fig. 4).
After the project preparation stage, all files are in place, but there is still the need to delineate the watershed and create HRUs before SWAT + Editor functions can be performed on the project. Watershed delineation and HRU creation are done in step 3.

Create config file
This step is only performed if the Model_2_Config variable in the config file is set to True, as seen in Fig. 3. If this step is performed, it means step 1 was not performed. SWAT + AW will look for SWAT + models in the directory where it is launched. If any of the found models is listed in the config file, it will extract (a) raster files and shapefiles from the Project Name/Watershed directory within the project directory structure (Fig. 4), (b) lookup tables, usersoil table, and model options from the project database, (c) weather data from the model. In the case that the model listed in the config file was not found or that there was no config file in the directory, the user is requested to select from the list of available models.
At the end of this stage, a data directory is created and populated by the data retrieved from the model, while model configuration information is saved in the config file.

Create HRUs
The aim of this stage is to generate unique combinations of four layers, namely: landscape units (LSUs; sub-divisions of sub-basins), soils, land use and slope classes. However, by the time the SWAT + AW reaches this stage, the landscape units layer and the slope classes layer does not exist yet. Therefore, the first step at this stage is Watershed Delineation in order to create sub-basins and landscape units.
Watershed Delineation involves the use of the DEM to calculate a flow direction and a flow accumulation map. Stream and channel thresholds are applied to the flow accumulation map to derive Stream and Channel Networks respectively. Stream outlets are created in the stream network wherever two streams meet, and additional outlets are acquired from the outlets shapefile listed in the config file. The watershed is delineated by drawing a water divide around the pixels that contribute to the main outlet. In the same way, sub-basins are delineated by drawing a water divide around pixels that contribute to each stream outlet while LSUs are delineated using channel outlets. It is important to note that the channel threshold must be equal to or smaller than the stream threshold. In addition, the complexity of the generated networks depends on the magnitudes of the stream and channels thresholds; the smaller the thresholds, the higher the complexity.
Next, the slope class map is created. This map is derived from the DEM using the slope classes specified in the config file. Upon creating the slope map, all the four layers necessary for creating HRUs become available. The four layers that are required to define HRU's are overlaid, and unique combinations of the layers are identified and are referred to as potential HRUs. These potential HRUs are filtered using methods and thresholds specified in the config file to form the final HRUs. The final HRUs are finally saved in the project database.

Edit inputs and run
At this stage, SWAT + AW creates and saves a SWAT + Editor project for compatibility with the SWAT + Editor GUI. Weather information is then imported from the data directory, and model options are configured in the project database.
The SWAT + executable does not read directly from the databases. Ascii files containing all information about the model setup are written to the TxtInOut directory in the directory structure specified in Fig. 4 from where the executable runs. Depending on whether calibrated parameters are provided, the SWAT + AW will apply the parameters before attempting to run the model. The workflow finally runs the model using the executable type specified in the config file: 1. Do not run. If this option is used, the model is not run. 2. Release. The release executable runs faster, but it does not provide information for troubleshooting errors. 3. Debug. The debug version runs slow, but it provides information for troubleshooting errors if they occur.

Run calibration and make figures
Calibration is performed only if the Calibrate option in the config file is set to True. Calibration is simply based on the best performing parameter set from a sample generated using Latin-Hypercube sampling described by McKay et al. (1979). The user can specify the sample size and also the number of processes in the config file to run calibration in parallel. A calibration configuration file is also specified in the config file, and it stores settings for performing the calibration including calibration periods, parameter ranges and observation file names. Observation data is obtained from the observations directory within the data directory (Fig. 2).
The SWAT + AW finally creates maps for different variables extracted from the model results if the Make_Figures option is set to True. These maps are saved in Figures directory within the Default directory (Fig. 4).

Study area
We tested the workflow by applying it to the Blue Nile catchment. The Blue Nile river is one of two major tributaries of the Nile River. It begins from Lake Tana in Ethiopia and joins the White Nile at Khartoum, Sudan (Fig. 5) and covers a total area of 330,000 km 2 (Ali et al., 2014) Elevation above sea level varies from about 3000 m in the Ethiopian highlands to approximately 400 m at Khartoum.
The annual rainfall ranges from 900 mm to 2200 mm in the upper part (Ali et al., 2014) while the lower part receives about 135 mm on average (Mahmoud et al., 2014). The catchment's large portions of natural forests have rapidly been transformed to farmland due to population increase (Ali et al., 2014). Agriculture heavily depends on irrigation utilizing water from shallow wells close to rivers or surface water owing to low rainfall, especially on the Sudanese part of the catchment.
Groundwater recharge is limited in the catchment, and most of the water in the catchment is lost through evapotranspiration which amounts to 955 mm on average annually according to WaPOR (FAO, 2018). MacAlister et al. (2013) reported annual groundwater recharge below 50 mm/year in the arid plain of Sudan and up to 400 mm/year in the highland areas of Ethiopia.

Available data
The scripts used to process data used for the application of the workflow can be found at https://github.com/VUB-HYDR/2020_Chawa nda_etal_EMS and the prepared data is available from https://doi. org/10.4211/hs.0890b3a954bf423db7d5b08f122b5436.
Digital Elevation Model: A 300 × 300 m 2 resolution DEM was obtained by re-sampling downloaded DEM from the Shutter Radar Topography Mission (Farr et al., 2007). This data can be downloaded from http:// srtm.csi.cgiar.org.
Soil Map: A 300 × 300 m 2 resolution soil map of Africa was prepared from the Food and Agriculture Organization (FAO) Digital Soil Map of the World (DSMW) presented in Fao Geonetwork (FAO et al., 2009). Usersoil (soil properties) and soil lookup tables for the DSMW dataset were obtained from MapWindow-SWAT database. The DSMW dataset can be downloaded from http://www.fao.org/geonetwork/ srv/en/resources.get?id=14116&fname=DSMW.zip&access=private, and MapWindow-SWAT is freely available from the SWAT website (https://swat.tamu.edu/software/mwswat/).
Land Use Map: The land use map was obtained from the European Space Agency (ESA) Climate Change Initiative -Land Cover (Defourny et al., 2017) for 2009 and had a resolution of 300 × 300 m 2 . The land cover map can be downloaded from http://maps.elie.ucl.ac.be/CCI/ viewer/download.php.
Weather Data: The EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP (EWEMBI) (Lange, 2016) dataset was used in the study. The original dataset includes records of precipitation (kg m − 2 s − 1 ), minimum and maximum temperatures (K), solar radiation (W m − 2 ), wind speed and relative humidity (%) at a daily time-step and with a spatial resolution of 0.5 • . This data was prepared in the format for use with SWAT. The units were converted from kg m − 2 s − 1 to mm/day for Precipitation, K to o C for minimum and maximum temperatures, and from W m − 2 to MJm − 2 d − 1 for solar radiation. Units for windspeed (ms − 1 ) and relative humidity (%) did not require conversion.

Methodology
To test the program, we first built the Blue Nile catchment model using the QSWAT+ and SWAT + Editor GUIs (Fig. 6). The settings and options for the model which were picked just for testing purposes are shown in Table 2.
We built another model using SWAT + AW (Fig. 6 b) with the same settings in Table 2. We organised input data into the directory structure shown in Fig. 2. The settings for the model setup were entered into a config file, and we ran SWAT + AW to setup the SWAT + model. This model was compared with the one created using the GUIs. Both models are accessible through HydroShare .
We then analysed the mass balance in the catchment for each model setup. We also compared flow results from two model setups against each other using Nash-Sutcliffe Efficiency (NSE) and Root Mean Square Error (RMSE) (Moriasi et al., 2007). Lastly, we generated the config file C.J. Chawanda et al. Environmental Modelling and Software 134 (2020) 104812 7 from the model made using the GUIs and compared it to the config file used to set up the SWAT + AW model (Fig. 7).

Results
Both models had 209 LSUs and 1329 HRUs. Comparing flow timeseries from the two models yielded an NSE value of 1.00 and an RMSE of 0.00. These values indicate that there is a perfect fit between the hydrograph from the SWAT + AW and the hydrograph from the GUI.
We also compared the water balance components from both model setups. The resulting water balance components were identical, and either method in setting up the model yields exactly the same results.
The config file used for the SWAT + AW setup (appendix 8.2) had the same settings as the one derived from the GUI model setup (appendix 8.3), demonstrating that indeed SWAT + users can derive config files for their model setups and share them to promote transparency and reproducibility of their work. Furthermore, SWAT + AW prepares and organises the input that was used to create the model into the folder structure shown in Fig. 2 so that the data can be used without further processing. Sharing the data makes sure that others can reproduce models without worrying about the preparation of SWAT + input dataset.
It is also possible to open the project created by the workflow in the GUI, which allows visualisation and adaptation if necessary. However, it is easier to change settings in the workflow as the user then only needs to edit the config file to change the desired setting, and run swatplus_aw. If the user changes the config file, they can choose to run SWAT + AW from a specific point without having to start from the beginning. On the other hand, using the GUI requires the user to go through the Delineate Watershed and Create HRU Windows. Afterwards, SWAT + Editor would still have to be used again to input weather data and write the ASCII files from which the SWAT + executable reads.

Discussion
In this study, we present a new Python-based software to manage workflows for setting up SWAT + models keeping all model configuration settings in one file, the config file. Users can share the config file to allow others to run the workflow to recreate models. We have shown that the results obtained through the workflow are identical to those obtained through the GUIs. Thus, users are assured that if they use the workflow to create config files and share data, their models will fully be reproducible.
SWAT + AW allows the retrieval of a config file and data from a model created in the GUIs. This provides beginners with an opportunity to set up their model in the GUIs and obtain a config file and prepared dataset that reproduces their model. Thus, both beginners and expert users can ensure reproducible models using the workflow sharing the config file. Note that sharing the config file is not the same as sharing the QGIS project file. This is because in the SWAT + AW software, the QGIS Project file is a derivative of the config file, and apart from the userunfriendly format for text in the XML file, it does not have all configuration information contained in the config file such as routing methods, infiltration method and ET calculation method.
Not only does SWAT + AW promote reproducibility for SWAT + models, but also transparency. The configuration of each SWAT + model needs to be communicated clearly for people who need to build upon previous works. However, Table 1 indicates that researchers cannot build on previous modelling studies in the area due to incomplete model configuration information. While models may be configured based on the purpose of the study or the data that is available, the number of models in Table 1 built for the same area may also indicate lack of confidence in the existing models upon which to build new work.
With the emergence of model sharing platforms such as SWATShare (Rajib et al., 2016) and HydroShare (Horsburgh et al., 2016;Morsy et al., 2017), config files and datasets can be published with an assigned unique persistent identifier such as DOI. The unique identifier can be used for reference in published papers which then provides an easy and transparent way of sharing modelling settings and options used in the publication. We encourage the SWAT + users to generate config files and publish to HydroShare to boost provenance and make it easy for others to audit and reproduce their work. This will enhance progress in model studies as researchers gain trust in previous studies in the area.
One of the major advantages SWAT + AW is that users will be able to script runs. Users can script model runs using different config files and create models with varying configurations. For example, using this approach, users can vary parameters in a given range or distribution and generate stochastic results for quantification of hydrological model uncertainty. At present, model configurations for different hydrological models cannot be changed as easily as changing values in a config file since the user would have to build each model setup in the GUI. Thus, we need scripted workflows across different hydrological modelling software to save time and effort when testing different configurations of a model.
SWAT + AW allows to reproduce simulations when the simple calibration mechanism that is incorporated in the workflow is used. Nevertheless, SWAT + AW can be useful even if another algorithm outside the software such as IPEAT+ (Yen et al., 2019) have been used for calibration. In that case, users must include the calibrated parameters in the workflow to reproduce the final calibrated model results.
Scripted workflows also provide many opportunities in enhancing the modelling process if deployed on Cloud Computing (CC) systems and High Performance Computing (HPC) facilities. These facilities overcome storage and computational power limitations that many users face during modelling, especially users working on large scale hydrological modelling. Taking advantage of CC and HPC, datasets can be kept on the facilities and users around the world would just provide a config file (and data if not available online) to create new or reproduce old models. User-friendly workflows such as the one presented in this paper enable both novice users and experts to take advantage of the CC and HPC facilities without losing compatibility with GUIs which may be used for visualising model structure and results.
There are features that can improve the workflow; for instance, better calibration mechanisms can be added to improve calibration results. However, being open-source and written in Python 3.7, users can easily adapt modules to include the features they need. It is worth pointing out that while it may not be practical to adapt SWAT + AW to other hydrological modelling software, SWAT + AW highlights the key elements that similar software should haveinteroperability with GUIs and a comprehensive config_file -and provides open source building blocks to set up similar environments for other models.

Conclusions
Lack of reproducibility of catchment models as reported in scientific research is an important issue which is largely due to the lack of information in reporting, as demonstrated in Table 1. This paper tackles this issue by providing a software to manage workflows that summarizes all user settings in a single config file. Attaching this file as supplementary information provides enough information to allow for reproducible model results. Hence, the tool will support good modelling practices for model-based research. Furthermore, sharing data, along with the config file, makes the model accessible to other researchers what enables further research in areas where data is difficult to find. In addition, SWAT + AW also allows reusing the same model with slightly different settings, for instance when a user scripts runs using a range of parameters for obtaining stochastic model results. The innovative element is that the workflow is fully interoperable with the GUI: a model produced by the workflow can be opened and adapted by the GUI, while a model built by a GUI can be converted to a data set and config file that reproduces the model built in the GUI. Our software can thus be used by SWAT + users with different backgrounds and scripting skills.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.