Web-based access, aggregation, and visualization of future climate projections with emphasis on agricultural assessments

Access to climate and spatial datasets by non-specialists is restricted by technical barriers involving hardware, software and data formats. We discuss an open-source online tool that facilitates downloading the climatedatafromtheglobalcirculationmodelsusedbytheInter-SectoralImpactsModelIntercomparison Project. The tool also offers temporal and spatial aggregation capabilities for incorporating future climate scenariosinapplicationswherespatialaggregationisimportant.Wehopethatstreamlinedaccesstothese data facilitates analysis of climate related issues while considering the uncertainties derived from future climate projections and temporal aggregation choices. © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Motivation and significance
Studies of the effects of climate change on agriculture typically involve using observational data to determine the parameters connecting climate variables to agricultural productivity and then using future climate projections from global circulation models barriers include software, hardware, and the need for specialized skills to handle non-standard formats [3]. In addition to access to the data, the spatial processing is not trivial as it requires expertise in geographic information systems (GIS) methods to process both the climate data and the auxiliary datasets [4].

Scientific rationale
The tool discussed in this article is designed to reduce the technical barriers to accessing climate model outputs. For this, we have built a web-based tool that facilitates downloading and aggregating global grids (0.5 degree) of bias-corrected, historical and future monthly mean temperature and precipitation from the five General Circulation Models (GCMs) used by the Inter-Sectoral Impacts Model Intercomparison Project (ISI-MIP) [5,6]. (See Table 1 for the included models). The scientific problem the tool contributes to solving is how to facilitate the analysis of future climate scenarios in applications where spatial aggregation is important. This includes a wide range of economic analyses focused on either impact assessment [7][8][9] or policy analysis [10].
Access to the data produced by GCMs is far from trivial. For instance, bulk downloads from the Earth System Grid Federation, 1 an open data repository for the CMIP5, requires using Linux in order to run the bash script provided by the system. In addition, the bandwidth and storage capacity for the climate data are often limiting factors, both because of the size of the climate datasets as well as the number of existing models and emission scenarios. Moreover, manipulation of the spatio-temporal grids of the climate data requires considerable dexterity in using GIS software. This is even more so when the data needs to be merged with other datasets for aggregation over time and/or space. Procedures performed in user friendly, point and click interfaces of available GIS software are difficult to document, reproduce, and share. Best practices for handling spatial data involve using processing scripts, which requires knowledge of some more general-purpose programming languages.
In our experience, none of these barriers is necessarily a significant issue for users who have advanced training in climatology or computer science. However, for cross-disciplinary work, a lack of skills for accessing and processing climate data can be a major obstacle. In a widely read blog specializing in interactions between the environment and society, Auffhammer [16] addresses the difficulties economists face in trying to access the climate data from the CMIP5 archive. Hertel et al. [17] identify barriers to access to geo-referenced data as a main factor impeding a better understanding of how global environmental changes affect the sustainability of the global food system. To foster multidisciplinary work, geoprocessing on-line tools have been identified as being effective [18]. Such tools present a number of advantages including reducing software and hardware costs [19], leveraging shared cyberinfrastructure via web services [20], and combining elements of various workflows across different studies [21].

Target users
The tool targets mainly, but not exclusively, researchers interested in the effects of climate change on agriculture, but who lack the training and/or resources to obtain climate data projections. The target users of the Climate Scenario Aggregator (CSA) were identified in the course of a multi-year pilot effort originating in a request from the UK Foresight Programme to review the adequacy of the global data base infrastructure for analyzing issues related to agriculture and the environment [17]. The need for online tools to deliver large and complex geo-referenced datasets arose from an in-depth diagnosis of the availability of geospatial data for analyzing the impacts of global environmental change [21]. Moreover, the need for these tools was validated through three international workshops 2 with researchers and policy analysts working on climate change issues in both developed and developing countries. We also drew from our experience training graduate students in agricultural economics and computer science to work on multidisciplinary teams on issues related to climate change and global food security.
At the most general level, the CSA tool can be used as a downloading platform for the original GCM data in the ISI-MIP archive. The target user of this functionality is expected to be skilled in NetCDF formats, have a relatively powerful computer with reasonable bandwidth, and be comfortable with the scripting and/or programming languages needed for manipulating and processing spatially-explicit data. A second target user may need some assistance with basic preprocessing of the data, such as temporal and spatial aggregation. This user will benefit from the aggregation programs as well as the preprocessed datasets for temporal aggregation (crop calendars) and spatial aggregation (e.g., from grid cells to countries.) Finally, a third target user may be interested in the download and aggregation capabilities of the tool while employing alternative spatial aggregation schemes (e.g., gridded population.) The CSA tool is related to other tools that seek to simplify access to (and spatial geoprocessing of) climate data while leveraging shared resources and expertise. For example, Wang et al. [3] developed user-friendly software applications for downscaling climate data for ecological modeling applications. Meanwhile, Villoria et al. [22] built an aggregation tool that facilitates access to the gridded projections of yield changes produced by the Agricultural Model Intercomparison and Improvement Project (Ag-MIP) [23]. This tool is being widely used as documented in usage logs and 3 peer-reviewed published articles [e.g., 10,24]. The CSA documented in this article has had a rapid uptake by the research community 4 and its range of applications is much more general than that of the AgMIP tool, so we expect it to have a larger impact.

Software description
The CSA tool is available at the GEOSHARE web site 5 and can be accessed using any standard Internet browser. This tool allows users to calculate, for each half-degree land pixel, a crop-specific growing season average value of temperature and precipitation using the global crop calendars from [25] (See Table 1 for crop coverage.) The tool also permits aggregating the pixels to large geographic units using crop harvested area and production from [26]. All of the source code -a Java graphical user interface (GUI) and a set of R functions -can be freely downloaded from the tool's landing page. The documentation and support for users include a User's Manual as well as a set of default regional maps and weighting schemes.

About HUBzero
The GEOSHARE web site was developed based on HUBzero [20], an open source software platform specializing in disseminating scientific data and simulation tools via the world wide web. Originating in the nanotechnology community, 6 HUBzero has evolved to constitute a flexible environment for online collaboration, education, and outreach. HUBzero brings a unique and important feature to scientific collaboration. Non-expert users, domain scientists, and students, for example, can rapidly develop online applications and tools, publish and share with others who can access these tools, and launch computation on the national cyberinfrastructure such as XSEDE and the Open Science Grid in their web browser without having to download and install any software. HUBzero also bundles the social networking features that specifically support scientific collaborations (tagging, reviews, citations, Q&A, forums, project groups, etc.).
HUBzero has a set of predefined steps that guides tool developers to develop and contribute tools online. It starts with filling out an online form with basic tool information. It then informs the hub manager to create a new project area for the tool. After that, the developer develops and tests the code using the hub's workspace and Subversion source code repository. When the code development is completed, the hub manager installs the tool, which then becomes available for the development team to test. Upon approval, the tool is officially published and available to other hub users.
Hub tools are desktop tools executed securely in a remote virtual container. Users interact with the tool's graphical user interface in their web browsers, enabled by the virtual network computing (VNC) technology. The end user connects to the VNC using either Java Applet or HTML5 based clients, both of which are supported by all modern browsers. Therefore, beyond a web browser, users do not need to download or install any software locally. The VNC technology compresses and transfers the pixels of the screen shared between the remote server and the end user as the user interacts with the tool. Because the tool's interface is relatively simple, and all the interactions are through button clicks, the amount of data that need to be streamed between the remote server and the local machine is relatively small, which helps with the user experience under restricted network conditions.
HUBzero provides the RAPPTURE Toolkit to aid rapid tool development. 7 RAPPTURE essentially web-enables desktop applications without web programming, hence allowing scientists (mostly non-expert web developers) to put graphical user interfaces into their scientific applications and make them accessible on the web, accelerating the deployment of new tools. Recent efforts in developing software building blocks for geospatial data management and processing have given rise to new mapping libraries and capabilities for hub tool development and a series of user communities and shared tools. 8

Software design
As shown in Fig. 1, the CSA tool consists of four major software components: the graphic user interface, a file browser, a data processing module, and a Globus Online data transfer library. The GUI was mainly developed using the Java programming language. It is integrated with a generic file browser that was implemented in Python and allows users to move data between the tool's online storage and their local desktops. It also allows users to publish results to a hub data management system called iData with metadata automatically associated. The ISI-MIP archive is accessed through Globus Online [27,28], a service that facilitates transfer of large datasets. In order to enable a large data transfer between the CSA tool and the remote ISI-MIP data archive, a Globus end point was created on the hub server and configured to authenticate with the ISI-MIP Globus end point. Furthermore, a service wrapper of Globus client commands called isimiptransfer was created in Python and may be executed using the Hub's submit library for querying and fetching data from the remote data repository via the Globus Online protocol. Finally, a set of R scripts were run at the backend that support data processing functions based on user's selection at the user interface.
In order to retrieve the data, the user selects a unique combination of variables, climate models and scenarios, which are all presented in the tool's user front-end (Fig. 3). The user's selections create a character string that matches the file names stored in the ISI-MIP archive. This character string is used to retrieve all the available years -in most cases, each file stores information on 10 years worth of data -for the selected scenario. Once in GEOSHARE's Hub, the files are stored in a common server workspace. Before each data request, the tool checks whether the data have already been  downloaded, and if so, indicates this to the user. This feature avoids downloading the same data more than once. At this point, users can either download the raw NetCDF files for custom processing on their desktop or proceed to aggregate the data through the GUI implementation in Fig. 4.
Aggregation is performed by three R functions. The first function reads the data using the R NetCDF package [30]. The second function estimates pixel and crop specific growing-season averages of the chosen climate variable. Planting and harvesting months for each pixel are from [25]. In many cases, the harvesting month is in a different year from the planting month. For example, corn planting in most of Argentina occurs in October and the crop is harvested in April of the following year. Meanwhile, corn planting in the U.S. starts in May, with the crops harvested in September. In order to avoid ambiguities, we assign the average value of the variables (e.g., temperature) over the growing season for the month in which the harvest occurs. So, the value of the average growing season temperature for the year 2000 corresponds to the Argentinean harvest of April 2000 and the US harvest of September 2000 (see Fig. 5).
A third R function performs the aggregation from grid cells to larger geographic units. Users have the opportunity to select different aggregation schemes or upload their own. For example, aggregation from the grid cells to country level requires a mapping that correlates each latitude and longitude pair with a unique country name. The mapping schemes are simple comma-separated value files. By default, we have included regional mappings from grid cells to countries, country-AEZ regions, and global. Simple guidelines for preparing these data files are in the User's Manual, which can be retrieved from either the description page or the tool. In addition, the tool allows for weighted and unweighted aggregations. Files are provided from weighted aggregations using harvested areas and production based on the gridded crop harvested area and yield statistics from Monfreda et al. [26] . The CSA tool also keeps a record of the user's choice, producing a text file that indicates the chosen combination of GCM, RCP, and variables which can be obtained by clicking on ''Data description'' in the Download tab (Fig. 3). For users performing an aggregation in the Aggregation tab, the documentation includes aggregation choices as well as the source of the aggregation weights (see Fig. 4.) Fig. 6 displays four plots that illustrate the versatility of the tool in terms of spatial and temporal aggregation of the GCM outputs. Fig. 6A compares growing-season temperatures for wheat in a single grid cells near Manhattan, Kansas in the U.S. Fig. 6B displays historical and average temperatures during the growing season of maize for the U.S. using projections for RCP 2.6 for the five GCMs included in the ISI-MIP archive. In this case, the individual grid cells have been weighted by their contribution to total US maize production using production weights. A noteworthy feature of Fig. 6B is that it allows an understanding of the uncertainty embedded in the model so eventually this uncertainty can be included in modeling exercises or impact analyzes. The two following figures, C and D, display temperature and precipitation aggregated from individual grid cells to the global level using three different aggregation modalities: weighted averages using harvested area weights, weighted averages using production weights, and unweighted averages. These two figures exemplify   5. The average temperature/precipitation over the growing season is assigned to the calendar year in which the harvest season occurs. In the example, for Argentina, the average temperature/precipitation in Calendar Year 1 is taken over October 0-April 1, while in the U.S., the average temperature (Midwest region) is taken over May 1-September 1. The planting and harvesting dates for each country are from [25]. the usefulness of the tool for evaluating different empirical choices of aggregation at different spatial scales.

Impact
Our software makes three contributions. First, it provides straightforward access to a widely used set of bias-corrected model climate projections from the CMIP5 archive. Second, it provides important GIS functionality for data aggregation. Finally, all the downloading and processing is in remote servers. It is likely that these contributions have varying degrees of appeal for different Areaweighted average temperature during the maize growing season in the U.S., RCP 2.6 for the five available climate models; C: Global weighted (using harvested area and production weights) and unweighted average temperature over the maize growing season; D: Global weighted (using harvested area and production weights) and unweighted average precipitations over the maize growing season. users; nevertheless, by expanding access and lowering entry barriers to use, we expect that this tool will advance the study of the impacts of climate change in world agriculture across several geographic scales. The potential research questions that benefit from streamlined access to climate data include statistical analysis of future climate patterns, modeling the human and ecological impacts of climate change, and the evaluation of adaptation and mitigation policies. The tool also facilitates streamlined descriptions of climate patterns at different spatial scales as well as exploring the effects of different aggregation mechanisms.

Limitations
An important consideration to keep in mind is that these models are a subset of the approximately 36 models that contributed to the CMIP5 data archive. These 5 models were selected because they were the first to supply data that met the minimum data requirements of the ISI-MIP project [5, p. 221]. It is also important to keep in mind that for many regions, these models are likely to underestimate the uncertainty in future climate projections [31]. In particular, these authors find that ''the fraction of the full range of future projections captured across different regions and seasons by the ISI-MIP subset varies from 0.5 to 0.9 for temperature (median 0.75) and 0.3 to 0.8 for precipitation (median 0.55).'' This is a general problem in climate scenario selection. Even if dry, wet, cool, or hot climate projections can be specifically selected for particular regions, including the global aggregation, these characteristics do not necessarily hold for other regions. As such, a climate projection that is specifically dry and hot compared to other projections in one region may be cool and wet in other regions. McSweeney and Jones [31] find that at least 13 climate model projections are needed to cover a substantial range of the uncertainty in all regions. This tool cannot be easily extended to all climate projections from the CMIP5 archive as these are not available in bias-corrected form as done by [5], but we encourage users to note the limited representation of scenario selection in the interpretation of their applications.

Conclusions
Access to climate and spatial datasets by non-specialists is hindered by technical difficulties involving software and data formats as well as the need for strong Internet bandwidth and storage capacity. This article discusses a GEOSHARE HUBzero tool that expands access to the climate data that underlies the AgMIP Global Gridded Crop Model Intercomparison (GGCMI) Project to the broader scientific community who can benefit from these data but who may lack the resources to gain access to them. We hope that this software tool enables researchers who are facing technical limitations to overcome these barriers.