FluxEngine: A Flexible Processing System for Calculating Atmosphere–Ocean Carbon Dioxide Gas Fluxes and Climatologies

Theair–seaﬂuxofgreenhousegases[e.g.,carbondioxide(CO 2 )]isacriticalpartoftheclimatesystemanda major factor in the biogeochemical development of the oceans. More accurate and higher-resolution calculations of these gas ﬂuxes are required if researchers are to fully understand and predict future climate. Satellite Earth observation is able to provide large spatial-scale datasets that can be used to study gas ﬂuxes. However, the large storage requirements needed to host such data can restrict its use by the scientiﬁc com- munity. Fortunately, the development of cloud computing can provide a solution. This paper describes an open-source air–sea CO 2 ﬂux processing toolbox called the ‘‘FluxEngine,’’ designed for use on a cloud- computing infrastructure. The toolbox allows users to easily generate global and regional air–sea CO 2 ﬂux data from model, in situ, and Earth observation data, and its air–sea gas ﬂux calculation is user conﬁgurable. Its current installation on the Nephalae Cloud allows users to easily exploit more than 8 TB of climate-quality Earth observation data for the derivation of gas ﬂuxes. The resultant netCDF data output ﬁles contain . 20 data layerscontainingthevariousstagesoftheﬂuxcalculationalongwithprocessindicatorlayerstoaidinterpretationofthedata.Thispaperdescribesthetoolboxdesign,whichveriﬁestheair–seaCO 2 ﬂux calculations; demon-strates the use of the tools for studying global and shelf sea air–sea ﬂuxes; and describes future developments.


Introduction
The climate of Earth is sensitive to the radiative impact of a number of gases and different types of particles in the atmosphere. The atmospheric concentration of many important gases and particles is sensitive to the air-sea transfer of volatile compounds. These gases can also play a substantial role in the biogeochemistry of the oceans. It is therefore important to quantify contemporary air-sea fluxes of gases and also to provide the understanding necessary to project possible future changes in these fluxes. The air-sea fluxes of gases can in some cases be inferred indirectly, but most flux estimates depend on a calculation using a standard bulk airsea gas transfer (e.g., as defined within Takahashi et al. 2009, hereafter T09). For each gas, this calculation depends upon measurements of the gas concentration in the surface ocean and the lower atmosphere and upon ''transfer coefficients'' that describe the ''rate constants'' for transfer across the sea surface. The simplest calculation requires only a single transfer coefficient, the gas transfer velocity.
Greenhouse gases are those that can absorb and emit infrared radiation. Of these gases, CO 2 is one of the most studied and systematically observed. Increasing levels of atmospheric CO 2 , caused by the burning of fossil fuels and biomass, are of growing concern due to their impact on the global climate system. Understanding the pathways, sources, sinks, and impact of CO 2 on the earth's climate system is essential for monitoring climate and predicting future scenarios. The global ocean is thought to annually absorb ;25% of anthropogenic CO 2 emissions (Le Quéré et al. 2015Quéré et al. , 2014, and it constitutes the only true net sink for anthropogenic CO 2 over the last 200 years (Sabine et al. 2004). The North Atlantic sink in particular has been shown to be highly variable (Watson et al. 2009) and the mechanisms driving this variability are not well understood. Therefore, isolating and reducing the uncertainties in the estimates of the oceanic sink of CO 2 is a crucial goal of climate science (Le Quéré et al. 2009).
In the last decade there has been an explosion in the availability of large (.1 TB) high-quality, wellcharacterized, and often multisensor cross-calibrated Earth observation (EO) datasets. For example, the European Space Agency (ESA) GlobWave project (http:// globwave.ifremer.fr/) produced a 20-plus-year time series of global coverage multisensor cross-calibrated wave and wind data. Similar efforts in the United States resulted in the National Aeronautics and Space Administration (NASA) project Making Earth System Data Records for Use in Research Environments (MEaSURES), which produced a 13-plus-year time series of global coverage multisensor surface biology datasets (Maritorena et al. 2010). These successes and the classification by the Group on Earth Observations (GEO) of a number of parameters discernable from space as essential climate variables prompted ESA to start their Climate Change Initiative (CCI) projects. There are currently 14 different CCI projects. For those interested in air-sea gas fluxes, arguably the most interesting of these projects is the Sea Surface Temperature CCI project, which has provided a 20-plus-year time series of global coverage multisensor sea surface skin and subskin temperature data (Merchant et al. 2012). Unfortunately, the resources required to download and exploit these large global and decadal datasets can limit their exploitation by the scientific community. The development of cloud technologies and storage provides a solution. Cloud computing can be defined as interconnected computing resources that can be easily scaled up (grown) or down (shrunk) while maintaining its capability or function, rather like a ''cloud'' in the atmosphere. These systems have a number of key features, such as a high level of redundancy (e.g., servers can be removed or upgraded without users noticing) and their scalable nature (e.g., they use standard hardware and software, allowing low-cost expansion of a cloud). Through cloud computing it becomes possible for users to easily remotely access and process large volumes of data and then simply download the results to their local desktop computers or laptops.

a. The OceanFlux Greenhouse Gases project
The OceanFlux Greenhouse Gases project was funded by the European Space Agency in 2011 to encourage the use of satellite Earth observation data for studying air-sea gas fluxes. To achieve this, the objectives of the project included the development and validation of novel gas flux Earth observation algorithms (Goddijn-Murphy et al. 2013 and scientific analyses (Land et al. 2013). Another objective was to provide datasets and processing tools that can be used by the scientific community. Accordingly, the gas flux data processing tools, collectively named the ''FluxEngine,'' are described in this paper. The FluxEngine allows users to configure the flux parameterization and to select their chosen input data, and then it generates the resulting monthly global flux datasets. A plethora of climate study quality EO, in situ, and model data are available as input to the toolbox and to aid the interpretation of the resultant flux data (see Table 1). The outputs from the system are standard netCDF datasets that can be easily read into a number of third-party scientific software packages. The primary gas of interest for the OceanFlux Greenhouse Gases project was CO 2 . Therefore, the FluxEngine has been developed to aid the study of the air-sea flux of CO 2 , although as described later in the paper, the toolbox can also be used to support the study of other gases, such as N 2 O and dimethyl sulfide (DMS).

b. Air-sea flux calculations
The flux of CO 2 between the atmosphere and the ocean (air-sea) is controlled by wind speed; sea state; sea surface temperature; surface processes, including any biological activity; and the difference in CO 2 fugacity between the ocean and the atmosphere. The airto-sea flux of CO 2 (F, g m 22 s 21 ) is calculated using the gas transfer velocity k (m s 21 ), and the difference in CO 2 concentration (g m 23 ) between the base [CO 2AQW ] and the top [CO 2AQ0 ] of a thin (;10-250 mm) mass boundary layer at the sea surface: The concentration of CO 2 in seawater is the product of its solubility a (g m 23 matm 21 ) and its fugacity fCO 2 (matm). As gas solubility is a function of salinity and temperature, it varies across the aqueous boundary layer. Hence, Eq. (1) now becomes where the subscripts denote values in water (W), at the sea-air interface (S), and in air (A). The CO 2 concentration (and thus fugacity) is normally measured a few meters below the sea surface rather than at the surface. Variations in temperature at the sea surface (such as diurnal warming) will affect the fugacity via the carbonate reaction. For simplicity we can substitute partial pressure for fugacity because their values differ by ,0.5% over the temperature range of interest (McGillis and Wanninkhof 2006). Equation (2) can therefore be alternatively represented as It is also popular for Eqs.
(2) and (3) to be collapsed into formulations that ignore the differences between the two solubilities, and just use the waterside solubility a W for both halves of the equation, resulting in and this formulation is often referred to as the ''bulk'' parameterization. The airside partial pressure of CO 2 , pCO 2A (matm) can be calculated using the concentration of CO 2 in dry air and air pressure using Term X [CO2] is the molar fraction of CO 2 in the dry atmosphere (expressed as the zonal mean in T09 and the FluxEngine), P is the air pressure (mb, expressed as a daily mean in the FluxEngine), and pH 2 O is the saturation vapor pressure (mb), which is defined in terms of sea surface temperature and salinity (Weiss and Price 1980) by where SST k is the sea surface temperature dataset of interest (K) and S is the salinity. For pCO 2W the Flux-Engine relies upon in situ pCO 2W measurements (e.g., data from a buoy or ship) or an in situ-derived climatology of pCO 2W data (e.g., T09).

The development of the FluxEngine
The following sections describe the design of the FluxEngine, the methods used for calculating the air-sea fluxes, the input datasets that are available, and the methods used to verify the implementation.
a. The FluxEngine design, input data, and implementation Figure 1 shows a diagram of the main parts of the FluxEngine, conceptually showing the input data and the contents of the output files. The input data are on the left of the diagram, with the calculation in the middle and an overview of the contents of each output file shown on the right. The air-sea fluxes are calculated using monthly composite data and the generation of these input data is described below. The output file is a netCDF4 Climate Format (CF) 1.6 compliant file that contains .20 data layers. The data layers within each output file include the different components of the gas flux calculation, statistics of the input datasets (e.g., variance of the wind speed), and process indicator layers to aid interpretation of the fluxes. The process indicator layers include fixed masks (e.g., land, ocean basins, open ocean, and coastal classification), climatological data (e.g., persistent SST fronts), and other modeling or Earth observation datasets useful for interpreting the fluxes (e.g., chlorophyll-a concentrations and modelgenerated estimates of wave whitecapping). All output from the toolbox consists of monthly global coverage 18 3 18 spatial resolution data (360 3 180 arrays). Therefore, generation of gas fluxes for a complete year (12 months) requires 12 sets of data inputs (one set per month) and the FluxEngine then produces 12 output files. The air-sea flux calculation utility contains internal integrity checks for all of the output data layers. These integrity checks highlight if any data layers contain data outside of a predefined expected range. If a data value for a data layer falls outside its specified valid data range, then a count is added to the corresponding data element position in one of the process indicator data layers.
The inputs to the flux calculation (box 2 in Fig. 1) are monthly composite data. Table 1 shows all of the monthly datasets that are currently available for the FluxEngine to use as input. Where required, these monthly netCDF composite data were generated using the original daily data. For those datasets that were generated, each monthly composite file contains the mean (first-order moment), median, standard deviation, and the second-, third-, and fourth-order moments as calculated using one calendar month of data.

b. The computing platform
The FluxEngine is currently installed on the Centre ERS d'Archivage et de Traitement (CERSAT) Nephelae Cloud. This Linux-based cloud-computing platform provides a petascale storage capacity and distributed FIG. 1. Conceptual overview of the FluxEngine air-sea gas flux processing approach. Description of (left) the input data and (right) the netCDF output content. The user-configurable air-sea CO 2 flux calculation is represented by (middle) the main component.
processing over more than 600 computing nodes. It was specifically designed by CERSAT for massive archive processing, for applications including data mining and multidecadal Earth observation data reprocessing. As with all cloud computing, it offers facilities for simple backup and restoration, high-speed data processing (i.e., the processing nodes are close to data, so there is no potential for input/output bottlenecks), the system can be tailored to a specific job (i.e., there is no reliance on physical hardware as it uses virtual servers), no specific skills are required to use it, and it maximizes the use of resources through dynamic reallocation. Data processing runs submitted by the user are scheduled and run on the cloud-computing nodes using a system developed by CERSAT called ''GoGo list.'' This is simply a wrapper enabling processing jobs to be executed and monitored on the cloud, and the use of GoGo list is completely invisible to the user. It must be noted though that the main reason for using the Nephelae Cloud here is to provide potential users with access to a large and continually growing satellite Earth observation dataset. The FluxEngine can also be installed and used on a desktop or laptop computer with no loss of performance.

c. Configurable options
The main flux calculation within the FluxEngine is user configurable through the use of a plain text ASCII configuration file. Within the configuration a user can choose the input datasets, the flux calculation model [choose between Eqs. (4) and (9)], and the gas transfer velocity parameterization. A range of different gas transfer parameterizations are available (e.g., McGillis et al. 2001;Nightingale et al. 2000;Wanninkhof 1992), including those based on sea state and surface roughness (Fangohr and Woolf 2007;Goddijn-Murphy et al. 2012). Through a generic formulation (see the appendix), it is also possible for users to use their own wind speedbased parameterization. Through the configuration file, the user can choose to inject random noise (normally distributed with specified mean and standard deviation) to any of the main input datasets (SST, U 10 , Hs, pCO 2 , or fCO 2 ). In the same way, the user can choose to add a bias offset value to any of the main input datasets. This functionality allows the impact of known input data uncertainties (i.e., root-mean-square error and bias) to be propagated through to the final flux datasets, allowing the impact of these uncertainties to be quantified in terms of the gas fluxes. Example configuration files are included with the open source software.

d. Additional software tools
Three additional tools exist within the FluxEngine toolbox. These enable the calculation of integrated net fluxes, regridding of the output data, and cruise or buoy in situ data to be used as inputs. The integrated net flux tool can provide gross and net fluxes, mean values for each dataset within the output data, and estimates of the open ocean fluxes and the flux contribution from any missing data (such as that from coastal, shelf, and enclosed seas). The regridding tool converts the 18 3 18 data to the 58 3 48 grid used by the T09 climatological dataset. This allows users to easily compare the FluxEngine output with that of the T09 climatology. The in situ to netCDF conversion tool converts sparse in situ data into a spatially and temporally binned or gridded (18 3 18 grid) format. For example, the tool allows in situ pCO 2 data to be used as input to the FluxEngine. Further details on all of the toolbox software components can be found in the appendix of this paper.

Data quality and verification of the calculations
The widely used T09 climatology dataset provides an ideal benchmark for verifying the operation and output of the FluxEngine, as it contains both the air-sea flux estimates and the input values used to calculate these fluxes. Therefore, the T09 air-sea flux data were first used to verify the FluxEngine integrated net flux tool, and then the main T09 input fields were used to verify the combination of the FluxEngine flux calculation and the integrated net flux tool.
To verify the integrated net flux tool, the year 2000 T09 climatology data were linearly interpolated to a 18 3 18 grid and then provided as the input to the tool. The integrated net flux tool used the T09 ice normalization (see the appendix). The resultant net air-sea CO 2 flux for the open ocean region was 21.39 GtC yr 21 with an additional 20.17 GtC yr 21 attributed to missing data, large lakes, the Mediterranean Sea, and coastal and shelf seas, giving a total of 21.56 GtC yr 21 . The difference between this global open ocean result (as calculated using the linearly interpolated data and the net flux tool) and that stated in the original publication is ,1%.
The T09 climatology data of SST, XCO 2 , U 10 , pCO 2w , air pressure, and percentage ice cover (linearly interpolated to a 18 3 18 grid) were then used as input to the FluxEngine. The main flux utility was configured to use Eq. (5) to calculate the air-sea fluxes. Figure 2a shows the resultant daily mean air-sea flux map. The projection and scale bar have been chosen to allow easy comparison with Fig. 13 in T09. The resultant flux outputs were then compared with the corresponding linearly interpolated T09 fluxes. The following is true for all monthly outputs: the pCO 2W data were identical to six decimal places; 2% of the pCO 2A data elements were .6 0.01% different; 2% of the a data elements were .6 3% different; and 2% of the k data elements were .6 2% different. These small differences within the calculations collectively results in 5%-12% (dependent upon the month; see Fig. 3) of the air-sea flux F data elements being .6 5% different. The differences in these data fields between the output and that of the original linearly interpolated T09 dataset are likely to be a combination of (i) minor rounding differences in the flux calculations (e.g., due to differences in precision between the FluxEngine calculations and those of the original T09 publication) and (ii) interpolation issues at boundaries. Figure 3 shows that the majority of the differences between the two air-sea flux datasets correspond to the sea ice boundary at high latitudes. Since we are comparing a linearly interpolated air-sea flux (originally on a 48 3 58 grid) with the output of 18 3 18 calculation (where all inputs are also 18 3 18), differences at these boundaries are expected. Using the integrated net flux utility on the output, the annual net integrated flux for 2000 was 21.33 GtC yr 21 with an additional value of 20.16 GtC yr 21 contribution from missing data, large lakes, the Mediterranean Sea, and coastal and shelf seas, giving a total of 21.49 GtC yr 21 . This final result is within 5% of that derived from the original 18 3 18 linearly interpolated T09 data. Figure 2b shows the annual mean air-sea CO 2 flux and the monthly net integrated fluxes for the four main oceanic basins.

Scientific application
This section illustrates how the FluxEngine and resultant data can be used to study global and regional airsea gas fluxes.

a. Global analyses and the Southern Ocean
A time series global analysis was performed to demonstrate the use of the FluxEngine. The linearly interpolated (to 18 3 18) T09 climatology data of X [CO2] , pCO 2w (and associated SST), and Earth observation SST (SST skin) and U 10 data, NCEP Climate Forecast System Reanalysis (CFSR) air pressure, and SSM/I percentage ice cover were used as the input to the FluxEngine (please see Table 1 for dataset specifics).
When the FluxEngine is configured to use the T09 pCO 2w data and the chosen SST dataset is not from T09 (either due to studying a different year than the original T09 year 2000, or due to the selection of a different SST dataset), the pCO 2w data need to be reanalyzed to be consistent with this new SST dataset. Therefore, following previous studies (Fangohr et al. 2008;Kettle and Merchant 2005;Land et al. 2013), the FluxEngine reanalyses the T09 pCO 2W data to correct them to the chosen SST datasets using the relationship provided by T09: where T C is the original temperature dataset (8C) and the subscript SST C denotes the sea surface temperature (8C). The FluxEngine was run for years 1995-2009 using Eq. (5) for the flux calculations and the gas transfer velocity of Nightingale et al. (2000). To generate an estimate of the uncertainties in the flux estimate due to the known uncertainties in the U 10 , pCO 2W , and SST input data, the FluxEngine time series processing was repeated with injected random noise enabled. The system assumes a 1.5 matm yr 21 change in the atmospheric and oceanic partial pressures of CO 2 relative to the year 2000 of the original T09 data. Therefore, the partial pressure difference is imposed to have no interannual variation or trend, and therefore differences between years are due to other factors. Integrated net fluxes were then calculated using the T09 ice normalization. As an example output, Fig. 4 shows the February and August monthly air-sea CO 2 flux maps for year 2000 and

b. European shelf seas
Despite their relatively small area, accounting for just ;5% of the World Ocean's surface, shelf seas play an important part in the global carbon cycle and in buffering human impacts on marine systems. These regions have a disproportionately large role in primary and new production, remineralization, and the sedimentation of organic matter [Chen et al. (2013) and the references therein]. The high biological activity in these regions can result in considerable drawdown of atmospheric CO 2 , with the potential for the carbon to be exported to the deep ocean. A recent study (Chen and Borges 2009) has estimated that 29% of the global air-to-sea CO 2 flux occurs in shelf seas. The North Sea is considered to be a sink for atmospheric CO 2 (e.g., Frigstad et al. 2011;Thomas et al. 2004), but the ability of the entire northwest European shelf to act as a sink of CO 2 , and the variability of this sink, are areas of active research. Assuming negligible net burial rates of carbon in shelf sediments (de Haas et al. 2002), the net off-shelf carbon export will equal the region's net air-sea CO 2 exchange. Therefore, estimating the net air-sea CO 2 exchange in the European shelf seas can help quantify the carbon export from this shelf sea.
The global time series data described in section 4a were used to study the air-sea CO 2 fluxes in the European shelf seas. Four different bathymetry-based definitions of the northwestern European shelf seas [,1000, ,500, ,200, and ,200 m plus the Norwegian Trench (NT)] were generated using Python and the General Bathymetric Chart of the Oceans, GEBCO_08 grid, and were used for calculating the net flux. Figure 7 shows the estimated net sink for each year and each region definition. The northwest European shelf seaintegrated net flux across all years and definitions falls in the range of 210.1 to 223.7 TgC yr 21 ( Table 2). The limits of this range are set by using the 200 m (210.1 TgC yr 21 ) and 1000 m (223.7 TgC yr 21 ) bathymetry masks. The air-sea fluxes generated using the 200-and 1000-m masks differ by 13%-14%, assuming the 200-m flux as the reference. Despite the relatively course spatial resolution of these data, these estimates are comparable to previous in situ-based studies. A recent review and assimilation of the published literature on shelf sea and coastal air-sea fluxes from in situ data (Chen et al. 2013) estimates the northwest European shelf net air-sea CO 2 flux to be 213.88 TgC yr 21 (for an unknown year), which they estimate to be ;4% of the global net flux due to estuaries and shelves. A recent modeling study (Wakelin et al. 2012) estimates the European shelf average long-term net air-sea CO 2 flux to be 239.6 TgC yr 21 based on a 16-yr average from a hydrodynamic-ecosystem model. There are differences in the definition of the northwestern European shelf between all of these studies, and the results in Table 2 illustrate that a precise definition is required. One possible reason for the differences between the model estimates and those presented here is likely to be the relatively coarse near-surface vertical resolution of the model. Each model surface grid cell will represent a volume of water that is typically between 0.1 and 2 m deep, dependent upon the underlying bathymetry (Shutler et al. 2011). This means that the model is FIG. 6. Time series of global and regional mean air-sea CO 2 fluxes (PgC yr 21 ). Regions are defined by the IHO oceanic basin descriptions as shown in Fig. 5c. The gray-shaded area represents the uncertainty in the global air-sea fluxes due to the known uncertainties in the input data. unlikely to be able to resolve any near-surface temperature gradients. In contrast the FluxEngine-derived fluxes presented here use satellite SST skin measurements that are observations of the temperature within a thin layer (500 mm) at the waterside of the air-sea interface (Donlon et al. 2002). The concentration of CO 2 is highly temperature dependent, so the lack of nearsurface vertical resolution within the model could have a large impact on the calculated CO 2 air-sea gas fluxes.

c. Underway in situ data
To demonstrate the flexibility of the system, the airsea fluxes for a research cruise in the central equatorial Atlantic were calculated using the FluxEngine. In situ fCO 2 and associated SST data were downloaded from the community Surface Ocean CO 2 Atlas (SOCAT) website (v1.5, tropical Atlantic group) and then preprocessed into the format required by the FluxEngine (using the in situ tool; see the appendix). The Flux-Engine was then used to calculate the air-sea CO 2 fluxes using the in situ fCO 2 data and the temporally and spatially corresponding EO data (the EO data sources are the same as used in sections 4a and 4b). Figure 8a shows the resultant gridded fCO 2 data for all SOCAT, v1.5, tropical Atlantic in situ fCO 2 data, and Fig. 8b shows the gridded in situ fCO 2 data from a single cruise (Bakker 2014) in the tropical Atlantic. Figure 8c shows Earth observation-derived mean gas transfer velocity for October 2000, and Fig. 8d shows the resultant air-sea CO 2 fluxes from using Fig. 8b as the input fCO 2 data. The fluxes can be seen to vary along the cruise track between a source (positive) and a sink (negative) of CO 2 . The missing gas transfer velocity data (causing a hiatus in the estimates of air-sea fluxes) are due to missing data in the Earth observation sea surface temperature dataset.

d. Benchmarking
The FluxEngine was run on a single cloud node (2.4-GHz Intel Xeon, 3 GB of memory), and the time taken to process a single-year dataset was determined. All input datasets were quality filtered and converted to a suitable netCDF format in advance of this analysis. A 1-yr climatology of fluxes at 18 3 18 spatial resolution took 40 min (total process time) to complete. Disabling the process indicator layer output reduced this time to 25 min. Running a 5-yr time series (with the indicator layers off) on the single cloud node took 2 h [i.e., ;25 3 5 yr (60 min) 21 total process time], whereas repeating the same 5-yr time series processing across five nodes using GoGo list took 25 min (as each year was executed on an independent node and so executed simultaneously). Calculating the integrated net fluxes from the 1-yr climatology netCDF files took 10 min (total process time). So, the total end-to-end time for calculating a 1-yr climatology and then calculating the integrated net fluxes using one node (with all indicator layers turned on) was 50 min.

Future developments
The FluxEngine has been developed to allow the study of CO 2 and the current version uses the T09 climatology pCO 2W data for the waterside component of the calculation. To increase the versatility of the system, work is ongoing to extend the toolbox to use the community SOCAT datasets (e.g., Pfeil et al. 2013;Bakker et al. 2014). There are many other climatically important gases, including nitrous oxide (N 2 O) and methane (CH 4 ). Partial support for these gases exists within the toolbox, as N 2 O, CH 4 , and CO 2 are all poorly soluble gases, so their gas transfer velocity parameterizations can be considered as interchangeable with that for CO 2 . Therefore, the toolbox can be used to generate maps of gas transfer velocity to enable air-sea fluxes of N 2 O and CH 4 to be studied. Similarly, k for the gas DMS can be considered to be the direct component (i.e., nonbubble component) of a k CO 2 parameterization, and the toolbox already includes two methods for deriving this direct component. The generic nature of the gas calculation and parameterization lends itself to being extended to determine air-sea fluxes for other gases. For each additional gas, a gas transfer velocity parameterization and an in situ dataset or climatology of in-water concentrations, partial pressures or fugacity, and suitable solubility equations are required. After CO 2 the next largest gas climatology (in terms of the number of in situ data points) is the Lana et al. (2010) DMS climatology, and so future extensions of the FluxEngine are likely to include the addition of a DMS capability through exploiting published work (e.g., Goddijn-Murphy et al. 2012;Johnson 2010;Lana et al. 2011). This will involve exploiting (or linking in) the open source code of Johnson (2010), which can be used to calculate gas transfer velocities. The full capability to calculate air-sea fluxes for N 2 O, CH 4 , and other gases is not currently possible as the in-water data collections for these gases are still in their infancy, but efforts have begun to collate such datasets, for example, the Marine Methane and Nitrous Oxide (MEMENTO) database (Bange 2006). The toolbox currently uses climatological salinity data from the T09 climatology. Recent advancements in satellite Earth observation have seen the launch of two sensors that can measure surface salinity from space. These are NASA's Aquarius and ESA's Soil Moisture Ocean Salinity (SMOS) missions, and future work will enable SMOS salinity data to be used within the toolbox. A web interface is also being developed that will enable users to create configuration files, execute processing, and download the resulting output.

Conclusions
A flexible air-sea CO 2 data processing toolbox called the FluxEngine has been developed and presented. The flux calculation itself is user configurable, and the outputs have been extensively evaluated and compared with reference datasets. No specialist knowledge is required to use the toolbox, and it is based on standard software tools and packages that require no licenses. It is currently installed and running on the Nephelae Cloud at the Insitut Francais de Recherche pour l Exploitation de la Mer (Ifremer), where .8 TB of climate-quality data can be used as input to the flux calculations. The use of cloud-computing approaches means that the data processing is scalable, and this feature is completely transparent to the user. Here we have used the toolbox to estimate the 15-yr-average net air-sea flux of CO 2 for the global oceans (including shelf seas and coastal zones), the four main oceanic basins, and the European shelf seas. We have shown how subtle differences in the definitions of the European shelf seas can cause differences of .10% in the calculated annual net fluxes. Similarly differences in the Southern Ocean definition can have an impact on the calculated air-sea flux. We therefore urge the scientific community to use a common set of oceanic region definitions, to allow the outputs from differing studies to be easily compared and contrasted. The FluxEngine provides a mechanism for this and its open source nature allows the scientific community to freely exploit the toolbox. It is hoped that the FluxEngine will help to improve the transparency and traceability of results from air-sea gas flux studies. The FluxEngine was originally developed for the ESA OceanFlux Greenhouse Gases project, and it is currently being used to produce air-sea gas flux climatologies and to study method and data uncertainties. Users can access the version of the toolbox installed on the Nephalae Cloud on the OceanFlux Greenhouse Gases project website (http://www.oceanflux-ghg.org); alternatively, the open-source software is available in GitHub (https://github.com/oceanflux-ghg/FluxEngine).
Acknowledgments. This work was funded by the European Space Agency (ESA) Support to Science Element (STSE) through the OceanFlux Greenhouse Gases project (Contract 4000104762/11/I-AM) and the U.K. NERC Carbon and Nutrient Dynamics and Fluxes over Shelf Systems (CANDYFLOSS) project (Contract NE/K002058/1). The Surface Ocean CO 2 Atlas (SOCAT) is an international effort, supported by the International Ocean Carbon Coordination Project (IOCCP), the Surface Ocean Lower Atmosphere Study (SOLAS), and the Integrated Marine Biogeochemistry and Ecosystem Research program (IMBER), to deliver a uniformly quality-controlled surface ocean CO 2 database. The many researchers and funding agencies responsible for the collection of data and quality control are thanked for their contributions to SOCAT.

APPENDIX
The FluxEngine Tools, Data, and Generic Gas Transfer Parameterization a. Tools Table A1 lists the software utilities that are available in the FluxEngine toolbox. Specific details on some of the tools are given in the following sections.

1) OFLUXGHG-FLUX-BUDGETS.PY
The utility calculates integrated net fluxes (F IN ) over a given region from the monthly mean flux and ice cover data as follows. The input netCDF files are assumed to contain monthly mean net flux F, ice cover, gas transfer velocity, in-water CO 2 concentration, and interfacial CO 2 concentration. A high-spatial-resolution land mask is required and a region definition file if nonglobal regions are to be analyzed. The four main oceanic regions (Atlantic, Pacific, Indian, and Southern Oceans), as defined by the IHO, are provided within the monthly netCDF output (within the process indicator data layers), so these can be used as the region definitions for this utility if desired. A high-spatial-resolution land mask is also provided within the toolbox and its use is described below.
The method for the integrated net flux tool is now described. At each pixel or data element, the integrated net flux is initially calculated from F and the pixel's total area, which is calculated assuming the earth to be an oblate spheroid. Next, we need to account for the effect of ice, and two methods of ice normalization are available. The first is from T09, which specifies that if ice cover within a data element is ,10%, then it has a negligible effect on the integrated net flux and so the ice cover value is assumed to be 0%. Data elements with ice cover .90% are set to 90% to account for leads, polynyas, etc. The net flux for each data element where the ice cover is .0% is then reduced linearly by the percentage of ice cover value. The second ice normalization method (Loose et al. 2009) posits that the integrated net flux from partially ice-covered regions is greater than would be expected from proportionality with ice-free fraction. Here, the integrated net flux is proportional to (ice-free fraction) 0.4 . This implies that the integrated net flux from a pixel with 90% ice cover (the maximum assumed by T09) is actually responsible for 40% of the icefree integrated net flux. This results in a quadrupling of the estimated net flux from regions of fast ice over that assumed by the T09 method. Users are able to choose which method they prefer to use.
The data are at a relatively coarse spatial resolution of 18 3 18 (which at the equator is ;111 km 3 ;111 km). Therefore, to calculate the contribution of each data element to the regional or global integrated net flux, we need to know the proportion of ocean (whether ice covered or not) actually contained within each data element. For most elements this will be 1 (open ocean or sea ice) or 0 (land), but it may be intermediate in data elements that cover region boundaries (e.g., either oceanic boundaries or coastal regions). The oceanic (nonland) proportion of the pixel is multiplied by the ice-corrected integrated net flux to give the contribution to the regional or global integrated net flux.
Where F data are missing (termed the missing integrated net flux), a first-order correction is made in order not to underestimate the regional net flux. As F is calculated from remotely sensed SST and wind speed, areas with significant ice cover, persistent cloud cover, or some coastal regions are likely to have missing F. We sum the ice-corrected ocean area of such data elements contained within the region of interest and use the areas with measured F to calculate a regional average F, which is then multiplied by the missing area to give an estimate of the integrated net flux from the missing regions. This is added to the integrated net flux to give an estimate of the total regional integrated net flux. Note that the flux from individual regions may not exactly sum to the global region flux using this method, since in one case the regional mean flux is used to estimate the missing flux, while in the other case the global mean flux is used.
The net flux tool also provides gross fluxes and the average values for all spatially varying variables within the netCDF files. The gas transfer velocity k, in-water CO 2 concentration ([CO 2AQW ]), and interfacial CO 2 concentration ([CO 2AQ0 ]) are used to calculate the upward and downward integrated gross fluxes. The upward gross flux, F UG , is defined as k [CO 2AQW ] and the downward gross flux, F DG , is defined as k[CO 2AQ0 ]. Missing data are treated in the same way for these calculations as for integrated net flux.
When using the net flux utility, each selected region has its own output data file. Data are output for each month of each year for which input data are supplied, and the annual totals are provided for each year. Net flux outputs include F IN (integrated net flux based on calendar days), missing integrated net flux, and integrated net flux assuming a 30.5-day month, along with similar values for F UG and F DG .

2) RESAMPLE_NETCDF.PY
This tool calculates the mean for each 58 3 48 grid cell using the corresponding 18 3 18 grid cells in the input data. No correction due to variations in area (between the 58 3 48 and the 18 3 18 grids) due to interactions with land is considered and any missing (masked) 18 3 18 data are not used in the calculation. If all data in the 18 3 18 grid cells are masked (missing values), then the output 58 3 48 grid cell is also masked. The tool can output the regridded fields as netCDF or as a single ASCII commaseparated variable (CSV) file. The output file will simply reflect the contents of the input netCDF with each data array (2D dataset) replaced by its 58 3 48 equivalent, while the optional CSV file contains columns corresponding to latitude, longitude, and all 2D datasets found in the input file.

3) TXT2NCDF.PY
This utility assumes that the input data are in CSV format, with headings that include ''latitude,'' ''longitude,'' and optionally ''date'' or ''date/time'' on the first line, and corresponding values on the subsequent lines. Latitude and longitude must be in decimal degrees, while date or date/time must follow the format DD/MM/ (YY)YY [hh:mm(:ss)], where parentheses indicate options and DD is the two-digit day number within the month, MM is the two-digit month number (starting at 01) and (YY)YY is the two-or four-digit year. The resulting output data are in netCDF format, with latitude and longitude limits copied from a reference global netCDF file that is passed as one of the inputs. The output netCDF includes a dataset ''count'' containing the number of observations found in each grid cell. The user can optionally specify a startTime and endTime (in the same range of formats as date) and then the tool will only count observations on or after startTime and/or before endTime. If the end time is only a date with no time specified, it is assumed to be inclusive; that is, the end time is the end of the specified day. Hence, using the options of startTime ''01/01/08'' and endTime ''31/01/08'' would only count observations in January 2008. Any data are binned into each grid cell and no interpolation of the data is performed. The ''time'' value in the netCDF is set as follows: If any data contain date/time information, then the time is set to the midpoint of the b. Input data currently available Table 1 gives an overview of the available input datasets.

c. Generic gas transfer parameterization
Several published gas transfer velocity parameterizations (e.g., as used in T09) are of the form k w 5 aU 2 (Sc/660) 21/2 .
The toolbox provides a more generic polynomial expression that enables a large range of different wind speed-based gas transfer velocity parameterizations to be used (Nightingale et al. 2000;Wanninkhof et al. 2009), for example, k w 5 (Sc/660) 21/2 (a 0 1 a 1 U 1 a 2 U 2 1 a 3 U 3 1 a 4 U 4 ).

(A2)
This parameterization allows users to exploit their own wind speed-based gas transfer relationship.