Impact and detectability of hypothetical CCS offshore seep scenarios as an aid to storage assurance and risk assessment

Carbon Capture and Storage has the potential to make a significant contribution to the mitigation of climate change, however there is a regulatory and societal obligation to demonstrate storage robustness and minimal local environmental impact. This requires an understanding of environmental impact potential and detectability of a range of hypothetical leak scenarios. In the absence of a significant body of real-world release experiments this study collates the results of 86 modelled scenarios of offshore marine releases derived from five different model systems. This synthesis demonstrates a consistent generalised relationship between leak rate, detectability and impact potential of a wide range of hypothetical releases from CO 2 storage, which can be described by a power law. For example a leak of the order of 1T per day should be detectable at, at least, 60m distance with an environmental impact restricted to less than a 15m radius of the release point. Small releases are likely to require bottom mounted (lander) monitoring to ensure detection. In summary this work, when coupled with a quantification of leakage risk can deliver a first order environmental impact assessment as an aid to the con- senting process. Further this work demonstrates that non-catastrophic release events can be detected at thresholds well below levels which would undermine storage performance or significantly impact the en- vironment, given an appropriate monitoring strategy.


Introduction
Many commentators regard Carbon Capture and Storage (CCS) as an essential component in limiting climate change to 2°C or lower (e.g. IPCC, 2014), as legislated by the Paris Agreement (United Nations, 2016). Successful deployment of CCS depends on demonstrable storage integrity from both a legal (Dixon et al., 2015) and societal point of view (Bradbury, 2012) and hence a reliable monitoring strategy. In addition, understanding and quantifying the impact of leakage, alongside an appreciation of the minimal risk of leakage, can serve to assure or at the very least underpin a cost-benefit analysis (Jenkins et al., 2015).
Whilst international regulatory standards are not harmonised (e.g. Bielicki et al., 2015;EU, 2009), there are reasonably consistent CCS performance regulations that, in general, minimise the acceptable loss from storage, require any loss from the storage complex to be quantified, necessitate an environmental impact assessment and stipulate a fit-for-purpose monitoring strategy (Dixon et al., 2015).
Primary monitoring of storage integrity will involve seismic techniques that are capable of imaging storage formations and overburden, but these can be costly to deploy and have limited sensitivity and accuracy (Jenkins et al., 2015). Monitoring at or near the surface (or sea floor) has the potential to be highly sensitive, could be cheaper to deploy and can contribute to quantification of release if required. In particular, monitoring at the sediment surface, whether onshore or offshore would be necessary to identify and quantify environmental impact or the lack thereof. Given that adequately planned storage should have near-zero likelihood of leakage, it is unlikely that all-embracing monitoring programmes are economically justifiable, nevertheless a minimum monitoring provision is necessary for assurance. As leakage points are a-priori unpredictable, monitoring must be able to cover relatively large areas (of the order of 100 km 2 allowing for lateral https://doi.org/10.1016/j.ijggc.2019.102949 Received 10 June 2019; Received in revised form 28 November 2019; Accepted 22 December 2019 spread of CO 2 through the overburden), be able to detect chemical anomalies at a distance (sensitivity) whilst not generating false-positives (accuracy). The latter is problematic as the natural concentration of CO 2 in marine systems varies heterogeneously, modified by a variety of physical and biological processes. Marine CO 2 concentrations are operationally determined by either pH or pCO 2 measurements, as both can be measured by commercially available in-situ sensors at high frequency. Typically shelf seas that overlay offshore storage regions will experience an annual range of 0.2-0.4 pH units, with a mean between 8.0 and 8.1 or 400 μatm +/-50-150 μatm in terms of pCO 2 (Thomas et al., 2005;Artioli et al., 2012). As a result of these challenges various studies have identified potential anomaly criteria based on covariance relations, stoichiometric methods (Romanak et al., 2012;Botnen et al., 2015;Uchimoto et al., 2018) or very short term variability (Blackford et al., 2017) which would enable the detection of a small leakage signal against a background of significant noise. Other studies have developed strategies for deployment of sensors either on autonomous underway vehicles (AUVs) or fixed landers aimed at minimising costs whilst delivering assurance (Greenwood et al., 2015;Hvidevold et al., 2015;Alendal, 2017). It is envisaged that optimising the combination of anomaly criteria and deployment strategy would deliver the coverage, accuracy and sensitivity required of a monitoring system.
An environmental impact assessment requires consideration of the spatial and temporal extent of harmful geochemical changes that may occur in the event of a release, along with an understanding of the sensitivity of the proximate ecosystem. Quantifying ecosystem sensitivity to raised levels of CO 2 is complex (Jones et al., 2015), as impact depends on taxon, species and life stage (Kroeker et al., 2013), the nutritional status of individuals (Thomsen et al., 2013) and the length of exposure  as well as being magnified in the presence of other stressors (Ellis et al., 2015). As such it is difficult to establish a threshold of excess CO 2 that can be reliably defined as signifying the beginning of harmful impact. Gattuso et al., 2015 identifies combined changes in of +1.5 to +2.0°C and -0.15 to -0.25 pH units since pre-industrial as a threshold where severe impact to sensitive species and ecosystems is likely. Due to the absorption of increasing atmospheric carbon dioxide the oceans have already experienced an average pH decrease of approximately 0.1 since pre-industrial times. In laboratory experiments short term exposures to lowered pH tend to indicate impacts only where pH change exceeds -0.2 pH units (e.g. Azevedo et al., 2015). As marine systems exhibit pH variability of 0.2-0.4 pH units over a seasonal cycle, the timing of any perturbation with respect to natural cycles is an important consideration when assessing impact potential. Assessments from the small number of release experiments (Jones et al., 2015;Roberts and Stalker, 2017) and other analogues of leakage have generally shown spatially limited environmental impact. For example a short term release experiment of 4.2 Tonnes CO 2 over a five week period resulted in a measurable biological response only within 20 m of the release point, which persisted for no longer than four weeks . Analysis of a long-term natural volcanic CO 2 vent system demonstrated significant alteration in marine community structure, however constrained to the region, approximately 100 m in length, with active vents and a measurable pH change (Hall-Spencer et al., 2008).
Given that working offshore to establish impact and monitoring strategies from experimentation and direct observation is complex, risky and expensive; maximising in-silico techniques to constrain impact and derive monitoring approaches is a cost effective methodology. Marine system models that reproduce 3D time-evolving hydrodynamics coupled with representations of system biogeochemistry are internationally ubiquitous, often with established skill and with minimal modification are applicable to exploring and quantifying scenarios of CO 2 leakage into the marine environment. This paper synthesises existing modelled scenarios of leakage with a view to establishing general relationships between leakage rate, potential environmental impact and potential detectability of leakage, using chemical sensors. Although presenting a simplistic collation of heterogeneous simulation outcomes, the resulting quantification of impact potential and detectability has the ability to inform the wider debate about the societal benefits of CCS.

Generic modelling approach
Coupled marine hydrodynamic -biogeochemical models are a long established technology with many applications (e.g. Moll and Radach, 2003;Holt et al., 2016). For the purposes of investigating CCS leakage, monitoring and impact potential the primary requirement is a sufficiently skilful representation of hydrodynamic processes which are responsible for the advection and dispersion of CO 2 released at the sea floor, including currents, wind-wave effects, tides and turbulent mixing.
A key consideration for CCS applications is resolution; large scale events can be reasonably treated using meso-scale model resolutions (e.g. Drange et al., 2001;Blackford et al., 2008;Phelps et al., 2015) whilst consideration of small scale release requires a high resolution system in order to resolve the gradients of chemical change (e.g. Dewar et al., 2013a;Blackford et al., 2013). A first order estimation of CO 2 concentrations resulting from a release can be achieved by using hydrodynamic models including dissolved tracers as a proxy for CO 2 , given that CO 2 dissolution has a minimal effect on density (Greenwood et al., 2015;Ali et al., 2016), i.e. away from the bubble plume which will require multiphase flow models Chen, B. et al., 2003;Kano et al., 2010;Dewar et al., 2013aDewar et al., , 2015. More sophisticated systems have explicit coupled CO 2 chemistry (the carbonate system) which includes the impact of temperature on CO 2 solubility, the exchange of CO 2 across the air sea interface and the ability to calculate resultant changes in pH or pCO 2 (Blackford and Gilbert, 2007;Phelps et al., 2015;Dewar et al., 2013a). pH and pCO 2 are important to resolve as these are the operationally measurable indicators used to derive CO 2 concentration. Fully coupled hydrodynamic-biogeochemical or ecosystem models have the advantage of combining hydrodynamics, carbonate chemistry and the biological processes that affect marine CO 2 concentrations such as respiration and photosynthesis (e.g. Blackford and Gilbert, 2007;Artioli et al., 2012Artioli et al., , 2014. These allow simulation of leakage scenarios in the context of natural variability and potential inclusion of formulations that encode for impacts on the ecosystem and resulting feedbacks . Such fully coupled models are common place in mesoscale applications. Coupled full biogeochemical -ecosystem models with high resolution hydrodynamics have less of a pedigree but are becoming more established (e.g. Xue et al., 2014).
In this work the results of five different model applications which explore a total of 86 leakage scenarios, ranging from 5 × 10 −4 to 10 4 tonnes per day (Appendix , Table A1) are collated. Whilst each model system conforms to the basic requirements for application to CCS R&D outlined above, they all differ in terms of resolution, forcing and sophistication. The initial question is therefore, do these varied model systems deliver consistent results? From each individual model simulation the sea floor area and water column volume which are acidified by -0.01 and -0.1 pH units are derived. The reduction of 0.1 pH unit is used as an indicator of maximum impact potential, thus the area/volume acidified by -0.1 pH defines the impact footprint arising for that particular scenario. The threshold of a 0.01 pH unit reduction is chosen as both an established accuracy and sensitivity for marine deployable pH sensors and a threshold which has been demonstrated as a potential indicator of anomalous CO 2 , based on high frequency sampling, (Blackford et al., 2017). Thus the -0.01 pH change threshold defines a limit of detectability, which can then be used to inform the deployment strategy of fixed or mobile sensors. particular features which are pertinent to the simulation of CO 2 releases. Included are results from five model systems, the FVCOM and Eulerian models deliver horizontal resolution at the metre scale, the NEMO system at approximately 100 m and the POLCOMS and Bergen Ocean Model at approximately 1 km. The Eulerian model system describes the detailed dynamics of gas phase bubble plumes, including buoyancy and solubility terms, the other systems assume CO 2 is injected to the water column in the dissolved phase, the NEMO model parameterising the impact of bubble rise on the initial distribution of dissolved CO 2 with height above sea-bed, with the CO 2 added to several of the lower model layers. Full descriptions of the model systems can be found in the literature cited.

FVCOM unstructured grid system
The Finite Volume Community Ocean Model (FVCOM) is a prognostic, unstructured-grid, 3-D ocean circulation model (Chen et al., 2003a). FVCOM enables the resolution of the model grid to vary across the study domain, such that high resolution can be applied in regions where strong gradients are expected, without imposing the computational cost of high resolution throughout the domain as in the case of regular rectilinear grid systems. Hence for simulating releases of dissolved substances into the marine system, resolution can be concentrated at the epicentre of the release, decreasing towards the periphery of the domain (Blackford et al., 2013). For the simulations presented here, FVCOM has been coupled to a carbonate system model  that predicts the chemical changes associated with additional CO 2 , including outgassing at the sea surface and changes to the pH of the system as described in section 2.2.6.
The domain used is centred on 58.00 N, 0.25 W, or Goldeneye, a station in the North Sea identified as a possible storage site (Pale Blue Dot, 2016). Bathymetry is sourced from the EMODnet Digital Terrain Model (DTM) (www.emodnet-bathymetry.eu) whilst within the storage complex, high resolution multi-beam bathymetry collected during the STEMM-CCS project is used. The bathymetry is interpolated from the raw data onto the model grid with a linear interpolation. Simulated water depth is of the order of 120 m meters and the system experiences seasonal thermal stratification. Boundary forcing is provided by an FVCOM model of the north-west European continental shelf itself driven at its boundaries by tides predicted from 11 tidal constituents. Grid resolution in the larger domain ranges from 15 km at the boundaries to 500 m at the boundary with the smaller nested domain. Within the smaller nested domain in which the leaks are simulated, grid resolution varies from 3 km along the boundaries to 2.5 m at the leak site. Vertical resolution is 25 layers using terrain following sigma coordinates. Surface heating is supplied by a custom Weather Research and Forecasting (WRF, Skamarock et al., 2008). Sea surface temperatures are assimilated from the GHTSST data (NASA/JPL) to constrain the modelled values within observed values. The model and results are fully described in Cazenave et al. (2019).

Two phase Eulerian plume models
A purpose built, Eulerian-Eulerian, finite volume numerical model of multi-phase plume dynamics in a small scale turbulent ocean was developed in particular to investigate the fine scale behaviour of bubble plumes, their displacement and aqueous dissolution; predicting the near field physicochemical impacts and biological risk to the marine ecosystem from CO 2 leakage from potential carbon storage locations around the North Sea and surrounding shallow waters (Dewar et al., 2013a).
The model considers leaked bubbles or droplets forming as a dispersed phase, rising as a buoyant plume with interactions and dissolution into the surrounding turbulent waters, modelled through large eddy simulation theories (Chen et al., 2003b). Sub-models are designed based on the fluid properties (in phases of CO 2, gas, liquid, and hydrate) and CO 2 leakage parameters to predict the bubble-seawater plume interactions, dynamics and exchange rates. These sub-models include physicochemical properties, drag coefficient, Sherwood number, bubble generation and size distributions along with both bubble-bubble and bubble-seawater interactions through bubble breakup and coalescence.
The model is developed, calibrated and validated from a wide range of both laboratory and in-situ experiment and observation data, including that of the QICS experiment Sellami et al., 2015) and North Sea turbulent energy measurements. pH is calculated from total dissolved inorganic carbon concentration using of carbonic acid dissociation constants obtained from Saruhashi (1970), and the ion content of the water suggested by Someya et al. (2005). Further details of the model development, calibration and verification may be found in Dewar et al. (2015Dewar et al. ( , 2013aDewar et al. ( , 2013b. Model resolution is varies between 0.5-2 m in the horizontal, concentrated around the leakage area and 0.04-2.67 m in the vertical, dependent on water column depth. Simulations are purposefully chosen to cover a wide range of potential leakage scenarios through the use of varied CO 2 leakage depths (10−200 m), and bubble size distributions (0.1-> 10 mm), leakage rates and fluxes (appendix Table A1); along with varied ocean currents, temperatures and seawater compositions (salinity etc.) and as such are not specific to a particular location. Changes to the fluid chemistry and pH are calculated from the dissolved mass concentrations of the CO 2 to measure the effect on the marine ecosystem.

NEMO 90 m nested domain
Simulations use the Nucleus for a European Model of the Ocean (NEMO; Madec, 2008) model which provides the model for Operational Oceanography across Europe and the ocean model for climate simulations in UK, France and Italy. NEMO is a 3D prognostic finite-difference hydrodynamic model operating on a quadrilateral horizontal mesh.
The work reported here utilises two NEMO configurations. Both are rectangular model domains of 0.5°x0.5°surrounding a "Northern" site (centred on 57.75°N, 1.0°E) and "Southern" site (centred on 54.0°N, 1.0°E) in the North Sea, representing the Forties and Viking fields respectively. Horizontal resolution is 1/1200°(∼90 m) and 40 levels are used in the vertical. These follow the bathymetry with an equal number of uniformly spaced levels at each grid cell. The model uses a time split approach with depth averaged and depth varying equations being integrated forwards in time with time steps of 0.6 s and 6 s respectively. Bathymetry is taken from the NOOS partnership NW European shelf data set (http://www.noos.cc/), with a resolution of 1/60°, so is smoothly varying over the region considered. Tides are taken from the High Resolution Continental Shelf Model . In the study only the dominant M 2 constituent was considered, and so represent mean tidal conditions rather than the 14 day spring-neap tidal cycle. These forcing data have been extensively validated and forms the basis of the POLPRED tidal products (e.g. as used by UK Renewable energy Atlas, http://www.renewables-atlas.info/).
The effects of a seafloor leakage of CO 2 are simulated through the continuous, vertical (lateral movement of CO 2 bubbles is considered negligible) injection of dissolved inorganic carbon into the centre of the model domain, over a predetermined number of layers above the seafloor, defined by a simple bubble parameterisation embracing the dissolution rate and the rise velocity. In effect the release is equally partitioned over the bottom 3 m of the water column (equivalent to two vertical layers in the 98 m deep domain and 4 layers in the 43 m deep domain). The carbonate chemistry is modelled by the iterative speciation model outlined in 2.2.6.
In this study, model experiments are designed to consider a range of 'idealised' environmental scenarios including geographical location (defining water depth and tidal current patterns), the presence/absence of stratification, and/or the presence/absence of synthetic wind forcing. dimensional horizontal quadrilateral grid hydrodynamic model with terrain following S-coordinates in the vertical, well suited to maintaining sharp temperature and salinity gradients that are abundant in shelf seas. The model is forced by 6 -hourly European Centre for Medium Range Weather Forecasting atmospheric model data whilst boundary conditions consist of hourly elevations and depth mean currents, and daily depth-varying currents, temperature and salinity. These and 15 tidal harmonics are taken from a wider area POLCOMS model (Wakelin et al., 2009). Phelps et al. (2015) used a 1.8 km horizontal resolution, 32 layer implementation to address a number of high-rate scenarios. For simplicity it is assumed that CO 2 enters the marine system in the dissolved phase within the bottom layer of the model and dissolved CO 2 is assumed to modify the density of seawater (and hence mixing strength) according to Song et al. (2002). The carbonate chemistry is modelled by the iterative speciation model outlined in 2.2.6.
The scenarios presented address a combination of two sites, two leak rates, two seasons and two contrasting years (1998 and 1999), as defined by the North Atlantic Oscillation indices, which in turn are an indicator of wind driven mixing strength. In effect the 1999 simulations deliver stronger and more prolonged stratification compared with 1998. July represents a seasonal peak in stratification and temperature, whilst January simulations demonstrate completely mixed water columns and in general, stronger wind driven mixing. The northern site (57.75 N, 1.0E) coincides with the Forties Oil field with an epi-centre depth of 98 m, whilst the southern site (54.00 N, 1.0E) coincides with the Viking Oil field and a water depth of 43 m, in common with the NEMO simulations. Given the relatively coarse grid, the model system was used to investigate two very high end scenarios, 1000 T/d and 10,000 T/d.
The model set up and results presented here are fully detailed in Phelps et al. (2015).

Bergen Ocean Model
The Bergen Ocean Model (BOM) is a three-dimensional terrain-following ocean model with capabilities of resolving mesoscale to largescale processes (http://www.mi.uib.no/BOM/, Berntsen, 2004). The model domain covers the entire North Sea with a horizontal grid resolution of 800 m and 41 sigma-coordinate layers in the vertical, with a vertical resolution of less than 1 m in the shallow areas, and up to tens of meters in the middle of the water column in deeper areas. The model forcing data consist of wind, atmospheric pressure, harmonic tides, rivers, and initial fields for salinity and temperature. Water elevation and velocities are spun up from zero. Initial values and lateral boundary conditions of temperature and salinity are taken from the UK Metoffice FOAM 7 km model published by MyOCEAN service at http://www. myocean.eu.org/, and interpolated into the model grid. The atmospheric forcing data are collected and interpolated from The European Centre for Medium-Range Weather Forecasts (ECMWF), ERA Interim reanalysis data set. The wind forcing is updated at intervals of 6 simulated hours. The tidal forcing applied on the open boundaries is taken from harmonic analysis and includes four tidal constituents; M2, S2, K1, and N2. Nodal factors and equilibrium arguments needed to get the phase and amplitude right for a given date are provided by the ADCIRC model (http://www.adcirc.org/). CO 2 injections are treated a s a passive tracer where the horizontal diffusivity is computed following Smagorinsky (1963), while the vertical diffusivity coefficient is estimated using a turbulent closure described by Mellor and Yamada (1982). All nine scenarios used a seep rate equivalent to 150 tonnes CO 2 per day with the initial vertical distribution of CO 2 dissolution following Dewar et al., 2013aDewar et al., , 2015. Nine scenarios differ only in their geographical position, one centred on the Sleipner storage field (58.36°N, 1.94°E.) with other sites arranged 8 and 16 km to the NW, NE, SE, and SW. A full description is given in Ali et al., 2016.

Carbonate system modelling
The FVCOM, POLCOMS and NEMO based model systems use the same iterative carbonate (CO 2 chemistry) model, described in Blackford and Gilbert, 2007;Artioli et al., 2012 andButenschön et al., 2016 which calculates changes in pH associated with particular concentrations of CO 2 and various environmental conditions. The carbonate system model performs to the same accuracy and precision as the industry standard scheme MOCSY2.0 (Orr and Epitalon, 2015).
The CO 2 content of the system is modelled as fully dynamic concentrations of dissolved inorganic carbon (DIC, the sum of dissolved CO 2 and its dissociation products) with total alkalinity (TA) parameterised from salinity. Together these master variables, coupled with information of the primary physical properties (temperature, salinity and pressure) are used to derive pH, consistent with internationally agreed protocols (e.g. Dickson et al., 2007). DIC is comprised of two components, baseline or natural DIC which is parameterised from long term runs of a coupled biogeochemical-hydrodynamic model (POL-COMS-ERSEM, Wakelin et al., 2012), or set at a contemporary climatological mean and "leak" DIC, representing the additional release flux. The combination of both allows for a realistic treatment of CO 2 outgassing at the sea surface, which is parameterised from Weiss (1974) and Nightingale et al. (2000) and driven by surface wind speed.
The Eulerian model determines DIC via the dissolution of a modelled gas bubble/drop plume whilst the Bergen model uses a passive tracer as a proxy for DIC, neither of these systems consider air-sea gas exchange.

Scenarios
In the absence of real world leakage events from storage reservoirs and minimal evidence with which to constrain rates, the choice of leak rate has been very variable over the studies published. Some studies are guided by theoretical exercises attempting to derive plausible leak rates based on various geological scenarios (e.g. Paulley et al., 2013). Estimates of CO 2 leak rates from abandoned well bores range from 0.1 kg yr −1 to 52 t yr −1 (3 × 10 -6 -0.14 t d −1 ) depending on the physical integrity of the well casing and plug (Crow et al., 2010;Tao and Bryant, 2014;Jordan et al., 2015;Vielstädte et al., 2015Vielstädte et al., , 2017Vielstädte et al., , 2019. Estimates of leakage from an enhanced oil recovery project using CO 2 have been reported in the range 170−3800 t yr −1 (0.47-10.4 t d −1 , Klusman, 2003a, b). Other studies take a less constrained approach of investigating the range of possible events ranging from "what is the smallest leak that could be detectable or have some measurable impact" to "what is the largest conceivable leak, however unlikely that may be". Maximum leakage scenarios can be constrained by planned or achieved injection rates; for example the Sleipner field operated at about 1 Mt yr -1 equating to approximately 3 Kt d -1 .
The morphology of leakage is also unknown, for example an abandoned well bore scenario is likely to result in a quasi-point source, whilst a geological feature could enable leakage over the area of a chimney structure or along a fault line (Paulley et al., 2013). The model simulations are constrained to releasing CO 2 homogenously into each of one or more model grid cell(s), hence the initial injection concentration (and often the maximum chemical change) is determined by model resolution. In these studies simulated leaks have been treated as quasi point sources, either defined by a single grid cell for the FVCOM, NEMO, POLCOMS and BOM models or a 15 m x 15 m area (30 × 15 grid cells) for the Eulerian model. Equivalent fluxes for each scenario can be derived by dividing the release rate by the area of release, namely 3 m 2 (FVCOM), 225 m 2 (Eulerian), 8100 m 2 (NEMO), 0.64 km 2 (BOM) and 3.24 km 2 (POLCOMS). The model systems therefore approximate a range of release modes from point source to a large chimney feature.
No inference on the likelihood of any particular leak rate is presented, within any of the cited studies or in this work. The purpose here is to examine the hypothetical range, which covers in excess of seven J. Blackford, et al. International Journal of Greenhouse Gas Control 95 (2020) 102949 orders of magnitude from < 10 −3 to 10 4 tonnes CO 2 per day. A full list of the scenarios presented in this meta-analysis is given in Table A1 of the appendix. Apart from leak rate the reported scenarios test the effect of hydrodynamic conditions as a function of geographical position, season, tidal, thermal and wind driven mixing, and water depth. Model simulation periods are variable, but sufficiently long for a quasi-steady state to establish in all cases.

Results and discussion
In this work pH is used as the parameter to describe the carbonate system and degree of perturbation. Practically this is the parameter with the most complete record from the suite of model simulations analysed. Results are presented using four metrics • Volume of seawater experiencing a pH drop exceeding 0.01 • Volume of seawater experiencing a pH drop exceeding 0.1 • Area of sea experiencing a pH drop exceeding 0.01 • Area of sea experiencing a pH drop exceeding 0.1 There are several reasons for wanting to understand the footprint and chemical signature of CO 2 released into the marine system. The first relates to detection, and for that purpose pH is the ideal parameter as it is the primary parameter recorded by operational sensors. The threshold of a 0.01 pH change equates to the current sensitivity and accuracy of commercially available pH sensors, and is used here as the metric relating to detection. Whilst discriminating a change of 0.01 pH unit from background noise is challenging there is evidence that it can be an achievable anomaly detection criteria (Blackford et al., 2017).
A second purpose is understanding impact potential. For this pH is also a good parameter -much of the high CO 2 impact literature is presented as a function of pH, although not all. Physiologically pH is a causative proxy of impact as it references the ionic gradients affecting membrane transport (e.g. Flynn et al., 2012). Other impacts are more directly identified by [DIC], pCO 2 or saturation state, in particular the rubisco enzymatic functions and their effect on photosynthesis rates or calcification. However the intent in this manuscript is only to establish a basic relationship between release rate and impact potential. We use a change of 0.1 pH unit as an indicator of the maximum extent of impact, on the basis that experimentally measurable impacts of increased CO 2 tend to require a larger shift in the carbonate system than represented by 0.1 pH change, or equivalent changes in the other carbonate system parameters. A third monitoring challenge is quantification, where indeed DIC is the key variable. However this is not the topic of this manuscript.
Area metrics indicate the sea floor or benthic footprint affected, and are relevant to the placement of fixed sea floor monitoring platforms. Volume metrics address impact potential for water column or pelagic systems and are relevant to the planning of mobile monitoring deployments using autonomous underway vehicles. Fig. 1 illustrates the collated simulation results for each metric. The available data is presented in Table A1 of the appendix. Data gaps arise either where the original model output is no longer accessible and only a subset of metrics were derived at the time of analysis or where model resolution is insufficient, e.g. for the BOM model a 0.1 drop in pH would implicate an area much smaller than a grid box.
Considering the combined data from all of the models, the best descriptor of the relationship between leak rate (varying over seven orders of magnitude) and impact metric is given by power-law relationships, with R 2 > 0.9 in all cases. The exponents vary within the range 1.41-1.70. Considering data from individual models, whilst a power law generally describes the correlation well, in some cases a linear model also gives as good a fit. The power law tends to overestimate impacts from larger leaks, whilst the linear regression tends to overestimate impacts from small leaks. This empirical result is certainly worthy of further analysis to better understand the mechanistic basis, but the key result here is that in a range of physical environments, represented by the diversity of models, a consistent relationship emerges. Vielstädte et al. (2019) report an experimental release of 40 kg CO 2 over a 11.5 h period (0.08 t. d −1 ), using this to parameterise a Gaussian based plume model and testing scenarios of 0.027, 0.054 and 0.151 t d −1 (10, 20 and 55 t yr −1 ). They report a detectable seafloor footprint based on a detection threshold of approximately Δ0.03 pH units of 25-136, 50-292 and 250-1346 m 2 respectively, depending on tidal state (lower bounds estimated from Vielstädte et al., 2019). The respective detection thresholds derived from this study, based on a Δ0.01 pH threshold for the same release rates are 73, 193 and 802 m 2 . Impact estimates from the same study use a threshold of Δ0.15 pH units and report a potential impact area of 29, 54 and 271 m 2 respectively, approximately 3x the radii, but within the variability of the model scenarios considered here.

Scenario variability
Whilst the regression analysis demonstrates reasonably consistent results across the range of release rates, there is still significant variability between individual scenarios addressing similar releases. Are there any factors which explain this heterogeneity?
Examining the influence of different model systems (Fig. A1, Appendix), the high resolution FVCOM model tends to produce lower estimates of all metrics, whilst the coarsest resolution model system, POLCOMS, produces higher estimates. This follows from the higher numerical (artificial) diffusion associated with coarser grid models. The Bergen model agrees closely with the mean, although the scenario variability within the Bergen Model data is very low compared with the other systems, such that the Bergen data is highly clustered and more strongly influences the regression analysis. The only clear trend is seen in the very high resolution Eulerian model, which tends to produce lower estimates in all metrics for larger release scenarios. This is due to the limited domain size of the Eulerian model system, in these particular scenarios, where the larger releases plumes interact with the domain boundaries. The outcomes of the NEMO model display the greatest variability in metrics for similar release rates, this is a result of the idealised mixing scenarios used to force the scenarios, which do not include natural heterogeneity. In the absence of sufficient field data it is not possible to assess which if any model gives the most accurate results.
Horizontal and vertical velocity are likely to be key determinants of plume properties. For the Eulerian model, which applies fixed velocity fields there is a positive correlation between current velocity and the magnitude of both area and volume metrics. In the other models tidal forcing adds an additional complexity of periodically varying and reversing flow velocities. The model outcomes were analysed to assess if the spread in metrics can be related to environmental factors such as depth and an estimation of the degree of mixing (Fig. 2, Appendix). In neither case is there any apparently consistent relationship, although the lack of quantitative data describing velocities or mixing within each model simulation hampers a robust assessment.
The presence/absence of the air-sea exchange process could plausibly impact model comparison, however this would imply that the models omitting this process (Eulerian and Bergen) would generate larger values for the volume/area metrics, as a carbon sink is missing. This is not borne out by the bias analysis (Fig. A1 of the appendix). Challengingly Phelps et al. (2015) describing the POLCOMS scenarios, present an explicit analysis of sea to air fluxes showing that a significant proportion of the released CO 2 is lost to the atmosphere (or lessens atmospheric drawdown, depending on other processes.), but the POL-COMS simulations show a bias towards higher values of the metrics. For smaller releases, where the plume is restricted to the lower layers of the water column air-sea exchange is likely to be a slow and less significant process as far as monitoring and impact is concerned. For larger leaks or shallower water columns where plumes are more likely to reach the surface layer, air-sea exchange would be expected to moderate the value of the metrics. The available model data does not clarify this hypothesis which is worthy of further analysis.

Detection length scales
An important first order consideration for developing monitoring strategies, as well as risk assessments, is the length scale of detectability for a given leak rate -how close does a sensor have to be to the source to ensure detection, both horizontally and vertically?
By comparing area and volume impacted to the 0.01 threshold the approximate rise height above sea floor can be derived (Fig. 2). For leak rates below 1 T/d, rise height does not exceed 10 m and is predominantly limited to 2−3 m above sea bed, consistent with the results reported in Vielstädte et al., 2019. Given constraints on safe operational heights of marine autonomous underway vehicles (AUVs), which are generally limited to a height above sea bed of 5 m or more, these results suggest that maximum success in detecting small leakages will require sensors deployed on sea floor landers, enabling a sensor height of less than 1 m. For large leak rates, the rise height in these scenarios generally exceeds 5 m, sometimes considerably so, implying that AUV based sensor deployments would be suitable. Although not tested here, release events in shallow, well mixed systems may also be expected not to be confined to the proximity of the sea bed.
Making the highly simplistic assumption that the area of CO 2 plumes are approximately circular, the radii describing the average distance of the detection threshold from the CO 2 source can be derived  J. Blackford, et al. International Journal of Greenhouse Gas Control 95 (2020) 102949 from the regression in Fig. 1, thereby quantifying detectability distance as a function of leak rate (Fig. 3). This indicates that a leak of 1Kg/d would require a sensor within a 1 m radius, a 1 T/d event could be detectable at around 60 m distant and a 1 M T/d event at 7.8 km distance. Taking the criteria that no more than 1 % of stored CO 2 can be released to maintain an acceptable degree of storage integrity (IPCC, 2005) and using the Sleipner injection rate as an example, then a release of 30 T/d would be detectable at approximately 660 m distance.
The appropriate resolution of a model system for investigation of a particular leak rate can also be informed by the analysis, indicating that operational models likely require a resolution of 10 m or less. Of course, due to complex hydrodynamics of marine systems, and in particular the oscillatory water movement caused by tidal mixing, plumes of enriched CO 2 water will be anything but hemi-spherical/ circular. In the horizontal plane the model simulations predict a range of footprint shapes ranging from annular to elongated and fragmented (Blackford et al., 2008(Blackford et al., , 2013Phelps et al., 2015;Ali et al., 2016) whilst vertical movement will additionally be affected by bubble plume rise height and thermal stratification of the system. These model findings are confirmed by the complex plume geometry described by a small scale sea floor release (Vielstädte et al., 2019). A-priori knowledge of tidal flow direction and in-situ mixing regimes may enable bespoke planning of sensor deployment which could significantly increase the detection length scales reported here. Improved sensitivity of sensors may also increase the detection length scale, however the primary challenge may be to accurately identify changes in pH of less than 0.01 as anomalous, rather than the product of natural processes in the marine system.

Impact analysis
Addressing the question, "what degree of leakage is environmentally problematic?" is challenging and involves a consideration of global benefit contrasted with local impact, moderated by an understanding of risk. In this study a reduction in pH of 0.1 unit is taken as an indicator of maximum impact potential. In reality most ecosystems are robust to slightly larger reductions, especially if short-lived. A further complication arises from the intermittent exposure generated by oscillating tidal cycles. Some organisms may be able to tolerate short term exposure by down-regulating processes; however there is also evidence that rapidly fluctuating conditions can impart additional stress on individuals.
On average a small 1 T/d seep would produce a volume at risk of impact of 1470m 3 (equivalent to a sphere of radius 7 m) or a sea floor area at risk of 630m 2 (equivalent to a circle of radius 14 m), whilst a 1 M T seep could impact a volume of 0.18 km3 (sphere of 350 m radius) or an area of 49 km 2 (circle of radius 4 km). High end scenarios would have more likelihood of early detection, allowing prompt mitigation and reduction of impact. Where reported, return to non-impactful conditions after the cessation of a leak is rapid, varying between hours and days depending on leak rate (Phelps et al., 2015).
Comparing the scale of a leakage event against other marine pressures may at least contribute to a more informed comprehension, allowing a qualitative assessment of potential impact in the context of other marine features as well as comparison to socially relevant analogues. Fig. 4 ranks estimates of the footprint of selected marine features, alongside a high end CO 2 release, approximating to a total failure of the scale of the injection rate and a "small" release approximating to a leaky well type event. The former compares with the size of the (current) largest offshore windfarm, and is two orders of magnitude less than the area in the North Sea considered eutrophic or the area in the Central/Southern UK North Sea area impacted by bottom trawl fishing. The small scale leak impacts an area an order of magnitude less than a regulation football/soccer pitch. Whilst this doesn't provide a rigorous comparison of impact in any sense, it can provide some context to inform opinion, especially in a societal context.

Conclusions
The synthesis of model simulations presented here shows a consistent generalised relationship between leak rate, detectability and impact potential of hypothetical releases from CO 2 storage, which can be described by a power law. Different model systems, each with different complexities and process omissions deliver reasonably consistent results, so long as the resolution and domain of the model is appropriate for the scale of the release event. Given the lack of real-world release events the accuracy of model simulations is hard to assess, however many of the models have been extensively evaluated in terms of their general hydrodynamic skill. Local observations of hydrodynamic parameters will be key to improve site-specific estimates. There is also space for a more mechanistic analysis which elucidates the key drivers controlling footprints of detectible or impactful change, be they horizontal mixing and export or vertical mixing and air-sea exchange. Although it is important to recognise that any particular event will be uniquely configured by geological and hydrodynamic factors, comparisons between the impact of an unlikely failure of CO 2 storage and other marine environmental events can be drawn. This work also goes some way to quantifying the potential detectability length scale of a release, based on chemical anomalies. For example a leak of the order of 1 T per day should be detectable at, at least, 60 m distance with an impact restricted to an approximate 15 m radius of the release point. Very small releases are likely to require bottom mounted (lander) monitoring to ensure detection and careful planning to optimise monitoring strategies will be important in controlling cost whilst retaining assurance value. This study demonstrates the value of an in-silico approach to planning for offshore CCS, informing both risk assessments and monitoring strategies. Specific models of individual storage sites will enable bespoke planning of monitoring strategies as well as impact assessments within the specific hydrodynamic and biogeochemical context. Especially if based on pre-existing model systems, such studies are likely to be cost effective. Finally small leaks do not have the capacity to cause a significant environmental impact, especially when put in the context of other anthropogenic pressures on the marine system and set against the potential for CCS to ameliorate CO 2 driven climate change.

Declaration of Competing Interest
None. (continued on next page) J. Blackford, et al. International Journal of Greenhouse Gas Control 95 (2020) 102949  J. Blackford, et al. International Journal of Greenhouse Gas Control 95 (2020)