Reviews and syntheses : guiding the evolution of the observing system for the carbon cycle through quantitative network design

Various observational data streams have been shown to provide valuable constraints on the state and evolution of the global carbon cycle. These observations have the potential to reduce uncertainties in past, current, and predicted natural and anthropogenic surface fluxes. In particular such observations provide independent information for verification of actions as requested by the Paris Agreement. It is, however, difficult to decide which variables to sample, and how, where, and when to sample them, in order to achieve an optimal use of the observational capabilities. Quantitative network design (QND) assesses the impact of a given set of existing or hypothetical observations in a modelling framework. QND has been used to optimise in situ networks and assess the benefit to be expected from planned space missions. This paper describes recent progress and highlights aspects that are not yet sufficiently addressed. It demonstrates the advantage of an integrated QND system that can simultaneously evaluate a multitude of observational data streams and assess their complementarity and redundancy.


Introduction
There is an increasing number of observational data streams that can constrain the global carbon cycle (Scholze et al., 2017).A theoretical framework for integrating such observations into models of the carbon cycle is available (Rayner et al., 2016).Implementations of this framework, carbon cycle data assimilation systems (CCDASs; Rayner et al., 2005), are in operation (see e.g.Kaminski et al., 2013) and attempt to derive a consistent picture of the global carbon cycle.
In this context, an obvious challenge is the selection of observational sampling strategies that allow us to extract a maximum of information on a selected aspect of the carbon cycle.Typical questions are as follows: there is funding for n additional flux towers and m additional continuous atmospheric sampling sites available.Where to place them in order to maximise complementarity with the existing observational capabilities?Another question concerns the layout of space missions to sample, for example, the column-integrated atmospheric carbon dioxide concentration (XCO 2 ) or the fraction of absorbed photosynthetically active radiation (FAPAR) by the land surface.In both examples, the in situ sampling and the space mission, the optimal sampling strategy will typically depend on the objective, i.e. on the question to be answered.The verification of anthropogenic CO 2 emissions at the scale of a megacity or country over some period in the past may require a sampling strategy that is very different from a sampling strategy devised to constrain the carboncycle climate feedback in 2100.The optimal sampling strategy will also depend on the "background" of other observations to which we add the new observations.And it will depend on the level of redundancy we wish to ensure in the observational information, in order to hedge us against incidents such as instrumental failure or loss of a satellite.
The above two examples already illustrate the complexity of the task and the need for a systematic, quantitative approach; purely relying on ad hoc choices guided by intuition is too dangerous.This contribution describes a formalism, called quantitative network design (QND), that addresses the evaluation (or even optimisation) of sampling strategies in a modelling framework.QND evaluates a network, which is defined as a set of observations of specified variables at Published by Copernicus Publications on behalf of the European Geosciences Union.
T. Kaminski and P. J. Rayner: QND specified times and locations (or their integrals) that can be simulated by a modelling system.The approach uses formal uncertainty propagation of the observational information to selected target quantities that are also simulated by the modelling system.The definition of a set of target quantities formalises the purpose of the network, i.e. the questions the network is supposed to answer, and the uncertainty in the target quantity is the specific metric used to assess the performance of the network.In the above example the target quantities would be regional and temporal integrals of the net carbon flux or its fossil emissions or land-use change components.Typically, a network is compared with a simpler reference network.This reference network can be a network without any observations or a network with standard background observations.The reduction in uncertainty with respect to the reference network quantifies the added value or impact of the additional observations.Section 2 formalises these definitions and explains how QND differs from observing system simulation experiments (OSSEs) and observing system experiments (OSEs).
Almost a decade ago Kaminski and Rayner (2008) summarised the state of QND in the context of the global carbon cycle and concluded that "there are hardly any CCDAS applications to network design".Meanwhile this has changed, and Sect. 3 summarises the progress and shows a series of successful applications.
Modelling systems that simultaneously simulate the components of the carbon cycle as a coupled system are computationally heavy, and embedding them into a QND framework even amplifies the computational burden.Hence, it appears to be appealing to apply QND to component models for the separate evaluation of sub-networks that provide observations of the respective components.Section 4 illustrates the consequences of such a simplified approach in a highly simplified and in a more complex example.Finally, Sect. 5 recommends aspects of QND that need to be addressed by future work.

Methodology
The presentation of the methodology follows Kaminski and Rayner (2008) and Kaminski et al. (2012b) using the notation introduced to this special issue by Rayner et al. (2016).The underlying algebra is provided by Tarantola (2005) and Rayner et al. (2016).As mentioned, the QND formalism performs a formal uncertainty propagation from the observations to a target quantity of interest through a dedicated modelling chain.Hence, it is worth recalling the four influence factors which produce uncertainty in a model simulation: 1. uncertainty caused by the formulation of individual process representations and their numerical implementation (structural uncertainty); 2. uncertain constants (process parameters) in the formulation of these processes (parametric uncertainty); 3. uncertainty in external forcing/boundary values (such as solar insulation or temperature) driving the relevant processes; 4. uncertainty about the state of the system at the beginning of the simulation (initial state).
The first factor reflects the implementation of the model (code) while the others can be understood as input quantities controlling the behaviour of a simulation using the given model implementation.The QND procedure formalises these input quantities through the definition of a control vector, x.
The choice of the control vector is a subjective element in the QND procedure.A good choice covers all input factors with high uncertainty and high impact on simulated observations y mod or target quantities f (Kaminski et al., 2012b;Rayner et al., 2016).
The target quantity may be any quantity that can be extracted from a simulation with the underlying model, i.e. any potential model output, for example terrestrial net primary production (NPP) integrated over an area and time interval, but also any component of the control vector (for example a process parameter such as Q 10 expressing the temperature dependency of the decomposition of organic material).In the general case, where the target quantity is not part of the control vector, the QND procedure operates in two steps (Fig. 1).The first step (inversion step) uses the observational information to reduce the uncertainty in the control vector, i.e. from a prior to a posterior state of information, and the second step (prognostic step) propagates the posterior uncertainty forward to the target quantity.
In this procedure we take uncertainty into account by representing all variables, i.e. the prior and posterior control vectors as well as the observations, their equivalents simulated by the model, and the simulated target quantity by probability density functions (PDFs).We typically assume a Gaussian form for the prior control vector and the observations, if necessary after a suitable transformation.For example, instead of the above Q 10 we could use the transformed variable ln(Q 10 −1) in our Gaussian control vector, which changes the PDF of Q 10 such that values below 1 have zero probability.The Gaussian PDFs' covariance matrices express the uncertainty in the respective quantities, i.e.U(x 0 ) and U(y obs ) for the prior control vector and the observations.
For the first QND step we use the model M as a mapping from control variables onto equivalents of the observations.In our notation the observation operators that map the model state onto the individual data streams (see Kaminski and Mathieu, 2017) are absorbed in M. Let us first consider the case of a linear model, for which we denote the Jacobian matrix by M .In this case the posterior control vector is described by a Gaussian PDF with covariance U(x), i.e. the uncertainty is given by where the data uncertainty U(y) combines U(y obs ) with the uncertainty U(y mod ) in the simulated equivalents of the observations M(x): The first term in Eq. ( 1) expresses the observational constraint and the second term the prior information content.In the non-linear case we use Eq. ( 1) as an approximation of U(x).
In the second step, the Jacobian matrix N of the model (now used as a mapping from the control vector onto target quantities and denoted by N) is employed to propagate the posterior uncertainty in the control vector U(x) forward to the uncertainty in a target quantity σ (f ): If the model was perfect, σ (f mod ) would be zero.In contrast, if the control variables were perfectly known, the first term on the right-hand side would be zero.The terms U(y mod ) in Eq. (2) and σ (f mod ) in Eq. (3) capture the structural uncertainty as well as the uncertainty in those process parameters, boundary and initial values that are not included in the control vector.
To conduct a correct QND assessment, the requirement of the model is not that it simulates the target quantities and observations under investigation correctly, but the requirement is that it provides a realistic sensitivity of the target quantities and observations under investigation with respect to a change in the control vector.If these sensitivities, i.e. the Jacobians, are realistic, but the simulation of target quantities and observations is incorrect, we can always make a correct QND assessment with appropriately large model uncertainty.The result of the assessment may then be that a particular data stream is not useful in constraining a particular target quantity given current modelling capabilities.In this situation we could operate the QND system with reduced model uncertainty to explore which accuracy of the model is required for a data stream to be a useful constraint on a given target quantity.As an example for incorrect simulation but correct sensitivity we can think of a regional transport model that simulates the small-scale variability very well but cannot match the absolute concentration because it runs with an incorrect large-scale background.In particular, when it comes to new data streams and target quantities, the accuracy of both the simulation and the sensitivities are hard to assess.In the case of a model that misses relevant processes we may expect errors in both the simulation and the sensitivities, and consequently also in the QND assessment.
Our performance metric is the (relative) reduction in posterior target uncertainty σ (f ) 2 , with respect to a reference.To compare against the case without any observations we use, as the reference, the prior target uncertainty The uncertainty reduction with respect to the prior, quantifies the impact of the entire network.If we seek an extension of a background network by additional observations, we may want to use the posterior uncertainty for the background network as reference.The uncertainty reduction against this reference then quantifies the impact or added value of the additional observations.We note that (through Eqs. 1 and 3) the posterior target uncertainty solely depends on the prior and data uncertainties, the contribution of the model error to the uncertainty in the simulated flux, σ (f mod ), as well as the linearised model responses of the simulated observation equivalent and of the target quantities.The QND formalism does not require real observations and can thus be employed to evaluate hypothetical candidate networks.Candidate networks are defined by a set of observations characterised by observational data type, location, time, and data uncertainty.Here, we define a network as the complete set of observations, y, used to constrain the model.The term network is not meant to imply that the observations are of the same type or that their sampling is coordinated.For example, a network can combine in situ and satellite observations.
In practice, for pre-defined target quantities and observations, model responses can be pre-computed and stored.A network composed of these pre-defined observations can then be evaluated in terms of the pre-defined target quantities without any further model runs.Only matrix algebra is required to combine the pre-computed sensitivities with the data uncertainties.This allows the setup of QND systems that interactively respond to user specifications of networks.
www.biogeosciences.net/14/4755/2017/Biogeosciences, 14, 4755-4766, 2017 T. Kaminski and P. J. Rayner: QND For the interpretation of QND results it is useful to develop a qualitative understanding of the sensitivity of the result to the inputs of the QND system.For example, the impact of an observation on the target quantity, i.e. the uncertainty reduction compared to the prior increases when the Jacobian M increases: through Eq. ( 1) an increase in M will translate into an increase in U(x) −1 , i.e. a reduced posterior uncertainty.In contrast, if M was 0, the observation would have no effect, i.e. it would be irrelevant.In the same way, the impact of an observation increases when the data uncertainty U(y) is reduced.By contrast, an observation with very high data uncertainty would have only a small effect.Possible reasons for high data uncertainty are high uncertainty in the observation or little confidence in our capability to simulate the observation, as expressed by Eq. ( 2).An increase in the prior uncertainty U(x 0 ) reduces the second term in Eq. ( 1).This, in turn, increases the prior and posterior control uncertainties and, thus, also the prior and posterior target uncertainties.But for any relevant observation the increase of the posterior uncertainty is lower than that of the prior uncertainty, because in Eq. ( 1) the increase in the prior uncertainty increases the weight of the constraint by the data, which is expressed by the first term.As a consequence the increase in the prior uncertainty yields a higher uncertainty reduction.We note that from Eq. (3) a given target quantity is linked by N to a one-dimensional sub-space of the control space.The observation must achieve an uncertainty reduction in that subspace to yield an uncertainty reduction in the target quantity.The contribution of the model error, σ (f mod ), has the effect of decreasing the uncertainty reduction; σ (f mod ) will always remain the lower bound on the posterior target uncertainty, no matter how relevant the observations are.When comparing the performance of two networks, we can pronounce their difference in uncertainty reduction by neglecting σ (f mod ).
We will see in the following sections that for many QND applications, it is sufficient to evaluate the performance of a small number of candidate networks and compare their performance for a range of reasonable target quantities.For applications with many candidate networks it is often impractical to test every candidate network, and a formal minimisation algorithm is used to identify the network with the lowest posterior uncertainty in the target quantity.In the case of multiple target quantities, we can minimise a suitable scalar function of their posterior uncertainties, e.g.their sum of squares.An example for the mathematically rigorous analysis of the complexity of a network optimisation problem is provided by Krause et al. (2008).Often the posterior uncertainty calculation for a single candidate network is so computationally demanding that applications are only tractable with more pragmatic and efficient minimisation approaches that may yield sub-optimal results (see Sect. 3).
The QND approach relies on the capability to propagate data uncertainty to target uncertainty.This requirement is met by CCDASs and transport inversion systems that use an explicit representation of M (or alternatively the entire right-hand side of Eq. 1) and of N .The combination of high-dimensional control and data spaces yields a large M , which may render its computation and the solution of Eq. ( 1) difficult or even impossible.As a consequence, the control space is often reduced from the full space-time grid of the model to, e.g., scalar coefficients of large flux patterns (big region approach).To reduce the dimension of the data space, Eq. ( 1) can also be solved in a sequential procedure, where each step uses only sub-sets of the observations and the posterior control uncertainty from the previous step as prior (see e.g.Kaminski and Rayner, 2008).In contrast to a fixed lag (ensemble) Kalman filter approach, it is then essential not to change the control space from one step to the next (Feng et al., 2009).
There are other approaches than QND that employ data assimilation/inverse modelling systems for the design of observational networks but do not rely on the availability of posterior uncertainty.As such techniques have not been applied in a CCDAS context, we only give brief summaries of the approaches that are most popular in the numerical weather prediction (Masutani et al., 2010) and chemical data assimilation (Timmermans et al., 2015) communities.OSSEs are conducted as follows: first, a "true" control vector is selected.Second, a model (with suitable observation operators) is used to generate pseudo-observations (in a so-called nature run).Third, prior (see e.g.Chevallier et al., 2007) and data are perturbed according to their respective uncertainties.Fourth, the inverse modelling system is used to retrieve a control vector.As an indication of the combined performance of the network and the inverse modelling system, one can use the difference between true and retrieved control vectors or between simulations of some target quantity from the true and retrieved control vectors.For linear Gaussian problems the difference between retrieved and true control vectors is a realisation of the posterior covariance.The above procedure is termed identical twin experiment, if the nature run and the inverse modelling system employ the same model, which means the experiment is conducted under the assumption of a model that perfectly represents the real world.OSEs use a network of real observations.They assesses the added value of a data stream by excluding it from the network.Unlike QND, which requires only the data uncertainty, OSSEs and OSEs require, in addition, pseudo (OSSEs) or real (OSEs) observations.Further, they typically use metrics other than uncertainty reduction.OSSEs and identical twin experiments can be employed to assess the impact of biases in the observations, the prior, or the model (see e.g.Engelen et al., 2002).Further approaches to network design rely on the analysis of the patterns of variability in real (see e.g.Mahecha et al., 2017) or pseudo-observations (see e.g.Shiga et al., 2013).

Evolution of the field
The QND approach is based on work by Hardt and Scherbaum (1994) who optimised the station locations for a seismographic network.QND was introduced to biogeosciences by Rayner et al. (1996), who optimised the spatial distribution of the atmospheric network for sampling CO 2 and the δ 13 C isotope in terms of their capability to constrain, in an atmospheric transport inversion, the global ocean uptake.Surprisingly, the optimal location for an additional site was over the Amazon rainforest, the region with the highest prior flux uncertainty.In their QND system a site at this location would minimise the uncertainty in the global terrestrial flux, which through the atmospheric budget would indirectly provide the best possible constraint on the total ocean flux.This mechanism did not work anymore when they changed their target quantity from the globally integrated ocean flux to a set of regionally integrated ocean fluxes.
This groundbreaking study established the QND approach in the carbon cycle community and already illustrated the need for a careful formulation of the target quantity.It paved the way for three lines of QND applications: the first continues the optimisation of the atmospheric in situ sampling network for use in atmospheric transport inversions.The second optimises the design of missions sensing XCO 2 from space for use in transport inversions.The third line employs the QND approach to terrestrial biosphere models.As our focus is on QND applications in a CCDAS, we only briefly point to the most relevant QND applications with atmospheric transport inversions; more detail on this topic can be found in Kaminski and Rayner (2008).
Pure atmospheric applications of QND include the studies by Patra and Maksyutov (2002), Patra et al. (2003a), Law et al. (2004), andRayner (2004), which explored the dependency of the optimised networks on several aspects of the problem formulation as well as the optimisation approach.While Rayner et al. (1996) used the simulated annealing approach to determine optimal station locations, Patra and Maksyutov (2002) demonstrated that their incremental optimisation approach of iteratively finding one optimal station location at a time combined comparable performance with higher computational efficiency.Rayner (2004) introduced the use of genetic algorithms to tackle the optimisation problem.The study addressed the specification of the model uncertainty contribution U(y mod ) to the data uncertainty U(y) (which he derived from the spread of a multimodel ensemble) and demonstrated its impact on the optimal network.The study by Law et al. (2004) explored several aspects of the QND problem, including higher temporal resolution of the data space and higher space-time resolution of the flux space.They employed a global model but their target region was Australia.The study optimised locations for high-frequency sampling (4 hourly) in addition to a global background network that mainly consisted of flask sampling sites.In order to avoid so-called aggregation errors (Kaminski et al., 2001) induced by prescribed flux patterns over coarse regions (typically of the size of a continent) they divided their target region into 12 sub-regions.For the same reason, rather than solving for a monthly flux field per region they split the flux into a constant and a daytime component.To assess the magnitude of their aggregation error, they performed, in parallel to the QND assessments, identical twin experiments.The study also assessed the impact of data uncertainty or prior uncertainty on the optimal networks.In contrast to the above studies, Lauvaux et al. (2012) used real atmospheric observations and a regional model: highfrequency samples were provided by a network of up to eight sites, and the study tested the effect of removing sites from the network.Due to the limited domain, fluxes on the boundary had to be included in the control vector.Recent examples for QND studies addressing rather practical design questions with a regional model are provided by Ziehn et al. (2014) for 2014).Most of the QND assessments with transport inversion systems addressed, however, only a single data stream.
We note that techniques other than QND were also applied for the assessment of space missions.Identical twin experiments performed with variational transport inversion systems to assess the performance of OCO include studies by Chevallier et al. (2007), Chevallier (2007), and Baker et al. (2010).Chevallier et al. (2007) used an ensemble generated by five inversions to approximate the uncertainty reduction in control space.Baker et al. (2010) studied the effect of transport error, incorrect uncertainty specifications, and systematic errors in the observations.Chevallier ( 2007) demonstrated the effect correlated data uncertainty.
Before addressing QND applications with CCDASs, we recall the impact of prior information.Within a given QND system, it is manifested in the sensitivity of the posterior target uncertainty with respect to the prior control uncertainty.We need to keep in mind, however, that prior information has already entered the construction of the QND system.This is through the selection of the suite of models and observation operators (including their implementation) used in the QND system, and then through the definition of the control vector.This includes the above-mentioned selection of the uncertain process parameters and initial and boundary conditions as well as their spatial differentiation.For example, we can specify a process parameter globally or as specific to a plant functional type (PFT) or a region.In a transport inversion, the control vector may consist of fluxes on the space-time grid of the model, or multipliers of prescribed patterns.In a CCDAS the model achieves a coupling between the fluxes in space and time, which reduces the dimension of the control space.
An initial QND application with a CCDAS was performed in the system based on the simple diagnostic biosphere model (SDBM; Knorr and Heimann, 1995).The study (Kaminski et al., 2002) assessed the effect of adding a hypothetical direct flux observation over the model's broadleaf evergreen biome to the atmospheric flask sampling network as reference network.The study did not calculate the corresponding uncertainty reduction in flux space, i.e. the target quantities were the model's control variables, a vector of two process parameters per biome.Compared to the reference network, the additional observation achieved substantial uncertainty reduction for the biome's temperature dependency of the heterotrophic respiration.This was the first quantification of the complementary nature of atmospheric and ecosystem (i.e.direct flux) measurements as constraints in a CCDAS.
A more systematic assessment of the complementarity of atmospheric and ecosystem measurements was performed by Kaminski et al. (2012b).The study employed the prognostic Biosphere Energy-Transfer Hydrology (BETHY; Knorr, 2000) model, which composes the global vegetation out of 13 PFTs.The control vector consisted of the initial atmospheric condition and the model's process parameters, some of which were differentiated by PFT and some of which were globally uniform.Target quantities were 20-year averages of net ecosystem production (NEP) and NPP integrated over a number of regions and over each of the model's 2 • by 2 • grid cells.They used pre-computed Jacobians for direct flux measurements over any land point on the globe, for 15 selected sites for continuous sampling of atmospheric CO 2 , for 41 selected sites for flask sampling of atmospheric CO 2 , and for all target quantities.Thanks to these pre-computed Jacobians they could construct an interactive tool for assessments of user-specified networks.The study showed that a network with one flux site over each of the model's PFTs populating Europe is sufficient to infer the terrestrial carbon budget of that continent.With only one of these PFTs unsampled (incomplete flux network), the posterior flux uncertainty drastically increases.In the model study we can, of course, avoid such incomplete networks, as we know the number and distribution of the PFTs.Since this is not the case for the real world, such incompleteness is likely.The study also showed that the addition of an atmospheric network (in this case the flask network) provides a protection against the risk of missing a PFT or failure of a flux site.Through a set of experiments with an increased number of PFTs (up to 325) the robustness of the above findings against the dimension of the PFT space was shown.The study demonstrated the abovementioned difference between QND with atmospheric transport inversions and a CCDAS: through the model equations the constraint of an observation taken at a particular point in time and space can act as a constraint to fluxes at another point in time and space.
The complementarity of flux and atmospheric networks was confirmed by Koffi et al. (2013).They employed the same model (BETHY) with two different atmospheric transport models, with combinations of two flux networks, two flask sampling networks (with 62 and 77 sites, respectively), and one network of continuous atmospheric sampling (27 sites).Atmospheric sampling frequencies varied between monthly and 3-hourly.Their target quantities were the process parameters in the control vector.They found that their atmospheric networks perform well in constraining parameters that impact NEP but are not well suited to constrain parameters that impact gross primary production (GPP).
The study of Szolgayová et al. (2016) builds on the flux and flask network definitions of Kaminski et al. (2012b).It employed the CCDAS QND system to assess the uncertainty reduction in CO 2 fluxes through the combined network.The study then used a real options model to quantify the economic value of this uncertainty reduction and contrasted it with the cost of the network.They found a positive net value of the network that, in sensitivity tests, proved robust for a range of assumptions entering the model.
The first QND assessment of a space mission with a CCDAS evaluated several design options for the abovementioned A-SCOPE mission (Kaminski et al., 2010).These design options were the wave band and the observational uncertainty, and the target quantities were 20-year averages of NEP and NPP.Owing to the active instrument's high sampling frequency, despite higher data uncertainty the constraint from A-SCOPE observations outperformed the constraint from the flask samples.The atmospheric transport was represented by a pre-computed Jacobian mapping fluxes on concentration changes.To reduce the dimension of that Jacobian and the associated computational burden, the sensitivity of XCO 2 samples with respect to fluxes within the same latitude band and more than 2 months prior to the observations was assumed to be uniform.Switching to monthly-mean observations had little impact on the posterior uncertainty.
A further CCDAS study (Kaminski et al., 2012a) assessed the constraint provided by an optical mission.Target quantities were regional NEP and NPP as well as two hydrological quantities, namely the plant available soil moisture and the evapotranspiration.The optical mission was represented by a product of the FAPAR similar to that derived from the Medium Resolution Imaging Spectrometer (MERIS) by Gobron et al. (1997).Details on this data stream are provided by the contribution of Scholze et al. (2017).The observation operator for FAPAR is a newly developed phenology scheme (Knorr et al., 2010) that shows smooth dependence of simulated FAPAR in response to changes in process parameters.The inclusion of this observation operator added further uncertain process parameters and, thus, extended the control vector.Atmospheric flask samples were included as a further data stream.Again, the required observational and target Jacobians we pre-computed and exploited to set up an interactive QND system.The system can evaluate both data streams, flask samples of CO 2 and FAPAR, individually and in combination.For the FAPAR data stream it allowed changes to aspects of the mission such as the accuracy of the product and the length of the mission.The study demonstrated a moderate added value of FAPAR in constraining carbon fluxes and a high added value in constraining hydrological quantities as well as the complementarity of FAPAR to atmospheric CO 2 .
Solar-induced fluorescence (SIF) is a further observational constraint from space and also presented in the contribution of Scholze et al. (2017).Its assessment in a CCDAS requires a dedicated observation operator such as the SCOPE model by van der Tol et al. (2009).Koffi et al. (2015) coupled SCOPE with BETHY and provided a set of sensitivity tests.Norton et al. (2017) present a QND assessment for the SIF product retrieved from GOSAT.The target quantity is the GPP at grid scale.The study adds to the control vector, as an extra component, a scalar multiplier of the incoming solar radiation, an external forcing term of BETHY.Rayner et al. (2010) focused on the anthropogenic component of the carbon cycle and constructed a Fossil Fuel Data Assimilation System (FFDAS) that assimilates statistics of national emissions, modelled population density, and remote sensing observations (nightlights) into a model of the fossil fuel emissions.Posterior uncertainty is approximated by a 25-member ensemble of inversions for perturbed prior and observations.The system was employed to quantify (through the uncertainty reduction in fossil fuel emissions) the impact of hypothetical measurements of the annual mean 14 CO 2 concentration collected by a network of 194 atmospheric sites.
We note that all of the above CCDAS-based QND studies explored a set of candidate networks or mission concepts.None of them applied a formal optimisation algorithm.
4 Separate and integrated QND Modelling systems that simultaneously simulate the components of the carbon cycle as a coupled system are computationally heavy, and a QND framework amplifies the computational burden.For example, the QND systems by Kaminski et al. (2012b) used a terrestrial biosphere model coupled to two atmospheric transport models to evaluate the combination of one terrestrial and two atmospheric networks with surface flux integrals as target quantities.To reduce the computational demands one may think of a strategy that treats the QND problem separately per model and network component and then integrates the results (in the following termed "separate QND").In the example of Kaminski et al. (2012b) this would mean computing three posterior uncertainty estimates for the surface flux, one by evaluating the terrestrial network in a QND system around the terrestrial model, and each of the two others by using a QND system for each of the atmospheric networks with the corresponding transport model.As the three component networks are independent, one could argue that the respective posterior uncertainties are uncorrelated, and hence the square root of the sum of the squares of the three posterior flux uncertainties would yield the posterior uncertainty that could be achieved by the combination of the three component networks.In the following we contrast this separate QND approach with the integrated QND for the coupled model.We do this first in a highly simplified example and then in the system of Kaminski et al. (2012b).

Simplified model
Let us first consider a highly simplified model, in which our target quantity, the net flux f , directly depends on two parameters p 1 and p 2 , each representing a component model: For simplicity, assuming both parameters have the same uncorrelated prior uncertainty σ (p 0 ), the prior uncertainty σ (f 0 ) of the flux is Now, assuming we have two component networks, one can only constrain p 1 and reduce the uncertainty by a factor k, and the other network can only constrain p 2 , for simplicity www.biogeosciences.net/14/4755/2017/Biogeosciences, 14, 4755-4766, 2017 it reduces the uncertainty by the same factor of k.If we construct a QND system around both component models that can evaluate both networks simultaneously we would have (red PDF in Fig. 2) If we use only either of the two sub-networks we reduce the uncertainty only for one of the parameters we have (black PDF in Fig. 2): To combine the flux estimates provided by the two subnetworks we could use their (evenly weighted for simplicity) average: If we ignore for a moment that they are based on the same parameter prior, f 1 and f 2 are independent and we get the following for the uncertainty in f : Applying Eq. ( 9) to both estimates we have the following: The double use of the prior produces correlated uncertainty and increases σ (f ).For a small k, i.e. efficient networks or large prior uncertainty, this effect is small.The lower limit (for k approaching 0) in the separate QND case is while in the integrated QND case (Eq.8) it is zero.This means that the separate QND approach drastically underestimates the network performance.

Complex example
The above example is very much simplified, and before generalising the finding we need to consider the consequences of the simplifications.The assumption of only two parameters is not a serious limitation: for the case of two larger sets of parameters, with each set only "seen" by one of the component networks the example would work similarly.The assumption of full complementarity of the two sub-networks is more important.If there were parameters that neither system could observe, not even the integrated QND could bring the posterior flux uncertainty to zero.Typically, however, a given data stream tends to be good on one subset of the parameter space and weaker on another one.If there is at least some complementarity, the integrated model can take advantage of this complementarity, while in the separate QND approach the badly observed parts of the parameter space have the potential to spoil the performance.
To adapt the above algebra to such a case is a bit cumbersome, because constraining two or more parameters simultaneously would involve matrix inversion.It is easier to run an example (with two sub-networks) in the system of Kaminski et al. (2012b): we define a simple flask observing network composed of the two sites (MLO and SPO) and a simple flux network by a site in Europe (0 • longitude and 50 • N latitude) with the tool's default PFT fractions at that site.For both networks we use the tool's default uncertainty, i.e. 1 ppm for flask and 10 gC m −2 day −1 for flux observations.We first set the model error to zero, which yields very low posterior uncertainty but makes the contrast between the networks more drastic.The resulting posterior NEP uncertainties for Europe are 0.29 GtC yr −1 for the flux network and 0.21 GtC yr −1 for the flask network.Using Eq. ( 11) this yields a posterior uncertainty of the combined estimate of By contrast the integrated QND yields a posterior uncertainty for Europe of 0.06 GtC yr −1 , a factor of three lower.
The uncertainty component reflecting model error clearly depends on the quality of the model used.For example, a model that achieves a 20 % uncertainty in the NEP simulated over Europe would (based on the 20-year posterior NEP average of 0.39 GtC yr −1 inferred by Scholze et al., 2007) have a σ (f mod ) of 0.08 GtC yr −1 .Using this value in the evaluation of Eq. (3) would increase the posterior NEP uncertainties for Europe in the separate QND to 0.30 GtC yr −1 for the flux network and to 0.22 GtC yr −1 for the flask network, i.e. according to Eq. ( 11) a combined posterior estimate of 0.19 GtC yr −1 , while the integrated QND would yield a posterior uncertainty of 0.10 GtC yr −1 , i.e. a factor of 2 less.

Suggested next steps
The study by Szolgayová et al. (2016) indicated the role of QND in assessments of the economic value of a carbon observing system.Such assessments are an important and obvious next step as they can provide an objective quantitative basis to decision makers.
We demonstrated the need for an integrated QND approach, i.e. a joint assessment of all relevant data streams in an integrated model that includes all components required to simulate these data streams.In the last decade there were several demonstrations of the QND approach in a CCDAS, for atmospheric data streams (CO 2 and XCO 2 ) and for land data streams (direct flux measurements, FAPAR, SIF).The list of (potential) further direct (e.g.biomass) or indirect (e.g.soil moisture) observational constraints on the carbon cycle is much longer (see e.g.Raupach et al., 2005;Ciais et al., 2014;Dolman et al., 2016;Scholze et al., 2017).Our examples also demonstrate that QND can assess the complementarity of in situ and satellite observations as well as real and hypothetical data streams for a range of suitable target quantities.This is exactly what is needed to guide the evolution of an observing system that can reduce uncertainties in estimated natural and anthropogenic fluxes, as requested by the Paris Agreement.
To cover a particular data stream or target quantity, the model in the core of the QND system needs to be capable of simulating in a realistic manner the sensitivity of the data stream (observational Jacobian) and target quantity (target Jacobian) with respect to changes in the control vector.With regard to natural fluxes, a suitable QND system should also include an ocean component, to allow the evaluation of oceanic data streams and target quantities, e.g.acidification.The same holds for the inclusion of a methane emissions component (see e.g.Houweling et al., 2017).With regard to anthropogenic fluxes, fossil emissions and land management modules are needed.The above-mentioned FFDAS is an obvious candidate for coupling into a CCDAS.The first demonstration of the inclusion of a fossil emissions module into a CCDAS was provided by Hooker-Stroud (2008).
We explained that the setup of a QND system also relies on subjective choices.Hence, it is advisable to have multiple QND systems in operation; relying on a single one appears risky.It may be useful to also operate a "light" variant of such a system, which relies on pre-computed Jacobians and can rapidly test design questions.A "heavier" system could then be used for a subsequent in-depth analysis of the most promising configurations.It is also necessary to better understand the effect of such subjective choices, in order to minimise their impact on the assessment.This includes the selection of component models and the specification of the control vector, including its resolution or discretisation in space, time, and other dimensions of the model, for example the spaces of plant functional types or of fossil emission sectors.This also concerns approximations we make to reduce the size of the Jacobians, e.g.pre-aggregation of observations.At the technical level, formal optimisation algorithms have so far only been used in QND with transport inversions, not in a CCDAS.Progress at this level would be useful, especially for the design of the in situ network.

Figure 2 .
Figure 2. Schematic illustration of PDFs in parameter space (upper section) and flux space (lower section).Prior parameter PDF in blue.Posterior PDFs for separate (integrated) QND in black (red).Projections onto posterior flux uncertainty with QND for either component network (black) or integrated QND (red).