Various observational data streams have been shown to provide valuable constraints on the state and evolution of the global carbon cycle. These observations have the potential to reduce uncertainties in past, current, and predicted natural and anthropogenic surface fluxes. In particular such observations provide independent information for verification of actions as requested by the Paris Agreement. It is, however, difficult to decide which variables to sample, and how, where, and when to sample them, in order to achieve an optimal use of the observational capabilities. Quantitative network design (QND) assesses the impact of a given set of existing or hypothetical observations in a modelling framework. QND has been used to optimise in situ networks and assess the benefit to be expected from planned space missions. This paper describes recent progress and highlights aspects that are not yet sufficiently addressed. It demonstrates the advantage of an integrated QND system that can simultaneously evaluate a multitude of observational data streams and assess their complementarity and redundancy.

There is an increasing number of observational data streams that can
constrain the global carbon cycle

In this context, an obvious challenge is the selection of observational
sampling strategies that allow us to extract a maximum of information on a
selected aspect of the carbon cycle. Typical questions are as follows: there
is funding for

The above two examples already illustrate the complexity of the task and the
need for a systematic, quantitative approach; purely relying on ad hoc
choices guided by intuition is too dangerous. This contribution describes a
formalism, called quantitative network design (QND), that addresses the
evaluation (or even optimisation) of sampling strategies in a modelling
framework. QND evaluates a network, which is defined as a set of observations
of specified variables at specified times and locations (or their integrals)
that can be simulated by a modelling system. The approach uses formal
uncertainty propagation of the observational information to selected target
quantities that are also simulated by the modelling system. The definition of a set of
target quantities formalises the purpose of the network, i.e. the questions
the network is supposed to answer, and the uncertainty in the target quantity
is the specific metric used to assess the performance of the network. In the
above example the target quantities would be regional and temporal integrals
of the net carbon flux or its fossil emissions or land-use change components.
Typically, a network is compared with a simpler reference network. This
reference network can be a network without any observations or a network with
standard background observations. The reduction in uncertainty with respect
to the reference network quantifies the added value or impact of the
additional observations. Section

Almost a decade ago

Modelling systems that simultaneously simulate the components of the carbon
cycle as a coupled system are computationally heavy, and embedding them into
a QND framework even amplifies the computational burden. Hence, it appears to
be appealing to apply QND to component models for the separate evaluation of
sub-networks that provide observations of the respective components.
Section

The presentation of the methodology follows

uncertainty caused by the formulation of individual process representations and their numerical implementation (structural uncertainty);

uncertain constants (process parameters) in the formulation of these processes (parametric uncertainty);

uncertainty in external forcing/boundary values (such as solar insulation or temperature) driving the relevant processes;

uncertainty about the state of the system at the beginning of the simulation (initial state).

Two-step procedure of QND formalism. Ovals denote data, rectangles denote processing.

The target quantity may be any quantity that can be extracted from a
simulation with the underlying model, i.e. any potential model output, for
example terrestrial net primary production (NPP) integrated over an area and
time interval, but also any component of the control vector (for example a
process parameter such as

In this procedure we take uncertainty into account by representing all
variables, i.e. the prior and posterior control vectors as well as the
observations, their equivalents simulated by the model, and the simulated
target quantity by probability density functions (PDFs). We typically assume
a Gaussian form for the prior control vector and the observations, if
necessary after a suitable transformation. For example, instead of the above

For the first QND step we use the model

In the second step, the Jacobian matrix

To conduct a correct QND assessment, the requirement of the model is not that
it simulates the target quantities and observations under investigation
correctly, but the requirement is that it provides a realistic

Our performance metric is the (relative) reduction in posterior target
uncertainty

We note that (through Eqs.

In practice, for pre-defined target quantities and observations, model responses can be pre-computed and stored. A network composed of these pre-defined observations can then be evaluated in terms of the pre-defined target quantities without any further model runs. Only matrix algebra is required to combine the pre-computed sensitivities with the data uncertainties. This allows the setup of QND systems that interactively respond to user specifications of networks.

For the interpretation of QND results it is useful to develop a qualitative
understanding of the sensitivity of the result to the inputs of the QND
system. For example, the impact of an observation on the target quantity,
i.e. the uncertainty reduction compared to the prior increases when the
Jacobian

We will see in the following sections that for many QND applications, it is
sufficient to evaluate the performance of a small number of candidate
networks and compare their performance for a range of reasonable target
quantities. For applications with many candidate networks it is often
impractical to test every candidate network, and a formal minimisation
algorithm is used to identify the network with the lowest posterior
uncertainty in the target quantity. In the case of multiple target quantities, we can minimise a
suitable scalar function of their posterior uncertainties, e.g. their sum of
squares. An example for the mathematically rigorous analysis of the
complexity of a network optimisation problem is provided by

The QND approach relies on the capability to propagate data uncertainty to
target uncertainty. This requirement is met by CCDASs and transport inversion
systems that use an explicit representation of

There are other approaches than QND that employ data assimilation/inverse modelling systems for the design of observational networks
but do not rely on the availability of posterior uncertainty. As such
techniques have not been applied in a CCDAS context, we only give brief
summaries of the approaches that are most popular in the numerical weather
prediction

The QND approach is based on work by

This groundbreaking study established the QND approach in the carbon cycle
community and already illustrated the need for a careful formulation of the
target quantity. It paved the way for three lines of QND applications: the
first continues the optimisation of the atmospheric in situ sampling network
for use in atmospheric transport inversions. The second optimises the
design of missions sensing XCO

Pure atmospheric applications of QND include the studies by

The XCO

These mission assessment studies typically explore a low number of prescribed
design options, i.e. an optimisation algorithm is not used, and their target
quantity is typically the net CO

We note that techniques other than QND were also applied for the assessment
of space missions. Identical twin experiments performed with variational
transport inversion systems to assess the performance of OCO include studies
by

Before addressing QND applications with CCDASs, we recall the impact of prior information. Within a given QND system, it is manifested in the sensitivity of the posterior target uncertainty with respect to the prior control uncertainty. We need to keep in mind, however, that prior information has already entered the construction of the QND system. This is through the selection of the suite of models and observation operators (including their implementation) used in the QND system, and then through the definition of the control vector. This includes the above-mentioned selection of the uncertain process parameters and initial and boundary conditions as well as their spatial differentiation. For example, we can specify a process parameter globally or as specific to a plant functional type (PFT) or a region. In a transport inversion, the control vector may consist of fluxes on the space–time grid of the model, or multipliers of prescribed patterns. In a CCDAS the model achieves a coupling between the fluxes in space and time, which reduces the dimension of the control space.

An initial QND application with a CCDAS was performed in the system based on
the simple diagnostic biosphere model

A more systematic assessment of the complementarity of atmospheric and
ecosystem measurements was performed by

The complementarity of flux and atmospheric networks was confirmed by

The study of

The first QND assessment of a space mission with a CCDAS evaluated several
design options for the above-mentioned A-SCOPE mission

A further CCDAS study

Solar-induced fluorescence (SIF) is a further observational constraint from
space and also presented in the contribution of

We note that all of the above CCDAS-based QND studies explored a set of candidate networks or mission concepts. None of them applied a formal optimisation algorithm.

Modelling systems that simultaneously simulate the components of the carbon
cycle as a coupled system are computationally heavy, and a QND framework
amplifies the computational burden. For example, the QND systems by

Schematic illustration of PDFs in parameter space (upper section) and flux space (lower section). Prior parameter PDF in blue. Posterior PDFs for separate (integrated) QND in black (red). Projections onto posterior flux uncertainty with QND for either component network (black) or integrated QND (red).

Let us first consider a highly simplified model, in which our target
quantity, the net flux

For simplicity, assuming both parameters have the same uncorrelated prior
uncertainty

Now, assuming we have two component networks, one can only constrain

If we use only either of the two sub-networks we reduce the uncertainty only
for one of the parameters we have (black PDF in Fig.

To combine the flux estimates provided by the two sub-networks we could use
their (evenly weighted for simplicity) average:

If we ignore for a moment that they are based on the same parameter prior,

Applying Eq. (

The double use of the prior produces correlated uncertainty and increases

The above example is very much simplified, and before generalising the finding we need to consider the consequences of the simplifications. The assumption of only two parameters is not a serious limitation: for the case of two larger sets of parameters, with each set only “seen” by one of the component networks the example would work similarly. The assumption of full complementarity of the two sub-networks is more important. If there were parameters that neither system could observe, not even the integrated QND could bring the posterior flux uncertainty to zero. Typically, however, a given data stream tends to be good on one subset of the parameter space and weaker on another one. If there is at least some complementarity, the integrated model can take advantage of this complementarity, while in the separate QND approach the badly observed parts of the parameter space have the potential to spoil the performance.

To adapt the above algebra to such a case is a bit cumbersome, because
constraining two or more parameters simultaneously would involve matrix
inversion. It is easier to run an example (with two sub-networks) in the
system of

By contrast the integrated QND yields a posterior uncertainty for Europe of
0.06 GtC yr

The uncertainty component reflecting model error clearly depends on the
quality of the model used. For example, a model that achieves a 20 %
uncertainty in the NEP simulated over Europe would (based on the 20-year
posterior NEP average of 0.39 GtC yr

The study by

We demonstrated the need for an integrated QND approach, i.e. a joint
assessment of all relevant data streams in an integrated model that includes
all components required to simulate these data streams. In the last decade
there were several demonstrations of the QND approach in a CCDAS, for
atmospheric data streams (CO

To cover a particular data stream or target quantity, the model in the core
of the QND system needs to be capable of simulating in a realistic manner the
sensitivity of the data stream (observational Jacobian) and target quantity
(target Jacobian) with respect to changes in the control vector. With regard
to natural fluxes, a suitable QND system should also include an ocean
component, to allow the evaluation of oceanic data streams and target
quantities, e.g. acidification. The same holds for the inclusion of a methane
emissions component

We explained that the setup of a QND system also relies on subjective choices. Hence, it is advisable to have multiple QND systems in operation; relying on a single one appears risky. It may be useful to also operate a “light” variant of such a system, which relies on pre-computed Jacobians and can rapidly test design questions. A “heavier” system could then be used for a subsequent in-depth analysis of the most promising configurations. It is also necessary to better understand the effect of such subjective choices, in order to minimise their impact on the assessment. This includes the selection of component models and the specification of the control vector, including its resolution or discretisation in space, time, and other dimensions of the model, for example the spaces of plant functional types or of fossil emission sectors. This also concerns approximations we make to reduce the size of the Jacobians, e.g. pre-aggregation of observations.

At the technical level, formal optimisation algorithms have so far only been used in QND with transport inversions, not in a CCDAS. Progress at this level would be useful, especially for the design of the in situ network.

No data sets were used in this article.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Data assimilation in carbon/biogeochemical cycles: consistent assimilation of multiple data streams (BG/ACP/GMD inter-journal SI)”. It is not associated with a conference.

We acknowledge the support from the International Space Science Institute (ISSI). This publication is an outcome of the ISSI's Working Group on “Carbon Cycle Data Assimilation: How to consistently assimilate multiple data streams”. Peter Julian Rayner's participation was supported by an Australian Professorial Fellowship (DP1096309). Edited by: Marko Scholze Reviewed by: Albertus J. Dolman and one anonymous referee