Introduction
Earth system models (ESMs) are complex software that capture our knowledge of
how the ocean, atmosphere, land, and ice operate and interact. ESMs provide
scientists with powerful tools to better understand our global environment,
its evolution, and the potential impact of human activities (e.g. analyses of
relevant processes, and their interaction and feedback mechanisms). ESM
applications range from numerical weather prediction (NWP), to seasonal or
decadal forecasting see e.g., to climate
projections on centennial scales or even longer
.
Before being used for predictions, ESMs and their components should be
confronted with observations in order to assure their realism (validation).
Such validation procedures can be extended to standardized assessments of
model performance in so-called benchmarking systems by evaluation of a set of
observation-based metrics see e.g.. This
involves the definition of metrics that quantify the model performance
through the fit to observations. A further step towards the rigorous use of
the observations is their ingestion in formal data assimilation procedures,
e.g. to constrain the model's initial state (initialization) or tunable
parameters in the model's process representations (calibration).
Such confrontation with observations is hampered by the fact that observed
and modelled quantities typically differ in nature or scale (in space and
time). For example, a flask sample of the atmospheric carbon dioxide
concentration provides a value at a specific point in space and time, whereas
an atmospheric tracer model operates in a discretized representation of space
and time, i.e. on values that refer to a box in the atmosphere and a
particular period of time. Any comparison of the two quantities (modelled and
observed) must hence take the uncertainty arising from this representation
error into account see e.g.. Another example
is a vertical profile of the ocean temperature and salinity provided by a
floating buoy (see http://www.argo.ucsd.edu). Again the spatial
scales of the observation and the model do not match (in the horizontal
dimension). In addition, ocean models are formulated in terms of potential
temperature rather than temperature. Since we can only compare quantities of
the same nature, some form of transformation is required before any
comparison can take place. Such a difference in nature is intrinsic to
observations from space, where the raw quantities measured by satellites,
i.e. photon counts , are by nature only indirectly
(through radiative transfer processes) related to the model quantities of
interest.
The link from the model to the observations is provided through a set of
relationships expressed in terms of an observation operator. We can
think of an observation operator as an arm that enables the ESM to access a
particular type of observation (see Fig. ). We stress that
the usage of the term operator is not meant to imply the linearity
of the observation operator. In fact, observation operators range in
complexity from a simple interpolation or integral scheme up to a chain of
sophisticated non-linear radiative transfer models.
Schematic of an ESM assessing several data types via observation
operators H1,…,Hn.
The layout of the remainder of this paper is as follows. Section
introduces the concept of an observation operator and presents examples. The
role of observation operators in applications is presented in
Sect. . Section highlights the use
of derivatives of observation operators and introduces automatic differentiation, a technique to provide these derivatives. Finally,
Sect. draws conclusions.
Model–data comparison at the sensor level (level 1, solid arrows)
and at the level of geophysical variables (level 2, dotted arrows). Ovals
denote data, and rectangles denote some form of processing.
Observation operators
Definition
Mathematically, the observation operator is defined as mapping H from
the vector of state variables z (of the
model) onto the vector of observations y:
H:z↦y.
The observation vector can include, for example, observed radiances, radar
backscatter, or in situ observations. The vector of the model's state
variables (state vector) defines the simulated system for a given
time step at all points in space, and the evolution of the system is
described by a sequence of state vectors forming a trajectory through the
state space. We note that Eq. () may be generalized in the sense
that the simulation of observations of temporal averages or integrals (e.g.
an increment in above-ground biomass over several years or an albedo covering
several weeks) may require not only a single state, but also
the trajectory over the averaging/integration interval. The state
variables of a dynamical model are also called prognostic variables, to
contrast them with diagnostic variables, which are computed from the state
and evolve only indirectly through the evolution of the state. For example,
the albedo of the land surface is diagnosed from the state of the
vegetation–soil system. Hence, if we achieve a change in the model state at
any given point in time, the model will then propagate this change of state
forward in time, and we achieve a change of the model trajectory (e.g. to
improve the fit to observations). This means that to bring observational
information into the model, we must link the observations to the state; in
other words, the model's state vector constitutes the interface between the
model and the observation operator.
The solid path in Fig. sketches how an observation operator
(H1) enables the comparison of simulated and observed values at the
sensor level, i.e. at the level of spectral radiances, typically referred to
as level 1 data products . Another way to make Earth
observation (EO) data accessible to dynamical models is by deriving, by means
of a retrieval algorithm, a biogeophysical variable (in the following
denoted as a geophysical variable) from the satellite observations. Such EO
products are usually called level 2 data products (and level 3 refers to
products on a space–time grid). Internally, the retrieval algorithm also
relies on a functional relationship that maps the geophysical variable(s) of
interest onto the spectral radiance. This mapping is similar, if not
identical to, the observation operator H1, although the term used by the
EO community is forward model. The retrieval can be regarded as an
inversion of H1. As the examples below will illustrate, the retrieved
level 2 product will typically not exactly coincide with a component of the
model state vector. Hence, the confrontation of level 2 data with the model
(the dotted path in Fig. ) also requires an observation operator
(denoted by H2).
Examples
Figure attempts to sketch a generic observation operator
H1, which links a model's state vector to observed spectral radiance.
For the sake of clarity, the figure focuses on processing steps that map one
variable onto another and omits further important steps that involve
transformations in space and time, i.e. interpolation, averaging, or orbit
simulation.
The simulation of spectral radiances at the sensor level requires information
from the atmosphere and the land/ocean surface, including the description of
ice or snow covers. Hence, the observation operator typically consists of
various modules. First, from the model state, the relevant electromagnetic
signatures are simulated. For example, for a passive optical sensor observing
the terrestrial vegetation, this would be the reflected sunlight, and it
would be computed by a model of the radiative transfer within the canopy; for
examples, see or . For a passive
microwave sensor that observes sea ice and snow, this would be the thermal
emission, and it would involve a model of the radiative transfer within the
sea ice and snowpack see e.g..
In the atmosphere, this could be a model for the emissivity of clouds as a
function of the atmospheric state. The next step covers the path through the
atmosphere from the observed components to the sensor and requires a model of
the radiative transfer through the atmosphere. Prime examples are the
Radiative Transfer for the TIROS Operational Vertical Sounder
RTTOV; for the microwave and infrared domain,
the Second Simulation of a Satellite Signal in the Solar Spectrum, 6S , for the solar domain, or the optimal spectral
sampling (OSS) method . The output of the radiative
transfer model can be compared with a level 1 product.
Each type of observation requires its own observation operator in order to be
accessible to a model. The complexity of the observation operator typically
reflects a compromise between the accuracy required for the application at
hand and the available computational resources. In a space mission, the
observation operator depends on characteristics such as the geometry of the
observation (as a function of the orbit of the platform) or the measuring
principle and therefore the spectral sensitivity of the sensor. The observation
operator also depends on the formulation of the dynamical model. One aspect
is the state space, which depends on the model formulation. For example, an
atmospheric model can either diagnose clouds or include them in the state
space . In the former case, the diagnostic cloud
model is part of the observation operator; in the latter, it is not. Even
though parts of an observation operator are usually model-dependent, it is
desirable to implement the observation operator in a modular form with
carefully designed interfaces. This modularity maximizes the flexible use and
reuse for assimilation and retrievals and the adaptation to new models or
observations; i.e. it ensures multifunctionality.
Generic scheme of an observation operator for spectral radiance.
Oval boxes denote data, and rectangular boxes denote processing.
The crucial role of observation operators is reflected in comparison
exercises, such as the radiation transfer model intercomparison (RAMI)
initiative for the transfer of radiation in plant canopies and over soil
surfaces .
A similar activity for the atmosphere is the international project
Intercomparison of 3-D Radiation Codes (I3RC) . The
I3RC focuses on the interaction of solar and thermal radiation with cloudy
atmospheres. Another activity in this domain is the Cloud Feedback Model
Intercomparison Project (CFMIP), which has set up the CFMIP Observation
Simulator Package (COSP) . The modular package includes
a set of observation operators that map model output consisting of “vertical
profiles of temperature, humidity, hydrometeor (clouds and precipitation)
mixing ratios, cloud optical thickness and emissivity, along with surface
temperature and emissivity” onto a set of level 2 products retrieved from
“the following instruments: CloudSat radar, CALIPSO lidar, ISCCP, the MISR,
and the Moderate Resolution Imaging Spectroradiometer (MODIS)”
. The above-mentioned “fast radiative transfer
code RTTOV can also be linked to COSP to produce clear-sky brightness
temperatures for many different channels of past and current infrared and
passive microwave radiometers” . Not only does COSP
greatly simplify the comparison of model output with EO products, but using
standardized interfaces also allows the comparison of multiple models through
the same observation operators with EO data from various sources; this
facilitates the attribution of a model–data mismatch to aspects of the
model, the observation operator, or the observations. The Community Microwave
Emission Modelling Platform CMEM; takes a similar role
for the modelling of the emissivity of the canopy–soil system in the
spectral domain from 1 to 20 GHz. For example, use 12
different configurations of the modular system in their Land Surface Models
Intercomparison Project (ALMIP).
Applications of observation operators
This section starts with an introduction of the formalism behind advanced
data assimilation and retrieval schemes. The details of the formalism are
useful to understand the application examples in this section (and the
commonalities between assimilation and retrievals) and the need for
derivative information that is discussed in Sect. .
Formalism of data assimilation and retrieval
Data assimilation is a procedure to combine the information from observations
with the information in a dynamical model. There is a range of data
assimilation techniques with varying degrees of sophistication. The simplest
techniques try to replace a component of the model state vector with an
observation or, more precisely, some average of the two. More advanced
approaches can assimilate observations y that are linked to the
state through an observation operator H, which can be an observation operator
for in situ data or for EO data; for example, the operators H1 and H2
introduced in Sect. (see Eq. and
Fig. ). The assimilation problem is typically formalized as a
minimization problem for a misfit function
J(x)=12H(x)-yTUy-1H(x)-y+12x-xprTCxpr-1x-xpr,
where x denotes the vector of unknowns. Even though in some
applications this vector of unknowns may coincide with the model state,
z, this is not generally the case (as will be discussed below), and
we need to make a clear distinction between the both objects.
The function J(x) is composed of two terms. The first term quantifies
the misfit between the observations and their simulated counterpart
(observational term). Uy has to account for the uncertainty in
the observations and the uncertainty imposed by the imperfection of the
model, including the above-mentioned representativeness in space and time.
For diagonal Uy (uncorrelated uncertainty), it reduces to a least
squares fit to observations. The second term quantifies the deviation of the
model state from the prior information xpr (prior term, also
called background). This term provides a means to include information in
addition to the observational information into the assimilation procedure,
and it ensures the existence of a minimum in cases where the observational
information is not sufficient to constrain the unknowns. Both terms,
observation misfit and prior, are weighted in inverse proportion to the
respective uncertainties, i.e. the combined uncertainty in the observations
and observation operator, Uy, and the uncertainty in the prior
information, Uxpr. The superscript T denotes
transposition. Note that the equation does not require the observations to be
provided on the space–time grid of the model. The observations can come in
any spatio-temporal distribution, e.g. the above-mentioned point measurements
or orbits, as long as we can formulate the appropriate observation operator.
Equation () formalizes what in numerical weather prediction is
called three-dimensional variational assimilation
3D-Var;; more precisely, it formalizes its analysis step, which is
then followed by a forecast step. Operationally, the assimilation scheme is
run in cyclic mode through these two steps. In such a cyclic scheme, the
prior information is provided by the previous forecast; i.e. it is consistent
with the dynamical information from the model, and at the same time it suffers
from the errors in the model. In this setup, the specification of the prior
uncertainty is particularly challenging see e.g..
The model dynamics are even more emphasized when the scheme of
Eq. () is extended to contain observations yi at
different time steps (i=,1…,n) to constrain the initial state
z0=x through
J(x)=12∑i=1,nH(zi(x))-yiTUyi-1H(zi(x))-yi+12x-xprTUxpr-1x-xpr.
This is the setup of the analysis step in four-dimensional variational
assimilation (4D-Var) schemes, where the dynamical model M is used as a
constraint that links the states at all observation times via
zi+1=M(zi)
to the initial state z0=x. For convenience, the notation suppresses
the time-dependent nature of H and M, and it also assumes that the
data uncertainties at different time steps are uncorrelated. While 4D-Var
solves a single minimization problem to find a (dynamically consistent) model
trajectory, 3D-Var is a sequential approach, i.e. it solves a sequence of
minimization problems, which that yield a dynamically inconsistent sequence of
model states. In operational NWP, as mentioned above, the application of
4D-Var is in a cyclic procedure, i.e. also in a sequential manner.
Section will describe applications where this is not the case.
In the 4D-Var approach, the vector of unknowns x can be extended from
the initial state to boundary values and process parameters (model
calibration). Since these are external controls to the dynamical system,
x is also called control vector, a term taken from control theory
. We usually try to select the control vector such that it
comprises the fundamental unknowns of the system at hand, i.e. those with the
highest uncertainty . An extension of the 4D-Var
approach (weak constraint 4D-Var) allows deviations from the model
trajectory, which are included as additional components into the control
vector see e.g..
The Kalman filter is another sequential approach. Its analysis step
solves a slightly simplified form of Eq. (), in which H is
replaced by its linearization H (Jacobian matrix) around the
prior. This allows an analytic solution xpo of
Eq. ()
xpo=xpr-UxpoHTUy-1Hxpr-yi,
the evaluation of which involves the inversion of the matrix
Uxpo=(HTUy-1H+Uxpr-1)-1,
which is typically of high dimension (e.g. 107 in NWP) and expresses the
uncertainty range in xpo that is consistent with uncertainty
ranges in the data and the prior values.
In the case of linear H and Gaussian probability densities for the prior and
the data, the solution of Eq. () is Gaussian as well and
therefore
completely described by its mean (Eq. ) and covariance
(Eq. ). This formalism is, for example, applied in inverse
modelling of the atmospheric transport of carbon dioxide ,
where an atmospheric transport model takes the role of H and the
space–time distribution of the surface fluxes takes the role of x.
Note that the cost function's second derivative (Hessian matrix)
J′′ is related to Uxpo through
Uxpo=J′′(x)-1.
In the non-linear case (i.e. H or M are non-linear), we cannot solve
Eq. () or Eq. () analytically, but via the cost
function's Hessian we can use Eq. () to approximate
Uxpo. Via a linearization N of the model that
links the control variables to model outputs of interest f, we can
approximate the uncertainty range of these model outputs Uf
by
Uf=NUxpoNT.
The alternative to the above assimilation approaches (which are based on
linearizations) are ensemble methods, such as Markov chain Monte Carlo
see e.g., ensemble Kalman filter
techniques, or particle filters see e.g., which rely on forward simulations to sample the
control space. The feasible ensemble size is limited by the computational
demands, which are essentially determined by the complexity of the underlying
model. We also note the challenge of filter degeneracy that limits the
applicability of particle filters to high-dimensional problems see e.g..
We used Eq. () to introduce the formalism of data assimilation.
The same equation also plays a central role in retrievals. The minimization
of Eq. () describes a retrieval algorithm for the entire state.
The prior term regularizes what is otherwise an underdetermined inverse
problem: several of the unknown variables that influence the observed signal
vary continuously with altitude (continuous vertical profiles). Even though
we formulate our observation operators on a vertical grid, there are
typically “fewer” measurements than unknowns. Consequently, there are many
sets of unknown variables that yield an equal fit to observations; i.e. we
have to deal with a non-zero null space . The word fewer
was put in quotation marks to indicate that, more precisely, it is not only the ratio of the numbers of
observations to unknowns that matters here, but it is also the capability of
the observations to constrain the unknowns. The null space is the subspace of
the control space that is not constrained by the observations, and including
more observations of the same type often does not help to reduce the
dimension of the null space. The prior term provides additional information
on every unknown and helps the retrieval algorithm to find a unique solution.
Further, Eqs. () or
() are used to furnish the retrievals with uncertainty ranges.
Another perspective on the assimilation of level 1 data is to regard it as an
advanced form of retrieval, and to regard the assimilation system as an advanced
retrieval algorithm that optimally combines the information from remote
sensing, radiative transfer, and dynamical model. The other point to note is
that H is usually not constant in space and time. For example, the
radiative transfer in the optical domain is affected by atmospheric water
vapour and aerosols. A retrieval of a land surface variable, for example, requires
information on clouds and aerosols. In a coupled atmosphere–land model,
these data
are available in a form that is dynamically consistent with the state of the
land surface; on the other hand, they are also affected by errors in the model.
Data assimilation examples
The prime example of an atmospheric 4D-Var system is the one
operated at the European Centre for Medium-Range
Weather Forecasts (ECMWF) in their Integrated Forecasting System
IFS;. The 4D-Var system has been in operation since 2003;
meanwhile, most of the assimilated observations are remotely sensed radiances.
Observations are provided by about 50 different sensors and used with
appropriate observation operators. The system is used with a 12 h
assimilation window to initialize the operational forecast. Several other
weather services (including those of Canada, France, and the UK) are running
similar 4D-Var systems. A recent development at NWP centres are hybrid
approaches that combine ensemble and variational approaches. Such a hybrid
approach is operational at ECMWF
and the NWP centres of the UK
and Canada .
A prominent example of a variational ocean assimilation system was set up around the MIT General Circulation Model (MITgcm; ) by
the consortium ECCO: Estimating the Circulation and Climate of the Ocean (see http://www.ecco-group.org). The system uses a
combination of in situ observations and level 2 or 3 remote-sensing products
(including sea surface height, sea surface temperature, wind stress, and
geoid) for ocean state estimation over decadal-scale assimilation
windows . Owing to these long assimilation
windows, the prescribed exchange fluxes with the atmosphere are a major source
of uncertainty in their model trajectory. Hence, this boundary condition is
included in the control vector along with the initial state. Various
applications of the assimilation product require closed property budgets over
the entire assimilation window, which are achieved via variational approaches
in contrast to sequential approaches. Examples are mechanistic and diagnostic
studies of climate variability or oceanic tracer transport problems
.
A recent example of a regional variational assimilation system for the
coupled ocean–sea-ice system in the northern latitudes was developed by
. This system is operated for assimilation windows from a
few weeks to a few years. Their control vector combines (depending on the
application) the initial state, boundary conditions, and process parameters.
The system is constrained by hydrographic in situ observations and level
2 and 3 products of sea surface temperature, sea ice concentration, thickness,
and displacement.
An example of the global terrestrial vegetation is provided by the Carbon
Cycle Data Assimilation System (CCDAS). Initially set up for the assimilation
of in situ observations of the atmospheric CO2 concentration
, the system was extended step by step with observation
operators for several level 2 or 3 products, namely fraction of absorbed
photosynthetically active radiation (FAPAR) products , the column-integrated atmospheric carbon dioxide concentration
(XCO2) , and the surface layer soil
moisture . The observation operator for FAPAR was a
considerable extension of the previous system because it required modules
for the simulation of vegetation phenology and hydrology, which were
previously provided by an offline calculation. The observation operators for
CO2 and XCO2 include models of the atmospheric transport that solve
the continuity equation for carbon dioxide
. The observation operator for surface
layer soil moisture was derived by modification of the initial bucket
formulation of the soil hydrology model (which had no equivalent to the thin
surface layer that is observed from space). The assimilation window ranges
from years to decades. Considering uncertain values of the parameters
(constants) in process formulations as the major source of uncertainty in the
model trajectory, the control vector is composed of (depending on the setup)
50–100 (in extreme cases up to 1000) process parameter
values. This type of application is called parameter estimation or model
calibration.
Retrieval examples
The integrated retrieval of or solves
simultaneously for geophysical variables (level 2 data) of the atmosphere
(wind speed, total water vapour, cloud liquid water), the ocean (sea surface
temperature), and the sea ice (ice surface temperature, total sea ice
concentration, multi-year ice fraction). Technically, as H is non-linear,
and use Eq. () in an iterative procedure (xpo
from one step is provided as xpr to the subsequent step),
which recomputes H by linearization around the current
xpr. Upon convergence, they deliver posterior uncertainties
via Eq. (). Their input values are radiances (brightness
temperatures) observed by the Advanced Microwave Scanning Radiometer for EOS
(AMSR-E). Their prior values are taken from a range of sources, including
analysis data provided by an NWP assimilation system, or separate univariate
retrievals. This integrated retrieval is performed individually for each
observed point in space and time at 12.5 km horizontal resolution. Although
the use of the same level 1 data in an assimilation system (ensuring
dynamical consistency between the atmosphere, ocean, and sea ice components)
appears desirable, it is highly challenging in various respects: from a
software development perspective, because it would require an assimilation
system built around a coupled atmosphere, ocean, and sea ice model; from a
computational perspective, because a single run of the coupled model at
12.5 km resolution is already computationally expensive, and an
iterative assimilation scheme would be more so.
An example of the land surface is the Joint Research Centre Two-stream
Inversion Package (JRC-TIP) , which solves
Eq. () for model parameters controlling the radiation transfer
regime in vegetation canopies, namely the effective leaf area index (LAI) and
the spectral scattering properties of the vegetation and the soil. The latter
information is then used to compute the spectral fluxes scattered by,
absorbed in, and transmitted through the vegetation layer as well the fluxes
absorbed in the background (radiant fluxes). Further, the system uses
Eq. () to infer the uncertainty in the retrieved parameters and
Eq. () to propagate these forward to uncertainties in the
simulated radiant fluxes. In its typical setup, the system uses observed
albedos in two broad wavebands (visible and near infrared)
. The JRC-TIP is constructed
around a one-dimensional two-stream model, which takes three-dimensional
radiative transport effects into account . As a
consequence, the retrieved vegetation parameters are effective parameters
(i.e. their values are only meaningful within this model) and are determined
such that the radiant fluxes are simulated as accurately as possible. This
illustrates a crucial point when confronting retrieved level 2 variables with
their ESM counterparts. It is essential that the variables have the same
definition in the forward model that is used for the retrieval and in the
observation operator that is used for its assimilation. For the JRC-TIP
products, this is the case for the radiant fluxes and soil parameters; for the
effective vegetation parameters, it requires the use of the same two-stream
model in the observation operator. An in-depth description of JRC-TIP, its
use for the generation of EO products, and the validation of these products is
provided by a dedicated contribution to this special issue
.
The Earth Observation Land Data Assimilation System
EO-LDAS, is a retrieval and data assimilation
framework for various types of EO data. Primarily, it uses a weak-constraint
variational approach, i.e. the weak constraint form of Eq. (),
in combination with a simple model of the land surface dynamics (e.g.
persistence), which acts as a regularization in the spatial and temporal
domains. The system can, however, also perform single-pixel retrievals or be
operated in a sequential manner. EO-LDAS development started for the optical
domain with the semi-discrete model of the vegetation canopy
as observation operator. The system is being extended with
further observation operators, including the above-mentioned CMEM for passive
microwave observations. This means it has the capability to simultaneously
use EO over a range of spectral domains and exploit their complementarity. To
achieve the computational performance necessary for global-scale
processing of high-resolution EO data, the system is also operated
with fast approximations of radiative transfer (RT) models (including
the above-mentioned atmospheric RT code 6S) by so-called emulators. A similar
regularization strategy in the temporal domain was also presented by
, who operated a variational inversion scheme around a
chain of coupled RT models, i.e. PROSPECT for the
leaf optical properties, Scattering by Arbitrarily Inclined Leaves, SAIL , for the canopy RT, and the Simplified Method for
Atmospheric Correction, SMAC
, for the atmospheric RT.
A serious practical difficulty in data assimilation is the specification of
Uy. This is particulary relevant in multi-data stream
assimilation because Uy defines the respective weights of the
data streams. In the case of level 2 data, Uy is the posterior
uncertainty of the retrieval. Since the retrieval is typically carried out
point by point, uncertainty correlations in space and time are difficult to
assess. Another issue is the data volume required by the uncertainty
information. For a product of a retrieved geophysical variable at n points
in space and time, Uy contains (taking its symmetry into account)
n⋅(n+1)/2 different
values, a volume that is usually prohibitive in real-world applications. The
challenge is to develop ways of providing (approximations of) Uy
that retain the essential information in a minimal data volume. One approach
is through appropriate variable transformations based on a singular value
decomposition of the observation operator
. Another approach is a parametric model
of Uy, for example, as provided by for the
XCO2 product retrieved from Scanning
Imaging Absorption Spectrometer for Atmospheric CHartographY (SCIAMACHY) observations through their Bremen
Optimal Estimation DOAS BESD; algorithm. Generally, the
contribution by the observational uncertainty Uy is certainly
easier to specify in the case of level 1 data. Their direct assimilation
automatically propagates, through the observation operator H1, the full
information content of Uy into the model.
A related topic is the consistency of the prior information
(xpr and Uxpr in Eq. ) used
in the retrieval scheme with the scheme that subsequently uses these
retrievals in a dynamical model, e.g. for assimilation.
demonstrate how the xpr used in the
retrieval can be replaced with that simulated by the dynamical model,
provided that the following are available: xpr and the so-called resolution operator
UxpoHTUy-1H of the retrieval (also
called the averaging kernel), i.e. Eq. ().
goes one step further and addresses the remaining
inconsistencies in Uxpr and highlights their effect in his
atmospheric transport inversion using retrievals of XCO2. More generally,
derives requirements on the retrieval such that the
assimilation of the retrieved level 2 products is equivalent to the direct
assimilation of level 1 products.
Observing system simulation experiments and quantitative network design
Observing system simulation experiments (OSSEs) and quantitative network
design (QND) are two methodologies that evaluate observation impact on
assimilation systems. Through an observing system or observational network,
we understand the superset of all observations that are made available to an
assimilation system. We only give a brief introduction to the topic, as QND
for the carbon cycle is addressed by another contribution to this special
issue ().
An OSSE see e.g. uses a model
plus observation operators to simulate (in a model) analogues of observations
that would be collected by a potential observing system (often the current
observing system extended by a potential new data stream). The model is also
used to simulate, in a so-called “nature run”, a surrogate of reality, i.e.
a reference trajectory over the period of investigation. Then an
assimilation–forecast system (preferably built around a different model) is
used to evaluate some measure of the performance of the potential observing
system and its subsystems. In NWP, the performance of an observing system is
usually quantified by the quality (skill) of a forecast from the initial
value that was constrained by the observation system. Via this procedure one
can, for example, assess the added value of a planned mission in terms of an
increment in forecast skill. We also note a related approach, observing
system experiments (OSEs), which assess observation impact by removing one or
several existing data streams from the list of observations used in
a data assimilation system.
QND (for overviews, see ) relies on
the ability of an assimilation system to evaluate posterior uncertainties on
target quantities of interest via Eqs. () and ().
For a linear model, this propagation of uncertainty is independent of the
observational value; it only depends (via Eqs. and
) on the data and prior uncertainties, the sensitivity of the
observations with respect to the control variables, and (via
Eq. ) the sensitivity of the target quantity f to the control
variables. A first application to mission design was presented by
, who ran an atmospheric transport inversion system
built around a linear model of the atmospheric transport of carbon dioxide in
QND mode. They assessed the utility of remotely sensed carbon dioxide in
constraining its CO2 fluxes. Their benchmark was the in situ flask
sampling network. generalized the method to the
above-mentioned CCDAS and assessed the utility of XCO2 observations with
an active lidar instrument. The performance of the observing system is
quantified by posterior uncertainty of surface fluxes and compared to the
performance of the in situ network. use CCDAS to assess
the performance of potential optical sensor configurations in constraining
the vegetation's carbon and water fluxes. Their benchmark was the MEdium
Resolution Imaging Spectrometer (MERIS) sensor. Another QND application
evaluated airborne sampling strategies for sea ice
thickness and snow depth in the Arctic using simultaneous laser altimeter and
snow radar observations.
For both approaches, OSSE and QND, the importance of suitable observation
operators is obvious. A disadvantage is that the result depends on the model.
Both techniques require the specification of data uncertainties for the
hypothetical data streams to be evaluated.
Derivatives of observation operators
This section first summarizes how the capability to evaluate derivatives of
the observation operator is used in efficient schemes for retrieval,
assimilation, or QND. It then introduces a technique for providing derivative
information.
In variational assimilation, Eqs. () or () are
typically minimized in an iterative procedure that varies x. To do
this efficiently, even for high-dimensional control spaces, so-called gradient
algorithms are employed. They rely on the capability of evaluating the
gradient of J with respect to x to define a search direction in the
space of unknowns. The gradient is useful because it yields the direction of
steepest ascent. For J(x) of Eq. (), straightforward
differentiation with respect to x yields (see Sect. 3.4.4 of
) the gradient
∇J(x)=H(x)TUy-1H(x)-y+Uxpr-1x-xpr,
and we see that its evaluation requires the capability to multiply the
transpose of H with a vector. The uncertainty estimation via
Eq. () based on J′′ additionally requires second
derivative information on H. This second derivative expresses the
curvature of (the components) of H, i.e. the change of its linearization
corresponding to a unit change of x.
Likewise, the Kalman filter requires derivatives of H: in
Eq. (), it multiplies the matrix H and its
transpose with vectors, and for the evaluation of Eq. () it
needs to invert a matrix that contains H and its transpose. One
can do this inversion by pre-computing H or by so-called
matrix-free methods that repeatedly multiply H and its transpose
with vectors.
As mentioned, advanced retrieval algorithms are based on the same equations;
i.e. they typically solve Eq. () either via gradient methods or
the (possibly repeated) application of Eq. () and use either
Eq. () or its approximation Eq. () to estimate
the posterior uncertainty. Hence, they benefit in the same way from
derivatives of H as data assimilation systems. The same holds for QND
schemes, which are based on the computation of uncertainties via
Eq. () or its approximation, Eq. ().
Traditionally, derivatives were approximated by multiple forward runs (finite
difference approximation) see e.g.. This discretized
procedure has two disadvantages. The first is the limited accuracy of this
gradient approximation (providing only the linear term of the Taylor series),
which degrades the performance of the above-listed algorithms. For example,
incorrect gradient information will slow down or prematurely stop the
iterative minimization of J, because gradient-based minimization algorithms
rely on the consistency of evaluations of J and its gradient. The other
disadvantage is that the computational cost of this approximation grows
linearly with the length of the control vector.
Both disadvantages can be avoided by automatic differentiation
AD;. AD is a procedure that generates source code for
the evaluation of derivatives from the code of the underlying function. In the
current case, this function is the observation operator mapping the state
variables onto remote-sensing products. The function code is decomposed into
elementary functions (such as +, -, sin(⋅)), for which the
derivative (local Jacobian) is straightforward to derive. The derivative of
the composite function is then constructed via the chain rule as the product
of all local Jacobians. According to the associative law, this multiple
matrix product can be evaluated in arbitrary order without changing the
result. The tangent linear code (or tangent code) does this
evaluation in the same order as the function is evaluated, which is called
the
forward mode of automatic differentiation. The adjoint code uses
exactly the opposite order, which is called the reverse mode of automatic
differentiation. Even though both modes yield the same derivative, depending
on the dimensions of the function to be differentiated, there may be large
differences in their computational efficiency. The CPU time required by
the tangent code is proportional to the number of the function's input variables
but independent of the number of output variables. By contrast, the CPU time
required by the adjoint code is proportional to the number of output
variables and independent of the number of input variables. Both the tangent
and adjoint codes use values from the function evaluation (required values) see e.g.. Providing required values to the
adjoint code is more complicated than to the tangent code. As an
application of the chain rule, AD provides derivatives that are accurate up
to rounding error.
For variational assimilation, we require the derivative of the scalar-valued
cost function J(x) of Eqs. () and () with respect
to a usually high-dimensional vector x. For a state-of-the-art model,
only the adjoint can provide this derivative with sufficient efficiency. A
product Hv of H with a vector v yields the
directional derivative of H in the direction defined by v,
i.e. the derivative of the function H(x+tv) of a scalar independent variable t. Hence,
this type of product is evaluated most efficiently in forward mode, i.e. by
the tangent linear code of H. By contrast, a product of the form
HTv is the (transpose of the) derivative of the
scalar-valued function vTH(x), which is evaluated most efficiently
in reverse mode, i.e. by the adjoint of H. The scalar forward and reverse
modes required for efficient evaluation of the above Jacobian vector products
are the standard forms of the derivative code. The scalar mode is contrasted by
the vector mode. In forward mode, the vector mode simultaneously computes the
sensitivities with respect to multiple input quantities, and in reverse mode
it
simultaneously computes the sensitivity of multiple output quantities. Experience
shows that the vector mode is considerably more efficient than multiple runs
in scalar mode see e.g.. We use the vector mode for
applications that require the entire Jacobian, H. Here the
sensible choice between forward and reverse modes depends on the relative
dimensions of state and observation spaces.
A particular advantage of AD is that it can guarantee readability and
locality ; i.e. every statement in the derivative code
belongs to a particular statement in the function code. As a consequence, if
the function code is modular, the same modularity is preserved in the
derivative code. Another important advantage of the AD approach is that it
simplifies the maintenance of the derivative code, because it can be quickly
updated after any modification of the function code.
Since an AD tool operates at the code level, it is restricted to a particular
programming language. For the most frequently used programming languages in
Earth system science, namely Fortran and C, AD tools are, however, available.
It is a considerable effort to develop and maintain an AD tool at a level
robust enough for relevant scientific applications. Over the last decade,
tool development has made good progress, and there is a long list of
successful AD applications to component models of the Earth system. A prime
example is the above-mentioned MITgcm, which is compliant with multiple AD
tools . Typically, an initial effort is required to achieve
the compliance of a model with an AD tool. From this basis, maintaining
compliance for incremental updates of the model or the AD tool is less
demanding. This is, on the one hand, because AD tool developers apply regression tests before each new release against a set of benchmarking codes to
preserve compliance and; on the other hand, the incremental model
updates typically respect the AD tool's coding requirements. Examples are
limitations in the handling of pointers or memory allocation/deallocation
sequences. It should also be noted that some AD tools allow the insertion of
directives that support the analysis of the model code. These directives are
helpful (and sometimes necessary) to enhance the efficiency of the generated
code.
In some cases, analytical formulations of the derivative can be derived and
implemented with the observation operator .
Alternatively, the AD process can be mimicked by hand see e.g.;
i.e. a human transforms the function code line by line into derivative code
following the same recipes that are implemented
in AD tools. The advantage of hand-coding derivatives is that a human can be
more flexible than a software tool. On the other hand, the hand-coding
approach is tedious and error prone. As a consequence, this approach requires
considerable development and maintenance effort and is restricted to first
derivatives. The large assimilation systems in the above list
were set up before AD tools were
mature enough to handle the respective function codes.
Whether coded by hand or by an AD tool, the differentiation process typically
reveals issues in the function code that are not apparent otherwise. A
standard example is the square root e.g. used in the computation
of the norm, the derivative of which tends to infinity as the argument tends
to 0. Infinite sensitivities were typically not intended when the model code
was designed, and we can regard a differentiable reformulation as model
improvement. A further example is the introduction of a floor value
of 0 to avoid negative values of the simulated ice-covered area. An obvious
implementation of this floor value as the maximum of the simulated area and 0 produces a step in
the derivative at 0. Another example (now for the implementation as a
minimum) is the formulation of co-limitation in biogeochemical models, in
particular for carbon fixation in the photosynthesis model of
. and
describe the replacement of non-differentiabilities with smooth alternatives
(including a lookup table) in a model of the terrestrial biosphere. The
handling of similar non-differentiabilities in a crop model is described by
. In this context, it is helpful that AD tools see, for
example,for C and Fortran support the provision of derivative
code for external routines.
discusses treatment non-differentiabilities in NWP models
through convection, precipitation, or clouds, and
highlight the use of a statistical variational concept of 4D-Var at
a convective scale. explore the smoothing of non-differentiabilities, in particular of the atmospheric component of an Earth
system model.
Typical formulations of leaf phenology rely on a number of on–off switches
that yield non-differentiable behaviour and hamper the performance in a
CCDAS. This problem was addressed by through the design of
an alternative phenology model that is based on a probabilistic approach and
takes spatial sub-grid variability into account. As a consequence, the model
yields (more realistic) smooth transitions instead of on–off behaviour and
performs well in CCDAS applications
. Our recommendation is to
consider the availability of the sensitivity information as an additional
perspective on the model and its implementation. One can benefit from this
sensitivity information and improve the modelling concept, as demonstrated
for leaf phenology by , or remove errors in the
implementation as described by .