Assigning proper prior uncertainties for inverse modelling of CO

Atmospheric inversions are widely used to infer surface CO

The Bayesian formulation of the inverse problem is a balance between the a priori and the observational constraints. It is crucial to introduce a suitable prior flux field and assign to it proper uncertainties. When prior information is combined with inappropriate prior uncertainties, this can lead to poorly retrieved fluxes (Wu et al., 2011). Here, we are interested in biosphere–atmosphere exchange fluxes and their uncertainties, and we make the usual assumption that the uncertainties in anthropogenic emission fluxes are not strongly affecting the atmospheric observations at the rural sites that are used in the regional inversions of biosphere–atmosphere fluxes.

Typically inversions assume that prior uncertainties have a normal and unbiased distribution and thus can be represented in the form of a covariance matrix. The covariance matrix is a method to weigh our confidence of the prior estimates. The prior error covariance determines to what extent the posterior flux estimates will be constrained by the prior fluxes. Ideally the prior uncertainty should reflect the mismatch between the prior guess and the actual (true) biosphere–atmosphere exchange fluxes. In this sense it needs to also have the corresponding error structure with its spatial and temporal correlations.

A number of different assumptions of the error structure have been considered
by atmospheric CO

A recent study by Broquet et al. (2013) obtained good agreements between the statistical uncertainties as derived from the inversion system and the actual misfits calculated by comparing the posterior fluxes to local flux measurements at the European and 1-month scales. These good agreements relied largely on their definition of the prior uncertainties based on the statistics derived in an objective way from model–data mismatch by Chevallier et al. (2006, 2012). In these studies, modelled daily fluxes from a site-scale configuration of the ORCHIDEE model are compared with flux observations made within the global FLUXNET site network, based on the eddy covariance method (Baldocchi et al., 2001), and a statistical upscaling technique is used to derive estimates of the uncertainties in ORCHIDEE simulations at lower resolutions. While typical inversion systems have a resolution ranging from tens of kilometres up to several degrees (hundreds of kilometres), with the true resolution of the inverse flux estimates being even coarser, the spatial representativity of the flux observations typically covers an area with a radius of around 1 km. Considering also the scarcity of the observing sites in the flux network, the spatial information they bring is limited without methods for up-scaling such as the one applied by Chevallier et al. (2012). Typical approaches to up-scale site level fluxes deploy for example model tree algorithms, a machine learning algorithm which is trained to predict carbon flux estimates based on meteorological data, vegetation properties and types (Jung et al., 2009; Xiao et al., 2008), or neural networks (Papale and Valentini, 2003). Nevertheless eddy covariance measurements provide a unique opportunity to infer estimates of the prior uncertainties by examining model–data misfits for spatial and temporal autocorrelation structures.

Hilton et al. (2012) studied also the spatial model–data residual error structure using a geostatistical method. Hilton's study is focused on the seasonal scale, i.e. investigated residual errors of seasonally aggregated fluxes. However, the state space (variables to be optimized considering also their temporal resolution) of current inversion systems is often at high temporal resolution (daily or even 3-hourly optimisations). Further, the statistical consistency between the error covariance and the state space is crucial. Thus the error structure at the daily timescale is of interest here and can be used in atmospheric inversions of the same temporal resolution. Similar to Hilton's study we select an exponentially decaying model to fit the spatial residual autocorrelation.

In this study, we augment the approach of Chevallier et al. (2006, 2012) to a multi-model–data comparison, investigating among others a potential generalisation of the error statistics, suitable to be applied by inversions using different biosphere models as priors. This expectation is derived from the observation that the biosphere models, despite their potential differences, typically have much information in common, such as driving meteorological fields, land use maps, or remotely sensed vegetation properties, and sometimes even process descriptions. We evaluate model–model mismatches to (I) investigate intra-model autocorrelation patterns and (II) to explore whether they are consistent with the spatial and temporal e-folding correlation lengths of the model–data mismatch comparisons. Model comparisons have been used in the past to infer the structure of the prior uncertainties. For example, Rödenbeck et al. (2003b) used prior correlation lengths based on statistical analyses of the variations within an ensemble of biospheric models. This approach is to a certain degree questionable, as it is unclear how far the ensemble of models actually can be used as representative of differences between modelled and true fluxes. However, if a relationship between model–data and model–model statistics can be established for a region with a dense network of flux observations, it could also be used to derive prior error structure for regions with a less dense observational network.

Moreover, to improve the knowledge of spatial flux error patterns, we make use of a unique set of aircraft fluxes measured on 2 km spatial windows along intensively sampled transects of several tens of kilometres, ideally resolving spatial and temporal variability of ecosystem fluxes across the landscape without the limitation of the flux network with spatial gaps in between measurement locations. Lauvaux et al. (2009) compared results of a regional inversion against measurements of fluxes from aircraft and towers, while this is the first attempt to use aircraft flux measurements to assess spatial and temporal error correlation structures.

This study focuses on the European domain for 2007 (tower data) and 2005 (aircraft data) and uses output from high-resolution biosphere models that have been used for regional inversions. Eddy covariance tower fluxes were derived from the FLUXNET ecosystem network (Baldocchi et al., 2001), while aircraft fluxes were acquired within the CarboEurope Regional Experiment (CERES) in southern France. The methods and basic information regarding the models are summaries in Sect. 2. The results from model–data and model–model comparisons are detailed in Sect. 3. Discussion and conclusions are following in Sect. 4.

Appropriate error statistics for the prior error covariance matrix are derived from comparing the output of three biosphere models which are used as priors for regional-scale inversions with flux data from the ecosystem network and aircraft. We investigate spatial and temporal autocorrelation structures of the model–data residuals. The temporal autocorrelation is a measure of similarity between residuals at different times but at the same location as a function of the time difference. The spatial autocorrelation refers to the correlation, at a given time, of the model–data residuals at different locations as a function of spatial distance. With this analysis we can formulate and fit an error model such as an exponentially decaying model, which can be directly used in the mesoscale inversion system to describe the prior error covariance.

A number of tower sites within the European domain, roughly expanding from

Eddy covariance sites measuring

Eddy covariance sites used in the study. The dashed line delimits the exact domain used to calculate the aggregated fluxes.

Additionally, aircraft fluxes are used, obtained with an eddy covariance system installed onboard a SkyArrow ERA aircraft (Gioli et al., 2006). Flights were made in southern France during CERES (CarboEurope Regional Experiment) from 17 May to 22 June 2005. Eddy covariance fluxes were computed on 2 km length spatial windows along transects of 69 km above forest and 78 km above agricultural land, flown 52 and 54 times respectively, covering the daily course. Exact routes are reported in Dolman et al. (2006).

We simulate CO

The Organising Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE)
model (Krinner et al., 2005) is a process-based site scale to global land
surface model that simulates the water and carbon cycle using meteorological
forcing (temperature, precipitation, humidity, wind, radiation, pressure).
The water balance is solved at a half-hourly time step while the main carbon
processes (computation of a prognostic leaf area index (LAI), allocation, respiration,
turnover) are called on a daily basis. It uses a tiled approach, with
fractional coverage for 13 plant functional types (PFTs). It has been
extensively used as prior information in regional- and global-scale inversions
(Piao et al., 2009; Broquet et al., 2013). For the present simulation, we use
a global configuration of the version 1.9.6 of ORCHIDEE, where no parameter
has been optimized against eddy covariance data. The model is forced with
0.5

The “5 parameter model” (5PM) (Groenendijk et al., 2011), also used in
atmospheric inversions (Tolk et al., 2011; Meesters et al., 2012), is a
physiological model describing transpiration, photosynthesis, and
respiration. It uses MODIS LAI at 10 km resolution,
meteorological data (temperature, moisture, and downward shortwave radiative
flux, presently from ECMWF at 0.25

Modelled fluxes for all above mentioned sites have been provided by the different models by extracting the fluxes from the grid cells which encompass the EC station location using vegetation-type-specific simulated fluxes, i.e. using the vegetation type within the respective grid cell for which the eddy covariance site is assumed representative. For most of the sites the same vegetation type was used for model extraction as long as this vegetation type is represented within the grid cell. As VPRM uses a tile approach, for two cases (IT-Amp and IT-MBo) the represented vegetation type (crop) differs from the actual one (grass). For these cases, the fluxes corresponding to crop were extracted. Fluxes were aggregated to daily fluxes in the following way: first, fluxes from VPRM and 5PM as well as the observed fluxes were temporally aggregated to match with the ORCHIDEE 3-hourly resolution; in a second step we created gaps in the modelled fluxes where no observations were available; the last step aggregated to daily resolution on the premise that (a) the gaps covered less than 50 % of the day and (b) the number of gaps (number of individual 3-hourly missing values) during day and during night were similar (not different by more than a factor two) to avoid biasing.

Spatial and temporal correlation structures and the standard deviation of
flux residuals (model–observations) were examined for daily fluxes over the
year 2007. Simulated fluxes from the different models are at different
spatial resolution, which makes comparisons difficult to interpret. For the
model–data residual analysis, the models VPRM1, VPRM10, ORCHIDEE, and 5PM were
used. We note that VPRM1 with 1 km resolution is considered compatible when
comparing with local measurements. For the model–model analysis we use VPRM50
at 50 km resolution when comparing with ORCHIDEE fluxes as both models share
the same resolution. VPRM10 is considered also appropriate for comparisons
with 5PM model as they both share the same resolution (MODIS radiation
resolution of 1 km aggregated to 10 km and meteorological resolution at
0.25

For the aircraft analysis, only the VPRM was used since it is the only model
with spatial resolution (10 km) comparable with aircraft flux footprint and
capable of resolving spatial variability in relatively short flight
distances. Aircraft NEE data, natively at 2 km resolution along the track,
have been aggregated into 10 km segments to maximise the overlap with the
VPRM grid, obtaining six grid points in forest transects and eight in agricultural
land transects. Footprint areas of aircraft fluxes were computed with the
analytical model of Hsieh et al. (2000), yielding an average footprint width
containing 90 % of the flux of 3.9 km. Averaging also over the different
wind directions (perpendicular or parallel to the flight direction) and
taking into account the 10 km length of the segments, the area that the
aircraft flux data corresponds to, is around 23.5 km

Observed and modelled fluxes are represented as the sum of the measured or simulated values and an error term respectively. When we compare modelled to observed data this error term is a combination of model (the prior uncertainty we are interested in) and observation error. Separating the observation error from the model error in the statistical analysis of the model–observation mismatch is not possible; therefore e-folding correlation length estimations do include the observation error term. Nevertheless, later in the analysis of model–model differences we assess the impact of the observation error on estimated e-folding correlation lengths.

The tower temporal autocorrelation is computed between the time series of
model–observations differences

In the current analysis we introduce the all-site temporal autocorrelation
by simultaneously computing the autocorrelation for all the observation
sites, where

The aircraft temporal autocorrelation was similarly computed according to Eq. (1) using VPRM, and the same exponentially decaying model (Eq. 3) was used to fit the individual flight flux data. The temporal interval was limited at 36 days by the experiment duration.

For the spatial analysis the correlation between model–observation residuals at two different locations (i.e sites or aircraft grid points) separated by a specific distance was computed in a way similar to the temporal correlation and involved all possible pairs of sites and aircraft grid points. Additional data treatment for the spatial analysis was applied to reduce the impact of tower data gaps, as it is possible that the time series for two sites might have missing data at different times. Thus in order to have more robust results, we also examined spatial structures by setting a minimum threshold of 150 days of overlapping observations within each site pair. Furthermore spatial correlation was investigated for seasonal dependence, where seasons are defined as summer (JJA), autumn (SON), winter (DJF for the same year), and spring (MAM). In those cases a different threshold of 20 days of overlapping observations was applied. We note that we do not intend to investigate the errors at the seasonal scale but rather to study whether different seasons trigger different error correlation structures.

To estimate the spatial correlation scales, the pairwise correlations were
grouped into bins of 100 km distance for towers and 10 km for aircraft
data respectively (dist). Following the median for each bin was
calculated, and a model similar to Eq. (3) was fitted but omitting the nugget
effect variable:

Confidence intervals for the estimated model parameters were computed based on the profile likelihood (Venzon and Moolgavkar, 1988) as implemented within the “confint” function from MASS package inside the R statistical language.

As aircraft fluxes cannot obviously be measured at the same time at different locations, given the relatively short flight duration (about 1 h) we treated aircraft flux transect as instantaneous “snapshots” of the flux spatial pattern across a landscape, neglecting temporal variability that may have occurred during flight.

We evaluate both model–data flux residuals and model–model differences in a
sense of pairwise model comparisons, in order to assess whether model–model
differences can be used as proxy for the prior uncertainty, assuming that
models have independent prior errors. In order to minimise potential
influence of the different spatial resolution between the models on the
estimated correlation lengths, we compare pairs that have comparable spatial
resolution. Such cases are VPRM50–ORCHIDEE and VPRM10–5PM. We choose VPRM10
as more representative to compare against 5PM as 5PM fluxes have also a
resolution of 10 km (the main driver, MODIS radiation has the same
resolution of 10 km for both models). Similar to the model–observation
analysis, the statistical analysis gives a combined effect of both model
errors. We assess the impact in the error structure between model–observation
and model–model comparisons caused by the observation error by adding a
random measurement error to each model–model comparison. This error has the
same characteristics as the observation error which is typically associated
with eddy covariance observations; the error characteristics were derived
from the paired observation approach (Richardson et al., 2008). Specifically,
we implement the flux observation error as a random process (white noise)
with a double-exponential probability density function. This can be achieved
by selecting a random variable

Observed daily averaged NEE fluxes, for all ground sites and the full
time series, yield a standard deviation of
3.01

The residual distribution of the models defined as the difference between
simulated and observed daily flux averages for the full year 2007 was found
to have a standard deviation of 2.47, 2.49, 2.7, and
2.25

Box and whisker plot for site-specific correlation coefficients
between modelled and observed daily fluxes as a function of the vegetation
type. The numbers beneath the

The distribution is biased by

Box and whisker plot for the annual site-specific biases of the
models differentiated by vegetation type. Units at

Temporal lagged autocorrelation from model–data daily averaged NEE
residuals for all models. Thin red lines correspond to different sites,
while the blue thin lines reveal the sites with a bias larger than

The temporal autocorrelation was calculated for model–data residuals for each of the flux sites (“site data” in Fig. 4), but also for the full data set (“all-site” in Fig. 4). The all-site temporal autocorrelation structure of the residuals appears to have the same pattern for all models. It decays smoothly for time lags up to 3 months and then remains constant near to 0 or to some small negative values. The temporal autocorrelation increases again for time lags > 10 months, which is caused by the seasonal cycle. These temporal autocorrelation results agree with the findings of Chevallier et al. (2012).

The exponentially decaying model in Eq. (3) was used to fit the data. At 0
separation time (

Annual temporal autocorrelation times in days, from model–data and model–model residuals. The number within the brackets shows the correlation times when excluding sites with large model–data bias from the analysis.

For a number of sites a large model–data bias was found. In order to assess
how the result depends on individual sites where model–data residuals are
more strongly biased the analysis was repeated under exclusion of sites with
an annual mean of model–data flux residuals larger than
2.5

The temporal correlation of differences between VPRM10 and aircraft flux measurements could be computed for time intervals up to 36 days (Fig. 5) corresponding to the duration of the campaign. The correlation shows an exponential decrease and levels off after about 25 days with an e-folding correlation time of 13 days (range of 10–16 days within the 95 % confidence interval). Whilst the general behaviour is consistent with results obtained for VPRM–observation residuals for flux sites, the correlation time is 2 times smaller.

Temporal autocorrelation for VPRM10–aircraft NEE residuals. Black dots represent individual flux transects pairs sampled at different times as function of time separation. Black circles represent daily-scale binned data.

Regarding spatial error correlations, results for all models show a
dependence on the distance between pairs of sites. The median correlation
drops within very short distances (Fig. 6). Fitting the simple exponentially
decaying model (Eq. 4) to the correlation as a function of distance we find
an e-folding correlation length

Distance correlogram for the daily net ecosystem exchange (NEE) residuals using all sites. Black dots represent the different site pairs; the blue line represents the median value of the points per 100 km bin and the green an exponential fit. Results are shown for residuals of VPRM at a resolution of 10 (top left) and 1 km (top right), ORCHIDEE (bottom left), 5PM (bottom right).

Annual and seasonal e-folding correlation length of the daily
averaged model–data NEE residuals for VPRM at 10 and 1 km resolution,
ORCHIDEE and 5PM. S refers to the standard case where all pairs were used,
D refers to the case where only pairs with different vegetation types
were used, I denotes the case in which only pairs with identical vegetation
type were considered, and

Interestingly, if we restrict the analysis to pairs with at least 150
overlapping days between site pairs, larger correlation scales are found
(case S

Seasonal dependence of the e-folding correlation lengths for at least 20 overlapping days per season and for all-site pairs is also shown in Fig. 7. VPRM showed somewhat longer correlation lengths during spring and summer, ORCHIDEE had the largest lengths occurring during summer and autumn, and 5PM e-folding correlation lengths show slightly enhanced values during spring and summer. However, none of these seasonal differences are significant with respect to the 95 % confidence interval.

The spatial error correlation between VPRM10 model and aircraft fluxes
measured during May–June along continuous transects at forest and
agriculture land use (Fig. 8) shows an exponential decay up to the maximum
distance that was encompassed during flights (i.e. 70 km). Of note is that
only two measurements were available at 60 km distance and none for larger
distances making it difficult to identify where the asymptote lies.
Nevertheless, fitting the decay model (Eq. 4) leads to

Distance correlogram between VPRM10 and aircraft NEE measurements. Black dots represents the different aircraft grid points pairs; black circles represent 10 km scale binned data.

We investigate the model–model error structure of NEE estimates by replacing
the observed fluxes which were used as reference, with simulated fluxes from
all the biosphere models. Note that for consistency with the model–data
analysis, the simulated fluxes contained the same gaps as the observed flux
time series. The e-folding correlation time is found to be slightly larger
compared to the model–data correlation times, for most of the cases. An
exception is the 5PM–VPRM10 pair which produced remarkably larger correlation
time (Table 2). Specifically, VPRM50–ORCHIDEE and VPRM10–5PM residuals show
correlation times of 28 days (range between 24 and 32 days within 95 %
confidence interval) and 131 (range between 128 and 137 days within 95 %
confidence interval) respectively. Significantly different e-folding
correlation times are found for VPRM50–5PM compared to VPRM10–5PM with
correlation times of 52 days (range between 49 and 56 days within 95 %
confidence interval). Repeating the analysis excluding sites with residual
bias larger than 2.5

Although the e-folding correlation times show only minor differences compared to the model–data residuals, this is not the case for the spatial correlation lengths (Fig. 9). The standard case (S) was applied for the annual analysis, with no minimum number of days with overlapping non-missing data for each site within the pairs. Taking VPRM50 as reference, much larger e-folding correlation lengths of 371 km with a range of 286–462 km within 95 % confidence interval yielded for VPRM50–ORCHIDEE comparisons, and 1066 km for VPRM50–5PM were found. However, VPRM10–5PM analysis, which is also considered appropriate in terms of the spatial resolution compatibility contrary to the VPRM50–5PM pair, is in good agreement with VPRM50–ORCHIDEE spatial scale (230–440 km range within 95 % confidence interval with the best fit being 335 km). With ORCHIDEE as reference, the e-folding correlation length for the ORCHIDEE–5PM comparison is 276 km with a range of 183–360 km within 95 % confidence interval. However the later correlation length might be affected by the different spatial resolution as the difference between VPRM10 and VPRM50 against 5PM suggests. Seasonal e-folding correlation lengths, using a minimum of 20 days overlap in the site-pairs per season (Fig. 9), are also significantly larger compared with those from the model–data analysis.

Annual and seasonal e-folding correlation length for an ensemble of
daily averaged NEE differences between two models without (filled circle) and
with random measurement errors added to the modelled fluxes used as reference
(crosses). The symbols represents the best-fit value when fitting the
exponential model, and the upper and lower edge of the error bars show the
2.5 and 97.5 percentiles of the correlation length. The first acronym at the
legend represents the model used as reference and the second the model which
was compared with. Note that for the VPRM10/VPRM1 case during spring (with
and without random error), the 97.5 percentile of the length value exceeds
the

When we add the random measurement error to the modelled fluxes used as reference (crosses in Fig. 9), we observe only slight changes in the annual e-folding correlation lengths, without a clear pattern. The correlation lengths show a random increase or decrease but limited up to 6 %. Interestingly, the seasonal e-folding correlation lengths for most of the cases show a more clear decrease. For example, the correlation length of the VPRM10–5PM residuals during winter decreases by 22 % or even more for spring season. Despite this decrease, the e-folding seasonal correlation lengths remain significantly larger in comparison to those from the model–data analysis. Overall, all models when used as reference show the same behaviour with large e-folding correlation lengths that mostly decrease slightly when the random measurement error is included. Although the random measurement error was added as “missing part” to the modelled fluxes to better mimic actual flux observations, it did not lead to correlation lengths similar to those from the model–data residual analysis. To investigate whether a larger random measurement error could cause spatial correlation scales in model–model differences, we repeated the analysis with artificially increased random measurement error (multiplying with a factor between 1 and 15). Only for very large random measurement errors did the model–model e-folding correlation lengths start coinciding with those of the model–data residuals (Fig. 10).

Annual e-folding correlation lengths as a function of the factor used for scaling the random measurement error, for all model–model combinations. The black dot-dash lines reveal the range of the spatial correlation lengths generated from the model–data comparisons.

We analysed the error structure of a priori NEE uncertainties derived from a
multi-model–data comparison by comparing fluxes simulated by three
different vegetation models to daily averages of observed fluxes from 53
sites across Europe, categorised into seven land cover classes. The different
models showed comparable performance with respect to reproducing the observed
fluxes; we found mostly insignificant differences in the mean of the
residuals (bias) and in the variance. Site-specific correlations between
simulated and observed fluxes are significantly higher than overall
correlations for all models, which suggest that the models struggle with
reproducing observed spatial flux differences between sites. Furthermore, the
site-specific correlations reveal a large spread even within the same
vegetation class, especially for crops (Fig. 2). This is likely due to the
fact that none of the models uses a specific crop model that differentiates
between the different crop types and their phenology. The models using
remotely sensed vegetation indices (VPRM and 5PM) better capture the
phenology; ORCHIDEE is the only model that differentiates between

Model–data flux residual correlations were investigated to give insights regarding prior error temporal scales which can be adopted by atmospheric inversion systems. Whilst fluxes from ORCHIDEE model are at much coarser resolution compared to the representative area from the flux measurements, VPRM1 fluxes (1 km resolution and only the meteorology at 25 km) are considered appropriate for the comparisons. Despite the scale mismatch, results are in good agreement across all model–data pairs.

Exponentially decaying correlation models are a dominant technique among
atmospheric inverse studies to represent temporal and spatial flux
autocorrelations (Rödenbeck et al., 2009; Broquet et al., 2011, 2013).
However, regarding the temporal error structure we need to note the weakness
of this model to capture the slightly negative values at 2–10 months lags
and, more importantly, the increase in correlations for lag times larger than
about 10 months. Error correlations were parameterized differently by
Chevalier et al. (2012) where the prior error was investigated without
implementing it to atmospheric inversions. Polynomial and hyperbolic
equations were used to fit temporal and spatial correlations respectively.
Nevertheless, we use here e-folding lengths not only for their simplicity in
describing the temporal correlation structure with a single number but also
because this error model ensures a positive definite covariance matrix (as
required for a covariance). This is crucial for atmospheric inversions as
otherwise negative, spatially and temporally integrated uncertainties may be
introduced. In addition it can keep the computational costs low, because the
hyperbolic equation has significant contributions from larger distances: for
the case of the VPRM1 model, at 200 km distance the correlation according to
Chevallier et al., hyperbolic equation is 0.16 compared to 0.004 for the
exponential model. As a consequence, more non-zero elements are introduced to
the covariance matrix, which increases computational costs in the inversion
systems. Using the same hyperbolic equation for the
spatial correlation,

Autocorrelation times were found to be in line with findings of Chevallier et al. (2012). The model–data residuals were found to have an e-folding time of 32 and 26 days for VPRM and ORCHIDEE respectively and 70 days for 5PM. This significant difference appears to have a strong dependence on the set of sites used in the analysis. Excluding nine sites with large residual bias, the autocorrelation time from the 5PM–data residuals drastically decreased and became coherent with the times of the other biosphere models. The all-models and all-site autocorrelation time was found to be 39 days, which reduces to 30 days (28–31 days within 95 % confidence interval), when excluding the sites with large residual bias, coherent with the single model times. From the model–model residual correlation analysis, the correlation time appear to be consistent with the above-mentioned results and lies between 28 and 46 days for most of the ensemble members. However model–model pairs consisting of the VPRM and 5PM models produced larger times up to 131 days; omitting sites with large residual biases this is reduced to 100 days (99–105 days within 95 % confidence interval). This finding could be attributed to the fact that despite the conceptual difference between those models, they do have some common properties. Both models were optimized against eddy covariance data although for different years (2005 and 2007 respectively), while no eddy covariance data were used for the optimisation of ORCHIDEE. In addition, VPRM and 5PM both use data acquired from MODIS, although they estimate photosynthetic fluxes by using different indices of reflectance data. Summarising the temporal correlation structure, it appears reasonable to (a) use the same error correlation in atmospheric inversions regardless of which biospheric model is used prior or (b) use an autocorrelation length of around 30 days.

Only weak spatial correlations for model–data residuals were found, comparable to those identified by Chevallier et al. (2012) that were limited to short lengths up to 40 km without any significant difference between the biospheric models (31–40 km). Hilton et al. (2012) estimated spatial correlation lengths of around 400 km. However, we note that significant differences exist between this study and Hilton et al. (2012) regarding the methods that were used and the landscape heterogeneity of the domain of interest. With respect to the first aspect the time resolution is much coarser (seasonal averaged flux residuals) compared to the daily averaged residuals used here. Furthermore spatial bins of 300 km were used for the autocorrelation analysis, which is far larger than the approximate bin width of 100 km that were used in our study. Regarding the second aspect North America has a more homogenous landscape compared to the European domain. The scales for each ecosystem type (e.g. forests, agricultural land) are drastically larger than those in Europe as can be seen from MODIS retrievals (Friedl et al., 2002).

Although the estimated spatial scales are shorter than the spatial resolution that we are solving for (100 km bins), the autocorrelation analysis of aircraft measurements made during CERES supports the short-scale correlations. These measurements have the advantage of providing continuous spatial flux transects along specific tracks that were sampled routinely (in this case over period of 36 days at various times of the day), thus also resolving flux spatial variability at small scales, where pairs of eddy covariance sites may not be sufficiently close. However, aircraft surveys are necessarily sporadic in time. Of note is that the eddy covariance observation error has no significant impact on the error structure, as the addition of an observation error to the analysis of model–model differences had only minor influence on the error structure. We note that the current analysis focuses to daily timescale and therefore the error statistics with respect to the estimated spatial and temporal e-folding correlation lengths are valid for such scales.

Model–data residual e-folding correlation lengths show a clear difference
between the cases where pairs only with different (D) or identical (I) PFT
were considered, with the latter resulting in longer correlation lengths but
only identified for the VPRM model at both resolutions. The D case has
slightly shorter lengths for all models than the standard case (S). One could
argue that as VPRM uses PFT-specific parameters that were optimized against
2005 observations, the resulting PFT-specific bias could lead to longer
spatial correlations. However, ORCHIDEE and 5PM also show comparable biases
(Fig. 3), although long correlation scales were not found. Moreover we repeated
the spatial analysis after subtracting the PFT-specific bias from the fluxes,
and the resulting correlation lengths showed no significant change. The
impact of data gaps was also investigated by setting a threshold value of
overlapping observations between site pairs. Setting this to 150 days results
in an increase for the S case up to 60 km but only for the VPRM model.
For the D and I cases when setting the same threshold value (D

The analysis of model–model differences did not reproduce the same spatial
scales as those from the model–data differences, but instead spatial
e-folding correlation lengths were found to be dramatically larger. Adding a
random measurement error to the modelled fluxes used as reference slightly
reduced the spatial correlation lengths to values ranging from 278 to 1058 km.
Even when largely inflating the measurement error, the resulting spatial
correlation lengths (Fig. 10) still do not approach those derived from
model–data residuals. Only when the measurement error is scaled up by a
factor of 8 or larger (which is quite unrealistic as this corresponds to a
mean error of 1.46

The large e-folding correlation lengths yielded from this model–model residual analysis suggest that the models are more similar to each other than to the observed terrestrial fluxes, at least on spatial scales up to a few hundred kilometres regardless of their conceptual differences. This might be expected to some extent due to elements that the models share. Respiration and photosynthetic fluxes are strongly driven by temperature and downward radiation respectively and those meteorological fields have significant commonalities between the different models. VPRM and 5PM both use temperature and radiation from ECMWF analysis and short-term forecasts. Also the WFDEI temperature and radiation fields used in ORCHIDEE are basically from the ERA-Interim reanalysis, which also involves the integrated forecasting system (IFS) used at ECMWF (Dee et al., 2011). Regarding the vegetation classification all models are site specific and therefore are using the same PFT for each corresponding grid cell. Photosynthetic fluxes are derived with the use of MODIS indices in VPRM (EVI and LSWI) and in 5PM (LAI and albedo).

Using full flux fields from the model ensemble (rather than fluxes at specific locations with observation sites only) to assess spatial correlations in model–model differences is not expected to give significantly different results, as the sites are representative for quite a range of geographic locations and vegetation types within the domain investigated here.

The current study intended to provide insight on the error structure that can
be used for atmospheric inversions. Typically, inversion systems have a pixel
size ranging from 10 to 100 km for regional and continental inversions, and
as large as several degrees (hundreds of kilometres) for global inversions. If a
higher-resolution system assumes such small-scale correlations (as those
found in the current analysis), in the covariance matrix,
this leads to very small prior uncertainties when aggregating over large
areas and over longer time periods. To aggregate the uncertainty to large
temporal and spatial scales, we used the following equation (after Rodgers,
2000):

Whilst temporal scales found from this study have already been used in inversion studies, this is not the case to our best knowledge for the short spatial scales. The impact of the prior error structure derived from this analysis, on posterior flux estimates and uncertainties will be assessed in a subsequent paper. For that purpose, findings from this study are currently implemented in three different regional inversion systems aiming to focus on network design for the ICOS atmospheric network.

The research leading to these results has received funding from the European Community's Seventh Framework Program ([FP7/2007–2013]) under grant agreement no. 313169 (ICOS-INWIRE) and 283080 (GEOCARBON). The EC data used in this study were funded by the European Community's Sixth and Seventh Framework programs and by national funding agencies. The following sites are listed, sorted by project/funding agency: BE-Bra, BE-Lon, BE-Vie, CH-Lae, CH-Oe1, CH-Oe2, CZ-BK1, DE-Geb, DE-Gri, DE-Hai, DE-Kli, DE-Tha, DK-Lva, ES-ES2, ES-LMa, FI-Hyy, FR-Fon, FR-Hes, FR-LBr, FR-Lq1, FR-Lq2, FR-Pue, IT-SRo, NL-Dij, NL-Loo, PT-Esp, PT-Mi2, SE-Kno, SE-Nor, SE-Sk1, SK-Tat, UK-AMo, and UK-EBu are funded by CarboEuropeIP (grant agreement no. GOCE-CT-2003-505572); UK-AMo and UK-EBu are also co-funded by EU FP7 ECLAIRE project and NERC-CEH; IT-BCi, IT-Cas, IT-Lav, and IT-LMa are funded by CarboItaly (IT-FISR); IT-Amp, IT-Col, IT-Cpz, IT-MBo, IT-Ren, and IT-Ro2 are co-funded by CarboEuropeIP and by CarboItaly; CH-Cha, CH-Dav, and CH-Fru are co-funded by CarboExtreme (grant agreement no. 226701) and GHG-Europe (grant agreement no. 244122); ES-Agu is funded by GHG-Europe; FR-Mau is co-funded by CNRM/GAME (METEO-FRANCE, CNRS), CNES, and ONERA. We also acknowledge funding agencies for the sites FR-Aur, FR-Avi, and HU-Mat.

The authors would like to thank Anna Michalak and the anonymous reviewer whose insightful comments helped the development of the manuscript. The article processing charges for this open-access publication were covered by the Max Planck Society. Edited by: T. Keenan