Bayesian Inversions of a Dynamic Vegetation Model at Four European Grassland Sites

Eddy covariance data from four European grassland sites are used to probabilistically invert the CARAIB (CARbon Assimilation In the Biosphere) dynamic vegetation model (DVM) with 10 unknown parameters, using the DREAM (ZS) (DiffeRential Evolution Adaptive Metropolis) Markov chain Monte Carlo (MCMC) sampler. We focus on comparing model inversions, considering both homoscedas-tic and heteroscedastic eddy covariance residual errors, with variances either fixed a priori or jointly inferred together with the model parameters. Agreements between measured and simulated data during calibration are comparable with previous studies, with root mean square errors (RMSEs) of simulated daily gross primary productivity (GPP), ecosystem respiration (RECO) and evapotranspiration (ET) ranging from 1.73 to 2.19, 1.04 to 1.56 g C m −2 day −1 and 0.50 to 1.28 mm day −1 , respectively. For the calibration period, using a homoscedastic eddy covariance residual error model resulted in a better agreement between measured and modelled data than using a heteroscedastic residual error model. However , a model validation experiment showed that CARAIB models calibrated considering heteroscedastic residual errors perform better. Posterior parameter distributions derived from using a heteroscedastic model of the residuals thus appear to be more robust. This is the case even though the classical linear heteroscedastic error model assumed herein did not fully remove heteroscedasticity of the GPP residuals. Despite the fact that the calibrated model is generally capable of fitting the data within measurement errors, systematic bias in the model simulations are observed. These are likely due to model inadequacies such as shortcomings in the pho-tosynthesis modelling. Besides the residual error treatment, differences between model parameter posterior distributions among the four grassland sites are also investigated. It is shown that the marginal distributions of the specific leaf area and characteristic mortality time parameters can be explained by site-specific ecophysiological characteristics.


Introduction
Covering about 38 % of the European agricultural area and 8 % of the land surface (FAO, 2011), grassland is an important land cover class in Europe, which shows a wide range of different ecological characteristics.By stocking carbon, temperate grassland might play an important role in climate change mitigation in Europe (Soussana et al., 2004) and on the world scale (O'Mara, 2012).Large uncertainties, however, remain in the estimation of the (source or sink) carbon fluxes since those largely depend on farming management options.
In environmental modelling, grassland growth models have received less attention than the long-standing and highly developed crop models.Since grasslands are agroecosystems that can be considered either as agricultural or semi-natural lands, grassland models were designed for two main purposes: the simulation of forage and dairy or meat production, and the simulation of the carbon fluxes at the landatmosphere interface.Several crop models were adapted for grassland growth modelling (e.g., STICS;Ruget, 2009;Dumont et al., 2014, EPIC;Williams et al., 2008), especially when the management of the grassland remained similar to crop management, i.e., when the grassland was used for tem-porary forage production and was cut rather than grazed by animals.Some other models were specifically developed for grasslands (e.g., SPACSYS; Wu et al., 2007), sometimes coupled with animal production models (e.g., PaSim; Graux et al., 2013), whereas grassland models were also developed from dynamic vegetation models (DVMs) such as LPJmL (Bondeau et al., 2007), adapted from the LPJ model (Sitch et al., 2003).Being process-based models, DVMs are well suited for large-scale spatial simulations and can account for a wide range of current and projected climatic conditions.
To be used for simulation-based decision making, a DVM must be properly parametrized.Model parameter values can be derived from (1) laboratory experiments as, e.g., the stomatal conductance described by the Ball-Berry model (Ball et al., 1987), (2) in situ field measurements, or (3) model inversion using calibration data measurements or (4) spatialized databases (e.g., from remote sensing).Model inversion (also referred to as calibration) consists of automatically finding those model parameters that allow the model to adequately reproduce the available observed data.The collection of representative and high-quality data is thus of paramount importance for inversion, as DVMs require an adequate parametrization that is sufficiently representative of the range of conditions over the spatial extent of the simulation.Typically, DVMs use different sets of parameters that are assigned to specific vegetation classes that grow together over the same area or in geographically distinct biomes.Dynamic vegetation model inversion needs a sufficient number of sites with varying ecophysiological conditions that are supposed to be representative of the considered vegetation classes or biomes, but still well-delimited (Knorr and Kattge, 2005).Model inversion using continuous, gridded data (e.g., from remote sensing; Patenaude et al., 2008) could also help in determining optimal parameters for large areas, but computation time can be a limiting factor for such application.
Given the high number of eddy covariance experimental sites across the world, eddy covariance measurements are particularly appealing for inversion of the DVMs (Friend et al., 2007).Furthermore, the long-standing rise in computational resources not only increased modelling capabilities in terms of temporal and spatial resolution but also opened new avenues for quantifying the uncertainty associated with the estimated model parameters and its effect on model simulations.In particular, the Bayesian framework for inverse modelling is increasingly used in the DVM community (e.g., Hartig et al., 2012).Bayesian methods such as Markov chain Monte Carlo (MCMC) sampling aim to derive a representative set of all parameter combinations that are consistent with the observed data and available prior information.This set of parameters is referred to as the posterior distribution.
Nevertheless, eddy covariance data are known to be associated with relatively large measurement uncertainties, implying both systematic and random errors (see Aubinet et al. (2012), chapter 7, for a comprehensive description of all sources of eddy covariance uncertainties).As eddy covari-ance data are the result of a long process chain, they can be affected by instrumental measurement error (e.g., calibration and design errors), sampling errors due to the variability of the fluxes in time and space and data treatment error (e.g., due to the gap filling of missing data).Uncertainties in eddy covariance data are also strongly dependent on the time resolution of the fluxes, tending to diminish with time aggregation (Richardson and Hollinger, 2005).It is crucial to account for these random data uncertainties in the inversion since an improper statistical treatment can cause the parameter posterior distribution to be strongly biased (e.g., Fox et al., 2009).Quantifying random eddy covariance data errors is not straightforward (Hollinger and Richardson, 2005;Lasslop et al., 2008), but these errors are typically characterized by a variance that is proportional to the magnitude of the data, i.e., they show heteroscedasticity (e.g., Lasslop et al., 2008).Therefore, it has been suggested (Richardson et al., 2008) that the measurement error variance can be modelled as a linear function of the magnitude of the flux with a nonnull intercept, as random errors are non-null even when the flux equals 0. However, while the random error can be taken into account in the inversion, systematic measurement errors can only be removed by instrument calibration.
In this study, data from eddy covariance stations over four grassland sites are inverted for the CARAIB (CARbon Assimilation In the Biosphere) dynamic vegetation model parameters within a Bayesian framework.This is both the first automatic calibration of the CARAIB model and its first application to managed grassland modelling, which required the adaptation of the model to grass cutting and grazing.The main objective is to compare the modelling of the carbon and water fluxes over the four grassland sites using four different ways of treating the eddy covariance data errors during the inversion.Both homoscedastic and heteroscedastic residual error models are considered, either fixed beforehand or sampled along with the model parameters.A second objective is then to compare the site-specific posterior parameter distributions obtained for the four grasslands, given their climatic, ecological and management characteristics.

Experimental sites and data
In this study, we focus on four long-term experimental sites (see Table 1) that are semi-natural permanent grasslands: Grillenburg, Germany (Prescher et al., 2010); Oensingen (intensively managed), Switzerland (Ammann et al., 2007); Monte Bondone, Italy, (Wohlfahrt et al., 2008) and Laqueuille (extensively managed), France, (Klumpp et al., 2011).The four sites pertain to the global FLUXNET network and, as such, a large number of studies were conducted using eddy covariance data from these sites.The FLUXNET website (http://fluxnet.ornl.gov/)provides lists of references per site.
The four sites are located in western and central Europe and experience different climate, altitude, soil and management conditions.They can be classified according to the De Martonne-Gottman aridity index, which is inversely related with the site aridity.Oensingen is the most intensively managed site and the only one that is fertilized (about 200 kg N ha −1 yr −1 ).The other three sites are extensively managed, with no organic or mineral fertilization.The last two sites are mid-mountainous grassland, while the first two sites are situated at a lower altitude.Only the grassland in Laqueuille is grazed by animals during the growing season, while the other three are hay meadows that are cut once or several times a year.Note that, although grass cutting occurred on the 13 June 2005 in Grillenburg according to the given management data, it was not observed in the measured eddy covariance fluxes because of gap filling of missing data.As a result, this cut was neglected in the modelling.
The four grasslands are equipped with eddy covariance stations for measuring ecosystem fluxes.Flux measurements and field data sets were made available through a coordinated task of the FACCE/MACSUR (Food Agriculture Climate Change/Modeling European Agriculture with Climate Change for food Security) knowledge hub, which aims at performing an intercomparison of grassland models (Ma et al., 2014) by running several grassland models with the same field data sets collected under various climatic and management conditions.Field data sets hold the necessary information for feeding the grassland model: hourly meteorological records of climatic variables, soil physical parameters, management information such as cutting dates or grazing charges, and initial conditions.Daily eddy covariance data included net ecosystem exchange (NEE; g C m −2 day −1 ), gross primary productivity (GPP; g C m −2 day −1 ), ecosystem respiration (RECO; g C m −2 day −1 ) and evapotranspiration (ET; mm day −1 ).It is worth noting that only the NEE and ET are directly measured by the eddy covariance station (i.e., fluxes of CO 2 and H 2 O, respectively) and that GPP and RECO are derived from these measurements.
In this study, only GPP, RECO and ET measurements were used in the inverse modelling.Adding NEE measurements would be ineffective as they are directly dependent on GPP and RECO.GPP and RECO were used since they are directly linked with the photosynthesis and respiration processes, respectively, while the influence of these two processes is mixed in the NEE measurements.Other combinations including the NEE were first tested but resulted in poorer agreement between measured and modelled data.The full data range including gap-filled data was inverted, since these data are gap-filled according to specific protocols that are standards in the eddy covariance community.

Description of the model
CARAIB is a physically based dynamic vegetation model that was developed for the simulation of the carbon cycle on the global scale (Warnant et al., 1994;Nemry et al., 1996;Otto et al., 2002).It calculates the carbon fluxes through the soil-vegetation-atmosphere continuum by simulating ecophysiological processes: photosynthesis, carbon allocation to plant pools, and autotrophic and heterotrophic respiration.The CARAIB model has been used in numerous paleoclimatology, vegetation and crop modelling studies.The reader is referred to the aforementioned references for a full model description.
For C 3 plants, photosynthesis is computed according to the model of Farquhar et al. (1980).The stomatal conductance governing the flux of CO 2 through the stomata is described on the leaf scale with the Ball-Berry approach (Ball et al., 1987), using the model of Leuning (1995) with further adaptations from Van Wijk et al. (2000) to account for soil water stress affecting the stomatal conductance.Photosynthesis and respiration processes are computed at 2-hour time steps on a half-day basis, and the model assumes a symmetry with respect to solar noon time; that is, computation of these processes is made for half the day and further aggregated using a daily time step.Other processes, e.g., related to soil hydrology or carbon allocation, are computed on a daily basis.
In this study, a single plant functional type (PFT) is considered (BAG 22 as defined in Laurent et al., 2004Laurent et al., , 2008) ) corresponding to the flora that can be encountered in European grasslands, i.e., species of Poaceae and Asteraceae.The model was adapted for simulating the grassland sites by adding management functions for grass cutting and grazing.Grass cutting is modelled by the removal of a part of the plant carbon mass so that the model matches given values of leaf area index after cutting.Grazing is modelled such that a given fraction of the plant carbon mass is removed every day according to the grazing charge.The dates of the grass cutting and the duration of the grazing periods were known and fixed in the simulations.Daily meteorological data recorded at the experimental sites were used in the model, i.e., minimal and maximal temperature, precipitation, solar radiation, relative air humidity, and wind velocity.Although they can affect vegetation modelling (Gottschalk et al., 2007;Rivington et al., 2006;Zhao et al., 2012), uncertainties in the meteorological data were not considered in this study.
Thirty-three parameters per PFT are set in CARAIB.These parameters govern photosynthesis, plant physiology process (e.g., specific leaf area, carbon-to-nitrogen ratio), allocation of carbon and residence times in the different pools of carbon, including plants and soil pools, land surface-atmosphere interactions (albedo, roughness length), and tolerance to extreme conditions (thresholds and response times).CARAIB were mainly taken from the literature (Warnant, 1999) and further compared with observed values (remote sensing, field data and paleorecords).So far, no model inversions were performed with the CARAIB model.

Choice of parameters
In this study, 10 model parameters were sampled (Table 2).They were chosen according to their presupposed importance -that is, the model sensitivity to these parameters -and because some parameter values were already known in the measured data from the experimental sites.Default values that were defined during the model development and used in previous research are given in Table 2.These parameters govern the main processes of the model, namely, the photosynthesis, the respiration and carbon transfer between carbon pools: -The slope g1 and the intercept g0 (µmol m −2 s −1 ) of the stomatal conductance as described in Leuning (1995) are directly related to the photosynthesis since they govern the stomatal conductance.They are thus related to the gross primary productivity (GPP) and evapotranspiration (ET) with respect to the meteorological conditions.While most of ecological models, including CARAIB, use an empirical approach for stomatal conductance, derived from the Ball-Berry model, Medlyn et al. (2011) recently reconciled the empirical approach with the theoretical background based on the optimal stomatal behaviour (Farquhar et al., 1980), which states that there is a trade-off for stomata between maximizing carbon gain (photosynthesis) and minimizing water loss (transpiration).These new developments in the theoretical understanding of the empirical relationship push forward the necessity to measure or calibrate the stomatal conductance parameters under different environmental conditions.Although single values of these parameters are used for regional or global modelling of C 3 plant photosynthesis (e.g., Sitch et al., 2008), it is known that stomatal conductance parameters actually vary through time and space according to the environmental conditions and plant species.
-The specific leaf area (SLA; m 2 g C −1 ) is defined in CARAIB as the leaf area per unit of carbon mass of the plants.It is used in the model to convert the assimilated mass of carbon into leaf area index.Besides its role in the model, SLA is often studied as a plant trait that is used for predicting the plant resource use strategy or for clustering plants species into functional groups.
Maximizing the photosynthesis while minimizing leaf respiration, high-SLA leaves (thin leaves) are productive but also more vulnerable and short-lived (Wilson et al., 1999).They are thus better adapted to resourcerich environment, where leaves can be quickly reconstructed (Poorter and De Jong, 1999).On the other hand, low-SLA leaves (thick leaves) are often encountered in drought-adapted (Marcelis et al., 1998) or shadetolerant species (Evans and Poorter, 2001) and for the lower, self-shaded leaves of a plant.SLA is also known to vary over the course of the season and according to the leaf age (Wilson et al., 1999).Nevertheless, the concept of SLA is sometimes problematic for some plant species with complex plant geometry (Vile et al., 2005), e.g., highly folded leaves or with a non-negligible part of the photosynthetic tissues located on the stem, as is the case among the Poaceae species.In these simulations, SLA is defined for the PFT that is supposed to represent European grasslands, and, therefore, SLA should actually be considered as an effective parameter among the grassland species and for the whole plant body.
-The characteristic mortality time (year) of the plant in normal τ and in stress conditions τ s is, respectively, the characteristic time for the renewal of the plant (τ ) and the time it takes for the plant to die in stress conditions (τ s ).The stress conditions occur when temperatures reach either low or high extreme values, for soil water content below a certain threshold or for low irradiance values.The default values were 0.667 year for τ , meaning a renewal of the plant within 8 months, and 0.083 year for τ s , meaning a characteristic mortality time in stress conditions of 1 month.
-Two carbon-to-nitrogen ratios are defined for the photosynthetic active carbon pool of the plant (C / N1) and for the remainder of the plant (C / N2).The nitrogen content of the leaves play a crucial role in the photosynthesis, and increasing nitrogen content (decreasing C / N) fosters photosynthetic activity.A low C / N ratio in plant usually occurs together with high nitrogen content in soils, that is, a resource-rich environment.
-Three parameters govern the rates of the soil heterotrophic respiration: γ 1 for the respiration of the "green litter", γ 2 for the respiration of the "non-green litter" and γ 3 for the respiration of the soil organic carbon.

Inverse problem
To acknowledge that measurements and modelling errors are inevitable, the inverse problem is commonly represented by the stochastic relationship where F is a deterministic, error-free forward model that expresses the relation between the uncertain parameters z and the measurement data d and where the noise term e lumps measurement and model errors.Inversions were performed within a Bayesian framework, which treats the unknown model parameters z as random variables with the posterior probability density function (pdf) p (z|d) given by where p (z) denotes the prior distribution of z and L (z|d) ≡ p (d|z) signifies the likelihood function of z.The normalization factor p (d) = p (z) p (d|z) dz is obtained from numerical integration over the parameter space so that p (z|d) scales to unity.The quantity p (d) is generally difficult to estimate in practice but is not required for parameter inference.
In the remainder of this study, we will focus on the unnormalized posterior p (z|d) ∝ p (z) L (z|d).For numerical stability, it is often preferable to work with the log-likelihood function, (z|d), instead of L (z|d).If we assume the error e to be normally distributed, uncorrelated and with an unknown constant variance, σ 2 , the log-likelihood function can be written as where σ can be fixed beforehand or sampled jointly with the other model parameters z.
The homoscedasticity (i.e., constant variance) assumption for e may be excessively strong in many cases.Considering the residual errors, e, to be heteroscedastic, Eq. ( 3) becomes where the σ i represents the individual residual error standard deviations that can be gathered into a vector σ .Here also, σ can either be fixed beforehand or sampled along with z (see Sect. 2.3.4).

Multi-objective likelihood function
In this work, we chose three types of eddy covariance data for the calibration: d 1 (GPP), d 2 (RECO) and d 3 (ET).We further assume that the corresponding residual errors, e 1 , e 2 and e 3 , are uncorrelated, leading to the following multi-objective log-likelihood function: The weighting between the three components of z|d 1,2,3 is an important issue.The constant (σ ) and non-constant (σ i ) standard deviations in Eqs. ( 3) and ( 4), respectively, basically weight the respective influences of e 1 , e 2 and e 3 on the log likelihood defined by Eq. ( 5).Distinct homoscedastic or heteroscedastic residual error models must be specified for e 1 , e 2 and e 3 .This was done for both the homoscedastic and heteroscedastic cases either by specifying the residual error standard deviations beforehand or by jointly inferring these standard deviations along with the model parameters.

Homoscedastic and heteroscedastic error models
Based on prior knowledge of the measurement errors, the homoscedasticity assumption simply reduces to assigning values to σ 1 , σ 2 and σ 3 in Eqs. ( 3) and ( 5).These values were fixed to 3 g C m −2 day −1 for the GPP measurements, 1.5 g C m −2 day −1 for the RECO measurements and 1 mm day −1 for the ET measurements.As stated earlier, measurement errors associated with eddy covariance fluxes are, however, typically found to be heteroscedastic, with a variance that is assumed to be linearly related to the magnitude of the measured data (Richardson et al., 2008): where the variable d denotes either GPP, RECO, or ET measurements, i = 1, • • •, N are measurement times and σ 0,d is equivalent to σ 1 , σ 2 or σ 3 in the homoscedastic case.We refer to the inversions based on these homoscedastic and heteroscedastic error models as HO1 and HE1, respectively.It is worth noting that by fixing the standard deviations to known measurement errors, one implicitly assumes that the model is able to describe the observed system up to the observation errors.This might not be realistic in environmental modelling, where models are always fairly simplified descriptions of a much more complex reality.

Joint inference of the homoscedastic and heteroscedastic error model parameters
Still under the Gaussianity assumption, a more advanced treatment of the residual error models considers the simultaneous inference of the standard deviations with the model parameters, i.e., it considers the standard deviation of the residual errors as unknowns.Doing so assumes that residual errors are expected to be a mixture of both model (equations and inputs) and observational errors.For the homoscedastic case, this simply consists of jointly sampling σ 1 , σ 2 and σ 3 along with the model parameters, z.
The heteroscedastic error model then becomes where the a and b coefficients are to be jointly inferred with z from the measurement data.Using Eq. ( 7) thus leads to the addition of six variables to the sampling problem: a 1 , a 2 , a 3 , b 1 , b 2 and b 3 .We refer to the joint inversions of these homoscedastic and heteroscedastic error models as HO2 and HE2, respectively.In these inversions, a total predictive uncertainty around the model values can be computed by adding to the modelled data a random noise drawn from a normal distribution with mean 0 and standard deviation σ sampled from its posterior distribution (HO2) or computed by Eq. (7; HE2).
The simultaneous inference of model parameters with homoscedastic or heteroscedastic error model parameters requires the definition of their prior probability distributions.
Based on the available prior information, uniform (flat) priors are used for the 10 model parameters contained in z (see Table 2).We follow two guidelines for specifying the prior densities of the error model parameters.First, we would like to obtain posterior standard deviations that are as small as possible within the range permitted by the model and measurement data errors in order to get the lowest possible data misfits.Second, the magnitudes of the different prior distributions should reflect the desired weights of the different data types within the multi-objective inference.These weights translate the modeller's relative preferences among the three modelling objectives in Eq. ( 5).We therefore use normal distributions with mean 0 truncated at 0 to avoid negative values.The prescribed weights then correspond to the different standard deviations of these normal distributions: where the X variable is either σ j , a j or b j for j = 1, 2, 3; where the value of σ X expresses the modeller's preference for objective j compared to the other objectives (the smaller σ X , the larger the relative weight of objective j ); where φ (•) signifies the probability density function of the standard normal distribution; where µ X is set to 0 for maximizing the prior density of X towards small values; and where the constant B depends on the lower (v) and upper (w) limits of the truncation interval in which (•) denotes the cumulative distribution function of the standard normal distribution.
This treatment of multi-objective Bayesian inference is in line with the work of Reichert and Schuwirth (2012), who further considered different statistical models for model and observation errors.Overall, this resulted in four different ways of treating the eddy covariance data uncertainties: fixed homoscedastic (HO1) and heteroscedastic (HE1) error models and jointly inferred homoscedastic (HO2) and heteroscedastic (HE2) error models.Using the HO1 and HE1 models led to a total of 10 inferred parameters, whereas using the HO2 and HE2 models resulted into a total of 13 and 16 inferred parameters, respectively.Table 2 lists the marginal prior distributions used for all sampled parameters.The upper and lower bounds of these distributions were either set to their maximal and minimal physically possible values or determined on the basis of expert knowledge.

Markov chain Monte Carlo sampling
The goal of the inference is to estimate the posterior distribution p (z|d) where the 10-, 13-or 16-dimensional z vector contains all sampled parameters and d signifies the conditioning data: d = d 1,2,3 herein.As an exact analytical solution of p (z|d) is not available, we resort to Markov chain Monte Carlo (MCMC) simulation to generate samples from this distribution.The basis of this technique is a Markov chain that generates a random walk through the search space and iteratively finds parameter sets with stable frequencies stemming from the posterior pdf of the model parameters (see, e.g., Robert and Casella, 2004, for a comprehensive overview of MCMC simulation).
The MCMC sampling efficiency strongly depends on the assumed proposal distribution used to generate transitions in the Markov chain.In this work, the state-of-theart DREAM (ZS) (ter Braak and Vrugt, 2008;Vrugt et al., 2009;Laloy and Vrugt, 2012) (DiffeRential Evolution Adaptive Metropolis) algorithm is used to generate posterior samples.A detailed description of this sampling scheme including convergence proof can be found in the literature cited and is thus not reproduced herein.
Convergence of the MCMC sampling to the posterior distribution is monitored by means of the potential scale reduction factor of Gelman and Rubin (1992), R. For each parameter of interest, this statistic compares the average withinchain variance to the variance of all the chains mixed together.The smaller the difference between these two variances, the closer to 1 the value of the R diagnostic.Values of R smaller than 1.2 are commonly deemed to indicate convergence to a stationary distribution.In this study, posterior distributions of the parameters were drawn from the point where all parameters achieved R < 1.2.This is more conservative than the conventional practice of stopping the inference when R < 1.2 for every parameter.The mean acceptance rate of the proposed samples, AR (%), is an important sampling property and is thus also reported.An excessively small fraction of accepted candidate points indicates poor Heteroscedastic error model parameters (for HE2 inversions only) mixing of the chains due to too wide a proposal distribution.
In contrast, a very large acceptance rate signals too narrow a proposal distribution, causing the chains to remain in the close vicinity of their current locations.The optimal value for AR depends on the proposal and target distributions, but a range of 10-30 % generally indicates good performance of DREAM (ZS) .

Parameter samplings and convergence of the algorithm
The DREAM (ZS) algorithm was run with four parallel chains, initialized by sampling the prior parameter distribution (Table 2).As an example, Fig. 1 shows sampling trajectories of DREAM (ZS) parametrized with four chains for the SLA parameter and inversion HO1 at the Oensingen site.The R convergence statistic becomes < 1.2 for each parameter after about 20 000 forward model runs, and the AR over the last 50 % model evaluations is about 18 %.Overall, convergence was achieved for all MCMC trials after some 15 000-30 000 forward runs with AR values in the range of 10-30 %, except for the inversions associated with the Laqueuille site that showed AR values as low as 5 %.

Posterior parameter distributions
Figure 2 presents marginal posterior histograms of the 10 model parameters for all experimental sites, considering the inferred homoscedastic error model (inversion HO2).In the remainder of this document, results are mainly detailed for this inversion scenario, since it generally led to the lowest data misfit statistics in calibration.For some parameters (e.g., SLA and C / N1), the marginal posterior distributions are narrow compared to the prior parameter range.This indicates a large sensitivity of the model to the considered parameter.In contrast, some other parameters such as γ 2 are poorly resolved, demonstrating a relative insensitivity.Asymmetric edge-hitting distributions are also observed such as for C / N1 and C / N2 in Monte Bondone.In a Bayesian inversion of eddy covariance data obtained from a forest site, Braswell et al. (2005) found that 7 out of 26 marginal parameter distributions were edge-hitting.Extending the prior parameter ranges would lead to unphysical or implausible parameter values.Edge-hitting distributions reveal model inadequacies and/or large systematic measurements errors.For some parameters, posterior distributions were fairly distinct from the default values that were used in previous studies (Table 2), such as high g1 values.Values of the characteristic mortality time τ also generally increased compared to the default value.
Table 3 shows the most likely parameter values for the four experimental sites; these parameter values resulted in the highest values of the log-likelihood function.Some of the parameters present contrasting values between inversion scenarios and/or experimental sites, which may be related to the different ecological characteristics of the sites as discussed in section 4.3.Depending on the width of the posterior distributions, the most likely parameter values are well resolved or largely uncertain.As a result, a comparison between the experimental sites must account for the posterior distributions of the parameters.

Measured and modelled carbon and water fluxes
with calibration data

Measured and modelled data in Monte Bondone
As the parameter sampling resulted in posterior distributions of the parameters instead of single values, ensembles of posterior modelled signals can be represented as a graph.In Fig. 3, measured and modelled eddy covariance data are depicted for the experimental site of Monte Bondone for inversions with the inferred homoscedastic error model (inversion HO2).The posterior ranges of the modelled signals are represented by the dark grey shaded areas for the prediction uncertainty due to parameter uncertainties and by the light grey shaded areas for the total predictive uncertainty (at 95 % confidence level).This total prediction uncertainty is computed using the standard deviation of the residual errors σ as sampled by the inversions and, therefore, cannot be computed for the NEE.The site of Monte Bondone was chosen here since there is one single cut a year (indicated by the vertical arrows in Fig. 3) that is clearly identifiable, which facilitates the interpretation of the fluxes.The dates of cutting corresponded to a sudden drop in the GPP in the middle of the year, which was followed by a gradual increase.They were also observed in the NEE graphs, with a sudden increase in the NEE.
There was overall good agreement between measured and modelled signals.It is worth noting that the posterior ranges of modelled data were not constant over time and were not related to the magnitude of the signals.The ranges due to parameter uncertainties were relatively small and did not encompass the measured data.Overall, it could be observed that measured eddy covariance data have stronger dynamics than the modelled signals, meaning that the CARAIB model cannot follow the fast fluctuations of the GPP (and other signals) over time.In particular, the model could not simulate the highest peaks in GPP well.

Measured and modelled data across sites
Considering the other three experimental sites (Fig. 4), there was a similar agreement between measured and modelled signals, although the sites displayed different behaviour in terms of GPP as their management varies: there are several cuts per year in Grillenburg and Oensingen, while Laqueuille is a grazed meadow.In general, the peaks in GPP cannot be simulated well by the model.The modelled GPP seemed averaged out when compared to the measured signals, as observed before in Monte Bondone (Fig. 3a).
All the graphical comparisons between measured and modelled signals could not be shown but are summarized in Table 4 for the homoscedastic and heteroscedastic cases, and with the fixed and inferred error models, using the root mean square error (RMSE), the R 2 and the Nash and Sutcliffe (1970) model efficiency criterion (E) between measured and modelled signals.The latter criterion takes values from −∞ to 1.A value of 1 means a perfect match between measurements and model simulations, a value of 0 indicates that the mean of the observed data is as accurate as the modelled values, and an efficiency less than 0 occurs when the mean of the observed data reproduces the observations better than the modelled values.The maximum log-likelihood value "ml" that was obtained by the algorithm is also indicated.Note that performance criteria were also computed for the NEE, although these data were not used in the model inversions.Overall, the best agreement was found for the Monte Bondone site and the worst for the Laqueuille site.The low-   2) are depicted with a cross and the most likely values with a star.The x axes cover the whole prior ranges.est model efficiencies E were found for the NEE, which is not surprising since these data were not accounted for in the model inversions.While the ml values were generally the highest for the heteroscedastic inversions HE2, RMSE appeared larger for these inversions.

Homoscedastic and heteroscedastic eddy covariance residual errors
Considering homoscedastic or heteroscedastic eddy covariance residual errors resulted in different sampling of parameter posterior distributions and, therefore, different posterior modelled signals.As an example, Fig. 5 shows the measured and modelled GPP with their posterior ranges for the site of Monte Bondone in 2004, for both homoscedastic (a,     c) and heteroscedastic (b, d) cases.For the HO2 and HE2 inversions, the 95 % total predictive uncertainty is depicted using the light grey shaded areas.The measurement uncertainty is depicted only for fixed eddy covariance residual error inversions (a, b) for clarity.The measurement uncertainty is thus constant for the homoscedastic case (namely, ±3 g C m −2 day −1 for HO1), while it varies linearly according to the GPP for the heteroscedastic case (HE1).These two options led to different behaviours of the modelled GPP using the posterior distributions, which better approached the high values of the measured data (in summer) in the homoscedastic cases and better fit the low values (in winter) in the heteroscedastic cases.Overall, in calibration, modelled signals with parameter values from the homoscedastic inversions were in a better agreement with the measured data than with the parameters from the heteroscedastic inversions.The same observation was also made for the other sites (not shown), as can also be observed in Table 4.However, the total predictive uncertainty range derived from the HE2 inversions was more consistent, as, e.g., it avoids unrealistic negative values of GPP.The standardized residuals, which were computed as the difference between measured and modelled data divided by the standard deviation of the residual error, are depicted in Fig. 5 to the right of the GPP graphs.Heteroscedasticity of the GPP residual errors was fairly reduced but not fully removed by using the HE1 and HE2 heteroscedastic residual error models.Indeed, the standardized residuals still showed some small but complex heteroscedastic patterns.Partial autocorrelation of the residuals of the GPP was also depicted, and independence of the days of simulation from one another was reached after a few days.

Sampling of the standard deviation of the residual errors
Inversions with the sampling of the standard deviations of the residual errors resulted in posterior distributions of the standard deviation of the residual errors (HO2) and parameters of Eq. (7; HE2).Most likely values of these distributions (Table 5) depended on the experimental sites, being larger for Laqueuille and Oensingen, which can be related to the poorer agreement between measured and modelled data at these sites.Although the sampled standard deviations of the residual errors were lower than in the fixed inversions, there were no large differences between the inversions with fixed  4) nor in the parameter posterior distributions (Table 3).

Model validation
Parameter values from the posterior distributions were tested for validation using eddy covariance data over different periods (for validation data sets, see Table 1).Figure 6 shows measured and modelled GPP values over the periods of calibration and validation in Monte Bondone.Not surprisingly, worse agreement between measured and modelled data is observed than in the calibration period.However, it is observed that the modelled GPP in validation in the HE2 inversions follows better the measured signal than in the HO2 inversions.Strikingly, at all the sites, the posterior parameter distributions derived from using the HE1 and HE2 heteroscedastic models are found to induce a better model performance in validation compared to the posterior distributions associated with the use of the homoscedastic models (Table 6).
The difference between calibration and validation thus appeared smaller when using most likely parameter values from heteroscedastic inversions as compared to homoscedastic inversions.Among the different grassland sites, a similar performance pattern as for the calibration experiment is observed.Indeed, the Laqueuille site shows the worst performance statistics for each type of measurement data, whereas the Monte Bondone site overall presents the best fits to the data (Table 6).

Measured and modelled signals
Bayesian inversions over the four grassland sites resulted in posterior distributions of parameters and posterior ranges of modelled signals (GPP, RECO, ET and NEE).Consider-ing the inversion scenario HO2, there was, in general, good agreement between measured and modelled signals, with RMSEs ranging from 1.73 to 2.19 g C m −2 day −1 and R 2 being between 0.74 and 0.84 in terms of GPP.Using a dedicated model for soil organic carbon dynamics, De Bruijn et al. ( 2012) found an R 2 of 0.68 for the modelling of the NEE at the Oensingen site over the same years.Comparing three large-scale lands surface models in simulating carbon fluxes over different ecosystems, Balzarolo et al. (2014) noticed that grassland and crop sites were more difficult to model compared to forest sites.Using data from 13 grassland sites over Europe, including Laqueuille and Grillenburg, they found average RMSEs between measured and modelled GPP ranging from 2.45 to 3.57 g C m −2 day −1 and R 2 from 0.37 to 0.56.These larger discrepancies compared to our study are mainly to be related to the fact that the large-scale models were used without site calibrations.Modelling of carbon fluxes was also performed at the Oensingen site over the same years in Calanca et al. (2007), using a dedicated grassland model, PaSim.In that study, no numerical comparison between measured and modelled data was computed at a daily resolution, but the relative departures between measured (eddy covariance) and modelled data were given by year of simulation and ranged from −11 to −21 % in terms of the annual sum of GPP.In our study, the annual relative departures in the annual sum of GPP in Oensingen ranged from 0.7 to 9 % with the calibration data set and up to 63 % with the validation data set.In a similar experiment involving the inversion of eddy covariance data from forest sites, Fox et al. (2009) found RMSEs between measured and modelled NEE of 0.7 and 1.3 g C m −2 day −1 for two different sites in calibration and of 1.5 g C m −2 day −1 in validation.These values are lower than in our study, but the measured NEE data were not used in the model inversion here, unlike in the inversions in Fox et al. (2009).It could be observed that measured eddy covariance data have stronger dynamics than the modelled signals, that is, modelled signals could not follow the fast fluctuations of the measured signals and, in particular, simulate high GPP values.This could be related to the different time resolutions between the model and data.The CARAIB model is based on meteorological data averaged daily.However, photosynthesis and respiration processes are computed at a 2-hour time step before being aggregated to a daily resolution, and the model assumes a symmetry with respect to solar noon time (Otto et al., 2002) to save computation resource.Moreover, in the CARAIB model, solar fluxes are calculated assuming a constant cloudiness over the day and temperature is varied using a sinusoidal function between the minimal and maximal temperatures, which were fixed at midnight and noon, respectively.These shortcomings were necessary to save computation resources and to account for data scarcity in global vegetation modelling.Eddy covariance data, however, are typically acquired at a time frequency of 5 or 10 Hz (Aubinet et al., 2012) and can thus capture high-frequency fluxes.Even though eddy covariance data were aggregated over time to a daily time resolution, the high-frequency acquisition rate ensures that effects of abrupt meteorological events are recorded.Increasing the time resolution of the CARAIB model would help to better simulate ecophysiological processes at a high frequency.Alternatively, a simple workaround to deal with the different time dynamics would be to apply a filter based on a moving window of some days in order to smooth measured (and modelled) eddy covariance data before computing the statistical indicators, as done in Calanca et al. (2007).
Another modelling limitation is that model parameters are assumed as constant along the season, although plants traits are known to evolve throughout the season and plants acclimate to specific climate conditions.As a result, the effect of similar climatic conditions does not necessary result in similar eddy covariance measurements.
In general, there was poorer agreement between measured and modelled signals (GPP, RECO, ET and NEE) in Laqueuille than at the other experimental sites.This poorer agreement can probably be related to the grazing instead of the cutting that occurs in Laqueuille.Grazing was more difficult to simulate because of the expert-knowledge conversion between the given cattle charge and the biomass removal.As a result, grass cutting is better constrained in the model compared to grazing, as was already shown in the Laqueuille ex-perimental site by Calanca et al. (2007), who, however, used the grassland model PaSim.
All the same, besides the average statistical indicators between measured and modelled signals, the performance of the calibration might be also evaluated against specific scientific or operational objectives.For instance, the accurate modelling of the grass cutting or the computation of annual budgets of carbon in grassland (e.g., Soussana et al., 2007) might show different performances, depending on the timescale on which the processes are analysed.

Homoscedastic and heteroscedastic eddy covariance residual errors
Bayesian inversions were conducted considering homoscedasticity and heteroscedasticity in the eddy covariance residual errors.Figure 5 showed that accounting for heteroscedasticity in eddy covariance residual errors permitted a better simulation of low-magnitude signals (winter), but at the same time, it penalized the modelling of highmagnitude signals (summer).Actually, it is worth remarking that, in carrying out inversions considering heteroscedastic measurement errors, we do not attempt to produce smaller RMSEs between measured and modelled data compared to homoscedastic scenarios since larger errors are considered for high peaks in the signals.However, in validation, the posterior parameter distributions derived from using the heteroscedastic residual error models outperform their counterparts derived from using the homoscedastic residual error models.This important finding reveals that, despite inducing larger RMSE values in calibration, the use of a heteroscedastic residual error model leads to a more robust parameter estimation.
Since eddy covariance data are known to show heteroscedasticity, accounting for a heteroscedastic model of the residual errors in the inversions is more conceptually sound for ensuring unbiased parameter posterior distributions.However, we showed that considering a linear heteroscedastic model of the residual errors only partly removed heteroscedasticity in the standardized residual values (Fig. 5b  and d).Other kinds of heteroscedastic models (i.e., nonlinear) might be tested, but the residual distributions did not show any clear trend for all sites.
It is also worth noting that a substantial fraction of the large residual errors is caused by the tendency of the CARAIB model to underestimate the observed GPP summer peaks.As discussed above, this is related to a slower temporal resolution of the model compared to that of the measured data.To overcome this model inadequacy, further model modifications are necessary to increase the time reso-lution of the model.Another model improvement would be to simulate varying model parameter values as a function of the time of the year, since plant traits actually evolve over the course of the seasons.However, this would come at the cost of a large increase in model complexity.

Sampling of the standard deviation of residual errors
Sampling the standard deviation of the residual errors, i.e., the inversions HO2 and HE2, resulted in similar parameter samplings and modelling as the inversions HO1 and HE1, respectively.Some performance criteria were better with the sampling of the residual standard deviations, while others were not.As expected, the most likely standard deviations of the residuals errors were close to the RMSE obtained in the inversions HO2.The benefit of these values is that they inform us about the level of the uncertainties in the eddy covariance data with respect to the model used to invert the data, e.g., uncertainties in GPP ranged from 1.79 to 2.29 g C m −2 day −1 , in RECO from 1.09 to 1.63 g C m −2 day −1 and in ET from 0.52 to 1.31 mm day −1 .They could be used to weight different eddy covariance data in multi-objective inverse modelling.

Parameter values across sites
Posterior distributions of parameters showed contrasting values that could be linked to the characteristics of the experimental sites.For instance, the specific leaf area (SLA) is known to depend on many factors (Marcelis et al., 1998), such as leaf age, temperature, light intensity, aridity and soil nutrient content.Thick leaves (low SLA) are more adapted to dry ecosystems due to their greater capacity to retain water.
Although none of the four grassland sites are strictly characterized by a dry climate, it is interesting to note that the posterior parameter distributions for SLA were negatively correlated with the aridity, inversely expressed by the De Martonne-Gottman index (Fig. 7), that is, SLA decreases with increasing aridity.The largest SLA (thin leaves) were found for Laqueuille, which can be related to the permanent grazing that constantly regenerates young leaves, since young leaves are characterized by high SLA.The large SLA values in Oensingen can be related to more intensive management conditions (fertilization, more frequent cuts).Contrarily to SLA, the characteristic mortality time in stress conditions τ s appeared to be positively correlated with the site aridity (Fig. 7).A larger τ s value means a larger water stress resistance for the plants in Grillenburg and Monte Bondone.
The values of g1 were drastically different between Oensingen and the three other sites (Table 3).In addition, for these three sites, the values appeared much higher compared to the default values (g1 = 9) and other values com-monly encountered in the literature (Van Wijk et al., 2000;Medlyn et al., 2011).It is known that g1 should increase with humid conditions and temperature (Medlyn et al., 2011), as it is positively related to the marginal water cost of carbon gain.However, the high values of g1 here could not really be related to a warmer or wetter climate as compared to Oensingen.A possible explanation could be related to the different dynamics of the model and the measurements, as already explained herein before.As the model cannot simulate the high GPP values that are observed in the eddy covariance data, the Bayesian algorithm could have compensated for this by sampling high values of g1 that increase stomatal conductance.
More broadly, ecophysiological differences between the grassland sites resulted in parameter posterior distributions that can be either drastically different or common between the sites (Fig. 2).If it appears that site-specific parameter values are needed, it means that the model has to be refined by accounting for the ecophysiological dependence of the parameters.If not, generalized parameter values could be used, meaning that they are independent of the site on which they were determined or even independent of the plant species, as recently claimed by Yuan et al. (2014).Determining a common set of the parameter distributions among the four sites could be done either by (1) merging the four posterior distributions after independent inversions of the data of each site or (2) merging the eddy covariance data of the four sites in one single MCMC sampling, as discussed in Kuppel et al. (2012).

Conclusions
Bayesian inversions of the CARAIB dynamic vegetation model were performed using eddy covariance data (GPP, RECO, ET) at four experimental grassland sites.A specific version of the CARAIB model was developed for this application, with functions related to grassland management, i.e., grass cutting and grazing.Posterior parameter and predictive distributions were compared for different statistical models of the eddy covariance residual errors: (1) assuming homoscedasticity or heteroscedasticity of the residual errors and (2) fixing beforehand or jointly inferring the variances of the residual errors.There was, in general, good agreement between measured and modelled signals for the calibration data sets with RMSEs of daily gross primary productivity (GPP), ecosystem respiration (RECO) and evapotranspiration (ET) ranging from 1.73 to 2.19, 1.04 to 1.56 g C m −2 day −1 and 0.50 to 1.28 mm day −1 , respectively.Since the four sites belong to a long-standing network of eddy covariance data measurements, comparisons with previous studies could be made.
Although the eddy covariance measurement errors are known to be heteroscedastic, the use of a homoscedastic error model led to a better model performance in calibration compared to using a heteroscedastic error model.Nevertheless, a model validation experiment revealed that CARAIB models calibrated by means of a heteroscedastic error model outperform those calibrated assuming homoscedastic residual errors.Posterior parameter distributions derived from using a heteroscedastic model of the residuals are therefore more sound and robust, even though heteroscedasticity could not be fully removed.Therefore, our results support the use of a heteroscedastic residual error model for inverting eddy covariance data and inferring posterior parameter distributions.
Systematic model-data discrepancies were also found for the largest observed GPP values.This can be attributed to the low temporal resolution of the photosynthetic processes in the CARAIB model, among other model inadequacies.Modelling performance varied among the four sites, with poorer performances at Laqueuille because of the greater difficulty of modelling grazing compared to grass cutting.Lastly, sitespecific posterior parameter distributions obtained for the four grasslands were compared and discussed with respect to grassland characteristics.Specific leaf area and characteristic mortality time parameters appeared to be related to site aridity.

Figure 1 .
Figure 1.Sampled values of the specific leaf area (SLA) by DREAM (ZS) parametrized with four chains for the Oensingen site and the fixed homoscedastic error model (inversion HO1).The vertical dashed line indicates when convergence has been reached according to the R statistic.

Figure 2 .
Figure 2. Posterior distributions of the CARAIB model parameters sampled by the DREAM (ZS) algorithm with the inferred homoscedastic error model (HO2 inversions) for all sites.The default values (see Table2) are depicted with a cross and the most likely values with a star.The x axes cover the whole prior ranges.

Figure 3 .Figure 4 .
Figure 3. Measured and modelled GPP (g C m −2 day −1 ) (a), RECO (g C m −2 day −1 ) (b), ET (mm day −1 ) (c) and NEE (g C m −2 day −1 ) (d) at the Monte Bondone site for the inferred homoscedastic error model (inversion HO2).The ranges of the prediction uncertainty due to parameter uncertainty and the 95 % total predictive uncertainty (only for GPP, RECO and ET) are depicted by the dark and light grey shaded areas, respectively.Vertical arrows indicate the dates of the grass cutting.

Figure 5 .
Figure 5. Measured and modelled GPP (g C m −2 day −1 ) at the Monte Bondone site in 2004 for the fixed homoscedastic HO1 (a) and heteroscedastic HE1 (b), inferred homoscedastic HO2 (c) and heteroscedastic HE2 (d) inversions.The measured GPP is depicted with a constant (a) and variable (b) uncertainty range.For the HO2 and HE2 inversions, the 95 % total predictive uncertainty interval is depicted using the light grey shaded areas.Standardized residuals and partial autocorrelation of residuals of GPP over the full simulation period are depicted to the right of each graph.

Figure 6 .
Figure 6.Measured and modelled GPP (g C m −2 day −1 ) at the Monte Bondone site in calibration (2003-2005) and validation (2006-2007) for the inferred homoscedastic HO2 (a) and heteroscedastic HE2 (b) inversions.The 95 % confidence interval of total predictive uncertainty is depicted using the light grey shaded areas.

Figure 7 .
Figure 7. Posterior distributions of the specific leaf area (SLA, dashed line) and characteristic mortality time in stress conditions τ s (plain line) for the four sites (HO2 inversions values), classified as a function of increasing aridity by the De Martonne-Gottman index (grey bars).The mean of the posterior distributions and the most likely parameter values are depicted with a circle and a star, respectively.The error bars stand for one standard deviation around the mean.

Table 1 .
Grassland sites and periods of simulations.

Table 2 .
Default values and prior distributions of the 10 model parameters and prior distributions of the statistical parameters of the homoscedastic and heteroscedastic error models.The label U means a uniform distribution, TG signifies a zero-mean Gaussian distribution truncated at 0 to avoid negative values and SD denotes the prescribed standard deviation of a TG distribution.

Table 3 .
Most likely CARAIB model parameter values for all inversion scenarios.

Table 4 .
Comparison between measured and modelled signals using most likely parameter values.The "ml" variable is the maximum value of the log-likelihood function.

Table 6 .
Validation of the calibrated model using most likely parameter values from the inversions.