A Bayesian ensemble data assimilation to constrain model parameters and land-use carbon emissions

. A dynamic global vegetation model (DGVM) is applied in a probabilistic framework and benchmarking system to constrain uncertain model parameters by observations and to quantify carbon emissions from land-use and land-cover change (LULCC). Processes featured in DGVMs include parameters which are prone to substantial uncertainty. To cope with these uncertainties Latin hypercube sampling (LHS) is used to create a 1000-member perturbed parameter ensemble, which is then evaluated with a diverse set of global and spatiotemporally resolved observational constraints. We discuss the performance of the constrained ensemble and use it to formulate a new best-guess version of the model (LPX-Bern v1.4). The observationally constrained ensemble is used to investigate historical emissions due to LULCC ( E LUC ) and their sensitivity to model parametrization. We ﬁnd a global E LUC estimate of 158 (108, 211) PgC (median and 90 % conﬁdence interval) between 1800 and 2016. We compare E LUC to other estimates both globally and regionally. Spatial patterns are investigated and estimates of E LUC of the 10 countries with the largest contribution to the ﬂux over the historical period are reported. We consider model versions with and without additional land-use processes


Introduction
Due to constraining atmospheric CO 2 concentrations and the relatively well known CO 2 sink in the ocean it follows that about a fifth of anthropogenic CO 2 emissions is stored in the terrestrial biosphere (Ciais et al., 2013). However, the partitioning of this land-atmosphere flux to effects from humaninduced land-use and land-cover change (LULCC) and the transient change in the residual terrestrial sink remains highly debated (Schimel et al., 2015). It is estimated that approximately a third of the cumulative anthropogenic CO 2 emissions in the industrial period stem from the effects of LULCC Brovkin et al., 2013;Houghton and Nassikas, 2017;McGuire et al., 2001;Mahowald et al., 2017;Pongratz and Caldeira, 2012;Sitch et al., 2015;Strassmann et al., 2008;Stocker et al., 2017Stocker et al., , 2014Peng et al., 2017). A better understanding of the mechanisms of the historical terrestrial carbon cycle is vital for more accurate future projections of the global carbon cycle and climate. In addition, a better understanding of the residual terrestrial sink can also help to improve our understanding of the terrestrial carbon cycle of the past, unperturbed by human influence.
Amongst others, dynamic global vegetation models (DGVMs) are used to quantify the contribution of LULCC to the terrestrial carbon budget (Le Quéré et al., 2016). The assessment of the performance of a given model using observational benchmarks is actively discussed in the literature Peng et al., 2014;Kelley et al., 2013;Luo et al., 2012;Blyth et al., 2011;Randerson et al., 2009) and different frameworks have been proposed. In addition to uncertainties in the prescribed LULCC forcings and the representation of LULCC and other processes in DGVMs, the Published by Copernicus Publications on behalf of the European Geosciences Union. values of the applied parameters are subject to substantial uncertainties. We use a Monte-Carlo-like data assimilation approach Steinacher and Joos, 2016;Battaglia and Joos, 2018) to sample 15 key model parameters and construct a 1000-member model ensemble to investigate this parameter-related uncertainty in the DGVM LPX-Bern and establish a new reference version of the model. A total of 14 data products are used as observational constraints. These range from global inventories of carbon (Ciais et al., 2013) to spatially resolved satellite estimates of photosynthetically absorbed radiation (Gobron et al., 2006). The goal of the data set selection process is to have observations capturing the magnitudes of fluxes and inventories in the carbon cycle, as well as its transient response to the anthropogenic perturbation.
The assimilation of observations should be an integral part of model development. Various approaches to incorporate constraining data exist, such as variational approaches minimizing a cost function using the adjoint of the model Kaminski et al., 2013) or the use of ensemble Kalman filters (Lorenc, 2003;Gerber and Joos, 2013;Stöckli et al., 2011;Ma et al., 2017). A drawback of these methods is that the sampling process is dependent on the choice of the cost function, the design of which is not trivial when assimilating multiple observations simultaneously. Other approaches have also been investigated, such as using generalized likelihood function for model calibration and uncertainty estimation (Beven and Binley, 1992). Here we employ the Latin hypercube sampling (LHS) (McKay et al., 1979) approach, as used successfully in previous studies Battaglia et al., 2016;Steinacher and Joos, 2016;Battaglia and Joos, 2018;Zaehle et al., 2005). It allows simultaneous stratified sampling of a range of parameters, given an appropriate prior parameter distribution, while offering the opportunity to change evaluation metrics a posteriori, thus enabling a sensible incorporation of multiple observational constraints. By improving the prior distribution iteratively it is possible to reasonably capture observations while considering trade-offs between the different targets. Additionally, this approach not only yields a best-guess of parameter values but also contains information about the associated uncertainties. A drawback of this technique is that it is not possible to increase the size of the ensemble after the initial sampling and, if the range of the prior distribution is too large, the algorithm has decreased computational efficiency.
While the land-atmosphere carbon flux can to some extent be constrained by the other components of the global carbon cycle, the contribution of LULCC and, in turn, the implied residual terrestrial carbon sink are highly uncertain. Efforts to fill this knowledge gap have been made using bookkeeping approaches (Houghton et al., 2012;Hansis et al., 2015;Houghton and Nassikas, 2017) and bottom-up modeling approaches using DGVMs (McGuire et al., 2001;Stocker et al., 2014;Wilkenskjeld et al., 2014;Sitch et al., 2015). Book-keeping models can offer valuable information on the magnitude of regional and global LULCC emissions (E LUC ), but they typically rely on time-invariant estimates of carbon densities and thus neglect the direct impact of climate change on vegetation. Observational data on carbon densities and response of the vegetation to LULCC effects can be directly incorporated in bookkeeping models. In contrast, DGVM model studies are able to produce highly resolved spatial results and consider changes to vegetation structure due to anthropogenic perturbance, but DGVMs have large uncertainties due to differences in process modeling and parametrization. Additionally, a number of LULCC processes are often neglected, such as the effect of gross land-cover transitions (shifting cultivation), management (wood harvest) or erosion. Studies investigating these processes generally have found that the inclusion of those processes leads to an increase in E LUC Wilkenskjeld et al., 2014;Stocker et al., 2014). On the other hand, neglected processes such as human-induced erosion can have the opposite effect and reduce net E LUC (Kosmas et al., 2007;Billings et al., 2010;Hoffmann et al., 2013;Wang et al., 2017). The effect of parameter uncertainty on these estimates is often only considered indirectly in the intercomparison of models. Here we investigate a parameter ensemble of a single DGVM, constrained by observation, and provide direct estimates of parameter-induced uncertainties in LULCC estimates. These uncertainties are put into context by investigating the effect of additional LULCC processes, such as shifting cultivation and wood harvest, as already investigated in previous studies (Stocker et al., 2014;Wilkenskjeld et al., 2014;Shevliakova et al., 2009). We rely here on the LUH2 v2h (Hurtt et al., 2018) land-cover data to force the DGVM LPX-Bern v1.4.

LPX-Bern
The Land surface Processes and eXchanges (LPX-Bern) model Stocker et al., 2013;Keller et al., 2017) is a dynamic global vegetation model based on the Lund-Potsdam-Jena (LPJ) model (Sitch et al., 2003). It features coupled nitrogen, water and carbon cycles and distinguishes between different types of prescribed land-use classes: natural vegetation, peatland, cropland, pasture and urban land. The vegetation composition for a given land-use class is determined dynamically. Different plant functional types (PFTs), with given bioclimatic limits, compete for resources. Here, eight tree PFTs and two herbaceous PFTs are used to describe natural vegetation; the same two generic herbaceous PFTs grow on pasture and cropland, and two moss PFTs, two flood-tolerant tropical PFTs and a floodtolerant herbaceous PFT grow on peatlands.
Two different configurations are used to treat the transition between different classes of land use. The simpler implementation adjusts the fractional land-use cover at the end of each year such that the prescribed area fractions are achieved; this computationally efficient configuration is referred to as net land use. The more advanced gross land-use implementation also includes effects of shifting cultivation and wood management by prescribing all the transitions between different land-use classes and harvested wood (Stocker et al., 2014;Strassmann et al., 2008). Furthermore, it includes an additional land-use class, the so-called secondary forestnatural vegetation growing on abandoned pasture or cropland. A major drawback of this scheme is the significantly increased computational cost. Additionally, the implementation of gross land use in LPX-Bern in the current version does not allow for the simultaneous simulation of peatlands. For both schemes a fraction ox frac of the crops above-ground biomass is directly oxidized to the atmosphere, simulating crop harvest. A total of 75 % of heartwood and sapwood biomass production from forest conversion is assigned to decaying product pools; the remaining 25 % are respired directly to the atmosphere as assumed harvest losses. Associated root and leaf mass are transferred to a below-and aboveground litter pool respectively. The biomass in the product pools is evenly split in a long-lived (mean lifetime 20 years) and a short-lived (mean lifetime 2 years) pool. In the gross LULCC setup, biomass is harvested according to the prescribed forcing and the resulting heartwood is assigned to product pools using the same allocation rules as before.

Model setup and spin-up
The model is run on a 1 • × 1 • global grid and forced with CRU TS3.25 climate data (Harris et al., 2014) and global atmospheric CO 2 concentration from ice core reconstructions (Meure et al., 2006;Joos and Spahni, 2007) and direct atmospheric measurements after 1958 (Tans andKeeling, 2017). The land-use harmonization LUH2 v2h (Hurtt et al., 2018) estimates for land-use patterns and transitions are prescribed to the model. Additionally, nitrogen deposition (Lamarque et al., 2013) and fertilization (Zaehle et al., 2011) and the extent of Northern Hemisphere peatlands (Tarnocai et al., 2009) are prescribed. As described in Sect. 2.3 we use an ensemble approach featuring 1000 simulations with different parameters. All ensemble members share a 1500-year spinup run to pre-industrial conditions, using the median parameter values. To ensure the equilibration of each member an additional 300-year individual spin-up run, featuring an analytical equilibration of the soil carbon pools after 100 years, is performed. The model is then run transiently from 1800 to 2016 with recycled climate data (years 1901-1930) in the 19th century.

Sampling and constraining
The model parameter space is sampled using Latin hypercube sampling (McKay et al., 1979) to create an ensemble of model configurations and assess model uncertainty. LHS is a stratified sampling method using chosen prior parameter distribution to generate an uncorrelated parameter ensemble of a given size. In contrast to most Monte Carlo data assimilation techniques, the sampling is independent of the metrics used to assess model performance, allowing us to modify the metrics after the sampling without substantial computational effort. A drawback of this sampling strategy is that it is not possible to increase the size of the ensemble after the initial sampling. The generated ensemble is then constrained using an hierarchical weighting scheme of deviations to observational data sets to obtain a global skill score, rating each model member. Table 1 lists the selected sampling parameters as well as their old values in LPX v1.2 and new best-guess values (LPX v1.4). The parameters were selected for their importance in various aspects of the model; 10 of the 15 parameters were also used by Steinacher et al. (2013). The fraction of photosynthetically active radiation assimilated at ecosystem level relative to leaf level, α a ; the intrinsic quantum efficiency of CO 2 uptake in C 3 plants, α C 3 ; and θ the rubisco co-limitation shape parameter are of primary importance for the photosynthetic carbon assimilation. g m and α m are parameters in the empiric water demand calculation and have a direct impact on the hydrological cycle and consequently also the carbon assimilation. The sapwood-heartwood turnover time, τ sapwood ; the maximum mortality parameter, mort max ; and the ratio between leaf area and sapwood area, k la : sa , are vital for the allocation of the carbon to the different vegetation pools and thus also the overall vegetation carbon pool size. The fractions of the flux leaving the litter pools that is respired to the atmosphere directly and entering the slow soil pool, f atm and f slow , influence soil and litter carbon inventories. These pools are further controlled by the temperature sensitivity of heterotrophic respiration E 0,hr , which is of special significance under changing climate, and a scaling factor for soil decomposition k soil,tune , affecting the residence time of both the fast and the slow soil carbon pool. By using factorial simulations two important parameters for the nitrogen cycle were identified: the maximum nitrification rate, nitr max , and the fraction governing immobilization of mineral nitrogen in the soil, f imob,soil . Finally, the oxidation rate of crops ox crop , representing the harvest of biomass on croplands, is directly linked to emissions from human land use.

Selection of the prior distribution
The prior distribution used for LHS was derived in multiple steps following partly an explorative approach. An initial version of the ensemble with 1000 members was run using the 10 LPX parameters and distribution used by Steinacher et al. (2013) and four additional parameters relating to the nitrogen cycle and oxidation rates in areas with anthropogenic land use. The ensemble is sampled using normal and log-normal distributions with distribution parameters chosen such that the median matches the parameter value of LPX-Bern v1.2 and the 90 % confidence interval matches plausible ranges or literature-based ranges where available. Normal distributions are used by default; log-normal distributions are used for parameters with asymmetric parameter ranges and parameters with values close to zero. This initial ensemble was evaluated against a subset of the observational constraints presented in Sect. 2.4 and it was found that ensemble performance is poor, especially with respect to global atmosphere-land fluxes. Sensitivity of model outcomes to individual parameter values was explored by 76 factorial simulations where additional parameters were varied. The information from these sensitivity simulations and the results on parameter sensitivity of an earlier study (Zaehle et al., 2005) are used to identify key model parameters. In addition, six ensembles of reduced size (four 200 members and two 300 members), featuring slightly different parameter combinations, were used to refine the median parameter values and their ranges. By evaluating these simulations the final set of parameters presented in Sect. 2.3.1 was selected. The final iteration included the sequential computation of three observation-constrained ensembles with 1000 members each. The first of these three ensembles was calculated with prior distributions based on the refined median parameter values. The median and 95 % con-fidence of the posterior distribution after observation assimilation as described in Sect. 2.3.3 is then used as the prior distributions of a new 1000-member parameter ensemble. This procedure is repeated one more time to arrive at the prior distributions used in the final ensemble and displayed in Fig. 1.
No formal convergence criterion is employed, since the computation and evaluation of a single ensemble represents a considerable computational and analytical effort. The near convergence of the posterior (Sect. 2.3.3) and prior distribution of the final ensemble ( Fig. 1) indicates a near-optimal solution for the parameter distribution in the context of the observational constraints and the associated skill score metric (Sect. 2.3.3). In addition, this convergence of prior and posterior distribution also indicates that the final prior distribution is suited to adequately sample the parameter space for our selection of observational constraints. The differences between the parameter value used in the older LPX-Bern v1.2 and the best-guess parameter values of the final ensemble ( Fig. 1; see also Table 1) provide a measure of how much individual parameters were revised during our iterative data assimilation. For completeness, we report that individual forcing data sets, such as the land-use data, were updated and the set of observational constraints expanded during the course of the work.

Skill scores and the posterior distribution
The performance of the final 1000-member model ensemble is evaluated using the set of observational constraints listed in Table 2. The model-data discrepancy for a given observational data set i and model run is estimated by the relative mean squared error (MSE i rel )  where w j are the normalized weights of the data points j , which in the case of gridded data sets correspond to the grid cell area. X mod,i j and X sim,i j correspond to the modeled and observed data points for constraint i respectively. In accordance with Schmittner et al. (2009) and Steinacher et al. (2013) the combined error σ 2 is approximated by the modeldata variance of the model member with the smallest MSE i rel of the ensemble. As a consequence, the smallest possible MSE i rel using this approximation is 1. If the observational error is known and larger than the variance, it is instead used as an estimate for the combined error, allowing a minimum MSE i rel of zero. The MSE i rel of all individual observational constraints is aggregated to a total error MSE tot rel with a hierarchical weighting scheme shown in Fig. 2 and translated to a skill score S m for each ensemble member m. We require that MSE rel is lower than 5 for each of the individual observational data sets; otherwise S m is set to 0.
The size of the ensemble is further reduced by excluding runs with low skill scores, such that the remaining 667 runs have 99 % of the cumulative skill score m S m of all runs, which we term the constrained ensemble. The maximum achievable skill score is not 1 for spatially resolved data since it would correspond to a MSE tot rel of 0, which is not achievable due to the approximation for the combined error, used in the spatially resolved constraints. We did not renormalize skill score to a scale between 0 and 1.
The so-called posterior distribution of a parameter or quantity of interest is obtained by using the skill score weighted normalized histogram, which can be interpreted as a probability density function, of the constrained ensemble. The skill weighted median and confidence interval of a given quantity is then determined by transforming the histogram to a discrete cumulative density function using a cumulative sum and approximating the desired quantiles by a firstorder interpolation. Throughout this paper we report the skill weighted median of numerical results along with the 5 and 95 % quantiles, corresponding to the 90 % confidence interval, in parentheses.

Observational constraints
The calculation of the MSE rel requires the model and observational data to conform to the same structure. In the following, the required pre-processing will be outlined briefly. The seasonality of the fraction of absorbed photosynthetically active radiation (FAPAR) as simulated in the model is compared to a satellite-derived product (Gobron et al., 2006) 1980-1989 1990-1999 2000-2009 2002-2011 1750-2012 FLUXNET ≤45° latitude >45° latitude Keith Luyssaert Evapotranspiration Growth of seasonal atm. CO 2 amplitude Model-data discrepancy for each observation calculated as follows: Individual constraints aggregated to total skill score with hierarchical weighting: Net land carbon uptake (deconv.) Figure 2. Hierarchical weighting scheme to aggregate the relative mean squared error of individual observational constraints to a total error which is then mapped to a total skill score.  (2013) Transient Land uptake (deconvolution) Global land uptake from atmospheric deconvolution (this study) Transient Land uptake (IPCC) Global land uptake in five periods Ciais et al. (2013) which was regridded to the model grid, and the MSE is calculated from the averaged monthly fields in the measurement period. The modeled total and soil carbon distribution between 1982 and 2005 are compared to a data set based on observations (Carvalhais et al., 2014), regridded to the model resolution. The soil carbon map is divided into low-and highlatitude regions in order to avoid potential biases from peat areas with very high soil carbon content.
For site level observed NPP (multi-biome NPP; Olson et al., 2013, and FLUXNET v3.1;Luyssaert et al., 2009Luyssaert et al., , 2007, the site measurements are compared to the averaged modeled NPP of natural vegetation between 1931 and 1997 of the corresponding model grid cell. If multiple measure-ments are contained in the same grid cell they are averaged. Similarly, the site level measurements of biomass carbon (Keith et al., 2009;Luyssaert et al., 2009Luyssaert et al., , 2007 are compared to the modeled natural vegetation carbon, averaged between the periods 1950-2000 and 1931-1997 respectively. The biomass carbon of Luyssaert et al. (2009) is obtained by using a carbon to organic matter conversion factor of 0.475.
The TM2 (Kaminski et al., 1999), a global atmospheric tracer model, was used to translate the gridded landatmosphere flux to local anomalies in atmospheric CO 2 . This method does not include the interannual variability of the transport. Nine sites from the GLOBALVIEW-CO2 database (GLOBALVIEW-CO2, 2013) were selected and the annual offset-corrected seasonality of CO 2 in the period of 1980-2013 was compared. The influence of sea-air carbon exchange on the seasonal cycle and trend in atmospheric CO 2 are taken into account. This is done by prescribing net sea-toair fluxes as simulated by the Bern3D model (standard setup) (Battaglia and Joos, 2018;Roth et al., 2014;Ritz et al., 2011). The growth of the seasonal amplitude at a subset of four sites with high seasonality was used as a further constraint.
The modeled mean annual evapotranspiration between 1989 and 2005 was compared to the LandFlux-EVAL evapotranspiration data product (Mueller et al., 2013).
The global terrestrial carbon flux is constrained by a deconvolution, for which the global atmospheric CO 2 concentration; the median of an ensemble of simulated oceanatmosphere fluxes (Battaglia and Joos, 2018), consistent with other estimates (Khatiwala et al., 2013;DeVries, 2014); and an inventory of anthropogenic CO 2 emissions (Boden et al., 2017) were used. The combined error in Eq. (1) is estimated by propagating the 90 % confidence interval of oceanatmosphere fluxes and assuming a 5 % uncertainty for the anthropogenic emissions (Ballantyne et al., 2015).
The estimates of global soil and vegetation carbon as given by IPCC (Ciais et al., 2013) are used as a global constraint. The observation-based estimates are compared to the average soil and vegetation carbon over the whole industrial period. Additionally, the estimates for the global land-atmosphere flux in the periods 1970-1979, 1980-1989, 1990-1999, 2000-2009 and 2002-2011 are compared to the simulated land-atmosphere fluxes over the same period. Since the model simulation starts only in the year 1800, the estimated land-atmosphere flux over the industrial period from 1750-2011 is compared with the model by approximating the flux of the period 1750-1800 with 1801-1850. For all global constraints, the uncertainties reported by IPCC are used as an estimate for the combined error in Eq. (1).

Definition of land-use emissions and the setup of the model ensembles
To quantify emissions from LULCC, a second simulation featuring a time-invariant pre-industrial land-cover distribution and nitrogen fertilization is run for every ensemble member. In accordance with the TRENDY model intercomparison (Sitch et al., 2015), we define the emissions from LULCC as the difference of the change in carbon in the reference and fixed LULCC simulation. The change in carbon in the land system is calculated from the cumulative net biome production (NBP), including emissions from product pools. Since the additional simulations with fixed LULCC feature transient CO 2 and climate forcing, the direct impact of climate change and increasing CO 2 on E LUC are considered; however, unlike in coupled models (Strassmann et al., 2008) physical and biogeochemical feedbacks of LULCC on the climate are neglected. We refer to the literature (Strassmann et al., 2008;Pongratz et al., 2014;Stocker and Joos, 2015) for further discussion of differences in the definition of land-use fluxes.
For each of the parameter sets four transient simulations over the industrial period are performed: (i) a simulation with prescribed net transitions (M net,net and M gross,net ), (ii) a simulation with prescribed gross transitions (M gross,net and M gross,gross ), (iii) a run with land-use area fixed at preindustrial levels and (iv) a run with land use including shifting cultivation held at preindustrial levels. The last two simulations are used purely diagnostically to determine E LUC . E LUC is investigated using three different ensemble configurations. M net,net labels the standard model version featuring only net LULCC transitions. M gross,net and M gross,gross feature modules for shifting cultivation and wood harvest (gross land use) but lack northern peatlands due to technical limitations. M gross,net reuses the skill scores calculated for M net,net and M gross,gross features skill scores calculated on the basis of the gross land-use configuration.
For the M net,net ensemble and the M gross,net ensemble, the prior distributions of the model parameters were improved iteratively during the development of our benchmark system. Consequently, the solutions for the model parameters and associated model outcomes converge. For example, the prior and the posterior probability distribution of the sampled parameters are nearly identical (Fig. 1). This provides strong support that an optimal solution for the sampled parameters has been found for the applied model structure and observational constraints. In contrast, the parameters of the M gross,gross ensemble were not improved iteratively, given the computational cost, and prior and posterior solutions do not converge.

Land-use emissions
The use of the ensemble framework allows us to quantify both the magnitude and the uncertainty of land-use emissions in a model due to parameter spread. Following the procedure outlined in the method section, E LUC is computed for every ensemble member. In this section, we first present E LUC , total land-atmosphere fluxes and the residual land carbon sink on a global scale for the three ensemble configurations and then further analyze spatial patterns and regionally aggregated estimates.

Global fluxes
Global aggregates of skill weighted median NBP, E LUC , residual terrestrial sink and their respective cumulative fluxes, including a 90 % confidence interval as an estimate for model parameter uncertainty, are shown in Fig. 3. For the standard model configuration M net,net , featuring net land use, the total change in land carbon (i.e., cumulative NBP) is a release of 24.4 (4.5, 44.0) PgC from 1860 to 1960 and an up- In addition to the standard model configuration a second ensemble of a model configuration M gross,net featuring modules for shifting cultivation and wood harvest (gross land use) is employed. By using the skill scores M net,net , the parametrization remains identical, allowing us to compare the pure mechanistic difference between the two versions. The difference in median E LUC between the net and gross land-use configuration is most pronounced in the second half of the 20th century and amounts to 44.5 PgC between 1860 and 2016. The gross land-use ensemble simulates on average 0.40 PgC yr −1 more emissions due to LULCC between 1950 and 2016. This result is compatible with the earlier study by Stocker et al. (2014), which investigated land-use change using an earlier version of LPX-Bern with a single parameter configuration. The residual terrestrial sink shows as expected a near identical behavior in the two versions. The resulting total change in land carbon is negative, with a slight uptake of carbon at the end of the century, amounting to 9.3 (−0.9, 22.2) PgC between 1990 and 2016.
A third model configuration M gross,gross is obtained by recalculating the skill scores from the gross land-use results. As described in Sect. 2.5, the prior distributions of the M gross,gross were not improved iteratively to yield convergence between prior and posterior solutions. This leaves only 200 runs in M gross,gross in contrast to the 667 runs in M net,net and consequently M gross,net . In addition, several important benchmarks such as vegetation carbon density are not simulated as well in M gross,gross compared to M net,net and M gross,net . Since NBP is constrained by observations, median cumulative NBP from 1860 to 2016 is only 18.6 PgC lower in the M gross,gross than in the M net,net ensemble. Surprisingly E LUC is only 21.4 PgC higher over the period from 1860 to Table 3. Comparison of the skill weighted median emissions due to land-use change in the two constrained LPX parameter ensembles (90 % confidence intervals in brackets) to the bookkeeping method and DGVM model ensemble of Le Quéré et al. (2016). The uncertainty in the DGVM multi-model ensemble is given by the standard deviation across model members; for the bookkeeping method a best-value judgement on the uncertainty of ±0.5 PgC yr −1 is provided.
Mean E LUC (PgC yr −1 ) 1960-1969 1970-1979 1980-1989 1990-1999 2000-2009 ) is reduced from 90 % in the standard M net,net ensemble to 83 % in the M gross,gross ensemble. As a result of these two adjustments, E LUC is smaller in the M gross,gross than in the M gross,net ensemble. If the relative importance of the land-atmosphere observational constraints is increased, the difference in E LUC of M gross,gross and M net,net is decreased even further. E LUC as simulated by LPX-Bern is compared in Table 3 to a bookkeeping method and a DGVM model ensemble average from the Global Carbon Project (GCP, Le Quéré et al., 2016). E LUC in the net land-use configuration M net,net is considerably smaller than the estimates of the GCP with an average annual emission of 0.64 (0.29, 0.95) PgC yr −1 between 1960 and 2009, compared to the 1.4 PgC yr −1 of the bookkeeping approach and the 1.2 PgC yr −1 of the multimodel DGVM approach. The emissions of the gross landuse configuration with gross skill scores are higher but still fairly low at 0.88 (0.51, 1.17) PgC yr −1 . The version featuring gross land use with net skill scores yields higher landuse emissions at 1.07 (0.66, 1.46) PgC yr −1 , which is within the uncertainties of both estimates. The largest discrepancy between LPX and GCP is found in the 1990s and 2000s. The uncertainty in the parameter ensembles is comparable to the uncertainty in the multi-model ensemble of the GCP. The tendency towards low emissions is a consequence of the ensemble favoring low emissions to match the observational total land-atmosphere flux, combined with a relatively weak residual terrestrial sink in LPX-Bern.
In the following the ensemble version with gross land use and skill scores from the net land-use ensemble M gross,net is used to investigate the spatial structure of E LUC . This is motivated by the much better representation of the vegetation carbon benchmark in the M gross,net ensemble than in the M gross,gross ensemble and a higher confidence in the overall benchmark performance of the M net,net ensemble. A caveat of this choice is that the net land-atmosphere flux is underestimated in M gross,net because the residual land sink only responds to the lower E LUC of M net,net . However, if only considering E LUC we expect the magnitude of the residual land-sink and net land-atmosphere flux to be less important than model performance with respect to vegetation carbon  and other benchmarks.

Spatial patterns and regional aggregates
The land-atmosphere fluxes show large regional differences (Fig. 4). The most pronounced feature of net atmosphereland fluxes is the release of carbon due to deforestation in the Amazon rainforest and the regions close to the equator and a tendency towards a net uptake of carbon at higher latitudes, such as central Europe. The calculated land-use emissions E LUC are positive everywhere except central Europe and the west coast of North America, resulting in the expected overall emission of carbon due to land-use change. The residual carbon uptake, that is the total atmosphere-land flux minus the contribution of land-use change, shows a consistent uptake of carbon between 1901 and 2016, with the exception of some areas with high ensemble uncertainty. There are large regions where the 90 % confidence interval in the ensemble does not agree on the sign; however, most of these areas feature low NBP.  The E LUC estimates of M gross,net are aggregated to regions and compared to estimates of Houghton and Nassikas (2017) (Fig. 5). Since spatial output in LPX is only available after 1901 in LPX, the period 1850 to 2015 in Houghton and Nassikas (2017) is approximated by the interval 1901 to 2015. The global skill weighted median E LUC from 1850 to 1900 amounts to 24.5 (16.9, 33.6) PgC. Overall, the global median emissions between 1850 and 2015 in LPX amount to 144.5 (97.5, 192.7) PgC, which is very close to the estimate in Houghton and Nassikas (2017) of 145.5 ± 16.0 PgC. The largest discrepancy in the individual regions is found in South and Southeast Asia, where LPX yields lower emission estimates, which might be a consequence of the lack of tropical peatlands in the ensemble. In the recent decade from 2005 to 2015, the agreement is less pronounced. While the global annual flux simulated by LPX of 866 (552, 1181) TgC yr −1 is within the uncertainty of the independent estimate of 1113 ± 345 TgC yr −1 , the distribution of this flux to the regions shows some divergence. In LPX the tropical regions yield lower emissions, which is somewhat offset by a weaker sink effect in the temperate regions of North America, Europe, China and the former Soviet Union.
By using the Natural Earth Data administrative borders the E LUC estimates of M gross,net are aggregated to individual countries. The E LUC estimates of the 10 countries with the largest contribution to total E LUC from 1901 to 2016 are shown in Fig. 6. Brazil emitted the most carbon due to landuse change, because of the size of the country combined with the high emissions per unit area. The United States of America, China and Russia have moderate per-unit-area emissions but are a large contributor due to their sheer size. These three countries show a decrease in emissions in the 21st century, with the USA and Russia having negative emissions in the 2000s. Indonesia shows the largest per-area emissions of the considered countries and emissions increase in the 2000s. The emissions in Indonesia are likely underestimated due to a lack of tropical peatlands in the ensemble.

Evaluation of ensemble performance with respect to observational targets
In this section, the performance of the net land-use ensemble members (M net,net ; M gross,net performance is nearly identical) in the different observational metrics is discussed. In Fig. 7   higher than the minimum skill criterion. With the exception of the biomass measurements by Keith et al. (2009) and the FAPAR benchmark, the maximum skill in the constrained ensemble is identical to the full ensemble. The reduced maximum skill in those benchmarks is due to an exclusion of singular runs excelling at this benchmark but performing badly in others. LPX v1.4, indicative of the M net,net ensemble performance, is compared to the observational targets in more detail in Figs. S1-S14 in the Supplement.
As an illustration of the observational constraints, we consider the seasonal cycle of atmospheric CO 2 and the growth in the amplitude of the seasonal cycle of atmospheric CO 2 . In Fig. 8 the median simulated values, as well as the 90 % confidence interval, of the M net,net ensemble are compared to the atmospheric measurements (GLOBALVIEW-CO2, 2013) for a subset of two measurement sites, Alert (Nunavut, Canada) and Terceira Island (Azores, Portugal). The model ensemble is able to reproduce the seasonality pattern, as well as the increase in seasonal amplitude. As expected, the interannual variability in seasonal amplitude of CO 2 is not captured as the atmospheric transport model TM2 does not represent interannual variability in mass transport.
For the scalar targets, the median values and range of the full and constrained ensemble are compared in Fig. 9. The constrained ensemble shows a consistently improved performance for the uptake targets. In general, the targets are matched well for the 20th century but net land carbon uptake is underestimated in the model ensemble compared to the observational estimates in the beginning of the 21st century. Soil carbon and vegetation carbon inventory are matched well in the model, with a considerable decrease in model spread in the constrained ensemble. The median vegetation carbon of the constrained ensemble is lower than the full ensemble. This is due to a trade-off in the skill of land carbon uptake; increased vegetation carbon leads to a higher release of carbon due to deforestation.
Vegetation carbon inventory and spatial distribution are highly relevant for E LUC estimates . The sum of the vegetation carbon estimate and soil carbon estimate by Carvalhais et al. (2014) is used as a constraint for the total carbon; however, the individual vegetation carbon data is not used as a constraint. Nevertheless, the global vegetation carbon inventories of the two products are compatible with 422 (328, 523) PgC for the vegetation carbon as simulated by LPX and 449 PgC for the Carvalhais et al. (2014) estimate. The spatial patterns (Fig. 10) between simulated vegetation and the Carvalhais et al. (2014) estimates are fairly consistent with a correlation between the two products of r 2 = 0.83. LPX simulates somewhat more carbon in vegetation in the high latitude. The extent of areas with high vegetation density in tropical Africa is larger in LPX, but peak vegetation density in this area is lower than in the observational product. The vegetation carbon density in the model is somewhat lower in Southeast Asia.
We compare the total land-atmosphere exchange flux to the results of the atmospheric CO 2 deconvolution in Fig. 11. In panels (c, d) the growth in the amplitude of atmospheric CO 2 for the same two measurement sites (GLOBALVIEW-CO2, 2013) (blue) is compared to the median of the model ensemble, with the 90 % confidence interval shaded in red. A linear fit indicated by dashed lines is included. The CO 2 concentration at a given site and time is computed with the TM2 transport model using simulated net land-atmosphere fluxes for each ensemble member and oceanatmosphere fluxes from the Bern3D ocean model (Battaglia and Joos, 2018). The seasonal cycle of CO 2 is dominated by fluxes from the land -in particular, the Northern Hemisphere.
The model ensemble shows lower emissions in the early 20th century and slightly underestimates NBP in the latter half of the 20th century compared to the deconvolution. The overall exchange of carbon over the industrial period is within the uncertainty of the estimate.
We investigate the dependency of the constrained ensemble on the choice of the observational constraints by reevaluating the ensemble for a subset of observations. We created 19 weighting schemes, each missing one of the individual observational constraints ( Fig. 2 and Table 2) and otherwise identical to the default scheme. Then the median skill weighted parameter values of these ensembles are compared to the best-guess values of M net,net (Sect. 3.3). The relative change in parametrization is less than 1 % for 15 out of the 19 considered alternative weighting schemes. Leaving away the global vegetation and soil carbon constraints lead to moderate changes, notably to a change in the parameter for mortality (mort max ) of 4 and 2 % respectively. Not including the soil carbon distribution in high latitudes lead to an increase in the parameter for the dependency of soil respiration on temperature (E 0,hr ) of 2 %. The largest changes in parametrization were observed when not considering the atmospheric deconvolution; most notably the sapwood-heartwood turnover time τ sapwood decreased by 5 %. When omitting entire categories in the benchmarking scheme, the changes in parametrization are larger than for omitting individual constraints, with parameter changes of up to 1 % for the fluxes, 5 % for the inventory and 6 % for the transient category. This shows that the final parametrization is not overly sensitive to the inclusion or omission of a single observational product.
The unweighted kernel density estimates of the prior (full ensemble) and posterior (constrained ensemble) parameter distributions are shown in Fig. 1. The iterative procedure discussed in Sect. 2.3.2 results in only slight changes in the posterior distribution with respect to the prior distribution. The median of the distributions is, however, substantially different from the initial parameter value used in LPX v1.2, the version used as a starting point for this study.

Parameters of the new reference model version
We use the constrained ensemble to establish a new reference model version, featuring a set of optimized parameters. The reference version will be used for model simulations where the use of an ensemble is not appropriate or required.
The skill weighted median parameter values of the constrained ensemble are used as a reference model and its parameter values are shown in Table 1. In Fig. 11 cumulative NBP is displayed for an older model version, the mean values of constrained and unconstrained model ensemble and a run with the new best-guess parameters. The best-guess version is similar to the mean behavior of the constrained ensemble, showing a net uptake of carbon in the latter half of 1 8 0 0 -2 0 1 1 1 9 8 0 -1 9 8 9 1 9 9 0 -1 9 9 9 2 0 0 0 -2 0 0 9 2 0 0 2 - the 20th century, consistent with observations (Ciais et al., 2013). We note that the intermediate version 1.3 used in Keller et al. (2017) features similar parameter settings as determined here. This version simulated 20th century changes in carbon isotope discrimination and intrinsic water use efficiency in good agreement with tree-ring data. The severe underestimation of the land-carbon sink in older versions of LPX-Bern was a consequence of the introduction of new features and improvements in the code of LPX-Bern, without subsequent retuning of the parametrization. The parameter changes are most pronounced in the temperature dependence of heterotrophic respiration E 0,hr and α m , a parameter associated with plant water demand. Both of these changes are not unexpected, as they increase the land carbon sink. In the case of heterotrophic respiration, less carbon is lost due to increasing surface temperature and the increased water demand amplifies the CO 2 fertilization effect.
Overall, the updated parametrization shows a wellbalanced performance in the spatial benchmarks shown in Fig. 7. The older LPX version excels at singular metrics, namely the amplitude growth of CO 2 and the FLUXNET measurements, but breaks down at others, such as the spatial distribution of carbon and evapotranspiration. Furthermore, it also performs considerably worse in the scalar and deconvolution targets.
The choice of using the skill weighted median parameters of the constrained ensemble instead of simply using the bestperforming parameter set for the reference version is motivated by its robustness and representativeness of the ensemble. While the best-performing model member certainly possesses a higher skill score, its parameter values can depend strongly on the choice and weighting of the observational tar- The simultaneous assimilation of multiple observational constraints yields soil and vegetation stocks and distributions which are consistent with observations. The total landatmosphere carbon flux is reproduced relatively well in the model configuration using net land-use M net,net . Comparing the land-atmosphere carbon flux to the independent flux estimates by Schimel et al. (2015) in the period 1990-2007, the tropical and southern fluxes are in good agreement to the atmospheric deconvolution results with airborne constraint with a flux of 0.24 (−0.02, 0.57) in LPX-Bern. The flux in the northern extratropical areas of 0.50 (0.37, 0.63) is on the lower end but easily fulfills the mass balance. The observed uncertainties of E LUC due to parameter uncertainty in the DGVM LPX are on the same order of magnitude as structural uncertainties, such as including or not including modules for shifting cultivation and wood harvest. The effect of the inclusion of additional land-use processes can even be compensated by a change in parametrization, while still conforming to the observational benchmarks, indicating that it might be possible to capture the magnitude of E LUC , while neglecting second-order processes. The compensation of E LUC occurs because the residual sink is less sensitive to parametrization changes than the E LUC in LPX-Bern. This behavior has also lead to an E LUC that is on the lower end of independent estimates (Le Quéré et al., 2016). A lack of large difference in E LUC from model setups featuring gross and net land use might seem in contrast with the result of other studies investigating these processes Wilkenskjeld et al., 2014;Stocker et al., 2014;Shevliakova et al., 2009); however, if we keep parametrization constant (M gross,net ) we find the expected lower E LUC for net land use.
We investigated the magnitude and spatial distribution of E LUC in the model configuration using the skill scores and parametrization from the standard net land-use configuration with additional processes of shifting cultivation and wood harvest (M gross,net ). This choice is motivated by the good performance of the net land-use ensemble in the observational benchmarks (Sect. 3.2 and Figs. S1-S14).
A good correspondence between simulated fluxes and the estimates of Houghton and Nassikas (2017) in 10 regions during the industrial period is found. When comparing recent decades, LPX-Bern generally seems to simulate lower E LUC than both the bookkeeping-approach-based estimate and the aggregated estimates in the GCP. The biggest disparity is comparatively low fluxes in the South and Southeast Asia regions in LPX-Bern, which are at least partially explained by the lack of tropical peatlands in this model configuration. The burning and draining of tropical peatlands is an important contribution to E LUC in tropical regions (Roman-Cuesta et al., 2016;Koh et al., 2011;Hooijer et al., 2010). The annual emissions estimate from draining peatlands used in Houghton and Nassikas (2017) increase from almost no emissions in 1980 to roughly 0.2 PgC yr −1 in 2015. The lack of tropical peatlands is also consistent with the underestimated soil carbon density in these regions when compared 2924 S. Lienert and F. Joos: Data assimilation: model parameters and land-use emissions to Carvalhais et al. (2014). Other studies suggest higher historical E LUC , such as the bookkeeping approach by Hansis et al. (2015), including shifting cultivation, with an estimate of 261 PgC between 1850 and 2005. Some of the difference between DGVM model results and bookkeeping approaches can be attributed to different definitions of LULCC emission Stocker and Joos, 2015).
A recent study by Li et al. (2017) constrained E LUC by using biomass observations. They derived a relationship between E LUC and biomass in nine regions using the nine DGVMs in the TRENDY v2 model intercomparison (Sitch et al., 2015) and applied empirical estimates for biomass carbon to arrive at a constrained E LUC of 155 ± 50 PgC between 1901 and 2012. The result of 116 (77, 156) PgC as in this study is compatible, albeit somewhat lower. By neglecting all other constraints and exclusively using the global vegetation carbon by IPCC (Ciais et al., 2013) and the biomass map by Carvalhais et al. (2014) (also used as one of the constraints in Li et al., 2017) as constraints, we arrive at a higher E LUC of 130 (87, 179) PgC. This illustrates the importance of the biomass inventory for the magnitude of E LUC .
E LUC is not only influenced by uncertain model processes and parametrizations but also the underlying LULCC forcings (Goll et al., 2015). Peng et al. (2017) have shown that the choice of transition rules, governing how new land-use areas are allocated from previous areas, has a considerable effect on E LUC . The effects of these uncertainties are not accounted for in this study since we only use one land-cover forcing product and one set of transition rules is used.
Overall, the ensemble approach produces E LUC estimates consistent with other independent estimates, albeit somewhat on the lower end of the range of estimates. This is a consequence of the constraining process favoring parametrization with low E LUC over a high residual sink, which is discussed further in the next section.

Benchmark performance and best-guess version
A hierarchical weighting scheme to compare a diverse set of constraints was employed, following earlier work . A set of 14 data sets (Fig. 2, Table 2) was selected to constrain the model's performance with regard to steady state carbon and water fluxes and carbon inventories as well as with regard to transient changes. Globally aggregated as well as spatially resolved information is used to constrain simulated spatial patterns and to robustly model global mean properties. The temporal focus is on the decadal-to-century timescales most relevant for projections of anthropogenic climate-carbon cycle changes and on the seasonal cycle of photosynthesis and the decadal amplification of the seasonal cycle in land-atmosphere fluxes (McGuire et al., 2001;Graven et al., 2013) which provide information on underlying processes. The iterative procedure for choosing the prior parameter distribution yielded an en-semble which performs well with respect to the selected metrics.
In addition to the weighting of model results with the global skill score, we employed a minimum skill criterion, discarding runs with very bad performance in a singular metric. This approach is somewhat comparable to pre-calibration methods, where implausible parameter spaces are also ruled out (Williamson et al., 2017;Holden et al., 2010;Edwards et al., 2011), and aims to sensibly reduce the size of the parameter space.
While the uptake of carbon by the terrestrial biosphere in the model ensemble is significantly larger than earlier versions of LPX, it is still in the lower range of estimates. A direct way of increasing the magnitude of change in land carbon is to change pool sizes, which is here restricted by other observational constraints. The inclusion of more processes, such as natural and human-induced erosion , could also increase the strength of the terrestrial sink; however, other processes such as shifting cultivation lead to a decrease in the land carbon sink. A further possibility is the revising of established processes in the model. The climatic dependence of the auto-and heterotrophic respiration is an important component, mitigating the CO 2 fertilization effect. The implementation of a more refined module might decrease this negative feedback, thus increasing carbon storage and sink sensitivity. The sink strength could potentially also be enhanced by including so far not included parameters and including additional constraints that discriminate between the different components of the land sink.
Fossil carbon emissions and thus the net biome production and the carbon sink inferred from the deconvolution may be biased high for the most recent decades. The fossil emissions are estimated from fossil-fuel production data, which include the fraction used for non-combustion purposes such as the production of plastics and asphalt. Boden et al. (2017) assume non-fuel uses equal to zero (Andres et al., 2012) since the products will eventually be oxidized as well. Geyer et al. (2017) estimate that 8.3 Pg of plastics were produced between 1950 and 2015, of which 2.6 Pg were in use in 2015, 0.8 Gt incinerated and 4.9 Gt discarded. This implies that between 2.6 and 7.5 Pg plastic may still be left unoxidized. This is relatively small compared to the residual terrestrial sink, estimated to be around 69 (51, 93) PgC for the period from 1950 to 2015 (M net,net in Fig. 3). However, about half of the plastic was produced since 2000 and estimated production is about 0.4 Pg yr −1 in 2015. In addition, about 0.1 Pg of bitumen asphalt is produced annually. Considering that most of the molecular weight of plastics is from carbon, fossil CO 2 emissions and in turn the terrestrial sink are biased high by up to 0.5 PgC yr −1 in 2015. This potential bias may be compared to the residual terrestrial sink flux of 1.2 (0.8, 1.7) PgC yr −1 during 2005 to 2015. Interestingly the deconvolution of the atmospheric and fossil CO 2 records suggests a recent acceleration in the trend of the net biome production (Fig. 11); this acceleration may also be biased high. In conclusion, con-sidering plastic and asphalt products brings the most recent trends in the net biome production from the deconvolution versus the LPX model in better agreement, while estimates of net biome production and the terrestrial sinks are hardly affected before 2000 CE.
The release of both spatially and temporally resolved carbon flux observations by using remote sensing, such as the Carbon Monitoring System Flux Pilot (CMS) project, featuring not only net fluxes but also gross production and respiration, which is a very promising candidate for constraining the parameter space further. The spatial structure might restrict the apparent degree of freedom in partitioning the terrestrial sink in E LUC and residual land carbon sink. δ 13 C isotope measurements in vegetation also have the potential to be a useful additional constraint in land biosphere models (Keller et al., 2017).
Another avenue of increasing model performance is to introduce spatially explicit parametrization, as used in multimodel averaging studies (Exbrayat et al., 2018;Schwalm et al., 2015). A caveat of using this approach with a single model is a potential overfitting of the parameters.
The simultaneous assimilation of multiple observational constraints allowed us to formulate a well-rounded bestguess version of the model. While this parameter version does not necessarily excel at every single benchmark, it shows a consistent performance amongst all different targets. This behavior leads us to believe that the best-guess version is well suited for simulations spanning long time spans, both for paleo-and future research questions, where the use of a full parameter ensemble is not feasible. Furthermore, it can also be used in model intercomparison studies, where single realizations of different models are compared.

Conclusions
We successfully applied a multi-purpose model benchmark to a perturbed parameter ensemble of a dynamic global vegetation model (DGVM). Specifically, we developed a bestguess model version and constrained the residual carbon sink flux and carbon emissions from anthropogenic land use (E LUC ) over the industrial period. The general characteristics of the framework are as follows. (i) The framework permits a standardized model benchmarking Kelley et al., 2013;Luo et al., 2012;Blyth et al., 2011) by comparing different models or model versions graphically and using statistical metrics (Stow et al., 2009) to a broad and diverse range of observations. (ii) The efficient Latin hypercube sampling method (McKay et al., 1979) is used to explore the model parameter space and to set up and run perturbed parameter ensembles for a large set of model parameters. The advantage of the Latin hypercube sampling is the representative sampling of different parameter combinations, whereas a shortcoming is that the sampling size has to be determined in advance. (iii) A hierarchical model weight-ing scheme is used to assimilate diverse observations. These may differ with respect to spatial and temporal resolution and quality and include observations from the local scale, such as data from individual biomass measurements or the seasonal CO 2 cycle at individual atmospheric sampling sites, up to global-scale gridded data products such as satellite measurements of absorbed radiation by plants. A major advantage of this scheme compared to sequential assimilation techniques such as ensemble Kalman filters is that the influence of necessarily subjective choices (Rougier, 2007) on the results can be investigated a posteriori -in other words without performing costly additional simulations. The subjective choices may be of scientific nature such as whether an observational data set is considered or not or of a more technical nature such as whether gridded data values are weighted by grid cell area or not. (iv) The applied modular framework is easily extendable to incorporate different or more observational constraints and to different mechanistic models including other DGVMs, ocean models (Battaglia et al., 2016) or Earth system models Steinacher and Joos, 2016)). (v) The Bayesian, skill score weighted ensemble is able to constrain the median and uncertainty ranges of unknown or uncertain quantities such as carbon emissions from anthropogenic land use, marine nitrous oxide production (Battaglia and Joos, 2018) or climate sensitivity metrics . (vi) Finally, the skill score weighted ensemble is suitable for probabilistic projections including both likely and less likely model configurations and assumptions.
A new reference version of the LPX-Bern (v1.4) DGVM was established. We were able to show that the constrained ensemble, as well as a resulting best-guess version, performs consistently well under a range of benchmarks (Table 2) while satisfying a minimum skill criterion in every single benchmark. The new model version LPX-Bern v1.4 successfully simulates observation-based estimates of the cumulative net land uptake and release over the industrial period.
Many previous studies have investigated inherent uncertainties in E LUC estimates (Houghton et al., 2012;Goll et al., 2015;Peng et al., 2017). Our study aims to contribute to this ongoing discussion by providing DGVM E LUC uncertainty estimates purely due to parameter uncertainty in an observationally constrained model ensemble using the LUH2 v2h (Hurtt et al., 2018) product. Overall, the benchmarking scheme favors runs with low emissions due to a relatively low residual sink sensitivity in the model and constraining total land-atmosphere fluxes. We consider model ensembles with and without additional land-use processes (shifting cultivation and wood harvest) and find that the difference in global E LUC is on the same order of magnitude as parameterinduced uncertainty. The inclusion of shifting cultivation and wood harvesting increases emissions similar in magnitude to earlier studies (Stocker et al., 2014;Shevliakova et al., 2009) when applying the same model parameters, while in some cases these additional emissions could potentially even be offset with appropriate parameter choice. We attributed the fluxes to different countries and more closely investigated the 10 countries with the most emissions in the industrial period due to land-use and land-use change. Our land-use carbon emission estimates are similar to those of Houghton and Nassikas (2017) on the country level and overall consistent with other independent estimates on regional to global levels Le Quéré et al., 2016).
The observation-constrained DGVM ensemble and bestguess version established in this work are ready for use in model intercomparison studies (Tian et al., 2018;Sitch et al., 2015) and longer time span paleo-simulations. It may also be applied to quantify future terrestrial carbon fluxes and E LUC for different shared socio-economic pathways. Additional new observational data streams may be implemented in our modular framework to further refine results.
Data availability. Model output is available upon request to the corresponding author (lienert@climate.unibe.ch).