The Climate Dependence of the Terrestrial Carbon Cycle, including Parameter and Structural Uncertainties

The feedback between climate and the terrestrial carbon cycle will be a key determinant of the dynamics of the Earth System (the thin layer that contains and supports life) over the coming decades and centuries. However, Earth System Model projections of the terrestrial carbon-balance vary widely over these timescales. This is largely due to differences in their terrestrial carbon cycle models. A major goal in biogeosciences is therefore to improve understanding of the terrestrial carbon cycle to enable better constrained projections. Utilising empirical data to constrain and assess component processes in terrestrial carbon cycle models will be essential to achieving this goal. We used a new model construction method to data-constrain all parameters of all component processes within a global terrestrial carbon model, employing as data constraints a collection of 12 empirical data sets characterising global patterns of carbon stocks and flows. Our goals were to assess the climate dependencies inferred for all component processes, assess whether these were consistent with current knowledge and understanding, assess the importance of different data sets and the model structure for inferring those dependencies, assess the pre-dictive accuracy of the model and ultimately to identify a methodology by which alternative component models could be compared within the same framework in the future. Although formulated as differential equations describing carbon fluxes through plant and soil pools, the model was fitted assuming the carbon pools were in states of dynamic equilibrium (input rates equal output rates). Thus, the parame-terised model is of the equilibrium terrestrial carbon cycle. All but 2 of the 12 component processes to the model were inferred to have strong climate dependencies, although it was not possible to data-constrain all parameters, indicating some potentially redundant details. Similar climate dependencies were obtained for most processes, whether inferred individually from their corresponding data sets or using the full terrestrial carbon model and all available data sets, indicating a strong overall consistency in the information provided by different data sets under the assumed model formulation. A notable exception was plant mortality, in which qualitatively different climate dependencies were inferred depending on the model formulation and data sets used, highlighting this component as the major structural uncertainty in the model. All but two component processes predicted empirical data better than a null model in which no climate dependency was assumed. Equilibrium plant carbon was predicted especially well (explaining around 70 % of the variation in the withheld …


Introduction
Whilst models of the Earth System (the thin layer that contains and supports life) have evolved in response to improvements in our understanding of different processes (Randall et al., 2007), wide differences in the predictions of different models still greatly limit decision making about how best to adapt to climate change (Cox and Stephenson, 2007;Kerr, 2011;Maslin and Austin, 2012).Improvements in characterising the consistency of models with empirical data and with each other, in terms of predictive accuracy and in terms of inbuilt assumptions, would improve clarity about why model predictions differ and hopefully enable critical improvements to be made that improve confidence in predictions.
Models of the terrestrial carbon cycle are one of the Earth System Model components most critically in need of improvement.The terrestrial carbon cycle has major effects on the dynamics of the Earth System over decadal or longer timescales (Denman et al., 2007), and, whilst this means that terrestrial vegetation currently accounts for approximately 60 % of the total annual flux in atmospheric carbon dioxide and absorbs around a quarter of anthropogenic carbon dioxide emissions (Denman et al., 2007), there is great uncertainty about how this balance will change in the future (Cramer et al., 2001;Friedlingstein et al., 2006;Denman et al., 2007;Sitch et al., 2008).This uncertainty is largely because models exhibit wide differences in their predictive accuracy (Keenan et al., 2012) and lead to widely diverging and inconsistent projections (Friedlingstein et al., 2006).
Resolving the problem of the differences in carbon model predictions is a major research challenge.Traditionally, carbon models have not been developed in a way that enables detailed intercomparisons to assess why their predictions differ.Their component processes and their parameterisations have been based on contemporary understanding but have not explicitly (quantitatively) incorporated confidence in how that understanding is based on empirical data.Further research has typically added more details to these models but has rarely gone back and characterised the consistency of the initial assumptions, or the overall model, with empirical data.Recent work has shown that explicitly constraining parameters of terrestrial carbon models with empirical data can lead to better understanding of uncertainty in their parameterisations and of the importance of that uncertainty for predictions (Knorr and Heimann, 2001;Scholze et al., 2007;Zhou and Luo, 2008;Rayner et al., 2011;Ricciuto et al., 2011).Recent systematic comparisons of alternative carbon models or their components have also shown how differences and inconsistencies between different models can be identified more precisely (Keenan et al., 2012;van Oijen et al., 2011;Randerson et al., 2009;Kloster et al., 2010).These analyses have been facilitated by the increased availability of more varied and detailed data sets on terrestrial carbon stocks and fluxes from around the globe.
Delivering better constrained projections of terrestrial carbon cycle dynamics could soon be achieved in light of these recent advances.However, delivering that goal is going to require improved methodologies for the construction, parameterisation and evaluation of terrestrial carbon cycle models, which enable the detailed analyses of the consistency of different model components and their parameterisations with empirical data.Here we develop such a methodology and implement it to fully decompose all component processes of a global terrestrial carbon cycle model in terms of their parameter uncertainty and the accuracy of their predictions with respect to different empirical data sets.Specifically, our goals were to (i) assess the degree of empirical support for simple functional representations of component processes of the carbon cycle, when assessed within a model of how the overall system is connected; (ii) to assess whether the inferred relationships are consistent with current understanding; and (iii) to define a methodology by which we can build on from this model to identify the appropriate balance of details for making better constrained probabilistic projections of the carbon cycle into the future.

Carbon stocks and fluxes
Our empirical data primarily came from field-data collation initiatives that targeted an individual terrestrial carbon stock or flow.These are summarised in Table 1.These data sets were selected on the basis that they (i) were informative about the stocks and fluxes of carbon in natural terrestrial vegetation, (ii) contained at least some data that were representative of vegetation in a state of dynamical equilibrium (selected via a filtering process; see Appendix A) (iii) could be used as information to constrain parameters in our model, (iv) had approximately global coverage, (v) could have single latitude and longitude coordinates assigned each a site-based estimate to enable cross-referencing to spatial climate data, and (vi) could be easily accessed and shared alongside our study to enable reproducibility, investigations of data processing steps, investigations into the importance of the selected data, and controlled comparisons of alternative models.Full details of how all of the empirical data sets were processed are given in Appendix A. Some of these data sets have been superseded by more recent data sets (e.g.our fire data set could now be replaced by the Global Fire Emissions Database (GFED) of Giglio et al., 2010).We hope to incorporate these improved data sets in the future.

Environmental data
All model components incorporated information from sitespecific environmental variables to make predictions, either directly (e.g. the effect of temperature on net primary productivity) or indirectly by requiring input from another component model that itself required environmental data (e.g. the allocation to woody plant parts depends on net primary productivity).All environmental variables were calculated using data contained in either or both of the New et al. (2002) gridded monthly climate data set and the Global Soil Data Task Group (2000) "Global Data Set of Derived Soil Properties" data set.These have spatial resolutions of 10 arcmin and 0.5 decimal degrees, respectively.Full details of how the different environmental variables were calculated are given in Appendix B. Again, we hope these data sets will be upgraded in the future.

Full structure
We developed a terrestrial carbon model as a set of six carbon pools, connected by various flows (illustrated in Fig. 1).Mathematically, this corresponds to a series of six ordinary differential equations (one for each pool), each with three general components: an input rate (e.g.carbon fixation rate); an output rate (e.g.leaf mortality); and a carbon content (e.g.leaf carbon).The chosen level of complexity was biased by our ability to identify global data on at least two out of these three for each carbon pool, so that it would be possible to infer properties of the third.The full terrestrial carbon model is then expressed as follows: where C l , C r , C s , C m , C a , and C b are the amounts of organic carbon stored (kg m −2 ) in leaves, fine roots, structural plant parts, metabolic fraction of the soil, structural fraction of the soil and recalcitrant fraction of the soil (these are defined below); t is time in years (yr); symbols marked with a box are functions that are described below and fully defined in Appendix C; f max is the maximum fraction of net primary productivity allocated to structural plant parts; S f scales the fire-induced mortality rate of structural plant parts relative to that of leaves and fine roots (inferred in this study); k m , k a and k b are the maximum loss rates of the metabolic, structural and recalcitrant soil fractions (yr −1 ; all inferred); and F 1 is the fraction of the structural carbon pool that does not decompose directly to carbon dioxide but enters the recalcitrant pool (unitless; inferred).The terms in boxes in Eq. ( 1) are functions that apply at a given location and given time, which depend on the physical environment at that location and time.Our principal aim was to infer the environmental dependence for each of these functions.However, we also considered, for each component, a null model with no climate dependency (i.e. a constant applying to all locations and times: see below).We do not suggest that these are the "best" representations of the model component functions (boxes in Eq. 1).Rather, we identified reasonable functional forms from the literature, which could be used simply to infer climate-dependent relationships.G determines the plant carbon fixation rate (kg m −2 yr −1 ), based on the MIAMI model of Leith (1975).f s scales the fraction of carbon allocated to structural plant parts over leaves and roots (unitless) and is new to the literature.The mortality rates of leaves (yr −1 ), µ l , is predicted from a new model based on the recent analysis by van Ommen Kloeke et al. (2011) of global patterns of leaf lifespan.The mortality rates of fine roots, µ r , whole plants µ s (yr −1 ), and the fraction of organic carbon in dead leaves and fine roots that enters the metabolic soil organic carbon pool f m (unitless) are based on simple linear functions.Carbon loss rate due to fire µ f (yr −1 ) is based on the models of Thonicke et al. (2001), Kloster et al. (2010) and Arora and Boer (2005).A scales the decomposition rate of organic soil carbon based on classic climate-dependent relationships (Ise and Moorcroft, 2008;Schimel et al., 1996;Bolker et al., 1998;Adair et al., 2008).
When inferring the parameters, we further assume that all carbon pools (and thus all stocks) at the specific locations and times for the empirical data have reached equilibrium (the empirical data were filtered as described in Appendix A).Equation (1) then reduces to simple expressions for the equilibrium carbon contents of plant and soil carbon pools (omitted for brevity).

Computational framework
We built a computational framework to enable the assembly, parameterisation and assessment of multi-component models of arbitrary complexity to enable our study to be conducted (illustrated in Fig. 2).The framework, model, and derivative data necessary to reproduce the results of this paper, as well as a user's guide, are available from research.microsoft.com/en-us/downloads/.The data resulting from conducting the analyses described in this study are available from research.microsoft.com/en-us/downloads/.

Data partitioning into training, evaluation and final test sets
The data sets were partitioned into training, evaluation and final test sets to avoid including parameters only because they help explain fluctuations specific to a particular data set, instead of the general phenomenon being inferred from the data   set (over-fitting).Over-fitting can still occur when adopting this approach if model refinement goes through many iterations and the same training and evaluation sets are used.We therefore first removed a fraction of each data set to be used as a final step to assess the performance of our models (the "final test data").This allows us to test our models against data that played absolutely no role in the model refinement process.We constructed a land surface mask by randomly positioning 0.5 degree squares over the terrestrial land surface until approximately 25 % of the terrestrial land surface had been covered.Any data that fell under this mask were removed permanently as final test data.We performed 10fold cross-validation within our model parameter inference experiments on the data remaining after the removal of the final test data.

Parameter inference
We used a Bayesian approach to infer the probability distributions for the model parameters given our empirical data sets.For every model we used flat (or "uninformative") prior probability distributions for the parameter values.We assume that the probability of observing the data under all possible hypotheses is 1 (the marginal probability).We also made no attempt in this study to incorporate or infer errors and biases associated with the observational data.This is potentially an important assumption, and accounting for these errors will be enabled in the future for empirical data sets that contain estimates of observational uncertainty Under these assumptions, estimating the probability distributions of the parameters reduces to requiring the estimation of their likelihood, given the observed carbon data and environmental conditions.This required us to specify a likelihood function for each model component, which defines the probability of the data, given any combination of parameters.With this function defined, the posterior probability of each parameter could be estimated.Formally, we assume that where bold text denotes a vector.In words, the likelihood L of the predictions Pred of the parameterised model given the observations, Obs, is proportional to the probability P , of the observations given the particular model predictions.The predictions arise from a particular model with parameters Pars and set of environmental conditions Env.We used the Filzbach set of code libraries to find the posterior distributions for the parameters of a given model, given the observed carbon data (Obs) and environmental conditions (Env; http://research.microsoft.com/en-us/um/cambridge/groups/science/tools/filzbach/filzbach.htm).Filzbach implements Markov chain Monte Carlo sampling of parameter space given a set of parameters to be varied (Pars) and likelihood function (L; Gilks et al., 1996).It uses the Metropolis-Hastings algorithm to accept or reject sets of parameter values when compared to the likelihood associated with the parameter values of the previous iteration of the Markov chain (Gilks et al., 1996).In our study, the likelihood function used by Filzbach may depend on the likelihoods associated with several sub-component models, depending on the model parameter inference experiment being run (outlined below, and see Fig. 3 for details).The specific likelihood function chosen to assess each model component against its corresponding empirical data set is detailed in Table 1.
The data sets used contain different relative frequencies of data for different climate regions of the world.We downweighted the log-likelihoods assigned to data points in direct proportion to the relative frequency of data in their respective Holdridge life zones (Holdridge, 1967).This avoids biasing the model parameter inference procedures towards parameter values that predict well those regions of the world that are most frequently represented in the data.

Parameter inference experiments
We investigated the sensitivity of the inferred model functional forms and model performance metrics to using different combinations of model components and data sets by conducting three different parameter inference experimental protocols described below.

Build-up experiments
We inferred parameters to each of the component models indicated in Fig. 3 alongside those of the models on which they depend to make predictions.We refer to these experiments as "build-up", because we started by inferring the parameters to the group 1 models individually and then inferred those to the group 2 models (alongside those of the NPP model), before inferring parameters to each of the group 3 models (and those of the sub-components on which they depend), incrementally working towards inferring all the parameters in all components of the terrestrial carbon model simultaneously (the 12th experiment).
Page 47 of 54 do not require predictions from other models to predict their accompanying data sets.
Group 2 models require predictions from the net primary productivity model.Group 3 models require input from several model components.

Fig. 3.
The full terrestrial carbon model can be represented as a factor graph.All boxes represent model components with accompanying data.Arrows connect a model that acts as a subcomponent (tail of arrow) to another model (head of arrow).Models within group 1 do not require predictions from other models to predict their accompanying data sets.Group 2 models require predictions from the net primary productivity model.Group 3 models require input from several model components.

Omit-data experiments
We sequentially omitted an entire data set associated with each model component in Fig. 3 prior to inferring the parameters of the full model.This enables investigation of how important the information contained in a given data set is for the inferred parameter probability distributions.Each evaluation step also included a fold of the omitted data sets.However, this approach does not allow us to estimate process error associated with a given model in the absence of its associated empirical data.We therefore used the posterior parameter estimates of process error obtained from inferring the parameters of the full model with all empirical data sets when performing model evaluation.

Replace-null experiments
We sequentially replaced a model component in Fig. 3 with a null model consisting of an inferred constant and associated error.This allows us to investigate how important the climate dependency of a particular model is for the predictive performance of other components and their inferred parameter values.

Assessing predictive performance
In general, we calculated performance metrics for each sample of parameter values from the Markov chain after the burnin procedure.The burn-in period was always 20 000 iterations, and the Markov chain length used for parameter sampling was always 120 000 iterations and was subsampled every 100 iterations.This enabled us to also take mean, median and 95th percentiles of various performance metrics over the set of sampled parameter values.

Full model
Overall we infer climatically varying functions for all component processes to our terrestrial carbon cycle model when fitted using all available data sets (red functional relationships in Fig. 1).The inferred climate dependencies of net primary productivity (NPP), increasing but saturating functions of temperature and precipitation (Fig. 1a, b), are consistent with what was established for the classic MIAMI model (Leith, 1975).The proportion of fixed carbon allocated to wood (versus and leaves and fine roots) varies continuously as a sigmoid function of NPP and increases to around 0.35 for the most productive locations (Fig. 1c).The four processes then determining carbon loss rates have contrasting climate dependencies (Fig. 1d-h).Fire increases with dry season length (combustibility) and NPP (fuel), as expected (Fig. 1i, j; Kloster at al., 2010).Contrasting dependencies of evergreen and deciduous leaf mortality (Fig. 1e, f) are inferred, with the relative frequency of evergreen versus deciduous leaves being U-shaped against annual frost frequency (Fig. 1d).This highlights the relatively complex climate dependence of leaf lifespan globally (van Ommen Kloeke et al., 2011).Fine root mortality rate is inferred to increase with mean annual temperature as expected (Fig. 1g; Gill and Jackson, 2000).A relatively flat relationship for the climate dependency of plant mortality is inferred (Fig. 1h); however, this actually results from contradictory information from different data sets under our assumed model formulation (details below).We infer no strong climate dependency in the fraction of dead leaves and roots initially allocated between the different soil pools (Fig. 1k).For the soil component, we infer temperature and moisture dependencies of the classical three-pool soil model that are consistent with previous findings (Fig. 1l, m;Ise and Moorcroft, 2008).
The lack of a relationship for plant mortality was unexpected, because a previous study (Stephenson and van Mantgem, 2005), using the same empirical data on plant mortality rates, identified a positive relationship between mortality rates and productivity -a close correlate of evapotranspiration rates (Leith, 1975).Further analysis reveals this inconsistency to be due to differences in the information implied model (Gelfand and Day, 1994).Separate data subsets were used for parameter inference by different empirical data sets under our assumed model formulation.We infer qualitatively different climate dependence from the plant mortality data alone (a positive relationship, Fig. 1h (grey), as found by Stephenson and van Mantgem, 2005), from all model components and empirical data sets together (a flat relationship, Fig. 1h (red)), or using all model components but omitting individual data sets (omitting plant mortality data gives a negative relationship, Fig. 1h (blue), omitting the plant carbon data gives a positive relationship, similar to Fig. 1h (grey)).These results indicate a clear discrepancy between the information on the climate dependency of plant mortality implied by the mortality data itself and that implied by the other plant carbon data sets under the assumed model formulation.On this basis we identify global plant mortality rates as a major "structural uncertainty" in our terrestrial carbon model.Other than the non-climatically dependent functions, the climate dependencies inferred for the full terrestrial carbon model tend to make predictions that both are significantly positively correlated with the evaluation data sets (Fig. 4a), and tend to explain a positive fraction of the variation within each data set (Fig. 4b).The plant carbon data set is predicted particularly well (final test data median Pearson's r = 0.84 (5 % and 95 % confidence intervals = 0.81, 0.86), coefficient of determination = 0.70 (0.63, 0.77)), as are data on litter production rate, plant carbon fixation rate and the fraction of carbon allocated to structural plant parts, for all of which the model always explains a positive fraction of the varia-tion in the data at a 95 % confidence level (Fig. 4b).Comparing the performance of the full model to one in which the relevant model component is replaced with a null model supports choosing the climate-dependent model for all processes (Fig. 4c) except for the two component processes for which we inferred no climate dependencies.
When applied at global scales, the terrestrial carbon model predicts global patterns of equilibrium plant and soil carbon that match the known patterns well (Fig. 5a, b; calculated using the New et al., 2002;and Batjes, 2000 gridded data sets for environmental variables).We estimate that the absolute uncertainty is positively related to the median (Fig. 5c, d) for both carbon pools.For plant carbon, relative uncertainty (absolute uncertainty/median prediction) tends to be higher in areas in which the model predicts vegetation that is intermediate between being maximally woody and completely non-woody (Fig. 5e), owing to the contrasting mortality rates of these different carbon pools (Fig. 1e-h).Relative uncertainty is also higher over Greenland, and we expect this is because of the inflation of uncertainty under extrapolation: the extreme environments in that region are out of the ranges represented in our data (Fig. 5e).We believe a similar phenomenon explains the relatively higher uncertainty in predictions of soil carbon in extremely dry or cold environments (Fig. 5f).
The inferred climate dependencies in the terrestrial carbon model, other than the plant mortality function, generally support those that have been found in previous studies, indicating their consistency when considered as part of the overall system.Although the qualitative nature of many of these dependencies is consistent with previous knowledge, our estimates of parameter values for these functions (and the associated uncertainty in those parameter estimates) are new.Our analysis also highlighted a number of new insights into the performance of these climate-dependent functions as detailed below.
Although the inferred climate dependencies of NPP (Fig. 1a, b) are consistent with the MIAMI model (Leith, 1975), one parameter is not-well constrained by the data: the parameter controlling NPP at zero degrees Celcius (t 1 in Eq.C1b) does not converge.Instead, the sigmoid response of NPP to temperature is constrained by the parameter controlling the gradient of the temperature dependency, t 2 .This implies that the temperature-dependent function may actually be over-complex for our purposes.Further investigations (omitted for brevity) involving removing this parameter and refitting the model reveal that this and other underconstrained parameters discussed below have little effect on model predictive accuracy.Plots of predictions versus observations also reveal a noisy positive relationship, with the model underestimating NPP for sites with NPP greater than around 1.0 kg m −2 yr −1 -a property that has been noted previously for the MIAMI model (Fig. 6a; Friedlingstein et al., 1992;Dai and Fung, 1993).The shape of the function controlling the proportion of fixed carbon allocated to structural components implies that structural tissue only makes up around 10 % of vegetation carbon at NPP values of around 0.5 kg m −2 yr −1 (Fig. 1c).We expect this probably underestimates structural carbon in vegetation types dominated by low-productive woody vegetation such as some boreal forests (Kicklighter et al., 1999), although we have not verified this.
The wide confidence intervals in the function predicting the fraction of vegetation leaves that are evergreen imply relatively high uncertainty, and inspection of the relationship between predictions and observations makes clear why this is the case, with several observations lying far from the 1 : 1 line (Fig. 6b).Despite this, the correlation between observations and predictions is relatively high (Fig. 4a), probably due to the dominance of sites in the data set that are either entirely evergreen (44 %) or entirely deciduous (14 %).We anticipate that the low quantity of data in the data sets on leaf characteristics (the most sparse data in our collection; Table 1) strongly influences the variation in model predictive performance for those climate dependencies.
Our inference of the climate dependencies of fire frequency globally again highlights some redundant model parameters: the scaling constant c f and the two half saturation constants, lfs halfsat and NPP halfsat , are poorly constrained.This implies that the model could be reformulated with fewer parameters and still predict the data with the same accuracy.Although the correlation between predictions and observations is relatively strong for this data set (Fig. 4a, b), visual inspection of observational data versus predictions implies a lower predictive performance at low fire return intervals (fraction of burned per year) and a tendency to underestimate the fire return interval overall (Fig. 6g).
Although plant carbon is predicted well, the plots of predictions versus observations indicate some notable outliers at low carbon contents, where carbon is predicted to be much higher that observed for some sites (Fig. 6h).These sites appear to be associated with tropical vegetation that has been classified as grasslands and shrublands in the global land cover map and have been assigned low carbon content (data omitted for brevity), in contrast to the predictions of the model.
For the soil model we could, again, not fully constrain all model parameters.The parameter downscaling soil decomposition rates as a function of extremely high temperatures (ts c in Eq.C8a) and the parameters controlling the optimum moisture content for decomposition and downscaling parameter of decomposition rates in extremely wet conditions (ms c and m thresh , in Eq.C8c) are probably all poorly constrained, a result of lacking sufficient data representing such extreme environments.

Build-up experiments
The build-up experiments show that the performance of some model components changes as they become part of larger model structures (Fig. 7).The major result from these experiments is the qualitative change in the inferred climate dependency of plant mortality upon being connected to the full model, as mentioned above (Fig. 1k).The consequent change in model predictions is clearly seen in the plots of predictions versus observed data in which a noisy positive relationship between predictions and observations is apparent when the plant mortality model is fitted to the plant mortality data alone, whereas a relatively flat relationship is observed for the model fitted as part of the full model structure (Fig. 6k, l).
The net primary productivity model (NPP) improves in predictive performance when it is connected to the full model, showing higher correlation coefficients and an improved fit to the training data (lower deviance information criterion values; Fig. 7).Lower uncertainties in NPP functional forms are also visible for the full model compared to when the model is fitted to the NPP data alone (grey versus red in Fig. 1a, b).This implies that information from other model components helps to further constrain the climate dependence of NPP.The component predicting the fraction of plant material that is woody exhibits a poorer fit to the training data as it is connected to other model components but still slightly increases in predictive performance (Fig. 7).Connecting this component to plant carbon also enables the parameter controlling the maximum carbon allocation fractions to wood to be inferred (grey versus red in Fig. 1c).Connecting the model predicting litter production to the full model structure improves its fit to the training data, whereas the opposite is the case for the model predicting fraction of land area burned (Fig. 7).However, none of these effects significantly alter the correlation between predictions and observations in the evaluation data, and they only cause minor reductions in the confidence intervals of the functional forms of the fire model (grey versus red in Fig. 1i, j).

Omit-data experiments
The most notable effects of omitting data sets when inferring parameters to the full models were on the parameters of the plant mortality climate dependency, as reported above.Otherwise, omitting individual data sets when fitting the full model does not dramatically influence the performance of the model at predicting the data sets that had not been removed (we omit details for brevity).Constraining the parameters of some components is entirely dependent on the presence of their corresponding data set.This is the case for the fraction of leaves that are evergreen, the mortality rates of evergreen and deciduous leaves, fine root mortality rates and soil decomposition rates and indicates that the prediction of other connected components, such as plant carbon for example, is not dependent on the predictive accuracy of those components.In contrast, omitting the data for the NPP model still results in the inference of very similar parameter values, indicating a strong dependency on those parameter values for predicting other empirical data sets.

Replace-null experiments
In general, and as expected, replacing a component with a model predicting no climate dependency strongly influences the predictive performance of that component in the cases where evidence exists for a climate dependency (Fig. 8).For some components this results in negligible effects on the predictive performance of the rest of the model.However, for the NPP model, replacement strongly influences the predictive performance of other components.This is not surprising, given that it is the initial carbon input term for the plant carbon pools.

The degree of empirical support for the component processes
One of our key aims was to establish a baseline terrestrial carbon model to support future work in which we had assessed the empirical support for every component process, including clearly characterising and incorporating parameter uncertainties.Our new methodology has clearly achieved that, and it also illustrates how the adoption of such a methodology in the future development of terrestrial carbon models could ac-celerate the identification of where their key inconsistencies lie, both with each other and with empirical data.We adopted a relatively simple model here, and we hypothesise that it is probably too simple and uncertain to provide informative projections of future terrestrial carbon dynamics.For example, while we adopted the MIAMI model as a convenient and well-recognised, simple model of net primary productivity, it has several well-known limitations (e.g.Bonan, 1993)  Skill at predicting the data on the terrestrial carbon cycle at equilibrium, as we have shown here, is no guarantee that the model can accurately capture the temporal dynamics of the carbon cycle that other data assimilation studies have focussed on (Scholze et al., 2007;Ricciuto et al., 2011).For example, our model does not account for carbon dioxide fertilisation effects that are likely to have had, and could continue to have, a major influence on vegetation carbon fixation potential into the future (Friedlingstein et al., 2003).We therefore see a pressing need to build on our rigorous approach and begin the process of comparing alternative formulations of component models, model structures, and different data sets.Such studies would obviously consider alternative representations of canopy photosynthesis (Purves and Pacala, 2008), including not only carbon dioxide fertilisation effects (Friedlingstein et al., 2003), but also the dynamics of other resources such as water and nitrogen (Goll et al., 2012), the importance of different plant functional types, vegetation traits, successional processes and permafrost carbon thawing, to name a few (see Arneth et al., 2012; for a detailed discussion).Although we developed our model within a generalized framework so that it can become a foundation upon which future intercomparisons can be conducted (see Sect. 5.1), it will require further improvements to truly enable detailed intercomparisons of state-of-the-art terrestrial carbon models.Enabling the inference of the parameters of dynamical models using time-varying data, and the quantification, assessment and propagation of errors in the observational data are two key necessary refinements to enable such studies.
Future studies should also investigate the effects of incorporating more and improved data sets for data assimilation and model testing than we did here.We deliberately selected data sets that were relatively easy to obtain, process and share.It is reassuring that these were sufficient to infer most of the known climate dependencies.However, further constraining and testing the more detailed terrestrial carbon models will require new data sets that document temporal dynamics and, more generally, whichever data have the greatest potential to minimise uncertainty in model predictions.Here we made no use of the valuable FLUXNET (http://fluxnet.ornl.gov/)data sets on water and carbon fluxes nor the FPAR data (http://modis.gsfc.nasa.gov/data/dataprod/dataproducts.php?MOD NUMBER=15).We expect that incorporating such data could assist greatly in constraining and comparing any dynamical models developed in the future.There have already been developments in this direction, especially in constraining terrestrial carbon models with carbon flux data (Knorr and Heimann, 2001;Kaminski et al., 2012;Scholze et al., 2007;Ricciuto et al., 2011).Future developments should build from these studies to enable more detailed processbased models to be data-constrained within our framework, although it will be important to take a balanced approach to refining components and not, for example, focus on refining the representation of photosynthesis if it occurs at the expense of refining other components, such as mortality, which has received much less attention to date.
While we demonstrated here how multiple component processes can be data-constrained using multiple data sets, there is obviously a question about how to assess models when they are projected under future scenarios, for which no observational data can exist.This problem can never be solved fully, and in principle vegetation could exhibit changes, or suffer perturbations, that have never been observed, either naturally or under artificial perturbations.Our approach offers the promise of being able to identify model components for which improved understanding would be most important, by decomposing changes in the output variables of interest (e.g.soil carbon) to changes in different processes (e.g. the effects of extreme temperatures on soil decomposition rates), and then finding out which data sets are most important for improving those components.Under our framework the results of key experiments, such as manipulative experiments, or of other passive observations could then be included as additional data constraints.

Inferred relationships in relation to current understanding
The climate dependencies inferred here mostly confirm those that have been identified previously.However, the identification of qualitatively different climate dependencies for plant mortality depending on the model formulation and empirical data used, as well as some other more subtle adjustments to other climate dependencies, highlights the value of the systemic approach: it enables us to assess how consistent our model of how the overall system functions is with empirical evidence and identify where discrepancies lie.
At present, we do not understand the reason for the structural uncertainty in the climate dependency of plant mortality rates.As a group, the plant carbon data sets imply that mortality rates decrease with actual evapotranspiration under our assumed overall model formulation, whereas the plant mortality data imply the opposite.Perhaps the high plant mortality rates observed in highly productive sites are inconsistent with the high carbon values recorded for those sites under the inferred climate dependencies.This could have caused the systematic underestimation of net primary productivity in highly productive sites (Fig 6a), leading to lower than expected predictions for plant carbon, which in turn could cause the inferred plant mortality rates for those sites to be higher than observed.There are however other possibilities, such as issues to do with the quality of the plant mortality data or the other plant carbon data sets, the possibility of plant mortality patterns obeying more complex relationships than represented by our mortality component, or missing processes in the model (such as the separation of structural carbon losses between whole plant mortality and the loss of other structural pars such as branches).These possibilities add to recent calls to increase understanding of the role of plant mortality in the global carbon cycle (Stephenson and van Matgem, 2005;van Mantgem et al., 2009;Allen et al., 2010).

Building a model for future predictions
We do not know how accurate predictions of temporal dynamics of the carbon cycle from our terrestrial carbon model might be, because we have done no assessments of our model in predicting carbon dynamics.Nonetheless, we can use it to illustrate the concept of using a data-constrained model to inspect the relative importance of parameter and structural uncertainty in projections.For example, it could be that, despite the uncertainties identified, the model makes relatively wellconstrained projections for some features of the carbon cycle, or it could be that the uncertainties tend to combine to give extremely uncertain dynamics.To illustrate this we set the plant and soil carbon pools across the terrestrial land surface to equilibrium in the year 2000 (as in Fig. 5 but at 0.5 decimal degree global resolution), and simulated climate change under pessimistic (A1F1) and optimistic (B1) anthropogenic emissions scenarios using our model to explore the plausible importance of these uncertainties (The IPCC Data Distri-bution Centre; Lowe, 2005; http://www.mad.zmaw.de/IPCCDDC/html/SRES AR4/index.html;see Appendix D for details).We decided to set the land surface to equilibrium carbon levels to ensure that any changes in the carbon balance were solely due to climate changes, and the climate scenarios were chosen to create two extremes for comparison.
The resulting projected changes in the total terrestrial carbon balance are similar under both climate forcing scenarios, both predicting a net carbon sink up to about 2150 before becoming a carbon source (Fig. 9).However, the time  courses of uncertainty in the projections are quite different, with uncertainty higher under the A1F1 scenario and showing that the terrestrial carbon cycle could be either a carbon source or a sink over the simulated time period (note that these are not intended as reliable projections).Substituting the alternative models of plant mortality inferred under the different model fitting experiments indicates that this structural uncertainty has only minor quantitative effects on the projected change in the terrestrial carbon balance.Remarkably, however, this structural uncertainty leads to qualitatively different predictions for the projected changes in total vegetation carbon up to 2050 (Fig. 9), with positive, relatively flat, or negative changes in global vegetation carbon, depending on the mortality model parameterisation chosen.Despite these differences, all simulations predict that vegetation becomes a net source by 2020.Although these are exploratory simulations, they do emphasise the potential importance and value of considering parameter and structural uncertainties in terrestrial carbon models when attributing confidence to projections of the terrestrial carbon cycle in Earth System Models.Such uncertainty could be decomposed further into the contributions from different component processes and even individual parameters and data.We could also begin such simulations with vegetation out of equilibrium.Such diagnoses are likely to help identify the different sources of uncertainty in predictions and projections, enabling the most important sources of uncertainty to be targeted for reduction.
Overall, our results complement the progress that has been made in data-constraining terrestrial carbon models (Rayner et al, 2011;Knorr and Heimann, 2001;Scholze et al., 2007;Ricciuto et al., 2011;Zahele et al., 2011) and the development of frameworks to enable such studies to be performed in a repeatable fashion (Scholze et al., 2007; http: //pecanproject.org/).We hope that, by combining these insights, the biogeosciences community can rapidly move towards identifying the best models for specific purposes, that is, those models which have been shown to be most consistent with the available empirical evidence, and which incorporate uncertainties in model structure and parameter values into their predictions.

Appendix A Processing of empirical data on carbon stocks and fluxes
We did not actively seek the "best data" pertaining to individual stocks and fluxes, because we prioritised enabling reproducibility and transparency over obtaining the most accurate results.A number of the data sets were only available as gridded land surface data, constructed through analyses of site-based data but for which the original site data were not available (Table 1).For these we generated pseudo-site based data by stratified random sampling.We anticipate that some of our data sets only coarsely represent true carbon stocks or flows at global scales.Moreover, although not all of the data in all data sets represent vegetation in a state of dynamical equilibrium, we screened the data sets that incorporate disturbance and anthropogenic effects using the Global Land Cover 2000 vegetation classification map (Bartholome and Belward, 2005) to only select data representative of vegetation at equilibrium.

A1 Vegetation carbon
Global site-based data sets of carbon held in natural terrestrial vegetation have been compiled previously and used to produce global gridded maps.Unfortunately, the original source data have not been made generally available.Recently, Ruesch and Gibbs (2008) produced a global biomass carbon map for the year 2000 for the entire global land surface at 1 km resolution (approximately).This was derived from unique estimates of vegetation carbon values for 124 different carbon zones that were defined by considering the www.biogeosciences.net/10/583/2013/Biogeosciences, 10, 583-606, 2013 continent, ecoregion, vegetation type and degree of human disturbance.We used this map to generate pseudo-site based data.We performed stratified random sampling of the Global Land Cover 2000 vegetation classification map (Bartholome and Belward, 2005; which has the same 1 km resolution as the vegetation carbon map) and then for each randomly selected point sampled the corresponding vegetation carbon value, conditional on whether the vegetation classification indicated specific vegetation types.Specifically, this included classification types 1-9 (generally "tree cover"), excluded type 10 ("tree Cover, burnt"), included types 11-15 (generally shrub or herbaceous cover), and excluded the remaining types 16-22 (not natural or not vegetation).

A2 Litter carbon production
Different studies use terms like "litter production" to mean different aspects of vegetation biomass loss (Matthews, 2003).Here, we define "litter carbon production" as the net mass of carbon lost through natural leaf and fine root mortality (not including fire-induced mortality) per unit area of land surface (m 2 ) per year (yr).This does not include woody litter production through the loss of branches, stems or coarse roots.
Leaf litter production is commonly estimated in the field using litter traps (Matthews, 2003).Studies employing this method rarely estimate root litter production.This led us to consider modelling leaf and fine root litter production separately.However, studies estimating leaf litter production sometimes use it to approximate net primary productivity by assuming that the gains and losses of carbon in the vegetation are balanced (Matthews, 2003).This implies that source data on leaf litter production might not be independent of some of our net primary production data (although we did not check to confirm this).These challenges led us to consider alternative methods for estimating litter carbon production.Matthews (2003) used multiple methods to estimate litter production.One of these infers litter production from data on soil respiration rates and root respiration rates, by assuming that carbon stocks and flows are at equilibrium.Neither of these data sets was used in our model, so we used data generated from this method.A further complication however was that Matthews (2003) never made the source data available.Instead the author presented averages for 30 different vegetation types according to the UNESCO vegetation classification system (Matthews, 1999).We therefore generated pseudo-site based data.To do this we performed stratified random sampling of the UNESCO vegetation classification map (which comes in 1 degree resolution), and for each randomly selected point we associated the vegetation type with the litter production value given in column 3D of Table 5 in Matthews (2003).This gives litter dry matter production figures, so we multiplied by 0.5 to approximately convert from dry matter to carbon.

A3 Soil carbon
Global site-based estimates of carbon held in soils under natural terrestrial vegetation have been compiled for decades.We initially considered using the NDP018 data set analysed by Post et al. (1982Post et al. ( , 1985) ) in their studies of global patterns of plant carbon and nitrogen.However, preliminary investigations revealed significant differences in the Holdridge climate classifications (Holdridge, 1967) associated with the sites in the NDP018 data and the classifications we obtained by using georeferenced climate data from the New et al. (2002) data set.This led us to suspect that the GPS coordinates associated with the NDP018 are not sufficiently accurate to enable a sufficiently accurate estimate of the climatic conditions associated with the site.Instead we chose the "Global Gridded Surfaces of Selected Soil Characteristics" (IGBP-DIS) data set produced through the "Global Soil Data Task" project, which was designed specifically to assemble a "reliable and accessible data set on pedosphere properties on a global scale" (Batjes, 2000).We generated pseudo-site based data by performing stratified random sampling of the map of soil-carbon density at depth interval of 0-100 cm (which comes in 5 arcmin resolution).

A4 Net primary productivity
Net primary productivity is estimated as the net mass (kg) of carbon fixed by living vegetation per unit area of land surface (m 2 ) per year (yr).We selected NPP data compiled for the Ecosystem Model Data Intercomparison project (EMDI, Olson et al., 2001).We selected the Class B "intermediate quality" data set, because it provided a relatively high quantity of data (933 unique sites) and represented all of the major vegetation zones of the world.

A5 Mortality rates of deciduous and evergreen leaves, and the fraction of vegetation that is evergreen
We know of no global data sets containing site-based estimates of leaf turnover rate at the whole vegetation stand level.Data from satellite observations are likely to fill this gap in the future.The lead authors of Wright et al. (2004) provided the GLOPNET database, which, according to Wright et al. (2004), "spans 2548 species from 219 families at 175 sites" (approximately 1 % of the extant vascular plant species).This database contains georeferenced data for a variety of species-specific leaf traits, for multiple species at a given site, including an estimate of leaf lifespan and whether the leaf is classified as deciduous or evergreen.In a recent analysis of global patterns of leaf mortality, using the GLOPNET database (Wright et al., 2004) supplemented with additional data, van Ommen Kloeke et al. ( 2011) found clear trends with environmental variables when the deciduous and evergreen leaves were treated separately.As a result we calculated the geometric average mortality rate at each site, separately for deciduous and evergreen leaves.We also calculated the fraction of the records at each site that were of evergreen plants and used this as a coarse approximation of the faction of vegetation at each site that is evergreen.

A6 Fine root mortality rate
Data on root turnover rates are available for most vegetation types, even though estimating root mortality rates is notoriously difficult and prone to error.As is conventional, we distinguished between "coarse roots" and "fine roots", but only explicitly modelled the mortality rate of fine roots (structural roots are implicitly represented in plant structural carbon).We define "fine roots" as those roots whose primary function is to acquire water and nutrients, with little role in structural support or resource storage (Eissenstat and Yanai, 1997).We obtained data from the Appendix of Gill and Jackson (2000) who studied global patterns in root turnover.We included all data from their table with the exception of any entries that had "root type" classified as "coarse".We also corrected three obvious errors in their data set: "Adiopodoum, Ivory Coast" should have the longitude 4 • 30 W; "Portugal" should have longitude 9 • 24 W; and the two entries for "Macquarie Island, Subantarctic" were corrected to have longitude 158 • 57 E.

A7 Plant mortality rate
Any plant matter that is not "leaves" or "fine roots" in our model is classified as "structural".We equated the turnover rate of structural plant parts in the absence of fire to the rate of plant mortality.We could find no data on plant mortality rates at global scales.Instead we used the data compiled by Stephenson and van Mantgem (2005) on forest turnover rates at global scales.This omits data on non-forested vegetation.

A8 Fractional area burned
A number of projects have sought to obtain accurate estimates of patterns of fire frequency at global scales.Mouillot and Field (2005) generated a global fire map of "fractional area burned per year" at 1 • resolution for the terrestrial land surface by synthesizing available data and extrapolating when data were absent.We used this data set to generate pseudo-site based data on the fractional area burned per year by performing stratified random sampling of their map.We then took averages over the same time period as in the New et al. (2002) data set , including only samples from sites that were classified as natural vegetation (codes < 16) in the Global Land Cover 2000 map (Bartholome and Belward, 2005).In our predictive model we equate the fraction of land area burned per year with the rate of plant carbon losses due to fire, although we allow for this to be downscaled for structural plant parts.(2009).Predicted AET (mm) was calculated using the methodology described in Sect.3.2.6 for land grid squares at 0.5 degree resolution.Points are partially transparent to help emphasise differences in data density.

A9 Fraction of carbon in leaves and fine roots in "metabolic" carbon pool
It is common in dynamic soil modelling to distinguish between the pool of dead plant carbon that decomposes relatively slowly (cellulose and lignin) and that which decomposes relatively rapidly (nucleic acids and cytoplasmic constituents; Ise and Moorcroft, 2006;Schimel et al., 1996;Bolker et al., 1998).However, we have not found any actual compilations of site-based estimates of the "metabolic fraction" to use to infer global patterns, nor of the two metrics typically used to calculate it: the ratio of carbon to nitrogen in plant tissues or the lignin fraction of plant tissue mass.What does exist are "representative" figures for different vegetation types: for example, the litter fall of boreal evergreen forest trees can be calculated to have a metabolic fraction of 0.49, whereas grassland litterfall has a metabolic fraction of 0.76, a typically high value.We therefore generated pseudo-site based data.To do this we performed stratified random sampling of the IBIS vegetation classification map (Ramankutty and Foley, 1999; http://www.sage.wisc.edu/atlas/data.php?incdataset=PotentialVegetation, which comes in 0.1 degree resolution), and for each randomly selected point we associated the vegetation type with the metabolic fraction value given in Table 1 of Ise and Moorcroft (2006).

A10 Fraction of plants that is "structural": everything but leaves and fine roots
The dominant vegetation types across the globe exhibit highly contrasting patterns in their ratios of leaves and fine roots to woody plant parts such as stems and woody roots.In this study we treated all of the biomass that is not leaves and fine roots as "structural".Previous global vegetation modelling assumed different allocation patterns between leaves, roots and structural components for different plant functional types, such as boreal forests (high fraction "structural") versus grasses (low fraction "structural").In our study we inferred the most likely allocation patterns from data.However, we know of no global data explicitly documenting leaf : fine root : structural allocation ratios at the vegetation stand scale (although such data do exist for individual species).Instead, we assumed that the fraction of net primary productivity allocated to structural plant parts was zero in grasslands and some maximum in evergreen tropical rainforests.To acquire the data we performed stratified random sampling of the Global Land Cover 2000 map (Bartholome and Belward, 2005) and only recorded samples if they happened to be classified as these vegetation types.We recorded an integer "1" alongside the sample GPS coordinates if the vegetation was recorded as evergreen tropical rain forest (codes 1) or a "0" if the vegetation was recorded as grassland (codes 13 and 14).

Appendix B Calculation of environmental variables B1 Mean annual temperature
Mean annual temperature, used in several of the model components, was calculated as the arithmetic mean of monthly temperatures in the New et al. (2002)

B2 Mean annual precipitation
Mean annual precipitation, used in several of the model components, was calculated as the sum of the mean monthly precipitation values in the New et al. (2002) gridded climate data set.

B3 Mean annual biotemperature
Mean annual biotemperature, used to calculate Holdridge life zones (Holdridge, 1967), was calculated as the arithmetic mean of monthly temperatures in the New et al. (2002) grid-ded climate data set after setting monthly temperatures less than 0 • C or greater than 30 • C equal to zero.

B4 Fraction of the year experiencing frost
The fraction of the year experiencing frost (FYF), used to calculate the fraction of leaves that are deciduous or evergreen, was calculated using the method employed by van Ommen Kloeke et al. (2011) (P. van Bodegom, personal communication, 2011).The algorithm uses the number of frost days per month from the New et al. (2002) gridded climate data set.If the number of frost days was greater than 15, then the whole month was classified as being a "frost month".If the number was less than 15, then it was classified as being a "non-frost month".However, if a non-frost month followed a frost month, or if a non-frost month was to be followed by a frost month, then the fraction of the month in the non-frost month was calculated as 1/15 the number of frost days.The fraction of the year experiencing frost was then the sum of the number of frost months divided by 12.

B5 Mean annual potential evapotranspiration
Mean annual potential evapotranspiration rate was calculated using the Penman-Monteith algorithm as specified in Allen et al. (1998) for calculating monthly evapotranspiration rates.The algorithm is rather lengthy and we omit details here for brevity, although all details of the algorithm were checked using the test data provided in Allen et al. (1998).All environmental variables for the algorithm are available in the New et al. (2002) gridded climate data set.The sum of the monthly evapotranspiration rates gave PET.This calculation requires making an assumption about the vapour pressure deficit and stomatal resistance, both of which can be influenced by vegetation type.We simplified the calculation by calculating the PET according to grassland (the "reference evapotranspiration"; Allen et al., 1998).A natural area for future work will be to explore the importance of alternative formulations for calculating PET.

B6 Mean annual actual evapotranspiration
Soil moisture content was simulated on a daily time step to obtain estimates of the actual evapotranspiration rate and the length of the "fire season".It was calculated using a modified version of the algorithm reported by Prentice et al. (1993).
Climate variables for the algorithm came from the New et al. (2002) gridded climate data set, and soil maximum water capacity (field capacity) came from the Global Soil Data Task Group (2000) "Global Data Set of Derived Soil Properties" data set.Daily changes in soil water content were calculated using the balance equation specified in Prentice et al. (1993): where ω i is soil water content (mm), ω max is soil field capacity, i is time in days, P i is daily precipitation and E i is actual daily evapotranspiration.Actual daily losses due to evaporation are calculated as where ET i is daily potential evapotranspiration.Our method is a modification of the algorithm used by Prentice et al. (1993).Here the supply of water is taken to be proportional to maximum evaporative demand (potential evapotranspiration) scaled by the relative soil wetness.Prentice et al. (1993) calculated E i as the minimum of daily supply and demand, where supply was calculated according to Eq. (B1) but using a maximum evapotranspiration rate constant instead of ET i , and using a slightly different algorithm to calculate potential evapotranspiration rate (we used Allen et al., 1998).A natural extension in the future will be to use the methodology for model building that we report here to also assess the modelling of evapotranspiration rates.
For each site we initialised the soil water content at field capacity and simulated Eqs.(B1) and (B2) for 10 yr, which was long enough for the annual dynamics to converge to an equilibrium annual cycle.Like Prentice et al. (1993) we used a daily time step, because we found that adopting a coarser time step led to extreme numerical artefacts in the time series of soil water balance.Values for P i and ET i were obtained by linear interpolation of the monthly precipitation and potential evapotranspiration values, respectively.Monthly and annual actual evapotranspiration was then calculated by summation of the E i values.
We checked that our actual evapotranspiration rate calculation yielded sensible predictions by comparing our estimates with model-derived estimates of global actual evapotranspiration rates (Willmott and Matsuura, 2001) averaged between 1961 and 1990, the same period as our New et al. (2002) climate data.We obtained a good agreement with their calculations (Fig. A1; r 2 = 0.88, n = 44 225).However, a natural area for future work will be to explore the importance of alternative methods for calculating AET and soil water balance.

B7 Length of the fire season
Daily soil water content predictions (detailed above) were used to estimate the length of the fire season -the fraction of days of the year over which fire is likely to occur.We based our algorithm on that specified in Thonicke et al. (2001), which calculates the length of the fire season as a function of the daily soil moisture status throughout the year.However, unlike Thonicke et al. (2001) we did not impose any constraints on the amount of biomass present for the daily probability of fire to be greater than zero (instead, this is part of the fire model), and we added the constraint that daily temperature (interpolated from monthly temperature) had to be greater than zero for the daily probability of fire to be nonzero (as in Kloster et al., 2010).The algorithm was therefore LFS = 360 i=1 fp(ω i , T i ), where (B3a) fp = exp −π ω i ω e , T > 0, ω i < ω e 0, T ≤ 0, ω i ≥ ω e (B3b) where ω i is the daily soil water content on day i, ω e = 0.3 is the soil moisture content at which fires become impossible (moisture of fire extinction) and T i is the daily temperature, which was linearly interpolated from monthly values.Note that we assume a 360-day year.The parameter ω e = 0.3, used by Thonicke et al. (2001), is clearly one that could be inferred from data in future studies.
plants.The mean leaf mortality rate µ l (yr −1 ) is calculated as a weighted average according to C3 Fine root mortality rate Gill and Jackson (2000) analysed a database on root turnover rates from all major terrestrial vegetation types and found clear log-linear positive relationships between turnover rates and site mean annual temperatures for fine roots in forests, and for roots in shrublands and grasslands.Reflecting their findings we predicted fine root turnover rates according to µ r = exp (m frm MAT + c frm ) yr −1 , (C3) where µ r is root-turnover rate (yr −1 ), and c frm and m frm are unknown (assumed constant) parameters scaling the response of fine root mortality rate to MAT.

C4 Plant mortality rate
Stephenson and van Mantgem (2005) analysed patterns of tree mortality rates across temperate and tropical forests worldwide and revealed a tendency for mortality rates to be higher in higher productivity areas.We therefore modelled plant mortality rates as where µ s is plant mortality rate (yr −1 ), and p 1 and p 2 are unknown constants (inferred parameters) that scale the plant mortality rate as a function of annual actual evapotranspiration, AET.

C5 Mortality rate due to fire
We developed a fire model based on the models of Thonicke et al. (2001), Kloster et al. (2010) and Arora and Boer (2005).We predicted the per capita vegetation mortality rate due to fire as f 1 (LFS) = 1 1+ exp (−lfs scalar (LFS-lfs halfsat )) and ( C5b) Here, µ f is the mortality rate due to fire (yr −1 ), and c f , lfs scalar , lfs halfsat , NPP scalar , and NPP halfsat are unknown constants (inferred parameters).The constant c f scales the overall mortality rate due to fire, lfs scalar and lfs halfsat scale the logistic response of this mortality rate to the length of the fire season LFS, and NPP scalar and NPP halfsat scale the logistic response of fire return interval to G .To infer the parameters to this model, we assume that the mortality rate due to fire is equivalent to the fractional area burned per year.

C6 Metabolic fraction
The fraction of leaf and fine root carbon allocated to components that decompose relatively rapidly (nucleic acids and cytoplasmic constituents) notably varies between different plant functional types, with gymnosperms, for example, tending to have a relatively low "metabolic fraction".Metabolic fraction also tends to be positively associated with environmental variables such as actual evapotranspiration rate, even when controlling for changes in plant functional types (Aerts, 1997).Rather than introduce plant functional types, we chose to model metabolic fraction as a simple linear function: where f s is metabolic fraction and c fm and m fm are unknown constants (inferred parameters) that scale the response of metabolic fraction to AET.

C7 Fraction of carbon allocated to structural components
We developed a simple model that predicts the fraction of carbon to woody plant parts as a logistic function of the net primary productivity of the vegetation.This model has the following form:

Fig. 1 |Fig. 1 .
Fig. 1 | Summary of the inferred climate dependence of the terrestrial carbon cycle within

Fig. 3 |
Fig. 3 | The full terrestrial carbon model can be represented as a factor graph.All boxes represent model components with accompanying data.Arrows connect a model that acts as a subcomponent (tail of arrow) to another model (head of arrow).Models within Group 1 Fig. 4| Performance assessments of the terrestrial carbon model.(A) Pearson's correlation,

Fig. 4 .
Fig. 4. Performance assessments of the terrestrial carbon model: (a) Pearson's correlation, (b) the coefficient of determination and (c) the probability of the model relative to a null model(Gelfand and Day, 1994).Separate data subsets were used for parameter inference and model evaluation to avoid over-fitting.A final test data subset was reserved (never used during model development) to provide an independent estimate of the likely predictive ability when the model is applied to locations that have not been observed.Assessments against evaluation and final test data are in grey and red, respectively, with n being the respective mean or absolute number of data points per assessment.Dots and error bars are average medians, 5th and 95th percentiles in (a) and (b) and medians, maxima and minima in (c) using parameter distributions inferred from 10 training data subsets.Insufficient evaluation data existed to calculate (a) and (b) for the deciduous leaf mortality model.

1239Fig.Fig. 5 .
Fig. 5| Predicted global distributions of equilibrium plant and soil carbon.These match the

Fig. 6 |
Fig. 6 | Model predictions versus observed empirical data (omitting the 25% final test data).Predictions were made using all 10 posterior parameter probability distributions from 10fold model fitting.Points show the average median prediction and error bars show the average upper and lower 95% confidence intervals.a)-k) are predictions from the full model trained to all empirical data sets and l) are predictions from the plant mortality model trained to the plant mortality data alone.

Fig. 6 .
Fig. 6.Model predictions versus observed empirical data (omitting the 25 % final test data).Predictions were made using all 10 posterior parameter probability distributions from 10-fold model fitting.Points show the average median prediction, and error bars show the average upper and lower 95 % confidence intervals.(a-k) show predictions from the full model trained to all empirical data sets, and (l) shows predictions from the plant mortality model trained to the plant mortality data alone.

Page
53 of 54 obtained for the full model fitted to all data sets.

Fig. 9 |
Fig. 9 | Projected changes in terrestrial carbon under two climate change scenarios.These highlight the potential importance of parameter and structural uncertainty.Lines and shading represent the average median, 5th and 95th percentile projected changes (representing parameter uncertainty) from the terrestrial carbon model using parameter probability distributions inferred from 10 training data subsets.Red, grey and blue correspond to different climate dependences for plant mortality, representing the major structural uncertainty in the model (see Fig. 1).Details of how we simulated the model is given in Appendix C. The additional code necessary to run these simulations is available at http://research.microsoft.com/en-us/downloads/49ad471e-7411-4f65-910a-2a541f946575/default.aspx and the resulting simulation data is available from http://download.microsoft.com/download/1/F/D/1FD1F550-69C4-4503-B2FE-B47F94607A7F/MSRTCMSIMData.zip

Page 54 of 54 1286 Fig. A1 |
Fig. A1 | Comparison between our model derived estimates of annual actual 1287

µ
l = exp (f e ln(µ e )+ (1−f e ) ln(µ d )) yr −1 , (C2a) where f e =a fe FYF 2 +b fe FYF + c fe , (C2b) µ e = exp (m e MAT − c e ) , (C2c) µ d = exp (−(m d MAT + c d )) .(C2d) a fe , b fe , c fe , c e , m e , c d , and m d are unknown constants (inferred parameters); 0 <f e < 1 is the fraction of the vegetation that has evergreen leaves with parameters a fe , b fe and c fe scaling the quadratic Eq. (C2b); µ e is the mortality rate of evergreen leaves with parameters c e and m e scaling that exponential function; µ d is the mortality rate of deciduous leaves with parameters c d and m d scaling that function; and FYF is the fraction of the year that experiences frost (calculated using the method of van Ommen Kloeke et al., 2011).
point (in units of kg C m −2 ) by the area of the grid cell.

Table 1 .
Source data on carbon stocks and flows used in our study for model training and evaluation.
New et al. (2002)ata set.If elevation figures were present in georeferenced data, then the difference between this and the associated elevation figure in theNew et al. (2002)data set was used to calculate a corrected temperature, assuming a lapse rate of 6.49 degrees Celsius per km elevation.