Articles | Volume 18, issue 8
Research article
30 Apr 2021
Research article |  | 30 Apr 2021

Optimal model complexity for terrestrial carbon cycle prediction

Caroline A. Famiglietti, T. Luke Smallman, Paul A. Levine, Sophie Flack-Prain, Gregory R. Quetin, Victoria Meyer, Nicholas C. Parazoo, Stephanie G. Stettz, Yan Yang, Damien Bonal, A. Anthony Bloom, Mathew Williams, and Alexandra G. Konings

The terrestrial carbon cycle plays a critical role in modulating the interactions of climate with the Earth system, but different models often make vastly different predictions of its behavior. Efforts to reduce model uncertainty have commonly focused on model structure, namely by introducing additional processes and increasing structural complexity. However, the extent to which increased structural complexity can directly improve predictive skill is unclear. While adding processes may improve realism, the resulting models are often encumbered by a greater number of poorly determined or over-generalized parameters. To guide efficient model development, here we map the theoretical relationship between model complexity and predictive skill. To do so, we developed 16 structurally distinct carbon cycle models spanning an axis of complexity and incorporated them into a model–data fusion system. We calibrated each model at six globally distributed eddy covariance sites with long observation time series and under 42 data scenarios that resulted in different degrees of parameter uncertainty. For each combination of site, data scenario, and model, we then predicted net ecosystem exchange (NEE) and leaf area index (LAI) for validation against independent local site data. Though the maximum model complexity we evaluated is lower than most traditional terrestrial biosphere models, the complexity range we explored provides universal insight into the inter-relationship between structural uncertainty, parametric uncertainty, and model forecast skill. Specifically, increased complexity only improves forecast skill if parameters are adequately informed (e.g., when NEE observations are used for calibration). Otherwise, increased complexity can degrade skill and an intermediate-complexity model is optimal. This finding remains consistent regardless of whether NEE or LAI is predicted. Our COMPLexity EXperiment (COMPLEX) highlights the importance of robust observation-based parameterization for land surface modeling and suggests that data characterizing net carbon fluxes will be key to improving decadal predictions of high-dimensional terrestrial biosphere models.

1 Introduction

The role of the terrestrial biosphere in the global carbon cycle is challenging to model (Friedlingstein et al., 2013) due to the diverse processes, forcings, and feedbacks driving variability of gross fluxes (Heimann and Reichstein, 2008; Luo et al., 2015). Many attempts to reduce model uncertainty have focused on matching models to nature by representing an increasing number of processes known to influence different parts of the carbon cycle (e.g., vegetation demography, Fisher et al., 2018, or plant hydraulics, Kennedy et al., 2019). In this way, models of the terrestrial biosphere have become more complex over time (Fisher et al., 2014; Bonan, 2019; Fisher and Koven, 2020). Despite such advancements, the spread in terrestrial carbon cycle predictions remains large (Arora et al., 2020) and is dominated more so by model uncertainty than by either internal variability of the climate system or emission scenario uncertainty (Lovenduski and Bonan, 2017; Bonan and Doney, 2018). Because the behavior of the terrestrial biosphere feeds back directly on the rate of CO2 accumulation in the atmosphere, understanding the most effective ways of reducing this model uncertainty is crucial. Progress can benefit not only long-term predictions of global change, but also near-term, regional-scale ecological forecasts aimed at informing sustainable decision-making (Dietze et al., 2018; Thomas et al., 2018; White et al., 2019) and modeling studies focused on understanding the recent past (Schwalm et al., 2020).

While ecological models are becoming more and more detailed, the extent to which predictive skill scales with model complexity is not clear. The logic behind enhancing model realism with increased complexity is intuitive: a highly simplistic model may be structurally unable to capture key relationships defining the system (it underfits), which would naturally imply that greater detail is needed to improve model performance. However, excessively complex models have their own limitations. Because they often contain more parameters than can be robustly determined with the available data (e.g., Prentice et al., 2015; Shi et al., 2018; Feng, 2020), they are prone to learning “noise” instead of true interactions (also called overfitting; Ginzburg and Jensen, 2004; Hawkins, 2004; Keenan et al., 2013). Equifinality – the case in which vastly different parameter sets can yield similar model performance (Beven, 1993; Beven and Freer, 2001) – also becomes more likely as model complexity increases. This dichotomy between model complexity and model performance is known in the statistics and machine learning communities as the bias–variance tradeoff. According to this theory, a model that balances the costs of under- and overfitting can minimize forecast error (Lever et al., 2016). It is therefore possible that other approaches to reducing carbon cycle model uncertainty (e.g., improving model parameterization) may be more effective than increasing structural realism in some circumstances, as also noted by Shiklomanov et al. (2020) and Wu et al. (2020a).

Here, we explicitly map the relationship between model complexity and predictive performance across a spectrum of model structures and parameterizations, hypothesizing that an intermediate-complexity carbon cycle model can outperform a low- or high-complexity one. Our approach can inform ecological models that operate on a spectrum of scales, from localized at the level of individual stands to highly generalizable across the global land surface. This study is particularly relevant for global ecological models, which often function as the land surface component of large-scale Earth system models and have been employed in contexts that carry significant policy relevance (e.g., Intergovernmental Panel on Climate Change (IPCC) reports; Stocker et al., 2014). Hereafter we refer to global ecological models as terrestrial biosphere models, or TBMs.

We note a distinction between conceptualizing complexity as a straightforward count of a model's parameters, equations, or processes, versus as an emergent property of its solution space. When locations or data constraints do not allow certain model parameter values or modeled states, this reduces the effective complexity of the remaining set of possible solutions. That is, one can consider what we term the “effective complexity” of a model as a function of the actual parameter combinations that are possible for that model, or equivalently, the volume of space occupied by these parameter combinations. Two models with the same number of parameters may have very different effective complexities, for example, because correlations between parameters (e.g., allocation fraction to foliage and turnover rate of foliage; Fox et al., 2009) or the extent to which they are constrained (i.e., many more states are possible in the absence of assimilated data than in the presence of it; Keenan et al., 2013), or when the assimilated data have high uncertainty, can influence the models' effective degrees of freedom. As a simple analogy, consider the difference between a sphere and a disc in three-dimensional space (Fig. 1). Although both exist within the space determined by three unconstrained parameters (axes), they are not identical because the volumes they occupy – and the relationships between their parameters – are drastically different. The same can be true between models: one model's equations or assimilated observations may constrain the dimensionality of its potential parameter space to “resemble” a disc, while that occupied by another, less constrained model may look more like a sphere.

Figure 1Conceptual diagram of effective complexity in three-parameter space. A sphere (a) has three unique dimensions spanning the three axes of variability (analogous to a larger solution space for a given model). In the region defined by the same three axes, a disc (b) has only two unique dimensions (analogous to a smaller solution space, perhaps due to two parameters being highly correlated).


Model–data fusion (MDF) systems (also known as data assimilation systems) provide an effective way of isolating and evaluating different model structures by using observations to derive optimized model parameters with uncertainty. An increasingly common tool for carbon cycle science, MDF has been leveraged to provide insight into long-term trends of carbon fluxes (e.g., Rayner et al., 2005), to reconcile the roles of specific datasets in constraining parametric uncertainty (e.g., Keenan et al., 2013), and more (Scholze et al., 2017). Here we use an MDF system called the CARbon DAta MOdel fraMework, or CARDAMOM (Bloom and Williams, 2015; Bloom et al., 2016), chosen because of its high customizability. The structure of its underlying ecosystem carbon model, DALEC (Williams et al., 2005; Bloom and Williams, 2015), can be easily adjusted to become more simple or detailed (e.g., by changing the number of carbon pools or by modifying the functional representations of certain carbon fluxes). Various combinations of observational and functional constraints can also be tested in the assimilation process, along with different assumptions on the amount of error inherent to each assimilated dataset (the characterization of which is an ongoing challenge for the modeling community; Keenan et al., 2011). Taken together, this flexibility allows for experimentation with the different levers that control effective model complexity.

In this paper, we demonstrate the extent to which the prediction accuracy of two key carbon cycle variables can theoretically scale with model complexity. Net ecosystem exchange (NEE) and leaf area index (LAI) were chosen for the analysis because they represent integrated effects of different parts of the carbon cycle (NEE is the balance of photosynthesis and ecosystem respiration fluxes, while LAI strongly controls canopy photosynthesis; Bonan, 1993). Additionally, both are commonly measured and modeled. To explore the complexity–skill relationship, we developed 16 structurally distinct carbon cycle models (i.e., variants of the DALEC model) spanning a range of complexity and calibrated them using the CARDAMOM framework. Several recent studies have demonstrated the utility of CARDAMOM for understanding multiple aspects of the carbon cycle (e.g., López-Blanco et al., 2019; Konings et al., 2019; Yin et al., 2020; Bloom et al., 2020; Quetin et al., 2020), lending confidence for its use here. We calibrated each DALEC variant within CARDAMOM under 42 different data scenarios (i.e., combinations of data constraints and assumptions about observational error) representing different degrees of certainty with which parameters are determined. Each model was calibrated and validated at six globally distributed eddy covariance sites covering a range of biomes and vegetation types, with data collected over multiple years. To quantify complexity, we computed the effective complexity of each model calibration using a principal component analysis (PCA) that reduced the parameter space to its primary axes of variance. Forecast skill was determined using an overlap metric that takes account of uncertainty both in the model forecast and the validation data. Though the range of complexity we evaluated here is lower than that populated by large-scale TBMs, this experiment reveals universal modeling elements that control performance. Specifically, here our COMPLexity EXperiment (COMPLEX) aims to answer the following questions. (a) What controls a given model run's effective complexity? (b) Under what conditions does increasing model complexity improve forecast skill?

2 Methods

2.1 Suite of carbon cycle models (DALEC variants)

The Data Assimilation Linked Ecosystem Carbon (DALEC) model suite includes 16 related intermediate-complexity models of the terrestrial carbon cycle. Each model variant tracks the state and dynamics of both live and dead carbon pools, their interactions, and responses to meteorology and disturbance such as fire or biomass removals. From an initial DALEC model (Williams et al., 2005), we produced alternate structures that either aimed to reduce complexity by focusing on core variables/processes and removing others or aimed to increase complexity by including hypothesized missing carbon pools or improving on over-simplified processes.

Accordingly, the DALEC suite spans a range of model structures (i.e., number of carbon pools, carbon pool connectivity) and process representations (component sub-models of varying complexity) related to different simulations of photosynthesis, plant respiration, decomposition, and water cycle feedbacks. These representations are listed in Table 1 and described in further detail in Appendix A. To facilitate disentanglement of the impacts of specific alternate process representations, the different sub-models can be related to a common baseline structure of the carbon cycle (Fig. 2a). Specific variants of this general structure for the least and most detailed models in this analysis are presented in Fig. 2b–c, while additional diagrams for the remaining models are shown in Appendix B (Figs. B1–B7). Across models, carbon enters the system via gross primary productivity (GPP), which is allocated to autotrophic respiration (Ra) and non-canopy live tissues based on fixed fractions. Canopy growth and mortality is determined by a phenology sub-model which is sensitive to day of year (sub-model scheme CDEA), environmental factors (GSI), or a combination of environmental factors and estimated net canopy carbon export (NCCE). Mortality of wood and fine roots follows continuous turnover based on first-order kinetics. Decomposition of dead organic matter and associated heterotrophic respiration (Rh) follows first-order kinetics with an exponential temperature sensitivity (and, in models C2–C5, a linear soil moisture sensitivity).

Table 1Summary of the DALEC sub-model combinations assessed in COMPLEX. For a detailed description see the Supplement. ID is model identifier. CDEA: Combined Deciduous Evergreen Analytical model; CDEA+: CDEA with variable labile release fraction; GSI: growing season index; NCCE: net canopy carbon export; ACM: aggregated canopy model; T: temperature; M: soil moisture; CUE: carbon use efficiency. fNPP : GPP indicates a fixed fractional allocation of gross primary production (GPP) to foliage net primary production (NPP). DOM is dead organic matter. Models are grouped according to common characteristics, as follows: C models all share the Combined Deciduous Evergreen Analytical (CDEA or CDEA+) phenology sub-model; G models use the growing season index (GSI) phenology sub-model; E models use the evergreen (constant allocation) phenology sub-model; and S models are simple, reduced-complexity variants of other models.

* Includes cold weather GPP limitation. a Includes surface runoff parameterization (assumes constant runoff to infiltration ratio at surface). b Includes two water storage pools (plant-available and plant-unavailable water).

Download Print Version | Download XLSX

Figure 2Overview of the carbon pools (filled boxes) and fluxes (arrows, with names in open boxes) represented in the DALEC model suite. (a) Broad structure of the DALEC model maintained across all variants in the suite; (b) carbon cycle structure of the simplest model; (c) carbon cycle structure of the most detailed model.


2.2 Site selection

COMPLEX uses information from six globally distributed eddy covariance sites participating in FLUXNET (Pastorello et al., 2020) (Table 2). Our site selection procedure aimed to maximize biogeographical spread and diversity of natural ecosystems while fulfilling specific data requirements. These constraints collectively yielded a series of site selection criteria that are described in detail in Appendix C. As an example, the sites must not be dominated by the C4 photosynthetic pathway, nor arable agriculture nor intensively grazed grassland. Additionally, we required that the range of time series observations to be used for model calibration and validation spanned at least a decade. Data collated at each site are described below (see Sect. 2.3).

Table 2Summary of sites, showing their location, FLUXNET code, observational time period, mean climate information, and ecosystem type. Latitude is given in -90/90 and longitude is -180/180. Ecosystem type is denoted using the International Geosphere-Biosphere Programme (IGBP) classification. DBF: deciduous broadleaf forest; EBF: evergreen broadleaf forest; ENF: evergreen needleleaf forest; WSA: woody savanna.

Download Print Version | Download XLSX

2.3 Model–data fusion

We used the CARDAMOM model–data fusion system (Bloom and Williams, 2015; Bloom et al., 2016) to parameterize the DALEC model suite with available observations of the carbon cycle. Specifically, we employed Bayesian inference to retrieve time-invariant, site-specific, optimized parameters and initial conditions for a given DALEC model (y) as informed by observations (O), where p(y|O)p(y)p(O|y). Here, p(y|O) is the posterior parameter probability distribution, p(y) is the prior parameter probability distribution, and p(O|y) is proportional to the likelihood of parameters y given observations O.

For each model, p(y) is derived as the product of (i) the prior probability density functions for each model parameter and (ii) ecological and dynamical constraints (EDCs, i.e., functional constraints). EDCs are simple mathematical functions that impose conditions on inter-relationships between model parameters based on known ecological theory. They are used to inform parameter prior information with broader ecological knowledge and tend to reduce bias and equifinality (Bloom and Williams, 2015). One example of an EDC in CARDAMOM is the imposed constraint that litter turnover times are faster than soil organic matter turnover times (e.g., Gaudinski et al., 2000). In this analysis, each model includes some or all of the EDCs documented in Bloom et al. (2016).

The likelihood p(O|y) is derived as a function of the mismatch between observations O and the model realization M corresponding to y, such that pO|yexp-12n=1NOn-Mnσn2, where σn is the error for the nth observation. This formulation requires no assumptions on the normality of prior or posterior parameter distributions and is robust to missing data. In our analysis, monthly-averaged eddy covariance NEE measurements from FLUXNET, monthly-averaged leaf area index (LAI) estimates from the Copernicus Global Land Service (Verger et al., 2014; Fuster et al., 2020), and in situ wood stock surveys were made available for ingestion into the model (see Appendix C). NEE uncertainty was assumed to be 0.58 gC m−2 d−1 based on estimates of random errors in eddy covariance measurements from Hill et al. (2012). A time-varying uncertainty estimate was included with the Copernicus LAI product, and site-specific, locally derived biomass uncertainties were provided by the site PI or drawn from relevant publications when necessary. Model drivers included monthly average site meteorology (air temperature, shortwave radiation, atmospheric CO2 concentration, vapor pressure deficit, precipitation, and wind speed). Here models were run at the monthly time step.

To sample the distribution p(y|O) (namely the product of p(O|y) and p(y)), we used an adaptive proposal Metropolis–Hastings Markov chain Monte Carlo (MCMC) approach (Haario et al., 2001). We performed 108 iterations for each of four chains, which were checked for convergence using the Gelman–Rubin criterion (<1.2). A subset of 100 samples of y was selected from the latter half of each chain for our analysis. For additional details on the implementation of this algorithm within CARDAMOM, see Bloom and Williams (2015).

2.4 Experimental design

We performed a factorial experiment such that each of the 16 structurally distinct carbon cycle models was run within CARDAMOM under all possible combinations of sites, observational and functional constraints, and assumptions on data uncertainties. These scenarios represented differing degrees of certainty with which parameter distributions were determined. Specifically, we considered (a) six sites; (b) six options for assimilated data, including one for which no data were ingested into the model; (c) four options for the magnitude of error assumed on the assimilated datasets (represented by scalar multipliers on the prescribed nominal uncertainties); and (d) two options for EDC state (either present or absent) (Table 3). In total, this factorial approach yielded 4032 unique model runs (16 models × 6 sites × 21 data scenarios × 2 EDC states). Using a high number of factorial model runs both added robustness to our interpretation and allowed for consideration of each factor's influence across a range of background conditions.

Table 3Model specifications varied in the factorial experiment. Each of the 16 model versions was run with every combination of scenarios across each variable. Note that observational error scalars were not applied when no data were assimilated into the model.

Download Print Version | Download XLSX

Figure 3 shows examples of three model analyses at the FR-LBr site, highlighting the range in NEE prediction performance across different model structures and data scenarios. Each model run contains a calibration period (the first 5 years of the site record; shown in white) during which optimized parameters were derived and a forecast period (the remaining years of the record, which always spanned at least 5 years because no site contained fewer than 10 years of data; shown in gray) during which fluxes and pools were predicted with the optimally parameterized model. In the scenario presented, model S2 is highly constrained by multiple datasets (Fig. 3a). By contrast, model C2 is moderately constrained (Fig. 3b), and model G4 is poorly constrained (Fig. 3c), which is evident by comparing the relative uncertainty of the NEE forecasts (blue shading) for each model. To highlight the effectiveness of the assimilation system, corresponding time series based only on prior parameter distributions are presented in Fig. S1.

Accounting for prediction uncertainty – as well as data uncertainty (red shading) – is a key goal of our model skill evaluation approach. Forecast skill for each model run was computed by comparing predictions and observations drawn strictly from the forecast period, using the histogram intersection algorithm (see Sect. 2.5.1). The complexity of each run was quantified based on its effective complexity (see Sect. 2.5.2).

Figure 3Example model runs (title of each subplot) at the FR-LBr site. The calibration window – the first 5 years of the record – is shown in white, and the forecast window is shaded gray. The ensemble spread (blue shading) encapsulates the 5th–95th percentiles of runs. (a) Forecast skill = 0.15; effective complexity = 7; (b) forecast skill = 0.44; effective complexity = 24; (c) forecast skill = 0.22; effective complexity = 39.


2.5 Analysis

2.5.1 Skill metric

We chose the histogram intersection as a skill metric because it captures accuracy along with both prediction uncertainty (i.e., the ensemble spread for a given model output) and observational uncertainty (i.e., the mean value and error for a given observation). This approach contrasts with more familiar metrics such as the coefficient of determination (R2) or root-mean-square error (RMSE), which do not account for uncertainties surrounding individual data points or predictions.

The histogram intersection is a simple algorithm that calculates the similarity of two discretized distributions p and q and is commonly used in the machine learning community (e.g., for image classification; Jia et al., 2006; Maji et al., 2008). Specifically, the histogram intersection of p and q is computed as i=1nminpi,qi, where n is the number of bins in the two histograms (here, n was set to 50). In our case, p was the histogram of predicted NEE or LAI ensembles for a given time step, and q was a discretized Gaussian distribution with mean and standard deviation equivalent to the observed NEE or LAI value and its error, respectively. We normalize the metric by i=1npi so that it is bounded between 0 (no overlap) and 1 (identical distributions). Because histograms p and q correspond to individual months in the forecast period, the metric used for analysis was the average histogram intersection over all such months.

We note that results for NEE predictions are presented in the main figures of this paper, while those for LAI predictions are included in the supporting information.

2.5.2 Complexity metric

The effective complexity of each model run links model structure (i.e., process representation) and number of parameters to the information content of assimilated data. It was computed using a principal component analysis (PCA) on the posterior parameter space. When applied to CARDAMOM output, the PCA reduces the posterior parameter space (n ensembles of m parameters) to a set of at most m uncorrelated variables that successively maximize variance. As such, this approach finds the smallest number of unique dimensions necessary to explain the most variability in the posterior parameter space of each model analysis. Specifically, we defined effective complexity as the number of principal components for which 95 % of variance in the posterior parameter space was explained. Note that in our experiment, a given DALEC model variant has a distribution of effective complexities corresponding to the different specifications for each run (i.e., data scenario, site; Table 3).

3 Results

3.1 Behavior of effective complexity metric

Effective complexity – defined as the number of principal components for which 95 % of the variance in the posterior parameter space is explained (see Sect. 2.5.2) – is primarily determined by model structure (Fig. 4a, inset). Specifically, over all runs included in the experiment, effective complexity varies far more between different models than between the other tested factors (assimilated data, observational error scalar, site, and EDC presence or absence). This link to model structure provides insight into the metric's interpretability and justifies its use as a measure of model complexity.

While predominantly determined by the choice of model, effective complexity also varies according to the degree to which parameters are constrained (Fig. 4a). It therefore captures the inter-relationship between model structure and parameterization. Within a given model structure, each of the experimentally varied factors yields a range of distinct complexities that follows a predictable pattern: effective complexity is higher for runs with weaker constraints on parameters than it is for runs with stronger constraints on parameters. This is easily interpretable in the case of assimilated data, which is the dominant within-model control on effective complexity (Fig. 4b). Runs for which no observations are ingested into the model have consistently higher effective complexities than runs for which NEE, LAI, and biomass observations are all ingested (compare yellow and purple circles in Fig. 4b), since the observational constraints reduce the possible model solution space. Similar behavior is also observed across the different error scalars tested in the experiment (larger observational error assumptions correspond to higher effective complexities (Fig. S2)) and between the presence versus absence of EDCs (the absence of non-observational realism constraints yields higher effective complexities (Fig. S3)). Conceptually, this pattern can be understood in the following way. Parameters in a given model's high-complexity runs were sampled from wider posterior distributions (due to weak or absent constraints) than in its low-complexity runs. This implies greater variance between parameter sets selected in high-complexity runs – and thus more distinct dimensions of variability in the posterior parameter space – than in low-complexity runs for the same model.

Figure 4Influence of the experimentally varied factors on effective complexity. (a) Range of effective complexity attributable to sites, error scalars, assimilated data, and EDCs for each model (row). Inset: range of attributed effective complexity across all model runs. (b) Average effect of assimilated data combination on effective complexity for each model. Colored circles are means of corresponding runs. Models are ordered from fewest (S1) to greatest (G4) number of parameters. See Table 1 for definition of model IDs.


3.2 Relationship between effective complexity and skill

Across all runs performed in the experiment, the hypothesis that an intermediate-complexity carbon cycle model can outperform a low- or high-complexity model is confirmed, both when NEE is predicted (Fig. 5a) and when LAI is predicted (Fig. S4a). Runs on both extremes of the complexity axis perform poorly, due to overfitting in the low-complexity case (parameters are over-determined, leading to accurate predictions in the training period but poor ones in forecast) and underfitting in the high-complexity case (parameters are under-determined, yielding poor predictions in both training and forecast). Figure 3a and 3c demonstrate this contrasting behavior at the FR-LBr site.

When runs for which no data were assimilated – that is, runs with the least informed parameters – are withheld from the analysis, increasing complexity no longer degrades skill (Fig. 5b). More specifically, the relationship between effective complexity and skill increases monotonically when all runs have some baseline constraint on parameters. This result also holds regardless of which variable is predicted (Fig. S4b) as well as when the number of runs within each complexity bin is standardized via bootstrapping (Fig. S5). The decline in performance attributable to the most extreme effective complexity scenarios is also preserved across RMSE and R2 metrics (not shown; further comparison between different metrics is beyond the scope of this paper). This finding implies that increasing complexity by introducing suitable data-constrained parameters can improve performance, but that doing so by adding unconstrained dimensions can degrade it. That is, the processes and parameters introduced in the most detailed models (such as G1–G4) can lead to improvements in predictive skill over simpler models only when they are sufficiently well-characterized (i.e., adequately informed by data). Importantly, larger observational uncertainty assumptions reduce the effectiveness of assimilated data at constraining parameters in high-complexity models. The monotonically increasing relationship between complexity and skill is strongest when observational error is assumed to be relatively small (Fig. S6).

Figure 5Relationship between effective complexity and NEE forecast skill for (a) all model runs in the experiment and (b) the subset of runs in panel (a) for which data were assimilated. Dark gray shading spans the 25th to 75th percentiles of runs; light gray shading spans the 5th to 95th percentiles; blue points are medians of effective complexity bins. Average forecast skill is computed using the histogram intersection metric.


Assimilated data determine the shape of the overall complexity–skill relationship in COMPLEX. Not only does the presence of any assimilated observations control the response of skill to increasing complexity, but the specific choice of assimilated observations also matters. In particular, assimilating monthly NEE observations improves both NEE (Fig. 6a–c) and LAI predictions (Fig. S7a–c) by complex models over simple models: note the positive/increasing trends between complexity and skill in these cases. However, such improvements in predictive performance are not consistently observed across the complexity axis when other data, but not NEE, are ingested. The ingestion of LAI data and biomass estimates yields a small positive trend (Fig. 6d) – although this relationship is clearly weaker than when NEE is also assimilated (Fig. 6a) – and simple models informed only by LAI perform just as well as complex models when predicting NEE. Indeed, these runs show a constant skill level across the complexity axis (Fig. 6e). When predicting LAI, though, complex models outperform simple models with only the assimilation of LAI (Fig. S7e). All such combinations contrast with the case in which no data are assimilated: forecast skill for those runs declines with complexity, regardless of target variable (Figs. 6f, S7f).

Figure 6Complexity–skill relationship for NEE predictions, split by combination of assimilated data (title of each subplot). Average forecast skill is computed using the histogram intersection metric. Ordering of subplots reflects the strongest (a) to weakest (f) data constraint.


Recall that the magnitude of skill – the degree of overlap between model predictions and observations (see Sect. 2.5.1) – reflects the ability of the model to capture the data along with its uncertainty. Particularly in scenarios corresponding to low effective complexities, models tend to overfit when NEE is assimilated (as demonstrated in Fig. 3a). Overfitting is a key factor causing the discrepancy in performance between low-complexity runs that do (e.g., Fig. 6c) and do not assimilate NEE (e.g., Fig. 6e).

Regardless of which data are assimilated, site-specific characteristics also introduce additional variability into the form of the relationship between effective complexity and skill (Fig. 7). To better understand and isolate site-specific dynamics, here we only interpret runs for which at least one data type is assimilated. Most sites show high-complexity performance optima, consistent with Fig. 5b. However, several are characterized by a threshold effect for which performance increases significantly once a certain effective complexity is attained and remains stagnant thereafter (e.g., a low-complexity threshold around 10 for FI-Hyy and FR-Pue). This “diminishing returns” effect suggests that the performance benefit of added structural detail has the potential to stabilize for all but the simplest models. The two tropical sites included in our analysis demonstrate additional unique dynamics. GF-Guy is the only site for which the performance of the most complex models appears to slightly degrade, even when all observations including NEE are assimilated, and no threshold is apparent at AU-How. Overall, the site analysis demonstrates the large variability in model performance across space, including between sites sharing biome classifications (e.g., FI-Hyy and FR-LBr) or broadly similar climate types (e.g., GF-Guy and AU-How).

4 Discussion

4.1 Effective complexity and the inter-relationship between model structure and parameterization

We defined a concept of effective complexity that is linked to model structure and number of parameters as well as to the information content of calibration data (Fig. 4). This metric can inform future studies seeking to investigate the role of model complexity by providing a simple and comparable quantification of parameter posteriors. Conventional complexity measures (e.g., counts of observable model attributes) can serve as reasonable approximations of the more nuanced definition specific to ensemble methods that we present here. Still, effective complexity is rarely identical to the number of model parameters: it is generally lower. Correlations between model parameters can and do occur whether the model is poorly or well-constrained (Keenan et al., 2013) and whether it is simple or complex, implying that all carbon cycle models have “constrainable” dimensions. Importantly, though, none of the high-parameter models in our experiment have so much redundancy that their average effective complexity across runs is equivalent to that of any low-parameter model (Fig. 4). Whether this is also true for large-scale TBMs remains an open question.

Overall, the behavior of the effective complexity metric highlights that the best-performing analyses (i.e., runs with the highest forecast skill) in the COMPLEX maximize model structural breadth and minimize parametric uncertainty. Models built with high numbers of processes but without effective parameter constraints (i.e., runs that maximize structural breadth but do not attempt to minimize parametric uncertainty) are not sufficient to optimize performance (Fig. 5). Additionally, models of the carbon cycle can overfit if they are calibrated in too narrow a subset of conditions and underfit if they are improperly parameterized and therefore biased, as shown in Fig. 3.

Figure 7Complexity–skill relationship for NEE predictions, split by site (title of each subplot). Only runs for which data were assimilated are plotted. Average forecast skill is computed using the histogram intersection metric.


4.2 Influence of data constraints and site on complexity–skill relationship

The main factors controlling the observed complexity–skill relationship are (a) whether, and which, data are assimilated into the model and (b) the geographical location at which the analysis is undertaken. One way to interpret the role of data in the relationship is explicit: models with the ability to assimilate monthly observations of NEE, which uniquely represent the integrated behavior of terrestrial carbon cycling and its internal dynamics, are more likely to experience gains in skill with increased complexity than those that cannot. This result is consistent with the prominent role of NEE observations in reducing model projection uncertainty identified by Keenan et al. (2013). The effects of LAI and biomass observations in COMPLEX are somewhat more nuanced. All models in the DALEC suite are able to extract information from the LAI data and produce reasonably skilled NEE predictions (Fig. 6e), though such data do not improve the skill of complex models over simple ones. The ingestion of LAI data most directly constrains specific features relating to growth or carbon allocation, potentially informing the seasonality of NEE. Finally, the impacts of biomass observations on forecast skill were relatively muted in our experiments. Given that biomass data are particularly useful for informing the carbon cycling of slow pools (Williams et al., 2005), the relatively short calibration (5 years) and forecast periods (≥5 years) tested here, along with the temporal sparsity of these data in COMPLEX (i.e., a few measurements per site instead of continuous time series for LAI or NEE), may have obscured their utility.

Several recent TBM efforts have sought to enable the assimilation of eddy covariance or remote sensing observations (e.g., Bacour et al., 2015; Raoult et al., 2016; Schürmann et al., 2016; Peylin et al., 2016; MacBean et al., 2018; Norton et al., 2019) as well as measurements of functional traits (e.g., LeBauer et al., 2013). Our results underscore the value of such efforts to reduce parameter uncertainty, despite the fact that the computational costs associated with data assimilation are relatively high (e.g., MacBean et al., 2016). Increased use of emulators may help reduce this computational cost (Fer et al., 2018).

Given the demonstrated value of data constraints and the specification of their uncertainty (Fig. S6), the need to characterize and quantify this uncertainty (Keenan et al., 2011) remains particularly critical for model–data fusion studies. In this analysis, NEE uncertainty was assumed to remain constant both in time (i.e., for all observations regardless of season or year) and in space (i.e., across sites), which likely over-generalizes the specifications of individual sensors and the possibility of systematic or increasing biases. These assumptions become even more important to account for when assimilating global datasets, for which retrieval accuracy can vary across land cover types or with atmospheric conditions such as clouds or snow (e.g., Fang et al., 2013). One benefit of the Copernicus LAI product used here is its explicit, spatially variable quantification of uncertainty, which is still relatively rare for remote sensing datasets. Though the robustness of these uncertainties has been challenged with independent observations in some locations (e.g., Zhao et al., 2020), this approach represents a level of detail well-suited to the coupling of data to large-scale or global models.

The observed variability in the complexity–skill relationship across sites (Fig. 7) suggests that predictability itself is spatially heterogeneous. Further, it implies that the benefit to model performance accrued by the addition of a given process should not be expected to affect all locations uniformly, even when site-specific parameter uncertainty is minimized through calibration or optimization. Models not tuned locally likely smooth this spatial variability in predictability drastically (van Bodegom et al., 2012; Berzaghi et al., 2020), and thus model development and calibration must include locations spanning a wide range of vegetation, climate, soil characteristics, and disturbance histories.

4.3 Recommendations for selecting appropriate model complexity

Overall, our results suggest that the benefits of increased model complexity (e.g., gains in skill attributable to the introduction of specific processes or to additional detail applied to existing mechanisms) are attainable only when parameters are sufficiently well characterized. Here, this benefit is achieved when high complexity is balanced by data-assisted parameter optimization (in particular, when NEE observations are assimilated). More broadly, the relationship between complexity and skill is dynamic and extends beyond model structural choices. As a result, it is difficult to quantify whether model parameters corresponding to any specific model implementation – including outside the DALEC suite – are adequately informed such that increased model complexity is beneficial to performance. To assist in this endeavor, we present the following recommendations for model development and evaluation.

  1. Assimilate well-characterized, repeat-observation datasets to constrain model parameters at the scale of model application.

  2. Use long time series to undertake independent forecast evaluation studies, and factor observational uncertainty into model evaluation (e.g., using overlap metrics).

  3. Test whether model updates that add complexity lead to forecast improvements (not only calibration improvements), and test for possible model simplification improvements also.

  4. Seek to calibrate or optimize model parameters even when data assimilation is not possible (e.g., using optimality-based approaches; Walker et al., 2017; Jiang et al., 2020).

Finally, while beyond the scope of this study, future work will investigate the linkage between specific processes or process representations (e.g., the inclusion or exclusion of water cycling) and predictive performance to better parse ecological controls on the complexity–skill relationship.

4.4 Transferability to large-scale models (TBMs)

This analysis tested a spectrum of structurally distinct representations of the carbon cycle based on the intermediate-complexity ecosystem model DALEC, which allowed for coupling with the CARDAMOM model–data fusion system in a computationally tractable manner. Because our findings are not explicitly linked to the roles of specific processes or model features, however, their implications extend beyond the use of DALEC-like models to a wide variety of ecological models, including TBMs.

Traditional (based on plant functional type, or PFT) parameter determination in TBMs is far from random. It is informed by data – for example, by hypotheses or generalizations derived from prior literature (e.g., Oleson et al., 2010; Lawrence et al., 2011) or by model calibration at specific locations (e.g., Williams et al., 1997) – and therefore endowed with ecological knowledge. Accordingly, TBM parameters are likely more informed than the least constrained parameters retrieved in our analysis, which were freely sampled from wide uniform distributions and caused the high-complexity decline in performance (Fig. 5). However, while this may be true locally, the common assumption on uniformity of parameters within PFTs casts doubt on their precision across the regional or global scales at which TBMs typically make predictions (van Bodegom et al., 2012). Indeed, using a suite of global TBMs participating in the Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP; Huntzinger et al., 2013), Schwalm et al. (2019) showed that increases in model performance were more often linked to the omission rather than inclusion of various processes, suggesting a tradeoff between complexity and skill similar to that observed here. This conclusion calls into question the conventional paradigm that greater complexity significantly and consistently improves skill across current TBMs.

Earth observation (EO) is one key approach that can provide the high-spatial- and high-temporal-resolution data on carbon cycling needed for more localized calibrations (Exbrayat et al., 2019). In COMPLEX, we used Copernicus LAI data, though there are also opportunities to ingest biomass maps from space lidar or radar, estimates of photosynthesis from solar-induced fluorescence (SIF), and satellite-based atmospheric inversions of regional NEE, among others, in future studies. If supplied with appropriate error estimates, these datasets can over time provide powerful constraints for high-resolution carbon cycle analyses with TBMs or DALEC-like models. A key research goal is to determine the appropriate model complexity for maximizing the information content of these EO data for robust forecasts and analyses.

Alternative methodologies for deriving ecosystem parameters outside the realm of PFTs are also becoming increasingly common (van Bodegom et al., 2012; Bloom et al., 2016; Exbrayat et al., 2018; Berzaghi et al., 2020; Fisher and Koven, 2020) and may represent a way forward in addressing the tradeoff between structural and parametric uncertainty. Recent work has focused on upscaling in situ trait data (e.g., from the TRY database; Kattge et al., 2020) to yield spatially variable maps of key ecosystem parameters, using modeled relationships with climate or canopy properties (often referred to as environmental filtering relationships, since the environment “filters” the possible distribution of parameters at a given location; e.g., Verheijen et al., 2013; van Bodegom et al., 2014; Butler et al., 2017), leaf economics (Sakschewski et al., 2015), or optimality theory (e.g., Smith et al., 2019). Other studies have investigated how TBM parameters optimized at eddy covariance sites covary with climate (e.g., Peaucelle et al., 2019; Wu et al., 2020b). These efforts are not without their challenges, however. The spatial coverage of in situ trait data as well as eddy covariance sites is sparse relative to the large diversity of ecosystem behavior (Schimel et al., 2015), and such datasets also comprise a non-representative sample of species and disturbance histories (Sandel et al., 2015). These biases may limit the representativeness of the modeled relationships. Taking a different approach, a small subset of models has also been developed to operate altogether independently from the paradigm of PFTs (e.g., using trait-based approaches; Scheiter et al., 2013; Pavlick et al., 2013; Fyllas et al., 2014). Our results imply that these and future developments to improve the flexibility of model parameters will play critical roles in enabling the trend of increasing model complexity and may be a more fruitful avenue towards reducing the uncertainty of TBM prediction than model structural changes and additions.

5 Conclusions

Our approach to understanding the relationship between model complexity and model predictive performance is novel in its focus on sampling the spectrum of possible parameter uncertainty states for a variety of model structures and calibration data. Taken together, lessons learned from the behavior of the effective complexity metric as well as the data and site effects discussed here represent a comprehensive pattern: improving the robustness of parameter calibration is a prerequisite for effectively increasing structural complexity. Specifically, we found that increasing model complexity actively degrades predictive skill in the most extreme cases of parameter uncertainty. Assimilating data – particularly monthly observations of net ecosystem exchange – considerably improve the performance of complex models relative to simple models, though the magnitude and persistence of this improvement vary across space. Overall, the growing focus on understanding and reducing parametric uncertainties within large-scale models (such as via direct data assimilation, the development and implementation of alternatives to PFTs, and parameter sensitivity analyses; e.g., Fisher et al., 2019, and more) is both a necessary direction and a significant opportunity for improving the predictability of the terrestrial biosphere. Our conclusion for model construction and usage matches those from other scientific fields, as stated by Albert Einstein: “to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience” (Caprice, 2013).

Appendix A: DALEC model descriptions

The Data Assimilation Linked Ecosystem Carbon (DALEC) model suite includes a range of related intermediate-complexity models of the terrestrial carbon cycle. Each model version is comprised of sub-models related to different simulations of photosynthesis, plant and heterotrophic respiration, canopy phenology, stomatal conductance, and the inclusion of water cycling (Table 1). The sub-models are described in detail in the following sections (Sect. A.1–A.5). Each section contains a table highlighting the key features of each sub-model (Tables A1–A5).

A1 Photosynthesis and stomatal conductance

A1.1 Aggregated Canopy Model Version 1 (ACM1)

The Aggregated Canopy Model Version 1 (ACM1) estimates canopy gross primary productivity (i.e., photosynthesis) as a function of temperature, shortwave radiation, day length, atmospheric CO2 concentration, leaf area, and mean foliar nitrogen content (Williams et al., 1997; Fox et al., 2009). ACM1 was designed and calibrated to emulate a state-of-the-art process-orientated ecosystem model SPA (Williams et al., 1996, 2001; Smallman et al., 2013). As such, ACM1 contains 10 parameters which implicitly capture the more complex process representations (e.g., temperature sensitivity, radiative transfer) found within SPA. An 11th parameter represents the canopy photosynthetic efficiency (the product of nitrogen use efficiency and foliar nitrogen), which is estimated by CARDAMOM as a location-specific, optimized value.

ACM1 has no explicit capacity to simulate drought or direct overheating stress on canopy processes. Canopy photosynthesis is connected to the wider carbon cycle through the leaf area, although the role of the roots in water supply is neglected as is its interplay with CO2 supply via stomatal conductance.

A1.2 Aggregated Canopy Version 1 + cold weather GPP

The GPP module also includes an empirical cold-weather GPP limitation sensitivity function. The cold temperature limitation factor (denoted as g) is used as a multiplier on the DALEC GPP function output, to act as a thermostat that regulates evergreen needleleaf carbon uptake. The cold-weather factor g is calculated using added model parameters (Tminmin and Tminmax) and temperature observations (Tmin), such that g=0 if Tmin< Tminmin, g=1 if Tmin> Tminmax, and g=(Tmin-Tminmin)/(Tminmax-Tminmin) otherwise.

A1.3 Aggregated Canopy Version 2 (ACM2)

The aggregated canopy model for gross primary productivity and evapotranspiration is the successor version to ACM1, hereafter known as ACM2 (Smallman and Williams, 2019). ACM2 builds on the ACM1 outline creating a model of ecosystem water cycling to facilitate the implementation of a mechanistic stomatal conductance model linking the canopy to soil water via fine roots. ACM2 also optimizes the stomatal intrinsic water use efficiency (for details see Williams et al., 1996; Bonan et al., 2014). ACM2 simulates shortwave and longwave isothermal radiation balances, canopy interception of rainfall, and soil infiltration. ACM2 is therefore capable of simulating canopy transpiration, soil evaporation, evaporation of canopy intercepted rainfall, soil water runoff and drainage.

A1.4 Analytical Ball–Berry

For the analytical Ball-Berry GPP module of CARDAMOM, leaf-level GPP and stomatal conductance are calculated using the coupled leaf photosynthesis–stomatal conductance developed by Ball–Berry (Ball et al., 1987) and an analytical solution to the system of equations developed by Baldocchi (Baldocchi, 1994). This new module serves to calculate both GPP and evapotranspiration coupled through the stomatal behavior. This formulation added the maximum rate of carboxylation (Vcmax), the maximum rate of electron transport (Jmax), stomatal slope and intercept, and boundary layer conductance to the set of parameters that were optimized through data assimilation, while removing the explicit water use efficiency (where there is a water cycle in CARDAMOM) and canopy efficiency parameters. We scaled the leaf level results of GPP and stomatal conductance to the canopy as a “big leaf” with an exponential decay function of LAI (Sellers et al., 1992).

Table A1Summary of the key features for each photosynthesis sub-model.

Download Print Version | Download XLSX

A2 Autotrophic respiration (Ra)

Autotrophic (plant) respiration (Ra) is a key ecosystem carbon flux returning approximately half of GPP back to the atmosphere (Waring et al., 1998). While this overall proportionality remains true, subsequent studies have identified variation in the Ra : GPP fraction linked, among others, to climate, nutrient status, and plant age (e.g., Collalti and Prentice, 2019). Furthermore, there are multiple competing hypotheses for how to explain the broad proportionality and site-specific variations (e.g., Collalti and Prentice, 2019; Collalti et al., 2020), requiring an investigation of multiple approaches.

Table A2Summary of key features for each respiration sub-model.

Download Print Version | Download XLSX

A2.1 Fixed Ra : GPP fraction

Autotrophic respiration (Ra) is assumed to be a fixed (time-invariant) fraction of GPP (Ra : GPP) such that

(A1) R a = GPP × R a : GPP .

It varies in space as a retrieved location-specific parameter. A prior value (0.46±0.12) for the Ra : GPP fraction is drawn from Waring et al. (1998) and (Collalti and Prentice, 2019).

A2.2 Fixed Rm : GPP fraction Rg : NPP

Ra can be divided between respiration associated with tissue growth (Rg) and maintenance (Rm). Rg has a robust mechanistic understanding, allowing it to be estimated as a fixed fraction of carbon allocated to plant tissues (Calloc; gC m−2 d−1) independently of ecosystem type and climatic conditions (0.22; Waring and Schlesinger, 1985):

(A2) R g = C alloc × 0.22 .

We continue to retrieve a location-specific fixed fraction of GPP respired as Rm (Rm : GPP):

(A3) R m = GPP × R m : GPP .

This formulation allows for variation between the proportion of Ra attributed to either Rg or Rm, as they have independent drivers. Note that this model structure implicitly assumes that maintenance respiration is fully coupled to GPP and growth activity, neglecting any distinct temperature sensitivity of respiration versus photosynthesis.

Table A3Summary of key features for each decomposition sub-model.

Download Print Version | Download XLSX

A2.3 Canopy cost respiration model

The sensitivity of Rm to tissue temperature and nitrogen content is well established (e.g., Ryan, 1991; Reich et al., 2008; Atkin et al., 2017); however the exact formulation of the relationship remains poorly understood (Thomas et al., 2019). We implemented the canopy maintenance respiration model proposed by Reich et al. (2008), which has been extensively evaluated in comparison with alternate approaches (Thomas et al., 2019). Wood and fine root maintenance respiration continue to be represented using a fixed fraction as described in Sect. A.2.2. Estimation of growth respiration continues to be a fixed fraction of NPP.

Following Reich et al. (2008), the estimation of canopy maintenance respiration occurs in two stages: (i) estimation of the canopy maintenance respiration per gram of leaf carbon at 20 C (Rm-leaf20; gC (m−2 leaf)−1 d−1) and (ii) daily temperature adjustment. Rm-leaf20 is estimated as a function of the leaf nitrogen concentration ([Nleaf]; mmol N (g leaf)−1) and two retrieved parameters. Parameter α represents the reference maintenance respiration at 20 C and [Nleaf] = 1, while β is the exponential [Nleaf] sensitivity parameter. Both α and β are retrieved by CARDAMOM as DALEC model parameters. The Reich et al. (2008) model estimates maintenance respiration in units of nmol C (g leaf)−1 s−1, which is adjusted to gC (gC leaf)−1 d−1 by the remaining terms: 1×10-9 scales from nmolC to molC, 12 is the atomic mass of carbon adjusting molC to gC, the factor 2 adjusts gC (g leaf)−1 s−1 to gC (gC leaf)−1 assuming 50 % of leaf biomass is carbon, and 86 400 is the number of seconds in a day giving gC (g leaf)−1 d−1:

(A4) R m - leaf 20 = 10 α × N leaf β × 1 × 10 - 9 × 12 × 2 × 86 400 .

[Nleaf] is determined from existing DALEC parameters representing the mean foliar nitrogen content (avN; gN m−2) and leaf mass per unit area (LMA; g m−2):

(A5) N leaf = avN / LMA 14 × 1000 .

The factor of 14 is the atomic weight of nitrogen and 1000 scales the result from molN g−1 to mmolN g−1.

Temperature strongly impacts metabolic activity and thus maintenance respiration. The canopy maintenance respiration (Rm−leaf) at the current temperature (T) is estimated following a Q10 function (=2; widely used) and scaled by the size of the canopy carbon pool (Cfol; gC m−2):

(A6) R m - leaf = R m - leaf 20 × 2 0.1 T - 20 × C fol .

The instantaneous temperature response is well captured by existing models. However, the impact of long-term temperature changes and associated acclimation of both photosynthetic and respiratory pathways are not accounted for. Therefore, simulations over longer timescales may overestimate negative feedbacks of increased canopy maintenance respiration due to warming (Atkin et al., 2015; Wang et al., 2020).

Table A4Summary of key features for each phenology sub-model.

Download Print Version | Download XLSX

Table A5Summary of key features for each water cycle sub-model.

Download Print Version | Download XLSX

A3 Decomposition and heterotrophic respiration

Heterotrophic respiration results from decomposition and mineralization processes in carbon pools containing dead organic matter. Depending on the model structure, these can include a fine litter pool (Rh−lit composed of foliar and fine root inputs), a wood litter (Rh−woodlit both fine and coarse woody debris), and soil organic matter (Rh−som). In all cases, decomposition and mineralization follow a first-order kinetic approach with environmental modifiers. When litter and wood litter pools turn over, a fraction of their carbon is released as heterotrophically respired C while the remainder passes to the soil organic matter pool (Dlit, Dlitwood; gC m−2 d−1). All decomposition of soil organic matter is heterotrophically respired. All models assume heterotrophic C respiration is respired as CO2.

A3.1 Temperature sensitivity

All dead organic matter pools follow a common basic form of a pool-specific turnover parameter (Θpool; fraction per day at 0 C) combined with an exponential response linked to temperature (Tmax; C) and a sensitivity parameter (γ):

(A7) R h - pool = C pool × Θ pool × e γ T max .

A3.2 Temperature and soil moisture sensitivity

Heterotrophic respiration regulated by both temperature (as in Sect. A.3.1) and a linear function of the ratio of current precipitation to the site mean (as proxy for near-surface soil moisture). The functional form allows for varying linear sensitivity, such that

(A8) R h - pool = C pool × Θ pool × f T × P / P - 1 s p + 1 ,

where P is the monthly precipitation, P is the average precipitation, and sp is the precipitation sensitivity parameter. Note that sensitivity is positive-definite (i.e., no heterotrophic limitations induced for high-moisture events). See Quetin et al. (2020) and Bloom et al. (2020) for further details.

A4 Canopy phenology

A4.1 Combined Deciduous-Evergreen Analytical (CDEA) model

The CDEA phenology model is based primarily on a day of year approach to simulate the turnover of a labile pool to support canopy growth and subsequent canopy turnover (Bloom and Williams, 2015). Each time step, a fixed fraction of GPP is allocated to the canopy and a labile pool which supplies the canopy with new growth based on the CDEA model. The CDEA model uses parameterized values for the peak day of year for labile turnover (i.e., supplying leaf growth) and leaf turnover plus two further parameters which define the standard deviation of a Gaussian distribution specifying the period of time over which canopy phenology occurs. The fraction of the canopy which is turned over each year is defined by a leaf lifespan parameter, while the labile pool is assumed to fully turnover each year.

The CDEA model provides an easy-to-calibrate diagnostic model of mean canopy phenology. However, it does not vary phenology in response to changing environmental conditions limiting simulation of inter-annual variability. As a result, the CDEA model has a limited capacity to inform on the meteorological drivers of canopy phenology.

A4.2 CDEA+

Phenology is the same as Sect. A.4.1; labile C release to foliar C is optimizable (annually  15 %–100 % of labile C allocated to foliar C).

A4.3 Growing season index (GSI) + GPP return

Canopy phenology is sensitive to environmental conditions (e.g., Jolly et al., 2005; Forkel et al., 2015) and plant carbon economic constraints (e.g., Flack-Prain et al., 2021) driving interannual variation in leaf area dynamics. The growing season index (GSI) is a piecewise model linking canopy phenology to linear functions of day length, temperature, and vapor pressure deficit scaled 0–1 (GSI; Jolly et al., 2005). The GSI model was implemented in Smallman et al. (2017) and augmented to include a requirement for new leaf area to lead to an increase in GPP greater than a critical threshold retrieved as part of CARDAMOM.

However, we note that recent plant economic theory indicates that canopies are optimizing net canopy carbon export (NCCE; e.g., Thomas et al., 2019; Flack-Prain et al., 2021) – that is, photosynthesis minus respiratory and construction costs, rather than photosynthesis alone. To investigate this level of process complexity, in Sect. A.4.4 we include a canopy maintenance respiration model to assess the NCCE.

A4.4 Growing season index (GSI) + net canopy carbon export (NCCE)

Optimality theory is increasingly being used to explain canopy phenology based on maximizing some metric of the carbon economy. One approach which is gaining support is optimizing net canopy carbon export (NCCE): that is, ensuring photosynthetic gains are greater than costs associated with leaf growth and maintenance respiration (e.g., Thomas and Williams, 2014; Flack-Prain et al., 2021). While further research is needed to refine these theoretical models, we implement a model consistent with existing literature.

The GSI model proposes an amount of new leaf area. Whether this grows or not is determined by quantifying whether the increase in GPP averaged over the expected life span of the leaf is greater than the increased maintenance respiration costs and the carbon required to construct the new leaf and the associated growth respiration.

A5 Water cycling

A5.1 Empirical bucket

The bucket approach extends the DALEC baseline structure to include a plant-available water pool, where the hydrological balance is defined as the sum of precipitation inputs (P) and evapotranspiration (ET) and runoff (R) outputs. The total plant-available water W at time t+1 is determined in the following way:

(A9) W t + 1 = W t + P t - ET t - R t Δ t ,

where Δt is the time period. Runoff is calculated as

(A10) R t = α W t 2 ,

where α is a second-order decay constant. Evapotranspiration is derived as

(A11) ET t = GPP t VPD t ν e ,

where νe is the inherent use efficiency. The plant-available water limits GPP such that

(A12) GPP t = GPP max t max 1 , W t ω ,

where ω is the plant-available water stress threshold. Note that the parameters α, νe, ω, and W0 are optimized in CARDAMOM. For further details, see Quetin et al. (2020) and Bloom et al. (2020).

A5.2 ACM2: multi-layer root model

The ACM2 model includes a multi-layer representation of the soil and root access (Smallman and Williams, 2019). There are five soil layers, three of which are accessible to roots to supply the canopy with water. The top two layers have a fixed thickness of 10 and 20 cm, respectively, with a third layer which is expandable based on root penetration. Soil-layer-specific field capacity, porosity, and hydraulic conductances are calculated using soil texture. Using these data, infiltration of precipitation, drainage between soil layers, soil hydraulic resistance to root uptake of water, and soil surface evaporation are estimated. Soil surface evaporation occurs from the top soil layer only. For a complete description, see Smallman and Williams (2019).

Appendix B: Carbon cycle structure for DALEC variants

Figure B1Carbon cycle structure for models C1–C8.


Figure B2Carbon cycle structure for model E1.


Figure B3Carbon cycle structure for models G1–G4.


Figure B4Carbon cycle structure for model S1.


Figure B5Carbon cycle structure for model S2.


Figure B6Carbon cycle structure for model S3.


Figure B7Carbon cycle structure for model S4.


Appendix C: Data requirements and site selection

COMPLEX uses information from six sites across the globe (Fig. C1). The selection aimed to maximize their biogeographical spread and diversity of natural ecosystems while fulfilling specific data requirements. A key DALEC model criterion requires that the sites must not be dominated by C4 photosynthetic pathway, be arable agriculture or intensively grazed grassland. COMPLEX makes use of a range of time series observations, including LAI, NEE, and wood stock inventory. Furthermore, the experiment uses temporally distinct calibration and prediction periods requiring observational constraints to span both periods. Collectively both scientific and data availability created a series of site selection criteria which are described below.

Time series information on leaf area are drawn from the (EO) Copernicus 1 km product derived from Earth observations which provides estimates of LAI magnitude at fine temporal resolution and concurrent location-specific estimates on uncertainty. Using this EO product and the abovementioned calibration/prediction period constraints requires site data collection periods to be post-1998.

Simulation of NEE is a key focus of COMPLEX, making the availability of long-term, temporally consistent, high-quality NEE estimated derived from eddy covariance essential (e.g., FLUXNET2015; Pastorello et al., 2020). The FLUXNET2015 database provides consistent information on data quality (e.g., observation uncertainty and proportion of model–data gap-filling) that underpins the site selection process. Here, to avoid comparing DALEC-simulated NEE with largely statistically gap-filled observations, only sites with <20 % gap-filled data are used.

Hill et al. (2012) demonstrated that assimilation of NEE observations provides substantial new information up to at least 5 years in duration. To create a balanced experimental design, COMPLEX sites are required to have a minimum of 10 years of observations (i.e., 5 years calibration and remainder evaluation). Building on existing analyses with DALEC (e.g., Smallman et al., 2017), COMPLEX quantifies the role of woody biomass information on constraining the DALEC models' predictive capacity of NEE. Therefore, multiple wood stock estimates are required spanning both the calibration and prediction periods. As determining the amount and access of inventory data often requires direct contact with site managers, this stage occurs later in the selection process.

Figure C1Map of FLUXNET sites used in the experiment.

Collectively, the abovementioned and model process representations formed the basis of a site selection procedure to filter the FLUXNET2015 database. This process ultimately led to the selection of six sites (Table 2).

  • a.

    Sites must represent a natural ecosystem (i.e., remove arable agriculture and intensively grazed sites) dominated by C3 photosynthesis species.

  • b.

    Sites have observations spanning >10 years after 1998.

  • c.

    Sites have <20 % gap-filled observations: threshold varied to ensure that at least one site representative is available for boreal, temperate, and tropical ecosystems spanning, where appropriate, canopy phenological types (i.e., needle versus broadleaf, evergreen versus deciduous).

  • d.

    Contact site managers to determine availability of wood stock observations.

Code and data availability

Data generated in COMPLEX (performance and complexity metrics corresponding to each model run) are publicly available at (Famiglietti, 2020). We thank FLUXNET site PIs Jean-Marc Ourcival and Serge Rambal (FR-Pue), Lindsay Hutley and Jason Beringer (AU-How), Bill Munger and Steve Wofsy (US-Ha1), Denis Loustau (FR-LBr), and Timo Vesala (FI-Hyy) for providing much of the data used in our analysis. We thank Yuan Zhao, Rong Ge, and Penghui Zhu for their assistance in preparing the data. Analysis code is available at (, Famiglietti, 2021).


The supplement related to this article is available online at:

Author contributions

AAB, MW, TLS, AGK, GRQ, SFP, VM, and CAF planned the analysis. CAF, TLS, PAL, GRQ, SFP, VM, NCP, SGS, YY, AAB, MW, and AGK contributed to model development. TLS and MW developed site selection criteria, contacted site PIs, and gathered input data. TLS and PAL executed model runs. CAF performed analysis on model outputs. CAF wrote the manuscript with contributions from TLS, PAL, GRQ, AAB, MW, and AGK. All authors reviewed drafts of the manuscript.

Competing interests

The authors declare that they have no conflict of interest.


This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) (, last access: 20 May 2020). Part of this work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (NASA).

Financial support

Caroline A. Famiglietti, Gregory R. Quetin, and Alexandra G. Konings were supported by NSF DEB-1942133 and NASA NNH16ZDA001N-IDS. Operation of the US-Ha1 site is funded by the US Department of Energy's Office of Science (DE-AC02-05CH11231) and National Science Foundation LTER funding (DEB-1832210). The Howard Springs site is funded by the Australian Research Council FT1110602, DP160101497, and Australian Terrestrial Ecosystem Research Network – Ecosystems Process platform. Mathew Williams received funding from NERC (NE/P018920/1), the UK Space Agency, Newton Fund CSSP Brazil, and the Royal Society.

Review statement

This paper was edited by Sönke Zaehle and reviewed by Enqing Hou and one anonymous referee.


Aguilos, M., Herault, B., Burban, B., Wagner, F., and Bonal, D.: What drives long-term variations in carbon flux and balance in a tropical rainforest in French Guiana?, Agr. Forest Meteorol., 253–254, 114–123,, 2018. 

Arora, V. K., Katavouta, A., Williams, R. G., Jones, C. D., Brovkin, V., Friedlingstein, P., Schwinger, J., Bopp, L., Boucher, O., Cadule, P., Chamberlain, M. A., Christian, J. R., Delire, C., Fisher, R. A., Hajima, T., Ilyina, T., Joetzjer, E., Kawamiya, M., Koven, C. D., Krasting, J. P., Law, R. M., Lawrence, D. M., Lenton, A., Lindsay, K., Pongratz, J., Raddatz, T., Séférian, R., Tachiiri, K., Tjiputra, J. F., Wiltshire, A., Wu, T., and Ziehn, T.: Carbon–concentration and carbon–climate feedbacks in CMIP6 models and their comparison to CMIP5 models, Biogeosciences, 17, 4173–4222,, 2020. 

Atkin, O. K., Bloomfield, K. J., Reich, P. B., Tjoelker, M. G., Asner, G. P., Bonal, D., Bönisch, G., Bradford, M. G., Cernusak, L. A., Cosio, E. G., Creek, D., Crous, K. Y., Domingues, T. F., Dukes, J. S., Egerton, J. J. G., Evans, J. R., Farquhar, G. D., Fyllas, N. M., Gauthier, P. P. G., Gloor, E., Gimeno, T. E., Griffin, K. L., Guerrieri, R., Heskel, M. A., Huntingford, C., Ishida, F. Y., Kattge, J., Lambers, H., Liddell, M. J., Lloyd, J., Lusk, C. H., Martin, R. E., Maksimov, A. P., Maximov, T. C., Malhi, Y., Medlyn, B. E., Meir, P., Mercado, L. M., Mirotchnick, N., Ng, D., Niinemets, Ü., O'Sullivan, O. S., Phillips, O. L., Poorter, L., Poot, P., Prentice, I. C., Salinas, N., Rowland, L. M., Ryan, M. G., Sitch, S., Slot, M., Smith, N. G., Turnbull, M. H., VanderWel, M. C., Valladares, F., Veneklaas, E. J., Weerasinghe, L. K., Wirth, C., Wright, I. J., Wythers, K. R., Xiang, J., Xiang, S., and Zaragoza-Castells, J.: Global variability in leaf respiration in relation to climate, plant functional types and leaf traits, New Phytol., 206, 614–636,, 2015. 

Atkin, O. K., Bahar, N., Bloomfield, K., Griffin, K. L., Heskel, M. A., Huntingford, C., and de la Torre, A. M.: Plant Respiration: Metabolic Fluxes and Carbon Balance, edited by: Tcherkez, G. and Ghashghaie, J., Springer, Cham, Switzerland, 302 pp., 2017. 

Bacour, C., Peylin, P., MacBean, N., Rayner, P. J., Delage, F., Chevallier, F., Weiss, M., Demarty, J., Santaren, D., Baret, F., Berveiller, D., Dufrêne, E., and Prunet, P.: Joint assimilation of eddy covariance flux measurements and FAPAR products over temperate forests within a process-oriented biosphere model, J. Geophys. Res.-Biogeo., 120, 1839–1857,, 2015. 

Baldocchi, D.: An analytical solution for coupled leaf photosynthesis and stomatal conductance models, Tree Physiol., 14, 1069–1079,, 1994. 

Ball, J. T., Woodrow, I. E., and Berry, J. A.: A model predicting stomatal conductance and its contribution to the control of photosynthesis under different environmental conditions, in: Progress in Photosynthesis Research, Springer, Dordrecht, the Netherlands, 221–224, 1987. 

Berbigier, P., Bonnefond, J., and Mellmann, P.: CO2 and water vapour fluxes for 2 years above Euroflux forest site, Agr. Forest Meteorol., 108, 183–197,, 2001. 

Beringer, J., Hutley, L. B., Tapper, N. J., and Cernusak, L. A.: Savanna fires and their impact on net ecosystem productivity in North Australia, Global Change Biol., 13, 990–1004,, 2007. 

Berzaghi, F., Wright, I. J., Kramer, K., Oddou-Muratorio, S., Bohn, F. J., Reyer, C. P. O., Sabaté, S., Sanders, T. G. M., and Hartig, F.: Towards a New Generation of Trait-Flexible Vegetation Models, Trends Ecol. Evol., 35, 191–205,, 2020. 

Beven, K.: Prophecy, reality and uncertainty in distributed hydrological modelling, Adv. Water Resour., 16, 41–51,, 1993. 

Beven, K. and Freer, J.: Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology, J. Hydrol., 249, 11–29,, 2001. 

Bloom, A. A. and Williams, M.: Constraining ecosystem carbon dynamics in a data-limited world: integrating ecological “common sense” in a model–data fusion framework, Biogeosciences, 12, 1299–1315,, 2015. 

Bloom, A. A., Exbrayat, J.-F., van der Velde, I. R., Feng, L., and Williams, M.: The decadal state of the terrestrial carbon cycle: Global retrievals of terrestrial carbon allocation, pools, and residence times, P. Natl. Acad. Sci. USA, 113, 1285–1290,, 2016. 

Bloom, A. A., Bowman, K. W., Liu, J., Konings, A. G., Worden, J. R., Parazoo, N. C., Meyer, V., Reager, J. T., Worden, H. M., Jiang, Z., Quetin, G. R., Smallman, T. L., Exbrayat, J.-F., Yin, Y., Saatchi, S. S., Williams, M., and Schimel, D. S.: Lagged effects regulate the inter-annual variability of the tropical carbon balance, Biogeosciences, 17, 6393–6422,, 2020. 

Bonan, G. B.: Importance of leaf area index and forest type when estimating photosynthesis in boreal forests, Remote Sens. Environ., 43, 303–314, 1993. 

Bonan, G. B. (Ed.): Terrestrial Biosphere Models, in: Climate Change and Terrestrial Ecosystem Modeling, Cambridge University Press, Cambridge, UK, 1–24,, 2019. 

Bonan, G. B. and Doney, S. C.: Climate, ecosystems, and planetary futures: The challenge to predict life in Earth system models, Science, 359, eaam8328,, 2018. 

Bonan, G. B., Williams, M., Fisher, R. A., and Oleson, K. W.: Modeling stomatal conductance in the earth system: linking leaf water-use efficiency and water transport along the soil–plant–atmosphere continuum, Geosci. Model Dev., 7, 2193–2222,, 2014. 

Butler, E. E., Datta, A., Flores-Moreno, H., Chen, M., Wythers, K. R., Fazayeli, F., Banerjee, A., Atkin, O. K., Kattge, J., Amiaud, B., Blonder, B., Boenisch, G., Bond-Lamberty, B., Brown, K. A., Byun, C., Campetella, G., Cerabolini, B. E. L., Cornelissen, J. H. C., Craine, J. M., Craven, D., de Vries, F. T., Díaz, S., Domingues, T. F., Forey, E., González-Melo, A., Gross, N., Han, W., Hattingh, W. N., Hickler, T., Jansen, S., Kramer, K., Kraft, N. J. B., Kurokawa, H., Laughlin, D. C., Meir, P., Minden, V., Niinemets, Ü., Onoda, Y., Peñuelas, J., Read, Q., Sack, L., Schamp, B., Soudzilovskaia, N. A., Spasojevic, M. J., Sosinski, E., Thornton, P. E., Valladares, F., van Bodegom, P. M., Williams, M., Wirth, C., and Reich, P. B.: Mapping local and global variability in plant trait distributions, P. Natl. Acad. Sci. USA, 114, 10937–10946,, 2017. 

Caprice, A. (Ed.): The Ultimate Quotable Einstein, Princeton University Press, Princeton, NJ, 608 pp., 2013. 

Collalti, A. and Prentice, I. C.: Is NPP proportional to GPP? Waring's hypothesis 20 years on, Tree Physiol., 39, 1473–1483,, 2019. 

Collalti, A., Ibrom, A., Stockmarr, A., Cescatti, A., Alkama, R., Fernández-Martínez, M., Matteucci, G., Sitch, S., Friedlingstein, P., Ciais, P., Goll, D. S., Nabel, J. E. M. S., Pongratz, J., Arneth, A., Haverd, V., and Prentice, I. C.: Forest production efficiency increases with growth temperature, Nat. Commun., 11, 5322,, 2020. 

Dietze, M. C., Fox, A., Beck-Johnson, L. M., Betancourt, J. L., Hooten, M. B., Jarnevich, C. S., Keitt, T. H., Kenney, M. A., Laney, C. M., Larsen, L. G., Loescher, H. W., Lunch, C. K., Pijanowski, B. C., Randerson, J. T., Read, E. K., Tredennick, A. T., Vargas, R., Weathers, K. C., and White, E. P.: Iterative near-term ecological forecasting: Needs, opportunities, and challenges, P. Natl. Acad. Sci. USA, 115, 1424–1432,, 2018. 

Exbrayat, J.-F., Smallman, T. L., Bloom, A. A., Hutley, L. B., and Williams, M.: Inverse Determination of the Influence of Fire on Vegetation Carbon Turnover in the Pantropics, Global Biogeochem. Cy., 32, 1776–1789,, 2018. 

Exbrayat, J.-F., Bloom, A. A., Carvalhais, N., Fischer, R., Huth, A., MacBean, N., and Williams, M.: Understanding the Land Carbon Cycle with Space Data: Current Status and Prospects, Surv. Geophys., 40, 735–755,, 2019. 

Famiglietti, C.: NEE and LAI prediction metrics for DALEC model suite (COMPLEX experiment), (last access: 23 April 2021) [Dataset], 2020. 

Famiglietti, C.: COMPLEX Analysis Code, [Dataset], last access: 23 April 2021. 

Fang, H., Jiang, C., Li, W., Wei, S., Baret, F., Chen, J. M., Garcia-Haro, J., Liang, S., Liu, R., Myneni, R. B., Pinty, B., Xiao, Z., and Zhu, Z.: Characterization and intercomparison of global moderate resolution leaf area index (LAI) products: Analysis of climatologies and theoretical uncertainties, J. Geophys. Res.-Biogeo., 118, 529–548,, 2013. 

Feng, X.: Marching in step: The importance of matching model complexity to data availability in terrestrial biosphere models, Global Change Biol., 26, 3190–3192,, 2020. 

Fer, I., Kelly, R., Moorcroft, P. R., Richardson, A. D., Cowdery, E. M., and Dietze, M. C.: Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation, Biogeosciences, 15, 5801–5830,, 2018. 

Fisher, J. B., Huntzinger, D. N., Schwalm, C. R., and Sitch, S.: Modeling the Terrestrial Biosphere, Annu. Rev. Env. Resour., 39, 91–123,, 2014. 

Fisher, R. A. and Koven, C. D.: Perspectives on the future of Land Surface Models and the challenges of representing complex terrestrial systems, J. Adv. Model. Earth Sy., 12, e2018MS001453,, 2020. 

Fisher, R. A., Koven, C. D., Anderegg, W. R. L., Christoffersen, B. O., Dietze, M. C., Farrior, C. E., Holm, J. A., Hurtt, G. C., Knox, R. G., Lawrence, P. J., Lichstein, J. W., Longo, M., Matheny, A. M., Medvigy, D., Muller-Landau, H. C., Powell, T. L., Serbin, S. P., Sato, H., Shuman, J. K., Smith, B., Trugman, A. T., Viskari, T., Verbeeck, H., Weng, E., Xu, C., Xu, X., Zhang, T., and Moorcroft, P. R.: Vegetation demographics in Earth System Models: A review of progress and priorities, Global Change Biol., 24, 35–54,, 2018. 

Fisher, R. A., Wieder, W. R., Sanderson, B. M., Koven, C. D., Oleson, K. W., Xu, C., Fisher, J. B., Shi, M., Walker, A. P., and Lawrence, D. M.: Parametric Controls on Vegetation Responses to Biogeochemical Forcing in the CLM5, J. Adv. Model. Earth Sy., 11, 2879–2895,, 2019. 

Flack-Prain, S., Meir, P., Malhi, Y., Smallman, T. L., and Williams, M.: Does economic optimisation explain LAI and leaf trait distributions across an Amazon soil moisture gradient?, Global Change Biol., 27, 587–605,, 2021. 

Forkel, M., Migliavacca, M., Thonicke, K., Reichstein, M., Schaphoff, S., Weber, U., and Carvalhais, N.: Codominant water control on global interannual variability and trends in land surface phenology and greenness, Global Change Biol., 21, 3414–3435,, 2015. 

Fox, A., Williams, M., Richardson, A. D., Cameron, D., Gove, J. H., Quaife, T., Ricciuto, D., Reichstein, M., Tomelleri, E., and Trudinger, C. M.: The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data, Agr. Forest Meteorol., 149, 1597–1615, 2009. 

Friedlingstein, P., Meinshausen, M., Arora, V. K., Jones, C. D., Anav, A., Liddicoat, S. K., and Knutti, R.: Uncertainties in CMIP5 Climate Projections due to Carbon Cycle Feedbacks, J. Climate, 27, 511–526,, 2013. 

Fuster, B., Sánchez-Zapero, J., Camacho, F., García-Santos, V., Verger, A., Lacaze, R., Weiss, M., Baret, F., and Smets, B.: Quality Assessment of PROBA-V LAI, fAPAR and fCOVER Collection 300 m Products of Copernicus Global Land Service, Remote Sens., 12, 1017,, 2020. 

Fyllas, N. M., Gloor, E., Mercado, L. M., Sitch, S., Quesada, C. A., Domingues, T. F., Galbraith, D. R., Torre-Lezama, A., Vilanova, E., Ramírez-Angulo, H., Higuchi, N., Neill, D. A., Silveira, M., Ferreira, L., Aymard C., G. A., Malhi, Y., Phillips, O. L., and Lloyd, J.: Analysing Amazonian forest productivity using a new individual and trait-based model (TFS v.1), Geosci. Model Dev., 7, 1251–1269,, 2014. 

Gaudinski, J. B., Trumbore, S. E., Davidson, E. A., and Zheng, S.: Soil carbon cycling in a temperate forest: radiocarbon-based estimates of residence times, sequestration rates and partitioning of fluxes, Biogeochemistry, 51, 33–69,, 2000. 

Ginzburg, L. R. and Jensen, C. X. J.: Rules of thumb for judging ecological theories, Trends Ecol. Evol., 19, 121–126, 2004. 

Haario, H., Saksman, E., and Tamminen, J.: An adaptive Metropolis algorithm, Bernoulli, 7, 223–242,, 2001. 

Hawkins, D. M.: The problem of overfitting, J. Chem. Inf. Comp. Sci., 44, 1–12, 2004. 

Heimann, M. and Reichstein, M.: Terrestrial ecosystem carbon dynamics and climate feedbacks, Nature, 451, 289–292,, 2008. 

Hill, T. C., Ryan, E., and Williams, M.: The use of CO2 flux time series for parameter and carbon stock estimation in carbon cycle research, Global Change Biol., 18, 179–193,, 2012. 

Huntzinger, D. N., Schwalm, C., Michalak, A. M., Schaefer, K., King, A. W., Wei, Y., Jacobson, A., Liu, S., Cook, R. B., Post, W. M., Berthier, G., Hayes, D., Huang, M., Ito, A., Lei, H., Lu, C., Mao, J., Peng, C. H., Peng, S., Poulter, B., Riccuito, D., Shi, X., Tian, H., Wang, W., Zeng, N., Zhao, F., and Zhu, Q.: The North American Carbon Program Multi-Scale Synthesis and Terrestrial Model Intercomparison Project – Part 1: Overview and experimental design, Geosci. Model Dev., 6, 2121–2133,, 2013. 

Jia, W., Zhang, H., He, X., and Wu, Q.: Gaussian Weighted Histogram Intersection for License Plate Classification, in: 18th International Conference on Pattern Recognition (ICPR'06), 20–24 August 2006, Hong Kong, China, 574–577,, 2006. 

Jiang, C., Ryu, Y., Wang, H., and Keenan, T. F.: An optimality-based model explains seasonal variation in C3 plant photosynthetic capacity, Global Change Biol., 26, 6493–6510,, 2020. 

Jolly, W. M., Graham, J. M., Michaelis, A., Nemani, R., and Running, S. W.: A flexible, integrated system for generating meteorological surfaces derived from point sources across multiple geographic scales, Environ. Modell. Softw., 20, 873–882,, 2005. 

Kattge, J., Bönisch, G., Díaz, S., et al.: TRY plant trait database – enhanced coverage and open access, Global Change Biol., 26, 119–188,, 2020. 

Keenan, T. F., Carbone, M. S., Reichstein, M., and Richardson, A. D.: The model-data fusion pitfall: assuming certainty in an uncertain world, Oecologia, 167, 587,, 2011. 

Keenan, T. F., Davidson, E. A., Munger, J. W., and Richardson, A. D.: Rate my data: quantifying the value of ecological data for the development of models of the terrestrial carbon cycle, Ecol. Appl., 23, 273–286,, 2013. 

Kennedy, D., Swenson, S., Oleson, K. W., Lawrence, D. M., Fisher, R., Lola da Costa, A. C., and Gentine, P.: Implementing Plant Hydraulics in the Community Land Model, Version 5, J. Adv. Model. Earth Sy., 11, 485–513,, 2019. 

Konings, A. G., Bloom, A. A., Liu, J., Parazoo, N. C., Schimel, D. S., and Bowman, K. W.: Global satellite-driven estimates of heterotrophic respiration, Biogeosciences, 16, 2269–2284,, 2019. 

Lawrence, D. M., Oleson, K. W., Flanner, M. G., Thornton, P. E., Swenson, S. C., Lawrence, P. J., Zeng, X., Yang, Z.-L., Levis, S., Sakaguchi, K., Bonan, G. B., and Slater, A. G.: Parameterization improvements and functional and structural advances in Version 4 of the Community Land Model, J. Adv. Model. Earth Sy., 3, M03001,, 2011. 

LeBauer, D. S., Wang, D., Richter, K. T., Davidson, C. C., and Dietze, M. C.: Facilitating feedbacks between field measurements and ecosystem models, Ecol. Monogr., 83, 133–154,, 2013. 

Lever, J., Krzywinski, M., and Altman, N.: Model selection and overfitting, Nat. Methods, 13, 703–704,, 2016. 

López-Blanco, E., Exbrayat, J.-F., Lund, M., Christensen, T. R., Tamstorf, M. P., Slevin, D., Hugelius, G., Bloom, A. A., and Williams, M.: Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system, Earth Syst. Dynam., 10, 233–255,, 2019. 

Lovenduski, N. S. and Bonan, G. B.: Reducing uncertainty in projections of terrestrial carbon uptake, Environ. Res. Lett., 12, 44020,, 2017. 

Luo, Y., Keenan, T. F., and Smith, M.: Predictability of the terrestrial carbon cycle, Global Change Biol., 21, 1737–1751,, 2015. 

MacBean, N., Peylin, P., Chevallier, F., Scholze, M., and Schürmann, G.: Consistent assimilation of multiple data streams in a carbon cycle data assimilation system, Geosci. Model Dev., 9, 3569–3588,, 2016. 

MacBean, N., Maignan, F., Bacour, C., Lewis, P., Peylin, P., Guanter, L., Köhler, P., Gómez-Dans, J., and Disney, M.: Strong constraint on modelled global carbon uptake using solar-induced chlorophyll fluorescence data, Sci. Rep.-UK, 8, 1973,, 2018. 

Maji, S., Berg, A. C., and Malik, J.: Classification using intersection kernel support vector machines is efficient, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 June 2008, Anchorage, AK, USA, 1–8,, 2008. 

Munger, W. and Wofsy, S.: Biomass Inventories at Harvard Forest EMS Tower since 1993 (version 33), Environmental Data Initiative,, 2020a. 

Munger, W. and Wofsy, S.: Canopy-Atmosphere Exchange of Carbon, Water and Energy at Harvard Forest EMS Tower since 1991 (version 31),Environmental Data Initiative,, 2020b. 

Norton, A. J., Rayner, P. J., Koffi, E. N., Scholze, M., Silver, J. D., and Wang, Y.-P.: Estimating global gross primary productivity using chlorophyll fluorescence and a data assimilation system with the BETHY-SCOPE model, Biogeosciences, 16, 3069–3093,, 2019. 

Oleson, K. W., Lawrence, D. M., Bonan, G. B., Flanner, M. G., Kluzek, E., Lawrence, P. J., Levis, S., Swenson, S. C., Thornton, P. E., and Dai, A.: Technical description of version 4.5 of the Community Land Model (CLM), NCAR Tech., Notes (NCAR/TN-478+ STR),, 2010. 

Pastorello, G., Trotta, C., Canfora, E., et al.: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data, Sci. Data, 7, 225,, 2020. 

Pavlick, R., Drewry, D. T., Bohn, K., Reu, B., and Kleidon, A.: The Jena Diversity-Dynamic Global Vegetation Model (JeDi-DGVM): a diverse approach to representing terrestrial biogeography and biogeochemistry based on plant functional trade-offs, Biogeosciences, 10, 4137–4177,, 2013. 

Peaucelle, M., Bacour, C., Ciais, P., Vuichard, N., Kuppel, S., Peñuelas, J., Belelli Marchesini, L., Blanken, P. D., Buchmann, N., Chen, J., Delpierre, N., Desai, A. R., Dufrene, E., Gianelle, D., Gimeno-Colera, C., Gruening, C., Helfter, C., Hörtnagl, L., Ibrom, A., Joffre, R., Kato, T., Kolb, T. E., Law, B., Lindroth, A., Mammarella, I., Merbold, L., Minerbi, S., Montagnani, L., Šigut, L., Sutton, M., Varlagin, A., Vesala, T., Wohlfahrt, G., Wolf, S., Yakir, D., and Viovy, N.: Covariations between plant functional traits emerge from constraining parameterization of a terrestrial biosphere model, Global Ecol. Biogeogr., 28, 1351–1365,, 2019. 

Peylin, P., Bacour, C., MacBean, N., Leonard, S., Rayner, P., Kuppel, S., Koffi, E., Kane, A., Maignan, F., Chevallier, F., Ciais, P., and Prunet, P.: A new stepwise carbon cycle data assimilation system using multiple data streams to constrain the simulated land surface carbon cycle, Geosci. Model Dev., 9, 3321–3346,, 2016. 

Prentice, I. C., Liang, X., Medlyn, B. E., and Wang, Y.-P.: Reliable, robust and realistic: the three R's of next-generation land-surface modelling, Atmos. Chem. Phys., 15, 5987–6005,, 2015. 

Quetin, G. R., Bloom, A. A., Bowman, K. W., and Konings, A. G.: Carbon Flux Variability From a Relatively Simple Ecosystem Model With Assimilated Data Is Consistent With Terrestrial Biosphere Model Estimates, J. Adv. Model. Earth Sy., 12, e2019MS001889,, 2020. 

Rambal, S., Joffre, R., Ourcival, J. M., Cavender-Bares, J., and Rocheteau, A.: The growth respiration component in eddy CO2 flux from a Quercus ilex mediterranean forest, Global Change Biol., 10, 1460–1469,, 2004. 

Raoult, N. M., Jupp, T. E., Cox, P. M., and Luke, C. M.: Land-surface parameter optimisation using data assimilation techniques: the adJULES system V1.0, Geosci. Model Dev., 9, 2833–2852,, 2016. 

Rayner, P. J., Scholze, M., Knorr, W., Kaminski, T., Giering, R., and Widmann, H.: Two decades of terrestrial carbon fluxes from a carbon cycle data assimilation system (CCDAS), Global Biogeochem. Cy., 19, GB2026,, 2005. 

Reich, P. B., Tjoelker, M. G., Pregitzer, K. S., Wright, I. J., Oleksyn, J., and Machado, J.-L.: Scaling of respiration to nitrogen in leaves, stems and roots of higher land plants, Ecol. Lett., 11, 793–801,, 2008. 

Ryan, M. G.: Effects of Climate Change on Plant Respiration, Ecol. Appl., 1, 157–167,, 1991. 

Sakschewski, B., von Bloh, W., Boit, A., Rammig, A., Kattge, J., Poorter, L., Peñuelas, J., and Thonicke, K.: Leaf and stem economics spectra drive diversity of functional plant traits in a dynamic global vegetation model, Global Change Biol., 21, 2711–2725,, 2015. 

Sandel, B., Gutiérrez, A. G., Reich, P. B., Schrodt, F., Dickie, J., and Kattge, J.: Estimating themissing species bias in plant trait measurements, J. Veg. Sci., 26, 828–838,, 2015. 

Scheiter, S., Langan, L., and Higgins, S. I.: Next-generation dynamic global vegetation models: learning from community ecology, New Phytol., 198, 957–969,, 2013. 

Schimel, D. S., Pavlick, R., Fisher, J. B., Asner, G. P., Saatchi, S. S., Townsend, P., Miller, C., Frankenberg, C., Hibbard, K., and Cox, P.: Observing terrestrial ecosystems and the carbon cycle from space, Global Change Biol., 21, 1762,, 2015. 

Scholze, M., Buchwitz, M., Dorigo, W., Guanter, L., and Quegan, S.: Reviews and syntheses: Systematic Earth observations for use in terrestrial carbon cycle data assimilation systems, Biogeosciences, 14, 3401–3429,, 2017. 

Schürmann, G. J., Kaminski, T., Köstler, C., Carvalhais, N., Voßbeck, M., Kattge, J., Giering, R., Rödenbeck, C., Heimann, M., and Zaehle, S.: Constraining a land-surface model with multiple observations by application of the MPI-Carbon Cycle Data Assimilation System V1.0, Geosci. Model Dev., 9, 2999–3026,, 2016. 

Schwalm, C. R., Schaefer, K., Fisher, J. B., Huntzinger, D., Elshorbany, Y., Fang, Y., Hayes, D., Jafarov, E., Michalak, A. M., Piper, M., Stofferahn, E., Wang, K., and Wei, Y.: Divergence in land surface modeling: linking spread to structure, Environ. Res. Commun., 1, 111004,, 2019. 

Schwalm, C. R., Huntzinger, D. N., Michalak, A. M., Schaefer, K., Fisher, J. B., Fang, Y., and Wei, Y.: Modeling suggests fossil fuel emissions have been driving increased land carbon uptake since the turn of the 20th Century, Sci. Rep.-UK, 10, 9059,, 2020. 

Sellers, P. J., Berry, J. A., Collatz, G. J., Field, C. B., and Hall, F. G.: Canopy reflectance, photosynthesis, and transpiration – III. A reanalysis using improved leaf models and a new canopy integration scheme, Remote Sens. Environ., 42, 187–216,, 1992. 

Shi, Z., Crowell, S., Luo, Y., and Moore, B.: Model structures amplify uncertainty in predicted soil carbon responses to climate change, Nat. Commun., 9, 2171,, 2018. 

Shiklomanov, A. N., Bond-Lamberty, B., Atkins, J. W., and Gough, C. M.: Structure and parameter uncertainty in centennial projections of forest community structure and carbon cycling, Global Change Biol., 26, 6080–6096,, 2020. 

Smallman, T. L. and Williams, M.: Description and validation of an intermediate complexity model for ecosystem photosynthesis and evapotranspiration: ACM-GPP-ETv1, Geosci. Model Dev., 12, 2227–2253,, 2019. 

Smallman, T. L., Moncrieff, J. B., and Williams, M.: WRFv3.2-SPAv2: development and validation of a coupled ecosystem–atmosphere model, scaling from surface fluxes of CO2 and energy to atmospheric profiles, Geosci. Model Dev., 6, 1079–1093,, 2013. 

Smallman, T. L., Exbrayat, J.-F., Mencuccini, M., Bloom, A. A., and Williams, M.: Assimilation of repeated woody biomass observations constrains decadal ecosystem carbon cycle uncertainty in aggrading forests, J. Geophys. Res.-Biogeo., 122, 528–545,, 2017. 

Smith, N. G., Keenan, T. F., Colin Prentice, I., Wang, H., Wright, I. J., Niinemets, Ü., Crous, K. Y., Domingues, T. F., Guerrieri, R., Yoko Ishida, F., Kattge, J., Kruger, E. L., Maire, V., Rogers, A., Serbin, S. P., Tarvainen, L., Togashi, H. F., Townsend, P. A., Wang, M., Weerasinghe, L. K., and Zhou, S.-X.: Global photosynthetic capacity is optimized to the environment, Ecol. Lett., 22, 506–517,, 2019. 

Stocker, T. F., Qin, D., Plattner, G.-K., Tignor, M. M. B., Allen, S. K., Boschung, J., Nauels, A., Xia, Y., Bex, V., and Midgley, P. M.: Climate Change 2013: The physical science basis, in: Contribution of Working Group I to the fifth assessment report of IPCC the intergovernmental panel on climate change, Cambridge University Press, Cambridge, UK and New York, NY, 1535 pp.,, 2014. 

Suni, T., Rinne, J., Reissell, A., Altimir, N., Keronen, P., Rannik, Ü., Maso, M. D., Kulmala, M., and Vesala, T.: Long-term measurements of surface fluxes above a Scots pine forest in Hyytiälä, southern Finland, 1996–2001, Boreal Environ. Res., 8, 287–301, 2003. 

Thomas, R. Q. and Williams, M.: A model using marginal efficiency of investment to analyze carbon and nitrogen interactions in terrestrial ecosystems (ACONITE Version 1), Geosci. Model Dev., 7, 2015–2037,, 2014. 

Thomas, R. Q., Jersild, A. L., Brooks, E. B., Thomas, V. A., and Wynne, R. H.: A mid-century ecological forecast with partitioned uncertainty predicts increases in loblolly pine forest productivity, Ecol. Appl., 28, 1503–1519,, 2018. 

Thomas, R. Q., Williams, M., Cavaleri, M. A., Exbrayat, J.-F., Smallman, T. L., and Street, L. E.: Alternate Trait-Based Leaf Respiration Schemes Evaluated at Ecosystem-Scale Through Carbon Optimization Modeling and Canopy Property Data, J. Adv. Model. Earth Sy., 11, 4629–4644,, 2019. 

van Bodegom, P. M., Douma, J. C., Witte, J. P. M., Ordoñez, J. C., Bartholomeus, R. P., and Aerts, R.: Going beyond limitations of plant functional types when predicting global ecosystem–atmosphere fluxes: exploring the merits of traits-based approaches, Global Ecol. Biogeogr., 21, 625–636,, 2012. 

van Bodegom, P. M., Douma, J. C., and Verheijen, L. M.: A fully traits-based approach to modeling global vegetation distribution, P. Natl. Acad. Sci. USA, 111, 13733–13738,, 2014. 

Verger, A., Baret, F., and Weiss, M.: Near Real-Time Vegetation Monitoring at Global Scale, IEEE J. Sel. Top. Appl., 7, 3473–3481,, 2014. 

Verheijen, L. M., Brovkin, V., Aerts, R., Bönisch, G., Cornelissen, J. H. C., Kattge, J., Reich, P. B., Wright, I. J., and van Bodegom, P. M.: Impacts of trait variation through observed trait–climate relationships on performance of an Earth system model: a conceptual analysis, Biogeosciences, 10, 5497–5515,, 2013. 

Walker, A. P., Quaife, T., van Bodegom, P. M., De Kauwe, M. G., Keenan, T. F., Joiner, J., Lomas, M. R., MacBean, N., Xu, C., Yang, X., and Woodward, F. I.: The impact of alternative trait-scaling hypotheses for the maximum photosynthetic carboxylation rate (Vcmax) on global gross primary production, New Phytol., 215, 1370–1386,, 2017. 

Wang, H., Atkin, O. K., Keenan, T. F., Smith, N. G., Wright, I. J., Bloomfield, K. J., Kattge, J., Reich, P. B., and Prentice, I. C.: Acclimation of leaf respiration consistent with optimal photosynthetic capacity, Global Change Biol., 26, 2573–2583,, 2020. 

Waring, R. H. and Schlesinger, W. H.: Forest ecosystems, Concepts and Management, Academic Press, Orlando, Florida, USA, 340 pp., 1985. 

Waring, R. H., Landsberg, J. J., and Williams, M.: Net primary production of forests: a constant fraction of gross primary production?, Tree Physiol., 18, 129–134,, 1998. 

White, E. P., Yenni, G. M., Taylor, S. D., Christensen, E. M., Bledsoe, E. K., Simonis, J. L., and Ernest, S. K. M.: Developing an automated iterative near-term forecasting system for an ecological study, Methods Ecol. Evol., 10, 332–344,, 2019. 

Williams, M., Rastetter, E. B., Fernandes, D. N., Goulden, M. L., Wofsy, S. C., Shaver, G. R., Melillo, J. M., Munger, J. W., Fan, S.-M., and Nadelhoffer, K. J.: Modelling the soil-plant-atmosphere continuum in a Quercus–Acer stand at Harvard Forest: the regulation of stomatal conductance by light, nitrogen and soil/plant hydraulic properties, Plant Cell Environ., 19, 911–927,, 1996. 

Williams, M., Rastetter, E. B., Fernandes, D. N., Goulden, M. L., Shaver, G. R., and Johnson, L. C.: Predicting gross primary productivity in terrestrial ecosystems, Ecol. Appl., 7, 882–894,[0882:PGPPIT]2.0.CO;2, 1997.  

Williams, M., Law, B. E., Anthoni, P. M., and Unsworth, M. H.: Use of a simulation model and ecosystem flux data to examine carbon–water interactions in ponderosa pine, Tree Physiol., 21, 287–298,, 2001. 

Williams, M., Schwarz, P. A., Law, B. E., Irvine, J., and Kurpius, M. R.: An improved analysis of forest carbon dynamics using data assimilation, Global Change Biol., 11, 89–105,, 2005. 

Wu, G., Cai, X., Keenan, T. F., Li, S., Luo, X., Fisher, J. B., Cao, R., Li, F., Purdy, A. J., Zhao, W., Sun, X., and Hu, Z.: Evaluating three evapotranspiration estimates from model of different complexity over China using the ILAMB benchmarking system, J. Hydrol., 125553,, 2020a. 

Wu, G., Hu, Z., Keenan, T. F., Li, S., Zhao, W., Cao, R. C., Li, Y., Guo, Q., and Sun, X.: Incorporating spatial variations in parameters for improvements of an evapotranspiration model, J. Geophys. Res.-Biogeo., 125, e2019JG005504,, 2020b. 

Yin, Y., Bloom, A. A., Worden, J., Saatchi, S., Yang, Y., Williams, M., Liu, J., Jiang, Z., Worden, H., Bowman, K., Frankenberg, C., and Schimel, D.: Fire decline in dry tropical ecosystems enhances decadal land carbon sink, Nat. Commun., 11, 1900,, 2020. 

Zhao, Y., Chen, X., Smallman, T. L., Flack-Prain, S., Milodowski, D. T., and Williams, M.: Characterizing the Error and Bias of Remotely Sensed LAI Products: An Example for Tropical and Subtropical Evergreen Forests in South China, 12, 3122,, 2020. 

Short summary
Model uncertainty dominates the spread in terrestrial carbon cycle predictions. Efforts to reduce it typically involve adding processes, thereby increasing model complexity. However, if and how model performance scales with complexity is unclear. Using a suite of 16 structurally distinct carbon cycle models, we find that increased complexity only improves skill if parameters are adequately informed. Otherwise, it can degrade skill, and an intermediate-complexity model is optimal.
Final-revised paper