The terrestrial carbon cycle plays a critical role in modulating the interactions of climate with the Earth system, but different models often make vastly different predictions of its behavior. Efforts to reduce model uncertainty have commonly focused on model structure, namely by introducing additional processes and increasing structural complexity. However, the extent to which increased structural complexity can directly improve predictive skill is unclear. While adding processes may improve realism, the resulting models are often encumbered by a greater number of poorly determined or over-generalized parameters. To guide efficient model development, here we map the theoretical relationship between model complexity and predictive skill. To do so, we developed 16 structurally distinct carbon cycle models spanning an axis of complexity and incorporated them into a model–data fusion system. We calibrated each model at six globally distributed eddy covariance sites with long observation time series and under 42 data scenarios that resulted in different degrees of parameter uncertainty. For each combination of site, data scenario, and model, we then predicted net ecosystem exchange (NEE) and leaf area index (LAI) for validation against independent local site data. Though the maximum model complexity we evaluated is lower than most traditional terrestrial biosphere models, the complexity range we explored provides universal insight into the inter-relationship between structural uncertainty, parametric uncertainty, and model forecast skill. Specifically, increased complexity only improves forecast skill if parameters are adequately informed (e.g., when NEE observations are used for calibration). Otherwise, increased complexity can degrade skill and an intermediate-complexity model is optimal. This finding remains consistent regardless of whether NEE or LAI is predicted. Our COMPLexity EXperiment (COMPLEX) highlights the importance of robust observation-based parameterization for land surface modeling and suggests that data characterizing net carbon fluxes will be key to improving decadal predictions of high-dimensional terrestrial biosphere models.
The role of the terrestrial biosphere in the global carbon cycle is
challenging to model
(Friedlingstein et al., 2013) due to the diverse processes, forcings, and feedbacks driving variability
of gross fluxes (Heimann and Reichstein, 2008; Luo et al., 2015). Many attempts to reduce model uncertainty have focused on
matching models to nature by representing an increasing number of processes
known to influence different parts of the carbon cycle (e.g., vegetation
demography, Fisher et al., 2018, or plant hydraulics, Kennedy et al., 2019). In this way, models of the terrestrial
biosphere have become more complex over time
(Fisher et al., 2014; Bonan, 2019; Fisher and Koven, 2020). Despite such advancements, the spread in terrestrial carbon cycle
predictions remains large (Arora et al., 2020) and is dominated more so by model uncertainty
than by either internal variability of the climate system or emission
scenario uncertainty (Lovenduski and Bonan, 2017; Bonan and Doney, 2018). Because the behavior of the terrestrial biosphere
feeds back directly on the rate of CO
While ecological models are becoming more and more detailed, the extent to which predictive skill scales with model complexity is not clear. The logic behind enhancing model realism with increased complexity is intuitive: a highly simplistic model may be structurally unable to capture key relationships defining the system (it underfits), which would naturally imply that greater detail is needed to improve model performance. However, excessively complex models have their own limitations. Because they often contain more parameters than can be robustly determined with the available data (e.g., Prentice et al., 2015; Shi et al., 2018; Feng, 2020), they are prone to learning “noise” instead of true interactions (also called overfitting; Ginzburg and Jensen, 2004; Hawkins, 2004; Keenan et al., 2013). Equifinality – the case in which vastly different parameter sets can yield similar model performance (Beven, 1993; Beven and Freer, 2001) – also becomes more likely as model complexity increases. This dichotomy between model complexity and model performance is known in the statistics and machine learning communities as the bias–variance tradeoff. According to this theory, a model that balances the costs of under- and overfitting can minimize forecast error (Lever et al., 2016). It is therefore possible that other approaches to reducing carbon cycle model uncertainty (e.g., improving model parameterization) may be more effective than increasing structural realism in some circumstances, as also noted by Shiklomanov et al. (2020) and Wu et al. (2020a).
Here, we explicitly map the relationship between model complexity and predictive performance across a spectrum of model structures and parameterizations, hypothesizing that an intermediate-complexity carbon cycle model can outperform a low- or high-complexity one. Our approach can inform ecological models that operate on a spectrum of scales, from localized at the level of individual stands to highly generalizable across the global land surface. This study is particularly relevant for global ecological models, which often function as the land surface component of large-scale Earth system models and have been employed in contexts that carry significant policy relevance (e.g., Intergovernmental Panel on Climate Change (IPCC) reports; Stocker et al., 2014). Hereafter we refer to global ecological models as terrestrial biosphere models, or TBMs.
We note a distinction between conceptualizing complexity as a straightforward count of a model's parameters, equations, or processes, versus as an emergent property of its solution space. When locations or data constraints do not allow certain model parameter values or modeled states, this reduces the effective complexity of the remaining set of possible solutions. That is, one can consider what we term the “effective complexity” of a model as a function of the actual parameter combinations that are possible for that model, or equivalently, the volume of space occupied by these parameter combinations. Two models with the same number of parameters may have very different effective complexities, for example, because correlations between parameters (e.g., allocation fraction to foliage and turnover rate of foliage; Fox et al., 2009) or the extent to which they are constrained (i.e., many more states are possible in the absence of assimilated data than in the presence of it; Keenan et al., 2013), or when the assimilated data have high uncertainty, can influence the models' effective degrees of freedom. As a simple analogy, consider the difference between a sphere and a disc in three-dimensional space (Fig. 1). Although both exist within the space determined by three unconstrained parameters (axes), they are not identical because the volumes they occupy – and the relationships between their parameters – are drastically different. The same can be true between models: one model's equations or assimilated observations may constrain the dimensionality of its potential parameter space to “resemble” a disc, while that occupied by another, less constrained model may look more like a sphere.
Conceptual diagram of effective complexity in three-parameter
space. A sphere
Model–data fusion (MDF) systems (also known as data assimilation systems) provide an effective way of isolating and evaluating different model structures by using observations to derive optimized model parameters with uncertainty. An increasingly common tool for carbon cycle science, MDF has been leveraged to provide insight into long-term trends of carbon fluxes (e.g., Rayner et al., 2005), to reconcile the roles of specific datasets in constraining parametric uncertainty (e.g., Keenan et al., 2013), and more (Scholze et al., 2017). Here we use an MDF system called the CARbon DAta MOdel fraMework, or CARDAMOM (Bloom and Williams, 2015; Bloom et al., 2016), chosen because of its high customizability. The structure of its underlying ecosystem carbon model, DALEC (Williams et al., 2005; Bloom and Williams, 2015), can be easily adjusted to become more simple or detailed (e.g., by changing the number of carbon pools or by modifying the functional representations of certain carbon fluxes). Various combinations of observational and functional constraints can also be tested in the assimilation process, along with different assumptions on the amount of error inherent to each assimilated dataset (the characterization of which is an ongoing challenge for the modeling community; Keenan et al., 2011). Taken together, this flexibility allows for experimentation with the different levers that control effective model complexity.
In this paper, we demonstrate the extent to which the prediction accuracy of two key carbon cycle variables can theoretically scale with model complexity. Net ecosystem exchange (NEE) and leaf area index (LAI) were chosen for the analysis because they represent integrated effects of different parts of the carbon cycle (NEE is the balance of photosynthesis and ecosystem respiration fluxes, while LAI strongly controls canopy photosynthesis; Bonan, 1993). Additionally, both are commonly measured and modeled. To explore the complexity–skill relationship, we developed 16 structurally distinct carbon cycle models (i.e., variants of the DALEC model) spanning a range of complexity and calibrated them using the CARDAMOM framework. Several recent studies have demonstrated the utility of CARDAMOM for understanding multiple aspects of the carbon cycle (e.g., López-Blanco et al., 2019; Konings et al., 2019; Yin et al., 2020; Bloom et al., 2020; Quetin et al., 2020), lending confidence for its use here. We calibrated each DALEC variant within CARDAMOM under 42 different data scenarios (i.e., combinations of data constraints and assumptions about observational error) representing different degrees of certainty with which parameters are determined. Each model was calibrated and validated at six globally distributed eddy covariance sites covering a range of biomes and vegetation types, with data collected over multiple years. To quantify complexity, we computed the effective complexity of each model calibration using a principal component analysis (PCA) that reduced the parameter space to its primary axes of variance. Forecast skill was determined using an overlap metric that takes account of uncertainty both in the model forecast and the validation data. Though the range of complexity we evaluated here is lower than that populated by large-scale TBMs, this experiment reveals universal modeling elements that control performance. Specifically, here our COMPLexity EXperiment (COMPLEX) aims to answer the following questions. (a) What controls a given model run's effective complexity? (b) Under what conditions does increasing model complexity improve forecast skill?
The Data Assimilation Linked Ecosystem Carbon (DALEC) model suite includes 16 related intermediate-complexity models of the terrestrial carbon cycle. Each model variant tracks the state and dynamics of both live and dead carbon pools, their interactions, and responses to meteorology and disturbance such as fire or biomass removals. From an initial DALEC model (Williams et al., 2005), we produced alternate structures that either aimed to reduce complexity by focusing on core variables/processes and removing others or aimed to increase complexity by including hypothesized missing carbon pools or improving on over-simplified processes.
Accordingly, the DALEC suite spans a range of model structures (i.e., number of
carbon pools, carbon pool connectivity) and process representations
(component sub-models of varying complexity) related to different
simulations of photosynthesis, plant respiration, decomposition, and water
cycle feedbacks. These representations are listed in Table 1 and described in
further detail in Appendix A. To facilitate disentanglement of the impacts of specific
alternate process representations, the different sub-models can be related
to a common baseline structure of the carbon cycle (Fig. 2a). Specific variants of
this general structure for the least and most detailed models in this
analysis are presented in Fig. 2b–c, while additional diagrams for the remaining
models are shown in Appendix B (Figs. B1–B7). Across models, carbon enters the system via gross
primary productivity (GPP), which is allocated to autotrophic respiration
(
Summary of the DALEC sub-model combinations assessed in
COMPLEX. For a detailed description see the Supplement. ID is model
identifier. CDEA: Combined Deciduous Evergreen Analytical model; CDEA
Overview of the carbon pools (filled boxes) and fluxes
(arrows, with names in open boxes) represented in the DALEC model suite.
COMPLEX uses information from six globally distributed eddy
covariance sites participating in FLUXNET (Pastorello et al., 2020) (Table 2). Our site selection procedure
aimed to maximize biogeographical spread and diversity of natural ecosystems
while fulfilling specific data requirements. These constraints collectively
yielded a series of site selection criteria that are described in detail in
Appendix C. As an example, the sites must not be dominated by the C
Summary of sites, showing their location, FLUXNET code,
observational time period, mean climate information, and ecosystem type.
Latitude is given in
We used the CARDAMOM model–data fusion system
(Bloom and Williams, 2015; Bloom et al., 2016) to parameterize the DALEC model suite with available observations of the
carbon cycle. Specifically, we employed Bayesian inference to retrieve
time-invariant, site-specific, optimized parameters and initial conditions
for a given DALEC model (
For each model,
The likelihood
To sample the distribution
We performed a factorial experiment such that each of the 16 structurally
distinct carbon cycle models was run within CARDAMOM under all possible
combinations of sites, observational and functional constraints, and
assumptions on data uncertainties. These scenarios represented differing
degrees of certainty with which parameter distributions were determined.
Specifically, we considered (a) six sites; (b) six options for assimilated data,
including one for which no data were ingested into the model; (c) four options for
the magnitude of error assumed on the assimilated datasets (represented by
scalar multipliers on the prescribed nominal uncertainties); and (d) two options
for EDC state (either present or absent) (Table 3). In total, this factorial approach
yielded 4032 unique model runs (16 models
Model specifications varied in the factorial experiment. Each of the 16 model versions was run with every combination of scenarios across each variable. Note that observational error scalars were not applied when no data were assimilated into the model.
Figure 3 shows examples of three model analyses at the FR-LBr site, highlighting the range in NEE prediction performance across different model structures and data scenarios. Each model run contains a calibration period (the first 5 years of the site record; shown in white) during which optimized parameters were derived and a forecast period (the remaining years of the record, which always spanned at least 5 years because no site contained fewer than 10 years of data; shown in gray) during which fluxes and pools were predicted with the optimally parameterized model. In the scenario presented, model S2 is highly constrained by multiple datasets (Fig. 3a). By contrast, model C2 is moderately constrained (Fig. 3b), and model G4 is poorly constrained (Fig. 3c), which is evident by comparing the relative uncertainty of the NEE forecasts (blue shading) for each model. To highlight the effectiveness of the assimilation system, corresponding time series based only on prior parameter distributions are presented in Fig. S1.
Accounting for prediction uncertainty – as well as data uncertainty (red shading) – is a key goal of our model skill evaluation approach. Forecast skill for each model run was computed by comparing predictions and observations drawn strictly from the forecast period, using the histogram intersection algorithm (see Sect. 2.5.1). The complexity of each run was quantified based on its effective complexity (see Sect. 2.5.2).
Example model runs (title of each subplot) at the FR-LBr
site. The calibration window – the first 5 years of the record – is shown in
white, and the forecast window is shaded gray. The ensemble spread (blue
shading) encapsulates the 5th–95th percentiles of runs.
We chose the histogram intersection as a skill metric because it captures
accuracy along with both prediction uncertainty (i.e., the ensemble spread for a
given model output) and observational uncertainty (i.e., the mean value and error
for a given observation). This approach contrasts with more familiar metrics
such as the coefficient of determination (
The histogram intersection is a simple algorithm that calculates the
similarity of two discretized distributions
We note that results for NEE predictions are presented in the main figures of this paper, while those for LAI predictions are included in the supporting information.
The effective complexity of each model run links model structure (i.e., process
representation) and number of parameters to the information content of
assimilated data. It was computed using a principal component analysis (PCA)
on the posterior parameter space. When applied to CARDAMOM output, the PCA
reduces the posterior parameter space (
Effective complexity – defined as the number of principal components for which 95 % of the variance in the posterior parameter space is explained (see Sect. 2.5.2) – is primarily determined by model structure (Fig. 4a, inset). Specifically, over all runs included in the experiment, effective complexity varies far more between different models than between the other tested factors (assimilated data, observational error scalar, site, and EDC presence or absence). This link to model structure provides insight into the metric's interpretability and justifies its use as a measure of model complexity.
While predominantly determined by the choice of model, effective complexity also varies according to the degree to which parameters are constrained (Fig. 4a). It therefore captures the inter-relationship between model structure and parameterization. Within a given model structure, each of the experimentally varied factors yields a range of distinct complexities that follows a predictable pattern: effective complexity is higher for runs with weaker constraints on parameters than it is for runs with stronger constraints on parameters. This is easily interpretable in the case of assimilated data, which is the dominant within-model control on effective complexity (Fig. 4b). Runs for which no observations are ingested into the model have consistently higher effective complexities than runs for which NEE, LAI, and biomass observations are all ingested (compare yellow and purple circles in Fig. 4b), since the observational constraints reduce the possible model solution space. Similar behavior is also observed across the different error scalars tested in the experiment (larger observational error assumptions correspond to higher effective complexities (Fig. S2)) and between the presence versus absence of EDCs (the absence of non-observational realism constraints yields higher effective complexities (Fig. S3)). Conceptually, this pattern can be understood in the following way. Parameters in a given model's high-complexity runs were sampled from wider posterior distributions (due to weak or absent constraints) than in its low-complexity runs. This implies greater variance between parameter sets selected in high-complexity runs – and thus more distinct dimensions of variability in the posterior parameter space – than in low-complexity runs for the same model.
Influence of the experimentally varied factors on
effective complexity.
Across all runs performed in the experiment, the hypothesis that an intermediate-complexity carbon cycle model can outperform a low- or high-complexity model is confirmed, both when NEE is predicted (Fig. 5a) and when LAI is predicted (Fig. S4a). Runs on both extremes of the complexity axis perform poorly, due to overfitting in the low-complexity case (parameters are over-determined, leading to accurate predictions in the training period but poor ones in forecast) and underfitting in the high-complexity case (parameters are under-determined, yielding poor predictions in both training and forecast). Figure 3a and 3c demonstrate this contrasting behavior at the FR-LBr site.
When runs for which no data were assimilated – that is, runs with the least
informed parameters – are withheld from the analysis, increasing complexity
no longer degrades skill (Fig. 5b). More specifically, the relationship between
effective complexity and skill increases monotonically when all runs have
some baseline constraint on parameters. This result also holds regardless of
which variable is predicted (Fig. S4b) as well as when the number of runs within each
complexity bin is standardized via bootstrapping (Fig. S5). The decline in
performance attributable to the most extreme effective complexity scenarios
is also preserved across RMSE and
Relationship between effective complexity and NEE
forecast skill for
Assimilated data determine the shape of the overall complexity–skill relationship in COMPLEX. Not only does the presence of any assimilated observations control the response of skill to increasing complexity, but the specific choice of assimilated observations also matters. In particular, assimilating monthly NEE observations improves both NEE (Fig. 6a–c) and LAI predictions (Fig. S7a–c) by complex models over simple models: note the positive/increasing trends between complexity and skill in these cases. However, such improvements in predictive performance are not consistently observed across the complexity axis when other data, but not NEE, are ingested. The ingestion of LAI data and biomass estimates yields a small positive trend (Fig. 6d) – although this relationship is clearly weaker than when NEE is also assimilated (Fig. 6a) – and simple models informed only by LAI perform just as well as complex models when predicting NEE. Indeed, these runs show a constant skill level across the complexity axis (Fig. 6e). When predicting LAI, though, complex models outperform simple models with only the assimilation of LAI (Fig. S7e). All such combinations contrast with the case in which no data are assimilated: forecast skill for those runs declines with complexity, regardless of target variable (Figs. 6f, S7f).
Complexity–skill relationship for NEE predictions, split
by combination of assimilated data (title of each subplot). Average forecast
skill is computed using the histogram intersection metric. Ordering of
subplots reflects the strongest
Recall that the magnitude of skill – the degree of overlap between model predictions and observations (see Sect. 2.5.1) – reflects the ability of the model to capture the data along with its uncertainty. Particularly in scenarios corresponding to low effective complexities, models tend to overfit when NEE is assimilated (as demonstrated in Fig. 3a). Overfitting is a key factor causing the discrepancy in performance between low-complexity runs that do (e.g., Fig. 6c) and do not assimilate NEE (e.g., Fig. 6e).
Regardless of which data are assimilated, site-specific characteristics also introduce additional variability into the form of the relationship between effective complexity and skill (Fig. 7). To better understand and isolate site-specific dynamics, here we only interpret runs for which at least one data type is assimilated. Most sites show high-complexity performance optima, consistent with Fig. 5b. However, several are characterized by a threshold effect for which performance increases significantly once a certain effective complexity is attained and remains stagnant thereafter (e.g., a low-complexity threshold around 10 for FI-Hyy and FR-Pue). This “diminishing returns” effect suggests that the performance benefit of added structural detail has the potential to stabilize for all but the simplest models. The two tropical sites included in our analysis demonstrate additional unique dynamics. GF-Guy is the only site for which the performance of the most complex models appears to slightly degrade, even when all observations including NEE are assimilated, and no threshold is apparent at AU-How. Overall, the site analysis demonstrates the large variability in model performance across space, including between sites sharing biome classifications (e.g., FI-Hyy and FR-LBr) or broadly similar climate types (e.g., GF-Guy and AU-How).
We defined a concept of effective complexity that is linked to model structure and number of parameters as well as to the information content of calibration data (Fig. 4). This metric can inform future studies seeking to investigate the role of model complexity by providing a simple and comparable quantification of parameter posteriors. Conventional complexity measures (e.g., counts of observable model attributes) can serve as reasonable approximations of the more nuanced definition specific to ensemble methods that we present here. Still, effective complexity is rarely identical to the number of model parameters: it is generally lower. Correlations between model parameters can and do occur whether the model is poorly or well-constrained (Keenan et al., 2013) and whether it is simple or complex, implying that all carbon cycle models have “constrainable” dimensions. Importantly, though, none of the high-parameter models in our experiment have so much redundancy that their average effective complexity across runs is equivalent to that of any low-parameter model (Fig. 4). Whether this is also true for large-scale TBMs remains an open question.
Overall, the behavior of the effective complexity metric highlights that the best-performing analyses (i.e., runs with the highest forecast skill) in the COMPLEX maximize model structural breadth and minimize parametric uncertainty. Models built with high numbers of processes but without effective parameter constraints (i.e., runs that maximize structural breadth but do not attempt to minimize parametric uncertainty) are not sufficient to optimize performance (Fig. 5). Additionally, models of the carbon cycle can overfit if they are calibrated in too narrow a subset of conditions and underfit if they are improperly parameterized and therefore biased, as shown in Fig. 3.
Complexity–skill relationship for NEE predictions, split by site (title of each subplot). Only runs for which data were assimilated are plotted. Average forecast skill is computed using the histogram intersection metric.
The main factors controlling the observed complexity–skill relationship are
(a) whether, and which, data are assimilated into the model and (b) the geographical
location at which the analysis is undertaken. One way to interpret the role
of data in the relationship is explicit: models with the ability to
assimilate monthly observations of NEE, which uniquely represent the
integrated behavior of terrestrial carbon cycling and its internal dynamics,
are more likely to experience gains in skill with increased complexity than
those that cannot. This result is consistent with the prominent role of NEE
observations in reducing model projection uncertainty identified by
Keenan et al. (2013). The effects of LAI and biomass observations in COMPLEX are
somewhat more nuanced. All models in the DALEC suite are able to extract
information from the LAI data and produce reasonably skilled NEE predictions
(Fig. 6e), though such data do not improve the skill of complex models over simple ones. The
ingestion of LAI data most directly constrains specific features relating to
growth or carbon allocation, potentially informing the seasonality of NEE.
Finally, the impacts of biomass observations on forecast skill were
relatively muted in our experiments. Given that biomass data are
particularly useful for informing the carbon cycling of slow pools
(Williams et al., 2005), the relatively short calibration (5 years) and forecast periods (
Several recent TBM efforts have sought to enable the assimilation of eddy covariance or remote sensing observations (e.g., Bacour et al., 2015; Raoult et al., 2016; Schürmann et al., 2016; Peylin et al., 2016; MacBean et al., 2018; Norton et al., 2019) as well as measurements of functional traits (e.g., LeBauer et al., 2013). Our results underscore the value of such efforts to reduce parameter uncertainty, despite the fact that the computational costs associated with data assimilation are relatively high (e.g., MacBean et al., 2016). Increased use of emulators may help reduce this computational cost (Fer et al., 2018).
Given the demonstrated value of data constraints and the specification of their uncertainty (Fig. S6), the need to characterize and quantify this uncertainty (Keenan et al., 2011) remains particularly critical for model–data fusion studies. In this analysis, NEE uncertainty was assumed to remain constant both in time (i.e., for all observations regardless of season or year) and in space (i.e., across sites), which likely over-generalizes the specifications of individual sensors and the possibility of systematic or increasing biases. These assumptions become even more important to account for when assimilating global datasets, for which retrieval accuracy can vary across land cover types or with atmospheric conditions such as clouds or snow (e.g., Fang et al., 2013). One benefit of the Copernicus LAI product used here is its explicit, spatially variable quantification of uncertainty, which is still relatively rare for remote sensing datasets. Though the robustness of these uncertainties has been challenged with independent observations in some locations (e.g., Zhao et al., 2020), this approach represents a level of detail well-suited to the coupling of data to large-scale or global models.
The observed variability in the complexity–skill relationship across sites (Fig. 7) suggests that predictability itself is spatially heterogeneous. Further, it implies that the benefit to model performance accrued by the addition of a given process should not be expected to affect all locations uniformly, even when site-specific parameter uncertainty is minimized through calibration or optimization. Models not tuned locally likely smooth this spatial variability in predictability drastically (van Bodegom et al., 2012; Berzaghi et al., 2020), and thus model development and calibration must include locations spanning a wide range of vegetation, climate, soil characteristics, and disturbance histories.
Overall, our results suggest that the benefits of increased model complexity
(e.g., gains in skill attributable to the introduction of specific processes or to
additional detail applied to existing mechanisms) are attainable only when
parameters are sufficiently well characterized. Here, this benefit is
achieved when high complexity is balanced by data-assisted parameter
optimization (in particular, when NEE observations are assimilated). More
broadly, the relationship between complexity and skill is dynamic and
extends beyond model structural choices. As a result, it is difficult to
quantify whether model parameters corresponding to any specific model
implementation – including outside the DALEC suite – are adequately
informed such that increased model complexity is beneficial to performance.
To assist in this endeavor, we present the following recommendations for
model development and evaluation.
Assimilate well-characterized, repeat-observation datasets to constrain
model parameters at the scale of model application. Use long time series to undertake independent forecast evaluation studies,
and factor observational uncertainty into model evaluation (e.g., using overlap
metrics). Test whether model updates that add complexity lead to forecast improvements
(not only calibration improvements), and test for possible model
simplification improvements also. Seek to calibrate or optimize model parameters even when data assimilation
is not possible (e.g., using optimality-based approaches;
Walker et al., 2017; Jiang et al., 2020).
Finally, while beyond the scope of this study, future work will investigate the linkage between specific processes or process representations (e.g., the inclusion or exclusion of water cycling) and predictive performance to better parse ecological controls on the complexity–skill relationship.
This analysis tested a spectrum of structurally distinct representations of the carbon cycle based on the intermediate-complexity ecosystem model DALEC, which allowed for coupling with the CARDAMOM model–data fusion system in a computationally tractable manner. Because our findings are not explicitly linked to the roles of specific processes or model features, however, their implications extend beyond the use of DALEC-like models to a wide variety of ecological models, including TBMs.
Traditional (based on plant functional type, or PFT) parameter determination in TBMs is far from random. It is informed by data – for example, by hypotheses or generalizations derived from prior literature (e.g., Oleson et al., 2010; Lawrence et al., 2011) or by model calibration at specific locations (e.g., Williams et al., 1997) – and therefore endowed with ecological knowledge. Accordingly, TBM parameters are likely more informed than the least constrained parameters retrieved in our analysis, which were freely sampled from wide uniform distributions and caused the high-complexity decline in performance (Fig. 5). However, while this may be true locally, the common assumption on uniformity of parameters within PFTs casts doubt on their precision across the regional or global scales at which TBMs typically make predictions (van Bodegom et al., 2012). Indeed, using a suite of global TBMs participating in the Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP; Huntzinger et al., 2013), Schwalm et al. (2019) showed that increases in model performance were more often linked to the omission rather than inclusion of various processes, suggesting a tradeoff between complexity and skill similar to that observed here. This conclusion calls into question the conventional paradigm that greater complexity significantly and consistently improves skill across current TBMs.
Earth observation (EO) is one key approach that can provide the high-spatial- and high-temporal-resolution data on carbon cycling needed for more localized calibrations (Exbrayat et al., 2019). In COMPLEX, we used Copernicus LAI data, though there are also opportunities to ingest biomass maps from space lidar or radar, estimates of photosynthesis from solar-induced fluorescence (SIF), and satellite-based atmospheric inversions of regional NEE, among others, in future studies. If supplied with appropriate error estimates, these datasets can over time provide powerful constraints for high-resolution carbon cycle analyses with TBMs or DALEC-like models. A key research goal is to determine the appropriate model complexity for maximizing the information content of these EO data for robust forecasts and analyses.
Alternative methodologies for deriving ecosystem parameters outside the realm of PFTs are also becoming increasingly common (van Bodegom et al., 2012; Bloom et al., 2016; Exbrayat et al., 2018; Berzaghi et al., 2020; Fisher and Koven, 2020) and may represent a way forward in addressing the tradeoff between structural and parametric uncertainty. Recent work has focused on upscaling in situ trait data (e.g., from the TRY database; Kattge et al., 2020) to yield spatially variable maps of key ecosystem parameters, using modeled relationships with climate or canopy properties (often referred to as environmental filtering relationships, since the environment “filters” the possible distribution of parameters at a given location; e.g., Verheijen et al., 2013; van Bodegom et al., 2014; Butler et al., 2017), leaf economics (Sakschewski et al., 2015), or optimality theory (e.g., Smith et al., 2019). Other studies have investigated how TBM parameters optimized at eddy covariance sites covary with climate (e.g., Peaucelle et al., 2019; Wu et al., 2020b). These efforts are not without their challenges, however. The spatial coverage of in situ trait data as well as eddy covariance sites is sparse relative to the large diversity of ecosystem behavior (Schimel et al., 2015), and such datasets also comprise a non-representative sample of species and disturbance histories (Sandel et al., 2015). These biases may limit the representativeness of the modeled relationships. Taking a different approach, a small subset of models has also been developed to operate altogether independently from the paradigm of PFTs (e.g., using trait-based approaches; Scheiter et al., 2013; Pavlick et al., 2013; Fyllas et al., 2014). Our results imply that these and future developments to improve the flexibility of model parameters will play critical roles in enabling the trend of increasing model complexity and may be a more fruitful avenue towards reducing the uncertainty of TBM prediction than model structural changes and additions.
Our approach to understanding the relationship between model complexity and model predictive performance is novel in its focus on sampling the spectrum of possible parameter uncertainty states for a variety of model structures and calibration data. Taken together, lessons learned from the behavior of the effective complexity metric as well as the data and site effects discussed here represent a comprehensive pattern: improving the robustness of parameter calibration is a prerequisite for effectively increasing structural complexity. Specifically, we found that increasing model complexity actively degrades predictive skill in the most extreme cases of parameter uncertainty. Assimilating data – particularly monthly observations of net ecosystem exchange – considerably improve the performance of complex models relative to simple models, though the magnitude and persistence of this improvement vary across space. Overall, the growing focus on understanding and reducing parametric uncertainties within large-scale models (such as via direct data assimilation, the development and implementation of alternatives to PFTs, and parameter sensitivity analyses; e.g., Fisher et al., 2019, and more) is both a necessary direction and a significant opportunity for improving the predictability of the terrestrial biosphere. Our conclusion for model construction and usage matches those from other scientific fields, as stated by Albert Einstein: “to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience” (Caprice, 2013).
The Data Assimilation Linked Ecosystem Carbon (DALEC) model suite includes a range of related intermediate-complexity models of the terrestrial carbon cycle. Each model version is comprised of sub-models related to different simulations of photosynthesis, plant and heterotrophic respiration, canopy phenology, stomatal conductance, and the inclusion of water cycling (Table 1). The sub-models are described in detail in the following sections (Sect. A.1–A.5). Each section contains a table highlighting the key features of each sub-model (Tables A1–A5).
The Aggregated Canopy Model Version 1 (ACM1) estimates canopy gross primary
productivity (i.e., photosynthesis) as a function of temperature, shortwave
radiation, day length, atmospheric CO
ACM1 has no explicit capacity to simulate drought or direct overheating
stress on canopy processes. Canopy photosynthesis is connected to the wider
carbon cycle through the leaf area, although the role of the roots in water
supply is neglected as is its interplay with CO
The GPP module also includes an empirical cold-weather GPP limitation
sensitivity function. The cold temperature limitation factor (denoted as
The aggregated canopy model for gross primary productivity and evapotranspiration is the successor version to ACM1, hereafter known as ACM2 (Smallman and Williams, 2019). ACM2 builds on the ACM1 outline creating a model of ecosystem water cycling to facilitate the implementation of a mechanistic stomatal conductance model linking the canopy to soil water via fine roots. ACM2 also optimizes the stomatal intrinsic water use efficiency (for details see Williams et al., 1996; Bonan et al., 2014). ACM2 simulates shortwave and longwave isothermal radiation balances, canopy interception of rainfall, and soil infiltration. ACM2 is therefore capable of simulating canopy transpiration, soil evaporation, evaporation of canopy intercepted rainfall, soil water runoff and drainage.
For the analytical Ball-Berry GPP module of CARDAMOM, leaf-level GPP and
stomatal conductance are calculated using the coupled leaf
photosynthesis–stomatal conductance developed by Ball–Berry (Ball et al., 1987) and an
analytical solution to the system of equations developed by Baldocchi
(Baldocchi, 1994). This new module serves to calculate both GPP and evapotranspiration
coupled through the stomatal behavior. This formulation added the maximum
rate of carboxylation (
Summary of the key features for each photosynthesis sub-model.
Autotrophic (plant) respiration (
Summary of key features for each respiration sub-model.
Autotrophic respiration (
It varies in space as a retrieved location-specific parameter. A prior value
(
We continue to retrieve a location-specific fixed fraction of GPP respired
as
This formulation allows for variation between the proportion of
Summary of key features for each decomposition sub-model.
The sensitivity of
Following Reich et al. (2008), the estimation of canopy
maintenance respiration occurs in two stages: (i) estimation of the canopy
maintenance respiration per gram of leaf carbon at 20
[N
The factor of 14 is the atomic weight of nitrogen and 1000 scales the result from molN g
Temperature strongly impacts metabolic activity and thus maintenance
respiration. The canopy maintenance respiration (
The instantaneous temperature response is well captured by existing models. However, the impact of long-term temperature changes and associated acclimation of both photosynthetic and respiratory pathways are not accounted for. Therefore, simulations over longer timescales may overestimate negative feedbacks of increased canopy maintenance respiration due to warming (Atkin et al., 2015; Wang et al., 2020).
Summary of key features for each phenology sub-model.
Summary of key features for each water cycle sub-model.
Heterotrophic respiration results from decomposition and mineralization
processes in carbon pools containing dead organic matter. Depending on the
model structure, these can include a fine litter pool (
All dead organic matter pools follow a common basic form of a pool-specific
turnover parameter (
Heterotrophic respiration regulated by both temperature (as in Sect. A.3.1) and a linear
function of the ratio of current precipitation to the site mean (as proxy
for near-surface soil moisture). The functional form allows for varying
linear sensitivity, such that
The CDEA phenology model is based primarily on a day of year approach to simulate the turnover of a labile pool to support canopy growth and subsequent canopy turnover (Bloom and Williams, 2015). Each time step, a fixed fraction of GPP is allocated to the canopy and a labile pool which supplies the canopy with new growth based on the CDEA model. The CDEA model uses parameterized values for the peak day of year for labile turnover (i.e., supplying leaf growth) and leaf turnover plus two further parameters which define the standard deviation of a Gaussian distribution specifying the period of time over which canopy phenology occurs. The fraction of the canopy which is turned over each year is defined by a leaf lifespan parameter, while the labile pool is assumed to fully turnover each year.
The CDEA model provides an easy-to-calibrate diagnostic model of mean canopy phenology. However, it does not vary phenology in response to changing environmental conditions limiting simulation of inter-annual variability. As a result, the CDEA model has a limited capacity to inform on the meteorological drivers of canopy phenology.
Phenology is the same as Sect. A.4.1; labile C release to foliar C is optimizable (annually
Canopy phenology is sensitive to environmental conditions (e.g., Jolly et al., 2005; Forkel et al., 2015) and plant carbon economic constraints (e.g., Flack-Prain et al., 2021) driving interannual variation in leaf area dynamics. The growing season index (GSI) is a piecewise model linking canopy phenology to linear functions of day length, temperature, and vapor pressure deficit scaled 0–1 (GSI; Jolly et al., 2005). The GSI model was implemented in Smallman et al. (2017) and augmented to include a requirement for new leaf area to lead to an increase in GPP greater than a critical threshold retrieved as part of CARDAMOM.
However, we note that recent plant economic theory indicates that canopies are optimizing net canopy carbon export (NCCE; e.g., Thomas et al., 2019; Flack-Prain et al., 2021) – that is, photosynthesis minus respiratory and construction costs, rather than photosynthesis alone. To investigate this level of process complexity, in Sect. A.4.4 we include a canopy maintenance respiration model to assess the NCCE.
Optimality theory is increasingly being used to explain canopy phenology based on maximizing some metric of the carbon economy. One approach which is gaining support is optimizing net canopy carbon export (NCCE): that is, ensuring photosynthetic gains are greater than costs associated with leaf growth and maintenance respiration (e.g., Thomas and Williams, 2014; Flack-Prain et al., 2021). While further research is needed to refine these theoretical models, we implement a model consistent with existing literature.
The GSI model proposes an amount of new leaf area. Whether this grows or not is determined by quantifying whether the increase in GPP averaged over the expected life span of the leaf is greater than the increased maintenance respiration costs and the carbon required to construct the new leaf and the associated growth respiration.
The bucket approach extends the DALEC baseline structure to include a
plant-available water pool, where the hydrological balance is defined as the
sum of precipitation inputs (
The ACM2 model includes a multi-layer representation of the soil and root access (Smallman and Williams, 2019). There are five soil layers, three of which are accessible to roots to supply the canopy with water. The top two layers have a fixed thickness of 10 and 20 cm, respectively, with a third layer which is expandable based on root penetration. Soil-layer-specific field capacity, porosity, and hydraulic conductances are calculated using soil texture. Using these data, infiltration of precipitation, drainage between soil layers, soil hydraulic resistance to root uptake of water, and soil surface evaporation are estimated. Soil surface evaporation occurs from the top soil layer only. For a complete description, see Smallman and Williams (2019).
Carbon cycle structure for models C1–C8.
Carbon cycle structure for model E1.
Carbon cycle structure for models G1–G4.
Carbon cycle structure for model S1.
Carbon cycle structure for model S2.
Carbon cycle structure for model S3.
Carbon cycle structure for model S4.
COMPLEX uses information from six sites across the globe (Fig. C1). The
selection aimed to maximize their biogeographical spread and diversity of
natural ecosystems while fulfilling specific data requirements. A key DALEC
model criterion requires that the sites must not be dominated by C
Time series information on leaf area are drawn from the (EO) Copernicus 1 km product derived from Earth observations which provides estimates of LAI magnitude at fine temporal resolution and concurrent location-specific estimates on uncertainty. Using this EO product and the abovementioned calibration/prediction period constraints requires site data collection periods to be post-1998.
Simulation of NEE is a key focus of COMPLEX, making the
availability of long-term, temporally consistent, high-quality NEE estimated
derived from eddy covariance essential (e.g., FLUXNET2015;
Pastorello et al., 2020). The FLUXNET2015 database provides consistent
information on data quality (e.g., observation uncertainty and proportion of
model–data gap-filling) that underpins the site selection process. Here, to
avoid comparing DALEC-simulated NEE with largely statistically gap-filled
observations, only sites with
Hill et al. (2012) demonstrated that assimilation of NEE observations provides substantial new information up to at least 5 years in duration. To create a balanced experimental design, COMPLEX sites are required to have a minimum of 10 years of observations (i.e., 5 years calibration and remainder evaluation). Building on existing analyses with DALEC (e.g., Smallman et al., 2017), COMPLEX quantifies the role of woody biomass information on constraining the DALEC models' predictive capacity of NEE. Therefore, multiple wood stock estimates are required spanning both the calibration and prediction periods. As determining the amount and access of inventory data often requires direct contact with site managers, this stage occurs later in the selection process.
Map of FLUXNET sites used in the experiment.
Collectively, the abovementioned and model process representations formed
the basis of a site selection procedure to filter the FLUXNET2015 database.
This process ultimately led to the selection of six sites (Table 2).
Sites must represent a natural ecosystem (i.e., remove arable agriculture and
intensively grazed sites) dominated by C Sites have observations spanning Sites have Contact site managers to determine availability of wood stock observations.
Data generated in COMPLEX (performance and complexity
metrics corresponding to each model run) are publicly available at
The supplement related to this article is available online at:
AAB, MW, TLS, AGK, GRQ, SFP, VM, and CAF planned the analysis. CAF, TLS, PAL, GRQ, SFP, VM, NCP, SGS, YY, AAB, MW, and AGK contributed to model development. TLS and MW developed site selection criteria, contacted site PIs, and gathered input data. TLS and PAL executed model runs. CAF performed analysis on model outputs. CAF wrote the manuscript with contributions from TLS, PAL, GRQ, AAB, MW, and AGK. All authors reviewed drafts of the manuscript.
The authors declare that they have no conflict of interest.
This work has made use of the resources provided by the Edinburgh
Compute and Data Facility (ECDF) (
Caroline A. Famiglietti, Gregory R. Quetin, and Alexandra G. Konings were supported by NSF DEB-1942133 and NASA NNH16ZDA001N-IDS. Operation of the US-Ha1 site is funded by the US Department of Energy's Office of Science (DE-AC02-05CH11231) and National Science Foundation LTER funding (DEB-1832210). The Howard Springs site is funded by the Australian Research Council FT1110602, DP160101497, and Australian Terrestrial Ecosystem Research Network – Ecosystems Process platform. Mathew Williams received funding from NERC (NE/P018920/1), the UK Space Agency, Newton Fund CSSP Brazil, and the Royal Society.
This paper was edited by Sönke Zaehle and reviewed by Enqing Hou and one anonymous referee.