Scale-variance in the carbon dynamics of fragmented, mixed-use landscapes estimated using Model-Data Fusion

. Many terrestrial landscapes are heterogeneous. Mixed land cover and land-use generate a complex mosaic of fragmented ecosystems at fine spatial resolutions with contrasting ecosystem stocks, traits and processes, each differently sensitive to environmental and human factors. Representing spatial complexity within terrestrial ecosystem models is a key challenge for understanding regional carbon dynamics, their sensitivity to environmental gradients, and their resilience in the face of climate change. Heterogeneity underpins this challenge due to the trade-off between the fidelity of ecosystem representation within 5 modelling frameworks and the computational capacity required for fine-scale model calibration and simulation. We directly address this challenge by quantifying the sensitivity of simulated carbon fluxes in a mixed-use landscape in the UK to the spatial resolution of the model analysis. We test two different approaches for combining EO data into the CARDAMOM Model-Data Fusion (MDF) framework, assimilating time series of satellite-based Earth Observation (EO) derived estimates of ecosystem leaf area and biomass stocks to constrain estimates of model parameters and their uncertainty for an intermediate complexity 10 model of the terrestrial C cycle. In the first approach, ecosystems are calibrated and simulated at pixel-level, representing a "community average" of the encompassed land cover and management. This represents our baseline approach. In the second, we stratify each pixel based on land-cover (e.g. coniferous forest, arable/pasture etc.), and calibrate the model independently using EO data specific to each stratum. We test the scale-dependence of these approaches for grid resolutions spanning 1 ◦ to 0.05 ◦ over a mixed land-use region of the UK. Our analyses indicate that spatial resolution matters for MDF. Under the 15 "community-average" baseline approach biological C fluxes (GPP, R eco ) simulated by CARDAMOM are relatively

Abstract. Many terrestrial landscapes are heterogeneous. Mixed land cover and land use generate a complex mosaic of fragmented ecosystems at fine spatial resolutions with contrasting ecosystem stocks, traits, and processes, each differently sensitive to environmental and human factors. Representing spatial complexity within terrestrial ecosystem models is a key challenge for understanding regional carbon dynamics, their sensitivity to environmental gradients, and their resilience in the face of climate change. Heterogeneity underpins this challenge due to the trade-off between the fidelity of ecosystem representation within modelling frameworks and the computational capacity required for fine-scale model calibration and simulation. We directly address this challenge by quantifying the sensitivity of simulated carbon fluxes in a mixed-use landscape in the UK to the spatial resolution of the model analysis. We test two different approaches for combining Earth observation (EO) data into the CARDAMOM model-data fusion (MDF) framework, assimilating time series of satellite-based EO-derived estimates of ecosystem leaf area and biomass stocks to constrain estimates of model parameters and their uncertainty for an intermediate complexity model of the terrestrial C cycle. In the first approach, ecosystems are calibrated and simulated at pixel level, representing a "community average" of the encompassed land cover and management. This represents our baseline approach. In the second, we stratify each pixel based on land cover (e.g. coniferous forest, arable/pasture) and calibrate the model independently using EO data specific to each stratum. We test the scale dependence of these approaches for grid resolutions spanning 1 to 0.05 • over a mixed-land-use region of the UK. Our analyses indicate that spatial resolution matters for MDF. Under the community average base-line approach biological C fluxes (gross primary productivity, R eco ) simulated by CARDAMOM are relatively insensitive to resolution. However, disturbance fluxes exhibit scale variance that increases with greater landscape fragmentation and for coarser model domains. In contrast, stratification of assimilated data based on fine-resolution land use distributions resolved the resolution dependence, leading to disturbance fluxes that were 40 %-100 % higher than the baseline experiments. The differences in the simulated disturbance fluxes result in estimates of the terrestrial carbon balance in the stratified experiment that suggest a weaker C sink compared to the baseline experiment. We also find that stratifying the model domain based on land use leads to differences in the retrieved parameters that reflect variations in ecosystem function between neighbouring areas of contrasting land use. The emergent differences in model parameters between land use strata give rise to divergent responses to future climate change. Accounting for fine-scale structure in heterogeneous landscapes (e.g. stratification) is therefore vital for ensuring the ecological fidelity of large-scale MDF frameworks. The need for stratification arises because land use places strong controls on the spatial distribution of carbon stocks and plant functional traits and on the ecological processes controlling the fluxes of C through landscapes, particularly those related to management and disturbance. Given the importance of disturbance to global terrestrial C fluxes, together with the widespread increase in fragmentation of forest landscapes, these results carry broader significance for the application of MDF frameworks to constrain the terrestrial C balance at regional and national scales.

Introduction
Over the past decade, terrestrial ecosystems have provided a global net carbon (C) sink sequestering ∼ 3.4 ± 0.9 PgC yr −1 , ∼ 30 % of anthropogenic CO 2 emissions, despite estimated emissions of ∼ 1.6 ± 0.7 PgC yr −1 associated with land use and land cover change (Friedlingstein et al., 2020). The future trajectory of the terrestrial carbon sink will therefore have a significant impact on global efforts to achieve the goal of the UN Framework Convention on Climate Change to avoid dangerous climate change, reaffirmed in the Glasgow Climate Pact (UNFCCC, 2021). Quantification of spatial and temporal variations in exchange magnitude, alongside their associated uncertainties, is therefore essential to understanding the stability of the terrestrial carbon sink in the face of rapid environmental change (Hurlbert et al., 2019), and a prerequisite to robust national reporting of land-based CO 2 emissions and their attribution to different sectors (Grassi et al., 2017;Jones and Friedlingstein, 2020;McGlynn et al., 2022). Terrestrial biosphere models provide a means of quantifying the land carbon balance in a systemic, ecologically coherent way (Bonan and Doney, 2018). However, the current and future dynamics of terrestrial C exchange are highly uncertain, largely due to uncertainties in the structure and parameter constraints of the biosphere models themselves (Lovenduski and Bonan , 2017;Smallman et al., 2021).
Global land use and land cover change has increased the fragmentation of ecosystems, creating highly heterogeneous landscapes that host a mosaic of land cover and uses (Lindenmayer and Fischer, 2013;Brink et al., 2017;Matricardi et al., 2020). This heterogeneity juxtaposes ecosystems with contrasting C stocks, traits and ecological processes, management, and environmental sensitivity, at length scales of 10-100 m. For example, within the UK, landscapes comprise a patchwork of managed arable land and pasture, semi-natural and plantation woodland, heath, and settlements ( Fig. 1). Insight into the dynamics of this patchwork of ecosystems has been greatly accelerated by the proliferation of Earth observation (EO) data from satellites that monitor ecosystems with ever-increasing spatial and temporal resolution . A major challenge is to synthesise this expanding range of EO data to generate systemic understanding of the terrestrial C cycle, thus transforming ecosystem observation into ecological understanding that can inform policy development and facilitate land management .
Model-data fusion (MDF) frameworks provide the means to integrate EO observations with spatially explicit processbased ecosystem models that encapsulate our understanding of how C flows through ecosystems (Luo et al., 2011), thereby providing key, mass-balanced, constraints on the fluxes of C between the atmosphere and land surface alongside their associated uncertainties (Niu et al., 2014;Bloom et al., 2016;Peylin et al., 2016;MacBean et al., 2018;Small-man et al., 2021). MDF frameworks that exploit intermediate complexity models of the terrestrial C cycle, such as CARDAMOM (Bloom et al., 2016;Exbrayat et al., 2018;Lopez-Blanco et al., 2019;Smallman et al., 2021), are able to generate "local" calibrations based on pixel-level inversions of EO and auxiliary data streams. Calibrating ecosystem models to local data is important, because the functional traits of ecosystems vary in space (Smith et al., 2013;Reich et al., 2014;Butler et al., 2017;Exbrayat et al., 2018;Lopez-Blanco et al., 2019;Smallman et al., 2021), with trait differences within biomes often exceeding differences between biomes (Van Bodegom et al., 2012;Butler et al., 2017); failure to account for such variations may lead to biases in the estimated dynamics (Scheiter et al., 2013;Exbrayat et al., 2018). However, the computational intensity of largescale MDF frameworks limits their spatial resolutions to 10-100 km, several orders of magnitude greater than the length scales relevant to differentiating the ecosystems within landscape mosaics (e.g. Kaminski et al., 2012;Smith et al., 2013;Kuppel et al., 2014;Bloom et al., 2016;Peylin et al., 2016;Yin et al., 2020;Smallman et al., 2021).
The scale disparity between model domains and the ecological fabric those domains represent poses a major challenge to large-scale modelling of terrestrial C dynamics in heterogeneous landscapes (Stoy et al., 2009;Fisher and Koven, 2020;Levy et al., 2022). In typical spatially distributed MDF applications, available observations are aggregated to pixel-level "community averages" prior to inversion. There are usually sufficient degrees of freedom in ecological process models to fit the observed temporal changes in aggregated stocks and fluxes, based on available observation constraints (Beven, 2006;Famiglietti et al., 2021). Nevertheless their ecological fidelity may be limited in heterogeneous landscapes, for which the parameters retrieved by these community average models provide intermediate representations of the distinct ecosystems present. This limitation compromises efforts to attribute fluxes to specific land uses and raises potential for significant sources of bias when estimating the terrestrial carbon balance and its environmental sensitivity. Firstly, the C cycle represents the interplay of a number of nonlinear ecological processes, and therefore upscaling raises the familiar foe of Jensen's inequality (Jensen, 1906;Levy et al., 2022), whereby for a set of input variables, X, the expectation value for a nonlinear function f (i.e. E[f (X)]), will not yield the same estimate as the same nonlinear function applied to the average values of those variables (f (E[X])), leading to scale variance. In the case of terrestrial C fluxes, land use places strong controls both on the distribution of carbon stocks and plant functional traits within the landscape and on the processes controlling the fluxes of C through landscapes, particularly those related to exogenous processes such as management and disturbance. The inversion of the pixel-average environmental signals may provide biased diagnostics, in particular where pixels comprise a mixture of different ecosystem types and man- Figure 1. Perspective view of Sentinel-2 imagery over a typical landscape sampled from the study area illustrating the fine-scale mosaic of land use characteristic of this region. The spatial extent of the displayed domain is 10 km × 10 km and comprises part of the North River Tyne catchment in Northumberland, with the spatially extensive coniferous woodlands of Kielder Forest encroaching into the NW of the scene (top). Contains modified Copernicus Sentinel data (2021).
agement. As the model domains become coarser, the length scales over which the environmental signal is averaged increases. Failing to account for the co-location of stocks and process imposed by land use in mixed-use landscapes (e.g. concentration of C stocks in woodland, where timber harvest is focused) provides a clear source of potential systematic scale-variant bias in derived flux estimates across large scales. Additionally, community average models may miss or poorly represent processes specific to certain land uses (White et al., 2019;Kondo et al., 2020). To ensure the ecological fidelity of large-scale ecosystem C-cycle models, it is therefore vital to adequately capture the essential processes controlling the fluxes of C through these different ecosystems and their potentially divergent temporal dynamics and environmental sensitivities (Levy et al., 2022).
In this study, we specifically address the impact of the resolution trade-off in spatially explicit MDF frameworks between ecological fidelity and computational intensity by investigating how simulated carbon cycling in a mixed-use landscape in the UK responds to the spatial resolution of the model grid. We test two different MDF approaches that assimilate EO information of ecosystem characteristics to constrain model parameters and uncertainty for an intermediate complexity model of the terrestrial C cycle, DALEC (Williams et al., 2005;Bloom et al., 2016;Smallman et al., 2017Smallman et al., , 2021. In the first approach, ecosystems are calibrated and modelled at the pixel level, representing a community average of the encompassed land cover and management. This corresponds to the approach commonly employed in large-scale ecosystem MDF frameworks (e.g. Smith et al., 2013;Bloom et al., 2016;Yin et al., 2020;Smallman et al., 2021). In the second, we stratify each pixel based on land cover and calibrate the model independently using remotely sensed data specific to each stratum, aligning more closely with the tiled plant functional type (PFT) approach employed in many terrestrial biosphere models (e.g. Sitch et al., 2008;Kaminski et al., 2012;Kuppel et al., 2014). The novelty of introducing stratification within a MDF context is that we use fine-scale ecosystem information contained within EO data to retrieve locally calibrated parameter ensembles for the ecosystems represented by each stratum and therefore retain that key advantage of MDF systems, which enables calibrated traits to vary across environmental gradients, within the constraints of the available observations and ecological knowledge . However, by stratifying pixels we attempt to minimise the extent to which the environmental signals being inverted are averaged across functionally distinct ecosystems, thus ameliorating one source of scale-variant error in the estimated C fluxes. Furthermore, the model parameters retrieved through MDF synthesise the ecological information relayed by the assimilated data streams, within the constraints imposed by model structure and data quality. At coarser scales, aggregating observation streams results in the loss of ecological information, which we expect to be particularly marked in heterogeneous landscapes. Stratification provides one mechanism through which this ecological information loss may be reduced. Of course, the ecological fidelity of the model calibrations may still be limited where the model structure cannot adequately represent important processes, or where there are systematic errors and biases in the assimilated data; stratification by itself does not resolve these components of ecological fidelity, but it does open avenues through which they may be addressed.
We test the two MDF approaches -the novel subpixel stratification approach and the traditional pixel average (baseline) approach -on grid resolutions spanning 1 to 0.05 • . Specifically we address the following hypotheses: -H1. Estimated C fluxes will be scale variant, with stronger resolution sensitivity exhibited by exogenous fluxes (i.e. disturbance) compared to biogenic fluxes (e.g. gross primary productivity, GPP).
-H2. C fluxes will be more consistent across grid resolutions when the framework explicitly accounts for sub-pixel heterogeneity in land use; estimates from the baseline (unstratified) experiments will converge on the stratified estimates at finer spatial resolutions.
-H3. Aggregating data to coarser spatial resolutions will result in parameter estimates that increasingly fail to capture functional variations between land cover types, but stratifying the landscape prior to aggregation reduces this functional information loss.
Using MDF to combine models and data at local scale offers huge potential for rigorous quantification of the state 3304 D. T. Milodowski et al.: Scale variance in the carbon dynamics of fragmented landscapes and dynamics of the terrestrial C cycle across large spatial scales, with propagation of uncertainty through analyses (Bloom et al., 2016;Smallman et al., 2022). In testing the hypotheses outlined above, we seek to address a key challenge relating to the mismatch between the scales of ecological processes and of large-scale MDF frameworks through the development of a novel stratified MDF framework. Our approach retains the core advantages of MDF, namely local calibration with local information, while also capturing the fine-scale ecosystem heterogeneity common to fragmented or mixed-use landscapes.

Study area
The study area for this site covers northern England and the Scottish borders, spanning approximately 30 000 km 2 across 3 • of longitude and 1 • of latitude (Fig. 2). The region comprises a mosaic of land cover types, including coniferous plantation forest (including the nationally significant forestry estates of Kielder Forest, Eskdalemuir Forest, and Galloway Forest), fragments of broadleaf woodland, upland heath, arable agriculture, and pasture. The longitudinal extent stretches from coast to coast, from the Firth of Clyde in the west to the North Sea in the east. Elevation varies from sea level to a high of 978 m on Scafell Pike in the Lake District. These gradients in longitude and elevation are associated with gradients in both precipitation and temperature (Jenkins et al., 2009). Precipitation decreases from west to east in response to the prevailing westerly wind direction and orographic enhancement of rainfall in areas of high topography; temperature gradients are broadly controlled by elevation.
2.2 Model-data fusion with CARDAMOM (CARbon DAta MOdel fraMework)

DALEC
At the core of our model-data fusion framework sits DALEC, an intermediate complexity model of the terrestrial C cycle (Williams et al., 2005;Bloom and Williams, 2015;Smallman et al., 2017;Famiglietti et al., 2021). DALEC is a mass balance model of the C cycle with carbon moving through different pools based on parameterised fluxes (Fig. 3). A number of variants of DALEC have been created representing ecosystem carbon dynamics with varying degrees of complexity (Famiglietti et al., 2021;Smallman et al., 2021). The specific version of DALEC used here corresponds to the C6 model outlined in Famiglietti et al. (2021), which combines the C-cycle structure from Bloom and Williams (2015) with the revised photosynthesis model from Smallman and Williams (2019). There are four live biomass pools, specifically relating to carbon stored in fo-liage, labile carbon, fine roots and wood, and two dead organic carbon pools: litter and soil organic carbon. Carbon enters the system through GPP, modelled using the photosynthesis model ACM2 , wherein GPP is simulated as a function of modelled leaf area, estimated canopy photosynthetic efficiency, absorbed solar radiation, atmospheric CO 2 concentration, air temperature, and a stomatal conductance model that balances potential water supply from the soil (assumed to be at field capacity) through the roots with atmospheric demand, determined by absorbed solar radiation and vapour pressure deficit (VPD). Carbon is lost from the system via autotrophic and heterotrophic respiration. Net primary production (NPP) is allocated between autotrophic respiration and the live pools based on fixed fractions. Canopy growth is driven by a combination of direct allocation from GPP and transfer of carbon from the labile pool. The flux of carbon from the labile pool to foliage and canopy senescence, driving litterfall, are controlled by a simple day-of-year phenology model with a parameterised leaf life span (Bloom and Williams, 2015). Carbon flows from the roots and wood into the litter and soil organic carbon pools, respectively, based on first-order turnover rates. Heterotrophic respiration fluxes also follow first-order kinetics, but with an exponential temperature sensitivity. A full list of model parameters is provided in the Appendix (Table A1). The relative simplicity compared to other terrestrial biosphere models makes DALEC amenable to calibration in model-data fusion frameworks and allows propagation of uncertainties through large ensemble simulations (Bloom et al., 2016;Exbrayat et al., 2018;Famiglietti et al., 2021;Smallman et al., 2021).

Model-data fusion
Our model-data fusion framework, CARDAMOM (Bloom et al., 2016), uses a Bayesian approach within an adaptive proposal Markov chain Monte Carlo (AP-MCMC) framework (Haario et al., 2001) that can assimilate a range of information (Bloom et al., 2016), including remotely sensed leaf area index (LAI) and aboveground biomass (see Sect. 2.3). The premise of the approach is to take driving data describing the meteorology and disturbances such as forest clearance and fire and search the model parameter space to find parameter combinations that provide simulated dynamics that are consistent with the available data. Specifically, given a set of observations, O, with uncertainty σ , the probability of a given parameter set x, P (x|O), is calculated as a function of the likelihood of the observations given the current parameters, P (O|x), and any prior knowledge on the parameter distributions, P (x): The likelihood P (O|x) is calculated based on the misfit between the N available observations and the equivalent simu- (2) We use the Gelman-Rubin convergence criterion to determine whether multiple chains at each pixel have converged. The AP-MCMC (Haario et al., 2001) does not stipulate or target an acceptance rate; the emergent acceptance rate typically varies between 5 % and 25 %. The covariance matrix used in adapting the parameter sampling is generated from an initial phase of the MCMC. No hyperparameters are estimated as part of the process.
To facilitate the calibration process, we employ a series of ecological dynamic constraints, EDCs (Bloom and Williams, 2015;Smallman et al., 2017). EDCs comprise a series of mathematical rules and functions that impose conditions on the inter-relationships between model parameters to ensure ecological "realism" in the accepted parameter sets and are based on ecological theory (Bloom and Williams, 2015). For example, the turnover of the wood carbon pool must be slower than foliage turnover. Where EDCs are not satisfied, the likelihood is set to zero. By restricting the acceptable parameter space, the EDCs therefore reduce the effective model complexity (Famiglietti et al., 2021) and tend to reduce bias and equifinality in the calibrated ensembles (Bloom and Williams, 2015). The resulting ensemble of parameter sets encapsulates the uncertainty in the calibration within the available observational constraints.

Stratification approach for mosaic landscapes
Our approach to handling fine-scale heterogeneity during the model-data fusion process is based on sub-pixel stratification based on land use (Fig. 3). Stratification is achieved by sampling the spatially gridded EO data products at their native spatial resolution based on a reference land cover map, resampled to the same resolution using the modal category.
The specific land cover product used is the LCM2015 land cover map produced by the UK Centre for Ecology and Hydrology (CEH) (Rowland et al., 2017), which we aggre-gate to four classes: coniferous woodland, broadleaf woodland, arable/pasture, and heathland, which includes seminatural grasslands and widespread areas of non-wooded upland heath ( Fig. A1 in the Appendix). Urban and coastal areas are masked from all analyses. For each pixel, separate ensembles are calibrated independently, yielding a suite of ensembles that maintain the ecological fidelity of the calibrated parameters. While aggregation of observations to coarser resolution unavoidably results in information loss, stratification preserves the distinctions between functionally distinct ecosystems; therefore in this sense, the ecological fidelity of the resultant suite of ensembles is maintained within the limitations of the model structure and data quality. This is a contrast to the "traditional" model-data fusion approach, which aggregates the data constraints into pixel-level community averages prior to calibration, yielding calibrated parameter combinations that may be attempting to account for a multitude of distinct ecological processes. However, when considering the ecological fidelity of the calibrated models, it is important to note that CARDAMOM will attempt to find parameters that satisfy the observation constraints and their uncertainties, and therefore systematic errors associated with particular data streams will propagate to lead to parameter estimates that do not provide a good representation of the ecology. We discuss this in relation to the impact of overly seasonal LAI observations for conifer woodland canopies and the resultant impact on canopy parameters in Sect. 4.3.
The stratification approach is very flexible. The number of categories can be refined as necessary, within the data constraints. Regarding aggregation of uncertainties, we do not have constraints on the extent to which pixel uncertainties are correlated in space. Therefore for each stratum, spatial aggregation of uncertainty conservatively assumes correlated uncertainties (Exbrayat et al., 2018). However, we assume that individual strata, representing different ecosystems, are uncorrelated with other strata when aggregating sub-pixel ensembles to pixel level. For simplicity of comparison across the experiments in this study against the baseline experiments (i.e. no stratification), we use only one model structure across all strata and pre-process the assimilated data streams in the same way. For strata where woody tissues are not part of the dominant vegetation types, for example in areas covered by crop and pasture, the C Wood pool also provides a reservoir for non-woody structural tissue, with the differential allocation patterns and turnover rates reflected in the retrieved parameters. Importantly, different ecosystems could in the future be modelled with distinct, ecosystem-specific models that better capture their functional process dynamics. Relevant ecosystem-specific model variants have previously been integrated within the CARDAMOM framework, for example woodlands (Smallman et al., 2017), pasture (Myrgiotis et al., 2021), and arable agriculture (Revill et al., 2021). Given the computational limitations on the resolution of the model domain, stratification would be prerequisite to the inclusion of ecosystem-specific models within regional CAR-DAMOM applications.

Copernicus LAI 300 m
LAI data are obtained from the 300 m Copernicus LAI product v1.0 (Fuster et al., 2020) for the period 2014-2019. The LAI estimates in the Copernicus 300 m product represent 10 d composites from daily estimates of LAI that are generated from to daily top-of-atmosphere input reflectances detected by the PROBA-V satellite by applying a neural network. These 10 d LAI estimates were aggregated to monthly averages prior to assimilation. Pixel-wise uncertainty estimates are also provided with this product, calculated as the root-mean-square difference between the individual daily neural network estimates and the 10 d average. Previous work has indicated that these uncertainty estimates underestimate the true uncertainties associated with this product (Zhao et al., 2020). We therefore used a more conservative temporal aggregation approach based on the maximum uncertainty within the aggregation period.

ESA Biomass CCI aboveground biomass 2017 and 2018
Aboveground biomass (AGB) estimates and associated uncertainty were extracted the global maps published within the ESA Biomass Climate Change Initiative (CCI) collection (Version 2), comprising two estimates for the years 2017 and 2018 with a spatial resolution of 100 m (Santoro et al., 2021). The ESA CCI Biomass data are derived from synthetic aper-ture radar (SAR) backscatter data, specifically ALOS PAL-SAR L-band SAR backscatter combined with Sentinel-1 Cband SAR backscatter. Uncertainty estimates are provided with this product, calculated as the standard deviation associated with the AGB estimate after propagating errors through the SAR measurement, SAR-AGB modelling framework, and merging of L-band and C-band estimates into an overall AGB estimate (Santoro et al., 2021). The DALEC wood carbon pool represents the combination of above-and below-ground carbon (i.e. including the coarse root component). The contribution from belowground biomass (BGB) to the woody biomass pool, alongside the associated uncertainty, is modelled using an allometric relationship following Saatchi et al. (2011): (3)

SoilGrids2 soil organic carbon (SOC)
Soil organic carbon estimates and associated uncertainties were obtained from SoilGrids2, which provides 250 m resolution spatial maps of depth profiles for various soil properties (Poggio et al., 2021). These maps were produced using EO and auxiliary spatial data within a machine learning framework trained on over 230 000 individual soil profile observations. The extracted SOC estimates are used to set a prior constraint on the initial SOC stock. As there is no date associated with the SoilGrids2 dataset, we use these estimates, with their uncertainty, to provide a prior constraint on the initial SOC stocks. This contrasts with our treatment of the LAI and AGB data, which are associated with specific time periods and therefore used as observational constraints on the simulated time series. However, the aggregation of the original SoilGrids2 data layers for the baseline and stratified experiments follows the same procedure.

Disturbance
Disturbance is imposed on DALEC based on satellite observations of tree cover loss and burned area. Disturbances related to tree cover loss are driven by observations from the Global Forest Watch (GFW) dataset (Hansen et al., 2013), which provides annual constraints on tree cover loss at 30 m resolution based on Landsat data. Note that other mechanisms of disturbance, such as agricultural harvests and pasture management, are not considered in the current analysis.
To convert area estimates of tree cover loss into changes in C stocks, we use a simple clearance model in which a fraction of the C stored in C wood , C foliage , and C labile is removed based on the pixel fraction (or stratum-specific sub-pixel fraction) identified within the GFW dataset as experiencing tree cover loss. In practice, most tree cover loss occurs in the conifer woodlands and is therefore concentrated in these woodlands in the stratified analysis, compared to the baseline experiments, in which we do not consider the sub-pixel distribution of land cover. Fire is imposed based on monthly aggre- Figure 3. Schematic flow diagrams illustrating the different model-data fusion approaches employed in this study: the "traditional" modeldata fusion approach whereby the input data are aggregated to pixel-level "community averages" and stratification based on the land use leading to calibration of a suite of land-use-specific ensembles. At the heart of both MDF approaches sits DALEC, an intermediate complexity model of the terrestrial C cycle (Bloom and Williams, 2015). While the presented time series show specifically the change in live biomass, dC bio , it is important to note that similar information is retrieved for all fluxes and stocks within DALEC, alongside pixel-specific parameter ensembles.
gated burnt area fractions in the MODIS MCD64A1 product (Giglio et al., 2018), which maps fire-affected areas at 500 m resolution based on changes in surface reflectance, although the occurrence of MODIS-detected fires throughout the model domain was very low. Emissions from fire are estimated by assuming a fraction of simulated biomass either undergoes combustion, therefore immediately released to the atmosphere, or is transferred to the litter pool, based on tissue specific combustion-completeness factors (Exbrayat et al., 2018).

Experimental setup
To test how simulated C fluxes varied with grid resolution, we calibrated DALEC across the target domain at four different grid resolutions: 0.05, 0.25, 0.50, and 1.00 • , at a monthly time step spanning the period 2014-2019. We compared the retrieved parameters and simulated C fluxes for two MDF approaches: the proposed stratified CARDAMOM calibration that explicitly accounts for sub-pixel heterogeneity in land use and the traditional pixel aggregate CARDAMOM calibration. The latter serves as a baseline. In the baseline experiments, the observation streams were aggregated to the domain resolutions, and these community average environmental signals were assimilated into a single ensemble. In the stratified experiments, the individual observation streams were stratified at their native resolutions based on the dominant category from the high-resolution land cover map, and then these strata-specific subsets of observations were aggregated to the resolution of the model domains before assimilation. In all cases we use the same underlying DALEC model structure within the MDF framework (Fig. 3). Emergent differences in the retrieved parameters, stocks, and fluxes between experimental runs are therefore a consequence of the resolution at the observations are aggregated, rather than ecosystem-specific differences in model structure. The resolution of the meteorological data in all cases is 0.5 • ; therefore, our analysis does not allow us to test the extent to which resolving fine-scale variations in meteorological forcing impacts on the overall C balance. We characterise the calibration performance for each ensemble based on the RMSE and the bias with respect to the N assimilated observations: In both cases, we calculate pixel-level metrics and then weight the contributions from individual pixels when aggregating across the domain based on the fractional coverage contributed by each stratum. As the observations are also associated with significant uncertainty, we also consider the ratio of the RMSE and bias to the product uncertainty as a measure of agreement within the uncertainty constraints provided by the assimilated data. Values > 1 would indicate situations where the model was not able to fit the observations to within their associated uncertainty. Different data streams may provide inconsistent and/or incompatible information, for example due to data biases or incorrect specification of uncertainty (Zhao et al., 2020). This could lead to larger RMSE/σ ratios as CARDAMOM attempts to balance inconsistent information. Larger model-data mismatch could also indicate model structural error. Values < 1 may indicate improved constraints based on the combination of assimilating complementary data streams and the ecological knowledge embedded in the model and EDCs.
We are able to address our first two hypotheses (H1, H2), relating to the impact of resolution and sub-pixel stratification on diagnostic analyses of C cycle dynamics, by comparing the changes in C stocks and fluxes over the data assimilation period. H3 is addressed by comparing the retrieved parameters for each run, including the individual land use classes in the stratified analysis and how these distributions shift depending on the land use and on the spatial resolution of the analysis. To understand the potential impact of any emergent differences in the retrieved parameters on future trajectories, we then ran forward simulations of our DALEC ensembles to 2100 under the SSP2-4.5 W m −2 scenario extracted from the UK Earth System Model (UKESM; Sellar et al., 2019) contribution to CMIP6 , which corresponds to a middle-of-the-road scenario with a projected mean global warming of 2.7 • C (O'Neill et al., 2016). We do not impose future disturbance fluxes, so emergent differences in C dynamics will be driven by the interactions between climate and the retrieved parameters for each ensemble. To avoid step changes in meteorology between the historical meteorology (from observations) and future meteorology (simulated by UKESM), we apply the future trajectories for each scenario based on the anomaly in the UKESM forecast relative to 2019 (following Smallman et al., 2021).

Calibration performance
The two MDF approaches tested provided comparable fits to the calibration data. Both the baseline and stratified CAR-DAMOM calibrations were able to fit the assimilated C Wood and LAI observations to well within the levels of observation uncertainty, with the RMSE between simulated and observed variables less than 50 % of the uncertainties attached to the assimilated observations (Table 1, Figs. A2, A3). In general the RMSE values were comparable between the stratified and baseline experiments for LAI (mean RMSE for C Wood across spatial resolutions: 14.0 % for the baseline experiment; 13.6 % for the stratified experiment), while the RMSE for C Wood was slightly lower for the stratified experiment (mean RMSE for C Wood across spatial resolutions: 14.0 % for the baseline experiment; 13.6 % for the stratified experiment). For both LAI and C Wood , the RMSE tended to increase at finer spatial resolutions in both the baseline experiments and the aggregated stratified experiment (Table 1), although the resolution-dependent trend was not consistent between individual strata, Table A2). An increase in RMSE at finer spatial resolutions can be rationalised by the smoothing effect of aggregating the remote sensing products over larger spatial scales. This not only removes the impact of high frequency random noise in the assimilated signal but also removes variability generated by local processes (e.g. management) that are not accounted for in the relatively simple treatment of canopy dynamics encoded in the model. In the stratified experiment, the bias in C Wood was dominated by the contributions from the woodland strata (Table A2), corresponding to their much greater C Wood stocks, which were over 4 times higher in coniferous woodland than arable/pasture and 6 times higher than in the heathland class (Table A2, Fig. A3). Notably, all strata, including the coniferous woodland class, contained a strong seasonal cycle of monthly LAI in both the assimilated observations and the simulations (Fig. A2).

Terrestrial C budget and impact of spatial resolution on C flux estimates
Both the baseline and stratified CARDAMOM analyses estimated the net ecosystem exchange of C to be most likely a net sink of C over the calibration period. C uptake from the atmosphere via GPP was ∼ 5 gC m −2 d −1 (Table 2; for stratified experiment, 0.05 • , GPP = 4.87 (3.75-5.95) gC m −2 d −1 ), while C returned to the atmosphere via autotrophic and heterotrophic respiration (R eco ) was ∼ 4.5 gC m −2 d −1 (Table 2; for stratified experiment, 0.05 • , R eco = 4.57 (3.33-6.07) gC m −2 d −1 ). At increasingly fine spatial resolutions the model reveals greater spatial variability, reflecting the impact of landscape features and topography that are only adequately resolved at the finest grid scale (Fig. A4). However, median estimates of GPP and R eco were relatively insensitive to the spatial resolution once aggregated across the spatial domain (Fig. 4), varying by ≤ 0.05 gC m −2 d −1 ( Table 2). The simulated uncertainties were smaller in the stratified experiment compared to the baseline for both GPP and R eco (Fig. 7). The reduced uncertainty with stratification is a result of assuming independence between strata. If the uncertainty in these fluxes is assumed fully correlated across strata, then the combined uncertainty in the stratified experiment is comparable to that of the base- line experiment. The degree to which stratification reduces uncertainty is therefore determined by the extent to which strata are considered independent. In contrast to GPP and R eco , the disturbance flux exhibited a stronger resolution dependence in the baseline experiment (1.00 • : 0.027 (0.013-0.047) gC m −2 d −1 ; 0.05 • : 0.038 (0.016-0.073) gC m −2 d −1 ), while the disturbance flux is insensitive to resolution in the stratified experiment (0.054 (0.019-0.112) gC m −2 d −1 ; see Table 2). In this set of ex-periments, the disturbance flux is driven by forest harvest (i.e. tree cover loss), as fire was negligible, affecting only 16 pixels across the finest-resolution domain across the entire period of analysis. The baseline experiments simulate a tendency towards increasing C bio stocks for all four spatial resolutions, with the median simulated sink strength increasing at coarser grid resolutions. This emergent scale-dependent sensitivity of dC bio is not shared by the stratified experiment, for which median dC bio is more consistent across the range of spatial resolutions. The disagreement in C accumulation between the two approaches is reduced at finer-resolution model domains. Considering the median cumulative dC bio , the resolution-dependent bias for the baseline experiment compared to the stratified experiment declines by ∼ 50 % moving from the 1.00 • to the 0.05 • domain (Table 2). However, the two approaches do not reach convergence even at 0.05 • resolution (Fig. 4).
Harvest fluxes in the stratified ensemble setup were between 2 and 1.4 times higher than the baseline ensemble, with the difference increasing systematically at coarser grid resolutions (Fig. 5). Areas declining in dC bio are generally focused around the primary commercial forestry regions, where timber harvest is most abundant, for both the baseline and stratified experiments (Fig. 6). Across these regions hosting significant areas of conifer woodland, the magnitudes of simulated net losses in the live carbon pools are generally greater for the stratified ensemble. Comparing the differences in dC bio estimated by the stratified and baseline approaches on a pixel-wise basis (Figs. 6, A6), it is apparent that the stronger relative biases in dC bio correspond to parts of the domain with higher harvest fluxes. In turn, differences in harvest flux between the stratified and baseline experiments are  . A comparison of changes in the C stocks aggregated across the live C pools for the baseline and stratified ensembles for the 0.05 • domain (a, c); the relationship between the differences in simulated changes in live C stocks (dC bio ) and the difference in simulated harvest fluxes for the two approaches (b); and the difference in the simulated harvest fluxes, i.e. (stratified − baseline), normalised by the area of the pixel harvested, as a function of the fraction of the pixel covered by coniferous woodland (d). Large differences in simulated stock changes were generally associated with corresponding differences in the simulated timber harvest fluxes. The difference between the stratified and baseline harvest fluxes are greater in pixels with a mix of coniferous woodland and other land cover classes and diminish in more homogeneous pixels. A similar figure for the 0.25 • domain is provided in the Appendix (Fig. A6). strongly influenced by the level of sub-pixel heterogeneity (Figs. 6, A6), with the difference in the harvest flux between the two experiments declining as the fraction of pixels covered by the coniferous woodland class (where both harvest and live C stocks are concentrated) approaches full cover- age. In other words, there is little difference between the two experiments where pixels have homogeneous land use.

Impact of stratification on calibrated parameters and future C dynamics
The stratified data assimilation scheme reveals emergent differences between ecosystems, while traits retrieved for the baseline experiments characterised intermediate values (Figs. 8, A7, A8, A9). In the baseline experiments, comparing across domain resolutions, it is apparent that aggregation to coarser spatial resolutions reduces the range of retrieved traits. The reduction in the widths of the retrieved parameter distributions highlights the loss of information relating to variations of fundamental aspects of ecosystem function in the baseline experiments, as the resolution of ecological gradients is lost. In contrast, it is evident that stratification leads to a reduction in the ecological information loss when aggregating to coarser resolutions, as the widths of the aggregated parameter distributions are maintained. The most pronounced differences in the calibrated parameters are associated with parameters closely related to the dynamics of the C Wood pool (Fig. 8). Within the study landscape, live C stocks are concentrated in the woodland classes, particularly the conifer woodlands. Higher C stocks are reflected in greater woody productivity (Fig. 8), higher mean residence times for the wood pool (MRT Wood ; Fig. A7), and higher allocation fractions of NPP to the woody pool (Fig. A9). In the baseline experiment, it is notable that large MRT Wood estimates are not represented within the distribution of median parameter estimates; this effect is magnified in the coarser domains, where the range of the simulated MRT Wood distribution contracts (Fig. 8). In contrast, the longer-lived C Wood pools in the woodland classes are retained across the different-resolution experiments in the stratified case (e.g. Fig. 8). The longest soil C residence times (MRT Soil ) were retrieved for the heathland class (Fig. A8), which includes upland areas underlain by peat deposits (Tanneberger et al., 2017). Again, the longer MRT Soil estimates of the heathland stratum are not well represented within the baseline ensemble, and as with MRT Wood , this loss of ecological information is exacerbated at coarser resolutions. Notably, there was little variation in leaf lifespan between classes (Fig. A7), reflecting the strong seasonal cycle in the assimilated LAI observations for all classes, including coniferous woodland (Fig. A2). Differences in the parameter ensembles calibrated for each stratum lead to contrasting C dynamics in response to future climate change (Fig. 9), although the projection uncertainties are large (Fig. A10), reflecting the significant role of parameter uncertainty in forecasts . The forecast simulations do not include any impacts of future disturbance, so the evolution of dC bio post-2020 was driven only by biogenic processes, which were not strongly scale-dependent in our experiments. Both the stratified and baseline ensembles simulated increasing live C stocks up to 2100, assuming no disturbance (i.e. responding to climate and CO 2 only), with the rate of growth tailing off in the latter half of the forecast period. While the trajectories of the baseline and stratified ensembles were similar, the individual strata evolved along divergent paths (Fig. 9). The increase, and rate of increase, of live C density (dC bio ) tracked differences in the retrieved MRT Wood (Fig. 8). Accumulation was greatest for the coniferous and broadleaf forest strata, for which the median dC bio increased throughout the simulation period. In contrast, the heathland class and arable/pasture class accrued C in live biomass more slowly and had stopped accumulating C by the end of the forecast scenarios. Of the two forest strata, the median rate of increase in C density was greatest for coniferous forest (Fig. 9). The future trajectories were comparable across the range of spatial resolutions (Figs. A11, A12, A13).  of mixed land use in northern England and southern Scotland, we found contrasting resolution dependence for spatially distributed biogenic fluxes (GPP, R eco ) and disturbance fluxes, typically restricted to specific land uses (e.g. timber harvest). In our baseline experiments, where we aggregated the assimilated data to the domain resolution without considering the underlying land use, GPP and R eco were insensitive to the grid resolution for model domains spanning a resolution range of 1 to 0.05 • (Table 2, Fig. 4). While we expected the biogenic fluxes to be less sensitive than exoge-nous fluxes, the relative invariance in biogenic fluxes with respect to both resolution and method was surprising. Within the version of DALEC used, GPP is estimated for each pixel as a function of leaf area and meteorology drivers, modulated by the retrieved canopy photosynthetic efficiency parameter. R a is estimated as a parameterised fixed fraction of GPP, while R h is proportional to the C stocks in the litter and soil and exponentially related to temperature. The resolution of the meteorological driving data was 0.5 • resolution, so it did not resolve differences in the local meteorological con-ditions across the range of grid resolutions. Additionally, the time series of assimilated LAI did not exhibit strong variations between strata (Fig. A2). Together these might explain the relative insensitivity of biogenic fluxes to resolution observed in our experiments. A previous attempt to estimate the impact of meteorological drivers failing to account for local conditions suggested a corresponding error on net ecosystem exchange estimates of ∼ 10 % within 100 km of a met station, compared to an estimated parameter uncertainty of ∼ 50 % (Spadavecchia et al., 2020). Therefore, while fine-scale meteorology may drive fine-scale variability in biogenic C fluxes, we anticipate that the impacts are likely to be secondary to the differences between land cover types within a heterogeneous landscape. However, the influence of fine-scale meteorological variation is an aspect worthy of future research. Our model also ignored changes to litter pools associated with harvest; a more complex treatment incorporating coarse and fine residues might lead to greater sensitivity in R h . Conversely, disturbance fluxes, dominated in this case by timber harvest, showed a clear resolution dependence, with flux estimates approaching the stratified estimates at the finest grid resolutions (Fig. 5), although the baseline estimates did not reach convergence even at 0.05 • resolution (Table 2, Fig. 4). The emergent resolution dependence is a consequence of the fact that AGB is not distributed evenly within the landscape but concentrated in woodlands, areas which unsurprisingly correlate strongly with the distribution of timber harvest. Failing to account for this localisation of disturbance within distinct ecosystems when aggregating to coarse spatial domains therefore results in a significant negative bias in the simulated C losses from the live C pools. In contrast, in the stratified framework, disturbance fluxes were insensitive to resolution, giving a consistent estimate of the carbon balance across the resolution ranges considered.

Impact of heterogeneity on model parameters and ecosystem response to future climate H3
We found that by stratifying the landscape prior to MDF, the variability in ecosystem function exhibited between ecosystems, manifest in their retrieved parameters, was retained across the range of spatial resolutions considered (Fig. 8).
Conversely this ecological information is degraded if data are aggregated without considering a priori the underlying distribution of land use and land cover, demonstrated by the contraction of the distribution of median parameters across the model domains in the baseline experiments compared to the stratified experiments. Critically, this misrepresentation of ecosystem function was exacerbated at the coarser grid resolutions commonly employed in large-scale MDF applications, demonstrated by the contraction of the distribution of median parameters across the model domains. The degradation of ecological information is exemplified by our attempts to constrain mean residence times in the long-lived wood and soil pools, which are critical for understanding the potential carbon sink of terrestrial ecosystems (Luo et al., 2015;Smallman et al., 2021): in the baseline experiment, where data were the relationship between stocks and land use was ignored, the longer residence times specific to woodland ecosystems (MRT Wood ; Fig. 8) and heathland areas supporting C-rich peat deposits (MRT Soil ; Fig. A8) were not wellrepresented by the posterior parameter estimates, particularly in the coarser model domains. In this sense, stratification led to the retention of greater ecological fidelity in the model ensemble when aggregating to coarser spatial resolution domains. The overall ecological fidelity of the model representation will also be limited by the process representation embedded in the model structure and in the fidelity of the observation data to the relevant characteristics of the actual ecosystem. Prior research has demonstrated that CARDAMOM can retrieve trait differences across biomes (Bloom et al., 2016;Smallman et al., 2021). We demonstrate that CARDAMOM can also retrieve ecosystem-specific traits in mosaic landscapes when the assimilated data are stratified based on prior knowledge of land use, as long as the observations available for assimilation faithfully convey the ecosystem characteristics. Given the relative importance of parameter uncertainty on future trajectories of the terrestrial C cycle , stratification also presents significant opportunities to take advantage of the Bayesian framework embedded within CARDAMOM by taking advantage of the prior information on land cover and land use, for example using global trait databases (e.g. Kattge et al., 2020), to inform parameter prior estimates. In our prognostic experiments, we explored the potential future response of our calibrated ecosystem model under different levels of climate forcing (Fig. 9), highlighting divergent future C accumulation, which we suggest is largely related to differences in MRT between strata (Luo et al., 2015;Smallman et al., 2021). Ecosystem responses to climate and disturbance are modulated by their functional traits (e.g. Greenwood et al., 2017). In our experiments, the differences in simulated future C accumulation reflect differences in the calibrated parameters between strata, characterising the differing functional traits of the represented ecosystems. In our study landscape, differences in forecast C accumulation between strata counteracted to give overall trajectories that were similar to the baseline experiments. However, in practical terms, understanding the differential dynamics between ecosystems is likely to be important for the utility of forecasts for land managers and policy makers operating in heterogeneous landscapes . Ecosystem-specific calibrations may also be particularly important for understanding the future trajectories of heterogeneous landscapes in regions close to thresholds of abrupt ecological change (Turner et al., 2020).

Limitations of current approach and future work
A key advantage of spatially distributed MDF approaches, such as CARDAMOM, is that model parameters are calibrated locally based on the available observations of the ecosystem. Nevertheless, deficiencies in the assimilated observation streams and/or their uncertainty estimates will propagate to affect the calibrated parameters (Zhao et al., 2020). Excessive seasonality in satellite-derived estimates of LAI has been documented previously in coniferous forests. For example, Heiskanen et al. (2012) found that for a conifer forest in southern Finland, the seasonal course of satellite LAI estimates systematically underestimated LAI in the winter months, resulting in exaggerated seasonality compared to local site observations. Likewise, in our stratified experiments, it was notable that the satellite-derived LAI for coniferous woodlands stratum exhibited strong seasonality. As a consequence of the propagation of this systematic error in the assimilated observations, the calibrated leaf lifespans were indistinguishable from deciduous systems (Fig. A7). Secondly, repeated estimates of AGB have been demonstrated to significantly reduce uncertainties by helping to constrain the residence times of C within the ecosystem (Smallman et al., 2017). In our experiments we used observations of AGB for two time points but spaced only a year apart (i.e. 2017, 2018; Santoro et al., 2021). The spacing of these estimates is short compared to the residence times of aboveground C in forests and woodlands (Fig. A7. Improved constraints on the residence times and trajectories of the long-lived C pools may potentially be possible with additional, comparable, AGB estimates with greater temporal separation and reduced uncertainty. Stratification provides flexibility to improve the process representation for specific ecosystems when applying model-data fusion in heterogeneous landscapes. In the context of the UK, top-down estimates of the terrestrial C balance suggest a pulse of emissions late in the summer, coincident with the main harvest season that is not observed in bottom-up CARDAMOM simulations based on a similarly simple DALEC model structure employed here (White et al., 2019). Stratification by itself does not resolve this discrepancy. In contrast, we find that the temporal patterns of simulated R eco were indistinguishable between the baseline and stratified experiments across all resolutions (Fig. 4). However, stratification provides the basic framework within which to add ecosystem sub-models that explicitly model land use in agricultural settings, for which there are already candidate variants of DALEC. For arable systems, DALEC-Crop simulates the C dynamics associated with the growth, development, and harvest of crops (Sus et al., 2010); DALEC-Grass models the impact of grazing and mowing in managed pasture (Myrgiotis et al., 2020(Myrgiotis et al., , 2021. A next step is to integrate these two sub-models into the stratified CARDAMOM framework. While both DALEC-Crop and DALEC-Grass have been validated at the field scale (Revill et al., 2021;Myrgiotis et al., 2020), validation of the nationalscale C balance, where individual pixels contain multiple fields, is challenging. One approach would be to return to the atmospheric inversion estimates (White et al., 2019) and test the extent to which adding this additional process representation resolves the temporal discrepancies during the harvest period.
Finally, in our stratified framework, CARDAMOM retrieves model ensembles that best represent the observations within the calibration period based on a specified set of land use classes extracted from the LCM2015 land cover map (Rowland et al., 2017). However, land use is not static. The terrestrial biosphere is a dynamic environment, with environmental and anthropogenic change driving temporal shifts in land use and cover that we need to be able to account for. The UK has a relatively stable land use configuration; however, landscapes with significant land use change present a challenge, as the current implementation of the MCMC within CARDAMOM assimilates the full time series of calibration data to calibrate a set of time-invariant parameters. This carries the advantage of constraining the model parameters using as many observations as possible. Other frameworks employing sequential approaches to data assimilation provide scope for parameters to shift through time as new data are assimilated, such as four-dimensional ensemble variational data assimilation (e.g. Pinnington et al., 2020) and particle filters (e.g. Montzka et al., 2011). Adapting CARDAMOM to deal with land use change is therefore an important target for future development, although such efforts will be reliant on reliable, frequently repeated information on shifting land use (e.g. Souza et al., 2020). The static parameterisation also presents a limitation when forecasting ecosystem responses to climate change, as we do not account for either adaptive management strategies or shifts in species composition that may drive functional shifts in the terrestrial C cycle. Consequently, forecasts become increasingly uncertain as the environmental conditions deviate from the calibration period. Moreover, capturing these complex functional responses to future climate within ecosystem models remains a major challenge (Fisher and Koven, 2020).

Broader implications for constraining the terrestrial C balance
Landscapes frequently host a mosaic of contrasting land uses and land cover types, with heterogeneity exhibited at scales of tens to hundreds of metres. Modelling the land surface carbon balance across large scales requires the aggregation of ecosystems to resolutions that are computationally feasible. Adequately handling this scaling challenge is critical to avoid the introduction of biases into the simulated C balance, particularly when dealing with heterogeneous landscapes.
Our results indicate that localised, ecosystem-specific fluxes, such as those related to disturbance, are particularly sensitive to resolution in mixed-use landscapes (Fig. 6). Disturbances through logging, clearance for agriculture, or fire are important determinants of the carbon balance in many forest landscapes, globally (Gatti et al., 2021;Harris et al., 2021). Deforestation and degradation are often concentrated at forest edges and coincident with forest fragmentation (Brink et al., 2017;Matricardi et al., 2020). These hotspots of environmental change are therefore characterised by the juxtaposition of contrasting ecosystems in fragmented landscapes, highlighting the importance of stratification within MDF frameworks constraining the carbon balance in forested regions globally. The C pools within the comparatively simple Ccycle model structure of DALEC (Bloom and Williams, 2015;Smallman et al., 2017) map well onto the IPCCrecommended pools for reporting greenhouse gas emissions in the agriculture, forestry, and other land use (AFOLU) sector (IPCC, 2019). By combining multiple, spatially explicit observation streams with ecological theory embedded in models, MDF approaches such as CARDAMOM ensure conservation of mass balance, ecological "common sense" in the retrieved parameter sets and simulated temporal dynamics, and transparent propagation of uncertainties when quantifying ecosystem C fluxes . In addition, sub-pixel stratification within CARDAMOM enables sector-specific interrogation of the terrestrial C budget even in mosaic landscapes. For example, in the stratified experiment it is evident that C losses driven by timber harvest over the simulation period result in coniferous woodlands tending to lose C over time. The magnitude and spatial extent of these losses is only readily apparent when we stratified CARDAMOM based on land use. CARDAMOM, facilitated by land use stratification, therefore has clear potential to feed into Tier-3 greenhouse gas emissions reporting to the UN-FCCC (IPCC, 2019).

Conclusions
Quantifying the current and future terrestrial C balance is essential to understand the stability of terrestrial ecosystems facing rapid environmental change and to support robust national reporting of land-based CO 2 emissions. Bayesian MDF frameworks like CARDAMOM integrate the ecological knowledge embedded within models with constraints provided from a range of observation sources and their associated uncertainties, thus providing self-consistent, massbalanced estimates of systemic C cycling (Luo et al., 2011;Bloom et al., 2016;Smallman et al., 2017). However, when applying MDF across large regions, we are confronted by the challenge of capturing the innate complexity of terrestrial ecosystem with spatial and temporal resolutions that are computationally feasible. This challenge is particularly severe in heterogeneous landscapes with a mosaic of land uses. Failure to account for sub-pixel ecosystem heterogeneity within MDF inversions leads to bias in flux estimates and degradation of the ecological information embedded within the calibrated model ensembles. We explored the carbon balance for a region of ∼ 30 000 km 2 in the UK using a range of spatial resolutions (0.05-1.0 • ). In our baseline experiment (ignoring sub-pixel heterogeneity), disturbance fluxes in particular exhibited a resolution-dependent negative bias that was exacerbated both at coarser grid resolutions and as landscape fragmentation increased. Accounting for finescale structure of land use through stratification resolved this scale dependence and yielded higher disturbance fluxes. Stratification also enabled CARDAMOM to retrieve parameter ensembles that preserved the differences in ecological function between different land uses.
Stratification within CARDAMOM therefore provides three key benefits: (i) stratification reduces the scale dependence of flux estimates, facilitating scaling of CARDAMOM applications across larger spatial domains; (ii) stratification provides transparency for sector-level estimates of the terrestrial carbon balance that could be integrated into Tier-3 national emissions reporting frameworks; and (iii) by separately analysing distinct ecosystems within fragmented landscapes, the loss of ecological information associated with aggregation to coarse resolutions is limited. Where the observations are accurate and model structure appropriate, this should improve the ecological fidelity of the calibrated models, enable more robust ecological forecasting, and raise the prospect of mapping spatial variations of ecosystem functional traits based on a diverse range of EO data. Future work will build on this stratification framework to build in more detailed process representation to better account for C fluxes in managed arable (Revill et al., 2021) and pasture landscapes (Myrgiotis et al., 2021). Finally, landscape fragmentation and disturbance, whether driven by logging, agriculture, or fire, are important determinants of the carbon balance globally. Therefore, while the focus of this study is a temperate landscape within the UK, these results carry broader significance for the application of MDF frameworks to constrain the terrestrial C balance at regional and national scales.   Table 1 and for individual strata in Table A1. Figure A3. Simulated C Wood time series and assimilated observations, presented with uncertainty, aggregated from the 0.05 • domain. Shaded bands represent the 50 %, 90 %, and 95 % confidence intervals, assuming full spatial correlation of uncertainties across the domain. Calibration performance statistics for all resolution domains are summarised in Table 1 and for individual strata in Table A1. Figure A4. Temporally averaged GPP for the baseline and stratified ensembles, for domains of 0.50, 0.25, and 0.05 • resolution. Note the 1.00 • domain is not shown. The bottom row shows the distributions of median GPP across the domain for the baseline and stratified ensembles and demonstrates the diminishing variability in simulated GPP when data are aggregated to spatial domains with coarser resolution. Figure A5. Spatially aggregated time series for GPP and ecosystem respiration (R eco = R a + R het ) and the cumulative change in carbon stocks in the live (dC bio ) and soil (dC soil ) pools, shown for the baseline and stratified ensembles for the 0.05 and 1.00 • resolution domains. The median estimates are plotted with the shaded regions representing the 50 %, 90 %, and 95 % confidence intervals. The 1.00 • simulation results are plotted in grey scale, but for the most part, the ensemble ranges overlap. Figure A6. A comparison of changes in the C stocks aggregated across the live C pools for the baseline and stratified ensembles for the 0.25 • domain (a, c); the relationship between the differences in simulated changes in live C stocks (dC bio ) and the difference in simulated harvest fluxes for the two approaches (b); and the difference in the simulated harvest fluxes, i.e. (stratified − baseline), normalised by the area of the pixel harvested, as a function of the fraction of the pixel covered by coniferous woodland (d). Large differences in simulated stock changes were generally associated with corresponding differences in the simulated timber harvest fluxes. The differences between the stratified and baseline harvest fluxes are greater in pixels with a mix of coniferous woodland and other land cover classes and diminish in more homogeneous pixels. Figure A7. Comparison of the retrieved residence times for the live carbon pools, for the individual strata and the baseline retrieval for the different resolution domains. Distributions represent the pixel-level median parameter estimates weighted by the pixel fraction estimated associated with each stratum. Figure A8. Comparison of the retrieved residence times for the dead organic matter carbon pools, for the individual strata and the baseline retrieval for the different resolution domains. Distributions represent the pixel-level median parameter estimates weighted by the pixel fraction estimated associated with each stratum. Figure A9. Comparison of the retrieved allocation fractions the partitioning of NPP between the live organic carbon pools, for the individual strata and the baseline retrieval for the different resolution domains. Distributions represent the pixel-level median parameter estimates weighted by the pixel fraction estimated associated with each stratum. Note that labile carbon is subsequently allocated to foliage. Figure A10. Ensemble distributions for the spatially aggregated forecasts of dC bio projected by 2100 under SSP2-4.5 W m −2 scenario extracted from the UK Earth System Model (UKESM; Sellar et al., 2019) contribution to CMIP6 . Stock changes represent cumulative differences from 2016-2100. Uncertainties are assumed to be fully correlated in space but independent across the different land cover strata in the stratified ensemble. Figure A11. Evolution of live C pools aggregated across the baseline and stratified domains, alongside the individual strata, under three future trajectories of climate change. Climate trajectories were extracted from the UK Earth System Model (UKESM; Sellar et al., 2019) contribution to CMIP6 . No disturbance was simulated after the end of the calibration period. The presented trajectories were taken from the 0.05 • domain and represent the median estimates of the ensemble forecasts. Figure A12. Evolution of live C pools aggregated across the baseline and stratified domains, alongside the individual strata, under three future trajectories of climate change. Climate trajectories were extracted from the UK Earth System Model (UKESM; Sellar et al., 2019) contribution to CMIP6 . No disturbance was simulated after the end of the calibration period. The presented trajectories were taken from the 0.25 • domain and represent the median estimates of the ensemble forecasts. Figure A13. Evolution of live C pools aggregated across the baseline and stratified domains, alongside the individual strata, under three future trajectories of climate change. Climate trajectories were extracted from the UK Earth System Model (UKESM; Sellar et al., 2019) contribution to CMIP6 . No disturbance was simulated after the end of the calibration period. The presented trajectories were taken from the 1.00 • domain and represent the median estimates of the ensemble forecasts. Table A1. Description of parameters estimated for the DALEC model used in this study (Bloom and Williams, 2015). Each parameter is given a name, unit, and description. Gross primary productivity -GPP, autotrophic respiration -R a , autotrophic maintenance respiration -R m , heterotrophic respiration -R h . Litter is assumed to be the combined foliage and fine root litter pools. Note that GPP allocation fractions are applied sequentially such that GPP allocation to C wood = GPP − (GPP · R a / GPP) − (GPP · GPP lab ) − (GPP · GPP root ).

Name
Units Description Leaf mass per unit leaf area Ceff g(C) m −2 d −1 Potential photosynthetic activity per unit leaf area Initial labile g(C) m −2 Size of the labile C pool at time step 1 Initial foliage g(C) m −2 Size of the foliar C pool at time step 1 Initial fine root g(C)m −2 Size of the fine root C pool at time step 1 Initial wood g(C)m −2 Size of the wood C pool at time step 1 Initial litter g(C)m −2 Size of the litter C pool at time step 1 Initial soil g(C)m −2 Size of the soil C pool at time step 1  Table A2. Summary of calibration performance for the individual strata in the stratified CARDAMOM experiment, aggregated across the domains. σ represents the uncertainty of the assimilated observation data; thus RMSE/σ provides the ratio of the RMSE relative to the uncertainty attached to the observation constraint. The values in parentheses following the RMSE and bias estimates indicate the percentage relative to the mean of the observations across the domain. Code and data availability. The driving data and selected carbon cycle outputs have been archived (Milodowski et al., 2023a) at Edinburgh DataShare: https://doi.org/10.7488/ds/3843. The code used to generate the analysis and figures presented in this article is archived (Milodowski et al., 2023b) on Zenodo: https://doi.org/10.5281/zenodo.8146687. To request access to the repository, please contact either David T. Milodowski or Mathew Williams.
Author contributions. DTM, TLS, and MW created the experimental design. DTM ran CARDAMOM and analysed the outputs. DTM led the writing with inputs from TLS and MW.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements. This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) (https://www.ecdf.ed.ac.uk/, last access: 4 July 2023). LAI information was generated by the Global Land Service of Copernicus, the Earth Observation programme of the European Commission. Aboveground biomass maps for 2017 and 2018 were provided by the European Space Agency through their Climate Change Initiative (CCI) Biomass project. Soilgrids2 soil organic carbon maps were made available by the International Soil Reference and Information Centre (ISRIC). The University of Edinburgh CARDAMOM framework is available on GitHub (https://github. com/GCEL/CARDAMOM, last access: 4 July 2023) with permissions provided on request to the corresponding author. The Jet Propulsion Laboratory CARDAMOM framework is also available on GitHub (https://github.com/CARDAMOM-framework/, last access: 4 July 2023 -contact abloom@jpl.nasa.gov for access).
Financial support. David T. Milodowski and Mathew Williams were funded primarily by the NERC DARE-UK project (NE/S003819/1). T. Luke Smallman and Mathew Williams were additionally supported by the UK's National Centre for Earth Observation. Mathew Williams also received funding from the Royal Society.
Review statement. This paper was edited by David Medvigy and reviewed by Zachary Robbins and two anonymous referees.