Examining the role of environmental memory in the predictability of carbon and water ﬂuxes across Australian ecosystems

. The vegetation’s response to climate change is a signiﬁcant source of uncertainty in future terrestrial biosphere model projections. Constraining climate–carbon cycle feedbacks requires improving our understanding of both the immediate and long-term plant physiological responses to climate. In particular, the timescales and strength of memory effects arising from both extreme Our the importance in models we demonstrate a considerable to

Abstract. The vegetation's response to climate change is a significant source of uncertainty in future terrestrial biosphere model projections. Constraining climate-carbon cycle feedbacks requires improving our understanding of both the immediate and long-term plant physiological responses to climate. In particular, the timescales and strength of memory effects arising from both extreme events (i.e. droughts and heatwaves) and structural lags in the systems (such as delays between rainfall and peak plant water content or between a precipitation deficit and down-regulation of productivity) have largely been overlooked in the development of terrestrial biosphere models. This is despite the knowledge that plant responses to climatic drivers occur across multiple timescales (seconds to decades), with the impact of climate extremes resonating for many years.
Using data from 12 eddy covariance sites, covering two rainfall gradients (256 to 1491 mm yr −1 ) in Australia, in combination with a hierarchical Bayesian model, we characterised the timescales and magnitude of influence of antecedent drivers on daily net ecosystem exchange (NEE) and latent heat flux (λE). By focussing our analysis on a single continent (and predominately on a single genus), we reduced the degrees of variation between each site, providing a novel chance to explore the unique characteristics that might drive the importance of memory. Model fit varied considerably across sites when modelling NEE, with R 2 values of between 0.30 and 0.83. λE was considerably more predictable across sites, with R 2 values ranging from 0.56 to 0.93. When considered at a continental scale, both fluxes were more predictable when memory effects (expressed as lagged climate predictors) were included in the model. These memory effects accounted for an average of 17 % of the NEE predictability and 15 % for λE. Consistent with prior studies, the importance of environmental memory in predicting fluxes increased as site water availability declined (ρ = −0.73, p < 0.01 for NEE, ρ = −0.67, p < 0.05 for λE). However, these relationships did not necessarily hold when sites were grouped by vegetation type. We also tested a model of k-means clustering plus regression to confirm the suitability of the Bayesian model for modelling these sites. The k-means approach performed similarly to the Bayesian model in terms of model fit, demonstrating the robustness of the Bayesian framework for exploring the role of environmental memory. Our results underline the importance of capturing memory effects in models used to project future responses to climate change, especially in water-limited ecosystems. Finally, we demonstrate a considerable variation in individual-site predictability, driven to a notable degree by environmental memory, and this should be considered when evaluating model performance across ecosystems.

Introduction
Ecosystems respond to climate at a wide range of timescales: stomata respond to changes in humidity within seconds (Fanjul and Jones, 1982), while extreme droughts can impact plant growth for up to 5 or more years after the event (Anderegg et al., 2015;Vanoni et al., 2016). There is growing interest in better understanding these timescales over which vegetation responds to environmental conditions, particularly when considering the projected rate of future climate change and threats this poses to ecosystem stability (Mottl et al., 2021). An increasing number of studies have demonstrated the importance of past events such as fires (Sun et al., 2020), land management (Seabloom et al., 2020), and droughts (Anderegg et al., 2015;Kannenberg et al., 2020) on current ecosystem behaviour, in many cases, over time spans of years.
It is important to differentiate between what shall be referred to as "legacy" and "lag" effects. The ongoing impacts of climate extremes are an example of the former -they leave a persistent yet diminishing "legacy" from their single occurrence. A "lagged" effect differs in that it is an ongoing, constant delay in reaction to current conditions. The differing response times of vegetation to stimuli mean that lags exist associated with climate and the exchange of heat, energy, and carbon fluxes between ecosystems and the atmosphere. For instance, current grassland soil respiration can be strongly influenced by antecedent moisture from the prior 2 weeks, with the importance of longer lags decreasing sharply despite cumulative lag effects showing for up to 6 weeks (Cable et al., 2013). These response timescales are affected by ecosystem characteristics, such as vegetation structure. Cable et al. (2013) also found that, relative to grasslands, shrublands had a much longer cumulative lag effect in soil respiration of up to 10 weeks, with the first 4 weeks being most important.
The timescale of responses also differs among ecosystem processes. Feldman et al. (2021) found a lag of 5 d between a pulse of rainfall and peak plant water content in semi-arid grasslands, which in turn is likely to affect plant water status and hence carbon uptake. In arid and semi-arid ecosystems, soil respiration can respond immediately to a rainfall pulse and remain elevated for up to 2 d, while net ecosystem exchange (NEE) rates have a lagged response of up to a week (Huxman et al., 2004;Cleverly et al., 2013). Cleverly et al. (2016) found a variety of lags for phenological and photosynthetic responses in central Australia, ranging from immediate to 6 weeks. Antecedent climate is yet another factor that influences ecosystem response timescales. Repeated droughts in one growing season can negatively af-fect a plant's investment in biomass in comparison to a single event (Lemoine et al., 2018). In semi-arid and arid regions, NDVI (normalised difference vegetation index, a measure of vegetation greenness) responds to precipitation with lags of up to 7 months, while less arid areas tend to have shorter response times (Liu et al., 2018). From a study over the entire state of Kansas, Wang et al. (2003) found shorter lags between NDVI and precipitation, with the strongest correlation being at a 4-week lag. In fact, antecedent conditions could even be more important than concurrent conditions when measuring ecosystem productivity (Sala et al., 2012). Clearly, there is a diverse range of lagged responses to climate found within terrestrial ecosystems. Confounding factors, such as differing vegetation characteristics (including the proportion of woody vegetation, rooting depth, and varying allocation strategies), prevailing climate, interacting processes, and prior extreme events, all influence the magnitude and timescale of these lags and their impact on ecosystem fluxes.
Despite increasing attention on the impact of these lags within ecosystems, they are poorly captured within terrestrial biosphere models (TBMs) (Anderegg et al., 2015;Frank et al., 2015;Ogle and Barber, 2016;Kannenberg et al., 2020). For example, TBMs usually have an instantaneous coupling between photosynthetic uptake and growth in plants, while in reality carbon is first allocated to non-structural carbohydrates, allowing it to sustain growth and respiratory demands during periods of lower photosynthetic activity (Fatichi et al., 2014;Smith and Dukes, 2013;Jones et al., 2020). Anderegg et al. (2015) similarly found that earth system models are generally weak at predicting the impact of droughts on productivity during the drought recovery period. Models generally overestimate the immediate impact and underestimate the recovery times for extreme events (Kolus et al., 2019;Huang et al., 2016;Ukkola et al., 2016a). Recent satellite observations have also indicated that models fail to capture the impact of water stored in reservoirs with longer response times to climate (such as deeper soil moisture, groundwater, and surface water), resulting in models being more dominated by anomalies at shorter timescales than observations (Humphrey et al., 2018). Such model failures may in part be due to incorrect rooting depths, poor soil profile characterisation, or a lack of representation of these long-term storage pools such as groundwater or wetlands. Implementing these "lagged effects" into the models used to predict the carbon cycle is key to improving their performance (Keenan et al., 2012).
To better incorporate the role of lags in model process representations, we first need to determine which physical processes are affected and then quantify the timescales at which lags persist. Frequently, autoregressive methods are used in studies looking at specific pre-determined timescales, such as the current year's productivity and/or the previous year's rainfall and productivity (Sala et al., 2012) or the prior 6 months of climate (Zhang et al., 2015). Many studies have however indicated that intra-annual rainfall patterns are more important to productivity than total annual rainfall (Hovenden et al., 2014. Overcoming these subjective constraints and still providing freedom to explore sensitivity to shorter (sub-yearly) lags seems key to revealing any unexpected behaviour and detailing the full extent of memory effects, including any potential interaction between climate drivers.
To address this issue of strictly prescribed lags, new statistical approaches have been developed that allow for more flexible estimation of timescales of influences and magnitudes of lag effects . Such methods have identified a key role of memory effects in many ecosystem processes, including soil and ecosystem respiration (Ryan et al., 2015;Cable et al., 2013;Barron-Gafford et al., 2014) and gross primary production (Ryan et al., 2017). NEE and latent heat flux (λE) have also been shown to be influenced by these memory effects (Samuels-Crow et al., 2020), which are often stronger at drier sites (Liu et al., 2019).
Here we use the stochastic antecedent modelling (SAM) framework of Liu et al. (2019) to probe lagged flux responses in individual ecosystems across two environmental gradients in Australia. We explore whether a relationship between site aridity and importance of antecedent conditions holds when viewed at a smaller spatial scale or whether such a relationship is confounded by other site characteristics, such as vegetation structure or extreme weather conditions. By focusing on a number of intensively studied sites within a single continent and examining the relative contributions of various predictors in each environment, we aim to reduce these potential confounding factors. We include the sites of the North Australian Tropical Transect (NATT) as an explicit case study since this "living laboratory" covers a steep rainfall gradient without a correspondingly strong change in vegetation (Hutley et al., 2011). By applying the SAM framework to λE in addition to NEE, we explore how the timescales of response vary between these coupled fluxes, which can improve our understanding of the processes involved in environmental memory. Our second aim is to explore whether assessing the importance of environmental memory at a site can offer insights into TBM evaluation. If sites are more predictable or have a greater dependence on antecedent conditions (and therefore are traditionally less predictable), this should be taken into account when critically analysing TBM performance across sites. Finally, we examine an alternative statistical approach for antecedent modelling to test whether structural assumptions in the SAM framework notably affect inferences about environmental lags. In addition to highlighting sites with more complex lagged environmental behaviour (a tougher test for TBMs), explicit identification of individual lagged site mechanisms can highlight key processes missing from TBMs more broadly.

Flux data
Meteorological and flux data were taken from the OzFlux data repository for 12 eddy covariance towers (see Table 1 for site details). The sites were selected to cover a variety of vegetation types and fall into two overarching groups. Firstly, sites comprising the North Australia Tropical Transect (NATT) were included. These vary from tropical grasslands to semi-arid shrublands and savannahs (International Geosphere-Biosphere Programme -IGBP -classifications of grassland, savannah, and woody savannah) along a steep rainfall gradient (321 to 1486 mm annual precipitation) running from north to south in the Northern Territory, Australia. Secondly, we grouped the southern Australian woodland sites (SAWSs). These were selected as sites with a greater proportion of woody vegetation than the NATT sites (IGBP classifications of savannah, woody savannah, and evergreen broadleaf forest) while still covering a broad range of climate types (256 to 1491 mm annual precipitation). While the vegetation differences between the groups are less distinguished at the drier sites, the NATT sites are considered "savannah" sites, and the SAWS sites are referred to as "woodland" sites.
In this analysis we used observations of daily NEE and λE fluxes. NEE is a direct measurement of carbon exchange which represents the balance of carbon uptake and losses and is favoured over flux-derived gross primary productivity due to issues with respiration partitioning (Renchon et al., 2021). λE measures all evaporation, including contributions from soil and vegetation. Although carbon (uptake) and water fluxes are non-linearly coupled (De Kauwe et al., 2015a), NEE and λE are expected to exhibit a degree of independence in their responses to environmental conditions (including differing timescales) since they contain contributions from different components of the system (e.g. soil or understorey vs. canopy).
The fluxes were modelled using meteorological observations, which included mean downward short-wave radiation, mean air temperature, mean vapour pressure deficit (VPD), and precipitation. All OzFlux data were extracted at a halfhourly time step, aggregated to daily data, screened to only include complete calendar years, and then mean-centred. Sites were also screened to ensure that at least 5 years of good-quality data were available, which excluded five sites that have previously been included in the NATT (Adelaide River, Daly River Pasture, Fogg Dam, Litchfield, and Ti Tree East). Details on the data processing and quality control of OzFlux data can be found in Isaac et al. (2017). Here, we used the L6 data, which are quality-checked and gap-filled.
Normalised difference vegetation index (NDVI), a proxy for vegetation greenness, was obtained for each site from the global MODIS daily albedo data product at a 500 m res- Table 1. Site information for all sites included in the analysis. MAP is the mean annual precipitation at the site. CVP is the coefficient of variation in precipitation. WI is a wetness index of MAP over mean annual reference evapotranspiration. MAP, CVP, and WI are calculated from the WorldClim dataset, which covers 1970-2000 (Fick and Hijmans, 2017;Trabucco and Zomer, 2018). "Years analysed" corresponds to the full years of data available for analysis, inclusive. "World ecoregion" is the biome classification of each site, which is based on climatic regime and ecological structure, among other criteria (Olson et al., 2001;Beringer et al., 2016). Asterisks indicate sites that are part of the NATT. Litchfield was initially included in the analysis but was discarded due to the short time series available. All other sites are grouped as the SAWS. Data were obtained from the OzFlux Data Portal (http://data.ozflux.org.au, last access: 21 September 2020).  Arndt et al. (2013) olution via the Google Earth Engine interface (Schaaf and Gorelick et al., 2017). This dataset is calculated at a daily time step based on a 16 d composite of observations. This is used in the Bayesian model as a measure of whether the sites were in a "growing" (higher greenness) or "dormant" (lower greenness) state (see below). General site characteristics including the mean annual precipitation (MAP) and the coefficient of variation in precipitation (CVP) were obtained from the WorldClim dataset (Fick and Hijmans, 2017), while the wetness index (WI) was calculated from these data as MAP divided by mean annual potential evapotranspiration (Trabucco and Zomer, 2018;Zomer et al., 2007Zomer et al., , 2008. This WI assumes that lower values indicate more water-limited locations, while higher values are locations with a greater proportion of precipitation to potential evaporation. These data were used in preference to meteoro-logical data from individual sites as they cover a significantly long time period.

Statistical analysis of memory and lags
Following previous work by Liu et al. (2019) in the application of the SAM framework, we separately model NEE and λE using a non-linear mixed-effect Bayesian model at each site. The same formulation was used to model both fluxes at the 12 sites, and so the following description of the NEE model is also applicable to λE modelling.
While the exact distribution of flux errors is site-dependent and can vary between superimposed Gaussian distributions (Lasslop et al., 2008) or Student's t distribution , daily NEE is assumed to be Laplace-distributed (Richardson et al., 2006) with mean µNEE and variance σ 2 in line with Liu et al. (2019); µNEE at time t is modelled as per Eq. (1).
(1) G n and D n represent two sets of coefficients, corresponding to "growing" and "dormant" behaviours, respectively. These two behaviours are a function of NDVI, which we used to capture a site's growing seasons. Growing season differences were accounted for in our statistical modelling because at many of the sites growth is restricted by water availability (e.g. following rain), which in Australia is not strictly related to a specific growth period during the year; φ(t) is a function that partitions between these two behaviours based on NDVI such that D n is 1, and G n is 0 at minimum NDVI and vice versa at maximum NDVI: (2) A uniform prior on the interval [−1, 1] is assigned to φ * . This ensures that φ(t) increases monotonically with increasing NDVI. The φ * value modifies this function, allowing either a one-to-one linear relationship (when φ * is equal to 0) or varying non-linear relationships. In this manner, the proportion of time steps assigned to predominately "growing" or "dormant" behaviour (when φ(t) takes a value greater than 0.5 or less than 0.5, respectively) is modified during the model fitting.
CLIMATE n (t) is a weighted sum of daily climate measurements with various lags, where n indicates various variables as follows. When n = 1, CLIMATE n (t) = 1 at all time steps t and is therefore an intercept term. Where n = 2, 3, 4, or 5, CLIMATE n (t) represents the contribution of each of the short-term climate predictors (e.g. short-wave radiation, air temperature); n = 6, . . . , 9 are the quadratic interactions of these predictors, and n = 10, . . . ,15 are the remaining pairwise interactions; n = 16 is the contribution of long-term precipitation. Equation (3) is the generic form for a shortterm climate predictor, where ω CLIM n lag represents the weights assigned to the lagged climate observations CLIM n (t − lag). The CLIMATE n (t) terms for each n are summarised in Table 2.
Long-term precipitation (mean rainfall calculated over varying periods, up to 365 d prior; Table 3) is included as it is possible for the vegetation at these sites to be drawing on water reservoirs from deeper within the soil profile than would be recharged by short-term precipitation. This is especially true for the monsoonal sites along the NATT which experience prolonged dry seasons. In comparison, radiation, temperature, and VPD are assumed not to influence the fluxes at time spans longer than 14 d and so are restricted to the shorter timescale (Ryan et al., 2017).
Each CLIMATE n (t) term has a unique set of weights, which were assigned a Dirichlet prior. This ensures that each individual weight is constrained between 0 and 1, and the sum of the weights within each CLIMATE n (t) term is equal to 1. As such, each weight is indicative of the relative importance of the specific climate variable at the corresponding lag period.
To reduce the number of parameters estimated in the model, the lagged time steps in the short-term CLIMATE n (t) sums were grouped into blocks, with each lag in a block assigned the same weight, as shown in Table 3. For instance, the same weight is assigned to observations from 7 and 8 d prior. The decreasing resolution further into the past is due to the expectation that the importance of the driving variable on consecutive days becomes increasingly difficult to distinguish as the lag time increases. This reduction in the number of individual weights being estimated improves the model computation time and convergence.
Significant changes between Liu et al. (2019) and our model include the removal of soil moisture, the introduction of shorter precipitation lags (< 14 d), and min-max normalisation of NDVI on a per-site basis. We excluded soil moisture from our analysis due to inconsistencies in the data record at some sites as well as varying measurement depths across sites. Additionally, the soil moisture data are typically collected from relatively shallow depths of the soil profile and, as such, may not accurately reflect the root zone soil moisture in woody ecosystems. For instance, Eucalyptus species are known to have dimorphic rooting profiles with deep tap roots able to access water from below the shallow root zone (Knight, 1999). To account for short-term impacts of water availability following the removal of soil moisture, precipitation lags at the shorter timescales were introduced. Finally, NDVI is normalised to vary from 0 at minimum NDVI to 1 at maximum observed NDVI at each site for two reasons. Firstly, this ensures that each site spans the full range of φ values between "growing" (1) and "dormant" (0) periods. The similarity between the individual G n and D n coefficients was then able to indicate any differences between these two behaviours. Secondly, it was hypothesised that totally excluding coefficients of one behaviour at minimum NDVI (and the other at maximum NDVI) would improve convergence towards well-defined parameter values during model fitting.
The Bayesian model was implemented in JAGS via R, using the r2jags package and Markov chain Monte Carlo (MCMC) simulations (Plummer, 2003;R Core Team, 2020;Su and Yajima, 2020). Convergence of the MCMC chains was confirmed visually with trace plots and analytically with Table 2. Formulas for the CLIMATE n (t) term in the model for µNEE(t) for each n. TAS is mean air temperature, SWR is incoming shortwave radiation, VPD is vapour pressure deficit, and PPT is rainfall. The lag term in the sums is in days; i.e. at lag = 3, TAS(t−lag) is the mean air temperature from 3 d prior. Note that the lags in PPT long ant are larger periods than the daily lags for the short-term predictors, and PPT is here taken as the mean daily rainfall in these lagged periods; ω CLIM lag is the weight assigned to each lag period and is different for each of the five climate variables.

Climate predictor Formulation
(n) (CLIMATE n (t)) the Gelman-Rubin and Geweke diagnostic values as calculated with the coda package (Plummer et al., 2006). The MCMC iterations were set high enough to minimise the number of parameters with an "effective sample size" of less than 10 000, which ensured the parameters' posterior distributions were sufficiently sampled (Kruschke, 2015;Harms and Roebroeck, 2018). Once converged, model performance was assessed using five metrics: coefficient of determination (R 2 ), correlation coefficient (CCO), standard deviation difference (SDD), mean bias error (MBE), and normalised mean error (NME) (Haughton et al., 2016). These were calculated between the daily time series of the observed and modelled flux. Together, these metrics capture a broad range of potential model performance measures relative to the observations. The model was first run in a "current climate only" configuration, referred to as the "CC model", where the weights were set to 0 for all but the current-day climate (which were therefore assigned a weight of 1). This restriction was removed for a second set of model runs to utilise the full SAM framework. This second set of runs introduces "environmental memory" to the flux predictions, and so this model configuration is referred to as the "EM model". Finally, in an attempt to capture any remaining predictability in the observations, the residuals from the EM models were themselves modelled using a standard autoregressive process with a time lag of 1 d, which is an AR(1) model. This effectively allows a correction of the predicted flux based on the prior error in the prediction. While previously this has been referred to as capturing a "biological memory" component, there are various Table 3. Assignment of weights to lagged periods. The same weight is applied to multiple lag periods for the short-term predictors. Grouping of lags is necessary to reduce the number of parameters being estimated by the Bayesian framework, which improves model computing performance and reduces the risk of overfitting.
plausible effects that this AR(1) model can be considered to represent (Liu et al., 2019). As such we do not claim it represents a specific memory process but rather a lower bound on site predictability, and it is simply referred to as the "AR model".

Additional modelling
For comparison and confirmation of the SAM method, site NEE and λE were also modelled using an approach of kmeans clustering plus regression (Abramowitz, 2012;Best et al., 2015). The k-means approach is an alternative insample empirical model, providing a direct comparison to the SAM approach. The clustering is performed on the environmental predictor variables, and the time steps that belong within each cluster are determined. For each cluster, a linear regression between the climate predictors and the flux at the time steps within the cluster is performed. This allows assessment of the degree to which SAM results are indicative of site behaviour as opposed to resulting from the inherent structure of the SAM model. Seven different cluster-plus-regression models were implemented. Firstly, fluxes were modelled as a linear combination of concurrent-only climate variables. Five additional models were then run, where each model included concurrent climate but was further expanded with one individual climate predictor including potential lags (see Table 3 for lag timescales for each of the predictors). Finally, clustering and regression were performed for a model including all concurrent and lagged climate variables. As such, this final model contains the same information as that of the EM SAM model. The NbClust package (Charrad et al., 2014) indicated that for the majority of sites and models, four or fewer clusters were preferred for model parsimony. While increasing the number of clusters would increase the R 2 values reported, we found that clusters began to contain fewer than a reasonable number of observations for the linear regression (fewer than ∼ 260 observations, which is 4 times the number of parameters in the k-means model containing every lag). As such, we repeated the k-means clustering for each site and model, with the number of clusters ranging from two to eight, and the median R 2 value taken as the measure of performance.
For further comparison, we also consider the performance of a TBM in simulating site NEE. The TBM used was the CSIRO Atmosphere Biosphere Land Exchange (CABLE) model (Kowalczyk et al., 2006), a land surface scheme that can be run offline with prescribed meteorological forcing (De Kauwe et al., 2015b;Decker et al., 2017;Haverd et al., 2018;Ukkola et al., 2016b;Wang et al., 2011) or fully coupled (Lorenz et al., 2014;Pitman et al., 2011) within the Australian Community Climate Earth System Simulator (AC-CESS; Kowalczyk et al., 2013). CABLE models the exchange of carbon, energy, and water fluxes at the land surface, representing the vegetation with a single-layer, twoleaf (sunlit or shaded) canopy model (Wang and Leuning, 1998) and a detailed treatment of within-canopy turbulence (Raupach, 1994;Raupach et al., 1997). Soil water and heat conduction are numerically integrated over six soil layers (to 4.6 m depth) following the Richards equation. CABLE can be run with interactive biogeochemistry  and vegetation demography , but both were switched off as leaf area index was prescribed on a persite basis. CABLE is a state-of-the-art TBM that performs similarly to other TBMs used in global coupled modelling (Best et al., 2015). At each site, we applied CABLE with the parameterisation taken from the assumed dominant plant functional type (PFT) at the flux tower location. CABLE's reported performance at the 12 sites in this study is then essentially the performance one might expect if CABLE were run in a global coupled model; unlike the empirical models it is being compared to, it is not calibrated with site data, so in some sense this is not a fair comparison. Nevertheless, there are strong indicators that local calibration of TBMs offers relatively minor performance increases (i.e. that structural inadequacies remain) and that empirical approaches benefit to a much greater degree by the inclusion of local calibration information (Abramowitz et al., 2007). There is also compelling evidence that TBMs share biases (Haughton et al., 2016). We suggest therefore that this comparison should highlight how much more appropriate empirical approaches are for investigating ecosystem memory effects than TBMs with additional parameterisations, where existing structural inadequacies in TBMs could cloud the interpretation of the inclusion of lagged effects. Effectively, these CABLE model runs represent a lower bound on the possible performance of TBMs at each of these sites, and so the comparison between the statistical approaches and CABLE provides insight into the role of underlying site predictability (including environmental memory) in model-observation evaluations.

Model performance
The ability of the CC ("current climate"), EM ("environmental memory"), and AR ("autoregressive") models to capture the temporal variability in NEE varied considerably among the 12 OzFlux sites (Fig. 1). In general, as sites became drier, their NEE fluxes became more predictable for the EM and AR models (p values < 0.05). The introduction of lagged memory effects (EM model) consistently improved model performance across sites. The smallest improvement was at AU-Tum, where the R 2 increased from 0.23 to 0.25. AU-GWW experienced the greatest improvement, with memory effects increasing R 2 from 0.37 to 0.53. The improvement in model performance when introducing memory effects was true across all model performance metrics considered, apart from MBE, which saw small increases at some sites (see Fig. S1 in the Supplement). The increase in site predictability associated with environmental memory was also correlated with the MAP at the sites (Spearman's ρ = −0.73, p < 0.01). However this relationship between memory and MAP was only significant when all sites were considered together and was not apparent when either site grouping (NATT or SAWS) was considered in isolation, albeit the sample size is smaller when considering transects separately (n = 12 vs. n = 5 and 7). Among the savannah sites, the role of lag effects increased as the precipitation regime became more seasonal (relative improvement in EM compared to CC, correlated with the coefficient of variation in precipitation (CVP) at the sites, ρ = 0.98, p < 0.01). By contrast, at the woodland sites, the importance of memory was instead correlated with mean annual temperature, with hotter sites exhibiting a greater lag influence (relative improvement in EM compared to CC, ρ = 0.83, p < 0.05).
The remaining predictability captured by the AR model was correlated with the seasonality of precipitation, as expressed by CVP. Introducing the AR1 process had a greater influence on absolute improvement in model performance at sites with greater precipitation variability (purple bar in Fig. 1, ρ = 0.70, p < 0.05). The relative improvement from the AR model compared to the EM model (cf. purple bar relative to the combined yellow and turquoise bars) was also correlated with CVP, but to a lesser extent (ρ = 0.65, p < 0.05). When the NATT sites were considered in isolation, this correlation was no longer significant. Instead, there was a strong correlation between the relative improvement from the EM to the AR models and measures of site winter rainfall (ρ = 0.90, p < 0.05). There were no significant relationships between the AR model performance at SAWS sites and climate metrics.
By contrast to NEE, the λE flux was more predictable, with total R 2 values once all memory effects were included ranging from 0.56-0.93, compared to R 2 = 0.3-0.83 for NEE (Fig. 1b). As with the NEE fluxes, the improvement in λE model performance as additional memory components were introduced was maintained across all model performance metrics and sites (see Fig. S2 in the Supplement). The addition of environmental memory improved R 2 values at all sites but by varying amounts. AU-DaS saw R 2 improve from 0.79 to 0.81, while R 2 values for AU-GWW increased from 0.55 to 0.73. The improvement in model performance from the introduction of lagged effects (turquoise bar, Fig. 1) was correlated to the wetness index of the site (ρ = −0.62, p < 0.05). The four wettest sites (MAP > 1000 mm yr −1 ) had an R 2 improvement of between 0.02 and 0.05, while the four driest sites varied from 0.09 to 0.19. Interestingly, the results did not show any clear difference in memory importance between the savannah and woodland sites.
The improvement in performance of the AR1 process for λE fluxes was not correlated with any of the climate metrics under consideration. This was true both when all sites were considered at once and when sites were partitioned into the savannah or woodland subsets.

Sensitivity to climate predictors
To explore the impact of individual climate predictors on NEE, the G n and D n coefficients from the EM model were summed and normalised by the standard deviation, as shown in Fig. 2 (see also Fig. S3 in the Supplement). For NEE (Fig. 2a), most climate variables significantly impacted the flux. Air temperature, short-term precipitation, and VPD each did not significantly affect NEE at two sites. AU-Dry was the only site where more than one climate variable was not significant. The sensitivity magnitude was generally greater for the short-term variables (past 14 d) than for long-term precipitation. Short-wave radiation had a mean absolute sensitivity coefficient of 0.74 µmol m −2 s −1 , followed by air temperature (0.58 µmol m −2 s −1 ), short-term precipitation (0.45 µmol m −2 s −1 ), and VPD (0.37 µmol m −2 s −1 ). In comparison, mean NEE sensitivity to long-term precipitation was just 0.11 µmol m −2 s −1 . For the savannah sites, the NEE flux appeared to display greater sensitivity to environmental conditions as sites became wetter, which was most clear for short-term precipitation. The woodland sites had both a wider range and larger magnitude of sensitivity to the environmental drivers, with short-wave radiation, air temperature, and VPD having a greater impact on NEE than at the savannah sites.
λE fluxes were sensitive to more climate variables than NEE fluxes. Across all sites and variables, only two climate drivers were not significant in their effect on λE at a site: air temperature at AU-Dry and long-term precipitation at AU-Tum. The lack of significance in long-term precipitation at AU-Tum may be due to this site being the wettest, with a MAP of over 1400 mm yr −1 , and one of the sites with the most even rainfall distributions (CVP = 32). The driest year in the AU-Tum data is 2006, with 421 mm of precipitation, which is still greater than the MAP at the four driest sites  Sites are on the x axis, ordered by ascending mean annual precipitation. The y axis is the sum of all G n and D n coefficients from Eq. (1), where n includes the climate variable, divided by the mean standard deviation of the corresponding weighted sums from Table 2, such that sensitivity values are comparable between variables. Where the range of sensitivity includes 0, those variables are considered to non-significantly affect the flux and are coloured green in the plots.
in this study. Another result of interest is that the response of the λE flux to air temperature and VPD at AU-Wom was the opposite to the relationship at all other sites. AU-Wom exhibited an increase in λE with an increase in VPD and a decrease in the flux as temperature increased.
For both NEE and λE fluxes, there was a correlation between the sites' sensitivity to short-wave radiation and their wetness index (ρ = −0.69, p < 0.05 for NEE; ρ = 0.78, p < 0.01 for λE). However when this relationship was explored by splitting between the two vegetation groups, it was only significant for the SAWS sites. The correlation was in fact stronger when only the woodland sites are considered (ρ = −0.86 and 0.86 for NEE and λE, respectively; p < 0.05). When the NATT sites were taken in isolation, this relationship between site aridity and sensitivity to short-wave radiation was not apparent.

Timescales of memory influence
The cumulative weights from the EM model can provide evidence of the relevant timescales at which the significant climate variables affected NEE. These are shown for (a) air temperature and (b) long-term precipitation in Fig. 3. Following Liu et al. (2019), we assumed that the critical lag timescale is when the cumulative weight reached 0.5, indicated by the dashed line. For air temperature, at sites where this climate metric was significant, all but two sites had a lagged response of > 2 d. NEE at AU-Stp and AU-Gin had dependence on prior air temperature at longer timescales, around 4-5 d. The timescales at which precipitation affected NEE were much less consistent across sites. AU-DaS had a very short lag of only 21 d, while AU-Tum required over 270 d of prior rainfall to reach a weight of 0.5. The remaining sites had lags to longterm precipitation falling between these extremes but with no obvious correlation between MAP and response timescale. Similarly, there was no clear relationship between lagged response timescales and the prevalence of woody vegetation at the sites; both NATT and SAWS sites exhibited a range of critical timescales. Figure 4 shows that the timescale at which the λE flux responded to air temperature is generally longer than that for NEE. This may reflect the contribution of deep soil moisture (and so longer timescales) to transpiration fluxes (driven by VPD associated with higher temperature), whereas the impact of temperature on NEE, via for example heterotrophic respiration, would be controlled by shallower soil moisture (Parton et al., 1988). We found that 7 of the 12 sites responded to the air temperature from 3 to 7 d prior. Notably AU-Gin, which had a relatively long response timeframe for NEE to air temperature, had a strong immediate response for λE, with the critical timescale occurring at no lag. The shapes of the cumulative weight plots were more similar across sites for λE than for NEE, with a consistent increase across each lagged period and a much smaller range of initial weights calculated for the current air temperature.
For long-term precipitation, critical time periods for λE ranged from 60 to 270 d. No obvious relationship existed between site aridity and the timescales at which long-term precipitation affected evapotranspiration, indicating that other site characteristics (for example rooting depth) were influencing the lagged effects of rainfall. While the overall range of lagged responses to long-term precipitation was very similar between NEE and λE, the critical timescales for each flux differed at most sites (see Figs. 3b and 4b). For instance, at AU-Dry and AU-Stp, the critical lagged timescale for λE was 60 d, but for NEE a total lag of 270 d was required for a cumulative weight of over 0.5. Conversely, AU-DaS had a short lagged NEE response to long-term precipitation of 20 d, but the critical weight was reached at 180 d for λE.

Comparison to alternative modelling approaches
Finally, NEE fluxes were modelled using both an approach of k-means clustering plus regression and the CABLE TBM parameterised by PFT, the results of which are shown in Fig. 5. At every site, the EM SAM model performed better than a current-climate-only k-means model, which is to be expected due to the far greater degrees of freedom in the SAM models. The difference between the median k-means R 2 and EM R 2 ranges from 0.02 to 0.12. However, the introduction of the lags for individual climate variables to the kmeans model generally improved performance, although the EM SAM model outperformed these models in most cases. When all lags were introduced to the k-means model, performance improved at all sites. Performance was comparable to or better than that of the EM SAM model, confirming the applicability of SAM for predicting terrestrial ecosystem fluxes.
The sites were also modelled using the Australian TBM, the Community Atmosphere Biosphere Land Exchange model (CABLE). In these model runs, CABLE was parameterised by assumed PFT, with no site-specific calibration, and thus indicates a lower bound on TBM performance at the sites. As shown in Fig. 5, this consistently underperformed relative to the predictability expected from the empirical methods. R 2 values for the TBM predictions against observed NEE ranged from 0 to 0.33, with a mean of 0.16. Performance was generally better at the SAWS sites than at the NATT sites. CABLE performed better as sites became drier within the SAWS group, but this behaviour was not seen for the NATT sites.

Discussion
Prior to the introduction of the stochastic antecedent modelling framework , little work had attempted to explicitly quantify the role and behaviour of these lagged effects. This paper provides further evidence that ecosystem fluxes exhibit complicated responses to antecedent conditions and that these responses are important components of ecosystem functioning and as such need to be further explored if these ecosystems are to be properly understood.

To what extent does environmental memory matter?
This study shows that understanding the lagged component in ecosystem responses to climate, that is the environmental memory, can significantly improve the ability to model site fluxes empirically. Introducing antecedent climate to the NEE model increased the R 2 by an average of 0.08, which is a mean relative improvement of 22 % in model performance, with the relative improvement at individual sites ranging from 7 % to 55 %. For λE, the improvement ranged from 2 % to 49 %, with a mean of 20 %. This is reflective of the generally higher predictability of the λE flux compared to NEE, with more of the variance in the flux explained by current climate only. This improvement in model performance for both fluxes indicates that exploring the role of environmental memory could further substantially improve our understanding of site functioning. While the EM model results provide a direct indication of the role of antecedent climate, we further modelled the flux residuals with an autoregressive lag-1 model (the AR model). The ensuing R 2 value obtained was interpreted as a lower bound on overall site predictability based on prior conditions, contingent on the structural assumption of a lag of one time step. This is because the AR model captures remaining predictability from the previous day's fluxes without identifying the source of this influence. For instance, this dependence on prior-day flux may be affected by seasonal leaf area, delays between photosynthesis and respiration (Mencuccini and Hölttä, 2010), or potential site disturbances. It could also be representing environmental drivers that we have not ex-plicitly accounted for, lagged allocation, or more unique impacts such as insect infestation.
Although eddy covariance analysis is a mature field, relatively little work has been done on quantifying environmental memory. Instead, the focus has been on immediate site responses to disturbance events and meteorological extremes (Ciais et al., 2005;von Buttlar et al., 2018;Teuling et al., 2010;Flach et al., 2018). At the site level, further examination of other controls on carbon and water fluxes (e.g. variability in leaf area, root zone soil moisture, the contribution of non-transpiration components, and the role of nonstructural carbohydrates) may unlock explanations for why environmental memory varies both across the precipitation gradients and amongst similar geographic sites. For example, we found that our capacity to simulate NEE fluxes at AU-Cum was seemingly poorly explained by current climate. It is notable that Peters et al. (2021) recently highlighted the high tolerance of drought stress (xylem embolism resistance; p 5 0: −4.07 to −5.82 MPa) of species at AU-Cum, which may explain an apparent decoupling between carbon fluxes and current meteorological conditions. Pinning down the exact mechanistic explanation will remain for future work, but our results motivate the search for hypotheses to explain differences in site behaviour. By characterising the extent of individual-site memory statistically, we hope to stimulate fu- Figure 4. Cumulative mean weights from the λE EM model for (a) air temperature and (b) long-term precipitation. Sites are split into the (i) NATT and (ii) SAWS groups for each climate predictor. Within each group, sites are ordered by mean annual precipitation, with darker colours indicating higher MAP. Only sites where the climate variable is significant in Fig. 2b are included. The dashed line at a cumulative weight of 0.5 indicates the threshold for the critical lag period. Where the cumulative weights cross this line is considered the timescale of the environmental memory effect for each site for the climatic driver in question. ture site measurement campaigns and hypothesis development that examines what drives memory variability and ultimately guide model development. As for which TBM modules would need to be adjusted to fully capture environmental memory, this approach needs to be applied at many more individual sites. This would allow us to identify functional relationships to a greater extent. However, such application needs to be carefully pursued, using not just SAM but other machine learning approaches (such as the k-means clustering plus regression as we have demonstrated) to ensure that any results are process-based and not just structural assumptions from the use of a single modelling approach. By combining multiple empirical studies of environmental memory, we can understand the key lags that aid prediction of ecosystem fluxes and how these vary across site characteristics.

The benefits of stochastic antecedent modelling
Through the application of stochastic antecedent modelling, we have been able to identify the importance of environmental memory in ecosystem fluxes at a diverse range of Australian sites. Unlike traditional methods of exploring antecedent effects, SAM makes few prior assumptions about critical lag lengths beyond a prescribed maximum lag of interest (i.e. the 14 d window we assigned to the short-term variables). The Bayesian framework also yields the full (joint) posterior distribution for the parameters of interest, from which we can compute summaries such as credible intervals, allowing a critical assessment of the relative significance of various memory drivers. However, the SAM method is computationally intensive and requires long data records, which may reduce its potential applications. In this paper, we also perform similar analysis using the basic machine learning approach of k-means clustering plus regression. The results from the k-means modelling were broadly consistent with those from the SAM approach. Both modelling methodologies produced predictions with similar R 2 values (Fig. 5) and saw improved performance when lagged climate effects were introduced. The k-means modelling in this study has provided a novel, independent check on the suitability and performance of the SAM approach. The consistent results increase our confidence in the findings from the SAM model and reduce the likelihood that our findings are influenced by the structural assumptions of the SAM model. Importantly, while computation of k means was significantly faster than SAM, the k-means approach lacks the inherent interpretability of SAM. While SAM is explicitly designed to infer the sensitivity to predictors and the timescales at which lags exist, the approach of k-means clustering plus regression would require further work and modelling to fully explore these as- Figure 5. R 2 values for seven different models of k-means clustering plus regression predicting NEE at the flux sites, split by the two vegetation groups. "Current climate" is the k-means model that only considers concurrent environmental observations for predicting NEE. Each of the five lagged environmental predictors are then introduced separately; "+ all climate lags" is the model where every lagged environmental variable is included and hence utilises exactly the same predictors as the EM SAM model. Boxplots indicate the distribution of the R 2 values when these models are run for between two and eight clusters each. The EM SAM model R 2 value is indicated by a red cross, and the R 2 for a PFT-parameterised TBM (CABLE) is shown with a blue cross. Note that AU-Wom had no available CABLE output. Sites are ordered from left to right by ascending mean annual precipitation within each vegetation group.
pects of flux responses, which is beyond the scope of this study. Since the k-means clustering plus regression often outperformed the SAM model, we have identified that the SAM approach does not provide an upper bound on the information available from the flux data. As such, our results highlight the need to explore the role of environmental memory using different approaches, including use of alternative machine learning techniques.
This study also showed how the relative influence of different drivers of NEE and λE fluxes can be discerned. By normalising the sensitivity parameters for each environmental driver, we further the work of Liu et al. (2019), allowing the individual climate predictors to be compared between variables and across sites. This increases the information provided by the SAM approach regarding the site behaviour and is a further argument for its use to explore the timescales of ecosystem response. Liu et al. (2019) provided a strong argument that including the influence of environmental memory is key for predicting ecosystem fluxes. Our results suggest that more careful site evaluation is likely to be required to truly understand the impact and source of these memory effects. Fortunately, the SAM approach, together with other machine learning techniques, is well positioned to provide the necessary insights to shine a new light on site dynamics and memory influence.

The importance of across-site heterogeneity
One of the key conclusions from Liu et al. (2019) was that as sites become more arid, the importance of antecedent effects increases. However, Liu et al. (2019) considered 42 sites from across the globe, incorporating a wide range of biomes and species. As such, there is potential for confounding factors to be influencing the importance of environmental memory at each site. This study reduces some of this uncertainty by limiting its scope to 12 sites, all located within Australia. This means that, as well as limiting the diversity of species and climates studied, a greater understanding of each individual site is possible. Similarly to Liu et al. (2019), these sites were grouped by biome, although we only had two groups: savannahs and grasslands within the NATT and woodlands in the SAWS group. Each biome group contains sites with a range of MAP and WI values. When these sites are viewed together, the importance of memory is strongly correlated with site aridity (improvement in R 2 between CC and EM models; ρ = −0.85, p value < 0.01), consistent with the conclusions of Liu et al. (2019). However, when the sites are split by our vegetation groupings, this significant correlation is only seen for the SAWS group (ρ = −0.86, p value < 0.05), again noting that total sample size is reduced when we split by vegetation group. The NATT sites had no correlation between site memory and aridity (ρ = −0.7, p value = 0.23), despite having a very strong rainfall gradient. Note that both groups have ranges of aridity that include sites spanning from "arid" to "humid", with the WI at NATT sites between 0.12 and 0.75 and at SAWS sites between 0.11 and 1.2 (Trabucco and Zomer, 2018). This result indicates that grouping many sites together to explore relationships based on a single metric can obscure more nuanced understanding of the processes involved or the key site characteristics driving such relationships. For instance, at the savannah sites, we found that NEE sensitivity to short-term precipitation increases as site MAP increases. We hypothesise that this increased sensitivity is driven by the monsoonal nature of rainfall at the NATT sites with greater MAP. Our results also show that the relationship between short-wave radiation and site aridity is only seen in the SAWS grouping, not at the NATT sites. This is potentially due to the greater proportion of woody vegetation at SAWS sites, resulting in a greater rooting depth and less frequent water stress. As such, these sites are likely to be energy-limited, and hence transpiration is linked to days of high photosynthetically active radiation.
We have also found an inverse response of λE to temperature and VPD at AU-Wom to all other sites. It is not clear what the driver is here, and it is unlikely to be related to the correlation between temperature and VPD as this is higher at the other sites which did not exhibit this behaviour (ρ = 0.85 at AU-Wom; five other sites have higher ρ values, up to 0.91). One possible cause is that AU-Wom generally has the lowest VPD observations across all sites, with only AU-Tum showing similar but greater values (median VPD = 0.20 kPa; other sites range from 0.31 to 1.94 kPa). While other sites may tend to close their stomata, limiting transpiration as VPD increases, AU-Wom is potentially below the VPD threshold at which stomata begin to limit transpiration. Similarly, Griebel et al. (2020) found that AU-Wom does not limit transpiration during hot temperatures and heatwaves, potentially due to access to deep water reserves. However, the inverse response could also be due to the vegetation composition at AU-Wom, with the potential for this effect to have been caused by photosynthetic inhibition at high temperatures (and high VPD), which would further explain why this opposite response is not seen in the NEE flux. The prevailing wind direction, which is seasonally dependent, heavily influences the flux tower footprint and affects the climate conditions at AU-Wom . This could potentially contribute to the inverse behaviour seen if changes in VPD are correlated with significant changes in the vegetated area being measured by the tower. Additional exploration of the lags experienced at the AU-Wom site is probably necessary if this response is to be more precisely attributed.
Many studies have also highlighted how global relationships do not hold at regional, local, or even site level (Knapp and Smith, 2001;Knapp et al., 2017;Lauenroth and Sala, 1992;Ukkola et al., 2021;Wilcox et al., 2016). Such nontransferability is related to the issue of spatial versus temporal relationships and particularly the "vegetation structure constraint" (Lauenroth and Sala, 1992). This is where, due to the slow timescales at which species composition and plant function respond to changes in climate, individual sites are unable to fully utilise any inter-annual variability in climate conditions (Lauenroth and Sala, 1992). Here, we have shown that, within the range of Australian ecosystems analysed, environmental memory is not clearly related to site aridity for savannah sites. In comparison, among woodland sites, the link between environmental memory and site aridity appears to be stronger. Our results point to a need to better understand the role of individual-site characteristics (i.e. root zone water access) in determining predictability of carbon and water fluxes.

Implications for TBM evaluations
Flux data are routinely used to benchmark and improve TBM performance (Abramowitz, 2012;Abramowitz et al., 2008;Best et al., 2015;Haughton et al., 2016Haughton et al., , 2018bNearing et al., 2018). In spite of this, relatively few studies have proposed that assessments of TBM performance at flux sites should also consider the underlying site predictability (but see Haughton et al., 2018a). Here, we argue that the confounding effect of baseline predictability is essential when comparing models that may have been tested at different sites and also in determining which ecosystems a model can best represent. Our results indicate that process-based TBMs tested at sites that exhibit greater predictability from simple empirical models, such as SAM or k means, as used in this paper, might be expected to perform better than TBMs tested at those sites which exhibit a lower baseline predictability of fluxes. For instance, Whitley et al. (2016) found that, when modelling gross primary productivity (GPP) and λE fluxes, the performance of a suite of TBMs improved slightly across the NATT sites as the MAP decreased. This is consistent with our findings of higher predictability at more arid sites. Importantly, this performance could be reinterpreted given the baseline predictability calculated using the SAM approach. Across the sites included in both this paper and Whitley et al. (2016), they found that model-averaged correlation coefficients for λE predictions had a maximum difference between sites of 0.08, ranging from 0.56 (AU-Dry) to 0.64 (AU-Stp). Our analysis found a similar difference between correlation coefficients of 0.06 (0.85 at AU-How and 0.91 at AU-Stp). This might show that both empirical models and TBMs perform better at drier sites. However, when we consider the TBM performance relative to the empirical SAM performance, AU-How actually performs better (TBMs capture 70 % of the expected predictability assumed from the SAM model) than the drier AU-Stp site (where TBMs capture 66 %).
Similarly, Barraza et al. (2017) modelled λE across the NATT using various indices of surface conductance and found that the sites with greater Eucalyptus occurrence (i.e. the wetter sites: AU-How, AU-DaS, and AU-Dry) were better represented by their model. However, if the reported R 2 values from this study are considered relative to the values found in this study, then their models perform even better at the wet sites compared to the dry. The Eucalyptus-dominated sites have R 2 values very similar to those from this study, while at AU-Stp and AU-ASM (the drier sites), the R 2 values found in Barraza et al. (2017) are substantially lower.
Additionally, the CABLE model results in Fig. 5 illustrate how the results from empirical models can be used as a baseline for TBM performance. At AU-Cum, AU-Dry, AU-Gin, and AU-Whr, the R 2 from CABLE is close to 0.1. However, from the SAM and k-means approaches, it is apparent that AU-Whr is more predictable than the other three sites. The performance of CABLE at AU-Whr is only around 18 % of the empirical models. At AU-Cum, the R 2 of CABLE is approximately 25 % and 37 % of the R 2 of k means and SAM, respectively. Hence, while initially CABLE appears to perform similarly well at both sites, it can be seen that the TBM is capturing more of the expected predictability at AU-Cum than at AU-Whr. As such, the baseline predictability of sites as calculated from detailed empirical models such as ours clearly provides a framework by which we can reinterpret the predictability of ecosystems and hence how well TBMs are performing.
Our results add to a growing body of research (for example, see Bastos et al., 2020;Ciais et al., 2005;Feldman et al., 2021;Liu et al., 2018;and Ogle et al., 2015) that identifies an important role of ecosystem "memory" in the terrestrial fluxes. This first step, including the characterisation of the timescale of influence, the processes affected (e.g. λE vs. NEE), the controlling environmental driver and site-tosite variability, is critical to improving TBMs. It is widely acknowledged that capturing legacy processes in TBMs is important (e.g. acclimation, recovery from climate extremes, link between carbon uptake and growth, canopy defoliation), but to develop the theory, we first need a strong evidence base against which we can probe model predictions. The challenge now is to link the statistical findings to mechanisms and then demonstrate that capturing these processes in models leads to improvements in site predictions. This second step will require applications of our approach (or similar) to both field and targeted experimental data, with progress likely to be made by linking directly to model-hypothesis testing (e.g. Katul et al., 2001;Mahecha et al., 2010).

Conclusions
Accurate prediction of carbon and water fluxes is key to understanding the role that terrestrial ecosystems will play in a changing climate. This study builds on previous work utilising stochastic antecedent modelling to provide further evidence that environmental memory is a key component of both net ecosystem exchange and latent heat fluxes. In general, the role of this memory effect increases as sites become more arid, yet we have shown that this relationship is confounded by individual-site characteristics and behaviour. By separating the influence of various predictors on NEE and λE fluxes, it becomes clear that despite this broad-scale relationship with aridity, very different mechanisms are at play across sites. The differences we report in site behaviour should motivate a range of new hypotheses in future research to understand the controls on variability in predictability of site fluxes. Finally, we argue that a consideration of both site predictability and environmental memory should form a key part of terrestrial biosphere model evaluation and future process development.
Author contributions. JCP performed all model runs and code analysis, with input from MGDK, GA, YL, and KO. JCP wrote the manuscript with substantial input from MGDK, GA, MJH, JC, YL, and KO. AJP and NHN provided further input on the manuscript.
Competing interests. At least one of the (co-)authors is a member of the editorial board of Biogeosciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Review statement. This paper was edited by Alexandra Konings and reviewed by two anonymous referees.