Assessing the representation of the Australian carbon cycle in global vegetation models

. Australia plays an important role in the global terrestrial carbon cycle on inter-annual timescales. While the Australian continent is included in global assessments of the carbon cycle such as the global carbon budget, the performance of dynamic global vegetation models (DGVMs) over Australia has rarely been evaluated. We assessed simulations of net biome production (NBP) and the carbon stored in vegetation between 1901 to 2018 from 13 DGVMs (TRENDY v8 ensemble). We focused our analysis on both Australia’s short-term (inter-annual) and long-term (decadal to centennial) terrestrial carbon 5 dynamics. The TRENDY models simulated differing magnitudes of NBP on inter-annual timescales, and these differences contributed to carbon accumulation in the vegetation on decadal to centennial timescales resulted in signiﬁcant differences in long-term carbon (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) accumulation (cid:58) (-4.7–9.5 PgC). We compared the TRENDY ensemble to several satellite-derived datasets and showed that the spread in the models’ simulated carbon storage resulted from varying changes in carbon residence model responses to land-use The DGVMs also simulated different sensitivities to atmospheric carbon dioxide ( CO 2 ) concentration, although notably, the models with nutrient cycles did not simulate the smallest NBP response to CO 2 . Our results suggest that a change in the climate forcing did not have a large impact on the carbon cycle on long timescales. However, the inter-annual variability in precipitation drives the year-to-year variability in NBP. We analysed the impact of key modes of climate variability, including the El Niño Southern Oscillation (ENSO) and the Indian Ocean Dipole 15 (IOD) on NBP. While the DGVMs agreed on sign of the response of NBP to El Niño and La Niña, and to positive and negative IOD events, the magnitude of inter-annual variability in NBP differed strongly between models. In addition, we identiﬁed that differences in the timing of simulated phenology and ﬁre dynamics are associated with differences in simulated/prescribed vegetation composition cover and process representation. Model We disagreement in simulated vegetation carbon, phenology and apparent carbon residence time, the models have different types of vegetation cover (cid:58)(cid:58)(cid:58) and (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) coverage (cid:58)(cid:58) of (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) vegetation across Australia (whether prescribed or emergent). Our study highlights the need to evaluate parameter assumptions and the key processes that drive vegetation dynamics, such as phenology, mortality and ﬁre, in an Australian context to reduce uncertainty across models.


Introduction
Decadal variability in the growth rate of atmospheric carbon dioxide (CO 2 ) is strongly influenced by variability in the uptake 25 and release of carbon by the oceans and the terrestrial biosphere (Ballantyne et al., 2012;Raupach et al., 2008). The interannual variability (IAV) in the CO 2 growth rate is dominated by terrestrial processes (e.g. vegetation productivity, respiration and fire emissions) and their responses to both temperature and precipitation, driven by modes of climate variability (Ahlström et al., 2015;Zhang et al., 2018;Poulter et al., 2014). The El Niño Southern Oscillation (ENSO) is the dominant mode of global variability (e.g. Ahlström et al., 2015) and contributes around 26% to IAV in global gross primary production (GPP; Zhang 30 et al., 2019). Globally, ENSO has also been shown to explain more than 40% of satellite-derived net primary production (NPP) variability, mainly driven by the response of Southern Hemisphere ecosystems (Bastos et al., 2013) and in particular, semi-arid ecosystems .
Around 70% of the land in Australia is classified as either arid or semi-arid (Brown et al., 2008). Oceania has been found to contribute significantly to the uncertainty in global and regional carbon budgets (Bastos et al., 2020) and the important role 35 played by semi-arid ecosystems in explaining the variability of the global carbon cycle was highlighted by the 2011 La Niña event. While on average, ∼ 17% of IAV in net biome production (NBP) was attributable to Australia over the historical period, during the 2011 La Niña event, Australia contributed around 60% to the global carbon sink (e.g. Poulter et al., 2014). A recent study suggested that the Australian terrestrial carbon sink may be enhanced due to more extreme wet-events projected for future decades . This carbon uptake enhancement has been associated with the asymmetric response of GPP 40 to precipitation, i.e. positive GPP anomalies tend to be larger than negative ones (based on upscaled global flux tower data; Haverd et al., 2017), in combination with vegetation expansion linked to rainfall (based on a single dynamic vegetation model; Poulter et al., 2014). At the same time, a series of studies have also identified evidence of rising CO 2 leading to a marked greening of the Australian continent (Donohue et al., 2009(Donohue et al., , 2013Ukkola et al., 2016b;Trancoso et al., 2017).
Dynamic global vegetation models (DGVMs) are commonly used to explore large-scale responses of the carbon cycle 45 to climate variability and climate change (Friedlingstein et al., 2019;Le Quéré et al., 2018). However, comparing NBP (the change in carbon stocks including carbon losses due to disturbance) between different DGVMs shows a large model spread with substantial variations of annual global NBP of up to 3 PgC yr −1 when forced with the same meteorological drivers (Le Quéré et al., 2018). This demonstrates large uncertainties in the representation of the terrestrial carbon cycle and the associated response to climate and land-use change in DGVMs. Roxburgh et al. (2004) concluded from an Australian based evaluation 50 of NPP that DGVMs were unable to capture the interactions between terrestrial biosphere and atmosphere (models simulated a fivefold variation in annual NPP), citing weaknesses related to the role of water and nutrients in limiting productivity. A number of more recent studies have improved our understanding of the Australian terrestrial carbon cycle, employing flux tower data (e.g Tarin et al., 2020), satellite derived carbon fluxes (e.g. Cleverly et al., 2016) and regional biospheric modelling (e.g. Haverd et al., 2013Haverd et al., , 2016b. 55 Nevertheless, while the Australian continent is included in global assessments of the carbon cycle, DGVM performance over Australia has rarely been evaluated. The need for such an evaluation is emphasized by recent studies that have used DGVM simulations to underline the importance of Australia's ecosystems to the inter-annual variability of the global carbon cycle (Poulter et al., 2014;Ahlström et al., 2015). DGVM studies have also previously highlighted vegetation responses to climate as critical to future projected changes in water resources (Ukkola et al., 2016a) and an enhanced future carbon sink (Kelley 60 and Harrison, 2014) across Australia. Thus, evaluating DGVM skill is also equally important for assessing projected changes in Australia's carbon and water cycles.
In this study, we assessed the terrestrial carbon cycle for Australia simulated by 13 DGVMs that are part of the TRENDY v8 ensemble. We examined drivers of both the long-term (decadal to centennial timescales) and short-term (inter-annual timescales) model responses of the carbon cycle. With these timescales in mind, we expected that model differences result et al., 2015). Liu et al. (2015) used an empirical approach to convert a harmonised time series of VOD to above-ground biomass carbon.

Fire-related CO 2 fluxes
We used two different satellite estimates of fire CO 2 emissions: the Global Fire Emissions Database version 4 (GFED4s) 110 described in van der Werf et al. (2017) as well as the Copernicus Atmosphere Monitoring Service Global Fire Assimilation System (CMAS-GFAS; Kaiser et al., 2012). GFED4s has a spatial resolution of 0.25 • with a monthly time step and provides data from 1997-2020. We note that starting from 2017, the GFED4s data are flagged as a beta version, i.e. starting 2017, fire emissions are derived from active fire detections instead of burned area. CAMS-GFAS covers the years 2003-2020 on a 0.1 • horizontal grid. The source for both estimates is the MODIS satellite but different variables and methods are used to derive the 115 fire CO 2 emissions (see e.g. van der Werf et al., 2017;Kaiser et al., 2012;Li et al., 2019;Pan et al., 2020, for more detail).

Landcover fraction
We used the MODIS/Terra Vegetation Continuous Fields Yearly L3 Global 250m SIN Grid V006 dataset for a comparison of satellite derived and landcover in DGVMs. This product estimated the global landcover fraction using surface reflectance, brightness temperature and the MODIS Global 250m Land/Water Map (DiMiceli et al., 2017).

Fluxsites/ North Australian Tropical Transect sites
We analysed the TRENDY models in context of the North Australian Tropical Transect (NATT ) ::::: NATT : gradient, which encompasses a precipitation gradient ranging from ∼720 to 1650 mm yr −1 . The NATT was established to understand the function of savanna ecosystems, the predominant landscape in north Australia. We used the four sites Howard Springs (AU-How), Daly Uncleared (AU-DaS), Dry River (AU-Dry) and Sturt Plains (AU-Stp; see appendix table B1). We chose the simulations with 125 transient CO 2 concentration, climate and land-use change. In order to compare the models to the flux sites, we isolated the corresponding gridcells based on the site coordinates and calculated the net ecosystem exchange (NEE) as a balance between GPP and terrestrial ecosystem respiration (TER). We further divided the data into the wet (November-April) and dry (May-October) season.
Although we showed flux tower observations and simulations of NEE, GPP and TER together, we note that the spatial scales 130 of observed ecosystem fluxes (eddy covariance) and simulations by a DGVM are not directly comparable given the flux tower footprint is ∼ 1 km 2 vs. ∼ 3000 km 2 for a gridcell in a 0.5 • grid. Consequently, DGVMs cannot represent local features such as heterogeneous environmental conditions and land cover, or biogeographical feedbacks with atmospheric conditions specific to the site (Piao et al., 2013;Luo, 2007). However, given the meteorology recorded at the sites and the meteorology used to drive the TRENDY simulations were highly correlated (see appendix figure B3), we assumed the TRENDY models simulated 135 the vegetation in a similar climate as the observed. We also note that due to data loss, some of the observed data contained a high proportion of gap-filled data. This was particularly the case at Howard Springs site and for the TER flux across sites.

Vegetation classes
To group the analysis by regions, we classified Australia into six vegetation classes (tropics, savanna, warm temperate, cool temperate, mediterranean and desert; see appendix figure B4) following Haverd et al. (2012). These regions have distinct 140 climate and biophysical characteristics and are based on the agro-climatic classification by Hutchinson et al. (2005).

Identification of modes of variability
Numerous studies highlighted the impact of different modes of climate variability on Australian weather patterns (e.g. Ummenhofer et al., 2011). The El Niño Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD) drive variability in precipitation, which in turn is the main driver of variability in the Australian carbon cycle (e.g Haverd 145 et al., 2013). We identified years corresponding to the phase of ENSO and IOD events (see appendix table B2). We used these different climate modes to group our analysis of modelled carbon fluxes.

Apparent carbon residence time in vegetation
To understand model differences that are related to carbon stored in vegetation (C Veg ), we analysed the carbon residence time in vegetation (τ ). We followed Friend et al. (2014) and Pugh et al. (2020) who defined the change in C Veg over time as Consequently, we calculated the change in the carbon residence time using annual timesteps according to (2)

PFT groups
Each of the TRENDY models has its own vegetation classification. To compare simulated/prescribed vegetation cover between 155 models, we grouped the model-specific plant functional types (PFTs) into ten PFT groups: evergreen trees ('EVG'), deciduous, summergreen and raingreen trees ('DCD/ SMG/ RNG'), shrubs, savanna, C3 grass, C4 grass, C3 agriculture and C4 agriculture (i.e. crops and pasture PFTs), bare ground and a group comprising the remaining landcover types, such as city, urban, or lakes.
For each model, we assigned the model-specific PFTs to these groups and finally compared the fraction of land covered by each PFT group. Note that not all models account for all the vegetation groups, i.e. only four out of the ten PFT groups included 160 input from all of the models (EVG, DCD/ SMG/ RNG, C3 Grass, and C4 Grass).

Net biome production and carbon stored in vegetation
Summed over Australia, the TRENDY model ensemble simulated large inter-annual variability in NBP (see fig. 1a) ranging from -0.9 PgC yr −1 to 1.4 PgC yr −1 . Figure 1a also shows a considerable model spread: the TRENDY models varied on 165 average by 0.5 PgC yr −1 with a difference of up to 1.26 PgC yr −1 between the most extreme models. Years with extremely high or low NBP, such as in 1973 and 2010, were associated with particularly large uncertainty amongst the models. These large differences on inter-annual timescales led to differences in cumulative NBP over 1901NBP over -2018.7 PgC (OCN) and 9.5 PgC (LPX-Bern; see fig. 1b). Two of the DGVMs simulated a net carbon source over the Australian continent from 1901-2018 while the remaining models simulated a net sink ranging from negligible through to 9.5 PgC.

170
Across the 20th Century, the TRENDY models simulated very different amounts of C Veg . For example, JSBACH simulated ∼2.5 PgC while JULES-ES and ORCHIDEE-CNP simulated more than ∼16 PgC (see fig. 1c). We used an estimate derived from vegetation optical depth (VOD; see fig. 1c) to put the model simulations into context. Nine of the thirteen models predicted a higher amount of C Veg than aboveground biomass estimated by the satellite. Uncertainty in the VOD data was not available, but if we conservatively assume ± 30%, at least 7 of the 12 models were outside this range. However, we note that higher 175 C Veg in comparison to aboveground biomass might also result from the fact that C Veg includes above-and belowground biomass. The models also showed very different long term behaviour. For example, some DGVMs simulated a steady state C Veg (e.g. JSBACH and LPX-Bern) through 1901-2018, while other models displayed a strong decrease in C Veg until 1970 and subsequently increased slightly (e.g. OCN and ISBA-CTRIP). VISIT simulated a sudden jump in C Veg around 1975. Similarly, C Soil varied strongly between the models from 11.9 PgC (JSBACH) up to 64.4 PgC (JULES-ES). Most of the models did not 180 show a change over time while three models increased in C Soil (CABLE-POP, ISAM and LPX-Bern). In short, figure 1 shows that the 13 TRENDY models simulated different inter-annual variability, contrasting cumulative NBP and inconsistent short and long term C Veg and C Soil .
3.2 Response to atmospheric CO 2 concentration, climate and land-use change We used three sets of TRENDY simulations (see appendix figure B1) to separate the effect of three drivers -atmospheric  Liu et al., 2015). Note that we define the first month of the year as July and the last month of the year as June for NBP and follow calendar years for the carbon pools.
both in sign and magnitude . Eight out 13 models simulated a decrease in cumulative NBP compared to the CO 2 195 + CLIM run for all regions. Imposed land-use turned positive cumulative NBP in the CO 2 + CLIM simulations into negative cumulative NBP in the CO 2 + CLIM + LUC simulation for OCN and ORCHIDEE-CNP. Only ISAM accumulated more NBP in the CO 2 + CLIM + LUC simulation compared to the CO 2 + CLIM run in most regions.

Response to modes of climate variability
The climate in Australia, especially the precipitation patterns, is strongly influenced by different modes of variability such as   figure B4; Haverd et al., 2012). We define the first month of the year as July and the last month of the year is defined as June. Note that the y-axis limits differ between the panels.   with the ten driest years, and 'Wettest years' show NBP values associated with the ten wettest years (based on precipitation anomalies averaged over Australia). We define the first month of the year as July and the last month of the year is defined as June. Note that the y-axis limits differ between the panels.

Seasonal productivity and phenology
To examine differences in simulated carbon uptake, we evaluated the simulated seasonal cycle. Figure 4 shows that the 215 TRENDY models varied in the timing and the magnitude of peak productivity for the different vegetation regions. We compared the TRENDY models to the GPP estimate derived from solar-induced chlorophyll fluorescence ('GOSIF-GPP'; Li and Xiao, 2019). We note that because direct measurements of GPP at continental scales do not exist (i.e. GOSIF-GPP is not directly observed GPP), we constrained our comparison to an indicative evaluation of differences. Most models simulated higher productivity for the tropics, savanna and warm temperate regions compared to GOSIF-GPP. For all regions except the desert, 220 most models were not within the uncertainty band of the GOSIF-GPP satellite estimate (grey band). The timing of peak productivity varied among the TRENDY models and GOSIF-GPP: most models matched peak productivity for the tropics, Savanna and desert. In contrast, for the cool temperate and Mediterranean regions, most models lagged peak productivity estimated by GOSIF-GPP by one to three months. As the SIF-GPP relationship was derived using a fixed relationship between eddy covariance and satellite SIF, it is possible that the model data mismatch implies a missing species sensitivity to water stress in 225 the SIF data. However, the consistent timing in peak SIF and satellite LAI (see appendix figure B2) tends to imply that the lags in phenology relate to differences in assumed/emergent vegetation cover (see below).

Apparent carbon residence time
Relative to the average over 1901-1930, all models simulated a similar rate in increase in NPP (see fig. 5a). The apparent carbon residence time (τ ), i.e. the balance between growth of plant tissues (leaves, wood, roots) and the turnover of these 230 tissues, varied between 2 and 14 years averaged over Australia. Models that simulated high C Veg (compare fig. 1) also had high values for τ (e.g. ORCHIDEE-CNP and JULES-ES) and vice-versa (e.g. JSBACH and LPX-Bern). Some models did not simulate changes in τ over time (e.g. CLM5.0, JSBACH, JULES-ES, LPX-Bern) while other models simulated decreases. In particular, the ISBA-CTRIP and OCN models showed a strong decline: from ∼six years (1900) to four years (1960) before leveling off (see fig. 5c). Even though NPP increased for ISBA-CTRIP and OCN, C Veg significantly declined for these two 235 models as well due to the decrease in τ . The remaining models either balanced increasing NPP and decreasing τ so that C Veg did not show a strong trend (e.g. CLM5.0, JSBACH, JULES-ES, LPX-Bern) or increased in C Veg because the effect of change in NPP was greater than the decline in τ (e.g. CLASS-CTEM, ORCHIDEE, VISIT).
Carbon stored in vegetation

Northern Australian Tropical Transect (NATT)
To better understand differences in modelled carbon fluxes, we examined simulations along a rainfall gradient: the Northern (November-April) and dry (May-October) seasons across the NATT. During the dry season, as the sites become drier, the mean position of the GPP and TER distributions shifted closer to zero and the distribution spread narrows. By contrast, the shape of the distributions varied strongly among the models for all variables: some models (e.g ISAM; LPX-Bern for dry season NEE and GPP) simulated high peak probability densities and narrow distributions while others (e.g. JULES-ES and OCN) displayed 245 a flatter distribution. During the wet season, some models were more productive than observed for all sites (CABLE-POP, JSBACH, JULES-ES and SDGVM). Overall, there was no clear systematic pattern in the arrangement of the models across the NATT, even when models simulated similar distributions for NEE, they often did so for different reasons (i.e. contrasting trade-offs in carbon uptake/respiration). terrestrial ecosystem respiration (TER) for the for four fluxsites: Howard Springs (AU-How), Daly Uncleared (AU-DaS), Dry River (AU-Dry) and Sturt Plains (AU-Stp). These sites reflect a strong mean annual precipitation (MAP) gradient (provided for each row). We show the monthly distribution of each variable for the wet (November-April) and dry (May-October) seasons. The observed variables originate from eddy covariance data collected by the TERN-OzFlux facility. The probability density function is based on kernel density estimates using Gaussian kernels with bandwidth selection following Scott (1992).

Land cover fraction
270 Figure 8 shows the vegetation fraction, either prescribed or simulated dynamically, from the TRENDY models averaged over Australia. We grouped the model-specific PFTs into ten groups (see methods). In general, the average landcover varied quite strongly among the models. The average fraction of Australia covered by natural vegetation differed between 24.9% (LPX-Bern) up to 93.7% (VISIT). The vegetation composition also displayed large differences with some models simulating high tree fractions around 40% (ORCHIDEE and ORCHIDEE-CNP) while other models had low fractions around 2% (e.g. ISAM, 275 and SDGVM). Models accounting for shrubs populated 3.5% (ISBA-CTRIP) up to 55% (VISIT) of Australia with shrubs.
Similarly, land populated by grasses covered a large range between 4% up to 50% (VISIT and OCN, respectively) and the fraction of C3 grasses either exceeded land covered by C4 grasses or vice versa. Most models showed a relatively small proportion of agricultural landcover except for LPX-Bern. Lastly, models accounting for bare landcover fraction (i.e. bare ground or desert) defined 0.  In contrast, the grass cover fraction for ORCHIDEE generally increased moving from the coastal areas to the interior of the continent. The tree fraction also varied strongly across the models. Most models had high fractions along the East Coast and in the tropics and no significant tree growth in other places, similar to the MODIS satellite derived tree cover fraction. In contrast, OCN had a low tree cover fractions along the coastline but this fraction increased towards the interior of Australia, with a higher distribution in the South

290
West and South East (see appendix figure B5). On average, Australia's simulated contribution to the global NBP anomaly was 17%, with greater contributions when the global sink (excluding Australia) was small, or in years with above average precipitation in Australia (e.g. in a La Niña years ;Poulter et al., 2014). Depending on the model, Australia's share of global cumulative NBP could be as high as ∼53% (SDGVM).

295
Given Australia's significant contribution to the global terrestrial carbon sink on inter-annual timescales, our goal was to assess the skill of state-of-the-art DGVMs applied to Australia.
Despite being forced with identical meteorology, the 13 DGVMs from the TRENDY v8 ensemble simulated markedly different representations of Australia's terrestrial carbon cycle. The maximum difference in annual NBP among the models reached up to 1.3 PgC yr −1 (see fig. 1a). Importantly, uncertainties associated with IAV in simulated NBP accumulated over

Landcover
We found that the models either prescribed or simulated very different fractions of woody and herbaceous cover (see fig. 9 310 and appendix figure B5), and that these discrepancies likely explained a large proportion of the model divergence in simulated Land-use and land-cover change is one of the major drivers of the terrestrial carbon sink on global and regional scales Whilst DGVMs were developed to understand the interactions between natural vegetationand the atmosphere, the insight that land-use is a major driver of the terrestrial carbon sink has driven efforts to incorporate land use change into DGVMs (Marquer et al., 2018;Pongratz et al., 2018). Differences in the models' response to LUC results from process inclusions, 325 parametrisations and the model-specific interpretation of the land-use forcing (e. g. change in vegetation composition, as well as wood/crop harvest). Although LUC information was taken from the Land-Use Harmonization 2 (LUH2) guidelines, the modelers had to decide how to implement it. For example, the LUH2 suggests that all natural vegetation should be cleared for the conversion to managed pasture while the conversion to rangeland only requires the transformation of forest to (managed) grassland. Other natural vegetation, such as grass-or shrubland, were used for animal browsing without any transformation 330 of the land cover type (Ma et al., 2020). Furthermore, the fraction of croplands reconstructed by LUH2 increased markedly in South West Australia and South East Australia. This increase in crop fraction was associated with increased harvests and consequently expected to reduce NBP. Some models exhibited this feature to varying degrees (JULES-ES, OCN, and ORCHIDEE-CNP) while other models barely increased the crop fraction and impact on NBP (CABLE-POP, ORCHIDEE and VISIT; see appendix figure B6). These different representations of LUC led to emergent differences (e.g. in physiology as the 335 models transition between PFTs and agricultural management) when harvesting was imposed on models simulations.
Atmospheric CO 2 concentration All the models simulated an increase in cumulative NBP in response to increasing CO 2 but with very different magnitudes (+2. 46-10.13 PgC from 190146-10.13 PgC from -2018. This was in line with studies that have identified increasing CO 2 as the main driver of change in the global terrestrial carbon sink and find that model disagreement was largest due to differing sensitivities to CO 2 340 in cumulative NBP (e.g. Huntzinger et al., 2017;Arora et al., 2020;Walker et al., 2020). The incorporation of nutrient cycles in models and DGVMs in particular, tends to result in a reduced sensitivity to CO 2 (e.g. Smith et al., 2014;Zaehle, 2013;Thomas et al., 2013;Meiyappan et al., 2015;Meyerholt et al., 2020). In a global analysis, Huntzinger et al. (2017) found that models incorporating an : a nitrogen cycle tended to have a lower net carbon sink than models without nitrogen constraints. Similarly, two model based analyses in the context of CO 2 manipulation experiments, found lower simulated net primary productivity 345 responses in models that incorporated nitrogen and phosphorus cycles (Medlyn et al., 2016;Fleischer et al., 2019). Interestingly, this finding does not hold for our study: the three models with the highest cumulative NBP all included a nitrogen cycle. A possible explanation may be that Australian soils are considered to be phosphorus, rather than nitrogen limited (Du et al., 2020;Lambers et al., 2015) due to weathering processes and high phosphorus sorption capacities (Beadle, 1966;Wild, 1958). Results from an ecosystem-scale CO 2 manipulation experiment (Ellsworth et al., 2017) showed that phosphorus availability limited 350 biomass growth in an Australian woodland, although leaf-level photosynthesis was observed to have consistently increased (Jiang et al., 2020;Ellsworth et al., 2017;Yang et al., 2020). Despite the importance of phosphorus availability in Australia, only two of the TRENDY ensemble incorporated an interactive phosphorus cycle and these models (CABLE-POP and ORCHIDEE-CNP) did not show consistent behaviour. It is likely that improved modelling of the phosphorus cycle and in particular, the relative 'openness' of this cycle and the flexibility of the plant tissue stoichiometry, will remain key to accurately simulating 355 the time evolution of Australia's carbon cycle (Medlyn et al., 2016).

Apparent carbon residence time in vegetation
The capacity of the terrestrial vegetation to store carbon depends on the magnitude of the input carbon flux and the carbon residence time in plant tissues (τ ; Luo et al., 2003). All models simulated increases in NPP at similar rates compared to their initial NPP  average; see fig. 5). As the other models simulated a similar change in the rate of carbon uptake 360 throughout the period, divergence in the change of C Veg instead resulted from differences in the change of τ . τ depends on the turnover of plant tissues, the simulated mortality rates and mortality induced by competition. Each of these processes can be affected indirectly through shifts in vegetation composition (Friend et al., 2014). Previous studies have found that τ was a key source of model uncertainty shared amongst DGVMs (Friend et al., 2014;Pugh et al., 2020). In an analysis of CMIP5 models, Carvalhais et al. (2014) found that less than a quarter of the models were within the range of of observed τ in tropical 365 Australia and parts of the Northeastern Savannas. Similarly, we found that τ in the TRENDY models was associated with large uncertainty: between 2 and up to 14 years over Australia. Models that simulated high vegetation carbon storage also had comparably long τ and vice versa. These differences in modelled τ imply that the TRENDY models assume different parametrisations associated with their PFTs, mortality and/ or simulated different vegetation compositions and this is an area requiring future evaluation.

Climate and modes of climate variability
Variability in Australia's climate can influence the terrestrial carbon sink directly (e.g. fire, droughts or increased water availability; Keenan and Williams, 2018;Ma et al., 2016) or indirectly via interactions with nutrient availability or fire. For example, models that include interactive fire modules might directly (e.g. wind speed driving fire spread; temperature threshold to allow for fire) and indirectly (fuel availability and moisture) depend on climate variables, and altered fire patterns could 375 further affect the simulated terrestrial carbon sink. Overall, we found that the effect of climate on inter-annual NBP was relatively small, with the exception of the Mediterranean region (see fig. 2). This is in general agreement with Huntzinger et al.
(2017) who found that long term trends in climate were too small to significantly alter the simulated terrestrial carbon sink on long timescales and was : a less important factor than CO 2 fertilisation, nitrogen deposition (depending on whether the models include an interactive nitrogen cycle) and land-use change. While climate variability might not influence Australian carbon 380 cycle trends on long timescales, inter-annual weather variability, especially the amount and timing of precipitation, is a strong control on inter-annual variability in NBP.
Australia's weather is influenced by different modes of variability with El Niño and pIOD events tending to result in below average precipitation in south eastern Australia (e.g. Ummenhofer et al., 2011). The reduced water availability can be expected to reduce NBP or even turn Australia from a carbon sink to a carbon source (e.g. Ma et al., 2016). In contrast, nIOD and La influence the Australian weather patterns and consequently the terrestrial vegetation (e.g. Cleverly et al., 2016). We did not include an analysis of SAM because these events are short lived (one-two weeks), but future work may examine the resulting impacts. We note that selecting years based on ENSO-and IOD-events does not completely isolate the effect of ENSO-or IOD-events on the terrestrial carbon cycle as these modes of variability are not independent and could have legacy impacts that last more than one year. Further, years where both an IOD and an ENSO event occur were accounted for in both the IOD and 400 ENSO categories and therefore, double counted. An additional limitation of our approach was that we compiled the occurrence of ENSO and IOD events from different sources based on the sea surface temperature. Observations of sea surface temperature prior to 1960 however are associated with large uncertainty given they were mostly based on ship data (e.g. Deser et al., 2010).
In addition, the Southern Hemisphere tends to be less well observed and spurious trends in reanalyses do occur (e.g. Hines et al., 2000), leading to an overall increased uncertainty in the sampling of data based on modes of variability.

405
Overall, dry and wet extremes in precipitation led to the largest differences among the models (see fig. 3 'Driest years' and 'Wettest years'), suggesting that the models' sensitivities to wet and dry extremes vary strongly. About half of the TRENDY ensemble had a stronger response to wet than to dry extremes. This is in line with studies that have noted an increased carbon uptake resulting from the asymmetry in the inter-annual distribution of rainfall as well as the asymmetry in the response of GPP to precipitation (e.g. Haverd et al., 2017). The ecosystem in Australia largely consists of vegetation that is adapted to drought 410 (e.g. Li et al., 2018;De Kauwe et al., 2020). About half of the TRENDY ensemble displayed greater divergence during dry extremes, implying that models did not accurately capture process responses as water becomes limiting.
This result is consistent with an a model intercomparison at an Australian woodland site, which showed disagreement was greatest in low-rainfall years (Medlyn et al., 2016). The TRENDY models not only simulated different carbon uptake sensitivities to precipitation but also simulated different seasonal phenology (see fig. 4), suggesting that differing model sensitivities to 415 rainfall may result as much from the underlying simulated vegetation as the different mechanisms at play in wet/dry extremes.
We further looked at the TRENDY output in context of the NATT, defined by a strong rainfall gradient (∼ 720 to 1650 mm yr −1 ). We found that for dry season GPP, the mean position of the distributions shifts closer to zero and the distribution narrows as the sites become more arid, implying that vegetation is only productive following rain (Whitley et al., 2016). By contrast, there was no clear systematic pattern in the responsiveness of simulated productivity during the wet and dry seasons.
Fire-related CO 2 fluxes 435 The seven TRENDY models that included fire outputs consistently underestimated satellite-derived peak monthly emissions and did not accurately capture the timing of peak fire CO 2 emissions. Importantly, we found that DGVMs did not :::::: always accurately capture fire dynamics in Australia, even when aggregated annually (which should reduce timing biases). While land-use change has been identified to be a significant driver of global burned area (Teckentrup et al., 2019), we found that the effect of land-use change on fire CO 2 emissions in Australia was small (see appendix figure B7). Models either increase or 440 decrease in fire CO 2 emissions due to land-use change but these changes were mostly smaller than 1%. All models simulated fire with varying degrees of complexity and derive fire CO 2 emissions based on model-specific burned area using emission factors based on Andreae and Merlet (2001) and Akagi et al. (2011). Consequently, model disagreements must originate from differences in simulated burned area.

Future directions
Our analysis highlights considerable work is still required to improve our capacity to simulate the dynamics of Australian veg-490 etation and the resulting terrestrial carbon cycle. This finding is consistent with previous model evaluations of NPP (Roxburgh et al., 2004), carbon flux responses to rainfall (Whitley et al., 2016(Whitley et al., , 2017 and responses to elevated CO 2 (Medlyn et al., 2016).
When DGVMs are applied to Australia, the assumed plant traits are not adjusted to reflect differences in observed physiology (e.g. Bloomfield et al., 2019;Cernusak et al., 2011), investment in plant tissues (e.g. Togashi et al., 2015), how moisture limits plant function (e.g. De Kauwe et al., 2020;Li et al., 2018) or responses of photosynthesis and growth to temperature (Drake 495 et al., 2015). For example, models typically assume much lower values for the key model parameter V cmax (the maximum rate of carboxylation by the enzyme Rubisco; Rogers, 2013) than is commonly observed in Australia (e.g. Bloomfield et al., 2019;Cernusak et al., 2011), implying that DGVMs miss key carbon and water trade-offs. Whitley et al. (2017) previously stressed the importance of improving process representation of phenology, root-water uptake, rooting depth and fire to improve the representation of Savanna ecosystem dynamics in DGVMs. These Savanna ecosystem cover around 20% of the Australian 500 landscape, but as this study implies (see fig. 6), progress has been limited, although recent model developments incorporating competing rooting strategies and dynamic root growth (Sakschewski et al., 2020) may help this process. Australia has a high fraction of endemic vegetation (>90%; Chapman, 2006), structurally distinct vegetation including open canopy forests and woodlands (predominately Eucalyptus) and hummock grasslands and a dominance of Sclerophyll leaves (Box, 1996). Thus, there remains a significant opportunity in future work to directly test the current hypothesis implicitly embedded in DGVMs, 505 i.e. that Australia's vegetation is not distinct from other continents. The forthcoming AusTraits (https://ardc.edu.au/project/ austraits-a-curated-plant-trait-database-for-the-australian-flora/) database presents one possibility to examine how the use of Australian specific plant traits may change DGVM simulations.
One core area in which model development should be focused relates to the processes that govern the vegetation cover.
In these models vegetation cover is either dynamically simulated ( Our study has highlighted major differences across models under historic forcing (see fig. 8 and 9), implying that future simulations need to be interpreted cautiously. Haxeltine et al. (1996) demonstrated a realistic simulation of Australia's potential vegetation cover with the BIOME2 model and yet, 25 years later, it is not clear that DGVMs have made significant progress.
One explanation for this apparent discrepancy may simply relate to focus of the modellers. In the Haxeltine et al. (1996) 515 study, the model explicitly aimed to reproduce Australia's potential vegetation cover, whereas the current TRENDY models are instead focused on global applications.
Nevertheless, our results imply that a one-size fits all approach does not appear to work for Australia. Australia's National Vegetation Information System (https://www.environment.gov.au/land/native-vegetation/national-vegetation-information-system) generates maps of both vegetation distribution as well as structure, which could be used to evaluate and develop DGVM pre-520 dictions of potential vegetation. However, a direct comparison to observed vegetation distributions still requires model-specific rules to transform simulated foliar projected cover and height to vegetation classes. Our results also showed that imposing land-use change adds further uncertainty to the vegetation distribution. Despite the importance of improving simulated vegetation type, as Fisher et al. (2015) highlighted, it is not necessarily a straight forward task but over-reliance on bioclimatic boundaries is not preferable moving forwards. Emergent differences in cover type may also originate from parameter assump-525 tions, which determine carbon uptake/loss, additional hypotheses that govern demography and responses to fire. Haverd et al. (2016a) demonstrated promise in accurately predicting tree-grass partitioning in Australian savannas using optimality theory to determine daily carbon allocation. However, this approach has yet to be incorporated into any of the Australian land models fractions, by incorporating PFT-specific fire (adaptive bark thickness) resistance and post-fire epicormic resprouting into a DGVM. However, these features have yet to be incorporated into any current DGVMs when applied to Australia.
In conclusion, there is a major opportunity in evaluating DGVMs using Australia as a laboratory. The strong inter-annual variability in precipitation offers an opportunity to evaluate the simulated response of the Australian ecosystems to both extreme wet events and droughts. Similarly the repeated frequency of heat extremes (van der Horst et al., 2019;van Gorsel et al., 2016;535 Mitchell et al., 2014), offers important tests of underlying physiology assumptions. It is likely that insights gained from model evaluation in Australia's environment of longer duration and high intensity climate extremes, including droughts, heatwaves and floods, will help understand how Northern Hemisphere systems will respond to extremes, and in particular events that may intensify in the future (IPCC, 2012). Finally, our study focused on Australia's carbon cycle, but the carbon and water cycle are intimately linked and divergence between model simulations of the carbon cycle implies further work may also be needed to 540 evaluate DGVM simulation of the hydrological cycle.

CO 2 concentration
The atmospheric CO 2 concentration is derived from ice core CO 2 data merged with NOAA data from 1958 onwards. The values. Data prior to March 1958 are estimated with a cubic spline fit to ice core data from Joos and Spahni (2008).

Land-use
The land-use datasets are based on updated data from HYDE for the years 1960-2019, as well as the latest wood harvest data provided by the Food and Agriculture Organization (FAO). The land-use states and transitions are identical to the LUH2 v2h dataset (Hurtt et al., 2020) for the years from 1700 to 1949 (i.e. consistent with the input for CMIP6). Starting 1950, the land-use forcing is based on new inputs from HYDE, and new FAO data for the national wood harvest demands. This leads to 570 differences in comparison to LUH2 v2h, primarily in Brazil.
In order to convert natural vegetation to managed pasture, the LUH2 guidelines suggested all natural vegetation is cleared.
The conversion of natural vegetation to rangeland only requires the clearance of forests. However, each modelling group developed model-specific land-use forcings leading to inter-model differences. We show the change in agricultural landcover in appendix figure B6.

Spin-up
For the spin-up, the atmospheric CO 2 concentration and landcover were set to the pre-industrial values of the year 1700.
During the spin-up, the years from 1901 to 1920 from the climate forcing were recycled.

595
After the spin-up, the atmospheric CO 2 concentration remained time-invariant for the years 1701-2018. The simulation continued to recycle the 1901-1920 climate forcing and the land-use was prescribed to the distribution of the year 1700. The TRENDY protocol required all participating models to be in equilibrium for the CTRL simulation after the spin-up.
Further, all models had to simulate the net annual land flux as a carbon sink over 1990s and/or 2000s for the CO 2 + CLIM + 610 LUC run as constrained by global atmospheric and oceanic observations (Keeling and Manning, 2014). Lastly, the global net annual land use flux (ELUC) had to be a carbon source over the 1990s (based on the CO 2 + CLIM and CO 2 + CLIM + LUC simulation).  El Niño La Niña pIOD nIOD ENSO and IOD neutral 1902197219031964190219631906197419011951      Competing interests. The authors declare no conflict of interests. Correspondence and requests for materials should be addressed to LT 620 (l.teckentrup@unsw.edu.au)