Simulation of soil carbon dynamics in Australia under a framework that better connects spatially explicit data with ROTH C

We simulated soil organic carbon (C) dynamics across Australia with the Rothamsted carbon model (ROTH C) under a framework that connects new spatially-explicit soil measurements and data with the model. Doing so helped to bridge the disconnection that exists between datasets used to inform the model and the processes that it depicts. Under this framework, we compiled continental-scale datasets and pre-processed, standardised and configured them to the required spatial and temporal resolutions. We then calibrated ROTH C and run simulations to predict the baseline soil organic C stocks and composition in 5 the 0–0.3 m layer at 4,043 sites in cropping, modified grazing, native grazing, and natural environments across Australia. The ROTH C model uses measured C fractions, the particulate, humus, and resistant organic C (POC, HOC and ROC, respectively) to represent the three main C pools in its structure. The model explained 97–98% of the variation in measured total organic C in soils under cropping and grazing, and 65% in soils under natural environments. We optimised the model at each site and experimented with different amounts of C inputs to predict the potential for C accumulation in a 100-year simulation. With 10 an annual increase of 1 Mg C ha−1 in C inputs, the model predicted a potential soil C increase of 13.58 (interquartile range 12.19–15.80), 14.21 (12.38–16.03), and 15.57 (12.07–17.82) Mg C ha−1 under cropping, modified grazing and native grazing, and 3.52 (3.15–4.09) Mg C ha−1 under natural environments. Soils under native grazing were the most potentially vulnerable to C decomposition and loss, while soils under natural environments were the least vulnerable. An empirical assessment of the controls on the C change showed that climate, pH, total N, the C:N ratio, and cropping were the most important controls 15 on POC change. Clay content and climate were dominant controls on HOC change. Consistent and explicit soil organic C simulations improve confidence in the model’s predictions, contributing to the development of sustainable soil management under global change.

and Porporato, 2009), leading to inconsistent model calibrations (Conant et al., 2011;Seidel et al., 2018) and inaccurate model predictions (Shi et al., 2018). In this context, the development of frameworks for soil organic C modelling and simulation, to synthesise and integrate measurements and datasets with models are critical (Harden et al., 2018;Ogle et al., 2010;Paustian et al., 1997;Smith et al., 2020). The frameworks should allow for their efficient updating, with new measurements, data and models, as they become available (England and Viscarra Rossel, 2018;Smith et al., 2020). They should also enable a more systematic approach for calibration and validation, making simulations more reliable and reproducible. 65 Our aims here were to: (i) simulate soil organic C across Australia with the ROTH C under a framework that enables the synthesis, processing and standardisation of measurements and data, and predictions at a correct scale, and (ii) to predict the changes in C stocks and composition under cropping, modified grazing, native grazing, and natural environments, in a 100-year simulation. We describe the optimisation of the simulations to derive baseline estimates of the total organic C stock and that of its pools and use a plausible range of C inputs to make the predictions. Finally, we identified the soil and environmental 70 controls on the predicted changes.
2 Materials and methods

The Rothamsted carbon model (ROTH C)
ROTH C is a soil process model for the turnover of organic C in non-flooded soils (Jenkinson, 1990;Coleman and Jenkinson, 1996). The model partitions total organic C (TOC) into pools that represent decomposable plant material (DPM), resistant 75 plant material (RPM), microbial biomass (BIO), humified organic matter (HUM), and inert organic matter (IOM) (Coleman and Jenkinson, 1996). The model simulates on a monthly time step changes in its active pools, in response to climate, soil type, land use and management. Annual C inputs from crops and manure represent different land use and management regimes. We used the ROTH C model version 26.3, which is the version that was re-calibrated for a range of Australian soils (Skjemstad et al., 2004). The decomposition rate constants for the DPM, RPM, BIO, and HUM pools are 10, 0.3, 0.15, and 0.02 year −1 , 80 respectively (Skjemstad et al., 2004). The decomposition of each active pool is assumed to increase, following first-order kinetics, with air temperature, but reduced by soil water deficits and the presence of vegetated soil cover. Temperature effects on soil organic matter decomposition increase following a sigmoid function, while the topsoil moisture deficit reduces it by a factor of 0.2 to 1 (no moisture stress). The soil cover factor is 1.0 for bare soil and 0.6 when soil is vegetated, to slow organic matter decomposition. The main conceptual pools RPM, HUM and IOM are replaced with the measured particulate, humus 85 and resistant organic C fractions (POC, HOC and ROC, respectively) (Skjemstad et al., 2004). The POC fraction includes any DPM available in the soil at the time of measurement.

Soil C simulations under a framework
The framework that we used to simulate C dynamics across Australia (Figure 1), enabled us to efficiently standardise and then integrate measurements and data on soil properties and environmental controls with the ROTH C model for simulation  2.3 Implementation of the framework for soil C simulations 2.3.1 Data compilation and synthesis 95 ROTH C requires site coordinates, POC, HOC, ROC, clay content, and sampling depth (in our case 0-0.3 m). The available water capacity (AWC) of the soil to a depth of 1 m is needed to estimate the soil water balance in a rooting depth. We used a total of 4,431 sites across Australia (Figure 2). The soil properties were estimated with visible-near infrared spectra (Viscarra Rossel and Webster, 2012;Viscarra Rossel et al., 2015). Maximum air temperature, minimum air temperature, precipitation and pan evaporation are also required to run the model. We obtained gridded daily climate data (approximately 5-km resolution) from model include an estimate of the decomposability of incoming biomass, soil cover, and monthly inputs of plant C and farmyard manure. These C input variables, if not measured, must be estimated at each site (see below).

Data pre-processing and standardisation
The datasets were pre-processed and configured to provide standard and consistent values and units of measurement. Daily weather was extracted at each of the 4,431 sites for the 20 years from 1991 to 2010. The mean of the minimum and maximum 110 daily temperatures derived the average daily temperatures. Aggregation of the daily weather data produced monthly average temperature, precipitation, and pan evaporation.
We used the Australian land use map to re-classify each site into the following broad land uses: cropping, modified grazing, Data on agricultural practices at the SA2 level obtained from Unkovich et al. (2017) were used to select a crop or grass to represent typical management regimes in the sites under cropping and modified grazing.
2.3.3 Configuration of land management regimes and initial estimation of C inputs to soil ROTH C does not calculate plant growth or the quantity of soil C inputs. Therefore, we estimated monthly plant C returns and farmyard manure added to the soil (e.g. managed or deposited by animals grazing on pasture) using the following approach.
The initial estimate was made to set the starting values of the C inputs and to match the timing of C inputs to the crop or grass grown.
We assumed that crops grow in rotations, but at sites under modified pastures, only a single grass species was considered. We 125 used the activity data from Unkovich et al. (2017) to determine crop rotations and a representative grass species for each site during the baseline period between 1991 through 2010. We based the selection on the probabilities of change in a regime. The data suggest that these probabilities were different for different periods (particularly for 1990-1994, 1995-1999, 2000-2004, 2005-2009, 2010-2014). For each period, we calculated the cumulative frequency by regime. We used it to randomly select the crop or grass species (both annual and perennial) planted each year in the particular period. The crops grown in all years 130 were selected and then used to determine the most dominant crop species. For the sites under native grazing, we considered a native perennial grass only.
For annual plant species, we used a crop model (Unkovich et al., 2018) that uses the amount of water available to the plant (derived from the precipitation data) to calculate a potential dry matter increment that is water-limited (WLDM) in kg ha −1 : where ET is the evapotranspiration (mm) from pan evaporation, DD is any deep water drainage (mm) that occurs during the fallow season, T s is a fraction of ET that goes through the transpiration, T d is a fraction of deep water drainage that goes through the transpiration, and T E is the transpiration efficiency that is the amount of biomass produced per unit of water transpired (kg mm −1 ) of a cropping or grazing system. The maximum dry matter production (DM max ) is the sum of dry matter increments over the growing season. This model then back calculates dry matter accumulation (kg ha −1 ) over the 140 season (DM acc ): where Day is the current day as the season progresses, Days max is the number of total growing days, Day sow is the day of planting, and a and b are growth coefficients specific for the plant. For a perennial system, daily growth (G) in kg ha −1 is calculated as: where W LT is the amount of water-limited transpiration (mm) that is evapotranspiration multiplied by vegetation cover, T I is the temperature index function (Nix, 1981), and T E is the transpiration efficiency of a perennial system (kg mm −1 ). The perennial plant growth is used to calculate dry matter accumulation over the season. The model estimates root biomass using a plant-specific shoot-to-root ratio. For both modified and native pastures, we assumed grazing to occur if the grass accumulated 1.2 Mg ha −1 of shoot drymatter, with no grazing effect on plant growth. Specifically, grazing animals consumed 50% of daily shoot growth, returned 50% of the consumption to the soil as dung, and shed 50% of daily root growth. When the available soil water fell to < 15% of water holding capacity, 1% and 0.5% of the shoot dry matter and the root dry matter were assumed to die daily. We assumed the C content of above-ground and below-ground residues to be 42% by mass.

155
For the sites under natural environments, however, we did not use the plant model because we had no data on plants in this region. Instead, we assumed small but consistent C inputs from plant residues only, which was set to be 0.049 Mg ha −1 /month.
We assumed no soil cover because in these regions, vegetation cover is typically sparse.

Simulation: optimisation of C inputs to the baseline soil organic C
We assumed an equilibrium condition for the simulation of the baseline soil organic C stocks. Our assumption is based on 160 temporal soil organic C changes from 73 sites across Australian agricultural regions, recorded from 1911 to 2000 (Skjemstad and Spouncer, 2003). The DPM/RPM ratio determines the decomposability of incoming biomass. By default, the recommended DPM/RPM ratio is 1.44 for most crops and improved pastures and 0.67 for unimproved grasslands (Coleman and Jenkinson, 1996). The DPM/RPM ratio depends on the quality of C in plant residues and manure. It is site-specific, differs by landuse (Post and Kwon, 2000), and is unknown for Australian native grazing or natural environments. We tested six different 165 DPM/RPM ratios (0.67, 0.96, 1.17, 1.44, 1.78 and 2.23) to assess the sensitivity of the simulated TOC, POC and HOC to this parameter. These ratios correspond to allocations of incoming plant material to DPM in the range 40-69%, and proportionally, to RPM in the range 60-31%. We then performed the simulation iteratively (up to 1000 times) at each site until the POC and HOC pools matched the estimates of the measured C fractions. We ran the model for 100 years using weather data, repeated from a baseline period between 1991-2010, until equilibrium conditions occurred, with no temporal change in both POC and 170 HOC. At each iteration of the simulations, the monthly input of plant residues and farmyard manure changed from their initial values by a fraction of 1/100. We considered only monthly C inputs in the simulations. Equilibrium condition occurred when 1) both POC and HOC did not significantly change over time (P > 0.05), or 2) we observed an absolute change of < 0.0025 Mg C ha −1 in both POC and HOC. We used a time series linear model with a trend and seasonality to fit the change in POC and HOC over time. An equilibrium condition was also assumed if the direction of a trend (positive or negative) in any pool 175 shifted. This condition prevented unrealistic simulations because both POC and HOC showed the same trend in response to C inputs. Depending on the DPM/RPM, at 12 to 14 out of the 4,431 sites, the model was not able to simulate the equilibrium condition. We note that, for the sites that failed, changing C inputs only is insufficient for making both the POC and HOC pools reach equilibrium simultaneously.
We report the stocks of TOC, POC and HOC at the end of the 100-year simulation. We selected 100 years because it is the 180 permanence period for soil C sequestration of the Australian ERF. The difference between the measured and the simulated TOC stock provided an estimate of the model deviation. We also calculated the range of monthly variation in simulated TOC stocks.
For each site, we selected the DPM/RPM ratio based on the minimum deviation of TOC. We excluded 388 sites with the model deviation and range of monthly change in TOC stock ≥ 10 Mg C ha −1 . Large TOC stocks characterised the sites, median 75.04 Mg C ha −1 (range 52.58-111.44 Mg C ha −1 ) and fell mostly under a modified grazing (data not shown). Finally, we 185 optimised the amount of monthly C input and the DPM/RPM ratio at 4,043 sites and used them as the baseline. We determined the dominant values of the DPM/RPM ratio for each land-use across Australia, based on their relative frequency.

Prediction: potential for C sequestration under changing C inputs
Using the calibrated model, we predicted potential changes in soil organic C over 100 years, in response to changes in C inputs.
We selected different rates of C input to the soil by multiplying the optimised baseline with the factors 0, 0.25, 0.5, 0.75, 1.25, 190 1.5, 2, 4 and 6. These rates were selected to represent a wide range of C inputs that would be either physically achievable or manageable. Because we already calculated the timing of C inputs and the sensitivity to the DPM/RPM ratio, we did not consider scenarios that varied the timing or quality of C inputs. We chose 100 years so that we could detect changes in TOC, POC and HOC and predict their long-term response.
We calculated 11-year moving averages of the stocks of TOC, POC and HOC and the potential vulnerability of soil C to 195 decomposition over the 100-years. The vulnerability of soil C was derived using POC/(HOC + ROC) (Viscarra Rossel et al., 2019). We calculated changes in soil organic C by changing C inputs. We reported the median stocks for the last 11 years of the simulation when it reached a new equilibrium. We also approximated the lower and upper 95% confidence intervals for the median to quantify the variation of their responses to different C inputs (Conover, 1998 There are soil and environmental controls on organic C that are not accounted for by ROTH C. To gain a better understanding of the controls on the predicted change in soil organic C under changing C inputs, we modelled the change in TOC, POC and HOC as a function of the four land-use classes and a set of environmental variables. The environmental variables included i) soil properties, such as total nitrogen (N), total phosphorous (P), and C:N (Viscarra Rossel et al., 2015), ii) clay minerals (illite, kaolinite, and smectite) (Viscarra Rossel, 2011) and iii) potassium (K), thorium (Th) and uranium (U) from gamma 205 radiometrics, which also represent mineralogy and parent material (Minty et al., 2009). For the modelling, we used the machine learning algorithm CUBIST (Quinlan, 1992). To build precise and stable models, we tested combinations of committees (1, 2, 5, 10, and 20) and the number of neighbours (0, 2, 5, and 9) using 10-repeated cross-validation (Hastie et al., 2009). We used the minimum root mean squared error (RMSE) to select the best model. We then assessed the relative importance of each variable based on the usage of each variable in the conditions and the linear models.   Table S1). However, the amount of annual C input necessary to maintain soil organic C stocks was sensitive to 220 the varying quality of incoming plant material. The C inputs increased from 1.47 to 1.83 Mg C ha −1 when the DPM/RPM ratio increased from 0.67 (low decomposability) to 2.23 (high decomposability). With those changes, the rate of C inputs into DPM rose from 0.59 to 1.26 Mg C ha −1 yr −1 , while the rate into RPM decreased from 0.88 to 0.57 Mg C ha −1 yr −1 . The addition of biomass C with different qualities affected the levels of POC and HOC at equilibrium (Supplement Table S1).
With an optimised DPM/RPM ratio at each site, the model was able to explain 97-98% of the measured variation in TOC   * The Australian-wide estimates were the area weighted averages of the medians for the four land-use classes. The area of cropping, modified grazing, native grazing, and arid natural environments occupy 3.8%, 9.2%, 44.8%, and 19.6% of Australia (total area 7673138 km 2 ), respectively.

Effect of changing C inputs on soil organic C
The TOC, POC and HOC stocks at equilibrium were positively related to the level of C inputs (Figure 4). Annual C inputs to the soil under natural environments, native grazing, modified grazing and cropping were 2.38, 0.77, 1.86 and 1.60 Mg C ha −1 , 235 respectively. Therefore, the model estimated the largest amount of C inputs required to maintain soil organic C under natural environments compared to the other land uses. The corresponding interquartile range was 1.11-3.57 Mg C ha −1 for natural environments. In comparison, there was a wider range of C inputs for native grazing (0.57-1.13 Mg C ha −1 ), modified grazing (1.37-3.01 Mg C ha −1 ), and cropping (1.20-2.18 Mg C ha −1 ) (Figure 4). For the agricultural soils, clay affected the relationship between soil organic C stocks and C inputs as soil with more clay (predominantly in eastern Australia) could hold more 240 11 https://doi.org/10.5194/bg-2020-150 Preprint. Discussion started: 21 July 2020 c Author(s) 2020. CC BY 4.0 License. organic C (Figure 4). In particular, HOC and TOC under modified grazing show distinct responses to C inputs, depending on clay content. This pattern was not evident for POC as this pool is not directly associated with clay in the model. The model explained 78, 80, and 50% of the variation in TOC, HOC, and POC by increasing C input under cropping ( Figure   4). The relationship was poorer under native and modified grazing (r 2 = 0.54-0.69) (Figure 4). There was a relatively weak and divergent relationship between soil organic C stocks and C inputs to the soil under natural environments (r 2 = 0.35-0.40), mostly due to differences in precipitation. We found that soil organic C was more responsive to C inputs at the sites with little annual precipitation (approximately 170 mm).
The TOC, POC and HOC stocks at a new equilibrium responded linearly to changing soil C inputs from the baseline ( Figure   5 increased under all four land uses ( Figure 5). The soil under native grazing was the most vulnerable, as the increase in POC (35%) was proportionally higher than the increase in HOC (59%). Soil under natural environments was the least vulnerable to change because of the smaller increase in POC (26%) relative to HOC (70%).

The controls on the simulated soil organic C change
Climatic variables, particularly temperature and potential evaporation, controlled changes in TOC, POC and HOC ( Figure 6).

265
Clay content had a dominant effect on the changes in HOC because in ROTH C, clay determines the ratio of CO 2 released to HOC formed, during decomposition. Total N, the C:N ratio and pH were important controls for the changes in POC (Figure 6), and might be related to a capacity of the soil to form POC. Cropping affected the changes in POC, possibly because of the cropspecific distribution of C inputs. The controls on POC were similar to those on TOC because their changes were proportional.
The land use in natural environments affected the changes in HOC (Figure 6), suggesting that we need a greater understanding 270 of the potential for C sequestration in low clay content soils in arid climates.

Cropping
Grazing modified Grazing native

Change in C vulnerability
Change in C input (Mg C ha −1 ) Figure 5. Simulated potential changes in total organic C (TOC) in the topsoil (0-0.3 m) following the changes in C input (n = 4,043), consisting of particulate and humus organic C (POC and HOC) (left) and corresponding changes in C vulnerability (right). At each site, baseline C input was multiplied by the factor 0, 0.25, 0.5, 0.75, 1.25, 1.5, 2, 4 and 6 to derive different C input levels. A positive change in the C vulnerability shows increased vulnerability to decomposition, while a negative change indicates more resistance to loss. The error bar represents the model variation within the interquartile range from the median. Figure 6. Importance of the environmental variables that contribute to potential changes in total, particulate and humus C organic C (TOC, POC and HOC) by changing C inputs. Climatic variables, i.e. mean annual temperature (MAT), mean annual total precipitation (MAP) and potential evapotranspiration (PET), are averaged over a period of 1991-2010. CEC is the cation exchange capacity of a soil.

The performance of ROTH C for simulating C change in Australia
We used ROTH C because it requires few parameters, it initialises its main pools with measured C fractions, and it was adjusted to suit Australian conditions (Janik et al., 2002;Skjemstad et al., 2004). Further, the model is in Australia's National

275
Greenhouse Gas Inventory System and the ERF, and so we thought it useful to comply. The climatic and soil property inputs needed to run ROTH C are readily available from public datasets or are relatively easily measured, for instance, with proximal sensors (England and Viscarra Rossel, 2018).
The main soil C pools of ROTH C can be initialised with measured C fractions, there is no need for spinup simulations (i.e. simulations until the model reaches equilibrium), making it possible to run the model at each of the sites across Australia, 280 without additional effort. Further, using measured C fractions in the model allows for correct assignment of the primary pool 15 https://doi.org/10.5194/bg-2020-150 Preprint. Discussion started: 21 July 2020 c Author(s) 2020. CC BY 4.0 License.
structure and the measurements serve as internal verification of the model predictions. In our case, we could empirically assess how well the baseline simulations matched the model's corresponding dynamic pools, which gave us additional confidence that the performance of the was representative of the Australian soils used. Such data-driven model initialisation helps with the selection and site-specific estimation of other 'unknown' model parameters, such as the amount and quality of C inputs.

285
Importantly, it also helps with a more consistent calibration of the model (Aber, 1997;Seidel et al., 2018). The simulations successfully optimised both the amount and the quality of C inputs to maintain the current baseline soil organic C stocks.
The model performed well in soils that are under agriculture (both cropping and grazing), but the simulation under natural environments in semi-arid and arid climates need improving. Together with the relatively large C inputs, required to maintain baseline TOC (9.61 to 17.05 Mg C ha −1 ), this poor performance suggests that the model did not represent well the complex There are few quantitative assessments of soil C dynamics in Australia. Primarily they are for cropping regions (Luo et al., 2014;Lam et al., 2013;Wang et al., 2016), some present local case studies with few data (Hoyle et al., 2013). and some 310 report estimates that are uncertain because of the lack of comprehensive surveys and scarcity in data (Gifford, 2010). Here, we simulated soil organic C at 4,043 sites across Australia to predict changes in C stocks and C composition under four land uses, from a range of plausible changes in C inputs to the soil. With an annual increase of 1 Mg C ha −1 in C inputs, the model predicted the largest potential soil C increase in soil under native grazing (12.07-17.82 Mg C ha −1 ), modified grazing (12.38-16.03 Mg C ha −1 ), cropping (12.19-15.80 Mg C ha −1 ) and natural environments (3.15-4.09 Mg C ha −1 ), respectively.
would become more vulnerable to losses with increases in soil organic C. This indicates the need for a better understanding of C stabilisation mechanisms and management to maintain the proportion of new C sequestered in the soil.
We did not use net primary productivity (NPP) as a proxy for C inputs to the soil. Although large-scale estimates of NPP might be a good proxy for the C inputs in natural environments, they would be inadequate for managed land uses (Haverd et al.,320 2013). To derive estimates of NPP for managed land uses, such as croplands, one needs fine spatial resolution land cover data with crop-specific information (Li et al., 2014;Turner et al., 2006), which are not readily available continentally. Large-scale (global, continental or regional) estimates of NPP, such as those available from coarser resolution remote sensing, would not be suitable for agricultural environments, also because depending on the method used to derive NPP, the estimates would be largely uncertain (Roxburgh et al., 2005;Ciais et al., 2010). Therefore, using NPP as an estimate of C inputs for all four land 325 uses would have resulted in more uncertainties in our simulation. We thought it important to maintain a consistent approach for deriving the C inputs, so we used a plausible range of values to represent the C inputs under all four land use classes. The range of C inputs that we used are representative of values that might be expected from management practices that enhance rates of primary production and C input to the soil, including manure addition (Lal, 2016;Paustian et al., 2019). Our results suggest that the baseline rate of C inputs into the active POC and HOC pools is site-specific, and managing its rate locally is 330 needed to avoid soil C loss from land-use change. Importantly, these estimates of C inputs are useful to locate soils where C capture is possible under limited availability of water resources and nutrients (Baldock et al., 2012).
The long-term changes in organic C are primarily determined by the C inputs into the soil, and the sensitivity of the change can be affected by local conditions. The results from the empirical modelling suggest that predictions might improve if we can modify the environmental effects on decomposition, separately for each of the pools. For example, clay content did not 335 importantly affect the changes in POC, but it did affect changes in HOC. In contrast, other studies have shown that clay has a direct effect on both C inputs and the C pools in Australian soils (Krull et al., 2003;Luo et al., 2017). Of course, this might be due to the inability of the model to simulate textural controls on POC. Total N and the C:N ratio contribute more to the changes in POC than in HOC and POC seems to be also affected by pH and more under cropping. These results suggest the difficulty that ROTH C has in simulating the more labile POC dynamics and the need to represent such additional environmental factors 340 to better explain the changes.

Simulating soil C dynamics under a framework
There is a functional disconnection between measurements, data and biogeochemical models , but by simulating under a framework, like we did here, we can bridge that disconnect. A framework provides a standardised and consistent approach for organising and processing input datasets from different sources, to facilitate calibration, verification and 345 prediction at an appropriate scale and resolution, depending on the study. The input data may originate from field or laboratory measurements, remote sensing, digital soil maps or other data from various sources. Under a framework soil C simulations can be more versatile. They can be performed on points, areas or pixels, even when few, or no site-specific data are available. In the latter case, by using fine spatial resolution information (Viscarra Rossel et al., 2014, or like we have shown here, one can use measurements together with continental-scale datasets and processes them consistently for the simulations.

350
aggregated over space and time, depending on the data and the need. For example, if the need is to run the simulations over a large-scale and over grids, finer resolution data e.g. soil property data, will require aggregation to match the coarser resolution of the simulation unit. Similarly, re-classification of categorical data, e.g. land-use data, may be performed, like we have done here, to set the spatial extent of the simulations. We used the model ROTH C, however, by simulating under a framework one could accommodate other soil C models, with only small changes to the workflow (Figure 1). This versatility is essential for 360 extending our theoretical understanding of C cycling and its response to human-induced and environmental change at appropriate scales (Grunwald et al., 2011;Metting et al., 2001). Of course, with other multi-pool C models, it will be important to explore further the initialisation requirements and the baseline state for the simulations. The reason is that each soil C pool could be at a different state. Other models may also drive decomposition based on different assumptions, e.g. soil enzyme kinetics or microbial growth (Smith et al., 2020).

Future needs
Plant biomass production and subsequent C inputs to the soil are critical determinants of the quantity of organic matter in soil C models. The simulations that we presented predicted the potential of soil C sequestration in response to changing C inputs under the main land uses in Australia. However, we will need data on plant growth properties, seasonal biomass data, and residue and grazing management, to better represent management practices under these land uses. Without such datasets, it is 370 difficult to verify the balance between C inputs and the stocks and composition of soil organic C under different land-use and management combinations, except for a few cropping systems (Wang et al., 2016). For the soils under native grazing, we need new research on the specific growing conditions of plants (e.g. nutrient availability) and how they affect the amount and timing of C inputs.
The machine learning modelling could identify some other contributing factors for changes in soil organic C and determine 375 their relative importance. Although there is no clear mechanistic understanding gained from those analyses, some of those variables are important predictors of soil C sequestration, and they might need accounting in future model development. In practice, statistical modelling can be included in the framework to help identify the balance of C flows between the soil, plant, and atmosphere at the scale of interest. However, research to combine mechanistic and statistical modelling is still at an early stage, and more research is needed to connect data with models (O'Rourke et al., 2015;Vereecken et al., 2016), in 380 a consistent manner and across scales, for example, Viscarra Rossel et al. (2019). With new measurements and subsequently growing datasets, we expect to identify new processes and controls from statistical modelling and to further account for these in the models within the framework.