Evaluation of soil carbon simulation in CMIP6 Earth System Models

. The response of soil carbon represents one of the key uncertainties in future climate change. The ability of Earth System Models (ESMs) to simulate present day soil carbon is therefore vital for reliable projections. In this study the most up-to-date CMIP6 ESMs are evaluated against empirical datasets to assess the ability of each model to simulate soil carbon and related controls: Net Primary Productivity (NPP) and soil carbon turnover time ( τ s ). Comparing CMIP6 with CMIP5, uncertainties in modelled soil carbon remain, particularly the underestimation of northern high latitude soil carbon stocks. 5 There is a robust improvement in the simulation of NPP in CMIP6 compared with CMIP5, however the same improvements are not seen in the simulation of τ s . These results suggest a greater emphasis is required on improving the representation of below-ground soil processes in future developments of models. These improvements would help reduce the uncertainty of projected carbon release from global soils under climate change and to increase confidence in the carbon budgets associated with different levels of global warming.


Introduction
Soil carbon is the Earth's largest terrestrial carbon store, with a magnitude of two to three times the amount of carbon contained within the atmosphere (Jackson et al., 2017). The response of soil carbon to CO 2 induced global warming has the potential to provide a significant feedback on climate change, but this feedback is currently poorly known (Friedlingstein et al., 2006;Gregory et al., 2009;Arora et al., 2013;Friedlingstein et al., 2014;Arora et al., 2020;Song et al., 2021). by the Arrhenius equation and then 'hill' functions. Similarly, the most common representation of the sensitivity of soil to moisture in CMIP6 is a monotonically increasing function, followed by 'hill' functions of various sorts.

Defining soil carbon variables
CMIP defines common output variables (Meehl et al., 2000), which allows for consistent comparison between the models, and 95 for cleaner evaluation of models to observational data. These common output variables also allow for consistent comparison between model generations, in this case between CMIP6 and CMIP5. This study focuses on evaluation of near present day soil carbon and related controls. Therefore the results presented in this study use the CMIP standard historical simulation (CMIP scenario historical), for both the CMIP6 and CMIP5 analysis. The historical simulation runs from 1850-2015 in CMIP6 and from 1850-2005 in CMIP5, where the selected dates for each variable (stated below) were chosen to allow for consistent 100 comparison between CMIP5 and CMIP6, and to best match the modelled data to the observational data.
To evaluate soil carbon, this study uses 'Soil Carbon' (CMIP variable cSoil) which represents the carbon stored in soils, and where applicable 'Litter Carbon' (CMIP variable cLitter) which represents carbon stored in the vegetation litter. Total soil carbon, C s , is defined to be the sum of these soil carbon and litter carbon variables (cSoil + cLitter), where applicable. This allows for a more consistent comparison between the models and between the models and empirical data, due to differences 105 in how soil carbon and litter carbon are simulated (Todd-Brown et al., 2013;Arora et al., 2020). Where for models that do not report a separate litter carbon pool, the total soil carbon is taken to be simply the cSoil variable. Modelled C s is time averaged between the years 1950 to 2000 of the historical simulation, and is considered spatially (units of kg m 2 ), and as global totals (units of PgC), where global totals are calculated as an area weighted sum (using the model land surface fraction, CMIP variable sftlf ) and divided by 1×10 12 ). To calculate northern latitude totals, a sum between the latitudes 60 • N and 90 • 110 N is considered.
The CMIP6 models CESM2 and NorESM2-LM have two different variables to represent soil carbon: CMIP variable cSoil, which represents the full vertical soil profile, and CMIP variable cSoilAbove1m, which represents soil carbon in the top 1m of soil. This is due to the representation of vertically resolved soil carbon in these models, which means there are separate carbon pools in the model that represent different soil depths (Lawrence et al., 2019). The CMIP variable cSoilAbove1m is 115 used throughout this study to represent soil carbon for the models CESM2 and NorESM2-LM, unless otherwise stated. The use of this variable is to enable a more consistent comparison with both the other CMIP6 models and the CMIP5 models.
Therefore, an assumption of a 1m depth of soil for modelled soil carbon allows for the fairest evaluation, and evaluation is considered against empirical datasets down to a depth of 1m (see below). However, comparisons with the cSoil variable for both CESM2 and NorESM2-LM are included in Tables 4 and 6 of the Results. 120 In order to obtain a clean separation between above-ground and below-ground drivers of soil carbon variations, a quasiequilibrium approximation is made. We begin with the definition of the effective soil carbon turnover time (τ s ) (Varney et al., 2020;Koven et al., 2017;Carvalhais et al., 2014), which represents the average time carbon resides in the soil: 4 https://doi.org/10.5194/bg-2022-13 Preprint. Discussion started: 24 January 2022 c Author(s) 2022. CC BY 4.0 License.
where R h is the output flux of carbon from the soil, known as the heterotrophic respiration. This definition of the turnover time implicitly neglects other processes that may release soil carbon, but which are not yet routinely included in ESMs (e.g. peat fires or dissolved organic carbon fluxes).
In an unperturbed steady-state (i.e. neglecting disturbances from land-use change, fires, insect outbreaks etc.), there is no net exchange of carbon between land and atmosphere, and R h is equal to the Net Primary Productivity (NPP, Π N ). In the contemporary period, the difference between Π N and R h represents the Net Ecosystem Productivity (NEP, ≈ 3 PgC yr −1 ), 130 which is small compared to the Π N and R h fluxes (≈ 60 PgC yr −1 ). Therefore the present day soil carbon can be approximated by: which gives a clean separation of soil carbon variation into the above (NPP) and below (τ s ) ground drivers of soil carbon spatial patterns.

135
To evaluate these soil carbon controls on C s,eq , NPP and τ s are evaluated separately. This study uses modelled 'Net Primary Productivity' (CMIP variable npp), which is defined as the mass flux of carbon out of the atmosphere due to NPP on land. NPP is also considered spatially (kg m 2 yr −1 ), and as an area weighted global total flux (PgC yr −1 ). By definition τ s is defined by Eq. 1, so τ s is calculated by soil carbon (as defined above) divided by R h . R h , 'Heterotrophic Respiration' (CMIP variable rh), is defined as the mass flux of carbon into the atmosphere due to heterotrophic respiration on land, primarily due to the microbial 140 respiration that occurs in the soil, and where the units of R h are the same as that of NPP. The carbon fluxes (NPP and R h ) are time averaged over the period 1995-2005 for consistency between the CMIP generations and to match the empirical datasets. τ s can be considered on a spatial level, or as an effective global τ s , which is defined as average τ s =mean(C s )/mean(R h ) (where the mean represents an area weighted global average). The advantage of defining an effective global τ s is that it is not dominated by large spatial outlying values. Using either method, the units for τ s are in years by definition.

145
The relationships of soil carbon, C s , NPP and τ s , with both temperature and soil moisture are also considered. For temper-  Koven et al. (2017), and Todd-Brown et al. (2013). This dataset is referred to here as the 'Benchmark dataset'.
We use the 95% confidence intervals given by Todd-Brown et al. (2013), to derive standard deviations about the global mean soil carbon. To do this, the constructed 95% confidence intervals where used to calculate upper and lower bounds around 165 the mean value. Then assuming the data is normally distributed, these derived 95% confidence intervals were halved to obtain confidence intervals equivalent to a standard deviation error on the mean (1412 ± 215 PgC). The uncertainty analysis completed in Todd-Brown et al. (2013) is used for the benchmark soil carbon dataset as no quantitative uncertainty has been previously or since defined for the HWSD and NCSCD datasets (Anav et al., 2013).
Additionally, the benchmark dataset was compared with empirical estimates found in the literature to improve the robustness 170 and reliability of the evaluation. Todd-Brown et al. (2013) find that this derived uncertainty is consistent with other empirical estimates of global soil carbon; for example, 1576 PgC in Eswaran et al. (1993), 1220 PgC in Sombroek et al. (1993), and 1502 PgC in Jobbágy and Jackson (2000). This study further compares with empirical estimates of 1395 PgC in Post et al. (1982) and 1515 PgC in Raich and Schlesinger (1992). These empirical estimates are with one standard deviations of the global mean soil carbon given by the benchmark dataset (Table 3).

175
These additional datasets include: (1) the World Inventory of Soil property Estimates (WISE30sec) dataset down to a depth of 2m (Batjes, 2016), which includes a given standard deviation on the global total soil carbon consistent with our derived benchmark uncertainty, (2) the named 'S2017' from Sanderman et al. (2017) (Harper et al., 2018), models can be further evaluated against using a benchmark dataset for Heterotrophic respiration (R h ), where R h is estimated using the CARDAMOM (2001-2010) heterotrophic respiration dataset (Bloom et al., 2015). The empirical CARDAMOM R h has associated estimates of error, which were used to derive a standard deviation 200 uncertainty on the empirical average R h (51.7 ± 21.8 PgC yr −1 ). This study includes map plots comparing the two empirical datasets, which is shown in Fig. A2. Global totals for R h are also considered for comparison against NPP, where the CMIP6 and CMIP5 values are also shown in Appendix Tables A1 and A2, respectively.

Soil carbon turnover time
To estimate a benchmark soil carbon turnover time (τ s ), the estimates of observational soil carbon are divided by an estimate 205 of heterotrophic respiration (R h ) (see above). To estimate an uncertainty on effective global τ s , this study derived upper (τ + s ) and lower (τ − s ) bounds based on the derived C s and R h uncertainty estimates. The upper bound was calculated using the following: τ + s = C + s / R − h , where C + s is equal to the mean soil carbon plus one standard deviation and R − h is equal to the mean heterotrophic respiration minus one standard deviation. The lower bound was calculated using the following: τ − s = C − s / R + h , where similarly C − s is equal to the mean soil carbon minus one standard deviation and R + h is equal to the mean 210 heterotrophic respiration plus one standard deviation. This method gives a large uncertainty bound around the derived mean estimate (27.0 +27 −11 yr), so the benchmark data is further compared to empirical estimates. Raich and Schlesinger (1992) derive an estimate of mean soil carbon turnover of 32 yr, using estimates for mean soil carbon pools and mean soil respiration rates.
More recently, Carvalhais et al. (2014) derive an estimate for the mean global ecosystem carbon turnover time of 23 +7 −4 , which is a spatially explicit and observation based estimate. Ito et al. (2020) derived an observational uncertainty range on soil carbon 215 turnover time of 18.5 to 45.8 years, which was derived using similar empirical estimates found in the literature. These estimates give more certainty on the values closer to the derived empirical mean value for τ s . Change Initiative soil moisture, and estimates global surface soil moisture from a large set of satellite sensors (Copernicus Climate Change Service, 2021; Liu et al., 2011Liu et al., , 2012Wagner et al., 2012;Gruber et al., 2017;Dorigo et al., 2017). The WFDEI Meteorological Forcing dataset is used to represent observational air temperatures (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) (Weedon et al., 2014), where dates are choosen to allow for consistency between CMIP generations. This study includes no uncertainty analysis on the soil moisture and air temperature empirical datasets as these datasets are only used to evaluate spatial correlations with 225 modelled data and not to evaluate soil moisture and air temperature in the models.

Regridding
To allow direct comparisons between the empirical data and model output data, the model data was regridded to match the observational grid. In this case, the observational grid is a 0.5 • by 0.5 • resolution, 720 longitude and 360 latitude grid. This was done using Iris -the community-driven Python package for analysing and visualising Earth science data (Met Office, 2010 230 -2013). The regidding method assumed conservation of mass and used linear extrapolation, where extrapolation points will be calculated by extending the gradient of the closest two points. Moreover, model land masks are used to calculate the fraction of land in each coastal grid cell.

Statistical analysis
It is difficult to evaluate the spatial distributions of modelled soil carbon and related spatial controls against empirical data 235 with a single metric, so the evaluation for both CMIP6 and CMIP5 involves multiple methods. These include: coefficients of variation, spatial standard deviations, spatial Pearson correlation coefficients and Root Mean Square Errors (RMSEs). These methods can be combined to give a more thorough evaluation of spatial soil carbon and associated controls in the CMIP6 models compared to the previous generation of CMIP5 models.
The coefficient of variation is defined as the ratio of the ensemble standard deviation (std) to the ensemble mean in each 240 grid cell. This is used to show the amount of variability amongst the models in the ensemble scaled to the size of the ensemble mean, so represents the variability spatially in the ensemble and shows how much variation is present across the ensemble in specific regions. It is presented as hatching on map figures, where shaded 'hatched' regions show regions of high variability within the ensemble. These regions show areas where there is disagreement in the ensemble as there is large spread compared with the mean, and is defined as where std/mean > 0.75. The regions where spatial C s < 5 kg m 2 are discounted as the low 245 values of soil carbon discounts the significance of disagreement in these regions.
The spatial standard deviation is a measure of the spread in the data across the globe compared to the mean value. Pearson correlation coefficients (r-values) are used as a spatial measure of the linear correlation between the empirical and modelled data, where a high r-value (near 1 or -1) represents a high correlation in the data and a low r-value (near 0) represents a negligible correlation. Root Mean Square Error (RMSE) is used as an absolute measure of the difference between the modelled 250 data and empirical data, where the lower the value the lower the difference error. The RMSE can be considered as the standard deviation of the difference, and it is a measure to show the deviation of the modelled data in relation to the empirical data.
This statistical data: spatial standard deviations, Pearson correlation coefficients, and RMSEs, can be presented using a Taylor diagram. A Taylor diagram is a mathematical graph used to indicate the performance of a model compared with a benchmark, which in this case is the empirical datasets (Taylor, 2001).  Table 4). The global total soil carbon for two (CanESM5 and MIROC-ES2L) 260 out of the eleven CMIP6 models falls within the benchmark soil carbon uncertainty range, 1197 -1627 PgC (mean ± stand deviation). The models with the largest global total soil carbon are CNRM-ESM2-1 (1810 PgC), BCC-CSM2-MR (1770 PgC), and UKESM1-0-LL (1760 PgC), values greater than the benchmark dataset but not the additional empirical datasets ( Table 3).
The models GFDL-ESM4 (516 PgC) and IPSL-CM6A-LR (639 PgC) have the lowest global total soil carbon values in the ensemble, with global totals significantly lower (approximately 50% less) than the global totals seen in empirical data. It is 265 noted that if the full soil carbon profile is considered for CESM2 and NorESM2-LM opposed to a depth of 1m, the global total soil carbon values are increased to 1870 PgC from 991 PgC in CESM2, and to 2430 PgC from 969 PgC in NorESM2-LM.
The ensemble mean global total soil carbon is found to have reduced in CMIP6 from CMIP5. Table 4 includes the CMIP6 ensemble mean global total soil carbon, where a total of 1206 ± 445 PgC is deduced, using regridded model resolutions (see methods). It is noted that Ito (2011) state a CMIP6 ensemble of 1553 ± 672 PgC, however the full soil carbon profile is consid-270 ered for CESM2 and NorESM2-LM, opposed to a depth of 1m considered in this study. Table 5  Despite a reduction in ensemble mean global total soil carbon in CMIP6 compared with CMIP5, the CMIP6 ensemble value remains within the benchmark uncertainty range. However, a significant reduction is seen in the associated standard deviation of the ensemble mean global totals (± 445 PgC in CMIP6 from ± 810 PgC in CMIP5). Moreover, a reduced range of global 280 total values is seen in CMIP6 compared with CMIP5, where a range of 1294 PgC is seen in CMIP6 opposed to 2493 PgC in CMIP5. This suggests that although a significant range in global soil carbon still exists amongst the CMIP6 ESMs, there is an improved consistency between the models seen in CMIP6 compared with the models in CMIP5. It is found from comparing the previous generation models in CMIP5 with the updated CMIP6 equivalent, that multiple models in CMIP6 have lower quantities of soil carbon than in CMIP5, such as: GFDL-ESM4 from GFDL-ESM2G, IPSL-CM6A-LR from IPSL-CM5A-LR, is reported to have the largest soil carbon magnitude amongst the CMIP5 models, with a global total of 3000 PgC (Table 5), whereas the updated CMIP6 model MPI-ESM1.2-LR has a reduced global total soil carbon value of 970 PgC, amongst the lowest values reported in CMIP6 and below observational derived range (Table 4). Conversely, these reductions are negated in the ensemble mean by the remaining models which have greater quantities of soil carbon in CMIP6 compared to their CMIP5 290 equivalent, such as CanESM5 from CanESM2, CESM2 from CCSM4, NorESM2-LM from NorESM1-M and UKESM1-0-LL from HadGEM2-ES. For example, the CMIP5 model NorESM1-M is amongst the lowest soil carbon values presented in this ensemble at 538 PgC (Table 5), whereas the updated CMIP6 model NorESM2-LM has an increased global total of 969 PgC (down to 1m) ( Table 1).

Northern latitude total evaluation
295 Northern latitude soil carbon (down to a depth of 1m, and where northern latitudes defined as 60 • N -90 • N) is found to be underestimated in CMIP6, with eight out of the eleven CMIP6 models having lower northern latitude soil carbon values than the derived observational range (Table 4). Two out of eleven CMIP6 models (CNRM-ESM2-1 and MIROC-ES2L) have northern latitude totals that fall within the uncertainty range derived from the benchmark data, 318 -484 PgC (mean ± stand deviation).
The CMIP6 models with the greatest northern latitude total soil carbon are BCC-CSM2-MR (575 PgC The northern latitude soil carbon total was also underestimated in CMIP5, with six out of the ten CMIP5 models estimating northern latitude totals lower then the empirical estimates ( Table 5). The ensemble mean total northern latitude soil carbon 305 is lower in CMIP6 (266 ± 139 PgC seen in Table 4) than in CMIP5 (318 ± 246 PgC seen in Table 5), which is consistent with the global total results, however both the CMIP5 and CMIP6 mean values fall below the benchmark range. Similarly, as with global soil carbon, a smaller standard deviation on the mean is found for CMIP6 compared with CMIP5. Moreover, there is a reduced range in simulated northern latitude total values amongst the CMIP6 models, where despite a large range seen (66 to 575 PgC), an even greater range is seen in CMIP5 (28.1 to 742 PgC). Moreover, improvements are seen amongst 310 models from CMIP5 to CMIP6. For example, the CMIP5 model NorESM1-M had a northern latitude total soil carbon value of 31.0 PgC, which is significantly lower than what is expected based on the benchmark dataset (Table 5). However, the updated CMIP6 version of this model, NorESM2-LM, has a northern latitude total soil carbon value of 300 PgC, which is much more in line with the expected observational values (Table 4). An improved representation of northern latitude soil carbon is also seen CESM2 (compared with CCSM4), which has the same land surface model as NorESM2-LM (CLM5 (Lawrence et al.,315 2019)).
The CMIP6 models with the lowest global total values for soil carbon do not always correspond with the lowest northern latitude values for soil carbon. For example, UKESM1-0-LL global total soil carbon is amongst the highest global totals seen in CMIP6, however low quantities of soil carbon are seen in the northern latitudes (approximately 10% of the global total). in the northern latitude region, which is consistent with the ratio seen in the benchmark dataset. This result suggests that representing global total soil carbon stocks consistent with the benchmark soil carbon, does not imply the consistency in the representation of northern latitude soil carbon stocks, and these should be evaluated separately. However, the large uncertainties associated with the empirical datasets for the northern latitudes are noted (Table 3).

Spatial evaluation 325
A lack of consistency in the simulation of soil carbon was found amongst the CMIP5 models, which can be seen in Fig.   1(a), where differences between the empirical and modelled data is shown. Northern latitude soil carbon was found to be underestimated in CMIP5, where areas of blue can be seen in the northern latitudes of the CMIP5 soil carbon map in Fig.   1(a). This underestimation of CMIP5 northern latitude soil carbon is accompanied by significant overestimations seen in midlatitude soil carbon. Specifically, large quantities of soil carbon which are inconsistent with our benchmark dataset can be 330 seen in the mid-latitude regions in the following CMIP5 models: CanESM2, GFDL-ESM2G, GISS-E2-R, MIROC-ESM, and MPI-ESM-LR, and less significant overestimations are seen in HadGEM2-ES and IPSL-CM5A-LR (Fig. A3). Systematic errors remain in the CMIP6 models, however there are some improvements seen in the spatial simulation of soil carbon from CMIP5. Soil carbon is still underestimated in the northern latitudes, where the areas of blue still remain the northern latitudes of the CMIP6 soil carbon map in Fig. 1(a). However, regions of overestimations in the northern latitudes are also seen 335 amongst the CMIP6 models in BCC-CSM2-MR, CESM2, CNRM-ESM2-1, and NorESM2-LM (Fig. 2), but it is noted that this representation might be more consistent with observations if a dataset including deeper soil carbon stocks was considered.
CMIP6 shows improvements in the representation of mid-latitude soil carbon, where less of an overestimation is seen in CMIP6 compared with CMIP5 ( Fig. 1(a)). This overestimation can still be seen in four of the eleven CMIP6 models: ACCESS-ESM1.5, CanESM2, MIROC-ES2L and UKESM1-0-LL, however the overestimations in CMIP6 are less inconsistent than 340 when compared with CMIP5 and the number of models showing this limitation in CMIP6 has been reduced (Fig. 2).
Despite the differences seen in the spatial representation of soil carbon between the individual models in CMIP6, the ensemble mean has more areas of agreement within the ensemble compared to the ensemble mean in CMIP5. This can be seen in Fig.   3(a), where there is less hatching (where hatched shaded areas represent regions of low agreement amongst the models in the ensemble, see methods) in the CMIP6 map compared with the CMIP5 map. Specifically, ensemble mean soil carbon in CMIP6 345 has more areas of agreement in the mid-latitude region compared with the CMIP5 ensemble mean, where significant areas of disagreement are seen. This disagreement is likely due to the overestimation which exists in some of the CMIP5 models ( Fig. A3). Also, a reduction in the area of disagreement is seen in the northern latitudes in CMIP6 compared with CMIP5, however this remains the region where the most disagreement exists across the generations. It is noted that this is a measure of agreement within the ensemble, and not between the models and empirical data.

350
The inconsistency of the simulation of spatial soil carbon in CMIP6 is further evaluated using the spatial standard deviations, the spatial Pearson correlation coefficients and Root Mean Square Errors (RMSEs) (see Methods). The Taylor Diagram ( Fig.   4(a)) presents all three statistical assessments. The spatial standard deviation for soil carbon is shown on the radial axis between standard x and y axes in Fig. 4(a). The range of spatial standard deviations amongst the CMIP6 models sees a slight reduction from the range amongst the CMIP5 models, though significant differences remain. The CMIP6 models CNRM-ESM2-1, 355 MIROC-ES2L and UKESM1-0-LL best match the spatial standard deviation derived from the benchmark dataset (Tables 4   and 5). It is found that the spatial representation of modelled soil carbon in CMIP6 is poorly correlated to the empirical soil carbon, where the CMIP6 ensemble spatial correlation coefficient with the empirical data is found to be 0.250. The spatial correlation coefficients between the individual CMIP6 and CMIP5 models with the empirical data can also be seen in Fig.   4(a), where the low spatial correlation coefficients are shown by the curved correlation axis. The lowest spatial correlation 360 coefficients amongst the CMIP6 models were r-values of 0.104 in IPSL-CM6A-LR and 0.115 in UKESM1-0-LL. The CMIP6 model that was the most spatially consistent with the empirical data is CNRM-ESM2-1, with an r-value of 0.630. The CMIP6 ensemble sees a slight reduction in the RMSE compared to the CMIP5 ensemble, suggesting a slight improvement ( Fig. 5(a)).
Significant improvements in the RMSE are seen in MIROC-ES2L from MIROC-ESM and MPI-ESM1.2-LR from MPI-ESM-LR. These results suggest small improvements in the simulation of soil carbon across this CMIP generation, however the 365 low spatial correlation coefficients and variable RMSEs seen across the models in CMIP6 suggest inconsistencies with the benchmark data remains.
3.2 Net Primary Productivity: improved in CMIP6 relative to CMIP5

Global total evaluation
Global total NPP amongst the CMIP6 models is consistent with the benchmark dataset (Table 6), where the CMIP6 ensemble 370 mean for NPP is approximately 95% of the benchmark mean. The CMIP6 ensemble mean global total NPP (53.0 ± 9.39 PgC yr −1 ) is found to be slightly lower than the derived mean benchmark value, however it is comfortably within the observational uncertainty range (56.6 ± 14.3 PgC yr −1 ). The equivalent values for the CMIP5 models can be seen in Table 7, where the CMIP5 ensemble total is also found to be within the observational uncertainty range (56.3 ± 15.4 PgC yr −1 ). The standard deviation surrounding the CMIP5 ensemble mean is greater than in CMIP6. This reduced uncertainty in CMIP6 is because 375 several of the models have a simulated global total NPP that more closely matches the benchmark NPP global total value compared with the previous CMIP5 generation. For example, GFDL-ESM4 from GFDL-ESM2G, IPSL-CM6A-LR from IPSL-CM5A-LR, MIROC-ES2L from MIROC-ESM, MPI-ESM1.2-LR from MPI-ESM1-M, and UKESM1-0-LL from HadGEM2-ES. The majority of CMIP6 models see a reduction in NPP from the CMIP5 equivalent model, which in general reduces the overestimation of NPP that was seen in the CMIP5 models (Table 7 and 6). However, is was not the case for CanESM5 from 380 CanESM2 which sees an increase in the magnitude of NPP from CMIP5 to CMIP6, resulting in a consequent overestimation compared to the benchmark data. A reduced range of modelled global total NPP values is also seen in CMIP6 from CMIP5, where the range is reduced from 48.5 PgC yr −1 in CMIP5 to 32.7 PgC yr −1 in CMIP6. These results suggest that overall the representation of carbon fluxes in CMIP6 ESMs is more consistent than in CMIP5.

Spatial evaluation
Modelled NPP in CMIP6 is spatially more consistent with empirical data than CMIP5. This can be seen in Fig. 1(b), where the difference between the modelled and benchmark NPP is shown for both CMIP5 and CMIP6. It can be seen in the CMIP5 map that NPP is overestimated in the tropical regions, specifically in Africa and South East Asia, and the equivalent CMIP6 difference map shows a clear reduction in this overestimation. This tropical overestimation of NPP prominent in CMIP5 (Fig.   A4), is still seen in the CMIP6 models CanESM5, MPI-ESM1.2-LR and UKESM1-0-LL. However, this is not seen in the 390 CMIP6 ensemble mean as it is likley negated by underestimations seen in CESM2, CNRM-ESM2-1, and NorESM2-LM (Fig.   6). CMIP6 also sees more consistency with the benchmark dataset in the northern and mid-latitudes compared with CMIP5, where more white areas are seen in the CMIP6 map in Fig. 1(b). An underestimation of NPP is seen in both CMIP5 and CMIP6 on the west side of South America, though unusually high NPP is seen in this region in the MODIS NPP dataset (Fig. A2).
Moreover, greater agreement amongst the models within CMIP6 is seen compared the models in CMIP5. This can be seen in 395 Fig. 3(b), where less hatching representing areas of disagreement within the ensemble is seen in the CMIP6 compared with CMIP5. Specifically, CMIP6 sees less hatching in the northern latitudes, the Middle East and South East Europe, as well as regions in South America, South Africa and Australia.
The improved empirical consistency of modelled NPP in CMIP6 is also found when further evaluated using the same spatial metrics as with soil carbon. Despite a small range remaining in the spatial standard deviations amongst the CMIP6 400 models (shown by the radial axis in Fig. 4(b)), robust improvements in the spatial correlation coefficients (shown by the curved axis in Fig. 4(b)) and RMSEs are seen across the ensemble compared with CMIP5 ( Fig. 5(b)). Notable improvements in the representation of NPP are seen in GFDL-ESM4 compared with GFDL-ESM2G, IPSL-CM6A-LR compared with IPSL-CM5A-LR, and UKESM1-0-LL compared with HadGEM2-ES, with reduced RMSEs seen in each updated model. A general improvement in the spatial correlation coefficients is seen across all the CMIP6 models, where the circle markers (CMIP6 405 models) in Fig. 4(b), have higher correlation values than the cross markers (CMIP5 models). The general improvement has resulted in the CMIP6 ensemble correlation coefficient (0.836) being greater compared with the equivalent CMIP5 value (0.711). The lowest correlations between modelled and observed NPP amongst the CMIP5 models are GISS-E2-R (0.274) and CanESM2 (0.469). The updated version CanESM5 remains the lowest correlation seen in CMIP6 (0.655), however an improvement in the correlation is seen. The updated version of the GISS model is not included in the CMIP6 ensemble 410 considered in this study, which could be a reason for the increased ensemble mean correlation. However, this effect does not take away from the improvements seen across the CMIP6 models. HadGEM2-ES (0.764) and MPI-ESM-LR (0.764) were the CMIP5 models with the highest correlation to the benchmark NPP, and the updated CMIP6 equivalents of these models remain the models with the greatest correlations, but again improvements in the correlations are seen (0.816 in UKESM1-0-LL and 0.785 MPI-ESM1.2-LR). 3.3 Soil carbon turnover time: no major improvements in CMIP6 compared to CMIP5

Global evaluation
There are minor improvements seen in the simulated effective global τ s amongst select CMIP6 models (Table 6) compared with CMIP5 ( Table 7). The ensemble mean effective global τ s was overestimated in CMIP5 (37.8 ± 19.7 yr) when compared with the derived mean τ s using the benchmark datasets (27.0 +27 −11 yr), which is reduced to a less significant underestimation 420 in CMIP6 (23.3 ± 8.59 yr). However, both the CMIP5 and CMIP6 estimates fall within the observational uncertainty range.
The associated ensemble uncertainty on effective mean τ s is less in CMIP6 compared with CMIP5, with a ensemble standard deviation of approximately 50% less. A significant range is seen in the effective global τ s values amongst the CMIP5 models, with 5 fold difference between the lowest and the highest values (Table 7). This range is mostly due to large overestimations seen amongst the CMIP5 models, for example in CanESM2, GFDL-ESM2G and MIROC-ESM. A reduced range is seen in 425 amongst the models in CMIP6, however a 4 fold range still exists between the lowest and the highest values (Table 6). This where the slowest effective turnover time was seen. Moreover, the range is also reduced due to improvements seen in models 430 which underestimated τ s in CMIP5, such as UKESM1-0-LL from HadGEM2-ES and CESM2 from CCSM4.

Spatial evaluation
The comparison of spatial soil carbon turnover times (τ s ) in CMIP6 with CMIP5 has more varied results than comparing simulated NPP. The CMIP5 ensemble showed an underestimation of τ s in the northern latitudes, which is replaced with an overestimation of τ s in CMIP6 when compared to the benchmark data ( Fig. 1(c)). This northern latitude overestimation in 435 the CMIP6 ensemble is a result of the overestimations of τ s in CESM2 and NorESM2-LM (Fig. 7), which dominate in the CMIP6 ensemble mean. It is noted that this result may differ if deeper soil carbon stocks were considered. The northern latitude underestimation of τ s is still seen within the CMIP6 models, such as CanESM5, CNRM-ESM2-1, GFDL-ESM4, IPSL-CM6A-LR, MIROC-ES2L, MPI-ESM1.2-LR, and UKESM1-0-LL (Fig. 7). An overestimation of mid-latitude τ s was seen in the CMIP5 models MIROC-ESM and MPI-ESM-LR (Fig. A5), which is no longer seen in the updated CMIP6 models 440 MIROC-ES2L and MPI-ESM1-2-LR, respectively. However, an overestimation of mid-latitude τ s is seen in CMIP6 models BCC-CSM2-MR, CNRM-ESM2-1 and UKESM1-0-LL (Fig. 7). The uncertainty in simulated northern latitude τ s is also apparent in Fig. 3(c), where the hatching shows the lack of agreement within the CMIP6 ensemble in this region. However, more agreement within the CMIP6 ensemble is seen in the same figure in the mid-latitudes and tropical regions compared with CMIP5.

445
The simulation of spatial τ s in CMIP6 is further evaluated against the empirical data with the additional statistical metrics.
Modelled τ s is found to be poorly spatially correlated to empirical τ s in both the CMIP5 and CMIP6 models (shown by the curved axis in Fig. 4(c)). A slight increase in the ensemble mean spatial correlations is seen from CMIP5 (0.188) to CMIP6 (0.267), due to increases seen amongst individual models between CMIP5 and CMIP6, such as CESM2 from CCSM4, MPI-ESM1.2-LR from MPI-ESM-LR, and NorESM2-LM from NorESM1-M. However, the consistency of modelled τ s with the 450 benchmark datasets remains low. A particularly large range is seen in the spatial standard deviations of τ s amongst the CMIP6 models, which is an increased range from CMIP5 (shown by the radial axis in Fig. 4(c)). The CMIP6 models with the most extreme overestimations of the spatial standard deviations compared to the derived benchmark value (NorESM2-LM, CESM2, and ACCESS-ESM1.5), are also found to have large RMSEs (Fig. 5(c)). Amongst the remaining CMIP6 models, the RMSEs for modelled τ s remain relatively consistent between CMIP5 and CMIP6.

455
3.4 Drivers of soil carbon spatial patterns: Soil carbon spuriously highly correlated with NPP in CMIP5 and CMIP6

Global drivers
A negligible correlation (≈ 0) is found between the benchmark estimates of soil carbon and NPP, suggesting that soil carbon is not spatially correlated to NPP in the real world. On the other hand, soil carbon and NPP (C s -NPP) were found to be significantly correlated in the models in both CMIP5 and CMIP6. The C s -NPP spatial correlation was found to be greater 460 than 0.5 for six out of the ten CMIP5 ESMs and eight out of the eleven models in CMIP6 (Fig. 8(a)). However, a low spatial correlation is found in the CMIP6 models CESM2 (0.134), NorESM2-LM (0.261), and BCC-CSM2-MR (0.214), values most consistent with the benchmark datasets. The C s -τ s spatial correlations found in the CMIP6 models tend to underestimate the positive correlation seen in the benchmark datasets ( Fig. 8(a)). The majority of CMIP6 models see a negligible or slightly negative C s -τ s spatial correlation, despite a low positive correlation produced by the benchmark datasets. The models BCC-465 CSM2-MR, MIROC-ES2L, and NorESM2-LM are most consistent with the benchmark C s -τ s correlation.
The modelled NPP to temperature (NPP-T) spatial correlations in CMIP6 are consistent with the positive relationship seen in the benchmark datasets, however the magnitude of this positive correlation varies amongst the models ( Fig. 8(b)). The magnitude of the positive NPP-T correlation is underestimated in CanESM5, GFDL-ESM4, and NorESM2-LM, but otherwise relatively consistent amongst the CMIP6 models. Nonetheless, a much greater range in the modelled NPP-T correlations 470 was seen amongst the CMIP5 models, suggesting an improved representation of this relationship in CMIP6. The variation in modelled NPP-θ correlations remains in CMIP6, with models disagreeing in the sign and magnitude of the correlation of NPP to soil moisture. The modelled NPP-θ correlation is the most consistent with the benchmark correlations in GFDL-ESM4, MPI-ESM1.2-LR and UKESM1-0-LL ( Fig. 8(b)).
It is generally agreed across the models in CMIP6 and CMIP5 that τ s and temperature (T) are negatively correlated, with 475 the exception of MPI-ESM1.2-LR where a slight positive correlation is seen (Fig. 8(c)). This is consistent with the negative τ s -T correlation derived in the benchmark dataset. There is variation amongst the models in the magnitude of the negative correlation, with a significant overestimation seen in CanESM5. A negative correlation is also seen in the τ s -θ correlation derived with the benchmark datasets. Inconsistencies with this empirical relationship are seen amongst the models in both CMIP5 and CMIP6, with many negligible and positive correlations deduced (Fig. 8(c)). The exception is again MPI-ESM1.2-

Regional drivers
The spatial correlations of modelled C s -NPP are shown to be overestimated at every latitude in both CMIP6 and CMIP5, compared to the equivalent correlations derived from the empirical datasets. It can be seen that the CMIP6 ensemble mean C s -NPP correlation has an even larger positive bias compared to the benchmark correlation than in CMIP5. The empirical data 485 sees a reduced C s -NPP correlation in the northern latitudes, whereas a slight but less significant reduction is seen in the models ( Fig. 9(a)). The spatial correlation between C s -τ s is shown to vary against latitude in the empirical datasets, where a greater correlation is seen in the tropical and northern latitude regions, and a negligible correlation is seen in the mid-latitudes ( Fig.   9(b)). The CMIP6 models simulate the negligible C s -τ s seen in the mid-latitudes relatively consistently with the benchmark data, where an improved consistency is seen from CMIP5. However, the CMIP6 models do not simulate the tropical and 490 northern latitude positive C s -τ s correlations, where a negligible modelled correlation remains in these regions. CMIP5 is more consistent with the benchmark correlations than in CMIP6, where a positive modelled correlation C s -τ s is seen ( Fig. 9(b)).
The spatial correlation between modelled soil carbon and soil moisture (C s -θ) is consistent with the correlations seen in the benchmark datasets at every latitude, with an improvement seen in the tropical correlation patterns in CMIP6 compared with CMIP5 ( Fig. 9(c)). Both the CMIP5 and CMIP6 ensembles span the benchmark C s -θ correlation, though large model ranges 495 in the C s -θ sensitivity are seen across all latitudes. However, there is a reduced ensemble uncertainty in the C s -θ correlation from CMIP5 to CMIP6 in low and mid latitudes. An overestimation of the negative C s -T correlation seen in the benchmark datasets is present in both the CMIP5 and CMIP6 models, except the high latitudes ( Fig. 9(d)). This modelled C s -T correlation is particularly underestimated in the lower tropical latitudes, where a greater positive correlation is seen here in the benchmark datasets. Fig. 9(d) suggests a slight improvement in the modelled tropical C s -T correlation in CMIP6, and a worsening of 500 modelled C s -T in the high latitudes than in CMIP5, when compared to the C s -T correlations in the benchmark datasets.

Global total soil carbon
Simulating global soil carbon stocks that are consistent with empirical data is required to predict reliable projections of future 505 soil carbon storage and emission (Todd-Brown et al., 2013). Despite a reduced spread in model estimates of global total soil carbon within CMIP6 relative to CMIP5, discrepancies remain in the consistency of these estimates with the observations between the two CMIP generations. This together with the uncertainty associated with empirical datasets has resulted in no robust conclusion being drawn on the improvement of soil carbon simulation in CMIP6 compared to CMIP5. Due to the potential significant feedback that exists between soil carbon and global climate, this lack of consistency may reduce 510 our confidence in future projections of climate change (Friedlingstein et al., 2006;Gregory et al., 2009;Arora et al., 2013;Friedlingstein et al., 2014).

Spatial soil carbon patterns
Spatially, the simulation of soil carbon stocks sees some improvement between the CMIP5 and CMIP6 generations. Modelled soil carbon was found to be poorly spatially correlated with the empirical data amongst models in both CMIP5 and CMIP6 (Fig.   515   4(a)). An improvement is seen on the spatial patterns across the mid-latitudes, which were generally overestimated in CMIP5.
However, significant underestimations of modelled soil carbon in the northern latitudes still remains which has a significant impact on model predictions of global total soil carbon stocks ( Fig. 1(a)). This systematic underestimation was previously reported in the literature as a limitation of the CMIP5 models, where Todd-Brown et al. (2013) found northern latitude soil carbon to be less consistent with the empirical data than on a global scale. This limitation remains amongst models in the 520 CMIP6 generation, where it was found that the majority of CMIP6 models underestimate northern latitude soil carbon stocks regardless of whether or not the global soil carbon stocks are underestimated.
An exception to this northern latitude underestimation is seen within CMIP6 in the models CESM2 and NorESM2-LM.
These ESMs include the Land Surface Model (LSM) CLM5 (Lawrence et al., 2019), which is the first LSM to include the representation of vertically resolved soil carbon in their CMIP simulations. This representation enables the inclusion of separate 525 carbon pools at varying depths in the soil, and allows for an improved simulation of soil carbon stocks (Koven et al., 2013). This is of particular importance in the northern latitudes, where carbon stocks are expected to exist at much greater depths than the 1m considered in this study (Tarnocai et al., 2009;Ran et al., 2021). This can be seen in Table 3, where increased magnitudes of soil carbon stocks are shown when increased depths are considered using the empirical datasets. A more thorough evaluation of soil carbon in both CESM2 and NorESM2-LM is suggested for future research, with a particular focus on this improved 530 northern latitude soil carbon stocks simulation, however this evaluation of deeper soil carbon stocks (below 1m) is beyond the scope of this study.
Accurately simulating soil carbon in the northern latitude regions is of particular importance as it is a major part of the total global soil carbon pool (Jackson et al., 2017). Additionally, much of the carbon stored in these soil is held within permafrost, which is known to be particularly sensitive to climate change. Permafrost thaw under climate change has the potential to release 535 significant amounts of carbon into the atmosphere over a short period of time with increased warming (Schuur et al., 2015;Zimov et al., 2006;Burke et al., 2017;Hugelius et al., 2020), representing a significant feedback within the climate system.

Drivers of soil carbon change
To allow for a more in-depth understanding of the inconsistencies found between modelled and empirical soil carbon, the simulation of above and below ground controls of soil carbon were also evaluated. Simulations of contemporary soil carbon can be disaggregated into the effects of litter fall, which is well approximated by plant Net Primary Productivity (NPP), and 545 effective soil carbon turnover time (τ s ), which is affected by both temperature and moisture of the soil (Koven et al., 2015).
If models are to reliably simulate soil carbon in a way that is consistent with empirical data, the spatial drivers of soil carbon, NPP and τ s , must also be simulated consistently with empirical data. Isolating the effects of NPP and τ s on soil carbon helps us to breakdown the simulation of soil carbon to help understand the limitations and inconsistencies seen amongst the models.

550
A robust improvement in the simulation of NPP is seen in the CMIP6 models compared with the CMIP5 models. This conclusion is deduced by: an increased number of models in CMIP6 have global total NPP values consistent with empirical data (Table 6), the overestimation of tropical NPP amongst CMIP5 models is seen to be reduced amongst the CMIP6 models ( Fig.   1(b)), and more agreement is seen within CMIP6 relative to CMIP5 in the simulation of mid and northern latitude NPP ( Fig.   3(b)). Modelled NPP was found to be robustly more consistent with the empirical data in CMIP6 compared with CMIP5 in 555 all statistical evaluation metrics. Since CMIP5, multiple models have seen an addition of a dynamic nitrogen cycle (Davies- Barnard et al., 2020), where the models with nitrogen cycles are highlighted in Fig. 5 by the shaded bars. The results suggest an improvement in the simulation of NPP with the addition of dynamic nitrogen in models. However, CMIP6 models that do not represent a nitrogen cycle also mostly see improvements in the simulation of NPP, suggesting NPP is more constrained by observations in the most up to date generation of models. CanESM5 is the only model within CMIP6 to not see an overall 560 improvement in the simulation of NPP, where NPP is found to be overestimated compared with the benchmark dataset. It is likely that the inclusion of a nitrogen cycle in this model would limit this overestimated NPP and improve consistency with the observations Exbrayat et al., 2013).
Despite an improved simulation of NPP in CMIP6, the spatial correlation between modelled soil carbon and NPP was found to be inconsistent with the equivalent empirically derived relationship. This result was previously shown for the CMIP5 models 565 (Todd-Brown et al., 2013), and has been more recently shown for the CMIP6 models (Georgiou et al., 2021), both agreeing with the results found here. The majority of CMIP6 models were found to have positive C s -NPP spatial correlations, opposed to a negligible spatial correlation found in the observations ( Fig. 8(a)). Despite NPP driving the spatial pattern of soil carbon stocks, a positive correlation is not expected in the real world due to regions with high soil carbon not correlating with regions of high NPP. For example, in the observational derived data soil carbon stocks are greatest in the northern latitudes due to long 570 turnover times in these regions, whereas NPP is lower due to cold temperatures in these regions limiting vegetation growth.
The three CMIP6 models which did not significantly overestimate this correlation (CESM2, NorESM2-LM, and BCC-CSM2-MR), are three of the models with the most empirically consistent proportion of soil carbon stocks in the northern latitudes.
Conversely the tropical regions see high NPP values, but warmer temperatures result in faster turnover times and lower soil carbon stocks. NPP is expected to increase in the future under climate change (Kimball et al., 1993;Friedlingstein et al., 575 1995;Amthor, 1995), which means an overly positive correlation in models could result in a subsequent increase in modelled projections of soil carbon stocks. An overestimation of future soil carbon storage could result in an overestimation of the future carbon sink and an inaccurate global carbon budget (Todd-Brown et al., 2013;Friedlingstein et al., 2020).

Soil carbon turnover time
The systematic improvements seen in the simulation of NPP in CMIP6 are not seen in the simulation of soil carbon turnover 580 time (τ s ), where the simulation of τ s is found to remain inconsistent with the empirical data in CMIP6. Improvements are seen within CMIP6 relative to CMIP5, such as more agreement within the ensemble in the mid-latitudes and tropical regions, however less agreement is seen in the northern latitudes (Fig. 3(c)). Northern latitude τ s is generally underestimated in models, which corresponds to the underestimation of soil carbon seen in these regions. This has been previously identified in ESMs, where it was found that the underestimation of global τ s amongst the CMIP5 models is primarily due to low values in the 585 northern latitudes (Wu et al., 2018). The reduced agreement in CMIP6 is due to long τ s values existing in the northern latitudes of CESM2 and NorESM2-LM, alongside the general ensemble underestimations (Fig. 7). The increased northern latitude τ s values in CESM2 and NorESM2-LM is likely to be due to the improved representation of soil carbon pools, where vertically resolved soil carbon allows for differential τ s values for pools at varying depths. Despite these individual improvements since CMIP5, large discrepancies exist within the CMIP6 ensemble between modelled and empirical τ s .

590
To simulate τ s consistently with observations, the relationship of τ s to both temperature (T) and moisture (θ) must also be simulated in a way that is consistent with observations. Generally, the τ s -T relationship is consistently simulated, however there is variation in the modelled temperature sensitivity of τ s across the ensemble. The τ s -θ relationship is less consistently represented, where the majority of CMIP6 models do not match the empirically derived relationship. Despite a positive dependence of soil respiration on soil moisture, many of the CMIP6 models display a contradictory positive τ s -θ correlation (Fig. 8). This 595 lack of consistency between the modelled and empirical relationships involving τ s , is likely to be due to key soil processes not being represented. Particularly, a limitation of the τ s -θ relationship in ESMs is the representation of peat not being simulated.
Peat forms in wet areas globally, so to simulate the τ s relationship to soil moisture consistently with empirical data, models must simulate increased, longer turnover times in regions where peat exists. Moreover, to accurately simulate the accumulation of peat in models, the soil column must be vertically resolved to allow for the soil column to grow (Chadburn et al., 2021).

600
These results suggest much of the uncertainty associated with modelled soil carbon stocks can be attributed to the simulation of below ground processes. The improved consistency of NPP with empirical data suggests considerable efforts have been made to achieve an improved representation of above ground processes in CMIP6 ESMs since the release of the CMIP5 ensemble.
However, the same improvements are not seen in the simulation of τ s as systemic limitations remain in the new generation of models, suggesting the same progress on the model development of below ground processes has not been achieved between 605 CMIP5 and CMIP6. Moreover, focus on above ground processes without consideration of below ground processes can result in inconsistencies of soil carbon stocks. For example, the inclusion of a nitrogen cycle has been shown to lead to a reduction in soil carbon in the model, see Fig. 6 in Wiltshire et al. (2021), so tuning of the baseline turnover rates is required to keep soil carbon stocks consistent with observed values.
The required improvement of soil carbon pool turnover rates has previously been identified for the CMIP5 ensemble (Nishina 610 et al., 2014), and more recently, Ito et al. (2020) find that the difference in turnover times amongst the CMIP6 models is responsible for approximately 88% of the variation seen in global soil carbon stocks amongst the models, and state that constraining key parameters which control soil carbon turnover processes is a key area for future model development.
A key development seen in CMIP6 since CMIP5 is the representation of vertically resolved soil carbon. Models which simulate nonvertically resolved soil carbon typically turn over all the carbon based on the temperature near the soil surface. This could lead 615 to reduced quantities of soil carbon and an underestimation of northern latitude soil carbon stocks, due to near surface soil being warmer than the deeper soil, and as turnover is known to respond exponentially to temperature (Davidson and Janssens, 2006). Overall, further improvements in the representation of soil carbon turnover time, with a particular focus on the northern latitudes, is identified as a key area for future model development. 1. The spatial patterns of soil carbon in CMIP6 models are more in agreement with each other than they were in CMIP5, and are more consistent with observations in the mid-latitudes. However, soil carbon is still heavily underestimated in high northern latitudes (with the exception of two CMIP6 models that represent deep soil carbon).

630
2. Overall no significant improvements are seen in the simulation of the observed spatial pattern of soil carbon across the globe from the CMIP5 to the CMIP6 generation.
3. There is good evidence that spatial patterns of contemporary NPP are better simulated in CMIP6 than in CMIP5 generation models, when compared to satellite-derived estimates.
4. However, spatial patterns of τ s continue to be poorly represented in CMIP6 models, in comparison to estimates derived 635 from observational datasets of soil carbon and heterotrophic respiration (R h ).
5. Importantly, soil carbon simulations in both the CMIP5 and CMIP6 ESM generations seem to be spuriously highlycorrelated with NPP, which may make soil carbon in these models over responsive to future projected changes in NPP.
Taken together, these conclusions point to a need for a much greater emphasis on improving the representation of below-ground soil processes in next generation (CMIP7) (Arora et al., 2013;Anav et al., 2013;Friedlingstein et al., 2014). Including temperature and moisture functions presented in Todd-Brown et al.