Evaluating two soil carbon models within the global land surface model JSBACH using surface and spaceborne observations of atmospheric CO 2

. The trajectories of soil carbon in our changing climate are of utmost importance, as soil is a substantial carbon reser-voir with a large potential to impact the atmospheric carbon dioxide ( CO 2 ) burden. Atmospheric CO 2 observations integrate all processes affecting carbon exchange between the surface and the atmosphere and therefore are suitable for carbon cycle model evaluation. In this study, we present a framework for how to use atmospheric CO 2 observations to evaluate two distinct soil carbon models (CBALANCE and YASSO) that are implemented in a global land surface model (JSBACH). We transported 5 the biospheric carbon ﬂuxes obtained by JSBACH using the atmospheric transport model TM5 to obtain atmospheric CO 2 . We then compared these results with surface observations from Global Atmosphere Watch stations as well as with column XCO 2 retrievals from the GOSAT satellite. The seasonal cycles of atmospheric CO 2 estimated by the two different soil models differed. The estimates from the CBALANCE soil model were more in line with the surface observations at low latitudes (0 ◦ N-45 ◦ N) with only 1% bias in the seasonal cycle amplitude, whereas YASSO underestimated the seasonal cycle amplitude 10 in this region by 32%. YASSO, soil temperature. The results underline the importance of using sub-annual data in the development of soil carbon models when they are used in shorter than annual time scales. study we used space-born XCO 2 observations in addition to the surface observations of CO 2 . They were providing a larger-scale conﬁrmation for the results obtained from the surface observations and thus provided complimentary information. The number of satellite observations of column XCO 2 are increasing at a fast pace for example OCO-2 observations started in 560 2014, and they possess high potential for carbon cycle studies (Miller and Michalak, 2020).


Introduction
The terrestrial carbon cycle consists of uptake of CO 2 by vegetation for photosynthesis and release of carbon by plants' autotrophic respiration, soil decomposition by heterotrophic organisms and natural disturbances (Bond-Lamberty et al., 2016). 25 Soils store twice as much carbon as the atmosphere (Scharlemann et al., 2014) and its fate in changing climate remains uncertain (Crowther et al., 2016). For example, while Crowther et al. (2016) concluded from a data-based analysis that large carbon stocks will lose more carbon due to warming conditions, van Gestel et al. (2018) questioned this view with an analysis based on a more comprehensive dataset. To have reliable predictions of future carbon stocks, process-based understanding of the below ground carbon cycle is needed . 30 One way to evaluate soil carbon models has been to use observations of soil carbon stocks (Todd-Brown et al., 2013).
At small scales, rates of gas exchange measured in chambers have also been used (Ťupek et al., 2019), but separation of heterotrophic and autotrophic respiration is laborious (Chemidlin Prévost-Bouré et al., 2010). It is anyhow challenging to find reasons for differences in heterotrophic respiration between large scale models, as the litter input to the soil influences heterotophic respiration and this litter input varies between the models. One way forward is to use a testbed for these models, 35 as done by Wieder et al. (2018).
An alternative, regionally integrated approach is using observations of atmospheric CO 2 , which integrate all processes involved in global surface-atmosphere carbon exchange. The surface observation network of atmospheric CO 2 has been used in benchmarking global carbon cycle models (Cadule et al., 2010;Dalmonech and Zaehle, 2013;Peng et al., 2015). Recent advances of satellite technology have enabled retrievals of space-born dry-air total column-averaged CO 2 mole fraction (XCO 2 ), 40 quantifying CO 2 in the entire atmospheric column between the land surface and the top of the atmosphere. These observations reveal a more spatially integrated CO 2 signal compared to surface site observations and together they provide a complementary dataset. These two data sources have been used together to study the carbon cycle with "top-down" inversion modelling (Crowell et al., 2019). This kind of modelling framework uses atmospheric CO 2 observations to constrain a priori biospheric and ocean fluxes, based on the Bayesian inversion technique, which results in optimized estimates (a posteriori) of the fluxes 45 Rödenbeck et al., 2003;van der Laan-Luijkx et al., 2017;Wang et al., 2019). Estimates for fossil emissions are often assumed as known, i.e., not optimized in the inversion.
In this study we present a framework of how to use atmospheric CO 2 observations to evaluate soil carbon models implemented in a land surface model. We apply this to two state-of-the-art soil carbon models as a "proof-of-concept" for a more universal application. Basile et al. (2020) did similar work within a biogeochemical testbed and concluded that heterotrophic 50 respiration can be a valuable benchmark in carbon cycle studies. They emphasized that the seasonal phasing of heterotrophic respiration relative to the net primary production influences the net ecosystem exchange and therefore potentially introduces bias to atmospheric CO 2 that hampers its use as a benchmark.
To obtain the atmospheric CO 2 profiles from our simulations with the land surface model we applied an atmospheric transport model. In this work we used a three-dimensional atmospheric chemistry transport model TM5 (Krol et al., 2005;Huijnen 55 et al., 2010). Generally, transport models, such as TM5, contain errors caused by, for example, poorly resolved advection and heavily parameterized transport schemes (Gaubert et al., 2019). With TM5 we calculated the column averaged CO 2 that can be used to evaluate model results versus the satellite observations. Also satellite observations can include errors. The uncertainty for GOSAT observations has been estimated to be around 1 to 2 ppm Reuter et al., 2013). Contributors to uncertainties in the retrieval algorithms originate, for example, from the solar radiation database and handling of aerosol 60 scattering . Last, also the column XCO 2 profiles have influences from, for example, advection and global scale gradients driven by weather systems (Keppel-Aleks et al., 2011). A model evaluation performed with the column XCO 2 observations enabled a more thorough study of fluxes and atmospheric physics of a modelling system (Keppel-Aleks et al., 2011).
We use in this work JSBACH, the land surface model of the Max Planck Institute's Earth System Model, one of the models participating in CMIP6. The JSBACH model has two distinct soil models implemented in it (CBALANCE and YASSO). We are interested in seeing if the two soil carbon models lead to markedly different CO 2 signals and to explore which conclusions on model performance and process representation can be drawn that could help to improve this land surface model (and potentially other similar models) and our understanding of the land carbon cycle. The two model versions only differ with respect to the underlying soil processes and do not include major feedbacks between soil and vegetation (apart from a small 70 effect of litter accumulation on fire emissions). Thus, the difference in the release of carbon to the atmosphere originates only from the soil carbon models. The two soil carbon models are both first-order decay models. However, they have different pool structures as well al environmental drivers and have differing response functions. CBALANCE uses soil moisture and soil temperature as driving variables and YASSO precipitation and air temperature. In the analysis we also use a simple box model calculation to further understand the main causes in the different outcomes of the models. Our framework combining a land 75 surface model with a transport model allows us to investigate how these above-mentioned differences in soil carbon models influence atmospheric CO 2 . Specifically, we aim to answer the following questions: -How can we use a land surface model together with a transport model to evaluate soil carbon models and what problems do we face when doing that?
-What is the role of soil carbon stocks, the variables driving their decomposition and the functional dependencies of those 80 variables on modelled heterotrophic respiration at global scale and how does this lead to differences in the atmospheric CO 2 signal?
We used the land surface model JSBACH (Giorgetta et al., 2013) to obtain net land-atmosphere CO 2 exchange and fed that, together with ocean, fossil and land use fluxes, into a transport model, TM5, which simulates resulting atmospheric CO 2 at 85 selected surface sites as well as column integrated values for comparison to satellite derived column CO 2 .
2.1 Model simulations: JSBACH with two soil carbon models JSBACH is the global land surface model of the Max Planck Institute's Earth System Model (Giorgetta et al., 2013), simulating terrestrial carbon, energy and water cycles (Reick et al., 2013). In this study JSBACH was run with two different soil carbon sub-models that are described below. The older model, CBALANCE, has been used in CMIP5 simulations of JSBACH 90 (Giorgetta et al., 2013). The newer model, YASSO, has been used in simulations for the annual global carbon budget (Le Quéré et al., 2015;Le Quéré et al., 2018) and is used in CMIP6 simulations of JSBACH (Mauritsen et al., 2019). It is also used in JSBACH4, a re-implementation of JSBACH for the ICOsahedral Non-hydrostatic Earth system model (ICON-ESM) (Giorgetta et al., 2018;Nabel et al., 2019).
Independent of the sub-model used for soil carbon, JSBACH uses three carbon pools for living vegetation: a wood pool, 95 containing woody parts of plants, and green and reserve pools that contain the non-woody parts. JSBACH simulates different processes that lead to losses from the vegetation pools, such as grazing, shedding of leaves and natural or anthropogenic disturbances. Depending on the process, some of the vegetation carbon is lost as CO 2 into the atmosphere, while the remaining part is transferred as dead vegetation into the litter and soil pools of the sub-model for soil carbon, where it is then subject to the internal processes of the soil carbon sub-model. The only process outside of the soil carbon sub-model that influences dead 100 material is fire, burning parts of above ground litter carbon.
2.1.1 The soil carbon model CBALANCE CBALANCE (CBA) is the original soil carbon sub-model of JSBACH (Raddatz et al., 2007), which has been used in CMIP5.
The environmental drivers for decomposition in CBA are soil temperature (at soil depth of 30 to 120 cm below the surface) and relative soil moisture (α) of the upper-most soil layer, which is 5 cm thick. α varies between zero and one.

105
The function for soil temperature dependence, f CBA,T soil of decomposition follows a Q 10 formulation as

10
(1) with a Q 10 value of 1.8 and T soil as soil temperature in • C (shown in Fig. S1a) (Raddatz et al., 2007). The dependency on relative soil moisture α is linear (Fig. S1b) and it is calculated as where α crit is 0.35 and α min is 0.1 (Knorr, 2000).
Together these functions are modulating the rate of decomposition, so that the heterotrophic respiration (R h ) from each pool where C i is the carbon content of each pool and τ i is the turnover time of each pool in days. CBA uses five different carbon 115 pools having different turnover times: -Two green litter pools: one above-and one below-ground in which the non-woody plant parts decompose with turnover times between 1.8 and 2.5 years (Goll et al., 2015) -Two woody litter pools: one above-and one below-ground in which the woody plant parts are decomposed with turnover times of several decades 120 -One slow pool receiving its input from the four litter pools, having a turnover time in the order of a century.

The soil carbon model YASSO
The original soil carbon model of JSBACH was replaced by YASSO (YAS) (Thum et al., 2011;Goll et al., 2012). JSBACH's YAS implementation is based on the Yasso07 model (Tuomi et al., 2009). Development of Yasso07 relied heavily on litter bag and other observational data sets that were used to estimate model parameters (Tuomi et al., 2009. Owing to its strong 125 connection to experiments, its environmental drivers are quasi-monthly air temperature and precipitation.
The decomposition depends on precipitation P a [m] as where γ = -1.21 m −1 (Fig. S1d). The environmental drivers for YAS (precipitation and air temperature) are averaged for 30-day periods.
Similar to CBA, YAS has slowly and rapidly decomposing pools, but its pool dynamics are more structured. First, all the pools are divided into woody and non-woody materials. The difference in the calculation of the decomposition rates between 135 non-woody and woody pools is an additional parameter that increases the turnover rates of the woody litter, dependent on its size parameter , which is plant functional type (PFT)-dependent.
YAS takes the chemical composition of the incoming litter into account. The incoming litter is divided to different chemical pools according to the PFT-dependent factors. Information for the PFT-dependent factors for the litter decomposition has been derived from observations (Berg, 1991;Berg et al., 1991;Gholz et al., 2000;Trofymow, 1998). YAS uses four chemically 140 distinct pools: acid soluble, water soluble, ethanol soluble and non-soluble. For each of these four chemical compositions one above-and one below-ground pool is used. In addition there is one humus pool (divided to woody and non-woody pools as all the other pools). Dynamics of the YAS carbon pools are described in Tuomi et al. (2009) with decomposition fluxes causing redistributions among the pools or losses to the atmosphere. Each of the pools has a decay constant, which is modified by the environmental dependencies in Eqs. (4) and (5).
2.2 The model simulations: The JSBACH set-up JSBACH model simulations followed the TRENDY v4 protocol in terms of JSBACH version, simulation protocol and forcing data (Le Quéré et al., 2015;Sitch et al., 2015). Climate forcing was based on CRUNCEP v6 (Viovy, 2010) and global atmospheric CO 2 was obtained from ice core and National Oceanic and Atmospheric Administration (NOAA) measurements (Sitch et al., 2015). For each set-up, the model was run to equilibrium, i.e. until the soil carbon pools of the applied carbon 150 sub-model were at steady-state. The two different transient simulations were then done for 1860 to 2014. Anthropogenic land cover change was forced by the LUHv1 dataset (Hurtt et al., 2011) and was simulated as described in Reick et al. (2013). While fire and windthrow were simulated, natural land cover changes and the nitrogen cycle were not activated. Simulations were done at T63 spatial resolution (approximately 1.9 • or 200 km). For further details on the spin-up and the model version please refer to the SI.

The model simulations: TM5
To estimate atmospheric CO 2 , we used the global Eulerian atmospheric transport model TM5 (Krol et al., 2005;Huijnen et al., 2010) in an available pre-existing set-up. TM5 was run globally at 6 • x 4 • (latitude x longitude) resolution with two-way zoom over Europe, where the European domain was run at 1 • x 1 • resolution. The 3-hourly meteorological fields from ECMWF ERA-Interim (Dee et al., 2011) were used as forcing to run TM5. Linear interpolation was done to obtain CO 2 estimates at the 160 exact locations and times of the observations.
We fed TM5 daily biospheric as well as weekly ocean and annual fossil fuel fluxes to obtain realistic atmospheric CO 2 .
Values of gross primary production (GPP) and total ecosystem respiration were taken from the JSBACH simulations for the two different soil model formulations. Also, carbon release from vegetation and soil owing to land-use change, fires and herbivores were taken from the JSBACH model results as part of terrestrial biospheric carbon fluxes. In addition, we used the 165 posterior biospheric flux estimates from CarbonTracker Europe (CTE2016, later referred to as CTE; van der Laan-Luijkx et al. (2017)) to provide some guidance on the ability of TM5 to represent the individual site observations. The ocean fluxes were the a posterior estimates from the same study.  Table S1. Simulations with TM5 were done for 2000-2014.

The surface observations
Surface observations of atmospheric CO 2 from NOAA weekly discrete air samples (ObsPack product: GLOBALVIEWplus 175 v2.1; ObsPack (2016)) were used to evaluate the effect of different soil carbon models on tropospheric CO 2 seasonal cycles at sites around the globe. The sites used in the evaluation are shown in Fig. 1. The uncertainties of NOAA flask air measurements for the period of this study are ±0.07 ppm (with 68% confidence interval). From the data, samples reflective of well-mixed background air were selected (based on flag criteria) similar to van der Laan-Luijkx et al. (2017) to minimize the influence of transport model errors in our analysis.  (Kuze et al., 2009). These data were used to evaluate the different simulations and to assess model performance at larger spatial scale. XCO 2 from the TM5 simulation was calculated using global 4 • x 6 • x 25 (latitude x longitude x vertical levels) daily average 3-dimensional (3-D) atmospheric CO 2 fields. For each 185 satellite retrieval, the global 3-D daily mean gridded atmospheric CO 2 estimates were horizontally interpolated to the location of the retrievals to create the vertical profile of simulated CO 2 . Averaging kernels (AKs) (Rodgers and Connor, 2003) were applied to model estimates to ensure reliable comparison with GOSAT retrievals: whereĈ is XCO 2 , scalar c a is the prior XCO 2 of each retrieval, h is a vertical summation vector, a is an absorber-weighted 190 AK of each retrieval, x is a model profile and x a is the prior profile of the retrieval . The retrievals for different terrestrial TransCom (TC) areas ( Fig. 1) were compared with those calculated from the two model simulations. For comparison with GOSAT XCO 2 , the estimates of 3D fields at 6 • x 4 • resolution were used, but not those from the zoom grids due to technical reasons. Differences in XCO 2 due to model resolution were not significant within the context of this study. In this work GOSAT observations (NIES retrieval V02.21 and V02.31) between July 2009 and the end of 2014 were used. Since the two different model formulations differ only in their soil carbon module formulation, the incoming flux to the ecosystem from photosynthesis is the same in both cases. We analyzed results for 2000-2014, and we show here averaged values for that period. The main target variable of our analysis is understanding the role of heterotrophic respiration, but to better elucidate how it influences the atmospheric CO 2 , we also show net primary production (NPP) and net ecosystem 210 exchange (NEE). NPP is obtained from the gross primary production (GPP) by subtracting autotrophic respiration. NEE is obtained by subtracting from GPP total ecosystem respiration, direct land cover change, fire, harvest and herbivory fluxes, as shown in Table 2.
Even though annual total global values of heterotrophic respiration are close between the different model formulations (Table   2) and September. In the region 10 • S -30 • S YAS has 54% larger amplitude in R h than CBA. The differences in heterotrophic respiration lead to pronounced differences in the NEE within the tropics (Fig. 3e, f).
The variation in R h seasonal dynamics of these two model formulations can be linked to the differences in their environmental drivers and functions. In Table 3  and a high correlation with soil temperature T soil in the northern high latitudes (30 • N-90 • N) and lower, but significant, correlation in southern high latitudes (30 • S-60 • S). For other combinations of regions and drivers the r values are low for CBA and in three regions the dependency between α and R h is negative. In two of these regions with a negative relationship between α and R h (located in high latitudes), the variability of α is quite small and the plot shows high scatter (Fig. S3). The shape of 245 the T soil dependency on the CBA decomposition is exponential, and the relationship is significant, when the range of the T soil values is over 15 degrees, which is larger than what is occurring in the tropics (Fig. S4).
For the YAS model, on the other hand, R h shows strong correlation to its environmental drivers (Table 3) (Table 3 and Fig. S7). This is not unexpected 255 as such, since α is not the driver of the YAS model. In the tropical region soil moisture for CBA and precipitation for YAS are more important drivers compared to soil and air temperatures. At high latitudes temperature has a larger effect on R h in the results of both models, even though in the Northern Hemisphere precipitation also has a significant role for YAS.
We also investigated, whether the seasonal cycle of the heterotrophic respiration is correlated with litter fall. The only significant correlation occurred at 30 • N-60 • N for both model versions. This was caused because both have similar annual 260 cycles of R h and litter fall, but the seasonal cycle of R h precedes litter fall (Fig. S8).
Global simulated GPP of 167 PgCyr −1 (Table 2)  The annual net CO 2 flux shows a slightly larger land sink for YAS than CBA (Table 2). Owing to the larger litter pool, fire fluxes are larger in the YAS model formulation by 0.50 PgCyr −1 , however they have similar spatial patterns (Fig. S10). This caused the heterotrophic respiration of YAS to be 0.56 PgCyr −1 smaller than by CBA, since the model was spun-up to steady state in 1860 and thus leads to a small discrepancy in net CO 2 fluxes between the two model formulations.

Carbon stocks
The soil carbon stocks simulated by the two models differed in magnitude and also their latitudinal distributions differed. The global estimate for total soil carbon by CBA was 4.5-fold larger than by YAS (Table 1). The global estimate for litter simulated by the YAS model was larger than that simulated by CBA. Vegetation carbon biomass was similar in both model formulations (Table 1).

275
The global distribution of soil carbon is very different between the model formulations ( Fig. S11c, d, Fig. S12).   (Fig. 4 a). The YAS model shows a large spread of turnover times at warmer temperatures but below 0 • C the range is narrower (Fig. 4 b). Both models predict the fastest turnover rates in moist and warm conditions. The

Box model 295
To assess whether the larger seasonal cycle amplitude in R h by YAS is caused by the larger litter pool or the environmental response functions, a simple box model calculation was performed (detailed description is given in Appendix). When global respiration was calculated with the turnover times and soil carbon pools of the YAS model, but using the environmental responses and drivers of the CBA model, the annual magnitude decreased by 29% compared to the original YAS model (Table A1). However, the yearly maximum value did not change much. When the opposite was done, and the turnover time 300 and soil carbon pools of CBA were used with the environmental responses and inputs of the YAS model, the magnitude of global heterotrophic respiration increased by approximately 1.4-fold (Fig. A1). The increase in the amplitude was 83% (Table   A1). Therefore, this simple analysis suggests that the environmental variables and their response functions cause the larger global amplitude of R h in the YAS model formulation. To further disentangle whether this change was caused by the different environmental drivers or their functional dependencies, we made additional tests.

305
The amplitudes of the seasonal cycle of R h (difference between the maximum and minimum values) are shown in Table A1.
For the YAS model, there happens a strong decrease in the amplitude when both driver variables and the response functions are changed. When only driver variables are changed, only a slight decrease occurs. When the response functions are changed, the decrease in the amplitude is more pronounced with 21%. The amplitude predicted by the CBA model increases, when the driving variables and response functions are changed (Table A1). This increase occurs when either driving variables or 310 response functions are changed individually. However, with the change of the response functions the change in amplitude is larger (74%). In summary, the response functions have a more pronounced role in the changes than the driving variables alone, and this was true for both models.

Evaluation against surface observations
Seasonal cycle amplitudes of atmospheric CO 2 are successfully simulated by the modeling framework across different latitudes 315 (Fig. 6a). The r 2 values of the observed seasonal cycle and the model estimates are high across latitudes, despite some lower values in mid-latitudes of the Northern Hemisphere (Fig. 6b). Averaged over all latitudes the r 2 value, calculated as linear correlation of simulated and observed averaged annual cycles, was 0.93 for CTE, 0.90 for CBA and 0.87 for YAS.
The capability of the model formulations to simulate the amplitude of the seasonal cycle differs within latitudinal regions (Fig. 6). The CBA model is able to capture the timing of the seasonal cycle in northern latitudes, but has a tendency to Four surface observation sites in the Northern Hemisphere illustrate similar behaviour of the seasonal cycle and its amplitudes as described above (Fig. 7 and Table S3). To confirm the general quality of the TM5 model used for both YAS and  (Table S3), whereas YAS underestimates them strongly. YAS is also having difficulty capturing the seasonal pattern at Niwot Ridge. This was happening generally in the temperate region, as is also seen in the When comparing the overall bias in atmospheric CO 2 at these four sites between the observations and the model simu-  and Australia (TC=10). For completeness we show the analysis also for these regions in  Thoning et al. (1989)), but for regions with missing data or no clear seasonal cycle, we averaged over all years of data.
To further illustrate the results from this comparison, we show data for two regions having a clear seasonal cycle. In TC region 2, the southern part of North America, CBA is more successful in capturing the observed seasonal cycle amplitude 355 than YAS (Fig. 8a), even though CBA reaches the minimum XCO 2 later than observations. YAS underestimates the seasonal cycle amplitude by 56% and has a different seasonal pattern than observations, so the minimum is reached earlier than in the observations and also the shape during the summer period differs from the observations. In Europe, TC region 11, both models capture the seasonal cycle amplitude (Fig. 8b, Table S4) and the seasonal cycle in the first part of the year. The increase of CO 2 in autumn is not captured so well by the simulations.

360
Overall, observed and simulated XCO 2 differ from each other in ways similar to the surface site observations. Estimates of seasonal cycle amplitude by YAS are too small in mid-latitudes (Fig. 8a) and in TCs 2, 5 and 8 compared to the observations, and CBA is better at capturing the observed annual cycles. At TC=1 (the northern part of North America), CBA overestimates the seasonal cycle amplitude, while YAS better captures it. However, the seasonal cycle pattern is better captured with CBA (Table S4) than with YAS. Generally YAS had smaller seasonal cycle amplitudes than the observations and CBA was more 365 consistent with the observations in most TC regions. CBA is also better than YAS in capturing the seasonal pattern of XCO 2 in all TC regions (Table S4).
There is bias in absolute XCO 2 between the GOSAT retrievals and the model simulations. When averaged over the time period used and the TC regions, CBA overestimates the GOSAT observations by 3.37 ppm and YAS by 2.33 ppm. These values were in line with bias in absolute CO 2 estimates at the four surface sites.
In this work our aim was to use atmospheric observations to assess whether soil carbon models of a land surface model can be evaluated with this kind of framework. Our main finding was that the two models predicted different annual cycles of global R h , with the YAS model having a larger amplitude. This in turn leads to clear differences in the model predictions of seasonal cycles of the atmospheric CO 2 concentration at surface stations and TC regions. To attribute the differences between the two 375 models to a specific cause, we need to compare their results from their different aspects and to also judge whether our model simulations are reasonable in the light of previous research.

Evaluation of carbon fluxes
Annual heterotrophic respiration was 66.1 PgCyr −1 for CBA and 65.5 PgCyr −1 for YAS (Table 2) showed that the response functions themselves lead to pronounced differences between soil carbon models (Wieder et al., 2018).
When heterotrophic respiration is compared by latitudinal zones, differences between the model formulations are visible in the variability and timing of the seasonal cycles in many regions (Fig. 3). R h correlates strongly with the environmental drivers of the models in different latitudinal zones (Table 3). Both models are largely influenced by their moisture dependency 395 in the tropical region (Table 3). CBA is driven by soil moisture with a linear dependence and YAS is driven by precipitation with an exponential relationship. Since the ranges of precipitation are larger than the variability in soil moisture and due to the exponential relationship between precipitation and decomposition in YAS, YAS is more tightly coupled to moisture than CBA.
At annual timescales, at which the YAS model was originally developed, precipitation and soil moisture behave similarly.
However, the seasonal cycles of the two variables are different. Precipitation begins earlier in the season in the tropical region, 400 and it causes YAS to reach yearly maximum heterotrophic respiration earlier than CBA, which is driven by soil moisture in this region. Similarly, air and soil temperatures are more similar on the long term as for short periods. Particularly in the temperate region, where the temperature has a larger role, the air temperature has larger variability than soil temperature and this leads to different kind of seasonal pattern of the R h predictions by the two different soil models.
The observations show that litterfall has strong influences on heterotrophic respiration (Chemidlin Prévost-Bouré et al., 405 2010). At seasonal timescales in the different latitudinal zones, there is no clear influence of litterfall driving the heterotrophic respiration seen in the models, which primarily results from the pre-defined turnover times of the fast litter pools, which smooth out individual litter fall events. Changes in the chemical composition of litterfall are considered to be one potential reason for changes in the amplitude of atmospheric CO 2 (Randerson et al., 1997) and this is something we could study with the YAS model.

410
Different moisture dependencies of R h have earlier been found to be important (Exbrayat et al., 2013). At the global level  (Welp et al., 2011). Fig. S9 shows that the bias relative to FLUXCOM exists 425 throughout most of the Northern Hemisphere and the tropics, but has only minor influence on the seasonal cycle of GPP. The high estimate of GPP will propagate into larger NPP, litter input and therefore also simulated heterotrophic respiration and soil carbon stocks. While this may contribute to a slightly larger simulated seasonal cycle of atmospheric CO 2 at northern stations, it is unlikely that this will affect our conclusions on the impact of the different soil formulations on the ability of JSBACH to simulate the seasonal cycle of heterotrophic respiration and the residence time of carbon in soil, and as a consequence, its 430 ability to reproduce observed seasonal cycle of atmospheric CO 2 or its longterm trend. Nevertheless, this comparison shows that in order to further improve JSBACH's performance against these data, GPP biases should be reduced. Furthermore, the high GPP values resulting from the simulations would likely be lower, if the nutrient cycles of nitrogen and phosphorus were included in the used version of JSBACH (Goll et al., 2012). Beside using a JSBACH version with nutrient cycles, further development work in the phenological cycle could improve the estimated GPP. The difference of the modelled GPP to the

Evaluation of carbon stocks and turnover times
The two soil models predicted different global soil carbon stocks (Table 1) with different latitudinal distributions (Fig. S12).
Similar to earlier studies (Goll et al., 2015;Thum et al., 2011), in our results the YAS model was more successful than CBA 440 in estimating global soil carbon stocks similar to estimates from observations, approximately 1500 PgC including large uncertainties (from 504 to 3000 PgC) (Scharlemann et al., 2014), as can be seen in the different estimates from HSWD (1578 PgC) and SoilGrids (2870 PgC) (see also Tifafi et al. (2018)). The YAS model is widely used in different applications at smaller scale and its performance to estimate soil carbon stocks has been found to be good (Hernández et al., 2017). Comparability between the model-calculated and the observed carbon stocks is relevant for any analyses of carbon fluxes because in The distribution of soil carbon stocks was also more realistic in YAS than in CBA (Fig. S12, Table S2). The large soil carbon stocks in the mid-latitudes predicted by CBA (Figs. S11c, S12) are unrealistic compared to current data-based estimates of the global soil carbon distribution (Fig. S12). The large carbon stocks at high latitudes predicted by the YAS model (Figs. S11d, The environmental responses of the turnover times have quite different forms for the two soil carbon models (Fig. 4). The CBA model shows a wide distribution of turnover times across the whole temperature range, whereas the YAS model shows 460 a larger spread in the tropical temperature range. This large spread in warm conditions is also observed (Koven et al., 2017) and is caused by the saturating temperature function of the YAS model, as shown in Fig. S1c. The large spread in turnover times as predicted by the CBA model might be caused by the fact that CBA is driven by soil temperature in one soil layer.
The environmental responses of the turnover times at annual time scales behave similarly as at monthly time scales, so that moisture is a more important driver in warm regions and temperature in cold regions, as was seen in Table 3.

465
The study by Koven et al. (2017) provided an empirically based turnover time as a function of temperature. At 20 • C this turnover time was approximately 11 ± 2 years, being closer to the estimate for the YAS model (calculated for values 19.5 -20.5 • C, and their standard deviation), being 22 ± 21 years • C and much lower compared to the CBA estimate of 64 ± 37 years.
In lower temperatures, at -15 • C, the empirically based turnover time is 200 ± 100 years, and YAS underestimates this with 82 ± 41 years (calculated for values -15.5 -(-14.5) • C), whereas the prediction by CBA is closer (150 ± 80 years). Therefore, 470 the turnover times simulated with the YAS model are closer to the observations in warm temperatures, but the turnover times are too low in cold temperatures. CBA estimated too high turnover times in warm temperatures, but turnover times in colder temperatures were in the same order as the observations.
The global turnover time of soil carbon by CBA was somewhat larger than in an earlier study, where it was estimated to be 40.8 years (Todd-Brown et al., 2014). This value was in the higher end of the CMIP5 models. The global turnover time 475 from YAS, which was 14.8 years, is more in the range of the other CMIP5 models (Todd-Brown et al., 2014). The spatial distribution of the turnover time anomalies show differences caused by the environmental drivers and their dependencies at annual timescales. When comparing these overall turnover times of total soil carbon, it is important to keep in mind that both models consisted of carbon pools that had widely varying turnover times. For example, despite the higher overall turnover time, the turnover time of the most recalcitrant carbon pool of YAS was an order of magnitude smaller than that of CBA. The differences between the two models in the seasonal cycle of atmospheric CO 2 were strong. CBA better reproduced the seasonal cycle amplitudes capturing the shape of the seasonal cycle both for surface sites and comparisons in the TC regions, even though its soil carbon distribution had worse performance compared to YAS. CBA exaggerated the seasonal cycle amplitudes at high northern latitudes, as has been found earlier (Dalmonech and Zaehle, 2013). It is important to keep in mind that 485 this study was done within a land surface model and modelled GPP was biased. The simulated GPP had a larger magnitude and some bias in its seasonal cycle, and therefore its evaluation against atmospheric CO 2 observations is influenced by it.
Even though the atmospheric observations provide a valuable and informative comparison for the model results, their use as a benchmark metric needs careful consideration.
The differences in absolute CO 2 and XCO 2 levels against the surface observations and the satellite retrievals, respectively, 490 with modelled CO 2 are caused by the modelling system, but this bias does not influence the analysis performed. We obtained the land surface fluxes (GPP, respiration, fire, herbivory fluxes, land-use change emissions) from JSBACH and together with the rest of fluxes from CarbonTracker Europe2016 (CTE), we used TM5 to obtain atmospheric CO 2 values. Fossil fuel emissions have not been optimized in CTE. Therefore we obtained ocean fluxes that had been optimized with the land carbon cycle of CTE, that differ from the JSBACH estimate. The land carbon cycle of CTE is modelled by the SiBCASA-GFED4 model 495 (van der Velde et al., 2014) and fire emission that were estimated from satellite observed burned area (Giglio et al., 2013).
The net global a posteriori land sink of CTE is approximately -2.0 (± 1.1) PgCyr −1 for 2001-2014. On the other hand, the JSBACH estimate for the net land sink is approximately -1.7 PgCyr −1 (Table 2) and is therefore smaller than the land sink by CTE. The fire flux of JSBACH is modelled, whereas the estimate in CTE is based on data. As shown in Fig. S13 for Mauna Loa, the bias in the CO 2 develops during the study period and the plot shows consistency so that YAS, which predicts a net 500 land sink closer to CTE than CBA, has smaller bias at the end of the time period. We concentrated the analysis on the averaged seasonal cycles, that are not influenced by this linear increase.
The space-borne observations give a similar message as the surface observations in TransCom regions, which showed clear seasonal cycle. Niwot Ridge is located in TransCom region 2 (southern part of North America) and also there YAS showed too low amplitude and CBA performed better, similarly as seen in the Fig. 8. The Pallas site is located in TransCom region 505 observations at Pallas and TransCom region 11, the models both perform acceptably. Using large TransCom regions helped to interpret the signal despite the larger variability than in the surface observations (comparing grey shaded regions in Figs. 7 and 8) and it has been recommended to use the information content of the satellites on continental scales (Miller et al., 2018).
The transport model itself also brings uncertainty to the result. Modelling of atmospheric transport is a challenging task as 510 open scientific questions in the field remain (Crotwell and Steinbacher, 2018) and the models contain biases (Gurney et al., 2004). The errors in atmospheric transport models cause a substantial difference in the inverse CO 2 model flux estimates . However, in this study we only used one atmospheric transport model. It is expected that the biases, as only one transport model was used, are similar between the two soil model runs and are not the cause for the large differences seen in the two simulations.

Conclusions
We demonstrated how atmospheric CO 2 observations can be used to evaluate two soil carbon models within the same land surface model and the different viewpoints offered by several variables considered. We used two different soil carbon models within one land surface model and used a three-dimensional transport model to obtain atmospheric CO 2 , while obtaining the anthropogenic and ocean fluxes from CarbonTracker Europe framework. We evaluated the carbon stocks of the soil models 520 and compared seasonal cycles calculated with soil carbon fluxes from the soil models to atmospheric CO 2 results from both surface and space-born observations. This work highlighted how the changes in the heterotrophic respiration transfer to the net ecosystem exchange estimates and further to the atmospheric CO 2 signal. We also discussed the importance of the model drivers and their functional dependencies, which differed for the two soil carbon models we studied. When considering both surface-and space-based observations, it is not straightforward to say which of the two soil carbon models performed better.

525
The comparison of the two soil carbon models revealed large differences in their estimates. The YAS model better captured the magnitude and spatial distribution of soil carbon stocks globally. However, it was biased in its atmospheric CO 2 cycle at temperate latitudes in the Northern Hemisphere. The CBA model, on the other hand, showed better performance in capturing the seasonal cycle pattern of atmospheric CO 2 , but it is biased at high latitudes in the Northern Hemisphere. CO 2 amplitude only in the northern high latitudes. The linear moisture dependence therefore seems justified, however it likely causes the Central Asian region to have too large carbon stocks. Whether this is caused by too high drought sensitivity or problems in the predicted soil moisture by JSBACH is difficult to judge. The too high amplitude in the northern high regions might be a result from the biases in the gross fluxes of the modeling system.

540
The evaluation was done within a land surface model that overestimates GPP in comparison to an upscaled GPP product and this hampers doing benchmarking using this modeling system. Since the model is run to a steady-state during the spinup procedure, it also leads to other biases in the modelling system (influencing e.g. autotrophic respiration). Overestimated GPP leads to an enhanced litter input to the soil. This causes comparing the magnitudes of the soil carbon pools to the actual observations cumbersome, as the overestimated litter fall causes biases in the model estimates. In this study the magnitudes 545 of simulated soil carbon are therefore not as good as the spatial patterns as an indicator for the model performance (such as latitudinal gradient). The other downside of the GPP biases is their influence on the estimated NEE. Due to the biases in the timing and magnitude of the other carbon fluxes, it is challenging to use CO 2 as a benchmark to heterotrophic respiration.
However, in our study the two soil models lead to pronounced differences in the atmospheric CO 2 and we were also able to locate latitudinal regions, where the models had most issues. Therefore, this approach provides a method to evaluate how the 550 changes in the heterotrophic fluxes further influence the atmospheric signal and helps to track which geographical areas are contributing to the questionable model performance.
Soil carbon models have several development needs van Groenigen et al., 2017) that are now partly being answered with next generation models including more mechanistic representation of several below ground processes (Wieder et al., 2015;Yu et al., 2019). The development of moisture dependency from simple empirical relationships is moving 555 towards mechanistic approaches, which may yield more reliable results in the long term (Yan et al., 2018). Our results confirm that the moisture dependency of heterotrophic respiration plays on important role in the whole global carbon cycle.
In this study we used space-born XCO 2 observations in addition to the surface observations of CO 2 . They were providing a larger-scale confirmation for the results obtained from the surface observations and thus provided complimentary information.
The number of satellite observations of column XCO 2 are increasing at a fast pace for example OCO-2 observations started in 560 2014, and they possess high potential for carbon cycle studies (Miller and Michalak, 2020).
Code and data availability.

575
where R h,Y AS is the heterotrophic respiration of model YAS, b is a scalar that takes into account the different magnitudes of the response functions, T air is air temperature, P a is annual precipitation, C soil,Y AS are the total soil carbon pools and τ Y AS is the turnover time of the total soil carbon pools. T soil is soil temperature and α is the relative soil moisture. This formulation in A1 refers to the YAS model. The response functions are as shown in Section 2.1.2. For the CBA model the formulation is as These responses were introduced in Section 2.  Table A1). Additionally we changed the driving variables, but kept the original response functions (lines C in Table A1). Then we changed only the response functions of the original model while keeping the original driving variables (lines D in Table A1).
Since the driving variables of soil moisture and annual precipitation differed in magnitude by approximately 4-fold, soil moisture was multiplied by four when using the function for annual precipitation (f Y AS,Pa ) and when annual precipitation was 590 used in the function for soil moisture (f CBA,α ) it was divided by four. The annual cycles of R h are shown in Fig. A1 and the amplitudes in Table A1.