Multi-year observations reveal a larger than expected autumn respiration signal across northeast Eurasia

. Site-level observations have shown pervasive cold season CO 2 release across Arctic and boreal ecosystems, impacting annual carbon budgets. Still, the seasonality of CO 2 emissions are poorly quantified across much of the high latitudes due to the sparse coverage of site-level observations. Space-based observations provide the opportunity to fill some observational gaps for studying these high latitude ecosystems, particularly across poorly sampled regions of Eurasia. Here, we show that 5 data-driven net ecosystem exchange (NEE) from atmospheric CO 2 observations implies strong summer uptake followed by strong autumn release of CO 2 over the entire cold northeastern region of Eurasia during the 2015–2019 study period. Combining data-driven NEE with satellite-based estimates of gross primary production (GPP), we show that this seasonality implies less summer heterotrophic respiration ( R h ) and greater autumn R h than would be expected given an exponential relationship between respiration and surface temperature. Furthermore, we show that this seasonality of NEE and R h over northeastern

during the shoulder seasons at high-latitudes? And if so, (2) what are the underlying mechanisms driving this behaviour?
We first examine the seasonality of net ecosystem exchange (NEE) constrained by atmospheric inversions of retrieved column-averaged dry-air mole fractions of CO 2 (X CO2 ) from the Orbiting Carbon Observatory 2 (OCO-2) (Crisp et al., 2017;Eldering et al., 2017) and by flask and in situ CO 2 measurements (Sec. 2.2). Monthly NEE is obtained from version 9 of the OCO-2 Model Inter-comparison Project (OCO-2 MIPv9) (Peiro et al., 2021). In addition, we perform a set of three higher tem-60 poral resolution inversions using the CAMS, TM5-4DVar and CMS-Flux inversion systems to examine sub-monthly variability in CO 2 fluxes.
We then look to decompose NEE into component fluxes to better understand the processes driving the seasonality of NEE.
In particular, we decompose the data-driven NEE fluxes into net primary production (NPP) and heterotrophic respiration (R h ): (1) To do this, we utilize four data-driven gross primary production (GPP) products: FLUXCOM (Jung et al., 2020), FluxSat (Joiner and Yoshida, 2020), the Vegetation Photosynthesis Model (VPM) (Zhang et al., 2017), and the Global OCO-2-based SIF product (GOSIF) (Li and Xiao, 2019). These datasets utilize MODIS reflectances, OCO-2 solar induced fluorescence and reanalysis data to infer GPP, and thus provide an ensemble of global estimates of GPP to inform its uncertainty. NPP is 70 estimated from GPP using the carbon use efficiency (CUE) from the TRENDY models (Sec. 2.4) using the relationship: We then combine the data-driven estimates of NEE and NPP to recover a data-driven seasonal cycle of R h (Sec. 2.5).
We perform this analysis at two temporal resolutions. First, we leverage the large ensembles from TRENDY and the  MIPv9 that provide fluxes at monthly temporal resolution (Sec. 3.1). However, because phenological changes can be significant 75 on shorter timescales (e.g., weekly, Parazoo et al. (2018a)), we perform a second analysis at 14 day temporal resolution using https://doi.org/10.5194/bg-2022-40 Preprint. Discussion started: 2 March 2022 c Author(s) 2022. CC BY 4.0 License.
where BB is biomass burning, across a range of inversion systems. Version 9 of the OCO-2 MIP (MIPv9, (Peiro et al., 2021)), provides ensembles of nine inversion systems that assimilated a standardized set of in situ and flask CO 2 measurements for one 110 experiment (referred to as "IS") and OCO-2 ACOS b9 land nadir and land glint retrievals for a second experiment (referred to as "LNLG"). We estimate NEE fluxes from MIPv9 NBE fluxes by removing biomass burning emissions from the Global Fire Emissions Database version 4 (GFED4.1s) (van der Werf et al., 2017). GFED4.1s provides estimates of biomass burning using MODIS burned area (Giglio et al., 2013), thermal anomalies, and surface reflectance observations (Randerson et al., 2012).
Note that biomass burning is a relatively small contribution to NBE over the regions examined here during the study period 115 (2015-2019) (Fig. S1). The NEE fluxes produced by each ensemble member over northern Eurasia are shown in Fig. S2.
To examine variability in fluxes at the sub-monthly time step, we examine three other inversion NEE estimates that optimize sub-monthly NEE fluxes: TM5-4DVAR 14day LNLGIS, CAMS 14day LNLGIS, and CMS-Flux 14day LNLGIS. These inversions assimilated both in situ and flask CO 2 in addition to OCO-2 ACOS b10 land nadir and land glint retrievals (we refer to this experiment as LNLGIS). Note that the ACOS b10 retrievals are updated from the b9 retrievals employed in MIPv9. The prior 120 and posterior NEE fluxes produced by each ensemble member are shown in Fig. S3, and the inversion set-ups are described below.
TM5-4DVAR is a variational inversion framework based on the TM5 atmospheric tracer transport model (Meirink et al., 2008;Basu et al., 2013). The TM5-4DVAR 14day LNLGIS inversion assimilated 10 s averages of OCO-2 ACOS b10 land nadir and land glint measurements concurrently with in situ measurements to optimize weekly NEE and ocean fluxes. The OCO-2 125 10 s averages were constructed analogous to the b9 10 s averages assimilated by models in MIPv9 (Peiro et al., 2021). The in situ measurements assimilated were updated from Peiro et al. (2021), specifically ObsPack NRT 5.0 was replaced by NRT 5.2. The flux inversion set-up was identical to the set-up of "TM5-4DVAR" in Peiro et al. (2021), except (i) the inversion was run from 2014-06-01 to 2021-02-01 (instead of 2014-09-01 to 2019-06-01 in Peiro et al. (2021)), (ii) ECMWF ERA5 meteorology was used to drive the model instead of ERA Interim, (iii) a 1 • × 1 • transport grid over North America was nested 130 inside the global 3 • × 2 • grid to take advantage of the higher in situ data density, and (iv) prior CO 2 fluxes were constructed following Weir et al. (2021).
The CAMS 14day LNLGIS inversion utilizes the CAMS greenhouse gases inversion system (Chevallier et al., 2005(Chevallier et al., , 2010Chevallier, 2013), and assimilates OCO-2 land nadir and land glint X CO2 10 s averages and in situ CO 2 measurements concur-  et al., 2007;Liu et al., 2014). These flux inversions optimize 14-day NEE and ocean fluxes by assimilating OCO-2 ACOS b10 land nadir and land glint "buddy" super-obs concurrently with in situ and flask measurements from version 6.0 of the GLOBALVIEW plus package (Masarie et al., 2014;Schuldt et al., 2020). OCO-2 "buddy" super-obs are obtained by averaging individual soundings into super-obs at 0.5 • × 0.5 • spatial resolution (within the 145 same orbit) following . We assimilate surface-based in situ and flask measurements between 11 AM and 4 PM local time. These data were also pre-filtered to remove observations that were not well simulated by the model (based on a posterior data-model χ 2 mismatches greater than three for a preliminary flux inversion). For these inversions, ODIAC fossil fuel emissions (Oda and Maksyutov, 2011;Oda et al., 2018) and GFED 4.1s biomass burning emissions (van der Werf et al., 2017), including small fires (Randerson et al., 2012), are prescribed but not optimized.

Dynamic global vegetation models (DGVMs)
We use CO 2 flux estimates from an ensemble of 15 dynamic global vegetation models (DGVMs) from TRENDY v8 (Sitch et al., 2015). We utilize fluxes simulated by the CABLE-POP, CLASS-CTEM, CLM5.0, DLEM, ISAM, ISBA-CTRIP, JS-BACH, JULES, LPJ, LPX-Bern, OCN, ORCHIDEE, ORCHIDEE-CNP, SDGVM and VISIT DGVMs. We exclude LPJ-GUESS because monthly output on R h was not available. We utilize monthly GPP, autotrophic respiration (R a ) and R h fluxes 155 from the "S3" experiment that employs time-varying CO 2 , climate and land use forcing data. We further calculate NPP from the simulated GPP and R a data (NPP = GPP − R a ) at the models native resolution. The NEE, NPP, and R h fluxes produced by each ensemble member are shown in Fig. S4 for the same three year period as the data-driven estimates (2015, 2016, and 2018).
We also utilize TRENDY v8 model output to estimate an ensemble of Carbon Use Efficiency (CUE = NPP/GPP) from 160 each DGVM. CUE can become negative during the winter and spring, when GPP is approximately zero but R a is non-zero.
However, we limit CUE values to a range between zero and one. These CUE estimates are then employed to estimate datadriven NPP estimates from the data-driven GPP data (see Sec. 2.4). Figure S5 shows the CUE estimates derived from the TRENDY v8 models.

165
We utilize four data-driven GPP estimates in this analysis: FluxSat, FLUXCOM, VPM, and GOSIF. These datasets differ in inputs and approach.
FLUXCOM upscales CO 2 fluxes from flux tower observations using a variety of machine learning methods and forcing datasets (Jung et al., 2020). We examine the ensemble mean of the nine remote sensing (RS) learning algorithms.
VPM is a light use efficiency model that estimates GPP globally using MODIS surface reflectances and NCEP Reanalysis-2 PAR and temperature data (Xiao et al., 2004;Zhang et al., 2017). The native spatiotemporal resolution of the dataset is 500-m and 8-days. VPM has been shown to agree well with FLUXNET eddy covariance site-level data (Zhang et al., 2017) and with TROPOMI SIF at the global scale (Doughty et al., 2021).
The GOSIF GPP product estimates GPP based on OCO-2 SIF, MODIS EVI, and reanalysis data from MERRA-2 (Li and Xiao, 2019). To generate GPP estimates, first, 8 day globally gridded 0.05 • × 0.05 • SIF is estimated from the input data using machine-learning algorithms. GOSIF GPP is then estimated from the GOSIF SIF estimates using eight SIF-GPP relationships 180 with different forms (universal and biome-specific, with and without intercept). In this analysis we utilize the mean GPP estimate across the eight SIF-GPP estimates.
These four data-driven GPP estimates are shown in Fig. S6. For this analysis, we estimate NPP from these data using the CUE from the TRENDY models. We perform this calculation differently for the monthly analysis and biweekly analysis. For the monthly analysis, we calculate 60 NPP seasonal cycles for each possible combination of the four GPP and 15 CUE seasonal 185 cycles. We then calculate the median as our best estimate and interquartile range as a metric of uncertainty. For the biweekly analysis, we calculate the best estimate using the median GPP and CUE seasonal cycles, and calculate the uncertainty using the full range of GPP estimates and interquartile range of CUE estimates. This is done differently to match the NEE analysis, which leverages the larger ensemble from the MIPv9 to examine the median and interquartile spread for the monthly analysis, but employs the full range for the smaller biweekly ensemble of three models.

Data-driven R h estimates
We calculate the seasonal cycle of R h by combining the data-driven estimates of NPP and NEE, We perform this calculation differently for the monthly analysis and biweekly analysis. For the monthly OCO-2 MIPv9 IS-and LNLG-based estimates, we calculate 540 R h seasonal cycles by combining the nine data-driven IS or LNLG NEE estimates 195 with the 60 NPP estimates. We then take the median and interquartile spread as the best estimate and uncertainty. For the biweekly analysis, we calculate the best estimate of R h from the best (median) estimates of NPP and NEE. We then take the uncertainty to be the full range of R h estimates calculated from the three biweekly NEE estimates and the NPP range.

Soil carbon decomposition model
We use the soil carbon decomposition model developed in Yi et al. (2015Yi et al. ( , 2020 to simulate the contribution of soil at different 200 depths to total R h and NEE fluxes. The soil decomposition model uses multiple litter and SOC pools to characterize the progressive decomposition of fresh litter to more recalcitrant materials, which include three litterfall pools, three SOC pools with relatively fast turnover rates, and a deep SOC pool with slow turnover rates. The litterfall carbon inputs were first allocated to the three litterfall pools depending on the substrate quality of litterfall component and then transferred to the SOC pools through progressive decomposition. We then model the profile of the carbon pools through accounting for the vertical carbon transport 205 (Yi et al., 2020). A constant diffusivity rate was assigned to permafrost (5.0 cm 2 yr −1 ) and non-permafrost (2.0 cm 2 yr −1 ) regions within the top 1 m soil, and then linearly decreased to 0 at the 3 m below surface (Koven et al., 2013). The boundary conditions at the soil surface were set as the carbon input rate to the three surface litterfall pools. A zero-flux was assigned at the bottom of the soil carbon pool, which was set as 3 m. This accounts for the upper permafrost layer, while carbon in deeper layers (e.g., 3-10 m) is largely insulated from climate variability and ignored in this study. The decomposition rate (day −1 ) 210 for each carbon pool was derived as the product of a theoretical maximum rate constant and dimensionless multipliers for soil temperature and liquid water content constraints (Yi et al., 2015) to decomposition indicated by the MERRA2 soil temperature data. For simplicity, the soil saturation was assumed as 1.0 when soil temperature is above 0 • C, while the maximum liquid soil water fraction was used for below freezing (Schaefer and Jafarov, 2016).

215
We examine 15 high latitude FLUXNET2015 sites to confirm the seasonality of carbon fluxes inferred from the atmospheric CO 2 and remote sensing datasets. These sites are listed in table S1. For this, we utilize monthly data with the quality flag greater than 0.75. We calculate NPP and R h for each site from the NEE and GPP datasets by applying the CUE from the TRENDY DGVMs at the gridcell containing the FLUXNET site.

Differences between data-driven and DGVM carbon fluxes
We first examine the mean seasonal cycle of monthly NEE from the MIPv9 inversions and TRENDY v8 DGVMs over the three north Eurasian regions (mean over 2015, 2016, and 2018; 2017 is excluded due to an OCO-2 data gap). The objective of this initial analysis is to identify the seasonal features of NEE over northern Eurasia, and identify how the data-driven and simulated estimates differ. The spread among the TRENDY v8 models is large and encompasses the data-driven estimates 225 ( Fig. S4). Thus, to identify data-model differences, we focus on differences in the ensemble median estimates and adopt the interquartile spread across MIPv9 and TRENDY to quantify uncertainty in this estimate.   the Cold region, the TRENDY models produce weaker carbon uptake during June-July (7.17-11.21 TgC day −1 ) but stronger uptake (or reduced efflux) during Aug-Oct (4.86-5.33 TgC day −1 ). This large amplitude of the NEE seasonal cycle contributes 245 to the large seasonality in X CO2 observed over eastern Eurasia (Jacobs et al., 2021).
To further investigate the causes of differences in NEE between the TRENDY and MIPv9 ensembles, we separately examine NPP and R h . The data-driven NPP and TRENDY NPP are shown in Fig. 2 (Rogers et al., , 2019, we find that TRENDY NPP largely capture the data-driven seasonality and do not drive NEE differences against the data-driven seasonal cycle. Finally, we compare TRENDY R h to data-driven R h (Fig. 2(g-i)). In the Warm region, the TRENDY model median R h is 255 lower than the data-driven estimates during May-Sep (2.28-2.47 TgC day −1 ), but the seasonality is similar. In the Mid region, the data-driven and TRENDY R h seasonal cycles show good agreement throughout the growing season. The largest differences between data-driven and TRENDY R h seasonal cycles are found for the Cold region. The TRENDY model median shows increased R h during May-Jul (4.05-7.07 TgC day −1 ) but show reduced R h during the rest of the year (1.37-1.69 TgC day −1 ).
As a result, the seasonality of data-driven R h is shifted later in the year relative to TRENDY ensemble. This can be seen in 260 the cumulative fraction of annual R h , which quantifies the fraction of total R h released as the season progresses ( Fig. 2(j-l)).
The percentage of total annual Rh released during May-Jul is 48% for the TRENDY ensemble median but 36% (30%) for the LNLG (IS) data-driven R h ensemble median.
We independently confirm a shift in the seasonality of data-driven R h relative to TRENDY for 15 high latitude FLUXNET sites (Fig. S12). Due to the sparsity of FLUXNET sites over northeastern Eurasia, we include sites outside of the "Cold" domain 265 but that have early zero-crossing dates (estimated by a mean October air temperature less than 2 • C). The observed median R h peaks across these sites during September, in agreement with the data-driven R h seasonality. In contrast, the TRENDY median R h peak occurs during July (consistent with the regional scale analysis). This phase shift is also evident in the cumulative fraction of annual R h , which shows that the percentage of total annual Rh released during May-Jul is 46% for the TRENDY ensemble median but 35% for the FLUXNET-based ensemble median.

270
Overall, these results indicate good agreement between the TRENDY ensemble and data-driven estimates for the Warm and Mid regions, but show marked differences over the Cold region. In particular, we find that the data-driven estimates suggest a seasonal redistribution of R h with a reduction during May-Jul but an increase for the remainder of the year. Further, these results show that differences in R h largely account for the differences between the data-driven and TRENDY NEE fluxes over the Cold region, except in Aug-Sep when NPP differences are large. In the remaining sections, we will characterize the data-275 driven seasonal cycle of NEE, NPP, and R h at a higher (biweekly) temporal resolution and investigate mechanistic explanations for the data-model differences found over the Cold region.

Data-driven biweekly CO 2 fluxes
We now investigate the data-driven seasonal cycle of NEE, NPP, and R h with biweekly (14 day) temporal resolution. This higher resolution better resolves temporal changes in CO 2 fluxes throughout the growing season, particularly during the shoul-280 der seasons, when week-to-week changes in CO 2 fluxes are large (Parazoo et al., 2018a). For this analysis, we utilize a set of three flux inversions that assimilate both in situ and OCO-2 land nadir and glint data (ACOS v10) to estimate sub-monthly CO 2 fluxes (TM5-4DVAR 14day , CAMS 14day , and CMS-Flux 14day ; individual model fluxes shown in Fig. S3). These inversions give similar NEE seasonality to the MIPv9 monthly fluxes (e.g., Fig S10) and have seasonality similar to the OCO-2 MIPv9 LNLG inversions for the Cold region. For NPP, we utilize the same datasets as Sec. 3.1 but at 14-day temporal resolution. We examine 285 the ensemble medians for a best estimate and take the full range of model estimates as an illustration of the uncertainty. although peak NPP is slightly delayed relative to peak drawdown in NEE (by 0-2 weeks). Both NEE and NPP generally follow the seasonal cycle of insolation, but are somewhat delayed in the Mid and Cold regions, likely due to temperature limitation 290 . Peak R h is found to be delayed relative to peak NPP by 0-8 weeks. For the Warm and Mid regions, R h follows the seasonal cycle of surface temperature, with 48% and 51% of the annual total R h occurring after the peak in surface temperature, respectively. In contrast, the Cold region shows a substantial delay relative to surface temperature, with 63% of the total R h occurring after the peak in surface temperature. The median R h seasonal cycle is also found to have a double peak in this Cold region: a smaller peak of 9.8 TgC day −1 occurs during late May followed by a larger peak of 21.5 TgC day −1 at 295 the beginning of September. This May peak roughly aligns with the spring thaw and positive zero-crossing at the beginning of May. A potential mechanistic explanation for a spring pulse of R h could be due to thawing soils that release CO 2 that has been trapped within subsurface soil layers over the winter (see Sec. 4). However, the signal from this first peak is small relative to the uncertainties.

Mechanistic drivers of late-season R h 300
Data-driven R h for the Cold region indicates a delayed peak relative to surface temperature and the TRENDY model median.
Here we examine possible mechanistic explanation for this late season peak in R h using models. We investigate two factors that could potentially contribute to the delay in R h : (1) Seasonal variations in the labile carbon pool. Leaf and fine root litter carbon pools tend to increase over the growing season as carbon is sequestered through photosynthesis (Randerson et al., 1996). Thus, increased substrate availability in the autumn relative to the spring will act to shift the seasonal cycle of R h later in the year.

305
(2) R h from subsurface soil layers that have a delayed seasonal cycle driven by a lag in soil temperature. Heating and cooling at the surface slowly diffuses through the soil column resulting in a lagged seasonal cycle of temperature with depth (Parazoo et al., 2018b). Figure 4(a-c) shows the seasonal cycle in soil temperature from the MERRA-2 Land dataset. The phase shift in soil temperature seasonality with depth can be up to several months, and is largest for the colder regions. Note that we verify the fidelity of the MERRA-2 Land soil temperature against borehole measurements and against simulated soil temperature 310 from ERA-5 reanalysis and the CMIP6 models (see Text. S2, Fig. S13-S14).
To test the impact of these factors, we consider a single layer model that represents R h using a exponential relationship with temperature: where α represents the labile carbon pool size, β is a constant, and T is the temperature of the carbon pool. We further investigate these mechanisms using a soil carbon decomposition model that can resolve seasonal and vertical variations in carbon pools (Sec. 2.6). We examine the seasonality of the R h simulated down to a depth of 300 cm using a 335 constant carbon pool (C 300cm ), simulated within the top 10 cm of soil using a dynamic carbon pool (D 10cm ), and simulated to a depth of 300 cm using a dynamic carbon pool (D 300cm ). We compare these seasonal cycles after normalizing by the annual total R h . Figure 4(g-i) shows that incorporating seasonal and vertical variations in the carbon pool results in a phase shift in R h to later in the year, consistent with the one-layer model results. The simulated impact of these factors is found to be quite small, possibly due to underestimation of the impact of seasonal and vertical variations in the carbon pools on R h in the model.

340
Still, these model simulations can inform the R h tendencies of these carbon pool variations. Comparing the regions, the impact of seasonal variations in the labile carbon pool are quite similar, with reduced R h in the summer and increased R h during the autumn relative to a constant carbon pool. In contrast, the impact of vertically resolved R h shows differences between the regions, with a small impact for the Warm region but a comparatively large impact for the Cold region. The larger impact over the Cold region is likely due to larger carbon pools at depth (Fig. S16), with a possible contribution from regional differences 345 in the thermal gradient with depth (Fig. 4). Similarly, we find that the fractional contribution of subsurface soils to total R h has larger seasonal variation over the Cold region (Fig. S17). Thus, these results support a substantial contribution of subsurface soil R h , and suggests that an underestimation of this quantity by the DGVMs could explain the data-model differences.

350
Over the cold northeastern region of Eurasia, our data-driven R h seasonal cycle allocates 64-70% of annual CO 2 emissions to outside of the summer (August -April) compared to only 52% of annual R h emissions allocated by the TRENDY DVGMs to this period. The reason that the TRENDY models do not capture this seasonality is unclear. A plausible explanation is that the TRENDY models do not capture the contribution of subsurface layers to R h , especially during the zero curtain period. This is clearly the case for the subset of TRENDY models that drive R h with air temperature. However, it is unclear if this is an important factor for models with more sophisticated soil modules. Surprisingly, a preliminary analysis did not find a relationship between model complexity and agreement with the data-driven estimate. The drivers of differences from the datadriven estimate may differ between models, and be impacted by the interplay of litterfall phenology, R h formulation (Peylin et al., 2005), and number of soil layers, among other factors. Some potential areas of focus for improving models may be gleaned from recent studies. Seiler et al. (2021) suggest that the TRENDY models may systematically underestimate soil 360 organic carbon at high latitudes, which could contribute to an underestimate of subsurface R h across the models. Endsley et al.
(2021) found a similarly phased bias in simulated R h by the Terrestrial Carbon Flux (TCF) model against flux tower R h to that reported here. They show that this bias could be largely mitigated by adding seasonally varying litterfall phenology, an O 2 diffusion limitation on R h and a vertically resolved soil decomposition model, suggesting these may be foci for model improvements.

365
Differences between the data-driven and TRENDY R h seasonal cycles suggest that DGVMs may be deficient in simulating the response of permafrost rich ecosystems to climate change, particularly in terms of subsurface R h . Improving DGVM skill in these ecosystems is critical given the rapid northern high latitude warming and lengthening of the zero curtain period (Euskirchen et al., 2017;Parazoo et al., 2018b;Chen et al., 2021). The rapid changes in northern Eurasia are illustrated in Fig. 5, which shows the number of months per year that soil temperatures are greater than 0 • C as simulated by a set of 370 CMIP6 models. Soils in the permafrost-rich Cold region are undergoing the most dramatic lengthening of the unfrozen period, particularly at depth (50-200 cm). Under scenario ssp585 (highest emission scenario), these soils are predicted to go from ∼5 months per year with a monthly mean soil temperature above 0 • C during the 20th century to ∼11 months per year by 2100. The impact is largest for the Cold region at depth because of the reduced seasonality relative to the surface, such that a warming of ∼7 • C shifts nearly the entire seasonal cycle above 0 • C at a depth of 50-200 cm (Fig. S18). Such warming would 375 drive the widespread formation of talik, a subsurface layer of perennial thawed soil (Parazoo et al., 2018b), and further enhance R h at depth.

Limitations
Atmospheric CO 2 measurements are relatively sparse over Northern Eurasia (Byrne et al., 2017). In situ and flask CO 2 mea-385 surements are spatially sparse over Mid and Cold regions (Fig. S9), with only a handful of sites assimilated over Russia as part of Japan-Russia Siberian Tall Tower Inland Observation Network (JR-STATION) of nine tower sites (Sasakawa et al., 2010(Sasakawa et al., , 2013. The OCO-2 coverage is seasonally variable (Fig. S8). Due to the fact that X CO2 retrievals are performed on re- flected sunlight, the coverage across Eurasia is quite good during the growing season (May-Sep). However, low signal and the inability to perform retrievals over snow limits the data coverage during the shoulder seasons and winter, resulting in few X CO2 390 retrievals across the Mid and Cold regions during Nov-Feb. Ongoing research to both improve X CO2 quality control filtering at high latitudes (Jacobs et al., 2020;Mendonca et al., 2021) and in retrievals X CO2 over snow and ice surfaces (Mikkonen et al., 2021) may reduce these data gaps in the future. Despite this sparsity of measurements, we find that the LNLG and IS flux inversions show consistent differences from the TRENDY and prior fluxes. Furthermore, these data show good agreement with withheld in site data (Peiro et al., 2021) and independent aircraft measurements over Alaska ( Fig S11). Thus, we We also note that there are challenges in estimating data-driven GPP during the shoulder season due to reduced reflected radiance and snow cover, which impacts the spectral features of the vegetation canopy. Poor quality data, such as snowy and noisy samples, contributes to uncertainty in the timing of shoulder seasons (Wang et al., 2017;Zhang, 2015). In this analysis, 405 we attempted to mitigate this issue though the use of an ensemble of data-driven GPP estimates, but we acknowledge that remaining biases may be present. There are also remaining challenges in relating the inferred fluxes to underlying processes. Space-based flux constraints do not discriminate between biological and physical processes driving carbon cycle fluxes. It is currently unclear whether the substantial cold season CO 2 effluxes across permafrost regions are driven primarily by biological activity or physical processes 420 (Natali et al., 2019; Arndt et al., 2020;Raz-Yaseef et al., 2017). Yet, isolating the primary driver of these fluxes is critical for inferring the sensitivity of R h to climate change. If the cold season R h comes from the metabolism of old permafrost carbon, then 14 CO 2 measurements could help differentiate biological from physical CO 2 production.

Conclusions
Space-based and in situ atmospheric CO 2 measurements revealed strong summer uptake and early cold season release of CO 2 425 over the cold northeastern Eurasia region, implying a late summer peak in R h with substantial early cold season respiration.
Based on model simulations of R h , we suggested that this seasonality is driven by a large contribution of subsurface soils to the total R h . These results are consistent with site-level observations identifying substantial CO 2 release in permafrost regions outside the growing season (Natali et al., 2019), and in particular, reported spikes in early cold season respiration associated with the zero curtain period in Arctic ecosystems (Commane et al., 2017;Jeong et al., 2018).

430
The data-driven seasonality of R h over the Cold region was generally not captured by the TRENDY DGVMs, which showed greater R h during May-Jul and lower R h during the rest of the year. The underlying cause of this discrepancy is unclear, but may be linked to an underestimate of the contribution of sub-surface soils to total R h . Given the rapid warming of permafrost soils (Euskirchen et al., 2017;Chen et al., 2021), talik formation (Parazoo et al., 2018b), and increasing early cold season CO This analysis demonstrates the utility of space-based observations for studying carbon cycle dynamics at high latitudes, where in situ measurements are sparse. Although currently limited by a short observing record (2014 -present), the estimates of NEE inferred from the OCO-2 X CO2 retrievals suggest that these data will provide a powerful tool for detecting change in seasonal cycle of NEE across northern Eurasia.

Appendix A: Appendix 1
We estimate seasonal variations in labile carbon by estimating a litterfall flux of carbon. Litterfall is assumed to be a fraction of NPP, following Randerson et al. (1996) (Fig. S15), and in steady state on annual timescales: where t is the day of the year and f NPP is the frac NPP that is converted to litterfall. The seasonal variation in the labile carbon pool (∆C pool ) is defined as the difference in flux between litterfall and Rh: Finally, we assume a fractional variation in the total carbon pool amount, γ, and calculate α(t): 455 α(t) = C pool (t) max(|C pool (t)|) γ + 1 α 0 , where α 0 is the mean carbon pool size, and is optimized in the regression in Sec. 3.3. Changes in growing season duration and productivity of northern vegetation inferred from long-term remote sensing data, Environmental Research Letters, 11, 084 001, 2016.