Global atmospheric CO2 inverse models converging on neutral tropical land exchange, but disagreeing on fossil fuel and atmospheric growth rate

We have compared a suite of recent global CO2 atmospheric inversion results to independent airborne observations and to each other, to assess their dependence on differences in northern extratropical (NET) vertical transport and to identify some of the drivers of model spread. We evaluate posterior CO2 concentration profiles against observations from the High-Performance Instrumented Airborne Platform for Environmental Research (HIAPER) Pole-to-Pole Observations (HIPPO) aircraft campaigns over the mid-Pacific in 2009–2011. Although the models differ in inverse approaches, assimilated observations, prior fluxes, and transport models, their broad latitudinal separation of land fluxes has converged significantly since the Atmospheric Carbon Cycle Inversion Intercomparison (TransCom 3) and the REgional Carbon Cycle Assessment and Processes (RECCAP) projects, with model spread reduced by 80% since TransCom 3 and 70% since RECCAP. Most modeled CO2 fields agree reasonably well with the HIPPO observations, specifically for the annual mean vertical gradients in the Northern Hemisphere. Northern Hemisphere vertical mixing no longer appears to be a dominant driver of northern versus tropical (T) annual flux differences. Our newer suite of models still gives northern extratropical land uptake that is modest relative to previous estimates (Gurney et al., 2002; Peylin et al., 2013) and near-neutral tropical land uptake for 2009–2011. Given estimates of emissions from deforestation, this implies a continued uptake in intact tropical forests that is strong relative to historical estimates (Gurney et al., 2002; Peylin et al., 2013). The results from these models for other time periods (2004–2014, 2001–2004, 1992–1996) and reevaluation of the TransCom 3 Level 2 and RECCAP results confirm that tropical land carbon fluxes including deforestation have been near neutral for several decades. However, models still have large disagreements on ocean–land partitioning. The fossil fuel (FF) and the atmospheric growth rate terms have been thought to be the best-known terms in the global carbon budget, but we show that they currently limit our ability to assess regional-scale terrestrial fluxes and ocean–land partitioning from the model ensemble.


Supplementary material 1. Calculation of vertical gradients
For all model simulations, we interpolate the models along the HIPPO flight track and mask the output according to the CO2.X availability (2075 out of 28857 10-sec samples missing for our selected flights and regions). Measurements made in the continental boundary layer (Fairbanks missed approach or Anchorage take-off or landing) or in stratospheric air (N 2 O < 318 pbb + (year-2009) are excluded. We subtract a deseasonalized smoothed trend component from a seasonal trend decomposition using Loess (STL, Cleveland et al., 1990) fit of the Mauna Loa Observatory in-situ measurement time series to provide a common reference for both observations and models. For the observations and the models, this trend comes from filtering the MLO record with 10-year and 2-year windows and adding these 2 components back together, removing the seasonal and residual terms (see e.g. Stephens et al., 2013). Then all remaining data are binned in boxes of 5 degrees latitude and 100 hPa altitude. For each box, a 2-harmonic plus annual mean offset fit of the full available time period is performed. An example of this is shown in Fig. S1. For combining individual fit components into larger box means, a weight using the cosine of latitude is applied for each box to more closely match the zonally integrated impact of fluxes. The northern extratropical vertical gradient is calculated as the difference between the average from the 1000 hPa to 800 hPa for the lower troposphere (LT), and the average from 800 hPa to 400 hPa for the upper troposphere (UT), spanning the latitude from 20 N to 90 N. The statistics of the fits for all boxes and for all models and observations are presented in Fig. S2 and S3. Fig. S2 shows the annual mean and Fig. S3 shows the seasonal amplitude. These Figures show that the latitudinal distribution is reasonably well represented in the models while more differences are seen in the vertical. Figure S2: Annual mean CO2 offsets resulting from the 2-harmonic plus offset fits of the HIPPO data and correspondingly sampled model outputs for every box of 5 degrees latitude and 100 mbar altitude. Note all the models and observations had a deseasonalized smoothed trend component from an STL fit of the Mauna Loa Observatory time series subtracted to provide a common reference for both observations and models. Figure S3: Seasonal amplitude resulting from the 2-harmonic plus offset fits of the HIPPO data and correspondingly sampled model outputs for every box of 5 degrees latitude and 100 hPa altitude.

Estimation of measurement uncertainty in vertical gradient
The short-term (10 sec) precision for the QCLS and OMS sensors is on the order of 0.1 ppm and on the order of 0.5 ppm for AO2, while the precision of the flask measurements is also on the order of 0.1 ppm. Thus, the statistical imprecision on vertical gradients defined by many hours of measurements or hundreds of flasks is miniscule. However, airborne measurements may experience altitude-dependent biases for several reasons. Inlet humidity and pressure dependent surface effects can influence both in situ sensors and whole air samplers, cabin pressure changes can affect in situ sensors, and with inadequate drying altitude dependent humidity changes can influence flask measurements.
We assess the uncertainty in the vertical gradient defined by the HIPPO flight tracks, and as calculated in this work, by examining vertical gradients in the differences between the multiple CO 2 sensors and samplers. Because of the wide range of experimental choices on inlet designpumps, dryers, tubing materials, flow rates, and more -this approach provides a reasonable estimate of our uncertainty in using any single sensor, or combination of sensors, such as the CO2.X variable, which is primarily data from CO2_QCLS with calibration periods gap-filled with CO2_OMS. Model-data mismatch uncertainties specific to the sampling times and coverage of HIPPO are addressed in the next sections. Table S1 shows the vertical gradients, calculated in several different ways and for different time periods, in the differences between co-located measurements by the different systems. With the 3 in situ sensors, we can calculate vertical gradients identically to how we do so for the CO2.X variable by fitting harmonic functions, as described above. The data density for the whole air samplers is not sufficient to calculate gradients from the same harmonic fitting method as the insitu data. The in-situ comparisons are shown on the left side of Table S1. As expected, we find close agreement between CO2_QCLS and CO2.X (CO2_QCLS -CO2.X = -0.02 ppm for Ann, not shown in table), but disagreements in the same direction of up to -0.15 ppm for CO2.X compared to CO2_AO2 and CO2_OMS. Recognizing that other common sources of systematic bias might still not be included, we use this full range of disagreement as a conservative bidirectional 1 σ uncertainty estimate of ± 0.15 ppm in the annual and seasonal vertical gradients and use this as the width of the pink bar in the annual plots in Fig. 2 and S4. If we used CO2_OMS instead of CO2.X our best estimate of the annual mean vertical gradient would be shifted 0.15 ppm to the left in Fig. 1A. For the seasonal plots we use the maximum absolute differences of 0.07 ppm for JFM and 0.17 for JAS for the width of the pink bar.
Using both the in-situ sensors and the whole air samplers, we also calculate the differences in the vertical gradients by simple bin averaging of the sensor differences for each of the 9 transects. For this, we filter for tropospheric N 2 O and not continental boundary layer, bin average the CO 2 differences by 10 degrees latitude and 100 hPa bins, and then average these bins with cos(latitude) weighting into larger > 20 N and > 800 hPa, and > 20 N between 400 and 800 hPa bins, to be consistent with our other calculations. We show the vertical differences between these 2 bins on the right side of Table S1. The CO2.X variable is defined at 10 sec resolution but not 1 Hz, and because matching to the Medusa flask sampling kernels requires 1 Hz resolution, we show the difference between CO2_MED and CO2_QCLS rather than CO2.X. We also filter for comparisons with > 75 % overlap between QCLS and Medusa in terms of the kernel weighting. Because CO2.X is defined by CO2_QCLS anywhere that CO2_QCLS exists, these differences are all exactly zero and we do not show them in the table. We use the maximum absolute difference for each transect as conservative bidirectional 1 σ uncertainty estimates and use these to define the size of the vertical error bars for each transect in Fig. 1, with the exception of transect #1 (HIPPO 1 southbound) for which the CCG flask sampler had inadequate drying when we have used the maximum of the other 3 comparisons. The average differences across all 9 transects also support the values determined after harmonic fitting.  Table S1. Vertical gradients in the CO 2 (ppm) differences between independent sensors and samplers. Calculated as described in the text. The independent transects are shown in seasonal order to match Fig. 1. SB = southbound, NB = northbound.

Estimation of potential synoptic model bias
Although the models are all driven by reanalyzed meteorological products, we recognize that there are likely still errors in the positioning of synoptic systems when compared to the real world. To assess how much of the model disagreement we see might be a result of synoptic reanalysis error instead of biased vertical mixing, we have evaluated a single model output over a range of 10 days surrounding the actual HIPPO flights. This represents a worst-case scenario as it simulates a model in which the synoptic systems are completely in the wrong place. We interpolated the Jena s04_v4.1 simulation, on the HIPPO flight track locations but using different time lags, from 5 days before the actual flight and to 5 days after with one day increment. Then, we calculated vertical gradients identically to how we do so for all other models, by binning and fitting harmonic functions as described above. The result is shown in Fig. S4. The standard deviation of all simulations is 0.06 ppm for the annual mean, 0.14 for JFM, 0.15 for JAS. We do not include this worst-case potential model error on our figure, but note that for the annual mean gradient it is considerably smaller than the observational uncertainty, and similar in magnitude to the seasonal uncertainties. This small sensitivity to the positioning of synoptic systems is a result of averaging data over a large latitudinal range. Because synoptic systems typically stir latitudinally, for example in springtime moving high CO 2 air to the south while at the same time moving low CO 2 air to the north, their effect appears to largely cancel out when averaging from 20 to 90 N.

Characterization of spatial and temporal representativeness of HIPPO observations
The HIPPO observations provide good vertical coverage but have latitudinal gaps, are mostly over the Pacific Ocean approximately along 150 degrees W, and are for specific months (Fig. S1). This raises the question of the spatial and temporal representativeness of those observations. On Fig.  S5, we show the annual mean vertical gradients (2009)(2010)(2011) for the models at the time and location of the observations on the x axis of panels A and B. From the gridded model outputs, we also calculated the average for the full 150 W transect (Fig. S5A) and for the model zonal mean (Fig.  S5B). There is an offset, caused by larger vertical gradients for the zonal mean, which contains more land influence than over 150 W. Overall, despite a larger vertical gradient over land the modelled vertical gradients between the zonal mean, 150 W and sampled along the HIPPO flight track correlate well for the annual means. We show the 150 W vertical gradient on the x axis of panels C and D. The HIPPO vertical gradient is to a certain degree representative of the zonal mean vertical gradients. To evaluate the impact of the time representativeness (Fig. S5D), we look at the zonal mean and 150 W vertical gradients for each individual year (2009, 2010 and 2011). The models show little interannual variability and the relation between zonal means and Pacific data holds. Then, we investigate these spatial relationships, but at monthly and seasonal time scales. Fig. S6A shows all the modelled monthly values over the 3 years between 150 W and the zonal mean.
Overall, using all months, the modelled value for the NET 150 W transect correlate with the zonal means. Despite stronger variation than annual means (note the different scales from Fig. S5), the linear relationships in vertical gradients still holds. We also compare the 3-year average retrieved seasonal gradients versus the HIPPO-sampled seasonal gradients. We look at 150 W (Fig. S6B), the zonal means (Fig. S6C) and the zonal means using only land values (Fig. S6D). Significant relationships exist for JFM, AMJ, and OND for 150 W; for JFM and AMJ for zonal means; and for JAS for zonal means over land. To assess if we could be missing a flux dependency on transport by only sampling over the ocean, we ask if the vertical gradients were measured over land, would they be correlated with retrieved fluxes? For both seasons and annual means, we compare the NET land and the T+SET fluxes with the vertical gradients, for HIPPO measurements only, over the zonal means, and the zonal means over land only (Fig. S7).
Despite vertical gradients up to 3 times larger, there are still no relationships for all space and time for the JFM season and annual means. However, there is a modest linear correlation between the T+SET land fluxes and the vertical gradients for the HIPPO flight tracks and for the zonal mean averaged over land. This annual and seasonal characteristic is consistent with the findings above, illustrated on the Fig. S5 and S6. Finally, we reproduce Fig. 3 from Stephens et al. (2007) substituting T+SET for just T, dividing fluxes at 20 N and 20 S instead of using TransCom regions, and using the 2 newer models that span the 1992-1996 period, CAMS (v16r1) and Jena (S85_v4.1). The CO 2 mole fractions are interpolated at all sites used by Stephens et al. (2007), at 1 km and 4 km at the highest temporal resolution, 3-hourly for CAMS and 6-hourly for Jena. The resulting T+SET fluxes and vertical gradients are presented in Fig. 2 for the annual means and in Fig. S8 for the annual means plus the 2 seasons (JAS and JFM) for 2009-2011. Having only 2 newer models prevents detailed statistical analysis. It is however worth noting that the JFM vertical gradients are well represented in the newer models and that the annual mean gradient bias appears to arise in summer.  Transport model name: TM5 model  Meteorological fields: ERA-Interim (ECMWF, Reanalysis-Interim).

Description of inversions
Time period ( Observations: 66 sites are assimilated as monthly means.