Interactive comment on “ North American CO 2 exchange : intercomparison of modeled estimates with results from a fine-scale atmospheric inversion ” by S

Abstract. Atmospheric inversion models have the potential to quantify CO2 fluxes at regional, sub-continental scales by taking advantage of near-surface CO2 mixing ratio observations collected in areas with high flux variability. This study presents results from a series of regional geostatistical inverse models (GIM) over North America for 2004, and uses them as the basis for an inter-comparison to other inversion studies and estimates from biospheric models collected through the North American Carbon Program Regional and Continental Interim Synthesis. Because the GIM approach does not require explicit prior flux estimates and resolves fluxes at fine spatiotemporal scales (i.e. 1° × 1°, 3-hourly in this study), it avoids temporal and spatial aggregation errors and allows for the recovery of realistic spatial patterns from the atmospheric data relative to previous inversion studies. Results from a GIM inversion using only available atmospheric observations and a fine-scale fossil fuel inventory were used to confirm the quality of the inventory and inversion setup. An inversion additionally including auxiliary variables from the North American Regional Reanalysis found inferred relationships with flux consistent with physiological understanding of the biospheric carbon cycle. Comparison of GIM results with bottom-up biospheric models showed stronger agreement during the growing relative to the dormant season, in part because most of the biospheric models do not fully represent agricultural land-management practices and the fate of both residual biomass and harvested products. Comparison to earlier inversion studies pointed to aggregation errors as a likely source of bias in previous sub-continental scale flux estimates, particularly for inversions that adjust fluxes at the coarsest scales and use atmospheric observations averaged over long periods. Finally, whereas the continental CO2 boundary conditions used in the GIM inversions have a minor impact on spatial patterns, they have a substantial impact on the continental carbon budget, with a difference of 0.8 PgC yr−1 in the total continental flux resulting from the use of two plausible sets of boundary CO2 mixing ratios. Overall, this inter-comparison study helps to assess the state of the science in estimating regional-scale CO2 fluxes, while pointing towards the path forward for improvements in future top-down and bottom-up modeling efforts.


Introduction
Carbon cycle scientists are increasingly called upon to provide information in support of efforts to monitor anthropogenic CO 2 emissions, and to provide predictions of future changes to the carbon cycle within the context of a changing climate and land-use choices (CCSP, 2007).Atmospheric inverse models can contribute towards these goals by taking advantage of the information contained in atmospheric CO 2 mixing ratio measurements regarding upwind surface CO 2 exchange.Using these measurements, together with an atmospheric transport model and within a robust statistical framework (e.g.Enting, 2002), inverse models are used to infer the spatiotemporal distribution and magnitude of surface CO 2 fluxes.In addition, inverse-modeling derived CO 2 flux estimates are potentially useful for evaluating the processbased formulations of terrestrial ecosystem models.In fact, atmospheric measurements of CO 2 provide one of the key means of evaluating mechanistic models (e.g.Randerson et al., 2009;Cadule et al., 2010), given the lack of direct flux observations at regional scales.
Inversions that can take advantage of spatial and temporal atmospheric CO 2 gradients measured in areas with high flux variability provide the potential to resolve sub-continental scale fluxes, thereby informing carbon management efforts and evaluations of mechanistic models of the carbon cycle.An expanding in situ continuous measurement network across the North American and European continents (e.g.NOAA-ESRL, 2011;CGGMN, 2011;CEAD, 2011) is making this possible, but optimally extracting the flux signal from these data is complicated by the combined influence on atmospheric CO 2 mixing ratios of the diurnal cycle of the terrestrial biosphere, heterogeneous land cover, point source fossil fuel emissions, and complex atmospheric transport (Bakwin et al., 1998).Therefore, simultaneous improvements in inversion setups (e.g.Law et al., 2002;Schuh et al., 2009;Gourdji et al., 2010) and in the quality of atmospheric transport models (e.g.Geels et al., 2007) have been necessary.
By limiting the domain size in regional inversions, fluxes can be estimated at relatively fine spatial scales (e.g. 1 • × 1 • ), thereby reducing aggregation errors (e.g.Kaminski et al., 2001;Engelen et al., 2002) associated with estimating coarse-scale fluxes using highly variable CO 2 measurement data, while simultaneously keeping the computational cost of inversions manageable.In addition, with a limited domain, it is possible to take advantage of high-resolution meteorological information and Lagrangian transport models that can better resolve atmospheric dynamics in the near-field of measurement locations (e.g.Lin et al., 2003;Gerbig et al., 2008).However, it has proven difficult to reconcile observed differences in estimates across regional inverse modeling studies due to a number of potential factors: atmospheric transport, flux spatial and temporal resolution, boundary conditions, errors in the bottom-up models used as priors, treatment of fossil fuels, spatiotemporal flux covariance assumptions, and the processing and filtering of observations, among other possible causes.
Similarly, large discrepancies have also been seen in intercomparisons of mechanistic models of biospheric CO 2 flux (Huntzinger et al., 2012a;Hayes et al., 2012).Given that atmospheric CO 2 mixing ratio observations are a key data constraint on biospheric CO 2 fluxes, a reconciliation of topdown flux estimates across inversion studies could be especially useful for evaluating biospheric model estimates and their process-based assumptions.
As a step towards reconciling CO 2 flux estimates from regional inversions and biospheric models, this study presents a regional grid-scale geostatistical inversion (Michalak et al., 2004;Mueller et al., 2008;Gourdji et al., 2008Gourdji et al., , 2010) ) for North America in 2004, using the sampling network of 9 towers collecting continuous CO 2 measurements at that time, as well as available surface flask and aircraft data.The geostatistical inverse modeling (GIM) approach implemented here uses the optimized setup from Gourdji et al. (2010), which resolves fluxes at finer spatial and temporal scales than other published inversion studies for the same domain (e.g.Peters et al., 2007;Deng et al., 2007;Schuh et al., 2010;Butler et al., 2010), and relies strongly on the atmospheric measurements and spatiotemporal flux covariance assumptions to help constrain the problem.Also, by eliminating the requirement for explicit prior flux estimates and optimizing covariance parameters directly with the atmospheric data, the inversions presented here reduce potential biases associated with these setup choices.Four sets of geostatistical results are presented, with and without process-based auxiliary datasets that can help to explain biospheric fluxes, and using two plausible sets of continental boundary conditions.
Geostatistical inversion results are compared at various spatial and temporal scales to estimates from other inversion studies over North America to help illuminate potential causes of their differences and associated strengths and weaknesses of the various approaches.Geostatistical inversion estimates are also used to interpret the spread seen across a collection of biospheric models participating in the North American Carbon Program (NACP) Regional and Continental Interim Synthesis study (RCIS; Huntzinger et al., 2012b).

Data and methods
The presented inversions use the setup described as part of a synthetic data study for June 2004 (i.e.Gourdji et al., 2010) and further evaluated using synthetic data for a full year.A brief overview of the setup and methods is provided below, with methodological details available in Gourdji et al. (2010).

Flux domain and resolution
Fluxes are estimated at a 1 • × 1 • , 3-hourly resolution for the North American continent in 2004.Positive values indicate a flux from the biosphere to the atmosphere, and conversely, negative values refer to net uptake.The estimation domain spans from 10 • N to 70 • N and 50 • W to 170 • W, yielding 2635 land grid-cells (Fig. 1).Altogether, 8 million fluxes are estimated for the year (2635 regions × 366 days × 8 flux periods per day).Resolving fluxes for a 4-day-averaged diurnal cycle temporal resolution, as described in Gourdji et al. (2010), was also explored to reduce computational expense, but the penalty in terms of the quality of the flux estimates was found to be unacceptable.
Flux estimates are not well-constrained at this fine spatiotemporal estimation resolution by the limited atmospheric network; however, biases due to estimating fluxes directly at large scales, termed aggregation errors (Kaminski et al., 2001;Engelen et al., 2002), are minimized by estimating fluxes first at fine scales and then aggregating to betterconstrained resolutions a posteriori.Spatial aggregation errors have been shown to be particularly problematic when making use of continental, continuous CO 2 measurements in inversions (Gerbig et al., 2003b;Schuh et al., 2009).Temporal aggregation errors are also a concern when the shape of the diurnal cycle is fixed from biospheric models (Gourdji et al., 2010;Huntzinger et al., 2012b) or assumed flat, rather than estimated directly as in the current study.

Geostatistical inversions
GIM (e.g.Hoeksema and Kitanidis, 1984;Zimmerman et al., 1998) has been used to identify atmospheric trace gas sources and sinks (e.g.Michalak et al., 2004;Mueller et al., 2008;Gourdji et al., 2008Gourdji et al., , 2010;;Miller et al., 2012).Although GIM is Bayesian, it differs from synthesis Bayesian inversions (e.g.Baker et al., 2006;Peters et al., 2007;Butler et al., 2010) in a few key ways.GIM does not rely on a set of explicit prior flux estimates derived from biospheric models, fossil fuel inventories, fire emissions estimates, and/or ocean flux estimates.Traditional Bayesian inversion studies (e.g.Peters et al., 2007;Butler et al., 2010) typically use such explicit priors both to define a first estimate of flux magnitudes and to fix the fine-scale variability in the inversion, while the atmospheric data are used to adjust fluxes only at larger scales.Instead, GIM estimates fluxes directly at fine scales, and relies on a priori information on the spatiotemporal covariance between estimated fluxes to help constrain the problem.The parameters describing this covariance structure are optimized directly using the atmospheric measurements.Some recent synthesis Bayesian inversion studies also estimate fluxes at relatively fine spatial scales and rely on spatial covariance assumptions (e.g.Rödenbeck et al., 2003;Carouge et al., 2010a, b;Chevallier et al., 2010;Schuh et al., 2010), with the covariance parameters based on analyses of variability in biospheric models, or sensitivity testing to assess their impact on flux estimates.The GIM estimates presented here also estimate, rather than prescribe the diurnal cycle based on results from Gourdji et al. (2010) and Huntzinger et al. (2012b).We are not aware of any other regional inversions that have taken this step, which is aimed at minimizing temporal aggregation errors that can result from fluxes being adjusted at coarser (e.g.weekly to monthly) scales (e.g.Peters et al., 2007;Butler et al., 2010;Schuh et al., 2010).
The objective function L s,β for GIM is: where z represents the atmospheric CO The structure of Q and R remain the same as in Gourdji et al. (2010), with a few minor modifications.Separate modeldata mismatch values were optimized in R for each continuous measurement location, as well as for flask and aircraft data (with these latter two representing averages across sites).Additionally, covariance parameters are allowed to vary by month.In particular, spatial flux covariance has been found to have a strong seasonal cycle (Huntzinger et al., 2011) that should be appropriately accounted for in order to yield realistic estimates from the inversion.Monthly model-data mismatch values also help to account for seasonal variations in the quality of the transport model and inversion setup.All covariance parameters are estimated simultaneously, per month, using Restricted Maximum Likelihood (RML; e.g.Kitanidis, 1995, Michalak et al., 2004) with the atmospheric measurements, as described further in Gourdji et al. (2010).
The solution method for estimating fluxes ( s), drift coefficients ( β) and their a posteriori covariances (V s and V β ) is described in Gourdji et al. (2010).

Atmospheric CO 2 mixing ratio measurements
This study uses continuous, high-precision, well-calibrated CO 2 mixing ratio measurements from 9 observational locations unevenly spaced across the North American continent in 2004 (Fig. 1).These include two tall towers with a height of 457 m (Moody, Texas) and 396 m (Park Falls, Wisconsin), two coastal towers less than 25 m in height (Sable Island, Nova Scotia and Barrow, Alaska), and five other inland, continental towers ranging in height from 30 to 107 m (Norman, Oklahoma; Harvard Forest, Massachusetts; Argyle, Maine; Fraserdale, Ontario; Candle Lake, Saskatchewan).In addition, all available flask and aircraft measurements for this year are included, with the exception of flask samples at coincident tower locations and some coastal sites where the atmospheric transport model was deemed unreliable.Supplement A provides additional details on CO 2 data processing and filtering.
Although the North American measurement network has now expanded to more than 40 sites collecting continuous CO 2 mixing ratio measurements, many new sites are in complex terrain, urban areas, or are very short, and the optimal use of these data in inversions remains a topic of active research (Mueller, 2011;Manning, 2011).This study therefore provides a baseline for improving inversions taking advantage of a more data-rich environment.

Continental boundary conditions
Regional inversions necessitate the use of boundary conditions that represent the CO 2 concentrations of air flowing into the domain of interest (i.e. the North American land mass here).The impact of these boundary CO 2 mixing ratios on the observations used in the inversion must be pre-subtracted before inferring CO 2 fluxes.Two plausible sets of CO 2 boundary conditions are used in this study: one optimized as part of the CarbonTracker (termed "CT" in this work) global CO 2 data assimilation system (Peters et al., 2007), and the other derived more empirically (termed "EMP" in this work) from marine boundary layer and aircraft observations taken from the GLOBALVIEW-CO 2 (2010) database.Supplement A provides additional details on these datasets.Only results using the EMP boundary conditions are presented at the monthly timescale, but results with both sets of boundary conditions are discussed at the annual timescale, where they were seen to have a larger impact on the conclusions.

Atmospheric transport model
Surface influence functions ("footprints", or adjoint sensitivities) express the sensitivity of individual CO 2 measurements at specific points in space and time to surface fluxes in the upwind source regions.The Stochastic Time-Inverted Lagrangian Transport (STILT) model (Lin et al., 2003), driven by meteorological fields from the Weather Research and Forecasting (WRF) model (Skamarock and Klemp, 2008), customized for STILT (Nehrkorn et al., 2010), was used to derive these footprints.
The WRF-STILT framework is well-suited for this application, because: (1) WRF meteorology is available at higher resolution than that used in most global models, and therefore has the potential to be more realistic (Mass et al., 2002); (2) the Lagrangian approach minimizes numerical diffusion present in Eulerian models (Odman, 1997) and is thus better able to represent plumes in the near-field of the measurement locations (Lin et al., 2003;Wen et al., 2011); (3) the WRF-STILT coupling has been specifically designed to achieve good mass conservation characteristics by using time-averaged winds from WRF within STILT (Nehrkorn et al., 2010); and (4) the Lagrangian approach offers the most efficient way to compute the grid-scale footprints, by running transport backwards in time (Lin et al., 2003).The computational aspects of the footprint calculations are described in more detail in Supplement A.
The footprints can also be used to assess which portions of the continent are typically constrained by the measurements included in the inversion.Given the limited network in 2004, not all portions of North America are equally wellconstrained, as can be seen in the annual average footprint shown in Fig. 2a.Not surprisingly, the best-constrained areas are upwind of the measurement locations, in the Central and Northeastern continental United States (US) and a large portion of central Canada.Figure 2b shows three ecoregions across North America with similar land-cover and/or climatic characteristics (modified from Olson et al., 2001) that are relatively well-constrained by the network (although the Eastern Temperate Forests lack sensitivity in the Southeast US) Inversion results are aggregated a posteriori to these three ecoregions for comparison across all presented models.

Auxiliary variables and variable selection
Geostatistical inversions can estimate fluxes with varying levels of complexity in the flux covariate matrix (X).In ad-dition to potentially improving flux estimates, introducing auxiliary environmental variables as covariates in X makes it possible to identify significant flux drivers.
As in multi-linear regression, adding all possible covariates can help to improve the fit of the model to the data, although at the risk of introducing spurious relationships that could potentially bias flux estimates in under-constrained regions and time periods.Therefore, for this study, the Bayes Information Criterion (BIC) (Schwarz, 1978) is used to choose covariates that optimally explain the biospheric flux signal in the atmospheric data.The BIC was combined with the Branch-and-Bound algorithm (Yadav et al., 2012) in order to reduce computational expense.Supplement A presents the details of the implementation of these techniques.
Two main sets of inversion results are presented in this paper.For the first inversion, we include only a fossil fuel inventory dataset in the flux covariate matrix (X), with no additional information regarding biospheric processes.The fossil fuel inventory is included to help account for the spatial patterns of the fossil fuel emissions, which have a very different structure from the biospheric fluxes.This setup is referred to as the "Simple" inversion, given that the atmospheric observations, fossil fuel inventory and covariance assumptions provide the only constraint on biospheric fluxes in this setup.This inversion is conceptually similar to using a prior of zero in a synthesis Bayesian inversion setup, and provides an independent comparison to process-based, biospheric model output.For the second inversion, we additionally incorporate auxiliary variables from the North American Regional Reanalysis (NARR; Mesinger et al., 2006) to help explain the biospheric signal, which are selected using the combined BIC and Branch-and-Bound algorithm.This case is termed the "NARR" inversion.
All included flux covariates are defined at the resolution at which fluxes are estimated, i.e. 1 • × 1 • , 3-hourly, with each variable defined in a single column for all flux locations and time periods.Therefore, the inferred regression coefficients ( β) represent average relationships over the entire continent and year, with these averages reflecting the portions of the continent within the measurement footprints.

Fossil fuel inventory
The fossil fuel inventory used in this study is a merged data product providing full coverage for the continent.In the continental United States, we take advantage of diurnally and seasonally varying estimates from version 2.0 of the Vulcan database (Gurney et al., 2009).These estimates for 2002 are scaled up to 2004 total emissions for the region, but without any changes in the underlying spatial and temporal patterns.In Central America, Mexico and Canada, emission estimates are taken from a monthly-varying dataset specifically for 2004, which merges information from British Petroleum fuel statistics, remotely-sensed night lights, and the existing  (Oda and Maksyutov, 2011).
In the NARR inversion, the impact of fossil fuels is removed from the atmospheric observations a priori, by multiplying the inventory dataset by the footprints and subtracting the resulting signal from the atmospheric measurements.By pre-subtracting the fossil fuel influence from the measurements, we reduce potential covariance in the inferred regression coefficients between the fossil fuel inventory and biospheric datasets due to covariance in the underlying processes, e.g.re-growing forests and high emissions in the Eastern continental United States, or reduced populations and industrial activity in arid and snow-covered areas.Such covariance would confound flux interpretation by making it more difficult to separate the biospheric and fossil fuel signals a posteriori.In both inversions, errors in the fossil fuel inventories will become aliased onto the inferred biospheric fluxes, although these errors are thought to be small relative to the uncertainties in terrestrial biosphere models (Marland et al., 2009).

NARR variables
The NARR data products are a state-of-the-science meteorological reanalysis for North America, and have significantly improved the representation of the hydrological cycle relative to previous datasets (Bukovsky and Karoly, 2007).Eleven datasets with diurnal variability were considered for inclusion in the inversion, as well as two derived precipitation variables (average precipitation over the previous 16 and 30 day intervals, Table 2).This superset of NARR variables primarily relates to water fluxes and availability, although shortwave radiation, evapotranspiration and canopy conductance have a direct physiological relationship with photosynthetic CO 2 fluxes.Vegetative indices from remote-sensing datasets, such as Leaf Area Index or Fraction of Photosynthetically Active Radiation from the MODIS instruments (e.g.Yang et al., 2006), could also provide useful information regarding the seasonal cycle and spatial distribution of CO 2 flux (e.g.Gourdji et al., 2008).However, given that such datasets have a diurnally-varying relationship to CO 2 flux, this would have necessitated the use of diurnally-varying drift coefficients ( β), adding substantial complexity to the trend.Sensitivity tests showed that this setup did not help to improve flux estimates, and thus, these datasets were not included.Also, the evapotranspiration and canopy conductance variables from NARR are calculated with the Noah Land Surface Model (Ek et al., 2003), and therefore implicitly include a measure of vegetative biomass.

Comparison of inferred fluxes to estimates from other models
Biospheric a posteriori flux estimates from the GIM inversions are compared with those from previous inversion stud-ies (CarbonTracker 2009, i.e. Peters et al., 2007;Schuh et al., 2010;Butler et al., 2010) for the same domain, and also with a suite of bottom-up flux estimates from 16 terrestrial biosphere models that participated in the NACP RCIS (Huntzinger et al., 2012a).Table 1 compares the features of the GIM inversions conducted here and the synthesis Bayesian inversions included in the inter-comparison.In addition to the details presented in Table 1, each inversion has its own particularities in terms of data processing and selection, identification of covariance parameters, and numerical approaches for implementing the inversion scheme.Despite these differences, the available atmospheric measurements for this year are the same across studies, and the wellconstrained areas of the continent should therefore also be similar.The biospheric models from the NACP RCIS included in the inter-comparison are also described in more detail in Huntzinger et al. (2012a).
Model estimates are compared at monthly and annual timescales, and at grid and aggregated ecoregion and continental spatial scales.The analysis of grid-scale patterns helps to visually assess model output, despite high uncertainties at this fine scale, while the comparison at aggregated spatial scales helps to evaluate the quality of the biospheric models within the inferred 95 % uncertainty bounds from the GIM inversions.Net annual flux estimates from all models are less reliable, given that they represent a small residual on a strongly-varying seasonal cycle, although they are particularly important for understanding the carbon budget of North America, and the locations of net sources and sinks within the continent.Overall, this model inter-comparison gives insight into the CO 2 fluxes at various spatiotemporal scales and the strengths and weaknesses of various model formulations.However, an important caveat is that models may still agree with one another for the wrong reasons, e.g.due to systematic errors in the transport models across inversion studies.

Optimized covariance parameters
The monthly flux covariance parameters in Q provide insights into the underlying variability of the flux field, and how this variability changes throughout the year, while the inferred monthly model-data mismatch parameters in R quantify the ability of the inversion to take advantage of information contained in the CO 2 mixing ratios.
The spatial flux covariance parameters in Q have a strong seasonal cycle (Fig. 3), consistent with that observed in biospheric model estimates (Huntzinger et al., 2011), with the highest variance in July and August and the lowest in the dormant season from November through April.The spatial correlation lengths are shortest during the height of the growing season (850 and ∼600 km respectively for the Simple and NARR inversions in July), with average values across the year of ∼1700 and ∼1400 km respectively.The temporal correlation range varied from ∼5 days in October, a time of sharp seasonal transitions across the continent, to ∼50 days in January, with an average value of 22 days throughout the year for both inversions.In the NARR inversion, the variances and correlation lengths of the portion of the flux not explained by the trend Xβ are reduced relative to the Simple inversion, indicating that the NARR variables explain a portion of the variability in the flux distribution.
The yearly weighted average of the monthly model-data mismatch variances for each measurement site (or rather, the square root of this weighted average, σ R , with the weights defined by the number of data points per month) is shown for the Simple inversion in Fig. 3.The tower with the highest average model-data mismatch is Harvard Forest, which is located in a highly-productive forest (Urbanski et al., 2007) about 100 km west of Boston, and even closer to Worcester and Springfield, Massachusetts, while the 1 • × 1 • gridcell containing this site includes several other small towns and developed areas.The difficulty in matching the data at this tower is likely due to spatial aggregation and representation errors associated with fossil fuel emissions and heterogeneous land cover in the surrounding region.The two towers with the lowest model-data mismatch are BRW and CDL, the two northernmost sites in the domain, where the 1 • longitudinal grid-cell size is smaller (see Fig. 1).This could point to potential improvements in inversion performance that would be obtained by resolving fluxes at finer spatial scales, perhaps the resolution of the driving winds in the transport model (in this case 40-km or finer for all of North America).
Model-data mismatch can also have significant seasonal variations.For example, the optimized model-data mismatch standard deviation at ARM is 4.7 ppm in April, 0.8 ppm in July, and 0.0004 ppm in September.The high model-data mismatch in April for ARM implies that the full strength of local uptake evident in the measurement data in this month (see Fig. A1 in the Supplement), most likely due to the winter wheat crop planted upwind of the tower (McPherson et al., 2004), is not fully represented by the inversion.

Fossil fuel emissions inventory
If the fossil fuel emissions inventory and atmospheric transport model used in this study were perfect, the associated drift coefficient should be approximately one, whereas other values could imply problems with the inversion setup, systematic transport model errors, a lack of atmospheric constraint, covariance between the fossil fuel and biospheric flux signals, and/or errors in the spatiotemporal patterns and magnitudes of emissions in the inventory dataset.The drift coefficients for the Simple inversions were not significantly different from one ( β = 0.91 and 0.95 for the EMP and CT boundary conditions, respectively, and σ β = 0.06 for both cases), providing support both for the quality of the inventory and the inversion setup.In addition, this result makes it relatively easy to separate the biospheric and fossil fuel contributions to the total flux a posteriori.

Selected NARR variables
The variables selected by the BIC approach (Sect.2.4 and Supplement A), and their inferred relationships to CO 2 flux (Table 2) are consistent with process-based understanding of the biospheric carbon cycle.For example, evapotranspiration explains the largest portion of the uptake signal (as indicated by a large negative β), which is consistent with the physiological relationship between plant photosynthetic and transpiration fluxes (Bonan, 2008), and similar relationships between evapotranspiration and CO 2 flux have been found in other regression studies using eddy-covariance measurements (e.g.Mueller et al., 2010;Yadav et al., 2010).Combining evapotranspiration estimates with basin-wide estimates of water-use efficiency has also proven to be a robust method for estimating gross primary production (GPP) at watershed scales (Beer et al., 2007).Canopy conductance, which has a similar mechanistic correlation with photosynthesis, was not selected as a significant variable, potentially due to the difficulty in up-scaling this value to landscape scales (Anderson et al., 2003).
The positive β value associated with specific humidity is consistent with known drivers of heterotrophic respiration (Lloyd and Taylor, 1994;Ise and Moorcroft, 2006), given that this variable, or the mass of water vapor per unit mass of air, scales with both air temperature and surface moisture.The NARR specific humidity and air temperature variables have a correlation coefficient of 0.83 for 2004, and, consequently the β uncertainties (derived from V β ) on these two variables have a strong a posteriori anti-correlation (see Table B1 in the Supplement).This implies that the air temperature variable is primarily helping to correct the signal associated with specific humidity.The positive β, or source associated with precipitation rate at a 3-hourly timescale is consistent with flux tower studies showing pulses of respiration following rain events (Baldocchi, 2008).
While the selected NARR variables and inferred β's are consistent with process-based understanding of biospheric CO 2 flux, not all processes can be easily included in a statistical regression model like the one used here.For example, disturbance (e.g.storms, flooding, insect infestation) may not be fully reflected in the auxiliary variables available for analysis.In addition, this study only chose to examine diurnallyvarying variables from the NARR, and excluded other possible datasets that could help to explain fluxes at daily or weekly timescales (e.g. a fire emission inventory).Third, while evapotranspiration and canopy conductance implicitly incorporate a measure of live biomass, there are no variables within the NARR superset that can directly represent dead substrate availability for heterotrophic respiration, e.g.crop residues after agricultural harvesting or downed woody debris following a storm.Finally, the NARR variables themselves have known biases and limitations (e.g.Bukovsky and Karoly, 2007;West et al., 2007;Markovic et al., 2009).By design, however, any portion of the flux variability that is visible in the atmospheric observations, but that cannot be represented using the available auxiliary environmental datasets, can still be included in the best flux estimates in GIM through the spatiotemporally-correlated stochastic component.(See Eq. ( 5) in Gourdji et al. (2010), and Fig. B1 in the Supplement for an example of the contribution to net grid-scale fluxes in April by each component of the best estimate).

Seasonal cycle of biospheric fluxes
This section compares monthly biospheric fluxes from the GIM Simple and NARR inversions, aggregated a posteriori from the fluxes estimated at the 3-hourly timescale, to results from other inversion studies and from biospheric models participating in the NACP RCIS (Huntzinger et al., 2012a).The median of the biospheric models is used to represent the current consensus of the terrestrial ecosystem modeling community, although it may still be subject to systematic errors across models.Although the mean has also been shown to have high skill relative to individual models in previous studies (Schwalm et al., 2010), the median is used here to reduce the impact of outliers.Both the mean and median showed better agreement with GIM estimates than any individual biospheric model included in the inter-comparison.

Grid-scale estimates
Grid-scale spatial patterns are similar between the Simple and NARR inversion monthly fluxes (Fig. 4), implying that these patterns are primarily informed by the atmospheric observations and spatiotemporal flux correlations, and can therefore be used to evaluate biospheric model estimates.
The NARR auxiliary variables add some realistic heterogeneity to the grid-scale flux estimates relative to the Simple inversion, as seen by a slightly stronger correspondence with the biospheric model median, particularly in underconstrained regions like the Pacific coast.
The grid-scale GIM estimates have a stronger seasonality relative to the biospheric model median, with larger sources in the winter and stronger sinks in the growing season, but the locations with net uptake and release are generally consistent, especially during months with net uptake.The stronger seasonality of the GIM fluxes may be due to errors in the biospheric model median caused by ignoring models that show the strongest sources or sinks in each month.In addition, GIM inversions that account for both the temporal and spatial covariance of fluxes perform better in terms of capturing grid-scale spatial patterns, but in some cases lead to stronger flux variability relative to inversions that account only for spatial covariance.
The GIM estimates exhibit spatial patterns that are more consistent with the biospheric models during the growing, relative to the dormant season.The spatial correlation coefficient between the NARR inversion estimates and the biospheric model median is 0.57 and 0.48 respectively in April and July, compared to only 0.27 and −0.14 in January and October, respectively.Similarly, results for the dormant season show a larger number of biospheric models being significantly different from the GIM results in more areas of the continent (Fig. 4, last column), where the significance is defined as a biospheric model being outside the 95 % confidence intervals of the GIM NARR estimates.
The greater similarity during the growing season is likely due to, first, both the inversions and biospheric models having greater skill during periods with net uptake.The increased skill during the growing season has previously been documented for biospheric models in model inter-comparison studies using eddy-covariance data (e.g.Schwalm et al., 2010).For inversions, this greater skill is due to the stronger atmospheric signal during this season, thereby making it easier for inversions to constrain fluxes.
Second, errors in the magnitude or spatial patterns of the fossil fuel inventories, or an inability to properly interpret fossil fuel plumes within the inversion, impact the GIM biospheric fluxes.These errors would be most evident in the dormant season when fossil fuel emissions represent a larger proportion of the total CO 2 flux.For example, some of the strong sources in the GIM results in the south-central portion of the continent in January may be due to representation errors associated with emission plumes at the Moody, Texas tower.
Third, most of the biospheric models in the NACP RCIS do not specifically account for managed agricultural processes (i.e.fertilizer, irrigation, crop harvest and lateral transport, etc.).While this would reduce model skill throughout the year in agricultural areas of the continent (Lokupitiya et al., 2009;Corbin et al., 2010;Huntzinger et al., 2012a)  areas with a large consumption of agricultural products (e.g. by human or livestock populations), it would be particularly evident in the presented October fluxes.The strong sources seen in the inversions in this month reflect a strong build-up of CO 2 relative to background air at three towers in heavily agricultural areas (Candle Lake, Saskatchewan, Park Falls, Wisconsin and Norman, Oklahoma; see Fig. A1 in the Supplement), and are most likely associated with the decay of residual biomass after crop harvesting (Johnson et al., 2006).The stronger GIM sinks northwest of the Norman, Oklahoma tower in April and in the agricultural Midwest during July, are also supported by process-based studies showing stronger productivity in well-irrigated and fertilized agricultural croplands relative to unmanaged grasslands, or the native vegetation of these areas as parameterized in many of the biospheric models (Lokupitiya et al., 2009;Corbin et al., 2010;Smith et al., 2010).

Ecoregion-scale estimates
Test inversions using synthetic data showed that monthly inversion results at aggregated spatial scales should be reliable in well-constrained areas of the continent (barring substan-tial transport errors).Therefore, spatially-aggregated GIM results are compared to other inversions and the biospheric models (Fig. 5) for the three ecoregions presented in Fig. 2b, as well as for the full continent.
At the ecoregion scale, similarly to the grid-scale, inclusion of NARR auxiliary variables is seen to have no significant impact on GIM flux estimates, consistent with results from Gourdji et al. (2008).Yet, the across-model spread for both inversion studies and biospheric models at these aggregated scales is quite large, and generally wider than the GIM confidence intervals throughout the year, particularly in the Eastern Temperate Forest and the Temperate, Grass, Savannah, Shrub ecoregions.More generally, the spread across inversion studies is smaller than that seen in the biospheric models, thereby supporting the use of the atmospheric data constraint to evaluate biospheric models.
Compared to the other inversion studies, the seasonal cycle of the GIM estimates is within the spread of the other estimates.The spread across inversions is narrower in the better-constrained ecoregions (i.e.Temperate Grass/Savannah/Shrub, and Boreal Forest), pointing to the value of an expanded measurement network for reducing the impact of inversion setup choices and assumptions on flux estimates.The spread between inversions is also narrowest for the two largest examined regions (Boreal Forest and the full continent), indicating a better constraint on flux estimates at progressively larger spatial scales, at least with the limited 2004 measurement network.
In terms of individual inversions, GIM flux estimates are relatively similar to those from CarbonTracker, particularly in the Eastern Temperate and Boreal Forests, despite significantly different estimation resolutions and transport models (Table 1).The strong growing season uptake in the Butler et al. (2010) study relative to the other inversions, particularly in the Boreal and Eastern Temperate Forests, may be due to aggregation errors associated with using measurements from highly productive areas, and extrapolating this signal too strongly across that study's coarser estimation regions (see Table 1).The monthly concentration-averaging intervals and flux estimation timescale used in the Butler et al. (2010) study also support this conclusion.Aggregation errors could also be affecting the CarbonTracker results in the Temperate Grass/Savannah/Shrub ecoregion, where this inversion shows stronger peak uptake in July and August relative to the other studies.The Schuh et al. (2010) results show a weaker seasonal cycle and an earlier peak uptake relative to the other inversions in the Eastern Temperate Forests, and Temperate Grass/Savannah/Shrub.Among other potential factors, this result could be due to a strong adherence to the seasonal cycle of the prior (SiB3), which also has relatively early peak uptake in the continental US relative to other biospheric models.This would imply tighter prior flux uncertainties in Schuh et al. (2010) relative to the Butler et al. (2010) study, where the use of CASA vs. SiB3 as a prior changes flux estimates only slightly at the ecoregion scale.
The spread among the biospheric models is generally larger than that seen across inversion studies, especially in the Temperate Grass/Savannah/Shrub ecoregion, which has a relatively high proportion of agricultural land-cover.In this ecoregion, the GIM inversions show a stronger, later peak uptake relative to the majority of the biospheric models, as well as stronger sources in the dormant season, particularly in October.The lack of agreement in this region may again reflect the lack of skill in modeling agricultural processes in the biospheric models (Lokupitiya et al., 2009;Corbin et al., 2010).In the Eastern Temperate Forests, the GIM results fall within the wide biospheric model spread.EC-MOD and MOD17+, the two biospheric models with the strongest growing season uptake in the Eastern Temperate Forests, appear unrealistic when compared to GIM estimates and uncertainties.In the Boreal Forests and at the continental scale, the spread in the biospheric models is narrower and also more consistent with the GIM results.In the Boreal Forests, all models agree reasonably well on the timing of the seasonal cycle, and the biospheric model median falls within the GIM 95 % confidence intervals for all months.At the continental scale, the spread in both the biospheric model estimates and the inversion studies is the narrowest, pointing to errors that cancel at large spatial scales in both bottom-up and top-down models for the seasonal cycle.

Net annual sources and sinks
Net annual fluxes represent the small residual of a strong seasonal cycle.Therefore, relatively small monthly uncertainties represent a larger proportion of the annual totals.Also, any biases in the inversions or biospheric models may accumulate at longer timescales.In addition, many biospheric models are specifically not designed to capture net annual sources and sinks at all.For these reasons, the annual flux estimates presented here are considered to be less robust than the monthly estimates.

Grid-scale estimates
Although the annual grid-scale GIM flux estimates (Fig. 6) show relatively few areas with statistically significant sources or sinks, the spatial patterns are quite consistent between the two sets of boundary conditions and also with inventorybased estimates of CO 2 flux.The GIM inversions show net annual uptake in the Eastern US, the midwestern agricultural areas in the US and Canada, and parts of the continental western US, which is consistent with bottom-up inventory estimates from the State of the Carbon Cycle Report (SOCCR;CCSP, 2007) and a recent inventory study by Hayes et al. (2012).Results for these regions are also consistent with Crevoisier et al. (2010), who used an independent carbon budgeting method for North America based on vertical profiles of CO 2 mixing ratios collected from aircraft observations.
The GIM estimates show net annual sources in the southwest US, along the Canadian Pacific Coast, and in the Yucatan peninsula of Mexico.The existence of net sources in the Southwest is consistent with Hayes et al. (2012), who hypothesized that livestock and human consumption of agricultural products grown elsewhere could be contributing to sources in these heavily-populated but arid areas, as did Crevoisier et al. (2010).The lack of annual sources in these areas in the majority of the biospheric models (results not shown) is consistent with the fact that many do not explicitly account for the lateral transport of agricultural products.The net sources from the Mexican tropical forests in GIM are also consistent with inventory-based estimates of emissions from land-use change in this region (Cairns et al., 2000;de Jong et al., 2010), while the GIM sources in the Pacific Northwest most likely represent an artifact of the NARR auxiliary variables in this highly under-constrained region.
While inversions with the two boundary condition datasets return similar spatial patterns, magnitudes differ for net annual flux estimates, with stronger sinks and weaker sources evident in almost all regions with the CT relative to the EMP dataset.Most of these differences at the annual timescale are due to a difference in net uptake from March through August, consistent with the seasonally-varying offset between the two sets of boundary conditions (see Fig. A2 in the Supplement).

Ecoregion-scale estimates
At the ecoregion scale (Fig. 7), all inversions and biospheric models show a net biospheric uptake in the Eastern Temperate Forests.In this region, the CarbonTracker and Schuh et al. (2010) results fall within the 95 % confidence intervals for 3 of the 4 geostatistical inversions, while the majority of the biospheric models fall within the confidence intervals of the GIM Simple inversion with EMP boundary conditions.The strong uptake of approximately 1 PgC yr −1 seen here in the Butler et al. (2010) results is inconsistent with the other inversions and all but two of the biospheric models, and may reflect the impact of aggregation errors, as discussed previously.In the Boreal Forests, all inversion studies show net uptake, while the biospheric models span both net sources and sinks.CarbonTracker and the Butler et al. (2010) estimates fall within the confidence intervals of the GIM inversions with EMP and CT boundary conditions, respectively, while the Schuh et al. (2010) inversion does not cover this region.In the Temperate Grass/Savannah/Shrub ecoregion, the GIM inversions show a more neutral flux relative to the other inversion models and the biospheric models.It is possible that the biospheric models are under-estimating sources in this region due to their lack of skill in modeling residual biomass and the fate of harvested products, as discussed previously, and the other inversions would then also be biased by their use of these same biospheric models as prior flux estimates.As described previously, the GIM sources in these regions in spring and fall are supported by CO 2 mixing ratio Biogeosciences, 9, 457-475, 2012 www.biogeosciences.net/9/457/2012/time series at the tower locations, as shown in Fig. A1 in the Supplement.Differences across models in net annual fluxes are most evident at the continental scale (Fig. 7).The spread in the continental budget seen here is larger than that for the continental seasonal cycle (Fig. 5), although this partially reflects a difference in units between Fig. 7 (PgC yr −1 ), which is not normalized by area, and Fig. 5 (µmol m −2 s).
At this coarsest scale of comparison, the GIM results are strongly affected by the choice of boundary conditions (as in Göckede et al., 2010a, which estimated fluxes for the state of Oregon), where the consistent offset between these two datasets leads to an additive effect on flux estimates in both space and time.With the EMP boundary conditions, GIM returns a biospheric flux that is not significantly different from zero, and reduces the net North American sink by approximately 0.7 to 0.9 PgC yr −1 relative to the GIM inversion with CT boundary conditions.The CarbonTracker (Peters et al., 2007) estimates are statistically consistent with the GIM inversions with CT boundary conditions at the annual continental scale, supporting the conclusion that boundary conditions are the primary control on fluxes at this coarsest scale, whereas flux resolution, priors, transport model and data choices, have a stronger impact at sub-domain scales.An inventory-based estimate of the North American carbon balance from Hayes et al. (2012) found a net biospheric uptake of 0.33 PgC yr −1 , which is statistically consistent with the GIM Simple inversion using the EMP boundary conditions, suggesting that these boundary conditions may be more realistic relative to those from CT.The main conclusion, however, is that the uncertainty associated with boundary conditions must be reduced if regional inversions are to be able to accurately pinpoint net biospheric uptake for the entire spatial domain of interest.

Conclusions
The primary goal of this study was to perform an intercomparison of North American CO 2 flux estimates for a single year (2004) across inverse modeling studies and biospheric models, using results from a geostatistical inversion approach as a benchmark.Fluxes were quantified in the GIM inversions at a 1 • × 1 • spatial and 3-hourly temporal resolution, and covariance parameters and process-based auxiliary information were also optimized using the atmospheric data.By estimating fluxes at substantially finer scales relative to previous inversions, this study reduces spatial and temporal aggregation errors associated with using continental measurements sited in areas with high flux variability, and allows for the recovery of sub-continental spatial patterns from the atmospheric CO 2 measurements.In addition, avoiding the use of prior flux estimates from biospheric models avoids any biases caused by shortcomings in process-based representa-tions, and allows for a more independent comparison to flux estimates from terrestrial ecosystem models.
The GIM Simple inversion, that included only a fossil fuel inventory as ancillary data, yielded results that support the quality of the inventory as well as the GIM setup and assumptions.Significant regional-scale spatial and temporal variability was recovered, indicating that the information content of atmospheric observations is greater than implied in past studies.The introduction of NARR auxiliary variables into the second, NARR inversion confirmed that flux patterns in the majority of the continent are constrained by atmospheric observations rather than ancillary data, although these did help to extrapolate the signal to particularly poorlyconstrained regions.NARR inversion results are also consistent with process-based understanding of the drivers of CO 2 fluxes from photosynthesis and respiration, with evapotranspiration explaining a substantial portion of the net uptake signal.
For the grid-scale seasonal cycle, the GIM inversions were found to have more consistent spatial patterns with biospheric models during the growing relative to the dormant season.This could be due to errors in the fossil fuel inventories that are aliased onto the inferred fluxes, which would be more evident in the dormant season when emissions dominate the total CO 2 flux, or errors in the biospheric models themselves which may have less skill outside of the growing season (e.g.Schwalm et al., 2010).Finally, the fluxes during the growing season are stronger and more variable, and therefore perhaps easier to identify from the atmospheric signal.
At the ecoregion-scale, comparisons of inferred seasonal cycles across inversion studies pointed to the strong impact of setup on flux estimates, particularly adherence to the bottom-up prior flux estimates and aggregation errors associated with estimating fluxes at coarse-scales relative to the fine-scale variability embedded in the priors.A strong atmospheric data constraint reduced this impact somewhat, with inversions that estimate fluxes at different spatial scales showing more consistent results in well-constrained regions, i.e. the Boreal Forests and Temperate Grass/Savannah/Shrub.This suggests that an expanded measurement network will further help to reduce the sensitivity of inversion results to setup assumptions, although systematic transport model errors will still remain a concern.
The comparison of GIM results to the biospheric models at the ecoregion-scale pointed to the need for the biospheric models to better account for crop parameterizations, landmanagement practices, and the fate of harvested products in the agricultural areas of the continent.For example, the geostatistical inversions showed strong sources in the center of the continent in March and October, which were not seen in the majority of the biospheric models.The fact that these sources were also absent in some of the examined inversion studies may also point to the impact of these biospheric At the annual timescale, the GIM inversions showed net uptake in the Eastern Temperate and Boreal Forests, but more of a neutral flux in the Temperate Grass/Savannah/Shrub.This latter result is inconsistent with other inversion studies and bottom-up models, which may not have properly accounted for the fate of harvested agricultural products and residual biomass.Net annual sources inferred in the Southwest are also consistent with inventory-based estimates of livestock and human consumption of agricultural products in these sparsely-vegetated areas.
While the boundary conditions used as input into regional inversions were found to have only minor impact on recovered grid-scale spatial patterns, they were found to lead to large differences in the GIM flux estimates at the coarsest spatial and temporal scale of comparison, i.e. the annual continental scale.The use of an empirically-derived boundary condition dataset eliminated the North American carbon sink for this year relative to an inversion relying on model output from CarbonTracker for boundary CO 2 concentrations.This result points to the importance of accurately pinpointing the CO 2 concentrations of incoming air for appropriate carbon budgeting at the scale of countries in North America.
Overall, the North American geostatistical inversion for 2004 presented here provides a robust inversion framework suitable for ingesting the large data volumes associated with the recent expansion of the in situ CO 2 monitoring network over North America (Mueller, 2011).In particular, the presented GIM approach estimates fluxes at unprecedented spatiotemporal scales that help to optimally take advantage of the information contained in highly variable continental measurement data.The GIM approach also provides an independent comparison to bottom-up model estimates, thereby helping to provide insight into the currently large spread of biospheric model estimates of regional CO 2 flux, while providing a path forward for improving their formulation of processes at regional scales in future work.

Fig. 1 .
Fig. 1.Domains of nested WRF winds, flux estimation grid, and the locations of towers, flask and aircraft measurements used in the inversions.See TableA1in the Supplement for a key to the tower names.

Fig. 2 .
Fig. 2. (a) Sensitivities of all observations used in the inversion (as shown in Table A1 in the Supplement) to flux estimates, averaged across all 3-hourly flux time periods for the year for each grid-cell location.Black stars represent the nine continuous measurement locations in 2004.(b) Three ecoregions (modified from Olson et al., 2001) used for spatial aggregation of flux estimates.These three ecoregions, along with the grey grid-cells, form the domain for the aggregated North American totals in Figs. 5 and 7.

Fig. 3 .
Fig. 3. (a, b) Optimized covariance parameters using the RML algorithm with the atmospheric measurements, for the GIM inversions with the EMP boundary conditions.(a) Monthly flux standard deviations (σ Q ) for the Simple and NARR inversions.(b) Square root of the yearly average model-data mismatch variances (σ R ) (weighted by the number of data-points in each month) for 9 towers, flask and aircraft data from the Simple inversion.Location information for each measurement code is included in Table A1 in the Supplement.

Fig. 4 .
Fig. 4. Monthly-averaged grid-scale biospheric fluxes for January, April, July and October from the Simple and NARR inversions using the EMP boundary conditions, and the median of biospheric models participating in the NACP RCIS.Also shown are the grid-scale spatial correlation coefficients (ρ) and Root Mean Squared Difference (RMSD; µmol m −2 s −1 ) between the inversion and the biospheric model median fluxes.Please note the different scales for each month.The plots in the right-most column show the percent of individual biospheric models that are outside the 95 % confidence intervals of the NARR inversion for each month and location.

Fig. 5 .
Fig. 5. Seasonal cycle of monthly-averaged fluxes aggregated to the three well-constrained ecoregions, as well as to the full continent (Fig. 2b).GIM fluxes using the EMP boundary conditions are compared to other inversions (top row) and individual biospheric models (bottom row) that have coverage in at least 85 % of the given regions.Inversions with the same line color use similar biospheric model output for their prior flux estimates.

Fig. 6 .
Fig. 6.Annually-averaged grid-scale biospheric fluxes from the GIM NARR inversions using the EMP and CT boundary conditions.Also shown are significant sources and sinks (at 67 % and 95 % confidence levels) for each inversion.

Fig. 7 .
Fig. 7. Annual biospheric flux estimates spatially aggregated to the three ecoregions and North America (Fig. 2b).Results from the Simple and NARR inversions using the EMP and CT boundary conditions are compared to results from other inversions and the biospheric models participating in the NACP RCIS.GIM results are shown within their 95 % confidence intervals.
2 measurements [ppm], and s are the unknown surface CO 2 fluxes [µmol (m −2 s)].H describes the sensitivity of CO 2 measurements to fluxes, as quantified from an atmospheric transport model, with units of [ppm/(µmol (m −2 s))].X

Table 1 .
Comparison of setup and input data across published inversion studies for North America in 2004.

Table 2 .
Candidate NARR environmental covariates, with associated β values for those variables selected using BIC for inclusion in the NARR inversion with the EMP boundary conditions.Auxiliary variables were normalized to zero mean and unit variance, such that β values are directly comparable.Variables with dashes ("-") were considered but not selected by the BIC/ Branch & Bound algorithm.Also shown are the coefficients of variation (CV) for the selected variables (i.e.σ β / β ).