Biogeosciences A systematic approach for comparing modeled biospheric carbon fluxes across regional scales

Given the large differences between biospheric model estimates of regional carbon exchange, there is a need to understand and reconcile the predicted spatial variability of fluxes across models. This paper presents a set of quantitative tools that can be applied to systematically compare flux estimates despite the inherent differences in model formulation. The presented methods include variogram analysis, variable selection, and geostatistical regression. These methods are evaluated in terms of their ability to assess and identify differences in spatial variability in flux estimates across North America among a small subset of models, as well as differences in the environmental drivers that best explain the spatial variability of predicted fluxes. The examined models are the Simple Biosphere (SiB 3.0), Carnegie Ames Stanford Approach (CASA), and CASA coupled with the Global Fire Emissions Database (CASA GFEDv2), and the analyses are performed on model-predicted net ecosystem exchange, gross primary production, and ecosystem respiration. Variogram analysis reveals consistent seasonal differences in spatial variability among modeled fluxes at a 1 ◦ × 1 spatial resolution. However, significant differences are observed in the overall magnitude of the carbon flux spatial variability across models, in both net ecosystem exchange and component fluxes. Results of the variable selection and geostatistical regression analyses suggest fundamental differences between the models in terms of the factors that explain the spatial variability of predicted flux. For example, carbon flux is more strongly correlated with percent land cover in CASA GFEDv2 than in SiB or CASA. Some of the differences in spatial patterns of estimated flux can be linked back to differences in model formulation, and would have been difficult Correspondence to: D. N. Huntzinger (deborah.huntzinger@nau.edu) to identify simply by comparing net fluxes between models. Overall, the systematic approach presented here provides a set of tools for comparing predicted grid-scale fluxes across models, a task that has historically been difficult unless standardized forcing data were prescribed, or a detailed sensitivity analysis performed.


Introduction
The quantification of net ecosystem exchange (NEE) and its regional variability involves a great deal of uncertainty, due to the considerable spatial complexity of, and interactions among, its individual controlling processes (Melillo et al., 1995;House et al., 2003).This uncertainty is compounded by the fact that, at global and regional scales, carbon flux cannot be measured directly (Cramer et al., 1999).Yet, climate change predictions depend on our ability to appropriately assess and model the current (and future) behavior of carbon uptake and release across various spatial scales.
In response to this need, a number of biospheric or process-based models have been developed to estimate the magnitude of carbon sources and sinks across regional and continental scales.These models are based on the current mechanistic understanding of how carbon is exchanged within ecosystems.Models are typically driven by climate data, and constrained with environmental parameters and information about soil properties, nutrient cycling, vegetative cover, and other parameters that influence carbon fixation and respiration (Wulder et al., 2007).Models differ, however, in terms of the factors considered (e.g., land-use change, disturbance events, nutrient cycling), how processes are formulated within the model, or the type of environmental driver data used (Knorr, 2000).This is due, in part, to the various D. N. Huntzinger et al.: Systematic approach for comparing carbon fluxes purposes for which the models were created (e.g., tracking carbon stocks, estimating energy and/or carbon fluxes, forecasting future carbon cycling behavior or vegetative cover migration).As a result, each model generates a unique, and sometimes substantially different spatial distribution of fluxes for a given time period.
There are several approaches for evaluating biospheric models.One way to assess model performance is through validation with observational data.For example, some models are parameterized (and validated) with site-level observations from eddy-covariance flux towers (Baldocchi et al., 2001).However, because direct observations of carbon flux are not available at regional or continental scales, models used over large regions cannot be validated directly using field observations.Continental-to global-scale forward models can be compared with estimates from atmospheric inversion models that couple measurements of CO 2 concentrations with transport models to infer surface flux distributions (e.g., Rödenbeck et al., 2003;Gurney et al., 2004;Peters et al., 2007;Gourdji et al., 2008).However, many of these inversions use biospheric model output as prior estimates of carbon flux, and, therefore, the resultant fluxes are not independent of biospheric model assumptions.In the absence of direct observations, forward model inter-comparisons are a necessary first-step in assessing model performance, and evaluating our current understanding of the terrestrial carbon cycle system.
Model assumptions, formulation, and the environmental driver data used all have an impact on the magnitude and spatial distribution of model-estimated fluxes.There is a great deal of uncertainty, therefore, when comparing fluxes among models and making inferences about the root of their similarities and differences.In order to reduce some of this uncertainty, several studies (Melillo et al., 1995;Heimann et al., 1998;Cramer and Field, 1999;Cramer et al., 1999;Knorr, 2000) have compared model results after imposing a standardized set of input driver data, thereby providing a uniform basis for comparison.Significant effort is required, however, for organizing and conducting a formal model intercomparison with standardized environmental driving data, spin up, and land-use history.In addition, such an approach may not be feasible without sufficient resources and access to model code and input variables.Thus, as a complement to detailed sensitivity analyses, there is a need for quantitative tools that can be applied to compare existing, in-hand model results.
In the absence of standardized model simulations and/or detailed sensitivity analyses, quantitative methods employed in model intercomparisons have traditionally relied on the direct comparison of estimated fluxes, or on relatively simple statistics.For example, the use of linear correlation statistics such as the Pearson correlation coefficient (e.g., Bondeau et al., 1999) or Spearman Rank correlations (e.g., Schloss et al., 1999) have been used to examine the correlation of estimated fluxes to individual environmental or climatic parameters such as precipitation, temperature, water availabil-ity, or vegetation indices.Ruimy et al. (1999) applied linear regression to examine the direct dependence of net primary productivity (NPP) on light use efficiency (LUE) in several biospheric models.Similarly, Pan et al. (1998) studied the relationship of NPP to mean annual temperature and annual precipitation within individual biomes using multiple linear regression.
In these types of approaches, the influence, or importance, of selected environmental factors on flux is examined individually and independently, and often without taking into account the spatial autocorrelation of fluxes.Not accounting for the spatial correlation of model estimates of NEE, however, can lead to misrepresentations of inferred relationships and their associated uncertainty (Hoeting et al., 2006).In addition, linear regression of fluxes to a single environmental variable may produce erroneous results.For example, many environmental variables exhibit a seasonal cycle similar to NEE.When only one environmental variable is regressed against NEE, the derived relationship may be a result of correlation in their seasonal cycles rather than a true explanatory relationship.Therefore, the resultant regression represents a scaling parameter (e.g., how to scale the variable to look like NEE), rather than that variable's true relationship to flux.
Recently, some studies have applied more sophisticated statistical methods for analyzing model predictions.For example, Wulder et al. (2007) used local spatial autocorrelation to compare outputs from multiple runs of a forest-growth model, to identify areas where there was a systematic sensitivity of the model to underlying changes in soil water and fertility inputs.Geostatistical regression and model selection approaches have been used to assess the influence of various environmental factors on carbon flux observed at eddycovariance flux tower sites (Mueller et al., 2010;Yadav et al., 2010), and wavelet analysis has been used at flux tower sites to decompose NEE into its component fluxes of gross primary production (GPP) and ecosystem respiration (R E ) (e.g., Stoy et al., 2005Stoy et al., , 2009)).Although both of these approaches account for autocorrelation in NEE, they deal with autocorrelation of fluxes in time, rather than space.
In this paper, statistical tools that account for the spatial autocorrelation in land-atmosphere carbon fluxes are applied in order to compare flux estimates across models, in light of inherent differences among the models, and without the need for new model simulations.The presented systematic approach is especially useful when detailed sensitivity analyses are not possible.The applied tools are used to quantify the degree of spatial autocorrelation of the estimated fluxes, to statistically select environmental variables that best explain the predicted fluxes, and, through geostatistical regression, to evaluate the relationship between the selected variables and a model's estimate of land-atmosphere surface flux.The methods are evaluated in terms of their usefulness in elucidating model differences using a small set of biospheric models: the Simple Biosphere Model (SiB 3.0, e.g., Baker et al., 2008); the Carnegie Ames Stanford Approach (CASA, e.g., Potter  (2003, 2008) et al., 1993); and CASA coupled with the Global Fire Emissions Database (CASA GFEDv2, e.g., van der Werf et al., 2006).

Methods
Geostatistical methods can be used to describe the spatial continuity of natural phenomena, and their relationship to independent variables (Isaaks and Srivashtava, 1989).Geostatistical approaches have been applied to many fields in the Earth and environmental sciences, including geology, ecology, and hydrology to study spatial phenomena (e.g., Chiles and Delfiner, 1999;Webster and Oliver, 2007).In studying land-atmosphere carbon dynamics, geostatistical inversion modeling has been used to estimate the spatial distribution of carbon sources and sinks both globally, and regionally over North America (e.g., Michalak et al., 2004;Gourdji et al., 2008Gourdji et al., , 2010;;Mueller et al., 2008).Geostatistical regression approaches, such as the one described here, have been used regionally to evaluate the relationship between environmental variables (e.g., Erickson et al., 2005), and have recently been applied to evaluate the potential controls on carbon uptake and release at flux tower sites in North America (NA) (e.g., Mueller et al., 2010;Yadav et al., 2010).The approach presented here applies geostatistical regression modeling tools as a means to compare the spatial patterns of estimated flux from biospheric models.Thus, the analysis includes variogram analysis of modeled NEE and its component fluxes (GPP and R E ), variable selection methods, and geostatistical regression.

Models
The three terrestrial biospheric models chosen for this analysis are the Simple Biosphere model (SiB 3.0), the Carnegie Ames Stanford Approach (CASA) model, and CASA coupled with the Global Fire Emissions Database (CASA GFEDv2) (Table 1).The models chosen for this study differ in how they represent the processes controlling carbon exchange between the land and atmosphere, and were selected because (Eq. 1) they have been widely applied across a variety of regions and (Eq.2) are frequently used as prior estimates in inverse modeling studies (e.g., Gurney et al., 2004;Peters et al., 2007;Wang et al., 2007).There is growing awareness of the strong influence of prior estimates in inversion results (Rödenbeck et al., 2003;Michalak et al., 2004;Mueller et al., 2008).Therefore, understanding how forward model estimates of NEE, GPP, and R E vary spatially and what drives this variability is of great importance, not only for forward model intercomparisons, but also for atmospheric inversion studies.
Information about the driving environmental variables used in each model is provided in Table 1.Monthly NEE, GPP, and R E are compared among the models for 2002 over the domain of 10 • to 70 • N and 50 • to 170 • W, at a spatial resolution of 1 • × 1 • (approximately 110 km by 50 km; the distance per degree longitude varies across the domain).

Simple Biosphere Model (SiB 3.0)
SiB (Sellers et al., 1986(Sellers et al., , 1996a, b, b) is a biophysically-based, land-surface model that is formulated based on the theory that the physiological controls operating at the plant level seek to maximize photosynthesis while minimizing water loss (Baker et al., 2003(Baker et al., , 2008)).Thus, photosynthetic carbon fixation is based on enzyme kinetics (Farquhar et al., 1980), and linked to stomatal conductance through the Ball-Berry equation (Collatz et al., 1991).Soil respiration is calculated based on temperature and soil moisture in each soil layer.Ecosystem respiration is forced within the model to balance annually with net uptake (Denning et al., 1996;Baker et al., Fig. 1. 2002 Winter (December, January, February), Spring (March, April, May), Summer (June, July, August), and Fall (September, October, November) net ecosystem exchange (NEE) for examined models.
2003).SiB 3.0, hereafter referred to as SiB, specifies spatial heterogeneity in some of its canopy properties through the use of satellite-derived vegetative indices such as leaf area index (LAI) and absorbed fraction of photosynthetically active radiation (PAR), rather than using published values of vegetation and soil parameters tied solely to vegetation type (Baker et al., 2003).SiB has been coupled with the Regional Atmospheric Modeling System (e.g., SiB-RAMS) in regional inversion and error estimation studies (Denning et al., 2003;Nicholls et al., 2004;Zupanski et al., 2007;Lokupitiya et al., 2008).

Carnegie Ames Stanford Approach (CASA)
CASA is a biogeochemical model that uses a system of first-order linear differential equations to represent the flow of carbon between various pools, and track the long-term change in terrestrial carbon stocks on a monthly time-step (Potter et al., 1993;Randerson et al., 1997;Schaefer et al., 2008).CASA tracks photosynthesis using a simple lightuse efficiency model, where maximum light-use efficiency is modified by factors such as temperature, water availability, and litter substrate quality.Heterotrophic (i.e., soil) respiration is simulated within a number of soil-organic carbon pools; whereas autotrophic respiration is assumed to be a constant fraction of GPP.The model's soil moisture submodel is used to regulate both plant production and respiration.This version of CASA is used as a priori information in, among other studies, the TransCom3 inversion intercomparison study (e.g.Gurney et al., 2004;Baker et al., 2006).No component fluxes (e.g., GPP and R E ) were available for this version of CASA.

CASA coupled with the Global Fire Emissions
Database (CASA GFEDv2) The CASA GFEDv2 model (van der Werf et al., 2004(van der Werf et al., , 2006) ) is based on the CASA model described above, although fPAR is estimated from Advanced Very High Resolution Radiometer (AVHRR) NDVI from GIMMSg (Pinzon et al., 2005;Tucker et al., 2005) instead from NOAA/NASA Pathfinder NDVI (as in CASA).In addition, CASA GFEDv2 accounts for the indirect (e.g.mortality and subsequent decay and regrowth) effects of forest fires on carbon stocks in CASA's above-ground carbon pools (leaf, wood, and litter).In both CASA and CASA GFEDv2, the light utilization efficiency term is set uniformly to a value derived from calibration of predicted annual net primary production (NPP) to previous field estimates (Potter et al., 1993).

Spatial covariance of modeled fluxes
We apply variogram analysis to quantify monthly spatial autocorrelation of predicted NEE, and to examine how that variability changes with season.Here, autocorrelation is defined as the expected similarity between estimated fluxes as a function of their separation distances for a given model, and is used as means to quantify spatial variability.This approach enables not only the estimation of spatial variability in each of the models, but, more importantly, the comparison of spatial variability in predicted NEE across models.In addition, the spatial variability of NEE's component fluxes (GPP and R E ) is quantified for the three summer months of June, July, and August.
Quantifying spatial autocorrelation is needed to inform the statistical analysis (e.g., Sect.2.3) geared towards identifying potential drivers of observed spatial patterns of flux among the models.The comparison of spatial variability among the models also provides a quantitative counterpart to the visual examination of variability depicted in maps of the spatial pattern of fluxes (e.g., Fig. 1).Quantitative comparisons help to identify significant differences between the models, and can help to inform the comparison of model results.For example, if two models yield fluxes with very different degrees of spatial variability, then the models that generated these fluxes are likely very different.
A variogram is a function used to describe the spatial correlation of observations, and is based on the degree of dissimilarity between two points, y(x) and y(x+h): where y represents the grid-scale flux (i.e., NEE, GPP, or R E ) predicted from the models, and h is the separation distance between two grid-scale estimates of flux.The raw variogram obtained from Eq. ( 1) is typically represented using a theoretical variogram function.Here, we use an exponential model to examine the spatial correlation of a particular biospheric model's estimate of carbon flux.This choice is based on an examination of the raw variograms of flux estimates constructed from the different models, and is consistent with the work of Michalak et al. (2004).The exponential variogram is defined as: where σ 2 represents the expected variance of carbon flux at large separation distances, and l is the correlation range parameter.The practical correlation length (L) is approximately 3 l, beyond which the covariance between points is negligible (e.g.Kitanidis, 1997).
To obtain an overall measure of spatial variability for each of the models, the spatial covariance parameters (σ 2 , l) of grid-scale estimates of flux for land cells within NA are optimized by fitting the theoretical variogram in Eq. ( 2) to the raw variogram (derived from Eq. 1) using ordinary least squares (OLS).Conceptually, a higher variance is indicative of greater overall spatial variability, while a shorter correlation length indicates greater spatial variability at smaller scales.
In order to better compare the spatial variability in carbon flux predicted by the models, the parameter h 0 (Alkhaled et al., 2008) is used to merge information from the fitted variance and correlation range parameters, and to provide an overall measure of spatial variability: where l and σ 2 are the fitted variogram parameters for each model from equation (2), and h 0 represents the distance at which the variogram reaches a certain preset value (γ max ).Therefore, h 0 provides a single diagnostic metric with which to compare the overall spatial variability of the models.Both a higher regional variance (σ 2 ) and a shorter correlation range parameter (l) results in a shorter overall h 0 , indicating a greater degree of spatial variability of the modeled flux over smaller spatial scales.The value of γ max used in the analysis of model estimates of NEE and component fluxes (GPP and R E ) was chosen to be 0.05 and 1.0 (µmol m −2 s −1 ) 2 , respectively.The choice of this value is somewhat flexible, as long as it is below the lowest observed σ 2 .

Variable selection and regression analysis
The geostatistical regression (GR) approach presented here includes a combination of variable selection (e.g., Burnham and Anderson, 2002) and geostatistical regression (i.e.universal kriging) (e.g., Kitanidis, 1997) in order to identify the factors explaining the spatial variability in monthly predicted NEE, GPP, and R E among biospheric models.Several factors influence the response of estimated fluxes to environmental variables, including model assumptions, the manner in which mechanistic processes are formulated within the model, and the chosen driver or input data.Variable selection methods provide a measure of the overall linear sensitivity of estimated fluxes to environmental variables, or other derived quantities such as Q 10 , without having to recreate the complex relationships or formulations within the models.Even if these formulations could be recreated, such an approach would be undesirable because it would not offer a common metric of comparison across models.Variable selection also provides a means to compare models in light of their differences (e.g., formulations, environmental drivers), and to evaluate whether the same environmental variables appear as significant in explaining the spatial variability in flux across models.
Similarly to multiple linear regression, the GR approach expresses the dependent variable, in this case the estimated NEE (or GPP or R E ) from the models, y, as the sum of a deterministic component, (Xβ), and a stochastic term, (ε) (Kitanidis, 1997): The deterministic component represents the portion of the predicted flux that can be explained using a set of grid-scale environmental variables or covariates, while the stochastic component describes the spatial variability in flux that could not be explained by the deterministic component.The deterministic component takes the form of Xβ, where X is a (n × p) matrix containing columns of p environmental variables that are scaled by a p × 1 vector of unknown drift coefficients or weights (β).The simplest model is a spatially constant mean, where p = 1, X is a vector of ones, and β is the constant mean.More complex models include both the constant mean and a linear combination of environmental variables that best describe or represent the physical process being modeled (e.g., Erickson et al., 2005;Gourdji et al., 2008;Mueller et al., 2010;Yadav et al., 2010).Even though the columns in X are linearly related to y, the columns themselves can contain derived quantities or transformations of one or more environmental variables (e.g., Erickson et al., 2005;Mueller et al., 2010).In this analysis, however, no derived quantities or transformations were used.The variable selection step provides a means of objectively selecting the environmental variables to be included in X, while the geostatistical regression is used to obtain the best estimate of the drift coefficients ( β) which represent the relationship of model-predicted NEE, GPP, and R E to the selected environmental variables, along with their associated uncertainties (σ 2 β ).Together, these two approaches have several advantages over previous model intercomparison approaches.First, the application of a variable selection method reduces the risk of identifying spurious relationships between environmental variables and modeled fluxes as significant, because only variables that jointly provide significant explanatory power can be selected.Second, the combination of variable

Bayes Information Criterion (BIC)
One of the most widely used variable or model selection techniques is the Bayes Information Criterion (BIC) (Schwarz, 1978).The term "model" here refers to X, or a collection of environmental variables that, either individually or collectively, have the most explanatory power in terms of the spatial variability of biospheric-model-derived monthly NEE, GPP, and R E .
Criterion-based approaches, such as the BIC, have been applied in studies focused on model selection for geospatial data (Hoeting et al., 2006) and, more recently, at elucidating processes controlling carbon exchange at various temporal scales at eddy covariance towers sites (Mueller et al., 2010;Yadav et al., 2010).BIC aims to select the combination of environmental variables that is most probable (Forster, 2000) in terms of explaining the variability of fluxes.
The BIC value of a particular combination of environmental variables is a function of the unknown drift coefficients, β, and the covariance of the regression residuals, ε, assuming that the regression residuals (Sect.2.3.2) follow a Gaussian distribution (Schwarz, 1978;Mueller et al., 2010).The model with the lowest BIC is identified as the best model.The covariance of the residuals is assumed to decay exponentially with separation distance: where Q is an n×n covariance matrix of the regression residuals and h is an n × n matrix of separation distances between flux estimate locations (i.e., points on the model grid).The spatial covariance parameters (σ 2 and l) in Q are optimized in a similar manner as described in Eq. ( 2) and Sect.2.2 except that the covariance is quantified for the residuals, ε, (Eq.4) from the regression (see Gourdji et al., 2008) rather than the fluxes themselves.Accounting for the fact that the drift coefficients β are unknown, the BIC equation becomes (Mueller et al., 2010): The reader is referred to Burnham and Anderson (2002) for a more detailed description of BIC, and to Mueller et al. (2010) and Yadav et al. (2010) for additional information on how the BIC approach has been applied in the context of NEE and component flux modeling.
The BIC and GR analysis is conducted on 2002 monthly fluxes covering the three summer months of June, July and August.A small subset of monthly-averaged environmental variables is considered in the BIC analysis (Table 2).These variables were chosen because they have full spatial coverage over North America and have a known association with biospheric fluxes.The variables in Table 2 were mean-deviated and normalized by their standard deviation prior to the regression analysis, in order to make estimated regression coefficients, β, comparable across variables.

Geostatistical regression analysis
Estimates of the drift coefficients, β, and the covariance matrix describing their uncertainties, V β , are obtained as (e.g., Cressie, 1993): using the coefficient of determination: where E[y] is the mean of the estimated flux from a given biospheric model.

Spatial covariance
The spatial covariance parameters obtained from the variogram analysis provide a measure of the scale at which model estimated carbon fluxes vary, or the scale of spatial variability (e.g., correlation length and h 0 ).The models are driven by different data sets and represent the biogeochemical processes controlling carbon flux in different ways, both of which influence the spatial patterns and variability of a model's estimated fluxes.Non-growing season NEE was found to consistently exhibit lower variances (σ 2 ), reaching a minimum in September (∼0.2 to 0.5 (µmol m −2 s −1 ) 2 ), and remaining relatively constant until bud-burst and leaf-out in April and May (Fig. 2).The variance during the growing season, on the other hand, is quite different between the models, with the greatest variance in NEE exhibited by SiB (Figs. 2 and 3).Similarly, correlation lengths (L) are generally longer dur- ing the dormant months, which is consistent with the lower spatial variability observed in the variance parameter.
The h 0 parameter (Eq. 3) shows that NEE from SiB is more variable relative to the other models for most months, and especially during the dormant months, followed by CASA and CASA GFEDv2 (Fig. 3).SiB's forced cellby-cell, long-term balance in photosynthesis and respiration likely drives its greater spatial variability during the winter or dormant months, because the underlying spatial structure or variability is retained during the dormant season within SiB, much more so than for CASA or CASA GFEDv2.
Both SiB 3.0 and CASA GFEDv2's predictions of GPP have significantly greater spatial variability over smaller scales than that of NEE, and slightly greater spatial variability than R E .Component fluxes for CASA were not available.The average summertime variance in GPP and R E predicted by CASA GFEDv2 is greater than that of SiB 3.0 (see Fig. 4).The correlation lengths, however, of SiB's summertime GPP and R E (1100 km and 1300 km, respectively) are slightly shorter than for GFED (1200 km and 1400 km, respectively).Thus, using a γ max of 1.0 (µmol m −2 s −1 ) 2 , we see that the overall spatial variability of GPP for SiB (h 0 = 130 km) and CASA GFEDv2 (h 0 = 110 km) are comparable.However, CASA GFED (h 0 = 190 km) exhibits slightly greater overall spatial variability in R E during June, July, and August relative to SiB (h 0 = 270 km).Recall that the smaller h 0 , the more variable the modeled flux over smaller spatial scales.The differing degrees of variability in R E among the models, combined with somewhat comparable degrees of variability in GPP, leads to the overall spatial variability observed in NEE.Conceptually, NEE is the small difference between two large fluxes: gross primary productivity and ecosystem respiration.Large differences in the variability of GPP and R E , as seen in SiB 3.0, can translate into more variability in resultant NEE fluxes.Both models assume autotrophic respiration to be a fraction of GPP; however, they compute heterotrophic respiration in different ways (Table 1).CASA GFEDv2 treats soil respiration uniformly as a Q 10 response.However, it scales respiration with the same soil moisture submodel that regulates productivity (Table 1).Therefore, soil carbon flux is controlled by nondimensional scalars related to air temperature, soil moisture, litter substrate quality, and soil texture (Zhou et al., 2009).Conversely, SiB 3.0 calculates soil respiration from temperature and soil moisture in each layer, then scales respiration to achieve carbon balance over an annual scale (Baker et al., 2003).No such annual balance is enforced in CASA GFEDv2, which allows carbon to accumulate and move through various carbon pools.In the summer months, this prescribed annual balance in SiB may dampen the soil respiration variability, and therefore ecosystem respiration, causing it to be less spatially variable (compared to CASA GFEDv2) at certain times of the year.
The spatial variability of soil respiration can be affected by other factors, in addition to the soil respiration algorithm used in the model.For example, indirect effects, such as the model's ability to model soil temperature and/or soil water content can also significantly impact how soil respiration is represented.
Additional analyses were conducted to determine whether the scales of variability in the modeled fluxes could be explained simply by the scale of variability in typical environmental driver data used by these models.Using the same approach as that described in Sect.2.2, we estimated the summertime correlation lengths of many of the environmental variables in Table 2.The correlation lengths of these variables ranged from 3000 to 6000 km, lengths significantly longer than those of model estimates of flux, indicating that the scales of spatial variability of the environmental variables alone cannot explain the spatial variability of modeled fluxes.Therefore, although the variability in environmental variables (such as those that reflect phenology in particular) certainly explain some of the spatial variability seen in the models, model choice, assumptions, and structure were found to be a primary influence on the scales of flux variability.
Overall, quantifying spatial variability allows for the comparison of flux estimates across models, in terms of how differences in the degree of spatial variability in component fluxes translate into each model's grid-scale estimates of NEE.The variogram analysis quantifies spatial variability among the models, which, combined with knowledge about the model structure and formulations, is seen here to be a valuable tool for model intercomparisons.In addition, this type of information helps inform the geostatistical analyses that is used to correlate modeled fluxes (and their spatial variability) to climatic variables and other environmental driving variables.This is shown in the next section, where the spatial correlation structure described in Eq. ( 2) and shown in Fig. 2 is used in assessing the relationship of flux to environmental drivers and other ancillary variables.

Variable selection and regression analysis
Geostatistical regression is used to identify environmental variables that best explain the spatial variability in modelestimated fluxes.Whereas it is difficult to isolate the explanatory power of environmental variables by looking at them individually, the approach implemented here examines the influence of multiple variables concurrently.Thus, the geostatistical regression provides information about which variables significantly explain the variability in modeled flux, whether these variables are consistent between models, and how much of the variability remains unexplained and is therefore attributable to factors such as model assumptions, model formulation, scaling choices, and environmental variables not considered in the analysis.
Drift coefficients and their associated uncertainties were estimated for those variables selected using the BIC as outlined in Sects.2.3.1 and 2.3.2 (Table 3).A positive sign on βi indicates the variable is associated with an increase in R E or a decrease in GPP, while a negative sign indicates that a variable is correlated with a net increase in GPP or decrease in R E .

Explanatory variables for modeled net ecosystem exchange
Only evapotranspiration and the percent land cover by mixed and deciduous forests (MDBF) were found to have a significant relationship with estimated NEE across all models (Table 3).This is consistent with the expected strong relationship between transpiration and photosynthesis.The connection between land cover types, such as deciduous forest, and NEE is intuitive as well.Highly productive regions tend to have large gross fluxes, and small changes in these fluxes (relative to mean uptake/respiration) can have a large impact on net flux.Thus, the BIC successfully identifies variables with known strong correlations to NEE.
The magnitude and sign of the drift coefficient ( β) for evapotranspiration in relation to NEE is consistent among the three biospheric models (Table 3) indicating that evapotranspiration has a similar importance in each of the models or at least the spatial distribution of evapotranspiration is correlated similarly to the spatial distribution of NEE.In contrast, even though the percent cover of MDBF was selected as a significant variable explaining NEE across models, its weight (i.e., drift coefficient) varies, with the greatest weight observed for CASA GFEDv2 and the lowest weight in SiB 3.0.The sign on the drift coefficients are consistent, however, with MDBF being associated with an overall uptake of CO 2 across models (Table 3).
In the analysis conducted with SiB fluxes, the GR approach selected the fraction of photosynthetically active radiation (fPAR), leaf area index (LAI), downward short-wave radiation, soil moisture, and Q 10 , in addition to evapotranspiration and percent cover by MDBF, as having the greatest explanatory power for modeled NEE (Table 3).This is in contrast to the variables selected as having a significant relationship to CASA's NEE fluxes, which include the enhanced vegetation index (EVI), normalized difference vegetation index (NDVI), precipitation, and the percent land cover by croplands.While precipitation and percent cover by cropland were also selected for CASA GFEDv2, temperature, Q 10 , and land cover type appear to be more highly correlated with modeled NEE than in the other two models.Some of the differences in variables selected for a given model may be less important (e.g., LAI and fPAR, versus EVI and NDVI) because these variables are similar.What is interesting is that very few variables were commonly selected in CASA and CASA GFEDv2, even though the same base model is used for each (Table 3).In the modifications of CASA to account for fire, the mortality of woody vegetation in CASA GFEDv2 is scaled with the amount of tree cover.Therefore, a cell with high percentage tree cover will have higher mortality than, for example, an open grassland (van der Werf et al., 2003).Mortality rates due to fire have an indirect effect on respiration and productivity, and thus, might explain the higher apparent correlation of CASA GFEDv2 fluxes to percent land cover relative to CASA or SiB 3.0.
Q 10 was selected for both SiB 3.0 and CASA GFEDv2 as having a significant relationship to NEE.However, in SiB, Q 10 is correlated with uptake or sink in CO 2 in SiB, while in CASA GFEDv2 it is linked to an overall release (Table 3).For CASA GFEDv2, the drift coefficients of Q 10 and evapotranspiration are highly correlated, and therefore their impact cannot be assessed independently.In examining the net apparent impact (X β) of temperature on NEE in CASA GFEDv2 (not shown), temperature accounts for the greatest net uptake of CO 2 , more than twice the magnitude of any other variable.Conversely, in SiB, evapotranspiration and Q 10 are correlated with the greatest overall uptake and downward shortwave radiation with the largest release of CO 2 from the land to the atmosphere.
The amount of NEE's spatial variability explained by the selected environmental variables for each model is shown in Fig. 5 using experimental variograms (e.g., a smoothed variogram generated by averaging the raw variogram over consecutive separation distance intervals, similarly as for a histogram).Included in Fig. 5 is the experimental variogram resulting if only those variables commonly selected across models (i.e., evapotranspiration and MDBF) are used in the trend, as well as when each model is detrended using all of the variables selected for that model (X β).Evapotranspiration and percent cover of MDBF explain only a small portion of spatial variability in estimated NEE, relative to that explained by the complete model-specific environmental variables selected through BIC.This implies that the modelspecific variables selected for each model are more important for interpreting the predicted carbon flux than are the variables commonly selected for all models.This suggests fundamental differences between the models in terms of the factors that explain their predicted flux.For a given scale and time period, this, in turn, provides an opportunity for comparing the variables selected for each model with those found to be important in explaining real flux observations (e.g., Law et al., 2002;Urbanski, 2007;Mueller et al., 2010;Yadav et al., 2010), thereby providing a potential new avenue for model evaluation and improvement.
To further examine these relationships and to compare the models, a similar analysis was conducted on NEE's component fluxes of GPP and respiration (R E ).Because environmental variables such as water availability and temperature may have complex relationships with each other, as well as photosynthesis and respiration, isolating the GPP and R E estimates from the models may allow for a more robust comparison of modeled output.
Table 3. Selected variables and their associated drift coefficients ( β, µmol m −2 s −1 ) as estimated from geostatistical regression for the summer months of June, July, and August.The values inside the parentheses indicate the uncertainty in drift coefficients (V β ).Values of β and V β are based on normalized variables * .The variance explained (R 2 ) by the trends is provided for each model.Dashes indicate environmental variables that were not selected for a particular model.Note that component fluxes were only available for SiB 3.0 and CASA GFEDv2.

Explanatory variables for modeled gross primary production and respiration
In general, a greater number of environmental variables are consistently identified as being correlated with GPP and R E across models, than were observed in the analysis conducted with NEE.In addition, for a given biospheric model, many of the same environmental variables are correlated to both GPP and R E (Table 3).However, with most variables, the sign of the recovered drift coefficients on GPP and R E is reversed.In the analysis of component fluxes, uptake of CO 2 (i.e., GPP) is denoted with a negative sign, while a positive sign represents release of CO 2 from the land to the atmosphere (i.e., R E ).Those environmental variables that are correlated with an uptake of CO 2 from the atmosphere in the analysis with GPP, are correlated with a source of CO 2 when examined against R E .In CASA, CASA GFEDv2, and SiB 3.0, autotrophic respiration is formulated as being an instantaneous fraction of GPP.Therefore, the similar but opposite relationship of selected environmental variables to GPP and R E may be a result of model formulation rather than a real association with the physiological processes driving R E .However, the heterotrophic contribution to R E could be 50 % or more, and the controls on RH may vary among the models.For ex-ample, in a study by Mueller et al. (2010), which examined the relationship of environmental variables to measured flux from an eddy-covariance flux tower, different variables were found to be correlated with GPP than to R E .Thus, the degree to which R E predictions from biospheric model can be interpreted here, may need to be re-evaluated.Because of this inverse relationship in some environmental variables, their influence cancels out when the GR approach is used directly with model estimates of NEE.This can be seen in Table 3 when the results of the NEE analysis are compared to those of the component fluxes.
From the component flux analysis, temperature is seen to have a stronger correlation to R E in CASA GFEDv2 compared to SiB, which is the reverse of what was observed in the NEE analysis, where temperature was correlated with a net uptake of CO 2 (Table 3).Temperature was not selected in the GPP analysis with CASA GFEDv2.In terms of NEE, temperature may be acting as a proxy for another variable not included in the analysis, or the interaction of variables whose spatial coverage can be captured (in part) by temperature.
Vegetative indices appear to be similarly correlated to GPP and R E in SiB and CASA GFEDv2.While land cover classification appears to have a significant impact in both models, as also seen with NEE, the relationship to flux is stronger in CASA GFEDv2.The incremental impact of land cover is largest for deciduous and mixed broadleaf forests, evergreen and needleleaf forests, and croplands.This relationship between flux and land cover makes intuitive sense.In areas where CO 2 uptake is large (e.g., deciduous and mixed forests or croplands), respiration will likely be large as well.Furthermore, in regions where there are large gross fluxes, there is also the possibility for the largest variability in flux.The converse would be true for relatively unproductive regions, such as shrublands.This, combined with the potential impact of how mortality rates are parameterized in CASA GFEDv2, explains the apparent strong relationship between percent land cover and CASA GFEDv2's prediction of GPP, R E , and subsequently NEE.
The linear combination of selected variables (i.e., X associated with carbon uptake (GPP) and release (R E ) explains approximately 70 % of CASA GFEDv2 flux variability during the summer months and approximately half that amount for SiB 3.0 (Table 3).This can also be seen in Fig. 6, where the variability at smaller scales is only marginally re-duced for SiB when GPP and R E are detrended using the selected variables, whereas the variability for CASA GFEDv2 is reduced at all separation distances.In addition, a significant proportion of the variability explained in both SiB and CASA GFED's GPP and R E by their model-specific GR trends is from those variables commonly selected across both models (Fig. 6).The magnitude of the drift coefficients on these common variables differs between the two biospheric models, however, with larger drift coefficients estimated for CASA GFEDv2 fluxes versus SiB 3.0.
The greater overall ability of the GR models to explain component flux spatial variability (and NEE) in CASA GFEDv2, compared to SiB, likely results for the types of candidate environmental variables used in this study (Table 2).CASA is a diagnostic model that uses remote sensing data as input.Thus, it makes sense that the flux estimates from CASA would be more sensitive to datasets derived from remote sensing information (e.g., vegetative indices, land cover).While SiB fluxes are also sensitive to similar ancillary environmental variables, these parameters explain far less of SiB's spatial variability.Such a finding has several possible implications.If the true flux variability does not significantly correlate with remote-sensing-based products, model estimates that strongly correlate with, and depend on, such datasets (such as those from CASA) could potentially have significant biases.Conversely, more complex models, such as SiB, that derive their own measures of plant phenology based on environmental conditions, depend heavily on their model formulation rather than the variability in observed environmental drivers.Self-derived quantities within such models could also introduce a bias if the process parameterizations do not emulate the behavior of the real processes driving phenological variability.While this type of diversity in models makes intercomparisons difficult, the GR approach helps to highlight the impact of differences caused by alternative model formulations, which can be used to guide further model enhancements.

Conclusions
This study applies a set of quantitative methods, including variogram analysis, variable selection, and geostatistical regression, for comparing biospheric model estimates of NEE and its component fluxes.Using a small subset of biospheric models, these methods are evaluated in terms of their ability to identify overall similarities and differences in modeled fluxes, quantify and compare the scales of variability among the models, and determine the environmental factors explaining the observed variability in fluxes.
Both the variogram analysis and the related h 0 parameter provide information about the degree of flux spatial variability in the different models, beyond what is evident from visual examination and comparison of flux patterns.than both CASA and CASA GFEDv2, particularly during the dormant and transition months of the year, the overall small scale variability in SiB's component fluxes is much less than that of CASA GFEDv2.Moreover, it is these differences in the amount of variability in SiB's R E and GPP that drive the overall variability observed in NEE.In addition, as seen with GPP and R E , h 0 is a single diagnostic that allows for the comparison of complex variability, where the influence of variance and correlation length are merged to provide a more informative, and comparable, measure of flux variability over smaller scales.Finally, information about flux autocorrelation helps to inform the regression analysis used to correlate modeled NEE to common climatic and environmental parameters.
The GR approach, which combines variable selection and geostatistical regression, provides a means to identify and compare the environmental variables that are most significant in explaining modeled flux spatial variability.The approach highlights those environmental variables that correlate to flux as predicted by each examined model, and provides a means for identifying the strength of the relationship between these variables and predicted flux.This, in turn, provides an opportunity for comparing the variables selected for each model with those found to be important in explaining true flux observations (e.g., eddy-covariance flux tower measurements or inventory-based estimates).For example, in a study conducted by Mueller at al. (2010) examining the factors explaining temporal flux variability at the University of Michigan Biological Station eddy covariance site, different environmental variables were found to be correlated with GPP than to R E .This is contrary to the results of this analysis, which found the biospheric model estimates of GPP and R E to be sensitive to similar sets of environmental variables.This inconsistency between the sensitivity of modeled and observed fluxes indicates that, for the examined models, predictions of R E appear to reflect a scaling of GPP, rather than sensitivity to environmental variables that control respiration directly.Whether this type of relationship is reasonable requires further investigation with other models and flux measurements, and provides a new avenue for model-data intercomparisons.
Furthermore, the GR analysis can be used to draw inferences about how differences in model formulation translate into observed differences in flux variability across models.For example, CASA GFEDv2's method of scaling mortality with percent tree cover appears to strongly influence its flux distribution.As a result, the spatial variability of carbon exchange across North America in CASA GFEDv2 is strongly correlated to land cover.In addition, SiB's model formulation appears to be far more important in explaining the spatial variability of fluxes than environmental factors such as temperature, precipitation, or land cover.
While the methods presented here can point to key differences between models, they cannot tell which model is more correct.For example, the regression approach presented cannot identify exactly which processes within the models are formulated realistically versus those that are modeled incorrectly.However, using the types of quantitative model evaluation methods presented here, in conjunction with existing scientific and modeling understanding provided by model developers, can help to improve process-based biospheric models through the comparison of flux across a range of biospheric models.These methods provide a systematic way to evaluate how models differ in their spatial representation of surface flux and how flux correlates with common environmental drivers.
Statistical methods, such as the ones applied here, represent powerful tools for comparing biospheric model estimates of flux.Care must be taken in implementing and interpreting results from statistical analyses, however, because results can be impacted by the environmental variables considered and the spatiotemporal scale of the analysis (e.g., 1 • by 1 • spatial and monthly temporal resolution).Therefore the analysis should be set up to ensure that key variables or derived parameters needed to explain modeled flux variability are included in the analysis.The effect of missing environmental variables could be aliased onto other variables, thereby acting as a proxy for the true correlation.In addition, the conclusions have to be interpreted at the scale at which the analysis was performed, both in terms of the time period covered (e.g., summer months), as well as the native spatial resolution of the models.
Overall, the set of tools presented here can be used to elucidate differences between existing model estimates of carbon flux, a task that has historically been difficult unless sensitivity tests were performed with standardized simulations and prescribed forcing data.The results show that both variogram analysis and GR help to improve our understanding of model differences caused by alternative model formulations, which can be used as a guide for model enhancement, as well as for reconciling difference in modeled estimates of landatmosphere carbon exchange.As such, tools such as those presented here provide an opportunity to improve large-scale biospheric model intercomparison studies.

Fig. 4 .
Fig. 4. 2002 (Summer June, July, and August) experimental variograms for net ecosystem exchange (NEE), gross primary production (GPP), and ecosystem respiration (R E ).Component fluxes were not available for CASA.Note the different range on the yaxes.

Fig. 5 .
Fig. 5. 2002 Summer (June, July, August) experimental variograms for net ecosystem exchange (NEE) of original fluxes as provided by the biospheric models (solid-line), as well as for the residual (unexplained portion of NEE) using only those variables commonly selected across models (dashed-line); and all of variables selected that best explained predicted NEE for a specific biospheric model (dotted-line).

Fig. 6 .
Fig.6.2002 Summer (June, July, August) experimental variograms for gross primary production (GPP) and ecosystem respiration (R E ) of original fluxes as provided by the biospheric models (solid-line), as well as for the residual (unexplained portion of flux) using only those variables commonly selected across models (dashed-line); and all of variables selected that best explained predicted flux for a specific biospheric model (dotted-line).Component fluxes were not available for CASA.

Table 1 .
Comparison of environmental drivers, vegetation and soil distribution, phenology, compartments and photosynthetic and soil carbon decomposition formulations among models.

Table 2 .
Environmental variables considered in variable selection.
Variables are normalized by subtracting the mean and dividing by the standard deviation, making the normalized variable unitless.Therefore, the units on the drift coefficients become µmol m −2 s −1 to correspond with flux. * The results show that while SiB exhibits greater variability in NEE www.biogeosciences.net/8/1579/2011/Biogeosciences, 8, 1579-1593, 2011