Improving North American terrestrial CO 2 flux diagnosis using spatial structure in land surface model residuals

We evaluate spatial structure in North American CO2 flux observations using a simple diagnostic land surface model. The vegetation photosynthesis respiration model (VPRM) calculates net ecosystem exchange (NEE) using locally observed temperature and photosynthetically active radiation (PAR) along with satellite-derived phenology and moisture. We use observed NEE from a group of 65 North American eddy covariance tower sites spanning North America to estimate VPRM parameters for these sites. We investigate spatial coherence in regional CO 2 fluxes at several different time scales by using geostatistical methods to examine the spatial structure of model–data residuals. We find that persistent spatial structure does exist in the model–data residuals at a length scale of approximately 400 km (median 402 km, mean 712 km, standard deviation 931 km). This spatial structure defines a flux-tower-based VPRM residual covariance matrix. The residual covariance matrix is useful in constructing prior fluxes for atmospheric CO 2 concentration inversion calculations, as well as for constructing a VPRM North American CO2 flux map optimized to eddy covariance observations. Finally (and secondarily), the estimated VPRM parameter values do not separate clearly by plant functional type (PFT). This calls into question whether PFTs can successfully partition ecosystems’ fundamental ecological drivers when the viewing lens is a simple model.


Introduction
The rapid carbon dioxide (CO 2 ) accumulation in Earth's atmosphere in the second half of the 20th century (Conway et al., 2009) has been partially offset by natural biogeochemical processes.Without these buffers, atmospheric CO 2 could accumulate twice as fast: of the roughly 9 Pg of carbon humans release each year by burning fossil fuels (Boden et al., 2012), only roughly half remains in the atmosphere as carbon dioxide (Denman et al., 2007).The rest is absorbed by oceans through air-sea gas exchange or by terrestrial and marine flora through net primary production (NPP; Denman et al., 2007).Terrestrial biological fluxes of CO 2 through photosynthesis and respiration constituted a net sink from the atmosphere of 2 to 3 Pg C per year during the 1990s (Le Quéré et al., 2009), and they exhibit higher interannual variability than oceanic fluxes (Bousquet et al., 2000;Le Quéré et al., 2009).Understanding terrestrial fluxes is crucial to understanding and predicting the increase in atmospheric CO 2 caused by anthropogenic emissions.
Though the net global flux of CO 2 to the atmosphere is well constrained (Tans and Conway, 2005), continental biological CO 2 fluxes are not well characterized, and their drivers are, so far, poorly understood.Diagnostic skill at interannual time scales is poor: land surface models consistently fail to capture observed CO 2 flux interannual variability (e. g., Friend et al., 2007;Ricciuto et al., 2008;Prentice et al., 2000).Predictive skill is also poor: a sampling of terrestrial flux models project terrestrial sink strengths for the year 2100 that vary widely in magnitude and sign (Friedlingstein et al., 2006).

T. W. Hilton et al.: Spatial structure in land surface model residuals
One way to improve model performance is to collect additional data to constrain either model representations of ecological processes (that is, model structure) or model parameters toward the ultimate goal of constraining terrestrial flux estimates.These data sources include direct eddy covariance flux observations, observed atmospheric CO 2 concentrations coupled with atmospheric transport models, and land surface models.Model-observation residuals are another such data source.Here we seek to use spatial structure in these residuals to derive new constraints on surface fluxes.
Land surface models integrate ecological and meteorological drivers into a quantitative biological carbon flux estimate for some land region.They are useful because they can be used to extrapolate over large scales.Direct observation footprints of even the most spatially dense CO 2 flux observation networks cover only a tiny fraction of the land areas they span.For example, even with seven eddy covariance (EC) towers in a roughly 50 km × 75 km area, Goulden et al. (2006) estimate that they directly observe less than 0.01 % of that space.Land surface models estimate fluxes where direct observations do not exist.Improving model diagnoses of the magnitudes and drivers of terrestrial fluxes is a necessary step toward improving overall predictive skill.
Atmospheric inversion calculations (e.g., Rayner et al., 1999;Gurney et al., 2002;Rödenbeck et al., 2003;Peters et al., 2005Peters et al., , 2007) ) offer one approach to use observed atmospheric CO 2 concentrations coupled to an atmospheric transport model to further constrain terrestrial CO 2 flux diagnoses.This approach usually divides the planet into regions, and within each region typically solves for a correction to a prior flux estimate from a land surface model (e.g., Gurney et al., 2002).In regions of the world where CO 2 concentration observations are scarce, there is little information with which to correct the prior flux, and the resulting flux estimations are therefore heavily dependent on the prior.Rödenbeck et al. (2003) introduced a prescribed isotropic spatial covariance to the prior, choosing a spatial correlation scale of 1275 km based on the average scale of autocorrelation among four different land surface models examined by McGuire et al. (2001).Peters et al. (2005Peters et al. ( , 2007) ) propagated inferred surface fluxes forward through time instead of using a prior flux estimate calculated offline for each time step before beginning the inversion calculations.They also used an ensemble Kalman filter to estimate a surface flux spatial covariance matrix.This relies on the assumption that flux errors are independent at weekly time scales and at spatial regions of 25 to 50 % of each continent (Peters et al., 2005), an assumption that is conventional, though most likely not strictly accurate (Peters et al., 2005).Jacobson et al. (2007a,b) describe an inversion approach that does not rely on prior fluxes on the grounds that modeled regional prior fluxes must either be assumed to be independent or treated as spatially correlated with an explicit spatial structure.In reality they are often correlated, though the quantitative correlations are unknown (Jacobson et al., 2007b, auxiliary materials).Jacobson et al. (2007b) show that this assumption of independence results in overconfident flux estimates.By eschewing prior flux estimates the Jacobson et al. (2007a,b) study avoids this pitfall, but at the cost of ignoring the knowledge of ecosystem behavior encapsulated in the flux model: the resulting posterior flux uncertainties are much larger than when modeled prior fluxes are included.That is, removing the information provided by a land surface model removes a significant constraint from the estimation.Michalak et al. (2004) propose an inversion method that uses an assumed spatial correlation structure among surface fluxes, rather than estimates of prior fluxes themselves, to constrain an inversion. Mueller et al. (2008) present an implementation of that geostatistical method, imposing a spatial covariance structure derived from net ecosystem production (NEP) estimates obtained from the CASA model (Randerson et al., 1997), while Gourdji et al. (2008) extend the analysis to include meteorological and economic ancillary data sources as constraints.Gourdji et al. (2010) present results for the geostatistical inversion method applied to North America for the year 2004, examining the impact of spatially dense data within a relatively small region on the method's ability to recover continental model-generated CO 2 fluxes.Like Jacobson et al. (2007a) these studies also do not include prior surface flux estimates, again avoiding the errors introduced by incorrect assumptions regarding the priors but at the cost of ignoring the information a land surface model can provide.
Ideally, a flux diagnosis method would integrate all available sources of information.Here we focus on extracting information from a land surface model, while minimizing the overconfidence-producing assumptions demonstrated by Jacobson et al. (2007a,b).Model-data residuals are the combined effects of flux observation errors, model structural error, and natural variability uncaptured by the flux model.The spatial behavior of model residuals sheds light on the processes that drive fluxes.In the following analyses we will use that information to produce a data-derived quantification of the model-data residuals and their spatial structure that may be used to constrain the fluxes.
Terrestrial flux drivers (meteorological and ecological) do not appreciably vary at scales of, say, one centimeter; therefore model residuals should be correlated within some distance, however small.That distance is an upper bound on the area that a model result for a single spatial point can illuminate.Without a quantitative method for determining a correlation structure for model residuals, it is convenient to assume model residuals are independent and identically distributed (i.i.d.) in space and time.For example, Pacala et al. (2001), Peylin et al. (2002), Gurney et al. (2002), andPeters et al. (2005) adopt this assumption in their inversions.In fact, inversions that solve for corrections to regional prior fluxes intrinsically assume that prior flux residuals are correlated within the time scale and spatial scale of the inversion (Rödenbeck et al., 2003;Michalak et al., 2004).If they are not, the inversion applies a uniform correction to a group of uncorrelated residuals, creating a source of error (Chevallier et al., 2006).As noted, Rödenbeck et al. (2003) impose a prior flux uncertainty spatial correlation length scale of 1275 km.They base that distance on the autocorrelation length scales of the four models used by McGuire et al. (2001).This depends on the assumption that the NEE range among those four models is representative of flux model uncertainty (Rödenbeck et al., 2003).Furthermore, Michalak et al. (2004) point out that spatial structure, if existent, contains information that constrains fluxes.A spatial covariance matrix also identifies regions whose errors are strongly correlated.These correlated areas may then be assigned lesser weights than uncorrelated areas to quantitatively acknowledge that their results are, to some degree, redundant.Michalak et al. (2005) present a maximum likelihood (ML) approach for quantitatively estimating spatial covariances for prior flux error from model output, noting in reference to prior error covariances that "it is recognized that these parameters are crucial to the inversion".
We can improve on existing flux diagnoses by deriving directly from observed fluxes a residual covariance matrix to characterize the spatial behavior of flux model residual correlation.A necessary (and independently useful) prerequisite for estimating a model's residual covariance matrix is an estimation of the spatial scale at which the model's residuals are correlated.Here we present an analysis of the residual covariance matrix of VPRM (Mahadevan et al., 2008), a simple land surface model.
We test the hypothesis that VPRM model residuals are spatially correlated at length scales smaller than the North American continent but larger than an individual EC tower footprint.Analyzing the spatial scale of VPRM residual correlation will provide that length scale.The AmeriFlux and Fluxnet Canada networks of EC towers provide observations that allow us to directly analyze the spatial behavior of VPRM residuals.If that correlation length scale proves larger than the tower footprints, it will prove that the network of EC flux towers in North America has sufficient spatial span and density and has collected enough data across time to empirically define a flux model residual covariance matrix.

Land surface model
The vegetation photosynthesis and respiration model (VPRM) of Mahadevan et al. (2008) is a simple diagnostic terrestrial flux model.In spite of its simplicity, VPRM captures daily and annual cycles in CO 2 fluxes reasonably well (Mahadevan et al., 2008).VPRM structure and skill are described in great detail by Mahadevan et al. (2008).Here we provide a brief overview of the model structure.
VPRM models net ecosystem exchange (NEE) as the sum of a photosynthetic component (gross ecosystem exchange, GEE) and an ecosystem respiration component.GEE is modeled via the equation PAR is observed photosynthetically active radiation and EVI is the satellite-derived enhanced vegetation index (Huete et al., 2002).PAR 0 and λ are user-supplied parameters.P scale , W scale , and T scale are dimensionless scaling terms.They each take values between 0.0 and 1.0 and are defined as follows.
P scale is satellite derived and describes the impacts of leaf expansion and senescence on canopy-scale photosynthesis.P scale is defined as 0.0 outside of the growing season, 1.0 during growing season peak greenness, and is a linear function of the satellite-derived land surface water index (LSWI) (Xiao et al., 2004a) during leaf expansion and senescence: W scale is satellite derived and describes canopy moisture, and is defined as with LSWI max the growing-season maximum LSWI value.
The value of the third dimensionless scaling term T scale is taken from literature and describes the relationship between photosynthesis and temperature.T scale is defined as with T air temperature, and T min , T max , and T opt the minimum, maximum, and optimum temperatures for photosynthesis, respectively.T scale is calculated from literature values of T min , T max , and T opt because the strong correlation between surface temperature and PAR results in numerical instability when both T scale and PAR 0 are estimated from flux observations (Mahadevan et al., 2008).P scale , W scale , and T scale , by definition, vary in both time and space (Mahadevan et al., 2008, Eqs. 6, 7, 8).
Respiration (R) is modeled as a linear function of observed air temperature (T ): with user-supplied parameters α and β.NEE is the difference between the photosynthetic flux and the respiration flux: (1), ( 5), and (6), λ governs the slope of the light-response curve (the relationship between photosynthetic CO 2 flux and PAR).α defines the slope of the respiration response to temperature.PAR 0 defines a half-saturation value for photosynthesis.That is, it specifies a PAR value at which further increases in PAR no longer enhance photosynthesis, as other limiting factors become dominant.VPRM places a PFT-specific floor T low , 1 • C ≤ T low ≤ 5 • C, on surface temperatures.Temperatures below T low are raised to T low when calculating respiration.β thus specifies a minimal level of respiration that occurs regardless of air temperature.
In its simplicity, VPRM offers two important advantages over more complex models.First, it has only four userdefined parameters and is computationally inexpensive.This makes parameter estimation via data assimilation methods that do not require parametric assumptions computationally tractable.Second, as inputs, VPRM requires only air temperature, photosynthetically active radiation (PAR), and satellite-derived vegetation and moisture indices.Temperature and PAR can come from site-level observations when modeling a point or, if the model is to be run globally, from gridded reanalysis products.VPRM can thus be run globally, with no need to compile temporally filled meteorological driver data.These advantages make VPRM a useful tool both for producing diagnostic regional flux maps as well as for evaluating spatial scales of model residuals in the manner of Chevallier et al. (2006).

Data
To constrain VPRM parameter values and examine NEE residuals, we use data from 65 North American eddy covariance flux towers.Figure 1 shows the sites on a map, and Table 1 lists the sites and dominant plant functional type (PFT).These data are part of the 2007 Fluxnet Synthesis Dataset (http://www.fluxdata.org).For each site, this dataset contains CO 2 flux, air temperature, and PAR observations at 30 min intervals, as well as many other quantities not needed for VPRM.
The Fluxnet dataset contains gap-filled NEE, as well as non-filled NEE.Structurally, VPRM does not consider driver data or flux results from previous time steps (Eq.6).VPRM simply does not report an NEE at timesteps where the required driver data are not available.In light of this, and to reduce potential residuals due to gap filling, we use the non-filled data.
The 65 observation sites cover 9 of the 17 PFTs of the International Geosphere-Biosphere Programme (IGBP) land cover classification scheme (Loveland and Belward, 1997)  savannas, water, cropland/natural vegetation mosaic, urban and built-up, snow and ice, and barren or sparsely vegetated.
Site phenology, land surface water, enhanced vegetation index (Huete et al., 2002), and land surface cover type are calculated from reflectances measured by the NOAA MODIS instrument, orbiting with the NASA Terra satellite since 2000 and the NASA Aqua satellite since 2002.Oak Ridge National Laboratory extracts MODIS data for many Fluxnet tower sites and makes them available on the World Wide Web (ORNL DAAC, 2007).
Table 1.65 North American eddy covariance sites used to parameterize VPRM and calculate VPRM flux errors.PFTs are taken from the International Geosphere-Biosphere Programme (IGBP) land cover classification scheme (Loveland and Belward, 1997) We examine the time period 2000 to 2006, bounded in 2000 by the MODIS instrument launch and in 2006 by eddy covariance flux availability.

VPRM parameter estimation
VPRM has four user-estimated parameters that may depend on the location being simulated: λ, PAR 0 , α, and β.In this section we describe how we estimated these parameter values.
We seek the parameter values that cause VPRM NEE to match observed NEE as closely as possible.We chose to minimize the sum of squared errors (SSE; we define VPRM residuals as NEE VPRM minus NEE observed ).If the residuals are normally distributed with a constant standard deviation (i.e., homoscedastic), minimizing SSE is equivalent to a maximum likelihood estimate (Hilborn and Mangel, 1997).
In reality, flux model residuals are neither independent nor identically distributed.A double exponential distribution describes EC observation error better than the normal distribution (Richardson et al., 2006).EC observation error is also proportional to NEE magnitude and wind speed (Richardson et al., 2006;Lasslop et al., 2008).Thus the strong daily and seasonal cycles of NEE cause EC observation errors to be temporally autocorrelated as well as heteroscedastic (Lasslop et al., 2008).Lasslop et al. (2008) suggest a maximum likelihood approach for estimating flux model parameters; however, the method focuses on eddy covariance observation error.The combined impact of land surface model structural error, incorrect parameter values, and natural variability -microscale variations in climate, ecosystem behavior, etc. -may also exhibit statistically significant autocorrelation (Ricciuto et al., 2008).We can approximate distributions for each of those error sources from published literature; therefore the full likelihood function may be written out as the integrated product of likelihood functions for several different statistical distributions.Reducing that integral to an analytical solution, however, is beyond the scope of this study.
NEE varies on a number of different time scales (e.g., daily, annual) and space scales (e.g., local land use and PFT heterogeneity, larger regions that experience similar climate patterns).An ideal land surface model parameter estimation method would allow parameter values to vary at space and time scales matching the ecological variations in NEE.Optimizing parameter values in short time intervals and small spatial windows would run the risk of overfitting as well as incur unnecessary computational cost.We chose to examine three temporal and three spatial windows for SSE minimization -in time: monthly, annual, and all available data; and, in space: individual sites, sites grouped by PFT, and all sites together.This approach yields nine different parameter sets, ranging from 4 to more than 21 000 parameter values.Table 2 summarizes the nine parameter sets.There is not a clear consensus in the literature regarding the "optimal" space-time grouping of observations for parameter estimation; this almost certainly varies according to the modeling goal.This study takes no position on the fitness of the various parameterizations considered for a particular modeling task.For our main goal of determining whether model residuals are correlated at spatial scales longer than a few kilometers, it is sufficient to demonstrate spatial correlation, or lack thereof, across several different model parameterizations.
To search for parameter values that minimize SSE we used differential evolution (DE) (Price et al., 2006).DE is a genetic optimization algorithm that is both fast and more reliable in identifying a global optimum compared to gradient-based minimization algorithms.We used the DEoptim package (Ardia and Mullen, 2009) for the R language and platform for statistical computing (R Development Core Team, 2007).
The Markov chain Monte Carlo (MCMC) optimization approach (e. g., Metropolis et al., 1953) offers the advantage of delivering probability density functions (PDFs), and thus parameter uncertainties, rather than the single optimal value estimates provided by DE.Because of the substantially increased computational expense and the lack of a statistically robust residual likelihood function, we chose to employ DE to obtain the point estimates presented here.
Among the calibration sites used by Mahadevan et al. (2008), alpha varies between 0.051 for a permanent wetland to 0.234 for a spruce forest in Canada.Conducting an optimization of another model based on light-use efficiency (LUE), Lasslop et al. (2010) allowed the LUE parameter to vary between 0.0 and 0.22.With a similar variety of biome types represented by our sites, we expect similar values for λ.Schaefer et al. (2012) suggest that among LUE-based models the LUE parameter (λ, for VPRM) is more influential than other parameters on the photosynthesis side of the ledger in determining GPP; we might therefore place more scrutiny on optimized λ values than on PAR 0 values.Some of our empirically derived λ estimates (reported later) are somewhat outside the physically realistic bounds determined by field observations; we chose to allow these values due to their importance in simulating GPP and the relative simplicity of VPRM structure.

Quantifying spatial structure
The spatial covariance structure quantifies the spatial structure (or lack thereof) for an arbitrary function of space.The semivariogram offers a concise visual summary of the covariance structure.The spatial functions of interest here are VPRM NEE residuals, VPRM NEE, and observed NEE.This section defines the semivariogram and describes its typical behavior for geophysical quantities.
The semivariogram (γ ) is generically defined (Cressie, 1993) as where s i and s j are two locations in space, h is the distance between s i and s j , var denotes variance, and Z is some function of location-air temperature, VPRM NEE residual, etc.
If s i and s j are near one another, one might expect Z(s i ) and Z(s j ) to have similar values, causing γ to be correspondingly small.As h increases, Z(s i ) and Z(s j ) typically diverge, and the value of γ increases.At some sufficiently large h, Z(s i ) and Z(s j ) can become independent, causing γ to level off.
The value of h where the leveling-off occurs is known as the range, and the value of γ at this leveling-off is known as the sill.The range estimates the length scale of spatial correlation in Z.These easily visualized semivariogram features are formal parameters (range, φ; sill or variance, σ 2 ) of the covariance function.
In the same way that the population mean provides a statistical estimator for a population's expected value, there is a statistical estimator to calculate an empirical semivariogram ( γ ) from a set of observed data (Cressie, 1993): where N (h) is the number of location pairs separated by distance h, and the γ notation distinguishes the estimated semivariogram from the theoretical definition of Eq. ( 7); other terms are defined above.The separation distance h may be a precise distance for a single pair of locations, or may be an aggregated separation distance for a number of pairs of locations.
In this study we use the "robust" semivariogram estimator of Cressie and Hawkins (1980).This estimator includes a correction term for nonnormally distributed data, and also reduces the impact of outlying data: Purely mathematically, Eq. ( 7) requires the semivariogram to equal zero at h = 0 because Z(s i )−Z(s i ) = 0.In practice, measurement errors cause repeated measurements at a single location to differ.Moreover, measurements are not made at infinitesimally small separation distances.There is no information about γ (h) at distances below the minimum separation distance h present in the data.This unknown behavior at small h is sometimes called microscale variation.When microscale variation or measurement error are present, γ (h) does not approach zero as h approaches zero.The value of γ (h = 0) is known as the semivariogram nugget, denoted by τ 2 .Together, the sill, range, and nugget characterize the semivariogram and yield much information about the spatial structure of Z.
In addition to providing the length scale of spatial correlation for Z, the semivariogram also specifies the spatial covariance of Z. Specifically, with cov denoting covariance, which expresses the spatial covariance of VPRM residuals in terms of available quantities: the first two terms on the right-hand side of Eq. ( 10) are the variance within individual sites and the last term is the semivariogram.
Covariance parameters φ, σ 2 , and τ 2 may be estimated directly from spatial data via maximum likelihood estimation (MLE) by maximizing the log-likelihood function (Diggle and Ribeiro Jr., 2007): with the covariance matrix σ 2 R(φ) + τ 2 I expressed in terms of φ (range), σ 2 (sill), and τ 2 (nugget).The notation R(φ) allows for parametric covariance families that have parameters in addition to sill, range, and nugget (e.g., the Matérn family).The identity matrix I specifies that the nugget τ is defined only at a point, not point to point (i.e., the off-diagonal terms of the matrix are zero).The residual matrix (y −Dβ) is the difference between observations y and a model structure given by Dβ with model explanatory variables D and model parameters β.MLE is the preferred approach for formal covariance parameter estimation (Diggle and Ribeiro Jr., 2007) in large part because it considers the full set of available data rather than relying on the summary provided by an empirical semivariogram.
Fitting a parametric covariance function to observed VPRM residuals provides three key outcomes: (i) the range for VPRM NEE residuals; (ii) covariance of VPRM NEE residuals at arbitrary separation distances -that is, a residual covariance matrix; and (iii), via kriging, a VPRM NEE map that explicitly considers VPRM NEE residuals (Cressie, 1993).Figure 2  pure nugget covariance function.The exponential covariance function is one example of a model describing a spatial field that is correlated in space to a certain distance and uncorrelated beyond that distance.
For the nine VPRM parameterizations of Sect.2.3 we calculated seasonal mean VPRM residuals.We defined seasons as December-January-February (DJF), March-April-May (MAM), June-July-August (JJA), and September-October-November (SON).Within each season we maximized the negative log likelihood (Eq.11) to estimate covariance parameters for both the pure nugget as well as exponential covariance functions (Fig. 2).We then compared the pure nugget and exponential fits using Akaike's information criterion (AIC) (Akaike, 1976).This experiment determines whether the observed VPRM NEE residuals are better described as covarying in space at some length scale (the exponential covariance model) or as spatially independent even at minimal distances (the pure nugget model).We follow this experiment with two pseudodata experiments to assess the tendency of 65 observation locations spread across North America and our AIC test to choose the exponential covariance function when no spatial covariance is present, or to choose the pure nugget covariance model when the underlying field was generated from an exponential covariance model.The rest of this paper describes our parameterization of VPRM and our analysis of VPRM NEE residual spatial structure.

VPRM parameterization
As described in Sect. 2.3 and Table 2, we calculated VPRM parameter values for nine different groupings of those sites (three in space: individual sites, plant functional types, and all sites together; three in time: monthly, annual, and all available data), conditioned on observations from 65 North American eddy covariance sites (Table 1, Fig. 1).
Figure 3 shows the distribution of VPRM parameter values when estimated monthly within each PFT.Parameter distributions across PFTs for the other eight parameter sets in Table 2 are nearly identical to Fig. 3.The parameter distributions are similar to those of Mahadevan et al. (2008).Most striking in Fig. 3 is the failure of the parameterization to distinguish among plant functional types.
It is perhaps unexpected that VPRM parameters do not cluster by plant functional type.For example, one might expect that the model parameter estimates of a boreal needleleaf forest should be different from a cropland, for example.Turner et al. (2003) and Ruimy et al. (1994) both found that light-use efficiency, calculated as observed GPP divided by observed PAR, varied across different biomes.This suggests that VPRM's λ parameter should vary similarly across PFTs.There is, however, also evidence that light-use efficiency (LUE) is not consistent within PFTs (Ruimy et al., 1994;Schwalm et al., 2006), particularly at daily time scales (Schwalm et al., 2006).Another recent study assumes that maximum LUE is constant across PFTs (Yuan et al., 2007).Schwalm et al. (2006) also suggest intra-PFT LUE varies less at annual time-scales than at daily scales.The values of λ (the VPRM LUE parameter) in Fig. 3, relatively invariant across different PFTs, differ from the results of Schwalm et al. (2006).More recently, Groenendijk et al. (2011) estimated light-use efficiency parameter values by PFTs for a similar photosynthesis model and found similar values among croplands, and multiple forest types (evergreen broadleaf, evergreen needleleaf, deciduous broadleaf, and mixed), while the values for savannas and grasslands stood out; these parameterization results are similar to those presented here (Fig. 3).Kuppel et al. (2012) also found that multi-site parameterizations were able to reproduce site-level photosynthesis and respiration roughly as well as single-site parameterizations.The VPRM respiration parameters α and β presented here also do not vary much across PFTs; this is consistent with previous studies indicating that PFTs are not predictive of soil respiration (Raich and Tufekciogul, 2000;Bond-Lamberty et al., 2004).
The similar parameter values in Fig. 3 could be a consequence of VPRM's simplicity; perhaps a two-equation model that takes climatology and phenology from satellite observations is only able to separate landscapes into "greenphotosynthesizing" and "brown -not photosynthesizing."These results offer hints; investigating the question rigorously would require parameter PDFs to ascertain whether the differences in Fig. 3 are significant.That investigation should also compare model fluxes across different parameterizations.It is possible, for example, that the remote sensing data that drive VPRM are sufficient to separate the NEE of different plant functional types without large parameter differences.The question is intriguing, however.If PFTs truly are not important for NEE diagnosis and prediction, the task of estimating model parameters becomes much simpler: land surfaces may then be simply classified as "green" and "not green."

VPRM NEE residual spatial structure
Qualitatively inspecting the shape of an empirical semivariogram gives an intuitive sense for a function's spatial covariance.Figure 4 plots binned semivariograms for JJA mean VPRM NEE residuals.There is one curve for each of the nine VPRM parameter sets considered (Table 2); each point shows the mean semivariance within a 300 km bin.
The nugget is small for the site-specific parameter sets (black curves), and varies from 1 to 3 (µmol CO 2 m −2 s −1 ) 2 , for the other six parameter sets.In units of standard deviation, these six nuggets equal roughly 2.0 µmol CO 2 m −2 s −1 .The nugget represents the combined influence of variations at spatial scales smaller than the minimum separation distance as well as the contributors to VPRM residuals (EC observation error, VPRM structural error, natural variability; see Sect. 2.3).
In general, the semivariances for each parameter set increase from separation distances of 0 to roughly 800 km, and level off or decrease thereafter.This suggests that VPRM NEE residuals are correlated at distances up to 800 km.We quantify this in the following results.
We are interested primarily in the parameters (range, sill, nugget) of the covariance function that best describes the VPRM residuals.To estimate these parameters, we employ maximum likelihood estimation (MLE, described by Eq. 11).
We fit both pure nugget and exponential covariance functions (whose characteristic semivariograms are shown in Fig. 2) to each of the nine sets of VPRM residuals summarized by the binned semivariograms in Fig. 4. Within each VPRM parameter set we selected either the best-fit pure  2. The left vertical axis shows units of semivariance ( γ ), and the right vertical axis shows units of standard deviation (σ ).σ is related to γ by σ = (2 γ ) 1/2 .The parameter sets with fewer parameters show larger semivariances, reflecting their poorer fit to the data.Visual inspection shows all parameter sets produce strongly increasing semivariance at separation distances between 0 and roughly 750 km with a relative leveling-off at larger separation distances.
nugget covariance function or the best-fit exponential covariance function using AIC (Akaike, 1974).AIC balances goodness of fit (more parameters) against parsimony (fewer parameters).To interpret this result, we must test the adequacy of 65 observation locations across North America (Fig. 1) to detect spatial correlations across hundreds of kilometers.The maximum distance between towers in this group of 65 is 6557 km (US-Atq -US-KS1).To quantitatively test the detection capacity of the dataset we generated 1000 Gaussian random fields (GRFs) on a 6500 × 6500 grid  an imposed exponential covariance structure with a specified range of 402 km (equal to the median VPRM NEE error seasonal covariance range reported in Table 3.) We sampled each GRF at 65 randomly generated locations and estimated exponential and pure nugget covariance function parameters for each sample set using MLE in the same manner that we estimated range values for the VPRM NEE residuals (Table 3).Of the 1000 GRFs, AIC chose the exponential covariance function for only 74.Of those 74, the median estimated covariance range is 936 km; the median estimated covariance range across all 1000 GRFs is 313 km.This dis-tribution of estimated range values is similar to the distribution estimated from the real VPRM NEE residual observations; Fig. 5 plots the two distributions side by side.These results suggest that the estimated range values for VPRM NEE residuals (Table 3) are consistent with a scenario where VPRM NEE residuals have an exponential covariance structure with a range of roughly 400 km, and that 65 observation locations in the United States and Canada are minimally adequate for detecting that structure.That minimal sufficiency is manifested in the imprecision of the median range of 402 km (i.e., the spread around 402 km seen in Table 3) and Fig. 5.  3) along with values from 1000 Gaussian random fields (GRFs) with similar (and known) covariance structure.The left panel shows the range distribution for only the pseudodata range estimations where the exponential covariance model fit more optimally than the pure nugget; the right panel shows the distribution for all 1000 GRFs and demonstrates that the spread of best-fit ranges from the pseudodata experiment is quite similar to the spread among real VPRM residuals.
The similarity in the spreads of best-fit ranges between the observed VPRM residuals and the pseudodata experiment (seen in Fig. 5, left panel) supports this interpretation.The Fluxnet 2007 Synthesis Dataset contains data from more sites in 2003 and 2004 than the other years of the study period.In this scenario of minimal network sufficiency for detection, it makes sense that 2003 and 2004, the years with the most data, are also the years with the most coherent residual spatial structure (Table 3).
We also must consider the possibility of spurious MLE results: that the observed VPRM NEE residual realization may occasionally be better fit by an exponential covariance structure when the complete spatial field has no true structure.We generated another set of 1000 GRFs, each containing 65 points within a 6500 by 6500 grid, and each specified to have a pure nugget covariance structure.We calculated MLE covariance function parameters for these 1000 fields.AIC chose the exponential function over the pure nugget for only 25 of the 1000 fields, suggesting that we might expect a dataset like our VPRM NEE residuals to produce a spurious exponential covariance structure in only a small minority of realizations considered.
The detection rate of spatial correlation among real VPRM residuals (92/252 = 36%) greatly exceeds both the detection rate among similarly structured pseudodata (74/1000 = 7.4%) as well as the spurious detection rate among pseudodata with no structure (25/1000 = 2.5%).This result suggests that the structure detected in the VPRM residuals is unlikely to be specious; that is, it is unlikely that the true spatial correlation length scale of VPRM residuals is 0 km.These results suggest quantitatively that JJA mean VPRM NEE residuals are spatially correlated at a length scale on the order of 400 km.
Anomalies from the 2000 to 2006 means for annual cumulative VPRM NEE, annual cumulative observed NEE, and annual cumulative VPRM NEE residuals displayed similar spatial scales (Fig. 6).This analysis tests the hypothesis that while NEE itself varies significantly at spatial scales on the order of 10 km (e.g., Desai et al., 2008), NEE interannual variability (IAV) is driven by more climatic phenomena such as temperature and moisture that operate at much larger scales (Law et al., 2002;Desai et al., 2008;Ricciuto et al., 2008).If so, then we should see spatial correlation in annual cumulative NEE obs anomalies (calculated by subtracting annually integrated fluxes from their 2000 to 2006 mean values).If VPRM is able to capture that large-scale variation, then annual cumulative NEE VPRM anomalies will show similar spatial correlation.Any spatial structure that exists in NEE obs anomalies that VPRM fails to capture should appear in VPRM NEE residual anomalies.
Integrated annually, observed VPRM residuals follow roughly a normal distribution (not shown), with a mean of 1.6 gC m −2 yr −1 and a standard deviation of 168.5 gC m −2 yr −1 .This mean annual residual is small relative to the 100 to 300 gC m −2 yr −1 sink typical of a productive North American ecosystem.These results suggest that VPRM performs well at the annual scale.This seems reasonable given that VPRM www.biogeosciences.net/10/4607/2013/Biogeosciences, 10, 4607-4625, 2013 sill [(μmol is driven largely by climate-driven quantities such as temperature, EVI, and radiation, and suggests that VPRM performs sufficiently well at annual time scales to provide insight into spatial correlations of NEE interannual variability. As with the VPRM NEE residual semivariograms, we chose optimal anomaly covariance structures by AIC.Of the seven years examined, NEE obs anomalies show correlation at scales of roughly 1000 km only for 2006 (Fig. 6).AIC chose the pure nugget model for the other years considered.This rate of detection is consistent with the rate of detection of spatial structure among VPRM NEE anomalies (Fig. 6).It is also consistent with the detection rate of the pseudodata experiment (Fig. 5), in which we were able to detect a known exponential covariance structure in only 74 of 1000 attempts.This could indicate that large-scale structure does not consistently exist.It could also suggest that NEE interannual variability could be shaped by larger-scale drivers than is NEE itself, and that our flux tower spatial density is insufficient to consistently detect it in a noisy NEE signal.This seems reasonable; land use, which influences NEE, is markedly diverse throughout the study area.Also, disturbance events that heavily influence NEE (e.g., fire, insects, tree harvest) usually do not impact 500 km stretches of land surface.VPRM is strongly driven by climate variables (Eqs. 1, 5), so spatial structure in VPRM NEE interannual variability could simply reflect large-scale spatial structure in climatic interannual variability.Though VPRM no doubt contains structural error, it is an attempt to combine climatic terms as ecological research suggests they influence NEE.Therefore, we believe it makes sense to investigate this combined effect of several climate terms (that is, VPRM NEE) rather than attempt to explain NEE interannual variability by searching for spatial coherence in a number of climate variables individually.
Because VPRM NEE residuals are simply the difference between NEE obs and NEE VPRM , the spatial behaviors of these three quantities are interrelated.Where spatial structure exists in observations, we expect it to be partitioned among NEE VPRM and VPRM NEE residuals.Results in Fig. 6 from all nine VPRM parameter sets show strong spatial structure in VPRM NEE residuals.This is particularly true among parameter sets with fewer parameters (PFT and all sites in space; annual and all data in time).This structure occurs at length scales similar to the length scale exhibited by NEE obs .Sill and nugget values for NEE VPRM and VPRM NEE residuals are also of similar magnitude to the sill and nugget for NEE obs .VPRM NEE residuals are the combination of NEE observation error, VPRM structural error, and natural variability.Because of its correlation to NEE magnitude (Richardson et al., 2006), we expect the NEE observation error component of VPRM residuals to reflect whatever spatial structure is present in NEE itself.It therefore makes sense that the spatial structure present in NEE obs is not partitioned exclusively into NEE VPRM or VPRM NEE residuals but appears in both.
The covariance sill value provides an estimate of variance.The sill values (Fig. 7) for the annual anomalies of annual cumulative VPRM NEE, annual cumulative observed NEE, and annual cumulative VPRM NEE residuals display standard deviations (Fig. 7, right-hand axis) on the order of the annual cumulative NEE typically observed by an eddy covariance site.This suggests that annual VPRM errors at a single location in space are on the order of the flux at that point.If annual VPRM errors are indeed spatially correlated at length scales of 500 to 1000 km, as suggested by Fig. 6, then spatially aggregating VPRM NEE at that length scale should provide a method to reduce the VPRM error variance.

Discussion
Our findings are relevant to both land surface model upscaling as well as atmospheric inversion studies, though several important uncertainties should guide consideration of our results.

Caveats
Several caveats accompany these implications.The structural simplicity of VPRM allows us to conduct parameter estimations that use many thousands of model evaluations.The designers of VPRM achieve that simplicity by abstracting the broadest drivers of NEE out of what is in reality a complex ecology and by considering only short-term drivers of NEE.Longer-term drivers, such as carbon pools (e.g., Curtis et al., 2002) and disturbance histories (e.g., Thornton et al., 2002), are known to be first-order drivers as well.VPRM is able to credibly partition the contributions of photosynthesis and respiration to observed NEE (Mahadevan et al., 2008).
However, these simplifications caution us against attempting site-specific ecological interpretation of short-term fluctuations in VPRM parameter values, fluxes, and residuals.
In addition, the carbon cycle community's understanding of the statistical properties of land surface model NEE residuals remains rudimentary.Several studies have explored the distribution of NEE observation error (e.g., Richardson et al., 2008Richardson et al., , 2006;;Lasslop et al., 2008).Richardson et al. (2006) find that the observational error to exhibit a double exponential distribution observation error, however, is but one component of NEE model residuals.In the absence of a rigorous likelihood function that integrates all of the sources of uncertainty that contribute to NEE model residuals, we have used the mathematically simple sum of squared NEE residuals to estimate VPRM parameters.Implementing a statistically proper likelihood function is nontrivial and is the subject of ongoing research.
It is possible that VPRM residuals covary differently in the east-west direction than north-south or in different regions of the world, or that plant functional types, site disturbance history, or some other land surface descriptor is of first-order importance.The present spatial density of eddy covariance observations limits our ability to test these ideas.The residual spatial covariance of a more complex model structure may also be different.Computational limitations at this time preclude the rigorous optimization of more than a handful of parameters, so we have chosen to focus our attention on VPRM, whose relatively small number of parameters may be rigorously estimated in their entirety.

Implications
It is critically important to quantitatively tailor NEE model parameter estimates to the domain in which the model is to be run; generic parameter values can reproduce observed NEE poorly (Ricciuto, 2006).Good NEE simulation is crucial to calculating accurate model errors, which are in turn crucial to detecting model error spatial structure.
The spatial length scale of land surface model NEE residual covariance bears directly on atmospheric inversion calculations.Inversions seek to use observed atmospheric CO 2 concentrations to refine estimated biological CO 2 fluxes within a region of defined boundaries, with the estimated fluxes typically coming from models.Intrinsic to the method is the assumption that prior flux errors are correlated within each region treated as a separate unknown (Rödenbeck et al., 2003;Michalak et al., 2004).Moreover, this correlation must be assumed to exist at both the time scale of the inversion as well as the spatial scale of the inversion regions.
Our results indicate strongly that this implicit assumption is valid at seasonal time scales (Table 3) and, for annual anomalies, for annual time scales.The relevant spatial scale is approximately 400 km.Kaminski et al. (2001) and Engelen et al. (2002) describe in detail the aggregation error an inversion incurs when this assumption is violated, and recommend including an error covariance matrix in the inversion to quantitatively de-emphasize areas of large uncertainty and observations in close proximity to one another.Schuh et al. (2009) confirm the issue for a regional inversion, and suggest that post-aggregating a high-resolution inversion is preferable to a coarser scale inversion.The methods developed here derive this covariance structure quantitatively from eddy covariance observations.
The estimated length scale of any model's residual spatial covariance is a function of both the model structure's capacity to estimate NEE as well as the capacity of the North American flux tower network to constrain modeled fluxes.A more complex model structure should reduce model structural error.Each source of model NEE error should have a characteristic length scale, and reducing or changing a particular error source should therefore change the residual covariance length scale accordingly.We therefore expect the magnitude of residual spatial covariance length scale to vary somewhat between different models.The methodology presented here provides a method to estimate error covariance length scale for any land surface model.
The length scale derived here (order 400 km) is smaller than the scale of 1275 km estimated by Rödenbeck et al. (2003), and is based on eddy covariance flux measurements rather than land surface model comparison.Our length scale also contrasts starkly with the conclusion of minimal spatial covariance presented by Chevallier et al. (2006).Potentially incorrect prior flux error covariance assumptions are but one source of error that an inversion must consider.Scarcity of well-calibrated CO 2 concentration observations, for example, pushes inversion calculations toward regions larger than 1000 km (e.g., Butler et al., 2010).The North American Carbon Program's Mid-Continental Intensive (MCI) region is a notable exception to this scarcity of CO 2 concentration observations (Lauvaux et al., 2011), and presents an opportunity to investigate the impacts of prior flux error covariance assumptions more deeply.Solving for too many regions in an inversion (that is, too many unknowns) risks overfitting the data, and solving for too few risks oversimplifying the inversion and producing over-confident results.We suggest that for the purposes of minimizing spatial aggregation error, inversion calculations should optimally use regions with spatial scales on the order of 400 km.
Our finding that VPRM does not resolve different PFTs through its parameter values was mildly surprising to us.Though not a primary focus of this study, this results can be viewed in at least two different lights.First, studies wishing to provide first-order regional NEE estimates via a lowcomplexity land surface model may not need to distinguish among PFTs for parameterization on pure statistical grounds.This could lead to considerable savings in computation time and CPU resources.Second, PFTs are commonly assumed to partition land into sections with functionally different participations in the carbon cycle (Beer et al., 2010;Ciais et Xiao et al., 2008).Our results suggest that PFTs may not be the most useful predictor of a land area's carbon cycle dynamics, and that alternative partitioning schemes may be more skillful.Stand age and disturbance history are interesting "land surface NEE descriptor" alternatives to PFTs. Thornton et al. (2002) used the BiomeBGC model to explore the impacts of disturbance history, PFT, site climate, atmospheric CO 2 concentration, and nitrogen deposition on NEE variability among seven evergreen sites spanning North America, and concluded that of those, disturbance history dominated.Goulden et al. (2006) examined seven eddy covariance sites within 50 km of each other that were recovering from burn disturbances that occurred 0, 5, 14, 22, 39, ∼ 73, and ∼ 153 yr previously.They found that mid-growing season EVI and CO 2 fluxes took roughly 50 yr following a burn disturbance to become approximately interannually constant.That 50 yr period included transition from primarily deciduous species to primarily black spruce.These results and others suggest that disturbance history could be at least important as climate and plant functional type to understanding NEE for large areas.Although VPRM does not directly include disturbance history in its structure, its impacts ripple through a VPRM diagnosis.When parameters are estimated for specific sites by optimizing model NEE to observed NEE the parameter values themselves must contain some information about all of the drivers of NEE, including disturbance history.Plant functional type, though not included directly in VPRM structure, is also determined in part by species succession following a disturbance, and EVI is also impacted by disturbances (Goulden et al., 2006).Thus, VPRM NEE diagnoses and estimated parameters must contain some information about disturbance history.That said, through the lens of VPRM that information is convoluted with other drivers of NEE.This makes it difficult to assess disturbance directly through VPRM.
The results of Goulden et al. (2006) suggest, at least for boreal evergreen forests, a satellite record on the order of 50 to 100 yr or longer could be necessary before stand age and recovery from disturbance can be widely and directly described by remote sensing.Recent Landsat products have begun to assemble landscape disturbance records beginning in the 1980s (Huang et al., 2010), offering an opportunity to assess these influences at larger scales

Conclusions
Using observed NEE from 65 North American eddy covariance sites for the years 2000 through 2006, we make point estimates of parameter values for VPRM, a simple land surface model.We then estimate and analyze covariance structures of VPRM NEE residuals in the interest of quantifying spatial structure in the residuals.
PFTs demonstrate little skill as land surface classifications for model parameter estimation.This may allow large-region model studies to partition land surfaces into a "photosynthetically active or not" dichotomy, thereby simplifying model parameterization.
The semivariogram analyses presented here demonstrate that VPRM NEE residuals are spatially correlated at length scales well beyond individual tower footprints but well short of continental scales.Depending on the model parameterization, that length scale lies somewhere between 100 and 900 km, with a median value of roughly 400 km.This result is consistent at both seasonal and interannual time scales, and demonstrates that the North American EC tower network is minimally sufficient to define a VPRM residual covariance matrix.This information will allow us to construct a map of VPRM North American CO 2 fluxes, optimized to eddy covariance observations.
Our estimated covariance functions for model NEE residuals prove that the North American flux tower observation network is adequate for determining a land surface model residual covariance matrix.
A quantitative land surface model error covariance matrix can help to improve atmospheric inversion-derived ecosystem-atmosphere CO 2 flux estimates as well as estimate accompanying uncertainties more accurately.This, in turn, can help improve mechanistic understanding of the terrestrial carbon cycle, furthering the goal of increasing the predictive skill of land surface models.

Fig. 2 .
Fig. 2. Two examples of generic parametric variogram models.The parameter symbols correspond to Sect.2.4 and Eq.(11).Because these are purely illustrative, units for semivariance and distance are irrelevant.

Fig. 4 .
Fig. 4. June-July-August mean VPRM NEE residual empirical semivariograms.Each point represents the mean semivariance and mean separation distance from grouping pairs of towers into 300 km bins.VPRM parameterizations are described in Table2.The left vertical axis shows units of semivariance ( γ ), and the right vertical axis shows units of standard deviation (σ ).σ is related to γ by σ = (2 γ ) 1/2 .The parameter sets with fewer parameters show larger semivariances, reflecting their poorer fit to the data.Visual inspection shows all parameter sets produce strongly increasing semivariance at separation distances between 0 and roughly 750 km with a relative leveling-off at larger separation distances.

Fig. 5 .
Fig. 5. Cumulative density functions for covariance range parameter, estimated by maximum likelihood estimation (MLE).Each panel shows range values for seasonal VPRM NEE residuals (values in Table3) along with values from 1000 Gaussian random fields (GRFs) with similar (and known) covariance structure.The left panel shows the range distribution for only the pseudodata range estimations where the exponential covariance model fit more optimally than the pure nugget; the right panel shows the distribution for all 1000 GRFs and demonstrates that the spread of best-fit ranges from the pseudodata experiment is quite similar to the spread among real VPRM residuals.

Fig. 6 .Fig. 6 .
Fig. 6.Best-fit range values (km) for cumulative annual anomalies of observed NEE, VPRM NEE, and VPRM NEE residuals.Best-fit values were determined by AIC as described in section 2.4.The number y plotted denotes the year 200y.Years where the pure nugget covariance function fit more optimally than the exponential are shown in the shaded box.Anomalies were calculated as the departure from the mean value of 2000 to 2006 annual mean cumulative observed values.
al., T. W. Hilton et al.: Spatial structure in land surface model residuals 2005;

Table 2 .
. The PFT classifications are taken from literature citations or investigator descriptions where available, and otherwise derived from MODIS 1 km land surface classifications.Data are from the 2007 Fluxnet Synthesis Dataset.Total number of parameters resulting from the nine different schemes used to group observation sites for VPRM parameter estimation.

Table 3 .
Range parameter values (km)for VPRM flux error best-fit parametric variogram models.VPRM parameterizations are described in Table2.Where a value is present, Akaike's information criterion (AIC) analysis concludes the exponential variogram model fit more parsimoniously than the pure nugget model.Where the range is blank the pure nugget model fit most parsimoniously, indicating no spatial correlation is present.