Evaluating terrestrial CO 2 flux diagnoses and uncertainties from a simple land surface model and its residuals

Global terrestrial atmosphere–ecosystem carbon dioxide fluxes are well constrained by the concentration and isotopic composition of atmospheric carbon dioxide. In contrast, considerable uncertainty persists surrounding regional contributions to the net global flux as well as the impacts of atmospheric and biological processes that drive the net flux. These uncertainties severely limit our ability to make confident predictions of future terrestrial biological carbon fluxes. Here we use a simple light-use efficiency land surface model (the Vegetation Photosynthesis Respiration Model, VPRM) driven by remotely sensed temperature, moisture, and phenology to diagnose North American gross ecosystem exchange (GEE), ecosystem respiration, and net ecosystem exchange (NEE) for the period 2001 to 2006. We optimize VPRM parameters to eddy covariance (EC) NEE observations from 65 North American FluxNet sites. We use a separate set of 27 cross-validation FluxNet sites to evaluate a range of spatial and temporal resolutions for parameter estimation. With these results we demonstrate that different spatial and temporal groupings of EC sites for parameter estimation achieve similar sum of squared residuals values through radically different spatial patterns of NEE. We also derive a regression model to estimate observed VPRM errors as a function of VPRM NEE, temperature, and precipitation. Because this estimate is based on model-observation residuals it is comprehensive of all the error sources present in modeled fluxes. We find that 1 km interannual variability in VPRM NEE is of similar magnitude to estimated 1 km VPRM NEE errors.


Introduction
Terrestrial ecosystems remove roughly 25 percent of annual anthropogenic fossil fuel carbon dioxide (CO 2 ) via gross primary production (GPP) in excess of respiration (Keeling et al., 1996).More completely, net ecosystem exchange (NEE) -the balance between photosynthesis and heterotrophic respiration -controls the magnitude of atmosphere to ecosystem CO 2 uptake.Diagnosing terrestrial biological carbon dioxide fluxes with confidence is a necessary step toward understanding biological and climatological drivers of these fluxes.Because these fluxes are first-order influences on the accumulation of carbon dioxide in the atmosphere (Denman et al., 2007), understanding their mechanics is necessary to forecast impacts of past and future fossil fuel emissions.In spite of this, atmosphere-based methods to estimate global NEE (e.g., Peters et al., 2007;Janssens et al., 2003) and ground-based approaches (e.g., Potter et al., 2007;Janssens et al., 2003) have produced conflicting estimates, demonstrating substantial uncertainty surrounding these efforts to describe terrestrial carbon cycle mechanics.
Several recent studies demonstrate the discrepancy in diagnosed NEE between "top-down" approaches (i.e., atmosphere-based) and "bottom-up" approaches (i.e., ground-based methods rooted in eddy covariance (EC) flux measurements combined with ecosystem models).Janssens et al. (2003) assembled top-down and bottom-up estimates for late 20th Century annual cumulative European NEE and found that the top-down estimates are larger by roughly 100 Tg CO 2 per year.This 100 Tg difference is between 30 % and 100 % of their best-estimate annual total of 100 Published by Copernicus Publications on behalf of the European Geosciences Union.
to 300 Tg per year.Peters et al. (2007) estimated a North American uptake for 2001 to 2005 of roughly 700 Tg CO 2 per year using a top-down approach, while Potter et al. (2007) estimated uptake of 100 to 200 Pg CO 2 per year over the same period using a bottom-up method.Schwalm et al. (2011) found conflicting estimates of global NEE anomalies estimated from upscaled FluxNet observations when compared against top-down inversion anomaly estimates.
A number of studies have explored approaches to estimate regional NEE using some combination of land surface models and eddy covariance fluxes (bottom-up methods).Potter et al. (2007) chose a set of four North American EC towers to represent characteristic North American ecosystems and used them to evaluate the performance of the NASA-CASA ecosystem model run with a global set of previously published parameter values.They then used the NASA-CASA model to estimate North American annual cumulative NEE.This approach requires no computationally intensive data assimilation (e.g.parameter estimation), but achieves such savings at the cost of considering only a small portion of the NEE observations that are now available.Xiao et al. (2008) used a modified regression tree to create a model suite to explain observed NEE as a function of a variety of satellite-derived ecological measures.A regression tree is a method to empirically derive a best-fit statistical model based on a set of linear models.They derived the models using data from 42 AmeriFlux EC sites in the coterminous United States, producing a set of empirical models capable of upscaling the tower observations to the continental scale.The model that best explained the observed NEE used a combination of MODIS surface reflectances, enhanced vegetation index (EVI), land surface temperature, and normalized difference water index (NDWI).Though statistical, this model structure is quite similar to light-use efficiency (LUE) based models such as the Vegetation Photosynthesis Respiration Model (VPRM) of (Mahadevan et al., 2008).Yuan et al. (2010) develops the EC-LUE model and uses it to produce global estimates of gross primary production (GPP) and evapotranspiration (ET), while Yang et al. (2007) use a machine learning approach coupled with remote sensing data to extrapolate eddy-covariance-estimated GPP to the coterminous USA, and also use their method to estimate maximum light-use efficiency.Jung et al. (2010) describe a method to upscale FluxNet eddy covariance ET observations to the global scale using machine learning algorithms.This method could in concept be applied to NEE as well, though the machine learning approach is purely empirical and does not attempt to incorporate any ecological understanding.Schwalm et al. (2010) and Schwalm et al. (2011) present methods to globally upscale FluxNet observations by plant functional type.These methods depend on the commonly employed assumption that plant functional types are good predictors of landcape NEP.Beer et al. (2010) compared five different diagnostic GPP models with sharply contrasting structures, including two machine learning approaches, an NEE-biome region lookup table, and a LUE model.Each of these approaches was then used to estimate regional NEE values and uncertainties.The Beer et al. study explicitly considered many sources of uncertainty in the models considered.For the light-use efficiency model, the authors estimated site-specific parameter probability density functions (PDFs) at a number of FluxNet eddy covariance sites around the globe.The Bayesian framework of the study allowed the authors to consider PDFs from multiple uncertainty sources: parameter value uncertainty as well as driver data uncertainty.By taking random draws from these parameter PDFs, Beer et al. (2010) constructed a population of global GPP estimates driven by their distribution of parameter values.This population then provided confidence intervals for their global GPP estimates.
Each study outlined above presents a framework to use ecosystem modeling to combine the information in eddy covariance flux tower observations with the information contained in an ecosystem model structure and allows estimation of regional biological CO 2 fluxes.These studies exhibit many ways to treat uncertainty sources, ranging in complexity from not including uncertainty to Bayesian consideration of multiple uncertainty sources.The set of available eddy covariance NEE observations has increased dramatically in recent years (http://www.fluxdata.org);none of the above studies, however, take advantage of the wide spatial coverage of these observations except to perform site-specific calibration of model parameters.Hilton et al. (2013) used North American eddy covariance NEE observations from 65 different locations from the FluxNet project (http://www.fluxdata.org)to optimize parameter values for a simple land surface model (VPRM, Mahadevan et al., 2008).The 65 tower locations span North America in both space and plant functional type.That study presents extensive experiments varying the temporal and spatial periods for parameter estimation to determine an optimal strategy, producing nine different spatial and temporal resolutions.
Here we use VPRM and the assimilated data from this extensive tower network to diagnose annual integrated gross ecosystem exchange (GEE), ecosystem respiration (R), and NEE for the coterminous United States of America, Alaska, and Canada for the period 2002 to 2006.We use eddy covariance observations from a further 27 FluxNet tower locations to quantitatively cross-validate the parameter optimizations.This rigorous cross-validation analysis is a crucial step in a model-based carbon flux upscaling; without such an exercise it is difficult to measure the spatial accuracy, or lack thereof, of estimating unobserved fluxes using a model and data assimilation.Cross-validation would not be possible without the recent growth of eddy covariance observation networks: if only a handful of observation locations exist, as in the recent past, we cannot afford to withhold data from parameter estimation.
We also extend the analysis to derive empirical confidence intervals for NEE diagnoses based on observed NEE residuals and NEE drivers.Because this uncertainty is derived from eddy-covariance-observation-model NEE residuals, it considers all of the uncertainty sources that are present in model-based upscaling: eddy covariance observation error, model structural error, model parameterization error, and random natural ecosystem variability.This comprehensive and quantitative uncertainty analysis is also, to our knowledge, unique.

Land surface model
The Vegetation Photosynthesis and Respiration Model (VPRM, Mahadevan et al., 2008) is a light-use efficiency (LUE)-based land surface model.Ecosystem respiration is treated as a linear function of surface air temperature: with slope α and intercept β; β determines the basal rate of respiration that occurs at near-freezing temperatures.Gross Ecosystem Exchange (GEE) is modeled as with PAR denoting photosynthetically active radiation and EVI the satellite derived enhanced vegetation index (Huete et al., 2002).P scale (satellite derived), W scale (satellite derived), and T scale (literature derived) are scaling terms that take values between 0.0 and 1.0 and attenuate GEE according to phenology, moisture conditions, and temperature, respectively.Parameter λ encodes light-use efficiency, and parameter PAR 0 encodes the LUE curve half-saturation value.Mahadevan et al. (2008) provide detailed description of VPRM structure and performance.As described more fully in Hilton et al. (2013), the relatively simple structure of VPRM and its small number of parameters make it computationally inexpensive.This makes relatively sophisticated parameter estimation methods possible and makes VPRM a useful tool for diagnosing carbon fluxes, estimating flux uncertainty, and exploring the impacts of model parameterization and model error spatial covariance.Hilton et al. (2013) presented estimated values for λ, PAR 0 , α, and β using data from 65 North American eddy covariance towers (Fig. 1, Table 1).For parameter estimation, the eddy covariance data were partitioned in three different ways in space (individual sites, plant functional types (PFTs), and all sites together), and three different ways in time (monthly, annual, and all available data).This produced nine unique VPRM parameter sets with differing spatial and temporal optimization "resolutions": single sites-monthly, PFT-annual, etc.

Land surface model parameterization
As argued by Hilton et al. (2013), an ideal model parameter estimation scheme should permit parameter values to vary at space and timescales matching variations in NEE.Because NEE varies on numerous space and timescales, the space and timescales in which NEE variations are deemed "of interest" (as opposed to "noise") will vary with the modeling goals and spatial domain to be modeled.The range of temporal parameterization windows considered (annual, seasonal, monthly, and 10 day) allow variation consistent with a number of first order NEE drivers: annual climate variation, seasonal ecological cycles, and synoptic weather.Evaluating PFT-specific parameters as well as parameters estimated across many sites helps to evaluate the performance of PFTs as a land surface classification method for NEE diagnosis.
Upscaling tower measurements intrinsically requires model parameters that are applicable in spatial locations without tower observations, making the single-site parameters not useful for the task.This leaves the six parameter sets from the PFT and all-sites-together spatial groupings available for upscaling.In addition to those six, PFT-10-day Table 1.65 North American eddy covariance sites used to parameterize VPRM and calculate VPRM flux errors.PFTs are taken from the International Geosphere-Biosphere Programme (IGBP) land cover classification scheme (Loveland and Belward, 1997) Cook et al. (2004) parameters were calculated as well.The sum of squared errors (SSE) were computed for the seven VPRM parameter sets that are useful for upscaling, using observations from 27 "cross-validation" eddy covariance sites (Table 2; Fig. 2; Sect.2.3) that were not used for parameter estimation.
To evaluate model performance we use SSE at the 27 cross-validation sites and penalized sum of squared errors (PSSE, e.g.Hilborn and Mangel, 1997) at all 92 sites (the 27 cross-validation sites plus the 65 parameterization sites).PSSE is given by PSSE with SSE the sum of squared errors, n pars the number of unique model parameter values, and n obs the number of data points available.Among cross-validation sites withheld from parameter estimation, we can detect overfitting when the SSE begins to increase with the number of parameters.Because model parameterization, by definition, fits the model to observed data, SSE among parameterization sites should only decrease with additional parameters.PSSE provides a method to detect overfitting among parameterization sites.
We chose SSE because a statistically proper likelihood function would require integrating likelihood functions for all of the sources of error that contribute to model error.These include model structural error, model parameterization error, eddy covariance observation error, and "natural variability" (microscale flucutations in the atmosphere, climate, and ecosystem behavior).Distributions, and therefore likelihood functions, may be approximated for these error sources.Reducing their integrated product to a computationally tractable form is difficult and beyond the scope of this study.In the absence of a statistically proper likelihood function, we chose to use the mathematically simple SSE.This is equivalent to a maximum likelihood approach if the model errors may be assumed to be independent and identically distributed (i.i.d).Model errors are probably not i.i.d.(Ricciuto et al., 2008), but we have made this simplification in light of the points mentioned above.

Data
The 2007 FluxNet synthesis dataset (http://www.fluxdata.org) assembled eddy covariance observations from field sites around the world.The data were gap-filled and assigned quality scores using published methods (Papale et al., 2006;Moffat et al., 2007).The present study uses non-gap-filled NEE from 92 eddy covariance sites from the United States and Canada: 65 flux towers to estimate VPRM parameters (Table 1, Fig. 1), plus 27 "cross-validation" sites (Table 2, Fig. 2).The cross-validation sites used in this study were not used for VPRM parameterization because of data availability difficulties.After all necessary VPRM input data were eventually obtained for these sites, they were used to evaluate the performance of the optimized VPRM.VPRM uses temperature and photosynthetically active radiation (PAR) to drive GEE and respiration.To run VPRM at the continental scale, air temperature and downward surface radiation values were obtained from the reanalysis products of Sheffield et al. (2006).The Sheffield et al. (2006) products attempt to correct known biases (Brotzge, 2004) to the NCEP-NCAR reanalysis products (Kalnay et al., 1996).VPRM was driven with the three-hourly, 1 • × 1 • product for temperature and PAR.
VPRM is also driven by satellite-derived moisture and phenology.MODIS products MOD13A2 (enhanced vegetation index (EVI); Huete et al., 2002Huete et al., , 1999)), MCD12Q1 (land cover; Friedl et al., 2002;Strahler et al., 1999), MCD12Q2 (vegetation dynamics; Zhang et al., 2003), and MCD43B4 (Bidirectional Reflectance Distribution Function (BRDF) reflectances; Schaaf et al., 2002) provided these drivers.EVI data reported with quality ratings of "lowest quality" and "not useful" (VI quality bits 2-3 equal to 11) were discarded.Gaps from discarded MODIS data were not filled; VPRM fluxes were not calculated in these instances.EVI and BRDF data are reported at one-kilometer, 16-day resolution; land cover and vegetation dynamics are reported at 500 m, annual resolution and were processed from 500 m resolution to 1000 m resolution using software tools provided by the MODIS Land quality assessment group (Roy et al., 2002)

NEE residual spread estimation
This study seeks upscaled NEE diagnoses accompanied by uncertainty estimates.There are several methods of varying complexity available to quantify this uncertainty.A joint Bayesian inversion of VPRM parameters and VPRM NEE variance against eddy covariance NEE observations using a joint likelihood function would extract information from the available data with maximum mathematical rigor (though is still vulnerable to aggregation errors in grouping scheme, e.g.plant functional types, as well as errors in observations and driver data).Statistically rigorous likelihood functions for model NEE error, however, remain an ongoing research topic, and the calculation itself is computationally expensive.The method employed here uses an empirically derived statistical model to characterize VPRM NEE residual spread -a middle ground between the joint Bayesian inversion with MCMC and the simple interpolation.
It is known that eddy covariance observation error is proportional to NEE magnitude itself (Richardson et al., 2006).Typical magnitude for this random EC observation error is roughly 20 to 30 g C m −2 yr −1 (Richardson et al., 2006;Goulden et al., 1996), roughly an order of magnitude smaller than the VPRM annual NEE residuals.Random eddy covariance observation error is a component of VPRM NEE error, making it reasonable to posit that VPRM NEE residual magnitude is correlated to VPRM NEE magnitude.Furthermore, the structure of VPRM (Eqs. 1 and 2) assumes that temperature, water availability, and greenness (EVI) are primary drivers of NEE.It seems reasonable, then, that VPRM residuals would be affected by these influences as well.
Systematic bias in eddy covariance observations (such as underestimation of NEE under low-turbulence conditions) is also an important contributor to land surface model uncertainty (Williams et al., 2009).Typically a friction velocity (u * ) threshold is used to identify and remove conditions of low turbulence (e.g.Baldocchi, 2003); uncertainty in u * was recently estimated to contribute as much as 150 g C m −2 yr −1 to overall land surface model uncertainty (Reichstein et al, unpublished data, cited in Williams et al., 2009) Using non-gap-filled observations from the 65 eddy covariance sites used to estimate VPRM parameter values, annual integrated NEE residuals were calculated as the integrated sum of non-gap-filled NEE observations minus VPRM NEE, both at EC site-specific native reporting resolution (generally 30 min, 60 min at a few sites).Residuals were calculated only at time stamps where both quantities were available.With these integrated residuals squared differences were calculated for each site year: NEE denotes annually integrated VPRM residual, and NEE denotes the mean of NEE across all site years.NEE sq diff is closely related to statistical variance σ 2 (σ 2 ≡ N i=1 (x i − x) 2 /(N −1)).Estimating NEE sq diff in terms of known quantities provides a method to estimate the spread of VPRM NEE errors that can be upscaled along with the flux diagnoses.
Regression models for NEE sq diff were derived from subsets of these candidate explanatory variables: VPRM annual integrated NEE, total annual precipitation, annual mean surface air temperature, annual mean EVI, PFT, and year.The set of models consisting of all combinations within the categorical variables, the linear numerical terms, and the quadratic numerical terms was searched exhaustively using the glmulti package (Calcagno, 2011) for R (R Development Core Team, 2007) and the results ranked by Akaike's Information Criterion (AIC, Akaike, 1976).Annual mean EVI was calculated as the mean of 16-day MODIS EVI values (see Sect. 2.3).Annual total precipitation and annual mean temperature were calculated as the sum and mean, respectively, of the monthly mean 1

Land surface model parameter set ranking
As described in Sect.2.2, we ranked the parameter sets that are useful for upscaling (this excludes the three individualsite-based parameter sets) by sum of squared errors (SSE).Figure 3 presents these SSE values, plotted against the number of unique parameter values.The solid curve plots the SSE for the 27 cross-validation sites not used for VPRM parameter estimation, and the dashed curve plots the penalized sum of squared errors (PSSE; defined in Sect.2.2) for all 92 sites (the 27 cross-validation sites and the 65 sites used to parameterize VPRM).Figure 3 suggests that the monthly and 10day VPRM parameter sets overfit the data.The PFT-all-data VPRM parameters achieved the lowest cross-validation SSE as well as a PSSE only slightly above the lowest PSSE; there-  1); right vertical axis shows penalized sum of squared errors (PSSE) for the 27 cross-validation sites combined with the 65 sites used to parameterize VPRM (Fig. 1, Table 1).Note the log-scale on the horizontal axis.
fore, those parameters are used for most of the analyses presented here.

VPRM NEE residual evaluation
To evaluate the quality of VPRM NEE diagnoses, Fig. 4 presents the histogram of VPRM NEE residuals calculated at eddy covariance site reporting intervals (30 min at most sites; 60 min at a few sites), calculated using PFT-all-data VPRM parameters.The mean residual of 4.66 × 10 −4 µmol CO 2 m −2 s −1 corresponds to an annually integrated flux of 0.18 g C m −2 yr −1 , small compared to a typical observed EC annual NEE between 100 and 300 g C m −2 yr −1 .This suggests that the parameter optimization achieved its task of optimizing VPRM to observed fluxes at hourly timescales.A normal distribution with the same mean and standard deviation is overlaid; the observed residuals show a higher peak around their mean but otherwise correspond closely to the normal distribution.
Having demonstrated that the parameter optimization performs well at hourly intervals we turn now to the annual timescale.Figure 5  performed well at the annual scale.This observed residual distribution also follows a normal distribution (overlaid) reasonably well.Satisfied now that VPRM residuals are small relative to EC-observed fluxes, we investigate whether 1 km VPRM diagnosed annual NEE for North America seems reasonable when compared to EC-observed annual NEE. Figure 6 compares the distribution of annually integrated VPRM NEE diagnoses for the modeling domain (Sect.2.4) with the distribution of annually integrated NEE observations from the 2007 FluxNet synthesis dataset for the sites in Table 1.To obtain a meaningful comparison to model diagnoses, we use the FluxNet synthesis dataset gap-filled NEE observations in this case.The gapfilling used the methods of Papale et al. (2006) and Moffat et al. (2007).VPRM reproduces well the mode of the observed distribution as well as the right-hand tail (sources of CO 2 to the atmosphere).The left tail of the observed NEE distribution contains more density than the VPRM diagnoses, suggesting that VPRM estimates lower sinks of atmospheric CO 2 than the gapfilled FluxNet 2007 synthesis dataset in some cases.The FluxNet synthesis dataset contains a handful of site years with sinks of atmospheric CO 2 approaching or even exceed- ing 1000 g C m −2 yr −1 .VPRM was optimized to these data, and the left-side tail of the VPRM diagnosed NEE distributions does contain more mass than the right side.
Overall, the VPRM performance summarized by Figs. 4, 5, and 6 are encouraging for the ability of the parameter estimation process to optimize VPRM to eddy covariance observations at both hourly and annual timescales.

VPRM fluxes
Figures 7, 8, and 9 show annually integrated VPRM GEE, R, and NEE, respectively, for 2002.The larger-scale (order > 100 km) spatial patterns are representative of the integrated fluxes for 2003 to 2006 (not shown).NEE is the difference between GEE and R, both much larger in magnitude.This raises detectability issues for NEE: this difference between two larger and roughly equal quantities is easily polluted by errors from GEE and R estimation.Therefore, rather than focus on integrated annual NEE values or aggregated continental NEE, this study instead focuses on year-to-year NEE differences and NEE differences across different VPRM parameter sets.
Year to year flux differences are reported here as annual anomalies, calculated as integrated annual flux minus mean

Estimated spread of VPRM fluxes
Determining whether these flux diagnoses are able to detect meaningful interannual variability (IAV) requires a measure of the variance of annual integrated NEE.Section 2.5 describes the empirical derivation of a statistical model to predict the squared difference between the annual integrated VPRM residual and its mean (NEE sq diff , Eq. 4) across site years.Of the candidate models, the best-fitting model (lowest AIC) was NEE VPRM is annual integrated VPRM NEE, T is annual mean temperature ( • C), and pcp is annual total precipitation (mm).The fit achieved a multiple R squared of 0.289, with the coefficients significant at p < 0.001 (NEE 2 VPRM ), p < 0.05 (pcp 2 ), p < 0.1 (T ), and no significance for NEE VPRM (p = 0.12).
The regression model in Eq. 5 was tested at the crossvalidation EC sites (Table 2, Fig. 2). Figure 13 (top panel) shows observed vs. predicted NEE sq diff with the 95 %   prediction interval.The prediction intervals at each point are calculated from the regression slope and intercept variances, which are estimated from the residuals of the regression fit.
Of 56 site years in the cross-validation data set, one observation is outside of the 95% prediction interval.The bottom panel of Fig. 13 shows histograms of the observed and predicted values.The distributions are similar, except for predicted values around zero.This highlights a shortcoming of the regression model approach: negative predicted NEE sq diff values are possible.This should emphasize that, as with any regression model, predictions are only valid when the explanatory variables take values within the ranges used to fit the model.Figure 14 shows the square root of estimated NEE sq diff for the modeling area for 2002, providing an estimate of VPRM error magnitude.The spatial patterns for 2003 to 2006 (not shown) are similar.The estimated VPRM errors are broadly of similar magnitude to the VPRM NEE differences between years (Fig. 12).

Modeled flux equifinality
Figure 15 presents the June-July-August VPRM NEE for the southeastern USA for two different parameter sets: allsites-all-data (top panel) and PFT-all-data (bottom panel).These two parameter sets result in similar cross-validation SSE (Fig. 3).The starkly different spatial patterns of growing season NEE from the two parameter sets with similar cross-validation SSE demonstrate the problem of equifinality in land surface model results.The NEE differences between the two parameterizations are of similar magnitude to the estimated VPRM NEE errors of Fig. 14.

Model parameterization
The five lowest cross-validation SSE values in Fig. 3 are not drastically different from one another, though the penalized SSE values for the two most parsimonious parameter sets (all-all and all-annual) are significantly higher.In combination with the parameter distributions presented in Hilton et al. (2013), this result might suggest that order 100 parameters are optimal for flux upscaling.The two parameter sets considered in Fig. 3 that use parameterization temporal windows shorter than annual (monthly and 10 day) produced notably higher cross-validation SSE values and higher penalized SSE values than the other five parameter sets, suggesting these parameterizations overfit the observations.Considered in conjunction with the differing spatial behaviors in Fig. 15, the similar PSSE values among the betterperforming parameter sets in Fig. 3 suggest an instance of   equifinality: PFT-all-data parameters and all-sites-all-data parameters produce comparable sums of 30 min squared residuals via strongly divergent spatial outcomes.

Spatial behavior of modeled fluxes
The broad spatial patterns in the NEE, GPP, and R results presented here (Figs. 7, 8, and 9) largely agree with other analyses (e.g., Beer et al., 2010;Xiao et al., 2011;Running et al., 2004).As we would expect given the prominence of the vegetation index in VPRM structure (Eq.2), the patterns of strong GPP reflect areas of relatively dense vegetation as measured by vegetation index (Huete et al., 2002) or biomass (Myneni et al., 2001).The relatively large respiration diagnoses for the southeastern USA in Fig. 8 is also present in the 2003 to 2006 diagnoses.This area, roughly covering the US states of Louisiana, Mississippi, Alabama, Georgia, and South Carolina, is dominated by the mixed forest PFT in the MODIS land cover classification (Sect.2.3).The three mixed forest eddy covariance sites used for VPRM parameterization are in Wisconsin, USA and Ontario, Canada.Rather than conclude that the mixed forests of the U.S. Gulf Coast are much stronger sources of biological CO 2 than other classes of southern forests or more northerly mixed forests, several alternative explanations seem more likely.First, perhaps the carbon cycle mechanics of northern mixed forests do not describe well the behavior of southerly mixed forests and diagnose erroneously strong respiration when applied in southerly regions.Second, three eddy covariance sites may provide insufficient data to characterize this (or any) PFT.Lastly, stand age is an important driver of NEE (Litvak et al., 2003), and is ignored by the modeling methods employed here.
The region of positive NEE in Fig. 15, bottom panel corresponds to the region of large diagnosed respiration discussed above.Once again, instead of concluding that respiration is causing the mixed forests of the southeastern USA to release on the order of 150 g C m −2 yr −1 to the atmosphere, the explanations discussed previously (exclusively northerly mixed forest parameterization sites; insufficient quantity of mixed forest parameterization sites; potentially important ecological drivers not included in VPRM) seem more plausible.
The differences in growing season NEE in the southeastern USA between the two parameter sets shown in Fig. 15 highlights a question: what are the most appropriate parameterization time and space windows for a land surface model?In contrast, other results of this work might suggest de-emphasizing this line of inquiry.For example, the total 30 min cross-validation SSE values (Fig. 3) across the five most optimally fitting VPRM parameter sets are nearly equal, suggesting that the choice of parameter optimization spatial and temporal windows is perhaps of secondary importance.In that case, the drastically lesser computational cost makes coarser spatial and temporal windows preferable.
Notable in the annual anomaly diagnoses (Figs. 10 (GEE), 11 (R), and 12 (NEE)) is the much larger variability of GEE as compared to R. That GEE variability is reflected in NEE variability as well.This could be a consequence of VPRM's structural treatment of respiration as a linear function of temperature (Eq.1).In contrast VPRM GEE (Eq.2) considers a number of other variables in addition to temperature.Increased interannual variability (IAV) in GEE may simply reflect that there are more constituent quantities to vary.
Much of the stronger GEE IAV (Fig. 10) occurs in the upper midwestern USA.The 2006, for example, showed a particularly strong VPRM GEE diagnosis centered around the US state of Indiana.This area is dominated by agriculture -the cropland PFT in the MODIS IGBP landcover classification.Within the cropland PFT different agricultural products are known to vary in the strength of their carbon uptake.Corn, for example, has particularly strong atmospheric CO 2 uptake (Lokupitiya et al., 2009).Without parameterizations specific to particular crops, model NEE diagnosis can be poor (Lokupitiya et al., 2009).There are only five agricultural EC sites in the group used to parameterize VPRM (Table 1).This makes it possible that the model parameterization suffers from the same representativeness problem that may cause potentially spurious VPRM respiration spatial structure in the southeastern USA.Many farms rotate crops from one season to the next; for example, corn in year y followed by soybeans in year y + 1.If reflected in remotely sensed ecosystem variables (e.g.vegetation indices or moisture) this sort of rotation could itself cause the GEE interannual variation seen in the VPRM annual anomalies.Similarly, if the cropland VPRM parameter estimation EC sites (US-Ne1, US-Ne2, US-Ne3, US-Bo1, and US-Bo2) were consistently planted with a particular crop during the periods used for parameter estimation, VPRM should not be expected to perform well for different crops.Looking to other potential causes for large year to year changes in the upper midwestern USA, VPRM R diagnoses in that region show little year to year variation, removing temperature anomalies as a driver of GEE IAV.From the structure of VPRM GEE (Eq.2) this leaves moisture availability, PAR, and vegetation index as primary candidates for driving GEE variability.

VPRM uncertainty estimation
In spite of its r 2 of 0.289, Eq. 5 performed well across 27 cross-validation sites on two performance measures: First, 55 of 56 predicted errors (98 %) fall within the 95 % prediction confidence interval (Fig. 13, top panel).Second -and crucially -the distribution of predicted errors matches the distribution of observed errors (Fig. 13, bottom panel) at the cross-validation sites.This suggests that the distribution of diagnosed VPRM NEE error magnitudes is consistent with observations.
The multiple r 2 value of 0.289 achieved by Eq. 5 may at a glance appear relatively low.However, our ultimate goal in this exercise is a spatial estimate of VPRM NEE uncertainty.In this context it is more important to successfully diagnose the distribution of error magnitudes than to accurately capture every local rise and fall of the error magnitude as a function of its drivers.This is because spatial aggregation of high-resolution VPRM error diagnoses will smooth out the high-resolution inaccuracies without sacrificing the more important regional accuracy.Hilton et al. (2013) provides the spatial error covariances needed to perform this aggregation.
Though the regression model estimation methods developed here are applied to estimate VPRM NEE error magnitude, the approach is equally applicable to estimating errors in an ecosystem model diagnosis of GEE or R; this change would be subject only to quality of the partitioning of EC NEE observations into GEE and R.
As noted in Sect. 1, several recent studies have attempted continent-scale carbon flux diagnoses; those diagnoses generally do not report uncertainty.Beer et al. (2010) reported spatial estimates of GPP accompanied by globally aggregated uncertainties.The work presented here reports spatial GPP, R, and NEE diagnoses, and further extends the literature by estimating annual continental NEE uncertainty in space.Beer et al. (2010) estimate GPP uncertainty for their LUE model by randomly resampling from within their population of parameters; these parameters are optimized to eddy covariance observations at each observation site.Because the parameters are optimized to flux observations, these uncertainty estimates include observation errors and model parameterization errors.Driver data uncertainty is quantified by analyzing uncertainty separately for three different reanalysis products.
The difference between EC-observed NEE and model NEE includes contributions from the error sources described in Sect.

NEE error covariance nugget
The estimated nugget values from the VPRM NEE error spatial covariance (Hilton et al., 2013) quantify combined eddy covariance observation error and "microscale variation", that is, the behavior of the difference in VPRM NEE error between two locations that are closer to one another than the closest pairs of towers among the 65 used for covariance parameter estimation.The median estimated seasonal nugget values range from 5.42×10 −5 (individual-sitemonthly VPRM parameters) through 0.775 (PFT-all-data parameters) to 0.884 (all-sites-all-data parameters), with units of flux squared: (µmol CO 2 m −2 s −1 ) 2 .Converted to standard deviation and integrated annually (g C m −2 yr −1 ) these nuggets are 21.0, 586, and 603.
In units of standard deviation the annual integrated NEE error nugget of 21.0 g C m −2 yr −1 from the individual-sitemonthly VPRM parameters is essentially equal to the annual total eddy covariance random observation error of ±20 g C m −2 yr −1 estimated by Richardson and Hollinger (2005).The 65 eddy covariance sites used for fitting include 26 pairs that are within 10 km of each other, so there are many data points at small separation distances to quantify the nugget.
At coarser spatial and temporal parameter estimation resolutions (PFT-all-data, all-sites-all-data, etc.) the NEE error standard deviations of roughly 600 g C m −2 yr −1 calculated from the nuggets are of similar magnitude to the errors estimated from VPRM NEE and climate drivers (Fig. 13) for high-productivity PFTs (e.g.forests, croplands).
These results suggest that when VPRM is optimized to NEE observations at short temporal scales (order one month) the VPRM NEE nugget is dominated by eddy covariance observation error -that is, under these conditions VPRM per-forms quite well in close proximity (order one kilometer) to an optimization location.Eddy covariance observation error is independent of VPRM optimization spatial and temporal windows, so its contribution to either nugget should remain constant across these windows.The much larger nugget when the temporal optimization window is all available observations therefore suggests that microscale VPRM NEE error increases dramatically from its value when VPRM is optimized monthly for individual sites.This means that VPRM can perform quite poorly even in close proximity to an optimization location in these cases.
Light-use efficiency models such as VPRM make climatedriven diagnoses of NEE (Eqs. 1 and 2).Widespread VPRM annual NEE error magnitudes (Fig. 14, this section) on the order of VPRM NEE interannual variability (Fig. 12) imply that climate (or, at least, climate viewed through the prism of VPRM) cannot reliably explain NEE interannual variability.

Caveats
The results reported here were compiled using VPRM, a simple LUE-based land surface model, and are therefore most directly informative toward similar models.Questions of spatial and temporal resolution for model parameterization arise for more complicated mechanistic ecosystem models as well.Whether optimizing more complex model structures would result in similar total PSSE values for strongly contrasting spatial and temporal optimization windows (as reported here in Fig. 3) is a question for further analysis.Regardless, this work suggests that applying model parameterizations outside of the climate and ecosystem conditions where the parameter values were optimized can produce suspicious spatial structures such as the widespread flux of CO 2 to the atmosphere across the southeastern USA in Fig. 15 (bottom panel) and Fig. 8.
The 27 cross-validation sites (Fig. 2, Table 2) generally have shorter observational records than the sites used for VPRM parameterization.Repeating the cross-validation experiment with different, perhaps randomly selected subsets might be a useful exercise.
In addition, forest stand age since disturbance is a first order determinant of NEE magnitude (Litvak et al., 2003).Structurally, VPRM does not consider stand age (Eqs. 2 and 1), and the work presented here does not attempt to assess disturbance history.For this reason, this work does not emphasize integrated NEE magnitudes or attempt regional NEE aggregation.
Likewise, the NEE residual magnitude statistical model derived here (Sect.2.5) was fit using observed NEE residuals and observed climatic drivers.While estimating uncertainty directly from observed residuals is a strength of the approach, as with any regression these results cannot produce meaningful estimates where the driver variables depart the range of values used for fitting.

Conclusions
This work presents high-resolution diagnoses of North American NEE and NEE interannual variability accompanied by NEE error estimates, all derived from a simple lightuse efficiency-based land surface model (VPRM).Several different model optimization spatial and temporal resolutions achieve similar fits when evaluated by total sum of squared errors at cross-validation sites and penalized sum of squared errors at the model parameterization sites.Cross-validation is useful for identifying parameterizations that overfit assimilated data, however.Cross-validation SSE eliminated two of seven parameterizations we considered for upscaling.Penalized SSE from the parameterization sites eliminated another three.This sort of method to evaluate model parameter sets is computationally inexpensive and would make a welcome addition to future flux diagnoses.
Two of our model parameterizations achieved similar cross-validation SSE, but reached their NEE diagnoses through starkly contrasting spatial distributions of NEE.Modeling efforts that do not consider multiple spatial and temporal parameterization resolutions risk missing the structural uncertainty that this equifinality reveals, and that radically different fluxes across space may not be readily distinguishable when viewed through the lens of aggregated model errors.
The results here demonstrate that modeled annual integrated flux magnitude, annual mean surface temperature, and annual total precipitation provide reasonable (and computationally inexpensive) empirical predictors of NEE error magnitude.Estimated NEE errors are of equal magnitude to diagnosed NEE interannual variability.That a climate-driven land surface model cannot reliably separate year-to-year differences in model NEE from model error suggests that NEE interannual variability has important drivers outside of largescale climate, or, alternatively, that the present network of North American eddy covariance NEE observation sites provide insufficient constraints on NEE and NEE error to reveal a strong climate-NEE interannual variability connection.

Fig. 3 .
Fig. 3. VPRM sum of squared errors vs. number of unique parameter values.Shown are the seven VPRM parameter sets available for upscaling.Parameter sets are labeled as (space grouping)−(time grouping) used for VPRM parameter estimation.Left vertical axis shows sum of squared errors (SSE) for the 27 cross-validation sites not used to estimate VPRM parameters (Fig. 2, Table1); right vertical axis shows penalized sum of squared errors (PSSE) for the 27 cross-validation sites combined with the 65 sites used to parameterize VPRM (Fig.1, Table1).Note the log-scale on the horizontal axis.

Fig. 4 .
Fig. 4. Histogram of VPRM NEE residuals at eddy covariance site reporting temporal resolution (30 min or 60 min), PFT-all-data VPRM parameters.NEE residuals are calculated as observed NEE (non-gap-filled) minus VPRM NEE.A normal distribution probability density function with the same mean and standard deviation is overlaid.

Fig. 13 .
Fig. 13.Results from an empirical regression model (Eq.5) for VPRM NEE residual spread.Units are (g C m −2 yr −1 ) 2 .Top panel shows observed values vs. predicted values at 27 cross-validation sites (Table 2).Observed values are outside of the 95 % prediction interval where the solid line falls outside of the dashed lines.One of 56 predicted values (2 %) is outside the 95 % prediction interval.The bottom panel shows histograms for observed values and model-predicted values.

Fig. 14 .
Fig. 14.The 2002 estimated square root of NEE residual squared difference (Eq.4), calculated using PFT-all-data VPRM parameters.Units are g C m −2 yr −1 .Estimates are calculated by a statistical model (Eq.5, Sect.2.5) with explanatory variables annual integrated VPRM NEE, annual total precipitation, and annual mean surface temperature.The 2003-2006 estimated annual errors (not shown) show similar spatial patterns.

Fig. 15 .
Fig. 15.The 2004 VPRM June-July-August integrated NEE.Top panel shows all-sites-all-data parameters, bottom panel shows PFTall-data parameters.Units are g C m −2 yr −1 .Black areas were not calculated for this figure.
. The PFT classifications are taken from literature citations or investigator descriptions where available, and otherwise derived from MODIS 1 km land surface classifications.Data are from the 2007 FluxNet synthesis dataset.

www.biogeosciences.net/11/217/2014/ Biogeosciences, 11, 217-235, 2014 all
Beer et al. (2010)ce observation error, VPRM parameterization error, driver data error, VPRM structural errors, and natural variability.Because the uncertainty estimates presented here are derived from model-data residuals, land surface model error sources other than systematic eddy covariance observation bias are considered implicitly by this approach.This makes these estimates inclusive of a broader range of error sources relative to most approaches that focus on propagating specific errors through a model calculation.The method also avoids the need to quantify distributions of specific error sources, but sacrifices the possibility of partitioning the estimated error into contributions from constituent sources.Because it is based on model-data residuals, the regression model method presented here represents a different approach to flux diagnosis uncertainty from the method ofBeer et al. (2010), which individually samples different model error contributors such as parameterization uncertainty and meteorological driver data uncertainty.Direct comparison is difficult because the uncertainty estimates presented here quantify NEE errors for North America, while Beer et al. estimate globally aggregated GPP uncertainty.It is simple in concept, however, to extend the methods shown here to GPP uncertainty and to the global scale.