Remote sensing of annual terrestrial gross primary productivity from MODIS : an assessment using the FLUXNET La Thuile data set

Gross primary productivity (GPP) is the largest and most variable component of the global terrestrial carbon cycle. Repeatable and accurate monitoring of terrestrial GPP is therefore critical for quantifying dynamics in regional-to-global carbon budgets. Remote sensing provides high frequency observations of terrestrial ecosystems and is widely used to monitor and model spatiotemporal variability in ecosystem properties and processes that affect terrestrial GPP. We used data from the Moderate Resolution Imaging Spectroradiometer (MODIS) and FLUXNET to assess Published by Copernicus Publications on behalf of the European Geosciences Union. 2186 M. Verma et al.: Remote sensing of annual terrestrial gross primary productivity from MODIS how well four metrics derived from remotely sensed vegetation indices (hereafter referred to as proxies) and six remote sensing-based models capture spatial and temporal variations in annual GPP. Specifically, we used the FLUXNET La Thuile data set, which includes several times more sites (144) and site years (422) than previous studies have used. Our results show that remotely sensed proxies and modeled GPP are able to capture significant spatial variation in mean annual GPP in every biome except croplands, but that the percentage of explained variance differed substantially across biomes (10–80 %). The ability of remotely sensed proxies and models to explain interannual variability in GPP was even more limited. Remotely sensed proxies explained 40– 60 % of interannual variance in annual GPP in moisturelimited biomes, including grasslands and shrublands. However, none of the models or remotely sensed proxies explained statistically significant amounts of interannual variation in GPP in croplands, evergreen needleleaf forests, or deciduous broadleaf forests. Robust and repeatable characterization of spatiotemporal variability in carbon budgets is critically important and the carbon cycle science community is increasingly relying on remotely sensing data. Our analyses highlight the power of remote sensing-based models, but also provide bounds on the uncertainties associated with these models. Uncertainty in flux tower GPP, and difference between the footprints of MODIS pixels and flux tower measurements are acknowledged as unresolved challenges.

how well four metrics derived from remotely sensed vegetation indices (hereafter referred to as proxies) and six remote sensing-based models capture spatial and temporal variations in annual GPP.Specifically, we used the FLUXNET La Thuile data set, which includes several times more sites (144) and site years (422) than previous studies have used.Our results show that remotely sensed proxies and modeled GPP are able to capture significant spatial variation in mean annual GPP in every biome except croplands, but that the percentage of explained variance differed substantially across biomes (10-80 %).The ability of remotely sensed proxies and models to explain interannual variability in GPP was even more limited.Remotely sensed proxies explained 40-60 % of interannual variance in annual GPP in moisturelimited biomes, including grasslands and shrublands.However, none of the models or remotely sensed proxies explained statistically significant amounts of interannual variation in GPP in croplands, evergreen needleleaf forests, or deciduous broadleaf forests.Robust and repeatable characterization of spatiotemporal variability in carbon budgets is critically important and the carbon cycle science community is increasingly relying on remotely sensing data.Our analyses highlight the power of remote sensing-based models, but also provide bounds on the uncertainties associated with these models.Uncertainty in flux tower GPP, and difference between the footprints of MODIS pixels and flux tower measurements are acknowledged as unresolved challenges.

Introduction
Terrestrial ecosystems sequester about 25 % (≈ 2-2.5 Pg C year −1 ) of the carbon emitted by human activities each year (Canadell et al., 2007).By comparison, terrestrial gross primary productivity (GPP) is roughly 120 Pg C year −1 and is the largest component flux of the global carbon cycle (Beer et al., 2010).Thus, even small fluctuations in GPP can cause large changes in the airborne fraction of anthropogenic carbon dioxide (Raupach et al., 2008).Terrestrial GPP also provides important societal services through provision of food, fiber and energy.Methods for quantifying dynamics in terrestrial GPP are therefore required to improve climate forecasts and ensure long-term security in services provided by terrestrial ecosystems (Bunn and Goetz, 2006;Schimel, 2007).
Two main approaches have been used to estimate spatial and temporal variability in GPP from remotely sensed data.In the first approach, spatiotemporal patterns in vegetation indices (VIs) are assumed to reflect spatial and temporal variation in GPP (Goward et al., 1985;Myneni et al., 1998;Zhou et al., 2001;Goetz et al., 2005;Bunn and Goetz, 2006).These studies do not estimate carbon fluxes (but see Jung et al., 2008).We refer to these metrics as remotely sensed "proxies" of GPP in this study.In the second approach, remote sensing data is used as input to models of GPP that fall into one of three basic groups: (i) light-use efficiency models (e.g., Potter et al., 1993;Prince and Goward, 1995;Running et al., 2004;Mahadevan et al., 2008); (ii) empirical models that use remotely sensed data calibrated to in situ eddy covariance measurements (e.g., Sims et al., 2008;Ueyama et al, 2010); and (iii) machine learning algorithms, which are also calibrated to in situ measurements (Yang et al., 2007;Xiao et al., 2010).A large number of studies have compared results derived from remote sensing-based models with in situ measurements (e.g., Turner et al., 2006;Heinsch et al., 2006;Yuan et al., 2007;Sims et al., 2008;Mahadevan et al., 2008;Xiao et al. 2010).However, all of these studies are based on relatively small in situ data sets and none have explicitly examined both spatial and temporal variations in remotely sensed proxies (e.g., Hashimoto et al., 2012) and modeled estimates with corresponding variations in in situ measurements of GPP.
In this study, we use data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) to evaluate how well 10 different remote sensing models and proxies are able to explain geographic and interannual variation in annual GPP.Our analysis builds upon and extends previous efforts in three important ways.First, we examine spatial (across sites) and temporal (across years) variation separately, focusing on GPP at annual scale.Distinguishing between spatial and interannual variation is important because the drivers and magnitudes of geographic and interannual variation in GPP are different (Burke et al., 1997;Richardson et al., 2010a).Second, previous studies have examined results from only one or two models.The analysis we present here encompasses 10 different proxies and models that have not previously been systematically assessed and compared.Third, we use a data set that encompasses a much larger number of sites and siteyears than previous studies.Our analysis is therefore much more comprehensive than previous studies.
The selected proxies and models make very different assumptions about the underlying mechanisms and drivers of GPP (Table 1).A key goal of the work reported here is to assess how different assumptions and inputs influence remote sensing results.To accomplish this, our analysis addresses three questions: 1. How well do the selected remote sensing-based methods capture geographic (across sites) and interannual variation (across years) in annual GPP?
2. How does the performance of different methods vary across biomes?
3. Are methods that use daily or 8-day input data better at characterizing annual GPP relative to methods that use annual inputs?
By comparing results from the remote sensing-based methods against in situ measurements from field sites that encompass a wide range of biomes and climate regimes, our GPP is controlled by one of the above three proxies and mean annual precipitation or temperature.
Short-term fluctuations in GPP do not contribute to spatial variation in annual GPP.
One of the three proxies and mean annual temperature or precipitation.
3 for each biome.No constrain on spatial and temporal variability is imposed on weights and biases.
study not only aims to address the questions identified above, but also attempts to improve understanding of the processes and factors that control geographic and interannual variation in annual GPP.
2 Data and methods

FLUXNET data
Our analysis is based on measurements included in the FLUXNET La Thuile data set (http://www.fluxdata.org/SitePages/AboutFLUXNET.aspx).This data set contains measurements of net ecosystem exchange (NEE) and near surface meteorology for 247 sites encompassing approximately 850 site-years of data since 2000.The data set uses an empirical temperature response function to model ecosystem respiration (Reichstein et al., 2005), and estimates GPP as the residual of measured NEE and modeled respiration.The temperature response function is calibrated using nighttime data when winds are usually low and assumes that the calibrated relationship holds during daytime.It is worth noting that some site investigators are not in full agreement regarding the method used to model respiration in La Thuile data set.To address these concerns, efforts to refine and improve respiration estimates are underway.Until these revised data are available, however, the La Thuile data set is the only reference data set available for this type of study.Most importantly, the La Thuile data set has been widely used, including in a number of high-profile synthesis studies (e.g., Beer et al., 2010).We therefore believe that the GPP estimates used here are of sufficient quality to meet the needs of this study, although we recognize that it includes errors and uncertainties associated with modeled respiration, which can propagate to GPP estimates.To minimize these errors, we only included sites with high quality data and identified a subset of 176 sites with 515 site-years of data where each site-year satisfied two conditions: (i) more than 95 % of the days had daily GPP data, and (ii) the mean daily quality flag was more than 0.75 (Richardson et al., 2010b).Using land cover information (see Sect. 2.2), we also excluded sites where fewer than 20 % of pixels in 10.6 km 2 windows ( 7-by-7 window of 500 m MODIS pixels) centered over the site belonged to the same land cover type as the tower site.The final data set included 144 sites (Table S1 in the Supplement) and 422 site-years of data, spanned all of the major biomes and climate types, and included a range in annual GPP that varied from 200 to 4000 g C m −2 yr −1 (Table 2; Fig. 1).

MODIS data products
MODIS collection 5 land products are available from the Land Processes DAAC (https://lpdaac.usgs.gov)at 250, 500, and 1000 m spatial resolution, depending on the product (Justice et al., 2002).We computed the normalized difference vegetation index (NDVI), the enhanced vegetation index (EVI), and the land surface water index (LSWI) using nadir bidirectional reflectance distribution function adjusted reflectance (NBAR) data at 500 m spatial resolution and 8day time step (Schaaf et al., 2002).Information related to land cover and the timing and duration of the growing season at each 500 m pixel was obtained from the MODIS Land Cover Type and Land Cover Dynamics Products (Friedl et al., 2010;Zhang et al., 2006).We also used MODIS GPP (MOD17; Running et al., 2004), MODIS fraction of absorbed photosynthetically active radiation (FPAR) (MOD15; Myneni et al., 2002), and MODIS day and night land sur-face temperature (LST; Wan et al., 2002) data, which are all produced at 1000 m spatial resolution.Following the approach used in previous studies (Heinsch et al., 2006;Sims et al., 2008;Xiao et al., 2010), we extracted 500 and 1000 m MODIS products for 7-by-7 and 3by-3 pixel windows, respectively, centered on each site using the MODIS subsetting tool available at the ORNL DAAC for Biogeochemical Dynamics (http://daac.ornl.gov).We then selected the center pixel and all other pixels in the window with land cover labels equivalent to the land cover type at each flux tower.Figure 2 shows box plot for the number of pixels retained at each flux tower site in each biome.MODIS data were then averaged over the selected pixels to produce a single value for each MODIS product at each site at each time step.By using 3-by-3 km windows, we ensure that the tower is located in the window.More importantly, spatial averaging over pixels with similar land cover minimizes random variation in MODIS data and reduces errors associated with gridding artifacts (e.g., Tan et al., 2006) and land cover types that are different from the tower site (Garrity et al., 2011).

MODIS proxies of GPP
Remotely sensed data such as the growing season mean and integral of NDVI have been used as proxies for GPP in several previous studies (Tucker et al., 1981(Tucker et al., , 2001;;Myneni et al., 1998;Zhou et al., 2001).In this work we examined four different MODIS-based proxies of GPP (Table 1): (i) the growing period length (GPL), (ii) the growing season integral of EVI (EVI-area), (iii) the growing season mean NDVI, and (iv) the growing season mean EVI.GPL and EVI-area were obtained from the MODIS Collection 5 Land Cover Dynamics product (Ganguly et al., 2010).We included the GPL in our analysis because several studies have suggested that GPL is an important control on annual GPP (White et al., 1999;Barr et al., 2004;Churkina et al., 2005).GPL and EVI-area estimates were not extracted for evergreen broadleaf forest (EBF) sites because we assume that GPL is not a significant control on annual GPP in this biome.

GPP models based on MODIS data
We examined six remote sensing-based models in this study (Table 1): the MODIS GPP product (MOD17; Running et al., 2004), the temperature and greenness (TG) model (Sims et al., 2008), the vegetation photosynthesis and respiration model (VPRM) (Mahadevan et al., 2008), a non-parametric neural network model (e.g., Beer et al., 2010;Moffat et al., 2010), the MOD17 algorithm calibrated to tower GPP (e.g., Heinsch et al., 2006; hereafter referred to as "MOD17-Tower"), and regression models that use one of the four proxies and mean annual temperature or mean annual precipitation as predictors.Below we provide a brief description of each model (also see Table 1).
(i) MOD17.We obtained modeled 8-day estimates of GPP at each of the selected FLUXNET sites for the MODIS GPP product (MOD17A2; Running et al., 2004).The algorithm used to generate this product is based on light use efficiency and combines 8-day MODIS FPAR data with daily coarse resolution meteorological data and five biome-specific parameters to produce daily GPP estimates at 1 km spatial resolution (Table 1).
(ii) MOD17-Tower.The standard MOD17 product uses coarse resolution (1 • by 1 • ) PAR, temperature and vapor pressure deficit data.However, due to land surface heterogeneity and atmospheric variability, the three meteorological variables show a great deal of fine scale variability.Heinsch et al. (2006) showed that GPP estimated by the MOD17 algorithm is sensitive to the quality of meteorological forcing data.To minimize the errors due to the coarse resolution and low quality of meteorological data, we calibrated the MOD17 algorithm using fine spatial and temporal resolution, high quality meteorological data from the FLUXNET data set.Since model parameters are specific to a set of input data, we optimized the parameters of MOD17 algorithm by minimizing the sum of squared differences between daily tower measurements and GPP modeled with tower data (Heinsch et al., 2006).Following the same procedure that is used by the operational MOD17 algorithm (Zhao et al., 2005), we replaced all MODIS FPAR values that were not retrieved by the main MOD15 algorithm (i.e., missing values and those produced by the "backup" algorithm) by linear interpolation using adjacent good quality FPAR values.
(iii) VPRM.This model is also based on light use efficiency, but has significant differences from the MOD17 algorithm (Xiao et al., 2004).Following Mahadevan et al. (2008), we prescribed maximum, minimum and optimum temperatures in each biome.We then treated the half saturation point (PAR 0 ) and maximum light use efficiency (ε max ) as biomespecific parameters, and optimized them for each biome by minimizing the sum of the squared errors between daily modeled and observed GPP: where GPP VPRM is daily modeled GPP and GP P TOWER corresponds to measured GPP.We then randomly sampled the parameter space 1000 times and used the "trust-region" algorithm in MATLAB (MathWorks, 2009b) to find the vector [ε max , PAR 0 ] that minimized the cost function.2008) developed the TG model using MODIS data at 16-day time steps.In systems that exhibit rapid development and senescence, such as croplands, grasslands, savannas, and deciduous broadleaf forests, this relatively coarse temporal resolution reduces the TG model's ability to capture sharp transitions in phenology.We therefore used MODIS 8-day EVI and LST data to calculate GPP at 8-day resolution.Optimization of TG model parameters using tower GPP did not produce any significant differences in predicted GPP relative to the original model, and so we used the same parameters as originally described by Sims et al. (2008).Evergreen broadleaf sites were excluded because the TG model was not designed for this biome.
(v) Neural network model.Machine learning models that use meteorological and remote sensing data to predict carbon fluxes have been used in many recent studies (e.g., Xiao et al., 2010;Moffat et al., 2010).These models include no explicit biophysical structure, but are based on the assumption that functional relationships exist between the response (i.e., GPP) and predictor variables.We used a feed-forward neural network model with a single hidden layer and a sigmoid transfer function (MATLAB, 2009a).The model was estimated by minimizing the difference between predicted and observed GPP at daily time steps using the same input variables that were used to calibrate the MOD17-Tower model.
(vi) Regression models combining remote sensing proxies and climate predictors.We estimated regression models at annual time steps for each biome using two predictors: (1) a remotely sensed proxy (from Sect.2.3), and (2) mean annual temperature or precipitation (Garbulsky et al., 2010).The final model for each biome was based on the remotely sensed proxy and climate variable that explained the most variance in annual GPP in each biome.This model is referred to as "Proxy+Met".In CRO the model included EVI-area and mean annual temperature, in DBF and ENF it was based on GPL and mean annual temperature, and in EBF, GRA, and SSMF mean growing season EVI and mean annual precipitation.

Analysis
Our analysis uses estimates of annual GPP from the La Thuile database.Despite the size of this database, five biomes had fewer than 10 tower sites (savannas, woody savannas, open shrublands, closed shrublands, and mixed forest); we therefore pooled these into one group, labeled as SSMF.Savannas, woody savannas, open shrublands, and closed shrublands are all arid or semi-arid, where precipitation is a dominant control on primary productivity.Our analysis revealed that annual GPP at the mixed forests sites was more highly correlated with annual precipitation than with temperature.Thus, while the SSMF group includes sites with different plant functional types, variability in GPP at all of the sites in this group is largely controlled by water.
Our analysis explores both spatial and interannual covariance between in situ measurements and remotely sensed proxies and model-based estimates of annual GPP.To perform this analysis, it was important to distinguish random variability from ecologically meaningful variation in annual GPP data at each site.Daily GPP derived from eddycovariance measurements are generally assumed to include uncertainty on the order of 15-20 % (Falge et al., 2002;Hagen et al., 2006).However, summing daily GPP cancels random errors and reduces relative uncertainty in estimates of annual GPP compared to daily values (Falge et al., 2002;Hagen et al., 2006;Lasslop et al., 2010).Desai et al. (2008) report the interquartile range of annual GPP to be less than 10 % of the mean and Richardson (unpublished) estimates the uncertainty in annual GPP derived from eddy-covariance to be 5 %.Here we assume that uncertainty in annual GPP is ±5 % (±1 standard deviation).
We calibrated four models (the MOD17-Tower, neural network, Proxy+Met, and VPRM model) with tower data at biome level.To do this, we estimated separate models for each biome, where the parameters of each of the four models were optimized using tower GPP and meteorological data specific to each biome.In evaluating these four models we used a leave-one-site-out cross-validation method.This method allows the most efficient and objective use of available data while enabling independent calibration and evaluation of a model.Thus, in a biome with a total of n sites, we successively used n − 1 sites to calibrate model parameters and employed these parameters to predict GPP at the left-out nth site.
The first part of our analysis examines spatial covariance between remotely sensed estimates (or proxies) and in situ measurements of annual GPP.To this end, we first quantified the magnitude of spatial variance in annual tower GPP within each biome, and used this information to assess whether within-biome variance was sufficiently large relative to the uncertainty to provide meaningful information related to spatial variability in annual GPP.We then assessed the power of each remotely sensed proxy and model to explain spatial variation in annual tower GPP within each biome.Specifically, we compared variation in mean annual GPP across sites with the corresponding variation in each of the four remotely sensed proxies and mean annual GPP predicted by each of the models described in Sect.2.4.
To analyze interannual variation, we excluded sites with less than 3 years of GPP data.This resulted in a final data set composed of 302 site-years derived from 67 sites (Table 2).Also, because the magnitude of annual site anomalies tends to vary proportionally with the magnitude of mean annual GPP, we used relative annual anomalies for our analysis, which removes this effect.Specifically, the percent relative  annual anomaly was calculated as where RAA k (s, t) is the percent relative annual anomaly at site s in year t, A k (s, t) is the value of the variable k (in this case annual GPP), and MA k (s) is the annual mean of the variable k.Hereafter we refer to RAA k (s, t) as the relative annual anomaly.
Relative annual anomalies in tower GPP were compared with corresponding variations in the four remotely sensed proxies and annual GPP predicted from the five models described in Section 2.4.Note that we did not include results for the "Proxy+Met" model because interannual anomalies in annual temperature and precipitation were not significantly correlated with interannual anomalies in tower GPP.Finally, since large anomalies have high signal-to-noise ratios and are the main source of variance in interannual tower GPP, they provide a robust basis for assessing remote sensing proxies and models.To exploit this, we separately analyzed large anomalies, which we define here as those that exceeded ±10 % of mean annual GPP at each site (cf., Papale et al., 2006).

Spatial variation in mean annual GPP across sites
(i) Baseline characterization of spatial variability in tower GPP .Mean annual GPP varied from 1023-2240 g C m −2 year −1 across biomes.DBF had the lowest (321 g C m −2 year −1 ) and EBF had the highest standard deviation (913 g C m −2 year −1 ) in mean annual site GPP (Table 3).Among the four other biomes (CRO, ENF, GRA and SSMF), the standard deviation ranged between 400 and 600 g C m −2 year −1 (Table 3).DBF also had the lowest coefficient of variation (0.24), which was roughly half the magnitude observed in ENF (0.47), GRA (0.47) and SSMF (0.51).More importantly, spatial variation in mean annual site GPP in all biomes was significantly greater than average uncertainty (nominally ∼ 5 %; Table 3).The ratio of the standard deviation in annual GPP to average uncertainty was lowest (∼ 5) for DBF and highest for SSMF (∼ 10).
(ii) Spatial covariance between remotely sensed proxies and tower GPP.Remotely sensed proxies of GPP showed widely different ability to capture spatial variance in mean annual tower GPP, both within and between biomes.In CRO and DBF, only one of the four proxies (EVI-area and GPL, respectively) was significantly correlated with annual tower GPP.One or more proxies captured more than half the total variance in annual tower GPP in all biomes except CRO (Fig. 3).EVI-area was significantly correlated with annual tower GPP in five biomes, mean NDVI and EVI was significantly correlated in four biomes (ENF, EBF, GRA and SSMF), and GPL was significantly correlated with annual tower GPP in two biomes (DBF and ENF).Growing period length (GPL) was most highly correlated with annual tower GPP in one biome (DBF), EVI-area was most highly Fig. 4. R 2 , RMSE, MBE, and slope between modeled and measured mean annual GPP predicted from six models.TG model was not evaluated in EBF.RMSE is a measure of mean discrepancy between modeled and flux GPP.It does not distinguish between over-and under-prediction.MBE discriminates between over-and under-prediction and is a measure of mean bias between modeled and tower GPP.
correlated in two biomes (CRO and ENF), and mean EVI was most highly correlated in three biomes (EBF, GRA, and SSMF).
(iii) Spatial covariance between remote sensing-based models and tower GPP.The ability of the remote sensingbased models to capture spatial variation in annual GPP varied substantially within and between biomes.Overall, the "Proxy+Met" model provided the best overall prediction of mean annual tower GPP; mean annual site GPP predicted using this approach was significantly correlated with tower GPP in all six biomes (p < 0.05), and explained the largest variance in CRO (R 2 =0.38),DBF (R 2 = 0.47), GRA (R 2 =0.85) and SSMF (R 2 =0.70;Fig. 4).In ENF and EBF, tower GPP was most highly correlated with GPP predicted by the neural network model (R 2 =0.68 and 0.85 in ENF and EBF, respectively; Fig. 4), but GPP predicted by the "Proxy+Met" model captured nearly the same amount of variance (R 2 =0.63 and 0.82 in ENF and EBF, respectively).The RMSE and MBE for GPP modeled by the "Proxy+Met" model were among the lowest for all the six biomes (Fig. 4).Modeled GPP from the VPRM and neural network models also had low mean bias errors in one or more biomes, but the slope for least squares fits between modeled and tower GPP varied substantially across biomes for both of these models.

Temporal variation in annual site GPP across years
(i) Baseline characterization of interannual variation in tower GPP.Total interannual variance in GPP was dominated by years with large anomalies (Table 4).Average relative absolute anomalies were largest in GRA (17.5 %), followed by CRO (16.5 %) and SSMF (11.6 %).In the remaining three biomes (DBF, ENF and EBF), average relative absolute annual anomalies were less than 10 % and were lowest (8 %) in EBF (Table 4).The proportion of large anomalies was highest in CRO (67 %) and lowest in SSMF (26 %).Of the remaining four biomes, the proportion of year with large anomalies was 51 % in GRA, but less than one third of relative absolute annual anomalies were greater than 10 % in DBF, ENF and EBF (Table 4).
(ii) Covariance between interannual anomalies in remotely sensed proxies and tower GPP.Relative annual anomalies in growing season EVI and NDVI were significantly correlated with corresponding anomalies in tower GPP in EBF, GRA and SSMF (Fig. 5).Relative annual anomalies in growing season EVI showed marginally higher correlations with corresponding GPP anomalies in EBF (R 2 = 0.52) and GRA (R 2 = 0.64), and GPP anomalies in SSMF were most highly correlated with relative anomalies in NDVI (R 2 = 0.42; Fig. 5).No other proxies showed significant correlation with interannual anomalies in tower GPP, and agreement was especially poor in CRO, DBF and ENF, even for large anomalies in tower GPP.
(iii) Covariance between interannual anomalies in remotely sensed model predictions and tower GPP.In GRA, relative annual anomalies in GPP estimated by the TG, MOD17-Tower and neural network models explained substantial variance in corresponding tower GPP anomalies (R 2 ∼ = 0.6).In SSMF, the neural network model best explained the relative annual anomalies in tower GPP (R 2 ∼ = 0.7).In CRO, DBF, ENF and EBF, however, none of the modeled relative anomalies in annual GPP were significantly correlated with relative anomalies in tower GPP (Fig. 6).

Spatial variation in annual GPP
The results from this study show that remotely sensed proxies of GPP can successfully capture statistically significant and meaningful within-biome variation in GPP, but that the strength of the relationship is highly variable and depends on the biome and remotely sensed proxy.However, with the exception of ENF and GRA, proxies explained less than 50 % of the spatial variance in annual GPP.Thus, inferences regarding spatial patterns in GPP based on patterns observed in remotely sensed proxies should be made with caution.
Spatial covariance between remote sensing model predictions and tower-based annual GPP were similarly inconsistent; the majority of models explained less than 50 % of spatial variance in annual GPP.With the exception of croplands, the Proxy+Met model showed the best agreement with tower GPP.This result is consistent with the hypothesis that spatial variation in terrestrial GPP over large areas reflects an equilibrium response to climate (Burke et al., 1997;Richardson et al., 2010a).Recent studies have also suggested that light use efficiency model parameters should be calibrated to different climate types within biomes, thereby capturing spatial variation in ecosystem properties and processes (King et al., 2011).Our results appear to support this approach, and refined treatments that account for both temporal (interannual) and spatial (within-biome) variation in model parameters may help to resolve this issue.
Overall, the models had the weakest performance in CRO and DBF.Relatively weak performance of models in DBF at annual scale has also been noted in the context of dynamic ecosystem models (e.g., Schwalm et al., 2007).Most of the DBF sites included in this study are located in temperate regions where phenology co-varies with temperature and PAR, and exerts significant control on annual GPP across sites (Richardson et al., 2012).Across sites, one day of error in estimating spring and fall phenology can lead to errors of 12 g C m −2 in January to June GPP and 6 g C m −2 in July to December GPP estimates, respectively (Richardson et al., 2012).In LUE-based models such as those included in this analysis, interaction between PAR and phenology is primarily captured through APAR.Thus, the ability of LUE-based models to capture variations in GPP is closely tied with the accuracy of remotely sensed estimates (or surrogates) for FPAR.For a variety of reasons it appears that 8-day MODIS data (FPAR in MOD17,and EVI in VPRM and TG) do not consistently capture rapid phenological changes occurring over relatively short timescales in spring and fall, thereby introducing error to remote sensing-based estimates of annual GPP.Moreover, at instantaneous timescales temperature and vapor pressure deficit modulate leaf level photosynthesis (Farquhar et al, 1980).The LUE models we examined use daily (or 8-day) inputs and assume that leaf-level mechanisms hold at daily (or longer) timescales and are uniform   over large areas.However, whether and how leaf level processes scale to daily and longer timescales is an open question (Beer et al., 2010;Horn and Schulz, 2012), and some studies have observed that daily temperature and vapor pressure deficit exert only modest control on daily GPP (Gebremichael and Barros, 2006;Jenkins et al., 2007;Garbulsky et al., 2010).
In addition to the reasons mentioned above, the remote sensing methods tested here did not effectively explain spatial variation in annual GPP in crops, probably because agricultural practices that are not captured by remote sensing exert significant control on GPP in croplands.Specifically, application of fertilizers (Eugster et. al., 2010), variation in crop varieties (Moors et al., 2010), irrigation, and harvest practices significantly modify productivity in croplands (Suyker et al., 2004;Verma et al., 2005).These practices are not directly observable from remote sensing, and as a result, variation in productivity arising from these practices are not well-reproduced by remote sensing-based models (Zhang et al., 2008;Chen et al., 2011).

Interannual variation in GPP
Results from this work suggest that the ability of widely used remote sensing methods to explain interannual variation in GPP is relatively modest and varies significantly across biomes.In CRO, DBF, and ENF, relative annual anomalies in tower GPP were not significantly correlated with corresponding anomalies in remote sensing proxies and model predictions, even when anomalies less than ±10 % were excluded.This result suggests that important environmental drivers, biotic factors, and other unknown controls that influence interannual variability in GPP were not captured by the remote sensing proxies and models in these biomes.For example, moisture in the root zone is especially important and can affect annual GPP in both crops and seasonally dormant forests (Irvine et al., 2004;Zhang et al., 2006).Similarly, anomalies in spring phenology can have carry-over effects that influence GPP anomalies (e.g., Richardson et al., 2010b).Neither of these controls is directly observed or represented in the remote sensing proxies or models that we tested.In DBF high frequency variation in meteorological forcing has been shown to produce variation in GPP that accumulates over time and affects annual productivity (Medvigy et al., 2010).
On a more positive note, interannual anomalies in mean growing season greenness (EVI) and annual GPP were highly correlated in EBF.At the same time, relative annual anomalies in GPP predicted by the remote sensing models did not show comparable explanatory power at EBF sites.Thus, the additional complexity provided by the models not only failed to improve their performance but seemed to effectively cancel information provided by remote sensing.Several recent studies have documented large anomalies in greenness associated with drought in Amazon forests (Saleska et al., 2007;Samanta et al., 2010;Xu et al., 2011), which may significantly affect regional-to-global carbon budgets (Brando et al., 2010).The relationship between remotely sensed VIs and GPP at seasonal timescale is particularly complex in EBF (Huete et al., 2006).Despite these challenges, our results suggest that year-to-year variations in GPP may be partly driven by the corresponding variations in LAI.A number of EBF sites included in this study are located in subtropical Mediterranean locations where annual productivity is partly controlled by precipitation and large annual precipitation anomalies cause corresponding anomalies in VIs.
Anomalies in mean growing season EVI and NDVI explained ∼ 40-60 % of annual GPP anomalies in GRA and SSMF.In GRA, correlations between anomalies in mean EVI (and NDVI) and anomalies in GPP suggest that interannual variability in GPP in grasslands is tightly coupled to leaf area and supports the hypothesis that grasslands use LAI regulation to avoid moisture stress (Jenerette et al., 2009).Our results suggest that mean EVI and NDVI successfully capture the effect of moisture variability on GPP at GRA and SSMF sites, including moderate drought conditions when GPP can actually increase because of increases in LAI (Nagy et al., 2007;Mirzaei et al., 2008;Aires et al., 2008).
Finally, all the models included in this study assume that parameters such as light use efficiency are biome-specific and constant over time.VPRM and MOD17 specifically assume that variation in GPP can be explained by variation in FPAR and a small set of easily observable environmental variables (Table 1).Our results indicate that this assumption is not very robust, and that spatial and temporal variation (both within and across years) in key parameters may explain a significant portion of year-to-year variance in GPP.Models that use static or biome-specific parameters will not capture these dynamics (Polley et al., 2008;Stoy et al., 2009;Keenan et al., 2011) and therefore are not able to capture important sources of spatiotemporal variation in GPP.Moving forward, it may be possible to refine this weakness of remote sensing-based LUE models using complementary remote sensing metrics such as fluorescence or photochemicalbased reflectance indices that measure physiological properties of vegetation canopies that control photosynthesis (e.g., Guanter et al., 2012;Gamon et al., 1992).Recent studies have also suggested that lagged effects can significantly affect annual GPP (Gough et al., 2008;Marcolla et al., 2011).For example, Zielis et al. (2014) used long-term eddy covariance data collected at a spruce forest site to show that inclusion of meteorological data from the previous year significantly improved estimates of net ecosystem exchange, suggesting that next generation models need to include lagged effects and functional responses to climate forcing in previous years.

Challenges in comparing MODIS derived estimates with tower GPP
We identify two main challenges in a study like ours: landscape heterogeneity and uncertainty in tower GPP.Landscape heterogeneity is widely viewed to be an important factor that complicates interpretation of results from studies that couple flux data with data from MODIS.In this study, we accounted for landscape heterogeneity around tower sites using the MODIS land cover product.However, sub-pixel heterogeneity in land cover may still be a source of disagreement between observed and modeled fluxes.Furthermore, in biomes with strongly seasonal climates, sub-pixel heterogeneity can produce significant errors in remotely sensed phenology, which influences both observed and modeled primary productivity in many ecosystems.Emerging data sets and methods from Landsat for mapping both land cover and phenology (e.g., Melaas et al., 2013) at finer spatial resolution should address the issue of landscape heterogeneity and provide an improved basis for this type of analysis.Since year-to-year variations in GPP are relatively small, accurate characterization of uncertainty in tower GPP is important for interannual analysis.We used best available estimates of uncertainty in GPP.Estimating uncertainty in tower GPP is an active field of research and efforts are underway to improve our understanding of uncertainty in tower data.As more data become available from different sites and better characterization of uncertainty in GPP becomes possible, future analyses can factor in new and better estimates of uncertainty in tower GPP.

Conclusions
We draw two main conclusions from this work.First, the remote sensing models and proxies that we examined provide statistically significant and useful information related to spatial variation in annual GPP.Second, the remotely sensed proxies and modeled estimates of annual GPP only explained relatively modest amounts of variance in annual GPP across years.These conclusions are important for two main reasons.First, no previous study has explored these issues using a database as large and comprehensive as the FLUXNET La Thuile data set.Second, and more importantly, a large number of recent studies have used remote sensing to infer regional-to-global changes in GPP or net primary productivity.Many of these papers justify their conclusions based on previous studies that use models or proxies to explain spatial variance in annual GPP (or NPP) across large spatial scales or at seasonal timescales.The results from this study suggest the ability of remote sensing methods to explain spatial variance in annual GPP across widely different biomes should not be used to assume that remote sensing methods accurately capture variation in annual GPP within biomes.Similarly, the ability of remote sensing to capture seasonal variation in GPP should not be used to assume that remote sensing methods successfully capture variation in annual GPP across years.In both cases, the magnitude of variance is generally much larger in the former case (across biomes or within seasons) than it is in the latter case (within biomes or across years).
An additional important result from this work is that greater model complexity and higher temporal resolution did not improve the ability of models to explain spatial or temporal variance in annual GPP.Indeed, the simplest model ("Proxy+Met") explained the most spatial variance in annual tower GPP.Similarly, interannual variation in remotely sensed proxies explained as much or more interannual variance in GPP than any of the models.Spatial and temporal correlation between annual GPP and remote sensing proxies of total greenness (e.g., as measured by mean growing season EVI) was highest in moisture-limited biomes.In temperature-limited systems such as DBF and ENF, on the other hand, remotely sensed proxies showed statistically significant correlations with spatial variation in annual GPP, but almost no ability to explain interannual variation in GPP.

Fig. 1 .
Fig. 1.Location of 144 sites used in this study from the La Thuile data set.

Fig. 2 .
Fig. 2.Box plots showing the number of pixels in 7-by-7 pixel windows centered at each tower site whose land cover class matched the land cover corresponding to the tower sites (maximum agreement = 49).The pixel land cover classes were obtained from the MODIS Land Cover Dynamics Product and land cover at each tower site was obtained from the information provided in the La Thuile data set.Red line marks median and cross indicates outliers (> 3 standard deviation).

Fig. 3 .
Fig.3.R 2 between mean annual tower GPP and corresponding values from the four different remotely sensed proxies of GPP.GPL and EVI-area were not used in EBF.

Fig. 5 .
Fig.5.R 2 between relative interannual anomalies in tower GPP and the four proxies.GPL and EVI-area were not evaluated in EBF.

Table 1 .
'Remotely sensed proxies and models investigated in this study

Table 1 .
Summary of the remote sensing proxies and models investigated in this study.

Table 2 .
Total number of sites and site-years used in this study from the La Thuile data set.CRO, DBF, EBF, ENF and GRA are cropland, deciduous broadleaf forest, evergreen broadleaf forest, evergreen needleleaf forest and grassland, respectively.The category SSMF includes open and closed shrublands, savannas, woody savannas and mixed forest sites.Biome CRO DBF ENF EBF GRA SSMF TOTAL Mahadevan et al., 2008)to smooth the resulting MODIS time series.(iv)Temperatureand greenness (TG).Sims et al. ( To account for noise and missing data, we used quality assurance flags from the MODIS NBAR product to remove poor quality EVI and LSWI data and applied a locally weighted least squares www.biogeosciences.net/11/2185/2014/Biogeosciences, 11, 2185-2200, 2014 algorithm (

Table 3 .
Baseline statistics for annual tower GPP across sites in different biomes.

Table 4 .
Baseline statistics for relative annual anomalies in tower GPP in different biomes.