On estimating the gross primary productivity of Mediterranean grasslands under different fertilization regimes using vegetation indices and hyperspectral reﬂectance

. We applied an empirical modelling approach for gross primary productivity (GPP) estimation from hyperspectral reﬂectance of Mediterranean grasslands undergoing different fertilization treatments. The objective of the study was to identify combinations of vegetation indices and bands that best represent GPP changes between the annual peak of growth and senescence dry out in Mediterranean grasslands. In situ hyperspectral reﬂectance of vegetation and CO 2 gas exchange measurements were measured concurrently in un-fertilized (C) and fertilized plots with added nitrogen (N), phosphorus (P) or the combination of N, P and potassium (NPK). Reﬂectance values were aggregated according to their similarity ( r ≥ 90 %) in 26 continuous wavelength intervals (Hyp). In addition, the same reﬂectance values were resampled by reproducing the spectral bands of both the Sentinel-2A Multispectral Instrument (S2) and Landsat 8 Operational Land Imager (L8) and simulating the signal that would be captured in ideal conditions by either Sentinel-2A or Landsat 8.

Abstract. We applied an empirical modelling approach for gross primary productivity (GPP) estimation from hyperspectral reflectance of Mediterranean grasslands undergoing different fertilization treatments. The objective of the study was to identify combinations of vegetation indices and bands that best represent GPP changes between the annual peak of growth and senescence dry out in Mediterranean grasslands.
In situ hyperspectral reflectance of vegetation and CO 2 gas exchange measurements were measured concurrently in unfertilized (C) and fertilized plots with added nitrogen (N), phosphorus (P) or the combination of N, P and potassium (NPK). Reflectance values were aggregated according to their similarity (r ≥ 90 %) in 26 continuous wavelength intervals (Hyp). In addition, the same reflectance values were resampled by reproducing the spectral bands of both the Sentinel-2A Multispectral Instrument (S2) and Landsat 8 Operational Land Imager (L8) and simulating the signal that would be captured in ideal conditions by either Sentinel-2A or Landsat 8.
An optimal procedure for selection of the best subset of predictor variables (LEAPS) was applied to identify the most effective set of vegetation indices or spectral bands for GPP estimation using Hyp, S2 or L8. LEAPS selected vegetation indices according to their explanatory power, showing their importance as indicators of the dynamic changes occurring in community vegetation properties such as canopy water content (NDWI) or chlorophyll and carotenoids / chlorophyll ratio (MTCI, PSRI, GNDVI) and revealing their usefulness for grasslands GPP estimates.
For Hyp and S2, bands performed as well as vegetation indices to estimate GPP. To identify spectral bands with a potential for improving GPP estimates based on vegetation indices, we applied a two-step procedure which clearly indicated the short-wave infrared region of the spectra as the most relevant for this purpose. A comparison between S2and L8-based models showed similar explanatory powers for the two simulated satellite sensors when both vegetation indices and bands were included in the model.
Altogether, our results describe the potential of sensors on board Sentinel-2 and Landsat 8 satellites for monitoring grassland phenology and improving GPP estimates in support of a sustainable agriculture management.

Introduction
Mediterranean grasslands are highly biodiverse ecosystems, covering around 22 % of the European Union land area and providing important ecosystem services such as forage production (Bugalho and Abreu, 2008;Díaz-Villa et al., 2003). These ecosystems are subjected to large pressures under global change (Sala, 2000), namely by the increasing availability of nutrients (e.g. phosphorus, P, and nitrogen, N) due to human use of fertilizers and N deposition (Ceulemans et al., 2014;Galloway et al., 2004;Peñuelas et al., 2013) and by a decrease and shift in seasonal patterns of precipitation (Costa et al., 2012;Kovats et al., 2014). The contemporary changes in the supply of water and nutrients can affect species composition, biomass and phenology along the life cycle of annual grasslands (Harpole et al., 2007), compromising their productivity. In particular, the onset and duration of the senescence period, largely dependent on soil water availability, can be affected in Mediterranean grasslands with great impact on their functioning and consequences on gross primary productivity (GPP) (Aires et al., 2008a, b;Jongen et al., 2013b;Xu and Baldocchi, 2004). This uncertainty scenario increases the need for frequent monitoring of GPP along the growing season.
Using remote-sensing-based information to evaluate GPP brings important advantages, both from a scientific and a management point of view. Spectral retrievals collected from optical sensors on board remote platforms may provide information on many biophysical properties of vegetation and can be usefully employed for monitoring and modelling ecosystems GPP in a cost-and time-effective way (Schimel et al., 2015). Also, for land managers, the capability to make timely grassland management decisions may improve the use and sustainability of these ecosystems.
GPP estimation models integrating remote-sensing observations increased considerably in the last decades (Beer et al., 2010). Such models are generally based on the light use efficiency (LUE) concept (Monteith, 1972(Monteith, , 1977, which defines GPP as a function of the fraction of radiation absorbed by vegetation, which in turn depends on green leaf area and the efficiency by which light energy is used to fix carbon during photosynthesis (i.e. LUE) (Cheng et al., 2014;Yuan et al., 2014).
Based on this approach, a large amount of effort has been made to derive vegetation indices able to represent the green leaf area and LUE. The Normalized Difference Vegetation Index (NDVI) is widely used for its known linear relationship with the absorbed radiation (Fensholt et al., 2004;Joel et al., 1997;Myneni and Williams, 1994). However, some exceptions are reported in the literature. For example, in highly productive environments, such as grasslands, NDVI becomes easily saturated, not responding to increased leaf area and LUE, and the regression observed is no longer linear (Vescovo et al., 2012;Viña and Gitelson, 2005).
In annual grasslands, such as the Mediterranean, controls on ecosystem carbon balance are generally considered to be mainly related to the amount of green leaf area, while few LUE changes are expected (Gamon, 2015). Nonetheless, several studies reported a hysteresis in LUE in grasslands when the duration of the study encompassed the whole life cycle (Nestola et al., 2016;Pérez-Priego et al., 2015a).
The Photochemical Reflectance Index (PRI) is frequently adopted as a proxy of LUE  on the scale of leaf and canopy (Garbulsky et al., 2011). PRI in the short term represents the dynamic of the xanthophylls cycle (Peñuelas et al., 1995) which is related to thylakoid energization and hence to light harvesting by photosynthesis. In the long term, PRI was found to be correlated with the ratio of carotenoids to chlorophyll (Filella et al., 2004;Porcar-Castell et al., 2012) and hence to plant senescence, since chlorophyll degradation and N export is a distinctive process of leaf ageing (Thomas, 2013). However, PRI also shows some drawbacks, since it is largely affected by species identity, leaf age or environmental conditions (Peñuelas et al., 1995) and by sensor geometry and atmospheric factors (Moreno et al., 2012). Hence the performance of models integrating PRI is frequently below expectation (Pérez-Priego et al., 2015a).
As a result, other vegetation indices have been tested as alternatives to NDVI and PRI for GPP estimation. In a subalpine grassland, Rossini et al. (2012) obtained the best GPP estimate adopting the MERIS Terrestrial Chlorophyll Index (MTCI) (Dash and Curran, 2004), a proxy of chlorophyll, and PRI. In another study, in a subalpine grassland, Sakowska et al. (2014) found that the red-edge NDVI, a modified NDVI for which the infrared band is substituted with a red-edge band (Gitelson and Merzlyak, 1994), improved GPP estimates. In Mediterranean grasslands with different N and P fertilization levels, PRI, together with solar induced fluorescence, improved GPP estimates (Pérez-Priego et al., 2015a). In a semi-arid grassland, Vicca et al. (2016) observed that several vegetation indices, including NDVI and the Normalized Different Water Index (NDWI) (Gao, 1996), a proxy of vegetation water content, were able to capture the drought effect on GPP.
Altogether these results clearly indicate that, in spite of the usefulness of VIs in representing dynamic changes in biophysical properties of vegetation, further studies are needed to identify vegetation indices and the spectral regions that can be potentially interesting to estimate grassland GPP under different environmental constraints, such as nutrient availability.
The adoption of a specific model also depends frequently on the availability of remote-sensing products at suitable spatial and temporal scales. In the case of local-scale monitoring of managed grasslands, sensors with high spatial resolution will produce better results than sensors with coarse spatial resolution. In this study we opted for simulating data from Sentinel-2A MSI (Multi-Spectral Instrument) (hereafter named S2) and Landsat 8 OLI (Operational Land Imager) (hereafter named L8), for their spatial resolution (10-20 m for S2 and 30 m for L8), which is more suitable for representing the spatial heterogeneity of grasslands and hence better adapted to implementing management options from a precision agriculture perspective. L8 provides reflectance in seven bands, ranging from the visible to the short-wave infrared region (SWIR) (Loveland and Irons, 2016), but its main drawback is the long revisiting time of 16 days. The recently launched S2 covers the visible and near-infrared regions and also the SWIR in 13 bands with at worst a 5-day revisiting time for the combination of S-2A and S-2B platforms (Drusch et al., 2012).
Field collection of vegetation reflectance by hyperspectral sensors is less cost effective and more time consuming than satellite remote-sensing data but presents the advantage of providing reflectance in numerous high-resolution wavelengths (Porcar-Castell et al., 2015). Therefore, it can be usefully employed for identifying which wavelengths best reflect biophysical properties and the physiological status of vegetation Matthes et al., 2015) and point to regions of the spectra that are potentially interesting for GPP modelling, which until now have not been exploited by remote sensors. The high detail of spectral resolution (1 nm nominal) is a further advantage of hyperspectral measurements. In particular, it allows us to compare the performance of similar vegetation indices that are available from different satellite platforms and resample hyperspectral information to match spectral bands of different remote sensors.
A promising new technology is the use of space-borne imaging spectroscopy. The hyperspectral resolution of these images allows canopy properties to be identified with higher sensitivity than traditional vegetation indices. For example, the use of space-borne imaging spectroscopy was able to detect changes in canopy leaf area and water stress in a humid tropical forest, whereas NDVI and other vegetation indices failed .
The aim of this study was to identify combinations of vegetation indices and bands that better reflect GPP changes in the period comprised between the annual peak of growth and senescence dry out in Mediterranean grasslands that are subjected to different fertilization treatments. To achieve this goal, in situ hyperspectral measurements of vegetation reflectance were used to estimate GPP in a grassland northeast of Lisbon, Portugal, before and after the annual peak of growth which generally occurs in May, with large interannual differences (Jongen et al., 2011). A set of vegetation indices proposed in the literature were calculated and the performances of models used to estimate GPP based on linear combinations of vegetation indices and bands were compared.
Whenever a comparable spectral range was available for the S2 and L8, vegetation indices were also calculated to simulate the respective bands and the performances of GPP estimates based on remote platforms, and in situ hyperspectral measurements were compared.
The specific objectives of the study were (i) to test the impact of differing nutrient availability on GPP in Mediterranean grassland; (ii) to identify the set of vegetation indices to optimize GPP model in our experimental conditions; (iii) to compare the performance of GPP models employing only vegetation indices, spectral bands or a combination of both; and (iv) finally, to compare GPP models using spectral information obtained from hyperspectral sensors with similar models obtained from S2 and L8 platforms.

The study site
Our study was conducted in a semi-natural Mediterranean grassland at Companhia das Lezírias, an estate of approximately 15 000 ha, located north-east of Lisbon,Portugal (38 • 49 45.13 N,8 • 47 28.61 W). The grassland plant com-munity is composed of annual C 3 species. The climate is Mediterranean, with mild, wet winters and hot, dry summers. Long-term (1961Long-term ( -1990) mean annual rainfall is 709 mm. Mean annual temperature is 15.9 • C (INMG, 1991). Site topography is flat and the soil is a well-drained deep Haplic Arenosol (WRB, 2006).

Experimental design
The grassland studied is part of the Nutrient Network experiment (Borer et al., 2017;Seabloom et al., 2013). Plots (5 m × 5 m) were established in 2012, in a randomized block design. Factorial combinations of nitrogen (N), phosphorus (P), and potassium plus micronutrients (K), a total of eight treatments per block, including controls (C) with no added nutrients, were established. All nutrients were added at a rate of 10 g m −2 yr −1 . N was added as a slow-release urea (60-90 days), P was added as triple-super phosphate and K as potassium sulfate. Micronutrients (6 % Ca, 3 % Mg, 12 % S, 0.1 % B, 1 % Cu, 17 % Fe, 2.5 % Mn, 0.05 % Mo, and 1 % Zn) were added with K only once, at the start of the study to avoid possible micronutrient toxicity. In this study, only four fertilization treatments were considered: C, N, P and NPK. Each one of these treatments was repeated twice per block, and a total of 24 plots were measured (2 replicates × 4 treatments × 3 blocks).

Environmental measurements
Temperature, PAR and relative humidity were measured in situ using a VP-3 humidity temperature and vapour pressure sensor and QSO-S PAR Photon Flux sensor (Decagon Devices, Pullman, USA) logged every 30 min (EM50 data logger, Decagon Devices, Pullman, USA). Precipitation was recorded using a tipping bucket rain gauge (RG2, Delta-T Devices, Cambridge, UK). Soil water content (SWC) was continuously measured at a depth of 10 cm, which corresponds to the main rooting zone (Jongen et al., 2013a;Schenk and Jackson, 2002), using EC-5 soil moisture sensors (Decagon Devices, Pullman, USA). The rain gauge and soil sensors were connected to a CR1000 and AM16/32B multiplexer data logger (Campbell Scientific, Logan, USA).

Field measurements
2.4.1 NEP and R from a closed system IRGA Grassland net ecosystem productivity (NEP) was measured with a closed chamber (40 cm × 40 cm × 54 cm) of polymethylmethacrylate (3 mm thick) inserted into a permanent frame buried 5 cm into the soil. Radiation transmittance was higher than 95 %. The same chamber was covered with a reflective cloth for dark respiration (R) measurements. Air temperature inside the chamber was continuously monitored and PAR was measured at the beginning and end of measurements with a ceptometer (AccuPAR-LP80, Decagon De-vices, Inc. Pullman, WA, USA). Fans in the chamber ensured air circulation. The chamber was connected to an infrared gas analyser (LI-840, Li-Cor, Lincoln, NE, USA), measuring CO 2 and water vapour. Each measurement was no longer than 3 min. Fluxes were calculated based on the rate of change in CO 2 inside the chamber after an initial period of at least 10 s. Flux calculations and corrections for CO 2 water vapour dilution followed Pérez-Priego et al. (2015b). GPP was obtained by subtracting R from NEP at each measurement. All plots were measured between 11:00 and 13:00 UTC on clear-sky days, as close as possible to field spectroradiometric measurements. Measurements were taken during the 2016 growing season. Two field campaigns were carried out during vegetation growth, on day 1 (31 March-1 April) and day 2 (24-25 April) and two during the senescence phase, on day 3 (19-20 May) and day 4 (1-3 June).

Green plant area index and biomass
The plant area index (PAI), a measure of all above-ground plant structure, was indirectly measured with a linear PAR ceptometer (AccuPAR LP-80 Decagon Devices Inc., Pullman, WA, USA). The ceptometer measures the fraction of PAR intercepted by the canopy (fPAR) according to Eq. (1): where PAR i is the incoming PAR measured above the canopy and PAR t is the PAR transmitted through the canopy, measured below it. The fPAR was considered approximately equal to absorbed radiation, as the amount of reflected radiation in the PAR range is usually low (Gower et al., 1999). For each plot, 6-8 measurements above (PAR i ) and below (PAR t ) the canopy were taken and averaged.
The PAI is calculated by inversion of the Beer-Lambert law (Eq. 2): where K is the light extinction coefficient, which depends on the leaf angle distribution of the canopy and on the zenith angle of the probe. The first is considered spherically distributed (Jones, 1992), the second is calculated by the ceptometer considering the geographic coordinates and date and time of measurements. To avoid low solar zenith angles all measurements were taken around solar noon. As the growing season progressed, some species started to senesce. In order to estimate the fraction of PAR absorbed only by photosynthesizing components of the canopy ("green" fPAR and PAI, fPARgr and PAIgr respectively), fPAR and PAI were multiplied by a normalized (by scaling between 0 and 1) greenness index (GI, calculated as a ratio between the digital number values of green and the sum of red, green, and blue digital number values) derived from the analysis of digital pictures of the plots taken at each measurement day around solar noon (Cyber-shot DSC-W530, SONY), using the Phenopix R package (Filippa et al., 2016).
To determine above-ground productivity, a strip of vegetation (0.1 m × 1 m) within each plot was collected close to peak growth and biomass divided into functional types (legumes, forbs, graminoids) and dried in an oven until constant weight at 60 • C.

Hyperspectral measurements of vegetation reflectance
At each field campaign, hyperspectral observations of all plots were performed with a FieldSpec3 spectroradiometer (ASD Inc., Boulder, USA), which provides reflectance of vegetation in the range of 350-2300 nm. The spectral resolution (full width at half maximum) is 3 nm at 700 nm and 10 nm at 1400 and 2100 nm. The sampling interval is 1.4 nm for the spectral region of 350-1000 nm (visible and near infrared) and 2 nm for the spectral region of 1000-2500 nm (short-wave infrared). A white reference of known reflectance (Spectralon panel, Labsphere, Inc., North Sutton, USA) was used to normalize for variations in atmospheric conditions and to convert the measurements into absolute reflectance (Ref). Spectra were collected using a bare-fibre optical cable (with an instantaneous field of view of 25 • ) inserted into a pistol grip at approximately 90 cm above the canopy and a nadir view. Five spectra were recorded for each plot, each one representing an average of 25 observations. All measurements were conducted immediately after grassland gas exchange measurements, within 2 h around solar noon, to minimize the effects of shadowing and solar zenith changes.

Data analysis
All statistical analyses were performed using open-source R (R Core Team, 2017). We used the lme4 package (Bates et al., 2014) to perform linear mixed-effect analyses of the effect of the fertilization and control treatments on NEP, R, GPP and PAIgr. Treatment and date were the fixed effects and the block was the random effect. Conditions of homoscedasticity and normality were always verified by visual inspection of residuals. P values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question. A Tukey test was used for posthoc comparison using the multcomp package (Hothorn et al., 2008).
The full spectra of vegetation reflectance retrieved from the Fieldspec was used to model GPP, after excluding noisy values in the range 1350-1400 and 1800-1950 nm. Our P = 1748 original explanatory variables are x 350 , . . . , x 2299 where x λ represents the reflectance in the narrowband [λ, λ+1] (nm) and our response variable is the GPP (µmol m −2 s −1 ). A total number of 96 observations were available (4 treat- ments × 2 replicates × 3 blocks × 4 dates). Since we have 1748 explanatory variables and just 96 observations, hence a high level of redundancy in our data, the number of predictors were first reduced by grouping variables that belong to intervals of wavelengths at which all variables are highly correlated. A hierarchical cluster analysis was performed to reduce the number of predictors from P = 1748 to p = 25 groups of contiguous variables named "bands". The basic idea is to aggregate contiguous and highly correlated individual 1 nm intervals of wavelength into broader wavelength bands. Two original predictors x λa , x λb are clustered together if their correlation coefficient r(x λa , x λb ) is larger than 0.90. Bands correspond to the largest contiguous intervals at which all pairs of original predictors satisfy that condition. If a band groups all original predictors between λ 1 and λ 2 , then it is represented by a new variable x [λ1,λ2] , which is the arithmetic mean of the original variables x λ1 , . . . , x λ2 . The procedure is repeated to obtain all bands that partition the full x 350 , . . . , x 2299 spectrum. Reflectance values were also resampled to simulate bands of Sentinel-2A MSI (S2) and Landsat 8 OLI (L8). Since each band of S2 or L8 has a spectral response which is not perfectly uniform, we use the spectral response function of each sensor (Barsi et al., 2014;ESA, 2018) to weigh the contribution of each original predictor. As a result, for each sensor and band [λ 1 , λ 2 ], we calculated the reflectance as a weighted mean of x λ1 , . . . , x λ2 , where the weights are given by the spectral response. The list of S2 and L8 bands used in this study is shown in Table 1.
Vegetation indices (VIs) ( Table 2) were calculated from hyperspectral (Hyp), or simulated S2 and L8 sensors (Table 1). The selected VIs were retrieved from the literature based on their relation to biophysical properties of vegetation affecting GPP. Given that the goal of the analysis is to determine the set of VIs and/or bands that best model GPP, we apply a data analysis method that identifies the best subset of single variables. This is distinct from principal component analysis (PCA) where dimensionality reduction is achieved through replacing variables by their linear combinations, which still involve all the variables. We adopted linear regression (MLR) to model the relation between our explanatory variables (bands and VIs) and the response variable (GPP). Although the expressiveness of non-linear models (e.g. in the field of machine learning) is stronger than MLR, we believe that linear models provide a clearer interpretation of the relation between predictors and GPP, while offering enough flexibility by including variables in a high-dimensional representation space. Moreover, and as discussed below, linear models allow us to derive confidence intervals for our results, apply statistical tests to compare models at a given significance level, and they are less prone to overfitting than complex non-linear models.
Since the number of observations are only roughly twice as large as the number of new explanatory variables, we performed variable selection and excluded those that do not contribute significantly to the goodness-of-fit of our model. Although the dimensionality of the problem is very large, it can be solved efficiently by the LEAPS algorithm (Furnival and Wilson, 1974) available through the R package leaps (Lumley, 2009). Unlike alternative heuristic approaches (Cadima et al., 2004), LEAPS returns the optimal subset of predictors according to a given criteria. In our analysis, the criteria was the adjusted R 2 , so LEAPS returns the sub-model with the highest-adjusted R 2 among all possible 2 p sub-models, where p is the number of predictors.
A nested approach was adopted to formally test which model better explained GPP. A preliminary test showed that better results were obtained with exponential regressions and therefore ln GPP was adopted as the response variable in all analyses. Tissue water content Peñuelas et al. (1993) The general model was ln GPP ∼ n j,1 vj , where v is vegetation indices (VIs) or optical bands (B) from Hyp grouping procedure or from simulated S2 or L8 data. The subset of ν j was selected by maximizing the adjusted R 2 among all possible combinations of predictors. The LEAPS procedure returns an optimal model that we called L. However, L may include variables which contribute only marginally for the overall adjusted R 2 . To further reduce the dimensionality of the predictors, we test sub-models of L (obtained by backwards stepwise selection of predictors) against the LEAPS optimal model L. When sub-models of L were found not to be significantly worse than L at a significance level alpha = 0.05, we then considered the most parsimonious of those sub-models as the optimal solution. A F test was used to perform those comparisons. The analysis was repeated separately for all vegetation indices (VIs) and bands (B) from Hyp, S2 or L8 data, obtaining an optimal model for each sensor. We performed an analysis of residuals for each selected model, which showed no evidence of violation of the linear model assumptions.
Besides determining the adjusted R 2 for the optimal model from the full sample, we applied a bootstrap procedure (N = 10 000 iterations) to estimate the distribution of the adjusted R 2 in the whole population (Ohtani, 2000). This allowed us to estimate quantiles (25-75 %) for adjusted R 2 and also compare the adjusted R 2 distributions among models. In particular, it permits an estimation of the probability that the model A has a higher adjusted R 2 than the alternative model B.
Two-step models were also used to investigate whether optical bands had the potential to improve models based only on vegetation indices (VIs). Toward that end, bands (B) were added to the optimal models obtained by the procedure described above denoted by Hyp-VIs, S2-VIs and L8-VIs (step 1). Using step 1 as the base model, we applied LEAPS to determine the subset of bands that maximized the overall adjusted R 2 . As before, we applied a F test (alpha = 0.05) to possibly reduce the number of bands in the optimal model. As a result, we defined the optimal two-step models: Hyp-VIs + B, S2-VIs + B and L8-VIs + B. Finally, for Hyp, S2 and L8, we performed a F test to compare the one-step optimal model with the correspondent two-step optimal model. A low p value for this F test indicates that the two-step model is significantly better and means that bands, in addition to vegetation indices, contribute for an improved modelling of GPP.

Conditions during the experimental period
During the period of measurements, from 31 March to 3 June 2016, the average daily PAR and temperature increased progressively, ranging from 630 to 1000 µmol m −2 s −1 and from 9.6 to 17 • C, respectively (Fig. 1a). Soil water content (SWC) (Fig. 1b) showed fluctuations according to rainfall events, ranging from 0.05 to 0.2 m 3 m −3 . During the experimental period, rainfall was concentrated in the first half of April and at the beginning of May. Along the experimental period rainfall recorded was 195 mm, corresponding to the 33 % of the whole year.

The effect of fertilization on plant area index and functional groups proportion
From the beginning to the end of the study period, PAI increased on average 4-fold from 1 to 4 (Fig. 2a). In all treatments, the increase in PAI was completed by 20 May and no further increase was observed in the last measurement (3 June). On the contrary, at the beginning of the experiment, PAIgr (Fig. 2b) showed an increasing tendency similar to PAI, but from 20 May onwards, the tendency changed and a decrease was observed, corresponding to the onset of grassland senescence. The fertilization treatments influenced both the PAI (P < 0.000) and the PAIgr (P < 0.000), being both significantly higher for treatments NPK and P than for treatment C (P < 0.001). No differences were observed between C and N treatments (P > 0.05). In both PAI and PAIgr the treatment P showed similar values to NPK, with the exception of the first measurement day (1 April). The grassland communities fertilized with NPK had a higher and earlier leaf area growth when compared to the other treatments.
Plant species composition has been measured every year since 2012 (pre-treatment) under an ongoing long-term nutrient addition experiment on this grassland site. In line with results from that study (Carla Nogueira et al., personal communication, 2017), the fertilization treatments influenced the functional composition of grasslands (Table 3). In the NPK treatment the percentage of graminoids was higher than in any of the other treatments. P treatment showed a higher percentage of legumes and in the C and N treatments forbs were the dominant functional group.

The effect of fertilization on GPP
The ability of grasslands to sequester atmospheric carbon dioxide was not affected by fertilization treatments. The   . 3a) and the GPP (Fig. 3c) did not reveal any statistically significant difference among treatments (P > 0.05). On the contrary, the rate of respiration (Fig. 3b, R) was affected by the fertilization treatment (P < 0.05), being on average higher for treatments NPK and P than C. CO 2 gas exchanges were influenced by the grassland life cycle and marked trends were observed along the measurement period. NEP showed an average drop of 74 %, shifting from 14.47 to 3.67 µmol m −2 s −1 , from 1 April (day 1) to 3 June (day 4) (P < 0.000). This decrease in NEP rate was particularly evident from the second to the third measurement day, after the annual peak of grassland growth was achieved (Fig. 3a). R also showed differences along the experimental period (P < 0.000) but the observed trend was different. R increased from the first to the second measurement day, from 8.22 to 13.65 µmol m −2 s −1 and then decreased toward the end of the experiment (Fig. 3b). GPP also changed significantly along the studied period (P < 0.000), q q q q q q q q q q q q q q q q decreasing from 25.72 µmol m −2 s −1 on 25 April (day 2) to 12.12 µmol m −2 s −1 on 3 June (day 4). A linear relationship was observed between GPP and fPARgr (Fig. 4); however the slope of the regression line changed along the experimental period and marked differences were observed between the vegetation growth (day 1 and 2) and the senescence phase (day 3 and 4) (Fig. 4).

Vegetation reflectance
The reflectance of vegetation (Ref) varied on average between 0 and 0.4 (Fig. 5a). The cluster analysis created 25 bands (Fig. 5b) based on the Ref similarity of contiguous wavelengths (r > 90 %). Bands were narrower in the visible region (350 to 750 nm) than in the NIR (750 to 1350 nm) and in the SWIR (1350 to 2300 nm) region. In particular, in the red-edge region, between 698 and 732 nm six different bands were identified, corresponding to a steep increase in reflectance observed in this region of the spectra. Table 4. Linear regressions established between lnGPP and vegetation indices (VI) selected for this study (see Table 2). Best regression is shown in bold.

Vegetation indices
When adopting wave bands obtained by cluster analysis (Fig. 5b) several vegetation indices were calculated ( Table 2).
The average values of the indices GNDVI, NDWI, PSRI and MTCI are shown in Fig. 5. Other indices are omitted from the figure for showing very similar trends to the ones represented (NDVI, NDVIre, WBI, CI) or not being significantly correlated with the response variable (PRI). All of them showed larger changes during the study period, particularly after 25 April (day 2), when the annual peak of growth was achieved. The GNDVI (Fig. 6a) showed small changes among treatments and dates, with a significant drop of 20 % observed from 1 April to 3 June (P < 0.000), and significant differences in the NPK and P treatments (P < 0.001) compared to C but differences were not evident anymore on 3 June (day 4). The MTCI (Fig. 6b) showed a large drop that was particularly evident after 25 April (day 2). At 1 April (day 1), the effect of fertilization was evident in treatments NPK and P compared to C (P < 0.001); however along the experimental period differences among treatments diminished and by 3 June (day 4) no differences among treatments were observed. The NDWI (Fig. 6c) showed a similar temporal trend with a marked decrease from 1 April to 3 June (P < 0.001). Also, for MTCI, the NPK and P treatments always showed higher values than C (P < 0.001), suggesting a positive impact of the higher nutrient availability on tissue water content. The PSRI had an opposite trend, showing on average a 3-fold increase from 1 April to 3 June (P < 0.000) and a tendency to lower values in fertilized treatments compared to C (P < 0.001 for NPK and P and P < 0.01 for N).
Significant regressions were established between GPP and all the vegetation indices considered (Table 4) with the exception of PRI. The NDWI was the index that explained the higher proportion of variability of GPP.  Figure 4. Relationship between GPP and fPARgr observed in grassland subjected to different fertilization treatment (C, N, NPK and P). Measurements taken during the vegetation growth are indicated as circles (corresponding to measurement days 1 and 2), while those realized during the senescence period are indicated as triangles (measurement days 3 and 4). Linear regression lines were fitted separately to the two periods (GPP = 14.48 fPARgr + 18.44, R 2 = 0.39, P < 0.001 and GPP = 30.08 fPARgr + 6.39, R 2 = 0.65, P < 0.001, respectively for the growth and the senescence period).

GPP estimates by multiple linear regression models
The LEAPS procedure selected VIs or bands as predictor variables that were retrieved from hyperspectral data (Hyp) or simulating Sentinel-2 MSI (S2) and Landsat 8 OLI (L8) sensors.
The Hyp and S2 models adopting only VIs as predictor variables (Hyp-VIs and S2-VIs) performed similarly, with a considerable overlap of adjusted R 2 (Table 5). On the contrary, the L8-VIs model showed a lower performance (lower adjusted R 2 ) than Hyp-VIs and S2-VI models (Table 5). Bootstrap results allowed us to conclude, at a confidence level of 90 %, that Hyp-VIs have higher adjusted R 2 than L8-VIs.
The selection of VIs in the Hyp-VIs and S2-VIs models exhibited a similar spectral pattern. Both models included PSRI and GNDVI. On the contrary, NDVI, the most frequently adopted index as green leaf area proxy was not included in the Hyp-VIs model but only in the S2-VIs. Two of the indices included in the Hyp model are related to water balance (WBI) and water tissue content (NDWI). The S2 model also includes MTCI, which represents chlorophyll a and N.
Models including only bands (-B) showed similar performances to respective models employing vegetation indices (-VIs). Only in the case of L8, where just one vegetation index (NDVI) was available, bands (L8-B) led to better modelling of GPP than vegetation indices (L8-VIs). Similar spectral patterns were also observed in the selection of bands for GPP estimate for all sensors (Hyp, S2, L8). A common pattern is the inclusion of bands in the SWIR region that are strongly represented in the Hyp-B (R 1951−2299 , R 1209−1327 , R 1328−1349 ), S2-B (B11) and L8-B (B6 and B7) models. The red-edge region of the spectra was also largely represented in the Hyp-B (R 724−732 , R 706−710 , R 702−705 , R 698−701 , R 716−723 ) and S2-B (B5, B6 and B7), underlining the importance of this region for vegetation reflectance.
The LEAPS two-step procedure allowed us to identify bands with the potential to improve the VI-based models, identifying regions of the spectra that are generally not adopted in vegetation indices. For both S2 and L8 the twostep model (VIs + B) significantly increased (P < 0.010) the Table 5. Best selection of linear models for a GPP estimate according to the general equation: lnGPPP ∼ n j,1 vj , where v is vegetation indices (VIs) or optical bands (B). Bands and vegetation indices are obtained from hyperspectral measurements grouped in clusters with 90 % similarity (Hyp) or resampled for simulating Sentinel-2 MSI (S2) and Landsat 8 OLI (L8) sensors. Formulation of vegetation indices is shown in Table 2. The order of the variables (most important first) reflects their importance in the model. The quantiles 25 % and 75 % of the adj R 2 are obtained from a bootstrap with 10 000 iterations. The two-step models add a selection of bands to the variables (VIs) selected at step 1. A low p value indicates that the step 2 model, including VIs and bands (step 2) is significantly better the corresponding step 1 model.

Model
Step one  performance of the model, while for Hyp the difference between Hyp-VIs and Hyp-VIs+B model, in spite of still being significant, is less marked (P < 0.05). The bootstrap proce-dure indicated that the probability of the Hyp-VIs + B being significantly better is 83 % when compared to S2-VIs+B and 81 % when compared to L8-VIs+B. On the contrary, the S2-VIs+B and L8-VIs+B models do not differ significantly. In all VIs+B models, bands in the SWIR region were included. The second region of the spectra that was more represented in the Hyp-B and Hyp-VIs + B model was the red edge.

The impact of fertilization treatment
The fertilization treatment influenced the growth rate and the composition of the herbaceous plots more than carbon sequestration. In line with a 5-year ongoing study at the same grassland site, the fertilization treatment resulted in differences in proportions of above-ground biomass and functional groups for NPK and P treatments compared to C, while the single addition of N had no effect. An earlier growth response was also observed in the NPK treatment. The higher percentage of graminoid species with on average higher growth rates compared to most forb species (Ansquer et al., 2009;Craine et al., 2001;Westoby et al., 2002) could explain early differences in PAI and PAIgr in this treatment. As leaf area is usually positively related to GPP (e.g. Aires et al., 2008b;Xu and Baldocchi, 2004), it would be expected that higher PAIgr in NPK treatments would induce increased GPP. However, confounding factors such as increased water demand associated with higher growth rates might have downscaled differences between treatments (e.g. Weisser et al., 2017).
The relationship GPP-fPARgr showed some differences along the experiment. The slope of the regression line was considerably lower during the vegetative growth than during the senescence period (Fig. 4), revealing the occurrence of marked changes in the effective LUE along the growth life cycle and confirming previous results (Nestola et al., 2016;Pérez-Priego et al., 2015a).
While the fertilization treatment had no impact on NEP or GPP, a higher R rate was observed at the measurement day 4 (3 June) in the NPK and P treatments. Differences can be ascribed to the higher PAI, and precipitation at the end of May, which must have stimulated soil respiration (Jarvis et al., 2007;Reichstein et al., 2003).

Best VIs for GPP estimation
The LEAPS procedure selected several indices as significant predictor variables for GPP in the Hyp-VIs and S2-VIs model ( Table 5). The vegetation indices selected in the Hyp-VIs and S2-VIs models are known to represent different properties of vegetation, specifically: the green fraction of the leaf area (GNDVI and NDVI), the chlorophyll a and N concentration (MTCI), the ratio carotenoids / chlorophyll (PSRI) and the tissue water content (NDWI, WBI). Each of these traits has a major role in GPP.
Among the vegetation indices selected in the multiple linear models, both the Hyp-VIs and the S2-VIs included PSRI (Merzlyak et al., 1999), which is generally applied to detect the occurrence of vegetation senescence. PSRI is able to capture changes in the carotenoids / chlorophyll ratio occuring during vegetation senescence since chlorophyll declines more rapidly than carotenoids (Merzlyak et al., 1999). In this study, PSRI increased in all treatments after 25 April, when the maximum peak of growth ( Fig. 2b) was achieved and close to the onset of canopy drying out. Another index known to be related to the carotenoids / chlorophyll ratio, the PRI (Filella et al., 2009), showed no correlation with GPP in our study. These results are in contrast with previous studies (Pérez-Priego et al., 2015a). However, a low performance of PRI in representing the carotenoids / chlorophyll ratio has been already observed in semi-arid grasslands (Vicca et al., 2016). In crops, a good agreement between PRI and pigment pools was observed at leaf level (Gitelson et al., 2017a) but not at stand level (Gitelson et al., 2017b). Differences in the last two studies were ascribed to changes in canopy structure (e.g. changes in leaf inclination angle) over the growing season.
The Hyp model also gave evidence of the importance of changes in canopy water content, as both NDWI (Gao, 1996) and WBI (Peñuelas et al., 1997) were included in the model. Considering changes observed in NDWI along the experiment and the good correlation observed between NDWI or WBI and GPP it is reasonable to assume that GPP is largely affected by the progressive senescence of vegetation Vescovo et al., 2012). In a previous study, Vicca et al. (2016) found that NDWI was able to estimate GPP in semi-arid grasslands better than other indices, allowing the effect of drought to be distinguished.
Other indices that are sensitive to changes in chlorophyll a concentration, MTCI (Dash and Curran, 2004) and GNDVI (Gitelson and Merzlyak, 1998) were also included in the model. The fertilization treatment resulted in an increase in MTCI during the first stage of the experiment in the NPK and P treatments, followed by a decrease observed in all treatments as the season progressed toward the end of the annual growth cycle. A similar trend was observed in a study by Pérez-Priego et al. (2015a) in which Mediterranean grasslands were subjected to fertilization with N or NP. The primary role of chlorophyll in photosynthesis is well known and justifies the positive relationship observed between GPP and MTCI. However, in the present study, no differences were observed in GPP among fertilized and non-fertilized treatments, suggesting that the expected increase in photosynthesis due to the increase in chlorophyll and nitrogen was constrained by other environmental and physiological factors.
Notably NDVI, the most frequently applied index in GPP estimates by LUE models (Yuan et al., 2014) was not selected in the Hyp-VIs model and showed a poorer coefficient of determination than other indices (e.g. NDWI). NDVI is expected to reflect changes in green leaf area, being generally linearly related to the fraction of photosynthetically absorbed radiation (Myneni and Williams, 1994). However, previous studies reported a saturation of NDVI and consequent lack of linearity in the regression in highly productive vegetation communities (Gianelle et al., 2009;Vescovo et al., 2012), such as grasslands, and sometimes other indices showed a better performance. For example, in grasslands subjected to water and nutrient stress, the NDVI green index (GNDVI), which adopts a green band instead of the red band of NDVI and is hence more sensitive to chlorophyll a concentration (Gitelson et al., 1996), showed a better performance than NDVI as proxy of leaf area (Cristiano et al., 2010;Gianelle et al., 2009). Also, in this study, the GNDVI explained a larger proportion of GPP variance than NDVI in the Hyp-VIs and S2-VIs models being selected in both and before NDVI in the S2-VIs model.
The indices selected by the LEAPS (i.e. NDVI, GNDVI, NDWI, MTCI, PSRI and WBI) also showed a highly significant relationship with GPP (Table 4) in simple regressions, explaining 63 % to 72 % of the variability observed. The functional convergence (Ollinger, 2011) of different traits participating in the photosynthetic process may have hampered results observed in the regression for each single vegetation index, showing a high degree of correlation for most of them (Table 4). However, the selection of several VIs, representative of different structural and functional traits in the multiple linear models and the lower performance observed in the L8 model, including solely the NDVI index, clearly in-dicate the importance of considering the contribution of different traits with different temporal dynamics to capture GPP temporal changes in models integrating vegetation indices.

Are spectral bands better GPP estimators than VIs?
Our results suggest a marginal improvement in GPP estimates obtained by adopting bands (Hyp-B, S2-B, Table 5) instead of vegetation indices (Hyp-VIs, S2-VIs, Table 5). However, a large impact was observed in the case of the L8-VIs+B model when compared with the L8-VIs model which included only NDVI (Table 5).
These results confirm previous studies comparing the explanatory power of VIs and bands in grasslands Matthes et al., 2015), also evidencing the importance of the selection of the proper spectral range and suggesting that the selection of the proper band is equally important to the mathematical formulation of vegetation indices for the explanatory power of spectral retrievals. In our study, the adopted approach assured a high correlation among wavelengths within a band maximizing their representativeness.
However, it cannot be disregarded that VIs offer, in comparison with spectral bands, the advantage of being more robust in representing vegetation features, since differences resulting from background reflectance, sun-sensor geometry and atmospheric effects are mitigated by normalization of spectral values (Glenn et al., 2008).
Our results also evidenced the importance of the SWIR region of the spectra, as bands in this region were selected in all one-and two-step models, which is rarely adopted in vegetation indices with few exceptions. The SWIR region is known to correlate with canopy water content (Casas et al., 2014). Studies investigating the potential of spectral bands to estimate canopy chlorophyll content and green fAPAR found that the SWIR region was strongly positively correlated with them in grasslands (Sakowska et al., 2016) and also GPP in a semi-arid savanna (Tagesson et al., 2015).
Bands in the red-edge region were also largely represented in the Hyp-B, S2-B and in the Hyp-VIs + B models. The red edge corresponds to the steep increase in reflectance at the boundary between the red region, where chlorophyll is absorbed, and the leaf scattering at the NIR region. Red-edge bands were successfully employed to estimate chlorophyll content in maize (Zhang and Zhou, 2017) and LAI in crops (Kira et al., 2017). For these reasons they were integrated into numerous VIs, such as MTCI and PSRI, which were also applied in this study. This explains the lack of red-edge bands in the second step of the S2 model (S2-VIs + B) but strong representation in the S2-B.

Satellite sensors as estimators of GPP
Differences in the selection of the vegetation indices among sensors had apparently no effect on the performance of the S2-VIs and Hyp-VIs models (Table 5), while the limited number of available vegetation indices for the L8 resulted in a lower performance of the model. Our results show that S2 and L8 spectral signatures are equally suitable for assessing GPP.
GPP estimates obtained by simulating S2 and L8 sensors showed similar performances in the -B and -VIs + B models, while when only VIs were adopted, the S2 model had clearly a better performance than L8. These results suggest a need for testing new vegetation indices adopting L8 bands. In agreement with our results, other studies comparing linear additive models showed a similar ability in estimating canopy cover and LAI adopting the S2 or L8 sensors (Korhonen et al., 2017).
An important difference between the S2 and L8 availability of wavebands is the lack of reflectance values in the rededge region for L8, which limited the possibility of computing VIs, such as MTCI and PSRI (Korhonen et al., 2017). However, the limitation imposed by the lack of bands in the red-edge region, had apparently more importance for the -VIs model, while differences in the performance of the model between S2 and L8 decreased for -B and VIs+B models.
In this study, the S2 and L8 data comparison was based only on the simulation of the respective bands and did not take into consideration other factors that possibly affected the spectral response of sensors such as sun-sensor viewing geometry (Tagesson et al., 2015). Nonetheless, in a recent study (Korhonen et al., 2017), the comparisons of satellite data from the two platforms showed no differences between S2 and L8 reflectance values in the NIR, SWIR1 and SWIR 2 bands. In other regions of the spectra, such as the green and blue bands, reflectance values were considerably smaller in the S2 than in the L8 but still proportional, suggesting that comparisons between S2 and L8 simulated bands can largely be representative of the actual differences obtained by the two remote platforms.
Our results confirm the importance of performing hyperspectral measurements. Indeed, inferential bootstrap results show that for the whole population, and with a probability of 80 %, the Hyp-VIs+B model is superior to the corresponding S2 and L8 models. The high resolution and the wide range of wave bands makes hyperspectral sensors unique in identifying regions of the spectra that are of high interest for representing different vegetation properties (Porcar-Castell et al., 2015).

Conclusions
In agreement with previous studies (Pérez-Priego et al., 2015a;Rossini et al., 2012;Vicca et al., 2016), our results clearly indicate the need to integrate spectral information into GPP models, representing both structural and functional traits of vegetation along the whole grasslands life cy-cle. Specifically, water content (NDWI), chlorophyll (MTCI, GNDVI) and the ratio of chlorophyll to carotenoids (PSRI) were indicated as best predictor variables for GPP estimates. Altogether these vegetation indices describe the loss of photosynthetic pigments and efficiency and drying out of vegetation, and when considered together they considerably improved GPP estimates in comparison with models that only adopt NDVI.
Our study also confirms the importance of hyperspectral in situ measurements for an exploratory analysis of the relationship between biophysical and optical properties of vegetation, providing a wide spectral range and high resolution of spectral retrievals.
The hyperspectral reflectance values, together with the two-step procedure adopted for the selection of predictor variables, also gave evidence to critical regions of the spectra, which were not included in the initial selection of vegetation indices but revealed their usefulness in estimating GPP. For example, the LEAPS two-step procedure evidenced which bands could significantly improve a -VIs model (step 1), identifying the red edge and SWIR regions of the spectra to be of major importance for improving GPP estimates. This information can be critical in the development of new spectral indices and sensors.
Our results also evidenced the potential of S2 and L8 sensor in assessing GPP, since models obtained by simulating bands from the two sensors showed similar performances. The possibility of using remote-sensing information for monitoring and modelling vegetation at a suitable spatial resolution, such as in the S2 and L8 sensors, allows for attempted vegetation monitoring and modelling in a cost-effective way, in support of sustainable agriculture management practices.
Author contributions. SC designed the experiment and performed field measurements together with JF. MC developed the variable selection code. CN and MdCC set up the experimental plots. SC prepared the manuscript with contributions from all the authors.
Competing interests. The authors declare that they have no conflict of interest.