Evaluating the potential of large-scale simulations to predict carbon fluxes of terrestrial ecosystems over a European Eddy Covariance network

This paper reports a comparison between largescale simulations of three different land surface models (LSMs), ORCHIDEE, ISBA-A-gs and CTESSEL, forced with the same meteorological data, and compared with the carbon fluxes measured at 32 eddy covariance (EC) flux tower sites in Europe. The results show that the three simulations have the best performance for forest sites and the poorest performance for cropland and grassland sites. In addition, the three simulations have difficulties capturing the seasonality of Mediterranean and sub-tropical biomes, characterized by dry summers. This reduced simulation performance is also reflected in deficiencies in diagnosed light-use efficiency (LUE) and vapour pressure deficit (VPD) dependencies compared to observations. Shortcomings in the forcing data may also play a role. These results indicate that more research is needed on the LUE and VPD functions for Mediterranean and sub-tropical biomes. Finally, this study highlights the importance of correctly representing phenology (i.e. leaf area evolution) and management (i.e. rotation–irrigation for cropland, and grazing–harvesting for grassland) to simulate the carbon dynamics of European ecosystems and the importance of ecosystem-level observations in model development and validation.


Introduction
Terrestrial ecosystems currently mitigate climate warming by sequestering in plants and soils a significant portion of anthropogenic carbon dioxide (CO 2 ) emissions, which are considered to be primarily responsible for the increase in global surface air temperature since the mid-20th century (IPCC, 2007). In particular, European terrestrial ecosystems have been reported to be a significant sink of CO 2 (Luyssaert et al., 2012). The mechanisms that drive net carbon uptake in Europe partially depend on changes in environmental conditions that occurred during recent years . In addition, the net carbon uptake is also affected by re-growth of young forests (Bellassen et al., 2011), land management practices Kutsch et al., 2010), nitrogen deposition (Churkina et al., 2010), and response to extreme climate events .
Land surface models (LSMs) were developed in the last 20 years with the aim to improve the simulation of surface energy fluxes (e.g. latent heat, sensible heat, and net radiation) and biochemical fluxes (CO 2 ) at a global scale. The LSMs now include representations of ecological and hydrological processes, in relation to the vegetation biomass and to the terrestrial water cycle. Examples of land surface or dynamic  (Bonan et al., 2011), CTESSEL (Boussetta et al., 2013), ISBA-A-gs (Calvet et al., 1998), LPJ , or ORCHIDEE . However, the differences in the model schemes and in their assumptions have led to considerable discrepancies between the different LSMs in both present climate simulations (Jung et al., 2007;Schwalm et al., 2010;Weber et al., 2009) and future climate scenarios (Friedlingstein et al., 2006). In this context, an extensive comparison between LSM outputs and in situ flux measurements can help highlighting the processes and parameters that trigger the main discrepancies (Friedlingstein et al., 2006;Gregory et al., 2009).
Most LSMs use a plant functional types (PFTs) classification to map the model parameters at a global scale. Thanks to the FLUXNET worldwide network of eddy covariance (EC) measurements (Aubinet et al., 2012), continuous multiannual EC data are now available for a wide range of ecosystems. The observational information has a highly relevant role in understanding the land-atmosphere CO 2 processes (Baldocchi, 2008) and in evaluating/validating carbon LSMs (Friend et al., 2007) at regional and global scale. For these reasons, EC data represent an essential resource to describe with greater confidence how ecosystem carbon fluxes (net ecosystem CO 2 exchange -NEE, gross primary production -GPP and ecosystem respiration -Reco) act in response to past and present climate changes in Europe (Gilmanov et al., 2007;Ciais et al., 2010). EC measurements have been extensively used for the development of LSMs. They are typically used for calibration/optimization and for the evaluation of LSM performance (Kuppel et al., 2012;Boussetta et al., 2013), or at global scale using data-oriented models . In addition, all information collected at EC flux towers, such as biomass and soil carbon content, can be helpful for benchmarking analysis. However, most of these studies focus on the parameterization of specific ecological response functions over few eddy covariance sites, mainly at forested locations, or on the evaluation of a sole model at regional or global scale.
In this work, three generic LSMs namely ORCHIDEE, ISBA-A-gs and C-TESSEL, forced by ERA-Interim surface atmospheric variables, are compared using European EC sites as reference. This experimental set-up aims at reproducing a realistic scenario for the operational Copernicus atmosphere monitoring service (http://atmosphere.copernicus.eu/) which will link state-of-the-art numerical weather prediction (NWP) products with state-of-the-art LSMs for the monitoring of terrestrial carbon exchanges at the global scale. Our aim is to assess the capability of basic model simulations to describe the carbon dynamics for a variety of ecosystems and to identify potential ways to improve these simulations. More specifically we test how well the simulations: (i) perform to describe the daily, seasonal and interannual variability of carbon fluxes; (ii) describe carbon dynamics for different ecosystems (forest, grassland and cropland) and climates; and (iii) describe the main ecological functions controlling carbon fluxes.

Material and methods
Carbon fluxes from the three LSMs (ORCHIDEE, ISBA-Ags and CTESSEL) are compared with the fluxes measured at 32 EC flux tower sites (Table 1), which represent the main European ecosystems. Table 2 reports the main characteristics of the three analyzed LSMs. All models were forced using the same meteorological data set, ECMWF ERA-I (see Sect. 2.2).

The ISBA-A-gs model
The ISBA-A-gs LSM (Calvet et al., 1998(Calvet et al., , 2004Gibelin et al., 2006;Calvet et al. 2008) is a CO 2 -responsive version of the ISBA (interactions between soil, biosphere, and atmosphere) (Noilhan and Planton, 1989;Noilhan and Mahfouf, 1996) model. It is part of the SURFEX (SURFace EXternalisée) platform developed by Météo-France to be used in operational NWP and hydrological models (Masson et al., 2013). Photosynthesis and respiration are simulated following the A-gs scheme (Calvet et al., 1998;Calvet, 2000). According to the model classification framework set out in Arora (2002), the photosynthesis model within ISBA-A-gs is based on a soil-vegetation-atmosphere transfer (SVAT) biochemical approach. The representation of photosynthesis is based on the model of Goudriaan et al. (1985) modified by Jacobs (1994) and Jacobs et al. (1996). This parameterization has the same formulation for C4 plants as for C3 plants, differing only by the input parameters. Moreover, the slope of the response curve of the light-saturated net rate of CO 2 assimilation to the internal CO 2 concentration is represented by the mesophyll conductance (g m ). Therefore, the value of the g m parameter is related to the activity of the Rubisco enzyme (Jacobs et al., 1996), while in the Farquhar model this quantity is represented by a maximum carboxylation rate parameter (V C,max ). The model also includes an original representation of the soil moisture stress; two different types of the plant response to drought are distinguished, for both herbaceous vegetation (Calvet, 2000) and forests (Calvet et al., 2004). Broadleaf forests, grasslands, and C4 crops (coniferous forests and C3 crops) follow a drought-tolerant (droughtavoiding) response to soil moisture stress. In order to obtain the CO 2 balance at the ecosystem scale, the A-gs model is coupled to an ecosystem respiration module with a dependency on soil moisture and surface temperature. Ecosystem respiration is described as a basal rate modulated as a function of soil moisture and temperature (Albergel et al., 2010). For most prognostic variables the time step of the model is 15 min with LAI and biomass calculated on a daily basis. The SURFEX version 6.2 is used with the "NIT" option of ISBA-A-gs, as in Szczypta et al. (2012). The simulations are performed several times per grid cell in order to simulate the various PFTs. The simulations corresponding to the tower site PFT are used.

The CTESSEL model
Carbon-TESSEL (CTESSEL) is the latest version of Hydrology-Tiled ECMWF scheme for Surface Exchange over Land model (H-TESSEL) Balsamo et al., 2011Balsamo et al., , 2009van den Hurk et al., 2000;Viterbo and Betts, 1999;Viterbo and Beljaars, 1995). It contains a carbon module that simulates photosyn-thesis and respiratory processes at the surface (Boussetta et al., 2013). Photosynthesis and canopy conductance dynamics follow the same scheme as ISBA-A-gs (Calvet et al., 1998;Calvet, 2000). Ecosystem respiration is described by empirical functions of vegetation type, soil moisture, soil temperature and snow depth (Boussetta et al., 2013;Normann et al., 1992). This carbon module does not have a prognostic land surface carbon pool as generally included in land ecosystem exchange models and all variables are simulated on a 30 min time step. Two vegetation types are selected per grid box as part of the tiling approach, namely, one for high vegetation and one for low vegetation. Spatial distribution of vegetation is prescribed by the Global Land Cover Characterization database (GLCC, Loveland et al. 2000) and phenology www.biogeosciences.net/11/2661/2014/ Biogeosciences, 11, 2661-2678, 2014  Parton et al. (1987) for soil respiration); Autotrophic respiration (Ruimy et al. 1996) No Krinner et al. (2005) follows a MODIS LAI (collection 5) derived climatology on a 8 day basis  rather than prognostic LAI estimates. As reported by Boussetta et al. (2013), parameters for a particular PFT were optimized by grouping a number of EC sites with the same PFT for the year 2006. These parameters are the unstressed mesophyll conductance (g * m ) and the reference respiration (R 0 ). Others parameters were taken from the literature (White et al., 2000;Calvet et al., 2000Calvet et al., , 2004. The EC sites used in the parameters optimization were not considered in the validation and comparison of CTESSEL with others models.

The ORCHIDEE model
ORCHIDEE is a dynamic global vegetation model which can be run either coupled to global or regional atmospheric circulation models, or forced by meteorological fields. OR-CHIDEE consists of three linked sub-modules ; the "SECHIBA" module (Ducoudré et al., 1993) is a land surface energy and water balance model with a 30 minute time step. The phenology (Botta et al., 2000) and carbon fluxes of terrestrial ecosystems are modelled in the "STOMATE" module (Viovy, 1997) that simulates the processes of photosynthesis, carbon allocation, litter decomposition, soil carbon dynamics, maintenance and growth respiration, and phenology on a daily time step, and long-term processes (on yearly time step) of vegetation dynamics including sampling establishment, light competition, and tree mortality are adapted according to LPJ . Photosynthesis at canopy level, and the instantaneous energy and water balance of vegetated and non-vegetated surfaces, are simulated by coupling leaf-level photosynthesis and stomatal conductance processes based on Ball et al. (1987) and Farquhar et al. (1980). Stomatal conductance is reduced by soil water stress (McMurtrie et al., 1990), as a function of soil moisture and root profiles. This reduction is done indirectly by reducing maximum Rubisco carboxylation rate (V C,max ) which then reduces stomatal conductance (g s ). Two soil water reservoirs are considered: a surface reservoir which refills in response to rain events and which is brought to zero during dry periods, and a deeper soil reservoir updated according to evaporation, root uptake, percolation and runoff. ORCHIDEE uses a tiled approach allowing the simulation of different PFTs within a grid cell, and the tiles of a grid cell share the same soil water reservoir. Here, we used the Olson classification to derive spatial distribution of vegetation, distinguishing 13 different PFTs (Vérant et al., 2004). Only the PFT that seems the nearest from the class attribute of the FLUXNET site is considered. For cropland we considered winter wheat for C3 and maize for C4; irrigation and harvesting were not activated.

Atmospheric forcing
All the simulations reported in this paper were forced with 3-hourly meteorological data extracted from the ECMWF-Interim (ERA-I) reanalysis (Dee et al., 2011) which covers the period from 1979 to present. The ERA-I data is available on a reduced Gaussian grid (N128) corresponding to a resolution of about 80 km. The ERA-I grid point nearest to the tower location has been selected for the forcing. The temperature, surface pressure, humidity and wind fields are instantaneous values representative of the lowest level of the atmospheric model corresponding to a height of 10 m above the surface. The incoming short-wave and long-wave radiation components, rainfall, and snowfall are provided on a 3-hourly basis. The instantaneous fields are linearly interpolated in time to match the time step of the LSM. Systematic increments in the meteorological data assimilation system cause a slight imbalance in the ECMWF atmospheric model and the first 9 h of the forecasted atmospheric variables cannot be used. Therefore, forecast intervals 9-12, 12-15, 15-18 and 18-21 h starting from the daily analyses at 00:00 and 12:00 UTC are used in this study. Twice daily forecast of fluxes and instantaneous fields are matched by verification time and concatenated, which results in an uninterrupted time series with a full diurnal cycle and a time resolution of 3 h. Precipitation is kept constant over the 3hourly interval, long-wave downward radiation is linearly interpolated and downward solar radiation is disaggregated in time making use of the solar angle, but conserving the 3hourly integral. Land surface variables like soil temperature, soil moisture, and snow depth are slow variables and are not taken from ERA-I but are the result of the time integration of the land surface scheme. The potential effect of the initial condition is eliminated by performing long cyclic runs to Biogeosciences, 11, 2661-2678, 2014 www.biogeosciences.net/11/2661/2014/ achieve equilibrium (i.e. to get a proper equilibrium between, e.g., soil moisture and the soil characteristics of the particular model).
The use of ERA-I data is subject to limitations that are largely documented for re-analyses (Simmons et al. 2009, Dee et al. 2011), but in extra-tropical continental areas precipitation and radiation forcing show good skill in reproducing observed climate (Szczypta et al., 2011;Balsamo et al., 2010). A comparison between in situ and ERA-I forcing variables and in situ meteorology data confirmed that the ERA-I global data set is a good proxy of local climatic conditions. It is well known that carbon models are sensitive to biases in climate forcing data (see e.g. Zhao et al., 2006). In our case, ERA-I precipitation, which is a major component of the forcing, turns out to be very competitive compared to, for example, GPCP products for mid-latitudes with sometimes a small benefit from bias correction Szczypta et al., 2011). The realism of our global forcing can also be seen in the derived products from offline land surface simulations in verification of soil moisture, snow, and runoff . The error in meteorological variables caused by the difference spatial resolution of the in situ data simulated fluxes from the LSMs is strictly related to spatial resolution of the model grid. Zhao et al. (2012, their Appendix B) studied the differences between a virtual 80 km × 80 km (ERA-I grid) flux with homogeneous vegetation type and a 1 km × 1 km (scaled SAFRAN grid) flux for the same vegetation type. They recognized that the uncertainty in gridded meteorological variables at fine spatial resolution is smaller than that caused by coarser resolutions (i.e. ERA-I). However they showed that for ORCHIDEE the model behaved similarly at the two grid scales.
In this study, ISBA-A-gs and ORCHIDEE provided simulations consistent with the local vegetation type by prescribing the PFT type to match local vegetation types. Regarding CTESSEL, the model vegetation type selection for a particular tower location was according to the model climate data set rather than to the vegetation type of the tower location. If the tower location is representative for a large area, it is likely to be the same, but this is not always the case. The evaluation of CTESSEL was also made with actual tower vegetation type (CTESSEL_obs). The overall error in simulating NEE, GPP and Reco coming from a satellite derived global map and the actual tower vegetation type is comparable (see Table S1 in the Supplement).

FLUXNET sites description and data elaboration
The eddy covariance technique allows measuring directly CO 2 , latent heat (LE) and sensible heat (H ) fluxes between the ecosystem and the atmosphere relative to an area (the footprint) of hundreds of metres around the EC tower depending on the tower and vegetation heights. The EC data are collected at high frequency (10 Hz) and converted to fluxes over 30 min or 1 h integration periods using standard method-ologies (Aubinet et al., 2012). Data gaps due to sensors malfunctioning or less than ideal turbulence conditions (Papale et al., 2006) are filled using the MDS method described in Reichstein et al. (2005). The two main components of the net carbon exchange (GPP and Reco) are estimated using the flux-partitioning technique based on the extrapolation of night-time flux observations with temperature dependent relations . The methodology is the one proposed and implemented in the European Fluxes Database (http://www.europe-fluxdata.eu) and used also in the context of the FLUXNET synthesis activities.
The geographical distribution of the EC sites covers the main PFTs as defined by the International Geosphere-Biosphere Programme (IGBP) existing in Europe. The selected sites include 8 croplands (CRO), 4 broadleaf deciduous forests (DBF), 2 evergreen broadleaf forests (EBF), 5 needleleaf evergreen forests (ENF) and 13 semi-natural and managed grassland locations (GRA) ( Table 1). Details about the flux EC sites, the years of available data and their characteristics as well-corresponding PFT and Köppen-Geiger climate class are provided in the Table 1.

Performance of the large-scale simulations
The verification protocol proposed in this study aims at assessing uncertainty within global monitoring systems. The global monitoring system requires global gridded climate forcing for implementing large-scale simulations and yet only in situ local data are available for model benchmarking and to evaluate model bias at local scales. Due to this wellknown scaling mismatch between forcing and benchmarks, possible misrepresentations must be considered from (1) local physiography (related to the field site, its aspects, and exposition), (2) local meteorological effects (e.g. local turbulence, presence of breeze, isolated precipitation), (3) the full complexity of the bio-geophysical processes controlling surface fluxes, and (4) the difference in spatial resolution between in situ and gridded climatological data. Therefore, it is important to realize that the evaluation reported here does not only include model errors, but also errors in the forcing and the potential mismatch between vegetation type in the footprint of the EC tower and the model vegetation type for that location. Consequently, it is an evaluation of an integrated system rather than the land-surface model only.
The analysis is based on 32 sites (Table 1), for a total of 164 site-years, selected on the basis of the data availability and to cover different PFTs and climate conditions. Only the sites containing at least one year of carbon flux data of good quality and only daily data with a percentage of gap-filled half hours less than 15 % were used in this analysis.  The simulation quality in predicting daily EC fluxes was performed using the following statistical parameters: correlation coefficient (CORR), efficiency (E) (Weglarczyk, 1998), root mean square error (RMSE) and bias: where O i and P i represent daily averaged measured and simulated fluxes, respectively.Ō andP denote their means. The E value can range from −∞ to 1 and a value close to 1 indicates a perfect match between simulated and observed data. A negative value occurs when the observed mean is a better predictor than the simulation. Positive values of the bias indicate an overestimation by the simulation and negative values indicate an underestimation.
In order to analyze the contribution of climate and PFT properties to the performance of the different simulations, we carried out a statistical analysis of the daily simulation outputs and EC measurements by grouping the EC sites by PFTs and Köppen-Geiger climate classes (Peel et al., 2007).
In addition, the seasonal and interannual variability were assessed using monthly means and annual cumulative flux anomalies, respectively. The flux anomalies were calculated the sites with at least 9 years of flux data as where NEEyr is the cumulative NEE for year (yr) and avg(NEE) is the average annual cumulative NEE for all the available years. For example a positive anomaly value for the year (yr) indicates that the carbon uptake is lower for this year following the convention that negative NEE corresponds to a carbon uptake by the ecosystem. Moreover, we analyzed the simulation capability to reproduce environmental response curves by comparing the observed and modeled GPP response to short-wave incoming radiation (R g ), to vapor pressure deficit (VPD) and the Reco response to air temperature (T a ) for forest sites.  Table 3 reports the scores of the three simulations (ISBA-A-gs, C-TESSEL and ORCHIDEE) for daily carbon fluxes (NEE, GPP and Reco), pooled by PFT. The distribution of the scores on a site × year basis is represented by box plots in Fig. 1. Overall, the best CORR values are obtained for the GPP of evergreen needleleaf forests (ENF, Table 3 and Fig. 1d), for the three simulations. The median ENF CORR values of the three simulations are comparable and the interquartile range is small. This means that the CORR values are closely distributed around the median values and that ENF forests sites show very similar annual CORR values.

Simulation performance by PFT
Also, the GPP of deciduous broadleaf forests (DBF) is represented well by the three simulations (Table 3). However, CTESSEL and ISBA-A-gs show a wide variation of the annual CORR values (Fig. 1b) and the outlier CORR values are much lower than the outliers for ENF forests.
For GPP, Table 3 shows that ORCHIDEE presents the best CORR score for ENF and DBF; CTESSEL presents the best CORR score for the other PFTs (EBF, GRA, CRO), while ISBA-A-gs shows the poorest performances.
For NEE, all simulations perform best at ENF sites (Fig. 1d) and the variability of the annual CORR is limited. Only ORCHIDEE always presents good NEE CORR values for DBF, in conjunction to relatively low RMSE values. For the other PFTs, the low NEE CORR values and the large interquartile ranges indicate that all the simulations have limited ability in predicting NEE.
For Reco, Table 3 indicates that the best CORR values are obtained by CTESSEL and ORCHIDEE over ENF sites. Figure 1d shows that the interquartile range is small for the three simulations, as for GPP. CTESSEL presents the best CORR values for ENF and CRO sites; ORCHIDEE for the EBF and GRA sites. Table 3 reports the number of sites for which the model efficiency (E) in simulating the fluxes is higher than 0.5. It can be assumed that such values of E indicate an acceptable level of performance since it means that the simulated fluxes explain more than 50 % of the variability of the observed data. While this never happens for EBF and CRO sites, the ENF sites present a rather large proportion of fair GPP (and to some extent Reco) simulations. For NEE this happens only once, for an ENF site CTESSEL simulation.

NEE simulation performance at the site level
Box plots in Fig. 2 show the distribution of the NEE annual CORR values for EC flux sites grouped by PFT for Cfb (temperate without dry season and with warm summer) Köppen's climate class. On average, ORCHIDEE shows the largest correlation values for all temperate DBF and ENF sites with a reduced variability across years (Fig. 2b, c). OR-CHIDEE also shows good correlation for croplands (Fig. 2a), except for the BE-Lon site. For grasslands, ISBA-A-gs and CTESSEL simulations tend to present better scores than OR-CHIDEE simulations (Fig. 2d, e). Across the temperate climate zone, the variability of CORR for grasslands and croplands can be related to the specific year-to-year site management which is not accounted for by the models: single crop sites (e.g. DK-Ris and IE-Ca1) or sites affected by crop rotation (e.g. BE-Lon, DE-Geb), and grazing (FR-Lq1, FR-Lq1) and harvesting (DE-Gri) for grasslands. Box plots in Fig. 3 show the performances in simulating NEE of CRO, EBF and GRA sites located in Csa (temperate with dry and hot summer) Köppen's climate class. The two CRO simulations present poor CORR scores, always negative for the ORCHIDEE and ISBA-A-gs simulations. For EBF and GRA sites, all simulations show the same scores for NEE.   Figure 4 reports the mean seasonal variation of carbon fluxes as derived from the three simulations and from EC flux towers measurements for forest PFT classes. This analysis was conducted considering only the forest sites that represent natural ecosystems without any notable anthropogenic impact during the considered period, grouped according to  Figure 5 shows the anomalies of NEE (Eq. 5) derived from the three simulations and the anomalies measured by EC flux towers for four EC sites: DE-Tha, DK-Sor and FR-Pue. ISBA-A-gs simulations show very similar pattern of interannual variability of NEE in agreement with EC data while the CTESSEL simulations show much more discrepancies than observed fluxes. All simulations represent NEE well in 2003 for the FR-Pue Mediterranean forest site. Moreover, ISBA-A-gs and CTESSEL are able to simulate NEE well for the other two sites in 2003.

Environmental controls of fluxes
During summer drought periods, GPP can be limited by a combination of reduced soil water content availability, high leaf temperature and vapor pressure deficit (VPD) stress, which impact stomatal conductance, photosynthetic activity and ecosystem respiration. Figure 6 shows an example of the seasonal GPP trend of two Cfb-DBF sites (DK-Sor and FR-Fon) and a Csa-EBF site (FR-Pue). Overall, OR-CHIDEE overestimates GPP for all sites. ISBA-A-gs and CTESSEL underestimate GPP for the two Cfb-DBF sites. For the Csa-EBF site, CTESSEL follow the daily GPP dynamics observed by EC while ISBA-A-gs overestimate GPP.
Plots in Fig. 7 show the GPP response curves to R g (Fig. 7a) and VPD (Fig. 7b) and the Reco response to T a (Fig. 7c) for the sites DK-Sor, FR-Fon and FR-Pue for July 2006. All simulations present some limitation in describing ecological functions controlling GPP for all sites (Fig. 7a, b). However, ORCHIDEE describes better GPP-VPD function for DK-Sor and FR-Fon. On the other hand, ORCHIDEE tends to overestimate VPD slopes for the FR-Pue EBF Mediterranean site where CTESSEL and ISBA-Ags show better results. As reported for the GPP-VPD function, ORCHIDEE tends to simulate better GPP-R g slopes for DK-Sor and FR-Fon sites, CTESSEL and ISBA-A-gs show better results for FR-Pue Mediterranean site. CTESSEL and ISBA-A-gs capture Reco-T a slope for FR-Fon site and OR-CHIDEE describe better Reco-T a slope for DK-Sor (Fig. 7c). All simulations present limitations in describing the Reco-T a of FR-Pue Mediterranean site showing higher Reco values and lower T a values for this period.

Comparison of in situ and ERA-I climatological forcing
In order to analyze the mismatch between in situ and ERA-I forcing variables, Fig. 8 shows the comparison for global radiation (R g , Fig. 8a), vapor pressure deficit (VPD, Fig. 8b), and air temperature (T a , Fig. 8c) for the sites DK-Sor, FR-Fon, and FR-Pue for 2006. It is clear from Fig. 8 that ERA-I shows a good correlation with the local observations, but inevitably there are random and systematic errors. Downward radiation, R g (Fig. 8a), is overestimated in the range between 50 W m −2 and 250 W m −2 and underestimated for values higher than 250 W m −2 . Both the random and systematic errors in ERA-I radiation are dominated by cloud errors. This will also affect the ratio of diffuse and direct radiation, which is not considered in this paper, although it is probably relevant for plant response. As shown in Fig. 8b, ERA-I tends to underestimate VPD for the DK-Sor and FR-Pue sites, which may be related to limitations of the tiling approach. With the tiling approach it is impossible to represent the atmospheric response to the local biome, because the atmosphere is assumed to be well mixed across tiles. In contrast a very good estimation of T a is shown by ERA-I for all sites (Fig. 8c).

GPP simulations
The evaluation of the three LSM simulations shows that they all represent the ENF GPP relatively well. This is particularly true for ORCHIDEE, which presents E values higher than 0.5 for all the ENF sites. The simulations also show www.biogeosciences.net/11/2661/2014/ Biogeosciences, 11, 2661-2678, 2014   good results for DBF sites (Table 3). Overall, CTESSEL and ORCHIDEE show better scores than ISBA-A-gs. OR-CHIDEE tends to overestimate GPP while ISBA-A-gs and CTESSEL tend to underestimate GPP. The over-or underestimation of GPP limits the capability of the simulations to represent both NEE and Reco fluxes. Reco depends to some extent on biomass and is therefore related to GPP. Errors in both GPP and Reco impact the diurnal and seasonal NEE cycles (Schwalm et al., 2010;Schaefer et al., 2012). Therefore, the NEE is particularly difficult to represent in large-scale simulations. A fair simulation of the daily NEE (E = 0.51) is obtained only for the DE-Tha ENF site by CTESSEL (Table 3). This site is characterized by an Oceanic-European climate (Grunwald and Bernhofer, 2007). The difference between CTESSEL and the other simulations can be related to the prior optimization of CTESSEL against EC flux data (Boussetta et al., 2013). Note, however, that the EC sites used in the optimization of CTESSEL are not considered in this study. The high variability of the DBF annual scores and the small or negative values of CORR for NEE and Reco (Fig. 1) can be explained by the climatic variability associated with the sites of this PFT class (Table 1). DBF forests are located in both sub-tropical (Cfa) and Oceanic-European (Cfb) climate regions with and without marked dry summer periods, respectively, and the carbon dynamics and ecosystem ecological responses are controlled by different factors (Migliavacca et al., 2011).

Scale issues
The mismatch between the large-scale atmospheric forcing and the local atmospheric variables impacts the model intercomparison. Another perturbing factor is the possible mismatch between the model PFT and the local characteristics of the vegetation.
The differences in simulating carbon fluxes between ISBA-A-gs and the other two models can be associated with the sensitivity to errors in atmospheric forcing, LAI modelling versus LAI climatology, and the photosynthesis module (Table 2). In ISBA-A-gs a photosynthesis-driven growth model is used to compute leaf biomass and LAI, and all the atmospheric variables influence LAI in ISBA-A-gs simulations. Therefore, errors in any of the atmospheric variables can have a marked impact on LAI (Szczypta et al., 2013). Moreover, in this study, ISBA-A-gs and and ORCHIDEE have a prognostic LAI while CTESSEL uses a satellitederived LAI climatology. Gibelin et al. (2008) have shown that ISBA-A-gs and ORCHIDEE present similar scores at temperate and high latitudes of the Northern Hemisphere, based on LSM simulations driven by locally observed atmospheric variables. Also CTESSEL presents a similar score in simulating fluxes deriving vegetation types from a satellite derived map or actual tower vegetation type (Table S1 in the supplement).

Mediterranean areas
The difficulties in predicting carbon fluxes of the Csa EBF forest sites considered in this study (FR-Pue and PT-Esp, Table 1) can be related to a misrepresentation of phenology (Szczypta et al., 2013). More specifically, uncertainties in the representation of the soil water stress characterizing the Csa climate affect the simulations of GPP and Reco (Migliavacca et al., 2011). Many environmental factors impact the diurnal and seasonal variability of carbon fluxes in water limited biomes (e.g. soil moisture, rain pulses). The poor estimations of NEE at these forest sites is related to the summer drought period with reduced soil water content availability, high air temperature and high VPD stress, which impact stomatal conductance and photosynthetic activity .

Phenology and agricultural practices
The improvements of LAI and phenology are believed to be a high priority in agreement with a previous study by Richardson et al. (2012). For monitoring applications, the model vegetation seasonality could be imposed by near-real time (NRT) remote-sensing products as it would force deciduous forest, grassland and cropland phenology and would help capturing parts of the land management practices (mainly grazing and harvesting). For wider applications, advance in understanding which factors (photoperiod, cold temperatures, and warm temperatures) regulate spring bud burst is needed and  . The grasslands and croplands management is crucial in the definition of carbon sink/source activity for these agro-ecosystems . In addition, cultivated land occupies about 50 % of earth's surface, and nearly 18 % of the cultivated land now receives supplementary water trough irrigation (IPCC, 2007). The NEE seasonality of croplands (Fig. 1a) and grasslands (Fig. 1e) is related to management (e.g. irrigation, fertilization, grazing, manure, or harvesting) as well as C3/C4 climate-driven dynamics. Models used in this study did not consider the agricultural practices. The lack of description of agricultural practices and of the crop dynamics in the models interferes with the soil water stress at the end of the growing period. This likely causes the negative value of correlation (CORR) observed in Fig. 1a and e. Moreover, Mediterranean grasslands are very sensitive to rain pulses in spring and in autumn, after a drought period. Therefore, local-scale model structures that describe ecosystem functionalities observed in situ are a major component of carbon fluxes in this biome. For example, the grassland management consists of either cutting or grazing practices and the carbon balance depends on the type and intensity of management Wohlfahrt et al., 2008). All models show negative correlation values for cropland sites located in the Mediterranean climate area (Csa, Fig. 3). As reported by Kutsch et al., (2010) these sites are irrigated and their phenology is driven by the presence of water in the soil and not by meteorological condition. Irrigation is not accounted for in the models. Moreover, cropland sites are mainly managed following yearly crop rotation schemes. Current models assume a single crop type and management.
Therefore, future efforts should focus on implementing new cropland and grassland management schemes into LSMs. A possible approach is to couple LSMs with existing or new models focused on cropland and grassland management. Alternatively, model development could be consolidated by the introduction of new modules, for example, land management such as rotation, irrigation, grazing and harvesting, which would help simulating the carbon uptake of cultivated areas in Europe. For instance, work is already underway to couple ORCHIDEE with the PASIM model (Vuichard et al., 2007;Ciais et al., 2010), which takes pasture management into account as well as STICS modules (Gervois et al., 2008), related to crops phenology.

Impact of drought on model parameters
The poor simulation performance during drought periods could be linked to an inadequate representation of LUE. Simulated LUE is controlled by the leaf-to-canopy scaling strategy and a small set of model parameters that defines the maximum potential GPP, such as ε max (light-use efficiency), V C,max (unstressed Rubisco catalytic capacity), J max (the maximum electron transport rate), or mesophyll conduc-tance. The temperature, humidity, and drought scaling factors determine temporal variability in simulated GPP, but the LUE parameters determine the magnitude of simulated GPP. To improve simulated GPP, model developers should focus first on improving the leaf-to-canopy scaling and the values of those model parameters that control the LUE. Moreover, understanding the functional relationship between soil-root characteristics and vegetation water uptake remains challenging, particularly to describe globally over broad time and space scales the short-term effects on GPP and Reco due to dry conditions (Migliavacca et al., 2011). Therefore, further efforts should be focused on the understanding of the most appropriate ecological functions able to describe the complexity of the plant eco-physiological responses (e.g. adaptation, mortality, defoliation) in dry conditions (van der Molen et al., 2011).

ERA-I forcing in offline LSM simulations
The comparison of in situ and ERA-I atmospheric forcing (Fig. 8) showed that ERA-I is a good proxy for in situ data although some limitations for R g and VPD are present. Quality and limitations of ERA-I are well documented by several authors, for example, Simmons et al. (2009), Dee et al. (2011), Szczypta et al. (2011 and Balsamo et al. (2010) reported that ERA-I showed a remarkable skill in predicting precipitation and radiation in extra-tropical areas.
Some of the errors in the ERA-I forcing will be related to a mismatch between the 80 km grid box of ERA-I and the vegetation type or land use at a particular location. ERA-I accommodates for land heterogeneity by using surface tiles which allow for interaction of a single atmospheric grid point with multiple land cover tiles. However, the terrain heterogeneity represented in this way, is not kept in the atmosphere, which is a clear limitation of the tiling approach (see e.g. Manrique-Sunen et al. (2013) for a study on the contrast between lake and land tiles, Baldocchi et al. (2014) for biome heterogeneity, and Vilà-Guerau de Arellano et al. (2014) for effects of broken clouds).
It is therefore suggested that ERA-I remains a very powerful data set to use as input of LSMs. Its uniqueness is related to the data availability at global scale at hourly time resolution. Moreover, the ERA-I data set is strongly constrained by millions of daily in situ and remote-sensing observations that are consistently assimilated using statistically optimal schemes and a modeling system that ensures internal coherence of the physical/dynamical fields. Therefore, the adoption of a meteorological forcing from re-analyses is attractive, particularly because it allows simulations on a global scale. The use of local observations to force the LSMs should be a possibility, but they are also affected by limitations such as missing data, instrumental biases and smallscale representativeness. For many applications, LSM simulations need continuous atmospheric forcing that can be provided by global atmospheric data sets as ERA-I. On the other www.biogeosciences.net/11/2661/2014/ Biogeosciences, 11, 2661-2678, 2014 hand, in situ observing networks are essential as "groundtruth" verification to study and monitor the quality of land carbon modelling systems.

Conclusions
This study aimed at evaluating the ability of large-scale simulations based on three different LSMs (C-TESSEL, ISBA-A-gs and ORCHIDEE) to simulate terrestrial carbon fluxes of European ecosystems over a range of climatic conditions and agricultural practices (grazing, crop rotation, irrigation). We used a multiple approach where different tests were applied to perform the comparison between model results and observations, including analysis of seasonal and interannual trends and ecological relationships. The results show a heterogeneous picture, with differences between models (Table 2), plant functional types (Fig. 1), climates and sites ( Figs. 2 and 3). It is shown that the best performance is obtained for ENF sites in continental and humid climate areas without a marked dry season. All simulations show some limitations in capturing the NEE seasonality for Mediterranean and sub-tropical ecosystems characterized by a dry summer season. This could be due to several environmental factors that control GPP during summer dry conditions with low water availability (Keenan et al., 2009). These results are in agreement with previous studies on the ORCHIDEE , ISBA-A-gs , and CTESSEL (Boussetta et al., 2013) models.
The model stress functions in Fig. 7 show large errors for the drought-affected sites, which suggest that these stress functions can be improved by using more appropriate equations and parameters in the LSMs. Also working at higher resolution and better biome characterization at the tile scale are expected to contribute to improvements in the products. Furthermore, all simulations show large errors in the description of grassland and cropland phenology. The improvement of phenology, of crop and grassland management and of the relationships between LAI, photosynthesis and environmental drivers, should be considered in future development of these models.
This study suggests that priority areas for model development are (i) improving the seasonal evolution of LAI, (ii) modeling the effects of crop and grassland management including irrigation, and (iii) describing changes in model parameter values in response to drought. Finally, this analysis confirmed the importance of the ecosystem-scale observations in model validation and development, suggesting also an integrated set of tests to compare simulations and measurements. With the establishment of long-term global monitoring networks such as ICOS (www.icos-infrastructure.eu) NEON (www.neoninc.org) and AmeriFlux (http://ameriflux. lbl.gov), the use of direct measurements, also in nearreal time, will provide a unique framework for these types of activity.
The Supplement related to this article is available online at doi:10.5194/bg-11-2661-2014-supplement.