Impacts of fertilization on grassland productivity and water quality across the European Alps: insights from a mechanistic model

Alpine grasslands sustain local economy providing fodder for livestock. Intensive fertilization is common to enhance 10 their yields, thus creating negative externalities on water quality that are difficult to evaluate without reliable estimates of nutrient fluxes. We apply a 1-D mechanistic ecosystem model, seamlessly integrating land-surface energy balance, soil hydrology, vegetation dynamics, and soil biogeochemistry aiming at assessing the grassland response to fertilization. We simulate the major water, carbon, nutrient, and energy fluxes of nine grassland plots across the broad European Alpine region. We provide an unprecedent interdisciplinary model evaluation confirming its performance against observed variables from 15 different datasets. Subsequently, we apply the model to test the influence of fertilization practices on grassland yields and nitrate (NO3) losses through leaching. Despite the generally low NO3 concentration in groundwater recharge, the variability across sites is remarkable, mostly, but not exclusively, dictated by elevation. In high-Alpine sites short growing seasons lead to less efficient nitrogen (N) uptake for biomass production. This combined with lower evapotranspiration rates results in higher amounts of drainage and NO3 leaching 20 to groundwater. The local soil hydrology has a crucial role in driving the NO3 use efficiency. The commonly applied fixed-threshold limit on fertilizer N input is suboptimal. We suggest that major hydrological and soil property differences across sites should be considered in the delineation of best practices or regulations for management. Using distributed maps informed with key soil and climatic attributes or systematically implementing integrated ecosystem models as shown here can contribute to achieving 25 more sustainable practices. 30 https://doi.org/10.5194/bg-2020-294 Preprint. Discussion started: 26 August 2020 c © Author(s) 2020. CC BY 4.0 License.


Introduction
Alpine grasslands are a vital resource for the European economy. They provide a variety of ecosystem services such as maintaining biodiversity, protecting soil, and offering recreational functions (Sala and Paruelo, 1997;Lamarque et al., 2011;Schirpke et al., 2017). Above all, they sustain economy providing fodder for livestock. To this purpose high yields are desired, as they correspond to increasing income for farmers, hence large amounts of fertilizers are applied every year. One of the most 35 important components of fertilizers is nitrogen. On the one hand nitrogen plays a crucial role as a nutrient contributing to the grasslands' growth. On the other hand, the surplus of nitrogen, mostly as NO3, is dispersed in the environment, either through surface runoff or as leachate to groundwater. Well-known water quality issues, such as eutrophication, result directly from intensive management of grasslands (e.g., Heathwaite, 1995;Peukert et al., 2014). In order to tackle the problem, the European Union (EU) approved the Nitrate Directive 91/676/EEC (EEC, 1991), which imposes limits on the yearly load of fertilizers 40 and restricts the fertilization season. In this frame, the EU countries should pursue grassland management strategies designed with the intention of maximizing the grassland yields without damaging the environment. The task is particularly daunting given the spatial heterogeneity across the wide Alpine grassland areas. Elevation is often a major constrain for the management in Alpine environment, being low-elevation sites generally more prone than high-elevation sites to intensive management, for both climatological and pragmatic reasons. Since the EU Nitrate Directive has been in effect, many studies attempted to 45 quantify the NO3 fluxes with various methods (e. g., Kronvang et al., 2009;Groenendijk et al., 2014). However, according to a questionnaire submitted to grassland experts across EU countries, Norway and Switzerland (Velthof et al., 2014), "nonacademic" quantification of NO3 fluxes is still mainly based on expert estimates or on prescribed values. Rarely models are applied. When they are, the most common approach is the simple empirical feed balance method, where grassland yields are estimated using statistical data on feed availability for ruminants and their feed requirements. The nutrients balance is 50 consequently estimated applying values derived from literature or direct measurements in sampled grass.
More complex modelling approaches for the quantification of nitrate losses from agricultural activity are emerging. Some European Projects have been promoted with the goal of assessing the state-of-the-art modelling tools in view of a harmonized assessment procedure across Europe. For example, the EUROHARP Project (Kronvang et al., 2009) compares available distributed models applied in different European countries. The analyzed models span from conceptual models such as 55 NLES_CAT (Simmelsgaard and Djurhuus, 1998) and MONERIS (Behrendt et al., 2003) to more complex and process-oriented models like SWAT and NL-CAT, born from the integration of ANIMO (Groenendijk et al., 2005), SWAP (Kroes and Dam, 2003), SWQN ) and SWQL . Another example is the GENESIS Project (Groenendijk et al., 2014), which compares 1-D mechanistic models integrating hydrology, vegetation dynamics, and soil biogeochemistry.
The performance of some models such as ARMOSA (Perego et al., 2013), CoupModel (Jansson, 2012) and EPIC (Williams 60 et al., 1984;Sohier et al., 2009) are compared to simulated nitrate leaching rates from lysimeters. What emerges from these studies is that mechanistic models are currently not used for the purpose of supporting management because they are thought to be difficult to operate, require a large amount of input data, and require complex parameterization (see also discussions in ( Tafteh and Sepaskhah, 2012;Phogat et al., 2013) or DAISY (Hansen, 1990;. More recently, ecosystem models were built combining different models, each highly specialized in a specific compartment of the ecosystem. The result leads to fully-80 integrated models such as ARMOSA (Perego et al., 2013), which is the combination of the hydrological model SWAP (Van Dam, 2000), STAMINA (Ferrara et al., 2010;Richter et al., 2010) for simulating the crops dynamics, and SOILN (Bergström et al., 1991) to represent the soil carbon and nitrogen cycles. The model SIM-STO (Pütz et al., 2018) was also created from the merging of SIMWASER (Stenitzer, 1988) and STOTRASIM (Feichtinger, 1998), respectively focused on water fluxes and nitrogen dynamics. RT-Flux-PIHM (Bao et al., 2017) combines a reactive transport (RT) model, a land-surface model 85 (Noah LSM, Chen et al., 2001;Shi et al., 2013) and the hydrological model PIHM (Shi et al., 2013). These models are usually performing well when applied with targeted applications. In most of the cases they are validated only against the data concerning one specific module that is the most important to solve the problem at hands. Hardly ever the models are concurrently confirmed against vegetation and soil water dynamics, energy and biogeochemical fluxes. Here, we push the envelope of integrated models by applying the mechanistic ecosystem model T&C-BG, which simulates the interactions among 90 vegetation dynamics, soil biogeochemistry, and hydrological fluxes. T&C-BG was recently developed (Fatichi et al., 2019) as a single fully integrated model and therefore does not require many of the pragmatic simplifications necessary to externally couple different models. We evaluate the model capabilities by applying a "common" (non-site specific) parametrization to nine managed grassland sites across the broad alpine region, derived from previous plot-scale studies (Fatichi et al., 2014;Fatichi and Pappas, 2017). First, we confirm the model against hydrological, energy, vegetation, and soil biogeochemical 95 observations according to the available data. Then, we run numerical experiments to estimate the grass yields and the losses https://doi.org/10.5194/bg-2020-294 Preprint. Discussion started: 26 August 2020 c Author(s) 2020. CC BY 4.0 License.
of NO3 under different fertilization regimes and compare results across the sites. Specifically, we investigate and answer the following questions: (1) Is a mechanistic ecosystem model provided with a "common" parameterization for Alpine managed grasslands able to reliably represent the ecosystem dynamics across 9 sites? 100 (2) How do grass productivity and leaching of NO3 change in response to different fertilization scenarios across the broad Alpine region?
(3) Can mechanistic models provide guidelines for fertilization regulation?
Answering these questions, beyond allowing the assessment of a state-of-the-art ecosystem model in reproducing multiple aspects of ecosystem functioning, also provides quantitative information that can be used to enhance best grassland 105 management practices.

Study sites and instrumentation
The study sites are located in European Alps and cover an altitude gradient ranging between 393 m a.s.l and 2160 m a.s.l. We used grassland sites in the Alpine region for which at least eddy covariance flux tower observations were available. For a 110 subset of sites, additional observations about leaf area index, soil moisture, grass productivity and water and nitrogen leaching were also available. Specifically, the sites of Torgnon (TOR) and Monte Bondone (MB) in Italy, Oensingen (OEN), Chamau (CHA) and Früebüel (FRU) in Switzerland, Stubai (STU) in Austria and Fendt (FEN), Rottenbuch (ROT) and Graswang (GRA) in Germany were used. The mean annual precipitation (MAP) across sites ranges between 850 mm and 1627 mm while mean annual temperature varies between 2.9 and 9.5 °C following the elevation gradient (Table 1). All these sites are 115 permanent grasslands ecosystems, mainly used for fodder production (Ammann et al., 2007;Hammerle et al., 2008;Vescovo and Gianelle, 2008;Kiese et al., 2018). Apart from the site TOR, which is unmanaged, the other grassland sites are managed with regular fertilizer applications and harvested with grass cuts at standard height (0.07 m).
Each site is equipped with a flux tower providing time series of micrometeorological variables, as well as energy (e.g., net radiation, latent heat, sensible heat) and carbon dioxide fluxes by means of the Eddy Covariance (EC) method (Aubinet et al., 120 2012). The flux tower data for all the sites but the German ones are retrieved from the FLUXNET-2015 database (Pastorello et al., 2020). German sites (FEN, ROT, GRA) belong to the Pre-Alpine Observatory of the TERENO network (Zacharias et al., 2011;Pütz et al., 2016;Kiese et al., 2018) where lysimeters are also installed in proximity of the flux-tower site (Fu et al., 2017(Fu et al., , 2019Zeeman et al., 2017;. For the stations FEN, ROT and GRA annual totals of lysimeter evapotranspiration (ET) and water leaching are retrieved for 125 the years 2012-2014 from Fu et al., (2017Fu et al., ( , 2019. The same study provides annual DOC-C and nitrate-N (N-NO3) https://doi.org/10.5194/bg-2020-294 Preprint. Discussion started: 26 August 2020 c Author(s) 2020. CC BY 4.0 License. concentrations in the groundwater recharge. The ScaleX campaign of 2015 (Wolf et al., 2017;Zeeman et al., 2019) provided additional data concerning harvested biomass and its carbon and N content.

T&C-BG model 130
We simulate the grassland ecosystem dynamics in each site with the fully-coupled terrestrial biosphere/ecosystem model T&C-BG, introduced by Fatichi et al. (2019), who provided a complete model description. T&C-BG combines a well-tested ecohydrological model (T&C, e.g., Fatichi et al., 2012aFatichi et al., , 2012bFatichi et al., 2014Fatichi et al., , 2015Manoli et al., 2018;Mastrotheodoros et al., 2020) with a soil biogeochemistry module, which represents soil carbon and nutrient dynamics as well as plant mineral nutrition and bilateral feedbacks between nutrient availability, plant growth and soil mineralization processes. The T&C-BG 135 model inputs include hourly flux-tower observations of microclimate (e.g., precipitation, air temperature, wind speed, relative humidity, shortwave radiation) and resolves the principal land-surface energy exchanges (e.g., net radiation, sensible and latent heat) which are interconnected with the main hydrological processes, such as evaporation and transpiration, infiltration, runoff, saturated and unsaturated zone water dynamics, groundwater recharge, as well as snow and ice hydrology dynamics. The soil hydrology module solves the 1-D Richard's equation in the vertical direction and uses a heat diffusion solution to compute the 140 soil temperature profile. A soil-freezing module has been also recently introduced (Yu et al., 2020). The vegetation module computes photosynthesis, respiration, vegetation phenology, carbon and nutrient budget including allocation to different plant compartments and tissue turnover.
The management of vegetation such as grass cuts and fertilization can be prescribed as part of model inputs. The soil biogeochemistry module accounts for the carbon and nutrient budgets of litter and soil organic matter. Mineral nitrogen (N), 145 phosphorous (P), and potassium (K) budgets and the nutrient leakage are also simulated. This study is particularly focused on NO3. Nitrate pool in the soil depends on the net immobilization/mineralization fluxes, nitrogen uptake, nitrogen leaching, and nitrification/denitrification fluxes, with the latter that are simulated with empirical functions of the amount of NO3 and environmental conditions (detailed description in Fatichi et al., 2019). NO3 transport process is not solved in the soil column, but leaching is assumed to occur at the bottom of the soil column and it is proportional to the water leakage and bulk NO3 150 concentration in soil water (Fatichi et al., 2019).
The model can be run in a distributed topographically complex domain (e.g., Mastrotheodoros et al., 2019, 2020) but here it is employed in its plot-scale version, which simply solves 1-D vertical exchanges. In summary, based solely on meteorological inputs, soil and vegetation parameters, T&C-BG simulate prognostically all the other variables, from energy, carbon, and water fluxes, to vegetation biomass, up to soil nutrient mineralization and leaching. All these processes interact with each other. 155

Model parametrization
The model allows the simulation of different vegetation types, but in this case only grass is included. In order to keep results general enough and maximize future model transferability across grassland sites, we adopt a common parametrization for all https://doi.org/10.5194/bg-2020-294 Preprint. Discussion started: 26 August 2020 c Author(s) 2020. CC BY 4.0 License. of the nine sites with few exceptions justified by elevation dependent parameters such as threshold temperature for leaf onset.
Vegetation parameter selection was based on previous experience with European grasslands (Fatichi et al., 2014;Fatichi and 160 Pappas, 2017), soil biogeochemistry parameters are currently fixed for all sites in absence of more specific information (see discussion in Fatichi et al. 2019). The site-specific soil content of clay, sand, and organic matter in each site is provided as input parameter to the model, which internally computes the hydraulic soil parameters by means of pedo-transfer functions (Saxton and Rawls 2006). To make results comparable, across all the sites we assume a soil depth equal to 1.4 m, corresponding to the depth of lysimeters, which is discretized into 16 soil layers of increasing depth from the surface to the bedrock. The 165 biogeochemistry active zone is fixed at 25 cm depth and the grass roots are assumed to be exponentially distributed with a maximum rooting depth of also 25 cm. We do not simulate the groundwater dynamics, but we compute the groundwater recharge as the water leaching at 1.40 m depth. The selected parameters are detailed in Table S1.
In each site we force the model with the local meteorological conditions, while atmospheric CO2 concentrations are assumed to follow the observed historical global trend (Keeling et al., 2009). Nutrients depositions in absence of local specific 170 information for each site are set using global maps of recently observed values for nitrogen and phosphorus (Galloway et al., 2004;Mahowald et al., 2008;Vet et al., 2014). We confirm the model performance against an ensemble of diverse variables. First, we compare simulated and observed net radiation, latent heat, sensible heat, and Gross Primary Production (GPP) using flux towers data. Second, we test the hydrological module comparing the effective soil saturation (e.g., normalized soil moisture) from the model and from local 175 measurements. Third, we confirm the vegetation module comparing the simulated biomass and/or Leaf Area Index (LAI) dynamics with data retrieved from literature wherever available (Ammann et al., 2007;Hammerle et al., 2008;Gilgen and Buchmann, 2009;Zeeman et al., 2010;Chang et al., 2013;Finger et al., 2013;Filippa et al., 2015;Prechsl et al., 2015).
Finally, we compare biogeochemical fluxes in terms of harvested nitrogen and carbon, and leaching of NO3 and DOC. The confirmation of the soil biogeochemistry module is possible only for the sites FEN, ROT and GRA, equipped with lysimeters, 180 even though difference in scale, soil properties, and timing of management between flux-tower footprint and lysimeters (Oberholzer et al., 2017;Mauder et al., 2018) should be accounted in the comparison.
Detailed manure input data for all the case studies were not available. To bypass this problem, whenever we did not have information about fertilization (i.e., in all sites except in the German ones), in the reference simulations we assume that the cut grass is left on the ground, thus guaranteeing a nutrient application, similarly to fertilization, and a nutrient budget without 185 major losses. While such a scenarios is unrealistic it allows to keep nutrient budgets in a dynamic equilibrium, without having to assume specific fertilization rates.
In FEN, ROT and GRA, the sites where both the flux tower and the lysimeters are co-located, grassland is managed slightly differently above the lysimeters and below the flux-tower. To validate model results we run multiple simulations, each fed with the corresponding management either of flux-tower or lysimeter plots. 190 Since the current initial conditions of the carbon and nutrients pools in the soil are unknown, as common in modeling studies, we spin-up carbon and nutrient pools running only the soil-biogeochemistry module for 1000 years using average climatic conditions with prescribed litter inputs taken from preliminary simulations with the soil-biogeochemistry module inactive.
Then we used the spun-up carbon and nutrient pools as initial conditions for the hourly-scale fully coupled simulation over the period for which hourly observations are available. This last operation is repeated two times which allows reaching a dynamic 195 equilibrium.

Numerical fertilization experiments
We set up numerical experiments to test the response of the study sites to different manure application regimes. First, we classify the sites based on elevation ranges and management: pre-Alpine/Intensive sites located at elevations lower than 800 m a.s.l. and intensively managed (CHA, OEN, FEN, ROT), Alpine/Extensive for sites with an altitude between 800 and 1000 200 m a.s.l. (GRA, STU, FRU) and high-Alpine/Extensive sites above 1000 m a.s.l. extensively managed, which include the two grasslands MB and TOR. It is common practice among farmers to fertilize at the beginning of the growing season and after each cut. The number of cuts and manure applications along the year decreases with elevation. Although the time intervals between cuts can be similar, the length and intensity of the winter dormant period ultimately determines growing season length and, in turn, the number of cuts, the volume of locally needed organic fertilizer and the opportunity to bring that out into the 205 field. We aligned the management strategy of our numerical experiments to these notions and used a classification of the sites into three groups. In our simulations manure is applied 6, 4, and 2 times and the grass is cut 5, 3 and 1 times, respectively, in pre-Alpine, Alpine and high-Alpine sites. Grass is cut at a height of 0.07 m, following common practice. Referring to literature (Ammann et al., 2007(Ammann et al., , 2012Merbold et al., 2014;Fu et al., 2017) we identified a broad range of possible N yearly loads applied in grasslands. For pre-Alpine sites this range spans between 50 kg ha -1 yr -1 (i.e., extensive management) and 500 kg 210 ha -1 yr -1 (highly intensive management). This upper limit intentionally exceeds the actual management practices and allows us to analyze what can happen if fertilization loads are increased. We computed the corresponding single application amount of manure based on C:N of manure ratio equal to 8.9, which is reported for the sites of FEN, ROT and GRA (Fu et al., 2017), and is also similar to values suggested in literature (Sommerfeldt et al., 1988;Nyamangara et al., 1999) or slightly lower (Zhu, 2007;Kumar et al., 2010). The input of P and K is assumed to be proportional to the N input based on assigned N:P and N:K 215 values guaranteeing non P-or K-limited conditions to the system. The specific manure amount computed for pre-Alpine sites is applied at the corresponding lower frequencies determined for Alpine and high-Alpine sites. The resulting annual loads in all the scenarios are reported in Table 2. For each fertilization scenario, in each site, we spin-up the system running the soilbiogeochemistry module for 1000 years under average climatic conditions and the specific management scenario. Thus, the initial conditions of each simulation correspond to the final state of the spin-up simulation run for each specific site and 220 management scenario. https://doi.org/10.5194/bg-2020-294 Preprint. Discussion started: 26 August 2020 c Author(s) 2020. CC BY 4.0 License.
In the result analysis, we focus our attention on the resulting N contribution to grass productivity and in the N lost as leaching at the bottom of the soil profile. The former represents one of the positive gain from agriculture as economic activity and the latter a negative externality into the environment, as most NO3-leaching will ultimately reach the groundwater storage and the 225 rivers. We evaluate the different management strategies bringing together the two indicators in an index computed as the ratio of harvested-N to N-NO3 leachate. The higher the value of the index the most favorable is the management strategy for both farmers and environment. Thus, we interpret this index as a proxy of the efficiency of the grassland in profiting from the N added in fertilization. We also analyze potential relations between such efficiency index and site-specific characteristics, such as the elevation and the percentage of precipitation that becomes groundwater recharge. 230

Model confirmation
The performance of the model in representing the energy and carbon fluxes measured from the flux towers is good across all the sites. The R 2 values of the model/observations comparison are reported in Table 3 and further goodness of fit metrics are reported in Table S2. The seasonality of the energy fluxes is well represented across all the study sites, as illustrated by the 235 pattern of latent heat shown in Figure 2.
We show the comparison between the observed and the simulated effective saturation of the soil ( Figure 3) to test the hydrological dynamics. The intra-annual pattern is generally well captured by the model. The average coefficient of determination R 2 equals 0.49 and the average RMSE is 0.10. Results highlight the pedo-climatic differences across the Alpine region. Some sites tend to reach field capacity quite easily (e.g., FEN, ROT and GRA), in other sites effective saturation is 240 often below 50% (CHA, OEN, STU and FRU). The high-Alpine sites TOR and MB fall in this second case, but with pronounced peaks of saturation when snow melt occurs.
A summary of the simulated site-specific variables in terms of water, energy and carbon fluxes is shown in Table 4. The total net radiation does not vary consistently with elevation. As a result of temperature constraints, latent heat decreases with increasing elevation, leading to lower ET in high-Alpine sites, as the higher values of the Bowen ratio for the high-Alpine sites 245 indicates. Phenology is also affected by elevation, the average day of the year when the simulated growing season starts (taken as the mean day when the biomass is higher than the biomass threshold at cut height) increases with increasing elevation. It spans from mid-March in the pre-Alpine site CHA to mid-May in the high-Alpine sites of MB and TOR. Consequently, the mean yearly GPP is higher where the growing season starts earlier. For instance, the resulting mean annual GPP in CHA is more than the double compared to TOR. However, the average GPP of the month July only, intended as a proxy for the GPP 250 in the maximum growth period, does not differ much across the sites, and lower values are rather indicative of water limitations, testifying similar levels of productivity in the peak of the summer regardless of the site elevation.
The model confirmation against grass biomass dynamics clearly shows that snow presence on the ground limits the growing season at higher altitudes ( Figure 4). The model responds to the inter-annual variability of the snow cover. In years with large 255 snow accumulation, the growing season starts later compared to years with a less persistent snow pack. For instance, the years 2011 and 2013 in TOR are characterized by lower and higher than average snow depth respectively. The following growing seasons is respectively anticipated and delayed in both simulations and observations.
In CHA, FRU and OEN the total simulated harvested biomass falls within, or is very close to, the range reported by observational studies. Considering the large variability in published biomass estimates across sites and even within the same 260 site, it is difficult to conclude if such differences are a model shortcoming or simply dictated by observation uncertainty. In OEN the LAI dynamics are also well captured. The simulated biomass and LAI patterns in FEN, ROT and GRA fit the more detailed field data for 2015 ( Figure S1). Also, the LAI in STU is simulated well, and the length of growing season and the range of variability of available observations are matched. However, there are discrepancies on the exact dynamics of grass cuts in several sites, most pronounced at STU. These are expected as grass cuts are prescribed at regular intervals in the model, 265 while they may occur irregularly in reality (for instance dictated by specific weather events) and they might also vary from year to year. The simulation in TOR matches quite well the magnitude of the grass biomass and the beginning of the growing season, but overestimates its length by approximately one month, which explains the major discrepancy between simulated and observed LAI and leaf-biomass.
Simulations of the lysimeter data in FEN, ROT and GRA provide the opportunity to test the soil biogeochemical dynamics 270 and especially nutrient leaching as well as biomass productivity. The harvested dry matter and, consequently, also harvested

Numerical fertilization experiments
The simulated NO3 concentrations in the leaching flux span two orders of magnitude ranging between 0.07 and 12 mg L -1 (Figure 6a), compared to an environmental limit of 50 mg L -1 in Europe (EEC, 1998). In this case we prefer comparing NO3 rather than NO3-N concentrations to the widely known threshold of 50 mgNO3 L -1 imposed by the European Directive. The high-Alpine site TOR exhibits the highest NO3 concentration, increasing almost monotonically with increasing nitrogen input 280 ( Figure 6a). The other high-Alpine site MB shows concentrations comparable to the other sites but with higher interannual variability. Except for TOR, in the range of N inputs between 30% and 50% of the maximum fertilization, the NO3 concentration in leaching does not increase monotonically with the input (Figure 6a).
The variability of NO3 leaching concentration is remarkable not only across sites belonging to different elevation classes, but also within the same elevation range. The largest variability among sites of similar elevation is in the N input range of 30 to 50% of the maximum N fertilization. For example, the difference in NO3 leaching concentration between STU and FRU at 50% of maximum fertilization is comparable to the difference between the leaching obtained in STU increasing the N-load from 50 to 100%. The same observation applies to the pre-Alpine sites of FEN and OEN.
Harvested N as a function of the nitrogen fertilization in the high-Alpine sites MB and TOR differs remarkably from the other sites (Figure 6b). At these high-Alpine sites, the harvested-N does not considerably increase with increasing N inputs, as grass 290 productivity is likely constrained by other factors than N. For the other sites, a clear increase of harvested N emerges for N fertilization spanning from 10% to about 40% of the maximum. For values higher than 40%, there is not much gain in simulated harvested N (and thus biomass) regardless of the increased nutrient availability.
When results are summarized in the harvested N versus N leaching space, this difference is more pronounced (Figure 6c).
Patterns of pre-Alpine and Alpine sites partly overlap, but occupy a distinct space away from high-Alpine sites. High-Alpine 295 sites show a limited range of variability of harvested N compared to the range of leaching NO3. Beyond a certain threshold of N input also the other pre-Alpine and Alpine sites show limited increase in harvested N. Also note that for lower N-input the NO3 leaching is to be relatively stable while harvested N and thus grass biomass increase. The N contributing to grass growth is one order of magnitude higher than the leaching N in the high-Alpine sites. In comparison, for pre-Alpine and Alpine sites it is two orders of magnitude higher testifying a closer N-cycle despite more intensive fertilization. 300 When the harvested N and the N concentration in groundwater recharge are combined in a ratio, we obtain an index, which depicts the efficiency of the fertilization practice ( Figure 6d). Ideally, this index should be maximized to obtain a win-win situation which optimizes grass yield and minimizes water quality issue. When plotting the index as a function of the percentage fertilization, the pre-Alpine and Alpine sites exhibit a range where the index is maximum. This range is in between 20% and 60%. Specifically, CHA, OEN, FEN GRA and FRU exhibit the highest value at 20% of the fertilization input (67-305 100 kg N ha -1 yr -1 ), ROT at 35% of the maximum load (175 kg N ha -1 yr -1 ) and STU at 50% of the maximum load (167 kg N ha -1 yr -1 ). The high-Alpine sites do not exhibit any optimum, but only a monotonically decreasing line, as N fertilization does not stimulate growth but rather increase NO3 leaching. beyond soil-element biogeochemistry. In the sites OEN, FRU, MB and TOR the high groundwater recharge also corresponds to high soil hydraulic conductivity (Ks), while in GRA Ks is relatively small ( Figure S2). On the contrary, in STU the groundwater recharge fraction is relatively small and the N-efficiency index is high despite a high hydraulic conductivity ( Figure S2).

Fully-integrated mechanistic ecosystem modelling: successes and limitations
While many ecosystem modeling applications have been discussed in literature, including detailed ecohydrological (e.g., Ivanov et al., 2008;Tague et al., 2013;Millar et al., 2017) and soil biogeochemistry applications (e.g., Parton et al., 1998;Kraus et al., 2014;Robertson et al., 2019), rarely, if ever, a single integrated model has been tested across different compartments and disciplines in concurrently reproducing surface energy budget, hydrological dynamics, vegetation 325 productivity, and nitrogen budget. Here, we raise the bar to challenge T&C-BG, in reproducing these processes across nine grassland sites in the broad Alpine region. Furthermore, to ensure future model transferability and avoid local tuning, we also use the same vegetation and soil biogeochemistry parameters across all sites, with few exceptions where an elevation or latitudinal dependence of a parameter should be preserved. Despite such an "average" parameterization, the model responds surprisingly well to the challenge as energy and carbon fluxes, soil hydrology, vegetation dynamics, and NO3 and DOC 330 leaching fluxes are all with realistic magnitude and similar to observations with few notable exceptions discussed below. Also feedback between compartments are realistic in the model, as it is the case of growing season length varying depending on the date of complete snow cover disappearance or the limitations in grass growth and thus LAI at low nitrogen availability. Overall, simulations suggest that a correct representation of phenology and thus length of the growing season is a fundamental aspect of model performance as peak of the season GPP is much more similar across sites than annual GPP values (Table 4). 335 Despite such a positive outcome, there are a number of uncertainties in both the observations and model simulation that are relevant to highlight. It is well-known that flux towers do not close the energy budget (e.g., Foken, 1998Foken, , 2008Wilson et al., 2002;Widmoser and Wohlfahrt, 2018;Mauder et al., 2020). The model generally overestimates the sensible heat compared to observations, thus suggesting that the missing energy is most likely attributable to sensible heat as supported by other studies (Mauder et al., 2006;Wohlfahrt et al., 2010;Liu et al., 2011) and justifies the lower R 2 for sensible heat compared to the other 340 fluxes. Observations of soil water content depend on the specific soil hydraulic properties and microtopography in the location where the sensor is installed and are also often subject to temporal drifts. The model represents a vertically explicit but spatially implicit average soil moisture over the tower footprint. For this reason, even though we normalize soil moisture using effective saturation, the comparison should be seen more in qualitative terms rather than attempting to reproduce exactly the observed soil moisture values. Most important, in terms of vegetation productivity, biomass data reported from different articles present 345 a remarkable variability despite referring to the same study site. This might depend on differences in sampling protocol and instrumentation (Zeeman et al., 2019) as well as natural spatial variability. In light of these uncertainties we do not dwell in explaining model to data differences as far as the long-term magnitude of observed biomass is similar. The only exception is the very significant underestimation of the simulated biomass compared to lysimeters measurements. We attribute large portion of this inconsistency to the well-documented lysimeter oasis/border effect, which generates crop yields and evapotranspiration 350 fluxes 10-20% larger than larger-scale observations (Oberholzer et al., 2017). As the model captures well the pattern of biomass and carbon fluxes in the flux-tower locations, it would be difficult to justify why biomass productivity should be much different a few hundreds of meter apart in the lysimeters under similar management (Fu et al. 2017). The model does not properly represent the inter-annual variability of harvested carbon and N, for example not capturing the higher productivity of 2013, a particularly productive year due to the reduced snow cover (Zeeman et al., 2017). One explanation can be that we input to the 355 model the same management strategy every year while local management vary from year to year. More generally, simulation results are affected by the spin-up process performed to initialize the carbon and nutrient pools in the system in absence of historical information. The induced stationarity might influence the soil organic carbon and nitrogen pools enrichment or depletion and generates discrepancies with observations. Other possible N losses in the environment such as ammonia volatilization or denitrification (N2) do not affect this discrepancies, being their average losses across the three sites in the 360 order of 0.74 kgN ha -1 yr -1 and 0.28 kgN ha -1 yr -1 respectively. Simulated NO3 concentration in groundwater recharge in FEN is underestimated compared to the observed values from lysimeters. While this can be likely due to model shortcoming, the model uncertainty is not the only factor contributing to this incongruence as also the discrepancy of management between the model and reality might play a role. Moreover, the model does not take into account preferential flows, which might be particularly pronounced in the lysimeters, (Schoen et al., 1999;365 Groh et al., 2015;Benettin et al., 2019;Shajari et al., 2019). Regardless of these existing differences, and keeping in mind that the exact value of the results is likely more uncertain that the comparison across scenarios/sites, we argue that results obtained with T&C-BG are more encouraging than discouraging and allowed us to explore the complex functioning of the overall ecosystem and to carry out virtual numerical experiments that can inform farmers and legislators.

Simulated responses to fertilization and management implications 370
A term of comparison for the simulated NO3 leaching concentrations is the limit of 50 mgNO3/L imposed on groundwater nitrate concentrations by the European Drinking Water Directive (98/83/EC, EEC, 1998). Results from the numerical experiments show that NO3 leaching concentrations are generally lower or much lower than this threshold, even under highly intensive management practices.
The EU Nitrate Directive 91/676/EEC (EEC, 1991) imposes the maximum N fertilization rate for stable managed grasslands 375 of 170 kgN ha -1 yr -1 . The Directive leaves room for flexibility to European countries, since they are allowed to introduce higher N fertilization loads if they demonstrate that grassland absorbs it efficiently. However, if the monitoring of surface and groundwater NO3 concentration reveal water quality issues, then EU can intervene with an infringement proceeding. As an example, Germany in 2017 re-adapted the national Fertilizer Ordinance of 2007 after an infringement proceeding by the EU (Kuhn, 2017). 380 The threshold of 170 kgN ha -1 yr -1 is computed on the basis of a NO3 input-output balance at the farm scale. In such a simple computation the NO3 losses in the environment are assumed to be 30% of the input (IPCC, 2000(IPCC, , 2006. These losses include both groundwater recharge and surface runoff. Although some countries have lowered this threshold to 10%, such as Ireland, or to 20%, such as Switzerland, there is evidence that the threshold is quite high compared to observed losses in grasslands (Eder et al., 2015). Our study confirms such a finding highlighting NO3 concentrations in the order of 0.1-12 mg L -1 , although 385 we only consider the NO3 losses with groundwater recharge and neglect those through surface runoff. The surface runoff contribution to NO3 losses is more likely to be higher in the high-Alpine sites, where the winter snowpack melting on a frozen soil favors surface runoff. However, there is evidence that NO3 losses through surface runoff are usually small compared to losses through groundwater recharge ( Jackson et al., 1973;Casson et al., 2008). Despite the noticeable inter-site variability, the simulated losses of NO3 into groundwater are generally lower than 10% of the N-input, even under highly intensive 390 fertilization scenarios (Figure 6c).
Among the N inputs explored in the analysis, the value which guarantees the maximization of the N-efficiency ratio is quite close or lower than the limit of 170 kgN ha -1 yr -1 in pre-Alpine and Alpine sites (Figure 6d). Thus, we confirm that such a limit is a reasonable upper threshold. However, an emerging aspect from the simulations is the large variability across sites in NO3 leaching and response to fertilization. While model uncertainties exist and we are unlikely to capture the exact magnitude of 395 all N-fluxes at all sites, this variability is likely underestimated rather than overestimated by the model due to simplifying assumptions and commonality of parameters. The NO3 leaching and the grassland yields depend on the degree at which nitrogen is limiting productivity versus other environmental factors (e.g., temperature), on the capability of vegetation in uptaking nutrients, and on the hydrological characteristics of the sites. A soil favoring water drainage or a fast snowmelt at the end of the winter generates larger amount of N leaching into groundwater. Most likely one factor does not exclude the other 400 ( Figure 7 and Figure S2), but the combination of both soil properties and hydrological regime drives the correlation between N leaching concentration and groundwater recharge fraction. The N uptake efficiency of grassland depends more on the nutrient demand with a longer growing season (Zeeman et al. 2017) higher temperatures and lack of water stress favoring the N demand, even though the importance of the different aspects are difficult to disentangle (Lü et al., 2014). Literature shows that also the richness of species composing the grassland might favor nutrient uptake (Tilman et al., 1996;Spehn et al., 2005;405 Niklaus et al., 2006), but this type of ecological feedback is not implemented in the model, which simply considers an "average species composition" as a representative grassland. In the simulations, the length of the growing season at OEN, GRA and FRU is comparable to the other pre-Alpine and Alpine sites (Table 4). However, simulated NO3 leaching is higher and the produced biomass is lower. This is especially true at low N fertilization levels. This further remarks the important role of local hydrology (Fig 6a-c) on N-cycle. 410 As local hydrological and soil conditions (and potentially also different site biogeochemical history and species compositionnot considered here) modify the grass response to N fertilization, disparities among farmers might emerge simply as a function of location. Farmers owing a field in an area characterized by higher groundwater recharge might be disadvantaged because losses of NO3 to groundwater could be higher and yields limited for the same amount of N fertilization. On the contrary, farmers owing fields in hydrological favorable sites might be advantaged. Such disparities should be taken into account in two 415 ways: first in delineating location specific fertilization limits and second in applying some sort of compensation for locations/regions that are disadvantaged. A good practice in this direction is the proof of ecological performance (PEP), introduced in Switzerland (Swiss Federal Council, 1998). Direct payments are transferred to farmers who join this program, which imposes a wealth of rules in order to preserve the environment. Among the requirements the computation of the farm NO3 balance needs to be also provided. 420 On the basis of modeling results, regulations based on a fixed-threshold for the whole European grasslands could be largely suboptimal. Even if we might underestimate losses because of model uncertainties and because we do not consider surface runoff, resulting NO3 losses are abundantly lower than 30% of the N-input.
Overall, there is still leverage for enhancing N fertilization before reaching the threshold assumed by the EU Directive.
However, we also showed that higher N-load do not necessarily correspond to higher yields, especially above certain 425 thresholds. Therefore, even though NO3 leaching is low, an effective increase of manure application might not be beneficial for grass yield and should be evaluated on the basis of the site-specific characteristics.
We suggest as a possible alternative and better strategy the definition of thresholds on fertilizer input based on a distributed mapping of the landscape. In absence of more precise information, agricultural areas could be classified based on the crop NO3 use efficiency, soil type, and hydrological characteristics. For instance, Klammler et al. (2013) provide and interesting example 430 of NO3 leaching mapping in a case study in Austria. Alternatively, mechanistic ecosystem models as used in this study could be employed to identify ranges of optimal fertilization levels for different sites, groups of sites, or even in a fully distributed manner. This process could be demanding in terms of resources as requires advanced expertise, sufficient data to constrain the models, and adequate modelling tools (Decrem et al., 2007) . However, with increasing computational capabilities and data availability to constrain model parameters, it is likely that mechanistic modeling approaches might become more popular and 435 an essential tool for fostering the mapping process and provide distributed information to refine environmental regulations.

Conclusion
We simulated the dynamics of nine managed grassland sites across the broad European Alpine region. We applied the mechanistic model T&C-BG which fully integrates land-surface mass and energy fluxes, soil hydrology, vegetation dynamics, and soil biogeochemistry. The model was confirmed to reproduce realistic magnitude and temporal dynamics of 440 ecohydrological variables across multiple compartments in an effort of model evaluation that goes beyond many of the existing attempts to confirm ecosystem models. The model was subsequently used to quantify the impact of different N fertilization scenarios, with focus on grass productivity and NO3 concentration in groundwater recharge. Simulations reveal that although groundwater recharge concentrations are relatively small and well below environmental policy limits, there is high variability across grasslands, also for sites located at similar elevation. Such variability is mainly driven by local environmental controls 445 on productivity that reflects in the grass capability to uptake nutrients and by the differences in the hydrological regime summarized as the fraction of precipitation that becomes recharge. We suggest that these factors should be taken into account by legislators while defining thresholds on fertilizer loads. Guidelines based on the site-specific rather than based on fixed https://doi.org/10.5194/bg-2020-294 Preprint. Discussion started: 26 August 2020 c Author(s) 2020. CC BY 4.0 License. thresholds across large regions would favor the maximization of grass yields while allowing preserving similar water quality targets. Fully-integrated mechanistic ecosystem models as employed here have a big potential as tools to construct these maps 450 under current and future climate scenarios.

Code availability
The model code is available at the link: https://hyd.ifu.ethz.ch/research-data-models/t-c.html (last access July 2020).

Author contribution 460
MB and SF conceived the ideas and discussions with MZ supported their development. MB and SF designed the experiments, MB ran simulations and developed the analyses. All authors contributed to the writing.

Competing interests
The authors declare that they have no conflict of interest.  Galloway, J. N., Dentener, F. J., Capone, D. G., Boyer, E. W., Howarth, R. W., Seitzinger, S. P., Asner, G. P., Cleveland, C. 575 C., Green,P. A.,Holland,E. A.,Karl,D. M.,Michaels,A. F.,Porter,J. H.,Townsend,A. R.,and Vörösmarty,C. J.,Nitrogen cycles: Past,present,and future,Biogeochemistry,70 (2) (Filippa et al., 2015). In all the subplots, we compare the simulated either biomass or LAI (black line) with observations (black dots), simulated snow depth (grey line) is also shown.  N input. As a reference the biggest marker indicates N input of 100%. The dotted lines represent the estimate of maximum NO3 losses assumed by EU regulations or nearby countries. They correspond to the values 17 and 51 kgN ha -1 yr -1 , i.e., 10% and 30% of the maximum allowed input 170 kgN ha -1 yr -1 . (d) N fertilization efficiency index computed as the ratio between the harvested N and N concentration in groundwater recharge as a function of nitrogen input. In all the subplots the colors represent the elevation class, i.e., pre-Alpine (red), Alpine (blue) and high-Alpine (yellow) sites. The colored area around the 880 markers and lines represent the 25 th and 75 th percentile of the internannual variability of the simulated variables. The vertical dashed bars represent the limit of 170 kgN ha -1 yr -1 imposed by the EU Nitrate Directive in each of the three classes. the percentage of yearly water recharge to groundwater over the yearly precipitation. In both plots the whiskers span the 25 th and 75 th percentile of the interannual variability.   Table 4. Simulated energy, water and carbon fluxes at each site. The mean annual evapotranspiration (ET), mean annual net radiation (Rn), mean annual Bowen ratio (Bo), mean annual gross primary production (GPP), mean gross primary production of July (July GPP) and the mean day of the year in which the growing season starts (Start growing season) are reported.