Nitrogen cycling in CMIP6 land surface models: progress and limitations

. The nitrogen cycle and its effect on carbon uptake in the terrestrial biosphere is a recent progression in earth system models.

Abstract. The nitrogen cycle and its effect on carbon uptake in the terrestrial biosphere is a recent progression in earth system models. As with any new component of a model, it is important to understand the behaviour, strengths, and limitations of the various process representations. Here we assess and compare five land surface models with nitrogen cycles that are used as the terrestrial components of some of the earth system models in CMIP6. The land surface models were run offline with a common spin-up and forcing protocol. We use a historical control simulation and two perturbations to assess the model nitrogen-related performances: a simulation with atmospheric carbon dioxide increased by 200 ppm and one with nitrogen deposition increased by 50 kgN ha −1 yr −1 . There is generally greater variability in productivity response between models to increased nitrogen than to carbon dioxide. Across the five models the response to carbon dioxide globally was 5 % to 20 % and the response to nitrogen was 2 % to 24 %. The models are not evenly distributed within the ensemble range, with two of the models having low productivity response to nitrogen and another one with low response to elevated atmospheric carbon dioxide, compared to the other models. In all five models individual grid cells tend to exhibit bimodality, with either a strong response to increased nitrogen or atmospheric carbon dioxide but rarely to both to an equal extent. However, this local effect does not scale to either the regional or global level. The global and tropical responses are generally more accurately modelled than boreal, tundra, or other high-latitude areas compared to observations. These results are due to divergent choices in the representation of key nitrogen cycle processes. They show the need for more observational studies to enhance understanding of nitrogen cycle processes, especially nitrogen-use efficiency and biological nitrogen fixation.

Introduction
The terrestrial carbon (C) cycle currently removes around a third of anthropogenic carbon emissions from the atmosphere (Friedlingstein et al., 2019;Le Quéré et al., 2018). Changes in this uptake will affect the allowable emissions  for targets such as limiting warming to 1.5 • C (Millar et al., 2017;Müller et al., 2016). Nitrogen (N) is required to synthesise new plant tissue (biomass) out of plant-assimilated C, in differing ratios across biomes and tissue types (McGroddy et al., 2004). Therefore, future projections of terrestrial C uptake are dependent on N availability, particularly under high atmospheric carbon dioxide (CO 2 ) concentrations (Arora et al., 2020;Meyerholt et al., 2020;Wieder et al., 2015b;Zaehle et al., 2014b). A key tool for projections of allowable emissions are earth system models (ESMs), which project the responses of the coupled earth system to perturbations in forcings (Anav et al., 2013;Arora et al., 2013;Friedlingstein et al., 2006;Jones et al., 2013). Of the ESMs that contributed results to the Fifth Phase of the Coupled Model Intercomparison Project (CMIP5; Taylor et al., 2012) only two, based on the same land component, included terrestrial N cycling (Thornton et al., 2009). A number of studies with stand-alone terrestrial biosphere models (Sokolov et al., 2008;Wårlind et al., 2014;Zaehle et al., 2010;Zhang et al., 2013) as well as post hoc assessments of CMIP5 projections suggest that predictions of terrestrial C uptake would decrease by 37 %-58 % if ESMs accounted for N constraints (Wieder et al., 2015b;Zaehle et al., 2014b).
Among the latest generation of models contributing results to CMIP6 (Eyring et al., 2016), at least 10 ESMs incorporate the N cycle (Arora et al., 2020). These models employ a range of assumptions and process formulations, reflecting divergent theory and significant knowledge gaps (Zaehle and Dalmonech, 2011). Initial results imply that the inclusion of a N cycle has reduced the spread of results across multiple ESMs . Since N availability is an important source of uncertainty for the C cycle , an assessment of the sensitivity of the N cycle in these models to changes in atmospheric CO 2 and N inputs is required. Because of the tight coupling of C and N dynamics, a direct evaluation of the N effects on simulated C cycle dynamics using conventional model benchmarking approaches (Collier et al., 2018;Luo et al., 2012) is challenging. More insights into the magnitude of a N effect can be gained by comparing model simulations against perturbation experiments that provide evidence for the responses of terrestrial ecosystems to changes in the C and N availability (Thomas et al., 2013;Wieder et al., 2019;Zaehle et al., 2010).
In this study, we test five land surface models (LSMs) employed in the latest generation of ESMs used in CMIP6. We use a set of standardised model forcing and protocol to simulate historical changes in the C and N balance, as well as the response to N and C perturbations. The perturbation ex-periments (described in the Methods section) are designed to approximate field experiments undertaken to understand the effects of elevated CO 2 or N (e.g. Ainsworth and Long, 2005;LeBauer and Treseder, 2008;Song et al., 2019). These simulations reveal the overall pattern of response of the model to these forcings. We use a range of observations from the literature and model-to-model comparisons to assess the behaviour and performance of the models. The approach of assessing ESM N cycles via their corresponding offline LSMs, driven by a standardised set of model forcing, has the advantage of making model projections directly comparable while giving a representative view of the latest N cycle developments.

Models
We ran simulations with five LSMs that are the land components of ESMs taking part in CMIP6. The key N process formulations are summarised in Table 1. A brief description of each model follows.
The Community Land Model version 4.5 (CLM4.5; Koven et al., 2013;Oleson et al., 2010) is used in the Euro-Mediterranean Centre on Climate Change coupled climate model (CMCC-CM2; Cherchi et al., 2019) and TaiESM1. The N component is described in Koven et al. (2013). CLM4 is the precursor to CLM4.5 and was the first N model for ESMs used in CMIP5 (Thornton et al., 2007(Thornton et al., , 2009. While the N cycling component of CLM4.5 is similar to CLM4, some features of CLM4.5, such as leaf physiological traits (Bonan et al., 2012), were modified, and there is a vertically resolved soil biogeochemistry scheme (Koven et al., 2013) as opposed to the single-layer box modelling scheme for C and N in CLM4.
The Community Land Model version 5 (CLM5; Lawrence et al., 2019) is used in the Community Earth System Model Version 2 (CESM2; Danabasoglu et al., 2020) and the Norwegian Earth System Model version 2 (NorESM2; Seland et al., 2020). CLM5 is the latest version of CLM and represents a suite of developments on top of CLM4.5. The N component is described in Fisher et al. (2010) and Shi et al. (2016). The key difference for the N cycle compared to CLM4 is the implementation of a C cost basis for acquiring N, derived from the Fixation and Uptake of Nitrogen (FUN) approach .
The JSBACH version 3.20 model (Goll et al., 2017) is used in the Max Planck Earth System Model version 1.2 (MPI-ESM; Mauritsen et al., 2019) and Alfred Wegener Institute Earth System Model (AWI-ESM). The N component is described in Goll et al. (2017).
The Joint UK Land Environment Simulator version 5.4 (JULES-ES; Best et al., 2011;Clark et al., 2011) is used in the UK Earth System Model (UKESM1; Sellar et al., 2020).

Forcing data and model initialisation
All model pools were spun-up to equilibrium forced by preindustrial conditions. This comprised of a constant atmospheric CO 2 concentration of 287.14 ppm, cycling global climate data at 0.5 • × 0.5 • resolution for the years 1901-1930 from the CRU-NCEP dataset version 7.0 (New et al., 2000), using constant 1860 land cover from the Hurtt et al. (2020) database, and 1860s nitrogen deposition from the Atmospheric Chemistry and Climate Model Intercomparison Project (Lamarque et al., 2013). Next, transient historical runs were performed for the 1861-1900 period with the same climate forcing as the spin-up but with time-varying atmospheric CO 2 concentrations from synthesised ice core and National Oceanic and Atmospheric Administration (NOAA) measurements, as well as annually varying land use from Hurtt et al. (2020). The N deposition is taken from the Atmospheric Chemistry and Climate Model Intercomparison Project (Lamarque et al., 2013). The simulations were then continued for 1901-2015 under all time-varying forcings, including climate.
The models applied their individual soil and vegetation spin-ups according to their respective conventions. The goal of the spin-up procedure is to obtain quasi-steady states of the ecosystem pools in relation to climate, avoiding drifting pool sizes due to a lack of equilibrium, especially for slowturnover soil organic matter pools. Because of differences among the models, pool sizes after spin-up are not expected to be identical.

Model experiments
In addition to the historical run described above (referred to hereafter as the "Control"), two experiments were performed for the period 1996-2015: increased CO 2 (+CO 2 ) and increased N (+N). These two experimental runs are compared to the corresponding 1996-2015 simulations from the unperturbed Control runs. Table S1 in the Supplement provides a summary of the experiments.
For the increased CO 2 experiment (+CO 2 ), the atmospheric CO 2 concentration was abruptly increased to constant 550 ppm. This is almost twice the pre-industrial atmospheric CO 2 of 280 ppm or a 200 ppm increase compared to the 1996 atmospheric CO 2 of ∼ 350 ppm, similar to free-air CO 2 enrichment experiments performed in the 1990s (Norby et al., 2005).
For the increased N experiment (+N), N deposition was abruptly increased by 50 kgN ha −1 yr −1 , which is roughly equivalent to what has been used in a number of forest N fertilisation trials (Thomas et al., 2013) and around 5-10 times higher than typical background N deposition (Zak et al., 2017).

Analytical framework
The response of the terrestrial productivity (and with it terrestrial C storage) to changes in the N cycle is in principle controlled by two components: (i) the net ecosystem balance of N, i.e. the difference between changes in ecosystem N inputs and N losses, which determines the change in the ecosystem N available for plant growth and immobilisation during litter and soil organic matter decomposition, and (ii) the ratio of carbon production per unit N availability, which can most effectively be described as the N-use efficiency of growth.
Because the individual processes and pools considered vary between the five models (Table 1), we use a simplified N budget to assess the annual change in the terrestrial N store ( N, including soil and plants): where N dep is the N deposition, BNF is the biological N fixation, and N loss is the N lost from gaseous, leaching, and other pathways, as declared by the models. This paradigm assumes that increased ecosystem N input from deposition or fixation enters the soil and then becomes available for plant uptake.
In a similar way, plant N uptake (N up ) could lead to reduced N losses, which would (assuming constant N inputs) result in an apparent increase in the ecosystem N capital. Note that crop fertilisation is not included here, as it is assumed to be equal in the three simulations. Whether and how this change in N capital affects plant growth is dependent on the magnitude of the change in plant N uptake, as well as the relationship between N up and NPP (whole-plant nitrogen-use efficiency, NUE; Zaehle et al., 2014a) where N up includes plant uptake of soil inorganic N of any origin, i.e. atmospheric deposition, fertilisation, decomposition of plant litter, or biological nitrogen fixation (BNF). NUE is the outcome of the product of tissue stoichiometry and fractional allocation of NPP to different tissue types and therefore varies with changes in the allocation fractions and tissue C : N ratio.

Observations for comparison
We compare the models to a range of observation-based metrics at global and regional scales, detailed in Table S2. Most of the numbers from the literature that we cite are based on Biogeosciences, 17, 5129-5148, 2020 https://doi.org/10.5194/bg-17-5129-2020 T. Davies-Barnard et al.: Nitrogen cycling in CMIP6 land surface models 5133 relatively small numbers of field studies upscaled or averaged to give an approximate global value with confidence intervals. No modification of spatial scale or averaging is done to values used, but where the CO 2 or N increase is specified, it is scaled to 200 ppm or 50 kg ha −1 yr −1 accordingly. While these upscaled values need to be interpreted with caution, in the absence of more robust comparators they are useful benchmarks that can provide real-world context in addition to field-scale comparisons and inter-model comparisons.
Where appropriate, comparisons are made at the climatedetermined region level (see Fig. S1 in the Supplement; Kottek et al., 2006).

Control run global C and N budgets
A range of pools and fluxes from the models compared to the closest comparable observation-based data show a good performance overall and emphasise similarities between the models at the global scale ( Fig. 1). For GPP, all the models compare well to the MTE data (Jung et al., 2011), and when the directly comparable time period is used (see Fig. S2), the models are all within the MTE range. The global GPP value is underlain by some regional variations between models (Figs. S2 and S3). Like GPP, the total ecosystem respiration (TER) is similar across all the models, and most of the models fall within the range of a top-down estimate by Ballantyne et al. (2017) (106 ± 12 GtC yr −1 ). However, the partitioning between the autotrophic and heterotrophic respiration differs (Fig. 1). Autotrophic respiration is overestimated in all the models (Luyssaert et al., 2007;Piao et al., 2010), while heterotrophic respiration is underestimated (Bond-Lamberty and Thomson, 2010). The heterotrophic value from Bond-Lamberty and Thomson (2010) was reduced by 33 % to account for root respiration in line with Bowden et al. (1993).
N inputs differ strongly between the models because of widely varying biological nitrogen fixation (BNF, Fig. 1). The other major input, N deposition, is a prescribed input with small variations resulting from differences in the landsea mask of the individual models. BNF, on the other hand, has a wide range among models. An upscaled meta-analysis of BNF covering the period of approximately 1990-2019 (Davies-Barnard and Friedlingstein, 2020) has a range of 52-141 TgN yr −1 and only one model is outside of that range. The three models with the highest BNF (JSBACH, CLM5, and JULES-ES) are three of the four models that use an NPPbased function (the fourth being CLM4.5). CLM5's processbased function uses a C cost of N acquisition where energy from NPP can produce N based on the work by . JULES-ES, JSBACH, and CLM4.5 use an empirical large-scale correlation with NPP (Cleveland et al., 1999). LPJ-GUESS, the lowest BNF model, also uses an empirical correlation from Cleveland et al. (1999), based on evapotranspiration rather than NPP. Thus, even BNF functions based on the same source (Cleveland et al., 1999) can have very different results (Wieder et al., 2015a), due to the large range of BNF functions within the source and differences in how they are implemented (Meyerholt et al., 2016). BNF dominates N input variability both because of a lack of process understanding to constrain model structures and the continued uncertainty in available observations.
Looking at the soil and vegetation C and N pools as well as the ratios between them, the models have a range of strengths and weaknesses, with no model falling within the observation-constrained range for all pools. However, due to variations in both the modelling and measurement of C and N within different soil depths, not too much emphasis should be placed on the pool comparisons shown in Fig. 1.

Modelled NPP responses to the +CO 2 experiment
The ensemble's global modelled response of NPP to +CO 2 concurs with a meta-analysis of NPP responses to +200 ppm CO 2 that suggests a positive response of 15.6±12.8 % (Song et al., 2019) (Table 2), with all models within that range. Other meta-analyses of productivity (for instance, aboveground woody biomass) changes associated with elevated CO 2 give higher ranges of response (Table 2). These other measures of productivity suggest a lower limit of around 12 %, which encompasses all but one of the models. However, models falling within the range of the observations may be indicative of biases and a lack of precision in the observational estimates rather than the fidelity with which the models can predict local and global response to elevated CO 2 .
CLM4.5 has a notably lower NPP response to +CO 2 than the other models ( Fig. 2), with the exception of areas where the absolute magnitude of NPP is very low and small absolute changes ( Fig. S4) already lead to large proportional changes. However, even in these regions, the absolute changes are consistently less than the other four models (Fig. S4). The low response in CLM4.5 is due to a lack of mechanisms to ameliorate N limitation when C supply increases, for instance via variable C : N ratios or increased BNF (as is the case for CLM5) (Fisher et al., 2018;Wieder et al., 2019). This strong limitation by the N cycle was a key reason why CESM and NorESM in CMIP5 had lower C uptakes in response to CO 2 compared to other carbon cycle ESMs .
Despite the seeming agreement of the NPP response to +CO 2 at the global scale, the regional patterns in response vary considerably for key biomes (Fig. 2). In high-latitude tundra areas, the +CO 2 response ranges from near zero (JULES-ES) via very low (CLM4.5, JSBACH, and LPJ-Guess) to high (CLM5). In most models, this region shows sparse vegetation cover and N availability, allowing for only a little increase in response to elevated CO 2 , whereas the increased BNF in CLM5 facilitates a response to increasing CO 2 levels. With the exception of JULES-ES, most models predict a large +CO 2 response in very dry ecosystems with marginal productivity.
The NPP response of the equatorial region overall (Table S3 and Fig. S1) to +CO 2 ranges from 5 % for CLM4.5 to 23 % for CLM5 and JSBACH. Looking at latitudinal averages (Fig. S4), we can see the overall patterns are consistent across most models, and while the percent change varies a lot, the absolute change in NPP shows considerable agreement between models, with the exception of CLM4.5. Model responses of NPP to +CO 2 in greater Amazonia, however, do not reach a consensus. Comparing the response in the Amazonia region with that of coastal regions of northern South America, the JSBACH response is lower, CLM5 and LPJ-GUESS higher, and JULES-ES and CLM4.5 are approximately the same. JSBACH's dip in +CO 2 NPP response at the Equator (compared to surrounding areas) can also be seen in the absolute values averaged by latitude (Fig. S4). The process responsible for this spatial pattern is currently unclear but may be associated with the strongly enhanced GPP simulated by the model for this region compared to observationderived estimates (Fig. S2).

Modelled NPP responses to the +N experiment
The response to +N in the models shows a binary distribution, with models exhibiting either a high (> 20 %) or low (< 3 %) response (Fig. 3) at the global scale. A meta-analysis of NPP responses to +50 kg N ha −1 yr −1 suggests a positive  response of 3 %-10.5 % (Song et al., 2019), but none of the models are within this range (Table 2). Other meta-analyses of productivity changes with increased N give higher ranges of response (7.5 %-35 %), encompassing three of the five models ( Table 2). As both a percent change and absolute change (see Fig. S5), JULES and JSBACH show much lower +N NPP response than the other models considered here. CLM4.5 has the highest response (24 %), on account of its high initial N limitation (Koven et al., 2013). The tundra biome +N response is high in CLM5 and JULES-ES and lower but present in LPJ-GUESS and CLM4.5 (Figs. 3 and S5). If low NPP is excluded, then the tundra mean response across models is 2 %-9 % (Table S3) and is much lower than the average of observations compiled by LeBauer and Treseder (2008) of 35 % (95 % confidence interval 12 %-64 %). There is a high response to +N in Africa and Australia in CLM4.5, CLM5, and LPJ-GUESS, despite aridity likely limiting increase in NPP in absolute, if not relative, terms but with insufficient observations to make meaningful comparisons. One area of agreement between the models is the lack of +N response of the Amazonian region (Fig. 3), which is consistent with observations which show just a 5 % +N response in tropical forests (Schulte-Uebbing and de Vries, 2018). However, when other tropical regions are included in the models, the +N NPP response rises to 17 %-20 % in LPJ-GUESS, CLM4.5, and CLM5, with JULES-ES and JSBACH remaining low (Table S3).

Comparison of NPP +N and +CO 2 responses
It might be anticipated that there would be a relationship between the +N and +CO 2 responses, as an ecosystem (model) that is less N limited could respond more strongly to increased atmospheric CO 2 . A lack of response to N fertilisation could indicate sufficient N supply and therefore a lacking constraint of N on the response of the vegetation to CO 2 , while a strong response to N fertilisation could indicate insufficient N supply and as a result a strong N limitation of the CO 2 response. We know that response to increased N supply is globally distributed (LeBauer and Treseder, 2008) and that C 3 plants, which make up the majority of vegetation worldwide, have a positive photosynthetic response to additional atmospheric CO 2 (Ainsworth and Long, 2005). However, there is evidence that the +CO 2 response would be limited by N availability (forest NPP response to additional atmospheric CO 2 is limited by N availability; Norby et al., 2010), and it is currently unknown whether +N would be similarly affected.
All the models are consistent with the hypothesis of either N or CO 2 fertilisation at grid cell level, but the effect does not necessarily scale to either the regional or global level. The prevalent grid cell level spatial trend is bimodal, with grid cells either having a strong sensitivity to +N or +CO 2 but not both (see Fig. 4). Comparing percent change emphasises the dichotomy of +N and +CO 2 effects, with most values clustered either near zero for +N or zero for +CO 2 , but Fig. S6 shows that there is no positive relationship or heterogeneous distribution in the absolute values either. The bias toward +CO 2 is clear for JSBACH and JULES-ES, with most values varying in +CO 2 sensitivity but not +N (Fig. 4, also seen in the absolute anomalies in Fig. S6). A slight tendency towards the reverse is true for CLM4.5, CLM5, and LPJ-GUESS, with more points having a strong +N response and a weaker +CO 2 response (Fig. 4). Altogether, LPJ-GUESS and CLM5 show the most areas with both +N and +CO 2 sensitivity. Wieder et al. (2019) found that there was a trade-off between +N impact and +CO 2 impact in CLM4, CLM4.5, and CLM5, and this seems to be true for our ensemble of models too.
The latitudinal distribution of response shows similarities across models, with high latitudes (shown in purple in Fig. 4) generally more +N sensitive and middle latitudes (red to orange in Fig. 4) more +CO 2 sensitive. While negative NPP values are present in both +N and +CO 2 simulations, they occur in different places, with negative NPP occurring in hot arid areas for +N and cold arid areas for +CO 2 (Figs. 2, 3, and 4). In hot arid areas, +N increases simulate GPP and plant growth but also plant respiration, which then exceed the additional productivity, giving a decrease in NPP. Such model behaviour has been noted before ; however, there is little evidence that such a process would occur in nature. The negative values in all models except CLM4.5 also appear to have a regional bias, with a small number of grid cells responding negatively to both +CO 2 and +N in CLM5, JSBACH, and JULES-ES in the subtropics and a larger number of negative values in the subtropics in LPJ-GUESS (Fig. 4). These arid areas appear to be not sensitive to +N nor +CO 2 , probably due to low water availability.
We can gain further insights by considering the relationship between responses to +CO 2 and +N by forest biome (Fig. 5). The ideal for the models is to be in the area where the observations for +N and +CO 2 intersect. Two of the models achieve this partially, JSBACH and CLM5, by having tightly clustered forest vegetation C (VegC) response to +N and forest NPP response to +CO 2 . The dichotomy between +N and +CO 2 NPP response is averaged out at this scale, and the models show little of the L-shaped relationship between the +N response and +CO 2 response seen at the grid cell level (Figs. 4 and 5).
According to collated N addition experiments, we would expect models to have biome-level variation in +N response (LeBauer and Treseder, 2008; Schulte-Uebbing and de Vries, 2018). Schulte-Uebbing and de Vries (2018) show that tropical forest +N VegC response is lowest and boreal and temperate forest response higher (Fig. 5). While LPJ-GUESS and CLM4.5 capture some variation between averaged biomes, none of the models have the biome responses in the correct order (Fig. 5). However, all the models except LPJ-GUESS tend toward a lower (tropical) +N response. LPJ-GUESS, however, is the only model to have the boreal +N response in the correct range. It is the boreal response that seems to be the main issue, as relative to both the temperate and tropical regions, most models show the boreal response as being lower, whereas most of the models have the correct relative +N response for the tropics and temperate regions. Therefore, although the global values of model response are acceptable, the relative spatial patterns show limitations in the reliability of all the models.

N budget responses to +N and +CO 2
The model responses in different components of the N budget reflect and affect their overall N sensitivity (Fig. 6). N inputs of BNF and N deposition and loss (we only consider the sum of leaching and gaseous loss so as to be consistent between models) are similar between all the models in the Control simulation (Fig. 6a). The uptake of N by vegetation varies more strongly between models, reflecting differing levels of N mineralisation and assumed N requirements for growth, as also reflected by the different amounts of C and N pools depicted in Fig. 1. Changes in the N budget components to +CO 2 and +N ( Fig. 6b and c) are not straightforwardly related to changes in productivity (Figs. 2 and 3). For instance, the weak response of NPP to +CO 2 in CLM4.5 would suggest only small changes in uptake compared to the other models (Figs. 2 and 6). However, the +CO 2 -induced changes in uptake in CLM4.5 are higher than that of LPJ-GUESS (Fig. 6b). Similarly, CLM5 has the largest increase in N balance for +CO 2 (Fig. 6b) amongst the models, but this does not correspond to a larger response of NPP (Fig. 2f) or uptake response to elevated CO 2 (Fig. 6b). Nevertheless, Fig. 6b reveals a number of important characteristics of the N cycle response to +CO 2 underlying the NPP response presented in Sect. 3.2. For all models except CLM5, which shows a strong response of BNF to elevated CO 2 , reduced N losses are an important reason for the increased N balance of the ecosystem, which facilitates an increase in NPP in the absence of changes in ecosystem stoichiometry. For all models except CLM5, plant N uptake under elevated CO 2 is more enhanced than the change in the N balance of the ecosystem, implying a net transfer of N from the soil to vegetation.
Conversely, the N uptake changes in JULES-ES and JS-BACH reflect their sensitivity of productivity to +N and +CO 2 (Figs. 2, 3, and 6). For JULES-ES, we can see that this is driven by changes in loss, particularly for +N, which leads to a much smaller increase in N balance in JULES-ES than the other models. In common with all the models, in JULES-ES the N loss term is a fixed fraction of the mineralisation flux and the soil N pool size. However, JSBACH has less than half the increase in N loss of JULES-ES in the +N simulation (Fig. 6c), low changes in BNF compared to other models (Fig. 7b), and almost no change in NUE (Fig. 7d). This suggests that in both JULES-ES and JSBACH there is effectively little unmet N demand in the Control scenario.
BNF responses to +CO 2 in the models differ in magnitude (Fig. 7a) and mostly are smaller than a meta-analysis of CO 2 manipulation suggests (Liang et al., 2016). Only responses of JULES-ES at the global scale and the boreal re- Figure 5. Average 1996-2005 model predictions of woody plant NPP responses to +CO 2 (y axis) and above-ground forest vegetation C pool size responses to nitrogen (N) addition (x axis) for each of the models (as labelled). Area outlined in yellow indicates synthesis of observed woody plant NPP responses to +CO 2 (Baig et al., 2015). Other coloured areas indicate biome-wise estimates of above-ground forest C change per added N (Schulte-Uebbing and de Vries, 2018). For +CO 2 , NPP is restricted to simulated vegetation with NPP > 0.2 kg C m −2 yr −1 to exclude non-forest areas. For +N, forest VegC in CLM5, CLM4.5, and LPJ-GUESS is taken from wood C and N, whereas all C and N is included for JULES-ES and JSBACH due to model output limitations. The biomes are allocated according to the Köppen-Geiger climate classification (Kottek et al., 2006). The lower limits for "Temperate" and "Boreal" +N are the same value. sponse of CLM5 are within the range of the meta-analysis of observations. CLM5 is a clear outlier, with a large increase in BNF. CLM5 takes a C cost approach to BNF, which is different to the other models (Table 1), and BNF can be acquired for a relatively fixed amount of C (Houlton et al., 2008); thus, when C availability increases under +CO 2 , the BNF in CLM5 increases. Fisher et al. (2018) conducted a parameter sensitivity analysis of both +CO 2 and +N fertilisation, which illustrates that both responses are sensitive to the maximum fraction of C from NPP which is available for fixation (a proxy for the fraction of N-fixing plants and their effi- Figure 6. Global averaged 1996-2005 biological nitrogen fixation (BNF), N deposition, N loss via gases and leaching, the balance of those three inputs/losses, and the plant N uptake of the models. The top panel represents the Control scenario, and the second and third panels represent the response to +CO 2 and +N perturbations (see Methods section). Note that the y-axis scale is 4× smaller for +CO 2 response than the Control or +N response. All changes are relative to a nominal N pool in the terrestrial biosphere. Gas and Leaching loss is therefore shown as a negative (a loss from that N pool) in the Control. In the +CO 2 and +N responses, a positive change in "Gas and leach" indicates less losses than in the Control scenario, and a negative change losses more than the Control. ciency). However, the correct parameterisation of this fraction of C available for fixation is not well known and further field studies are required. The BNF +CO 2 response in the other four models is determined by their simple empirical BNF equations (see Table 1) based on NPP or evapotranspiration. However, recent analysis suggests that simple empirical relationships cannot represent BNF well (Davies- . The model BNF responses to +N show one of two responses: a small increase in JULES-ES, CLM4.5, and JSBACH or a large decrease in CLM5 and LPJ-GUESS (Fig. 7b). The latter models capture the correct BNF sign of response to +N of a decrease according to the meta-analysis of Zheng et al. (2019), though the amplitude is too large. The former models estimate BNF as a function of NPP resulting in increased BNF whatever the source of the additional NPP is and even when there is sufficient N. Observational evidence  shows that BNF reduces when N is supplied from another source, and it is understood that this is because facultative (able to modulate) BNF reduces and obligate BNF is out-competed (Menge et al., 2009). Overall, there is little evidence for any of the BNF functions performing well, primarily due to a lack of robust model parameterisations and parameter values.
The NUE responses allow for comparison between models, though comparisons with observations are limited by a lack of field studies. All models have an increase in NUE with +CO 2 in line with the current theory of Walker et al. (2015), with the exception of JULES-ES in the boreal region (Fig. 7c). It is unclear why the boreal region is responding differently to both other regions in JULES-ES and other models, but the boreal region reduction in NUE under +CO 2 likely indicates excess N from mineralisation, possibly triggered by the combination of soil warming and increased atmospheric CO 2 . CLM4.5 has low NUE response to +CO 2 due to fixed C : N ratios, which allow for little change in NUE. The other models allow for either more allocation to wood or flexible C : N that results in the larger increases of NUE.
There is regional variation in model NUE responses to +N between biomes, but all the models in our ensemble reduce NUE in response to +N (Fig. 7d). CLM5 and LPJ-GUESS are distinct in their larger NUE response to +N compared to the other models but do not share the same geographical spread of response. There is little consistency between models as to which regions have the largest change in NUE. CLM5 has the largest NUE change in the temperate region, whereas in JULES it occurs in the boreal region. No empirical measurements are currently available for NUE response to +N. On the basis that scarcity encourages a more frugal use of scarce a resource, a hypothesis could be that NUE could decrease with increased N availability, as the models show. However, water-use efficiency suggests an alternative hypothesis, as it tends to reduce during drought (Yu et al., 2017). Overall, the large variations in signal and sign of BNF and NUE responses to +N treatment between models suggest that there is considerable uncertainty in our understanding.

Discussion
In this paper, we investigated the performance of five N-enabled land surface models that are part of currentgeneration earth system models used in the framework of CMIP6 (Eyring et al., 2016). These new N-enabled land surface models in CMIP6 reproduce key global carbon cycle metrics. Despite the importance of N availability for regional productivity, there is large and unconstrained uncertainty in the magnitude of the global and regional N fluxes (Fig. 1). We have focused on three general components of Nenabled models that affect the plant N uptake and eventual productivity: N inputs via BNF, NUE, and the N losses. We find that all three show considerable heterogeneity of response between models. Previous studies suggest that stoichiometric controls and the processing of soil organic matter are important for a realistic +CO 2 response (Zaehle et al., 2014a). These are essentially contributory factors to NUE, where we find large variation between models (Fig. 7). The lack of well-constrained observations for global and biomelevel NUE and N loss responses implies that these areas need more work. N loss is particularly challenging, as there are multiple pathways (leaching, flooding, gaseous loss, fire, land use change, etc.) and forms (N 2 O, N 2 , etc.) of loss, and each model represents these in different ways. More observational studies and syntheses of existing observations are needed to quantify the N cycle in different biomes. In particular, better constraints are needed for the N cycle response to perturbations.
All the models show a global average productivity response to increased atmospheric CO 2 commensurate with those recorded in field studies. However, the regional responses and mechanisms behind this response vary widely, resulting from the interaction of the instantaneous physiological response to elevated CO 2 (e.g. Ainsworth and Long, 2005, which is embedded in all five models; see Rogers et al., 2017), with limitations imposed by temperature, water, light, and nitrogen, as well as the response time of vegetation dynamics. For instance, in LPJ-GUESS and CLM5 the response to elevated CO 2 in semi-arid tropical ecosystems is smaller than that of temperate ecosystems or other models. This suggests a combined effect of water and nitrogen limitation on soil organic matter decomposition in these models and thus low nitrogen availability that is not compensated for by changes in BNF. Similarly, tundra and arctic responses to elevated CO 2 vary widely across the models and are associated with the representation of BNF. This large regional variance highlights the need for more comprehensive observational data to constrain responses to elevated CO 2 , partic-Biogeosciences, 17, 5129-5148, 2020 https://doi.org/10.5194/bg-17-5129-2020 ularly in under-sampled regions such as the high arctic and tropical semi-arid regions (Song et al., 2019). The growth response to N addition across models is more varied. Two of the five models (JULES-ES and JSBACH) have little productivity response to increased N availability, indicating that they do not have any significant limitation of the C cycle by N availability (Fig. 3). There are four substantial similarities between these two models (Table 1): (i) the use of NPP to determine BNF; (ii) a direct control of NPP by N availability, whereas photosynthetic C uptake (GPP) is not directly affected by N (Goll et al., 2017;Wiltshire et al., 2020); (iii) the use of dynamic (as opposed to prescribed) vegetation, where vegetation cover is determined by the climate input to the model; and (iv) the assumption that N availability in pre-industrial times was sufficient to sustain the C cycle everywhere on land, because observed present-day N limitation is a result of anthropogenic changes, most notably increased CO 2 (Goll et al., 2017).
The hypothesis of no pre-industrial N limitation is based on the assumption that, prior to industrial times, the conditions of natural terrestrial ecosystems were stable for sufficient time to permit any lack of N availability to be filled by BNF (Thomas et al., 2015). Consequently, the pre-industrial Control run with both N and C is very similar to the C-cycleonly version, and a C equilibrium is reached before a N equilibrium. The disjoint between the C and N equilibriums may lead to varying levels of simulated N availability and may affect the model responses to perturbations. While there is evidence for wide-spread (co-)limitation of NPP in recent decades (LeBauer and Treseder, 2008;Song et al., 2019;Vitousek and Howarth, 1991), there is insufficient data to test the hypothesis of no pre-industrial N limitation. A summary by Thomas et al. (2015) suggests reasons why pre-industrial productivity of terrestrial ecosystems was affected by ecosystem N availability, e.g. the presence of unavoidable losses to denitrification or the competitive exclusion of N-fixing species as ecosystems mature. The inability of JULES-ES and JSBACH, when initialised in the assumption that preindustrial N availability does not limit vegetation growth, to simulate observed N addition responses comparable to models without this assumption suggests that this may be an important component of the N cycle constraint on the global C cycle. No pre-industrial N limitation also drives other model decisions (such as N limitation not being incorporated into the GPP equation; see Table 1), which may further contribute to the models being under-sensitive to N compared to observations.
The models mostly represent changes in productivity from +N in high-latitude Northern Hemisphere regions less well than other parts of the world as a percentage, as covered in the results in Sect. 3.3, Fig. 5, and Table S3. While the low NPP of these regions makes them more likely to have high percentage increases, the mean polar +N response across the models is 8 %-59 %, which is broadly in the range of a meta-analysis of observations (12 %-64 %) (LeBauer and Treseder, 2008). But looking at the maps of response (Fig. 3), the model response is either too low or too high compared to the aforementioned observational range. High-latitude tundra is an important but difficult to model biome because of the potential for release of methane (Nauta et al., 2015), permafrost C and N release (Anisimov, 2007;Burke et al., 2012;O'Connor et al., 2010), albedo changes with vegetation expansion (Myers-Smith et al., 2011), and the difficulty in representing large amounts of C stored in soil. This complexity in C and N cycles is not always well understood or represented in models and therefore could limit the ability of models to provide accurate responses to perturbation. A fully integrated model that accounts correctly for all of these is not yet possible but is necessary to reduce uncertainties.
The greater Amazon basin is a critical area of interest for the future of the terrestrial carbon balance under climate change. Our simulations show that, for most models, NPP in this area increases with +CO 2 , but all the models find a small or no change in NPP with +N. These regions are thought to be phosphorus limited rather than N limited, due to depletion through weathering over long periods. This result supports the idea that favourable climate conditions cause a high leaf area index (LAI) in this part of the tropics, such that there is little margin for increased NPP from +N (Fisher et al., 2018). For +CO 2 , there is the potential for increased NPP because of either increase in NUE or decreases in N losses, giving productivity increase without an increase in LAI. Reducing the uncertainty in NPP response to +CO 2 is important, as the moist tropics represent a significant proportion of the world's above-ground biomass; therefore, the size of the overall terrestrial sink will be influenced by the CO 2 uptake in this biome.
This experimental setup considers +N and +CO 2 separately but not the combined effects. It cannot be assumed that the effect of both +N and +CO 2 on productivity are linearly additive. It has been shown elsewhere that LPJ-GUESS  and BIOME-BGC (Churkina et al., 2009) have a significant non-linear (synergetic) term between CO 2 and N deposition. An assessment of the combined effects of +N and +CO 2 may show a significantly different picture of model performance.
Part of the uncertainty in the models comes from the reanalysis climate dataset used to drive the models. CRU-NCEP was chosen for the good spatial and temporal coverage, but some biases exist in the data compared to climatologies such as WATCH (Weedon et al., 2011). Offline simulations driven by low forcing frequency (6 hourly) CRU-NCEP data significantly overestimate evapotranspiration in regions with convective rainfall types and thereby could affect stomatal conductance and photosynthesis (Fan et al., 2019). Responses to +N and +CO 2 may partially be shaped by other limiting factors such as water availability, which will be handled differently between models, limiting the insight into the exact processes that control model responses to change. This does not affect all the models equally, as some are known to be sensitive to the driving climatology. JSBACH, JULES-ES, and LPJ-GUESS may be particularly strongly affected due to their dynamic vegetation. Lawrence et al. (2019) show that CLM5 corresponds best to benchmarks with the GSWP3 forcing dataset (van den Hurk et al., 2016), and work with JULES shows that climate forcing is the biggest cause of variance of those considered .
As well as uncertainty in the models, the observational data also have uncertainties and limitations. Global benchmarks are approximate measures, as multifaceted process mechanics are integrated over large domains and generalised, e.g. over climate zones that are inherently variable. Of the limited global or regional observations available, many use interpolation or proxies such as satellite data to upscale relatively small amounts of direct observational data. The perturbed responses may also have uncertainties beyond the spread of the observed responses because of the small observation basis and potential biases in the geographical sampling. Therefore, they may suffer from leverage points and skew the data towards more accessible, higher-income, or higher-population areas, such as western Europe, which are not representative of where models are impacted most at the global scale. One of the +N global responses cited is based on 126 values from LeBauer and Treseder (2008) but may overestimate the global response by including high responses from young, tropical soils. The NPP response to +CO 2 response for woody plants total above-ground biomass (Fig. 5) is based on just 16 experiments (Baig et al., 2015), making the upscaling to the biome scale less reliable than if more data were available. These meta-analyses combine measurements from a range of time periods and places, and different conditions (e.g. gradual or instantaneous perturbations) and thus models run at a global scale cannot be expected to be entirely consistent. Hence statements about the marginal issues of model accuracy are unlikely to be robust as further observational constraints may alter the perspective.

Conclusions
This is the first systematic comparison of the responses to increased N (+N) and CO 2 (+CO 2 ) in LSMs with terrestrial N cycles contributing to CMIP6. The five models considered here yield fair overall agreement with global and tropical observations but are less robust in high-latitude regions.
The models are not equally sensitive to either +CO 2 or +N, with individual grid cells tending to respond to either +N or +CO 2 . However, at the regional and global scales this pattern is averaged away and there is little correlation. Within this ensemble there is clear distinction between models that show strong N limitation, e.g. CLM4.5, which has a low NPP response to +CO 2 , and models that show very weak N limitation, e.g. JULES-ES and JSBACH, which have a low NPP response to +N. The two models with intermediate N limitation (CLM5 and LPJ-GUESS) capture the global-scale response to +CO 2 and +N reasonably well. However, although CLM5 performs well by many metrics, it is an outlier compared to other models or observations as its BNF and the NUE response to CO 2 appear to be larger than supported by observations. Similarly, LPJ-GUESS captures NPP responses to +CO 2 and +N well at the global level but overestimates the vegetation C response to +N in forested tropical and temperate biomes.
The model initialisation with or without the assumption of sufficient N in pre-industrial times is a key determinant of the differences between the models. The presence of N limitation before the rise of atmospheric CO 2 levels is an important and challenging question to resolve. While further modern constraints on +N response may inform which approach is more realistic, understanding from reconstructions or other data sources could help resolve this question.
The wide range of empirical or semi-mechanistic representations for key processes such as BNF, NUE, and N loss shows how important further process understanding is for many parts of the N cycle. These parts of the models are influential, but because N cycle components are a recent addition to LSMs, fewer data are available to evaluate N cycle processes than for C cycle components. The addition of this representation of N limitation on C uptake is a big step forward in this generation of models, addressing the biggest systematic bias in future projections of land C sinks. However, it is now crucial to better constrain their behaviour at regional and process levels. Consequently, better observational constraints are required to understand whether models are working appropriately, even when the process understanding is improved.
Data availability. The research data supporting this publication are openly available from the University of Exeter's institutional repository at: https://doi.org/10.24378/exe.2624 .
Author contributions. SZ designed and coordinated the model simulations. The model simulations were run by VB, TDB, YF, RAF, HL, DP, BS, DW, and AJW. JM, TDB, and SZ analysed the data and made the figures. TDB wrote the paper with key contributions from all the authors: SZ, JM, PF, VB, YF, RAF, CDJ, HL, DP, BS, DW, and AJW.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. Authors acknowledge the work by Johannes Meyerholt (deceased 2020) that formed the basis of this paper.
Authors acknowledge funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no.  Review statement. This paper was edited by Alexey V. Eliseev and reviewed by Vivek Arora, Peter Thornton, and Joshua Fisher.