Improved representation of phosphorus exchange on soil mineral surfaces reduces estimates of phosphorus limitation in temperate forest ecosystems

. Phosphorus (P) availability affects the response of terrestrial ecosystems to environmental and climate change (e.g. elevated CO 2 ), yet the magnitude of this effect remains uncertain. This uncertainty arises mainly from a lack of quantitative 15 understanding of the soil biological and geochemical P cycling processes, particularly the P exchange with soil mineral surfaces, which is often described by a Langmuir sorption isotherm. We first conducted a literature review on P sorption experiments and terrestrial biosphere models (TBMs) using Langmuir isotherm. We then developed a new algorithm to describe the inorganic P exchange between soil solution and soil matrix based on the double-surface Langmuir isotherm and extracted empirical equations to calculate the sorption capacity and Langmuir 20 coefficient. We finally tested the conventional and new models of P sorption at five beech forest sites in Germany along a soil P stock gradient using the QUINCY (QUantifying Interactions between terrestrial Nutrient CYcles and the climate system) TBM. We found that the conventional (single-surface) Langmuir isotherm approach in most TBMs largely differed from

Abstract. Phosphorus (P) availability affects the response of terrestrial ecosystems to environmental and climate change (e.g., elevated CO 2 ), yet the magnitude of this effect remains uncertain. This uncertainty arises mainly from a lack of quantitative understanding of the soil biological and geochemical P cycling processes, particularly the P exchange with soil mineral surfaces, which is often described by a Langmuir sorption isotherm.
We first conducted a literature review on P sorption experiments and terrestrial biosphere models (TBMs) using a Langmuir isotherm. We then developed a new algorithm to describe the inorganic P exchange between soil solution and soil matrix based on the double-surface Langmuir isotherm and extracted empirical equations to calculate the sorption capacity and Langmuir coefficient. We finally tested the conventional and new models of P sorption at five beech forest sites in Germany along a soil P stock gradient using the QUINCY (QUantifying Interactions between terrestrial Nutrient CYcles and the climate system) TBM.
We found that the conventional (single-surface) Langmuir isotherm approach in most TBMs largely differed from P sorption experiments regarding the sorption capacities and Langmuir coefficients, and it simulated an overly low soil P-buffering capacity. Conversely, the double-surface Lang-muir isotherm approach adequately reproduced the observed patterns of soil inorganic P pools. The better representation of inorganic P cycling using the double-surface Langmuir approach also improved simulated foliar N and P concentrations as well as the patterns of gross primary production and vegetation carbon across the soil P gradient. The novel model generally reduces the estimates of P limitation compared with the conventional model, particularly at the low-P site, as the model constraint of slow inorganic P exchange on plant productivity is reduced.

Introduction
Nutrient availability is one of the key factors affecting the productivity of terrestrial ecosystems and their future carbon (C) balance (Fernández-Martínez et al., 2014;Wieder et al., 2015). Although nitrogen (N) is the main constraint of plant biomass responses to an elevated CO 2 (eCO 2 ) concentration in many terrestrial ecosystems, phosphorus (P) availability likely constrains the biomass responses to eCO 2 in major global biomes (Du et al., 2020;Elser et al., 2007;Lebauer and Treseder, 2008;Terrer et al., 2019). For instance, the tropical forests and forests grown on old soils are known to be strongly limited by P availability, and they might not be able to sequester additional C in the future as the CO 2 concentration continues to increase (Hubau et al., 2020;Jiang et al., 2020). While temperate and boreal forests are generally considered to be N limited (Lebauer and Treseder, 2008), recent studies have shown that a decreasing P nutritional status is concomitant with increasing atmospheric CO 2 (Penuelas et al., 2020;Jonard et al., 2015). These findings highlight the importance of representing C-P interactions in terrestrial biosphere models (TBMs) and including P cycle processes in the future global C balance predictions.
There has been a continuous effort to include P cycling processes into TBMs in the past decade (Goll et al., 2012(Goll et al., , 2017Sun et al., 2021;Thum et al., 2019;Yang et al., 2014;Zhu et al., 2019;Wang et al., 2010;Yu et al., 2018). Many current TBMs employ the scheme (Fig. 1a) developed by Wang et al. (2007) to describe soil geochemical processes. This model considers the soil inorganic P (Pi) as soluble Pi (P sol ), labile Pi (P lab ), sorbed Pi (P sorb ), occluded Pi (P ocl ), and primary Pi (P primary ), with few exceptions where labile and sorbed Pi pool are grouped into one pool (Zhu et al., 2016. The exchange between soil solution and soil matrix is described with a Langmuir adsorption isotherm assuming that P sol quickly exchanges with P lab on the mineral surface. The P lab pool also slowly exchanges with P sorb at a linear rate regardless of P sol concentrations, but this relationship is parameterized very differently across TBMs (Helfenstein et al., 2020).
Many modeling studies have emphasized the significance of biological P processes (Fleischer et al., 2019;Jonard et al., 2010;Yu et al., 2018;Wang et al., 2010) and underestimated the role of geochemical P processes. In these models, organic P recycling processes are the major fluxes, whereas the geochemical P fluxes are small (Sun et al., 2021). Particularly, the effect of (ad)sorption kinetics has seldom been discussed in previous modeling studies (Fleischer et al., 2019;Yang et al., 2014), although they are known to directly and strongly regulate P sol and P lab , thereby affecting P bioavailability (Frossard et al., 2000;Shen et al., 2011).
In this paper, we first conducted an extensive literature review of important soil characteristics affecting soil phosphorus sorption kinetics and then developed a model concept to reconcile measured P stocks with model simulations. In this new model, we applied the "two-surface" modification (Holford and Mattingly, 1976;McGechan and Lewis, 2002) to the conventional single-surface Langmuir isotherm to formulate a novel algorithm for Pi exchange with mineral surfaces, namely a double-surface Langmuir isotherm. We hypothesized that both P lab and P sorb exchange with P sol in the new model (Fig. 1b), following evidence from isotopic studies (Helfenstein et al., 2020;Frossard et al., 2011). We then compared and evaluated the performance of the novel and conventional models with measured soil Pi pools as well as foliar N and P concentrations for a gradient of soil P stocks (164-904 g P m −2 ) and availability in a similar cli-mate (mean annual temperature, MAT, 4.5-8 • C) and under similar vegetation conditions (mature German beech forests, 120-140 years). Lastly, we tested the sensitivity of the alternative Pi exchange schemes on ecosystem P and C cycling to changes in P cycling parameters, and we also tested their responses to changes in P availability (P fertilization), C availability (CO 2 fertilization), and changes in both P and C availability.

Literature review on the Langmuir isotherm
The two parameters in the Langmuir isotherm, the maximum sorption capacity (S max ) and the Langmuir coefficient (K m ), are statistically fitted from measured adsorption curves of batch sorption experiments (Barrow, 1978). Following the scheme of Lloyd et al. (2001), most TBMs adapted the constant-equilibrium assumption to calculate a partition coefficient, k p (Eq. 2), which is the ratio of change in the P sol concentration to the change in available P (P sol and P lab ). The partition coefficient affects the rate of P sol released from the P lab pool as P sol is consumed and, thus, eventually affects the soil P supply to plants and microbes.
To better understand the characteristics of phosphate exchange between solution and the soil mineral surface, we conducted a literature review of batch phosphate sorption experiments that describe the P sorption curves using a Langmuir isotherm, and we converted the lab-fitted Langmuir parameters (Q max , mg P per kg soil, and K L , L per mg P; see Sect. S2 in the Supplement) to the parameters used in TBMs, sorption capacity and the Langmuir coefficient (S max and K m , g P m −2 ; Eq. 1). From the batch phosphate sorption experiments found in the literature, we calculated the partition coefficient (k p ) following Eq. (2). Similarly, we reviewed the values of S max and K m in modeling studies and calculated k p accordingly. To showcase the difference in the Langmuir isotherm parameter values in TBMs and experiments as well as the differences between single-and double-surface Langmuir isotherms, we compared the exchangeable soil Pi curves of different TBMs' and batch experiments' data to the doublesurface Langmuir isotherm. We also simulated desorption experiments with the single-and double-surface Langmuir isotherms to demonstrate their different responses to P removal.
L. Yu et al.: Improved representation of phosphorus exchange 59 2.2 The QUINCY model QUINCY (QUantifying Interactions between terrestrial Nutrient CYcles and the climate system) is a terrestrial biosphere model of coupled C, N, and P cycles as well as energy and water processes . The model represents the growth of vegetation and the turnover of litter and soil organic matter at 30 min timescales, coupled with the calculation of the terrestrial energy and water budgets. Vegetation is represented by average individuals of plant functional types (here, a temperate broadleaved tree; see Sect. 2.3). Gross carbon uptake as well as leaf area and vegetation biomass and structure are directly influenced by nutrient availability through their effect on photosynthesis, tissue growth, allocation, and mortality. The model explicitly considers depth profiles (discretized into 15 layers with a default total soil column depth of 9.5 m) of soil temperature, moisture, and biogeochemical pools, representing litter and soil organic matter compartments, as well as inorganic forms of N and P. Vertical transport processes include diffusion and bioturbation. The phosphate exchange between soil solution and soil matrix is described using a conventional single-surface Langmuir isotherm, which is implemented the same way as in other TBMs. Different from other TBMs, QUINCY estimates S max and K m using soil texture and organic matter content .
To test our hypothesis that both P lab and P sorb exchange with P sol , we modified the QUINCY Pi pool structure to describe the Pi exchange between solution and the soil matrix ( Fig. 1b), using a double-surface Langmuir isotherm (Eq. 3) (Holford and Mattingly, 1976;McGechan and Lewis, 2002).
where hlp = S max,1 K m,1 The first sorption sites are responsible for the fast exchange (Eq. 3a) between P sol and P lab and have much lower bonding strength than the second sorption sites that are responsible for the slower exchange (Eq. 3b) between P sol and P sorb . We estimated two soil sorption maxima (S max,1 and S max,2 ; Eq. 3c) and calculated two Langmuir coefficients (K m,1 and K m,2 ; Eqs. 3d and 3e) based on batch experiment data from the literature review, assuming both single-and double-surface isotherms can be fitted with batch experiment data. The detailed derivation is described in the Supplement (Sect. S3).

Sites and data
We performed analysis at five mature beech forest stands in Germany: Bad Brückenau (BBR), Mitterfels (MTF), Vessertal (VES), Conventwald (COM), and Lüß (LUE) ( Table 1, based on Table 1 from Lang et al., 2017). Total soil P stocks (g P m −2 , up to 1 m depth) decrease strongly along the following gradient: BBR (904) > MTF (678) > VES (464) > COM (231) > LUE (164). Soil was sampled up to 1 m depth at each site, with layer depths of 5-10 cm, for the measurement of total C, N, organic and inorganic P, and other physiochemical properties, such as soil texture, pH, and oxalate-extractable Fe and Al. Modified Hedley fractionations (Tiessen and Moir, 2008) were conducted on soils from all depths for the measurements of labile Pi (P extractable with anionic resin and Pi extractable with 0.5 M NaHCO 3 ), sorbed Pi (0.1 M NaOH extractable Pi), occluded Pi (P residual, acid digestion), and primary P (1 M HCl extractable P). Beech leaves were sampled in July and August from five trees for the measurement of leaf N and P concentrations.

Model setup
To compare the performance of different phosphate exchange schemes, we applied the conventional single-surface Langmuir isotherm here (siLang; Fig. 1a) as well as the novel double-surface Langmuir isotherm (dbLang; Fig. 1b) as model variants. To further test the causes of differences between these model variants, we introduced a Control simulation, which serves as a reference run without P limitation, and a four-pool simulation (4pool; Fig. 1c), which serves as a special case of siLang that does not include slow Pi exchange. All four models employed Eq. (3) to calculate the S max and K m in the Langmuir isotherms. In siLang and 4pool, total sorption capacity (S max ; Eq. 3c) only refers to P lab (or P exchangeable ), whereas the two sorption maxima (S max,1 and S max,2 ) refer to P lab and P sorb , respectively, in dbLang. In the Control model, P sol was kept at concentrations not limiting plant uptake or soil organic matter (SOM) decomposition. All models were initialized with a 15-layer soil column of 9.5 m which has a decreasing SOM content as soil depth increases . As for the inorganic P pools, the initial values of the top 1 m of soil were prescribed from the Hedley fractionation measurements and extrapolated to the deeper soil assuming an increased fraction of primary P and a decreasing fraction of exchangeable soil Pi pools (P lab and P sorb ). The initialization of soil inorganic P pools deeper than 1 m is described in the Supplement (Sect. S4).  (siLang), where only the labile P pool (P lab ) is exchanging P with the soil solution (P sol ); (b) a double-surface Langmuir model (dbLang), where both P lab and P sorb are exchanging P with the soil solution but with different Langmuir parameters; and (c) a Langmuir model with only four inorganic P pools (4pool), where P lab and P sorb are combined to one pool (P exchangeable ) to which P sol can get adsorbed via Langmuir sorption.

Model experiment protocol
The models were spun up for 500 years with meteorology and other atmospheric forcing (atmospheric CO 2 as well as N and P deposition), which were randomly drawn from the years from 1901 to 1930. During the model spin-up, the P cycle was simulated dynamically, but the more stable Pi pools, i.e., P ocl , and P primary , were kept constant to ensure that all of the models initialized at the same field P status. After spinup, all models were run from 1901 to 2015 using the annual values for atmospheric CO 2 , N (Lamarque et al., 2010(Lamarque et al., , 2011 and P (Brahney et al., 2015;Chien et al., 2016) deposition, and the meteorology (Viovy, 2018) of the respective year. The C : N and C : P ratios of the slow SOM pool were cali-brated per site per soil depth to match the measured soil C : N and C : P ratios so that the simulated SOM cycling adequately represented the measured site condition. All of the other parameter values were either taken from Thum et al. (2019) or modified based on the development in this study, except for the maximum biological N fixation rate, which is set to be 2.5 kg N ha −1 yr −1 for all of the study sites.

Model evaluation
Trend analyses were carried out with the Mann-Kendall test (M-K test) of the Kendall R package (McLeod, 2011). In the Mann-Kendall test, the tau (τ ) value varies between −1 and 1, where −1 represents a decreasing trend and 1 repre-sents an increasing trend. The modeled soil profile against the measured soil profile was evaluated with a normalized rootmean-square ratio term, K nrmsr , which is modified to represent the average proportions between modeled and measured values (Yu et al., 2020b).
K i is the variable representing the ratio between simulated and measured values (in volumetric units) at the measured ith layer. A paired t test was conducted on the K nrmsr of all study sites between siLang and dbLang to verify if one model was statistically better than the other.

Sensitivity analysis
To evaluate the response of alternative Langmuir isotherms to changes in P cycling processes, we further tested the sensitivity of the siLang and dbLang models to the P cycling parameterization at one low-P site, Conventwald (COM), and one high-P site, Mitterfels (MTF), using a standard Latin hypercube design (LHS; Saltelli et al., 2004). We selected 16 parameters that directly control flux rates of inorganic or organic P cycling processes and varied each parameter by ± 20 % of its default value (Table S2) using LHS sampling of a uniform distribution in order to form a set of 1000 LHS samples. The model outputs were evaluated in terms of gross primary productivity (GPP), foliar N and P content, vegetation C stock, soil organic carbon (SOC) content, total soil organic P (Po) and inorganic P, and the ratio between P lab and exchangeable Pi. We measured parameter importance as the rank-transformed partial correlation coefficients (RPCCs) to account for potential nonlinearities in the association between model parameters and output (Zaehle et al., 2005;Saltelli et al., 2004).

Simulated CO 2 and P fertilization experiments
To test the effects of phosphate exchange schemes on environmental changes, we conducted a CO 2 fertilization model experiment, a P fertilization model experiment, and a CO 2 -P (CP) fertilization model experiment using the siLang, dbLang, and 4pool models at each study site.
In the CO 2 fertilization experiment, the atmospheric CO 2 concentration was increased by 200 ppm from 2006 to 2015. In the P fertilization experiment, 50 kg ha −1 KH 2 PO 4 , i.e., 1139.7 mg P m −2 , was added once to the soil as soluble phosphate on 16 September 2006.

Langmuir isotherm parameters in batch sorption experiments and TBMs
In batch sorption experiments, the maximum soil P sorption capacity S max ranges from 187 to 829 g P m −2 (25th quantile and 75th quantile values, median 390 g P m −2 ) and the Langmuir coefficient K m ranges from 0.21 to 4.5 g P m −2 (median 0.93 g P m −2 ); therefore, the calculated partition coefficient k p varies between 0.005 and 0.022 (median 0.01; Table 2, Fig. S1). Regarding the parameterization of S max and K m in TBMs, a few studies have directly retrieved values from sorption experiments (Wang et al., 2007;Yang et al., 2014), although most modeling studies estimate S max and K m based on soil types (Wang et al., 2010;Goll et al., 2012;Zhu et al., 2019) or soil texture . The two studies using lab-derived Langmuir parameters fitted well in the range of experimental values; however, for those modeling studies that estimate Langmuir parameters, only the QUINCY model  produced reasonable Langmuir parameters and k p values, while most other TBMs greatly overestimated K m , thereby generating much higher k p values than batch experiments (Table 2).
In Fig. 2a, it is also shown that most TBMs are at or below the lower range of curves from batch sorption experiment data (BED) except the two isotherms from QUINCY that are close to the BED-median curve. All of the other TBMs either have very low S max or overly high K m , which can both lead to flattened soil Pi curves, indicating that very low amounts of P are stored as exchangeable Pi in the soil, even at high P sol concentrations.
However, the responses to P removal are noticeably different between the single-and double-surface Langmuir models (siLang and dbLang) given the similar soil Pi curves (Fig. 2b). The P sol in siLang almost reached zero after a 30 d desorption experiment, whereas dbLang still maintained a P sol level of 0.2 g m −2 , suggesting a much stronger buffering capacity in dbLang compared with siLang. This is mainly because the replenishment of P sol in siLang is strongly limited by the rate of desorption from P sorb to P lab , but P sol directly exchanges with P sorb when the P is removed in dbLang.

Simulated and measured ecosystem properties
The siLang, dbLang, and 4pool models can adequately reproduce the measured SOC content as well as the SOM C : N and C : P ratios along soil profiles at all of the study sites after site-and depth-specific calibration (Fig. 3a, b, c; Table S1). The performance of the three models did not differ much regarding the simulation of SOM profiles due to our calibration strategy (Tables 2, S1). However, the novel dbLang model better reproduced the ratio between P lab and exchangeable Pi (Lab-to-Exchangeable P) than siLang (K nrmsr improvement of 0.130 ± 0.077, p<0.05) ( Table 3). The improve-  Figure 2. (a) Exchangeable soil inorganic P (P lab and P sorb ) curves based on different terrestrial biosphere model (TBM) parameters and batch experiment data (BED). In TBMs, exchangeable soil Pi is defined as the sum of P lab and P sorb . In BED, soil exchangeable Pi only refers to P lab in Eq. (1) due to a destructive experimental procedure that mobilizes the soil Pi well (Barrow and Shaw, 1979). For the conventional Langmuir model (siLang), P sorb is calculated as 9 : 8 (global average ratio between P sorb and P lab , reported in Yang et al., 2013) of P lab at all soluble P concentrations; for the double-surface Langmuir model (dbLang), P sorb is calculated following Eq. (3b). (b) Simulated desorption curves for single-and double-surface Langmuir isotherms. The two desorption curves start with the same exchangeable soil Pi (152.2 g P m −2 , P lab + P sorb ) and soluble Pi (0.769 g P m −2 , P sol ) content, and the same amount of P (2.5 g P m −2 ) is removed from soil solution every day. Both Langmuir isotherms are assumed to be in equilibrium at a daily time step. Table 3. Results for the paired t test between the normalized rootmean-square ratios (K nrmsr ; Table S1) of the siLang and dbLang models of measured soil profile properties at all study sites. The null hypothesis is that dbLang performs no differently than siLang (i.e., dbLang has the same K nrmsr values as siLang), and the hypothesis is rejected when the p value is smaller than 0.05 (values in bold with an asterisk). for Control, dbLang, siLang, and 4pool, respectively) were within the average range of measured values (24.32± 1.43 mg N g −1 DW). However, the simulated decreasing trend in the foliar N content (M-K test, τ = −0.6, −0.8, −1, −0.8 for Control, dbLang, siLang, and 4pool, respectively) along the soil P gradient was not found in the measured data (M-K test, τ = −0.2) (Fig. 4a). Decreasing trends in simulated foliar P content in all models (M-K test, τ = −0.8, −0.95, −1, −1 for Control, dbLang, siLang, and 4pool, respectively) were instead also seen in measurements, but the decreasing trend was much weaker (M-K test, τ = −0.6) (Fig. 4b). The simulated foliar P content was highest in the Control model (1.40 ± 0.09 mg P g −1 DW) and lowest in siLang (0.95 ± 0.21 mg P g −1 DW) at each study site, as Control is not P limited and the strongest P limitation occurs in siLang. The simulated difference in foliar P content across models reflects both plant P uptake and productivity (Figs. 4, S2-5). For example, at the P-rich BBR site, although the simulated GPP, leaf area index (LAI), aboveground C, and fine-root C for the four models were almost identical (Fig. S2), the foliar P content values of four models were different due to differing plant P uptake (0.99, 0.95, 0.89, 0.95 g P m −2 yr −1 for Control, dbLang, siLang, and 4pool, respectively; Fig. S2). In contrast, at the P-poor LUE site, the differences among siLang, dbLang and 4pool for GPP, LAI, and plant C were more pronounced than that for foliar P content, as the limited P uptake (0.67, 0.53, 0.55 g P m −2 yr −1 for Control, dbLang, and siLang, respectively; Fig. S2) drastically changed the plant C allocation (e.g., leaf-to-root ratio) and led to subsequent changes in plant properties (Figs. 4, S2).

Sensitivity analysis
Our sensitivity tests at the low-P COM site (231 g P m −2 ) and the high-P MTF site (678 g P m −2 ) showed a diverging effect of model choice and parameterization on selected ecosystem properties, such as GPP, plant C, and foliar P content (Figs. 5, S6). Both models were very consistent in maintaining the vertical pattern of the simulated ratio between labile and exchangeable Pi (Lab-to-Exchangeable P) across different parameterizations (Fig. S7). In other words, the better fit of dbLang in capturing the measured decreasing trend in Lab-to-Exchangeable P was robust against model parameters choice (Figs. S6, S7).
The P cycling parameters with the strongest effect on the selected ecosystem properties were very different between the two models and moderately different between the two sites ( Fig. 5, Table S3). In siLang, the parameters with the highest impact for most selected outputs (e.g., foliar P, GPP, and plant C, as in Fig. 5) were the absorption and desorption rates between P lab and P sorb (k abs and k des ), i.e., the slow exchange process, whereas they were the turnover rates and N : P ratios of slow and fast SOM pools (τ slow , τ fast , SOM_np, and microbial_np) in dbLang. One common feature of both models is that the low-P site (COM) was more affected by SOM_np, the maximum weathering rate (k_weath_mineral) and the maximum plant P uptake rate (vmax_uptake_p_p4) than the high-P site (MTF), inferring a tighter P cycle in a low-P ecosystem than in a high-P ecosystem.

Modeled ecosystem responses to C, P, and CP fertilizations
The simulated ecosystem responses to CO 2 fertilization, P fertilization, and CP fertilization differed greatly among study sites, models, and fertilization types due to the differences in soil P availability and model schemes of Pi exchange (Figs. 6, 7, S3-S5). At the P-rich BBR site, the simulated GPP, LAI, aboveground and fine-root C, and plant uptake of N and phosphate increased after CO 2 and CP fertilizations but did not change after P fertilization in all three models (siLang, dbLang, and 4pool; Fig. 6a). This indicates that the simulations at BBR were not limited by P, which is also supported by the small difference in the responses between the CO 2 and CP fertilization experiments. In contrast, at the Ppoor LUE site, the simulated responses to P and CP fertilization were much stronger than those after CO 2 fertilization in all three models (Fig. 6c), indicating a strong P limitation at LUE. This simulated P stress among models can be quantified when compared to the Control model (Fig. 6d). The siLang model simulated the lowest plant P uptake at all three sites with high, moderate and low P concentrations in soil. This difference in plant P uptake was not only reflected in the foliar P concentration at the high-P site but also in other vegetation properties at moderate-to low-P sites. The 4pool model, which is a special case of the single-surface Langmuir isotherm, simulated similar P stress to dbLang at high-and moderate-P sites, whereas it behaved similarly to siLang at the low-P site and simulated a high P stress (Fig. 6d).
The changes in the model P pools after CO 2 , P, or CP fertilizations further illustrated the effects of Pi exchange schemes on ecosystem responses under varying soil P availability (Fig. 7). In the CO 2 fertilization experiment, the chronic increase of CO 2 led to increases in plant biomass P at all study sites in all three mod-els (Fig. 7, top row). The concurrent increases in plant P pools were compensated for by decreases in labile and sorbed Pi as well as Po in the microbial pool (fast SOM). At the high-P BBR site, the increases in plant P (631.4 ± 1.4 mg P m −2 ), litter P (107.8 ± 0.8 mg P m −2 ), and slow Po in SOM (62.8 ± 0.3 mg P m −2 ) were similar in all models, and the mobilization from soil exchangeable Pi (−417.4 ± 16.1 mg P m −2 ) and fast Po SOM (−354.2 ± 2.5 mg P m −2 ) contributed similarly to P sources. Conversely, at the P-poor LUE site, dbLang simulated a significantly higher plant P increase (188.3 mg P m −2 ) than siLang (69.5 mg P m −2 ) and 4pool (18.5 mg P m −2 ), due to its stronger capacity to mineralize P from the microbial P pool. dbLang can maintain a much higher SOM pool in topsoil (Fig. 3a), as P sol is better buffered under low soil Pi In the P fertilization experiments, the fate of added P largely differed between siLang and dbLang, as the added P was preferably and quickly transferred to P sorb in dbLang compared with siLang (Fig. 7, bottom row).
In the CP fertilization experiments (Fig. 7, middle row), plant P did not gain more P at high-and moderate-P sites (BBR and VES) compared to the C fertilization, indicating both sites might not be P limited under eCO 2 . Surprisingly, microbes (fast SOM) also did not benefit from P addition on top of eCO 2 at BBR and VES, as most added P was transferred to the soil Pi pools. However, at the low-P LUE site (Fig. 7, last three columns), the combination of C and P fertilization produced higher plant P increases (928.5, 1202.3, and 971.1 mg P m −2 for 4pool, dbLang and siLang, respectively) than P fertilization alone in all three models, inferring a very strong P limitation on C sequestration at the low-P LUE site.

Model representation of soil Pi cycling and Langmuir sorption
The majority of terrestrial biosphere models (TBMs) describe the soil inorganic P (Pi) cycling processes with a similar structure of pools and fluxes (Fig. 1a or c), and they mostly describe the key soil Pi exchange process, i.e., the (ad)sorption and desorption between soil solution and matrix, using the Langmuir isotherm. In this study, we have shown that the values of Langmuir parameters in most TBMs, namely maximum P sorption capacity (S max ; Eq. 1) and the half-saturation Langmuir coefficient (K m ; Eq. 1), largely differ from those reported in the P sorption batch experiments (Table 2). This disagreement between TBMs and experiments leads to much lower exchangeable soil Pi (P lab and P sorb ) content in TBMs than experiments under the same soluble Pi (P sol ) conditions (Fig. 2a). This is in line with the much higher partition coefficient (k p ; Eq. 2) values in most TBMs compared with experiments (Table 2). Additionally, the reported S max values have a comparable size to the total soil Pi in experiments (Fig. S1), indicating that most of the soil-associated Pi could be exchanged with the soil solution at a certain point. This implies that the Langmuir isotherm can describe not only the fast exchange between P sol and P lab but also the slower exchange with P sorb . This hypothesis Figure 5. Partial correlation coefficient (pcc) values of eight main parameters of foliar P content, gross primary production (GPP), and vegetation C stock at the COM and MTF sites of the single-surface Langmuir model (siLang) and the double-surface Langmuir model (dbLang) in the Latin hypercube design (LHS) sensitivity runs. The eight parameters are as follows: the maximum plant P uptake rate (vmax_uptake_p_p4), the turnover rates of slow and fast soil organic matter (SOM) pools (tau_slow and tau_fast, respectively), the N : P ratio of the slow SOM pool (SOM_NP), the phosphate sorption capacity of fine soil (qmax_po4_silt), the correction coefficient of pH on Langmuir K m (km_po4_ph), and the absorption and desorption rate between P lab and P sorb (k_ adsorpt_po4 and k_desorpt_po4, respectively). The significance levels shown in figure (using asterisks) are as follows: p<0.001 ( * * * ), p<0.01 ( * * ), and p<0.05 ( * ).
is supported by the classic sorption model of Barrow (Barrow, 1978, 1983) and the isotopic exchange kinetics conceptual model (Fardeau, 1995;Frossard and Sinaj, 1997;Morel et al., 2000) that Pi ions located in the soil matrix are distributed along a continuum of solubility, with some being in rapid equilibrium with Pi ions in soil solution and some being in slow equilibrium. Following the concept of the soil P solubility continuum, we have developed the double-surface Langmuir isotherm (dbLang; Fig. 1b, Eq. 3) in the QUINCY TBM and parameterized it with batch sorption experiment data. The doublesurface Langmuir isotherm simulated a higher P-buffering capacity than the single-surface Langmuir isotherm (siLang) when P is removed from the soil (Fig. 2b), which is in agreement with the experimental results of Roberts and Johnston (2015). By comparing dbLang with the conventional siLang model and a special four-Pi-pool structure of siLang (4pool model, used in Zhu et al., 2016(4pool model, used in Zhu et al., , 2019, we further investigated the effects of different sorption-desorption schemes on soil P under the QUINCY TBM framework (Fig. 3). The model performance in simulating the relationship between soil labile and sorbed Pi pools (Lab-to-Exchangeable P) has been improved by dbLang at our study sites of varying soil P stocks, soil texture, and stoichiometry (Lang et al., 2017;Yu et al., 2020b). Such an improvement of dbLang with respect to reproducing the Lab-to-Exchangeable P ratio much better captures the vertical patterns of soil inorganic pools (Figs. 3d, e, f), which is also supported by statistical analysis that the measured soil P profiles were slightly better reproduced (Tables 3, S1). This was not simply caused by improvements in modeling the individual labile or sorbed Pi pool but rather by a better representation of the Pi exchange among P sol , P lab , and P sorb , as such an improvement is independent of site conditions and model Figure 6. Simulated changes in gross primary production (GPP), aboveground C, fine-root C, leaf area index (LAI), uptake of N and P, and foliar N and P content due to CO 2 , P, and CP fertilizations at the (a) BBR, (b) VES, and (c) LUE sites, and (d) the ratio of unfertilized scenarios of each model to the Control model. Bars are calculated as the percentage change in subplots (a) to (c), and the bars in subplot (d) are fractions representing how much the models deviate from a non-P-limited condition. parameterization (Figs. 3, S7). Additionally, the siLang and dbLang models in this study shared the same set of QUINCY parameters; therefore, they had the same model complexity but differed in model performance. However, the new scheme did require more input than other TBMs (listed in Table 2), as the new parameterizations were dependent on soil texture, pH, and extractable Al or Fe, whereas most other TBMs use prescribed values.
In this study, we have included the Hedley soil P data as part of the model validation, which, to our knowledge, has seldom been done before, although the Hedley soil P data (Yang and Post, 2011;Yang et al., 2013) have been widely used as initialization data for TBMs. The depth-explicit and quantitative evaluation of model Pi pools with field measurements provides us a unique tool to diagnose different sorption-desorption schemes, and it warrants future appli- Figure 7. Simulated temporal responses of model P pools (labile Pi; sorbed Pi; litter Po; Po in fast and slow soil organic matter (SOM) pools -fast-SOM_Po and slow-SOM_Po, respectively; and P in the plant -Veg_P) to CO 2 , P, and CP fertilizations at the BBR, VES, and LUE sites. In the C fertilization experiment, the atmospheric CO 2 concentration was increased by 200 ppm from 2006 to 2015. In the P fertilization experiment, 1139.7 mg P m −2 was added once to the soil as soluble phosphate on 16 September 2006. The CP fertilization is the combination of C and P fertilizations. Each point in the x axis represents a P budget change at the specific time, with the size change in all P pools equal to zero (C fertilization) or the added P (1139.7 mg P m −2 , P and CP fertilization). The change in a specific P pool at any given time is equal to the total change in related P fluxes since the fertilization (e.g., the plant P pool change relates to plant P uptake and P litterfall, the labile Pi pool change relates to P adsorption (and P absorption) (Fig. 1), the Po in the fast SOM pool change relates to litter and fast SOM decomposition, and so on).
cations in model-data comparison or model intercomparison studies.

Model performance and confidence
We have shown that dbLang performed better than siLang (and 4pool) in reproducing the measured soil Pi pools and foliar P content (Figs. 3,4; Table 3). Several observations suggest that the novel dbLang model produced more realistic GPP and vegetation C patterns, although there is no direct evidence. First of all, Lang et al. (2017) found that there are no trends in the measured tree heights nor stand biomass along our soil P gradient (Table 1). As all sites are unmanaged beech stands with a similar age structure, we can assume that GPP and vegetation C should also be similar between the sites in the absence of nutrient limitation. This was better reproduced using dbLang than siLang, as less P limitation is simulated using dbLang. Secondly, studies have shown that biological processes dominate the C-P interactions in P-poor beech forests (Bünemann et al., 2016;Pistocchi et al., 2018); this was supported by the sensitivity results of only dbLang (Fig. 5, Table S3), whereas slow sorption/desorption process dominated GPP and vegetation C regardless of P availability in siLang. Most importantly, siLang overestimated P lab and P sorb in topsoil but still greatly underestimated the plant productivity and biomass at low-P sites (COM and LUE). Collectively, these findings suggest that a double-surface Langmuir model of Pi sorption-desorption better described forest C and nutrient dynamics at these sites than the conventional single-surface Langmuir model. Yang et al. (2014) used the siLang scheme to study P cycling in P-limited tropical forests. They showed that a doubled rate of P transfer from P sorb to P lab leads to similar increases in plant productivity and biomass as a doubled P mineralization rate. This is in line with our sensitivity analysis of siLang, showing that slow sorption/desorption of P has similar impacts to SOM turnover on GPP and plant biomass (Fig. 5, Table S3). However, this seems unlikely in tropical forest sites depleted of soil Pi, and we argue that a dbLang model could also better capture such processes in tropical forest ecosystems.
From the perspective of Pi cycling, the parameterization of linear P exchange between P lab and P sorb in siLang lacked clear foundations and experimental evidence; hence, it largely differed among TBMs (Helfenstein et al., 2020). Similarly, we have shown that the exchange between P sol and P lab in most TBMs also largely varies and deviates from the reported ranges of experiments (Table 2). Most importantly, the assumption that P sol only exchanges with P lab has been heavily questioned by experimentalists (Helfenstein et al., 2020;Morel et al., 2000), and it also differs from earlier models (Barrow, 1983;Devau et al., 2009;van der Zee and Gjaltema, 1992). The double-surface Langmuir isotherm much better represents the conceptual sorption-desorption scheme that is supported by experimental data (Frossard et al., 2000;Frossard and Sinaj, 1997;Fardeau, 1995), and it also implicitly considers some slower Pi exchange processes that are currently ignored in TBMs, such as chelation and dissolution/precipitation.

Ecosystem responses to perturbations
The different Pi sorption-desorption schemes have not only led to different simulated Pi pool sizes but have also caused varying ecosystem responses to elevated CO 2 and P addition (Figs. 6, 7). The main difference between siLang and dbLang was the speed and extent of exchange between the soluble/labile and sorbed Pi pools. The added P was quickly transferred to the P lab in both models, but the speed of Pi transfer to P sorb was much faster in dbLang. This was because Pi directly transferred from P sol to P sorb in dbLang, whereas the transfer to P sorb only occurred from P lab and at a much slower rate in siLang. As a consequence, P sorb in dbLang stored more P after P addition, and it also released more P under higher CO 2 concentrations and elevated plant P demand. The fast transfer of P from P sol to P lab in both models is supported by evidence from 33 P tracer studies showing that the radioactivity of P sol and P lab converged quickly (<3 months) after 33 P tracer addition (Pistocchi et al., 2018). However, the speed and extent of transfer to P sorb cannot be easily confirmed with isotope signals because both 33 P and 32 P have short half-lives, but it is estimated to vary from days to weeks (Buehler et al., 2002;Helfenstein et al., 2021). Nevertheless, the mathematical and conceptual descriptions of isotopic exchange kinetics tend to support the dbLang model in that the P transfer to P sorb took place at a timescale of months rather than years in acidic soils (Bünemann et al., 2016;Frossard and Sinaj 1997;Helfenstein et al., 2020). Hou et al. (2019), using a data assimilation approach, also concluded that P transfer to P sorb might happen at a much faster rate than the conventional TBM parameter values suggest.
Our advance in capturing soil Pi exchange has altered the responses of QUINCY to changing P availability, particularly at low-P sites. The main difference in the soil Pi responses to P additions between siLang (also 4pool) and dbLang is that the novel dbLang model can store or release more P from the sorbed Pi pool at a faster rate than siLang. Particularly, at sites with high P stress (e.g., LUE), a further increase in P stress (e.g., CO 2 fertilization) led to an increased P mineralization in dbLang rather than increased P desorption in siLang (Fig. 7), as dbLang can establish a much larger (im-plicit) microbial pool (fast SOM) in the topsoil to mineralize P (Fig. 3a). It was confirmed in the EucFACE (Eucalyptus Free Air CO 2 Enrichment) experiment that both N and P mineralization increased by as much as 200 % under eCO 2 (Hasegawa et al., 2016). Additionally, evidence from isotopic studies has shown that biological processes, rather than geochemical processes, are dominant in both topsoil and subsoil of the P-poor LUE site (Pistocchi et al., 2018;Bünemann et al., 2016).

Limitations and future directions
The novel dbLang model much better represented soil Pi exchange and, thus, improved the C-P interactions in QUINCY, but there was a caveat when the P stress was high. At the Ppoor site (LUE), the GPP and vegetation C in dbLang were almost twice as high as those in siLang (Fig. S2a, c). However, such a huge release of P stress by dbLang was still not enough for LUE to reach similar plant productivity and biomass values to BBR or VES, as indicated by field evidence (Lang et al., 2017). In ecosystems with low soil P or high P stress, CO 2 fertilization enhances the organic P cycling processes, such as root exudation, phosphatase production, and microbial P mining (Ellsworth et al., 2017;Jiang et al., 2020;Lang et al., 2017;Pistocchi et al., 2020). Such responses in SOM cycling have not yet been implemented in QUINCY, leading to much lower GPP and plant biomass values at the low-P site than at other sites. These mechanisms have also not been described in other TBMs, leading to poor model performance in low-P ecosystems, e.g., Amazon forests (Fleischer et al., 2019) and Eucalyptus forests (Medlyn et al., 2016). Recent developments in soil models/modules have endeavored to improve the model description of SOM cycling by including parameters such as organomineral association process, explicit microbial dynamics, enzyme dynamics or allocation, and mycorrhizal association (Huang et al., 2021;Yu et al., 2020a;Ahrens et al., 2020;Wutzler et al., 2017;Tang and Riley, 2014;Sulman et al., 2014;Wang et al., 2013). We believe that coupling the novel dbLang scheme with a more mechanistic representation of SOM cycling will open the door to model the different forest ecosystem P cycling strategies (Lang et al., 2017) as well as to simulate the responses of low-P ecosystems to elevated CO 2 (Jiang et al., 2020).

Conclusions
In this study, we first reviewed the implementation of the soil Pi exchange process in terrestrial biosphere models and compared the model implementations of P sorption with batch sorption experiment data. We found that the parameterizations used by most TBMs strongly underestimate the soil's P sorption capacity and overestimate the half-saturation Langmuir coefficient compared with the experimental data. In the QUINCY model, such a formulation leads to a much lower soil P binding capacity in TBMs than in reality, causing the unrealistically large constraint of a slow Pi exchange process on GPP and plant biomass, regardless of the actual soil P availability. We presented a novel model scheme, based on a double-surface Langmuir isotherm, to describe the soil Pi exchange in better accordance with data from sorption experiments. This model parameterization better simulated the measured soil Pi pools and the GPP and plant C patterns of our study sites. It also better reproduced the topsoil SOM pools at the low-P site, thereby better simulating the responses of P pools/fluxes to elevated CO 2 than conventional TBMs. Thus, the novel double-surface Langmuir approach can serve as a better modeling tool to understand the ecosystem response to global change.
Code availability. The scientific part of the code is available under the GPL v3 license. The scientific code of QUINCY relies on software infrastructure from the MPI-ESM environment, which is subject to the MPI-M Software License Agreement in its most recent form. The source code is available online (QUINCY model, https://doi.org/10.17871/quincy-model-2019, Zaehle et al., 2019), but its access is restricted to registered users. Readers interested in running the model should request a username and password from the corresponding authors or via the Git repository.
Data availability. The data in the literature review are available upon request from the corresponding author.
Author contributions. LY designed the study. LY and SZ formulated the paper outline. LY performed the literature review, model development, and analyses. SC, LY, and SZ developed the QUINCY model that forms the basis of the analysis in this paper. All authors contributed to writing the paper.
Competing interests. At least one of the (co-)authors is a member of the editorial board of Biogeosciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements. Lin Yu was supported by the Swedish government-funded "Strategic Research Area Biodiversity and Ecosystems in a Changing Climate" (BECC) program. Silvia Cal-dararu was supported by the European Research Council (ERC) under the European Union's Horizon 2020 "Research and Innovation" program (QUINCY; grant no. 647204). The authors are grateful to Friederike Lang, Jaane Krüger, and other co-workers from the German Research Foundation (DFG) priority research program "Ecosystem Nutrition" (SPP1685), for measuring and sharing the data, and to Jan Engel, for technical assistance with developing the code.
Financial support. The article processing charges for this openaccess publication were covered by the Max Planck Society.
Review statement. This paper was edited by Aninda Mazumdar and reviewed by two anonymous referees.