Global patterns and drivers of phosphorus fractions in natural soils

. Most phosphorus (P) in soils is unavailable for direct biological uptake, as it is locked within primary or secondary mineral particles, adsorbed to mineral surfaces, or immobilized inside of organic material. Deciphering the composition of different P forms in soil is critical for

Abstract.Most phosphorus (P) in soils is unavailable for direct biological uptake, as it is locked within primary or secondary mineral particles, adsorbed to mineral surfaces, or immobilized inside of organic material.Deciphering the composition of different P forms in soil is critical for understanding P bioavailability and its underlying dynamics.However, widely used global estimates of different soil P forms are based on a dataset containing few measurements in which many regions or soil types are unrepresented.This poses a major source of uncertainty in assessments that rely on these estimates to quantify soil P constraints on biological activity controlling global food production and terrestrial carbon balance.To address this issue, we consolidated a database of six major soil P "forms" containing 1857 entries from globally distributed (semi-)natural soils and 11 related environmental variables.These six different forms of P (labile inorganic P (Pi), labile organic P (Po), moderately labile Pi, moderately labile Po, primary mineral P, and occluded P) were measured using a sequential P fractionation method.As they do not represent precise forms of specific discrete P compounds in the soil but rather resemble operational pools, we will now refer to them as P pools.In order to quantify the relative importance of 11 soil-forming variables in predicting soil P pool concentrations and then make further predictions at the global scale, we trained random forest regression models for each of the P pools and captured observed variation with R 2 higher than 60 %.We identified total soil P concentration as the most important predictor of all soil P pool concentrations, except for primary mineral P concentration, which is primarily controlled by soil pH and only secondarily by total soil P concentration.When expressed in relative values (proportion of total P), the model showed that soil pH is generally the most important predictor for proportions of all soil P pools, alongside the prominent influences of soil organic carbon, total P concentration, soil depth, and biome.These results suggest that, while concentration values of P pools logically strongly depend on soil total P concentration, the relative values of the different pools are modulated by other soil properties and the environmental context.Using the trained random forest models, we predicted soil P pools' distributions in natural systems at a resolution of 0.5 • × 0.5 • .Our global maps of different P pools in soils as well as the pools' underlying drivers can inform assessments of the role of natural P availability for ecosystem productivity, climate change mitigation, and the functioning of the Earth system.

Introduction
Phosphorus (P) is a key nutrient limiting plant growth across a wide range of ecosystems (Augusto et al., 2017;Elser et al., 2007;Hou et al., 2020).Soil is typically the major P source for plants in natural terrestrial ecosystems (Weihrauch and Opp, 2018).P supplied by the soil plays a vital role in determining the structures, functions, and processes in terrestrial ecosystems (Peltzer et al., 2010;Wardle et al., 2004).For example, soil P availability imposes a major constraint on plant productivity in terrestrial ecosystems worldwide (Augusto et al., 2017;Ellsworth et al., 2022;Elser et al., 2007;Hou et al., 2020Hou et al., , 2021) ) and affects modeled projections of terrestrial carbon cycle responses to climate change and increasing atmospheric carbon dioxide concentrations (Cunha et al., 2022;Fleischer et al., 2019;Goll et al., 2012).The size of soil P stocks is large compared to annual plant P requirements (Wang et al., 2018) and the amount of P stored in vegetation (Wang et al., 2018;Zhang et al., 2021).However, only a small proportion of soil P can be directly taken up by plants (Morel et al., 2014), with most P tightly sorbed to soil minerals, organic compounds, or organo-mineral complexes with a turnover time of centuries to millennia or longer (Helfenstein et al., 2020;Vitousek et al., 2010).Consequently, vegetation growth is often limited by P availability in ecosystems across the globe (Vitousek et al., 2010;Wardle et al., 2004).For these reasons, the investigation of P dynamics and P bioavailability in the soil requires the identification and separation of different soil P pools (Crews et al., 1995;Walker and Syers, 1976).
Our knowledge of the various pools of P existing in soils is largely based on soil chronosequences and climosequences that investigated how P is cycled during pedogenesis (Crews et al., 1995;Walker and Syers, 1976).These studies revealed that chemical weathering results in the release of P from primary minerals, after which it can be converted to organic P through biological uptake, sorbed to soil particles, or occluded within secondary minerals.The most commonly used procedures for the sequential fractionation of P in soils were developed by Hedley et al. (1982) and later modified by Tiessen and Moir (1993).This method exploits differences in solubility to separate different "forms" of P occurring in the soil.Though it cannot be used to identify specific discrete P compounds in the soil, this approach has proven indispensable for the study of soil P cycling and, as such, is widely used (Condron and Newman, 2011;Klotzbücher et al., 2019;Barrow et al., 2021).In addition to forming the basis for modeling soil P dynamics, these procedures yield operationally defined pools that are used to assess soil fertility and soil development (Wang et al., 2010(Wang et al., , 2022)).Several studies have called the validity of sequential extractions into question, pointing out that, while it is often assumed that pools from sequential extractions contain distinct forms of P, the reality is much more complex (Condron and Newman, 2011;Gu and Margenot, 2021;Klotzbücher et al., 2019).
Numerous studies have used data from P fractionations to explore drivers of spatial differences in soil P pools from local to global scales (e.g., Brucker and Spohn, 2019;Hou et al., 2018a;Yang and Post, 2011;Chen et al., 2015).Yang and Post (2011) compiled Hedley P pools data from 178 soil samples to explore P dynamics along a soil development gradient.Their results generally supported the conceptual model proposed by Walker and Syers (1976): the gradual decrease of primary mineral-bound P, the continual increase and eventual dominance of occluded P, and the overall decrease of total P as pedogenesis progresses.However, the conceptual model of Walker and Syers (1976) disagreed with the results of Yang and Post (2011), who found that labile Pi and moderately labile Pi (non-occluded P in Walker and Syers' model) formed a significant fraction of total P at every stage of pedogenic development.Augusto et al. (2017) compiled 1684 measurements of P pools that were taken worldwide using the Hedley fractionation method.This work found that total P content was a main factor in determining the concentrations of labile Pi and organic P pools.Almost concomitantly, Hou et al. (2018a) used a global dataset compiled from analyses of 802 soil samples to examine climate effects on the soil P cycle and P availability and found that soil labile Pi concentration decreased with increasing mean annual temperature, which was mainly due to decreasing soil organic P and primary mineral P with increasing temperature.Although those studies advanced our understanding of factors controlling the size of various soil P pools, their focus was largely contained to the effects of climatic factors or the soil weathering stage on a few select P pools, mainly labile Pi and organic P. Thus, we still lack a comprehensive understanding of the relationships between environmental drivers and the various soil P pools at a global scale.
Despite significant efforts to synthesize global Hedley soil P pool data, to our knowledge, only a single mapping of soil P fractions across natural terrestrial ecosystems exists, and this work was based on the upscaling of measurements taken from only 178 samples (Yang et al., 2013).These global estimates and associated maps of soil P pools have been used to explore global patterns of soil P supply and to estimate P availability in natural and managed systems (e.g., Ringeval et al., 2017;Sun et al., 2017).They have also been used to calibrate or initialize a range of global P models (Wang et al., 2010;Yang et al., 2014).However, the poor global coverage of the underlying data introduces significant uncertainty, potentially resulting in misinformed model predictions and assessments.
We recently developed a new global map of soil total P concentrations and explored the underlying drivers, taking advantage of improved data availability and the use of non-linear statistical modeling (He et al., 2021).Here, we constructed a database of soil P pools in 1857 globally distributed (semi-)natural soils collected from 274 published studies, 1 order of magnitude larger than the dataset used by Yang et al. (2013) (see comparison in Fig. S1 in the Supplement).Using this database, we trained random forest models to capture observed variations in Hedley P pool concentrations at the site level with two aims: (1) to quantify the relative importance of different drivers of spatial variation in each soil P pool and (2) to develop global distribution maps of various P pools at a spatial resolution of 0.5 • × 0.5 • using the calibrated random forest regression model.

Soil P fractionation terminology and procedure
In the present study, we use the word "pool" to indicate the concentrations quantified in each step during sequential fractionation and the word "proportion" to represent the size of a pool relative to total P. We try to avoid using "fraction" to describe different P forms anymore because it is easy to confuse with proportion.There is disagreement about how to interpret the different pools yielded by sequential fractionation (Gu and Margenot, 2019;Barrow et al., 2021;Klotzbücher et al., 2019;Condron and Newman, 2011;Helfenstein et al., 2020).Here, we adopt a widely used regime for understanding these pools, which correspond to different forms of soil P: the resin Pi pool represents the soil soluble Pi pool, which is immediately accessible to plants.The HCO − 3 Pi pool can be released by ligand exchange with bicarbonate ions; this pool is available to plants and persists only for short periods (e.g., a growing season).Due to their functional similarity, the resin and HCO − 3 Pi pools can be combined and used as an index of labile inorganic P (i.e., "available" P).The HCO − 3 Po pool represents labile Po that can be utilized by plants after mineralization.The OH − P (Pi and Po) pools mainly indicate moderately labile P that is bound to both amorphous and crystalline Al and Fe; these pools represent P that is moderately available to plants.The 1 M HCl Pi pool represents primary mineral P that is bound to calcium and that can be utilized by plants after it is released by weathering.And other P pools, such as residual P, are the least available to plants due to their particularly low solubility.
To integrate data from studies that use different interpretations, we consider a set of six simplified P pools (Fig. 1): labile Pi, labile Po, moderately labile Pi, moderately labile Po, primary mineral P, and occluded P. Labile Pi includes the resin Pi and HCO − 3 Pi pools; labile Po and moderately labile Po are organic pools extracted by carbonate and NaOH, respectively; moderately labile Pi is the NaOH Pi fraction; primary mineral P represents the 1 M HCl Pi pool; and occluded P includes any remaining P (Hou et al., 2018b).
We collected, filtered, and processed soil P pool data (see Sect. 2.2.) from the literature (Sect.S1 data source references in the Supplement).First, we added all measured P pools together to calculate total soil P, unless at least one pool had a missing value.In this case, we used the measured value of total soil P presented in that paper instead.Second, if phosphate was extracted using deionized water before the resin P extraction step, the labile Pi pool includes both resin and aqueous P. If the extraction procedure began by using a sodium bicarbonate solution instead of resin P, we classified HCO − 3 Pi as labile Pi.Third, the labile Po pool and the moderately labile Po pool represent the HCO − 3 -extracted Po and NaOHextracted pools, respectively.The raw data contained other organic P pools (e.g., Po extracted by sonication and NaOH or by hot, concentrated HCl), which we included as part of occluded P. Fourth, if occluded P was not reported, we calculated this pool's concentration by subtracting the sum of the five other pools from total P.

Data source and processing
We collected soil P pool data by aggregating all the publications that cited either one of two main references dedicated to Hedley's method (Hedley et al., 1982;Tiessen and Moir, 1993).We included all studies that reported data from (semi-)natural soils that supported primary vegetation or that had been reforested with a stand older than 10 years and no documented history of P fertilization.We excluded observations taken from pot experiments, mine zones, and intertidal zones, as P pools in these soils could be affected by factors different from those influencing (semi-)natural soils.Despite our best efforts, we cannot rule out that our database includes data collected from soils affected by undocumented anthropogenic activities in the past (e.g., P fertilization occurring before reforestation), particularly in western Europe and the eastern USA (e.g., De Schrijver et al., 2012).All data were collected at the plot scale.For data that included replicates within a plot or soil layer, average values were calculated.
To compile our database, we first combined the two existing global databases (Augusto et al., 2017;Hou et al., 2018b).Detailed information about the methods used to construct these datasets can be found in the original publications.We extracted observations from these two databases by selecting only unfertilized, uncultivated, and (semi-)natural soils.This yielded 1684 observations from 182 studies from the dataset developed by Augusto et al. (2017) and 802 observations from 99 studies from the dataset developed by Hou et al. (2018b).Next, we removed 375 duplicates, after which our dataset contained 2111 observations from 245 studies (Fig. S2).Because we use total soil P concentration as a predictor of soil P pools, we removed data that did not include total soil P (calculated as the sum of P pools or measured by a separate method) or that did not identify the concentration of at least one pool (e.g., labile Pi, labile Po, moderately labile Pi, moderately labile Po, primary mineral P, or occluded https://doi.org/10.5194/bg-20-4147-2023 Biogeosciences, 20, 4147-4163, 2023 P).In this step, 816 observations were removed, resulting in a dataset that included 1295 observations from 178 studies.Next, we added additional observations by compiling data from literature published after 2016, the final year included in the database compiled by Hou et al. (2018b).We used Google Scholar to search for studies published between 2016 and 8 August 2021 that referenced either Hedley et al. (1982) or Tiessen and Moir (1993).This search returned 701 publications citing Hedley et al. (1982) and 245 citing Tiessen and Moir (1993).From this set, we selected studies that presented soil P data collected using the fractionation method for (semi-)natural soils.The resulting 562 observations from 96 studies were added to our final dataset, which includes a total of 1857 observations collected from 729 sites from 274 studies (Sect.S1).
In addition to soil P pool concentration and site coordinates, our dataset contains site characteristics including climate variables (i.e., mean annual temperature (MAT), mean annual precipitation (MAP), and potential biome), soil physicochemical properties (e.g., soil organic carbon concentration (SOC), soil clay and sand content, and soil pH), and elevation (Table 1).The potential biome was identified using a global map of potential natural biomes (i.e., the global distribution of biomes that would exist in the absence of human activity) (Hengl et al., 2018).This categorization includes seven ecosystem types, including tropical forest, temperate forest, boreal forest, grassland, savanna, desert, and tundra.We did not include a parent material type because it can be inferred from soil total P concentration and other soil properties (e.g., soil texture and pH) (Augusto et al., 2017;He et al., 2021).Because soil age was rarely reported, we used USDA soil order identity as a proxy for three age classes: slightly, intermediately, and strongly weathered (Smeck, 1985;Yang et al., 2013).Among the 12 USDA soil orders, Entisols, Inceptisols, Histosols, Andisols, and Gelisols are classified as slightly weathered soils.Alfisols, Mollisols, Aridisols, and Vertisols are classified as intermediately weathered soils.Oxisols, Ultisols, and Spodosols are classified as strongly weathered soils (Yang et al., 2013;Smeck, 1985).Given that atmospheric P inputs are small (0.3 kg P ha yr −1 , on average) compared to soil P stocks (Mahowald et al., 2008;Wang et al., 2015) and are also highly uncertain over timescales relevant to soil development, we do not consider atmospheric inputs to be a predictor of P pools.As such, we did not include this information in our dataset.We extracted data from each publication as available.In cases in which relevant information was not reported, we extracted the missing data from gridded datasets (Table S1 in the Supplement) based on the geographic coordinates of the study sites.
In random forest modeling, correlated predictors can be substituted for each other so that the importance of correlated predictors will be shared, making each predictor's estimated importance smaller than its true value (Strobl et al., 2008).Thus, we did not include the soil total nitrogen content, as it is strongly correlated with SOC (r = +0.94),nor did we include the aridity index as it is strongly correlated with MAP (r = +0.72).We also did not include rarely reported variables that were included in the referenced studies (e.g., soil extractable aluminum and iron concentrations).

Statistical modeling
All statistical analyses and plotting were performed in the R environment (v.4.0.2) (R Core Team, 2018).The database includes some extreme values in each P pool.These values were likely observed in exceptional geological contexts (Porder and Ramachandran, 2013) or in special soils (e.g., very young volcanic soils).We included these extreme values in the shared version of the dataset.However, these values were excluded from data used in model training, as the extremely high values could have a large influence on modeled relationships between soil P pools and predictors.To this end, we only included values falling in the interval between 1 % and 99 % (Table 2).As we only generate predictions in the top 100 cm depth, the training of the model was done using observations in 0-100 cm.
We used random forest regression models (Breiman, 2001) to predict global patterns of distribution for individual soil P pools.It is a type of ensemble learning algorithm that combines multiple decision trees to make predictions.It reduces the risk of overfitting and improves the generalization performance by using random subsets of input variables and training data.The output is the average prediction of all the trees (James et al., 2013).All models included the same 11 predictors: MAT, MAP, potential biome, total P, soil depth, SOC, soil clay and sand content, soil pH, elevation, and soil weathering stage.The random forest analysis accounts for interactions and nonlinear relationships between predictors and is appropriate for handling the multicollinearity problem in the multivariate regression (Delgado-Baquerizo et al., 2017).We performed random forest regression analysis using the R package caret by applying the embedded R package ran-domForest version 3.1 (Liaw and Wiener, 2002) with an au-tomated mtry parameter.A 5-fold cross-validation was performed using the R package caret (v.6.0-86) (Kuhn, 2020) to evaluate model performance.The mean decrease in accuracy (%IncMSE) was used to evaluate the relative importance of each variable as a predictor of a soil P pool.The mean decrease in accuracy plot shows how the accuracy of the fitted model declines with the exclusion of a predictor.The greater the decline in accuracy, the more important the variable is for prediction.In this study, the importance measure was calculated for each tree and averaged across the forest (500 trees).Our model found that all 11 variables are important for predicting pool concentrations; thus, all were used as predictors as we developed the global distribution map.Partial dependence plots are a graphical technique used in machine learning to show how the value of a particular input variable affects the predictions of a model, while holding all other input variables constant at their average values in the training data (James et al., 2013).We used the partial_dependence function in the R package edarf version 1.1.1(Jones and Linder, 2016) to calculate the partial dependence of the response on an arbitrary dimensional set of continuous predictors from a fitted random forest model.
Finally, we applied the above trained models for each of the soil P pools to global databases of the 11 predictors to generate global predictions of each soil P pool.The gridded predictor variables used for the global prediction were all regridded to a spatial resolution of 0.5 • ×0.5 • (the original resolution can be found in Table S1).The predict function in the ranger package (Wright and Ziegler, 2017) can compute the standard error of a predicted value.To estimate standard errors based on out-of-bag predictions, we used the infinitesimal jackknife for bagging approach (Wager et al., 2014).We did not mask croplands or other areas heavily influenced https://doi.org/10.5194/bg-20-4147-2023 Biogeosciences, 20, 4147-4163, 2023 by human activity (e.g., urban areas), so pool concentrations predicted for these regions should be interpreted as the natural state prior to anthropogenic activity.Soil depth was used as a predictor, allowing models to predict soil P pool concentration for any given depth (Hengl et al., 2017).The partial dependence plot indicated that soil P pool concentration changed with soil depth in the top 50 cm but not in deeper layers (> 50 cm) (Fig. S3e).As such, we generated predictions at six standard depths for all soil P pool concentrations: 0, 10, 20, 30, 50, and 100 cm.Averages for a depth interval (e.g., 0-30 or 0-100 cm) can be derived by calculating the weighted average of the predictions within that interval (Hengl et al., 2017).

Characters of P pools in natural soils across the world
Our soil P pool database includes 1857 measurements from 729 geographically distinct sites and covers 6 continents, all major biomes, and all 12 USDA soil orders in terrestrial ecosystems (Fig. 2).The database includes pool concentrations measured in samples collected from the 0.5 cm to a depth of 450 cm, with 83 % of the measurements taken from the topsoil (0-30 cm).
From the global median values (Table 2), the largest pool among the six pools considered is the occluded P, accounting for more than 40 % of the soil total P, followed by the moderately labile pools (Pi and Po mainly bound to Al and Fe), accounting for about a quarter of total P.Primary mineral P (bound to calcium) accounted for a minor proportion (7.9 %) of soil total P; labile P pools (Pi and Po) represent the smallest proportions of total P (around 4 %, respectively).

Model performance of different P pools in soils
The random forest regression models explained 62 %, 64 %, 60 %, 83 %, 76 %, and 82 % of the variance in the concentrations of labile Pi, labile Po, moderately labile Pi, moderately labile Po, primary P, and occluded P, respectively (Fig. 3).Using the importance measure (%IncMSE), we identified total P concentration as the most important predictor for concentrations of soil labile Pi, labile Po, moderately labile Pi, moderately labile Po, and occluded P and soil pH as the most important predictor for soil primary P (Fig. 3).The random forest regression models explained 48 %, 58 %, 52 %, 64 %, 80 %, and 58 % of the variance in proportions of labile Pi, labile Po, moderately labile Pi, moderately labile Po, primary P, and occluded P, respectively (Fig. S4).Based on the importance measure, soil pH is generally the most important predictor for proportions of all soil P pools, with prominent influences of total P concentration, soil organic carbon, soil depth, and biome (Fig. S4).These results suggest that, while concentration values of P pools logically strongly depend on soil total P concentration, the relative values of the different pools are modulated by other soil properties and the environmental context.

Global patterns and drivers of P pools in natural soils
Our global predictions (Fig. 4) revealed that average values across all P pools were higher in slightly weathered soils compared to those in more weathered soils (Fig. 5a), reflecting the strong effect of the initial stages of soil development   on soil P depletion.While occluded P proportion increased with soil development, the proportions of labile and moderately labile P (Pi and Po) were fairly independent of soil weathering stage (Fig. 5b).Our global predictions also indicated that soil P pool concentrations varied substantially among different biomes.Lower P pool concentrations were found in warm and/or humid biomes (e.g., tropical forest and savanna), while higher P pool concentrations were found in northern cold biomes (e.g., tundra and boreal forest) (Fig. 5c).The spatial patterns of pool proportions were different from those of pool concentrations across biomes (Fig. 5d).For example, variation in the proportion of labile Pi was relatively small compared to the variation observed in labile Pi concentrations; moreover, the proportion of occluded P tended to increase in the transition from tundra and boreal forest to tropical forest and savanna (Fig. 5d).It should be noted that the mapped predictions of P pool concentrations across biomes (see Fig. 5c) are not consistent with the measured data (Fig. S5), which indicates that total soil P in tropical forests is higher than in any other biome.This result suggests a sampling bias due to overrepresentation of high total soil P sites in the tropical forest data.Partial dependence plots (Fig. S3) and the results of Pearson correlation analysis (Fig. 6) were generally consistent.Both analyses revealed that concentrations for all six pools were significantly and positively correlated with total P concentration.SOC was significantly and negatively correlated with primary mineral P concentration but positively correlated with the other five pool concentrations.MAT and MAP were significantly and negatively correlated with concentrations of all soil P pools.Soil pH was significantly and positively correlated with primary mineral P concentration but significantly and negatively correlated with concentrations of the other five P pools.The results of Pearson correlation analysis also indicated that P pool concentrations were well correlated with each other, except for primary mineral P; this pool was negatively correlated with labile Po and not correlated with moderately labile Po concentration.The partial dependence plot indicated the variation of P pool concentrations with increasing soil depth (Fig. S3e).We found a drastic decrease of P pools with soil depth in the top 50 cm of soil, which then became relatively stable at 50-100 cm soil depth.Labile and moderately labile P (both Pi and Po) concentrations also decreased with an increase in soil depth in the top 50 cm, while primary mineral P and occluded P concentrations generally increased with soil depth.As for the P pools' proportions, Pearson correlation analysis (Fig. 6) revealed that soil pH was positively correlated with the primary mineral P proportion and negatively correlated with the other five P pool proportions.Soil labile Po, moderately labile Pi, and moderately labile Po proportions decreased substantially with an increase in MAT, while the occluded P proportion increased with MAT.Soil labile Po, moderately labile Pi, and moderately labile Po proportions increased substantially with increasing total P concentration, while the soil labile Pi and occluded P proportions decreased substantially with total P concentration.
There are significant differences between our predictions and those made by Yang et al. (2013) (Fig. S6) in both the magnitude and the spatial patterns associated with most P pool concentrations.The two global estimates were only weakly to moderately correlated (Pearson correlation coefficients between 0.09 and 0.38) (Fig. 7).Yang et al.'s (2013) predictions are lower than ours for organic P, moderately labile Pi, primary mineral P, and occluded P concentrations (Table S2).Although average values for labile Pi concentrations estimated by Yang et al. (2013) were close to ours, they were only weakly correlated with each other (Pearson correlation coefficient of 0.09) (Fig. 7).

Improved mapping of different P pools in global natural soils
We trained random forest regression models using 11 variables to predict six soil P pools at different depths in (semi-)natural terrestrial ecosystems, resulting in significant improvements over earlier estimates (Yang et al., 2013).First, we used a new global map of total P concentrations in natural soils (He et al., 2021) as a predictor.Because total P is an important predictor and is highly correlated with all other P pools, a higher quality map of total soil P will also lead to improved predictions of other P pools.Further improvements in global P data availability will thus also be useful to improve maps of other P pools.Second, Yang et al. (2013) used a limited number (n = 178) of measurements of Hedley P pools across soils.Our database represents a nearly 10fold increase, which can better represent the heterogeneous https://doi.org/10.5194/bg-20-4147-2023Biogeosciences, 20, 4147-4163, 2023 The same meanings apply to the labile Po P., moderately labile Pi P., moderately labile Po P., primary P P., and occluded P P. Elevation is not included this plot, as it is not well correlated with P pool variation in our results.
conditions on Earth.Third, Yang et al. ( 2013) estimated P pool concentrations using total soil P concentrations, global soil order maps, and average proportions of various P pools for different soil orders.However, there are still considerable variabilities in P concentrations within any given soil order, though it could be a good predictor of P pool variation (Cross and Schlesinger, 1995;Yang and Post, 2011).Indeed, we found that soil orders were less informative than other environmental predictors.By including more predictors (e.g., SOC, climate, and soil pH), our model offers significant improvements for capturing the variation observed in soil P composition across the globe.The above-named technical improvements have made it possible to produce more accurate maps.For example, while Yang et al.'s (2013) global predictions indicated that the highest organic P concentrations were found in the temperate zone, our maps suggest they are in boreal forest and tundra.This is more consistent with general understanding of global soil organic matter distribution (Hengl et al., 2017).Differences between our estimates of different P pools and those presented by Yang et al. (2013) have significant implications for soil P availability to vegetation.The averages and median values of Yang et al.'s (2013) predicted soil organic P, moderately labile Pi, and occluded P concentrations were substantially lower than our estimates.Evidence suggests that soil organic P and moderately labile Pi remain bioavailable on timescales of days to months (Helfenstein et al., 2020;Augusto et al., 2017;Maharjan et al., 2018), while occluded P is bioavailable in the order of years to millennia (Hou et al., 2016;Wang et al., 2007).Thus, soil P availability might be larger than previously assumed in assessments based on estimates by Yang et al. (2013) (e.g., Sun et al. 2017).

Major drivers of different P pools in natural soils
Our results indicate that global variation in soil P pools is jointly controlled by total P concentration, soil pH, soil development, climatic factors, and soil depth.Given that our models explain >48% of the variance observed in P pools (concentration and proportion), our results suggest that edaphic properties and climatic factors play significant roles in the size and composition of different soil P pools globally.

Effects of total soil P concentration on P pools
We found that total soil P concentration was a prominent predictor of most soil P pools at the global scale and that total P was positively correlated with all P pool concentrations and Po pool proportions.This is consistent with findings at local (Turner and Blackwell, 2013) and global (Augusto et al., 2017;Hou et al., 2018a;Harrison, 1987) scales.Total soil P is influenced by multiple soil-forming factors (e.g., parent material P concentration, climate, soil organic carbon content, and soil texture) (He et al., 2021).Thus, total soil P provides an integrated measure of factors that regulate the size of the P pools.Moreover, this result is consistent with the emerging idea of substrate-based P cycling in natural ecosystems (Lang et al., 2016(Lang et al., , 2017)): soils with high total P content are usually also associated with a large primary mineral P pool.At these P-rich sites, plant and microbial communi-ties tend to promote P release from primary minerals, with subsequent biological and abiotic transformations resulting in high concentrations in all other P pools (Lang et al., 2016;He et al., 2021) and higher proportions of organic P (Hou et al., 2018c).In contrast, at P-poor sites, plant and microbial communities are more reliant on P recycling systems that promote the mineralization of Po by soil microbes (Achat et al., 2009;Marklein and Houlton, 2012) and the mobilization of moderately labile Pi or even occluded P (Augusto et al., 2017) to sustain the P supply.Therefore, soil P pool concentrations are expected to strongly co-vary with total soil P concentration.

Effects of soil pH on P pools
Consistent with previous studies (Hou et al., 2018c;Kruse et al., 2015;Oburger et al., 2011;Barrow et al., 2020), our results indicate that soil pH is an important predictor of P pool concentrations and proportions in natural soils globally.The relative importance of pH is unsurprising, since the sequential fractionation procedure is based on dissolving a soil sample in solutions of varying acidity/alkalinity.However, the observed pH effects also support the existing mechanistic understanding of the various P forms.The strong positive correlation of primary P and soil pH is expected because (1) the primary P pool is composed mainly of calcium phosphate/apatite, which is highly soluble at low pH but becomes https://doi.org/10.5194/bg-20-4147-2023 Biogeosciences, 20, 4147-4163, 2023 less soluble with increasing pH, and (2) soil pH declines with soil weathering intensity (Delgado-Baquerizo et al., 2020) (e.g., the highest values of soil pH are usually found in dry regions where chemical weathering rates are limited by water availability; Slessarev et al., 2016).Both factors affect the transformation of primary mineral P to other forms.Soil pH shows important but negative influences on the proportions of other soil P pools (i.e., proportions of labile Po, moderately labile Pi and Po, occluded P, and labile Pi).There are several possible explanations for these relationships.First, low soil pH values (< 5.0) inhibit soil microbial activities and the extracellular activity of phosphatase enzymes (Aciego Pietri and Brookes, 2008;Eivazi and Tabatabai, 1977;Xu et al., 2017).Thus, in acidic soils, more organic P (i.e., labile and moderately labile Po) may accumulate than in neutral soils.Second, decreasing soil pH is associated with the accumulation of Fe and Al oxides, which leads to enhanced adsorption of P (i.e., moderately labile Pi and Po).Third, pH tends to decrease as soil weathering advances and base cations are progressively washed out (Slessarev et al., 2016).As soils weather, occluded P accumulates.Therefore, the occluded pool proportion decreases with increasing pH.Fourth, increasing soil pH is associated with enhanced adsorption of dissolved Pi to Ca and Mg, reducing the amount of labile Pi available for plants and soil microorganisms (Fink et al., 2016;Gerke, 2015).This could explain the negative relationship between soil pH and the labile Pi proportion as identified in this study.But increasing soil pH in acidic soils favors soil microbial growth and phosphatase enzyme activity, which could increase P availability.These conflicting mechanisms may be responsible for the relative low importance of predicting the spatial variation of labile Pi proportion.

Effects of climate on P pools
Our global predictions indicated negative effects of climatic factors (i.e., MAT and MAP) on the soil P concentrations, which means a decrease in soil P concentrations, as MAT increases from northern cold biomes (e.g., tundra and boreal forest) to a warm tropical biome (e.g., tropical forest) or MAP increases from arid to humid regions.These results fit well with our understanding of broad P concentration variation with increasing weathering (Walker and Syers, 1976).Also, these results are expected because the main determining factor of soil P pool concentrations (i.e., soil total P concentration) shows a similar pattern (He et al., 2021).Interestingly, we found contrasting responses of the labile Pi pool's proportions along the MAT and MAP gradients.The positive correlations between labile Pi proportion and both MAT and MAP indicated that labile Pi concentration decreased slower than the soil total P as temperature and precipitation increased.This result supported the idea that biological systems evolved to retain soil labile Pi levels despite an overall decrease in total soil P as long as climate factors are favorable for biological activity.In strongly weathered soil with limited soil P stocks but otherwise optimal growing conditions, like in warm and humid tropical forests, the mineralization of Po and mobilization of moderately labile Pi or occluded P could contribute to maintaining high levels of labile Pi due to the high soil temperature for soil enzyme kinetics and abundant carbohydrate supply from photosynthesis fueling biological activity (Vitousek, 1984;Achat et al., 2009;Chacon et al., 2006;Liptzin and Silver, 2009).

Effects of soil development on P pools
The variations of P concentrations and proportions across weathering stages predicted by our model partially support Walker and Syers' (1976) theory based on soil chronosequences.While our results are consistent with the expectations from Walker and Syers' (1976) theory about the increase in the proportion of occluded P that occurs at the expense of primary and organic P during soil development, our results disagree with Walker and Syers' (1976) ideas regarding the evolution of the labile Pi and moderately labile Pi pools during soil development.The evolution of occluded P is commonly explained by the increase of Al and Fe oxide minerals and the decrease of soil pH; in addition to being fixed onto Fe and Al oxides, P that is released from primary minerals or mineralized from organic matter can be occluded by being adsorbed onto mineral surfaces or precipitating in poorly soluble secondary soil minerals (Crews et al., 1995;Quesada et al., 2010;Selmants and Hart, 2010).
In the Walker and Syers model, non-occluded inorganic P increases initially to a peak value and then declines to very low levels during pedogenesis.However, our results showed that labile Pi and moderately labile Pi (non-occluded P in Walker and Syers' (1976) model) formed significant proportions of the total P throughout all soil orders across weathering stages.This could be due to the coarse classification of weathering stages in our study, which may be insufficient to characterize the end members of the range.This explanation is supported by the small proportion of primary mineral P in the slightly weathered soil and the moderate amounts of primary P remaining in strongly weathered soils.In addition, the theory of P distributions along soil development stages stems largely from relatively isolated island locations in New Zealand (Walker and Syers, 1976) and Hawaii (Crews et al. 1995).However, in most other places in the world there is higher dust deposition from surrounding land masses, which is a source of primary P to even highly weathered soils (Vogel et al., 2021).Nevertheless, the contribution of dust deposition to primary P and other forms of P in soil remains unquantified in most land areas.

Effects of soil depth on P pools
We found that soil P pool concentrations varied significantly with soil depth.Total soil P concentration is often higher in topsoil than in subsoil due to biological uplift, which was reported by previous studies (Jobbágy and Jackson, 2001;Porder and Chadwick, 2009).The labile and moderately labile P (in both inorganic and organic pools) concentrations were higher in the topsoil, which can also be explained by biological uplift and highly available P inputs from plants and dust to the topsoil.In contrast, the primary P and occluded P concentrations in the topsoil were lower than in the subsoil.This can be explained by the fact that topsoil tends to be more weathered and developed than the subsoil (Achat et al., 2012;Chen et al., 2021).

Limitations and prediction uncertainty
In our database, some regions were underrepresented (e.g., northern Canada, middle and northern Asia, and inner Africa), which may result in low accuracy of the predicted values in those regions.In the tropics, high P soils were overrepresented, and the accuracy of predicted values in tropical regions may be quite low.Our database contains 4 times as many observations from surface mineral soils (0-30 cm) than it does from soils deeper than 30 cm.As such, the predicted concentrations of different P pools for deep soils may suffer from larger uncertainties.Finally, large portions of variation remain unexplained by our models, especially variation in soil labile Pi concentrations and proportions (40 % and 52 % unexplained, respectively), indicating that other significant factors were not accounted for in our modeling.These factors may include microbial processes, Fe and Al oxide concentrations, plant community composition, atmospheric deposition, and soil erosion (Kruse et al., 2015;Achat et al., 2016).These limitations highlight the need for additional measurements, particularly from underrepresented regions and the subsoil, as well as measurements of closely associated variables, especially those related to labile Pi.

Conclusion
Here, we compiled the largest database to date of different soil P pools.Using machine learning modeling, we quantified the relative importance of multiple predictors for estimating different soil P pools and estimated these pools at the global scale.Our results indicated that the global concentrations of soil labile Pi, labile Po, moderately labile Pi, moderately labile Po, and occluded P could be generally predicted mainly by the total soil P concentration, while primary P concentration was mainly predicted by soil pH and total soil P concentration.For predicting proportions of different P pools, soil pH, and to a lesser extent soil depth, SOC and total P were the most important predictors for all P pool proportions at the global scale.In addition, our results also revealed significant effects of climate and other edaphic factors on spatial variation in P pools.We concluded that edaphic properties and climatic factors were significant predictors of soil P pools, including concentration and proportion of total P.These findings represent a significant step towards improving understanding of global variations in different soil P pools.Our global maps of predictions of different P pools will be important to improving global models of the terrestrial P cycle.

Figure 2 .
Figure 2. Distribution of site-level training data (a).The database contains 1838 observations covering 12 USDA soil orders (b) and all major terrestrial biomes (c).

Figure 3 .
Figure 3. Relative importance of variables for predicting the concentration of soil P pools quantified using random forest models.The mean decrease accuracy (%IncMSE) indicates the relative importance of each variable for predicting soil P pools.SWS: soil weathering stage.

Figure 4 .
Figure4.Global maps of P pool concentrations at depths of 0-30 cm.Note that croplands and other heavily influenced areas were not masked from the maps, so soils in these areas can be used to represent soils without extensive anthropogenic activity.

Figure 5 .
Figure 5. Average concentrations of P pools and their proportions of total soil P concentration across soil weathering stages and biomes.Labile and moderately labile Po form the organic pool.The results are based on global estimates for 0-30 cm depth.Dry vegetation combines grassland and savanna biomes to simplify the figure.

Figure 6 .
Figure 6.Coefficients of Pearson correlations among proportions and concentrations of soil P pools.The results are based on the average global predictions in the top 30 cm of soils.Coefficients with P < 0.001 are shown in black and bold.Labile Pi P. indicates the labile Pi proportion.The same meanings apply to the labile Po P., moderately labile Pi P., moderately labile Po P., primary P P., and occluded P P. Elevation is not included this plot, as it is not well correlated with P pool variation in our results.

Figure 7 .
Figure 7. Relationship between our predicted P fraction concentrations and Yang et al.'s predictions.Panels (a)-(e) and (f) depict correlations between both sets of predictions for soil labile Pi, organic P, primary mineral P, moderately labile Pi, and occluded P, respectively.The dashed lines indicate the 1 : 1 line; the blue lines indicate the regression line.

Table 1 .
Summary of training data used to predict soil P pool concentrations.P 10 and P 90 indicate percentile ranks of 10 % and 90 %, respectively.Proportions from literature (PFL) and proportions from gridded maps (PFGMs) indicate proportions of measurements from the literature and extracted from global gridded maps, respectively.
MAT: mean annual temperature; MAP: mean annual precipitation; SOC: soil organic carbon.a PFL: proportion from literature.b PFGM: proportion from gridded map.PFL and PFGM indicate proportions of measurements from literature and extracted from global gridded maps, respectively.