Use of near-infrared spectroscopy to assess phosphorus fractions of different plant availability in forest soils

The analysis of soil phosphorus (P) in fractions of different plant availability is a common approach to characterize the P status of forest soils. However, quantification of organic and inorganic P fractions in different extracts is labor intensive and therefore rarely applied for large sample numbers. Therefore, we examined whether different P fractions can be predicted using near-infrared spectroscopy (NIRS). We used the Hedley sequential extraction method (modified by Tiessen and Moir, 2008) with increasingly strong extractants to determine P in fractions of different plant availability and measured near-infrared (NIR) spectra for soil samples from sites of the German forest soil inventory and from a nature reserve in southeastern China. The R of NIRS calibrations to predict P in individual Hedley fractions ranged between 0.08 and 0.85. When these fractions were combined into labile, moderately labile and stable P pools, R of calibration models was between 0.38 and 0.88 (all significant). Model prediction quality was higher for organic than for inorganic P fractions and increased with the homogeneity of soil properties in soil sample sets. Useable models were obtained for samples originating from one soil type in subtropical China, whereas prediction models for sample sets from a range of soil types in Germany were only moderately useable or not useable. Our results indicate that prediction of Hedley P fractions with NIRS can be a promising approach to replace conventional analysis, if models are developed for sets of soil samples with similar physical and chemical properties, e.g., from the same soil type or study site.


Introduction
Phosphorus (P) is limiting plant growth and ecosystem productivity in many parts of the world (e.g., Elser et al., 2007;Vitousek et al., 2010).Unlike nitrogen, the world's P stores for the production of fertilizer are finite and therefore will decrease in the future (Cordell et al., 2009;Edixhoven et al., 2013).It is therefore of paramount importance to use P as efficiently as possible in agricultural and forestry production systems.In several parts of Europe and other parts of the world, concerns have been expressed that a substantial proportion of forest stands may suffer from P limitation (Fox et al., 2011;Khanna et al., 2007;Lorenz et al., 2003).A number of hypotheses, to explain this phenomenon of widespread phosphorus limitation have been put forward (e.g., nitrate uptake suppressing P uptake, soil acidification reduces rate of organic P mineralization and increases inorganic P sorption capacity) (Mohren et al., 1986;Paré and Bernier, 1989;Gillespie and Pope, 1991).In long-term development of forest ecosystems following disturbance, P (in contrast to N) can only diminish through processes such as erosion and timber harvest and therefore becomes limiting to biological activity.Consequently, increasing P deficiency is also a natural process of ecosystem retrogression (Walker et al., 1976;Wardle, 2009), which seems to be accelerated by anthropogenic activities.The increasing demand for forest biomass, such as for renewable energy, might exacerbate the P nutrition problem in many forests (Vanguelova et al., 2010).Therefore, suitable methods for monitoring the current status and the medium to Published by Copernicus Publications on behalf of the European Geosciences Union.
J. Niederberger et al.: Use of near-infrared spectroscopy to assess phosphorus fractions long-term trends of P nutrition in forest ecosystems are urgently needed.
For forest soils, often only total P contents are measured.But total P is not a meaningful variable as it is not very sensitive to environmental or management influences over time and is not indicative of the amount of plant-available P in forest soils, where P can occur in many organic and inorganic forms of different availability (Khanna et al., 2007).Especially for forest soils, it is important to differentiate between organic and inorganic P forms, since organic P, although not directly available for plant uptake, plays a major role in tree nutrition (Turner et al., 2005;Shen et al., 2011;Rennenberg and Herschbach, 2013).Analytical approaches, such as the Hedley fractionation (Hedley et al., 1982a), have been developed to quantify organic and inorganic P in fractions of different plant availability.These fractions have been successfully applied to characterize the dynamic nature of P availability in forest ecosystems (e.g., Richter et al., 2006).Several authors discussed the usefulness of this commonly used sequential extraction method to determine P in fractions of distinctly different solubility (Cross and Schlesinger, 1995;Condron and Newman, 2011;Yang and Post, 2011).The ecological relevance of chemically derived Hedley fractions has been critically discussed in some cases (Turner et al., 2005) and it is still unclear which fractions are relevant for plant growth (Rennenberg and Herschbach, 2013).However, the application of this sequential extraction method in forest soils has proven to be useful.In contrast to agricultural soils, the slowly cycling P pool in forest soils contributes substantially to plant nutrition, whereas labile P pools stay relatively constant, indicating that higher amounts of P from more slowly cycling pools "feed" the labile P pool (Compton and Cole, 1998;Richter et al., 2006).However, the Hedley fractionation method is very time consuming, labor intensive and therefore costly, which renders it unsuitable for the analysis of large numbers of samples (Cécillon et al., 2009).Therefore, it would be useful if Hedley P fractions could be determined by means of an indirect and less expensive technique.For this purpose, the use of near-infrared spectroscopy (NIRS) may be a promising approach.
NIRS is a rapid and non-destructive analytical method routinely used in a wide range of fields, mostly in quality control in the chemical, forage or food industry.In the field of soil sciences, NIRS has been successfully applied to predict soil chemical and soil biological parameters like carbon (in the form of total C, SOC, microbial C and mineralizable C) or nitrogen (in the form of total N, microbial N and mineralizable N) contents (Chang et al., 2001;Ludwig et al., 2002;Zornoza et al., 2008) and was also applied to agricultural soils to predict available P (Olsen P, Mehlich III P) (e.g., Maleki et al., 2006;Dhawale et al., 2013).An overview of available reflectance spectra for different soil properties is available at the global soil visible-NIR (near-infrared) spectral library (World Agroforestry Centre -ICRAF and ISRIC-World Soil Information, 2010).
The mineral fractions of soil, which commonly constitute the major portion of soils, are usually hardly visible for NIRS (Malley et al., 2004).Despite the complexity and heterogeneity of chemical and physical soil properties, the spectral information of soils is surprisingly uniform.For example, 30 soil types of the USDA soil classification system could be grouped into only five different spectra classes (Stoner and Baumgardner, 1981).Owing to the low dipole moment between P and oxygen, phosphates cannot be excited by infrared radiation and therefore not be detected directly with NIRS.However, P can be detected indirectly, if it is bound organically or through correlation with the above stated other soil properties such as C and N content that are detectable by NIRS.
To develop robust NIRS models, it is essential to produce a reliable quality in spectral data sets.Several publications on NIRS deal with the influence of the quality of soil samples on spectral information.Stenberg et al. (2010) and Brunet et al. (2007) discussed the influence of ground soil samples in comparison with sieved samples (< 2 mm) and found increasing model quality for more homogeneous ground samples.The quality of the data with regard to the heterogeneity of samples was addressed by Abrams et al. (1987) and Shenk and Westerhaus (1993).Global models used for the prediction of forage quality in hay, where "global" means that the model is based on the largest sample population comprising a wide variety of hay samples and hay composition from many sites, produced predictions that were as good as local models derived from one single site.However, in other studies using plant material (pine needles), both improvements and reductions in model quality for more heterogeneous data sets (addressed as global) when compared with homogeneous (local) data were observed (Gillon et al., 1999).Reductions in model quality were observed for the prediction of C content in sample sets with high variation in chemical composition associated with high spectral variation.Since high variation in chemical composition and spectral properties is true for forest soil samples, it might be a challenging process to build robust models.The problem of detecting soil characteristics using NIRS methods is not new (Stoner and Baumgardner, 1981;Ben-Dor and Banin, 1995;Udelhofer et al., 2003;Cécillon et al., 2009;Genot et al. 2011).To the best of our knowledge, up to now only few publications (Guerrero et al., 2010;Sankey et al., 2008) have addressed the problem of NIRS model calibration and prediction quality for forest soil samples and none of these has focused on soil P.
Currently, there are only a few NIRS models available to predict soil P parameters.These include NIRS prediction models for Mehlich-III extractable P (Chang et al., 2001), for total P and exchangeable P (Malley et al., 2004).Hitherto, there have been no studies focusing on the use of NIRS to predict organic and inorganic Hedley P fractions in forest soils.This study aimed to examine the general ability of NIRS methods to assess these P fractions.Specifically we addressed the following hypotheses: 1. Hedley fractions of different P availability can be sufficiently well predicted through NIRS models.The criteria used to quantify the quality of NIRS models will be introduced in the "Material and methods" section.
2. The quality of predictions through NIRS models is higher for organic than for inorganic P fractions.
3. The quality of predictions of P fractions depends on the variability in phosphorus content and soil properties of soil sample data sets.-WRB, 2006).The total P content covered by the BZE samples ranged from 58 to nearly 2800 mg kg −1 (Table 1).

Soil samples
The other half of the soil samples originated from the international research project "Biodiversity and Ecosystem Functioning in China (BEF China)" (Bruelheide et al., 2011).The 294 soil samples from the BEF China project were collected from 27 so-called comparative study plots (i.e., CSP, 30 × 30 m) in the Gutianshan National Nature Reserve, which has a size of 81.07 km 2 (mean distance between all CSPs = 3.4; Min = 0.04; Max = 8.98 km, Eichenberg, 2015).At all 27 CSPs, the soil type was a Cambisol (reference soil group after WRB, 2006) comparable to the "typical brown earth" after the German classification.Thus, the 294 BEF samples represented a more homogeneous sample set than the BZE samples.At each CSP, nine soil cores that were evenly distributed across the plot were collected.One bulk sample was then combined from the nine cores, for each of the three depth increments (0-5, 5-10 and 10-20 cm), i.e., 81 samples in total.At 12 CSPs, an additional 37 samples were collected from the three topmost diagnostic soil horizons, predominantly Ah and Bw horizons (at one plot, the four topmost horizons were sampled) of representative soil profiles of the CSPs (ranging in depth between 47 and 120 cm).In addition, 176 samples (0-20 cm) were collected in four replicates from 44 so-called neighborhood diversity tree clusters, which were located in the proximity of some of the permanent CSPs.Each of the four samples of tree clusters were also composite samples from three cores each.Each composite sample represents different conditions within the cluster; they were collected at the base of individual trees belonging to different or the same tree species and in the center of a triangle between these trees.We cannot rule out a spatial correlation between these samples.Total P content of all BEF samples ranged from 26 to 330 mg kg −1 (Table 1).
Both sample sets, BZE and BEF, contained samples that originated from different depth but the same location (soil cores).Since these samples might not have been spatially independent, the autocorrelation among samples could have influenced the quality of NIRS models.For example, the validation may not be truly independent, if calibration and validation sample sets contain samples that are systematically correlated with each other.To address this issue, we calculated the Durbin-Watson coefficient (d) as a measure of autocorrelation between measured and predicted values and compared this coefficient between sample sets from all soil depths (containing samples from the same location) and from one soil depth only (spatially independent).This test did not indicate autocorrelation within our sample sets (d values between 1.75 and 2.23).In addition, we found no differences for the Durbin-Watson coefficient for all BZE samples (depth: 0-5 and 10-30 cm) and BZE samples sets from one depth only.Since there was no indication of autocorrelation between samples of different depths, we included all samples in our calibration and validation steps.
The data for C and N content as well as the pH values of the BZE samples were provided by the Northwest German Research Station and the Forest Research Institute Baden-Württemberg (Table 2).Samples were prepared and measured according to the German forest science standard (BMVEL, 2009).Total soil C and N contents of the horizonwise samples and the three depth increments of the BEF samples (Table 2) were determined after dry combustion (1150 • C) with an element analyzer (Elementar Vario EL III) (DIN ISO 10964).Since the material is non-calcareous, to-  (2.5 mL : 1 g water solution to soil) with a blueline electrode and pH meter (Schott, Germany).All soil samples were dried at 40 • C for 24 h and sieved (< 2 mm).For C and N analyses, samples were finely ground.The intensity of sieving respectively grinding has an impact on the reproducibility and the distribution of extractable P over the fractionation process (Tiessen and Moir, 2008) and the NIR spectra (Brunet et al., 2007).Due to the more "natural" extractability of P compared to ground samples and the ecological importance of soil aggregates, we used < 2 mm sieved soil samples for the determination of P fractions.
Before measurement of the NIR spectra, the samples were ground to powder by a mixer mill MM400 (Retsch, Germany) and dried over night at 40 • C.

Phosphorus fractionation
Soil samples were analyzed by the sequential extraction method according to Hedley et al. (1982b).We used 24 samples in one batch consisting of 23 individual samples including six replicate samples (as random quality check) and one in-house soil standard (as standard quality check).In the protocol applied to the samples, 0.5 g soil was extracted by different extractants with increasing strength (Fig. 1), starting with distilled water containing an anion exchange resin (resin) -Dowex 1 × 8, 20-50 mesh, Sigma-Aldrich, Taufkirchen, Germany; followed by 0.5 M sodium bicarbonate (NaHCO3); 0.1 M sodium hydroxide (NaOH); 0.1 M sodium hydroxide with ultrasonic treatment (S-NaOH) -ultrasonic bath RK510H, 35 kHz, 23 WL-1, Bandelin, Berlin, Germany; 1 M hydrochloric acid ( M HCl); and a final acid digestion (residual).Tiessen and Moir (2008) introduced an additional extraction step with hot concentrated hydrochloric acid (HCl conc.)before the final digestion.Our fractionation scheme considered all of these seven fractionation steps, but used 25 mL solution for the extractions and 2.2 mL concentrated nitric acid for the final acid digestion.The whole fractionation process is described in detail by Tiessen and Moir (2008).
Total P content as index value for the sum of all fractions was measured separately after nitric acid digestion, which is a standard method recommended by the German forest soil survey (BMVEL, 2009).Phosphorus was determined colorimetrically with the spectrophotometer UV mini-1240 by Shimadzu through molybdate reactive P method described by Murphy and Riley (1962) modified by John (1970), but keeping the ammonium molybdenum concentration at 10 g L −1 .Every extract needs to be adjusted for pH, as development and stability of the blue phosphomolybdic complex is pH dependent (Drummond and Maher, 1995;Huang and Zhang, 2008;Tiessen and Moir, 2008).The P extracted within every step is only partly detectable as free phosphate.To determine the total phosphate, the NaHCO 3 , NaOH and the HCl conc.extracts were oxidized with ammonium persulfate in an autoclave.The difference between total P after digestion (TP) and the free phosphate in the extract (Pi) provides the organically bound P (Po) (Rowland and Haygarth, 1997;Tiessen and Moir, 2008).The amount of organically bound P in the resin and the 1 M HCl extracts were considered negligible (Tiessen and Moir, 2008) and therefore not analyzed.We summed up the values Pi of the NaOH and Pi of the S-NaOH fraction as well as Po of the NaOH and Po of the S-NaOH fraction because the ultrasonic treated NaOH step is chemically hardly different from the main NaOH fraction and a differentiation between these two fractions with NIRS did not seem meaningful.Similar considerations applied to the distinction between Pi and Po in the HCl conc.extract, since concentrated HCl is a strong oxidant and an additional oxidation with persulfate is chemically comparable.In our analysis, the precision of the Pi value of the HCl conc.extract lacked reproducibility within replicates compared to the TP value and this might be an indicator for an incomplete reaction during the boiling procedure.Also several organic phosphates are unstable under these acid conditions (Turner et al., 2006) and the results of Pi and Po of the HCl conc.fraction also lacked reproducibility within our standard.The total P content of the HCl conc.fractions showed good reproducibility and therefore was used for calibration.
In summary, we selected the following fractions to perform calibrations with NIRS: Pi resin, Pi NaHCO 3 , Po NaHCO 3 , Pi NaOH, Po NaOH, Pi 1M HCl, P HCl conc., P residual.For the purpose of subsequent statistical analyses, the Hedley fractions were grouped by their solubility in labile (Pi resin, Pi and Po NaHCO 3 ), moderately labile (1 M HCl, Pi and Po NaOH) and stable P pools (P HCl conc.and P residual) as commonly found in the literature (Cross and Schlesinger, 1994;Stevenson and Cole, 1999;De Schrijver et al., 2011;Yang and Post, 2011).Several tests were conducted to ensure that Hedley fractions were reproducible and accurately determined.One indicator of the reliability of the Hedley fractionation method is the recovery rate of total P in the sum of fractions, which was very good for the soils in this study (Fig. 2).Only for samples with high P contents, the sum of fractions may either systematically exceed the total P content or the amount of measured total P may be underestimated.Errors for samples with high P content could occur either due to necessary dilution steps for measuring fractions with very high P concentration, where small differences dur-ing colorimetric measurements lead to relatively high errors for the final value, or to incomplete digestion of P with nitric acid.To avoid these sources of error, we performed repeated measurements of these samples and we reduced the sample weight to 400 mg and increased the amount of acid to 3 mL in these samples.In addition, this problem occurred only in a very low proportion of samples (10 out of 576) and removing these samples from the data set did not change the calculated recovery rate.

Near-infrared spectroscopy
Molecules with dipole moments can be excited to stretching, bending and rotating oscillations by absorbing infrared radiation.While an excitation of fundamental oscillations occurs in the mid-infrared range of the light, the NIR spectrum of a substance shows overtone peaks and combination bands of molecular vibrations, mainly from bonds like O-H, C-H and N-H (Workmann, 2001;Malley et al., 2004;Cécillon et al., 2009).This makes NIRS especially suitable for organic compounds (Weyer, 1985;Miller, 2001).Phosphates and other inorganic P compounds are hardly detectable by NIRS due to the weak P-O dipole moment (Malley et al., 2004).The detection of nutrients such as in P compounds seems to be possible, if they are correlated with other NIRS-detectable soil properties such as soil texture, water content, organic compounds, and mineralogy, in particularly soil organic matter or organic carbon (Stenberg et al., 2010).
Each type of molecule has characteristic absorption bands for near-infrared radiation, but there is much interference and overlap between the overtone and combination oscillations.Therefore, molecules cannot be identified by single peaks.The whole spectral information or at least several regions of the spectrum are needed to determine a molecule (Workmann, 2001;Günzler and Gremlich, 2003;Roberts et al., 2004).
The spectral region of NIR radiation ranges from 800 to 2500 nm wavelength, following directly the visible light region.In infrared spectroscopy, commonly the wave number ν in cm −1 is used, which is the reciprocal of the wavelength λ.
Since the O-H bond has a strong influence on nearinfrared absorption (Schmitt, 2000;Malley et al., 2004), ground samples were dried at 40 • C and measured under constant humidity conditions.For each soil sample, five spectra were taken with a Fourier transform mid-and near-infrared combination instrument (Tensor 37, Brukeroptics, Ettlingen, Germany).Each spectrum is a mean of 64 individual scans over the range of 12 000 to 4000 cm −1 wave numbers with a resolution of 16 cm −1 .Model development was done with the OPUS 6.5 spectroscopy software (Brukeroptics, Ettlingen, Germany).Before statistical analyses, a number of mathematical data pre-processing options were tested (Duck-worth, 2004).The pre-processing options providing the best results were first derivative, vector normalization, or a combination of these two.These were used as data treatment before partial least square regressions were conducted and applied to a variety of frequency ranges to find the best fit with the value of the respective component; spacing was 1; several smoothing point values (5,9,13,17,21,25) for the derivatives were also tested.Calibration was performed with crossvalidation, a common approach for small data sets (Williams, 2001;Conzen, 2005).Here, a defined number of samples, in our case one sample, were stepwise excluded from the calibration process.The rest of the samples were used to predict the excluded samples.This procedure was performed until all samples were excluded once, and the best models to predict all samples were found (Conzen, 2005).For each data set, only 80 % of the samples were used for this procedure and the development of models the remaining 20 % were left out for later independent validation of the models to test their prediction quality.This distribution of samples was carried out with an automatic function in the Opus software (Conzen, 2005;Bruker Optik User Manual, 2006) selecting samples for each individual model equally over the whole range of each data set.The min and max values were always assigned to the calibration set and the samples second to min and max were always assigned to the validation set.To avoid confusion with the final validation process of independent samples, we refer to the cross-validation process as calibration.
Our initial intent was to generate a global prediction model for P concentration that integrated the whole variety of forest soil types.The aim was to test whether such a global model could be used to estimate P concentration of all BZE samples and for future inventories.Since this approach proved to be difficult, we also tried to develop NIRS models for a subset of the total BZE sample pool with reduced variation in soil properties including P. Therefore, we selected the BZE samples belonging to the German soil classification group brown earth.This was the only sub-set of soil types (using soil type as an indication of similar soil properties) in the BZE data set with sufficient number of samples for NIRS model development.Additionally, we used soil samples from the above-mentioned BEF China project, which represented a data set from a small geographic region and similar soil properties.
To address the hypothesis that the predictive quality of NIRS models increases with the homogeneity of sample sets, we finally established calibrations for sample populations of varying heterogeneity (Fig. 3): 1. a full data set containing all soil samples with the greatest variability (n = 576); 2. the complete BZE data set with almost similar variability comprising a wide range of different soil types (n = 282); 3. the less variable BZE subset of brown earth soil samples (n = 84); 4. the BEF China sample set comprising soils from one soil type and from one small region (n = 294).
To assess whether the prediction of P by NIRS depended on direct detection of soil P -regardless of organic or inorganic form -or rather indirectly on properties of soil organic matter, we used regression analyses to see whether the quality of NIRS models to predict P could be explained by the relationships between P and soil C and N.As mentioned above, it was presumed that P bound to C or N could be modeled with better quality.
The BZE brown earth data set (set 3) with 82 samples was too small for a robust validation with external samples that were not used in the calibration.Still, we included it in our study, because it was the only meaningful and less variable subset of samples that could be separated from the whole BZE data set (set 2).The sizes of the sample sets 1 and 2 were comparable with those used in other studies to develop "global" models (Abrams et al., 1987;Shenk and Westerhaus, 1993;Gillon et al., 1999).Sample set 4 could be regarded as an extensive but local data set.A principal component analysis (Fig. 3) (OPUS 6.5 software) was carried out to assess spectral properties of the selected subsamples.The general spectral properties of the soil samples indicated that the BEF China sample set (set 4) was more homogenous than the BZE sample sets and clearly separated from the BZE sample set as a whole as well as from the subset BZE "brown earth".The spectral properties of the two subsets of BZE samples were not different.
To evaluate the prediction quality of NIRS models, the statistical parameters R 2 and the ratio of performance to deviation (RPD stands for ratio of standard deviation to standard error of prediction) were used.The usefulness of these statis- tical parameters to characterize the quality of NIRS models was discussed controversially (Chang et al., 2001;Williams, 2001;Malley et al., 2004;Zornoza et al., 2008).Especially for soil analysis, high values for RPD and R 2 are hardly achieved (Malley et al., 2004).To judge our RPD and R 2 values, we followed the classification suggested by Malley et al. (2004), who provided an overview and summary on this topic.We applied the following four levels to classify model quality: -level A: R 2 > 0.95 and RPD > 4, excellent calibrations, useable for all applications; -level B: R 2 between 0.90 and 0.95, RPD between 3 and 4, successful calibrations useable for most applications some with caution, e.g., quality assurance; -level C: R 2 between 0.80 and 0.90, RPD between 2.25 and 3, moderately successful calibrations, useable with caution, e.g., research; -level D: R 2 between 0.70 and 0.80, RPD between 1.75 and 2.25, moderately useful calibrations, useable for screening of samples.
Calibrations with lower values might still be useful for rough screening purposes (Chang et al., 2001).
All statistical analyses shown in this study are performed with SPSS Version 20.

Soil phosphorus contents
Total P contents were higher and showed a wider range in the BZE samples than in the BEF China samples for all pools of P availability and individual fractions (Table 1).The subgroup brown earth was characterized by a large variation in total P content (130 to 2200 mg kg −1 ; as the sum of all Hedley fractions), which had a smaller range of values compared www.biogeosciences.net/12/3415/2015/Biogeosciences, 12, 3415-3428, 2015 to the pool of all BZE samples but, as indicated by the standard deviation, a similar variation in total P.The same applied to P in the labile, moderately labile and stable pools.The mean percentage of organic P, here calculated as the sum of all Po fractions without the HCl conc.fraction, was higher in the BEF China samples (57 %) than in the BZE and BZE brown earth samples (39 and 43 %).However, there was no difference in the range of organic P (10-70 % of the total) for the three data sets.The organic P fraction with the highest concentrations in all three data sets was the one extracted with NaOH (mean values for BZE 155 mg kg −1 : equivalent to 28 % of P t ; BZE brown earth 196 mg kg −1 : equivalent to 31 % of P t ; and BEF China 35 mg kg −1 : equivalent to37 % of P t ).The P concentrations in the NaHCO 3 -extractable and the 1 M HCl-extractable Pi fraction in the BEF samples were so low falling below the detection limit of the Murphy and Riley (1962) method.

NIRS models for P fractions
Initially, NIRS models were calibrated for both the complete sample set as well as for three different subsets (BZE, BZE brown earth and BEF China).The global model for all samples produced calibrations that were only below D level quality (R 2 = 0.08-0.68;RPD = 1.04-1.74).The BZE model, which was developed for samples from different soil types, showed only for the P HCl conc.fraction a moderately useful calibration, while all other fractions could not be calibrated sufficiently well (Table 3).For all fractions except the inorganic NaOH, better calibrations were achieved with the BZE brown earth subset than with the complete BZE data set.However, calibration models achieved only D level quality for two of the fractions (Po NaOH, P HCl conc.).The more homogenous BEF China data set yielded the best calibrations.Only the inorganic NaHCO 3 and the inorganic 1M HCl fraction were, owing to the very low contents, not calibrated successfully.Calibrations to predict the Pi resin, Po NaHCO 3 , Pi NaOH and P residual fraction produced models of level D quality, while calibrations for the organic NaOH and the P HCl conc.fractions yielded models of level C qual-ity.Overall, best calibration models were achieved for the BEF China data set especially for the organic P fractions extracted with NaHCO 3 , NaOH and the P HCl conc.fraction (Table 3).
Models providing level D calibrations were subsequently tested with a validation data set comprising samples not included in the calibration.The validation of the level D calibration model of the BZE set for the HCl conc.P fraction was not successful.Validation of this fraction in samples of the more homogenous subset BZE brown earth yielded a model quality of level C and validation models for the P HCl conc.fraction in BEF soil samples achieved a level B quality (Fig. 4).
The validation of models for the Po NaHCO

NIRS models for P pools
Grouping of the Hedley fractions into labile, moderately labile and stable P fractions did result in good models for the BEF China data set, while only the stable fractions of the other three data sets (BZE + BEF, BZE, BZE brown earth) could well be predicted with NIRS models (Fig. 5).Validation of the best calibration models for the stable P pool in the three data sets BZE + BEF, BZE and BZE brown earth yielded a level D model (R 2 = 0.79; RPD = 2.16) for the complete set (BZE + BEF) as well as the BZE data set (R 2 = 0.73; RPD = 1.91) and a level C model (R 2 = 0.88; RPD = 2.83) for the BZE brown earth data set.
Validations for the BEF China data sets achieved a level C quality for the labile and the moderately labile P pool, and level D quality for the stable P pool (Fig. 6).
To assess whether P content in Hedley fractions was determined indirectly through relationships with soil C and N, we calculated relationships between NIRS model quality and coefficient of determination for the relationship between P in Hedley fractions and soil total C and total N (Fig. 7).For organic P fractions, high coefficients of determination (R 2 ) between P in Hedley fractions and total soil C or total N coincided with high R 2 values of NIRS models.In contrast, the P HCl conc.fraction showed the best model quality for all three data sets (R 2 0.72-0.85)but very poor R 2 for the relationship with total C and total N.For inorganic P fractions for the BZE samples, coefficients of determination (R 2 ) were poor for both the relationships between soil C and N and P fractions and NIRS models (Fig. 7).

NIRS models for P fractions and pools
Calibration and validation models of individual fractions achieved good to satisfactory results for some but not all individual P fractions.Generally, the accuracy and reproducibility of the reference method (here Hedley fractionation) is of crucial importance for the quality of predictions with NIRS models (Williams, 2001).To increase reproducibility in our study, all samples were measured according to the same protocol in the same laboratory at the University of Freiburg.In addition, the strong agreement between the recovery of total soil P and the sum of all individual fractions indicated www.biogeosciences.net/12/3415/2015/Biogeosciences, 12, 3415-3428, 2015  a high reproducibility of the method.This reproducibility, however, does not provide information about the distribution of Pi and Po within individual fractions.For the individual fractions, it is widely acknowledged that significant potential errors may occur owing to variation in alkalinity or acidity in the extracts leading to acid hydrolysis and precipitation of inorganic P.This can influence the accurate distinction between organic and inorganic forms of P and may thus lead to over-or underestimation of these forms within individual fractions (Magid et al., 1996;Condron et al., 2005;Turner et al., 2005;Tiessen and Moir, 2008;Condron and Newman, 2011).However, we achieved good reproducibility for Pi and Po in repeatedly analyzed samples for NaHCO 3 and NaOH fractionation steps.For example, the standard errors of 16 repeated measurements of the same sample were 5.0 % for Pi NaOH, 8.3 % for Po NaOH and 7.5 % for P total NaOH.
Other than these methodological limitations of the Hedley fractionation method, the level of P content of each fraction is also of importance.The P content in the Pi NaHCO 3 and 1 M HCl fractions of the soil samples from subtropical China were close to the detection limit.Therefore, the reference data of these two fractions could not be regarded as reliable and hence no meaningful calibration models with NIRS could be produced.
Using all Hedley fractions, the description of soil P availability becomes easily confusing.Some fractions could not be calibrated owing to very low P concentrations or only with very low prediction quality.In the case of the BEF China data set, grouping the fractions into labile, moderately labile and stable pools showed a slight increase for the quality parameters R 2 and RPD, but the quality of models remained at the same level.These pools incorporated the inorganic NaHCO 3 and 1 M HCl fractions, which could not be predicted individually.As part of the labile respectively moderately labile P pool, a reasonable prediction of the total P content for the BEF China samples could still be possible.While pooling these fractions does not reduce the work involved in reference analysis of the Hedley fractionation, it is still a significant reduction in the modeling process, reducing the number of models from eight to three for each data set.

Calibration of organic and inorganic P fractions
Whether P is in organic or inorganic form seemed to be of importance for the quality of predictions of Hedley fractions with NIRS.Regardless of the data sets and sample origin, models predicting the organic P fractions performed better than models predicting the inorganic P fractions (Table 3).Similar results have been obtained for N (Ludwig et al., 2002).The superior quality of models for organic P fractions is related to the underlying mechanisms generating NIR spectra, because organic compounds can be more easily excited by the irradiation than inorganic ones.Spectral variation in relation to soil P content is often associated with organic matter or crystal water (Ben-Dor and Banin, 1995;Chang et al., 2001;Malley et al., 2005;Stenberg et al., 2010).Therefore, we also calibrated NIRS models for the sum of P in all organic fractions (data not shown).However, we could not observe any improvement for prediction of the sum of assigned organic P in comparison with single organic fractions or the pools.This might be caused by the large diversity in organic P compounds (Turner, 2008).In contrast to organic P fractions, the relationships between inorganic P fractions and other NIRS-detectable soil properties were obviously not strong enough to aid the prediction of inorganic P using NIRS.
In our study, we were not able to identify spectral regions to be specific for a P signal as was found in other studies (Malley et al., 2004).Therefore, we had also assessed whether focusing on typical NIR spectral regions for C-H, N-H and O-H bonds could influence NIRS model quality.The organic residual which is connected with the phosphate molecule could be dominated by CH, NH, OH bonds or a mixture of them.For this purpose we compared NIRS models based on optimized spectral regions (automated procedure by OPUS software), on the whole spectral range and on specific spectral regions, which are known to represent C-H, N-H and O-H bonds (Conzen, 2005).We found that in all cases, the OPUS-software optimized spectral selection yielded superior models followed by models covering the whole spectral area.Models for selected bonds were in all cases of substantially lower quality, and were thus not presented in detail.The best results based on R 2 and RPD were obtained for O-H bonds for the Po-HCO 3 and P HCL conc.fractions.This was followed by models focusing on C-H bonds and the lowest quality models were obtained for models focusing on N-H bonds.
There was no general pattern for the relationship between NIRS model quality (R 2 ) for different P fractions and the coefficient of determination of relationships between soil C or N and P in these fractions.This indicated that NIRS models were based mostly on original properties of organic P compounds and the predicted P content was not simply a function of soil organic matter content.This corroborates the underlying assumptions of the Hedley fractionation that P in the individual organic fractions is of different availability owing to the different properties of the organic matter dominating the respective fraction.In turn, this indicates that total organic soil C and N can be used only to a limited extent to predict organic P in different fractions.

Homogeneity of data sets
The quality of predictions of P fractions depended on the homogeneity of individual data sets.The question of homogeneity has also been discussed in the use of global and local calibration models by Abrams et al. (1987) and Shenk and Westerhaus (1993), who found that global models, with a higher number of samples per data set, were potentially as accurate as more local calibrations for mixtures of hay consisting of different plant species and origins.Variable results were obtained for calibrations of the total C, N and P between global and local models of heterogeneous plant material (Gillon et al., 1999).The global model, which combined all individual data sets of pine needle categories (needles on trees, fallen needles and litter), was more accurate or similar than the "local" individual data sets for each category of needles (Gillon et al., 1999).However, for global models predicting carbon, a decline in model quality was observed.Also, a comparison of NIRS models for soil organic C, inorganic C and clay content showed a decreasing accuracy from local to global calibration models (Sankey et al., 2008).In contrast Brunet et al. (2007) found for local NIRS models higher model quality predicting total C or total N for local models compared with global models.This increasing effect was stronger for ground samples (< 0.2 mm) than for sieved samples (< 2 mm).For our soil data sets, a similar incline in model quality with increasing homogeneity of sample origin could be observed.The general observation that an increase in the range of values leads to an increase in the robustness of NIRS models (Abrams et al., 1987;Shenk and Westerhaus, 1993) did not apply in our study.Our results indicated that regardless of the organic or inorganic origin or the availability of P, the quality of NIRS models to predict P was better for more homogenous than heterogeneous sets of soil samples.For example, reducing the soil samples from the nationwide forest soil monitoring program to the subset BZE brown earth, we produced a more homogenous subset that improved model calibrations for all but the inorganic NaOH fraction.Although an upgrade in model category (into category D) was only observed for the organic NaOH fraction.The most homogenous BEF China sample set produced even www.biogeosciences.net/12/3415/2015/Biogeosciences, 12, 3415-3428, 2015 better calibration models, including improvements of model category (into D as well as from D to C), than the BZE subset.Since P is not "recognized" directly by near-infrared radiation but produces signals in the spectra that are related to P compounds, it appears that the signals for the same group of P compounds that relate to a specific Hedley fraction vary greatly among different soil types.For organic P, these groups of compounds are probably present in more than one fraction and the various extracts are unlikely to be either exhaustive or unique with respect to the target compounds (Turner et al., 2003).To our knowledge, the chemical nature of the organic phosphorus within the operationally defined fractions is still poorly understood (Magid et al., 1996;Turner et al., 2005).In contrast, the inorganic P forms represented in the distinct P fractions are more specific in their chemical nature and well known (Stevenson and Cole, 1999;Tiessen and Moir, 2008).This variation of P forms within Hedley fractions may be greatly reduced through restricting the development of NIRS models to comparable soil types.

Conclusions
The approach to predict soil P Hedley fractions of different availability with NIRS can be regarded successful only when certain limitations are considered.NIRS models for prediction of individual inorganic P fractions were at best moderately useful but unsuitable for soil inventories.We found that the quality of NIRS models for prediction of P, for single fractions as well as for pools, depended on the homogeneity of sample sets.The homogeneity may be related not only to the same soil types (improved models after grouping of soil types in the BZE subset) but also to the same geology that has a large influence on soil properties (better calibration models of the BEF China sample set -with the same geology -than the BZE subset originating from different geologies).The approach may therefore be well suited for large experiments that require a great number of spatial and temporal replication of samples within well-defined soil types.For large-scale, grid-based soil inventories such as those conducted in the context of environmental monitoring programs for entire regions, like nation-wide soil surveys, it appears that NIRS models to predict P fractions need to be developed for individual soil groups or other classes of samples according to specific soil properties.While the development of NIRS models requests a considerable initial investment, these efforts are likely to pay off only for large sample numbers and in repeated inventories.
The Supplement related to this article is available online at doi:10.5194/bg-12-3415-2015-supplement.

Figure 1 .
Figure 1.Sequential extraction scheme of Hedley fractionation modified by Tiessen and Moir, grouped by pools of availability; NaHCO 3 , NaOH and HCl conc.steps provide both organic and inorganic compounds for this fraction step.

Figure 2 .
Figure 2. Relationship between the sum of recovered Hedley P fractions and independently determined total P (mg kg −1 ) in soil samples.

Figure 3 .
Figure 3. Score plot of principal component analysis (total of five components derived; principal component 1 vs. principal component 2) for the three data sets BZE (white, set 2), BZE "brown earth" (grey, set 3) and BEF China (black, set 4).

Figure 4 .
Figure 4. Validation for HCl conc.soluble P fraction of the three data sets BZE, BZE "brown earths" and BEF China (mg kg −1 ), x axis measured, y axis predicted values, please note the difference in scale among graphs.

Figure 5 .
Figure5.Calibration models for the four data sets BEF China + BZE, BZE, BZE "brown earth" and BEF China, pooled for labile, moderately labile (mod.labile) and stable phosphorus (mg kg −1 ), x axis measured, y axis predicted values, please note the difference in scales.

Figure 6 .
Figure 6.Validation for BEF China calibration models with external data sets pooled for labile moderately labile (mod.labile) and stable phosphorus (mg kg −1 ), x axis measured, y axis predicted values.

Figure 7 .
Figure7.Goodness of fit of calibration models (r 2 ) for all individual P fractions and coefficient of determination (r 2 ) of regression analysis of total organic carbon (a), total nitrogen (b) and the P fractions (circles are organic fractions; squares are inorganic fractions; triangles are HCl conc fraction) in soils for all three data sets (black is BEF China; grey is BZE "brown earth"; white is BZE), dashed line indicates the minimum model quality of r 2 = 0.7.

Table 1 .
Descriptive statistical parameters of soil P content (mg kg −1 ) as total sum of Hedley P fractions and in pools of different plant availability, grouped by data sets used for NIRS modeling.

Table 2 .
Descriptive statistical parameters of carbon (g kg −1 ), nitrogen (g kg −1 ) and pH (H 2 O) in soil, grouped by data sets used for NIRS modeling.

Table 3 .
Quality parameters for NIRS model calibration (R 2 , RPD) for all Hedley fractions in the three data sets: BZE, BZE "brown earth" and BEF China.
1 Model quality level D. 2 Model quality level C.