Interactive comment on “ A linear mixed model , with non-stationary mean and covariance , for soil potassium based on gamma radiometry ” by K . A

See our response to Comment 4 from Referee 1, ‘Since the model was fitted by likelihood not least-squares, a conventional R2 cannot be computed. To give the intuitive indication of the goodness of the model predictions that the referee suggests, we introduced a prediction adjusted coefficient of determination, R2 PA, see Equation (10) and Table 1. These show that almost 75% of the variability in the validation data is accounted for by the predictions from the models using the radiometric data as a covariate.’


Introduction
Soil information is costly and relatively sparse, so there is interest in methods to predict soil properties at unsampled sites from a set of sampled data.Generally the precision of such predictions can be improved if covariates, which reflect factors of soil formation, are incorporated into the predictor.There are various ways to do this.One of the most efficient, when direct observations of the target variable are distributed with reasonable coverage over the area of interest, is the empirical best linear unbiased predictor (E-BLUP) based on a linear mixed model (see below) in which the relationship between the target variable and the covariate is expressed as a linear function of the covariate, and the residual variation is expressed as a combination of a spatially correlated random effect (a random variable), and identically and independently distributed random error.The E-BLUP is, in effect, a combination of a regression-type prediction from the covariation and a kriging-type prediction from the random effect.This was discussed in the context of soil information by Lark et al. (2006), and the approach has been applied in various studies (e.g., Chai et al., 2008).In addition to the predictor, a prediction error variance is also computed which provides a measure of the uncertainty of the prediction.
In the standard LMM-E-BLUP approach to spatial prediction, as in all geostatistical methods, there is a necessary assumption of stationarity in the covariance of the random variation.For geostatistical prediction we require the covariance between the random effect at pairs of sites (including sites where we have data and sites where we require predictions).Since our data provide us, in effect, with a single realization of this random variable, the covariances cannot be obtained directly.Some generalization about the random variation is therefore necessary, so as to allow the parameters of the LMM to be estimated.Commonly we assume stationarity in the covariance (second-order stationarity).Under this assumption the variance of the random variable is the same at any two locations, s i and s j , and the correlation between the variable at the locations is a function of the lag vector between them, ρ(s i − s j ). (1) This assumption makes it possible to model the covariance of the random effect with some appropriate parametric function.
The parameters of this function are best estimated by residual maximum likelihood (REML).However, the assumption of stationarity in the covariance is not usually plausible when applied to properties of the soil, particularly across complex landscapes.It must first be noted that the assumption does not apply to data, but rather to a random function of which it is assumed that the data are a realization.However, behaviour of the data may indicate whether or not the assumption is plausible.When a variable appears substantially more heterogeneous in one part of the landscape than in another, or when the dominant spatial scale of variation in one part of the landscape differs from another, then this casts doubt on the plausibility of stationarity assumptions.This has been discussed by, for example, Voltz and Webster (1990) who examined variograms for soil properties over contrasting Jurassic strata in central England, included in a single data set, and found pronounced differences.Analysis of soil data sets using wavelet transforms (e.g., Lark and Webster, 2001) similarly cast doubt on the plausibility of stationarity assumptions.
Does the failure of this assumption matter?Lark (2009) compared two LMM for a soil data set.In one of these a conventional assumption of stationarity in the variance was made.In the second a non-stationary variance model was fitted (although the underlying autocorrelation was stationary).Lark (2009) showed that predictions made under these two models were very similar, but that the prediction error variances derived from the non-stationary model gave a much better description of the errors in predictions at validation sites.(If the autocorrelation appeared to be non-stationary, then this might affect the predictions themselves).
It would therefore seem worthwhile to attempt to model non-stationary covariation of soil properties as a basis for spatial prediction.For this reason Haskard and Lark (2009) presented a development and case study of the method of spectral tempering proposed by Pintore andHolmes (2004, 2005).This method is described in more detail below.Essentially it allows both the variance and autocorrelation of the property of interest to adapt locally in response to a set of covariates.Thus the variable might appear 'smoother' in some regions than another.Haskard and Lark (2009) also found that the prediction error variances based on this nonstationary model gave a better account of the uncertainty of predictions at validation sites than did a simpler stationary linear mixed model.
A covariate which has been widely used for the prediction of soil properties is passive gamma ray emissions from the earth's surface (McKenzie and Ryan, 1999;Wilford, 2008).
These emissions arise from the decay of particular elements in the upper 35 cm or so of the soil.Gamma rays are emitted over an energy spectrum, which can be partitioned into bands dominated by particular decays.Three bands which are widely used in soil studies are those associated with potassium, uranium and thorium.Much of the application of gamma radiometry for prediction of soil properties has taken place in Australia.Here the total airborne radiometric signal -or the ratios between potassium and thorium concentrations -relates to the age of the weathered material at the land surface and provides information on soil texture properties (Taylor et al., 2002).Radiometry has therefore proved a useful tool to map complex patterns of soil variation that arise from erosion and deposition of material over long periods of time.Rawlins et al. (2007) undertook a statistical analysis of the radiometric potassium signal from the soil surface in a part of eastern England, and showed that its variation could be attributed to a range of sources including the total potassium content of the soil determined from samples collected in the field.This suggested that gamma radiometry is a potentially useful source of soil information in the relatively young soils of the United Kingdom.This paper addresses the following question.If we have a data set on soil potassium as a basis for geostatistical prediction at unsampled sites, can we beneficially use airborne radiometric data to model both the mean and covariance of the target property in an appropriate LMM?By doing so, we may be able to improve both the precision of the predictions of soil potassium content and the validity of the prediction error variances.
2 The statistical model

The stationary model, estimation and prediction
The analyses in this paper are based on the linear mixed model (LMM) in which our data are assumed to be a realization of a random variate, with a single value at any location, where z is the n × 1 vector of observed values at locations s 1 ,s 2 ,...s n respectively, τ is a t × 1 vector of fixed effects, such as to allow for a smooth spatial trend or other external effects, with corresponding n × t design matrix X, u is a n × 1 vector of zero-mean random effects and e is an n × 1 vector of independent random errors, one element for each observation.In the application of the LMM to spatial data the random component u models the spatially-correlated component of variation and e is the so-called 'nugget effect' which incorporates uncorrelated measurement error and sources of variation in the target property that are uncorrelated over the shortest interval between observations.We assume that the random components u and e are mutually independent.The variance matrix of u is G (n × n) whose elements depend only on the sample locations under the assumption of a stationary covariance function, with a few parameters.These parameters are variances, and spatial parameters that characterize the spatial correlation function introduced in Equation (1) above.The random vector e has zero mean and covariance matrix R (n × n).The matrix R is diagonal when e denotes a nugget effect and R = σ 2 nugget I when it is assumed that the nugget effect has stationary variance σ 2 nugget .The variance matrix of the random vector z is denoted H = G + R.
Because the mean of z in Equation ( 2) is given by Xτ it is not necessarily assumed to be stationary, and the fixed effects could describe pronounced spatial trends.However, in standard applications of LMM to spatial data it is assumed that the covariance of z is stationary, as described above.In this study we used an isotropic exponential correlation function for the variable u.If the variance of u is σ 2 , then, with a stationary exponential model, element {i,j} of the covariance matrix G, which is the covariance between the values of z at the two locations s i and s j , is given by where φ is a distance parameter.Note that in this isotropic model the separation between the two locations is expressed as a (scalar) distance, but the model can be extended to an anisotropic case in two or more dimensions (Haskard et al., 2007).When such a stationary model has been specified then the variance parameters (σ 2 nugget , σ 2 and φ in this case) can be estimated by residual maximum likelihood (REML), and these estimates can then be used to obtain the best linear unbiased estimator (BLUE) of the fixed effect coefficients -τ in Equation (2) -by generalized least squares (details are given by Lark et al., 2006).
Once the parameters of the LMM are estimated then the empirical best linear unbiased predictor (E-BLUP) can be computed at unsampled sites where the fixed effects are known.Let there be p such sites, for which the fixed effects are contained in the p × t design matrix X p .We require the p × 1 vector of E-BLUPs which is where τ is the BLUE of the fixed effects vector τ , ũp is the E-BLUP of the spatially-correlated random effects vector u p at the prediction locations, and ẽp is the E-BLUP of the nugget effect, which is zero at unsampled locations.The E-BLUP ũp is given by where G po = Cov{u p ,u}, the elements of which can be computed given the REML estimates of the variance parameters, and . This shows that the E-BLUP at an unsampled location consists of a regression-type component (X p τ ) and a kriging-type component (ũ p ).
The PEV is the variance of the prediction error, Var{z p − z p }, which can be obtained from where G pp = Var{u p }, R pp = Var{e p } (again, obtained from the REML estimates of the variance parameters) and A is the coefficient matrix from the mixed model equations Again, more detail is provided by Lark et al. (2006), and the reader looking for a more extensive treatment is referred to Stein (1999)

The non-stationary model
In this paper we use the modified and extended version of the spatially adaptive spectral tempering procedure (Pintore andHolmes, 2004, 2005) that we introduced and described in full elsewhere (Haskard and Lark, 2009).We outline this procedure below.Consider a case where we are interested in a total of n t = n + n p locations, at n of which we have direct observations of our target variable as well as covariates for the fixed effects, and at the other n p of which we know the values of the covariates for the fixed effects and require predictions of the target variable because it has not been measured there.We may propose an initial stationary variance model for our variable, and from the parameters of this compute a n t × n t covariance matrix, C. A principal components analysis of this matrix yields what is known as its spectral decomposition where v 1 ,v 2 ,...,v nt are the n t eigenvectors, and λ 1 ,λ 2 ,...,λ nt are the corresponding eigenvalues.These latter constitute an empirical spectrum of the data.The eigenvectors can be regarded as a basis for the data, that is to say we can represent the variable as a weighted sum of the eigenvectors.If we examine the eigenvectors it can be seen that some correspond to high-spatial frequency (short-range) components of the variable where as others account for low frequency components, broader trends.The eigenvalues, as an empirical spectrum, describe how the overall variance of the variable is partitioned between these components.
Tempering is a method to modify the form of the spectrum, by raising its components to a power η.Tempering can therefore change both the overall variance of a variable, and its distribution between spatial scales.The latter is equivalent to a modification of the spatial autocorrelation of a variable.In spectral tempering the power η is modelled as a function of location η(s).This allows the covariance of the variable to change with location, to give a non-stationary covariance matrix where η(s i ,s j ) = 0.5η(s i ) + 0.5η(s j ).
In our modified spectral tempering procedure the function η(s) is used to adapt the local autocorrelation of the variable (we refer to this as the 'smoothness' of the variable), and the spatially correlated variance (we refer to this as the spatial variance) and the nugget variance are modelled separately, also as functions of spatial coordinates.The variance of the random effect at location s i under this non-stationary model is given by σ 2 κ(s i ), where σ 2 is the variance in the initial stationary model.Similarly the nugget variance at location s i is given by γ 2 (s i ).Clearly these functions must return positive values for all n t locations.Once some general parametric forms for the functions η(s), κ(s) and γ 2 (s) have been proposed the parameters are estimated by REML.The residual log-likelihood can be computed for any proposed set of parameters.First we obtain a provisional non-stationary covariance matrix C NS , by tempering the empirical spectrum obtained from initial stationary covariance matrix C.This is done with Equation (8).This preliminary matrix is then rescaled to a non-stationary correlation matrix, B, by dividing each element by the square-root of the product of the corresponding elements on the main diagonal, and then the non-stationary matrix of the random effect G NS is obtained by where κ = [κ(s 1 ),κ(s 2 ),...] T and σ 2 is the initial stationary variance.The matrix G NS , along with the non-stationary nugget variances in γ 2 , constitutes the non-stationary variance component of a linear mixed model.The resulting non-stationary variance parameters can be used to compute all the variance matrices required to compute the E-BLUP at the n p sites where we require predictions of the target variable, and its associated prediction error variance.
3 Soil data and their analysis

Stationary variance models
Our data are obtained from 890 locations in a region of approximately 2400 km 2 in north-east England, collected by the G-BASE project (Johnson et al., 2005) and described in detail by Rawlins et al. (2007).At each location we had data on total potassium content of the soil, K soil , (depth 35-50cm).We also had a corresponding value of the radiometric potassium variable, K Γ , extracted from the gamma emission spectrum measured by a 256-channel Picodas PGAM 1000 (Model 6.11) airborne spectroradiometer.The ground footprint corresponding to each measurement was an ellipse with a long axis length of approximately 200 m.
Five cores were taken at the centre and vertices of a square, length 20 m, centred at the nominal location of the data point.The cores were combined and the bulk sample then dried and disaggregated, sieved to pass 150 µm then coned and quartered and subsampled to produce a 50-g subsample which was ground in an agate planetary ball mill.A further subsample was then analysed for total content of a range of elements, including potassium, by wavelength dispersive Xray fluorescence spectrometry (XRFS), described in detail by Johnson et al. (2005).
We subdivided this data set at random into a set of 222 observations for modelling and 661 at which predictions would be obtained for validation.We restricted the size of the modelling data set to about a quarter of the available values since, if the data from which the predictions are computed were too dense, the prediction error variances would only be sensitive to the spatial covariance of the random effects over the shortest lags.Seven observations were removed because they took extreme values (large and zero).After exploratory analysis we decided to transform both the total potassium content and the gamma potassium values to natural logarithms, since under these transforms an ordinary least squares regression of logK soil on logK Γ gave residuals with a histogram (see Figure 1) which appears consistent with an assumption of a normally distributed random variable.The exploratory statistics of the residuals are also consistent with an assumption of normality.The coefficient of skewness is 0.13 and the excess kurtosis is only 0.18.Furthermore the mean and median (0.001 and -0.002 respectively) are very similar, and the first and third quartiles (-0.694 and 0.744 respectively) are symmetrical about the median.Figure 2 shows the scatter plot of the log-transformed variables, and Figure 3 shows a classified post-plot of the residuals, i.e. a plot of the coordinates of the sample points with the value of the residuals indicated by the size of the symbols.While an assumption of normality of the residuals is plausible from the histogram (Figure 1), it is apparent from Figure 2 that the scatter of the residuals appears larger when the radiometric potassium variable is small.This is not compatible with the assumption of a stationary variance, although it must be remembered that stationarity is not a property of data, but of the model that we postulate as underlying the data.Such exploratory analysis can therefore only be indicative.Figure 3 is suggestive of spatial dependence in the residuals, since those of similar size appear to be clustered.
Exploratory analysis also showed that there was no pronounced anisotropy in the residuals, so we elected to use an isotropic variance function.We also found that an exponen-tial stationary covariance function gave the best fit in a linear mixed model in which the fixed effect structure was a linear function of logK Γ .We used this stationary covariance function to obtain the E-BLUP and corresponding PEVs for logK soil at all the 661 validation sites.This stationary function was also used to provide an initial stationary empirical spectrum for spectral tempering to give a non-stationary variance model.
In addition we fitted a linear mixed model in which the only fixed effect was an overall mean, specifying a stationary exponential covariance function for the random effect.This was also used to compute the corresponding E-BLUP and PEVs of logK soil at the validation sites.This provides a basis for assessing the utility of K Γ as a covariate for spatial prediction of K soil .

Non-stationary variance models
In this study we hypothesized that the covariance of transformed K soil , as well as its local mean, could be predicted from K Γ .This hypothesis is compatible with our comments on Figure 2 above.We therefore proposed linear functions of logK Γ to obtain the expressions required for non-stationary variance models: η(s), κ(s) and γ 2 (s) so that, for example The overall variance models are labeled by a threecharacter code, with the three characters denoting, respectively, smoothness, spatial variance and nugget variance.In each position, 'L' denotes a linear function in logK Γ (two parameters to estimate), '1' denotes a constant only (one parameter to estimate), and '0' denotes fixing the function at the value of the initial stationary variance model (no parameters to estimate for this term).The option '0' was never considered for the spatial and nugget variance components.When selected for the smoothness it forces the variance model to be exponential with the same distance parameter throughout the region.The spatial variance and nugget variance may still be varied.
Note that the initial stationary model requires an estimate for the nugget variance and for the variance of the spatial variance, so can be seen to require estimation of two variance parameters, corresponding to model-code 011, conditional on the initial stationary variance model.(If we allow the method to estimate a common spatial variance and common nugget variance, with η(s) fixed at 1, it should yield those optimal values we obtained when fitting the initial stationary model.However, if η(s) is different from 1, whether constant or otherwise, different variance estimates would be optimal).
A fully non-stationary variance model (LLL) would have six parameters, conditional on the initial stationary empirical spectrum, by comparison to a stationary model with two (011).Models of intermediate complexity are also possible, so we require a method for model selection.In our previous paper (Haskard and Lark, 2009) we proposed a procedure to decide whether or not additional variance parameters are justified by the improved fit that they achieve.It is important to note that this is a procedure for comparing alternative variance models, it is assumed that the fixed effects structures of all the models are identical, for example a linear function of logK Γ .In this procedure we start with the full nonstationary variance model (six parameters).The stationary model can be regarded as a particular case of the full nonstationary model, nested within it, in which the parameters take particular values.There are various pathways from the full model to the stationary model through successively simpler models each of which is nested within the (more complex) models above it on the pathway.This is represented in a lattice diagram -see Figure 4, Haskard and Lark (2009).Having estimated the residual log-likelihood for all the models on the lattice, we may then consider in turn each alternative to the full model that is one step down each of the possible pathways.Whether the more complex model is justified can be decided by testing twice the log-ratio of the likelihoods of the two models against χ 2 with N degrees of freedom where the complex model has N more parameters than the simpler.If the null hypothesis is accepted the more complex form of the model is not significantly better than the one a step below it, and the step down the path is justified.By repeating this procedure down all the pathways until the null hypothesis for a particular comparison is rejected (P < 0.05), we may obtain a set of candidate models.These are not necessarily nested, if not they can be compared with respect to the Akaike information criterion, twice the number of parameters minus twice the log-residual likelihood, which will be smaller for the more parsimonious model (Akaike, 1973).
At each validation location we therefore had a prediction and PEV from each of: (i) a stationary model in which the only fixed effect is the overall mean, (ii) a stationary model in which the overall mean and coefficients of K Γ are fixed effects, and (iii) a set of non-stationary models in which the overall mean and coefficients of K Γ are fixed effects and the variance model has differing degrees of complexity.We computed a set of validation statistics from observations and predictions at these sites.In general if z(s i ) is the observed value of logK soil at the ith validation location out of n p , and Z(s i ) is the E-BLUP, given some particular LMM with σ 2 p the corresponding PEV, then the prediction error (P E) is and the standardized squared prediction error (SSP E) is We computed the mean values of P E and of P E 2 over the 661 validation sites.As a measure of the quality of predictions we computed what we call the prediction adjusted coefficient of determination, R 2 PA where where P E 2 and s 2 are, respectively the mean value of P E 2 and the sample variance of the validation data set.If the predictions accounted for all the variation in the validation data (i.e. they are all exact) then R 2 PA = 1.If the predictor is no better than the sample mean then R 2 PA = 0, and a very poor predictor (e.g. a biased one) could have R 2 PA < 0. The SSP E is a measure of the validity of the PEV.The expected value is 1 when the PEVs are reliable.However, it has been found (e.g., Lark, 2009) that the median SSP E which is a more robust statistic may be more useful than the mean for assessing PEVs, since it is less affected by a few validation points at which the SSP E is very large or small.Under an assumption of normal prediction errors the expected value of the median SSP E is 0.455.
We computed an empirical distribution of the mean and median SSP E under conditions where the same distribution of observation and validation sites are used and the data are a realization of a known spatial model in which the fixed effects are a linear function of the overall mean and logK Γ and the random effects have a stationary distribution with parameters equal to those fitted in our model (0,1,1).We computed 1001 realizations of this process.The central 95% of the distribution of the mean SSP E was 0.881 to 1.126, with median 0.996, while the central 95% of the distribution of the median SSP E was 0.375 to 0.545, with median 0.454.

Results
Figure 4 displays a hierarchy of non-stationary variance models that were fitted, with lines connecting models that are nested and therefore for which likelihood-ratio tests can be carried out.P -values are displayed on the connecting lines when they are less than 0.05.On the basis of residual log-likelihood, two candidate models were identified.The first was 01L with constant variance, with correlation fixed at exponential (i.e.not spatially adapting) and, with the nugget variance changing spatially, depending linearly on logK Γ .The second candidate model was L11, which keeps both the spatial variance and the nugget variance constant but allows the smoothness to adapt spatially as a linear function of logK Γ .However this required four parameters to be estimated, and while the two models L11 and 01L cannot be compared by a log-likelihood test because they are not nested, on the basis of the Akaike Information Criterion the latter was preferred.
Table 1 shows validation results for the E-BLUPs based on LMMs with stationary variance models, and a selection of cases with non-stationary models.Note first that there was a reduction in the mean PE 2 on using logK Γ as a fixed effect for prediction (R 2 PA increased from 0.67) but very little difference among the models with this fixed effect with respect to PEV and PEV 2 , for most of the models R 2 PA = 0.74 .In short there was little difference, with respect to the precision of the predictions, between E-BLUPs with logK Γ as a fixed effect and stationary and non-stationary variance models, even in non-stationary models where the smoothness (and so the autocorrelation) was adapted.However, there was an effect on the quality of the computed PEV.Note that the median SSP E for the BLUPs from the stationary model (with logK Γ as a fixed effect) was smaller than the 2.5 percentile, suggesting that the error variance was overestimated.This is consistent with results of Lark (2009) and Haskard and Lark (2009), who found that the median SSP E of E-BLUPs from stationary models indicated that the PEVs were overestimated.However, some of the non-stationary models for which validation results of the corresponding E-BLUPs are presented, show mean and median SSP E much closer to the expected values.

Discussion and conclusions
These results show that improvements in the precision of spatial prediction of soil potassium (log-transformed) of about 20% (mean square error) were achieved by using the radiometric potassium signal as a covariate.It is also seen that the PEV of E-BLUPs based on a model with a stationary variance structure, in which this covariate is used, appear to overestimate the uncertainty of the predictions.When E-BLUPs were computed using LMMs based on non-stationary covariance models the PEVs were better as judged by the mean and median SSP E. However, it must be noted that the LMM selected on the likelihood criteria and AIC (model 01L), only achieved a small improvement over the stationary model, as judged by the median SSP E. The best results on SSP E were obtained with model L11, in which the smoothness was adapted, but this model would not be selected on the fitting criteria alone, since its advantages over 01L with respect to the likelihood were not large enough to justify the inclusion of an additional parameter.
Our results therefore indicate that across this region some model for non-stationarity in the variance is appropriate, if we want to put reliance on the PEV of E-BLUPs.They also suggest that spatially adaptive tempering, with K Γ as a predictor for the tempering parameter η and local variances, has potential to improve modelling of the variance of soil potassium.Application of our approach to other landscapes would help to determine whether more complex, nonstationary variance models based on suitable covariates can improve prediction of soil properties at unsampled sites.
There are clearly questions for further research, prompted by the fact that the best non-stationary variance model (selected on likelihood criteria) did not give the best PEVs when these were validated on independent data.It may be that the dependence of the non-stationary parameters on the predictor could be better modelled than by the simple linear functions that we used here.Improving this modelling would require the development of some appropriate method for exploratory analysis of the data.While we might hope that the model selection on the likelihood criteria for the modelling data subset would identify the model which subsequently proves best on the validation data, it isby no means guaranteed.An alternative procedure for model selection might be developed, based on prediction efficacy with a separate validation data set.Alternatively a cross-validation or generalized cross-validation procedure (Marcotte, 1995) could be used for model selection.
To conclude, we have shown that the adapted and extended spectral tempering method that was presented by Haskard and Lark (2009)) can be applied to large two-dimensional data sets.We have shown that the use of gamma radiometric data can improve spatial prediction of a soil property, and that in principle using spectral tempering can improve modelling of the spatially-dependent variance of this property, and so the PEVs of the predictions.Further work remains to be done on the problem of model selection for non-stationary variance structures.

Figure captions 1 .
Figure captions1.Histogram of the residuals of an exploratory regression of logK soil on logK Γ .2. Scatterplot of logK soil against logK Γ .3. Postplot of the residuals of the exploratory regression of logK soil on logK Γ .4. Lattice diagram of alternative non-stationary covariance models.

Table 1 .
Mean and median squared predictions errors and standardized squared prediction errors for selected non-stationary variance models for the BGS potassium data.Stationary covariance; b Non-stationary covariance, nugget variance adapts; c Non-stationary covariance, smoothness adapts; d Nonstationary covariance, nugget and smoothness adapt; e Non-stationary covariance, spatial variance and smoothness adapt; f Non-stationary covariance, nugget and spatial variance and smoothness adapt. a