Modelling soil bulk density at the landscape scale and its contributions to C stock uncertainty

Soil bulk density ( Db) is a major contributor to uncertainties in landscape-scale carbon and nutrient stock estimation. However, it is time consuming to measure and is, therefore, frequently predicted using surrogate variables, such as soil texture. Using this approach is of limited value for estimating landscape-scale inventories, as its accuracy beyond the sampling point at which texture is measured becomes highly uncertain. In this paper, we explore the ability of soil landscape models to predict soil Db using a suite of landscape attributes and derivatives for both topsoil and subsoil. The models were constructed using random forests and artificial neural networks. Using these statistical methods, we have produced a spatially distributed prediction ofDb on a 100 m× 100 m grid, which was shown to significantly improve topsoil carbon stock estimation. In comparison to using mean values from point measurements, stratified by soil class, we found that the gridded method predicted Db more accurately, especially for higher and lower values within the range. Within our study area of the Midlands, UK, we found that the gridded prediction of Db produced a stock inventory of over 1 million tonnes of carbon greater than the stratified mean method. Furthermore, the 95 % confidence interval associated with total C stock prediction was almost halved by using the gridded method. The gridded approach was particularly useful in improving organic carbon (OC) stock estimation for fine-scale landscape units at which many landscape–atmosphere interaction models operate.


Introduction
Bulk density (D b ) is defined as the oven-dry mass per unit volume of a soil (IUSS Working Group, 2006).It dictates water and solute movement through the soil, can be indicative of soil quality for agriculture, and is vital for soil carbon and nutrient stock assessment (Bellamy et al., 2005;Ungaro et al., 2010;Martin et al., 2011).After the oceans, terrestrial ecosystems are the second largest store of carbon on earth, with the majority contained in soils (Batjes, 1996).These terrestrial carbon pools are highly susceptible to shortterm variation and are readily affected by anthropogenic influences such as land use changes.Consequently, they play a critical role in determining current and future global carbon budgets (Bellamy et al., 2005).Soil can either be a net sink or source of carbon (Janssens et al., 2005), and there is continuing debate as to its potential to mitigate atmospheric CO 2 emissions (Smith et al., 2005).The accuracy of soil carbon stock estimations is, therefore, of paramount importance.Dawson and Smith (2007) suggest that much of the error associated with carbon stock inventory in soils can be traced back to uncertainties in D b estimates, prompting further investigation into the methods for deriving these estimates.Furthermore, soil carbon content plays a crucial role in spatially distributed, integrated land-atmosphere process models such as JULES (Harrison et al., 2008).There is evidence that improvements to the soil C component in these types of models increases their response sensitivity to changes in soil stocks and processes.For instance, Jones et al. (2005) compared the outputs of a non-distributed soil C model to those from a multipool, distributed soil C model and found Published by Copernicus Publications on behalf of the European Geosciences Union.
that there was a difference in the magnitude of the feedback between climate and soil C when the distributed model was considered.Estimating the size of spatially distributed carbon pools requires a spatially distributed estimate of D b .
There are two principal approaches to estimating carbon stocks.One is to predict soil carbon concentrations across the landscape (often using geostatistics) and then combine these with a measure of D b and depth to predict the stock (Ungaro et al., 2010).The problem with this is that using mean D b values to convert predicted soil organic carbon (OC) concentrations into OC stocks (i.e. the failure to use spatially varying D b values) is flawed because important variations in individual soil types are omitted (Grimm et al., 2008).Alternatively, stock can be predicted directly across the landscape (Jones et al., 2005).The issue with this approach is that it cannot account for variations in the relationship between OC content and D b across the landscape, fixing this relationship at the point scale.This makes prediction at the landscape scale difficult, as at that scale soil properties are driven by physical environmental gradients and boundaries, such as topography, parent material and hydrologically effective rainfall.One of the most important recent research themes of international interest is the anticipated change in terrestrial carbon stock under changing climate and land use (Yu et al., 2012;Zaehle et al., 2007).By modelling D b using these changing landscape attributes, it can be viewed as spatially variable rather than as a fixed soil property.This may be an important consideration when predicting changes in soil carbon stocks over time, as both the soil carbon concentration and D b will vary with changes in climate and land use.Lastly, large datasets containing measurements of soil properties are scarce, prompting investigation into the possibility of making predictions using landscape variables.
Soils are formed through the combined effect of physical, chemical, biological and anthropogenic processes on soil parent material.These factors will affect soil formation in different ways across the landscape, resulting in the spatial variation observed in D b .Defined originally by Jenny (1941), these factors are soil, climate, organisms, relief, parent material, age and landscape position (SCORPAN).Today this information can be obtained from existing, large-scale soil maps, climatic data, land use/land cover maps, digital terrain models and their derivatives, parent material/geology, and landscape position.We can formalize the relationship between measured D b and the soil forming factors at the sampling location and in the surrounding landscape using statistical models (McBratney et al., 2003).These models are developed based on existing data and expert-or empirically derived soil-environmental relationships.They can then be used to predict D b within a landscape.
Recently, these principals have been applied to the prediction of both D b (Jalabert et al., 2010;Martin et al., 2009) and organic carbon stock (Wiesmeier et al., 2011;Grimm et al., 2008) at the point scale with considerable success.Methods commonly used to explicitly include landscape attributes in the modelling process are artificial neural networks (ANNs) (Keshavarzi et al., 2010) and random forests (RFs) (Prasad et al., 2006).
The objective of this study is to determine the efficacy of soil landscape models to predict D b for any given landscape, and we do so using a range of models.Our intent is not to determine the best modelling method, but rather to cover nonlinear (random forests and ANN) predictive methods to establish the feasibility of a landscape level prediction of D b .
Here, we consider a data poor environment (as the models do not include OC or soil textural properties as predictors) in which we rely on landscape-derived attributes.This allows us to produce spatial estimates of D b without interpolation and lets us consider the implications of spatial uncertainty for the wider modelling community.

Soil survey data
The soil data for this study were obtained from samples collected between 1970 and 1987 during the 1 : 25 000 and 1 : 50 000 soil mapping of England and Wales.The dataset has been described in detail by Hallett et al. (1998).Undisturbed 222 cm 3 soil cores were taken in triplicate using the methods detailed by Hodgson (1976); the D b and other soil measurements (organic carbon content, particle size fraction, textural class and depth of the horizons) were derived using methods described by Avery and Bascomb (1982).Due to limitations of computational power required to derive landscape attributes for the whole country, a subset of the data was selected from a limited area (a 18 150 km 2 region of the English Midlands) based on the relatively high density of samples (Fig. 1).The soils in the area are dominated by brown earths and surface water gleys, most of which have either a coarse or fine loamy texture, with some more clayey soils in the south of the region (McGrath and Loveland, 1992).The bedrock is dominated by undifferentiated argillaceous rocks with prominent areas of sandstone in the west and patches of limestone in both the north and south.The elevation ranges from −2 m to over 550 m.The spatial representation of soil data comes from the National Soil Map of England and Wales (NATMAP: Hallett et al., 1996), which is a 1 : 250 000 scale soil classification map.The classifications used in this study were at the association (many, homogenous groups) and great group (few, more heterogeneous groups) levels (Avery, 1980).

Topographic data
Although not usually applied to the modelling of D b , topographic model parameters are frequently used in digital soil mapping (McBratney et al., 2003) and have been specifically used to predict soil organic carbon concentration (Grimm et al., 2008).A 10 m resolution digital terrain model (DTM) was used to derive the following landscape attributes: elevation, slope, aspect, curvature (plan, profile and mean), SAGA wetness index (SWI) and sediment transport index (STI), all of which are commonly used topographic features in digital soil mapping (Wiesmeier et al., 2011).The SWI is based on the ratio of contributing upslope area per unit contour width and local slope angle (Böhner et al., 2001).The STI is based on unit stream power theory, where upslope contributing area is directly related to discharge (Moore and Burch, 1986).Classification algorithms were used to divide the landscape into 7 and 8 homogeneous topographic classes on the basis of curvature, slope and catchment size (Pennock et al., 1987), and slope, surface texture and local convexity respectively (Iwahashi and Pike, 2007).The derivation of these landscape attributes was carried out in ArcGIS 9.3 (ESRI, 2009).

Climatic data
The following climatic data were used as predictor variables: average annual rainfall (mm yr −2 ), accumulated temperature above 0 • C, median number of field capacity days (i.e. the number of days per year that the soil moisture content is above field capacity), annual average potential evapotranspiration (mm yr −2 ) and maximum potential soil moisture deficit (i.e. the water required to bring the whole soil profile back to field capacity, mm).The data were originally derived as 1971-2000 averages from monthly reports by the UK Meteorological Office, which provides information on weather for a 5 km × 5 km grid (Perry and Hollis, 2005).Average annual rainfall is the total of the monthly means per year and the accumulated temperature above 0 • C gives an effective daily temperature above 0 • C per month (Hallett and Jones, 1993).Evapotranspiration was calculated using the Penman-Monteith equation, as detailed in Hess (2000), while the potential soil moisture deficit (based on the balance of rainfall and evapotranspiration) was calculated using methods described by Jones and Thomasson (1985).The number of field capacity days is the median number of days per year that each soil type is calculated to be at or above field capacity based on water balance calculations (assuming free drainage) over the period 1970-2000(Jones and Thomassen, 1985).

Geology
Soil properties derive, in part, from in situ weathering of the parent material (Grimm et al., 2008), so a representation of geology is essential for a digital soil mapping approach.A 1 : 50 000 geological map was obtained from the British Geological Survey (BGS), which included the rock lexicon, giving the major rock units (available for download from http://edina.ac.uk/digimap) and the BGS rock classification scheme detailing the lithology of the bedrock.The distribution of bedrock, by rock classification scheme, is shown in Fig. 1c.We also used the same classification scheme to categorize superficial deposits (formerly known as drift), which represent the most recent geological deposits.Parent material was represented using the NATMAP 1 : 250 000 soil map (Hallett et al., 1996).

Land use
The land use (Fig. 1d) was represented by the Land Cover Map 2000 produced by the Centre for Ecology and Hydrology (CEH).We also produced a recoded land use map to reflect the land use at the time of the bulk density sampling.Satellite imagery was classified into a 25 m raster dataset which was subsequently aggregated into a ten-class, 1 km grid land cover map (Fuller et al., 2002).Previous studies have commonly only attempted to make predictions within a single land use such as agricultural soils (Katterer et al., 2006) or forest soils (Jalabert et al., 2010).When the region is heterogeneous, land use has proved to be an important determinant of D b (Hallett et al., 1998;Moreira et al., 2009).In this case, as land use was recorded when the D b samples were taken, the land cover map was recoded to reflect changes over time.

Soilscapes
To help evaluate the spatial performance of the models, results are assessed by "soilscape".Soilscapes are landscape units derived from expert knowledge based on the 300 soil associations that make up the National Soil Map (Soil Survey Staff, 1983;Mackney et al., 1983).Each association has a subgroup code (Avery, 1980) that identifies the diagnostic soil properties.From this, the soilscapes have been delineated by agglomerating the National Soil Map associations, resulting in 25 classes.Within these national classes, the soilscapes have been subdivided and grouped into homogenized regions based on similarities in drainage characteristics, texture and geology (Farewell et al., 2011).A description of each predictor variable used in this study, including their derivation and the number of classes or range of values in the study area, is shown in Table 1.

Data preprocessing
Models were built using 342 D b samples from the A horizon and 339 samples from the subsoil.Many studies differentiate between topsoil and subsoil by depth (De Vos et al., 2005;Katterer et al., 2006).However, the lower depth of the topsoil layer can vary from 15 cm (Bellamy et al., 2005) to 30 cm (Martin et al., 2009).The data used in this study were sampled by horizon, meaning that there was not a uniform sampling depth between points and the number of samples taken at a given location was dependent on soil morphology.As such, the A horizon, with an average depth of just over 22 cm, was used to represent the topsoil.The subsoil layer comprised various B horizons (predominantly Bw and Bg) and, on average, represented a horizon between 23 and 47 cm in depth.Of the original samples, the A horizon was split at random into 239 training and 103 validation samples, and the subsoil was split into 238 training and 101 validation samples.Models were built using the training data sampled for each horizon, then these models were applied to the validation data to provide an unbiased estimate of the predictive power of each model.

Statistical methods
In order to develop statistical relationships between a large number of landscape attributes and D b , it is necessary to apply statistical methods which can account for complex, non-linear interactions between variables.We have opted to test two distinct methods which have previously been successfully applied to the prediction of a range of soil properties including D b (Tranter et al., 2007), soil texture (Ließ et al., 2012) and near-infrared spectral reflectance (Rossel and Behrens, 2010).Both methods are suitable for datasets with numerous predictors, containing both categorical and continuous data.

Random forest
RF modelling has the potential to improve predictions made using classification and regression trees (CART) (Breiman, 2001).In essence, trees are constructed using a bootstrap of the entire dataset and the splits at each node are not made by the best predictor from the entire suite of input variables, but from the best of a randomly selected subset, which prevents overfitting (Liaw and Wiener, 2002).The model only requires two user-defined parameters: the number of trees in the forest (n tree ) and the number of variables tested at each node (m try ).The performance of the training model is assessed by predicting the mean square error (MSE) of the "out of bag" portion of the data at each tree, and then averaging over the entire forest:  Fuller et al., 2002). 8 Soil association Soils grouped to the association level (Avery, 1973) derived from the 1 : 250 000 scale National Soil Map of England and Wales (NATMAP; Hallett et al., 1996).

AT0 Annual
Average accumulated temperature above 0 • C derived from average monthly reports from the UK Meteorological Office on a 5 km × 5 km grid (Perry and Hollis, 2005).

2564-3871 • C
Parent material Soil parent material derived from a 1 : 250 000 scale National Soil Map of England and Wales (NATMAP; Hallett et al., 1996).
18 PSMD Potential soil moisture deficit related to the balance between rainfall and potential evapotranspiration (Jones and Thomasson, 1985) derived from average monthly reports from the UK Meteorological Office on a 5 km × 5 km grid (Perry and Hollis, 2005).

50-261 mm
PT Potential evapotranspiration is the amount of evaporation which would occur if water was not limited (Hess, 2000), derived from average monthly reports from the UK Meteorological Office on a 5 km × 5 km grid (Perry and Hollis, 2005).
480-708 mm yr −1 AAR Average annual rainfall derived from average monthly reports from the UK Meteorological Office on a 5 km × 5 km grid (Perry and Hollis, 2005).

RCS
Bedrock geology derived from 1 : 50 000 scale British Geological Survey rock classification scheme map, detailing bedrock lithology.

27
FCD MED Median number field capacity days derived from average monthly reports from the UK Meteorological Office on a 5 km × 5 km grid (Perry and Hollis, 2005).

Slope
Slope derived from a 10 m DEM (Childs, 2004).0-74.9SWI SAGA wetness index, a terrain-derived index of soil moisture derived from a 10 m DEM (Böhner et al., 2001).9.8-19.7 Aspect Aspect derived from a 10 m DEM (Childs, 2004).where ẑOOB i is the mean out of bag prediction for the i-th observation.RF modelling also provides a measure of fit comparable to the R 2 values of the other models.This "pseudo R 2 " is labeled the "percent variance explained" and is calculated using where σ 2 y is the total variance of the dependent variable calculated with n as the divisor, rather than n − 1 (Liaw and Wiener, 2002).The parameters were set to an n tree of 1000 and an m try of p/3, where p is the number of input variables.Liaw and Wiener (2002) suggest testing the m try value by both doubling and halving the default.Models can be sensitive to the m try parameter, as testing a greater number of variables at each split will increase the strength of the individual tree but also increase the correlation between trees in the forest.Here the optimal m try was determined using the tuneRF function (Ließ et al., 2012).Furthermore, the n tree value was increased from 500 (the default) to 1000 as recommended by Prasad et al. (2006).This number of trees is sufficiently large to stabilize errors, whilst not being too computationally demanding.An interesting feature of RF is its ability to rank predictor variables in order of importance, which is done by measuring how much the "out of bag" estimate error increases when data for a particular variable is "removed" from the analysis and the other variables are left intact.This is done on a tree-by-tree basis for the entire forest.The models were generated using the "randomForest" package (Liaw and Wiener, 2002) in the R statistical computing language (R Development Core Team, 2008).

Artificial neural networks
The principles of ANNs are well established (Bishop, 1995), with Maier and Dandy (2001) offering a practical guide for environmental modelling.The structure used here was a multilayer perceptron, a powerful predictive technique that is most commonly applied in soil science (Agyare et al., 2007).In this method, data are separated into a series of nodes, with similar nodes arranged into layers: typically, an input layer (containing the variables used for prediction), an output layer (where predictions are made) and, in-between, a single hidden layer which weights and transforms the data to extract meaningful relationships.For each model, the 239 samples used for developing the models were separated into a 75 : 25 percent split for training and testing respectively.As with the other models, the remaining 103 samples were used for independent validation.Splitting the data allowed the number of hidden nodes to be tested, which is important as the optimum number of nodes will differ depending on the problem at hand and the number of input variables.It is recommended that the number of hidden nodes should be half the number of input variables plus the number of output variables (which in our case was one) (Statsoft, Inc., 2011).Generally, adding more nodes will increase the performance of the model.However, this may lead to overfitting the data.To avoid this, the ANN uses a back-propagation algorithm (Rumelhart et al., 1986) to test the performance of the network on both training and testing datasets.Training the network should reduce the "error function" associated with predictions, such that when the error function of the testing dataset plateaus or increases, ANN overfitting is suggested.The error function for regression is the sum of squares error given by where N is the number of training cases, y i is the predicted value of the i-th case and t i is the observed value.Ideally, when the differences in the error function are negligible, the network with the fewest nodes is chosen.As the test dataset plays a role in developing the ANN infrastructure, a validation dataset is used to independently test the predictive power of the models.The best performing models were selected using values of R 2 and root-mean-square error (RMSE).ANNs can also rank variables in order of importance, although they use a different method from RFs.Here, data for each variable is replaced, in turn, by its mean value from the training data and the effect on the error function is recorded.The variables are then ranked by the amount their omission increases the overall error function (Lou and Nakai, 2001).The learning rate for the ANNs was set to 0.1 and the analysis was carried out using STATISTICA9 (StatSoft Inc., 2011).One issue arising when using ANNs for producing predictive maps is that they will not make predictions in areas where data differ from those of the training data.In other words, if not every category of, for example, land use is included in the training data, the final maps will leave blank areas when they encounter these missing categories as opposed to inferring the D b values from available data.While this leaves areas with missing predictions, it means the accuracy of the final map is not compromised.

Calculations of OC stock and associated variability
To illustrate the importance of D b for soil inventory, the variation in carbon stock estimation was calculated using measured, predicted and mean D b values.As a single, unweighted mean across a heterogeneous area would lead to biased results, the mean D b was calculated for each soil great group (Avery, 1980) and weighted by area.Using a mean D b value stratified by soil great group is an approach which is commonly used to represent the spatial variation of D b across the landscape (Grimm et al., 2008;Batjes, 1996).Carbon stock was calculated using where S is the soil organic carbon stock (t C ha −1 ), d is depth of the topsoil (m), OC is organic carbon concentration per unit mass of dry soil (kg C kg −1 ) and D b is soil bulk density (kg m −3 ).Note that within our calculations, depth is kept constant.To evaluate the uncertainty associated with carbon stock estimation, it is necessary to propagate the errors associated with both OC and D b measurements and predictions, while keeping depth constant (Schrumpf et al., 2011).The variance is given using the formula where σ OC and σ D b are the standard deviations of OC concentration and D b respectively, and covOC −D b is the covariance between the OC concentration and D b .We note here that the uncertainties in D b estimates in this case are derived from the model error (RMSE), not from the variability introduced by spatial or analytical variability (which has been considered elsewhere; Holmes et al., 2011).We are comparing here the effect of using a modelled D b (based on soil landscape attributes) for estimating C stocks to a D b estimate obtained through spatial aggregation (stratified approach).
In the stratified model, the standard deviation in D b is calculated using the measured D b samples within each soil great group.In the gridded model, the standard deviation of the D b is given by the RMSE of the predictive random forest model.As the standard deviation in OC is the same for both models and we found no spatial autocorrelation between D b sample points, we feel that this method provides a sufficiently robust estimate of OC stock variance.
In the gridded model, covariance was determined using the predicted D b values and the measured OC values.In the stratified model, the covariance between the mean great group D b and OC was determined using a non-linear mixed-effects model (Wutzler et al., 2008).This was to account for the random effects in the covariance between D b and OC across the soil great groups.To clarify, as there is a single D b value for each great group in the stratified model, there is no within group covariance.There is, however, covariance between soil great groups and OC across the study area, which is represented by the mixed-effects model.

Model performance
The results for the RF and ANN models for both topsoil and subsoil are shown in Table 2.For the A horizon, the best performing model was the RF, with ANN giving similar, albeit slightly inferior results in terms of predictive power.In the subsoil, neither of the models performed particularly well, with ANN, the best performing model, explaining just over 30 percent of the variation in D b .RF performed considerably less well, explain only 20 percent of the variation.It is interesting to note that although the model fit (in terms of R 2 values) is considerably worse for the subsoil than for the A horizon, the RMSE is lower in the subsoil models.This reflects the smaller variation between D b in subsoil horizons.

Predictor variables
Both modelling approaches have the ability to rank the predictor variables in order of importance.Although they do so in different ways, this allows us assess whether there are common predictors influencing the variation in D b .In the A horizon, the consistently important predictors are land use and soil great group or association.Climatic factors also feature as important predictors, with annual average temperature and median field capacity days shown to be significant for the RF and ANN models, respectively.The variation in the subsoil layers can be more attributed to a combination of soil association, parent material and bedrock geology.

Model performance
Random forests were able to describe D b most effectively, which is unsurprising as they are designed specifically for large, heterogeneous datasets containing a mixture of both continuous and categorical variables (Liaw and Wiener, 2002).Indeed, tree-based models have been used to successfully predict D b using a mix of landscape data and soil data (Martin et al. 2009).In terms of model performance, RF achieved similar results to a number of other studies, all of which have used textural properties as predictors (Tranter et al., 2007;De Vos et al., 2005;Heuscher et al., 2005).The ANN model also performed well for the A horizon.Previous studies (e.g.Minasny et al., 1999;Keshavarzi et al., 2010) have reported both high and low ANN performance.This can be attributed to the nature of the property being predicted.Wösten et al. (2001) suggest that generally, when there are more than three predictor variables and variables are subject to complex interactions, non-linear modelling techniques such as ANN and RF become necessary.This is clearly the case when predicting D b from landscape attributes.The poor performance of both techniques in the subsoil layer reflects the lower spatial variability of the subsoil D b (Braakhekke et al., 2013), meaning changes in landscape predictors exhibit relatively little influence.

Variable importance
It has been well established that OC content is usually the most important predictor when modelling D b .This is unsurprising as the relationship between the two has been well defined (Rawls, 1983)   modelling (Kaur et al., 2002).However, Calhoun et al. (2001) found that particle size distribution and OC generally explain no more than 60 percent of the variation in bulk density.Of particular interest here is the predictive power of the seldomused variables which represent a range of topographic, land use and climatic factors.The importance of putting D b in a landscape context is supported by the successful stratification of previous regression models by land use (Steller et al., 2008;Moreira et al., 2009) and parent material (Hallett et al., 1998;Calhoun et al., 2001).However, these factors have been explicitly included in the modelling process only relatively recently (Martin et al., 2009;Jalabert et al., 2010).
Of the landscape variables included, land use, parent material and soil classification are deemed to be consistently important predictors.The influence of soil class is unsurprising as, along with other attributes, soils are classified based on their textural properties.Using pre-existing soil maps is, in essence, a way of predicting using spatially distributed textural classes.The predictive power of land use will depend on the classification used and the resolution of the data layer.Previous prediction of D b using boosted regression trees by Buttner et al. (2000) has suggested that land use derived from the European CORINE map was the least influential of all their predictor variables, as these land use classes were too broad.However, more detailed, higher resolution land use information transpired to be the second most powerful explanatory variable, almost on a par with OC content (Jalabert et al., 2010).As land use was recorded at the time of sampling, the accuracy of the layer was not in question, and hence it proved to be an important predictor.To make use of use of the available land use data, the CEH Land Cover Map 2000 was recoded to reflect the land use at the time of sampling.This was important as, when used as a predictor without recoding, present-day land use categories were shown to be poor predictors of D b .This can probably be attributed to the fact that sampling of D b and the creation of the land use layer were approximately 30 yr apart, with significant changes over the intervening decades.
Parent material lithology is one of the leading predictors for three of the four models.This may be attributed to the presence of recently deposited material, such as alluvium, or slow draining or impermeable bedrock which are particularly influential for overlying soil formation (Hallett et al., 1998).Pertinently, a significant number of samples in this study were taken from alluvial plains, in which soil properties, such as D b , are closely related to the properties of the underlying alluvium, thereby promoting the influence of parent material as a significant predictor.In other areas with less alluvium, parent material may be less influential on D b .Predictably, parent material becomes a more influential predictor in subsoil horizons, which are less susceptible to climatic changes.Bedrock geology also becomes more influential below the A horizon.It is interesting that the climatic variables are such prominent predictors because they have a relatively low spatial resolution (5 km grid), in comparison with other predictor variables, although the link with some variables (e.g.field capacity) has clear physical significance.This suggests that improving the resolution of climatic predictors may improve model accuracy.The DTM-derived landscape attributes proved to be relatively poor predictors.Although Martin et al. (2011) mention including topographic predictors as a possible improvement for mapping OC stocks, they are not generally utilized.In similar work to model saturated hydraulic conductivity, landscape derivatives have offered some improvement to ANN models, but they cannot be used without other inputs; particularly at a regional scale (Agyare et al., 2007), this reflects the inclusion of elevation as a prominent predictor in the RF model.

Modelling without using measured soil properties
Mapping D b without point samples of soil properties is of interest for two reasons.Firstly, since the cost of largescale soil sampling can be prohibitive, the ability to use pre-existing or remotely sensed data would be desirable.As many countries already have soil, land use and geological maps at a variety of scales, it makes sense to see if further information can be extracted from them in the form of predictive models.Secondly, a key research theme in spatial mapping is the assessment soil carbon stocks because they relate to the global carbon budget (Bellamy et al., 2005;Tornquist et al., 2009;Wiesmeier et al., 2011).One issue of interest here is the lack of spatial representations of D b .Instead, mean D b values are used to convert modelled soil OC concentrations into soil OC stocks (Grimm et al., 2008).However, if variations in D b within individual soil types are not taken into account, significant errors in C stock estimation are possible.As datasets tend to be limited, and OC and D b are not always sampled together, being able to map D b accurately and independently of measured OC content would avoid circularity in modelling (i.e. using carbon content to predict D b which is then used to predict carbon stocks) and improve stock estimation at the same time.As we have found, most of the important predictor variables are categorical (land use, parent material).For the A horizon, we have found that both RF and ANN techniques can explain over 55 percent of the variation in D b .This result is significant because it shows that it is feasible to create a continuous surface of D b using landscape attributes alone.A spatial representation of D b across the landscape can be combined with a spatial representation of carbon concentration to give a more accurate estimate of C stocks and pools.At any given location, there will be an associated D b value, at an appropriate scale, which has been independently derived and which has an associated unambiguous error estimate.

Mapping D b across the landscape
For the A horizon, we have produced maps of D b for the topsoil of the entire study area using both ANN and RF (Fig. 2).Topsoil is generally considered to be the most important soil compartment in terms of soil carbon content, in part because OC concentration generally decreases with depth (Jones et al., 2005).Of the two methods, ANN gives a slightly wider range of predicted D b values than RF but still within the limits of the measured data reported within the National Soil Inventory of England and Wales (Loveland, 1990).Fewer than three percent of the samples in the National Soil Inventory had a D b lower than the minimum predicted value.In contrast, RF (Fig. 2b) provides more conservative estimates of D b , especially for the upper values.Despite this, the RF model was shown to have slightly more predictive power than the ANN model.Broadly speaking, the models agree on the spatial trends of D b distribution, most notably, areas of low D b in the north and at the westerly edge of the study area.The areas of missing data in the ANN model reflect missing data in the training dataset.Here the RF models will make predictions based on the available data.

Spatial performance
Spatially, there is broad agreement between the RF and ANN predictions, in terms of the areas of high and low D b .Figure 3 shows the individual performance of each model, in terms of prediction residuals as an average per soilscape.In the A horizon, the spatial variation in the relative performance of each statistical approach is very similar (Fig. 3b  and c).In terms of land use and soil group, the two most influential predictors of topsoil D b , both the RF and ANN models give their best predictions in areas of brown earths under arable land use.The areas across which both models appear to perform least well coincide with built-up areas dominated with Stagnogley soils.In the subsoil, the spatial patterns of model performance are also broadly similar for both the ANN and RF models.In relation to parent material, the best predicted regions coincide with areas of sandstone bedrock and superficial deposits containing siliceous stones while the worst performing areas overlie clay or soft mudstone.The spatial variation in model performance, can be used to inform any future sampling schemes, with an increased sample density in areas where a model is likely to underperform.

Stock estimation
To illustrate the potential improvement in OC stock estimation which could be achieved using the gridded surface of D b compared with using a stratified mean value (Mestdagh et al., 2009;Hanegraaf et al., 2009) points in the training data.Note that results for C stock calculations using model output were produced using a calibrated RF model that used the training dataset alone; the validation data was used solely to assess model performance.The average OC stocks calculated using each D b estimate are shown in Table 3, along with the difference between the estimated and measured mean OC value, expressed as a percentage of the mean measured value.The 5th and 95th percentile errors in measured OC stocks are also shown.The gridded surface refers to a map of RF-predicted D b values (Fig. 2b) produced as a raster grid with a cell size of 100 × 100 m across the entire study area.The main advantage of the gridded surface method over PTFs (Pedotransfer functions), which can be applied to individual points using measured soil property data for the point in question, is that the gridded method can be applied to the entire study area with the same quantifiable level of both performance and error estimation at all spatial locations.In contrast, the accuracy of predictions made using a PTF is hard to quantify beyond each sampling point.
Using the individual measured, point-based D b values gives an average OC content of 73.01 ± 0.56 t C ha −1 compared to an average value of 71.32 ± 0.61 t C ha −1 produced using the RF-predicted D b values and a value of 74.81 ± 0.70 t C ha −1 generated using great group mean D b value.Using the OC stock calculated with measured D b as a yardstick, the gridded estimate of D b yields a marginally better C stock estimate compared with using a single (mean) D b value.In this case, the RF predictions will underestimate D b , whereas using a stratified mean value will overestimate D b .The difference in the error associated with stock prediction using the gridded D b values compared to using the mean value of D b is particularly evident when predicting C stock levels in soils at the extremes of the expected range (i.e. the prediction errors for the 5th and 95th percentile OC stock values).The potential improvement in using the gridded estimate of D b is most evident in the 95th percentile, where using a stratified mean D b value will yield an error nearly two times larger.
To put the magnitude of the errors illustrated in Table 3 into context, Bellamy et al. (2005) suggest that the average annual rate of change in the OC content for UK topsoil is 0.67 g kg −1 yr −1 , which equates to approximately 1.79 t C ha −1 yr −1 .As the rate of change is comparable in magnitude to the error associated with prediction, it is clearly important to keep error to a minimum if stock changes are to be quantified accurately.The total soil OC inventory across the whole study area, calculated using both the stratified mean and gridded D b estimates, is shown in Table 4.There is a slight difference in the OC stock per unit area (0.6 t ha −1 ) which equates to a difference of over one million tonnes of carbon for this study area alone.The most notable difference between the stratified mean and gridded approaches to D b prediction is the error associated with prediction.The 95 % confidence interval associated with the stratified mean model is nearly twice as large as that of the gridded model.When estimating the total C stock within the study area, this translates to a difference of over 13 million t C −1 .
To further illustrate the potential of this method, carbon stocks were calculated for the landscape as a whole and for two selected individual soilscapes, using both the stratified measured mean and gridded predictions of D b .Soilscapes were selected to represent the range of D b values within the study area.Results are shown in Table 4.The two soilscapes of the Central Upland Spine of Northern England and the Central England Plateau show areas of relatively low and high D b , respectively.These regional differences in stock calculations, particularly in the Central Upland Spine of Northern England, highlight potential errors which can be introduced to a stock calculation by using a mean D b value, depending on the scale of the study.Moreover, the gridded model has a much greater predictive accuracy, with confidence bounds nearly two times smaller compared with the stratified mean model.The mean model produced similar stock predictions for both the entire study area and the se-lected soilscapes.This is a problem as, at the soilscape scale, the stratified mean model may either under-or overestimate carbon stocks.This issue does not affect the gridded model, because it is able to apply rules learned across the entire study region to identify areas of high and low bulk density, a key advantage when working at this scale.The soilscape scale is a scale at which errors in D b estimation have been shown to be highly significant to carbon stock inventory (Goidts et al., 2009).Estimating C stocks and changes, especially at finer spatial scales, requires the use of refined estimates of D b , which can be obtained using the types of landscape-scale models described in this paper.It is at these scales that many spatially distributed land-atmosphere interaction models such as JULES operate (Harrison et al., 2008).

Conclusions
It is possible to predict soil D b without relying on other sampled soil properties by using landscape derivatives, such as land use, geology and climatic data as predictors, if only for the topsoil.Of the two statistical modelling techniques tested, RF marginally provided the best results for the A horizon, while ANN performed better for the subsoil.In comparison to previous studies, which have attempted to predict D b from soil property data, the models constructed in this study were able to provide similar results, in terms of model performance, without using soil texture or OC content as predictors.The suite of landscape derivatives used was able to explain over 55 percent of the variation in topsoil D b in this study area.While it is not appropriate to suggest that this particular model could be used to predict D b in a landscape beyond the study area, the results show that the modelling approach can be used as a viable alternative to using soil property data.
The real advantage of this approach is the models' potential to improve soil carbon stock estimates at a landscape scale; it does not rely on point-scale measurements as explanatory variables.This means that it is possible to create a continuous, gridded surface of D b without interpolation which can be used in combination with continuous surfaces of predicted soil carbon content to improve estimations of carbon stock.In addition, the technique yields a more accurate measurement of the error associated with such predictions.In terms of carbon stock prediction, the gridded D b estimate offers a significant improvement in accuracy compared with using a stratified mean value of D b .In particular, this approach is valuable when applied at a sub-landscape regional scale, especially in data-poor areas.

Figure 1 .Fig. 1 .
Figure 1.Location and study area.a) Study location in relation to England and Wales.b) −74.8 to 66.4 Iwahashi Iwahashi landform classification uses a terrain classification algorithm based on slope, surface texture and local convexity (Iwahashi and Pike, 2007) derived from a 10 m DEM. 8 Pennock Pennock landform classification uses a terrain classification algorithm based on slope, curvature and catchment size (Pennock et al., 1987) derived from a 10 m DEM.7 STI Sediment transport index derived from a 10 m DEM.−67.4 to 0

BulkBulkFig. 2 .
Fig. 2. Predicted bulk density across the landscape obtained from models built using the training dataset.(A) Artificial neural network, and (B) random forest.

Fig. 3 .
Fig. 3. Spatial variation in model performance by soilscape.(a) The sample density for A horizon samples.(b) Average residuals for the ANN model prediction in the A horizon.(c) Average residuals for the RF model prediction in the A horizon.(d) The sample density for subsoil horizon samples.(e) Average residuals for the ANN model prediction in the subsoil horizon.(f) Average residuals for the RF model prediction in the subsoil horizon.

Table 1 .
Predictor variables used in the ANN and RF models.The variables are listed in order of importance for the RF model predicting A horizon D b .

Table 2 .
Modelling results (using the validation dataset) for random forest and artificial neural network models.The suffix "A" indicates the results are for the A horizon and the suffix "S" indicates the results are for the subsoil.The top five predictor variables are ranked in order of importance.

Table 3 .
Point estimates of OC stock.Average stock was calculated using Eq.(4).Of the prediction methods, "Measured" uses measured D b values, "Gridded" uses the gridded predicted D b values, and "Great group mean" uses the measured mean D b per soil great group.

Table 4 .
Carbon stock for the entire study area and by selected soilscape.