Local spatial structure of forest biomass and its consequences for remote sensing of carbon stocks

Advances in forest carbon mapping have the potential to greatly reduce uncertainties in the global carbon budget and to facilitate effective emissions mitigation strategies such as REDD+ (Reducing Emissions from Deforestation and Forest Degradation). Though broad-scale mapping is based primarily on remote sensing data, the accuracy of resulting forest carbon stock estimates depends critically on the quality of field measurements and calibration procedures. The mismatch in spatial scales between field inventory plots and larger pixels of current and planned remote sensing products for forest biomass mapping is of particular concern, as it has the potential to introduce errors, especially if forest biomass shows strong local spatial variation. Here, we used 30 large (8–50 ha) globally distributed permanent forest plots to quantify the spatial variability in aboveground biomass density (AGBD in Mg ha) at spatial scales ranging from 5 to 250 m (0.025–6.25 ha), and to evaluate the implications of this variability for calibrating remote sensing products using simulated remote sensing footprints. We found that local spatial variability in AGBD is large for standard plot sizes, averaging 46.3 % for replicate 0.1 ha subplots within a single large plot, and 16.6 % for 1 ha subplots. AGBD showed weak spatial autocorrelation at distances of 20–400 m, with autocorrelation higher in sites with higher topographic variability and statistically significant in half of the sites. We further show that when field calibration plots are smaller than the remote sensing pixels, the high local spatial variability in AGBD leads to a substantial “dilution” bias in calibration parameters, a bias that cannot be removed with standard statistical methods. Our results suggest that topography should be explicitly accounted for in future sampling strategies and that much care must be taken in designing calibration schemes if remote sensing of forest carbon is to achieve its promise.


Introduction
Forests represent the largest aboveground carbon stock in the terrestrial biosphere, and deforestation, forest degradation, and regrowth are globally important carbon fluxes (Pan et al., 2011).Our ability to predict future atmospheric CO 2 concentrations or to implement effective carbon emission mitigation strategies (e.g.REDD+; Agrawal et al., 2011) is limited by the accuracy of forest carbon stock estimates.The global monitoring of forest carbon stocks has thus come to the fore of the research agenda, with important implications in economics, policy and conservation (Gibbs et al., 2007).
Aboveground carbon stock estimates based on field inventories and on remote sensing approaches have led to substantial progress in mapping broad-scale forest carbon stocks (Asner et al., 2010;Baccini et al., 2012;Malhi et al., 2006;Saatchi et al., 2011).However, such carbon maps have substantial uncertainties (Mitchard et al., 2014).The most common approach to quantifying forest carbon stocks at regional and national scales is to first stratify the area of interest, and then to assign to each stratum a mean carbon density value estimated from ground measurements.This approach inherently overlooks extensive spatial variation in carbon density within strata, including variation related to forest degradation and regrowth, both crucial components of forest carbon fluxes (Harris et al., 2012;Lewis et al., 2009).Thus, recent studies have moved from classification approaches involving a discrete number of forest types toward approaches encompassing continuous spatial variation in forest structure and carbon density, often utilizing space-based and airborne sensing of vegetation (Asner et al., 2010(Asner et al., , 2013;;Goetz and Dubayah, 2011;Wulder et al., 2012).
Active remote sensing tools such as light detection and ranging (lidar) and synthetic aperture radar (SAR) are currently the best candidates for forest carbon mapping at broad spatial scales.One forthcoming spaceborne mission is of particularly interest, the P-band radar BIOMASS mission (scheduled for launch in 2020; Le Toan et al., 2011), as it will provide estimates of aboveground carbon and its annual changes in the world's forests.The products from this instrument will have a relatively coarse resolution (200 m) and will rely on ground data to train their inversion models and to evaluate the results.Hence, the quality of the resulting BIOMASS forest carbon map will depend crucially on the accuracy and suitability of the field data used.
The quality of a field-based model calibration and resulting products depends fundamentally on how well forest biomass density in pixels is represented by the field data.In space-based remote sensing of forest biomass, sensor footprints are often many times larger than field plots (Baccini et al., 2007).If forest biomass is uniform within pixel-sized areas, this mismatch in sample area will have little impact on calibration; however, if there is substantial local spatial variability in biomass, then small calibration plots will have large sampling errors.In general, as the sampling area decreases, the variability associated with any field biomass estimate increases, as does the associated sampling error.In addition, the remote sensing field of view often differs from the fieldbased view as a result of geolocalization errors, the conversion of a circular or ellipsoidal footprint into a square pixel, and the mismatch between the forest components measured in-situ and observed by the sensors.Side-looking radar observation is a typical example of such spatial mismatch with field-based tree stem measurements (Villard and Le Toan, 2014) and remote sensing of canopy structure versus fieldbased tree stem measurements is a common source of spatial mismatch in high-resolution remote sensing products (Mascaro et al., 2011).Such spatial mismatches may considerably increase errors during the model training and evaluation steps.There is thus a need to quantify these errors and test strategies to address them.
Here, we analysed spatially explicit forest census data from a global network of 30 large permanent plots (8-50 ha) in natural forests (Condit, 1998;Losos and Leigh, 2004) to quantify local variation in aboveground biomass density (AGBD) and explore its consequences for calibrating largefootprint remote sensing products (≥ 0.5 ha) with field data for smaller plots (Fig. 1; Table S1).Using these very large plots, we address three questions.(1) What is the local variability in AGBD for the most commonly used plot sizes, how does this variability scale with the area sampled, and how does it differ among sites, forest types, and continents?(2) Does local AGBD variability exhibit significant spatial structure (e.g.aggregation) and, if so, what is that structure (strength, spatial scales)?(3) What are the implications of the observed AGBD variability for the accuracy of remote sensing calibration equations when calibration plots are smaller than sensor footprints, and for different statistical procedures?

Field data
We used measurements in 30 large forest plots across three continents (8-50 ha each; Fig. 1, Table S1).In 28 of the plots, all free-standing trees ≥ 1 cm dbh (diameter measured at 130 cm above the ground or 50 cm above buttresses) were mapped, tagged, and identified taxonomically (Condit, 1998).In two additional plots, only trees ≥ 10 cm in dbh were included (Table S1).Trees < 10 cm dbh generally contribute less than 5 % of the total aboveground biomass (AGB) in mature tropical forests (Chave et al., 2003).AGB of each individual stem was estimated using regression models based on the measured individual diameter and the wood-specific gravity assigned to that species and site, or site-specific allometric equations (details in Table S1).We only used data for free-standing woody stems, and excluded lianas from our analyses for the few sites where these were censused.Lianas usually represent less than 5 % of the total AGB (e.g.Schnitzer et al., 2012).
Elevation ranges were computed for each site based on 5-20 m elevation maps generated from either field survey measurements (Condit, 1998) or high-resolution airborne lidar (in Paracou, Nouragues and Haliburton).Among 19 forest plots where elevation maps were available, the elevation range showed a strong and significant correlation with the mean of the standard deviation of elevation within 1 ha   S1.
subplots (Fig. S1).We therefore used the elevation range, a metric available over all sites, as an indicator of topographic variability.

Local spatial variability in AGBD
Each plot was gridded into subplots at spatial resolutions ranging from 5 to 250 m, to the extent feasible given the plot dimensions.Within each subplot, AGBD (Mg ha −1 ) was calculated by summing AGB estimates for all trees whose stems were located within the subplot and expressing this on a per ha basis.We quantified the local spatial variability in AGBD for subplots of area s (in ha) using the coefficient of variation of AGBD among subplots within sites, calculated as where µ is the mean AGBD in the plot, σ (s) is the standard deviation in AGBD computed from subplots of area s, and CV(s) is the coefficient of variation for plot area s in percent.A higher CV value indicates a higher relative spatial variability of AGBD (relative to the mean), and therefore greater random sampling error relative to the mean estimate when small subplots are used as samples to represent the full plot area.We focused on the CV at the 1 ha scale, denoted CV(1) in our examination of variation among sites.We evaluated whether CV(1) increased with AGBD among sites, and whether it increased with topographic variability as represented by the elevation range, in both cases using nonparametric Spearman rank correlations.We also tested whether CV(1) varied significantly among continents or forest types using nonparametric Kruskal-Wallis tests.
We examined the spatial scaling of variability with area both graphically and quantitatively with fitted functions.Specifically, we graphed CV(s) vs. plot area (s) on log scales, and fitted power functions to the relationship between the two.In the absence of spatial autocorrelation (i.e.given independence of each grid cell), the logarithm of CV(s) should decrease linearly with ln(s), with a slope of −1/2 , just as the standard error of the mean decreases with increasing sample size (that is, CV(s) = CV(1) √ s , thus log[CV(s)] = log[CV(1)] − 0.5 log[s]).Positive spatial autocorrelation will lead to a slower rate of decline in the CV with increasing sample size over relevant spatial scales, and negative spatial autocorrelation to a more rapid decline.We fitted power functions for the relationship of CV(s) to s through linear regression on the log-transformed variables, and tested whether 95 % confidence intervals of the fitted exponents (slopes) included the value −0.5 expected in the absence of autocorrelation.The confidence limits were calculated from the estimated standard error of the slope and Student's t distribution.

Local spatial structure in AGBD
We used empirical variograms to assess the spatial autocorrelation in AGBD for 20 × 20 m (0.04 ha), 50 × 50 m (0.25 ha) and 100 × 100 m (1 ha) subplots, with subplots created by gridding each plot as above.We calculated variograms with the following formula: where AGBD xi is the AGBD observed at location xi, d is a class of spatial distance between two locations and N is the number of pairs of observations, as implemented in the R package geoR (Ribeiro Jr. and Diggle, 2001).Distances between two subplots were based on the coordinates of the centre of each subplot.To make the variograms comparable among plots, we transformed the variance σ 2 (d) to a coefficient of variation with CV(d) = 100 • σ 2 (d)/µ, where µ is the mean AGBD of the plot.(1) Within each large mapped plot, a point is chosen to be the centre of both the simulated remote sensing footprints and the simulated calibration subplots; it is chosen randomly from all points for which the largest footprints and calibration plots are fully inside the mapped large plot.
(2) AGBD footprint is calculated within circular areas centred on this point, simulating the remote sensing footprint, for the listed sizes.
(3) AGBD subplot is calculated within square areas centred on this point, simulating calibration/validation plots, for the listed sizes.
We replicated this procedure 1000 times and then calculated the root mean squared error of AGBD subplot relative to AGBD footprint for each combination of areas in which the subplot area is less than or equal to the footprint area, and normalized by the mean AGBD footprint to obtain a measure of relative error specific to that combination of scales, ErrCV (see Eqs. 3-5).
To further investigate the spatial structure of AGBD within field plots, we used wavelet functions (Percival, 1995).Wavelet analysis decomposes the variance of a process on a scale-by-scale basis, thus it is very useful for study of a variable influenced by multiple processes operating simultaneously at different spatial scales (Detto and Muller-Landau, 2013).A plot of wavelet variance versus scale indicates which scales are important contributors to the total process variance.For example, global spatial variation in temperature could be decomposed into the sum of large-scale variation due to latitude and smaller-scale variation due to topography.In the absence of any spatial structure, the normalized wavelet variance (the wavelet variance divided by the vari-ance computed from the values of the quadrants) is 1 at all scales.A value greater than 1 at scale s indicates that the variance of the process at that specific scale is higher than expected under complete spatial randomness (spatial independence between observations), i.e. the scale-specific variation is spatially structured independently of the spatial variation occurring at larger and smaller scales.In contrast, a normalized wavelet variance of less than 1 indicates that the scale-specific variation is lower than would be expected under complete spatial randomness.Details of the methods for calculating the wavelet variances are given in the Supplement S1.
For each spatial scale, we then tested whether the scalespecific variation in AGBD among sites is explained by elevation range using Spearman's ρ correlation tests between the normalized wavelet variance and the elevation range.

Implications of local variability in AGBD for large-footprint remote sensing calibration
To assess the implications of local spatial variability in AGBD for remote sensing calibration, we explored the joint influence of field plot size and of footprint size of a hypothetical remote sensing observation on the sampling error associated with an AGBD estimate.We simulated different plot sizes and footprint sizes under the best-case scenario in which the remote sensing instrument was able to retrieve the exact value of AGBD as measured in field plots.Because the remote sensing field of view often differs from the fieldbased one, we simulated a spatial mismatch between the plot and footprint shape; for simplicity, we modelled the remote sensing pixels as circles and the calibration plots as squares.
More precise quantification of such spatial mismatch could be obtained using sensor-specific and 3-D simulation approaches.We simulated field plots of 0.04, 0.1, 0.25, 0.5, 1, 2 and 4 ha centred in remote-sensing circular footprints of 0.5, 1, 2 and 4 ha (Fig. 2).We then estimated the error associated with using the field plot to estimate AGBD in the footprint, henceforth referred to as sampling error.Note that this approach more generally attempts to assess the errors generated when sample measurements are extrapolated to a larger scale.Specifically, we calculated ErrCV as the ratio between the root mean square error (RMSE) and the mean AGBD within footprints (MAGBD) for each combination of areas in which the field plot area is less than or equal to the footprint area: where N is the number of simulations (1000 per combination), AGBD footprint,i is the AGBD within the remotesensing footprint (i.e. the circle) for the ith simulation, and AGBD subplot,i is the AGBD within the field subplot for that simulation.Five of our plots (the Haliburton plot and the four Ituri plots) were too small to accommodate a circular 4 ha footprint and were thus not included in the calculation of ErrCV at this scale.
To illustrate how this sampling error propagates into AGBD maps, we then fitted calibration equations from the combination of simulated remote sensing pixels and field calibration plots.For this exercise, we simulated square remote sensing pixels of 4 ha, thus mimicking the expected resolution of the BIOMASS mission's future products (Le Toan et al., 2011).Given the size of our field plots, we were able to simulate 60 such pixels (i.e. two pixels per plot for 30 plots).Within each simulated pixel, we assumed that a single randomly located field plot was available for calibration, for area of 0.01, 0.04, 0.25, 0.5, 1 or 2 ha (i.e. 60 calibration plots, one per 4 ha pixel).For each field plot scale we calculated the coefficients of an ordinary least squares (OLS) linear regression between the AGBD estimated in the calibration subplots of a given area and the simulated pixels.We changed the location of the subplots in each plot a thousand times and averaged the regression coefficients for each subplot size.
It is well-established in the statistical literature that random error in the independent variable, such as that which results from sampling error in field plots, leads to systematic underestimation of the OLS regression slope, a bias referred to as attenuation or regression dilution (Fuller, 1987).This phenomenon is easily understood as the OLS slope β is calculated as β = σ 2 (X, Y )/σ 2 (X), where σ 2 (X, Y ) is the covariance of X and Y and σ 2 (X) is the variance of X.If W is a measure of X with measurement error (that is, W = X +ε X ), then σ 2 (W ) > σ 2 (X) (Mcardle, 2003).Hence, the estimate of β tends to zero as the measurement error in X increases to infinity, a phenomenon referred to as the dilution bias.
Several methods have been proposed to correct for this bias (Carroll and Ruppert, 1996;Frost and Thompson, 2000;Smith, 2009).The method of moments estimator (Carroll and Ruppert, 1996;Fuller, 1987) assumes that a corrected slope, β MM , could be calculated from the observed slope, β, using a reliability ratio, R r , with where To estimate σ 2 (ε X ), the variance of the sampling error in X, we generated new estimates of X (here the AGBD of calibration plots) by bootstrapping over 0.01 ha (10 × 10 m) subplots the calibration plot (i.e. 100 bootstrapped values for each of the 60 calibration plots).The reliability ratio R r was estimated using the intraclass correlation coefficient (ICC), an accurate proxy for R r (Frost and Thompson, 2000), considering the bootstrapped values as repeated measures grouped by calibration plot units.ICC was estimated through a one-way analysis of variance of repeated measures considering the calibration plots as a factor.This approach was called "within subplot R r ".We also carried out a second reliability study based on additional subplots (i.e.replicates) established randomly inside the 4 ha pixels (in Supplement S2).
We evaluated two alternatives to OLS that have the potential to produce less bias in calibration equations.First, the reduced major axis (RMA) regression minimizes the sum of squared distances both horizontally (accounting for the error in X) and vertically (accounting for the error in Y ).Second, the nonparametric Theil-Sen estimator, also known as Sen's slope estimator or the single median method, is the median of all the slopes determined by all pairs of observations.Both methods have been proposed as preferred alternatives to OLS in remote sensing studies (Cohen et al., 2003;Fernandes and Leblanc, 2005;Mitchard et al., 2013;Ryan et al., 2012).
All analyses were performed using R version 3.0.2(R Development Core Team, 2013).The R code for the analyses is available on request from the first author.

Local spatial variability in AGBD
The coefficient of variation for AGBD at the 1 ha scale, CV(1), varied among sites (n = 30) from 5.1 (Haliburton, Canada) to 29.9 % (Palanan, Philippines), with a mean of 16.6 %, and a median of 15.2 % (Table S2).The best predictor of variation in CV(1) among plots was within-plot elevation range, that is, the difference between the highest and lowest elevation (Spearman's ρ = 0.70 and p < 10 −4 ; Fig. 3a).Thus, topographic variability, represented in the analyses by elevation range across the plot, explained considerable variation in AGBD variability among sites at the 1 ha scale.In contrast, CV(1) was not significantly correlated with mean AGBD (Spearman's correlation test, p = 0.15), and did not differ significantly among forest types (tropical, subtropical and temperate; Kruskal-Wallis test; p = 0.47) or among continents (Kruskal-Wallis test; p = 0.18).Asian tropical field plots tended to show higher biomass variability than other tropical field plots (median CV(1) of 24.4 and 14.3 % respectively), consistent with their higher average topographical variability (median elevation range of 90 m for Asian tropical plots and 24 m for tropical non-Asian plots).
Regressing the logarithm of CV(s) against ln(s), we found that in 15 of 30 sites the slope was significantly greater (less negative) than −1/2, suggesting significantly positive spatial autocorrelation in AGBD at the scales investigated.In contrast, in only two sites, the Ituri Edoro 1 plot in Democratic a Spatial variability (1 ha) and topography b Spatial scaling of AGBD variability q q q q q q q q q q q q q q q q q q q q 0 50 100 CV(d) (%) q 20x20 m² quadrats 50x50 m² quadrats 100x100 m² quadrats Republic of Congo and the Paracou plot in French Guiana (Fig. 3b, Table S2-3), the slope was significantly lower than −1/2 , suggestive of negative spatial autocorrelation.Sites with greater elevation range showed shallower fitted slopes (Spearman's ρ = 0.47 and p = 0.01).Such positive spatial autocorrelation means that extrapolation from 1 ha values under the assumption of no spatial autocorrelation will lead to a slight but systematic overestimation of CV(s) for areas (s) smaller than 1 ha, and underestimation for areas larger than 1 ha (Fig. S3).

Local spatial structure in AGBD
Variograms revealed only weak spatial autocorrelation of AGBD at 20, 50 and 100 m resolution over distances of 20-400 m (Figs. 4, S5).The average coefficient of variation for AGBD was only slightly higher between distant subplots than between neighbouring ones.Though these increases with distance were generally very small, they were statistically significant in half of the plots at 20 and 50 m resolution (Figs.S6-8), consistent with the results of the analysis of the slope of spatial variability with plot scale (see above), showing that even weak spatial aggregation may have an influence on the scaling of variability in AGBD.
Wavelet analyses also showed a relatively small departure from the complete spatial randomness (Figs. 5, S9).The average normalized wavelet variances at scales above ∼ 90 m were greater than 1, indicating that a substantial part of the spatial structure of AGBD occurs at these scales.Interestingly, many sites showed low variability at intermediate scales (25-75 m).The plots with greater elevation range were characterized by larger wavelet variances at scales > 100 m (Figs. 5, S9), suggesting that the large-scale variations are driven by topographic effects.

Implications of local spatial variability in AGBD for large-footprint remote sensing calibration
Field-based sampling error depended on both field plot and remote sensing footprint areas.For very small field subplots (0.1 ha and below), sampling error was due mostly to field sampling and was relatively insensitive to the footprint size (Fig. 6).For subplots and footprint size of 0.5 ha and larger, subplot area and footprint area had similar effects on the sampling error.versus square) was much higher for small calibration plots: when the field calibration plot area was equal to the footprint area (i.e. a ratio of 1; Fig. S10).Field-based sampling error resulted in systematic underestimation of calibration slopes, which could not be corrected through any currently available statistical approaches.The OLS regression slope was underestimated by an average of 54 % with 0.1 ha subplots and by 37 % with 0.25 ha subplots (Fig. 7a; see examples of fits in Fig. S11).The large sampling errors associated with small field plots caused large dilution biases (i.e.slope underestimation).Such dilution biases result in an underestimation of the variance in AGBD; in particular, application of the resulting calibration equations would produce systematic underestimation of AGBD in high AGB areas, and systematic overestimation in low AGBD areas.Alternatives to OLS models, such as reduced major axis (RMA) regression and the Theil-Sen estimator, corrected for at best half of this bias (Fig. 7b).Our bias correction approach, based on bootstrapping over spatial variability within subplots, outperformed the RMA and the Theil-Sen estimator for plots ≥ 0.25 ha, but remained too conservative ("within subplot R r " in Fig. 7b).The alternative reliability study approach involving replicate subplots did somewhat better, but requires greatly increased ground sampling effort (in the Supplement Table S2, Fig. S2).

Discussion
Given the pressing need to monitor global forest carbon stocks, ecologists and remote sensing experts need to pay careful attention to quantifying the errors associated with forest carbon estimates.Our results quantify large spatial variability in mean AGBD for plot sizes smaller than 0.25 ha (the mean CV was of 26 % at the 0.25 ha resolution; Table S2).This large local spatial variability in AGBD results in substantial sampling errors when small plots are used to estimate AGBD within larger areas, which in turn bias calibration equations based on such estimates.Many forest inventory plots are much smaller than 0.25 ha and are regularly used for calibrating coarser-resolution remote sensing products.Our findings suggest that using such small field plots to calibrate coarser-resolution remote sensing products is likely to cause strong systematic biases in carbon maps.

Local spatial variability and spatial structure of AGBD
We found that the coefficient of variation in AGBD averages ∼ 16.6 % at 1 ha, and scales roughly with s −1/2 , where s is the plot area.This present study confirms the findings of previous studies of individual sites or forest types (Baraloto et al., 2013;Chave et al., 2003;Holdaway et al., 2014;Keller et al., 2001;Wagner et al., 2010) and generalizes the results to many sites that encompass a wide range of forest types and topographical variation.We found that spatial variability of AGBD tended to be greater in hilly terrain, confirming that topography is a major driver of AGBD variability (e.g. de Expected sampling errors when the calibration/validation plots and the remote sensing footprint differ in shape and size.The remote sensing footprint is assumed circular, and subplots are assumed to be square to simulate the spatial mismatch between the remote sensing signal and the calibration plot (Fig. 2).The mean ErrCV in AGBD estimates across all sites (n = 30) is both given within the figure and illustrated by colours, and the range of ErrCV across sites is given in parentheses below the mean.Castilho et al., 2006;Detto et al., 2013).This is an important finding given that 23 % of the world's forests are on hilly terrain (Table S4).This result suggests that forest biomass maps in hilly areas have larger uncertainties, and that forest plot sampling designs should take topography into account (see below).We found no other systematic differences in AGBD variability among continents, among forest types or with mean AGBD.The higher AGBD variability found in our tropical Asian study sites compared with other tropical sites was probably due to their larger topographic variability.This finding is no accident of our study locations; remaining oldgrowth tropical forests in Asia are disproportionately located in topographically complex terrain, more so than on other continents (Table S4), probably because these areas have disproportionately escaped human disturbance.
Approximately half of the sites individually exhibited statistically significant spatial autocorrelation in AGBD.Decomposition of the variance in AGBD at different spatial scales using wavelet analyses confirmed spatial aggregation at scales > 100 m, and the role of topography in explaining aggregation at these scales (Fig. 5b).These results suggest that the weak spatial autocorrelation found in many plots is due to broad-scale topographic differences.In a previous scale-wise analysis of a 5000 ha area of moist tropical forest, Detto et al. (2013) likewise found strong wavelet coherence between canopy height (a proxy for AGBD) and topography at scales of 100-800 m.These scale-specific results are consistent with prior literature (reviewed in Detto et al., 2013) documenting how forest structure and biomass vary with topography (de Castilho et al., 2006;McEwan et al., 2011;Valencia et al., 2009).
In most plots, the wavelet analyses also revealed that spatial variability specific to scales of 25-75 m was lower (i.e. more uniformly distributed) than expected by chance.We hypothesize that this pattern may be associated with neighbourhood competition and gap-phase dynamics.That is, the forest can be thought of as a mosaic of patches of different age, reflecting time since the last disturbance (e.g.major treefall), with patch age strongly influencing AGBD (Moorcroft et al., 2001).Within such patches, biomass variation is reduced by the common time since disturbance, and also because local competition may cause large trees to be more evenly spaced than would be expected by chance (Lutz et al., 2013).This local uniformity is overlaid on the larger-scale topographic variation, and is evident only through scale-wise wavelet analyses that separate the two.

Field sampling error and remote sensing of carbon stocks
We showed that when field plots were very small (0.1 ha and below), the sampling error was due mostly to the contribution from field sampling, and was relatively insensitive to footprint area.Hence, with relatively high-resolution pixels such as in the Landsat (30 m) or ICESat/GLAS (Ice, Cloud, and land Elevation Satellite/Geoscience Laser Altimeter System) (∼ 70 m) products, sampling errors are likely to be very high if smaller plots are used or if spatial mismatches between the field and the sensor signal occur.This is because most of the AGBD variability is at the local scale so that a small difference between the areas sampled in the ground and by the sensor generates a large error.This is well illustrated by our finding that the error was much lower for large calibration plots even when the same ratio of calibration plot area to footprint area was maintained (Fig. S10).This reflects decreasing edge-to-area ratios for larger areas, which also provide other advantages for larger plots (see also Mascaro et al., 2011;Zolkos et al., 2013).
Our analyses show that the field-sampling strategy may result in a serious bias in model calibration of remote sensing products.When this bias is present, inversion models return AGBD values that are regressed to the mean of the calibration plots (Fig. 7a), and thus underestimate the true spatial AGBD variance.For instance, in a recent study that used 112 circular 0.13 ha plots to calibrate L-band radar products (Carreiras et al., 2012), the slope of an OLS regression was found to be underestimated by 86 % and the final AGBD map displayed a much lower variance than the map produced by Saatchi et al. (2011).The dilution bias is independent of the number of calibration plots; it depends only on the sampling error associated with these plots, which is determined largely by plot size.Though the mean AGBD of the calibration plots is inherently correctly predicted (Fig. 7a), the landscape mean AGBD and thus the landscape total AGBD will be correctly predicted only if the landscape mean is identical to the mean of the calibration plots.We found that the best way to diminish the dilution bias is to bootstrap over spatial variability using subplots within plots and to correct the estimated slope using these simulated "replicates".Some remote sensing studies have argued that alternatives to OLS regression such as RMA or the Theil-Sen estimator are good alternatives to OLS regression when errors occur in X (Cohen et al., 2003;Fernandes and Leblanc, 2005;Mitchard et al., 2013;Ryan et al., 2012).Here, we showed that these alternatives do not resolve the dilution bias and still provide strongly biased products.In theory, the dilution bias could be removed completely through Deming regression; however, this approach requires information on the ratio of the error variances in the two variables (Deming, 1944).The results we present here can assist in the estimation of error variances for field plots of different sizes.However, estimating error variances for remote sensing products -that is, their error in providing an estimate of the true value of AGBD -remains a challenge.

Implications for designing forest inventories and remote sensing calibration schemes
Our careful quantification of local spatial variability and spatial structure in AGBD should be useful for the design of national and regional forest inventories, as well as in remote sensing applications.Weak spatial autocorrelation at scales of less than 100 m suggests that there is generally no gain in representativeness from locating multiple small plots within a small area or footprint (≤ 100 m) when compared to establishing one larger plot in the same area.That is, because neighbouring small plots are on average almost as different as more distantly located small plots, expanding a single small plot provides similar information as adding another small plot nearby.A number of forest inventory designs use clusters of very small plots (≤ 0.04 ha); e.g. the US Forest Service Forest Inventory and Analysis program (Bechtold and Patterson, 2005).Based upon our results these cluster designs appear to have distinct disadvantages for calibrating remote sensing products as their small dimensions are below the resolution of most sensors, and their edge to area ratios are higher than single larger plots for the same total area.Although small plots may have practical advantages in time needed for field sampling and reduced equipment costs, these advantages should be carefully weighed against the disadvantages for biomass measurements.Such small plots may induce strong biases when used individually for calibrating coarser-resolution remote sensing products.
Our results reinforce the importance of topography as a factor that should be taken into account in designing forest inventories.AGBD variation at scales of > 100 m was strongly associated with topographic variation in our analyses as was also found in previous studies (Detto et al., 2013).This suggests that sampling should generally be stratified by topographic position (e.g.ridges, valleys and slopes), especially if landscape AGBD is to be estimated purely from a field-based approach.In contrast, where the aim of field sampling is to calibrate coarse-resolution remote sensing products, this might suggest that topographically complex areas should best be avoided to minimize sampling errors associated with local spatial variability.However, the gain from reducing such sampling errors would have to be weighed against the potential to bias the calibration sample if forests in topographically complex areas differ systematically in the relationship between remote sensing signals and AGBD.
The best way to avoid the dilution bias is to use calibration plots covering entire remote sensing pixels.For remote sensing tools with a resolution on the order of 4 ha, such as the planned BIOMASS mission, it is realistic to invest in a network of similarly sized field calibration plots.Though such field sampling is expensive, it would greatly improve the basis for mapping forest biomass, and its cost would remain small compared with the investment in the satellite itself.An alternative is to use a two-step approach in which a coarse-resolution remote sensing product is calibrated against a higher-resolution remote sensing product itself calibrated with field plots.For instance, airborne lidar may retrieve forest carbon stocks with an error of ca.10-15 % at 1 ha resolution (Mascaro et al., 2011;Zolkos et al., 2013).This compares favourably with errors from purely field-based estimates for 1 ha and smaller plots (Fig. 3).Errors in lidar-based estimates are expected to be even lower for larger areas, as random errors average out (Mascaro et al., 2011).Baccini and Asner (2013) found that using wallto-wall airborne lidar AGBD estimates to calibrate a 500 m resolution MODIS product led to much less error than using nested AGBD estimates from GLAS footprints (60-75 m resolution).This shows that even if the operational cost associated with lidar coverage is high, the use of lidar technology has the potential to greatly reduce the errors during the calibration step.In this case, care must be taken that errors are carefully and appropriately propagated through the two-stage calibration to the final map (Asner et al., 2013).
Future research should integrate the results of this study with information on other sources of error in order to assess the relative importance of field sampling errors to forest carbon estimation and make appropriate recommendations.Other important sources of error in forest carbon estimates include field measurement errors (Flores and Coomes, 2011;Larjavaara and Muller-Landau, 2013), biomass allometries (Chave et al., 2014(Chave et al., , 2004;;Molto et al., 2013), data cleaning procedures (Muller-Landau et al., 2014), and wood carbon content (Thomas and Martin, 2012).At the scale of forest inventories and calibration schemes, a major source of error is the uneven and nonrandom distribution of plots at broad spatial scales, an outstanding problem in the tropics where, for example, the central Amazon, the central Congo Basin, and swamp forests all remain insufficiently sampled.

Conclusions
Accurate measurements of forest carbon stocks are critical to reduce uncertainties in the global carbon budget and for the REDD (Reducing Emissions from Deforestation and Forest Degradation) programme.However, uncertainty associated with forest carbon maps remains poorly quantified (but for notable exceptions see Asner et al., 2013;Gonzalez et al., 2010;Mermoz et al., 2014).In this paper, we used a largescale global data set to illustrate that high local spatial variability in AGBD leads to large sampling errors when plots of standard sizes (e.g.0.1, 0.25, 1 ha) are used to estimate AGBD over larger areas (e.g. 4 ha, the expected resolution of BIOMASS products).We also show that remote sensing estimates of biomass density that rely on field data for calibration may be highly biased if such field-sampling errors are large.Such biases have previously been ignored by the remote sensing community and, as we show, can only be partially corrected by available statistical tools.Overall, our results strongly suggest that calibration of coarse-resolution remote sensing products to estimate forest carbon would benefit greatly from more investment in large forest plots that are large enough to encompass entire pixels.We hope that this contribution will stimulate further work on the propagation of field sampling errors to remote sensing products and that future studies will pay more careful attention to field sampling and calibration strategies.
The Supplement related to this article is available online at doi:10.5194/bg-11-6827-2014-supplement.

Figure 1 .
Figure 1.Geographical distribution of the 30 study sites (red points) included in the present study, relative to the global distribution of forest (green) from GLOBCOVER2009 (Bontemps et al., 2011), and the boundaries between temperate and subtropical areas (blue and orange dashed lines) and between subtropical and tropical areas (orange and green dashed lines) from Fischer et al. (2012).The four sites at Ituri (Democratic Republic of Congo) are represented by a single dot due to their proximity.Note that Fischer et al. (2012) classify the Yosemite site as subtropical, but we considered it as temperate due to its high elevation.Details on study sites are provided in TableS1.

Figure 2 .
Figure2.Schematic representation of the simulations used to assess expected errors when the calibration/validation plots and the remote sensing footprint differ in shape and size.(1) Within each large mapped plot, a point is chosen to be the centre of both the simulated remote sensing footprints and the simulated calibration subplots; it is chosen randomly from all points for which the largest footprints and calibration plots are fully inside the mapped large plot.(2)AGBD footprint is calculated within circular areas centred on this point, simulating the remote sensing footprint, for the listed sizes.(3)AGBD subplot is calculated within square areas centred on this point, simulating calibration/validation plots, for the listed sizes.We replicated this procedure 1000 times and then calculated the root mean squared error of AGBD subplot relative to AGBD footprint for each combination of areas in which the subplot area is less than or equal to the footprint area, and normalized by the mean AGBD footprint to obtain a measure of relative error specific to that combination of scales, ErrCV (see Eqs. 3-5).

Figure 3 .
Figure 3. Local spatial variability in AGBD as a function of topographic variability and of spatial scale.(a) The variability at the 1 ha scale, CV(1), was positively correlated with elevation range among plots (one point per site).(b) The variability declined with increasing spatial scale within each site (one dashed line per site) and in the cross-site mean (solid black line) and deviated from the slope of −0.5 (on log-log scales) expected in the absence of spatial autocorrelation in AGBD.Separate graphs for each individual site are provided in Fig. S3 and standardized CV measures within 4 ha subplots are shown in Fig. S4.

Figure 4 .
Figure 4. Spatial variograms of AGBD for three different spatial resolutions.Ensemble average variograms for AGBD in square subplots of size 20 × 20 m, 50 × 50 m and 100 × 100 m, with variances transformed into distance-specific coefficients of variation (CV(d)).Variograms for individual plots at each spatial resolution are shown in Fig. S5.Separate graphs for each site, with confidence intervals for the null hypothesis of no spatial correlation, are shown in Figs.S6-8.

Figure 5 .
Figure5.Scale-wise decomposition of spatial variation in AGBD and its relationship to elevation range.(a) The normalized wavelet variance of AGBD as a function of spatial scale for individual plots (coloured lines) and for the ensemble average across plots (solid black line).A wavelet variance at a given scale reflects the spatial structure of AGBD specific to that scale, with a value of 1 (solid grey line) indicating no spatial autocorrelation, lower values indicating negative spatial autocorrelation, and higher values positive spatial autocorrelation.Separate graphs for each site, with confidence intervals for the null hypothesis of no spatial correlation, are shown in Fig.S9.(b) Among-site Spearman's ρ correlation of the elevation range with the wavelet variance for different spatial scales.P values of Spearman's ρ correlation tests are provided within the panel.
Figure6.Expected sampling errors when the calibration/validation plots and the remote sensing footprint differ in shape and size.The remote sensing footprint is assumed circular, and subplots are assumed to be square to simulate the spatial mismatch between the remote sensing signal and the calibration plot (Fig.2).The mean ErrCV in AGBD estimates across all sites (n = 30) is both given within the figure and illustrated by colours, and the range of ErrCV across sites is given in parentheses below the mean.

Figure 7 .
Figure 7. Propagation of field sampling error to remote sensing products: the dilution bias.(a)The mean regression lines obtained from an OLS linear regression between the AGBD estimated within 4 ha pixels randomly established in large plots (n = 60, dependent variable) and variable-size subplots located within these pixels (independent variable) differ depending on subplot areas (see key), and are biased with respect to the true slope of 1 (slope dilution biases associated with each subplot area are provided in parentheses).All the lines cross at the mean AGBD over all sites.(b) Different potential correction methods (see key) result in improved estimates of the slopes, but still retain considerable bias.The points corresponding to the lines in panel (a) are shown with matching colours.The true slope of 1, i.e. the slope that would have been obtained without bias, is illustrated by the solid grey line.