An analysis of forest biomass sampling strategies across scales

Tropical forests play an important role in the global carbon cycle, as they store a large amount of biomass. To estimate the biomass of a forested landscape, sample plots are often used, assuming that the biomass of these plots represents the biomass of the surrounding forest. 10 In this study, we investigated the conditions under which a limited number of sample plots conform to this assumption. Therefore, minimum sample sizes for predicting the mean biomass of tropical forest landscapes were determined by combining statistical methods with simulations of sampling strategies. We examined forest biomass maps of Barro Colorado Island (50 ha), Panama (50,000 km 2 ), and South America, Africa and Southeast Asia (7 million – 15 million km 2 ). The results showed that 100-200 plots (1-25 ha each) are necessary for continental biomass estimations if the sampled plots are 15 spatially randomly distributed. The locations of the current inventory plots in the tropics and the data obtained from remote sensing often do not meet this requirement. Considering the typical aggregation of these plots considerably increase the minimum sample size required. In the case of South America, it can increase to 70,000 plots. To establish more reliable biomass predictions across South American tropical forests, we recommend more spatially 20 randomly distributed inventory plots. If samples are generated by remote sensing, distances of more than 5 km between the measurements increase the reliability of the overall estimate, as they cover a larger area with minimum effort. The use of a combination of remote sensing data and field inventory measurements seems to be a promising strategy for overcoming sampling limitations at larger scales.

Abstract. Tropical forests play an important role in the global carbon cycle as they store a large amount of carbon in their biomass. To estimate the mean biomass of a forested landscape, sample plots are often used, assuming that the biomass of these plots represents the biomass of the surrounding forest.
In this study, we investigated the conditions under which a limited number of sample plots conform to this assumption. Therefore, the minimum number of sample sizes for predicting the mean biomass of tropical forest landscapes was determined by combining statistical methods with simulations of sampling strategies. We examined forest biomass maps of Barro Colorado Island (50 ha), Panama (50 000 km 2 ), and South America, Africa, and Southeast Asia (3 × 10 6 -11 × 10 6 km 2 ).
The results showed that around 100 plots (1-25 ha each) are necessary for continent-wide biomass estimations if the sampled plots are randomly distributed. However, locations of current inventory plots often do not meet this requirement, for example, as their sampling design is based on spatial transects among climatic gradients. We show that these nonrandom locations lead to a much higher sampling intensity being required (up to 54 000 plots for accurate biomass estimates for South America). The number of sample plots needed can be reduced using large distances (5 km) between the plots within transects.
We also applied novel point pattern reconstruction methods to account for aggregation of inventory plots in known forest plot networks. The results implied that current plot networks can have clustered structures that reduce the accuracy of large-scale estimates of forest biomass if no further statistical approach is applied. To establish more reliable biomass predictions across South American tropical forests, we recommend more spatially randomly distributed inventory plots (minimum: 100 plots) and ensuring that the analyses of inventory plot data consider their spatial characteristics. The precision of forest attribute estimates depends on the sampling intensity and strategy.

Introduction
For a better understanding of the global carbon cycle, reliable estimations of aboveground biomass (AGB) in vegetation have become increasingly important (Broich et al., 2009;Malhi et al., 2006;Marvin et al., 2014), especially for tropical forests, as they store more carbon in biomass than any other terrestrial ecosystem (Pan et al., 2011). Current biomass mapping approaches are based on forest field inventory plots (e.g., Chave et al., 2003;Lewis et al., 2004;Malhi et al., 2006;Mitchard et al., 2014) or remote sensing measurements (e.g., Asner et al., 2013;Avitabile et al., 2016;Baccini et al., 2012;Saatchi et al., 2015) and involve statistical approaches (e.g., Malhi et al., 2006) or vegetation modeling (e.g., Rödig et al., 2017). Remote-sensing-derived maps have a typical spatial resolution of 100-1000 m and capture the biomass of large landscapes or even entire continents (Asner et al., 2013;Avitabile et al., 2016;Baccini et al., 2012;Saatchi et al., 2011). In contrast, biomass maps based on field inventories have a higher resolution so that the local distribution in biomass can be described in detail.
However, the biomass estimation of large forest landscapes by field inventory plots (typically between 0.25 and 1 ha) poses several challenges in the tropics. Firstly, field in-ventory campaigns of species-rich, densely grown tropical forests are costly and labor intensive, resulting in a much smaller number of available plots than in temperate and boreal regions . Currently, tropical forests are sampled with less than one plot per 1000 km 2 : a density that is up to 15 times less than those that can be found in the temperate zone . For instance, the US national forest inventory includes more than 125 000 forest plots (Smith, 2002). This corresponds to 40 plots per 1000 km 2 . In contrast, investigations of the South American Amazonian forest are often based on fewer than 500 forest plots (0.05 plots per 1000 km 2 ) Mitchard et al., 2014) including a highly debated sampling error (Marvin et al., 2014;Mitchard et al., 2014;Saatchi et al., 2015).
Secondly, establishments of forest plots are often limited mainly due to topographic, logistic, or political reasons (Houghton et al., 2009;Mitchard et al., 2014). Even if plots are representative for the landscapes (Anderson et al., 2009), extrapolations from clustered plot networks to larger scales can be biased (Fisher et al., 2008). Consequently, biomass estimations can include large uncertainties; for example, estimates of the total biomass of the Amazon (93 ± 23 PgC, based on 227 forest plots) include uncertainties of more than 25 % (Malhi et al., 2006).
A first step to ensure reliable extrapolations of forest biomass from field plots to large scales is to determine how many plots would be necessary to accurately estimate mean biomass on a regional scale. Previous studies have suggested that, for regions of about 1000 ha, 10-100 sampled 1 ha plots would be necessary (Marvin et al., 2014). However, most investigations assume that plots or biomass is distributed randomly in space Fisher et al., 2008;Keller et al., 2015;Marvin et al., 2014) and therefore do not consider a possible bias due to the choice of sampling strategy. The selected sampling design can significantly influence uncertainty and, consequently, the number of sample plots required (Clark and Kellner, 2012). A deeper understanding of how the choice of sampling design affects the number of plots required and the influence of the size of the plots is still lacking.
In this study, we present a novel simulation approach for determining the number of plots necessary across scales, answering the following questions: (i) how many sample plots are necessary for forest biomass estimations in South America, and what is the role of the sampling strategy? (ii) What is the influence of scale on the sampling design?
More specifically, we analyze different sampling strategies for biomass in tropical forests at different scales: 50 ha (Barro Colorado Island, BCI), 50 000 km 2 (Panama; Asner et al., 2013), and 11 × 10 6 km 2 (South America; Baccini et al., 2012). Following the scenario of a "virtual ecologist" (Zurell et al., 2010), we investigate through Monte Carlo simulations and analytical investigations the plot size and sample size that are necessary for accurate biomass estimations. Fur-thermore, we simulate nonrandom sampling strategies that imitate measurements of transects and real-world forest inventories.

Biomass maps at different scales
We focus on three forest biomass datasets for the South American tropical region covering different scales (Fig. 1). For an analysis at the local scale, a biomass map of the Barro Colorado Island forest in Panama was applied (50 ha) with resolutions between 10 and 100 m. The map was based on the forest inventory of 2010 (Condit et al., 2012), which included measurements of all trees with a stem diameter greater than 1 cm (Condit, 1998). The AGB per plot was determined using allometric relationships (see Supplements of Knapp et al., 2018, for details).
Regional-scale analysis was carried out using a carbon density map of Panama that was derived from airborne light detection and ranging (lidar) measurements from 2012, in combination with field measurements and satellite measurements (Asner et al., 2013). The AGB values for this study were calculated by multiplying the carbon values by a factor of 2. We aggregated the AGB map from a 100 m resolution to resolutions of 200, 300, 400, and 500 m. When aggregated pixels covered a mixture of forest and non-forest areas, we assumed the non-forest areas to have a biomass of zero.
At the continental scale, we utilized a biomass map covering South America, Africa, and Southeast Asia with a spatial resolution of 500 m (Baccini et al., 2012). Biomass values of this map give information on the aboveground vegetative biomass in the time period from 2008 to 2010 and were derived using a combination of MODIS data, lidar measurements, and field data. For our analysis, we combined this biomass map with a biome map (Dinerstein et al., 2017) and excluded all areas that covered grasslands, savannas, and shrublands as well as areas with an aboveground biomass of less than 25 t ha −1 . To that end, remaining areas were assigned to one of the following four tropical and subtropical forest biomes: (a) dry broadleaf forests, (b) moist broadleaf forests, (c) coniferous forests, and (d) mangroves.
Based on the continental forest biomass map of South America at 500 m resolution, we constructed an additional biomass map of South America with a 100 m resolution using two different downscaling approaches (for details, see Sect. S3 in the Supplement). The downscaling relationships were derived from the Panama map by upscaling this map from 100 to 500 m resolution.

Simulated sampling strategies
We investigated three different sampling strategies -(a) random sampling, (b) transect sampling, and (c) clustered sampling ( Fig. 2) -with different sample sizes. For example, for analysis of the BCI forest, we divided the 50 ha biomass map into 200 square plots with a size of 50 m × 50 m (note that results could slightly differ for circular plots) (Lindsey et al., 1958). Then, we ran simulations with sample sets containing one sample (0.25 ha), two samples (0.50 ha), and so forth until we reached a sample size of 100 samples (25 ha, half of the study area). For large-scale investigations, we analyzed sample sizes of up to 5000 plots for Panama and 200 000 plots for South America.

Random sampling
Analysis of random sampling was performed using Monte Carlo simulations. For every map, we selected sampling plots at randomly selected positions (without replications) until the sample set reached the desired sample size. Random sampling is the only strategy where we can assume that the spatial autocorrelation of the map does not influence analytical analysis using the central limit theorem (Sect. S1).

Transect sampling
Transect sampling mimics sampling strategies used whenever plots should cover different gradients (e.g., climate or soil gradients). In this case, field inventory plots are established in a straight line. In our simulation approach, we assume for simplification north-south transects that start at a randomly selected position of the map. Within one transect, the plots have regular distances of 0.5, 1, or 5 km. Whenever the transect reaches the southern end of the map, a new randomly selected north-south transect is chosen starting at the northern border.
The analysis of Panama was conducted by selecting plots of 1 ha (map with 100 m resolution). For South America, we selected plots of 25 ha (map with 500 m resolution). To explore if the north-south climatic gradient influences the results, we also tested west-east instead of north-south tracks.
However, the sampling performance remained similar (i.e., the probability of estimating the mean biomass accurately did not change considerably compared to north-south tracks).

Clustered sampling
The clustered sampling approach mimics the spatial clustering of real-world field inventory networks. To this end, we reconstructed the spatial pattern of the plot networks of four studies that estimated forest biomass -including . After removal of duplicate locations within the 500 m grid as well as plots that are located in grasslands, savannas, or shrublands (according to Dinerstein et al., 2017), the number of plots per network ranged between 23 and 167. To generate 1000 plot networks with spatial configurations similar to the original ones, we applied the method of pattern reconstruction (Wiegand et al., 2013; software "Pattern-Reconstruction"). This annealing method produces stochastic reconstructions of an observed point pattern that show the same spatial characteristics as the observed pattern, as quantified by several point pattern summary functions (for details see Sect. S2).

Determining the minimum sample size
For each map and each sample size, n, we calculated the sampling probability, P n , which quantifies how often the mean of a sample equals the mean of the underlying "true" biomass distribution (under a given accuracy) as the relative frequency out of 1000 sample sets. For each sample set, the mean biomass (X i,n in t ha −1 ) was estimated, where i is the sample set number, and n is the sample size. X i,n was then compared with the "true" mean biomass, µ [t ha −1 ], of the underlying biomass map. A sample set was assumed to be accurate if X i,n was within µ ± 10 %. The sampling performance can be assessed as follows: (1) P n typically increases with the sample size from 0 (no sample could represent the mean biomass) to 1 (all samples could represent the biomass). We defined n min as the minimum sample size, n, at which P n reaches 90 %. The minimum sampling area, a min , is calculated by multiplying the number of plots, n min , by the plot size.

Local scale (Barro Colorado Island)
The analysis of the 50 ha biomass map (BCI) shows the expected result that samples with larger plot sizes produce more accurate biomass estimates (Fig. 3a). For instance, a randomly chosen 0.01 ha plot has a probability (P n ) of 5 % of representing the mean biomass of the whole BCI forest, but if the plot has an area of 1 ha, P n reaches 40 %. The size of the plots also affects the minimum number of plots required (n min ) for reliable biomass estimates (biomass estimates that have at least a 90 % chance of meeting the mean biomass of the original biomass map). For small plots (plot size ≤ 0.04 ha), n min increases markedly (Fig. 3b). While only 11 one-hectare plots are needed to estimate the biomass, the number of plots increases to 176 if the plot size is 0.04 ha (20 × 20 m). However, the minimum total area of the samples (a min ) remains similar (Table 1, BCI); i.e., it makes no difference in sample performance whether the samples are taken from 29 plots of 0.25 ha each or 746 plots of 0.01 ha each, as an area of about 7 ha is sampled in both scenarios. Therefore, the most efficient sampling strategy for the 50 ha scale would involve 0.25 ha plots, as greater plot sizes would result in a greater total sampling area (a min ), and smaller plot sizes would simply increase the number of plots.

Regional scale (Panama)
Analyzing the biomass map of Panama (50 000 km 2 ) by using plot sizes between 1 and 25 ha (Table 1, Panama), we found that the minimum sample size ranges between 70 and 74 plots. In contrast to the BCI analysis, plot size has no remarkable influence on the minimum sample size. However, the total sampling area (a min ) increases from 70 to 1850 ha for different plot sizes. The most efficient sampling strategy at this scale is therefore to sample 70 plots of 1 ha each.

Continental scale (South America)
We found that the needed sampling number does not depend on the total forest area of biomes when samples are chosen randomly (Fig. 4). Mangrove forests (90 000 km 2 , requiring a sample of 100 plots) are the least abundant biome in South America but require a similar number of samples to dry broadleaf forest (2 × 10 6 km 2 , requiring a sample of 104 plots). Furthermore, the minimum sample size does not increase compared to the Panama analysis (Table 1); for example, plot number estimations for South America moist broadleaf forest (46 plots at 500 m resolution) are even 35 % less than for the Panama forest (74 plots at 500 m resolution, mainly consisting of moist broadleaf forest).
For the whole of the South American tropical forest (11 × 10 6 km 2 ), 74 plots of 25 ha are necessary to estimate the mean biomass with sufficient accuracy (Table 1, South America (500 m); for Africa and Southeast Asia, see Table S1 in the Supplement). This corresponds to a total sampling area (a min ) of about 18.5 km 2 .
Using the downscaling approach D1 (see Sect. S3 for details), we found that about 70 one-hectare plots would be necessary to estimate the mean biomass of the South American tropical forest (Table 1, South America (100 m)). If we assume a much higher variation of biomass values than ob- Figure 3. Analysis of different random sampling strategies for the Barro Colorado Island forest (BCI, 50 ha). (a) Analytical results showing the number of plots and probability (P n ) that the mean biomass of those plots reflects the mean biomass of the forest (for details, see Methods). We consider strategies using 0.01-1 ha plots (plot size, represented by line colors). The upper boundary (grey) marks sample sizes with at least a 90 % chance of meeting the mean biomass of the original biomass map. (b) Necessary number of plots, n min , to estimate the biomass reliably (minimum sample size from samples with P n ≥ 90 %). Shapes above the bars represent the necessary sampling area, a min = A plot × n min . Table 1. Analyzed forest biomass maps and the corresponding minimum sample size. The forest biomass maps for South America (11 000 000 km 2 ; Baccini et al., 2012;Dinerstein et al., 2017), Panama (50 000 km 2 ; Asner et al., 2013), and Barro Colorado Island (50 ha; Condit et al., 2012) and their random sampling performance are shown. Different resolutions of the maps led to different results. The minimum sample size refers to the necessary number of plots to accurately estimate the observed mean biomass of the forest (the mean of the samples does not deviate more than 10 % from the observed mean biomass with a probability of at least 90 %). The last column shows the necessary sampling area, a min = A plot × n min .

Map (resolution)
Plot

Transect sampling
The performance of nonrandom strategies was related to the spatial characteristics of maps (Sect. S4, Fig. S3 in the Supplement). When the spatial clustering of the BCI forest biomass map is analyzed at the scale of 50 m, the obtained spatial biomass distribution is comparable to a random configuration; thus, the design of the sampling strategy has no influence on the results for this local forest area. For Panama and South America, the biomass is distributed in such a way that similar biomass values are more likely to be close to each other, which leads to biased estimation of the mean biomass if the samples are close to each other (e.g., transects with distances of 0.5 km between the plots). This results in differences between random sampling and transect sampling (Fig. 5a): compared to random sampling, transect samples show a lower probability (P n ) of estimating the mean biomass of the forest accurately independent of the sample size. For Panama, random samples based on 100 one-hectare plots exhibit a P n = 95 %, while transect samples are less than 60 % reliable (Fig. 5a).
The results show that, if the distances between the plots increase from 0.5 to 5 km, about 80 % fewer plots are necessary for accurate estimations. Larger distances between measure- ments within one transect make the strategy "more random", and it therefore performs better. When distances of 5 km are used, Panama requires a total sampling area of 550 ha (instead of 70 ha with random sampling) to estimate the biomass of the 50 000 km 2 forest with sufficient precision (Fig. 5b). In summary, even with large distances between plots, transect sampling leads to higher sampling efforts than random sampling.
For South America (11 × 10 6 km 2 ), transect sampling based on 100 plots (25 ha plot size) shows a probability, P n , of less than 40 % (Fig. S4). Using distances of 5 km, the minimum sampling size increases by a factor of 140 compared to random sampling (Fig. S4), leading to a total sample area of about 2500 km 2 .
Stratification into forest biomes does not lead to a marked reduction of the overall number of needed sample plots, since the sum of the plots needed for all single biomes (in total 32 200; Fig. 5d) is similar to plots needed for an overall forest sampling (36 000 plots; Fig. S5). However in contrast to the random sampling, the area size of each biome affects the sampling effort. Here, the large broadleaf biomes need about 10 times more transect samples than coniferous or mangrove forests.

Clustered sampling
Samples based on forest inventory plots are often influenced by accessibility, which leads to nonrandom locations of the sample plots that are simulated under the clustered sampling approach. Here, we examine the biomass map of South America with reconstructed point patterns PP1-PP4 based on the locations of existing inventories in South America (23-167 plots; see Methods). The results show that the probability (P n ) of estimating forest biomass accurately is considerably lower compared to the probability associated with ran-dom samples (Fig. 6). All samples present less than a 45 % chance of reflecting the real mean biomass for South America. For clustered sampling, a greater number of samples per se does not lead to better biomass estimations. The positions of the plots therefore play a crucial role. Although PP4 combines many plots of PP1 and PP3, the stochastic sampling scheme based on the spatial aggregation of plots cannot capture the biomass distribution substantially better than those based on the single datasets alone. In summary, the simulation results demonstrate that nonrandom strategies such as transect sampling and clustered sampling differ considerably from random sampling, leading to increased sampling efforts and noticeably greater sampling uncertainties.

Discussion
Due to the large area of tropical forest, only a few parts of the forest can be investigated in detail. Therefore, effective sampling strategies for these forests are relevant (Broich et al., 2009;Chave et al., 2004;Malhi et al., 2006;Marvin et al., 2014). The question of how many forest plots are necessary to predict forest biomass has not yet been fully answered. Thus far, sampling quality has often been determined on the basis of the assumption that samples are spatially randomly distributed Fisher et al., 2008;Keller et al., 2015;Marvin et al., 2014). However, sampling at large scales in the tropics often does not fulfill this condition because, in many cases, random locations are difficult to access (Wang et al., 2012). In this study, we compared different sampling strategies for tropical forests across various scales and plot sizes, examining the probability of obtaining the correct biomass estimate and the associated minimum sample size. Therefore, we analyzed random samples and compared them to simulated samples that are spatially clustered. Please note that in this study we did not consider additional error sources, for instance those due to tree size measurements or allometric models, even though they are also known to influence biomass estimates .

Random sampling
Focusing on forests in South America, we showed that, independent from forest area, fewer than 100 randomly distributed 1 ha plots are necessary to estimate the mean biomass with sufficient precision. This result is in line with a study by Marvin et al. (2014) predicting minimum sample sizes between 10 and 100 plots for forest regions in Peru (1-10 km 2 ).
By testing plot sizes between 0.01 and 1 ha, we demonstrated that inventory plots should not be smaller than 0.25 ha because smaller plots tend to be considerably more heterogeneous (reflected by a large increase of the coefficient of variation (CV)) and lead to a noticeably greater number of necessary sample plots. Although the CV of the biomass distribution increases with decreasing plot size for local forests (Réjou-Méchain et al., 2014;Wagner et al., 2010), there seem to be only small effects for larger landscapes. For Panama, we even found that biomass distributions of the aggregated maps were more heterogeneous due to averaging forest with non-forest areas. To estimate the minimum sample size of a particular forest region, it might be useful to explore biomass variability, for example, by using forest models (Zurell et al., 2010) or topography (Réjou-Méchain et al., 2014).
For large areas (tropical forests in South America, Africa, and Southeast Asia), we obtained minimum sample sizes of 74-103 plots (randomly distributed, 25 ha each) on each continent. We also tested larger plot sizes with a biomass map from Saatchi et al. (2011), but the results were similar (75- Figure 6. Clustered sampling of biomass in South America. We tested different clustered sampling strategies using reconstructed point patterns based on the locations of existing field plots in South America (PP1-PP4). The simulation was performed with the South America map with a resolution of 500 m (25 ha plot size). Results show the probability (P n ) of accurate sampling for the spatial clustering of each point pattern (black crosses; "accurate" means that the mean biomass of the sample does not deviate more than 10 % from the mean biomass of the original map). The upper boundary (grey) marks sample sizes with at least 90 % chance to estimate biomass accurately. As a reference, the results for random sampling are shown (red line). 136 plots, 100 ha each). Furthermore, we also tested smaller plot sizes by downscaling the South American biomass map to 100 m using relationships derived from the Panama forest biomass map (50 000 km 2 forest area). The analysis indicated that 70 plots of 1 ha that are randomly distributed in space are sufficient for biomass estimations in South America at large scales (Table 1).
Some remote-sensing-based biomass maps can miss empirically measured biomass patterns in tropical forests . Biomass maps might not meet the high variability of biomass in forests because they do not include fine-scale variation  and saturate at high biomass values (Lu, 2006;Sellers, 1985). We addressed this issue of missing fine-scale variation by constructing an additional biomass map at 100 m resolution with much higher variation in biomass values than observed in the biomass map used. In this case, the number of 1 ha plots that are necessary for continental estimates of the South American tropical forest increased to 121 one-hectare plots (instead of 70 plots). Please note that we tested a simple downscaling procedure, so caution must be applied to these initial findings. In summary, we found that a higher variation in biomass values leads to a higher sampling effort (see Fig. S1).

Nonrandom sampling
Our analysis showed that sampling efforts change considerably if samples are not random in space. For South America, nonrandom samples of forests are less reliable and require substantially more plots to achieve accurate biomass estimations. This means that the necessary number of plots for nonrandom sampling strategies (as can be found in real-world inventories) cannot be assessed by Monte Carlo simulations that implicitly assume that samples are random (as in related studies, e.g., Chave et al., 2004;Fisher et al., 2008;Keller et al., 2015;Marvin et al., 2014). Instead simulation procedures need to incorporate more advanced methods that include aggregated plot placement.
We demonstrated that a spatial autocorrelation has an effect on the sampling strategy (Legendre and Fortin, 1989;Réjou-Méchain et al., 2014) if plots close to each other are more similar than plots located farther apart (positive autocorrelation). Result suggest that for larger regions biomass tends to be more spatially clustered (e.g., large forest biomass occurs more frequently within the Amazon basin than in the surrounding landscape) because biomass varies due to environmental gradients and geographical reasons (Houghton et al., 2009). Therefore, the uncertainty of large-scale estimations might be more affected by the sampling design than estimates for local scales. However, small forested regions can also be spatially correlated in terms of biomass (e.g., due to management or topography), so biases cannot be excluded whenever the sampling design is not random.
The sampling performance of current plot networks is related to an interplay of clustering and scattering of the inventory plots in certain areas. For instance, if inventory plots are more likely to be located in densely overgrown mature forests, the biomass is overrated because the rest of the plots cannot compensate for this bias ("majestic forest bias"; Malhi et al., 2002). We see a high potential in point pattern analysis to determine critical levels of aggregation that can bias sampling estimation. Current plot networks might improve their sampling performance by combining subsamples of aggregated plot clusters while including additional plots in poorly sampled regions.
Existing plot networks may provide better estimates than suggested by "blind" sampling using additional information (e.g., climate and soil covariates). Those covariates can be utilized, for example, to define weighting factors that enhance biomass mean estimation. To analyze this issue, the pattern reconstruction approach used in this study could include additional criteria (ideally also those used for selection of plots). If the covariates can be mapped in the entire study area, the pattern reconstruction approach can take into account the additional constraints and reject plot configurations that do not agree with these criteria.

Conclusions
In summary, our study shows that the accuracy of the biomass estimates derived from samples depends considerably on the sampling strategy. Inventories are highly relevant for studying forest structure and dynamics. For South Amer-ica, we have shown that more spatially randomly distributed plots are beneficial for continent-wide biomass estimations. For a given sampled area, plot size should not fall below 0.25 ha, as the variability of biomass values will strongly increase (Chave et al., 2003;Clark et al., 2001;Keller et al., 2015;Réjou-Méchain et al., 2014), and tree-level measurement errors can dominate .
It is challenging to establish forest plots randomly across South America. On the one hand, mature tropical forests have high tree densities (Crowther et al., 2015), so measurements are more labor intensive. On the other hand, random plot locations may lead to large distances between the plots (Wang et al., 2012), making them more difficult to access and also resulting in higher efforts and costs.
Some studies combine field inventories with remote sensing data to estimate the biomass of large regions (Asner et al., 2013;Baccini et al., 2012;Rödig et al., 2017;Saatchi et al., 2011) as remote sensing can sample forest regions in a short time (Houghton et al., 2009;Schimel et al., 2015). The transect sampling shown here could also give hints for remote-sensing-based airplane campaigns flying in straight lines over forest transects (e.g., comparable to Asner et al., 2013). The methods presented can be applied to any spatially clustered sampling technique. The sampling design is very relevant not only for forest biomass estimations but also in view of other forest attributes (e.g., production). This should be considered when establishing forest plot networks.
Code and data availability. Biomass data of BCI (Condit et al., 2012), Panama (Asner et al., 2013), and the tropics (Baccini et al., 2012) are available in the corresponding references. The R code for sampling simulations is available upon request from the corresponding author.
Author contributions. JH, RF, and AH conceptualized the research; JH prepared the data and ran analyses. TW supported point pattern analysis. HJD contributed to analytical solutions. JH, RF, and AH prepared the first draft of the paper, and all the co-authors contributed substantially to subsequent versions, including the final draft.