Improving maps of forest aboveground biomass : A 1 combined approach using machine learning with a 2 spatial statistical model 3

1 Key Laboratory of Urban Environment and Health, Key Laboratory of Urban Metabolism of Xiamen, 7 Institute of Urban Environment, Chinese Academy of Sciences, CN 361021, China 8 2 University of Chinese Academy of Sciences, CN 100049, China 9 3 CSIRO, Waite Campus, Urrbrae, SA 5064, Australia 10 4 State Key Laboratory of Resources and Environmental Information System, Institute of Geographic 11 Sciences and Natural Resources Research, Chinese Academy of Sciences, CN 100049, China 12 5 Ningbo Urban Environment Observation and Research Station-NUEORS, Chinese Academy of 13 Sciences, CN 315800, China 14 6 Department of Geography, University of Hawai'i at Mānoa, Honolulu, HI 96822, USA 15 7 Department of Earth and Environmental Sciences, University of British Columbia, Kelowna, BC V1V 16 1V7, Canada 17 *These authors contributed equally to this work. 18

. Workflow for screening an optimal model. 150

Selection of variables and analysis of resulting spatial distribution 151
To create the plot-level model, we first identified predictor variables. Based on our previous work (Ren 152 et al., 2017), we selected plot-level environmental covariates including longitude and altitude, and forest 153 attribute variables including forest distribution density, DBH, H, tree stem volume, and forest age. 154 Pearson's correlation coefficient was used to investigate the correlation between these variables and the 155 true AGB of sample plots. 156 We then analyzed the spatial autocorrelation and spatial heterogeneity of AGB data from the selected 157 sample plots. We used Moran's I (Cliff and Ord, 1981), a commonly used global spatial autocorrelation 158 index, to evaluate spatial autocorrelation between the true AGBs of sample plots. The spatial stratified 159 heterogeneity (which refers to the within-strata variance being less than the between-strata variance; it 160 is ubiquitous in ecological phenomena, such as AGB) of the true AGB of sample plots was evaluated 161 by using a q-statistic generated by applying the GeogDetector model, which is a software tool proposed 162 by Wang et al. (2016) that analyzes spatial variation of the geographical strata of variables. First, we 163 used the K-means algorithm to obtain the strata of true AGB for preprocessing by GeogDetector. Next, 164 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License.
we regarded the true AGB as Y, the strata of true AGB as X, and put them into the GeogDetector 165 model to obtain the q-statistic (Wang et al., 2010;Wang et al., 2016). 166

Split datasets 167
We used the leave-one-out cross-validation method to split the 30 sample plots into 30 sets, with each set 168 containing two groups of data: (1) validation data (the AGB of one plot) and (2) training data (the AGBs 169 and predictor variables of the other 29 plots), see Table B.3. The leave-one-out cross-validation method 170 assumes that, in a dataset containing n samples, each sample serves as a test sample with the other n − 1 171 samples serving as training samples. Thus, with n iterations, we can obtain n training datasets and n 172 validation datasets. 173

Model training 174
Seven models including three machine learning models [Figs. 3(a-3(c)], one spatial statistical model 175 The spatial statistical model (P-BSHADE) required AGB-related variables (reference series). In this 179 case study, we used the reference-plot AGB data as the variables. The allometric model (Qiu et al.,180 2018) was applied to obtain the AGB of each tree in each sample plot. Next, the reference-plot AGB 181 data consisted of the sum of the AGB of each tree. This method produces the P-BSHADE model shown 182 in Fig. 3(d). For the combined machine learning and spatial statistical models, the reference plot AGB 183 data in P-BSHADE were obtained from the results of the SVM [ Fig. 3(a)   and the three models that combine machine learning with the P-BSHADE model (a+d, b+d, c+d). 192

193
(1) Machine learning 194 SVM is a method of supervised learning in machine learning and is often used to solve classification 195 problems. The basic principle of SVM is to find a hyperplane in the feature space and separate the 196 positive and negative samples with the minimum misclassification rate (Hearst et al., 1998). RBF-ANN 197 is a three-layer neural network model, which includes an input layer, a hidden layer, and an output layer. 198 The transformation from input space to hidden space is nonlinear, whereas the transformation from 199 hidden space to output space is linear. The function of the hidden layer is to map the vector from the 200 indivisible low-dimensional linear state to the separable high-dimensional linear state, so as to greatly 201 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License. accelerate the learning and convergence speed and avoid getting stuck in a local optimum (Elanayar and 202 Shin, 1994;Xia and Xiu, 2007). RF is a combination of tree predictors such that each tree depends on 203 the values of a random vector sampled independently and with the same distribution for all trees in the 204 forest. RF is an effective tool in prediction. Because of the Law of Large Numbers, RF does not overfit. 205 Injecting the right type of randomness means that RF makes accurate classifiers and regressors (Breiman, 206 2001). 207 The schematic function for machine learning is 208 y = ( , , , , , , , ) (1) 209 where y is the AGB of the th sample plot predicted by a machine learning model, (… ) is a machine 210 learning model represented by a function of , ( = 1, … ,4); and , , , , , , and , are the 211 central longitude, the mean DBH, the mean H, and the forest age of the th sample plot, respectively. A 212 specific description of the three machine learning models is given in section S1 of the Supplementary 213

Material. 214
(2) Spatial statistical model: P-BSHADE 215 P-BSHADE is an optimal linear unbiased estimation interpolation method based on the assumption of 216 the simultaneous existence of the spatial autocorrelation and heterogeneity of the target object. We use 217 it here to solve the problem of an unrepresentative sample imposed by the spatial location of a 218 convenient sample at the plot level. 219 The core of the model is to minimize the variances between predicted error and unbiased estimation. 220 The prediction process of the P-BSHADE model requires strong spatio-temporal coordination between 221 the predictive variable (forest AGB of target plots) and the reference series (reference forest AGB of 222 target plots), so as to realize the spatial interpolation of the predictive variable. The model is also a data 223 fusion approach that combines the observed samples with the reference series (related variable). 224 P-BSHADE is markedly different from the Kriging and Inverse Distance Weighting (IDW) algorithms. 225 Compared with Kriging and IDW, the application of P-BSHADE to forest AGB interpolation has 226 obvious advantages. The spatial distribution of forest AGB is also characterized by spatial 227 autocorrelation and heterogeneity, which have been taken into account in the P-BSHADE model.

a. Objective 241
The objective is to interpolate the AGB data of the target sample plot by using data acquired from other 242 sample plots. A theoretical description is 243 where is the AGB of the th sample plot estimated by the P-BSHADE model ( = 1 − 30, = 245

30);
is the true AGB of the th sample plot ( = 1 − 30, = 30); is the weight (contribution) 246 of the true AGB of the th sample plot to the AGB to be interpolated of the th sample plot (when = 247 1, = 2, 3, … , 30; when = 1, = 1, 3, 5, … ,30); is calculated by the true AGB of the -th 248 sample plot and the allometric model estimation of the AGB in the -th sample plot. 249 As expected, the estimates of the two properties in Eq. (2) are unbiased: 250 Minimum estimation variance is expressed as 252 where is the statistical expectation. The ratio between data from the target sample plot to those from other sample plots is one of the most 256 important inputs for estimating the ABG of the target sample plot and is an index of heterogeneity in 257 the AGB spatial distribution. The relationship between data from the target sample plot and from the 258 other sample plots is expressed as 259 In most cases, the AGB of any two plots are not equal, and the relationship between them can be 261 further expressed as the relative bias between the mathematical expectation of and . The main challenge in estimation is finding the weights that satisfy the unbiased condition and 270 that minimize estimation variance: 271 These weights can be calculated by minimizing the estimation variance and taking unbiasedness into 274 where is a Lagrange multiplier. The minimized variance in the estimation error can then be written 278 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License.

281
The P-BSHADE model is a geospatial model because it has the following characteristics: 282 1. The P-BSHADE model is mainly based on the assumptions of spatial autocorrelation and spatial 283 heterogeneity of forest AGB. Therefore, before using P-BSHADE, we first applied the statistical test of 284 these two theoretical hypotheses (spatial autocorrelation test and spatial differentiation test) for forest 285

AGB. 286
2. The prediction process of the P-BSHADE model requires strong spatio-temporal coordination 287 between the predictive variable (forest AGB of target plots) and the reference sequence (reference 288 forest AGB of target plots), so as to spatially interpolate the predictive variable. 289

P-BSHADE is an optimal linear unbiased estimation interpolation method that considers temporal 290
and spatial heterogeneity. Spatial autocorrelation and heterogeneity of AGB data can be added into the 291 model based on prior knowledge (reference AGB data), following which the linear unbiased optimal 292 estimation of the target-plot AGB can be obtained by correcting data from a convenient sample plot. 293 Specifically, for example, the ratio of data from the target sample plot to that from other sample plots is 294 used [see 2.4.3(2)b section]. In the P-BSHADE model, this ratio plays a very important role in 295 estimating the forest AGB of the target plots. This ratio is a manifestation of the spatial heterogeneity 296 of AGB data. P-BSHADE takes into account the reality of the spatial distribution of AGB data and 297 emphasizes that the spatial distribution of AGB data is heterogeneous. 298 (3) Combination of machine learning and spatial statistical models 299 Considering the inherent advantages and disadvantages of P-BSHADE and machine learning, this study 300 investigates whether their combination can improve the accuracy of forest AGB estimates. Therefore, where is the predictive value of the different models, is the AGB of the th sample plot, and 324 is the number of training datasets. 325 We then used the calculated MAE, MRE, RMSE, and nRMSE to identify the optimal model. 326

Robustness of combined models 327
To evaluate the robustness of the combined machine learning and spatial statistical models, we selected 328 22 independent sample plots (see details in S1 and S3 of the Supplementary Material) and made 329 nondestructive measurements of each tree in July 2019. We repeated the workflow used for 330 constructing the plot-level model and evaluated the models. We then evaluated whether the combined 331 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License. models produced higher accuracy than the plot-level models by using the accuracy-assessment indexes 332 (MAE, MRE, RMSE, and nRMSE). 333

Model application and upscaling 334
We treated the irregular polygon forest patches (2980 patches) of the Forest Management and Planning 335 Inventory (FMPI) as a homogenous sample plot and used the optimal plot-level model to upscale forest 336 AGB (see section S1 of the Supplementary Material). We then compared the upscaled forest AGB with 337 the AGB map obtained from the allometric model and calculated the MRE of AGB between the two 338 methods (see Eq. A.15 in section S1 of the Supplementary Material).

Spatial autocorrelation test 362
The spatial distribution of the true AGBs of the 30 sample plots displayed a pattern of aggregation (see 363 red regions in Fig. C.1, section S3

Spatial heterogeneity test 371
As shown in Table 1, the true AGBs of the sample plots were divided into three strata by using k-means 372 clustering. We then ran the GeogDetector model and obtained a value of 0.87 and a value less 373 than 0.01. These results indicate that the within-layer variances were far less than the sum of variances 374 among different strata. The results also suggest that the reference AGBs of the 30 sample plots were 375 associated with obvious spatially stratified heterogeneity. 376

Performance of plot-level models 377
We developed seven models for estimating AGB: three machine learning models (SVM, RBF-ANN, 378 and RF), one spatial statistical model (P-BSHADE), and three combined models that integrated each  We also explored the relationship between the observed and predicted AGBs in terms of 411 cross-validation results (Fig. 6). The quantity R 2 was calculated for the linear regression model applied 412 to the observed and predicted AGBs; R 2 for every model was greater than 0.9. Although P-BSHADE 413 had the highest R 2 , its distribution of dots in Fig. 6(d) differed quite significantly from the 1:1 line. Of 414 the seven models, the accuracy of RF & P-BSHADE was the highest and the distribution of dots in Fig.  415 6(g) was closest to the 1:1 line. Therefore, we concluded that RF & P-BSHADE was the optimal 416 model. We compared three machine learning methods with three corresponding combined machine learning 423 and spatial statistical methods by using differences in MAE, MRE, RMSE, and nRMSE during two 424 periods, 2012 and 2019 (Fig. 7). The results suggest that the combined models improved the accuracy 425 of single machine learning models during both years. This suggests that the combined methods are

Significance of the optimal AGB model at the plot-level 453
In the past, ecologists converted AGB estimates from forest sample plots into regional AGB estimates 454 by scaling up from the tree-level to the regional scale (Malhi et al., 2004). Plot-level AGB models 455 therefore link tree-level AGB models to regional-scale AGB models. Research by Chen et al. (2015)

Why a combined model outperforms a single machine learning or spatial statistical model 487
As expected, the prediction accuracies of the combined methods were higher than those of any single 488 method (either machine learning or spatial statistical). This may due to the advantages of machine 489 learning, which can compensate for the inherent defects of the P-BSHADE model, and vice versa. 490 On the one hand, the P-BSHADE model has its own merits: (1) It takes into account the spatial 491 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License. autocorrelation and spatial heterogeneity of the distribution of the target objects, not only to solve the 492 difference between target objects caused by the different terrain or geographical location but also to 493 solve the problem of strong correlation between target objects with remote geographical locations due 494 to similar terrain condition.
(2) The P-BSHADE model calculates the covariance between objects by 495 using a reference sequence between objects (which means the reference AGB data between plots in our 496 study). This method is more reliable because it avoids the second-order stationary hypothesis (i.e., 497 when using the Kriging algorithm, semi-variograms need this hypothesis), which does not correspond 498 with the actual situation. (3) P-BSHADE regards strongly correlated plots as neighboring plots. 499 However, the P-BSHADE model is also handicapped by the fact that the founding assumption does not 500 conform to reality. The assumption is that estimated AGB is accurate in all sampling plots except the 501 target sampling plot. In other words, the premise behind using only the P-BSHADE model is that the 502 reference AGB data is accurate or strongly correlated with AGB. In reality, the AGB of each sampling 503 plot has a varying degree of uncertainty because it is obtained from the allometric model. Since the 504 P-BSHADE model combined with machine learning uses the results optimized by machine learning as 505 the reference series, it further improves the accuracy of AGB mapping. 506 Machine learning also has its advantages and disadvantages. As we described in the previous section 507 (4.2.2), machine learning has the advantage of being able to handle complex, potentially nonlinear 508 relationships between forest AGB and other variables. However, the initial samples of machine 509 learning are randomly selected, which may lead to differences in the results of each operation of the 510 model. In addition, machine learning uses the average value of all regression trees in the calculation, 511 which may result in overestimating the lower value and underestimating the higher value. As opposed 512 to machine learning, the P-BSHADE model takes into account the spatial autocorrelation and spatial 513 heterogeneity of forest AGB and of environmental covariates, and the bias of the observed values of the 514 sampling plots, which corresponds more to actual situations. A combined model takes the result of 515 machine learning as the reference series of P-BSHADE, so that the fitting process of the combined 516 model takes spatial relationships more into account than is the case for the single machine learning al., 2013). However, the methods used in these studies were adopted independently. Conversely, the 521 combination of machine learning and spatial statistics can improve the prediction accuracy of AGB 522 maps, which in turn can be used as criteria for improving the accuracy of LiDAR remote-sensing 523 technology and the results of ecological process models. Eventually, these improvements can promote 524 process-oriented projects that require dynamic AGB predictions for large-scale forests in different 525 forest management scenarios. 526 In addition, we compared the prediction accuracy of AGB mapping obtained by the combined spatial 527 statistical and machine learning models with that reported by recent studies using AGB plot-level 528 models. In the current literature on remote-sensing estimation of forest AGB, nRMSE, RMSE, and R 2 529 were commonly used as indexes for evaluating the prediction performance of models affected by 530 research sample size, data type, and forecasting methods (Fassnacht et al., 2014). In contrast, the 531 present study used four conventional indexes for evaluating prediction performance: nRMSE, RMSE, 532 MAE, and MRE. The criterion for model selection is to choose indexes summarized from sample 533 prediction (such as nRMSE), rather than choosing the goodness-of-fit R 2 (Babcock et al., 2015). Based 534 on calculated nRMSE indexes, the AGB prediction accuracy of the combined RF & P-BSHADE model 535 (11.13%) was higher than that obtained by Babcock et al. (2015) (33.91%) in Colorado, USA. In that 536 study, the authors used a combination of airborne LiDAR, a forest inventory database, and a Bayesian 537 spatial hierarchical framework model and introduced spatial random effects to compensate for the 538 residual spatial dependence and non-stationary model covariates. The AGB prediction accuracy of the 539 method developed in the current work was also greater than that obtained by Ioki et al.

Why RF & P-BSHADE method outperforms other combined methods 559
The three combined machine learning and spatial statistical methods produced more accurate AGB 560 predictions than any individual method. The accuracy of the RF & P-BSHADE and SVM & 561 P-BSHADE methods were significantly higher than that of the individual methods, but the RBF-ANN 562 & P-BSHADE method was only slightly higher. The accuracies of the combined methods depend on 563 the accuracy of the reference series (machine learning predicted result) (Xu et al., 2013). In other words, 564 the higher the accuracy of the predicted machine learning results, the higher the accuracy of the 565 combined method. Therefore, the different improvements offered by the three combined methods may 566 be attributed to the following two mechanisms: (1) the RF and SVM models are easier to use and 567 optimize than RBF-ANN (Raczko and Zagajewski, 2017). RBF-ANN is sensitive to hyper-parameters 568 and usually requires optimized parameters to obtain better fitting results. However, in the present study, 569 we used no optimized algorithms, such as genetic algorithms, to obtain parameters in the machine 570 learning model. Furthermore, the number of training samples determines the number of nodes in the 571 hidden layer of the RBF-ANN model, and the number of nodes significantly affects the prediction 572 accuracy. With only 30 training samples used in this study, the combined approach may have been 573 unable to strongly improve prediction accuracy. (2) RBF-ANN is more suitable for nonlinear stochastic 574 dynamic systems (Elanayar and Shin, 1994), whereas the relationship between AGB and environmental 575 covariates in this study is likely a monotonically increasing function. 576 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License.

Comparing upscaling of RF&P-BSHADE with allometric model 577
We used FMPI data to upscale the optimal plot-level AGB model from plot level to region scale. 578 Because the allometric model offers a fast and simple calculation method, it has been used in many 579 studies as the basis for determining the benchmark map. Nevertheless, spatial heterogeneity caused by 580 multiple environmental covariates is not considered in the allometric model because potential errors in 581 the AGB estimate may be propagated and affect the accuracy of the regional AGB map. Although we 582 regarded the FMPI patches as homogeneous study units in the present study, the area of the forest 583 patches is significantly larger than that of the sample plots. Upscaling results will thus have large 584 uncertainties (see Figs. C.4, S3 of Supplementary Material) . The current study finds 585 that the relative percent difference in total AGB between RF & P-BSHADE and the allometric model 586 was 0.17%. Meanwhile, the relative error (RE) in AGB between the two models ranged from 0.04% to 587 99.8% with a MRE of 19.93%. This suggests that the two methods are similar in terms of overall 588 estimates of AGB but that the local spatial distribution of AGB differs. Differences in AGB spatial 589 distribution have been reported in many studies of AGB maps. Babcock et al. (2015) asserted that the 590 main reasons for the differences in the spatial distribution of AGB maps between different methods 591 include the following: (1) The structural framework of different research methods and schemes cannot 592 truly reflect actual forest growth.
(2) The model is usually a simplification of an ecological process and 593 ignores spatial heterogeneity at the regional scale.
(3) The model does not consider the influence of 594 multiple environmental covariates (vegetation, topography, and others) on forest growth in the region. 595

Conclusions 596
This paper proposes a method to integrate the advantages of machine learning and spatial statistics, 597 different datasets, and multiple environmental covariates to improve the accuracy of plot-level 598 AGB-estimation models. In this study, we explored the prediction performance of different AGB 599 models and found that the model that combines the Random Forest and P-BSHADE models 600 substantially improved estimates of forest AGB. Although data from the sample plots and harvested 601 trees were collected only from Eucalyptus forests in the Nanjing region of China, the proposed model 602 and the associated results can provide references for AGB mapping in other countries and in different 603 types of tropical forests. 604 https://doi.org/10.5194/bg-2020-36 Preprint. Discussion started: 25 February 2020 c Author(s) 2020. CC BY 4.0 License.

Data availability. 605
All data are included in the paper and Supplement. 606

Author Contributions 607
Y.R. designed the study. X.Z. carried out the data collection. S.D. carried out the analyses and 608 visualized the data. X.Z. and S.D. wrote the manuscript with help from Y.R. L.G., C.X., S.Z., Q.C., and 609 X.W. provided technical advice and guidance throughout the project implementation and paper-writing 610 stages. S.D. and X.Z. contributed equally to this work. 611

Competing interests. 612
The authors declare that they have no conflict of interest. 613