Predicting tree heights for biomass estimates in tropical forests – a test from French Guiana

The recent development of REDD + mechanisms requires reliable estimation of carbon stocks, especially in tropical forests that are particularly threatened by global changes. Even though tree height is a crucial variable for computing aboveground forest biomass (AGB), it is rarely measured in large-scale forest censuses because it requires extra effort. Therefore, tree height has to be predicted with height models. The height and diameter of all trees over 10 cm in diameter were measured in 33 half-hectare plots and 9 one-hectare plots throughout northern French Guiana, an area with substantial climate and environmental gradients. We compared four different model shapes and found that the Michaelis– Menten shape was most appropriate for the tree biomass prediction. Model parameter values were significantly different from one forest plot to another, and this leads to large errors in biomass estimates. Variables from the forest stand structure explained a sufficient part of plot-to-plot variations of the height model parameters to improve the quality of the AGB predictions. In the forest stands dominated by small trees, the trees were found to have rapid height growth for small diameters. In forest stands dominated by larger trees, the trees were found to have the greatest heights for large diameters. The aboveground biomass estimation uncertainty of the forest plots was reduced by the use of the forest structure-based height model. It demonstrated the feasibility and the importance of height modeling in tropical forests for carbon mapping. When the tree heights are not measured in an inventory, they can be predicted with a height–diameter model and incorporating forest structure descriptors may improve the predictions.


Introduction
Tropical forests are an important and dynamic stock of carbon on earth; they account for 40 % of the carbon stored in the earth's vegetation (Gibbs et al., 2007).Accurate estimates of aboveground biomass (AGB) for tropical forests are needed to assess the spatial and temporal variation of these carbon stocks (Houghton et al., 2001).The AGB estimations have direct applications to forest management in light of the recent developments in the carbon market and REDD+ (IPCC, 2000;Gibbs et al., 2007).Though considerable plot measurements are occurring, models used to predict biomass are often rough and need further improvements to lower biases and uncertainties (Houghton et al., 2001;Chave et al., 2005).Today, AGB spatial extrapolation methods mostly rely on remote sensing data (Asner et al., 2010;Saatchi et al., 2011;Baccini et al., 2012).While very promising, these methods still require calibration points from wellknown forest plot inventories (Lucas et al., 2002).
Forest census plots typically consist of various measurements of properties of all the individual trees encountered on a given surface.The diameters at breast height (DBH) are always measured, generally starting at 10 cm.Depending on the inventory effort, additional information such as a tree's height or species may be recorded.The AGB of a forest plot is the sum of the AGB of the trees belonging to this plot.
Tree AGB models use biological variables describing a tree to predict its individual AGB (Brown et al., 1989;Brown, 1997;Araujo et al., 1999).The widely used models use the tree DBH, the tree height, and the tree wood density, or wood specific gravity (WSG), to predict the tree's biomass (Chave et al., 2005).Among these variables, the DBH is Q.Molto et al.: Predicting tropical tree height measured in the field, and the effect of WSG on the plot AGB estimation is unclear for some authors (Molto et al., 2012).
Thus, for AGB prediction, tree height is a key variable that is generally not measured.Therefore, it must be predicted.In boreal forests, classical height models predict a tree height from its DBH for a given species (Sharma and Parton, 2007).However, the biodiversity of tropical regions prevents the use of height models that include a species effect.In the past, various large-scale height-DBH model shapes have been proposed (Huang et al., 1992), but their applications to largescale tropical forests are rare (Brown et al., 1989;Feldpausch et al., 2011).The general objective of this paper is to explore the possibility of including additional information, such as forest stand structure structure and environmental variables, into the height-DBH model in order to build a flexible model that can be used for AGB estimations in different landscape contexts.
We used a data set from French Guiana consisting of 42 forest plots.These plot inventories are suitable for AGB assessments (IPCC, 2000).Measures include the tree's DBH measured above 10 cm, height, and species.The plots are situated in the northern part of French Guiana and were chosen to represent the contrasted landscape of the region (Ferry et al., 2010;Baraloto et al., 2011;Gond et al., 2011).More specifically, we asked the following questions: 1. Which height-DBH model shape is both robust and convenient to use?
2. Do the height-DBH model parameters vary between sites?If so, do these variations affect the AGB predictions?
3. Can the forest plot stand structures and forest local environment explain the variability of the height-DBH model coefficients?
To reach the objective of creating a height-DBH model for AGB predictions, height models were evaluated on their ability to replace measured heights in the forest plot for AGB predictions.We used a tree AGB model set in French Guiana.The AGB model used tree height, tree DBH and tree WSG to predict tree fresh AGB.It allowed for uncertainty from height and WSG predictions propagation through a Monte Carlo sampling process (Molto et al., 2012).To evaluate the performance of a height-DBH model, we predicted the AGBs of the trees using -1 measured height and -2 predicted heights.The degradation of the precision of the AGB prediction between -1 and -2 gave us a measure of the performance of the height-DBH model.

French Guiana
The study was conducted in French Guiana.The climate of the region is equatorial, with two main seasons: a dry season from August to mid-November and a rainy season from December to April (often interrupted by a short drier period in March; Wagner et al., 2011).The relief comprises a hill system within a dense hydrographic network.Rainforests cover almost all the study area.

Forest plots
Inventory data came from two projects recently conducted in French Guiana.The sampled plots are typical from the Guiana Shield forests (Terborgh, and Andresen, 1998).Dominant plant families include Lecythidacea (Eschweilera), Caesalpinaceae Caesalpiniaceae (Eperua), Chrysobalanaceae (Licania) and Sapotaceae.The tree species richness (DBH ≥ 10 cm) ranges from 130 to 200 species per hectare (ter Steege et al., 2000).A description of the forest plots is available in Supplement S2.These data represent a total of 9467 trees.
-Inventories from the AMALIN project (Baraloto et al., 2011): 33 plots spread in various landscapes and topographical contexts (ridges, plateaus, and lowlands).DBHs and tree heights were measured by a team of trained experts.The plots were divided into two subplots (details in Baraloto et al., 2011) and represent a 0.1 hectare area (trees with DBH ≥ 10 cm) nested in a 0.5 hectare area (trees with DBH ≥ 20 cm).
-Inventories from the BRIDGE project (Baraloto et al., 2010): 9 one-hectare plots where trees with DBH ≥ 10 were measured for DBHs and heights.Heights were measured with laser range finders or ropes when a climber could approach the tops of the trees.

Descriptors of the forest structure
We chose variables commonly used by foresters to describe the stand DBH structure: the basal area (in m 2 per hectare) and the relative frequencies of four classes of stem size (between 10 cm and 20 cm, 20 cm and 40 cm, 40 cm and 60 cm, and above 60 cm).These descriptors were computed from DBH census data only; thus, they are always available in standard forest inventories.Because they sum up to 1 in each forest plot, the relative frequencies of classes of stem size are not linearly independent.In order to have a variable matrix x j of full rank (Eq.8), we dropped the proportion of stems between 20 and 40 cm (Hastie et al., 2009)

Descriptors of the environment
We chose to work with mainstream, widely available environmental variables.Four of these were computed from a digital terrain model (DTM) with 90 m-sided squared cells (NASA SRTM missions).(i) The drained area measures the surface of the hydraulic basin that flows through a cell.A low value indicates cells located close to the limit of two basins, whereas higher values indicate cells located downstream.(ii) The hydraulic altitude was computed from the third-order hydraulic system.The hydraulic altitude of a cell is its altitude above the closest stream of its hydraulic basin.Lower values (including 0) indicate that the forest plot is located in a potentially temporarily flooded area, while higher values indicate that the forest plot is located at a top-hill area.(iii) The slope of each cell was computed with a 180 m 2-cell lag.(iv) The terrain ruggedness index (TRI) was computed with a 20-cell lag (1800 m) to catch the difference between flat and more mountainous landscapes.Two environmental variables were computed from the NASA TRMM rainfall data.One was the annual average rainfall in the last 10 years (in mm); the other was a dry season index (DSI), computed as the average number of months with rainfall below 100 mm (Wagner et al., 2012).The DSI quantifies the length of the annual hydraulic stress for trees.All maps and geographical information were computed with SAGA (Bock et al., 2004).

Height-DBH model shapes
M1 (log-linear, Eq. 1) is a height-DBH model that has already been used for height-DBH modeling (Nogueira et al., 2008).Classically, the error term was additive normal, but we used a multiplicative lognormal to better address heteroscedasticity.The model may give negative values for DBH lower than 1, but this is not a problem since the DBH is larger than 10 in standard forest inventories.The model has no horizontal asymptote, but due to the log function, the increase in large DBH values is extremely slow.
M2 (log-log, Eq. 2) is a model that is frequently used in forest ecology (Brown et al., 1989;Feldpausch et al., 2011).However, the existence of factors limiting tree growth in height but not in DBH may lead to questions about its basic assumptions.This model is known for overestimating the height of the large trees (Feldpausch et al., 2011).
M3 (simplified Weibull, Eq. 3) is a non-linear model that is common in height-DBH relationship modeling (Fang and Bailey, 1998;Feldpausch et al., 2011).Its shape presents an oblique asymptote with slope α/β at (0, 0) and a horizontal asymptote H = α when the DBH is large.
M4 (Michaelis-Menten, Eq. 4) is a non-linear model, while very common in chemistry, that has rarely been employed to model height-DBH relationships (Huang et al., 1992).However, it presents all the required features: positive and increasing, with an oblique tangent line with slope β = α/γ in (0, 0) and a horizontal asymptote H = α when the DBH is large.
The model was re-arranged as follows to ease the parameter inference: (5) All models have three parameters: two for the shape and one for the variance of the error term.In order to mechanistically increase the model uncertainty with height and DBH, the error term was modeled by a lognormal distribution.Keeping in mind that our objective was biomass prediction, each height-DBH observation was weighted by a proxy w i of the biomass of each single tree (Eq.6).
In each plot, the weights w i were normalized so their sum is the number of observations.The models M1 to M4 were calibrated for each forest plot.The four model shapes are represented in Fig. 1.Parameter estimations were conducted using Markov chain Monte Carlo (MCMC) methods (see Supplement S1).After discarding a burn-in sample and a thinning of the chains, 1000 samples of the posterior distribution of each parameter were kept (for M4, the posterior distribution of the parameters are presented in Fig. 2).The models inferred independently in each forest plot are referred to as the "site-specific" height-DBH model.
The last panel presents the predicted height distribution of a tree with a DBH of 50 cm.

Height-DBH model shape selection
The AGB of each plot was computed.The AGB of a forest plot is the sum of the AGB of the trees from this plot divided by the surface of the plot, in Mg ha −1 .The tree AGB model predicts the mass of a tree from its DBH, height, and WSG 7. Uncertainties from height predictions, WSG predictions, and tree AGB model parameters are propagated through Monte Carlo samples of their respective distributions (Molto et al., 2012).
Two different definitions of the height H i of a tree i were used: field-measured heights and predicted heights.When the AGB of a tree was predicted (Eq.7) with a height H i predicted from one of the four height models, Monte Carlo samples of the predicted height H i were generated from the posterior samples of the parameters and error term of the height-DBH model.
The forest plot AGB distributions obtained from each height prediction were compared with the AGB distributions obtained from measured heights using the root mean squared error (RMSE) (Fig. 3).The selected height-DBH model was thereafter noted M*.In addition, the selected model M* was calibrated on the entire data set without site effect.This model was called "regional" model.(Eq.8).The parameter σ was now unique, and we did not try to explain its plot-to-plot variation for identifiability reasons.The variables x j were scaled so the coefficients of the log-linear combination could be compared to each other.We used the exponential function to constrain the values of the parameters α p and β p to be positive.In models M2, M3, and M4, the coefficients α and β are positive for physical reasons: the height is a positive value and the height increases with the DBH.In the model M1, the α parameter is not necessarily positive.The observations were weighted as before (Eq.6).For algorithm details of the estimation of θ α , I α , θ β , and I β , see Supplement S1.

Environment
We used the method set by Kuo and Mallick (1998) to select the variables x j to be integrated in the final model.During parameter inference (see Supplement S1), an indicator i α,j (respectively i β,j ) associated with each variable x j for the parameter α (respectively β) can take two values: 1 indicates that the variable is kept in the model, and 0 indicates that the variable is not kept in the model (Eq.7).Thanks to the indica-tors, the MCMC algorithm explored different combinations of variables.
To decide whether a variable x j was kept in the model or not, we computed its percentage of presence in the explored models.This percentage was computed as the mean of the MCMC chain values of the indicator i α,j (respectively i β,j ) after a burn-in removal and a thinning.
Usually, this percentage had the shape of a plateau followed by a rapid decrease.We aimed to keep the variables with a percent of selection close to the value of the plateau.The selected variables implicated in the replacement of α and β were not necessarily the same (Fig. 4).

Variable selection
Because the environment has an obvious effect on the forest structure, we could not consider the structural and environmental variables in a single step.Thus, we first replaced the α and β coefficients by a linear combination of the stand structure variables only.The structural variables were selected using the method described above.The resulting model was called "stand structure model" (Fig. 4, panels 1 and 3).Then, the environmental variables were added to the previous stand structure model.The variable selection procedure was run again, selecting the environmental variables only.In other words, an environmental variable was selected only if it caught variance that was not caught by the formerly selected structural variables (Fig. 4, panels 2 and 4).
As for the comparison of the model shapes, the heights predicted with the stand structure model and the environment model were used to compute the AGB of the plots.In its ability to predict height to predict AGB, the best model including structure and eventually environmental variables is a compromise between the regional model (worst case) and the site-specific model (best case).The comparison of the AGB prediction RMSE allowed for quantification of how the variables describing the forest plots improved the regional model and how far the performances were from the sitespecific model.

Model shape selection
Overall, we found that α and β coefficients were different from one site to another (Fig. 2 for model M4), showing that the height-DBH relationship varied between locations.The posterior distributions of both parameters α and β were somehow correlated (r = −0.81),suggesting that the forest properties they catch are not independent.Using the α, β, and σ coefficients of the site-specific M* model, the heights were predicted with each model in each forest plot for a tree of 50 cm DBH (Fig. 2).For the four model shapes, the 95 % CI of the RMSE distributions completely overlap (Fig. 3): we found no significant differences between the four shapes in terms of biomass prediction.We decided to focus on model M4 for two main reasons: (1) it has biologically meaningful coefficients (contrary to M1 and M2), and (2) it is easier to manipulate than M3 and its exponential function.

Environmental and structural variable selection
The selected structure variables explaining the observed variation of α were the basal area (negative effect), the proportion of small stems (strong negative effect), and the proportion of medium stems (negative effect).In addition, the slope (positive effect) and the rainfall (negative effect) were selected among the environmental variables (Fig. 4, Table 1).
The selected structural variables involved in the replacement of β were the proportion of small stems (strong positive effect) and the proportion of bigger stems (positive effect).The rainfall (positive effect) and the drained area (negative effect) were selected from the environmental variables (Fig. 4, Table 1) to complete the structural variables.The variables selected for the replacement of α and β are not shared.All the selected parameters excluded zero from their 95 % confidence interval.The highest values were obtained for the proportion of small stems, highlighting its great explicative power.Environmental variables had very weak effects.

AGB prediction
The RMSE of the model including structural and/or environmental variables was larger than the RMSE of the sitespecific model and smaller than the RMSE of the universal model (Fig. 5).The RMSE of the model including environmental variables did not differ from the RMSE of the model using structural variables only (Fig. 5).

Discussion
Using a data set from diverse neotropical forests from French Guiana, we modeled the height-DBH relation using the Michaelis-Menten equation.The height-DBH relation varied between locations, which affected the AGB estimations.
We then demonstrated that part of the height-DBH relation variability could be explained by variables issued from the forest structure and somewhat from descriptors of the local environment.

Model choice and parameter values in the site-specific model
The four models were not significantly different from each other in terms of predicting height and therefore predicting AGB (Figs. 1, 3).We believe that our particular data weighting (Eq.6) was responsible for this closeness.It suggested that, with this weighting, one can use any of these four models to predict heights and then predict biomass.We also emphasize that the Michaelis-Menten model mathematical form is easy to handle as it has no exponential function.Though the exponential model has been used in the past, Feldpausch et al. (2012) found that the Weibull model was the most appropriate for biomass prediction (they did not consider the Michaelis-Menten model).We thus conclude that asymptotic models should be preferred.
Any comparison with published allometric models is often difficult because most studies do not report the error parameter of their fitted models.Given that the height is often log-transformed (e.g., Feldpausch et al., 2011) to achieve linearity, the back-transformation requires the application of a correction factor.To take this into account, a simulated error term needs to be added to each log-scale model prediction before transforming back to the arithmetic scale.
The α and β model parameters differed largely between forest plots (Fig. 2).This demonstrates that the height-DBH relationship was not the same in each plot, leading to contrasting height-DBH relationships and contrasting AGB values.
The α parameters represented the value of the horizontal asymptote for the largest DBH.This value was highly correlated with the maximum observed height in each forest plot (α = 1.06 × H max , R 2 = 0.98, RSE = 6.7).This result has important practical consequences.While it is not reasonable to measure the height of all trees in large-scale inventories, it could be feasible to measure the 10 higher trees or so to get the maximum height of a forest plot.Moreover, the maximum height of a forest plot is a direct output from lidar measurements.In either of these cases, the α parameter will not be predicted from environmental variables but will be estimated more or less directly.If the α parameter is known, the construction of the height-DBH model is more simple, straightforward and precise because it depends only on finding β.
The β parameter represented the slope of the oblique tangent in [0, 0].The larger β is, the faster the trees reach the asymptote.β values showed less variation than α between forest plots (Fig. 2).This suggests that the parameter could be inferred at the region level, with no site effect on its value.However, because some plot-to-plot differences remained, we decided to test the forest structure and environmental effect on this parameter.If one aims to build a height-DBH model estimating α as suggested above, one could consider using a constant β parameter for simplicity.

Stand structure variables
The competition for light between trees has been identified as a major driver of the tree height trajectory (Clark, 1996;Guariguata and Ostertag, 2001;Luyssaert et al., 2008).
The proportion of small trees (10-20 cm DBH) has a strong positive effect on β together with a strong negative effect on α.In a forest patch with a high density of small trees, the tree competition causes trees to grow faster in height (Hummel, 2000).The small positive effect of the proportion of biggest trees (more than 60 cm DBH) on β also suggests that the presence of a large tree, limiting the light resource, also causes the smaller trees to grow faster in height.

Environmental variables
Because the environment has an obvious effect on the forest structure (Baraloto et al., 2011), we decoupled in time the inclusion of structural and environmental variables in the final model.The negative effect of rainfall on α is unexpected.Rainfall, which is related to the water availability, has largely been described as a positive driver of forest height (Koch et al., 2004;Ryan et al., 2006).The negative effect of the drained area on β indicates that the trees grow more slowly in height in a seasonally flooded or waterlogged terrain.This is explained by (1) a greater light availability (Ferry et al., 2010) that is in turn linked to higher turnover rates (Madelaine et al., 2007;Ferry et al., 2010) and (2) higher mechanical constraints due to the lower soil stability in flooded areas (Gale and Barfod, 1999;Gale and Hall, 2001).The tradeoffs between the variables in α and β replacements, between α and β replacements and the low values of the model parameters means that that we should consider the highlighted patterns carefully.

Perspectives
In this study, we showed that part of the variability of the height-DBH relationship was successfully explained by the forest stand structure, expressed as the proportion of small trees (10-20 cm DBH).While basal area and rainfall were the most important variables at world scale (Feldpausch et al., 2011), we did not find them crucial at the regional scale in French Guiana.Our study did not include any soil effects on the height-DBH relationship, though we interpreted the effect of highly drained areas as a possible indicator of soil instability.If available, information on soil parameters (such as physical properties or chemical composition) can improve the predictions of height (Aiba and Kitayama, 1999;Quesada et al., 2009;Feldpausch et al., 2011).
Tree species is known to be an important determinant of tree height (Poorter et al., 2005).However, tropical forests have such a high diversity (up to 200 species ha −1 ; ter Steege et al., 2000) that individual species allometries cannot be inferred.Moreover, the use of a species-specific model for height prediction requires the species to be identified during the forest inventories.We wanted our method to be suitable for large-scale, quick inventories with no detailed species determination.Thus, we did not to include the tree species as a height predictor.
To go further, sophisticated models could be developed incorporating more information at the tree level.For example, if the botanical information is available, a trait-based model may provide substantial improvement.The functional traits have proven to catch information on species biological properties that may be related to the height-DBH relationship (Baraloto et al., 2010;Hérault et al., 2011).
The model can now be used to predict the coefficients of a height-DBH model for the entire region of French Guiana.
The new AGB estimates using the new predicted height will help us understand the spatial patterns of AGB variations and produce more accurate carbon stock estimates.
The Supplement related to this article is available online at doi:10.5194/bg-11-3121-2014-supplement.

Figure 1 .
Figure 1.The four model shapes (black lines) adjusted in two very different forest plots.The grey points represent data.

Figure 4 .
Figure 4. Variable selection.The bars represent the % of presence of the variables in the model, computed from the posterior valuesof the indicators i α,j and i β,j (Eq.8).The dotted lines indicate, in each selection process, which cut-off limit is chosen for the acceptance of a variable in the definitive models.The grey bar indicates the variables kept in the definitive model.The stand structure variables (first and third panels) were selected first.Then, keeping the selected structural variables, environmental variables were added to improve the model (second and last panels).Prop_X_Y -proportion of stems between X and Y cm, BA -basal Area, TRI_20terrain ruggedness index.

Figure 5 .
Figure 5. Box plots of the mean RMSE of the AGB predictions in 42 forest plots with tree heights predicted by four different height-DBH models: site-specific, universal, based on structural variables only, and based on structural variables completed with environmental variables.