Comment on bg-2020-460 Anonymous Referee # 1 Referee comment on " Assessing MODIS Vegetation Continuous Fields tree cover product ( collection 6 ) : performance and applicability in tropical forests and savannas

Bias in MODIS VCF is a known issue. The study by Adzhar et al. adds new interpretation to the issue by comparing the latest version (Collection 6) of the satellite-based dataset with field data collected at ecological sites (24 forest and 24 savanna sites). The authors developed a simulation technique to address the scale discrepancy between the 100m x 100m sites and the 250m x 250m MODIS pixels, and extrapolated the derived relationship across the tropics. The study went on and analyzed the bias patterns of MODIS VCF per land-cover class using the MODIS land cover product. A major finding of the study is that MODIS VCF underestimates tree cover in tropical savannas and woody savannas, which has important implications for carbon cycle and forest restoration potentials. The paper is well written and easy to follow.


product.
As the VCF product has progressed from Collection 1 to its current Collection 6, several validations using insitu field data or higher-resolution remotely sensed data as a reference measurement have been carried out. These have been few and limited to sites within a biome (Montesano et al., 2009a), a region (Hansen et al., 70 2005;White et al., 2005), or within a country (Gao et al., 2014;Sexton et al., 2013). The MODIS VCF product evaluated was the most recent collection available at the time (i.e. Hansen et al., 2005 andWhite et al., 2005 for Collection 3; Montesano et al., 2009a for Collection 4;andGao et al., 2015 andSexton et al., 2013 for Collection 5). To our knowledge, no such independent validation experiment has yet been conducted on Collection 6, which produces tree cover estimates in the same manner as Collection 5 but with improvements 75 made to the upstream inputs to enhance its accuracy (DiMiceli, 2017).
The validations found that MODIS VCF may be less suitable for estimating tree cover in sparsely-vegetated areas. Huang & Siegert (2006) noted that MODIS VCF classified large areas of land as 'bare' where their land cover classification system identified it as sparsely-vegetated. Montesano et al. (2009) found that MODIS VCF 80 data (Collection 4) overestimated cover in areas of low tree cover in taiga-tundra transition zones. Sexton et al. (2013) found that the Collection 5 product overestimated cover in areas of low cover (below 20 %) and underestimated in areas of higher tree cover, while Gao et al. (2015) found that MODIS VCF can only partially discriminate between tropical forest and non-forest, struggling in areas that have greater heterogeneity. What is clear from the history of these validation and comparison experiments is that MODIS VCF has accuracy issues 85 in areas with low woody vegetation cover, which has implications when its tree cover estimates are treated as accurately representative of real-world conditions. Failure to accurately account for the product's difficulty in estimating low woody covers can, therefore, lead to miscalibrated models and estimations that do not reflect real-world conditions. This, in turn, has knock-on effects on environmental policy-making, conservation efforts, and future ecological research, especially in areas with vegetation cover types that are most prone to error.

90
Tropical savannas have woody covers that fall within the range particularly affected by the reported MODIS VCF errors. A large proportion of these savannas can be found in tropical developing countries (Boval and Dixon, 2012), and are predicted to be home to half of the world's population by 2050 (State of the Tropics, 2020). Tropical savannas are therefore highly vulnerable to anthropogenic change. In the face of a growing 95 population, land fragmentation, and changing climate, a savanna's ability to maintain robust ecosystem functions is directly linked to the amount of woody cover present (Sankaran et al., 2006). As a result, the ability to accurately monitor the state, dynamics, and woody cover trends of tropical savannas is a vital part of understanding how and why savannas are changing in the tropics (Harris et al., 2012;Miles et al., 2006), while also improving modelled climate projections and vegetation dynamics for this complex biome. 100 https://doi.org/10.5194/bg-2020-460 Preprint. Discussion started: 2 February 2021 c Author(s) 2021. CC BY 4.0 License.
In this study, we validate the accuracy of MODIS VCF Collection 6 in tropical savannas and forests by comparing the tree cover percentage of the product to corresponding field data. We then characterise the observed bias in woody covers across both savanna and forest ecosystems and apply our corrections across the tropics to highlight the regions most likely to be affected by these inaccuracies in the MODIS VCF product.

2 Methods
We used the MODIS VCF Collection 6 product (spatial resolution of 250 m, DiMiceli, 2017) with tree cover values averaged across the years 2006 through to 2009 to reflect the range of the field data collection period.

110
The in-situ field data was sourced from the 'TROpical Biomes InTransition' project (TROBIT) (www.geog.leeds.ac.uk/TROBIT, Torello-Raventos et al., 2013)  100 m x 100 m in size (except 3 sites from Ghana that are 50 m x 100 m). All the sites fall within the tropics, that is, within 23.5 degrees north and south of the equator.
The classification of the TROBIT sites as either 'forest' or 'savanna' is based on the parameters described in Torello-Raventos et al. (2013) and Veenendaal et al. (2015). A 'savanna' is a natural land cover that is not a 120 forest, bare ground, or a body of water, with 'forest' being defined as woody vegetation with a mean height of or exceeding 6 m for trees with a diameter at breast height (dbh) exceeding 10 cm, and a tree cover of or exceeding ~60 %. There is some ambiguity in how 'savannas' and 'grasslands' are defined, with some modelling-based research treating the two biomes as different (Whitley et al., 2017) while studies based on plant functional traits group them together (Solofondranohatra et al., 2018;White et al., 2000). As there is some concern that MODIS 125 VCF will struggle to pick up woody cover in areas with really sparse vegetation, in this paper we have decided to treat 'grasslands' as part of the savanna domain.
CAI is defined as the sum of the projected areas of individual tree canopies divided by the ground area. In the TROBIT project, the upper-stratum, mid-stratum, and subordinate-stratum crown areas are factored into the CAI 130 values and were recorded using methods outlined in Torello-Raventos et al. (2013). Membership to the strata is determined by the tree's dbh (upper-stratum: dbh > 10 cm, mid-stratum: 2.5 cm < dbh < 10 cm, and subordinatestratum: dbh < 2.5 cm). However, CAI values do not account for within-site tree canopy distribution patterns and the overlap between individual tree canopies. We account for this by converting each CAI value into a probability distribution function incorporating the following two extreme scenarios: "enforced overlap", where 135 the location probability of individual canopies increases linearly from 0 to 1 across a site; and "unenforced overlap", where individual canopies follow a uniform random distribution pattern and canopy overlap is not purposefully introduced (Fig. A2). We repeated this 1000 times per CAI measurement to determine the probability distribution of expected CAI for each site.

140
Unlike CAI, which is the fraction of ground covered by tree crowns, the percent tree cover value from MODIS VCF is defined as "the portion of the skylight orthogonal to the surface which is intercepted by trees" (Hansen et al. 2002). To make the MODIS VCF tree cover value comparable to the tree cover derived from the CAI of a TROBIT site, we divided the MODIS VCF values by 0.8 as suggested by Hansen et al. (2002). Next, to account for the difference in size between the MODIS VCF pixel (250 m x 250 m) and the smaller field plot size (100 m 145 x 100 m), we calculated the possible percent tree cover an area the size of a TROBIT field site could have, given the MODIS VCF percent tree cover for a MODIS-sized pixel. This was done for two extreme scenarios: "enforced clumping," where all the tree cover for the given MODIS VCF value is forcibly 'clumped' on one side of the pixel, or "unenforced clumping," where 'clumping' is not enforced and tree cover is distributed randomly within the pixel (Fig. A3). Basically, the clumping scenarios introduce possible variations in percent 150 cover due to the area and location mismatch between a TROBIT field site and a MODIS pixel. A probability distribution was generated for each MODIS VCF pixel by calculating percent tree cover values for 1000 samples (100 m x 100 m) randomly placed within the 250 m x 250 m MODIS VCF pixel. https://doi.org/10.5194/bg-2020-460 Preprint. Discussion started: 2 February 2021 c Author(s) 2021. CC BY 4.0 License. clumping; 2) enforce overlap and unenforced clumping; 3) unenforced overlap and enforced clumping; 4) enforced overlap and clumping. Comparisons were conducted by fitting the following function: Where 0 , , 1 , 2 are optimised parameters and VCF and C are the MODIS VCF pixel and TROBIT site probability distributions, respectively. This is similar to a standard linear regression of logit transformed data, accounting for maximum and minimum bounds of 0-100 % tree cover, with 1 , 2 allowing for a non-symmetric transformation of tree cover. To account for the probability density of each point, we inferred the parameters in Equation 1 using a Total Least Squares Bayesian Inference technique using a Metropolis-Hastings Markov

165
Chain Monte Carlo step. Priors were uninformed but physically bounded (i.e. , 1 , 2 > 0) to assume an increasing relationship between MODIS VCF and C. Our logit transformed data in Equation 1, along with 1 , 2 correcting for symmetry, allowed us to assume normally distributed model errors, and so our conditional probability of observations for a given parameter combination was described by a normal distribution (Gelman et al., 2013). Each combination was run over 10 chains, with 1000 warm-up iterations and 10,000 sampling sampled from each of our 10 optimisation chains (50 in total) and masking out pixels with cover types not considered as 'forest' or 'savanna'.
We used the 500 m MODIS Land Cover Type (MCD12Q1) product to identify the areas of 'forest' and 'savanna' across the tropics in the MODIS VCF product. MCD12Q1 is widely used by the global land surface 185 modelling community (e.g. Sellar et al., 2019;Wiltshire et al., 2020) and describes land cover in terms of 17 global land cover classes as per the International Geosphere-Biosphere Programme (IGBP, Table 3 in Sulla-Menashe and Friedl, 2018). The product is based on the same spectroradiometer (MODIS) and temporal resolution as the VCF product. Referring to the definition of 'savanna' of Veenendaal et al. (2015), the following land cover classes were chosen to represent 'savanna': Closed Shrubland, Open Shrubland, Woody For a more detailed land-cover-specific evaluation, we resampled the corrected 250 m MODIS VCF pixels to a 500 m grid and combined it with the MCD12Q1 product to construct land-cover-specific MODIS VCF tree 195 cover frequency distributions (Fig. A5). Our tree cover correction by cover type (Fig. 3) for the four clumping/overlap regression combinations was then calculated by multiplying each cover type MODIS VCF frequency distribution ( MODIS VCF underestimates tree cover within the 23 % to 72 % range across all four combinations of enforcedunenforced overlap and clumping (black line, Fig. 1). Below 12 %, MODIS VCF tree cover values do not significantly disagree with TROBIT field data, and may instead be overestimating tree cover (50 % confidence, dashed line, Fig. 1). A similar pattern is seen when tree cover exceeds ~80 %: MODIS VCF does not differ 220 significantly from TROBIT when there is no enforced overlap (i.e. when tree canopies are spaced randomly within the site - Fig. A2 left), but may overestimate tree cover when overlap is enforced (i.e. trees are clustering towards one side increasing the degree of canopy overlap - Fig. A2 right).
There is a clear difference in how accurately MODIS VCF estimates tree cover in forested areas (in green, Figure 1) as opposed to areas identified as savannas (in orange, Fig. 1 underestimates tree cover where tree cover exceeds 79 % (at the 95 % confidence interval) when neither overlap 230 nor clumping is enforced, and overestimates where tree cover exceeds 58 % (at 5 % confidence interval) when both overlap and clumping are enforced.

Global estimates of change in tropical tree cover
We assessed the impact of VCFs underestimation of intermediate tree covers across the tropics (IGBP forest and savanna land cover classes), using a correction based on the combined forest and savanna sites (black curve, 235 Fig. 1) instead of using the savanna-only sites for a savanna-specific correction (orange curve, Fig. 1). This is because there were few TROBIT sites representing savanna with MODIS VCF tree cover values exceeding 40 %, and global land cover maps disagree on the distribution of savannas within the forest-savanna ecotone .

245
The distribution of tree cover change post-correction is similar across the four scenarios (Fig. 2), and the regions where all four scenarios agree on the direction of change (positive and negative) are substantial. However, there are some differences caused by the uncertainty introduced by different extents of overlap and clumping. While we see a significant increase in tree cover across all clumping-overlap combinations in many regions of tropical savannas and grasslands (Pennington et al., 2018), such as in the forest-savanna mosaics that surround

250
Congolian rainforests, we do not see the same pattern in the Cerrado of Brazil. This is likely because the African forest-savanna regions fall within the range of MODIS VCF values that consistently undergo a positive correction (~30-50 %, see Fig. A4), while the Cerrado of Brazil does not.
We also see significant tree cover decrease in the Sahel post-correction in most or all of the scenarios, which runs counter to the results of Brandt et al. (2020) that found that tree cover was underestimated in the region.

255
This disparity may be explained by our lack of field sites in more arid regions. As these corrections were based on a limited number of sites in a limited number of regions, it is important to note that the maps shown in Figure  2 are not definitive. Instead, it should be used to identify areas where MODIS VCF estimates may be more or less reliable.  When looking at our correction in more detail, we see that MODIS VCF significantly underestimates tree cover in all the IGBP land cover classes that we considered, regardless of overlap or clumping (95 % confidence interval) (Fig. 3). The most substantial and significant underestimation is in the classes 'woody savannas' and 'savannas'. The underestimation is the largest in woody savannas, except when clumping and overlap are 270 enforced (in purple, Fig. 3). This is because the peak in the tree cover distribution for savannas aligns with https://doi.org/10.5194/bg-2020-460 Preprint. where the correction for maximum overlap and clumping is the largest (i.e. at about 20 % tree cover, see Fig.  A5), while the peak in cover distribution for woody savannas (30-70 %, Fig. A5) aligns with the cover range that undergoes the greatest correction (Fig. 4, Fig. A5) in the other clumping and overlap scenarios.

Change in tree cover within different vegetation classes in tropical ecosystems
Open shrubland only shows a small underestimation of tree cover, despite having a tree cover range that closely 275 resembles the range where VCF most underestimates tree cover (i.e. 20-60 % cover). The discrepancy may be because IGBP class definitions may not entirely correspond with the tree cover percentage identified by MODIS VCF, or there is some issue with how apt the IGBP description of shrublands is in terms of tree cover. For example, IGBP distinguishes 'shrublands' from 'savannas' based on a woody perennial height threshold of < 2 m height, while MODIS VCF only observes canopies of trees exceeding 5 m in height. MODIS VCF tree cover 280 in open shrubland is therefore likely to be lower than the IGBP tree cover (see Fig. A5), resulting in corrections that are more conservative.
We found significant increases in tree cover for 'forests' in every correction scenario, though net change is only significant (95 % confidence) when overlap is unenforced. This can be explained by the presence of both negative and positive corrections in the higher ranges of tree cover when overlap is enforced. Similarly, the net 285 change is insignificant across all clumping and overlap scenarios for the IGBP classes matching the lower ranges of tree cover (grassland, close shrubland and open shrubland).

Discussion
While MODIS VCF is a powerful and accessible tool to map tree cover, our field data-based corrections indicate that the latest MODIS VCF collection 6 is missing a lot of woody cover even when uncertainty 290 introduced by site canopy overlap and clumping within the MODIS VCF pixel are accounted for. Our maps (Fig. 2) highlight that this potential underestimation of woody cover is mainly occurring in tropical savannas. Moreover, the highest underestimation in the savanna classes occurs when there is no enforced overlap (i.e. when there is a uniform random distribution of trees) which is the most likely scenario for the TROBIT savanna plots as evidenced by work done by Veenendaal et al. (2015) to test the plots for complete spatial randomness.

295
Only minor indications of overlap were found. Woody savannas, as an example, may have their tree cover underestimated by up to 27 % (95 % confidence) when neither clumping nor overlap is enforced (in black, Fig.  3). If our results are representative of the tropics, then overall, MODIS VCF may be underestimating tropical tree cover by between 5-22 % for unenforced clumping and overlap or 0-22 % for when either clumping or overlap are enforced (5-95 % confidence).

300
An overestimation at the lower end of the cover (< 20 %) (Hansen et al., 2002;Sexton et al., 2013) and underestimation in the middle range of cover have been identified in validations of previous MODIS VCF collections (Gross et al., 2018;Yang and Crews, 2019). MODIS VCF only maps trees that are 5 m or taller (Hansen et al. 2003), while the TROBIT CAI includes all trees. This could explain the observed underestimation in the lower tree cover ranges for the savanna sites. However, the discrepancies we found between the percent 305 tree cover ranges of the IGBP class descriptions (i.e. MCD12Q1 product) and the corresponding MODIS VCF suggests that the 5 m height threshold may not always apply. Unlike in MODIS VCF, an IGBP 'tree' has a minimum height of 2 m, and 'shrubs' are shorter. Therefore, MODIS VCF recording tree cover in the open and closed shrubland classes of MCD12Q1 (Fig. A5) indicates one of three things: these shrublands contain trees tall enough to be measured by MODIS VCF (> 5 m), that MODIS VCF is able to spot shorter trees (< 5 m), or some 310 combination of both scenarios. More importantly, for the MCD12Q1 savanna class, MODIS VCF yields a percent tree cover range that matches closely with the savanna class definition (between 10 % and 30 %), even though the MODIS VCF and IGBP tree thresholds are different. This would suggest that in areas classed as 'savanna', trees grow taller than 5 m, or, similar to the shrubland classes, MODIS VCF is picking up trees with heights below 5 m. Because of how our field reference CAI is derived, we were not able to apply a 5 m 315 threshold to the reference data. As a result, we cannot conclusively link the 5 m threshold to our observed underestimation. We expect that the TROBIT CAI percent tree cover value will include tree cover that is not identifiable by MODIS VCF, and that our observed underestimation is not as extensive as shown. More work needs to be done to evaluate how effective MODIS VCF is at distinguishing trees above the 5 m threshold.
Our results suggest that the biases found in the previous collections may have persisted in collection 6, despite 320 reported improvement in accuracy . This indicates that the biases introduced by binning the training data (Gerard et al. 2017) and using a CART (Classification and Regression Tree) model (Hanan et al., 2013) are inherent and still present within this version of MODIS VCF. Models calibrated using MODIS VCF (Brandt et al., 2017;Burton et al., 2019;Kelley et al., 2019Kelley et al., , 2020 risk inheriting these biases and should therefore be validated using other sources of data. We suggest that while MODIS VCF gives a good overview of 325 https://doi.org/10.5194/bg-2020-460 Preprint. Discussion started: 2 February 2021 c Author(s) 2021. CC BY 4.0 License.
tree cover on a global scale, it should be re-calibrated before it is used as a reference or training data. Special care should be taken in savannas, a biome that has long been noted as being challenging for E.O. products to characterise, as solitary trees in the landscape tend to be missed by global tree cover products (Jung et al., 2006, Brandt et al., 2020. The poor performance of MODIS VCF in savannas in particular (Gaughan et al., 2013;Gross et al., 2018;Kumar et al., 2019) emphasises the importance of continuous independent validation and re-330 calibration of the product. The ecosystem functions of savannas can vary drastically with just a slight difference in tree cover (Gaughan et al., 2013) and even slight errors may create issues in how we interpret the state and dynamics of the biome, which in turn affects how the land is managed.
Work on forest restoration potential would also be impacted. Bastin et al. (2019), for example, used MODIS VCF to estimate tree cover in agricultural land. As this tree cover is likely to have been underestimated 335 substantially, the derived available land space for replanting may be less than projected, with the restoration potential overestimated. However, our results also indicate an underestimated tree cover in woodier savannas and forests. Accounting for this, the restoration potential could actually be greater than anticipated, because the carrying capacity of a unit of land could be greater than previously thought. The MODIS VCF correction could also result in a more uniform cover distribution across regions, producing a more gradual transition between 340 low-cover savannas and high-cover forests. This could have implications for work that, for example, uses MODIS VCF to study forest-savanna dynamics and bi-stability (Lasslop et al., 2018;Wuyts et al., 2017;Xu et al., 2016).
To ensure the appropriate use of the product, we suggest that where field data is available, the height threshold of MODIS VCF should be evaluated, and the product itself should be calibrated for use in the target region.

345
Alternatively, comparing MODIS VCF to other land cover maps or higher-resolution remotely sensed data is recommended (Gross et al., 2018;Lary and Lait, 2006), though without a large-scale effort to re-calibrate MODIS VCF, the question of how appropriate MODIS VCF is for use in both forests and savannas in the tropics will remain. By highlighting the extent to which MODIS VCF struggles to estimate tree cover in tropical forests and savannas, we hope to inform the future use of this product to improve its useability.

Conclusion
We found that MODIS VCF significantly underestimates tree cover in tropical forests and savannas, even when within-field site and field site-pixel variation are accounted for during validation. As MODIS VCF is a product that is commonly used in a wide variety of ecological research including vegetation modelling, estimating restoration potential, and identifying forest-savanna bimodality, we stress that more independent work on 355 validating and re-calibrating is required before its tree cover estimates can be relied upon in the tropics. can be found in Fig. 1 of Torello-Raventos et al., 2013. https://doi.org/10.5194/bg-2020-460 Preprint. Discussion started: 2 February 2021 c Author(s) 2021. CC BY 4.0 License. Figure A2. Visual representation of the effects of enforcing overlap within a (100 m x 100 m) TROBIT site with a given Canopy Area Index (CAI). Left: Overlap is not enforced, and individual crowns follow a uniform random distribution. Right: Overlap is enforced by linearly increasing the probability of a canopy being located more on one side of the site (i.e. here the right side of the site) than the other. This results in tree canopies 'overlapping' to a greater extent, which affects how accurately CAI represents actual canopy cover.