Quantifying the Importance of Antecedent Fuel-Related Vegetation Properties for Burnt Area using Random Forests

The seasonal and longer-term dynamics of fuel accumulation affect fire seasonality and the occurrence of extreme wildfires. Failure to account for their influence may help to explain why state-of-the-art fire models do not simulate the length and timing of the fire season or interannual variability in burnt area well. We investigated the impact of accounting for different timescales of fuel production and accumulation on burnt area using a suite of random forest regression models that included the immediate impact of climate, vegetation, and human influences in a given month, and tested the impact of various combinations 5 of antecedent conditions in four productivity-related vegetation indices and in antecedent moisture conditions. Analyses were conducted for the period from 2010 to 2015 inclusive. We showed that the inclusion of antecedent vegetation conditions on timescales > 1 yr had no impact on burnt area, but inclusion of antecedent vegetation conditions representing fuel build-up led to an improvement of the global, climatological out-of-sample R from 0.567 to 0.686. The inclusion of antecedent moisture conditions also improved the simulation of burnt area through its influence on fuel build-up, which is additional to the influence 10 of current moisture levels on fuel drying. The length of the period which needs to be considered to account for fuel build-up varies across biomes; fuel-limited regions are sensitive to antecedent conditions over longer time periods (∼4 months) and moisture-limited regions are more sensitive to current conditions.


Data processing
There are gaps in the SWI, FAPAR, LAI, SIF, and VOD datasets in winter months at latitudes above ∼60 • N, and in the austral 100 winter for southern South America, due to high sun zenith angles for FAPAR, LAI, and SIF and because of snow cover and frozen soil for SWI and VOD (e.g. Moesinger et al., 2020). There are also sporadic missing values in these datasets caused by e.g. cloud cover. Missing values affect the calculation of antecedent conditions. These gaps were therefore filled using a twostep approach as in Forkel et al. (2017) for each location. First, persistent gaps, defined as months for which 50% or more of the observations across all years are missing, were filled using the minimum value observed at that location, for the given predictor interpolated to monthly intervals. Temporally static data (i.e. AGB) were recycled. Processing was carried out before averaging to provide monthly climatological time series.
The influence of antecedent conditions that might affect fuel loads or fuel dryness, specifically vegetation properties and DD, on BA was investigated by using antecedent FAPAR, LAI, VOD, SIF, and DD data from up to two years before any given month (1M, 3M, 6M, 9M, 12M, 18M, 24M, where M denotes months). The large autocorrelation between predictor variables could impede the interpretation of the impacts of antecedent periods ≥ 1 yr. Thus, anomalies were computed by subtracting 120 the seasonal cycle relative to the designated month, resulting in the following transformations: where X ∈ {FAPAR, LAI, VOD, SIF, DD}.

Machine learning experiments
We used random forest (RF) regression to model the relationships between BA and the driver variables (predictors). RF is an ensemble learning approach in which multiple decision trees are constructed using a randomly sampled subset of training observations. The final model is the average result from all of the individual decision trees. RF regression is highly suited to investigating the emergent controls on fire because it is able to learn non-linear relationships in high-dimensional space (Archibald 130 et al., 2009). By averaging over multiple decision trees, RFs also mitigate overfitting (Breiman, 2001 We trained a number of different RF regression models to test explicit hypotheses about the importance of antecedent conditions on BA (see Table 2) using the defined hyperparameters on the climatological timeseries. The initial experiment (ALL) was run using the basic set of 15 predictor variables related to climate, vegetation, and human influences on fire (Table 1) and included both current and antecedent values of the four vegetation indices and DD, giving 50 predictor variables. A second 145 experiment (TOP15) used only the 15 most important predictors from the ALL model, as a way of testing whether all the predictors were necessary and whether including so many predictors resulted in overfitting. All of the remaining experiments used

Measuring predictor importance and relationships
Our goal is to determine the contribution of individual predictors (including antecedent states of these predictors) to model skill at predicting BA, and to examine the relationships between predictors and BA.
There is no unique way to measure the importance of a given predictor on model skill in predicting BA and it is particularly difficult to assign importance to individual predictors when there is a high degree of collinearity between them (Dormann et al., (1)

185
Maximally significant timescales were then calculated by weighting the antecedent months, a, using SHAP magnitudes: where X denotes the basis predictor variable for which to carry out the calculation (e.g. FAPAR).
Locations with too many significant antecedent months were ignored in order to visualise resulting relationships more reliably; for example, if both the current (|SHAP X, |) and nine-month antecedent (|SHAP X 9M, |) magnitudes are dominant, the 190 weighted mean month (according to Eq. (2)) would lie in between which is physically meaningless. We designed an algorithm to detect SHAP values that differ significantly from the baseline in order to mitigate this. Additionally, we also employed a range-based threshold, whereby locations were ignored using the mean BA at location , BA (based on all BA samples), if We further used Accumulated Local Effects (ALE) plots (Apley and Zhu, 2020) to examine and interpret the coupled re-195 lationships fitted by the RF models. ALE plots are a more robust alternative to partial dependence plots (PDP) or individual conditional expectation (ICE) plots (Apley and Zhu, 2020;Molnar, 2020). We assessed the impact of each of the predictor locations and times. The causes of the inhomogeneities were explored using 2D ALE plots, which show the combined effect of two predictor variables on BA.
3 Results The ALL model, which includes all 50 variables, achieves an R 2 of 0.884 for the training dataset and of 0.686 for the validation dataset (Fig. 1). Out-of-sample predictions on the validation set ( Fig. 2b) show a similar geographic pattern to observed 205 BA (Fig. 2a). However, overprediction in the validation set relative to observed BA is more widespread than underprediction ( Fig. 2c). Nonetheless, there is no bias; the ALL model predicts a mean out-of-sample BA of 2.49 × 10 −3 compared to the expected 2.48 × 10 −3 . The apparent overprediction is the result of plotting relative (as opposed to absolute) errors, which amplifies the fact that the ALL model does not predict very low BA accurately: BA predictions are no lower than 2.22 × 10 −5 , while the observed BA is 0 for 85.7% of samples. Generally, extreme BA is captured more poorly by the model than interme-210 diate BA, leading to overprediction at low and underprediction at high BA values. More samples are over-predicted because there are more values with low BA than high BA, leading to many instances of slight overprediction balanced by few instances of comparatively large underprediction (Fig. S1, S2).
Climate variables and fuel-related vegetation indices have the strongest influence on BA in the ALL model (Table S2)  that 56% of grid cells are better predicted in the ALL model compared to the CURR model (Fig. 3). However, there does not appear to be any regional patterning in the improvement caused by including antecedent vegetation conditions. The decrease in the R 2 for the training dataset is smaller (-0.064) than for the validation dataset, indicating that overfitting may be more of 235 a problem in the CURR model than the ALL model. Compared to the ALL model, fuel-related vegetation properties are less important in the CURR model: VOD is the highest-ranked vegetation variable but is only fifth in importance (Table S2).
The fuel-related vegetation variables included in the TOP15 model are correlated with one another (Fig. S3), and there are high correlations between these four variables on specific antecedent timescales. This suggests it may be unnecessary to include all these variables to capture the influence of fuel build-up on BA. Comparison of the models which only include current and 240 antecedent conditions for one fuel-related vegetation variable (15VEG_FAPAR, 15VEG_LAI, 15VEG_VOD, 15VEG_SIF) confirms this. These models all perform better than the CURR model (Fig. 1). The 15VEG_FAPAR model performs better than the TOP15 model and almost as well as the ALL model. LAI appears to be a better predictor, considered on its own, than either SIF or VOD (Fig. 1). The 15VEG_FAPAR also has the lowest training-validation R 2 gap (0.192) of all the four models, which suggests that it has the most generalisable relationships. The importance of including antecedent DD is borne  Current and antecedent states of both fuel-related vegetation properties and DD have different impacts on BA (Fig. 4).
Current FAPAR has a negative effect on BA (Fig. 4a), which is strongest at intermediate levels of FAPAR, while antecedent FAPAR has a positive effect on BA. The impact of antecedent FAPAR is strongest for the preceding 1 month but persists for up to 6 months; longer lags tend to produce results more similar to the current relationship because of autocorrelation at the yearly scale. Current DD has a positive effect on BA (Fig. 4b), while antecedent DD has a generally negative effect except if DD is preceding 1M relationship, the negative impact becomes gradually stronger with antecedent DD up to 9M. These relationships make intuitive sense: whereas high antecedent levels of FAPAR suggest that fuel availability is not a limiting factor, high FAPAR in the current month indicates that the vegetation has sufficient moisture to be actively growing and therefore is less likely to burn. In contrast, dry conditions in the current month promote fire whereas dry conditions in preceding months reduce  Fig. 5a, S4a), the relationship between current LAI (and VOD) and BA is initially positive and then flat (Fig. 5b, S4b). This Although it is informative to consider the impact of individual predictor variables on BA, the expression of these relationships in the real world is likely to be conditioned by interactions with other variables. For example, low values of current FAPAR are associated with high BA (Fig. 4a), but this association occurs only when antecedent FAPAR is high (Fig. 6)  confined to shrublands in Africa. However, there is a significant interaction between current DD and current FAPAR (Fig. S5), with positive reinforcement of their influence on BA when DD is high and FAPAR is low and a negative influence on BA when 280 DD is high and FAPAR is high. Increased BA for high DD and low FAPAR is consistent with strong drought-induced fire in low productivity environments of sub-Saharan Africa, northern Australia, and isolated regions bordering the African tropical rainforests. Decreased BA as a result of increased DD and increased FAPAR is likely a sign of high-productivity environments that are not fire-prone, despite occasional drought.
The timescales of both fuel build-up and fuel drying are influenced by fuel type and are therefore expected to vary across 285 biomes. Current fuel-related vegetation properties, such as FAPAR (Fig. 7a), have an important effect in tropical regions, particularly dry tropical regions, but are less important in temperate forest regions. Antecedent FAPAR (Fig. 7b) is important in most regions, with the strongest influence from the antecedent 3-6 months. Current DD (Fig. 7c) is generally more important than antecedent DD, although the impact on BA varies geographically: tropical and boreal regions show decreased BA as a result of low current DD, while northwestern Australia, extra-tropical Africa, the Cerrado of Brazil, and the western USA 290 experience increased burning as a result of low DD (Fig. S6). The timescale on which antecedent drought affects BA (Fig. 7d) is more variable than that for fuel-related vegetation properties, ranging from 1-3 months in boreal forests, parts of sub-Saharan Africa, and northern Australia, to ∼4 months in the tropics, and ∼6 months or longer in more arid regions.

Discussion
Many studies have shown that climate, vegetation properties, and human factors are all important predictors of BA (e.g. 295 Aldersley et al., 2011;Bistinas et al., 2014;Forkel et al., 2017Forkel et al., , 2019a. Although spatial variation in BA is predominantly the result of spatial variability in moisture and fuel-load (Archibald et al., 2009), there is still poor understanding of the critical timescales for fuel build-up and fuel drying. We have shown that the inclusion of both antecedent vegetation conditions that influence fuel build-up and antecedent conditions that influence fuel drying are necessary for the accurate prediction of BA.
The length of the dry-day period in the current month has the largest impact on BA but antecedent DD can also be important, particularly in temperate regions.

305
The finding that fuel build-up on timescales longer than a year is not an important predictor of BA is surprising, given that fuel build-up as a result of fire suppression has been linked to large and catastrophic fires (e.g. in the United States; Marlon et al., 2012;Parks et al., 2015;Higuera et al., 2015). The failure to detect an influence of longer-term fuel build-up on BA probably reflects the short time interval (∼5 yrs) used for the analyses because of the requirement that all of the datasets were available for the same time period. The seasonal differences captured by our analyses may be unimportant in regions where 310 long fire return times (or fire suppression) allow fuel build-up over longer periods. Wetter forests with long fire-return intervals may also be more affected by longer term moisture deficits (Abatzoglou and Kolden, 2013) that are not captured in the limited time period analysed. It would be worthwhile re-examining the influence of longer timescales on BA in the future, when longer datasets are available.
We have shown that it is not necessary to include multiple fuel-related vegetation variables in order to predict BA, pro-315 vided that both current and antecedent conditions are taken into consideration. According to our analyses, FAPAR is the best fuel-related vegetation predictor. However, all four fuel-related vegetation predictors produce reasonable results, and other predictors (e.g. VOD) have been found to be important in other studies (e.g. Forkel et al., 2017). Optimising the timescales of different fuel-related vegetation predictors suggests that FAPAR is most important on short timescales (current, 1M) and the other vegetation properties appear to be more useful on longer timeframes. The good performance of models including FAPAR 320 is due to the fact that the parts of the world responding most strongly to FAPAR and FAPAR 1M tend to be fuel-limited, dry biomes accounting for the majority of global BA (Giglio et al., 2013). Therefore, globally averaged model performance metrics will tend to favour model predictor variables which best represent these dominant fire regimes.
Our analyses are impacted by the influence of previous fires on current vegetation conditions. Burnt grid cells could have a lower FAPAR, for example, as a result of prior burning in the current month. This is a problem, related to the temporal and 325 spatial scales of the analysis, because we are solely interested in how pre-fire vegetation conditions affect BA. A monthly analysis cannot resolve processes that occur on the order of only a few days, while the impact of previous fires on spatially averaged vegetation properties is proportional to the fraction of the grid cell that is burnt. In savannah regions of Africa and northern Australia, where on average up to 10% of a 0.25 • grid cell burns in any given month, this could have a significant effect on the averaged values of vegetation properties used in our analyses. Using a finer spatial scale for analysis would counteract 330 the impact of spatial smoothing, allowing burnt pixels to be ignored and predictor values to be estimated only from unburnt cells. Using a finer temporal resolution would allow the calculation of predictor variables only up to the time of burning. In practice, however, the lack of accurate fire statistics at finer scales (Abatzoglou and Kolden, 2013) limits the temporal and spatial resolution that could usefully be achieved.
Although the limitation of the spatial (and temporal) resolution of the observations could impact the realism of the RF 335 model, as could the omission of variables that affect fuel build-up, the consistency of the vegetation relationships shown by all the models including antecedent conditions indicates that processes related to fuel build-up are adequately represented by the chosen set of predictors.

Conclusions
The dependence of BA on multiple climatic and biophysical variables has been modelled using a random forest algorithm. 340 We showed that FAPAR was the most significant vegetation variable, and dry-day period and maximum temperature the most significant climatic variables, influencing BA. The inclusion of antecedent relationships with FAPAR and dry-day period significantly improved model performance. Antecedent relationships to BA were consistently different from instantaneous relationships, with the spatiotemporal variation of fuel build-up processes apparent via the relative importance of these different lagged relationships. A clear contrast between fuel-and moisture-limited regions was identified using the spatial variation of 345 the relationship between antecedent FAPAR and BA. Moisture-limited regions were more strongly affected by suppression of fire at instantaneous timescales, while fuel-limited regions were most affected by fuel build-up over the previous ∼4 months.
The instantaneous dry-day period was found to have the biggest effect on BA, suppressing or promoting fire based on the climate.