Topography-based statistical modelling reveals high spatial variability and seasonal emission patches in forest floor methane flux

Boreal forest soils are globally an important sink for methane (CH4), while these soils are also capable of emitting CH4 under favourable conditions. Soil wetness is a well-known driver of CH4 flux, and the wetness can be estimated with several terrain indices developed for the purpose. The aim of this study was to quantify the spatial variability of the forest floor CH4 flux with a topographybased upscaling method connecting the flux with its driving factors. We conducted spatially extensive forest floor CH4 flux and soil moisture measurements, complemented by ground vegetation classification, in a boreal pine forest. We then modelled the soil moisture with a random forest model using digital-elevation-model-derived topographic indices, based on which we upscaled the forest floor CH4 flux. The modelling was performed for two seasons: May–July and August–October. Additionally, we evaluated the number of flux measurement points needed to get an accurate estimate of the flux at the whole study site merely by averaging. Our results demonstrate high spatial heterogeneity in the forest floor CH4 flux resulting from the soil moisture variability as well as from the related ground vegetation. The mean measured CH4 flux at the sample points was −5.07 μmol m−2 h−1 in May–July and −8.67 μmol m−2 h−1 in August–October, while the modelled flux for the whole area was −7.42 and −9.91 μmol m−2 h−1 for the two seasons, respectively. The spatial variability in the soil moisture and consequently in the CH4 flux was higher in the early summer (modelled range from −12.3 to 6.19 μmol m−2 h−1) compared to the autumn period (range from −14.6 to −2.12 μmol m−2 h−1), and overall the CH4 uptake rate was higher in autumn compared to early summer. In the early summer there were patches emitting high amounts of CH4; however, these wet patches got drier and smaller in size towards the autumn, changing their dynamics to CH4 uptake. The mean values of the measured and modelled CH4 fluxes for the sample point locations were similar, indicating that the model was able to reproduce the results. For the whole site, upscaling predicted stronger CH4 uptake compared to simply averaging over the sample points. The results highlight the small-scale spatial variability of the boreal forest floor CH4 flux and the importance of soil chamber placement in order to obtain spatially representative CH4 flux results. To predict the CH4 fluxes over large areas more reliably, the locations of the sample points should be selected based on the spatial variability of the driving parameters, in addition to linking the measured fluxes with the parameters.

Abstract. Boreal forest soils are globally an important sink for methane (CH 4 ), while these soils are also capable of emitting CH 4 under favourable conditions. Soil wetness is a well-known driver of CH 4 flux, and the wetness can be estimated with several terrain indices developed for the purpose. The aim of this study was to quantify the spatial variability of the forest floor CH 4 flux with a topographybased upscaling method connecting the flux with its driving factors. We conducted spatially extensive forest floor CH 4 flux and soil moisture measurements, complemented by ground vegetation classification, in a boreal pine forest. We then modelled the soil moisture with a random forest model using digital-elevation-model-derived topographic indices, based on which we upscaled the forest floor CH 4 flux. The modelling was performed for two seasons: May-July and August-October. Additionally, we evaluated the number of flux measurement points needed to get an accurate estimate of the flux at the whole study site merely by averaging. Our results demonstrate high spatial heterogeneity in the forest floor CH 4 flux resulting from the soil moisture variability as well as from the related ground vegetation. The mean measured CH 4 flux at the sample points was −5.07 µmol m −2 h −1 in May-July and −8.67 µmol m −2 h −1 in August-October, while the modelled flux for the whole area was −7.42 and −9.91 µmol m −2 h −1 for the two seasons, respectively. The spatial variability in the soil moisture and consequently in the CH 4 flux was higher in the early summer (modelled range from −12.3 to 6.19 µmol m −2 h −1 ) compared to the autumn period (range from −14.6 to −2.12 µmol m −2 h −1 ), and overall the CH 4 uptake rate was higher in autumn compared to early summer. In the early summer there were patches emitting high amounts of CH 4 ; however, these wet patches got drier and smaller in size towards the autumn, changing their dynamics to CH 4 uptake. The mean values of the measured and modelled CH 4 fluxes for the sample point locations were similar, indicating that the model was able to reproduce the results. For the whole site, upscaling predicted stronger CH 4 uptake compared to simply averaging over the sample points. The results highlight the small-scale spatial variability of the boreal forest floor CH 4 flux and the importance of soil chamber placement in order to obtain spatially representative CH 4 flux results. To predict the CH 4 fluxes over large areas more reliably, the locations of the sample points should be selected based on the spatial variability of the driving parameters, in addition to linking the measured fluxes with the parameters.
E. Vainio et al.: Spatial variability in forest floor methane flux the largest natural CH 4 sink, boreal upland forests are considered to be a globally important terrestrial sink due to soil CH 4 oxidation by methanotrophs (Kirschke et al., 2013;Saunois et al., 2016). The sink role of upland forests is well in agreement with the current paradigm, where methanotrophy only occurs in oxic conditions, while methanogenesis requires anoxic conditions. However, CH 4 -producing methanogens are found to be universal also in well-drained upland soils (Angel et al., 2012), which is linked to the findings that methanogenesis can occur in anaerobic microenvironments within oxic soils (Angel et al., 2011;Angel et al., 2017).
As the availability of oxygen is the main controller for CH 4 dynamics, soil moisture and water table level are among the most important factors regulating CH 4 formation as well as consumption in soils. When soils become inundated with water, the environment often turns anoxic, thus creating favourable conditions for methanogenesis -however, there are likely notable time lags between the start of inundation and methanogenesis, complicating the analyses of dependencies between these processes. Consequently, upland boreal forest soils Matson et al., 2009;Savage et al., 1997), and even the whole forest ecosystems (Shoemaker et al., 2014), can shift from CH 4 consumption to CH 4 emission, or vice versa, following the soil water conditions. Besides soil moisture, temperature is known to be an important factor in regulating CH 4 fluxes, by controlling several microbial reactions, including methanogenesis and methanotrophy (Luo et al., 2013;Praeg et al., 2017;Yvon-Durocher et al., 2014). Similarly to microbial production of CH 4 , non-microbial CH 4 production in soil (Jugold et al., 2012;B. Wang et al., 2013) has also been linked to soil water conditions and temperature: the alternation of soil drying and re-wetting (Jugold et al., 2012), as well as high temperature (Jugold et al., 2012;B. Wang et al., 2013), enhances the non-microbial CH 4 emissions. Spahni et al. (2011) estimated global CH 4 emissions from occasionally wet mineral soils to be 58-93 Tg CH 4 yr −1 , accounting for 11 %-18 % of the global emissions (depending on the scenario). Annual CH 4 flux of upland sites is evaluated to range from −23 to 73 g CH 4 m −2 yr −1 (Treat et al., 2018). Nevertheless, aerated soils are generally considered to consume CH 4 , while CH 4 production via methanogenesis in occasionally wet mineral soils is neglected from most of the global models (Curry, 2007;Saunois et al., 2016). Furthermore, the division of ecosystems into upland and wetland sites is to some extent imprecise and thus subject to continuous discussion, as the intrinsic definition of "upland" may vary from study to study. Usually the concept of "upland" is relative to the surrounding topography, and there is no uniform limit for e.g. minimum elevation.
The upland forest CH 4 emission estimates are partly based on observations above forest canopies (Flanagan et al., 2020;Mikkelsen et al., 2011;Shoemaker et al., 2014). They further raise the question whether these CH 4 emissions originate only from the forest floor or whether trees, which have also been reported to emit CH 4 (e.g. Gauci et al., 2010;Machacova et al., 2016), contribute to the ecosystem-level flux. As the sources and sinks within the ecosystems are not adequately known or accounted for, forest floor CH 4 fluxes require revisit and thorough estimation in all the climatic zones, especially in the boreal zone, where the climate warming is pronounced, and at both "upland" and "lowland" sites with an emphasized focus on the local topography.
In order to precisely estimate the forest floor CH 4 flux variation, determining the variability of the driving parameters, i.e. particularly soil moisture, is needed. Airborne lidar (light detection and ranging) is an active remote-sensing method that can be used to observe the vegetation and terrain (Jaboyedoff et al., 2012) and that is very effective in forests (Korpela et al., 2009). Soil moisture is highly dependent on the terrain topography, like elevation and slope, and there are several digital elevation model (DEM)-derived digital terrain indices developed for estimating soil wetness (Ågren et al., 2014). When combining lidar-based measurements with the variables of interest measured on-site, it is possible to create landscape-scale maps of the studied variable, such as forest floor-soil CH 4 exchange (Kaiser et al., 2018;Sundqvist et al., 2015;Warner et al., 2019) or soil moisture (Kemppinen et al., 2018).
In this study, we used a relatively high number of measurement points (60 points in an area of ca. 10 ha) in order to fully cover the small-scale spatial variability in the CH 4 flux and its driving forces. Similar types of studies using chamber measurements are rarely based on more than 20 measurement points yet assuming that they are representative of a larger area. The aim of this study was (1) to quantify the spatial variation in the forest floor CH 4 exchange, (2) to quantitatively link small-scale spatial variability in the upland forest floor CH 4 exchange to the topography, soil moisture, and vegetation structure, and (3) to detect the potential CH 4 -emitting patches (hotspots). We combined the CH 4 flux data with the driving parameters to produce an upscaled ecosystem-scale forest floor CH 4 flux of the area. Only a few studies have applied a similar approach (Kaiser et al., 2018;Warner et al., 2019), among them Kaiser et al. (2018) in boreal coniferous forest, emphasizing the novelty of this research.

Site description and experimental design
In order to quantify the spatial variability, we conducted forest floor CH 4 flux and soil moisture measurements at 60 sample points covering an area of ca. 10 ha around the SMEAR II station (Station for Measuring Ecosystem-Atmosphere Relations) in Hyytiälä, southern Finland (61 • 51 N, 24 • 17 E; 160-180 m a.s.l.). The station is a combined ecosystem and atmospheric site in the ICOS (Integrated Carbon Observa-tion System) network. The measurements were performed during two growing seasons (2013 and 2014). The site was regenerated in 1962 by prescribed burning and sowing of Pinus sylvestris (Scots pine) (Hari and Kulmala, 2005). The mineral soils in the area are mostly podzols, while there are also some small peaty depressions and some areas with almost no topsoil on the bedrock (Ilvesniemi et al., 2009). The soil at the site is rather shallow (5-150 cm) on top of the bedrock (Hari and Kulmala, 2005). Annual mean temperature and precipitation in 1981-2010 were 3.5 • C and 711 mm, respectively (Pirinen et al., 2012).
To represent the heterogeneity in vegetation and soil moisture, we located 6 sample points in the highest area on top of the hill and 54 at all the wind directions from the hilltop (Fig. 1). The sample points were identified based on the cardinal and intermediate directions from the centre of the studied area (the main mast of SMEAR II), thus having eight sectors (north-north-east sector N-NE, north-east-east NE-E, etc.), accompanied by an Arabic numeral (1-9) depending on the distance from the centre of the study area (e.g. sample point SE-S-1 being the closest to the centre at the sector SE-S). The hilltop sample points are located in the sectors N-NE, NE-E, and E-SE, but they are labelled with the letter H instead of the directions.

Flux measurements
The flux measurements were conducted with non-steadystate non-flow-through static chambers (Livingston and Hutchinson, 1995) at each sample point, principally following the guidelines compiled in the ICOS protocol (Pavelka et al., 2018). The majority of the measurements were conducted with opaque aluminium or stainless steel chambers. The hilltop chambers were on average 0.027 m 3 , including the collar (depending on the collar height and the vegetation inside the chamber), covering a forest floor area of 0.40 × 0.29 m. The rest of the chambers were on average 0.102 m 3 , covering an area of 0.55 m × 0.55 m. Part of the measurements were conducted with transparent chambers made of FEP (fluorinated ethylene propylene) foil and PTFE (polytetrafluoroethylene) tape in order to test the effect of photosynthetically active radiation (PAR) on the CH 4 flux. The transparent Figure 1. Locations of the sample points (diamonds) at the study site. The hilltop sample points are coloured light green, and the rest are pink. The cartographic depth-to-water (DTW) index is showed on top of the aerial image, lighter colour indicating a higher DTW, i.e. drier soil. chamber measurements were always performed 15-155 min before the opaque chamber at the same sample points. As there was no significant difference in the flux between the chamber types, the data were merged (transparent chambers were used in 14 % of the measurements in the final data). All the chambers were equipped with a fan to ensure mixing of the chamber headspace air and a vent tube to minimize pressure disturbances inside the chamber. The collars were installed in May 2013 1 week before the beginning of the measurements (except for the hilltop chambers, which were installed already in 2002 - Pihlatie et al., 2007) at a depth of ca. 5 cm to avoid cutting of tree roots and to minimize the sideways diffusion in the soil affecting the flux (Hutchinson and Livingston, 2001). Fine quartz sand was added to the edges of the collars to ensure the installation.
The chambers were closed for 35-45 min, and five samples were taken during each closure. Small parts of the closures (10 % of the final data) were 75 min with seven samples, due to a separate study on N 2 O fluxes. The samples were taken with 65 mL syringes (BD Plastipak™, Becton, Dickinson and Company, New Jersey, USA), and samples of 20 mL were inserted into glass vials (12 mL, Labco Exetainer ® , Labco Limited, Wales, UK) after flushing the vial with the sample. The samples were stored in the dark at +5 • C before analyses with a gas chromatograph (GC) (7890A, Agilent Technologies, California, USA) equipped with a flame ionization detector (for details, see Pihlatie et al., 2013). The chamber headspace air temperature was also recorded (DT-612, CEM Instruments, Shenzhen Everbest Machinery In- Measurements from the hilltop sample points were conducted every 2-9 weeks between 21 March and 20 December 2013 and every 2-7 weeks between 10 April and 27 November 2014. The other 54 sample points were measured on average every 3-4 weeks between 29 May and 13 September 2013 and between 20 May and 10 December 2014. In practice, all the hilltop sample points were always measured during one day, while the rest of the sample points (or as many as possible) were measured during a 5 d period. In May-October, each sample point was measured on average every 22 d. Some of the sample points were measured significantly more often than others, each being measured 7-23 times during the 2-year campaign with a median of 13 measurements per sample point. The most active measurement period was June-August for both years.

Flux calculation
The procedure in flux calculations included (1) filtering outliers from raw concentration data, (2) calculating fluxes using linear and non-linear functions and estimating goodness-offit (GOF) parameters for the fluxes, (3) flagging the fluxes based on the method quantification limit (MQL; Corley, 2003), (4) applying GOF criteria to flux data, and (5) creating final flux data.
We removed the outliers from the CH 4 mixing ratio data by using a robust regression analysis that uses iteratively reweighted least squares with a bisquare weighting function (Holland and Welsch, 1977) and by setting a weight limit to 0.87 and discarding all points below this limit as outliers. The fluxes were calculated from the outlier-filtered raw data using both linear and exponential fits (for the calculation, see Pihlatie et al., 2013). The exponential fit parameters were based on 17th-order Taylor power series expansion (Kutzbach et al., 2007).
Firstly, decreasing CO 2 in an opaque chamber or CO 2 flux below the MQL indicate a possible problem with the measurement, e.g. leaking chamber, and thus these measurements were omitted. For the CH 4 fluxes that were above the MQL, the following GOF criteria must be met for the flux to be included in the final flux data: normalized root mean square error (NRMSE) below 0.2 and the coefficient of determination (R 2 ) above 0.7. The fluxes below the MQL were accepted in the final data as such, without applying the NRMSE and R 2 criteria, as neither of these GOF parameters works for close-to-zero fluxes. Furthermore, if the CH 4 mixing ratio was >10 ppm in the beginning of the closure, the flux was omitted. Finally, there was one exceptionally large CH 4 emission which was omitted from the final data set.
The MQL of the GC was 0.10 ppm for CH 4 and 151 ppm for CO 2 (calculated according to Corley, 2003). The CH 4 fluxes below the MQL were between −3.74 and +2.38 µmol m −2 h −1 for the larger chambers and between −0.146 and +0.244 µmol m −2 h −1 for the smaller chambers, calculated by the linear fit. While in general linear fit tends to underestimate the chamber fluxes (Pihlatie et al., 2013), regarding small fluxes, exponential fitting is more prone to errors and over-parameterization, and the relationship between linear and exponential flux values is more variable (Korkiakoski et al., 2017;Pedersen et al., 2010). Thus it is recommended to select between linear or non-linear fitting depending on the concentration data (Korkiakoski et al., 2017;Pedersen et al., 2010). We used linear fits for all the fluxes that were below the MQL.
After filtering the data, the final flux data included in total 723 measurements, of which 344 are from year 2013 and 379 from year 2014. There were 5-21 measurements from each sample point, with a median of 11 measurements per point. In the final data set, 467 fluxes were calculated with exponential fit and 256 with linear fit, of which 184 were below the MQL.

Environmental variables
We measured soil moisture (volumetric water content) in the A horizon (ca. 5 cm depth) manually at the sample points (except for the hilltop area) simultaneously with the flux measurements (ThetaProbe ML2x with HH2 Moisture Meter, Delta-T Devices Ltd, Cambridge, UK). The soil moisture was calculated as an average of three recordings at a sample point. In the hilltop area, the soil moisture was measured continuously with a Time Domain Reflectometer (TDR-100, Campbell Scientific Inc., Utah, USA).
Soil temperature in the A horizon was logged next to each sample point (except for the hilltop) eight times a day from June to October in 2013 and from April/May to October in 2014 with Thermochron iButton devices (Maxim Integrated Products, California, USA). In the hilltop area, the A horizon soil temperature was recorded automatically at five locations by silicon temperature sensors (KTY81-110, Philips, Netherlands) at 10 min intervals. In the analysis, we used daily average soil temperatures of the flux measurement days at each sample point. For May-June in 2013, when the iButtons were not yet installed, we used the hilltop soil temperature data, as the temperature was rather consistent at all the sample points (average temperature on 12-30 June 2013 at the sample points ranged from 10.8 to 13.4 • C).
In addition to the continuous recordings of soil temperature and moisture, we used air temperature at 4.2 m height (Pt100 sensors with radiation shields by Metallityöpaja Toivo Pohja) and precipitation (Vaisala FD12P weather sensor at 18 m height) measured at the SMEAR II station.

Ground vegetation of the sample points
The composition of ground vegetation in 54 sample points (all except the hilltop points) was described by estimating projection cover of each plant species with the help of a frame divided into 0.1 × 0.1 m sectors. Cover less than 5 % was marked as 3 %. To group the sample points based on their plant composition, we performed a two-way indicator species analysis (TWINSPAN), a divisive clustering method using TWINSPAN for Windows version 2.3 (Hill and Šmilauer, 2005). We used moss species as indicators because their distribution is generally more strongly related to soil moisture than the distribution of clonal vascular plants typical of boreal forests (e.g. Hokkanen, 2006). For a robust result we excluded species whose frequency was less than 3. Based on the plant species composition in 54 sample points, we created four vegetation groups. To visualize the variation within and between the groups, we performed canonical correspondence analysis (CCA) where vegetation groups from TWINSPAN were used as environmental variables. Before CCA, detrended correspondence analysis (DCA) was conducted to decide between linear and unimodal methods. Canoco 5.11 (Šmilauer and Lepš, 2014) was applied for both analyses.

Statistical analyses
Multiple-group comparisons were performed with one-way ANOVA and two-group comparisons with a t test, when Levene's test indicated equal variances, and distribution was normal or sample size large enough. When groups had unequal variances, we used Welch's ANOVA (analysis of variance) together with a Games-Howell test as a post hoc test. When comparing two groups with unequal variances, we used Welch's t test or Satterthwaite's approximation. When groups had equal variances but distribution was non-normal or sample size was very small, we used Kruskal-Wallis, followed by Bonferroni correction for pairwise comparisons.
Spearman's correlation coefficients were calculated to study the relationships between the CH 4 fluxes and the environmental/topographical parameters at the camber locations. Spearman's correlation was also performed between the CH 4 flux, soil moisture, and soil temperature time-series data, as the correlations were not linear. Soil temperature can increase both CH 4 emissions and uptake, by increasing the activity of the soil microbes, and thus we used absolute flux values when examining the effect of the temperature.
Welch's t test, Welch's ANOVA, and Kruskal-Wallis, accompanied by the post hoc tests, were performed with SPSS (IBM SPSS Statistics 24, New York, USA). Regular one-way ANOVA, Levene's tests, and correlation analyses were performed with MATLAB (R2017b/R2018b, MathWorks, Natick, Massachusetts, USA). The statistical analyses were assessed at a significance level of p<0.05.

Spatial drivers of the CH 4 flux
In order to find the spatial parameters connected to the CH 4 flux, we obtained the following spatial data sets for the study site: digital elevation model (DEM) (elevation model derived from airborne lidar scanning, National Land Survey of Finland, 2/2019), biomass of foliage (for pine, spruce, and broadleaf trees) and tree volumes (for pine, spruce, and birch) (16 × 16 m; Multi-source National Forest Inventory, Natural Resources Institute Finland, 2015; Mäkisara et al., 2019), subsoil types (basal deposit at a depth of 1 m) (16 × 116 m; superficial deposits 1 : 20 000/1 : 50 000, Geological Survey of Finland, 2015), and peated soil areas (16 × 16 m; topographic database, National Land Survey of Finland, 8/2018). In addition, we calculated the following topographic indices from the DEM: topographic wetness index (TWI; Beven and Kirkby, 1979), terrain ruggedness index (TRI; Riley et al., 1999), slope, and cartographic depth-to-water index (DTW; Murphy et al., 2007) (Appendix A, Figs. A1-A4). Before calculating the indices, the DEM was pre-processed to be hydrologically correct by using the TopoToolbox "carve" option (Schwanghart and Kuhn, 2010). TWI was calculated as a natural logarithm of the ratio between the specific catchment area (contributing area per unit contour length) and tangent of the local slope. The upslope catchment area was calculated using multiple flow direction algorithms (Freeman, 1991;Schwanghart and Kuhn, 2010), and local slope was calculated using adjacent points in the DEM. The calculations were made with TopoToolbox. Following the recommendations by Ågren et al. (2014), TWI was calculated from the coarse-resolution (16 m) DEM and scaled back to a finer 5 m grid with bilinear interpolation, since Ågren et al. (2014) found that TWI calculated from the coarse grid represented the soil moisture better than when calculated from a finer grid. The other indices were calculated from the DEM with 5 m resolution. TRI was calculated using the gdaldem program, which is part of the Geospatial Data Abstraction Library (GDAL). TRI describes the amount of elevation difference between adjacent cells in a DEM grid and hence presumably has a low value on flat hilltops and depressions. The flow channel networks in the study domain used for the DTW calculations were estimated from the DEM using a 1 ha flow initiation threshold (Ågren et al., 2014), and then the DTW values were calculated using the r.cost function of GRASS GIS, where the terrain slope raster map was used as a cost layer (e.g. Murphy et al., 2007). DTW can be considered to describe the elevation difference to the nearest open water location derived from the DEM.

Modelling the soil moisture and the CH 4 flux to the site
We used a random forest (RF) algorithm (Breiman, 2001) to upscale the soil moisture and CH 4 flux to the whole area for two time periods: May-July and August-October. The primary purpose of this upscaling was to get an accurate estimate of landscape-level forest floor CH 4 fluxes in a way that reflects the soil heterogeneity. We opted to do two static predictions for two separate periods instead of trying to capture the temporal variability, as we did not have enough temporal data from each sample point, and modelling temporal vari-ability of soil moisture has been shown to be difficult even with larger data sets than the one used here (Kemppinen et al., 2018). All 60 sample points, for which data were temporally averaged for the two separate periods (on average seven and five measurements for each point during May-July and August-October, respectively), were used in RF model development. RF is a machine-learning algorithm that can be used to generalize complex dependencies between driving variables and a target variable. Here, our RF models consisted of a large ensemble of regression trees, which were trained each with a separate random subsample of available data. The output from the RF model is an average of output from all the trees separately -and hence the algorithm applies the bootstrap aggregation (bagging) method, which decreases the noise of the prediction. The model for soil moisture was developed using four drivers (TWI, slope, DTW, and TRI) and for CH 4 flux using five drivers (soil moisture, TWI, slope, DTW, and TRI), selected based on the correlations (Appendix A, Table A1, Figs. A5-A6). MATLAB (R2018b, MathWorks, Natick, Massachusetts, USA) function TreeBagger was used for developing the RF models in this study. Each trained forest consisted of 300 regression trees, and a minimum of two samples were allowed in a split node. Values for these hyperparameters were selected based on initial testing using out-of-bag errors and the minimum number of observations per tree leaf was set to two, due to the limited amount of data available. However, the number of variables randomly sampled as candidates at each split was not changed from its default value (one-third of the total number of variables). The predictive performance of the developed RF models was evaluated using distance-blocked leave-one-out crossvalidation. In this method, one RF model is developed for each sample point location, and the training data consist of data measured further than the selected distance (here 30 m) from the sample point location in question, while the rest of the data (i.e. data originating from closer than 30 m) are utilized as independent validation data. This way the possible spatial autocorrelations in the data did not inflate the cross-validation metrics (Appendix A, Figs. A7-A9). Blocked cross-validation has been proposed as an appropriate cross-validation strategy for data showing e.g. spatial autocorrelations (Roberts et al., 2017), and it has been used in some prior flux upscaling exercises (Peltola et al., 2019). Statistical metrics used in evaluating model predictive performance included mean bias, fraction of variance explained by the model (R 2 ), Nash-Sutcliffe model efficiency (NSE), and root mean squared error (RMSE). The uncertainty of the upscaled soil moisture and CH 4 flux, however, was estimated by developing 100 RF models with a random subset (70 %) of available data, and the variability (standard deviation) over this ensemble was used as an uncertainty estimate. The uncertainty estimate describes the robustness of the soil moisture and CH 4 flux dependence on the drivers identified by the RF model but likely also the effect of the distribution of the sampling points. This approach is similar to the one used in e.g. Peltola et al. (2019) and Aalto et al. (2018).
After modelling the soil moisture to the whole study domain, the forest floor CH 4 flux was modelled as well using the produced soil moisture raster map along with the maps of topographic metrics used to develop the RF model for CH 4 flux.

Evaluating the representability of chamber measurements
The representability of CH 4 flux chamber measurements was evaluated by comparing the average of CH 4 fluxes measured at n chamber locations against the mean CH 4 flux modelled for the whole study domain with the RF modelling approach (Sect. 2.8). The aim of this analysis was to evaluate how many chamber measurement locations are needed to get an accurate estimate of landscape-level flux by only averaging over the measured chamber data without any upscaling with e.g. RF algorithms. The mean RF-modelled CH 4 flux is used as a reference as it accounts for the soil heterogeneity. The CH 4 fluxes measured at n chamber locations were selected at random and averaged. This was repeated for a maximum of 500 times for each n and mean absolute bias between these estimates, and the reference was calculated as a metric for the representability of n chamber locations.

Ground vegetation at the sample points
The most common vascular species growing in the sample points were V. vitis-idaea (48 out of 54 studied sample points), V. myrtillus (45 points), Equisetum sylvaticum, and L. borealis, followed by M. bifolium and T. europaea. The most prevalent mosses were P. schreberi (34 points), Polytrichum commune (29 points), Sphagnum spp. (28 points), D. polysetum, and H. splendens. The four vegetation groups were named based on their dominant mosses. (1) The Sphagnum group included 15 sample points that had over 50 % Sphagnum coverage (except for one point) and no P. schreberi. Distinctive species were also E. sylvaticum, Carex digitata, and P. commune, which all are typical of peatland forest. (2) The Sphagnum-Pleurozium group is an intermediary group between the swampy and drier forest areas, with some Sphagnum but also P. schreberi in all 13 of its sample points. Sample points in the remaining two groups do not include any Sphagnum but species characteristic of upland forests. Sample points in (3) the Pleurozium group have typically more than 10 % P. schreberi, while the sample points in (4) the Hylocomium group have less P. schreberi and usually more than 80 % H. splendens, which is related to slightly higher fertility. D. polysetum was most common in the Pleurozium group, and some of the Pleurozium group had high L. borealis coverage. V. vitis-idaea and V. myrtillus were common in all the groups -their coverage was highest in the Pleurozium group and lowest in the Sphagnum group. Finally, the hilltop sample points were put to the Pleurozium group based on their vegetation. The Pleurozium group has thus in total 25 sample points, and the Hylocomium group has 7. The soil moisture as well as the soil temperature and precipitation followed a similar temporal pattern in both measurement years, although the spring was a bit wetter in 2013 (Fig. 2). This difference was largely due to thicker snow cover and later snowmelt in 2013 (mean snow cover in December-February 37 cm; snowmelt in late April) compared to 2014 (mean snow cover in December-February 7 cm; snowmelt in mid-March). The measurement years were rather similar regarding the weather conditions, which allowed us to combine the measured data from 2 years.
The continuous measurements of the SMEAR II station show that the soil temperature in the A horizon was between 0.3 and 15.4 • C in 2013 and between −1.8 and 16.8 • C in 2014 (January-December) (Fig. 2). There was a steep increase in the soil temperature at the turn of April and May in both years, peaking in July-August (Fig. 2). Between May and October the soil temperature was 1.9-15.4 • C in 2013 and 2.7-16.8 • C in 2014. There was no significant difference between the years. Soil temperature was not as spatially variable as soil moisture (no statistically significant differences between sample points). The average soil temperature of the measurement days at the sample points was between 9.8 (±4.1 SD) • C (H6) and 13.1 (±1.8 SD) • C (W-NW-4).
Emissions of CH 4 were measured in total 63 times from 17 different sample points, corresponding to 9 % of the measurements and 28 % of the points. Most of the CH 4 -emitting sample points belonged to the Sphagnum group (11 out of 17), and the highest CH 4 emissions were detected from six sample points in the Sphagnum group (sample points SW-W-2-3, E-SE-4-6, and S-SW-4). No emissions were observed from the Hylocomium-group sample points.
The highest CH 4 emission was detected on 5 June 2013 from SW-W-3 (Sphagnum group), which was located at a water-filled patch (the water table was above the peat surface for most of the time). The CH 4 flux from this sample point ranged between −33.4 and 1080 µmol m −2 h −1 , the mean being 107 µmol m −2 h −1 . The highest emission is excluded from all further statistical analyses as an outlier; nevertheless, there was no indication of fault in the measurement. Consequently, without the highest emission, the mean CH 4 flux from all the sample points at the site in 2013-2014 was −5.69 (±14.9 SD) µmol m −2 h −1 , and the maximum was 212 µmol m −2 h −1 (n = 722). There was a significant positive correlation between the measured soil moisture and the CH 4 fluxes (r s = 0.70, p<0.001; n = 722). If we explore the years separately, the relationship was slightly stronger in 2014 than in 2013 (r s = 0.73 and r s = 0.62, respectively, p<0.001; n = 722). There was also a statistically significant positive correlation between the mean CH 4 flux and the mean soil moisture at the sample points (r s = 0.78, p<0.001; n = 60) (Fig. 3). The absolute values of the CH 4 flux and the daily mean soil temperature at the sample points had a weak positive correlation (r s = 0.22, p<0.001; n = 722), and the correlation was similar in early summer and late summer.
There were significant differences between the vegetation groups in both the soil moisture and the CH 4 flux, in both seasons (Fig. 4). The mean soil moisture was decreasing from the Sphagnum group to the Pleurozium group, and the differences between the groups were statistically significant, except between the two driest groups (Fig. 4a, b). The mean CH 4 flux decreased from the Sphagnum group to the Hylocomium group, indicating mean CH 4 emission from the Sphagnum-group sample points in May-July (Fig. 4c) and increasing CH 4 uptake from the Sphagnum-Pleurozium group to the Hylocomium group (p<0.05) (Fig. 4c, d).
When considering the spatial variation of the CH 4 flux at the site, the measured CH 4 fluxes differed markedly between the locations (groups of two to six sample points) (Fig. 5). Two sample point groups had a mean flux indicating CH 4 emissions: SW-W-1-3 (9.45 ± 35.8 SD µmol m −2 h −1 ) and E-SE-4-6 (7.18 ± 32.9 µmol m −2 h −1 ) (Fig. 5). However, all the median fluxes of the sample point groups were negative. Emissions were measured from nine groups. The strongest mean CH 4 uptake was measured from group SE-S-4-6 (−14.26 ± 10.4 µmol m −2 h −1 ). The mean CH 4 flux of all six hilltop points was −8.63 ± 4.11 µmol m −2 h −1 (Fig. 5). Between May and July the hilltop mean flux

Modelled soil moisture and CH 4 flux at the site
The performance of the RF model in predicting soil moisture variability in the study domain is outlined with the statistical metrics shown in Table 1 (see also Appendix Fig. A8). These metrics were calculated against independent validation data and hence are suitable for assessing the predictive performance of the model. R 2 values were 0.51 and 0.26 for May-July and August-October, indicating that the model was able to describe the spatial variation in soil moisture more accurately during the early summer period. The RMSE and mean bias in the model prediction were similar during both periods.
The modelled soil moisture of the whole studied area ranged in May-July from 0.11 to 0.79 m 3 m −3 and in August-October from 0.12 to 0.65 m 3 m −3 . The mean soil moisture in May-July was 0.25 m 3 m −3 (±0.12 SD) and in August-October 0.23 m 3 m −3 (±0.09 SD) ( Table 2). The mean of the modelled values at the sample point locations (0.33 and 0.29 m 3 m −3 in May-July and August-October, respectively) were similar to the measured mean values (0.33 and 0.28 m 3 m −3 in May-July and August-October, respectively), while the modelled averages for the whole area were slightly lower for both seasons (Table 2). Based on the RFmodelled soil moisture, the site was mostly rather dry in 2013-2014, with some wetter areas where the soil moisture was above 0.5 m 3 m −3 (Fig. 6). The modelling results indicated that the wet areas were wetter and wider in May-July (Fig. 6a) than in August-October (Fig. 6b), which follows from the measured soil moisture (see Sect. 3.2). In May-July 5 % of the area was wet (soil moisture >0.5 m 3 m −3 ), whereas in August-October only ca. 1 % of the area could be considered wet. The RF model predicted high soil moisture for topographical depressions and flat areas, which were specified by low DTW, slope, and TRI, and high TWI values. The soil moisture was spatially more variable in early summer than in autumn. The relative uncertainty of the upscaled soil moisture was on average 12 % and 9.6 % during May-July and August-October, respectively (Fig. 7). The uncertainty increased with the predicted soil moisture, yet during May-July the wettest locations (soil moisture above 0.65 m 3 m −3 ) had on average smaller relative uncertainty than the locations with intermediate (soil moisture between 0.4 and 0.65 m 3 m −3 ) wetness (5.5 % and 9.7 %, respectively). This indicates that the RF model was able to better constrain the moisture variability at wet depressions and at very dry locations than in areas with intermediate wetness, using the drivers used to develop the model (see Sect. 2.7).
Based on cross-validation, the agreement between the upscaled and measured CH 4 fluxes at the different chamber locations was moderate (R 2 = 0.26 and R 2 = 0.39 for May-July and August-October, respectively) ( Table 3; see also Appendix Fig. A9). Note that the upscaling had difficulties especially in representing the tails of the CH 4 flux distribution (strong uptake/emission), which has an influence on the cross-validation metrics. The mean bias (upscaled-measured) was 0.10 and −0.10 µmol m −2 h −1 for the May-July and August-October periods, respectively. The modelled CH 4 flux of the whole studied area was between −12.3 and 6.17 µmol m −2 h −1 in May-July, with a mean flux of −7.42 µmol m −2 h −1 (±3.26 SD) ( Table  2). In August-October, the flux ranged from −14.6 to −2.12 µmol m −2 h −1 , with a mean of −9.91 µmol m −2 h −1 (±2.73 SD) ( Table 2). The modelled fluxes resulted in stronger CH 4 uptake than averaging the flux measurements of 60 sample points ( Table 2). The upscaling demonstrated some CH 4 -emitting patches in the early summer (Fig. 8a), which shifted to CH 4 uptake in the autumn (Fig. 8b). The emission patches covered approximately 3 % of the study area, and the flux of the emitting areas was 0.029-6.19 µmol m −2 h −1 in May-July. Omitting the emis-  sion patches from calculation would result in ca. 4 % stronger mean CH 4 uptake in May-July. The soil moisture of the emitting cells was between 0.39 and 0.79 m 3 m −3 in May-July, with a mean of 0.60 m 3 m −3 . In autumn the emission patches were drier, the soil moisture of these areas being 0.38-0.65 m 3 m −3 with a mean of 0.48 m 3 m −3 (±0.065 SD), and the CH 4 flux was between −5.31 and −2.12 µmol m −2 h −1 with a mean of −3.67 µmol m −2 h −1 (±0.615 SD). The relative uncertainties of the upscaled forest floor CH 4 fluxes were larger at the wet depressions than in dry areas (Fig. 9); however, the absolute uncertainties were lower. The upscaled CH 4 flux uncertainty showed a similar spatial pattern during both periods. The lower absolute uncertainty at wet depressions was due to lower variability of the measured CH 4 fluxes between measurement locations at wet spots. Note that these uncertainty maps represent only the uncertainty stemming from the upscaling procedure with the RF model. Crossvalidation metrics presented above are better for evaluating the overall uncertainty, which includes e.g. uncertainties related to possibly biased sampling locations.
In order to evaluate the number of sample points needed to produce an accurate estimate of the landscape-level forest floor CH 4 flux, we compared mean CH 4 flux from 1 to 60 randomly picked sample points with the mean of the upscaled flux (Sect. 2.9). Based on this comparison, we state that with approximately 15-20 randomly selected sample points similar accuracy was achieved to averaging over all the sample points (Fig. 10). This finding held irrespective of the study period investigated.

Drivers and spatial variation of the CH 4 flux in the two seasons
We performed spatially extensive CH 4 flux measurements from the forest floor, covering different soil moisture conditions and vegetation types in the area, and combined the measured data with remote-sensing tools in order to unravel the spatial variation in the forest floor CH 4 flux within the terrain. We accomplished this by generating random forest (RF) models to map the soil moisture and the CH 4 flux at the boreal forest site. Our results demonstrate that even though the forest floor is mainly a sink of CH 4 , as expected for the mainly dry upland pine forest, the CH 4 flux at the site is highly heterogeneous. In our study, the soil moisture was the most important driving force in the spatial variability of the CH 4 fluxes, whereas soil moisture is highly driven by topography. In previous studies, soil moisture and TWI have also been identified as the main factors affecting the soil CH 4 fluxes from a spatial perspective (Kaiser et al., 2018;Warner et al., 2019). Furthermore, soil moisture and vegetation are strongly interconnected: while topography-driven soil moisture controls vegetation (Moeslund et al., 2013), vegetation can also affect soil moisture via e.g. evapotranspiration (Dunn and Mackay, 1995) or rooting strategies (Milly, 1997). As the soil moisture is affected by topography, vegetation, and soil properties, it can have high spatial variability (Rosenbaum et al., 2012). Based on our results, there was high variation in the CH 4 fluxes even within the sample point groups, which may be explained by different vegetation groups among adjacent sample points.
Our vegetation classification was mostly based on mosses, and the connection to soil moisture was expected. Consequently, the groups differed in their soil moisture, and, furthermore, in their CH 4 flux. However, while the soil moisture did not differ between the Pleurozium group and the Hylocomium group, there was a significant difference in the CH 4 fluxes in autumn, suggesting that the ground vegetation as such may affect the CH 4 flux. Some plant species of forest ground vegetation have been suggested to contribute to CH 4 exchange, and the effect may vary between species (Halmeenmäki et al., 2017;Maier et al., 2017;Praeg et al., 2017), yet the studies focusing on forest ground vegetation are still rare. Also, different tree species can affect the soil conditions and thus the forest floor CH 4 flux, for example through soil chemical properties or nutrient conditions (Reay et al., 2005). While soil moisture also undergoes short-term temporal changes resulting from weather conditions, ground vegetation is an important indicator of long-term soil conditions and their spatial variability. Thus it can be used as a rough in situ estimate of the CH 4 flux when planning the locations of the measurement plots, as demonstrated by this study. The Pleurozium group was the most prevalent vegetation type within our sample points, representing typical ground vegetation of boreal pine forests. The mean CH 4 flux of the sample points in the Pleurozium group was close to the modelled seasonal mean fluxes.
Even though the difference in mean soil moisture between the studied seasons was not large, the wet areas were wetter and wider in May-July than in August-October, resulting in CH 4 emissions in May-July, while in the dry areas the CH 4 uptake increased substantially between the seasons. This suggests that either the activity of CH 4 -oxidizing bacteria seems to increase towards autumn in the dry areas, which could be linked to soil temperature being at the highest level in August, or that the activity of methanogens in the deeper soil (or microsites) decreases due to drying. Previously reported results indicate that there is a local optimal soil moisture for CH 4 oxidation, and in boreal forest soils the oxidation of CH 4 decreases when the soil moisture increases   (Billings et al., 2000), whereas low soil water content as such does not remarkably decrease CH 4 uptake of boreal forests' mineral soils (Saari et al., 1998). The oxidation of CH 4 has been discovered to be at the lowest level in late spring and early summer and most effective in autumn by both the highaffinity and low-affinity methanotrophs (Reay et al., 2005). Similarly to our results, upscaling of CH 4 fluxes across hillslope transects in a temperate deciduous forest by Warner et al. (2019) demonstrated CH 4 emissions from low-elevation areas in early summer and uptake in late summer -moreover, the magnitudes of the upscaled fluxes were similar to our upscaling results. In contrast, Aaltonen et al. (2011) found the CH 4 uptake to be stronger in May-June compared to late summer and autumn at the hilltop of the SMEAR II site (in 2008) (Aaltonen et al., 2011).
Our results at the hilltop showed higher CH 4 uptake (mean −8.01 µmol m −2 h −1 in May-July and −10.3 µmol m −2 h −1 in August-October) compared to the previous forest floor CH 4 flux measurements from the same hilltop area, reporting a mean CH 4 flux of −7.0 µmol m −2 h −1 (between Au-gust 2006 and June 2007) (Skiba et al., 2009) and −4.6 µmol m −2 h −1 (between April and November in 2008) (Aaltonen et al., 2011). This may be explained by stronger CH 4 uptake due to drier years. Flux data processing techniques may also cause discrepancies between this and prior studies, as the widely used linear flux calculation method has been demonstrated to underestimate the CH 4 fluxes by on average 33 % in a chamber intercomparison study (Pihlatie et al., 2013). Skiba et al. (2009) and Aaltonen et al. (2011) used the linear flux calculation method, whereas here we mainly used a non-linear method. From a wider perspective, the mean CH 4 fluxes obtained from both the measurements (mean of all the sample points: May-July −5.07, August-October −8.67 µmol m −2 h −1 ) and the modelling (May-July −7.74, August-October −10.0 µmol m −2 h −1 ) in our study were in line with previously reported forest floor CH 4 fluxes from boreal and temperate coniferous forests (−0.62 to −15 µmol CH 4 m −2 h −1 , Jang et al., 2006).

Hotspots of CH 4 emissions
At the site, there was one anomalous water-filled sample point (SW-W-3), from which the CH 4 emissions were at the same level as emissions reported from a nearby fen site (Li et al., 2016;Rinne et al., 2007Rinne et al., , 2018. Even though the location of SW-W-3 had the highest water table level, some of the other sample points also had a water table at or close to the soil surface. However, we did not measure as high CH 4 emissions from any of the other sample points -excluding SW-W-3, the highest emission was 1 order of magnitude smaller than the highest emission from SW-W-3. Out of the 10 highest emissions measured, 7 were from SW-W-3 and the rest from sample points E-SE-4-6 in another peaty area. These substantially higher emissions from SW-W-3 compared to other sample points with high soil water content and equally high Sphagnum coverage may be related to the joint effect of these two factors. The Sphagnum mosses thrive in wet conditions, where CH 4 is also produced, and most of the Sphagnum mosses growing in Finland are demonstrated to support methanotrophic bacteria (Larmola et al., 2010), which therefore naturally reduces the potential CH 4 emissions from Sphagnum-covered wet areas. However, it may be that while the Sphagnum-associated methanotrophs may reduce the CH 4 emissions from many of the sample points, they may not be active during the highest water level at SW-W-3.
Our results indicate that the observed spatial hotspots of CH 4 emissions seem to be prone to temporal variation, depending on the soil water status, affecting also the size of these patches. The measurement years being drier than the long-term average (annual precipitation 576 mm in 2013 and 572 mm in 2014; average 711 mm for 1981-2010) suggests that the emission patches can be larger in wetter years. The temporal variability in soil moisture is mainly driven by meteorological forcing and is suggested to be greater at locations with intermediate soil moisture than at the extremely dry or wet locations and in the topsoil compared to deeper soil layers (Rosenbaum et al., 2012). In our study, the mean soil moisture decreased between the early summer and autumn in the two wettest vegetation groups but stayed at the same level in the two driest groups, while the mean CH 4 flux showed increasing uptake in all the vegetation groups between the two seasons. This demonstrates that the temporal changes in soil moisture affect mainly the wet areas of the forest in our study, while the driest areas tend to remain dry, probably due to the well-drained and shallow topsoil on top of a bedrock.
Based on our results, most of the wet plots were located in the areas with sandy or gravelly till as subsoil, while the areas with bedrock close to the soil surface (max. 1 m soil) had lower soil moisture. However, the sample points E-SE-4-6 were located in a depression with a ca. 0.6 m deep peat layer, which was on top of a bedrock area according to the subsoil map. Praeg et al. (2017) also reported bedrock type affecting the CH 4 flux, probably through soil properties, rooting of plants, plant species, and microbial composition -however, for better understanding more research is needed.

Representativeness of sample point locations
In our study, the great advantage is the large number of sample points, resolving the small-scale spatial variability of the forest floor CH 4 flux. Some previous studies have upscaled CH 4 flux using similar types of approaches across complex terrains with measurements from slope transects (Kaiser et al., 2018;Warner et al., 2019). The site studied here represents a typical commercial pine forest of the boreal areas, and the results are thus scalable to a large area of similar types of forests in the boreal zone. The mean CH 4 flux obtained from all the sample points was rather close to but slightly higher than the upscaled CH 4 flux obtained from the model, indicating that the sample points covered the spatial variation of both the soil moisture and CH 4 flux in the area. However, while the mean values can be insufficient to tell a full story (i.e. it misses the spatial and temporal variability), comparison of means is important when targeting an accurate landscape-scale CH 4 budget. Based on our results, we state that 15-20 sample points are needed to reliably cover an area of comparable size of a typical boreal commercial forest, demanding however carefully designed placement in order to cover the spatial heterogeneity. Yet an area with higher topographical variation may require more sample points. Our conclusion of the recommended number of sample points is slightly lower compared to the ICOS measurement protocol, which recommends having at least 25 points at a site when using manual chambers (Pavelka et al., 2018). In addi-tion, our results suggest that, in August-October, only three randomly picked sample points would be as representative a sampling of the whole area as 15-20 sample points in May-July, which was probably due to smaller spatial variability in the CH 4 flux in autumn compared to early summer.
Usually, if no upscaling is implemented, the mean flux of the measurements is reported, neglecting the effect of the placement of the measurement points. In this study, the CH 4 flux measurements resulted in smaller mean uptake than the spatially modelled estimate for the whole area, even though 60 sample points were used, which emphasizes the importance of upscaling. Soil or forest floor CH 4 fluxes are most often studied using ca. 4-10 soil chambers per study site (Billings et al., 2000;Lohila et al., 2016;Savi et al., 2016;Skiba et al., 2009;Sundqvist et al., 2015), although a couple of studies so far applied 20 or more chambers (Dinsmore et al., 2017;Matson et al., 2009;J. M. Wang et al., 2013;Warner et al., 2018). Upscaling or mean flux is therefore often based on the assumption that the soil conditions are rather homogenous over the area and/or that the heterogeneity is well represented by a small number of chambers (Sundqvist et al., 2015). The locations of the sample points should be selected based on the spatial variability of the driving parameters. This could be done e.g. by evaluating different topographic or remote-sensing-derived indices in the study area during the planning phase of a scientific experiment, so that the measurements cover the full range of flux drivers based on a priori knowledge. Together with long time series, it is critically important to cover the spatial variability within different ecosystems and link the CH 4 fluxes to landscape parameters in order to achieve more accurate estimations of CH 4 and other greenhouse gas fluxes over large areas. The vastly developed and increasingly common elevation-mapping methods can be highly practical for upscaling the CH 4 fluxes of different areas. Furthermore, Ueyama et al. (2018) found that wet CH 4 emission patches were important at a larch plantation and could have a strong contribution to the canopy-scale fluxes -in our case we cannot fully conclude the impact at the ecosystem scale, as the above-canopy fluxes were not included.

Upscaling of the CH 4 flux
The predictive performance of the RF model developed for CH 4 flux upscaling was in the same range as or lower than in some prior studies (Kaiser et al., 2018;Warner et al., 2019) using topographic data to upscale CH 4 fluxes. It must be noted, however, that direct comparison of cross-validation results between studies is hampered by the different crossvalidation techniques used, since the method used for crossvalidation has an influence on the results (Roberts et al., 2017). Nevertheless, the cross-validation results indicate that a significant proportion of the CH 4 flux variability was not explained by the RF model, suggesting that important predictors were missing from the RF model development. These likely include variables linked to plant activity and/or soil organic carbon storage, since these are related to the amount of substrates available for the microbes. Even though we created the model based on the correlations with several potential drivers (Sect. 2.7), all the examined variables were not available at fine enough resolution to be accurate for the sample points (e.g. soil type) or were not directly available for the whole area (e.g. soil temperature, vegetation type), and thus we cannot fully conclude that these drivers would not affect the spatial variability of the CH 4 flux and improve the model performance. Remote-sensing-derived indices (e.g. NDVI) might be helpful, but these are not available either at spatial scales used in this study or separately for the forest floor. Soil moisture and CH 4 flux were not measured at the same time at all sampling locations, and hence despite using temporally averaged data there may have been some apparent spatial variability in the temporally averaged data due to unsynchronized sampling (e.g. some locations measured more during rainy days and others during sunny days). This apparent variability cannot be explained with the static topographic properties and hence could partly explain the significant proportion of the CH 4 flux and soil moisture variability unexplained with the RF models.
Variability in the emissions from wet mineral soils has been estimated to explain most of the total interannual variability in CH 4 emissions globally (Spahni et al., 2011). Kaiser et al. (2018) reported that when the soil moisture was above 0.43 m 3 m −3 the soil was a source of CH 4 , while soil moisture below 0.38 m 3 m −3 resulted in a CH 4 sink. In our study, the soil moisture limit for the CH 4 emissions was similar, at 0.39 m 3 m −3 , while simultaneously (in May-July) areas with soil moisture as high as 0.73 m 3 m −3 indicated CH 4 uptake. Thus, in the upscaling method we used, the cells with high soil moisture had both uptake and emission CH 4 flux values, which ultimately results from the measured data (Appendix A, Fig. A8).
It should be noted, however, that the modelled fluxes represent only the average CH 4 flux spatial pattern during the two seasons, and hence they do not capture the short-term temporal variability in the CH 4 fluxes caused by rapid variability of soil moisture inflicted e.g. by rain. Hence, the soil moisture may be occasionally wetter than the modelled moisture at some locations of the study domain, and therefore larger areas can emit CH 4 during (short) wet periods. For instance, Rosenbaum et al. (2012) showed with spatially extensive and continuous soil moisture measurements that intense precipitation events were significantly altering the soil moisture spatial variability in their study. Based on the continuous soil moisture data measured at SMEAR II, there was a peak in soil moisture in mid-April (Fig. 2) during snowmelt, when we only measured the flux at the hilltop. Thus, presumably the CH 4 emissions were highest in the beginning of May, when the soil temperature started to increase and the soil moisture was still high, and thus the spring emissions may have been even higher than observed. In order to capture accurately the temporal variability and to avoid underestimation of the highest values, soil moisture should be monitored continuously with high spatial (hilltops, depressions, slopes) and temporal frequency for future upscaling research. Furthermore, comprehensive annual measurements of CH 4 flux are also needed, as the non-growing season fluxes are noted to have an important contribution to annual CH 4 flux, especially at upland sites (Treat et al., 2018).

Conclusions
The CH 4 fluxes of the boreal forest floor are spatially highly heterogeneous, including potential emission hotspots. Soil moisture and vegetation type are important drivers of the spatial variability of the CH 4 flux, and the spatial variability of these drivers should be taken into account already in the experimental planning. Furthermore, to obtain spatially reliable estimates, the fluxes should be upscaled using appropriate geospatial tools. Spatially extensive measurements and highresolution modelling can help to further improve our understanding of the CH 4 dynamics of forests. Moreover, resolving the CH 4 flux over a large spatial scale with high temporal frequency would be of great importance in order to reveal the variation between years and seasons. Eventually this should lead to a more precise global CH 4 budget.        Figure A5. Partial dependences between the soil volumetric water content (VWC) and its drivers selected for the modelling (slope, topographic wetness index TWI, cartographic depth-to-water index DTW, and terrain ruggedness index TRI). Figure A6. Partial dependences between the forest floor CH 4 flux and the driving parameters selected for the model (slope, topographic wetness index TWI, cartographic depth-to-water index DTW, terrain ruggedness index TRI, and soil volumetric water content).   Data availability. The upscaled CH 4 flux and soil moisture data, their uncertainties, the spatial drivers used for the model, as well as the sample point means of the measurements are available at Zenodo (https://doi.org/10.5281/zenodo.4382801, Vainio et al., 2020).
Author contributions. MP had the original idea of the study. EV conducted the field measurements, analysed the flux data, and was the main author of the paper. EST conducted the TWINSPAN analysis. OP, AJK, VK, and EV planned the modelling, and OP conducted the modelling and wrote the modelling parts of the paper. All the authors discussed and commented on the paper and contributed to the writing.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. For research facilities we thank the Academy of Finland Centre of Excellence (programmes 1118615, 307331, and 272041) and ICOS-Finland (grant 281255). We thank the technical staff of Hyytiälä SMEAR II station for technical assistance as well as Kira Ryhti, Hanna El-khouri, Mari Mäki, and Lucas Toniolo Junior for fieldwork assistance.
Open-access funding was provided by the Helsinki University Library.
Review statement. This paper was edited by Martin De Kauwe and reviewed by two anonymous referees.