Topography-based modelling reveals high spatial variability and seasonal emission patches in forest floor methane flux

Boreal forest soils are globally an important sink for methane (CH4), while these soils are also capable to emit CH4 under favourable conditions. Soil wetness is a well-known driver of CH4 flux, and the wetness can be estimated with several terrain indices developed for the purpose. The aim of this study was to quantify the spatial 15 variability of the forest floor CH4 flux with a topography-based upscaling method connecting the flux with its driving factors. We conducted spatially extensive forest floor CH4 flux and soil moisture measurements, complemented with ground vegetation classification, in a boreal pine forest. We then modelled the soil moisture with a Random Forest model using topography, based on which we upscaled the forest floor CH4 flux – this was performed for two seasons: May–July and August–October. Our results demonstrate high spatial heterogeneity in the forest floor CH4 flux, 20 resulting from the soil moisture variability, as well as on the related ground vegetation. The spatial variability in the soil moisture and consequently in the CH4 flux was higher in the early summer compared to the autumn period, and overall the CH4 uptake rate was higher in autumn compared to early summer. In the early summer there were patches emitting high amounts of CH4, however, these wet patches got drier and smaller in size towards the autumn, which was enough for changing their dynamics to CH4 uptake. The results highlight the small-scale spatial variability of the 25 boreal forest floor CH4 flux, and the importance of soil chamber placement in order to obtain spatially representative CH4 flux results. We recommend that a site of similar size and topographical variation would require 15–20 sample points in order to achieve accurate forest floor CH4 flux.


Site description and experimental design
In order to quantify the spatial variability, we conducted forest floor CH4 flux and soil moisture measurements at 60 sample points covering an area of ca. 10 hectares around the SMEAR II station (Station for Measuring  Atmosphere Relations) in Hyytiälä, southern Finland (61° 51' N, 24° 17' E; 160-180 m a.s.l.). The measurements were performed during two growing seasons (2013 and 2014). The site has been regenerated in 1962 by prescribed burning and sowing Pinus sylvestris (Scots pine) (Hari and Kulmala, 2005). The mineral soils at the area are mostly podzols, while there are also some small peaty depressions, and some areas with almost no topsoil on the bedrock (Ilvesniemi et al., 2009). The soil at the site is rather shallow (5-150 cm) on top of the bedrock (Hari and Kulmala, 85 2005). Annual mean temperature and precipitation in 1981-2010 have been 3.5 °C and 711 mm, respectively (Pirinen et al., 2012).
To represent the heterogeneity in vegetation and soil moisture we located six sample points on the highest area on 95 top of the hill, and 54 at all the wind directions from the hilltop (Fig. 1). The sample points were identified based on the cardinal and intermediate directions from the centre of the studied area (the main mast of SMEAR II), thus having eight sectors (north-north-east sector N-NE, north-east-east NE-E, etc.), accompanied by an Arabic numeral (1-9) depending on the distance from the centre of the study area (e.g. sample point SE-S-1 being the closest to the centre at the sector SE-S). The hilltop sample points are located at the sectors N-NE, NE-E, and E-SE, but they are labelled 100 with the letter H instead of the directions. https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.

Flux measurements
The flux measurements were conducted with non-steady-state non-flow-through static chambers (Livingston and Hutchinson, 1995). The majority of the measurements were conducted with opaque aluminium or stainless steel chambers. The hilltop chambers were on average 0.027 m 3 including the collar (depending on the collar height and the vegetation inside the chamber), covering a forest floor area of 0.40 × 0.29 m. The rest of the chambers were on 110 average 0.102 m 3 covering an area of 0.55 m x 0.55 m. Part of the measurements were conducted with transparent chambers made of FEP (fluorinated ethylene propylene) foil and PTFE (polytetrafluoroethylene) tape, in order to test the effect of photosynthetically active radiation (PAR) on the CH4 flux. The transparent chamber measurements were always performed 15-155 minutes before the opaque chamber at the same sample points. As there was no significant difference in the flux between the chamber types, the data were merged (transparent chambers were used in 14% of 115 the measurements in the final data). All the chambers were equipped with a fan to ensure mixing of the chamber headspace air, and a vent-tube to minimize pressure disturbances inside the chamber. The collars were installed in May 2013 one week before the beginning of the measurements (except for the hilltop chambers, which were installed already in 2002 ) at the depth of ca. 5 cm to avoid cutting of tree roots and to minimize the sideways diffusion in the soil affecting the flux (Hutchinson and Livingston, 2001). Fine quartz sand was added to 120 the edges of the collars to ensure the installation.
The chambers were closed for 35-45 minutes and 5 samples were taken during each closure. Small part of the closures (10% of the final data) were 75 min with 7 samples, due to separate study on N2O fluxes. The samples were taken with 65 ml syringes (BD Plastipak™, Becton, Dickinson and Company, New Jersey, USA), and samples of 20 ml were inserted into glass vials (12 ml, Labco Exetainer®, Labco Limited, Wales, UK) after flushing the vial with the 125 https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.
sample. The samples were stored in dark at +5 °C before analyses with a gas chromatograph (GC) (7890A, Agilent Technologies, California, USA) equipped with a flame ionization detector (for details see Pihlatie et al. 2013). The chamber headspace air temperature was also recorded (DT-612, CEM Instruments, Shenzhen Everbest Machinery Industry Co. Ltd., Shenzhen, China) during the measurements for the flux calculation.
Measurements from the hilltop sample points were conducted every 2-9 weeks between 21 March and 20 December 130 2013, and every 2-7 weeks between 10 April and 27 November 2014. The other 54 sample points were measured on average every 3-4 weeks between 29 May-13 September 2013 and 20 May-10 December 2014. Some of the sample points were measured significantly more often than others, each being measured 7-23 times during the two-yearcampaign with a median of 13 measurements per sample point. The most active measurement period was June-August for both years. 135

Flux calculation
The procedure in flux calculations included: 1) filtering outliers from raw concentration data, 2) flux calculation using linear and non-linear functions, and estimating goodness-of-fit (GOF) parameters for the fluxes, 3) flagging the fluxes based on method quantification limit (MQL; Corley, 2003), 4) applying GOF criteria to flux data, and 5) creating final flux data. 140 We removed the outliers from the CH4 mixing ratio data by using a robust regression analysis that uses iteratively reweighted least squares with a bisquare weighting function (Holland and Welsch, 1977), by setting a weight limit to 0.87 and discarded all points below this limit as outliers. The fluxes were calculated from the outlier-filtered raw data using both linear and exponential fit (for the calculation see Pihlatie et al. 2013). The exponential fit parameters were based on 17 th order Taylor power series expansion (Kutzbach et al., 2007). 145 Firstly, decreasing CO2 in opaque chamber or CO2 flux below the MQL indicate a possible problem with the measurement, e.g. leaking chamber, and thus these measurements were omitted. For the CH4 fluxes that were above MQL the following GOF criteria must be met for the flux to be included in the final flux data: normalized root mean square error (NRMSE) below 0.2 and the coefficient of determination (R 2 ) above 0.7. The fluxes below MQL were accepted in the final data as such, without applying the NRMSE and R 2 criteria, as neither of these GOF parameters 150 work for close-to-zero fluxes. Furthermore, if the CH4 mixing ratio was > 10 ppm in the beginning of the closure the flux was omitted. Finally, there was one exceptionally large CH4 emission which was omitted from the final data set.
The MQL of the GC was 0.10 ppm for CH4 and 151 ppm for CO2 (calculated according to Corley 2003). The CH4 fluxes below MQL were between −3.74 and +2.38 µmol m −2 h −1 for the larger chambers and between −0.146 and +0.244 µmol m −2 h −1 for the smaller chambers, calculated by the linear fit. While in general linear fit tends to 155 underestimate the chamber fluxes (Pihlatie et al., 2013), regarding small fluxes exponential fitting is more prone to errors and over-parameterization, and the relationship between linear and exponential flux values is more variable (Korkiakoski et al., 2017;Pedersen et al., 2010). Thus it is recommended to select between linear or non-linear fitting depending on the concentration data (Korkiakoski et al., 2017;Pedersen et al., 2010). We used linear fit for all the fluxes that were below MQL. 160 After filtering the data, the final flux data included in total 723 measurements, of which 344 from year 2013 and 379 from year 2014. There were 5-21 measurements from each sample point, with a median of 11 measurements per point. In the final data set, 467 fluxes were calculated with exponential fit and 256 with linear fit, of which 184 were below the MQL. https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.

Environmental variables 165
We measured soil moisture (volumetric water content) in A-horizon (ca. 5 cm depth) manually at the sample points (except for the hilltop area), simultaneously with the flux measurements (ThetaProbe ML2x with HH2 Moisture Meter, Delta-T Devices Ltd, Cambridge, UK). The soil moisture was calculated as an average of three recordings at a sample point. At the hilltop area, the soil moisture was measured continuously with a Time Domain Reflectometer (TDR-100, Campbell Scientific Inc., Utah, USA). 170 Soil temperature in A horizon was logged next to each sample point (except for the hilltop) eight times a day from June to October in 2013 and from April/May to October in 2014 with Thermochron iButton devices (Maxim Integrated Products, California, USA). At the hilltop area, the A horizon soil temperature was recorded automatically at five locations by silicon temperature sensors (KTY81-110, Philips, Netherlands) at 10-minute intervals. In the analysis, we used daily average soil temperatures of the flux measurement days at each sample point. For May-June 175 in 2013, when the iButtons were not yet installed, we used the hilltop soil temperature data, as the temperature was rather consistent at all the sample points.
In addition to the continuous recordings of soil temperature and moisture, we used air temperature at 4.2 m height (Pt100 sensors with radiation shields by Metallityöpaja Toivo Pohja), and precipitation (Vaisala FD12P weather sensor at 18 m height) measured at the SMEAR II station. 180

Ground vegetation of the sample points
The composition of ground vegetation in 54 sample points (all except the hilltop points) was described by estimating projection cover of each plant species with the help of a frame divided into 0.1 x 0.1 m sectors. Cover less than 5% was marked to be 3%. To group the sample points based on their plant composition we performed a Two-way indicator species analysis (TWINSPAN), a divisive clustering method using TWINSPAN for Windows version 2.3 185 (Hill and Šmilauer, 2005). We used moss species as indicators because their distribution is generally more strongly related to soil moisture than the distribution of clonal vascular plants typical to boreal forests (e.g. Hokkanen, 2006).
For a robust result we excluded species which frequency was less than three. Based on the plant species composition in 54 sample points we created four vegetation groups. To visualize the variation within and between the groups we performed Canonical Correspondence Analysis (CCA) where vegetation groups from TWINSPAN were used as 190 environmental variables. Before CCA Detrended Correspondence Analysis (DCA) was conducted to decide between linear and unimodal methods. Canoco 5.11 (Šmilauer and Lepš, 2014) was applied for both analyses.

Statistical analyses
Multiple-group comparisons were performed with one-way ANOVA, and two-group-comparisons with t-test, when the Levene's test indicated equal variances, and distribution was normal or sample size large enough. When groups 195 had unequal variances, we used Welch's ANOVA (Analysis of Variance) together with Games-Howell test as a post hoc test. When comparing two groups with unequal variances, we used Welch's t-test, or Satterthwaite's approximation. When groups had equal variances, but distribution non-normal, or sample size very small, we used Kruskal-Wallis, followed by Bonferroni correction for pairwise comparisons.
The Spearman's correlation coefficients were calculated to study the relationships between the CH4 fluxes and the 200 environmental / topographical parameters at the camber locations. The Spearman's correlation was also performed between the CH4 flux, soil moisture, and soil temperature time series data, as the correlations were not linear. Soil https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License. temperature can increase both CH4 emissions and uptake, via increasing the activity of the soil microbes, and thus we used absolute flux values when examining the effect of the temperature.
Welch's t-test, Welch's ANOVA and Kruskal-Wallis, accompanied by the post hoc tests, were performed with SPSS 205 (IBM SPSS Statistics 24, New York, USA). Regular one-way ANOVA, Levene's tests, and correlation analyses were performed with MATLAB (R2017b / R2018b, MathWorks, Natick, Massachusetts, USA). The statistical analyses were assessed at a significance level of p < 0.05.

Modelling the soil moisture and the CH4 flux to the study area
In order to find the spatial parameters connected to the CH4 flux, we obtained the following spatial data sets for the Following the recommendations by Ågren et al. (2014), TWI was calculated from coarse resolution DEM (resolution 16 m), because TWI is not accurate in small spatial scales, whereas the other indices were calculated from DEM with 220 5 m resolution. The flow channel networks in the study domain used for the DTW calculations were estimated from the DEM using one-hectare flow initiation threshold (Ågren et al., 2014). The DEM was processed using Topotoolbox in MATLAB (Schwanghart and Kuhn, 2010).
We used Random Forest (RF) algorithm (Breiman, 2001) to upscale the soil moisture to the whole area for two time periods: May-July and August-October. This approach was selected after the initial testing with the RF model 225 revealed that the soil moisture was the greatest driving force of the CH4 flux, while other variables (e.g. temperature) were not affecting considerably. We opted to do two static predictions for two separate periods instead of trying to capture the temporal variability, as we did not have enough temporal data from each sample point, and modelling temporal variability of soil moisture has been shown to be difficult even with larger data sets than the one used here (Kemppinen et al., 2018). RF is a machine learning algorithm that can be used to generalise complex dependencies 230 between driving variables and a target variable. Here, our RF model consisted of a large ensemble of regression trees, which were trained each with a separate random subsample of available data. In the RF algorithm, each tree in a forest consists of split and leaf nodes: in split nodes, the training data is divided into two based on a value of a predictor variable (e.g. soil moisture above or below 0.5 m 3 m −3 ), whereas the leaf nodes determine the output from the tree. During the training, the data is split into two in split nodes and when there are less than n amount of data 235 points in the new split nodes they turn into leaf nodes. The output from the RF model is an average of output from all the trees separatelyand hence the algorithm applies the bootstrap aggregation (bagging) method, which decreases the noise of the prediction. The model was developed using four drivers (TWI, slope, DTW and TRI) for the soil moisture, selected based on the correlations (Appendix A, Table A1). MATLAB (R2018b, MathWorks, Natick, Massachusetts, USA) function TreeBagger was used for developing the RF models in this study. Each trained forest 240 https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License. consisted of 300 regression trees and minimum of two samples were allowed in a split node. Mean squared error was used as a metric for deciding the split criterion in split nodes.
The predictive performance of the developed RF model was evaluated using distance-blocked leave-one-out cross validation. In this method, one RF model is developed for each sample point, and the training data consists of data measured further than selected distance (here 30 meters) from the sample point in question, while the rest of the data 245 (i.e. data originating from closer than 30 meters) are utilised as independent validation data. This way the possible spatial autocorrelations in the data did not inflate the cross-validation metrics (Appendix A, Fig. A5). Blocked crossvalidation has been proposed to be appropriate cross-validation strategy for data showing e.g. spatial autocorrelations (Roberts et al., 2017), and it has been used in some prior flux upscaling exercises (Peltola et al., 2019). Statistical metrics used in evaluating model predictive performance included mean bias, fraction of variance explained by the 250 model (R 2 ), and root mean squared error (RMSE). The uncertainty of the upscaled soil moisture, however, was estimated by developing 100 RF models with a random subset (70%) of available data, and the variability over this ensemble was used as an uncertainty estimate. The uncertainty estimate describes the robustness of the soil moisture dependence on the drivers identified by the RF model. This approach is similar to the one used in e.g. Peltola et al. (2019) and (Aalto et al., 2018). 255 After modelling the soil moisture to the whole study domain, the forest floor CH4 flux dependency on the moisture was used to derive CH4 flux map (modelled CH4 flux). This was performed by estimating continuous joint probability density function (joint PDF) between the soil moisture and the CH4 flux using Kernel smoothing of the measurement data, and extracting flux values from the continuous joint PDF for each grid cell of the soil-moisture map. For each cell in the map, 100 CH4 flux values were extracted from the joint PDF, and the mean of these values was assigned 260 for that particular cell in the map. The uncertainty of the modelled CH4 flux was evaluated with the following method: the 100 RF models developed for estimating the soil moisture uncertainty were used together with the method above to extract CH4 flux values from the joint PDF. CH4 flux uncertainty was then evaluated as the standard deviation of the resulting 10 000 CH4 flux values (100 soil moisture values used to extract 100 values). This way we were able to account for the flux uncertainty stemming from soil moisture modelling uncertainty, as well as uncertainty stemming 265 from the variation in CH4 flux soil moisture dependence.
The four vegetation groups were named based on their dominant mosses. (1) Sphagnum-group included 15 sample points that had over 50% Sphagnum coverage (except for one point) and no P. schreberi. Distinctive species were also E. sylvaticum, Carex digitata, and P. commune that all are typical to peatland forest. (2) Sphagnum-Pleurozium-275 group is an intermediary group between the swampy and drier forest areas with some Sphagnum but also P. schreberi in all its 13 sample points. Sample points in the remaining two groups do not include any Sphagnum but species characteristic to upland forests. Sample points in (3) Pleurozium-group have typically more than 10% P. schreberi, https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.
while the sample points in (4) Hylocomium-group have less P. schreberi and usually more than 80% H. splendens, which is related to slightly higher fertility. D. polysetum was most common in Pleurozium-group, and some of the 280 Pleurozium-group had high L. borealis coverage. V. vitis-idaea and V. myrtillus were common in all the groupstheir coverage was the highest in Pleurozium-group and the lowest in Sphagnum group. Finally, the hilltop sample points were put to the Pleurozium-group based on their vegetation. The Pleurozium-group has thus in total 25 sample points, and the Hylocomium-group has seven. represent relatively dry years. The soil moisture as well as the soil temperature and precipitation followed a similar temporal pattern in both measurement years, although the spring was a bit wetter in 2013 (Fig. 2). This difference was largely due to thicker snow cover and later snowmelt in 2013 (mean snow cover in December-February 37 cm; 295 snowmelt in late April) compared to 2014 (mean snow cover in December-February 7 cm; snowmelt in mid-March).

Soil
The measurement years were rather similar regarding the weather conditions, which allowed us to combine the measured data from two years. https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.
Furthermore, there was a significant difference in the mean soil moisture between the seasons May-July and August-October (0.25 and 0.18 m 3 m -3 , respectively, SMEAR II continuous measurements, or 0.35 and 0.29 m 3 m -3 , 305 respectively, at the sample points on measurement days; p<0.0001). The mean soil moisture of the sample points differed between the two subsoil types (sandy/gravelly till 0.37 m 3 m −3 , bedrock with shallow moraine layer 0.26 m 3 m −3 ; p<0.001), while there was no significant difference in the soil temperature.
The continuous measurements of the SMEAR II station show that the soil temperature in A horizon was between 0.3 and 15.4 °C in 2013, and between -1.8 and 16.8 °C in 2014 (January-December) (Fig. 2). There was a steep increase 310 in the soil temperature at the turn of April and May in both years, peaking in July-August (Fig. 2). Between May-October the soil temperature was 1.9-15.4 °C in 2013, and 2.7-16.8 °C in 2014. There was no significant difference between the years. Soil temperature was not as spatially variable as soil moisture (no statistically significant differences between sample points). The average soil temperature of the measurement days at the sample points was between 9.8 °C (H6) and 13.1 °C (W-NW-4). 315
Emissions of CH4 were measured in total 63 times from 17 different sample points, corresponding to 9% of the measurements and 28% of the points. Most of the CH4-emitting sample points belonged to the Sphagnum-group (11 320 out of 17), and the highest CH4 emissions were detected from six sample points in the Sphagnum-group (sample points SW-W-2-3, E-SE-4-6, and S-SW-4). No emissions were observed from the Hylocomium-group sample points.
The highest CH4 emission was detected from SW-W-3 (Spaghnum-group), which was located at a water-filled patch (water table was   There were significant differences both in the soil moisture and in the CH4 flux between the vegetation groups (Fig.   4). The mean soil moisture was decreasing from the Sphagnum-group to the Pleurozium-group, and the differences 345 between the groups were statistically significant, except between the two driest groups (Fig. 4a). The mean CH4 flux decreased from the Sphagnum-group to the Hylocomium-group, indicating mean CH4 emission from the Sphagnumgroup sample points, and increasing CH4 uptake from the Sphagnum-Pleurozium-group to the Hylocomium-group (p<0.001) (Fig. 4b).

Modelled soil moisture and CH4 flux at the study area
The performance of the RF model to predict soil moisture variability in the study domain is outlined with the statistical metrics shown in Table 1. These metrics were calculated against independent validation data and hence are suitable for assessing the predictive performance of the model. R 2 values were 0.51 and 0.26 for May-July and August-October, indicating that the model was able to describe the spatial variation in soil moisture more accurately during 375 the early summer period. The RMSE and mean bias in the model prediction were similar during both periods. The modelled soil moisture of the studied area ranged in May-July from 0.11 to 0.79 m 3 m -3 , and in August-October from 0.12 to 0.65 m 3 m -3 . The mean soil moisture in May-July was 0.25 m 3 m -3 (± 0.12 SD) and in August-October 380 0.23 m 3 m -3 (± 0.091 SD). These values are relatively close to the average measured soil moisture at the sample point locations, even though the measured soil moisture was on average higher than the modelled moisture (Table 2). Based on the RF-modelled soil moisture, the site was mostly rather dry in 2013-2014, with some wetter areas where the soil moisture was above 0.5 m 3 m -3 (Fig. 6). The modelling results indicated that the wet areas were wetter and wider https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.
in May-July (Fig. 6a) than in August-October (Fig. 6b), which follows from the measured soil moisture (see section 385 Soil moisture and temperature at the site). In May-July 5% of the area was wet (soil moisture >0.5 m 3 m -3 ), whereas in August-October only ca. 1% of the area could be considered wet. The RF model predicted high soil moisture for topographical depressions and flat areas, which were specified by low DTW, slope and TRI, and high TWI values.
The soil moisture was spatially more variable in early summer than in autumn. The uncertainty of the upscaled soil moisture was on average 0.028 m 3 m −3 and 0.023 m 3 m −3 during May-July and August-October, respectively (Fig.  390   7). The uncertainty increased with the predicted soil moisture, yet during May-July the wettest locations (soil

Figure 7. Uncertainty of the modelled soil moisture (volumetric water content, m 3 m -3 ) at the site in May-July (a) and August-October (b). The uncertainty was defined as the standard deviation of the results of individual trees of the Random Forest model. The red circles show the sample points.
The agreement between the upscaled and the measured CH4 fluxes at the different chamber locations was moderate 405 (R 2 =0.32 and R 2 =0.35 for May-July and August-October, respectively) ( Table 3). If two locations with high CH4 emissions were neglected from the comparison, then the R 2 value increased to 0.47 for the May-July period. The mean bias (upscaled−measured) was −1.33 and 0.10 µmol m −2 h −1 for the May-July and August-October periods, respectively. The modelled CH4 flux of the whole studied area was between −12.0 and 2.28 µmol m −2 h −1 in May-July, with a mean flux of −7.74 µmol m −2 h −1 (± 2.52 SD). In August-October, the flux ranged from −13.1 to -1.98 410 µmol m −2 h −1 , with a mean of −10.0 µmol m −2 h −1 (± 2.57 SD). The modelled fluxes resulted in stronger CH4 uptake than averaging the flux measurements of 60 sample points ( Table 2). The upscaling demonstrated some CH4 emitting patches in the early summer (Fig. 8a), which shifted to CH4 uptake in the autumn (Fig. 8b). The emission patches covered approximately 5% of the study area, and the flux of the emitting areas was 0.014-2.30 µmol m −2 h −1 in May-July. Omitting the emission patches from calculation would result in ca. 1% stronger mean CH4 uptake in May-July. 415 The soil moisture of the emitting cells was between 0.69-0.79 m 3 m -3 in May-July, with a mean of 0.72 m 3 m -3 . In autumn the emission patches were drier, the soil moisture of these areas being 0.43-0.65 m 3 m -3 with a mean of 0.50 m 3 m -3 , and the CH4 flux was between −4.23 and −1.98 µmol m −2 h −1 with a mean of −3.26 µmol m −2 h −1 .The uncertainties of the upscaled forest floor CH4 fluxes were smaller at the wet depressions than at dry areas (Fig. 9).
The upscaled CH4 flux uncertainty showed similar spatial pattern during both periods. The lower uncertainty at wet 420 depressions was due to lower variability of the measured CH4 fluxes between measurement locations at wet spots.
In order to evaluate the number of sample points needed to produce reliable upscaling results, we compared mean CH4 flux from 1-60 randomly picked sample points with the mean of the upscaled flux. With approximately 15-20 randomly selected sample points similar accuracy was achieved as using all the sample points (Fig. 10). 425 https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.

Discussion
We performed spatially extensive CH4 flux measurements from the forest floor, covering different soil moisture conditions and vegetation types at the area, and combined the measured data with remote sensing tools in order to 440 unravel the spatial variation in the forest floor CH4 flux within the terrain. We accomplished this by generating a random forest (RF) model to map the soil moisture and the CH4 flux at the boreal forest site. Our results demonstrate that even though the forest floor is mainly a sink of CH4, as expected for the mainly dry upland pine forest, the CH4 flux at the site is highly heterogeneous. We observed patches with high CH4 emissions during early summer, and our model confirmed a higher potential for CH4 emissions in early summer than in autumn, revealing that the dynamics 445 of in particular the CH4-emitting sample points varied between the seasons. In addition to the emission patches, the results demonstrate that the uptake rates of the dry areas were also spatially and temporally variable.
In our study, the soil moisture was the most important driving force in the spatial variability of the CH4 fluxes, and it was thus selected as the basis of the model. In previous studies, soil moisture and TWI have also been identified as the main factors affecting the soil CH4 fluxes on a spatial perspective (Kaiser et al., 2018;Warner et al., 2019). 450 Topography-driven soil moisture, moreover, controls the vegetation (Moeslund et al., 2013). However, vegetation can also affect soil moisture via e.g. evapotranspiration (Dunn and Mackay, 1995) or rooting strategies (Milly, 1997), so the soil moisture and vegetation are highly interconnected. As the soil moisture is affected by topography, vegetation, and soil properties, it can have high spatial variability (Rosenbaum et al., 2012).
Our vegetation classification was mostly based on mosses, and the connection to soil moisture was expected. 455 Consequently, the groups differed in their soil moisture, and furthermore, in their CH4 flux. However, while the soil moisture did not differ between the Pleurozium-group and the Hylocomium-group, there was a significant difference in the CH4 fluxes, suggesting that the ground vegetation as such may affect the CH4 flux. Some plant species of forest ground vegetation have been suggested to contribute to CH4 exchange and the effect may vary between species https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License. (Halmeenmäki et al., 2017;Maier et al., 2017;Praeg et al., 2017), yet the studies focusing on forest ground vegetation 460 are still rare. Also different tree species can affect the soil conditions and thus the forest floor CH4 flux, for example through soil chemical properties or nutrient conditions (Reay et al., 2005). While soil moisture undergoes also shortterm temporal changes resulting from weather conditions, ground vegetation is an important indicator of long-term soil conditions and their spatial variability. Thus it can be used as a rough in situ estimate of the CH4 flux when planning the locations of the measurement plots, as demonstrated by this study. 465 Some previous studies have upscaled CH4 flux across complex terrains with measurements from slope transects (Kaiser et al., 2018;Warner et al., 2019). In our study, the site represents a typical commercial pine forest of the boreal areas, and the results are thus scalable to large area of similar type of forests in boreal zone. In our study, the great advantage is the large amount of sample points, resolving the small-scale spatial variability in a typical boreal pine forest. In this study, the mean CH4 flux obtained from all the sample points was rather close to the upscaled CH4 470 flux obtained from the model, indicating that the sample points covered the spatial variation (of the soil moisture and CH4 flux) well at the area. Based on our results, we state that 15-20 sample points are needed to reliably cover an area of comparable size of a typical boreal commercial forest, however, demanding carefully-designed placement in order to cover the spatial heterogeneity. Yet an area with higher topographical variation may require more sample points. In addition, our results suggest that, in August-October, only three randomly picked sample points would be 475 as representative sampling of the whole area as 15-20 sample points in May-Julythis was probably due to smaller spatial variability in the CH4 flux in autumn compared to early summer.
In our site, there was one anomalous water-filled sample point (SW-W-3), from which the CH4 emissions were at the same level as emissions reported from a nearby fen site (Li et al., 2016;Rinne et al., 2007Rinne et al., , 2018. Even though the location of SW-W-3 had the highest water table level, some of the other sample points had also water table at or 480 close to the soil surface. However, we did not measure as high CH4 emissions from any of the other sample pointsexcluding SW-W-3, the highest emission was one order of magnitude smaller than the highest emission from SW-W-3. Out of ten highest emissions measured, seven were from SW-W-3, and the rest from sample points E-SE-4-6 at another peaty area. These substantially higher emissions from SW-W-3 compared to other sample points with high soil water content and equally high Sphagnum coverage may be related to joint effect of these two factors. The 485 Sphagnum mosses thrive in wet conditions, where also CH4 is produced, and most of the Sphagnum mosses growing in Finland are demonstrated to support methanotrophic bacteria (Larmola et al., 2010), which therefore naturally reduces the potential CH4 emissions from Sphagnum-covered wet areas. However, it may be that while the Sphagnumassociated methanotrophs may reduce the CH4 emissions from many of the sample points, they may not be active during the highest water level at SW-W-3. 490 The Pleurozium-group was the most prevalent vegetation type within our sample points, representing typical ground vegetation of boreal pine forests. The mean CH4 flux of the sample points in the Pleurozium-group was close to the modelled seasonal mean fluxes. Kaiser et al. (2018) reported that when the soil moisture was above 0.43 m 3 m -3 the soil was a source of CH4, while soil moisture below 0.38 m 3 m -3 resulted in CH4 sink. In our study, the soil moisture limit for the CH4 emissions were much higher, at 0.68 m 3 m -3 , while in the same season (May-July), areas with soil 495 moisture as high as 0.73 m 3 m -3 indicated CH4 uptake. Thus, in the upscaling method we used, the cells with high soil moisture had both uptake and emission CH4 flux values, which ultimately results from the measured data (Appendix A, Fig. A6).
Variability in the emissions from wet mineral soils has been estimated to explain most of the total interannual variability in CH4 emissions globally (Spahni et al., 2011). Furthermore, Ueyama et al. (2018) found that wet CH4 500 emission patches were important at a larch plantation, and could have a strong contribution to the canopy-scale fluxes in our case we cannot fully conclude the impact at the ecosystem-scale, as the above-canopy fluxes were not included. Based on our results, the observed spatial hot spots of CH4 emissions seem to be prone to temporal variation, depending on the soil water status, affecting also the size of these patches. The measurement years being drier than the long term average (annual precipitation 576 mm in 2013 and 572 mm in 2014; average 711 mm for 1981-2010) 505 suggests that the emission patches can be larger on wetter years. The temporal variability in soil moisture is mainly driven by meteorological forcing, and is suggested to be greater at locations with intermediate soil moisture than at the extremely dry or wet locations, and in the topsoil compared to deeper soil layers (Rosenbaum et al., 2012). In our study, the mean soil moisture decreased between the early summer and autumn in the two wettest vegetation groups, but stayed at the same level in the two driest groups, while the mean CH4 flux showed increasing uptake in all the 510 vegetation groups between the two seasons. This demonstrates that the temporal changes in soil moisture affect mainly the wet areas of the forest in our study, while the driest areas tend to remain dry, probably due to the welldrained and shallow topsoil on top of a bedrock. Furthermore, the activity of CH4 oxidizing bacteria seems to increase towards autumn at the dry areas, which could be linked to soil temperature being at the highest level in August.
Previously reported results indicate that there is a local optimal soil moisture for CH4 oxidation, and in boreal forest 515 soils the oxidation of CH4 decreases when the soil moisture increases (Billings et al., 2000), whereas low soil water content as such does not remarkably decrease CH4 uptake of boreal forests' mineral soils (Saari et al., 1998).
Based on our results, most of the wet plots were located at the areas with sandy or gravelly till as subsoil, while the areas with bedrock close to the soil surface (max. 1 m soil) had lower soil moisture. However, the sample points E-SE-4-6 were located in a depression with ca. 0.6 m deep peat layer, which was on top of a bedrock area according to 520 the subsoil map. Praeg et al. (2017) have also reported bedrock type affecting the CH4 flux, probably through soil properties, rooting of plants, plant species and microbial compositionhowever, for better understanding more research is needed.
It should be noted, however, that the modelled fluxes represent only the average CH4 flux spatial pattern during the two seasons, and hence they do not capture the short-term temporal variability in the CH4 fluxes caused by rapid 525 variability of soil moisture inflicted e.g. by rain. Hence, the soil moisture may be occasionally wetter than the modelled moisture at some locations of the study domain, and therefore larger areas can be emitting CH4 during (short) wet periods. For instance, Rosenbaum et al. (2012) showed with spatially extensive and continuous soil moisture measurements that intense precipitation events were significantly altering the soil moisture spatial variability in their study. Based on the continuous soil moisture data measured at the SMEAR II, there was a peak in soil moisture 530 in mid-April (Fig. 2) during snowmelt, when we only measured the flux at the hilltop. Thus, presumably the CH4 emissions were highest in the beginning of May, when the soil temperature started to increase and the soil moisture was still high, and thus the spring emissions may have been even higher than observed. In order to capture accurately the temporal variability, and to avoid underestimation of the highest values, soil moisture should be monitored continuously with high spatial (hilltops, depressions, slopes) and temporal frequency for future upscaling research. 535 Furthermore, comprehensive annual measurements of CH4 flux are also needed, as the non-growing season fluxes are noted to have an important contribution to annual CH4 flux, especially at upland sites (Treat et al., 2018). https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.
The upscaling model of the CH4 flux indicates CH4 emissions only in the beginning of the summer, whereas in August-October the upscaling shows only CH4 uptake for the site. In addition to the emissions turning to CH4 uptake towards autumn, the CH4 uptake increased substantially between the seasons. The oxidation of CH4 has been 540 discovered to be at the lowest level in late spring and early summer, and the most effective in autumn, by both the high affinity and low affinity methanotrophs (Reay et al., 2005). Similarly to our results, upscaling of CH4 fluxes across hillslope transects in a temperate deciduous forest by Warner et al. (2019) demonstrated CH4 emissions from low-elevation areas in early summer, and uptake in late summermoreover, the magnitude of the upscaled fluxes were similar to our upscaling results. Contrarily, Aaltonen et al. (2011) found the CH4 uptake to be stronger in May-545 June compared to late summer and autumn at the SMEAR II site (in 2008) (Aaltonen et al., 2011).
Our results at the hilltop showed higher CH4 uptake ( (Aaltonen et al., 2011). This may be explained by stronger CH4 550 uptake due to drier years. Flux data processing techniques may also cause discrepancies between this and prior studies, as the widely used linear flux calculation method has been demonstrated to underestimate the CH4 fluxes by on average 33% in a chamber inter-comparison study (Pihlatie et al., 2013). Skiba et al. (2009)  The CH4 flux measurements from the 60 sample points resulted in smaller mean uptake than the spatially modelled CH4 flux, which emphasizes the importance of upscaling. Usually, if no upscaling is implemented, the mean flux of 560 the measurements is reported, neglecting the effect of the placement of the measurement points. Soil or forest floor CH4 fluxes are most often studied by using ca. 4-10 soil chambers per study site (Billings et al., 2000;Lohila et al., 2016;Savi et al., 2016;Skiba et al., 2009;Sundqvist et al., 2015), although a couple of studies so far applied 20 or more chambers (Dinsmore et al., 2017;Matson et al., 2009;Wang et al., 2013b;Warner et al., 2018). Upscaling or mean flux is therefore often based on assumption that the soil conditions are rather homogenous over the area, and/or 565 that the heterogeneity is well represented by a small number of chambers (Sundqvist et al., 2015). The locations of the sample points should be selected based on the spatial variability of the driving parameters. Together with long time series, it is critically important to cover the spatial variability within different ecosystems, and link the CH4 fluxes to landscape parameters in order to achieve more accurate estimations of CH4 (and other GHG) fluxes over large areas. The vastly developed and increasingly common elevation-mapping methods can be highly practical for 570 upscaling the CH4 fluxes of different areas.

Conclusions
The CH4 fluxes of the boreal forest floor are spatially highly heterogeneous, including potential emission hotspots.
The spatial variability of soil moisture, vegetation, and the resulting CH4 flux should be taken into account already in the experimental planning, and the fluxes should be upscaled using appropriate geospatial tools. Spatially extensive 575 measurements and high-resolution modelling are essential to further improve our understanding on the CH4 dynamics https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License. of forests. Moreover, resolving the CH4 flux over large spatial scale with high temporal frequency would be of great importance in order to reveal the variation between years and seasons. Eventually this should lead to more precise global CH4 budget. The authors declare that they have no conflict of interest. 585

Data availability statement
Should our manuscript be accepted for publication with Biogeosciences, the data supporting the results will be archived in an appropriate public repository, and the data DOI will be included at the article.  https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License. Figure A3. Cartographic depth-to-water index (DTW) at the study area. The red circles show the sample plots. Figure A4. Terrain ruggedness index (TRI) at the study area. The red circles show the sample plots. https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License. Figure A5. Empirical semivariogram (dots) and variogram model (line) for the soil moisture in May-July and August-October. Figure A6. Continuous joint probability density function (joint PDF) between the soil moisture (m 3 m -3 ) and the CH4 flux was estimated using Kernel smoothing of the measurement data. This was used for extracting flux values for the soil moisture map grid cells, from which we obtained the upscaled CH4 flux map. https://doi.org/10.5194/bg-2020-263 Preprint. Discussion started: 10 September 2020 c Author(s) 2020. CC BY 4.0 License.