Evaluating alternative ebullition models for predicting peatland methane emission and its pathways via data-model fusion

. Understanding the dynamics of peatland methane (CH 4 ) emissions and quantifying sources of uncertainty in estimating peatland CH 4 emissions are critical for mitigating climate change. The relative contributions of CH 4 emission pathways through ebullition, plant-mediated transport, and diffusion together with their different transport rates and vulnerability to oxidation determine the quantity of CH 4 to be oxidized before leaving the soil. Notwithstanding their importance, the relative contributions of the emission pathways are highly uncertain. In 30 particular, the ebullition process is more uncertain and can lead to large uncertainties in modeled CH 4 emissions. To improve model simulations of CH 4 emission and its pathways, we evaluated two model structures: 1) the Ebullition Bubble Growth volume threshold approach (EBG) and 2) the modified Ebullition Concentration Threshold approach (ECT) using CH 4 flux and concentration data collected in a peatland in northern Minnesota, USA. When model parameters were constrained using observed CH 4 fluxes, the CH 4 emissions simulated by the EBG approach (RMSE 35 = 0.53) had a better agreement with observations than the ECT approach (RMSE = 0.61). Further, the EBG approach simulated a smaller contribution from ebullition but more frequent ebullition events than the ECT approach. The EBG approach yielded greatly improved simulations of pore water CH 4 concentrations, especially in the deep soil layers, compared to the ECT approach. When constraining the EBG model with both CH 4 flux and concentration data in model-data fusion, uncertainty of the modeled CH 4 concentration profiles was reduced by 78 40 to 86 % in comparison to constraints based on CH 4 flux data alone. The improved model capability was attributed to the well-constrained parameters regulating the CH 4 production and emission pathways. Our results suggest that the EBG modeling approach better characterizes CH 4 emission and underlying mechanisms. Moreover, to achieve the best model results both CH 4 flux and concentration data are required to constrain model parameterization. We illustrate the improvement from model structure by comparing ECT_F and EBG_F, which were calibrated using the observed CH 4 flux data. Then we compare results from EBG_F and EBG_FC to show the ability of pore water CH 4 concentration data to help constrain the parameters related to the CH 4 emission pathways. Model 335 performance was evaluated against the observed data using Root Mean Square Error (RMSE). Model uncertainties in pore water CH 4 concentrations were quantified as the standard deviation across all soil layers in each of model runs listed in Table 3.


Introduction
Methane (CH4) emissions from wetlands constitute roughly one third of the global CH4 budget (Denman et al., 2007;Saunois et al., 2020). Methane, following production under anoxic conditions, is stored belowground, oxidized into CO2 by methanotrophs, or emitted into the atmosphere as CH4. The emission of CH4 is a major concern given its sustained-flux global warming potential (SGWP) of 45 (Neubauer and Megonigal, 2015). Methane emissions from wetlands cannot be simply estimated from production rates, as more than 50 % of methane can be oxidized during transport to the atmosphere in various ecosystems (Conrad and Rothfuss, 1991;Teh et al., 2005;Segarra et al., 2015). The global wetland CH4 oxidation sink has been estimated to be 40-70 % of CH4 production and it can dominate wetland CH4 cycling (Megonigal et al., 2004). The oxidation sink depends substantially on the CH4 emission pathways due to their different oxidation rates (Blodau, 2002).

55
Methane is produced at depth in the soil and then transported to the atmosphere via three primary pathways: ebullition, plant-mediated transport, and diffusion. Ebullition is the least vulnerable to oxidation, as it allows CH4 to quickly ascend in a bubble that bypasses the aerobic and anaerobic zones (Epstein and Plesset, 1950). Gases such as CH4 can be released into the atmosphere by vascular plants (particularly sedges) after being transported through intercellular spaces (molecular diffusion) or aerenchymous tissues. Although plant-transported CH4 bypasses 60 aerobic zones of the soil, 20-90 % of plant-transported CH 4 can be oxidized in the rhizosphere or within the aerenchymous tissues where gaseous oxygen is present (Schipper and Reddy, 1996;Ström et al., 2005;Laanbroek 2010). Diffusive transport through the peat column is the slowest transport method and therefore, CH4 is most susceptible to oxidation as it spends the longest time transiting the aerobic and anaerobic zones (Chanton and Dacey, 1991;Megonigal et al., 2004). The relative importance of each pathway determines how much CH4 is oxidized before it leaves the soil. Uncertainties in the relative contributions of these pathways to CH4 emission can lead to large errors in the predictions of total CH4 emissions (Bridgham et al., 2013). Despite their importance, the relative contributions of the CH4 emission pathways have not been well quantified by either experimental or modeling approaches until recently (Ricciuto et al., 2021;Yuan et al., 2021).
Experimental data on the relative importance of CH4 emission pathways are limited due to spatiotemporal 70 heterogeneity and the difficulty in directly measuring the different pathways (Klapstein et al., 2014;Iwata et al., 2018). While most state-of-the-art Land Surface Models (LSMs) incorporate CH4 emission and differentiate the three transport pathways, information on the relative contribution of each pathway from modeling studies is still limited, and none of such studies has estimated the uncertainty or accuracy of the relative contributions of the emission pathways to net CH4 emission (Bridgham et al., 2013). Comparisons between modeling approaches and 75 empirical CH4 data suggest that emission pathways may not be well captured by LSMs. For example, plantmediated CH4 transport by vascular species measured at northern peatlands accounted for 30-98 % of the total CH4 emission (Shannon et al., 1996;Waddington et al., 1996), whereas model-estimated proportions in the similar ecosystems were all above 65 % (Tang et al., 2010;Wania et al., 2010). Empirical estimates also suggested that diffusion could range from 9 % to 60 % of the total CH4 flux (Barber et al., 1988;Shea et al., 2010;Iwata et al., the mismatch between simulated and observed CH4 concentrations in deep soil layers . This is because diffusion is described with Fick's Law and Henry's Law, which have been widely used and well tested, and plant-mediated pathway happens only within the rooting depth, which is typically shallow in wetlands with a high water table level (Iversen et al., 2018). Ebullition makes a significant contribution to the total CH 4 emissions in some wetlands (Christensen et al., 2003;Yu et al., 2014). However, this process has not been well incorporated into 95 most state-of-the-art LSMs. Mechanistically, CH4 ebullition occurs when the buoyancy force of a bubble exceeds the retention force. During ascent, the bubbles exchange gas with the surrounding pore water and some of the bubbles become trapped, allowing CH4 to re-dissolve or be oxidized within the confining layer. In modeling studies, ebullition is commonly estimated using the Ebullition Concentration Threshold (ECT) approach. In ECT, when the pore water CH4 concentration is larger than a defined threshold, the excess CH4 is directly released into the 100 atmosphere (Walter and Heimann, 2000;Zhuang et al., 2004;Wania et al., 2010;Riley et al., 2011;Xu et al., 2016).
However, this approach ignores the possibility of a CH4 bubble moving into a less saturated layer where it can subsequently dissolve and possibly be oxidized, potentially overestimating ebullition. Other methods for modeling ebullition include the Ebullition Pressure Threshold (EPT) or the Ebullition Bubble Growth volume threshold (EBG) to trigger ebullition (Tang et al., 2010;Zhang et al., 2012). For the EPT method, bubbles form when the CH4 105 concentration exceeds a certain threshold. The EBG method describes how temperature, pressure, and gas exchange alter the bubble volume and uses maximum bubble volume as a threshold to trigger ebullition events (Fechner-Levy and Hemond, 1996;Kellner et al., 2006;Zhang et al., 2012). Peltola et al. (2018) compared these modeling approaches and concluded that EBG should be incorporated into LSMs instead of ECT or EPT, given its most realistic representation of the observed temporal variability of CH4 emissions. However, the ability of the EBG 110 approach to represent the relative importance of CH4 emission pathways has not been evaluated against observations.
A more realistic projection of the emission pathways requires not only an improved model structure, but also more appropriate parameter values (Wania et al., 2010;Riley et al., 2011;Shi et al., 2018). Data-model fusion directly informs process-based models by synthesizing multisource data streams and thus can help determine 115 parameter values that lie within biophysically realistic ranges and reduce model uncertainty (Williams et al., 2009;Keenan et al., 2013;Shi et al., 2015a;Liang et al., 2018). Previous studies have found that sporadic measurements of net CH4 emissions were only useful to constrain a few model parameters and data assimilation with only CH4 emission (flux-based) data did not help reduce the uncertainties in emission pathways (Bridgham et al., 2013;. In our previous study, we found that monthly CH4 emission data could only constrain CH4 production-120 related parameters such as temperature sensitivity (Q10) and basal production rate of CH4 production . While direct measures of CH 4 emission pathways are rare, depth-specific pore water CH 4 concentration profiles can help elucidate the relative importance of CH4 emission pathways. Indeed, measured CH4 concentration profiles are critical for constraining the responsive parameters associated with CH4 emission pathways because in process-based CH4 models, all the three emission pathways are calculated based on the CH4 concentration in each 125 soil layer.
To date, few modeling studies have considered CH4 concentration data for structural improvement or parameter optimization (Zhuang et al., 2004;Wania et al., 2010;Riley et al., 2011;Zhu et al., 2014). In those studies that compared simulation results to observed pore water CH4 concentrations, the simulated concentration profiles did not agree well with observations, despite good agreements between simulated and observed CH4 emission data (Walter 130 and Heimann, 2000;Tang et al., 2010). Thus, when CH 4 emission pathway parameters are calibrated using only net CH4 flux data, models may not realistically represent CH4 production, oxidation, and emission pathways. The exclusion of concentration profile data results in poorly constrained model parameters due to equifinality, in which multiple combinations of parameters result in similar flux predictions. This can cause misunderstanding of the mechanisms of CH4 processes. It will be problematic to use these not-yet-well-calibrated parameter sets for climate 135 change predictions or extrapolating CH4 fluxes from the site level to larger spatial and temporal scales as these intermediate processes may have different responses to perturbations in climate.
To address these uncertainties, we evaluated the performance of two state-of-the-art methods for modeling ebullition, EBG and ECT, against the observed net CH4 fluxes and pore water CH4 concentration profiles in a northern Minnesota peatland. We also compared the strength of the flux-based data and pool-based data in 140 constraining the parameters using data-model fusion. We hypothesized that: (1) the EBG approach can reproduce the observed pore water CH4 profiles better than the ECT approach, given its more mechanical representations of bubble formation, gas exchange, and release; and (2) pore-water CH4 concentration data offer more information for model parameters to reduce the uncertainties in simulated CH4 emission and its pathways.

Site and measurements
The data we used to calibrate our model were collected from the Spruce and Peatland Responses Under Climatic and Environmental Change Experiment (SPRUCE), which is conducted in the 8.1-ha S1 bog in northern Minnesota in the USDA Forest Service Marcell Experimental Forest (N 47° 30.476', W 93° 27.162') to study the responses of northern peatlands to climate warming and elevated atmospheric CO2 concentration (Hanson et al., 2017a). The 150 mean annual temperature from 1961 to 2009 at the SPRUCE site was 3.4 °C, and the mean annual precipitation was 780mm (Sebestyen et al., 2011). The mean peat depth is 2-3 m (Parsekian et al., 2012 Deep peat warming began in June 2014, aboveground warming began in August 2015, and elevated CO 2 treatments began in June 2016. In this study, however, all observed data we used were only from ambient plots (no infrastructures and no warming treatment) for our research goals and we did not explore the warming effects on CH4 processes. Modeling CH4 emissions in response to warming and elevated CO2 at this experiment site can be found in 160 Yuan et al. (2021). A complete list of data streams used in this study is included in Table 1.

165
Environmental variables, including soil temperature, air temperature, relative humidity, wind speed, precipitation, and photosynthetically active radiation plots were used as model input data. Measurements of environmental variables in ambient plots started in 2011. Soil temperature and moisture in different layers, water table depth (Hanson et al., 2015a;Hanson et al., 2015b;Hanson et al., 2016b), carbon pools (leaf, wood, root, and peat soil, Hanson et al., 2018a;Hanson et al., 2018b;Norby et al., 2018), and community-scale fluxes, including 170 gross primary production (GPP), net ecosystem exchange (NEE), ecosystem respiration (ER), and CH4 flux data Hanson et al., 2016a;Hanson et al., 2017b) were used to calibrate the modeled water-heat balance and carbon cycle similarly as in our earlier studies Ma et al., 2017;Jiang et al., 2018).
CH4 fluxes and pore water CH4 concentrations were used for data assimilation. We averaged the data from all ambient plots measured on the same dates to represent the site-level CH4 emissions and concentration profiles and 175 variations among different ambient plots were not considered in this study. In total, 45 daily CH4 emission measurements were obtained from ambient plots from 2011-2016. In situ pore water CH 4 concentrations were measured monthly during the growing seasons in 2014-2016 (11 profiles in total) with the pore water samples collected from a series of piezometers permanently installed in the plots at 25, 50, 75, 100, 150, and 200 cm depths, respectively (Wilson et al., 2016). Piezometers consisted of a <1 cm diameter pipe that limited diffusion. Twenty-

180
four hours prior to sampling, piezometers were pumped dry and allowed to recharge naturally so that the sampled water would not have been in prolonged contact with the atmosphere prior to collection. A perforated stainless-steel tube was inserted into the peat to collect samples within 0-25cm depth. Samples were immediately filtered in the field through 0.7 µM Whatman glass-fiber filters and stored in pre-evacuated, septum-sealed glass vials. Phosphoric acid (1 mL, 20 %) was added to preserve each sample during shipment to Florida State University for analyzing CH4 185 concentrations.

Overview of TECO_SPRUCE
For this study, we used the process-based biogeochemistry model, TECO_SPRUCE (Terrestrial ECOsystem model at the SPRUCE site). The model was built with six major modules running at an hourly time step: canopy 190 photosynthesis, soil water dynamics, plant growth, soil thermal dynamics, soil carbon/nitrogen (N) transfer, and soil CH4 dynamics. A detailed description of these modules can be found in Weng and Luo (2008), Shi et al. (2015b), Huang et al. (2017), and Ma et al. (2017). Here we give a brief description of these modules but describe in detail how we calculated CH4 ebullition with the EBG and ECT approaches.
The canopy photosynthesis module was mainly derived from a two-leaf model. It couples surface energy, al., 1980) and the Ball and Berry stomatal conductance model (Ball et al., 1987). The soil water dynamic module has 10 soil layers and simulates water table level and soil moisture dynamics using rainfall, snowmelt, evapotranspiration, and runoff. Evaporative losses of water and associated latent heat are regulated by soil moisture in the first layer and atmospheric demand. Transpiration is determined by stomatal conductance and soil water 200 content of the layers with roots present. When precipitation exceeds water recharge to soil water holding capacity, runoff occurs. The water table level is estimated using a simple bucket model as described by Granberg et al. (1999).
The plant growth module calculates the allocation of photosynthesis carbon to individual plant pools (foliage, stem and root), plant growth, plant respiration, phenology, and carbon transfer to the litter and soil carbon pools. Leaf onset is regulated by growing degree days (GDD) and leaf senescence is determined by low temperature and/or dry ) models, we assume that CH4 production (Pro) within the catotelm is directly related to heterotrophic respiration from soil and litter (R h , g C m −2 h -1 ) via the following equation: where f CH 4 is an ecosystem-specific conversion scaler describing the fraction of anaerobically mineralized C atoms becoming CH4. The parameters fstp, fpH, and fred are environmental scalers, representing the effects of soil temperature, pH and redox potential, respectively on CH4 production. Total emission of CH4 from the soil to the atmosphere is calculated as the sum of CH4 ebullition from saturated soil layers, plant-mediated CH4 emissions from 225 all the soil layers, and the diffused flux from the first soil layer into the atmosphere. A sensitivity test was done to decide which parameters need to be optimized by data-model fusion , and more detailed descriptions on the CH4 module can be found in Ma et al. (2017). otherwise, CH4 in bubbles is added to the soil layer just above the water table and then continues to diffuse through the soil layers to the atmosphere. Below we describe these two methods in detail.

Ebullition approach based on the Concentration Threshold (TECO_SPRUCE_ECT)
With the concentration threshold approach, we assume that bubbles form when the CH4 concentration exceeds a certain threshold based on the equilibrium concentration defined by Henry's Law. Instead of using a constant value for the threshold, in this study, we allowed the threshold to fluctuate with atmospheric pressure, water column pressure, and soil temperature, following the method proposed by Wania et al. (2010). The maximum solubility of 245 CH4 at a given temperature was calculated using a statistical model used by Yamamoto et al. (1976) based on the empirical data: where V is the Bunsen solubility coefficient, defined as the volume of gas dissolved per volume of water at atmospheric pressure and a given temperature. The volume of CH 4 dissolved per volume of water was converted 250 into grams using the ideal gas law: where [CH 4 ] thre is the maximum concentration threshold (g C m -3 ), P is the sum of the atmospheric and hydrostatic pressures (Pa), V is the Bunsen solubility coefficient as in Eq. (2), the constant C is the atomic weight of carbon (12 g mol -1 ), the gas constant R is 8.3145 m 3 Pa K -1 mol -1 , and T is the temperature (K). Then the CH4 ebullition flux can 255 be calculated using the following equation: where E bu (z, t) is the ebullition flux of CH4 (g C h -1 ) to the lowest air layer, K ebu is a rate constant of 1.0 h -1 (Walter and Heimann 2000, Zhuang et al. 2004, Zhuang et al. 2006, and [CH 4 ](z, t) is the pore water CH4 concentration in soil depth z at model time step t. 260

Ebullition approach based on the Bubble Growth volume threshold (TECO_SPRUCE _EBG)
In contrast to the concentration threshold approach, the EBG approach uses bubble volume as a threshold to trigger ebullition events (Fechner-Levy and Hemond 1996) and it has been applied to model CH4 ebullition (Kellner et al., 2006;Zhang et al., 2012;Peltola et al., 2018). The total bubble volume in each soil layer is calculated and updated continuously based on the ideal gas law and Henry's law. In detail, if CH4 concentration exceeds the limit that the 265 water can withhold based on Henry's law, then excess CH4 is converted to a gaseous volume calculated using the predefined bubble CH4 mixing ratio (f). This gaseous volume is divided evenly into a certain number of bubbles (Nbub). Nbub is a unitless tuning parameter ranging between 5-500 in each 10 cm thick soil layer and 10-1000 in each 20 cm thick soil layer, which essentially controls the mass exchange rate between the gas volume and the pore water. The CH4 exchange between the stationary bubbles and the pore water (Qebu) is calculated using the equation

270
proposed by Epstein and Plesset (1950): where r is the radius of a bubble (m), Dw is the CH4 diffusion coefficient in water (m 2 s -1 ) calculated using the quadratic curve of observed diffusivities against temperatures (Broecker and Peng, 1974), Vw is the amount of water in this layer (m 3 ), cw is dissolved CH4 concentration in the pore water, and H cc is the dimensionless Henry solubility 275 of CH4 calculated following Sander (1999). P, R, and T are same as in Eq. (3). A negative value of Qebu indicates CH4 transfer from the bubbles back to the pore water. This reverse gas exchange mechanism has not been included in other ebullition methods but has been revealed as an important process in empirical studies (McGinnis et al., 2006;Rosenberry et al., 2006). The ebullition flux Ebu(z,t) is then calculated when the bubble volume at a certain depth (z) exceeds the volume threshold (Vmax) within the time step t: where Vmaxfraction is the free-phase gas-filled fraction of the pore space in the soil layer above which ebullition occurs, cb is the CH4 concentration in a bubble (mol m -3 ), VB is the total volume of all bubbles, and ΔVB is the change in the total volume due to the diffusive gas exchange in Eq. (5). The amount of CH4 in all bubbles after each time step is: where f is the predefined bubble CH4 mixing ratio as mentioned earlier, and V B ′ is the updated total bubble volume after each time step. Excess bubbles will be released into the lowest air layer within one time step unless they are trapped in the soil profile. To determine if a bubble will be trapped, we adopted an approach similar to Peltola et al. (2018), assuming that the probability for a bubble to be trapped within a certain soil layer is a predefined constant 290 number (bubprob), thus the bubbles formed in deeper layers would have a larger probability of being trapped during ascent. In contrast, all other ebullition modeling methods assume that no bubbles will get trapped.
The values of bubprob, f, and Vmaxfraction are dependent on the soil texture, porosity, water content, etc. and have been found to significantly affect the modeled CH4 fluxes, the layers where bubbles were formed, and the number of ebullition events (Zhang et al., 2012;Peltola et al., 2018). The tuning parameter, Nbub, however, has a minimal effect 295 on modeled ebullition . In this study, we used the empirical values measured from other sites or the values used in other models as the prior ranges of bubprob, f, and Vmaxfraction in our models (Table 2). Then we constrained these parameter values via data-model techniques so that the model estimation of ebullition process was more accurate.

Data-model fusion
We used the Markov Chain Monte Carlo (MCMC) method based on the Metropolis-Hasting algorithm (Metropolis et al. 1953) to optimize the posterior distribution of parameters and explore model uncertainty. Both the observed 305 data and simulated results were rescaled to a daily emission unit for comparison. The prior range for each parameter was assumed to be uniformly distributed, which indicates that all values within the range have equal likelihood. We also assumed that errors between observations and model simulations independently follow a normal distribution with a zero mean. The cost function weights the mismatch between observations and the modeled corresponding variables, represented by: where Z i (t) is the ith observation stream at time t, X(t) is the model simulated value, and σ i (t) is the standard deviation of observation error estimates.
The parameter space was explored for 50,000 iterations during the optimization process. The new parameter value at the current step was based on the accepted parameter in the previous step by a proposed distribution. The

315
current value was accepted if the observation-model difference was reduced or otherwise passed with a random probability. We ran five chains of 50,000 simulations and used the Gelman-Rubin statistic (Gelman and Rubin, 1992) to check the convergence of sampling chains. The first half of the accepted parameters were discarded as the burn-in period, and the second half were used for posterior analysis. More details on sampling and the cost function can be found in Xu et al. (2006).
Parameters directly regulating CH4 emission pathway and belowground dynamics and their prior ranges used for data assimilation are listed in Table 2. Specifically, we selected four parameters (i.e., fCH4, Q10pro, Omax and Tveg) from the TECO_SPRUCE_ECT and seven parameters (all the seven parameters in Table 2) from the TECO_SPRUCE_EBG during data assimilation. The prior ranges were determined by combining information from empirical measurements or modeling studies from peatland ecosystems. The in situ CH4 emission and pore water 325 CH4 concentration data from ambient plots (Table 1) were used as observations to constrain model parameters. In order to evaluate how a proper model structure and constrained parameter values help improve model-simulated CH4 emission pathways, we conducted four data assimilation runs with the TECO_SRUCE model, as shown in Table 3.  Table 3.

340
The CH4 production-related parameters, f CH 4 (fraction of anaerobically mineralized C atoms becoming CH4) and Q10pro (temperature sensitivity of CH4 production), were constrained using the CH4 emission flux data for both TECO_SPRUCE_ECT and TECO_SPRUCE_EBG (Table 4, Fig. 2a, b). However, the maximum likelihood estimates (MLEs) for parameters varied between the two models (Table 4), with f CH 4 slightly increased from 0.16 to 0.17 and Q10pro decreased from 3.0 to 2.69 in the EBG approach compared to the ECT approach. In contrast, Omax

345
and Tveg were not constrained in either model with large uncertainties in model-estimated CH4 oxidation and plant transport parameters (Table 4, Fig. 2c, d).

350
Of the three ebullition-related parameters used only in the EBG approach, when assimilating only the CH4 emission flux data, Vmaxfraction (maximum fraction of volume occupied by bubbles) was constrained with a unimodal shaped posterior distribution (Fig. 2g), f (CH 4 mixing ratio in bubbles) was edge hitting with a marginal distribution downward (Fig. 2e), and bubprob (probability that a bubble will get trapped at a certain soil layer) was constrained with a wide, slightly domed distribution (Fig. 2f).

Figure 2.
Posterior distribution of parameters that govern methane emission processes from data-model fusion.
Parameters are defined in Table 2

385
The ECT model constrained by CH4 flux data could not reproduce well the patterns of the observed pore water CH 4 concentrations, especially in the deep peat layers (RMSE = 0.77, Fig. 4, ECT_F). When calibrated by CH 4 flux data alone, the EBG approach for ebullition captured deep layer CH4 concentrations much better than the ECT approach (RMSE = 0.33, Fig. 4, EBG_F). The observed concentration profiles lay within the 95 % probability relatively large range of CH4 concentration profiles, especially in the deep peat layers, mainly due to the unconstrained T veg and bubprob controlling the plant transport and ebullition pathways, respectively (Fig. 2, EBG_F). Observation ECT_F

EBG_F EBG_FC
For the ECT approach, as described earlier, assimilating the observed CH4 flux could constrain parameters such as f CH 4 and Q10pro. However, when using both observed CH4 flux and concentration data to constrain the parameters of TECO_SPRUCE_ECT (i.e., the ECT_FC run), no parameter set was accepted within the observational uncertainty range, indicating that the ECT model structure failed to simultaneously simulate the dynamics of CH4 emissions and pore water CH4 concentrations.

410
In contrast to the ECT approach, incorporation of porewater CH4 concentration data in the EBG approach greatly improved parameter estimations. While Tveg and bubprob were not constrained by flux-based observation data alone (Table 4, Fig. 2, EBG_F), they were well constrained to a unimodal distribution when both CH4 flux and pore water CH4 concentration data streams were used in the data-model fusion (Table 4, Fig. 2, EBG_FC).
Compared to EBG_F, the parameter Tveg was well constrained to a very small range of 1.43 ± 0.46, and the 415 parameter bubprob was also well constrained to a range of 0.25 ± 0.015 with less uncertainty under EBG_FC (Table   4, Fig. 2). The parameter f CH 4 decreased from 0.17 in EBG_F to 0.15 in EBG_FC whereas Q10pro increased from 2.69 in EBG_F to 3.21 in EBG_FC (Table 4, Fig. 2). Moreover, the formerly constrained range of parameter f under EBG_F shifted from 0.11 ± 4.0 to 0.29 ± 0.46 when the pore water CH4 concentration information was added into data assimilation. All the emission pathway-related parameters (Tveg, bubprob, f, and Vmaxfraction) were well 420 constrained once the pore water CH4 concentration profile information was added to data-model fusion. However, incorporation of the porewater CH4 concentration data in data assimilation with the TECO_SPRUCE_EBG did not improve the constraint of Omax.
In terms of model's performance fitting observed CH4 emission patterns, the two parameterization methods for the EBG approach were comparable, with RMSE of 0.53 under EBG_F and RMSE of 0.52 under EBG_FC (Fig. 3c, 425 e). However, the simulated contributions from plant-mediated transport, diffusion, and ebullition by EBG_FC, which were 31.8 ± 4.9 %, 58.1 ± 5.1 %, and 9.9 ± 6.1 %, respectively (Fig. 3f), varied greatly from those simulated by EBG_F (Fig. 3d). The contribution from ebullition under EBG_FC was much less than that under EBG_F (Fig.   3d). CH4 flux and concentration data together reduced the uncertainty of the modeled CH4 concentration profiles by 78-86 % compared to the flux data alone for data-model fusion, with RMSE reducing from 0.33 in EBG_F to 0.12 430 in EBG_FC (Fig. 4). The uncertainty in modeled CH4 concentration profiles was decreased mainly due to the wellconstrained parameters regulating the CH 4 production and emission pathways (Fig. 2a, b, d-g).

Discussion
In this study, we evaluated two alternative model structures with two data streams, i.e., CH4 emission and pore water CH4 concentration data, in simulating peatland CH4 emission and its pathways.

(EBG) model
Previous studies suggested that the EBG method of modeling ebullition agreed better with the observed temporal variability in CH4 emissions (R 2 = 0.63) when compared with the ECT (R 2 = 0.56) and EPT (R 2 = 0.35) methods . We also found that the EBG-simulated CH 4 emissions (RMSE = 0.53) had a better agreement 440 with observations than the ECT method (RMSE = 0.61). Ebullition events simulated by EBG had a higher frequency but a smaller magnitude than those obtained from ECT, which is consistent with onsite minirhizotron observations of small bubbles around fine roots (Fig. S1). Although the ECT method was able to simulate a similar seasonal pattern of CH4 emissions as EBG, the mean annual CH4 emission was 17.8 % lower compared with the EBG method. Peltola et al. (2018) reported that the different ebullition modeling approaches simulated significantly 445 different amount of CH4 stored belowground and distinctly different distributions of CH4 along the soil profiles. In line with their results, we found that the ECT method produced much higher pore water CH4 concentrations than the EBG method, especially in the deep layers (Fig. 4).
Of the few modeling studies that compared results with observed belowground CH 4 concentration, Walter et al. (2000) simulated CH4 concentration with an early-generation ECT method. This method used a constant 450 concentration threshold that was tuned to match the observed concentration data, but they also found discrepancies with observed data (only CH4 concentrations within the first 50 cm soil were compared). Tang et al. (2010) compared the EPT method with the early-generation ECT method and found that EPT had an improved CH4 concentration profile, although a mismatch in the concentration profile remained, especially from the model that best reproduced observed CH4 emissions. The recently developed Land Surface Model (LSM) with a new microbialfunctional group-based CH4 module incorporated, i.e., the ELM_SRUCE model, used the modified ECT method for ebullition process, but incorporated the acetoclastic and hydrogenotrophic pathways for methanogenesis as well as anaerobic and aerobic oxidations (Ricciuto et al., 2021). This model could accurately predict the seasonal cycle of CH4 production and net fluxes, but CH4 concentrations in soil layers deeper than 1m were still not well simulated (Ricciuto et al., 2021) and led to different estimates of emission pathways (23.5 % for PMT, 15.0 % for diffusion, 460 and 61.5 % for ebullition) with our study (31.8 ± 4.9 % for PMT, 58.1 ± 5.1 % for diffusion, and 9.9 ± 6.1 % for ebullition). In our study, when training the modified ECT model with both CH4 emission and pore water concentration data, no parameter set was accepted, which suggested that the ECT method was not able to simultaneously reproduce both the magnitude of observed CH4 fluxes and the patterns of pore water CH4 concentrations, no matter the combinations of parameters used. In contrast, the EBG method could capture observed 465 CH4 emissions and the patterns of pore water CH4 concentration profiles simultaneously (Figs. 3 and 4).
Moreover, we found that although both the ECT and EBG methods were able to represent the general patterns of observed CH4 emissions, the flux-constrained parameter distributions varied between the two methods. For example, f CH 4 increased but Q10pro decreased in EBG compared to ECT (  (Williams et al., 2009). More studies are needed to further explore model structures and parameter optimization methods to best simulate CH4 production and emission processes and the underlying mechanisms.

Pool-based CH4 concentration data reduced the uncertainty of the emission pathways
good simulations of CH4 emissions did not necessarily reproduce a realistic pore water concentration profile (Figs. 3 and 4). By comparing the parameter posterior distributions trained by observed CH 4 emissions with and without observed pore water concentration profiles using the same model structure, TECO_SPRUCE_EBG, we revealed that CH4 emission data could constrain the CH4 production-related parameters fCH4 and Q10pro and ebullition-related parameter Vmaxfraction very well. The ebullition-related parameter f was edge hitting, but parameter bubprob and plant 480 transport-related parameter Tveg remained unconstrained, causing large uncertainty in simulated ebullition and plantmediated transport (Table 4). However, by training the model with both CH4 emission and pore water CH4 concentration data, the parameters regulating CH4 production, plant transport, and ebullition were all well constrained (Fig. 2). This is because the vertical concentration profile of CH4 is a balance between the dynamic CH4 production, oxidation and three emission pathways. The constrained parameters contributed to a more accurate 485 estimation of porewater CH 4 concentration (RMSE = 0.12) and better constrained emission pathways (Table 4, Fig.   4).
Previous studies have emphasized the importance of combining carbon-pool data with carbon-flux data to improve estimated ecosystem carbon exchange. For example, Richardson et al. (2010) reported the initial leaf pool size could not be constrained until biomass information was combined with flux data. Du et al. (2015) also found 490 that carbon flux data could constrain parameters reflecting instant responses to environmental changes such as temperature sensitivity, while pool-based data mainly contained information that could help constrain transfer coefficients. GPP/ER data could effectively constrain parameters that were directly related to flux data, such as the temperature sensitivity of heterotrophic respiration, the carbon allocation to leaves, and leaf turnover rate (Fox et al., 2009). In our study, the CH4 emission data mainly constrained parameters that represented instant responses to 495 temperature change (Q10pro) and input rate from the source pool (f CH 4 ). The pore water CH4 concentration data contributed to constraining the allocation rates of CH4 to the different emission pathways. Due to the different information contained between CH4 flux and concentration data, we highly recommend that both types of measurements should be made when possible, and that both data streams should be used when constraining CH4 models.

500
It needs to be noted that there is a large disagreement in simulated relative contribution by ebullition between CH4 flux data constrained models (i.e., 0.13 % by the ECT approach with a constant concentration threshold , 23.5 % by the modified ECT approach with varied concentration thresholds in our study, and 22.7 % by the EBG approach in our study, and the EBG approach constrained with both CH4 flux and concentration data (9.9 %) (Figure 3). This suggests the urgent need of observed data for separating these relative contributions in field 505 experiments, possibly through: 1) having continuous total emission flux measurement (Susiluoto et al., 2018), despite being hard to deploy and calibrate in the field; 2) separately measuring diffusive/plant-mediatedtransport/ebullition fluxes, despite being technically challenging; and 3) measuring belowground CH4 concentration profile as suggested in our study. At the SPRUCE experiment site, starting in the summer of 2022, two autochambers with footprint of 0.2m 2 will be deployed in each plot to measure CO2 and CH4 fluxes. Along with the continued CH4 profile measurement, these whole set of observations will provide the opportunity to further evaluate these discussed approaches for improving model simulations.

Broader impacts and implications
Large uncertainties exist in understanding future wetland CH4 emissions in response to projected climate change, which result from inappropriate model structure and insufficient parameterizations even after the uncertainties in 515 wetland areas are considered (Melton et al., 2013;Luo and Schuur, 2020). Decades of modeling research on CH4 has evolved to a stage that the emission pathways are explicitly calculated with various complexities, but determining the accuracy and uncertainty of individual pathways still requires more research (Xu et al., 2016).
Currently, models fail to reproduce diffusive fluxes by more than 40 %, mainly due to the lack of validations by the pore water CH4 concentrations (Tang et al., 2010;Riley et al., 2011). In LSMs, plant transport exclusively dominates 520 CH4 emission in all wetland types tested (Tang et al., 2010;Wania et al., 2010;Peltola et al., 2018). However, according to the experimental studies, each of the three emission pathways can dominate, depending on the wetland type, vascular plant coverage, and the height of the water table (Whiting andChanton, 1992, Shannon et al. 1996).
By assimilating empirical data of both CH4 flux and pore water CH4 concentration data, our data-model fusion study proposes a more reasonable model structure and more robust parameter estimates with greatly reduced uncertainties.

525
Our results also implicate barriers of current CH4 modeling studies and suggest future directions for both modeling and experimental efforts, namely: 1) the under-described CH4 processes in models and 2) the lack of observational data to constrain key CH4 processes in the models. More explicit CH4 processes are needed in modeling CH4 emission and its pathways. For example, in this study, the maximum aerobic oxidation rate (Omax) was always poorly constrained with wide, slightly domed distributions (Fig. 2c) regardless of what observation data 530 was being assimilated into the models. This poor constraint might partly result from the missing anaerobic oxidation process in the models. In current process-based models, much of the descriptions of CH4 dynamics in wetland soils are based on the premise that the oxidation of CH4 occurs only in aerobic environments (Wania et al., 2010;Riley et al., 2011;Bridgham et al., 2013). However, the anaerobic oxidation of CH4 may be an important sink for CH4, sometimes reducing emissions by over 50 % in experimental studies (Smemo and Yavitt, 2011;Gupta et al., 2013;Segarra et al., 2015). Recently, a microbial-functional-group-based CH4 model was developed accounting for both aerobic and anaerobic CH4 oxidations and this model has been validated against the concentration of CH4 and CO2 from incubation data (Xu et al., 2015). In Xu at el. model, 7 out of total 33 key CH4 process parameters controls CH4 oxidation and their values varies widely across different ecosystems and environmental conditions. Incorporation of anaerobic CH4 oxidation into LSMs may help improve the calculations of CH4 oxidation, if the uncertainties from 540 these CH4-oxidation-related parameters can be reduced.
While more comprehensive and process-based models for simulating all the processes or mechanisms involved in CH4 emissions are laudable, observations on such specific processes are critical to constrain parameters and reduce model uncertainty. Without sufficient data to evaluate such processes or to calibrate models, developing such complex models to explicitly simulate these processes could also introduce large uncertainties. Increased model complexity only contributes to the improved forecasting if parameters can be calibrated adequately by observed data (Famiglietti et al., 2021). If there were not enough observational data for model calibrations, increased complexity can lead to even worse forecast skills than the intermediate-complexity models (Shi et al., 2018;Famiglietti et al., 2021). Currently, similar to our model, many process-based biogeochemistry models (e.g., CLM, LPJ, TRPLEX, JULES, TEM) also use a parameter that varies with soil conditions to describe the potential ratio of CO2 becoming 550 CH4 (fCH4), which is due partly to the limitation of data availability. Another example of observational data hindering model development is the unconstrained parameters to calculate plant-mediated transport. Although new algorithms and parameters to calculate plant aerenchyma transport have been added to LSMs to represent this mechanism more realistically, the parameters such as tiller radius, number of tillers, cross section area of tillers, and the tiller porosity are highly idealized and poorly constrained (Wania et al., 2010;Riley et al., 2011). In the 555 TECO_SPRUCE model used in this study, the parameter Tveg was used as a proxy of the ability of the whole plant community (e.g., biomass and abundance) to emit CH4. Root growth was simulated by a phenological process using LAI and temperature and in situ fine root profile measurements were used as a proxy for vertical rooting distributions (Iversen et al., 2018). Tveg was well constrained by the observed CH4 emission and concentration data at a range of 1.43 ± 0.46, which indicated that the ability of the plant community to emit CH4 at this site was low 560 (compared to its prior knowledge of 0.01-15, Table 2). Empirical measurements of plant-mediated CH4 transport at the same study site supported our model results (Scott Bridgham, personal communications). This finding can also be explained given that the diversity and abundance of aerenchymous plants at our study site were low, similar to many other northern ombrotrophic bogs.

565
Understanding relative contributions of CH4 emission pathways is critical to mechanistically modeling future CH4 dynamics. Acknowledging that pore water CH4 concentration is the driving force for each emission pathway, we evaluated the ability of two ebullition modeling approaches to reproduce observed CH4 emissions and pore water concentration profiles at a large-scale manipulated experimental site in a northern Minnesota, USA peatland. The Ebullition Bubble Growth volume threshold approach (EBG) fits the observed CH4 emissions and CH4 570 concentration profiles much better than the modified Ebullition Concentration Threshold approach (ECT), especially for CH4 concentrations in the deeper soil layers. By assimilating the net CH4 emission and belowground CH4 concentration data into the models, we substantially reduced the uncertainties of modeled CH4 emissions from the involving emission pathways. While net CH4 efflux data are often the only data stream for CH4 model validations, we recommend that more attention be given to in situ measurements of the porewater CH4 concentrations and 575 assimilations of the concentration data for model parameterization. Since the relative ratio of the emission pathways (ebullition, plant-mediated transport and diffusion) determines how much CH4 is oxidized before leaving the soil, due to their different transport rate and vulnerability to oxidation, we also suggest that the EBG approach should be incorporated into Land Surface Models (LSMs) so that the projections of both CH4 emission and its transport processes are more realistic in response to climate change scenarios. Future studies should also include anaerobic Data availability. All data sets from this study are publicly available at project websites. Relevant measurements were obtained from the SPRUCE webpage (http://mnspruce.ornl.gov/), the archival ftp site (ftp://sprucedata.ornl.gov), or from the USDA Forest Service.

585
Author contributions. SM and YQL designed the project. SM, JJ, YYH, and XJL carried out modeling study. RMW, JPC, CMI, AM, PJH and SB provided experimental data for model evaluations and parameter optimization. SM, LFJ and YQL prepared the manuscript. All authors contributed to analyzing and interpreting the modeling results, and improving the manuscript.

590
Competing interests. The authors declare that they have no conflict of interest.
Du's help with discussing the data-model fusion techniques.