Modeling nitrous oxide emissions from agricultural soil incubation experiments using CoupModel

. Efforts to develop effective climate mitigation strategies for agriculture require methods to estimate nitrous oxide (N 2 O) emissions from soil. Process-based biogeochemical models have been often used for ﬁeld- and large-scale estimates, while the sensitivity and uncertainty of model applications to incubation experiments are less investigated. In this study, a process-oriented model (Coup-Model) was used to simulate N 2 O and CO 2 ﬂuxes and soil mineral nitrogen (N) contents in a short-term (43 d) factorial incubation experiment (16 treatments). A global sensitivity analysis (GSA) approach, “Morris screening”, was applied to quantify parameter sensitivity. The GSA suggested that a higher number of sensitive parameters was associated with N 2 O ﬂux estimates and that inter-treatment variations in parameter sensitivities were distinguished by soil moisture levels or NO − 3 content and residue types. Important parameters regarding N 2 O ﬂux estimates were linked to the decomposability of soil organic matter (e.g.,

Abstract. Efforts to develop effective climate mitigation strategies for agriculture require methods to estimate nitrous oxide (N 2 O) emissions from soil. Process-based biogeochemical models have been often used for field-and large-scale estimates, while the sensitivity and uncertainty of model applications to incubation experiments are less investigated. In this study, a process-oriented model (Coup-Model) was used to simulate N 2 O and CO 2 fluxes and soil mineral nitrogen (N) contents in a short-term (43 d) factorial incubation experiment (16 treatments). A global sensitivity analysis (GSA) approach, "Morris screening", was applied to quantify parameter sensitivity. The GSA suggested that a higher number of sensitive parameters was associated with N 2 O flux estimates and that inter-treatment variations in parameter sensitivities were distinguished by soil moisture levels or NO − 3 content and residue types. Important parameters regarding N 2 O flux estimates were linked to the decomposability of soil organic matter (e.g., organic C pool sizes) and the denitrification process (e.g., Michaelis constant and denitrifier respiratory rates). After calibration, the model better captured temporal variations and magnitude of gas fluxes and mineral N in unamended soils than in residue-amended soils. Low-magnitude daily and cumulative N 2 O fluxes were well simulated with mean errors (MEs) close to zero, but the model tended to underestimate N 2 O fluxes, as observed daily values increased by over 0.1 g N m −2 d −1 , in which the major mismatch was due to limited success of the model to describe the high emissions during the first few days after crop residue addition. A larger uncertainty was also seen in the magnitude of pulse emissions by the posterior simulations.
We also evaluated ancillary variables regarding N cycling, which indicated that more frequent measurements and additional types of observed data such as soil oxygen content and the microbial sources of emitted N 2 O are required to further evaluate model performance and biases. The major challenges for calibration were associated with high sensitivities of denitrification parameters to initial soil abiotic conditions and the instantaneous residue amendment. Model structure uncertainties and improved modeling practices in the context of incubation experiments were discussed.

Introduction
The potent greenhouse gas nitrous oxide (N 2 O) has been estimated to be responsible for about 7 % of the overall global radiative forcing by long-lived greenhouse gases (World Meteorological Organization, 2021). N 2 O emissions from the agricultural sector account for 60 %-70 % of the total anthropogenic emissions of this gas (Davidson and Kanter, 2014;Syakila and Kroeze, 2011). However, it is persistently difficult to reduce the uncertainties of N 2 O emission estimates, and one reason is associated with the high spatiotemporal variability in N 2 O (Kravchenko et al., 2017). To provide a scientific basis for developing achievable climate mitigation strategies, an improved understanding of N 2 O production in agricultural soils and quantification of N 2 O emissions are urgently needed.
N 2 O emissions from agricultural soils are driven by a suite of microbiological processes, among which nitrification and denitrification predominate as sources of N 2 O (Butterbach-Bahl et al., 2013). The factors directly regulating nitrification and denitrification activity are the availability of mineral nitrogen (N), oxygen, and degradable carbon (C) sources used by denitrifying organisms (Wijler and Delwiche, 1954). Indirect controls include soil temperature, moisture, pH, and soil texture. During nitrification, when ammonia (NH 3 , at equilibrium with ammonium NH + 4 ) is oxidized to nitrate (NO − 3 ), a small proportion of N may be lost as N 2 O (Firestone and Davidson, 1989). Nitrification mainly occurs in well-aerated soils with moderate water content (Goreau et al., 1980;Li et al., 1992;Parton et al., 1996). In contrast, denitrification is a microbial process that occurs under anaerobic conditions in which NO − 3 is reduced to gaseous N. Soil C substrates are electron donors for denitrification, but they are also a sink for oxygen that can lead to anaerobic microsites (Parkin, 1987) where NO − 3 is used as an electron acceptor for microbial respiration. Nitrifying and denitrifying bacteria are most active in producing N 2 O in environments with abundant N relative to assimilatory demands by other microorganisms or plants (Firestone and Davidson, 1989), as is often the case following soil amendment of fertilizers, manure, or crop residues.
Farming practices influence the interactions between microbial, physical, and chemical processes in the soil, and the uncertainties associated with predicting N 2 O emissions from agricultural soils amended with plant residues such as cover crops or green manure can be particularly high. The incorporation of crop residues can reduce NH 3 losses and enhance degradation compared to leaving residues at the soil surface, but at the same time the N input, elevated water holding capacity, and high local oxygen demand of residues may stimulate the development of anaerobic microsites and bacterial denitrification activity (Kravchenko et al., 2018;Kuzyakov and Blagodatskaya, 2015). Also, mechanical disturbances caused by tillage may influence soil properties (e.g., porosity, aggregate size distribution, solute and gas diffusivities) and microbial enzyme activities associated with N 2 O emissions (Grandy and Robertson, 2006).
The quantification of N 2 O emissions from agroecosystems based on field experiments often meets challenges associated with logistics, data gaps, and long-term resource support (e.g., man power, equipment, and research budget). Process-oriented biogeochemical models, e.g., DNDC (Li et al., 1992), DayCent (Parton et al., 1996), APSIM (Keating et al., 2003), and CoupModel (Jansson and Moon, 2001), can be a complementary approach to quantify N 2 O emission, partly addressing the above-mentioned challenges. When applying process-based models, available in situ measurements can be used to infer model parameters and allow simulation of soil N transformations and N 2 O emissions on temporal and spatial scales beyond the monitoring sites, but accurately simulating the magnitude of and temporal variability in N 2 O fluxes under contrasting field conditions remains a challenge. While models provide reasonable estimates of annual N 2 O emissions, they become less successful at finer time resolution (e.g., diurnal time steps), and this represents a barrier to evaluating the effects of agricultural land use and management on greenhouse gas emissions (Brilli et al., 2017). Such model errors are often attributed to physical and biogeochemical processes being inadequately represented, suggesting there is a need to improve process descriptions beyond parameter optimization (Abdalla et al., 2010;Brilli et al., 2017;Gaillard et al., 2018;Uzoma et al., 2015).
Process-based models attempt to reproduce the most relevant physical and biogeochemical processes through understanding grounded in the best available knowledge. Coup-Model, used in the current investigation, has a high level of detail on the interaction between biotic and abiotic processes and has adopted submodules of nitrification, denitrification, and gas transport from the model DNDC (Li et al., 2000;Norman et al., 2008). The description of N 2 O emissions, including the links between soil environmental factors and biological reactions, is based on a series of hypotheses and results generated from both field measurements and laboratory incubations studies (Li et al., 2000), and the algorithms and parameterization of microbial growth and decay dynamics were specifically supported by the latter. Hence, controlled laboratory experiments, in which the impact of ill-defined pedo-climatic conditions on model predictability can be minimized, may be useful to optimize the model prediction of N 2 O emissions following residue amendment.
The application of process-based models has often been challenged by the paucity of prior information and measurements compared to the model's demands, and this is also the case when applying a model to incubation experiments. One widely used model calibration method to bridge the gap between model requirements and available data, as well as to quantify parameter uncertainties, is generalized likelihood uncertainty estimation (GLUE) (Beven and Binley, 1992). During model calibration, uncertainty analysis can assess how robust the model reasonably describes the measurements and identify possible reasons to explain the model errors (U.S. Environmental Protection Agency, 2009). This may be facilitated by applying a global sensitivity analysis (GSA), which can rank the sensitivities of parameters so that the model calibration can focus on the relatively more sensitive parameters , and thereby the model's uncertainties can be more efficiently constrained. While model processes and performance have been extensively documented, in many studies N 2 O emissions alone were used to train and test the subroutines of nitrification and denitrification (Chen et al., 2008). Evaluation under controlled conditions and with ancillary measurements is noticeably lacking, which makes it difficult to assess the submodules related to C and N processes like, for example, decomposition, nitrification, and denitrification. Thus, a first step in understanding model performance may be an evaluation using new datasets that contain different variables linked to N cycling based on targeted laboratory experiments. To our knowledge, no previous study has attempted a systematic sensitivity and uncertainty analysis in the prediction of N 2 O emissions based on soil incubation experiments. The present study focused on the role of reactive C and N for N 2 O emissions and used simulations of targeted experiments to identify key drivers. Simulating N 2 O dynamics in a short-term laboratory study may be considered as zooming in on a single field operation, in this case the incorporation of crop residues by standard tillage operations.
For this work, we used CoupModel to simulate incubation experiments. CoupModel has a flexible setting of soil layer thickness down to a scale of millimeters and is proper for studying soil physical processes at the scale of incubation experiments. Besides, CoupModel has integrated modules that implement parameter calibration and uncertainty analysis (Jansson, 2012). The calibration datasets used in the model were obtained from a 43 d laboratory incubation using a factorial-based design with various crop residue practices and abiotic factors (Taghizadeh-Toosi et al., 2021). Specifically, our objectives were (i) to conduct a global sensitivity analysis for parameters in a CoupModel setup that represents soil N and C processes for incubation treatments; (ii) to calibrate the model and assess model uncertainty in the estimates of N 2 O and CO 2 emissions, soil ammonium, and nitrate under different residue applications; and (iii) to discuss any limitations in the model optimization with short-term incubation experiments and suggest directions for future model development. We hypothesized that the model is able to simulate the daily and cumulative N 2 O emissions under contrasting environments in incubated soil cores. Furthermore, we hypothesized that only a few parameters in a complex model could be constrained to an unambiguous solution with limited laboratory measurements. Our findings can thus help diagnose potential causes of model-measurement discrepancies concerning C and N processes and specify conditions for which model and data collection need to be improved.

Laboratory incubation experiment
In spring 2018, soil used for the experiment was collected from the 0-20 cm tilled layer at the Lönnstorp Field Station, Sweden. Red beets had been grown in the previous year with no cover crop during winter. The soil is sandy loam (61.8 % sand, 22.4 % silt, and 15.8 % clay) with a pH of 6.18, C content of 15 g kg −1 , and N content of 1.49 g kg −1 . After sampling, the soil was partially dried, stored at −20 • C, and thawed the day before being sieved to < 6 mm for use in the experiment.
Treatments were prepared under four different soil conditions including two moisture levels (i.e., 40 % or 60 % WFPS -water-filled pore space) and two nitrate content levels (i.e., no nitrate addition or addition of KNO 3 to 100 mg NO − 3 -N kg −1 dry weight soil). Soil cores were pre-pared by stepwise-packing 1 cm layers of soil to a density of 1.25 g cm −3 in cylinders with a height of 8 cm, at each step adding deionized water or a KNO 3 solution. Following moisture and nitrate adjustments, the soil was pre-incubated for 1 week at 15 • C. The experiment involved two different crop residues: red clover (RC) and winter wheat (WW). RC residues had a C/N ratio of 17.9 and a moisture level corresponding to 80 % of the fresh weight. The WW residues had a C/N ratio of 90.9, and the moisture content corresponded to 20 % of the fresh weight. WW residues had a higher proportion of lignin and ash (11.7 %) than RC residues (5.1 %). On Day 0 of the experiment, RC or WW residues were either mixed in an amount of 0.04 g DM cm −2 (DM signifies dry matter) into the soil from 0-4 cm depth and then repacked, or residues were placed as a layer at 4 cm depth; only results from the mixed treatments were used in the present study. Incubations with RC and WW took place sequentially, and therefore each residue treatment had its own set of unamended controls. Thus, in total 16 incubation treatments (i.e., 40 % vs. 60 % WFPS; ambient nitrate vs. nitrate addition; RC vs. WW; and soil amended with residues vs. unamended controls) were used for this modeling study. All cylinders were covered at both ends with perforated plastic caps and incubated at 15 • C for up to 43 d. Gas sampling for N 2 O and CO 2 flux measurements took place 10 times, i.e., on day 1, 3, 6, 9, 13, 16, 22, 29, 36, and 43. Gas concentrations were determined by gas chromatography. Additionally, nitric oxide (NO) fluxes were quantified in four selected treatments set up separately. Soil mineral N pools in all treatments were measured at four destructive samplings after 1, 6, 22, and 43 d of incubation. Further details about the experimental treatments, preparations, and analytical methods are given by Taghizadeh-Toosi et al. (2021).

CoupModel
This study used CoupModel v6.1, which can be downloaded from http://coupmodel.com, last access: 1 November 2021. A detailed description of CoupModel can be found in Jansson and Karlberg (2010). The main structure of the model is a one-dimensional vertical soil profile with user-defined layer thickness and subdivisions. The current setup of CoupModel includes a number of components, of which the following processes are linked to N 2 O emissions ( Fig. 1): (i) soil organic matter (SOM) decomposition and mineralization, (ii) nitrification and nitrifier growth, (iii) denitrification and denitrifier growth, and (iv) gas diffusion between soil layers and internal exchange of N trace gases between aerobic and anaerobic microsites. In the nitrification subroutine, Coup-Model accounts for response functions of soil temperature, soil moisture, mineral N concentration, and pH. For denitrification, each step in the chain of denitrification is explicitly simulated, and denitrifier activity is directly influenced by soil temperature, pH, nitrogen oxides, and anaerobic fraction. The anaerobic soil volume fraction is calculated using the "anaerobic balloon" concept, as implemented in the DNDC model (Li et al., 2000;Norman et al., 2008).

Simulation settings
The modeled soil profile consisted of a single soil layer with a depth of 4 cm for the control treatments and 4.2 cm for the treatments with residues, allowing a 2 mm increment as observed in the experiment. We only simulated the upper half of the 8 cm soil core to have proper gas boundary conditions since the model only allows external gas exchange at the upper side of the soil profile, whereas in the experiments both ends of the cylinder were exposed to air. For the water process, it was assumed that there was no evaporation from the surface and no vertical water flow across the lower boundary, and a constant temperature was set for the upper and lower boundaries, in accordance with incubation conditions. The model was initialized based on the measured soil water content, soil porosity, temperature, pH, total organic C and N, NO − 3 -N, and NH + 4 -N of the incubated soil cores. The dynamics of SOM were simulated with first-order kinetics using three pools (litter1, litter2, and humus). Considering there were no explicit pools designed for crop residue addition, we assigned the rapidly decomposable SOM and easily metabolized residue components (e.g., sugars and proteins) to litter1, the moderately decomposable SOM and structural residue materials (e.g., lignin and other fibers) to litter2, and the resistant SOM to humus. When initializing the organic matter pools, we took the soil pool sizes from published papers and assumed that they were already close to a steady state at the start of incubations (see the description in the following paragraphs), and hence the changes observed during incubation were assumed to be mainly caused by the decomposition of the residues introduced. For simulating gas transport, we selected the steady-state mode in which the oxygen content within the soil profile was derived from the balance between soil oxygen consumption and surface air diffusive supply, and the transient storage term of gases in the soil air was not considered here.
Calibration datasets. Measurements used for model calibration were N 2 O flux, CO 2 flux, NO − 3 -N content, and NH + 4 -N content. As the gas fluxes and mineral N content in the upper part with soil-residue mixtures and the lower part with bulk soil were not analyzed separately, it was assumed that soil C and N turnover in the lower, unamended part was identical to that of control treatments in order to create datasets for the residue-amended part for modeling. Specifically, the amounts of mineral N and gas fluxes recorded on individual sampling days in the controls were divided by 2 and subtracted from the values recorded in residue-amended soil.
Initial values.
(1) For mineral N, since the mineral N content in the unamended control soil changed little during incu-bation and the mineral N content in crop residues was negligible, the initial NH + 4 -N and NO − 3 -N values for the control and residue-amended treatments were taken as the measurement in control soil on day 1. (2) For soil moisture, for the control treatments, the initial volumetric water content was calculated from the water-filled pore space (WFPS) levels of 40 % or 60 % and the total porosity of 0.53. For the residue treatments, the initial volumetric water content was calculated from the moisture content of soil and crop residues (Taghizadeh-Toosi et al., 2021). (3) For organic matter pools, the partitioning of soil organic C between litter1, litter2, and humus was defined by the ratio 0.02 : 0.54 : 0.44 (Gijsman et al., 2002). For crop residues, the fraction of easily metabolized organic C was calculated from the lignin / N ratio: 0.85-0.013 (lignin / N) (Gijsman et al., 2002), and hence the organic C allocation between litter1 and litter2 had a ratio of 0.82 : 0.18 for RC and 0.55 : 0.45 for WW. The allocation of organic N to different pools followed the pattern of C and the C/N ratios (Table S5).
A summary of calibration datasets can be found in Table S1 of the Supplement, in which cumulative gas emissions were estimated by linear interpolation between sampling dates and integration of the area under emission curves, while average mineral N contents were calculated by dividing the integrated values by the sampling period. The results for nitrate in soil cores with residues were not included due to high uncertainty in the calculations that was probably caused by solute transport between the unamended and amended soil layers, as observed in a related incubation experiment using some of the same soil and residue treatments conducted by Lashermes et al. (2022).

Global sensitivity analysis
Given uncertain prior information, the study used Morris screening (Morris, 1991) for a global sensitivity analysis to identify parameters to which estimates of N 2 O fluxes were sensitive. We included seven input parameters related to the characteristics of soil and crop residues (i.e., soil porosity, crop residue porosity, soil pH, and organic C pool sizes), and the parameter ranges were defined around the mean values of measurements or estimations (Table S4). For parameters supported by measurements, i.e., soil porosity and pH, the ranges were within 25 % of the mean values to represent realistic microscale variations in the laboratory setup. The residue porosity was estimated with a bulk density of 0.18 g cm −3 and a dry density of 1.3 g cm −3 (Lam et al., 2008;Zhang et al., 2012), as well as with a wider uncertainty range of 40 % due to compressibility. Bulk soil porosity and crop residue porosity were used to calculate the soil-residue-mixture bulk density used in the model input, and this calculation is included in the Supplement. The ranges of organic pool fractions were derived from literature values for cultivated soil, in Figure 1. A conceptual diagram of major C and N processes in the current setup of CoupModel. The details of parameters and equations in each C or N process can be found in Table S2. which the humus pool size (SOC h ) had an uncertainty range of ca. 40 %, and the labile carbon pool size (SOC l1 ) varied within 2 orders of magnitude considering the marginal fraction. The range of crop residue organic pool sizes was taken from the constraints of the different estimated fractions of the two crop residues.
Besides the seven input parameters, we considered 45 process parameters involved in the relevant model processes. Part of the parameter ranges (e.g., t Q10a , d pHhrate , cn m ) adopted in the study were based on the most relevant applications of CoupModel, experimental studies, and other pro-cess models as shown in Table S4. Ranges for the remaining model-specific parameters, including those involved in nitrification and denitrification, could not be derived from the existing literature, and in the absence of better prior information, we adopted the default ranges set by the model. The Morris screening method is commonly used for sensitivity analysis based on an efficient sampling strategy for performing a randomized calculation of one-factor-at-a-time (OAT) sensitivity analysis. This method can be viewed as a compromise between a simple OAT approach and the more complex GSA methods (e.g., variance-based approaches), as it provides a good approximation of the global sensitivity measure of the parameters at an affordable computational cost. Furthermore, it was considered excessive and unnecessary in the present study to adopt a more detailed analysis given the limited available measurements.
The elementary effect (EE) was estimated by comparing the variation of the model's output y j with the variation of a given parameter θ i , according to Eq. (1). The number of iterations n was set to 50, and the optimal perturbation factor was set to 2/3 by dividing the input space into four levels (Morris, 1991). To allow comparison across outputs, the EE was then standardized using the standard deviation of the model factor and the standard deviation of the output (SEE, Eq. 2). The significance of the impact of parameters was tested by comparing the mean of the SEE of those parameters to twice the standard error (sem, Eq. 3) (Sin et al., 2009). If the input factor lies outside this range, it is said to have a significant effect on the output. The codes used in the analysis were adapted from Sin et al. (2009).
The GSA was performed to a measure of model performance, root mean square error (RMSE), for three variables: N 2 O flux, CO 2 flux, and soil NH + 4 , which had relatively complete measurement datasets. By applying the sensitivity analysis to the likelihood measure, the main factors that drive the model runs with a good fit to data could be identified (Ratto et al., 2001). The results from sensitivity analyses were further used to identify process parameters for inclusion in the uncertainty analysis due to their contribution to output variability.

Uncertainty analysis
Model calibration was conducted separately for each of the 16 treatments to give more flexibility in model parameterization. The calibration was carried out with reference to five measurement variables, namely N 2 O flux, CO 2 flux, NH + 4 content, NO − 3 content, and NO flux (only four treatments), using the GLUE method. The GLUE method does not seek the single best fit to the measured data but utilizes an ensemble of model simulations that represent equally good results using informal likelihood measures (also called acceptance criteria) to account for parameter equifinalities. In this study, we first described the entire ensemble of model runs as prior runs, and after applying selection criteria the selected ensemble of model runs was analyzed as posterior runs or behavioral runs. Based on the calculated sensitivity indices from Morris screening, a total of 26 process parameters were selected for calibration, in which the parameters with marginal SEEs were omitted and only one denitrifier growth parameter was kept in each step of the denitrification chain (see Table S4). These parameters were uniformly or log-uniformly distributed within the predefined ranges, from which 20 000 parameter sets were then randomly sampled for model runs. Out of these runs, whether a parameter set was accepted or not was based on the defined criteria, which in this study consisted of the coefficient of determination, R 2 , and mean error, ME. The latter is defined as ME = E (O i − S i ), where S i and O i are the simulated and observed data at each measurement time step.
The ME acceptance threshold of each variable was set to be around the average of daily measurements, taking into account the different magnitudes of each variable (see Table S3). The ME values of prior runs for some measurement variables often showed a skewed distribution deviating from 0, meaning that the model may have systematically overor underestimated these variables. While setting the same threshold on both sides rejected most of the prior model runs, the ME criterion on one side might be looser than the other. For N 2 O emissions with marked peak fluxes, a combination of R 2 and ME was used for the selection of posterior parameters to simulate the dynamics and magnitudes. An ensemble of ca. 50 posterior runs was selected with an acceptance rate of 0.25 % based on prior simulations. The uncertainties of model predictions were quantified within the limits and posterior probability distributions of parameters.
Finally, to investigate whether the treatment effects of soil moisture and nitrate levels could be represented by a common parameterization, we also calibrated process parameters against measurements combined with data from every four treatments with the same residue application including "40 % WFPS, −NO − 3 ", "40 % WFPS, +NO − 3 ", "60 % WFPS, −NO − 3 ", and "60 % WFPS, +NO − 3 ". The prior parameter ensembles used the same 20 000 parameter sets as the single-treatment calibration. Accordingly, the measurement datasets from the four treatments in each group were pooled, and thus a larger dataset for model evaluation was obtained. The procedure of selecting behavioral runs followed the aforementioned approach based on ME and R 2 . A diagram that describes the analysis workflow for this study is presented in Fig. 2. 3 Results

Sensitivity analysis
The results of Morris screening were evaluated by comparing the absolute SEEs concerning N 2 O flux, CO 2 flux, and soil NH + 4 for individual treatments. Figure 3a-c list the parameters ranking in the top five SEEs across the 16 treatments. In general, a higher number of sensitive parameters were found for N 2 O flux (18) than CO 2 flux (8) and NH + Figure 2. A schematic diagram of the workflow of the study to analyze parameter sensitivity and uncertainty of the CoupModel based on 16 treatments from an incubation experiment (see text for details). rameters were found for N 2 O flux. For CO 2 flux and NH + 4 , inter-treatment variations were found in the parameters relevant to soil moisture levels (e.g., p θ Low and θ wilt ), NO − 3 content (d growthNO3 ), or residue types (e.g., f e,l2 , f e,l1 , and cn m ). For N 2 O, the parameters exhibiting relatively high SEE values for most treatments belonged to categories of SOM decomposition and denitrification (Table S2), including d effNO , SOC h , d growthNO3 , cn m , and d hrateNxOy . The model input, SOC h , characterizing the partitioning of SOM pools in the simulation, was found crucial in 13 out of 16 treatments. d effNO represented the respiration of denitrifying bac-teria based on NO and showed relatively large elementary effects for almost all treatments by directly regulating the reduction step from NO to N 2 O. The parameter d growthNO3 described the loss of NO − 3 from the anaerobic nitrogen pool due to microbial growth. d hrateNxOy represented the N concentration for half the rate in the denitrification process and was also known as the Michaelis constant of the enzyme (see n13, n15, n17, n19, and n20 in Fig. 1).
Parameters that had the greatest impact on CO 2 emissions were concentrated in the following: SOC h , SOM decomposition rates (k l2 , k l1 ), and the corresponding efficiencies (f e,l2 , f e,l1 ). In addition, p θ Low and θ wilt , which controlled the lower limit of the soil moisture response function for the decomposition of organic matter (see Eq. 5.86 in Table S2), exhibited distinct influences for the treatments at the lower moisture level.
The main processes influencing the NH + 4 content of the soil were identified as SOM decomposition and denitrification, and influential parameters included cn m , SOC h , k l2 , and f e,l2 . The C/N ratio of microbes, cn m , has an influence on the mineral N content by changing the magnitude and direction of soil mineralization and/or immobilization of nitrogen (see n1-n7 and n11 in Fig. 1). Soil porosity (n soil ) had significant effects on some treatments, especially under higher moisture conditions. Besides, as a key intermediate of mineral nitrogen turnover, the content of NH + 4 was also influenced by denitrification-related parameters (e.g., d growthNO3 ).
The average SEEs across all treatments are shown in Fig. 3d-f. For the N 2 O flux, all parameters were located inside the wedge indicating that none of these parameters showed a significant effect across all treatments despite their significance in individual treatments. In contrast, we found that the other two variables, CO 2 flux and NH + 4 content, were significantly affected by 15 and 28 parameters, respectively. Moreover, all parameters showed nonlinear effects on the outputs as revealed by their non-zero standard deviations, which suggested that simulated C and N processes did not solely depend on individual parameters but also on their interactions.

Uncertainty analysis
3.2.1 Temporal dynamics of N 2 O flux, CO 2 flux, and mineral N The prior simulations generally showed that mean errors for gas emissions and soil mineral N largely deviated from zero, and these biases were reduced in the posterior simulations for most model outputs ( Fig. 4a-d). For N 2 O fluxes, most treatments amended with WW and corresponding controls did not show significant deviations from the observed fluxes. In contrast, in treatments amended with RC and corresponding controls, though the absolute MEs had been reduced, there were still significant deviations generally in the direction of underestimating the observed fluxes. For CO 2 emissions, 13 out of 16 treatments showed reduced mean errors in the behavioral models, and half of the treatments showed insignificant deviations from the observed fluxes. For soil NH + 4 content, there was a severe overestimation for most prior models, but this was alleviated by posterior models, and seven treatments showed insignificant deviations from zero after model calibration. For soil NO − 3 content in control treatments, the ME ranges of posterior runs were around zero, while negative or positive biases existed, especially the former.
In the experimental treatments with RC amendment, N 2 O emission rates were consistently low at 40 % WFPS but were markedly higher and peaked on day 3 at 60 % WFPS (Fig. 5a). The highest measured daily N 2 O flux was 1.4 g N m −2 d −1 in the RC treatment with NO − 3 addition at 60 % WFPS. Similar patterns were observed for CO 2 emission rates, with emission peaks at an early stage of incubation (day 1 or day 3), and then followed by a decline. In treatments amended with WW, N 2 O evolution rates were generally low compared to those with RC amendment. Higher rates were observed at 60 % WFPS and in NO − 3 -amended soil, but treatment effects were generally minor. For CO 2 evolution, higher rates were detected by day 1, but there was also a secondary peak after 1-2 weeks for the WW treatments (Fig. 5b). The control treatments of the WW residue incubations showed less CO 2 and N 2 O release compared to the control treatments of the preceding RC incubations. For soil mineral N, there was a consistent increase in NH + 4 -N content by day 1 after residue application compared to the controls, and both NH + 4 -N and NO − 3 -N showed smaller variations than gas fluxes (Figs. 5c-d and S6c-d). The simulated evolution of associated variables is depicted in Fig. 5a-d for treatments at 60 % WFPS and in Fig. S6a-d for treatments at 40 % WFPS, respectively, and the results are summarized below.
N 2 O. The posterior simulations (Figs. 5a and S6a) were able to represent the scenarios with low daily N 2 O emissions (10 −5 -10 −2 g N m −2 d −1 ), while simulations failed to capture the large emission peaks (e.g., 1.4 g N m −2 d −1 and 0.13 g N m −2 d −1 for the two RC treatments with NO − 3 amendment), or the emission dynamics were reasonably simulated (e.g., R 2 >0.4; see Table S3), but the peak values were lower than observed. The N 2 O fluxes obtained from the model tended to increase over time and generally agreed with the observed fluxes in the second part of the experiment. Compared with residue-amended soils, the model better described the magnitude and stable trend of N 2 O fluxes in control soils.
CO 2 . Overall, the posterior simulations mimicked the measured dynamics and magnitude of CO 2 emissions well (Figs. 5b and S6b). There were overestimations or underestimations by the model, most pronounced in the early stage of incubation. By day 14, a second peak of respiration was observed for the WW treatments but not captured by the posterior simulations.

NH +
4 . For RC residue treatments, an increase in NH + 4 was observed within the first day, followed by net N mineralization from the early-to mid-stage of the incubation period and then a decline. In contrast, in the posterior simulations for three of four treatments, the simulated NH + 4 corresponded to an enhanced net N mineralization throughout the incubation (Figs. 5c and S6c). Such a continuous increase in soil NH + 4 also existed in the prior simulations, which would not be radically altered in the posterior simulations by setting a stricter selection criterion for R 2 . For WW residue and control treatments the measured NH + 4 content was at the detection limit, and the magnitude of the simulated NH + 4 content was either in line with the measurements or a bit higher.
NO − 3 . The simulated daily NO − 3 content captured well the magnitude of the measurements and showed small variations throughout the incubation period for all control treatments (Figs. 5d and S6d). While in most of the control treatments except for the ones with high moisture and NO − 3 amendment, the observed NO − 3 levels remained stable or slightly increased during incubation. For the treatments amended with crop residues, the simulated NO − 3 content showed a noticeable decrease throughout the period, consistent with NO − 3 being utilized as a substrate of denitrification in the simulation. Though there was no explicit measurement of NO − 3 within the residue-amended layer in the present experiment, the average NO − 3 within the entire soil core in RC treatments showed net consumption followed by a rebound (Taghizadeh-Toosi et al., 2021), consistent with observations for the RC-amended layer in a comparable incubation experiment by Lashermes et al. (2022). Given that, the simulated NO − 3 content in residue treatments exhibiting a continuous decline was probably lower than the actual values.

Cumulative gas fluxes and average mineral N content
Model predictions of cumulative N 2 O fluxes and CO 2 fluxes for the 16 treatments were significant and strongly correlated with the observed fluxes ( Fig. 6  and NO − 3 content, the estimated slopes were also close to unity, and the deviations between prediction and measurement, signified by the relative RMSEs (rRMSEs), were 46 % and 11 %, respectively. For the average NH + 4 content in the low-range, simulated values were found to overestimate the measured data (Fig. 6c).
A regression of simulated cumulative N 2 O flux residuals against observed data confirmed that underestimations were strongly (R 2 : 0.92) associated with the magnitude of observed N 2 O fluxes (Fig. S1a). The negative slope of the regression indicated an underestimate of 0.83 g N 2 O-N for every 1 g of observed N 2 O-N per square meter. A regression The blue color shows ME values not significantly different from zero by one-sample t tests (significance level α = 0.05). Treatment indices 1-4 represent treatments for mix RC, 5-8 for control RC, 9-12 for mix WW, and 13-16 for control WW, where treatment conditions are, in order, "40 % WFPS, −NO − 3 ", "40 % WFPS, +NO − 3 ", "60 % WFPS, −NO − 3 ", and "60 % WFPS, +NO − 3 ". There are no measured data for nitrate in residue treatments.
of simulated cumulative N 2 O flux residual against the residuals of other variables revealed that underestimations were not strongly associated with the residuals of simulated NH + 4 and NO − 3 (Fig. S1c-d). However, we observed that clustering of residuals for mineral N existed, in which underestimations of cumulative N 2 O flux tended to occur when soil NH + 4 was overestimated and when soil NO − 3 was underestimated. Specifically, residuals for cumulative N 2 O flux and soil NO − 3 were simultaneously underestimated in 53 % of the posterior runs as revealed by scatter points falling in the third quadrant, and underestimations of N 2 O flux were accompanied by overestimations of soil NH + 4 in 41 % of the posterior runs by looking at scattering points in the fourth quadrant.

Calibration by multiple treatments
Increasing the number of calibration treatments resulted in lower uncertainty but, at the same time, reduced the performance of posterior models for some treatments (Fig. 7). Cumulative N 2 O fluxes were better simulated for the treatments with higher observed fluxes in each group, especially treatments at 60 % WFPS, but were overestimated for others with low observed fluxes. The regression between the mean simulated and measured N 2 O flux only accounted for 47 % of the variations in the data, much lower than the level of 96 % in the single-treatment calibration procedure (Table 1). Simulated CO 2 and NO − 3 were generally close to the observed data. In the same group, simulated CO 2 fluxes showed no difference between the two levels of soil NO − 3 but not for the levels of moisture. Simulated soil NH + 4 showed a good agree- ment with the measured data in the low range of NH + 4 content but had large model deviations for the four RC residue treatments (a log-scale is used in Figs. 6c and 7c), in which the R 2 was 0.50 in contrast to 0.98 in the single-treatment calibration procedure (Table 1). The statistics rRMSE and RMSE are more sensitive to outliers compared to mean error (ME), and regarding NH + 4 , their high values for NH + 4 (rRMSE of 411 % and RMSE of 0.50 in Table 1) would be reduced to 78 % and 0.01, respectively, after removing the four RC treatments.

Simulated oxygen status and N 2 O sources
Simulated oxygen content in the soil cores was close to that of the ambient air, with the modeled volumetric oxygen content in air-filled pores ranging from 19.5 % to 20 % throughout the incubation period independent of treatment (Table S7). Still, according to the model, denitrificationderived N 2 O accounted for 76 % to 100 % of the cumulative emissions on average (Table S7).
In simulations, the 0-4 cm soil layer was treated as one uniform compartment, and this could have influenced model predictions. To investigate whether increasing the vertical resolution would change the predicted oxygen availability, the soil profile was divided into five uniform layers. The results showed that underestimations of high N 2 O fluxes still existed after this change in the stratified representation of the model (Table S6). The simulated soil oxygen profiles were still predominantly aerobic for all treatments, as in the single-layer model, but did show stratification by depth as depicted in Fig. 8. The oxygen level was the lowest at the beginning of incubation and then showed an increase over the period studied, mirroring the trends in CO 2 flux. Despite the overall aerobic conditions in the soil, the large proportion of denitrification-derived N 2 O emissions was accompanied by the rapid growth of denitrifier biomass (data not shown).

Calibrated parameters
Although the number of parameters used for calibration had been reduced by Morris screening, more than half of the 26 calibrated parameters still exhibited random distributions within the predefined ranges (Fig. S4). High intercorrelations between posterior parameters may explain why some parameters could not be constrained to an unambiguous solution. But five parameters were showing marked variability between treatments in their posterior distributions as depicted in Fig. 9, where the prior ranges of these parameters are indicated in the ordinate.
In most treatments, the parameter d hrateNxOy , representing the half-saturation constant of N concentration in the denitrification process and also known as the Michaelis constant of the enzyme, was well constrained at the lower range of the parameter boundary within 50 mg N L −1 in contrast to the mean of 250 mg N L −1 in the prior range (Fig. 9). A low d hrateNxOy relative to the physical concentration of NO − 3 resulted in a pronounced response of denitrifying bacteria activity to substrate availability (see Eq. 6.44 in Table S2). In some treatments which had no NO − 3 addition, i.e., treatment 3, 7, 9, and 13, the parameter showed more diffuse distribution and higher medians compared to other treatments (Fig. 9). An enzyme with high d hrateNxOy relative to the con-centration of substrate is not normally saturated with substrate, and thus the rate of formation of product is substratelimited.
In more than half of the treatments, the posterior distribution of cn m , the microbial C/N ratio (cn m ) involved in calculations of mineralization and immobilization, was concentrated at around 10 on average. However, for some NO − 3amended treatments which usually had higher N 2 O emission rates, i.e., treatments 4, 6, 10, 12, and 14, the distribution of the calibrated microbial C/N ratio was not well constrained but similar to the prior distribution with medians up to 20.
The parameter representing the rate coefficient for the decay of the litter carbon pool, k l1 , generally showed higher values in WW treatments than controls, and its range for the high nitrate RC treatment at 60 % WFPS (treatment 4) was markedly higher than other treatments, indicating the faster decomposition of labile organic matter. Besides, the efficiency of NO-based denitrifier respiration, d effNO , showed a low-range distribution for treatment 4. Low d effNO values induced a high respiration rate of denitrifiers carrying out NO reduction (see Eq. 6.47 in Table S2). This treatment also exhibited a low-range distribution of the efficiency of SOM decomposition, f e,l1 , associated with a high fraction of CO 2 production.

Simulation of decomposition and denitrification processes
The sensitivity analyses, investigating the impact of parameter uncertainties on model predictions, showed the importance of the parameter interactions and associations between different processes in the model (Fig. 3a-f). Compared with CO 2 emissions and soil NH + 4 content, N 2 O emissions were controlled by a larger range of parameters related to decomposition and denitrification processes and had generally lower SEEs. According to Fig. 3a, reliable information about the size of humus and litter pools and their decomposition rates will be critical in reducing the uncertainty of simulated N 2 O emissions, and this could be particularly true where part of the soil organic matter is fresh input from residues. The results confirm the findings of previous experimental and modeling studies showing the importance of substrate heterogeneity for simulations of decomposition and denitrification processes (Brilli et al., 2017;Eusterhues et al., 2003;Sierra et al., 2011). However, other studies (Dungait et al., 2012;Schmidt et al., 2011) also indicate that the chemical structure of organic molecules alone may not determine their stability in soil, and environmental and biological controls (e.g., accessibility of the SOM to decomposers, abiotic reactions, and desorption) predominate, especially in the longer term.
In our model simulations, C and N in crop residues were allocated to two labile pools, but the allocation ratio did not greatly influence N 2 O emissions, soil respiration, or mineral N. This could be related to the fact that the overall C/N ratio of crop residues was kept constant as the sizes of organic matter pools changed in the sensitivity analysis. The influence of crop residues on N 2 O emissions may be better re- Table 1. Model evaluation of cumulative gas fluxes and average mineral N content from the single-treatment calibration procedure and the multi-treatment calibration procedure. The mean values of posterior models were compared with the observed data for 16 treatments. Units are valid for the statistics of ME and RMSE.  flected in other residue properties, e.g., the C/N ratio and solubility of individual substrates (Aulakh et al., 1991;Surey et al., 2020). Furthermore, it should be noted that the addition of labile carbon from crop residues does not affect the decomposition of native soil organic matter in the model (i.e., no priming effect), as the decomposition of organic matter in labile and recalcitrant pools is calculated separately in Coup-Model, as in other process-based models. The omission of a priming effect, the importance of which has been shown in field and laboratory studies (Kuzyakov, 2010), may cause models to underestimate the effects of crop residue composition on soil C and N turnover, particularly in connection with the initial burst of gas emissions.
In the present study, denitrifier growth parameters (e.g., d effNO , d growthNO3 ) showed considerable influence on the release of N 2 O in most treatments (Fig. 3a). Our results suggested that the influences of microbial activities on N 2 O emissions varied between different denitrification steps, and the denitrifier respiration for NO reduction showed a relatively larger and broader impact across treatments than other steps. The analysis of the two statistical measures σ and µ suggested that, rather than a single factor driving the model to become more "behavioral" in predicting N 2 O emissions, the collective effects of multiple parameters were more important because one single parameter could exhibit various SEEs as other parameters changed, represented by high variability (σ ) compared to the mean (µ) in Fig. 3d. For calibration of complex models, several combinations of different parameter values might give the same goodness-of-fit between model outputs and measured variables, which is defined as equifinality (Beven and Freer, 2001). The model sensitivity to such parameters is probably attenuated in the case of high-level equifinalities. Besides, the importance of the parameter interaction structure associated with equifinality could hinder the constraint of parameters and hence the reduction of uncertainty in N 2 O simulations when limited measurement data are available. For instance, the fraction of C mineralized to CO 2 , characterized by 1 − f e,l1 , and the decay rate of litter1 (k l1 ) have a product interaction regarding the production of CO 2 (see Eqs. 6.3 and 6.4 in Table S2). Also, the denitrifier growth rates (e.g., d growthNO3 ) and the half-saturation constant of nitrogen substrates (d hrateNxOy ) influence the loss of N from anaerobic N pools by invoking microbial growth via a quotient interaction (see Eq. 6.41 and Eq. 6.44 in Table S2). The parameters which have the greatest impact on soil respiration and NH + 4 content were associated with SOM composition (SOC h ) and decomposability (k l1 , k l2 , f e,l1 , f e,l2 ), suggesting that model uncertainty for soil respiration and soil NH + 4 could be greatly reduced if experimental data for either SOM composition or decay rates were available. For simulation of soil NH + 4 , information about the microbial C/N ratio (cn m ) and denitrifier growth parameters (e.g., d growthNO3 ) is also important because the availability of soil mineral N is closely associated with decomposition dynamics and its consumption by immobilization, nitrification, and denitrification (Lashermes et al., 2022). The influence of the soil wilting point on CO 2 emissions and soil NH + 4 content was larger under dry conditions, while soil porosity had a greater effect on these two variables under wet conditions. Soil porosity and wilting point are key set points of the soil moisture response function controlling the upper and lower bounds of the function, which implies that the measurement of soil hydraulic properties could reduce model uncertainty under contrasting soil moisture levels.

Model performance and possible explanations for deviations
Overall, the performance of posterior models varied between estimated variables and treatments. The timing and magni-tude of peak N 2 O emissions were more difficult to predict than those of CO 2 emissions. The GLUE calibration greatly reduced model uncertainty associated with N 2 O gas emissions, with an average reduction of 83 % in the posterior ME range for all treatments (Fig. 4a), whereas the model still had difficulties capturing high emissions observed in the incubation (Fig. 5a). This was probably in part due to model biases in describing abiotic and biotic properties but also due to uncertainties in the measurement dataset as discussed below (see Sect. 4.4). The model performance was generally consistent with previous attempts to simulate N 2 O emissions from agricultural soils where models have been able to reproduce the cumulative emissions but had the same difficulties in describing the actual dynamics (e.g., Fuchs et al., 2020;He et al., 2016). The evaluation of model bias with respect to the slope β 1 in the linear regression of model results vs. observations (Table 1) showed that the model had a better performance in its simulation of CO 2 fluxes and soil mineral contents than N 2 O fluxes. The major mismatch between the model and measurements was associated with the underestimation of N 2 O flux during the first 3 d of the experiment. Nevertheless, our model showed initial success by better capturing the timing of N 2 O for mix RC, particularly for "60 % WFPS, +NO − 3 " (R 2 >0.4, Fig. 5a) than other treatments. According to Gaillard et al. (2018), who evaluated the simulated N 2 O flux from three process-oriented models (DNDC, Day-Cent, and EPIC), an underestimation of 0.01-0.93 kg N 2 O-N ha −1 for every 1 kg of observed N 2 O-N ha −1 across these models was reported. When Fang et al. (2015) applied four different algorithms to an irrigated corn field, they found that all algorithms underestimated the four highest cumulative N 2 O fluxes among eight N fertilizer treatments and explained this with an underestimation of soil water content. In the present study, parameter-induced biases may have been an important source of simulation uncertainty as shown in Fig. 5, as some microbial coefficients were loosely constrained, with more denitrification-related parameters than other process parameters (Fig. 3). So far very few studies have reported parameterization and calibration of C and N cycling for incubation experiments. Our identified parameter sensitivities and constrained parameters could provide a reference for future modeling of similar systems. More discussion of parameter calibration is presented in Sect. 4.4.
The unamended soil treatments (controls) were well described by the model, indicating that the repacked and pre-incubated soil was a suitable representation of normal soil conditions. For amended soils, a rapid increase in soil NH + 4 was observed by day 1, while the modeled NH + 4 tended to have a delayed increase for most residue treatments (Fig. 5c). According to Lashermes et al. (2022), the red clover residues contained water-soluble N (WSN) corresponding to ca. 1 g N m −2 , which can account for the observed increase. Cutting and mixing residues into the soil probably accelerated the release and mineralization of N from the WSN pool compared to a real field situation (Angers and Recous, 1997), challenging the model that was calibrated to describe decomposition in natural environments (see Sect. 4.4 for more discussion). It also stresses the importance of realistic experimental setups for model parameterization under controlled laboratory conditions. The accumulating trend of NH + 4 simulated for the RC residue treatments was in contrast to the transient NH + 4 peaks observed (Fig. 5c), which indicates that the modeled NH + 4 release from residue decomposition was greater than the NH + 4 consumption by microbial immobilization and nitrification that was simulated. By setting stricter selection criteria of ME and R 2 regarding NH + 4 , such a trend could be avoided, but this was accompanied by decreased N 2 O emission rates owing to reduced net N mineralization (data not shown). In our study, the relationships between N 2 O flux residuals and the residuals for mineral N were weak and not significant, indicating that the N 2 O underestimation at high flux ranges may also be due to other factors.
Inaccurate estimation of proximal factors such as soil water content and temperature by the pedo-climatic subroutines has been put forward as a main cause of errors in simulating C and N emissions in many process-based models (Brilli et al., 2017), but in the present study, the soil water content and temperature were constant during incubation. Instead, a heterogeneous distribution of water within residue-amended soil cores could be a problem for the description of soil water content in the model because water retention capacity in the soil might be altered by the practice of adding crop residues. Lashermes et al. (2022) found that adding crop residues to soil increased the average WFPS of this layer from 60 % to 63 %. Also, Kravchenko et al. (2017) found that specific gravimetric moisture of plant residues in soil could vary in the range of 60 % to 220 % and that residues were characterized by high moisture even at low soil water contents. Hence, the main effect of crop residues on the abiotic soil environment is probably not the marginal change in the average soil moisture content but more likely the co-occurrence of elevated water content and labile C and N within the soil core. Residue fragments with high water retention capacity could represent microenvironments markedly different from those of the bulk soil and promote N 2 O emissions (Kravchenko et al., 2017).
Model results indicated that the simulated O 2 content of soil air at 0-4 cm depth changed only slightly during incubation and was close to the saturation partial pressure in the atmosphere owing to faster diffusion supply compared to soil respiration rates (Fig. 8). However, most O 2 consumption likely occurred in the microenvironment around residue particles. This is supported by observations of O 2 concentration profiles in soil using O 2 microsensors (Markfoged et al., 2011) and planar optodes (Kravchenko et al., 2017), which showed the aerated O 2 partial pressure in the soil matrix away from organic hotspots and steep gradients in O 2 between bulk soil and hotspots of manure and residues, respectively. Model simulations found that denitrification was the major process that produced N 2 O in the experiment, accounting for 76 % to 100 % of the total estimated N 2 O emissions. Parkin (1987) found that a thin water film, even as little as 20 µm, could be enough to deplete air and support denitrification at the surface of decaying litter, and it is thus possible that the observed high N 2 O fluxes were produced via denitrification despite a high average oxygen status within the soil. However, existing process-based models do not describe the heterogeneity in physical and biochemical processes caused by organic amendments, which may limit the ability of these models to reflect the microscale anaerobiosis and SOC availability, as well as to predict peak N 2 O emissions such as those observed in RC treatments (Fig. 5a). Some studies have explored methods to incorporate spatial variability into denitrification models, although conceptual frameworks considering effects of heterogeneity on greenhouse gas emissions have only in recent years emerged and gained attention (Sihi et al., 2020). Using a stochastic modeling approach, Parkin (1987) found that the patchy dispersion pattern of high denitrification microsites was a major factor influencing the overall rates of denitrification. Based on a parsimonious numerical model, Sihi et al. (2020) used probability distribution functions to represent soil microsite production and consumption of three greenhouse gases, which explained occasional observations of simultaneous N 2 O uptake (reduction) and CH 4 uptake (oxidation) that were not typically captured by other models. We suggest that model development should improve the representation of microscale processes in soil, for instance by parameterizing the distribution and extent of heterogeneity in, for example, organic amendments and clay content and by establishing the degree of anaerobiosis associated with organic hotspots and bulk soil separately.
The simultaneous underestimation of N 2 O and NO − 3 in model runs (Figs. 5d and S1) may have been due to an incomplete description of nitrate supply in the residue-amended 0-4 cm soil layer. In a separate incubation experiment using the same soil and experimental design and several of the same treatments, Lashermes et al. (2022) found that adding RC residue to the 0-4 cm soil layer induced a decrease in the NO − 3 content of the unamended 4-8 cm depth layer, indicating that NO − 3 removal in the upper, amended layer during denitrification caused a diffusive mass transfer between the two layers. In the current model framework, solute transport is only possible by convection (driven by water flow) and does not include diffusion driven by concentration gradients. Under field conditions, this spatial-scale infiltration is the main mechanism for solute transport between compartments, but for a period after organic amendments, diffusive NO − 3 supply from the bulk soil can be the most important source of electron acceptor for denitrification, as observed in earlier incubation studies Petersen et al., 1996). The current solute transport process may thus not be sufficient to properly simulate N 2 O production on small scales, especially under low flow rates or for short travel dis-tances where diffusive flux becomes increasingly important (Flury and Gimmi, 2018).
The differences in the posterior parameter distributions provided information about the simulation of C and N processes between treatments. For example, the microbial biomass C/N ratio (cn m ) was constrained within 10 in most treatments, consistent with observations that, on average, the C/N ratio of the soil microbial biomass varies between 6 and 10 (Xu et al., 2013) and does not easily change with litter quality (Spohn, 2015). Fungal cells typically have a C/N ratio ranging from 10 to 15, while bacteria range from 3.5 to 7 (Paul, 2007). In some treatments associated with extra NO − 3 input and high N 2 O emissions, however, the microbial C/N ratios in accepted runs exhibited relatively high values closer to the soil-residue-mixture C/N ratio. According to Eqs. (6.7) and (6.8) in Table S2, a relatively high cn m coincides with simulations of low humification (i.e., less labile C and N converted to recalcitrant matter), as well as intense N mineralization (i.e., more organic N in litter pools converted to NH + 4 ). At the same time, the relatively low values of estimated Michaelis constant d hrateNxOy suggested a high microbial affinity for soluble nitrogen oxides. These trends all promote N availability for denitrification, and we propose with reference to Lashermes et al. (2022) that C/N ratios above 10 were related to an inaccurate assumption that there was no diffusive supply of NO − 3 from the 4-8 cm soil layer. Including solute diffusion in the model may be able to change the posterior distributions of both parameters by better mimicking the mineral N supply.
Compared to control soils, C and N dynamics in residueamended soil, especially at the high moisture level (Fig. 5a), were difficult to simulate. In the experiment, rapid organic matter turnover in the residue-soil mixture was possibly caused by a high concentration of decomposer microorganisms associated with residue fragments. As discussed above, the rapid decay of organic matter (k l1 , k l2 ) and high CO 2 formation rate (f e,l1 , f e,l2 ) in the calibrated parameters in these amended treatments reflected the need to mobilize N for nitrification and denitrification processes to simulate observed N 2 O emissions from amended soils.
We also noted that the model deviations for N 2 O flux were not caused by the spatial resolution of the vertical soil profile, which has been a problem in some studies (e.g., Xing et al., 2011), as model performance concerning N 2 O prediction was not improved in the multi-layer model (Table S6) where the one-layer soil profile had been sub-divided into five layers for simulations. This was not surprising since structural uncertainty was not addressed in this way; for example, increasing the number of layers would not reflect the microscale processes associated with crop residue fragments and soil aggregates, nor would it address the missing description of solute diffusion between interfaces.

Treatment effects
We only investigated how the model responded to the specific change for two soil moisture and NO − 3 levels, but the results we obtained after calibrating the model against multiple treatments indicated that it is difficult to predict N 2 O emissions under varying soil environmental conditions using a common model parameterization (Fig. 7). In the multiple treatment calibration, similar cumulative N 2 O fluxes were simulated for treatments with the same NO − 3 level regardless of the soil moisture level, which was different from observations. For instance, in the experiment with RC treatments, higher N 2 O fluxes were associated with the higher WFPS level (60 %) rather than with the higher NO − 3 level but with a strong interaction between the two factors (Taghizadeh-Toosi et al., 2021). However, single-treatment calibration showed more success to simulate the effects of different soil moisture levels on N 2 O fluxes. In a recent study, Grosz et al. (2021) also found it difficult to describe treatments in incubation studies with three N 2 O models (DNDC, CoupModel, and DeNi) that responded to controlling factors in the same direction as measurements with frequencies from only 19 % to 67 %. In CoupModel, soil moisture effects have indirect effects on denitrification through decomposition, nitrification, and gas diffusion processes in which the soil moisture response function and its association with anaerobic conditions were accounted for. Hence, soil moisture may have had fewer or delayed effects on the N 2 O emissions in model applications than in soil incubations with non-uniform distribution of moisture and C and N sources. The model deviations in our study, obtained by calibration of multiple treatments, suggested that potential limitations in model assumptions or the description of mechanisms were more important for unsatisfactory model responses than parameterization. It should be noted, however, that only two moisture levels were included in the experimental setup, and in the absence of a more detailed moisture gradient, we were unable to draw more concrete conclusions about the model's performance in simulating soil moisture effects.
On the other hand, with the same type of residue amendment, treatments with higher cumulative N 2 O fluxes -either high soil moisture level or high N input -were better simulated by the model (Fig. 7a). This can be understood from the characteristics of the calibration dataset and selection criteria. The high flux samples represented only a minor fraction of the total samples (i.e., 40 sampling points) in each group but were higher than the rest of them by orders of magnitude (Fig. 5a). The application of the ME criterion mainly constrained model deviations for the high fluxes in each dataset and less so for minor fluxes. It may be argued that this limitation could be improved by applying more stringent additional criteria such as R 2 . However, this would reduce the acceptance rate or even refuse all posterior runs. Interestingly,  obtained similar results in a GLUE context by using the Nash-Sutcliffe-based likelihood and stormwater measurements with large internal variability, and they concluded that the choice of selection criteria should be based not only on their mathematical features but also by looking at the characteristics of the available data.

Improving modeling practice and experimental design
Some reasons for deviations between model simulations and experimental results were already discussed, but possibly other microbial and physiochemical processes are also not well described by current modeling practices. For example, enzymatic processes accounting for N 2 O production and reduction could change progressively during incubation, which is not reflected in fixed parameter values, as in this study. Khalil et al. (2005), coupling model and soil batch experiments, demonstrated the importance of an increase in the ability of denitrifiers to reduce N 2 O for correct simulation of N 2 O dynamics and variations in N 2 O / (N 2 O + N 2 ) ratios, especially when anaerobic conditions lasted more than 1 d. Microbial biomass and enzyme production can also vary under field conditions. Davidson et al. (2012) found that allowing model parameters to vary seasonally was more effective for modeling field data than calibrating parameters fixed at constant values to the entire annual dataset. In view of these reports, we tried to test the sensitivity of simulated N 2 O emissions to dynamic parameters using the default setup as a benchmark. Two parameters, d effNO and d effN2O , were selected to simulate an increase with time in the ability of denitrifiers to produce and reduce N 2 O, respectively, by assigning different values before and after day 3. The results (Fig. S7) show that the time course of N 2 O emissions could be manipulated, with an intensification of the N 2 O reduction activity over time changing the timing of an emission peak. It indicates that the use of fixed model parameters for short-term incubation experiments could lead to systematic errors if biotic properties such as the activity of enzymes or the affinity for a substrate of soil microbes vary significantly.
The rapid N mineralization in RC treatments during the early stage of incubation (Fig. 5c) may have been associated with disturbances in the experimental setup (see Sect. 4.2), which is beyond the description of the model framework. We conducted a simple sensitivity test with initial NH + 4 content on treatment 4 using default parameters and one calibrated parameter combination from the posterior ensemble. We found that by setting the initial NH + 4 content to different values between the observations on day 0 (value taken from the corresponding control) and day 1 to account for the rapid mineralization of WSN, simulated N 2 O in the early stage could be enhanced (Fig. S8), and the impact was linked to parameter values. When using process-based models to simulate such systems, the measurements taken shortly after experiment initialization can be more suitable model inputs compared to the values from the starting point, and thus frequent measurements in the early stage of experiments are required.
We also found that our capacity to evaluate model performance was limited by the data available for model estimation and calibration. Firstly, some model parameters were not assessed in the incubation experiment (e.g., soil and residue labile C content and microbial biomass), and their values were either estimated or determined by calibration. Proximal chemical analysis related to these properties of soil samples could help reduce model uncertainties. Secondly, the quality and temporal resolution of the measurements of controlling factors such as NO − 3 and NH + 4 were limited. We used mean values of recalculated measurements to compare with model simulations, in which any measurement error was included in parameter ranges. Having explicit measurements in each layer (i.e., soil-residue mixture and bulk soil layer) of incubated soil cores and having higher measurement frequencies of mineral N and gas emissions within the first 24 h of the experiment are promising tools to improve the reliability of experimental data for modeling. Thirdly, in soil N 2 O modeling, measuring more ancillary variables regarding N cycling (e.g., N 2 , NO) and using state-of-the-art techniques (e.g., 15 N gas flux methods) could help test the validity of model principles (Grosz et al., 2021). Although collecting all data types discussed here is not always possible or practical, we consider that reporting ancillary model outputs regarding N cycle processes even in the absence of observations, particularly the denitrification products, soil oxygen content, and anaerobic fraction, will aid model evaluation efforts, which was not done very often in previous studies. Accordingly, closer collaboration between modelers and experimentalists is essential to increase mutual understanding, enforce better modeling practices, and find new knowledge by merging data and modeling.

Conclusions
The current setup of CoupModel, when applied to results from an incubation study, indicated that parameters associated with the decomposition of SOM and denitrifier growth were important in regulating soil respiration and mineral N dynamics. A high level of parameter interaction and equifinality issues that exist in simulating N 2 O emissions has brought us challenges in quantifying parameter sensitivities and improve parameter constraints.
The parameters showing posterior distributions that differed from the prior distributions revealed specific modeled microbial processes between treatments and may be used as references behind observations. For example, in the treatments with NO − 3 addition, the relatively low values of d hrateNxOy in the posterior distribution suggested a high microbial affinity for soluble nitrogen oxides, accelerating microbial denitrification. More intense SOM decomposition was simulated in residue treatments compared to the "control" simulations.
The uncertainty analysis demonstrated that a model bias towards underestimating high-range daily and cumulative N 2 O fluxes was likely associated with a simplified description of microbial processes in mineral N dynamics. While the simulated soil respiration response to soil moisture was generally in line with the direction of measurement, the modeled N 2 O emissions were not as sensitive to the WFPS as the measured data, but further assessment of this requires experimental data with more moisture levels. We discussed potential model structure uncertainties based on current model outcomes (i.e., the prediction bias, oxygen profiles, N 2 O sources, and moisture effects) and the emerging knowledge in recent experimental studies. Several suggestions for model improvement were described, including the use of new parameters and equations to represent microscale heterogeneity, as well as a re-examination of the effects of soil moisture on denitrification processes with the assistance of more experimental data. Under the current modeling framework, allowing for dynamic microbial parameters in calibration and careful consideration of initial mineral N conditions may assist in better model representation of microbial and physiochemical processes in the context of incubation experiments, as well as N application in field conditions. Generally, we conclude that modeling N 2 O emissions in controlled experiments is useful to identify the need for prior knowledge in both basic (e.g., decomposability of SOM) and elaborate (e.g., denitrifier growth) aspects of the processbased model for reducing the uncertainty of N 2 O flux estimates. Moreover, we identified potential model deviations from observations and discussed future steps that may be required to assess their sources. We believe there is a need to revisit basic model assumptions and improve the experimental design that enables more effective modeling practices, model evaluation, and comparison.
Code availability. The codes and descriptions of the models are available from the authors on request.
Data availability. The laboratory experiment and the modeled gas flux data are available from the authors on request.
Author contributions. JZ and SOP conceived the study. JZ designed the modeling experiments and carried them out, with contributions from WZ and SOP. PEJ made a version of CoupModel available for the study and provided advice for model application. JZ prepared the manuscript and figures with contributions from all co-authors.