Parameter-induced uncertainty quantification of soil N 2 O , NO and CO 2 emission from Höglwald spruce forest ( Germany ) using the LandscapeDNDC model

K.-H. Rahn, C. Werner, R. Kiese, E. Haas, and K. Butterbach-Bahl Karlsruhe Institute of Technology, Institute for Meteorology and Climate Research, Atmospheric Environmental Research, Kreuzeckbahnstr. 19, 82467 Garmisch-Partenkirchen, Germany now at: Biodiversity and Climate Research Centre (BIK-F), Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt am Main, Germany

Abstract.Assessing the uncertainties of simulation results of ecological models is becoming increasingly important, specifically if these models are used to estimate greenhouse gas emissions on site to regional/national levels.Four general sources of uncertainty effect the outcome of processbased models: (i) uncertainty of information used to initialise and drive the model, (ii) uncertainty of model parameters describing specific ecosystem processes, (iii) uncertainty of the model structure, and (iv) accurateness of measurements (e.g., soil-atmosphere greenhouse gas exchange) which are used for model testing and development.
The aim of our study was to assess the simulation uncertainty of the process-based biogeochemical model Land-scapeDNDC.For this we set up a Bayesian framework using a Markov Chain Monte Carlo (MCMC) method, to estimate the joint model parameter distribution.Data for model testing, parameter estimation and uncertainty assessment were taken from observations of soil fluxes of nitrous oxide (N 2 O), nitric oxide (NO) and carbon dioxide (CO 2 ) as observed over a 10 yr period at the spruce site of the Höglwald Forest, Germany.By running four independent Markov Chains in parallel with identical properties (except for the parameter start values), an objective criteria for chain convergence developed by Gelman et al. (2003) could be used.
Our approach shows that by means of the joint parameter distribution, we were able not only to limit the parameter space and specify the probability of parameter values, but also to assess the complex dependencies among model parameters used for simulating soil C and N trace gas emissions.This helped to improve the understanding of the behaviour of the complex LandscapeDNDC model while simulating soil C and N turnover processes and associated C and N soil-atmosphere exchange.
In a final step the parameter distribution of the most sensitive parameters determining soil-atmosphere C and N exchange were used to obtain the parameter-induced uncertainty of simulated N 2 O, NO and CO 2 emissions.These were compared to observational data of an calibration set (6 yr) and an independent validation set of 4 yr.
The comparison showed that most of the annual observed trace gas emissions were in the range of simulated values and were predicted with a high certainty (Root-mean-squared error (RMSE) NO: 2.4 to 18.95 g N ha −1 d −1 , N 2 O: 0.14 to 21.12 g N ha −1 d −1 , CO 2 : 5.4 to 11.9 kg C ha −1 d −1 ).However, LandscapeDNDC simulations were sometimes still limited to accurately predict observed seasonal variations in fluxes.

Introduction
Trace gas emissions (N 2 O, NO and CO 2 ) from soils of terrestrial ecosystems are highly variable in space and time due to the interplay of climatic drivers (mainly rainfall and temperature) and various ecosystem processes involved in C and N transformation and associated production and consumption Published by Copernicus Publications on behalf of the European Geosciences Union.
of trace gases.Therefore, quantification of the annual sink or source strength of soil greenhouse gases (GHG) is still a challenge.For sound estimates at site scale, measurements are labour and cost intensive since they should be carried out at high temporal resolution covering full annual cycles (Kiese et al., 2005;Werner et al., 2006).For that reason quantification of soil GHG emission on regional/national scale cannot solely depend on measurements, but needs to follow an integrated measuring and modelling approach.In recent years, an increasing number of biogeochemical models were tested on site scale and, after sound validation, were applied in a coupled GIS model approach for regionalisation of soil GHG emissions (Del Grosso et al., 2006;Kesik et al., 2006;Pathak et al., 2005;Li et al., 2004;Salas et al., 2007;Potter et al., 1996;Butterbach-Bahl et al., 2001;Kiese et al., 2005;Werner et al., 2007).This approach is in line with the IPCC recommendations and requirements to develop improved inventories by use of biogeochemical models.However, the so-called Tier 3 approach includes not only up-scaling of GHG emissions, but also the obligation to perform uncertainty quantification of the simulation results.
Uncertainty of model predictions can be classified into four categories: (i) uncertainty of information used to initialise and drive the model (Vrugt et al., 2008;Wikle, 2003), (ii) uncertainty of model parameters (e.g., describing specific ecosystem processes) (Vrugt et al., 2003), (iii) uncertainty of the model structure (Refsgaard et al., 2006) and (iv) accurateness of measurements (e.g., soil-atmosphere greenhouse gas exchange), which are used for model improvement and development (e.g.van Oijen et al., 2005).Uncertainty estimates in many modelling studies that investigate the soilatmosphere exchange of trace gases only cover the assessment of uncertainty imposed by input data (e.g.Li et al., 2004;Werner et al., 2007;Winiwarter and Rypdal, 2001;Kiese et al., 2005).Due to the high complexity and large number of model parameters, work focused less on uncertainty related to model parameters as the computational demand of complex models is high and often model adaptations are required to allow application of statistical methods.
The Bayesian approach was increasingly used to quantify model parameter uncertainty on simulation results of process-based models in recent years.The Bayesian theorem was used for calibration and uncertainty assessment of parameters of dynamic process-based forest models mainly focusing on carbon turnover (van Oijen et al., 2005;Svensson et al., 2008;Klemedtsson et al., 2008) and more recently also for parameters involved in production, consumption, transport and emissions of soil GHGs (e.g.Lehuger et al., 2009).To our knowledge van Oijen et al. (2011) is the only study so far comparing four process-based biogeochemical forest models within a Bayesian model comparison framework.In contrast to such a model inter-comparison, the aim of this study is to provide deeper insights into the individual parameter uncertainty and calibration of the model LandscapeD-NDC and the subsequent uncertainty of simulated trace gas exchange.The parameter distribution, which was estimated after an objective multi-chain convergence check, was additionally tested on a validation dataset.
LandscapeDNDC is a process-oriented biogeochemical model, which simulates the biosphere-atmosphere exchange of greenhouse gases on the basis of the simulation of all major ecosystem C and N cycling processes (Haas et al., 2012;Werner et al., 2012).
We used a time series covering 10 yr of soil-atmosphere trace gas fluxes as observed continuously in sub-daily time resolution at the Höglwald spruce forest, Germany (e.g.Butterbach-Bahl et al., 2002;Wu et al., 2010) to assess the model parameter uncertainty of the LandscapeDNDC model.
Results of the Bayesian calibration approach can be used to gain insights into the complex parameter dependencies, to identify weaknesses in process descriptions and to narrow the range of likely model parameter values, which finally reduces uncertainty of the simulation results.

Model description and model parameter selection
The LandscapeDNDC model applied in this study is a derivate of the DNDC model family (Li et al., 1992(Li et al., , 2000) ) and was further developed from the MoBiLE model framework (Grote et al., 2009(Grote et al., , 2011)).LandscapeDNDC incorporates functions of DNDC (agricultural sites) and PnET-N-DNDC/Forest-DNDC (forest sites), which were initially set up to predict soil carbon and nitrogen biogeochemistry with a specific focus on the simulation of soil N trace gas emissions (Li et al., 2000;Stange et al., 2000;Butterbach-Bahl et al., 2001;Kiese et al., 2005;Kesik et al., 2005;Werner et al., 2007).LandscapeDNDC integrates different modules for describing soil environmental conditions (temperature, moisture, pH, nutrient availability and anaerobic volume fractions), soil-chemistry integrating microbial C and N turnover processes (mineralisation, nitrification and denitrification) and associated C and N trace gas emissions (e.g., N 2 O, NO and CO 2 ) as well as vegetation dynamics (Grote, 2007).It also offers a flexible initialisation of vegetation and soil properties and efficient multi-site calculations that ease regional applications as well as sensitivity and uncertainty studies (Haas et al., 2012).
Each module includes parameters derived from physical and chemical principals and laboratory measurements.In this study, we focus on the analysis of parameter-induced uncertainty quantification stemming from the soil-chemistry module describing all soil processes relevant for C and N trace gas production, consumption and transport, being crucial for the simulation of soil-atmosphere GHG exchange.Here, we do not consider model parameters of other modules e.g., for plant growth and soil water cycling modules in order to reduce complexity and degrees of freedom and to increase the efficiency of the calibration process.However, these modules were tested and calibrated in recent studies (e.g.Kiese et al., 2011) and are run using default parameters.
The soil-chemistry sub-module in total holds 67 parameters, mostly describing biological kinetics of nutrient turnover and transformation by growth and death of different types of microbes (e.g., nitrifiers and denitrifiers).Parameter values are generally derived from laboratory measurements and expert knowledge, if detailed information is not available.This introduces different levels of uncertainty, which need to be quantified and requires calibration.
The model parameters can only be estimated and optimised by an inverse calibration technique (cf.Vrugt et al., 2003), which compares model simulation output by using randomly selected model parameter vectors with measured observations.The observational data used was collected at the Höglwald spruce forest, Germany, covering the years 1994 to 1997 (Papen and Butterbach-Bahl, 1999;Gasche and Papen, 1999) and 2002 to 2003.The remaining observation period (years 2004 to 2007) was used for validation purpose and finally for assessing the prediction uncertainty.
Each parameter included into the uncertainty analysis adds a new dimension in the parameter space.Therefore, computational cost rises tremendously with the increasing number of parameters while efficiency of the calibration technique decreases.Furthermore, correlations among parameters become more likely by increasing the number of parameters.This subsequently leads to slower convergence rates (requiring additional iterations), as parameter vectors, which do not comply with these relations, are less likely to be accepted by the Bayesian algorithm (cf.Gilks et al., 1996).Additionally, a higher degree of freedom exist, i.e., parameter configurations producing similar outputs may not be unique.To avoid these obstacles we used a sensitivity analysis (Saltelli, 2008) developed by Morris (1991) prior to the Bayesian calibration method.This helps to restrict the analysis to the most influential parameters and to avoid over-fitting effects.
The method introduced by Morris is an efficient tool for parameter screening, since it can easily be implemented and computational demands are low (van Oijen et al., 2011).The method varies parameter values and finally produces a ranking of the model parameters based on their impact on the simulated model output of C and N trace gas emissions and soil moisture.
This procedure divides each parameter range in n (here n = 6) equidistant levels, starts with a random parameter vector using these levels and randomly changes one parameter after another to one of the other levels (1 iteration).Differences in model output are stored and used to rank the model parameters according to their influence on the simulation output.Since the trajectory of parameter changes per iteration is randomly selected m times (here m = 50), the method spans the parameter space better than a "one-parameter-at-a-time approach" (see Hamby, 1994).The model parameters, which produce largest differences (i.e., having highest sensitivity on the output variable), are regarded as the most influential ones.
We initialised and ran the model with specific site information (soil, vegetation and climate) of the Höglwald spruce forest to identify the most sensitive parameters of LandscapeDNDC affecting soil C and N fluxes.This approach does not require a comparison of simulated emission to measurements, since the sensitivity analysis only focuses on parameter-induced changes of model output.Parameter sensitivities were calculated separately for the output variables of soil N 2 O, NO and CO 2 emissions, which finally resulted in three different parameter-ranking lists.We selected the first 20 most influential parameters of any list, thereby considering the trade-off between over-parameterisation and under-representing significant processes.Due to close linkage of C and N cycling and in particular NO and N 2 O emission there was a good overlap of the most sensitive parameters.This led to a overall selection of 26 parameters (see Table 1).
We regressed the stored model output (a) to all parameters and (b) to the reduced parameter subset and compared the adjusted coefficient of determination R2 of both linear regressions (cf.van Oijen et al., 2011) to evaluate, whether the reduced parameter set accounts for most of the models behaviour.The results show that for N 2 O and CO 2 more than 90 % and for NO 65 % of the models linear behaviour is explained by the subset of the parameters.We regard these numbers to be sufficient for continuing the Bayesian calibration approach with the restricted parameter set and at the same time assure a balance with calibration efficiency, which will be reduced when introducing more parameters as already stated before.Following the selection of the most sensitive model parameters, the joint parameter distribution given the data was estimated by means of a Bayesian calibration.From this distribution, parameter values can be sampled to perform simulation runs and finally address the frequency distribution of simulation results.See Fig. 1 for an illustration of the workflow.

Bayesian calibration
In a standard frequency approach the parameter value is not regarded as a random variable.The used parameter value is either the true value or it is not (see Ellison, 1996).Therefore, a Bayesian approach is needed (Clark, 2005;van Oijen et al., 2005;Klemedtsson et al., 2008;Gelman et al., 2003;Reinds et al., 2008;Lehuger et al., 2009) since it models the parameter vector θ as a random vector, which allows a direct quantification of the probability of a certain parameter realisation/range.
The probability density of a parameter value θ given the measurement D (posterior) is: p(θ|D) .
( By using Bayes theorem, the posterior is proportional to the product of the likelihood p(D|θ) and the prior density p(θ): (2) The prior, describing the a priori knowledge on parameters, is determined by using literature data and biogeochemical principles to address the most likely parameter value and to constrain the range of a parameter.We use an uninformed prior (uniform distribution) ranging between provided minima and maxima for the given parameter as derived from expert knowledge or laboratory and field experiments.The likelihood, the only unknown term, describes the probability of a data realisation for a particular parameter vector.We assume the difference D − M between data D and model M to be normal distributed, hence the likelihood is (van Oijen et al., 2005): (3) Since this term cannot be solved analytically, a Metropolis algorithm (Metropolis et al., 1953) generates a Markov chain, which samples from the posterior distribution after convergence of the chain (see next section for convergence criteria).
Although the applied LandscapeDNDC model was run on daily time-step, for the Bayesian calibration, daily simulated as well as measured trace gas fluxes were aggregated to weekly means in order to avoid that minor temporal lags (1-2 days) between daily measured and simulated peak emissions penalise likelihood calculations.Using this approach (see also van Oijen et al., 2011) we find a balance for increasing acceptance-rates resulting in a more conservative estimation of the parameter uncertainties while still representing the temporal dynamic of C and N trace gas emissions.Furthermore, reduction of data by using weekly rather than daily fluxes, helps to prevent asymptotic collapse of the posterior and an overreliance on the information of the data (Arhonditsis et al., 2008).Although within the calibration procedure, model performance evaluation was based on weekly aggregated data the uncertainty quantification was done on a daily simulation time step.We also tested Bayesian calibration of parameters by using monthly aggregated data.However, this resulted in a significant flattening of the daily simulated Fig. 1.Schematic view of the workflow for assessing the uncertainty of simulated soil GHG emissions while using LandscapeDNDC.After reduction to influential parameters by means of a sensitivity analysis, the distribution of the model parameters was estimated using a Bayesian calibration.Subsequently, an uncertainty quantification of simulated emissions was carried out using 20 000 samples out of the 533 000 post burn-in realisations of the parameter distribution stored in a relational database.
emission pattern which was not the case with the parameter set of weekly aggregated data (Fig. 6).
In order to increase computation efficiency, we run the model in parallel for the six simulated calibration years on a High Performance Computing (HPC) Linux cluster.

Criteria to define convergence while using a multi-chain approach
As it is not possible to draw any statistical inference from the sampled parameter vectors if the Markov chain has not converged (Gilks et al., 1996), we used four independent Markov chains (differing only in the individual parameter starting points) and tested for convergence at each iteration step.When convergence was reached (end of "burn in phase"), the previous parameter samples were discarded and all following data were included in the further analysis.
To quantify convergence, Gelman et al. (2003) introduced the measure R which compares the variances of each chain (within sequence variance, Eq. 4) to the joint variance of all chains (between sequence variance, Eq. 5) In the process of convergence the measure R = to reach exactly 1.0, a threshold of 1.2 is introduced as the acceptance threshold (Kass et al., 1998).By using four chains, our implementation spreads the model to 24 CPUs (4 chains × 6 separate simulation years) using the Message Passing Interface (MPI).After 1000 iteration steps the Gelman/Rubin statistic was calculated and continuously updated until convergence (according to R) of chains.In our setup, burn-in of all parameters was completed after 31 656 iteration steps.After a visual inspection of the marginal distributions, we decided to continue the Markov chain, as one parameter (EFFAC) showed a bimodal distribution for two chains, but only one mode for the remaining chains.After additional 133 000 iterations the marginal posterior distribution of EFFAC had the same shape for all four chains.The Gelman Rubin Statistic was at the same time well below 1.1 for all parameters.
The acceptance-rates of the four chains ranged from 14.1 % to 15.8 % (using a step-width of 0.04).These are reasonable values taking into account the large dataset (6 yr of data in daily time resolution and 3 target variables: CO 2 , N 2 O and NO) and, therefore, a rather strict rejection step due to a narrow-shaped posterior (Arhonditsis et al., 2008;Clark, 2005;Rahn et al., 2011).

Effective data storage
The study design and computational setup lead to substantial amounts of data, which need to be efficiently handled within subsequent data analysis.For that reason a interface to a relational database was developed using Structured Query Language (SQL) which warranted a concurrent access and high data integrity.

Estimating simulated soil GHG flux distributions
In a second step the posterior distribution of the 26 parameters was used to quantify the uncertainty of LandscapeD-NDC simulations for soil N 2 O, NO and CO 2 emissions of the Höglwald Forest spruce site (see Fig. 1).For this, we used a total of 20 000 posterior-parameter vectors (posteriorsamples) by selecting every 26th parameter vector out of the 532 000 posterior-parameter samples of the four chains (133 000 for each chain) until 20 000 parameters were taken.We, thereby, reduced dependencies between parameter vectors of consecutive iterations (Kass et al., 1998;Toft et al., 2007), which arose as each parameter vector of the posterior distribution had been taken dependent on its predecessor during the calibration process.
Following the selection of the posterior-samples, we executed LandscapeDNDC with the parameter realisations for the calibration set (years 1994 to 1997 and 2002 to 2003) and an independent validation dataset (years 2004 to 2007).As a result, we obtained distributions (including associated uncertainty) of simulated soil N 2 O, NO and CO 2 emissions.
The root-mean-squared error (RMSE) is used to quantify the difference between measurements and simulations.Therefore, we defined the distance of measurements to the distribution of the simulations as the minimum of the distances between the measurements and the two boundaries of the credible interval or 0 whenever the measurement is within the range of the credible interval.The RMSE of the best simulation (RMSE(θ MAP )) is calculated using the common definition.

Posterior parameter distribution
An estimate for the posterior distribution of the 26 most sensitive LandscapeDNDC parameters for simulation of soil N 2 O, NO and CO 2 emissions was obtained by using Bayesian calibration technique and initial information on the likely range of the selected parameters.To illustrate common features of the obtained marginal posterior distributions, we present a subset of four model parameters (see Fig. 4 and Table 2).For each marginal histogram, 4×133 000 post burn-in chain steps were used.
In the first histogram (Fig. 4a) the marginal distribution of the parameter KN2O, the loss rate of N 2 O during nitrification, is displayed.The prior parameter uncertainty (SD prior = 0.026) was reduced substantially (SD post = 0.005) and the most probable value of the right-skewed distribution is in a narrow region between 0.002 and 0.008.
After the first 30 000 iterations, the marginal distribution of the parameter EFFAC (describing the partitioning of CO 2 and DOC production during microbial decomposition of organic matter) showed a bimodal shape for two out of the four chains.However, after 165 000 iterations the chains fi-nally all sampled from the right-skewed distribution shown in Fig. 4b.As a result, the convergence rate (time needed for convergence to the posterior distribution), was low and consequently the number of iterations was high.
Figure 4c displays the posterior distribution of the parameter KHDC L. This parameter is the decomposition constant for the labile humads pool (death microbial biomass).For this model parameter the posterior distribution is flat, i.e., all values across the explored range are of similar probability.Here, the uncertainty of the initial parameter could not be reduced significantly by the Bayesian calibration and only values approaching zero are less likely than others.
An example for a left-skewed distribution of a parameter is given in Fig. 4d, in this case of KRCL.KRCL is the decomposition constant for the labile litter pool.Although there is a tendency for higher values, smaller values can still occur depending on the values of the other 25 parameters.In conclusion, the uncertainty of the parameter KRCL is reduced, however, not as much as compared to KN2O.
A correlation analysis for the 26 selected parameters revealed for most pair-wise constellations no relevant correlations.This might be due to the large number of sampling points over the entire parameter space (see Fig. 9).Higher correlations in absolute appeared only between KMNO2 (Michaelis-Menten constant for NO 2 to NO 3 conversion during heterotrophic nitrification) and D NO (effective NO diffusion constant) with a correlation of −0.71, and between EFFAC and FNO3 U (fraction of microbial N-uptake as NO 3 ) with a correlation of 0.51.All other correlations were in the range of ±0.40, most of them between ±0.25 (Fig. 8).However, that does not fully exclude any relationship between parameters, since they are often of nonlinear character.Figure 2 shows that limiting the values of EFFAC to values <0.5 leads to a more bell shaped distribution of the parameter DRF (scaling factor for decomposition rate constants of SOM) around the value 0.042 (correlation between EFFAC/DRF = 0.32).At the same time smaller values of FTRANS (factor regulating microbial nitrate immobilisation and direct re-mineralisation to NH 4 ), FNO3 U, KRCR (decomposition constant for recalcitrant litter pool) and KMM DOC (Michaelis-Menten constant regulating growth of microbes in dependency of DOC substrate) become more likely, whereas for other parameters like MICRRESP (factor regulating CO 2 production during microbial metabolism in dependency of microbial C/N ratio), FRC (factor regulating microbial death depending on the availability of very labile and labile carbon) and KN2O (loss rate of N 2 O during nitrification) higher values occur more often, thus, get more likely.That also shows that restricting some parameters to a range of their most likely values can narrow the range of likely values of other parameters.
The heat map presented in Fig. 3, shows the relationship between KRCL and KMDC N. While the correlation between the two parameters is low (= 0.06), one can see that lower values of KRCL restrict the range of KMDC N to lower values.To capture all dependencies (compare Fig. 9) when estimating the distribution of model simulations, it is straightforward to use samples of the joint posterior parameter distribution, as the whole structure of parameter dependence is fully included.

Calibration set
In general, most of measured trace gas emissions of N 2 O, NO and CO 2 are within or close to the range of the simulated 99 % credible interval (cf.Gilks et al., 1996) (see for example Fig. 5. RMSE values for each year and each soilatmosphere flux are presented in Table 4).Based on the evaluation criteria, LandscapeDNDC was able to correctly simulate cumulative N 2 O and NO emissions in five and six out of six years, respectively (see Table 3).In two out of three Seasonal dynamics of NO measurements were reproduced for the years 1994, 1997 and partly for 2002, which resulted in low RMSE values for the credible interval (RMSE(CI): 2.40 to 3.20 g N ha −1 d −1 ) and when using the maximum posterior parameter vector θ MAP (RMSE(θ MAP ): 6.66 to 9.20 g N ha −1 d −1 ).Although in most of the remaining years the magnitude of measurements and simulations is similar, the temporal dynamic could not always be clearly reproduced.
N 2 O simulations especially suffer from the inability of the actual LandscapeDNDC version to simulate freezethaw pulse emissions (Papen and Butterbach-Bahl, 1999;Butterbach-Bahl et al., 2002;Wolf et al., 2010) in 1995, 1996, 1997and 2003 (RMSE(CI): up to 16.10 g N ha −1 d −1 ).Therefore, following simulation to measurement comparisons of N 2 O were restricted to periods being unaffected by freeze-thaw events.Nevertheless, cumulative statistics and RMSE statistics can be compared with or without freezethaw events in Tables 3 and 4. One can see that the RMSE is strongly reduced when neglecting frost-thaw emissions (e.g., RMSE(CI) reduced from 16.10 to 7.92 in 1996 and from 2.56 to 0.42 in 1997).Peak emissions of N 2 O (> 10 g N ha −1 d −1 ) in August 2002 could also not be reproduced by the model, although the model could comprehend the general increase of N 2 O emissions in the beginning of August (up to 7 g N ha −1 d −1 ).
CO 2 emissions were underestimated by at least 19 % and 7 % during August to November in 1995 and 1996.From May to June 1997, they were overestimated by at least 25 %.Note that only 1004 CO 2 observations were used for calibration, compared to 1890 and 2075 values for NO and N 2 O. Thus, CO 2 emissions were underweighted by a factor of approx.0.5 in the calibration process.

Validation set
To independently validate the behaviour of the parameterisation, we simulated soil-atmosphere trace gas emissions in Höglwald for 2004 until 2007, i.e., for a time period, which has not been used for the calibration of LandscapeD-NDC.The parameterisation of the model includes the same posterior-samples that have been used to simulate the emissions of the calibration set (1994 to 1997, 2002 to 2003) and to visualise model uncertainty.
For the validation set, LandscapeDNDC produced comparable results as for the calibration set.Cumulative NO

Discussion
Our work shows that the Bayesian calibration approach can successfully be implemented to estimate the posterior parameter distribution of a complex sub-module of a biogeochemi-cal model used for simulating soil N 2 O, NO and CO 2 fluxes at a spruce site of the Höglwald forest, Germany.The applicability of the illustrated method to complex ecological models was also demonstrated in previous studies (e.g.van Oijen et al., 2005;Svensson et al., 2008;Klemedtsson et al., 2008;Lehuger et al., 2009).
Bayesian calibration reduced the prior uncertainty (by up to 82 %) for 15 out of 26 parameters for simulating soilatmosphere exchange of C and N trace gases.For the remaining 11 parameters the calibration process achieved no significant reduction in parameter uncertainty.The flat shape of the distribution of these 11 parameters occurred because different parameter constellations can lead to similar model output.The underlying reasons for that cannot be further specified, as the parameter space is 26-dimensional and small changes in high-sensitive parameters may be compensated by changes of (many or all) remaining parameters.
Correlations between parameters were hardly found since only for two cases parameter correlation was significant (51 % and −71 %).This is most likely due to the fact, that interactions between parameters in a 26-dimensional parameter space are more complicated than linear relations between only 2 parameters, which was shown exemplarily for the parameters EFFAC and DRF.For some of the parameters (especially for those with a flat marginal distribution) this finding might also point to an overparametrisation of the model.The parameter distribution was estimated while running the model using the default parameter values for the remaining sub-modules (e.g., watercycle, plant growth).Hence, our estimation neglects dependencies between parameters of the soilchemistry submodule and parameters of other submodules.The posterior distribution is, thus, only estimated given the default values of the remaining modules.Changing these parameters and, therefore, the behaviour of the complex system will most likely change the distribution of the estimated parameters.Nevertheless, due to the complexity of Land-scapeDNDC including parameters of other modules would complicate and lengthen the approach tremendously.More additional parameters would increase the degrees of freedom and, thus, the number of iterations and time to fully explore the parameter space.Even though we focused only on a subset of all model parameters our results show that parameter uncertainty could only be reduced for a subset of selected parameters.This may indicate model over-parameterisation and Bayesian calibration can give valuable guidance which parameters and module functions need to be focused on to allow further model improvement.By simultaneously calibrating soil N 2 O, NO and CO 2 emissions, we use a multi-objective (here three objectives) framework, so that e.g., a worsening of CO 2 estimation can be compensated by an improvement in NO or N 2 O estimation.Gathering additional data (e.g., from different forests sites) may help to reduce uncertainty for these parameters.However, multiple parameter solutions do not affect the process of uncertainty estimation of soil-atmosphere gas fluxes modelled by LandscapeDNDC, as the posterior parameter solution is used (including all parameter constellations) to generate the distribution of simulated emissions.
The large number of parameters chosen, the complexity of the LandscapeDNDC model (simulating the entire C, N and water fluxes of terrestrial ecosystems), as well as a narrow shaped posterior distribution as a result of a detailed dataset (Arhonditsis et al., 2008;Rahn et al., 2011;Clark, 2005;van Oijen et al., 2011), reduces the acceptance-rate.Consequently, slow convergence rates of the chains were observed.While estimating parameter EFFAC (partitioning of CO 2 and DOC production during microbial decomposition of organic matter) two of the chains at first showed a bimodal shape, the remaining chains sampled distinctly different modes.Therefore, the number of required iterations to reach convergence was substantially higher compared to the other parameters.
Hence, it took 165 000 iterations per chain, which required in total approximately four months computation time.In particular for the parameter EFFAC running four independent chains in parallel, demonstrated to be a more reliable and also necessary procedure to guarantee proper sampling from the posterior distribution.
The knowledge of all complex parameter dependencies helps to understand and improve the reliability of future model simulations and additionally to quantify the uncertainty of the simulated gas fluxes (N 2 O, NO, CO 2 ) associated with model parameter uncertainty.As we use samples from the joint posterior distribution, we achieve more reliable uncertainty approximations of soil GHG exchange than by simply using samples of each marginal parameter distribution.
As we simultaneously calibrated the model parameters with data for three soil trace gas fluxes (N 2 O, NO and CO 2 ) spanning six observation years, the parameter calibration results are a compromise for all years and the respective gas fluxes.Hence, better model simulation results are very likely to be obtained if single years or only one out of the three trace gases would have been chosen.Since the model is just an expert representation of the "real world" one cannot expect that simulation results and flux observations for all years and all of N 2 O fluxes by LandscapeDNDC are generally in the same range as the measurements for both the calibration and the validation set.Nevertheless, due to the importance of freezethaw emissions for the annual N 2 O budget there is an urgent need to further develop and implement model algorithms describing underlying processes of freeze-thaw based N 2 O production and emission in/from soils (e.g.Wolf et al., 2011).Also with regard to soil NO and CO 2 fluxes we identified short-comes of the applied LandscapeDNDC.For example, higher NO emissions in the summer period in 1995 and 1996 were systematically underestimated, while soil CO 2 emissions tended to be overestimated in the end of spring and beginning of summer and underestimated in subsequent summer days.This points either towards insufficient process descriptions, which have already been suggested earlier (Stange et al., 2000), or to problems with model initialisation.We limited the calibration procedure to a subset of 26 most influential parameters describing C and N turnover and production, consumption and emission processes of N 2 O, NO and CO 2 in soils.To allow a more time efficient calibration, we excluded parameters describing soil water and vegetation dynamics.Nevertheless, the above-mentioned failures to accurately describe soil NO and CO 2 fluxes for all seasons point towards the necessity to recheck simulated soil water and vegetation dynamics in LandscapeDNDC.
However, in total the measurements of the calibration and the validation set were covered reasonably well, in particular if we consider that not all sources of errors (i.e., structural model error, input data error) were included.In order to achieve improved approximations of the uncertainty of N 2 O, NO and CO 2 emissions, a stochastic error term could be included in future research, e.g., by setting up a hierarchical Bayesian framework, to account for model missspecifications (Rahn et al., 2011;Arhonditsis et al., 2008).

Conclusions
Following the identification of the 26 most sensitive parameters out of the total set of 67 parameters used in the submodule soil-chemistry, describing soil emission of N 2 O, NO and CO 2 in the biogeochemical model LandscapeDNDC, we successfully implemented a Bayesian calibration to estimate the joint posterior distribution of the most influential model parameters.To ensure that the posterior distribution of parameters was assessed, we used a multi-chain approach and tested for convergence of the Markov chain by the objective criteria developed by Gelman et al. (2003).In contrast to the a priori assumption of a uniform distribution of parameter values over a given range the posterior parameter distribution showed a more distinct pattern, including all complex parameter dependencies.Bayesian calibration reduced the prior uncertainty (by up to 82 %) of 15 out of 26 parameters.This knowledge of the posterior probability distribution is of outstanding importance to guide future model development, e.g., to inform experimentalists which parameters need attention for further investigation.
The comparison of simulated soil N 2 O, NO and CO 2 emissions to measured flux data over the six observation years used in the calibration process showed high agreement.The same is true for independent validation data, including observations of four other years.Hence, we were able to quantify the parameter-induced uncertainty of the total simulated N 2 O, NO and CO 2 emission.However, further studies need to consider other uncertainties such as a model error in order to estimate the total uncertainty of simulated soil fluxes of N 2 O, NO and CO 2 .
In our study, freeze-thaw events could not be reproduced, as underlying processes are not included in the LandscapeD-NDC version used in this study.Since these events can potentially have a strong impact on the total annual soil N 2 O emission, future model development and implementation of freeze-thaw algorithms is intended.
Fig. 2. (A) Heat-map of 2-dimensional marginal distribution of EF-FAC and DRF (decomposition rate factor), the brighter the polygons, the higher the posterior value.(B) histogram of DRF using all values and histogram of DRF using only values of DRF, where EFFAC < 0.5.

Fig. 3 .
Fig. 3. Heat-map of 2-dimensional marginal distribution of KRCL (decomposition constant for labile litter pool) and KMDC N (factor for half optimum content of nitrogen in soil solution for denitrifier activity).Higher values of KRCL lead to a wider range of KMDC N.

Fig. 4 .
Fig. 4. Four typical histograms of marginal parameter distributions.The coloured density lines of two right-skewed (KN2O: loss rate of N 2 O during nitrification and EFFAC: describing the partitioning of CO 2 and DOC production during microbial decomposition of organic matter), a flat (KHDC L: decomposition constant of labile humads pool) and a left-skewed distribution (KRCL: decomposition constant of labile litter pool) were done by post burn-in samples of each individual chain, whereas the histograms are plotted using post burn-in samples of all chains.

Fig. 5 .
Fig. 5. Simulated fluxes (calibration set) versus measurements of NO, N 2 O and CO 2 fluxes at the spruce site of the Höglwald forest in the year 1997.The grey box highlights pulse emissions of N 2 O during soil freeze-thaw events.

Fig. 6 .
Fig. 6.Simulated fluxes (validation set) versus measurements of NO, N 2 O and CO 2 fluxes at the spruce site of the Höglwald forest in the year 2006.The grey box highlights pulse emissions of N 2 O during soil freeze-thaw events.

Fig. 7 .
Fig. 7. Simulated fluxes (validation set) versus measurements of NO, N 2 O and CO 2 fluxes at the spruce site of the Höglwald forest in the year 2007.

Table 1 .
Selected parameters being most influential for simulating soil-atmosphere trace gas fluxes (N 2 O, NO and CO 2 ) with LandscapeD-NDC.
1) www.biogeosciences.net/9/3983/2012/Biogeosciences, 9, 3983-3998, 2012 2 production during microbial metabolism in dependency of microbial C/N ratio NH4 DENIMAX maximum fraction of NH 4 available for auto-and heterotrophic nitrification PERTL fraction of labile litter, which can be reallocated into deeper soil layers PERTR fraction of recalcitrant litter, which can be reallocated into deeper soil layers PERTVL fraction of very labile litter, which can be reallocated into deeper soil layers PSL SC depth dependent factor for reallocation of organic matter into deeper soil layers SRB fraction of labile dead microbial biomass

Table 2 .
Summary of marginal posterior parameter distribution and prior ranges (column 2).Posterior SD and skewness were estimated, whereas the prior SD was analytically calculated.

Table 3 .
Summary of cumulated measured and simulated emissions of NO, N 2 O and CO 2 .Simulated fluxes were only cumulated if corresponding periods with observations were available.Values in Brackets are calculated after freeze-thaw events.