Sciences

Introduction Conclusions References


Introduction
Parameter estimation of process-based models is a long-standing problem in ecology and evolution.Early proponents of process-based modeling have stressed the importance of deriving predictions from basic physical processes, with physical parameters that can be experimentally determined (Bossel, 1992).In practice, however, various reasons, ranging from time limitations to fundamental observability restrictions, result in the situation that most process-based models include parameters for which direct measurements are not available (Hartig et al., 2012).These parameters need to be estimated inversely, meaning that they are adjusted by comparing model outputs to observed data.
To make this comparison, Bayesian methods have become increasingly popular in ecological research during the last decades (e.g.O'Hara et al., 2002;Clark, 2005;Purves et al., 2007;Higgins et al., 2012).Additional to their flexibility and their explicit treatment of parameter uncertainty, a particularly appealing property of Bayesian statistics is the possibility to combine existing information on likely parameters values with the information that is generated inversely (Hartig et al., 2012).As other inverse parameterization approaches, Bayesian methods require the definition of a metric that measures how well model predictions fit to the observed data.In non-statistical inversion approaches, such metrics are often called goal functions, objective functions or cost functions (e.g.Schröder and Seppelt, 2006).Bayesian approaches use a particular statistical metric, the probability of obtaining the observed data given the current model and parameter values, usually referred to as the likelihood.Most previous applications of Bayesian statistics to complex stochastic models derive this probability by Introduction

Conclusions References
Tables Figures

Back Close
Full making distributional assumptions about how observations vary around mean model predictions that are independent of the processes in the model, either ad-hoc or based on the observed variance in the data (e.g.Martínez et al., 2011;van Oijen et al., 2013).This is usually justified with the idea in mind that there is observation uncertainty or variability in environmental conditions that is not accounted for in the model.The approach of constructing likelihoods from such assumptions is the current state-of-the-art, but it has a major limitation: many process-based ecological models are already stochastic and predict variability of certain model outputs.Additional process-based observer models that describe how field data was collected can easily be added (Zurell et al., 2009).In principle, one would therefore much rather use this mechanistically derived variability for deriving the likelihood, because this accounts for our ecological understanding in creating expectations for the probabilities of observing particular deviations from mean model predictions.However, while theoretically always possible, this route was blocked in practice for most models due to the difficulties of making the necessary probability calculations technically tractable (Hartig et al., 2011).
This technical limitation has weakened in recent years.Simulation-based approximation techniques have been developed that allow treating any stochastic model in a formal statistical inference framework.Of those, Approximate Bayesian Computing (Beaumont, 2010) has arguably attracted most attention, but there are other approaches as well.Their common principle is very simple: what is needed for including a stochastic simulation model in a formal inferential framework is the likelihood p(D|M(φ)) for an outcome D to occur under a model M with parameters φ (Diggle and Gratton, 1984).Simulation-based likelihood approximations estimate this probability by generating draws from the stochastic model.Subsequently, different methods are used to approximate the likelihood or posterior (Hartig et al., 2011).Often, this involves comparing model output and observed data by means of data aggregations, also called patterns (Wiegand et al., 2004;Grimm and Railsback, 2012) or summary statistics (Beaumont, 2010;Wood, 2010).For brevity, we will refer to these methods in gen-Introduction

Conclusions References
Tables Figures

Back Close
Full eral simply as likelihood approximations, or, in the context of a Bayesian analysis, as approximate Bayesian methods.
The potential for likelihood approximations in ecology has been repeatedly stressed, but applications to community or population ecology are still rare (but see Jabot andChave, 2009, 2011;May et al., 2013), and to our knowledge, there is no previous study that applies likelihood-approximations to a computationally expensive, parameter-rich model of an ecological community.In this study, we use FORMIND, an individual-based model of tropical forest communities, to test the applicability of the likelihood approximations proposed by Wood (2010) to this model type.Using virtual field data that were generated from the model with known parameters, we test whether the method allows us to correctly identify all model parameters.We also examine how choice and aggregation (summary statistics) of the data affect the results of the inference.Finally, we apply the method to fit the model to field data from a tropical montane forest in Ecuador.

Forest gap dynamics and the FORMIND model
Forests ecosystems are locally highly dynamic.One of the most prominent drivers of these dynamics, particularly in the tropics, are natural disturbances, where large trees that have lost stability due to mortality or other factors fall and damage or kill other trees.This creates a dynamic mosaic of light-filled gaps in natural forests (Shugart, 1984;McCarthy, 2001).Within these gaps, pioneer species colonize first, until other species take over and continue the successional dynamics that are thought to be one part of the explanation for forest diversity (Kohyama, 1993).
Mechanistic forest models that describe the processes of gap formation and revovery have a long history in ecology (Pacala et al., 1996;Bugmann, 1996;Shugart, 1998;Huth and Ditzer, 2000).These models typically include several tree species with differ-Introduction

Conclusions References
Tables Figures

Back Close
Full   (Köhler, 2000).It has been applied for estimating forest succession, variability and disturbances impacts in various tropical locations around the world (e.g.Rüger et al., 2007;Köhler and Huth, 2010;Dislich and Huth, 2012;Gutiérrez and Huth, 2012).The simulation area (plot) in FORMIND, which can be of variable size (we use 1 ha throughout the paper) is subdivided into 20 m × 20 m grid cells.Tree individuals are assigned to one of these cells and interact with each other on this cell, but do not have an explicit spatial position within the cells.The model state is entirely described by species respectively functional type, size (measured in diameter at breast height dbh), and location (cell) of all trees.Other variables, such as tree height and crown dimensions, are derived through fixed allometric relationships.
At each time step (we use 5 yr time steps), the light climate for each tree is calculated from the position of all trees and their respective crowns in the same cell.Subsequently, establishment (light-dependent, stochastic), mortality (stochastic) and tree growth (light-dependent) act on the plot, respectively the tree individuals.Important parameters in the model (Table 1) are recruitment and mortality rates, parameters Introduction

Conclusions References
Tables Figures

Back Close
Full that describe the size-specific maximum growth rates, and the allometric relationships that determine height and crown dimensions.The model is Markov in the sense that the probabilities for reaching any possible state in the next time step are entirely determined by the values of the state variables in the previous time step.Details of these processes, together with a more exact description of the model scheduling, are provided in the online Supplement (see also Köhler, 2000;Dislich et al., 2009).

Bayesian parameter estimation with simulation-based likelihood approximations
We use a Bayesian approach for parameter estimation.Hartig et al. (2011) recommend using Bayesian methods (or at least MCMCs) with simulation-based likelihood approximations, because MCMCs (unlike optimization approaches) are more robust towards variance in likelihood estimates generated by the approximation.They are also less sensitive to interactions between parameters that are to be expected in process-based models.In principle, however, one could use the likelihood-approximation used in this study with an optimization algorithm in a maximum-likelihood framework as well.
There are a number of introductions to Bayesian statistics.A detailed reference is Gelman et al. (2003), for shorter introductions see Ellison (2004).We therefore give only a brief summary here.The outcome of a Bayesian inference is a probability distribution P(φ|D obs ) for the parameters φ given the observed data D obs .This distribution, called the posterior, is calculated as where c is a normalization constant, the prior probability density p(φ) quantifies parameter uncertainties before comparing the model to the observed data, and the likelihood function p(D|φ) describes the probability of obtaining the observed data conditional on a parameter set φ, which can, broadly speaking, be interpreted as a measure of fit.
Because our main concern in this paper is the approximation of the likelihood, we chose wide uniform (flat) priors for all parameters and fit types, which means that the Introduction

Conclusions References
Tables Figures

Back Close
Full posterior and likelihood are strictly proportional to each other across the possible prior range.Tables with the widths of these uniform priors are provided in the online Supplement.Given that we knew that the model reacts nonlinearly to many parameters, other uninformative priors choices would have been possible (e.g.Kass and Wasserman, 1996), but we felt that for this study it is more useful to ensure proportionality of likelihood and posterior to make the results better interpretable.

Generating approximate likelihoods
The technical key novelty in this study is the definition of the likelihood p(D|φ).In "conventional" Bayesian or maximum likelihood studies, this conditional probability is obtained by formulating an error model that quantifies probabilities of deviations between model predictions and observations to occur (e.g.Van Oijen et al., 2005).This model may be mechanistically motivated, for example by knowledge about measurement uncertainties during obtaining the data.In practical situations, however, there are usually a number of error sources that interact, and error models are therefore typically either fixed ad hoc (van Oijen et al., 2013) or derived from the observed variability in the data (Martínez et al., 2011).Hence, "conventional" likelihoods are usually independent of the mechanisms in the process-model that is fit.
The method that we apply here goes beyond such an independent error model towards an approach where both the mean model prediction, but also the probability of observing deviations from the mean, are derived from the same stochastic ecological processes.This is particularly promising in systems where process-stochasticity dominates observation errors.For many tropical forest inventories, this is the case -typical observation errors are known (Chave et al., 2004), and based on these results we can assume that, for small plots (1 ha), observation uncertainty is small compared to the local biomass variation due to successional dynamics (Chave et al., 2003).FORMIND simulations of the aforementioned successional dynamics triggered by gap formation explain the extent of this variability well (e.g.Köhler and Huth, 2010) and can therefore Introduction

Conclusions References
Tables Figures

Back Close
Full be used to generate statistical expectations for field observations such as, for example, plot biomass (Fig. 1).Several techniques have been suggested for achieving approximations of the likelihood p(D obs |φ) from the variability of the stochastic simulation outputs.Most prominent is arguably the method of Approximate Bayesian Computing (ABC) (Csilléry et al., 2010;Beaumont, 2010), which has attracted much attention in recent years.However, as discussed in Hartig et al. (2011), there are a number of closely related methods that are currently not counted as examples of ABC, but that apply nearly the same principles.In this study, we apply a simulation-based likelihood approximation used by (Wood, 2010), called a "parametric likelihood approximation" in Hartig et al. (2011).
The principle of this method is to estimate p(D obs |φ), for any φ desired, by fitting a parametric distribution to the output of the stochastic simulation, and estimating the probability of obtaining D obs from this distribution (Fig. 2).We used a multivariate normal distribution because it fitted well to the simulation outputs, and it allows a convenient estimation of the covariance structure, but normality is by no means a fundamental requirement.For this multivariate normal approximation, the likelihood of obtaining the observed data D obs is where c is (2π) k/2 , k the dimension of D obs , dsim (φ) is the corresponding vector of mean simulation outputs, and Σ −1 sim (φ) the covariance matrix of the simulation outputs that summarizes the variability within each dimension of the model output, but also correlations between those dimensions.Pseudocode for the entire parameter algorithm is provided in the Supplement.

Representation of the data
Similar as in ABC, it is desirable to represent the data used in Eq. ( 2 a computationally efficient way.Therefore, we do not use the raw data (which would be in our case the size and identity of each tree on the plot), but some aggregation (summary statistics) of the data.Ideally, one would choose the lowest-dimensional aggregation of the original data for which there is no loss of information with respect to the inference (minimal sufficient statistics), but there is no generally accepted rule about how to find this representation (but see Fearnhead and Prangle, 2012;Blum et al., 2013).We therefore decided to use mainly two aggregations of forest data that have been frequently used for summarizing local inventory data in forest modeling.The first aggregation is stem size distributions per PFT, which count the number of tree individuals in (here) 10 cm size classes.The second is the size specific mean growth, which quantifies the mean stem diameter growth for different tree size classes.We also experimented with other forest attributes or aggregations of the data (see Table 2).

Posterior estimation
Subsequent posterior estimation based on the approximate likelihood was done with an adaptive Metropolis-Hastings MCMC (Haario et al., 2001).We always ran several chains and checked convergence visually and with Gelman-Rubin diagnostics (Gelman and Rubin, 1992, see Supplement for further details).As we required typically several seconds to evaluate a single parameter combination with FORMIND, posterior estimations cost substantial computing time.The exact number, length and burn-in of chains are provided in the figure captions.Figure 2 visualizes a summary of the analysis method.

Field data, model setup and analysis
We used two data sets to fit the parameters of the model, a "virtual" 1 ha inventory with three plant functional types (PFT1 = pioneer, PFT2 = mid-successional, PFT3 = late successional) that was created from the FORMIND model itself (which has the advantage that the "true" parameter values are known), and a 5 ha forest inventory from Introduction

Conclusions References
Tables Figures

Back Close
Full a montane tropical rainforest in Ecuador that is described in (Dislich et al., 2009).The purpose of the virtual dataset is to test the parameter estimation method for different data types in a situation where true parameters are known, while the data from Ecuador provides a realistic case study by testing the method in a situation that had previously been dealt with by informal model calibration.
To create the "virtual" inventory, we used a base parameterization that was adjusted for exhibiting biomass values and successional patterns typical to a wet tropical lowland rainforest.With this setting, we simulated 1000 model runs, and created virtual datasets from the mean equilibrium values of these replicates for different types of output variables (summary statistics) such as biomass, stem diameter growth rates and stem size distributions.We also experimented with a different number of parameters to be estimated.A summary of these options, labeled with V1-V5, is provided in Table 2.While the question of sufficiency of different data types for the parameterization is a fundamental concern in this type of analysis (e.g.Jabot and Chave, 2009), the number of estimated parameters is more a practical issue: we knew already before that FOR-MIND exhibits interactions between parameters with respect to these outputs, which slows down posterior estimation and makes the analysis of the posterior distribution difficult.
For the fit of the model to field data in Ecuador, a tree-species grouping into seven PFTs was used that is described in detail in Dislich et al. (2009).Due to data availability, we used only the stem size distributions for the parameter estimation which was labeled E1.Introduction

Conclusions References
Tables Figures

Back Close
Full We concentrate here on the case V1 in Table 2 (detailed data, not all parameters under calibration).Results for the other cases are discussed shortly below.Detailed results are provided in the Supplement.

Marginal distributions
Figure 3a shows the estimated marginal posterior densities (Eq. 1) for the parameterization V1 in Table 2, which represents the probability assigned to the different possible values for the respective parameter.We find that most parameter values are retrieved correctly and show moderate uncertainties of the order of 20-50 % of the mean.In interpreting these plots, note that "marginal" means that we display the values of one particular parameter in the posterior sample without consideration of the corresponding values of the other parameters.The latter is important because marginally, the uncertainty of a parameter appears often substantially larger than it effectively is due to correlations between parameters.We examine this in Sect.3.1.2.

Correlations
Marginal distributions represent a cross-section of the posterior sample along a particular parameter axis, which neglects potential trade-offs between parameters with respect to the data that is used for the fit.Statistical models are usually designed in a way that such correlations are avoided wherever possible.For process-based models, the correspondence to concrete biological mechanisms is usually the main design criterion.It is therefore likely that such correlations will appear when estimating parameters of process-based models, as evidenced by Fig. 3b.Moreover, it is to be expected that the correlation structure depends on the data used to fit the model -less informative data will typically lead to more parameter combinations that can reproduce this data, affecting the correlation structure in the posterior sample.These expectations are largely confirmed by our results.We find strong negative correlations particularly for recruitment and mortality of early successional types, as Introduction

Conclusions References
Tables Figures

Back Close
Full one would expect, because, for those PFTs, increased mortality can be compensated for to some extent by increased recruitment.Also, we find that the correlation structure changes with the data types used.A detailed analysis of the correlation structure for the different data types (summary statistics) tested by us is provided in the Supplement.

Choice of data type and number of fitted parameters
For the parameter estimations V3-V5 in Table 2 that used less information (more aggregated model outputs or summary statistics), posterior parameter estimates were clearly wider than for our baseline V1 (Table 2, for details, see Supplement).It can therefore be concluded that all further aggregations of the data used in V1 lose information for the purpose of estimating the considered parameters (see also Wiegand et al., 2004).Similarly, increasing the number of fitted parameters (V2) increases the width of the posterior distribution.For all cases, the results indicate that more coarse aggregations or more parameters under calibration lead to additional correlations between parameters with respect to the objective of reproducing the respective data type, with the consequence of wider marginal distributions and slower convergence of the MCMC algorithms (Table 2, see Supplement for details).

Reduction of predictive uncertainty
Based on the posterior distribution of parameter values, we may also estimate the posterior predictive uncertainty for output values of the model, that is, the uncertainty estimates that are generated for the output of the model given the posterior parameter uncertainty for the parameters of the model.We compare three cases, the inherent stochastic uncertainty of the model with the "true" parameters, the uncertainty resulting from parameters drawn from the prior distribution (i.e.before parameter estimation), and the uncertainty for parameters drawn from the posterior distribution (i.e. after parameter estimation).We see that the posterior predictive mean is similar to that of the "true" parameters, with predictive uncertainty only slightly larger than for the "true", Introduction

Conclusions References
Tables Figures

Back Close
Full fixed parameter value, which indicates that, for a single 1 ha plot, the output uncertainty generated from process-stochasticity is of the same order of magnitude as the uncertainty originating from the parameters (Fig. 4).The prior predictive distribution, showing the predictions before calibration, is clearly biased towards smaller values.
The reason is likely that many parameters in the prior distribution, in particular those with high mortality, result in very low biomass values.

Fit to Ecuadorian montane rain forest, E1
The results of the fit to field data in Ecuador (case E1 in Table 2) are displayed in Fig. 5.We show the marginal distributions for each parameter scaled to the prior range.
As priors were set wide, but with typical ranges for a forest of that type in mind, this provides a visual estimate of the reduction of parameter uncertainty that would be reached by from a state at which no specific information about the plot is availablea distribution of a width of, e.g., 0.2 indicates that the prior uncertainty is reduced by 80 % by the chosen data type (static stem size distributions).Parameter correlations and unscaled marginal parameter estimates are provided in the Supplement, Figs. 13 and 12, respectively.

Discussion
Inverse parameters estimation of ecological models requires a metric that quantifies how well model predictions fit to observed data.Because of technical limitations, the current state-of-the-art is choosing these metrics from expert knowledge or deriving them from field data.However, new statistical methods make it possible to generate goodness-of-fit metrics directly from any stochastic simulation models.More specifically, simulation-based likelihood approximations allow generating approximate likelihood functions that report the probability of obtaining a certain field observation from Introduction

Conclusions References
Tables Figures

Back Close
Full any stochastic ecological model.This technique provides a universal and unambiguous way to connected stochastic ecological models to field data.The present study is one of the first to apply this method to a large ecological model.We use a parametric likelihood approximation, proposed by Wood (2010), to fit FOR-MIND, a relatively complex individual-based forest gap model, to a range of different virtual field data sets created from the model as well as to real field data from an Ecuadorian tropical forest.

Validation of the method with virtual field data
Fitting the model to different virtual field data sets allowed us to assess uncertainty and bias of the fit for situations where true parameters were known.For the most detailed data (abundance and growth distributions, case V1 in Table 2), parameter values were largely unbiased, with correlations between a few of the parameters (Fig. 4, as well as Figs. 2 and 3 in the Supplement).With increasing level of aggregation, (V3-V5), parameter values showed increasing correlations, bias and uncertainty.Note that it is important to take correlations into account when interpreting marginal parameter uncertainties such as Fig. 3a: if there are correlations between parameters, marginal uncertainties appear wider than in the multivariate correlation plots.This remains true for higher-order correlations which are likely present for more aggregated data types such as V4 and V5, but which are not covered by our analysis.Comparing the extent to which model parameters are constraint by the data by the width of their marginal posterior distribution only can therefore be misleading in the presence of strong correlations.
Correlations in the posterior indicate a trade-off between parameters with respect to the data that is used for the fit.For the data V1, for example, correlations occurred mostly within the parameters of the same PFT, with higher mortality requiring higher recruitment, with the strongest correlation for the early-successional type and weakest correlation for the late-successional type.The explanation is that if the mortality of a species is increased, we have to increase recruitment or growth to maintain similar Introduction

Conclusions References
Tables Figures

Back Close
Full species numbers in the community.However, as V1 contained data on growth data, growth rates are tightly constrained, so the only option to maintain a similar species number is to increase recruitment, those effects are most strong for the pioneer species that has the highest turnover.The fits to data V3 without growth data, on the other hand, show correlation also with mortality and growth, because growth is unconstrained for this data type.
It is an advantage of the Bayesian analysis (or rather the use of an MCMC) that these interactions can be made explicit and interpreted.Thinking about the reasons for correlations may be helpful to understand and improve the model structure, but a correlation in the posterior does not necessarily mean that two parameters share a particular ecological connection.It just means that changes in one of the parameters may be counterbalanced by the other to maintain the same value of the model output under consideration.Thus, correlations are connected to a data type, and they inform us which parameters cannot be fully constraint by this data type.For example, correlations and bias increase from V1 to V3, indicating that even for fitting recruitment, mortality and growth parameters only, static data such as stem size distributions does not provide sufficient information to constrain all parameters at once.Bias or correlations observed in the virtual test case seemed to originate predominantly from data limitations and not from problems with the simulation-based likelihood approximation.We have no indications that would suggest that the parametric model (multivariate normal) used in the likelihood approximation created any problems or bias by not adequately summarizing model outputs, which would be theoretically possible.Due to the computational complexity of this study, however, it was not possible for us to make a more systematic analysis of this question, for example by using virtual replicates of the field data sets or less aggregated data types.

Fit to Ecuadorian field data
Only static data was available to us for fitting the FORMIND model to field data from a montane forest in Ecuador.Our previous analysis suggested that these data would

Conclusions References
Tables Figures

Back Close
Full not be sufficient to sensibly constrain all demographic parameters at once.To get ecologically interpretable results, we therefore fixed the recruitment parameters to the values used in Dislich et al. (2009), and calibrated mortality and growth parameters only.
Prior uncertainty was considerably reduced by these data (Fig. 5), suggesting that the data was relatively informative to constrain the parameters under calibration.Marginal posterior parameter estimates are similar to those that were derived by Dislich et al. (2009) with a combination of literature data, expert knowledge and calibration (see Supplement, Table 3 for exact values).
From the fits to the virtual field data V3 (Fig. 7, Supplement), we expected correlations in the posterior mostly to occur between parameters of the same PFT.We find those correlations, but we find additionally correlations particularly between the mortality parameters of some PFTs (Fig. 13, Supplement).To understand this, one has to know that species grouping designed by Dislich et al. (2009) is hierarchical, consisting of 7 PFTs that were further divided into 4 growth groups with equal maximum diameter growth for the PFTs in each group, with the following reation between (PFT) and growth group: (1)-2,(2)-1,(3,4)-3,(5,6,7)-4.Diameter growth parameter 3, which is estimated lower, thus applies to the mid-successional PFTs 3 and 4, and diameter growth parameter 4, estimated higher, applies to the late successional PFTs 5, 6 and 7.This hierarchical species grouping is mirrored in the correlation structure, with particularly strong correlations in the mortality parameters of PFTs that belong to the same growth group.Our interpretation of this pattern is that PFTs in the same growth group are competing more strongly with each other than those that are in different growth groups.
Differences to the parametrization of Dislich et al. (2009) are particularly evident in the mortality parameters.Lower values were estimated for the mortality of the midsuccessional PFTs 3 and 4, while mortality of the late successional PFTs 5, 6 and 7 was estimated higher.This pattern is mirrored in the maximum diameter growth rates of mid-successional species.Thus, our study points to less pronounced differences between mid and late-successional types than Dislich et al. (2009).We can only speculate about the reason for these differences.In general, one would think that the system-

Conclusions References
Tables Figures

Back Close
Full atic parameter estimation is more reliable than the manual calibration by Dislich et al. (2009).However, although Dislich et al. (2009) calibrated to the same data, they also considered the fit of other model outputs such as total biomass and expert opinions for fixing the parameters.Expert opinion in particular would favor more pronounced differences in mortality rates between mid and late successional species due to ecological expectations, although specific empirical data on tree mortality or on maximum growth rates under full light were not available.Secondly, there are significant correlations between the parameters, which allow gaining a similar fit with a range of different parameter values.And finally, we were using the model in this study at a lower temporal resolution (5 yr time steps) than Dislich et al. (2009) to reduce computing time, which can affect model dynamics and equilibrium distributions, meaning that slightly different parameter values would be estimated for the same model with different temporal resolution.

Advantages compared to conventional calibration methods
Our results demonstrate that inverse parameter estimation with a likelihood function derived from the stochasticity in the model outputs is feasible and provides good results, even for a relatively complex and runtime-intensive ecological model.This is encouraging in itself, as it is neither trivial to calibrate a parameter-rich model with heterogeneous data in general, nor to address all the technical challenges for performing the simulation-based likelihood approximation.A valid question, however, is whether the gain is worth the effort -after all, our approach is connected with considerable computational and conceptual costs, and all we gain are parameter estimates that could probably also have been derived with conventional inversion methods such as parameter optimization.However, there are practical advantages of simulation-based likelihood approximations for ecological research that extend far beyond what we could demonstrate in this study.First of all, there is considerable interest in connecting models to large and heterogeneous data sources that become increasingly available (Luo et al., 2011; Hartig

Conclusions References
Tables Figures

Back Close
Full  et al., 2012;Dietze et al., 2013).A practical problem in this context is that conventional methods provide no good answer for how to weight different data sources to construct a joint likelihood or objective function.Moreover, ecological processes nearly inevitably lead to correlations between those different data types, meaning that we would not expect errors to be independent, posing a challenge for conventional methods.
Simulation-based likelihood approximations provide a natural answer to these problems.Assuming that the simulation model includes all major sources of stochasticity, likelihoods approximations automatically weight the importance of different model outputs and account for correlations between them.In our study, we can see this in the combined fit of growth rates and stem size distributions, which required no weighting of these two patterns and automatically accounted for second order correlations between them.Moreover, under conventional inverse parametrization procedures, one might see that a certain pattern is not well represented, but it is often difficult to decide whether this is a random or a systematic problem.Simulation-based likelihood approximations allow making a definite statement about the probability of observed patterns given the current model (parameters).Thus, we can use the full arsenal of statistical procedures, including Bayesian and frequentist model selection, to compare alternative ecological hypotheses.The possibility of such rigorous statistical tests for alternative processbased models will likely increase the acceptance of process-based models as a tool for not only representing and predicting, but for statistically testing ecological knowledge.

Differences to ABC
The comments in the previous subsection apply to parametric and non-parametric likelihood approximation alike.However, it also seems interesting to discuss differences between the parametric likelihood approximation used in this study and the more widely used non-parametric approximation used in Approximate Bayesian Computing (ABC, Beaumont, 2010).As discussed in Hartig et al. (2011), unlike ABC, parametric likelihood approximations will nearly inevitably exhibit a certain amount of approximation 13115 Introduction

Conclusions References
Tables Figures

Back Close
Full error because it is unlikely that a simple distributional model can emulate model output distributions in all respects (particularly in the tails of the output distribution).Yet, the parametric approximation also has practical advantages.Many ecological models have to be run into equilibrium before predictions can be made.Once such a model is in equilibrium, more draws for the parametric approximation can be generated relatively cheap, while a new run has to be started for each ABC step.In our example, the time required for the parametric approximation in one MCMC step was not much longer than for an ABC step, but the parametric approximation ensures a good acceptance probability.To reach the same acceptance probability with ABC, we would have to accept a relatively large ABC approximation error.This error may be corrected later, but the fact remains that for situations where the number of possible MCMC evaluation is fixed (complex models), both ABC and parametric approximations will have a nonnegligible error.We conjecture that the balance could well be in favor of parametric approximations in situations such as the one encountered in this study.

Conclusions
Our results suggest that likelihood approximations, in particular parametric likelihood approximations, are a promising route for the parametrization of stochastic ecological models.Using them is technically more challenging that the "traditional" Bayesian approach where likelihoods are based on phenomenological error models.The advantage, however, is that error models are based on the same ecological mechanisms as all other model predictions.Thus, they allow a more rigorous test of the mechanistic model assumptions, because the mechanisms have to explain both the mean and the variance in the data.Moreover, likelihood approximations account for the relative importance and correlations between different data types predicted by the model, which makes them interesting when models have to be coupled to heterogeneous data.In this study, additional computational costs of the approach were moderate (factor 2-5) compared to a standard Bayesian approach due to the fact that the model had to be Introduction

Conclusions References
Tables Figures

Back Close
Full Table 1.Important model parameters and their interpretation.The parameter dbh m in the last line refers to the maximum diameter of a tree, which is a species or PFT specific parameter of the model (see Dislich et al., 2009).
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) in a lowdimensional form so that the estimation particularly of Σ −1 sim (φ) can be achieved in Introduction Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .Fig. 2 .Fig. 5 .
Fig. 1.Principle of statistical inference through stochastic simulation.(a) shows mean model predictions (black), standard deviation (gray) and extreme values (light gray) for the biomass of a 1 ha plot over 10 000 yr, starting from an empty plot.(b)shows the same mean equilibrium biomass (black) and two standard deviations (gray), but as a function of the mortality of the late-successional type PFT 3, all other parameters equal.We see that comparing the observed biomass from (a), which was created with a mortality rate of 0.005, with the predicted biomass for different mortality rates, we can infer the original value as well as a statistical uncertainty, without having to define a statistical model.
ent growth properties and light demands.For highly diverse systems such as tropical rainforests exhibit, species are usually grouped into plant functional types (PFTs) that represent a group of species with similar functional properties.Parameters and model predictions per plant functional types then represent a mean over the species that are represented by this type.Gap formation, that is, the death of large trees, maintains the modeled forest in a dynamic equilibrium.As a result, forest gap models do not merely predict a mean value for outputs such as biomass, species composition, or tree size distributions.Rather, they deliver samples of different possible values for these outputs and therefore allow assigning probabilities to different community or biomass states.