Articles | Volume 15, issue 19
Biogeosciences, 15, 5801–5830, 2018
Biogeosciences, 15, 5801–5830, 2018

Research article 04 Oct 2018

Research article | 04 Oct 2018

Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Istem Fer1, Ryan Kelly2, Paul R. Moorcroft3, Andrew D. Richardson4,5, Elizabeth M. Cowdery1, and Michael C. Dietze1 Istem Fer et al.
  • 1Department of Earth and Environment, Boston University, Boston, MA 02215, USA
  • 2RK Analytics, Durham, NC 27712, USA
  • 3Department Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
  • 4School of Informatics, Computing and Cyber Systems, Northern Arizona University Flagstaff, AZ 86011, USA
  • 5Center for Ecosystem Science and Society, Northern Arizona University, Flagstaff, AZ 86011, USA

Abstract. Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.

Short summary
The computer models we use to understand and forecast the ecosystem changes have multiple components that determine their outcomes. Due to our limited observation capacities, these components bear uncertainties that in return affect our predictions. While there are techniques for reducing these uncertainties, they are not applicable to every model due to computational and statistical barriers. This research presents a method that lowers those barriers and allows us to improve model predictions.
Final-revised paper