Articles | Volume 13, issue 14
Biogeosciences, 13, 4291–4313, 2016
Biogeosciences, 13, 4291–4313, 2016

Research article 29 Jul 2016

Research article | 29 Jul 2016

Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms

Gianluca Tramontana1, Martin Jung2, Christopher R. Schwalm3, Kazuhito Ichii4,5, Gustau Camps-Valls6, Botond Ráduly1,7, Markus Reichstein2, M. Altaf Arain8, Alessandro Cescatti9, Gerard Kiely10, Lutz Merbold11,12, Penelope Serrano-Ortiz13, Sven Sickert14, Sebastian Wolf11, and Dario Papale1 Gianluca Tramontana et al.
  • 1Department for Innovation in Biological, Agro-food and Forest systems (DIBAF), University of Tuscia, Viterbo, 01100, Italy
  • 2Max Planck Institute for Biogeochemistry, Jena, 07745, Germany
  • 3Woods Hole Research Center, Falmouth, MA 02540, USA
  • 4Department of Environmental Geochemical Cycle Research, Japan Agency for Marine-Earth Science and Technology, Yokohama, 236-0001, Japan
  • 5Center for Global Environmental Research, National Institute for Environmental Studies, Tsukuba, 305-8506, Japan
  • 6Image Processing Laboratory (IPL), Universitat de València, Paterna (València), 46980, Spain
  • 7Department of Bioengineering, Sapientia Hungarian University of Transylvania, Miercurea Ciuc, 530104, Romania
  • 8School of Geography and Earth Sciences, McMaster University, Hamilton (Ontario), L8S4L8, Canada
  • 9European Commission, Joint Research Centre, Directorate for Sustainable Resources, Ispra, Italy
  • 10Civil & Environmental Engineering and Environmental Research Institute, University College, Cork, T12 YN60, Ireland
  • 11Department of Environmental Systems Science, Institute of Agricultural Sciences, ETH Zurich, Zurich, 8092, Switzerland
  • 12Mazingira Centre, Livestock Systems and Environment, International Livestock Research Institute (ILRI), 00100, Nairobi, Kenya
  • 13Department of Ecology, University of Granada, Granada, 18071, Spain
  • 14Computer Vision Group, Friedrich Schiller University Jena, 07743 Jena, Germany

Abstract. Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data and (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange (R2 < 0.5), ecosystem respiration (R2 > 0.6), gross primary production (R2> 0.7), latent heat (R2 > 0.7), sensible heat (R2 > 0.7), and net radiation (R2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well (R2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted (R2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). The evaluated large ensemble of ML-based models will be the basis of new global flux products.

Short summary
We have evaluated 11 machine learning (ML) methods and two complementary drivers' setup to estimate the carbon dioxide (CO2) and energy exchanges between land ecosystems and atmosphere. Obtained results have shown high consistency among ML and high capability to estimate the spatial and seasonal variability of the target fluxes. The results were good for all the ecosystems, with limitations to the ones in the extreme environments (cold, hot) or less represented in the training data (tropics).
Final-revised paper