11 May 2022
11 May 2022
Status: this preprint is currently under review for the journal BG.

Evaluation of gradient boosting and random forest methods to model subdaily variability of the atmosphere–forest CO2 exchange

Matti Kämäräinen1, Anna Lintunen2,3, Markku Kulmala3, Juha-Pekka Tuovinen4, Ivan Mammarella3, Juha Aalto1, Henriikka Vekuri4, and Annalea Lohila4,3 Matti Kämäräinen et al.
  • 1Weather and Climate Change Impact Research, Finnish Meteorological Institute, Helsinki, Finland
  • 2Institute for Atmospheric and Earth System Research/Forest Sciences, Faculty of Agriculture and Forestry, University of Helsinki, Helsinki, Finland
  • 3Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki, Finland
  • 4Climate System Research, Finnish Meteorological Institute, Helsinki, Finland

Abstract. Accurate estimates of the net ecosystem CO2 exchange (NEE) would improve the understanding of the natural carbon sources and sinks and their role in the regulation of the global atmospheric carbon. In this work, we use and compare the random forest (RF) and the gradient boosting (GB) machine learning (ML) methods for predicting the year-round 6 hourly NEE over 1996–2018 in a pine-dominated boreal forest in southern Finland and analyze the predictability of the NEE. Additionally, aggregation to weekly NEE values was applied to get information about longer term behavior of the method. The meteorological ERA5 reanalysis variables were used as predictors. Spatial and temporal neighborhood (predictor lagging) was used to provide the models more data to learn from, which was found to improve the accuracy compared to using only the nearest grid cell and time step. Both ML methods can explain the temporal variability of the NEE in the observational site of this study with the meteorological predictors, but the GB method was more accurate. It was more effective in separating the important predictors from non-important ones, showing no signs of overfitting despite many redundant variables. The accuracy of the GB (RF), here measured mainly using cross-validated Pearson correlation coefficient between the model result and the observed NEE, was high (good), reaching a best estimate value of 0.96 (0.94) and the root mean square value of 1.18 µmol m⁻² s⁻¹ (1.35 µmol m⁻² s⁻¹). We recommend using GB instead of RF for modeling the CO2 fluxes of the ecosystems due to its better performance.

Matti Kämäräinen et al.

Status: open (until 22 Jun 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Matti Kämäräinen et al.

Model code and software

Gradient boosting and random forest tools for modeling the NEE Kämäräinen, M., Lintunen, A., Kulmala, M., Tuovinen, J., Mammarella, I., Aalto, J., Vekuri, H., and Lohila, A.

Matti Kämäräinen et al.


Total article views: 158 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
133 20 5 158 1 1
  • HTML: 133
  • PDF: 20
  • XML: 5
  • Total: 158
  • BibTeX: 1
  • EndNote: 1
Views and downloads (calculated since 11 May 2022)
Cumulative views and downloads (calculated since 11 May 2022)

Viewed (geographical distribution)

Total article views: 142 (including HTML, PDF, and XML) Thereof 142 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 26 May 2022
Short summary
In this work we present an alternative machine learning approach to model the exchange of carbon between the atmosphere and the study site, located in a boreal forest in Southern Finland. The suggested method produces more accurate results than another which has been widely used previously in this context. Accurate estimations of the exchange of carbon are needed for better understanding of the role of forests in regulating atmospheric carbon and climate change.