Articles | Volume 18, issue 2
Biogeosciences, 18, 367–392, 2021
Biogeosciences, 18, 367–392, 2021

Research article 18 Jan 2021

Research article | 18 Jan 2021

Machine learning estimates of eddy covariance carbon flux in a scrub in the Mexican highland

Machine learning estimates of eddy covariance carbon flux in a scrub in the Mexican highland
Aurelio Guevara-Escobar1, Enrique González-Sosa2, Mónica Cervantes-Jiménez1, Humberto Suzán-Azpiri1, Mónica Elisa Queijeiro-Bolaños1, Israel Carrillo-Ángeles1, and Víctor Hugo Cambrón-Sandoval1 Aurelio Guevara-Escobar et al.
  • 1Facultad de Ciencias Naturales, Universidad Autónoma de Querétaro, Av. de las Ciencias s/n Juriquilla, CP. 76230, Querétaro, Querétaro, Mexico
  • 2Facultad de Ingeniería, Universidad Autónoma de Querétaro, Cerro de las Campanas s/n Las Campanas, CP. 76010 Querétaro, Querétaro, Mexico

Correspondence: Mónica Cervantes-Jiménez (


Arid and semiarid ecosystems contain relatively high species diversity and are subject to intense use, in particular extensive cattle grazing, which has favored the expansion and encroachment of perennial thorny shrubs into the grasslands, thus decreasing the value of the rangeland. However, these environments have been shown to positively impact global carbon dynamics. Machine learning and remote sensing have enhanced our knowledge about carbon dynamics, but they need to be further developed and adapted to particular analysis. We measured the net ecosystem exchange (NEE) of C with the eddy covariance (EC) method and estimated gross primary production (GPP) in a thorny scrub at Bernal in Mexico. We tested the agreement between EC estimates and remotely sensed GPP estimates from the Moderate Resolution Imaging Spectroradiometer (MODIS), and also with two alternative modeling methods: ordinary-least-squares (OLS) regression and ensembles of machine learning algorithms (EMLs). The variables used as predictors were MODIS spectral bands, vegetation indices and products, and gridded environmental variables. The Bernal site was a carbon sink even though it was overgrazed, the average NEE during 15 months of 2017 and 2018 was 0.78 gCm-2d-1, and the flux was negative or neutral during the measured months. The probability of agreement (θs) represented the agreement between observed and estimated values of GPP across the range of measurement. According to the mean value of θs, agreement was higher for the EML (0.6) followed by OLS (0.5) and then MODIS (0.24). This graphic metric was more informative than r2 (0.98, 0.67, 0.58, respectively) to evaluate the model performance. This was particularly true for MODIS because the maximum θs of 4.3 was for measurements of 0.8 gCm-2d-1 and then decreased steadily below 1 θs for measurements above 6.5 gCm-2d-1 for this scrub vegetation. In the case of EML and OLS, the θs was stable across the range of measurement. We used an EML for the Ameriflux site US-SRM, which is similar in vegetation and climate, to predict GPP at Bernal, but θs was low (0.16), indicating the local specificity of this model. Although cacti were an important component of the vegetation, the nighttime flux was characterized by positive NEE, suggesting that the photosynthetic dark-cycle flux of cacti was lower than ecosystem respiration. The discrepancy between MODIS and EC GPP estimates stresses the need to understand the limitations of both methods.

1 Introduction

Deserts and semideserts occupy more than 30 % of terrestrial ecosystems. In Mexico, almost 2×106 km2 (50 %) correspond to arid and semiarid ecosystems, mainly the Sonoran and the Chihuahuan deserts (Verbist et al., 2010). The Spanish-Criollo intrusion (1540–1640) brought new land use methods, but there is no evidence of additional landscape degradation from the central highlands to the northeastern frontier of New Spain until well into the 18th century (Butzer and Butzer, 1997). At the country scale, the extent of grasslands in Mexico declined, and the area of croplands and woody areas increased, rural–urban migration being an important driver of that transition (Bonilla-Moheno and Aide, 2020). The transition from grasslands to shrublands or scrub is linked to the extremely heavy grazing by domestic livestock (Wilcox et al., 2018).

Vegetation in the arid and semiarid ecosystems are mostly classified as rangelands. These are one of the most widely distributed landscapes on earth, incorporating a wide range of communities including grasslands, shrublands and savannah. Scrub is a xeric category of shrublands characterized by plants with small leaves; it is very thorny, and its biomass is distributed mainly to roots and leaves rather than the stems (Rzedowski, 1978; Wheeler et al., 2007; Zhang et al., 2017). Studies need to consider multiple sources of evidence and the driving processes of land use change in Mexico to aid in policy formulation and to identify regions that may provide important ecosystem services (Murray-Tortarolo et al., 2016).

On the other hand, photosynthesis contributes to carbon sequestration by moving carbon stock from the atmosphere to other pools or sinks, such as aboveground biomass, roots and soil organic matter (Booker et al., 2013). The role of vegetation in carbon sequestration on arid and semiarid ecosystems is less evident because the growth rate is low, and biomass partition above and below ground is different from that of temperate and tropical forests. Competitive interactions of arid plants at the community level are strongly influenced by rooting architecture and phenological growth (Zeng et al., 2008). Many plants in semiarid systems support a deep and wide root system as a drought adaptation but also for nutrient uptake (McCulley et al., 2004).

Recent time trends indicate that semiarid ecosystems regulate the terrestrial carbon sink and dominate its interannual variability (Piao et al., 2019; Scott et al., 2015; Zhang et al., 2020). This variability mainly results from the imbalance between two larger biogenic fluxes that constitute the net ecosystem exchange (NEE): the photosynthetic uptake of CO2 (gross primary production, GPP) and the respiratory release of CO2 (total ecosystem respiration, Reco). Radiation and water availability are important environmental drivers of NEE and thus GPP and Reco (Marcolla et al., 2017). However, other carbon fluxes contribute to the imbalance, such as fire and anthropogenic CO2 emissions (Järvi et al., 2019; Piao et al., 2019). Another atmospheric CO2 flux is that from soil inorganic carbon in arid and semiarid ecosystems (Soper et al., 2017). Calcium carbonates form in the soil at a relatively low rate of 5 to 150 kgCha-1yr-1; this carbon can return to the atmosphere, but it forms a carbon sink when carbonates are leached into the groundwater (Lal et al., 2004).

The methods used to explore the ecosystems and the understanding of their functioning are changing rapidly, particularly for arid and semiarid ecosystems (Goldstein et al., 2020; Ma et al., 2020; Xiao et al., 2019; Yao et al., 2020). There are many instruments and techniques for estimating carbon and water fluxes, but two stand out in the literature: eddy covariance (EC) and remote-sensing techniques. EC is a micrometeorological method that measures the ecosystem community NEE at short time intervals, representing a land surface smaller than 1 km2. The orbital remote sensors measure radiation emitted or reflected by the earth's surface using different algorithms, representing different traits of vegetation activity from large-scale areas. Both techniques are complementary, but an agreement between their estimates is important for regional, countrywide and global spatiotemporal monitoring of greenhouse gas inventories (Yona et al., 2020); ecological modeling; quantifying the interaction among the vegetation component and the hydrological component; energy and nutrient cycles; and others applications (Pasetto et al., 2018). Particularly, products from the Moderate Resolution Imaging Spectroradiometer (MODIS) have ample availability and have been extensively used to study land surface since 2000.

Gross primary production can be represented by a wide range of models, ranging in complexity from simple regression based on climatic forcing variables to complex models that simulate biophysical and ecophysiological processes (Anav et al., 2015). The MODIS MOD17 product uses a photosynthetic radiation conversion efficiency model (Running and Zhao, 2015), but a better relationship is reported with EC-derived GPP when the model uses vegetation indices calculated from the same MODIS platform (Ma et al., 2014; Wu et al., 2010). Although they are black box models in principle, recent modeling efforts report good agreement of GPP estimates obtained from machine learning (ML) algorithms or ensembles of models (Eshel et al., 2019; Joiner and Yoshida, 2020; Jung et al., 2020). Different machine learning algorithms are powerful because they can identify trends and patterns in big datasets and solve regression or classification problems.

To generate models of GPP, we measured EC fluxes during 2017–2018 in a thorny scrub with semiarid climate in the highlands of Mexico (Bernal site). Competing models were data-driven machine learning regression ensembles (EMLs) and ordinary-least-squares (OLS) regression, both using Daymet (Thornton et al., 2017) and MODIS datasets as explanatory variables. The MODIS GPP product was used as a baseline comparison. The second step was to use an EML model based on local data (Daymet and MODIS) from a site with EC instrumentation and similar vegetation to that of the Bernal's site and then use that model to predict GPP at the Bernal site. The site we used was Santa Rita from the Ameriflux network. While Santa Rita is in the Sonoran Desert, and Bernal is on the southern border of the Chihuahuan Desert, both have a similar climate and vegetation (Fig. 1). A good agreement between Bernal EC data and the predictions from the Santa Rita model would support the use of machine learning algorithms as a scale-up mechanism. This would be useful to the understanding of semiarid ecosystems and also improve current earth system models (Piao et al., 2019).

We measured the carbon flux for the vegetation in a semiarid site located at Bernal, Mexico. The phenology patterns in the region suggested that this site could be a carbon sink during the wet season or a carbon source during the dry season since some predominant species reproduce during winter and spring, particularly cacti, Acacia and Prosopis (mesquite). Furthermore, the Bernal site had a history of disturbance by overgrazing; this could decrease the GPP and even result in a positive carbon balance, thus being a carbon source. If the shrub vegetation in this site predominantly absorbed carbon during the measuring period, then this evidence would contribute to reinforce the reported importance of semiarid environments in the global carbon balance (Zhang et al., 2020). However, land ownership patterns and the balance between agricultural investments and land conservation will determine the absolute amount of carbon sequestered. Hopefully, the results of the present investigation would promote the idea that carbon sequestration is possible in scrubland and be incorporated in informed decisions and new policy.

2 Materials and methods

2.1 Site description

The study site (Bernal) is located at 20.717 N, 99.941 W and 2050 ma.s.l. in the municipality of Ezequiel Montes in Querétaro, where real-estate development, feedlot beef production, cheese and wine production associated with tourism, and automotive-industry development are very attractive options for landowners in the region. Bernal is located in a shallow valley oriented from north to south, approximately 15 to 20 km wide and opening to the south to the Río Lerma basin and then draining into the Pacific Ocean. The northern limit of the valley is surrounded by hill country and its characteristic dacitic dome 433 m in height (Aguirre-Díaz et al., 2013). Moisture-laden winds blow westward from the Gulf of Mexico, but the Sierra Gorda, located 60 km east of Bernal, casts a rain shadow over the area (Segerstrom, 1961).

Figure 1Localization and land use maps for the study site: (a) biogeographic arid and semiarid zones of southern North America relevant for the Bernal and Santa Rita sites; the black outline represents the state limit for Querétaro. (b) Heterogeneity of the normalized difference vegetation index (NDVI) surrounding the EC tower at Bernal during the peak of the growing season 2017 (DOY 257). (c) Land cover in the region of the Bernal site, according to the annual University of Maryland (UMD) classification (MCD12 MODIS product).

The Bernal site is private property, with grazing dairy cattle receiving additional concentrated feedstuffs under stall-feeding. Grazing was continuous, and water for livestock was only available in the feeding and milking area; there were no pasture divisions, and the perimeter fence was made of stone. These characteristics of the animal production model and the state of vegetation are representative of land management practices and the scrub vegetation of the region. However, the Bernal site suffered important changes in land use during 2019, and the scrub was suddenly cleared and converted into rainfed cropping.

The climate is arid with summer rains (BSk), with a mean annual rainfall of 476 mm and a mean annual temperature of 17.1 C (CICESE, 2015). Prevailing wind is from the east and northeast. The terrain is mostly flat; most grades are below 2 %. The soil has a clay loam texture, the class is a Vertisol with abundant subrounded basaltic stones without rocky outcrops, and the depth is greater than 0.6 m. Vegetation was less than 3 m in height with an overgrazed herbaceous stratum. Vegetation corresponds to secondary scrub with the dominant genera Acacia, Prosopis and different Cacti (Fig. 3). This site was classified as grassland by the MODIS land cover product.

For the scrub and tree species, the importance vegetation index (IVI) was determined following Curtis and McIntosh (1950) to assess the vegetation homogeneity. The IVI is the sum of relative dominance, relative density and relative frequency of the species present. Vegetation sample points were chosen according to the flux footprint of the eddy covariance tower (Fig. 2). For each plant in the vegetation sample points, two stem diameters, the number of individuals (abundance) and identity of each species were measured as well as the coverage, which is the horizontal projection of the aerial parts of the individuals on the ground, expressed as a percentage of the total area (Wilson, 2011).

Figure 2Source strength-weighted function at the Bernal site during 2017 and 2018 (gray points). Horizontally, plots show the peak of the function and increasing percentages of the flux footprint. The radial scale is along-wind distance from the sensor (m). The measurement height was 6 m. The blue points represent the vegetation sample plots used to calculate the importance vegetation index (IVI).


Figure 3Thorny scrub at Bernal, Querétaro, during the rainy season 2017. In the foreground is Cylindropuntia imbricata, a very thorny cactus; shrubs in the background are Prosopis laevigata mesquites.


2.2 Eddy covariance measurements

The micrometeorological EC technique measures at the plant community level NEE in a nondestructive way and continuously over time (Baldocchi, 2014). The negative CO2 fluxes corresponded to NEE, which is equivalent to NEP (net ecosystem production) but with opposite sign. The EC has advantages compared to other techniques that need to scale up measurements from the leaf, plant or soil levels up to ecosystems, especially when the vegetation is heterogeneous (Yepez et al., 2003). However, EC is an expensive technique, and data analysis and processing are complicated; also specific assumptions must be met regarding the terrain, vegetation and micrometeorological conditions, among other aspects (Richardson et al., 2019).

The fluxes were measured with the EC technique at a height of 6 m with the following instruments: a Biomet system (LI-COR Biosciences, USA) to measure H2O and CO2 fluxes using an IRGASON-EC-150 open-circuit analyzer, a CSAT3 sonic anemometer and a KH20 krypton hygrometer; these were connected to a CR3000 data logger (Campbell Scientific Inc., Logan, UT, USA). The relative humidity and air temperature were measured with an HMP155A probe (Vaisala Corporation, Helsinki, Finland), net radiation was measured with an NR-Lite2 radiometer (Kipp and Zonen BV Delft, the Netherlands), and the photosynthetic active radiation (PAR) was measured with a quantum sensor (SKP215; Skye Instruments, Llandrindod Wells, UK). Measurements of the soil heat flux was implemented with four self-calibrating HFP01SC plates at 80 mm depth and in four representative positions of the landscape (Hukseflux Thermal Sensors BV, Delft, The Netherlands). Three time domain reflectometry (TDR) probes (CS616) measured volumetric water content in the ground installed vertically, and two sets of TCAV (averaging soil thermocouple) probes measured the temperature at 60 and 40 mm depths and above the HFP01SC plates (Campbell Scientific Inc., Logan, UT, USA). The TE525 (Texas Electronics, Dallas, TX, USA) tipping-bucket rain gauge was installed at 1.2 m height and 3 m away from the tower. All these meteorological variables were measured every 5 s, and average values were stored every 30 min; rainfall was accumulated for the same time interval. Sensible (H) and latent (λE) heat fluxes were calculated by the EddyPro package (LI-COR Biosciences, USA).

2.3 Flux data processing

All EC data were collected at 10 MHz in the data logger and reported as micromoles of CO2 per square meter per second and processed with the EddyPro package to convert values into average fluxes of 30 min intervals. Only quality-flagged records were used to account for the CO2 flux (qc_co2_flux=0) according to the Mauder and Foken (2011) policy in the EddyPro program (LI-COR, 2019).

However, this quality-checking is not sufficient, especially in the case of CO2; therefore, data were postprocessed using the REddyProc package of R (R Development Core Team, 2009) to estimate the friction speed thresholds (u*), gap-fill data, and partition the NEE flux into its GPP and Reco components (Wutzler et al., 2018). The filled-in estimates of NEE (NEE_uStar_f), GPP (GPP_uStar_f) and Reco (Reco_uStar) were used when the u* was lower than a u* annual threshold, above which nighttime fluxes are considered valid. The annual u* threshold was 0.193 and 0.194 for 2017 and 2018. The difference between these thresholds and the 95 % u* threshold was small (0.033 m s−1). Appendix A presents the threshold means and confidence intervals calculated in the REddyProc package. Only 1050 half-hour records (9.3 %) had a u* below the annual mean u* threshold. The data with a flag equal to 0 were used for the variable NEE_uStar_fqc, as defined by REddyProc. Carbon dioxide flux data were time-integrated and converted to grams of C per square meter per day using the molar ratio of C. We only reported the continuous measurements of the exchange of CO2 for the period of April 2017 (DOY 89) to August 2018 (DOY 234) using the EC technique. Due to equipment malfunction and incomplete datasets, some periods of time were not considered. The measurement campaign presented here was not biased by wet winters since both years were characterized by a less-than-weak Niño–Niña.

2.4 Remotely sensed data

Data were requested to the National Aeronautics and Space Administration (NASA) via the Land Processes Distributed Active Archive Center (DAAC) using the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) to obtain spatial and temporal subsets for the Bernal and Santa Rita sites including daily surface reflectance (MOD09GA.006 and MODOCGA.006), daily daytime and nighttime land surface temperature (LST) (MOD11A2.006 and MYD11A2.006), 8 d leaf area index (LAI) and fraction of photosynthetically active radiation (FPAR) (MOD15A2H.006, MYD15A2H.006, MCD15A2H.006), 16 d enhanced vegetation index (EVI) (MYD13A1.006), 16 d gross primary production (GPP), and net photosynthesis (PsnNet) (MOD17A2H.006). The AppEEARS also unpacks and interprets the quality layers. Appendix B presents the details of each spectral band of MODIS. Data with less-than-good quality flags were deleted. Missing data were filled with splines, and a database with a 1 d time step was generated. This would smooth linear temporal–phenological evolution between any two successive remotely sensed data points (Eshel et al., 2019). Daily accumulated rainfall was requested using the Giovanni platform from the Goddard Space Flight Center (GSFC) DAAC; two products were used: the 3IMERGDF.006 from the Global Precipitation Measurement mission (GPM) and the 3B42.007 from the Tropical Rainfall Measuring mission (TRMM). Gridded weather parameters from the ORNL DAAC Daymet dataset were precipitation, shortwave radiation, maximum and minimum air temperature, and water vapor pressure. Daymet is a data product derived from a collection of algorithms interpolating and extrapolating daily meteorological observations (Thornton et al., 2017). Following Henrich et al. (2012) and Hill et al. (2006), daily reflectance bands of MODIS were used to compute several vegetation indices: red–green ratio index (RGRI), simple ratio (SimpleR), moisture stress (MoistS), disease stress index (DSI), green atmospherically resistant vegetation index (GARI), normalized difference vegetation index (NDVI), normalized difference water index (NDVI_w) and enhanced vegetation index (EVI); the corresponding equations are presented in Appendix B.

2.5 MODIS algorithm for GPP

Estimates of GPP are derived from data recorded by the MODIS sensor aboard the Terra and Aqua satellites. The efficiency (ε; g C MJ−1) with which vegetation produces dry matter is defined as the amount of solar energy stored by photosynthesis in a given period divided by the solar constant integrated over the same period (Monteith, 1972). Not all incident solar radiation is available for biomass conversion; only about 48 % is photosynthetically active (PAR; MJ m−2), and not all PAR is absorbed (Zhu et al., 2008). Thus, carbon exchange is mainly controlled by the amount of PAR absorbed by green vegetation (APAR) and modified by ε (Gitelson et al., 2015). The fraction of absorbed PAR (FPAR) is equal to APAR/PAR but can be represented by the NDVI spectral vegetation index produced by MODIS (Running 2004). The efficiency term ε is described as the product of different factors as a whole or as a part of the system (Monteith, 1972) but mostly those related to the efficiencies with which the vegetation intercepts the radiation and the efficiency of converting the intercepted radiation into biomass (Long et al., 2015). The MODIS algorithm that estimates GPP in the MOD17 product is (Running et al., 2004)


The ε term in the MODIS algorithm is represented by a maximum radiation conversion efficiency (εmax; kg C MJ−1) that is attenuated by suboptimal climatic conditions, mainly minimum air temperature (Tmin) and vapor pressure deficit (VPD). Two parameters for each, Tmin and VPD, are used to define attenuation scalars for general biome types. These parameters form linear functions between the scalars (Running and Zhao, 2015; Wang et al., 2013): daily minimum temperature at which ε=εmax and at which ε=0 and the daylight average VPD at which ε=εmax and at which ε=0. GPP is truncated on days when air temperature is below 0 C, or VPD is higher than 2000 Pa (Running and Zhao, 2015). Stress and nutrient constraints on vegetation growth are quantified by the limiting relation of leaf area in NDVI × PAR rather than constrained through ε (Running et al., 2004). However, the MODIS algorithm does not consider stomatal sensitivity to leaf-to-air vapor pressure across and within species and particularly between isohydric and anisohydric plant species (Grossiord et al., 2020).

(3) ε = ε max × T min scalar × VPD_scalar

The MOD17 user's guide presents a biome-property look-up table (BPLUT) with the parameters for each biome type and assumes that they do not vary with space or time (Running and Zhao, 2015). This aspect is important because εmax has the strongest impact on the predicted GPP of the MOD17 algorithm (Wang et al., 2013). The assumption is also important because the overstory and understory could be decoupled from each other and would intercept different amounts of light and have different water sources during the growing season (Scott et al., 2003). Light quantity and quality as diffuse light or sunflecks determine differences among understory species in their temporal response to gaps, involving acclimation and avoidance of photoinhibition (Pearcy, 2007). Another shortcoming is that few land cover classifications are incorporated into the MOD17 algorithm.

2.6 Santa Rita site dataset

The Santa Rita Experimental Range (SRER) is located in the western range of the Santa Rita Mountains in Arizona, USA (lat 31.8214, long 110.8661; 1120 ma.s.l.). Climate is BSk, with mean annual precipitation of 380 mm, temperature of 17.9 C and Ustic Torrifluvent soils. Established in 1903, SRER has a long history of experimental manipulations to enhance grazing potential for cattle (Glenn et al., 2015). Two Ameriflux sites are located in the SRER: Santa Rita Grassland (US-SRG) and Santa Rita Mezquite (US-SRM). We used EC data for the years 2013–2019 from US-SRM, which is a mesquite grass savanna (35 % mesquite canopy cover and mean canopy height above 2 m, 22 % grasses, and 43 % bare soil), although MODIS describes this site as open shrublands (Glenn et al., 2015; Scott et al., 2004). The US-SRM site is dominated by velvet mesquite (Prosopis velutina) and has a diversity of shrubs, cacti, succulents and bunch grasses (McClaran, 2003). This site was chosen because the vegetation and climate are similar to the Bernal site, and it was the closest EC instrumentation with data availability (Fig. 4).

Figure 4Semidesert grassland encroached by mesquite (Prosopis velutina) at Santa Rita, Arizona (US-SRM). Image credit: Russell Scott, 9 December 2016.

2.7 Modeling

Gross primary production estimated by EC at the Bernal site was modeled using OLS and EML. The explanatory variables were the remotely sensed data, the weather parameters and the vegetation indices (Appendix B). The OLS is a particular case of the generalized linear model, where the variation of a single response variable is explained by several independent variables. The OLS was fitted with the stepwise procedure; the final model included variables with a variance inflation factor (VIF) lower than 10 and a significance level of 0.05. Predictions of the EC GPP were obtained with the final model. Analysis and diagnostics were made with Minitab v 17 (Minitab LLC). Analysis of the OLS model can be used to determine agreement between methods of measurement, but it is sensitive to the range of values in the dataset, and its metrics – r, r2 and root mean square – do not provide information on the type of association (Bland and Altman, 2010). In this paper, we used another metric, the probability of agreement, to determine bias and agreement between model estimates and observed data (see below).

While OLS is a well-known algorithm, machine learning algorithms are emerging techniques that focus on the data structure and match that data onto models. The EML approach considers the different realizations of machine learning models and constructs an ensemble of models coming with the advantage of being more accurate than the predictions from the individual ensemble members. However, EML is computationally intensive, requiring nodes with purpose-built hardware such as multiple processors or reduced-precision accelerators. The nodes could be aggregated in computing clusters, which require storage, power and cooling redundancy.

A stack of EMLs was obtained with the H2O package of R (Hall et al., 2019). This package provides several algorithms that can contribute to a stack of ensembles using the AutoML (automatic machine learning) function: feedforward artificial neural network (DL), general linear models (GLMs), gradient-boosting machine (GBM), extreme gradient boosting (XGBoost), default distributed random forest (DRF) and extremely randomized trees (XRT). AutoML trains two stacked-ensemble models: one ensemble contains all the models, and the second ensemble contains just the best-performing model from each algorithm class or family; both ensembles should produce better models than any individual model from the AutoML run. The term AutoML implies data preprocessing, normalization, feature engineering, model selection, hyperparameter optimization and prediction analysis, including procedures to identify and deal with nonindependent and identically distributed observations and overfitting (Michailidis, 2018; Truong et al., 2019).

Machine learning has two elements for supervised learning: training loss and regularization. The task of training tries to find the best parameters for the model while minimizing the training loss function, the mean squared error for example. The regularization term controls the complexity of the model, helping to reduce overfitting. Overfitting becomes apparent when the model performs accurately during the training, but the accuracy is low during the testing. A good model needs extensive parameter tuning by running the algorithm many times to explore the effect on regularization and cross-validation accuracy (Mitchell and Frank, 2017). In this investigation, the function of training loss was the deviance, which is a generalization of the residual sum of squares driven by the likelihood. Deviance is a measure of model fit, and lower or negative values indicate better model performance (McElreath, 2020).

A stack of EML solutions was based on a random sample of the dataset for training the model. For the Bernal site, 85 % was used for training; for US-SRM 80 % was used. The AutoML function was run 20 times; each run added approximately 48 models to the leader board and ranked the best-performing models by their deviance. Each run splits the training data 10 times for k-fold cross-validation. The seed for an EML that is dependent on randomization was changed in every run. The stopping rule for each run was set at 100 s, and the maximum memory allocation pool for H2O was 100 GB in a single workstation with dual Xeon 2680 v4 processors and 128 GB of RAM. The H2O package was installed in a rocker/geospatial docker container (image available at, last access: 9 January 2021), which is a portable, scalable and reproducible environment (Boettiger and Eddelbuettel, 2017).

Two sets of predictions for the GPP at the Bernal site were obtained from the stacked ensemble. The first set of predictions was based on the 15 % of the Bernal site data reserved for testing. The second set of predictions was obtained by refeeding the US-SRM site model with the Bernal site explanatory variables. The first set of predictions would show the importance of local data to predict EC-based GPP. The second set of predictions would represent the suitability of off-site data to predict EC-based GPP. If the second scenario has good agreement, then an EML model could be used to represent wider areas of the ecosystem.

2.8 Variable importance

The variable importance within individual models was used to answer the question of which environmental variables were important for GPP prediction. For the “all-models ensemble” and “best-of-family ensemble” generated by AutoML, it is not possible to examine the variable importance or the contribution of the individual models to the stack ( 2017). Therefore, a weight (wi) was calculated using Eq. (4), which is adequate for other information criteria besides the Akaike weights; this weight is an estimate of the conditional probability that the model will make the best predictions on new data, considering the set of models (McElreath, 2020). Then, the importance of each variable (%) was multiplied by the model's weight (wi) and then added by variable to build the variable importance index. This index would measure how often a given variable was used on the leader board.

(4)wi=exp-12dWAICij=1mexp-12dWAICi(5)dWAIC=deviancei-deviance of top-performing model

2.9 Model agreement

Calibration and agreement between methods of measurement are different procedures. Calibration compares known quantities of the true value or measurements made by a highly accurate method (a gold standard) against the measurements of a new or a contending method. When two methods of measurement are compared, neither provides an unequivocally correct measurement because both have a measurement error, and the true value remains unknown (Bland and Altman, 2010). Stevens et al. (2015) proposes the probability of agreement (θs) as a plot metric to represent the agreement between two measurement systems across a range of plausible values. The θs method addresses some of the challenges of the accepted “limits agreement method” presented by Bland and Altman (2010). Besides the agreement plot, agreement is based on maximum likelihood bias parameters: α and β, quantifying the fixed bias and the proportional bias. If α=0, β=1 and σ1=σ2, then the two measurements are identical; σj is the measurement variation. The probability-of-agreement analysis was performed using the ProbAgreeAnalysis (, last access: 28 November 2020) in MATLAB 9.4 (MathWorks, Inc.). An arbitrary 1 gCm-2d-1 was considered to be a tolerable magnitude to conclude that agreement is sufficient enough to use either estimated GPP interchangeably. The reference measurement was the GPP obtained from EC data at the Bernal site and tested against the MODIS MOD17 model, the OLS predictions or each of the two sets of EML predictions. If the probability-of-agreement plot suggested disagreement between two measurement systems, then the predictions can be adjusted using

(6) Adjusted predictor = predictor - α / β
3 Results

3.1 Eddy covariance fluxes at Bernal

Dominant flow at the Bernal site was from the northeast (Fig. 2). Energy balance closure for this site had a slope of 0.72 and r2=0.92 (Fig. 5). Homogeneous sites of the Fluxnet network obtain closure percentages higher than 72 %, and for the Bernal site, the vegetation heterogeneity was important (see below). Average H was always negative during nighttime, but during some months of the dry and rainy season, λE was positive, particularly after dawn (Fig. 6), so as to allow nighttime evaporation from the soil or vegetation. However, during some months of the dry season, rainfall was low (Fig. 7); then the positive λE suggested that cacti could have an active gas exchange at that time.

Figure 5Closure of the surface energy balance from eddy covariance measurements averaged at 30 min between the turbulent fluxes (H+λE) and available fluxes (Rn-G). Data are from 30 March 2017 to 22 August 2018 at the Bernal site. The regression was y=23.02+0.72×, with adjusted r2=0.92. The diagonal line represents the 1:1 relation.


Figure 6Latent heat flux daily trend at Bernal during different months, emphasizing nighttime λE. Panel (a) shows months with low rainfall in the previous month and predominantly negative λE during nighttime (these months had low rainfall: 0.17, 11.3 and 0.33 mm rainfall for January, March and December). Panel (b) shows months with low rainfall in the previous month and positive λE after sunset (6.7, 7.3 and 20.7 mm rainfall for February, April and May). Panel (c) shows months during the rainy season with positive λE mainly due to soil wetness and antecedent rainfall of 43, 202, 9, 190 and 34 mm for June, July, August, October and November. Plot (c) is out of scale on the y axis for compatibility with the other plots.


Figure 7Monthly rainfall and land surface temperature (LST) during years (a) 2017 and (b) 2018 at the Bernal site. The LST values correspond to the 13:30 (LST day) and 01:30 (LST night) MODIS Aqua satellite overpasses.


Carbon dioxide absorptions had a diurnal behavior beginning at dawn and ending before sunset (Fig. 8). Nighttime flux was positive, indicating respiration, notwithstanding the presence of cacti. Although summer rains are characteristic of the climate at the Bernal site (Fig. 7), a negative NEE flux occurred in all measured months. The lowest CO2 flux was recorded in January and February 2017 and in May 2018 (Table 1). This behavior resulted from the phenology of the vegetation since most species lost their leaves in the dry season and was also due to the effect of low temperature. Within the rainy season, the flux of CO2 increased compared to the months of January to June. The correlation between NEE and precipitation was 0.45. When the sum of the precipitation of the current month and that of the previous month was considered, the correlation with NEE was 0.7, suggesting that continuous availability of soil moisture is important for the absorption of CO2 in this environment.

Figure 8Net ecosystem exchange (NEE) and photosynthetic active radiation (PAR) at the Bernal site in (a) 2017 and (b) 2018. Negative values in the CO2 flux indicate photosynthesis. The gray shadow is the standard error of the mean for each month at any given hour.


Table 1Daily average values of the net ecosystem exchange (NEE), gross primary productivity (GPP) and ecosystem respiration (Reco) in a scrub at the Bernal site. Negative values of NEE indicate photosynthetic absorption.

Download Print Version | Download XLSX

The scrub at Bernal was heterogeneous in botanical composition. A total of 24 species of cacti and shrub were identified; on average, each sampling plot had 10.3 species. The IVI was similar between all cacti (0.36±0.04), shrub legumes (0.38±0.04) and other shrubs (0.23±0.06) sampled. Most sampling plots were in areas of high flux frequency (Fig. 2). Cylindropuntia imbricata had the largest IVI, followed by Acacia farnesiana, Acacia schaffneri and Prosopis laevigata. The IVI of the herbaceous stratum represented by grasses was not characterized due to the state of overgrazing and the absence of reproductive structures in plants, which made measurement of their abundances, frequencies and dominance difficult. The grass genera present were Melinis, Chloris, Cynodon and Cenchrus, all corresponding to invasive C4 tropical grasses. Scrub species of higher IVI had a similar LAI (1.2), although the magnitude of the LAI of P. laevigata stood out (Table 2).

Table 2Importance value index (IVI) and leaf area index (LAI) of the main species present at the Bernal, Querétaro, study site.

a SEM: standard error of the mean.

Download Print Version | Download XLSX

3.2 Machine learning ensembles as predictors of eddy covariance GPP

In this section we describe the modeling with EML using local remotely sensed data from the Bernal site to predict GPP at the same site and then the agreement between EML GPP predictions and EC-derived GPP. The AutoML function generated 1031 models with an average deviance of 1.35, while the deviance of the leader model was 0.63 in the training dataset (Table 3). A total of 11 models of type GBM and five XGBoost models were in the top 30 models, along with the nine best-of-family ensembles and five all-models ensembles. The weighted variable importance on the leader board was higher for the LAI from the MOD15 and MCD15 products (17 % and 14 %). The PsnNet, EVI (MOD17), FPAR (MCD15), green atmospherically resistant vegetation index (GARI) and MODIS reflectance band 13 had an importance higher than 3 % (5.9 %, 5.4 %, 4.2 %, 3.6 % and 3.0 %). LAI (MCD13 and MOD13), PsnNet and the FPAR (MOD15) were the more important variables (20 %, 17 %, 13 % and 10 %) in the top nonstacked model, a GBM model that was ranked in fourth place.

Predictions of GPP in the testing dataset showed dispersion in the lower range of the scale of measurement, and the correlation was 0.94 (Fig. 9a). The final prediction of GPP for the whole dataset had a probability of agreement (θs) of 0.58±0.01 (parameter estimate and standard error), α=-0.0616±0.11 and β=1.0133±0.02, suggesting a good fit with low fixed and proportional bias (Fig. 9b). The probability of agreement decreased slightly at the lower and upper range of the scale of measurement (Fig. 9c), indicating that the EML model would predict GPP without increasing the bias, particularly in the range from 0 to 4 gCm-2d-1. However, the value of θs should be higher than 0.95 so as to consider EC measurements and EML to be interchangeable. The correlation of 0.99 (r2=0.98) for the data in Fig. 9b could be misleading as it would suggest a very good fit.

Using only the five of the more important variables named above to generate an EML resulted in an XBoost leader model with 2.73 deviance and a total of 1094 models. The top 30 models were 16 XBoost and GBM models, and the best-of-family ensembles started to show up in 12th place. Although the number of runs was the same (20), the AutoML function increased the number of produced models, but the smaller set of explanatory variables constrained the ability to identify features contributing to better models. Using another set of five randomly selected explanatory variables (one vegetation index and four MODIS bands) resulted in a leader model with 2.52 deviance out of 1024 models, but this time the leader was the best-of-the-family ensemble. Using only a few variables was considered to increase the deviance compared to the average deviance of 1.35 obtained during the training phase and using all available variables.

Table 3Leader board of EML models for the Bernal site, using 85 % of day observations as training dataset. NA denotes the outcome where the type of model was not present in the 30 top-performing models according to deviance.

Download Print Version | Download XLSX

Figure 9Agreement between predictions of GPP obtained with machine learning algorithms or derived from eddy covariance measurements at the Bernal site. (a) Example of one run of predictions of GPP in the test dataset from the Bernal site using the leader model of an ensemble of machine learning algorithms (EML); the test dataset was 15 % of data, with r=0.94. (b) Predictions for the complete dataset, with r=0.99; the diagonal line is the 1:1 agreement. (c) Function of probability of agreement using 1 gCm-2d-1 as a tolerable agreement between methods of estimation of GPP corresponding to plot (b). The horizontal axis (s) represents the magnitude of the measurement, the vertical axis is the probability of agreement for the measurement, and the red line is the confidence interval p<0.05.


An important question for modeling upscaling is the capacity to extrapolate results temporally and spatially; here we explored the latter, posing the following question: would predictions of GPP from EML for an EC site agree with EC observations from another site with “similar” environmental conditions? First, an EML solution was found, training 80 % of the Santa Rita dataset and obtaining a best-of-family ensemble with 0.23 deviance out of 634 trained models (hereafter this model is referred to as the Santa Rita model). Then, the environmental and remotely sensed data from the Bernal site were fed into the Santa Rita model; this would be an external validation dataset. However, agreement was not good; the mean value of θs was 0.15±0.01, with α=-1.0822±0.09 and β=0.58127±0.02. The value of θs was not constant across the range of measurement and decreased rapidly after 2 gCm-2d-1 (Fig. 10b). Because the bias was important, predictions were adjusted using Eq. (6), showing some improvement with r=0.78 (Fig. 10a). Comparing Figs. 5b and 6a, it is evident that an EML model extrapolation to other conditions is noisier, i.e., Santa Rita model trying to represent the ecosystem function at Bernal. Notwithstanding, some of the most important variables were shared by both EML ensembles: Bernal and Santa Rita; in the case of Santa Rita, LAI from MOD15, MYD15 and MCD15 had 35.0 %, 4.8 % and 3.1 % of variable importance, and the FPAR from MOD15 was 12 %.

Figure 10(a) Adjusted predictions of GPP for the complete dataset from the Bernal site using the leader model of the final ensemble of machine learning algorithms (EML) derived from the Santa Rita site compared to estimates of GPP from EC data, with r=0.78. (b) Respective function of probability of agreement using 1 gCm-2d-1 as a tolerable agreement between methods of estimation of GPP: EC data from the Bernal site and EML model for the Santa Rita site. The horizontal axis (s) represents the magnitude of the measurement, the vertical axis is the probability of agreement for the measurement, and the red line is the confidence interval p<0.05.


3.3 MODIS as predictor of eddy covariance GPP

MODIS is important because it overpasses every point of the earth every 1 or 2 d, and it implements a GPP product (MOD17) that has helped track the response of the biosphere to the environment since 2000. The product MOD17 has been validated against many EC sites, but few validation sites correspond to deserts and semideserts (Running et al., 2004). The GPP MOD17 underestimated the GPP derived from EC data at Bernal (Fig. 11a). In a similar bell-shaped distribution of θs, as in the case of the extrapolation of the Santa Rita site (Fig. 10b), here the θs was not constant across the range of measurement; mean θs was 0.24±0.13, with α=0.00047±0.087 and β=0.48749±0.02 (Fig. 11c). Adjusting MOD17 estimates with Eq. (6) improved the relationship, but note that the value of r (0.76) was the same for the original MOD17 and the adjusted MOD17 (Fig. 11b).

Figure 11(a) MOD17 GPP daily time step from the Bernal site versus estimates of GPP from EC data using the complete dataset at a daily time step derived from spline for MOD17 GPP (), with r=0.76, or 8 d composite estimates obtained by 8 d averages of EC GPP (), with r=0.76. (b) Adjusted predictions of GPP from MODIS. (c) Corresponding function of probability of agreement for Fig. 7a using 1 gCm-2d-1 as a tolerable agreement between methods of estimation of GPP: EC data from the Bernal site and MODIS MOD17. The horizontal axis (s) represents the magnitude of the measurement. The vertical axis is the probability of agreement for the measurement.


3.4 Prediction eddy covariance GPP with ordinary-least-squares multiple regression

OLS is a common estimation method for linear models, and here this model appeared as adequate, judging by the general distribution of predictions (Fig. 12a) and the probability-of-agreement plot (Fig. 12b). A total of 14 variables were included in the model, all of them with VIF values lower than 7.0 (Appendix C); the VIF statistic quantifies the severity of multicollinearity, and an acceptable threshold is 10. The most significant variables were the EVI from MYD13 and Daymet variables precipitation, shortwave radiation, and minimum and maximum temperatures (see Appendix B for variable details). Variables with high coefficient values were MODIS reflectance band 14 (9.17), the EVI from MYD13 (8.53) and the NDVI (3.9), while Daymet temperatures had small coefficients: 0.23 for maximum temperature and 0.17 for minimum temperature. The θs decreased slightly at the end of the measurement range; mean θs was 0.5±0.014, with α=0.18845±0.137 and β=0.94966±0.031. No correction for this model was calculated since α was close to 0, β was close to 1, and the EML model had a higher θs.

Figure 12(a) Ordinary-least-squares multiple-regression estimates of GPP for the complete dataset from the Bernal site versus estimates of GPP from EC data, with r=0.82. (b) Respective function of probability of agreement using 1 gCm-2d-1 as a tolerable agreement between methods of estimation of GPP: EC data from the Bernal site and OLS multiple regression. The horizontal axis (s) represents the magnitude of the measurement. The vertical axis is the probability of agreement for the measurement.


3.5 Agreement

The probability of agreement (θs) was the statistic to determine if the mean responses were in agreement. Comparing the confidence intervals (CIs) for θs, the best modeling approach was the EML because its CI (0.56–0.59) was different from that of the OLS model (0.47–0.52). More importantly, for the EML and OLS models, the values of θs had little variation across the range of GPP estimates. However, the CIs of the EML and OLS models were similar in their bias estimates α and β, and both models had no bias (p<0.05). Altogether, the best result was the EML ensemble using environmental and remotely sensed data corresponding to the same site, i.e., Bernal (Fig. 10c). This kind of EML would be useful for gap filling or the evaluation of GPP time series of the site that generated the model. Machine learning algorithms can fill gaps longer than 30 d (Kang et al., 2019).

The second option to estimate GPP, using the same datasets, was the multiple-regression OLS model (Fig. 12a). The multiple regression is straightforward, and here multicollinearity was not a problem. The EML ensemble and the OLS regression have the highest values of θs (0.58 and 0.5, respectively; Fig. 12b). Higher values for θs are desirable (>0.95), and this could be achieved by increasing the sample size, relaxing the tolerable magnitude for agreement (here it was set at 1 gCm-2d-1) or perhaps using different forcing variables.

The MODIS estimates were a third-best alternative since the mean θs was 0.24. In the present study, we used a spline to fill the data to a daily time step since the MCD17 is an 8 d composite product, but a similar result was obtained if the EC GPP was rescaled and compared to the original 8 d MCD17 data (Fig. 11a). The GPP from MODIS was an underestimate of EC GPP, and when the estimates were adjusted (Eq. 6, Fig. 11b) the performance was not better than the OLS (model not shown). The MODIS land cover classification represented this site as grassland (MCD12), and this could be another reason for the poor agreement besides the assumptions made in the MODIS algorithm regarding the ε and εmax parameters and the response of vegetation to VPD. Agreement of MODIS GPP is crucial because MODIS products are frequently used in country-wide assessments of the carbon cycle and can influence public policies.

The model with least agreement resulted when the EML ensemble generated from the Santa Rita site was used to predict GPP at Bernal. Machine learning models can make predictions, but their usefulness decreases when they are used outside the context of where they were built, while process-based mechanistic models have this ability. Although the AutoML function in H2O is designed to protect against overfitting using cross-validation runs (Michailidis, 2018), in our study, the Santa Rita model could not be generalized to represent the Bernal site GPP process, probably because the variables and features selected for Bernal or Santa Rita were different during the AutoML workflow.

The Santa Rita model was good at predicting GPP at that site, with a deviance of the leader model of 0.23, while at Bernal the deviance of the leader model was 0.63 (Table 3), indicating that the Santa Rita EML ensemble was at least as good a model as the EML at Bernal (not shown). The GPP time series for Santa Rita was about 4 times the size of the Bernal dataset, and therefore the deviance was lower. However, when the Santa Rita model was used with Bernal data, the mean θs was 0.16, indicating that the agreement was insufficient. Eyeballing the predictions in Figs. 9a, 10b and 11a and their corresponding correlation values (0.78, 0.76 and 0.82 for adjusted EML Santa Rita, adjusted MODIS and multiple-regression OLS), it could be argued that these models were comparable. However, their θs plots present a different perspective.

4 Discussion

4.1 Model agreement

The Santa Rita model had higher θs when it extrapolated for low GPP values for the Bernal site, suggesting that the Santa Rita model had a better skill when predicting GPP close to 0 and even negative GPP values (Fig. 10b). Although the Bernal and Santa Rita sites had similar vegetation and climate classifications, they are more than 1600 km apart, and rainfall monthly distribution is different. The GPP seasonal cycle at Bernal started in February and steadily increased to a maximum during July and August (Table 1). At Santa Rita, the GPP was low from January until mid-July (<0.5gCO2m-2d-1) and then increased sharply to a maximum level in mid-August (Joiner and Yoshida, 2020). What matters eventually for machine learning methods is how well the predictor space rather than geographic space is sampled (Jung et al., 2020). Incorporating data from more humid (semiarid) sites could improve the GPP predictions by a machine learning method. Not only more EC sites, but also sites representing a water availability gradient would be important for semiarid ecosystems, representing long spells and the influence of oceanic oscillations and monsoon rains.

All the models presented here used transient data to represent GPP, specifically at the 1 d time step. Besides the radiation-related variables, two sources of rainfall were used as forcing variables, but evapotranspiration (ET) was not used. The MOD16A2 version 6 is available as an 8 d gap-filled product and could be included in EML or OLS models. However, a more general representation of the carbon cycling could be achieved when including variables that represent annual or seasonal timescales of soil water, evaporation or precipitation (Scott and Biederman, 2019). Scott et al. (2015) suggest that real lags between precipitation and productivity that may impart legacy effects may also be partially masked by using ET as ET more carefully tracks productivity when soil moisture storage is accessed. The occurrence of off-season rainfall, dry spells and carryover effects could be parameterized as windowed events of a given duration. A window would be a period with distinct time boundaries; the window allows the grouping of records with similar features. The effect of the window at a given time could be represented as moving weights as the point in time in question is closer or farther from the window.

In our study, the metric to assess model agreement was the probability of agreement (θs) and their bias parameters. Many other metrics can be used to evaluate model performance such as the root mean square error, r, r2 or the model efficiency factor (MEF) presented by Nash and Sutcliffe (1970). In particular, the MEF is a step in the Fluxnet data processing pipeline (Pastorello et al., 2020). A MEF value close to 1 represents a high correlation and lower biases (Joiner and Yoshida, 2020). Therefore, the calculated MEF for the Bernal site EML (0.98) would suggest very good predictive performance, while the θs of 0.6 for this model indicates a more modest performance. The θs for a particular value of the measurand is the probability that the difference between two measurements made by different systems falls within an interval that is deemed to be acceptable (Stevens et al., 2015). In the present study we used 1 gCm-2d-1 as a critical value defining an acceptable difference. If this value is smaller, then the probability of agreement would decrease for this same model. In such a case, it would be less likely that the predictors agree considering that the θs takes into account both the difference in the function means at a given value of the measurand and the uncertainty in its estimation.

4.2 MODIS discrepancies

Different authors have reported discrepancies between MCD17 and EC estimates of GPP in semiarid regions. Examining MODIS discrepancies in these ecosystems is important because the errors induced by cloud cover are expected to be minimal, and other effects can be identified (Gebremichael and Barros, 2006). The GPP of MOD17 did not relate well (EC=0.11+0.17MODIS, r2=0.67) with estimates of EC GPP in semidesert vegetation of the Sahel (Tagesson et al., 2017). With data from different types of vegetation in the Heihe basin in China, MODIS17 overestimated the GPP from EC (EC=1.15+0.24MODIS, r2=0.68; Cui et al., 2016). For scrub sites in Mexico, the relation between GPP calculated from EC and MOD17 was not good (MODIS=383.82+0.467EC, r2=0.6; Delgado-Balbuena et al., 2018). In arid and semiarid ecosystems in China, optimizing parameters of the MODIS GPP model with site-specific data improved the estimate to explain 91 % of the variation in the GPP of the data observed by EC (Wang et al., 2019). These same authors propose improving the land use classification used by the MOD17 algorithm and recalibrating light use efficiency parameters to solve the GPP estimation problem. Gebremichael and Barros (2006) examined an open shrubland site in a semiarid region of Sonora, Mexico, and their analysis of the temporal evolution of the discrepancies with MODIS GPP suggested revisiting the light use efficiency parameterization, especially the functional dependence on VPD and PAR, and water stress or soil moisture availability.

The relationship between the GPP MODIS and the GPP EC presented in Sect. 3.3 is an approximation because the uncertainty in the respiration component must be considered. The empirical relationship between nocturnal NEE and soil temperature has been used to represent ecosystem respiration (Reco) in order to separate the processes that contribute to daytime NEE (Richardson and Hollinger, 2005; Wofsy et al., 1993). Nighttime NEE should be equal to the rates of autotrophic and heterotrophic respiration, while during daytime, NEE should be equal to the combined rates of carboxylation and oxidation of Rubisco, autotrophic respiration and heterotrophic respiration. Then, the GPP can be calculated as the difference between daytime NEE and Reco, estimated through its relationship with temperature (Goulden et al., 1996). In the present study, Reco was calculated based on soil and air temperature following the procedure of Reichstein et al. (2005) implemented in REddyProc (Wutzler et al., 2018). Although it is possible to measure or model the partition of respiration (Running et al., 2004; Wang et al., 2018), the presence of cacti complicates the calculation, assuming that all nighttime flux represents ecosystem respiration (Owen et al., 2016; Richardson and Hollinger, 2005). While soil respiration tends to be temperature-limited when soil moisture is nonlimiting in temperate ecosystems, in rangeland ecosystems the controls of soil CO2 efflux were photosynthesis, soil temperature and moisture (Roby et al., 2019). In our study, the instrumentation did not include measurements of plant or soil respiration partition to validate the Reco estimates.

A problem regarding data comparison from remote orbital sensors and terrestrial observations is that different quantities are fundamentally measured. MODIS measures the radiation reflected by the earth's surface in two spectral bands at 250 m spatial resolution per pixel, five bands at 500 m and 29 bands at 1000 m. The EC technique has a footprint to measure CO2 that varies dynamically in shape and size but is generally considered to be 1 km2; in this study the footprint radius was 600 m. To solve the scaling, MODIS products related to the carbon cycle have been validated with the EC technique and biometric measurements on several spatial scales using process-based ecosystem models and characterizing areas up to 47 km2 around the EC measuring tower (Cohen et al., 2003).

At Bernal, the vegetation was heterogeneous. This situation was represented by the heterogeneity in vegetation activity in the 4 pixels used, spanning about 0.2 units of NDVI at the peak of the season activity (Fig. 1). The standard error of the IVI differed by 1 order of magnitude among species. Although more important species had a lower standard error, their means were similar, indicating that they were equally abundant at all sampling plots. The higher importance of some species (Table 2) was explained by selective grazing–browsing behavior and the dispersion caused by cattle by either ingesting or transporting seeds or plant parts (Belayneh and Tessema, 2017). Regarding landscape heterogeneity, the tower fetch was predominantly from the northeast, capturing the less heterogeneous area of the site but also the more active, according to the NDVI (Fig. 1).

The thorny scrub examined had two vegetation layers: the overstory layer mainly consisting of mesquite, acacia and cacti and the understory layer that included grasses and herbs. Cattle preferentially graze the understory, and because they eat using their tongue, they will avoid browsing thorny species, unlike goats or deer, which use their lips. Without grazing management, overtime, the competition balance will favor bush species, resulting in encroaching, and the understory will be stressed; only unpalatable species or those with their growing meristems very close to the ground would survive. Representing the structure and functioning of these two layers using MODIS is possible (Liu et al., 2017). Recently, Hill and Guerschman (2020) presented a MODIS product derived from MCD43A4 to estimate the fractions of photosynthetic and nonphotosynthetic vegetation and the remaining fraction of bare soil. These developments could improve the MOD17 GPP estimates since its model represents a homogeneous single vegetation layer. All these considerations help to understand the low θs between MOD17 estimates of GPP and EC-derived GPP.

4.3 Forcing variables and machine learning

Forcing variables in this study were gridded meteorological data and MODIS spectral bands and products, but the variables mostly included in the ML algorithms were LAI and FPAR, accounting for 36.4 % and 54.9 % of the variable importance index at Bernal and Santa Rita, respectively, in their leader board. This result supported the view that CO2 fluxes can be represented by ML algorithms exclusively using remotely sensed data (Tramontana et al., 2016; Joiner and Yoshida, 2020). Using neural networks, Joiner and Yoshida (2020) showed that with only satellite reflectance from MODIS and top-of-atmosphere PAR, it is possible to capture a large fraction of GPP variability. They argue that vegetation indices may reduce the information content of the underlying reflectance when compressing the information from two or more bands into a single index. In our study, reflectance bands were not often included in the ML models. This was most likely because MOD09GA and MODOCGA provide estimates of surface reflectance uncorrected for the illumination and viewing geometry, while the MCD43 used in Joiner and Yoshida (2020) uses a surface bidirectional reflectance distribution function (BRDF) model that improves the quality of surface albedo retrievals. By propagating the BRDF correction in the MODIS processing pipeline, the vegetation indices would more likely relate better to the GPP. Future modeling efforts should use benchmark scenarios according to sets of forcing variables already identified as useful.

4.4 Carbon flux

Although the carbon balance in ecosystems is influenced by different factors such as soil type and amount of nutrients, the relationship with soil temperature and humidity is particularly strong (Anderson-Teixeira et al., 2011; Hastings et al., 2005). How much of the rainwater the system can retain or lose has been described as the leakiness of the system (Guerschman et al., 2009). More than immediate incident rainfall, the available soil moisture and its redistribution are important in semiarid ecosystems, including steam flow, preferential flow paths, hydraulic lift and others (Barron-Gafford et al., 2017). At Bernal, when the sum of the precipitation of the current month and that of the previous month was considered, the correlation with NEE was 0.7, suggesting that continuous availability of soil moisture is important for the absorption of CO2 in this environment. This result is consistent with other studies in which the relationship between the net productivity of the ecosystem (NEP) and precipitation is initially positive but is leveled from 1000 to 1500 mm annually (Xu et al., 2014). The hydraulic redistribution of water from moist (deeper) to drier soils through plant roots tended to increase modeled annual ecosystem uptake of CO2; this process was identified at US-SMR (Fu et al., 2018; Scott et al., 2003).

The leakiness is highly dependent upon vegetation fractional cover, the proportion of the surface occupied by bare soil and vegetation: photosynthetically active vegetation and nonphotosynthetically active vegetation such as litter, wood and dead biomass (Guerschman et al., 2009). All these vegetation fractions have a water storage capacity and can reduce the amount of effective rainfall available to plant roots. It is possible that the canopy of the bushes completely intercepted the rainfall in some months because the scrub can intercept up to 20 % of the precipitation, and its canopy storage capacity is 0.97 mm (Mastachi-Loza et al., 2010). Considering only the daily rain events greater than 5 mm, the correlation between precipitation and NEE rose to 0.72. In the present study, the interception of rain by vegetation surfaces was not calculated, but the results suggest that it would be important to explore the relationship between net precipitation and NEE.

The average NEE at a global level is -156±284gCm-2yr-1 (Baldocchi, 2014). The highest frequency among sites that measured NEE with EC occurs from 200 to 300 gCm-2yr-1, but in sites with biometric measurements, the peak occurs at 100 gCm-2yr-1 (Xu et al., 2014). Using the daily averages of Table 2, the average NEE during the measurement period was 0.78 gCm-2d-1 and annually would be 283.5 gCm-2yr-1. This result was higher than the annual values of the induced grassland and scrubland vegetation characterizing the Sonoran Desert plains (138 and 130 gCm-2yr-1; Hinojo-Hinojo et al., 2019). In New Mexico, NEE values measured with EC are between 35–50 gCm-2yr-1 in desert grassland and 344–355 gCm-2yr-1 in mixed coniferous forest (Anderson-Teixeira et al., 2011). In a dryer region, the sarcocaulescent scrubland of Baja California in Mexico, the NEE was 39 and 52 gCm-2yr-1 in 2002 and 2003, respectively (Hastings et al., 2005). The NEE measured here was within the range of NEE, 0.3±0.2kgCm-2yr-1, for grasslands and shrublands in Mexico (Murray-Tortarolo et al., 2016). Although the measurements of the present study had gaps and were compared with annual studies, we considered that the reported value of C was representative of the main season of growth of this type of scrub.

4.5 Ecosystem management

Overgrazing is an appreciation relative to the grazing productive system where the forage resource is overused; in a mixed shrub–grass ecosystem, such as Bernal, it usually refers to the understory. Overgrazing means that the plant regrowth is readily grazed, tillers and root reserves are lost, and eventually the plant may die. Although the Bernal site was overgrazed, the carbon fluxes indicated that the plant community was photosynthetically active in both dry and rainy seasons. It is fair to assume that water in the soil was not limiting for the deep-rooted bush species, and that was the reason why it was possible to maintain the photosynthetic function during rainless months. However, this primary production would not have tangible benefits for ranchers' production system since no edible biomass would be produced for the cattle. From the point of view of carbon capture, the system accumulated nonlabile biomass that would remain in the system for a longer time compared to a grassland ecosystem (although it would be necessary to determine the partition of said shrub biomass). However, the overgrazing condition affects the biomass of the understory roots and consequently the carbon pool in the soil.

In the short term, it can be thought that the estimated negative carbon flows had a favorable effect on the environmental agenda. As time passes, it is possible that the gaps between the individual shrubs of the overstory expand, and this would have had an effect on soil erosion. It is also possible that the water stored in the soil profile used by the bushes gradually decreases to the point of causing drought, changes in phenology and advancements of the desertification process. There are many opportunities for ecology conservation and livestock-oriented management; these may include controlled grazing or propagating native thornless shrub species. If ranchers do not identify a benefit in the vegetation, then they will be tempted to remove it as it occurred at the study site. Because of its wide coverage and availability, the MODIS GPP product accuracy is important in representing the carbon cycle, raising awareness and monitoring advancement of environmental decisions.

Although we found that EML was a good option for modeling the GPP of a site, what is really needed to evaluate the performance of semiarid ecosystems is a spatial representation of the carbon flux. This is a problem for an underrepresented area regarding instrumented EC towers. However, the EML could be designed to take into account the explanatory variables in a spatiotemporal continuity.

5 Conclusions

The best modeling approach was the ensemble of machine learning. The second option to estimate GPP was the multiple-regression OLS, and the third alternative were the MODIS estimates. Machine learning was a good option to predict GPP in the local context where it was generated; otherwise its performance was not good. This was demonstrated when using the EML from Santa Rita and trying to predict fluxes at Bernal. Nevertheless, a machine learning model would be useful for gap filling or the evaluation of GPP at the same site. The GPP estimates of a given model can be adjusted using the bias parameters of the probability of agreement to improve the relationship. Further research exploring the usefulness of EML to predict GPP in other contexts could use EML to calibrate mechanistic model parameters, hybrid approaches integrating EML and biologically meaningful mechanistic models, or EML representing the spatial variability of the landscape.

Appendix A:  

Table A1Threshold of velocity friction (u*) above which nighttime fluxes were considered valid. Estimates were obtained using the R package REddyProc. We used only the flux records with u* equal to or higher than the corresponding mean u* threshold for each year. Bound values are a 95 % confidence interval. NA: not applicable.

Download Print Version | Download XLSX

Appendix B:  

Table B1MODIS reflectance bands and products database.

n/a: not applicable.

Download Print Version | Download XLSX

Table B2Daily vegetation indexes computed using the MODIS reflectance bands described in Table 1.

Download Print Version | Download XLSX

Table B3Daymet meteorological database.

Download Print Version | Download XLSX

Table B4Precipitation data.

Download Print Version | Download XLSX

Appendix C:  

Table C1Analysis of variance for GGP derived from EC data at the Bernal site. Variable details are described in Appendix B.

a Standard error of coefficient. b Degrees of freedom. c Adjusted sum of squares. d Variance inflation factor. e Variable name denotes the band number and spectral bandwidth of MODIS (Moderate Resolution Imaging Spectroradiometer). f Vegetation indices: RGRI is red–green ratio index, and GARI is green atmospherically resistant vegetation index; details of formula are described in Appendix B. g Layers from MODIS products. EVIMYD is enhanced vegetation index from MYD13; PsnNet is photosynthesis product form MOD17; LstNgtMYD is nighttime land surface temperature emissivity from MYD11. h Variables obtained from Daymet daily dataset: DayTmin is minimum temperature; DayTmax is maximum temperature; Daysrad is shortwave radiation; Dayprc is precipitation. i Daily rainfall rate from 3B42 TRMM (Tropical Rainfall Measuring Mission).

Download Print Version | Download XLSX

Code and data availability

Database and programming code are available at (Guevara-Escobar et al., 2020).

Author contributions

AGE contributed to conceptualization, formal analysis, investigation, methodology, software, supervision, validation, visualization and writing of the original draft as well as review and editing. EGS was in charge of funding acquisition, project administration, and writing and review. MCJ contributed to investigation, methodology, validation, visualization and writing of the original draft as well as review, editing and communication. MEQB helped with investigation, validation, writing and review. HSA contributed with conceptualization, writing and review. ICÁ helped with investigation, writing and review. VHCS contributed with writing, review and funding acquisition.

Competing interests

The authors declare that they have no conflict of interest.


We thank NASA and Ameriflux for data availability. Data filtering was carried out by the cluster Abacus I, Centre of Applied Mathematics and High Performance Computing ABACUS-CINVESTAV.

Financial support

This research has been supported by CONACyT – SEMARNAT (grant no. 2014-1-(249407)).

Review statement

This paper was edited by Paul Stoy and reviewed by three anonymous referees.


Aguirre-Díaz, G. J., Aguillón-Robles, A., Tristán-González, M., Labarthe-Hernández, G., López-Martínez, M., Bellon, H., and Nieto-Obregón, J.: Geologic setting of the Peña de Bernal Natural Monument, Querétaro, México: An endogenous volcanic dome, Geosphere, 9, 557–571,, 2013. 

Anav, A., Friedlingstein, P., Beer, C., Ciais, P., Harper, A., Jones, C., Murray-Tortarolo, G., Papale, D., Parazoo, N. C., and Peylin, P.: Spatiotemporal patterns of terrestrial gross primary production: A review, Rev. Geophys., 53, 785–818,, 2015. 

Anderson-Teixeira, K. J., Delong, J. P., Fox, A. M., Brese, D. A., and Litvak, M. E.: Differential responses of production and respiration to temperature and moisture drive the carbon balance across a climatic gradient in New Mexico, Glob. Change Biol., 17, 410–424,, 2011. 

Baldocchi, D.: Measuring fluxes of trace gases and energy between ecosystems and the atmosphere–the state and future of the eddy covariance method, Glob. Change Biol., 20, 3600–3609,, 2014. 

Barron-Gafford, G. A., Sanchez-Cañete, E. P., Minor, R. L., Hendryx, S. M., Lee, E., Sutter, L. F., Tran, N., Parra, E., Colella, T., Murphy, P. C., Hamerlynck, E. P., Kumar, P., and Scott, R. L.: Impacts of hydraulic redistribution on grass–tree competition vs facilitation in a semi-arid savanna, New Phytol., 215, 1451–1461,, 2017. 

Belayneh, A. and Tessema, Z. K.: Mechanisms of bush encroachment and its inter-connection with rangeland degradation in semi-arid African ecosystems: a review, J. Arid Land, 9, 299–312,, 2017. 

Bland, J. M. and Altman, D. G.: Statistical methods for assessing agreement between two methods of clinical measurement, Int. J. Nurs. Stud., 47, 931–936, 2010. 

Boettiger, C. and Eddelbuettel, D.: An introduction to rocker: Docker containers 639 for r, ArXiv Prepr. ArXiv171003675, 640, 2017. 

Bonilla-Moheno, M. and Aide, T. M.: Beyond deforestation: Land cover transitions in Mexico, Agric. Syst., 178, 102734,, 2020. 

Booker, K., Huntsinger, L., Bartolome, J. W., Sayre, N. F., and Stewart, W.: What can ecological science tell us about opportunities for carbon sequestration on arid rangelands in the United States?, Glob. Environ. Change, 23, 240–251,, 2013. 

Butzer, K. W. and Butzer, E. K.: The `natural'vegetation of the Mexican Bajío: archival documentation of a 16th-century savanna environment, Quat. Int., 43, 161–172,, 1997. 

CICESE, C. de I. C. y de E. S. de E.: Base de datos climatológica nacional (CLICOM)., Villa Bernal, Querétaro, available at: (last access: 28 August 2019), 2015. 

Cohen, W. B., Maiersperger, T. K., Yang, Z., Gower, S. T., Turner, D. P., Ritts, W. D., Berterretche, M., and Running, S. W.: Comparisons of land cover and LAI estimates derived from ETM+ and MODIS for four sites in North America: A quality assessment of 2000/2001 provisional MODIS products, Remote Sens. Environ., 88, 233–255,, 2003. 

Cui, T., Wang, Y., Sun, R., Qiao, C., Fan, W., Jiang, G., Hao, L., and Zhang, L.: Estimating vegetation primary production in the Heihe River Basin of China with multi-source and multi-scale data, PloS One, 11, 1–20,, 2016. 

Curtis, J. T. and McIntosh, R. P.: The Interrelations of Certain Analytic and Synthetic Phytosociological Characters, Ecology, 31, 434–455,, 1950. 

Delgado-Balbuena, J., Yepez, E. A., Ángeles-Pérez, G., Aguirre-Gutierrez, C., Arredondo, T., Ayala-Niño, F., Bullock, S. H., Castellanos, A. E., Cueva, A., Figueroa-Espinoza, B., Garatuza-Payán, J., Hinojo-Hinojo, C., Maya-Delgado, Y., Méndez-Barroso, L., Oechel, W., Paz-Pellat, F., Pérez-Ruíz, E. R., Rodríguez, J. C., Rojas-Robles, N., Sánchez-Mejía, Z. M., Uuh-Sonda, J., Vargas, R., Verduzco, V. S., Vivoni, E. R., and Watts, C.: Flujos anuales de carbono en ecosistemas terrestres de México, in Estado Actual del Conocimiento del Ciclo del Carbono y sus Interacciones en México: Síntesis a 2018., p. 686, Texcoco, Estado de México, México., 2018. 

Eshel, G., Dayalu, A., Wofsy, S. C., Munger, J. W., and Tziperman, E.: Listening to the Forest: An Artificial Neural Network-Based Model of Carbon Uptake at Harvard Forest, J. Geophys. Res.-Biogeo., 124, 461–478,, 2019. 

Fu, C., Wang, G., Bible, K., Goulden, M. L., Saleska, S. R., Scott, R. L. and Cardon, Z. G.: Hydraulic redistribution affects modeled carbon cycling via soil microbial activity and suppressed fire, Glob. Change Biol., 24, 3472–3485,, 2018. 

Gebremichael, M. and Barros, A. P.: Evaluation of MODIS Gross Primary Productivity (GPP) in tropical monsoon regions, Remote Sens. Environ., 100, 150–166,, 2006. 

Gitelson, A. A., Peng, Y., Arkebauer, T. J., and Suyker, A. E.: Productivity, absorbed photosynthetically active radiation, and light use efficiency in crops: Implications for remote sensing of crop primary production, J. Plant Physiol., 177, 100–109,, 2015. 

Glenn, E. P., Scott, R. L., Nguyen, U., and Nagler, P. L.: Wide-area ratios of evapotranspiration to precipitation in monsoon-dependent semiarid vegetation communities, J. Arid Environ., 117, 84–95,, 2015. 

Goldstein, A., Turner, W. R., Spawn, S. A., Anderson-Teixeira, K. J., Cook-Patton, S., Fargione, J., Gibbs, H. K., Griscom, B., Hewson, J. H., and Howard, J. F.: Protecting irrecoverable carbon in Earth's ecosystems, Nat. Clim. Change, 10, 1–9,, 2020. 

Goulden, M. L., Munger, J. W., Fan, S., Daube, B. C., and Wofsy, S. C.: Measurements of carbon sequestration by long-term eddy covariance: Methods and a critical evaluation of accuracy, Glob. Change Biol., 2, 169–182,, 1996. 

Grossiord, C., Buckley, T. N., Cernusak, L. A., Novick, K. A., Poulter, B., Siegwolf, R. T., Sperry, J. S., and McDowell, N. G.: Plant responses to rising vapor pressure deficit, New Phytol., 226, 1550–1566, 2020. 

Guerschman, J. P., Hill, M. J., Renzullo, L. J., Barrett, D. J., Marks, A. S., and Botha, E. J.: Estimating fractional cover of photosynthetic vegetation, non-photosynthetic vegetation and bare soil in the Australian tropical savanna region upscaling the EO-1 Hyperion and MODIS sensors, Remote Sens. Environ., 113, 928–945,, 2009. 

Guevara-Escobar, A., González-Sosa, E., Cervantes-Jiménez, M., Suzán-Azpiri, H., Queijeiro-Bolaños, M. E., Carrillo-Ángeles, I., and Cambrón-Sandoval, V. H.: Machine learning estimates of eddy covariance carbon flux in a scrub in the Mexican highland, Zenodo,, 2020 

Hall, P., Gill, N., Kurka, M., Phan, W., and Bartz, A.: Machine Learning Interpretability with H2O Driverless AI, edited by: Bartz, A., Inc., California, US, 2019. 

Hastings, S. J., Oechel, W. C., and Muhlia-Melo, A.: Diurnal, seasonal and annual variation in the net ecosystem CO2 exchange of a desert shrub community (Sarcocaulescent) in Baja California, Mexico, Glob. Change Biol., 11, 927–939,, 2005. 

Henrich, V., Krauss, G., Götze, C., and Sandow, C.: Entwicklung einer Datenbank für Fernerkundungsindizes, vol. 45, AK Fernerkundung, Bochum, 2012. 

Hill, M., Held, A., Leuning, R., Coops, N., Hughes, D., and Cleugh, H.: MODIS spectral signals at a flux tower site: Relationships with high-resolution data, and CO2 flux and light use efficiency measurements, Remote Sens. Environ., 103, 351–368,, 2006. 

Hill, M. J. and Guerschman, J. P.: The MODIS Global Vegetation Fractional Cover Product 2001–2018: Characteristics of Vegetation Fractional Cover in Grasslands and Savanna Woodlands, Remote Sens., 12, 406,, 2020. 

Hinojo-Hinojo, C., Castellanos, A. E., Huxman, T., Rodriguez, J. C., Vargas, R., Romo-León, J. R., and Biederman, J. A.: Native shrubland and managed buffelgrass savanna in drylands: Implications for ecosystem carbon and water fluxes, Agr. Forest Meteorol., 268, 269–278,, 2019. 

Järvi, L., Havu, M., Ward, H. C., Bellucco, V., McFadden, J. P., Toivonen, T., Heikinheimo, V., Kolari, P., Riikonen, A., and Grimmond, C. S. B.: Spatial modeling of local-scale biogenic and anthropogenic carbon dioxide emissions in Helsinki, J. Geophys. Res.-Atmos., 124, 8363–8384,, 2019. 

Joiner, J. and Yoshida, Y.: Satellite-based reflectances capture large fraction of variability in global gross primary production (GPP) at weekly time scales, Agr. Forest Meteorol., 291, 108092,, 2020. 

Jung, M., Schwalm, C., Migliavacca, M., Walther, S., Camps-Valls, G., Koirala, S., Anthoni, P., Besnard, S., Bodesheim, P., Carvalhais, N., Chevallier, F., Gans, F., Goll, D. S., Haverd, V., Köhler, P., Ichii, K., Jain, A. K., Liu, J., Lombardozzi, D., Nabel, J. E. M. S., Nelson, J. A., O'Sullivan, M., Pallandt, M., Papale, D., Peters, W., Pongratz, J., Rödenbeck, C., Sitch, S., Tramontana, G., Walker, A., Weber, U., and Reichstein, M.: Scaling carbon fluxes from eddy covariance sites to globe: synthesis and evaluation of the FLUXCOM approach, Biogeosciences, 17, 1343–1365,, 2020. 

Kang, M., Ichii, K., Kim, J., Indrawati, Y. M., Park, J., Moon, M., Lim, J.-H., and Chun, J.-H.: New Gap-Filling Strategies for Long-Period Flux Data Gaps Using a Data-Driven Approach, Atmosphere, 10, 568,, 2019. 

Lal, R., Griffin, M., Apt, J., Lave, L., and Morgan, M. G.: Managing Soil Carbon, Science, 304, 393–393,, 2004. 

LI-COR, B.: Eddypro Software instruction manual, Nebraska, US, version 7.0, 2019. 

Liu, Y., Liu, R., Pisek, J., and Chen, J. M.: Separating overstory and understory leaf area indices for global needleleaf and deciduous broadleaf forests by fusion of MODIS and MISR data, Biogeosciences, 14, 1093–1110,, 2017. 

Long, S. P., Marshall-Colon, A., and Zhu, X.-G.: Meeting the Global Food Demand of the Future by Engineering Crop Photosynthesis and Yield Potential, Cell, 161, 56–66,, 2015. 

Ma, X., Huete, A., Yu, Q., Restrepo-Coupe, N., Beringer, J., Hutley, L. B., Kanniah, K. D., Cleverly, J., and Eamus, D.: Parameterization of an ecosystem light-use-efficiency model for predicting savanna GPP using MODIS EVI, Remote Sens. Environ., 154, 253–271, 2014. 

Ma, X., Migliavacca, M., Wirth, C., Bohn, F. J., Huth, A., Richter, R., and Mahecha, M. D.: Monitoring Plant Functional Diversity Using the Reflectance and Echo from Space, Remote Sens., 12, 1248,, 2020. 

Marcolla, B., Rödenbeck, C., and Cescatti, A.: Patterns and controls of inter-annual variability in the terrestrial carbon budget, Biogeosciences, 14, 3815–3829,, 2017. 

Mastachi-Loza, C. A., González-Sosa, E., Becerril-Piña, R., and Braud, I.: Pérdidas por intercepción en mezquite (Prosopis laevigata) y huizache (Acacia farnesiana) de la región semiárida del centro de México, Tecnol. Cienc. Agua, 1, 103–120, 2010. 

Mauder, M. and Foken, T.: Documentation and instruction manual of the eddy-covariance software package TK3, Universität Bayreuth Abteilung Mikrometeorologie, Bayreuth, Germany, 2011. 

McClaran, M.: A century of vegetation change on the Santa Rita Experimental Range, in: Santa Rita Experimental Range: 100 years (1903 to 2003) to accomplishments and contributions, edited by: McClaran, M. P., Ffolliott, P. F., and Edminister, C. B., Tech. Coords., USDA Forest service proceedings RMRS-P-30, Tucson, AZ, US, 16–33, 2003. 

McCulley, R., Jobbagy, E., Pockman, W., and Jackson, R.: Nutrient uptake as a contributing explanation for deep rooting in arid and semi-arid ecosystems, Oecologia, 141, 620–628,, 2004. 

McElreath, R.: Statistical Rethinking: A Bayesian Course with Examples in R and STAN, CRC Press, Boca Raton, FL, US, 2020. 

Michailidis, M.: How Driverless AI Prevents Overfitting and Leakage, Open Source Lead, AI ML, available at:, (last access: 7 July 2020), 2018. 

Mitchell, R. and Frank, E.: Accelerating the XGBoost algorithm using GPU computing, PeerJ Comput. Sci., 3, e127,, 2017. 

Monteith, J.: Solar radiation and productivity in tropical ecosystems, J. Appl. Ecol., 9, 747–766, 1972. 

Murray-Tortarolo, G., Friedlingstein, P., Sitch, S., Jaramillo, V. J., Murguía-Flores, F., Anav, A., Liu, Y., Arneth, A., Arvanitis, A., Harper, A., Jain, A., Kato, E., Koven, C., Poulter, B., Stocker, B. D., Wiltshire, A., Zaehle, S., and Zeng, N.: The carbon cycle in Mexico: past, present and future of C stocks and fluxes, Biogeosciences, 13, 223–238,, 2016. 

Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, J. Hydrol., 10, 282–290, 1970. 

Owen, N. A., Choncubhair, Ó. N., Males, J., del Real Laborde, J. I., Rubio-Cortés, R., Griffiths, H., and Lanigan, G.: Eddy covariance captures four-phase crassulacean acid metabolism (CAM) gas exchange signature in Agave, Plant Cell Environ., 39, 295–309,, 2016. 

Pasetto, D., Arenas-Castro, S., Bustamante, J., Casagrandi, R., Chrysoulakis, N., Cord, A. F., Dittrich, A., Domingo-Marimon, C., El Serafy, G., and Karnieli, A.: Integration of satellite remote sensing data in ecosystem modelling at local scales: Practices and trends, Methods Ecol. Evol., 9, 1810–1821,, 2018. 

Pastorello, G., Trotta, C., and Canfora, E.: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data, Sci. Data, 7, 225 pp.,, 2020. 

Pearcy, R. W.: Heterogeneous light environments, in: Handbook of Functional Plant Ecology, edited by: Pugnaire, I. F. and Valladares, F., CRC Press, Boca Ratón, FL, US, 270–314, 2007. 

Piao, S., Liu, Q., Chen, A., Janssens, I. A., Fu, Y., Dai, J., Liu, L., Lian, X., Shen, M., and Zhu, X.: Plant phenology and global climate change: Current progresses and challenges, Glob. Change Biol., 25, 1922–1940,, 2019. 

R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, available at:, (last access: 15 October 2015), 2009. 

Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier, P., Bernhofer, C., Buchmann, N., Gilmanov, T., and Granier, A.: On the separation of net ecosystem exchange into assimilation and ecosystem respiration: review and improved algorithm, Glob. Change Biol., 11, 1424–1439,, 2005. 

Richardson, A. D. and Hollinger, D. Y.: Statistical modeling of ecosystem respiration using eddy covariance data: maximum likelihood parameter estimation, and Monte Carlo simulation of model and parameter uncertainty, applied to three simple models, Agr. Forest Meteorol., 131, 191–208,, 2005. 

Richardson, A. D., Hollinger, D. Y., Shoemaker, J. K., Hughes, H., Savage, K., and Davidson, E. A.: Six years of ecosystem-atmosphere greenhouse gas fluxes measured in a sub-boreal forest, Sci. Data, 6, 1–15,, 2019. 

Roby, M. C., Scott, R. L., Barron-Gafford, G. A., Hamerlynck, E. P., and Moore, D. J. P.: Environmental and Vegetative Controls on Soil CO2 Efflux in Three Semiarid Ecosystems, Soil Syst., 3, 6,, 2019. 

Running, S. W. and Zhao, M.: Daily GPP and annual NPP (MOD17A2/A3) products NASA Earth Observing System MODIS land algorithm, Univ. of Mont, Missoula, Mont., US, 2015. 

Running, S. W., Nemani, R. R., Heinsch, F. A., Zhao, M., Reeves, M., and Hashimoto, H.: A continuous satellite-derived measure of global terrestrial primary production, Bioscience, 54, 547–560,[0547:ACSMOG]2.0.CO;2, 2004. 

Rzedowski, J.: Vegetación de México: México, Editor. Limusa, 1978. 

Scott, R. L. and Biederman, J. A.: Critical Zone Water Balance Over 13 Years in a Semiarid Savanna, Water Resour. Res., 55, 574–588,, 2019. 

Scott, R. L., Watts, C., Payan, J. G., Edwards, E., Goodrich, D. C., Williams, D., and James Shuttleworth, W.: The understory and overstory partitioning of energy and water fluxes in an open canopy, semiarid woodland, Agr. Forest Meteorol., 114, 127–139,, 2003. 

Scott, R. L., Edwards, E. A., Shuttleworth, W. J., Huxman, T. E., Watts, C. and Goodrich, D. C.: Interannual and seasonal variation in fluxes of water and carbon dioxide from a riparian woodland ecosystem, Agr. Forest Meteorol., 122, 65–84,, 2004. 

Scott, R. L., Biederman, J. A., Hamerlynck, E. P., and Barron-Gafford, G. A.: The carbon balance pivot point of southwestern US semiarid ecosystems: Insights from the 21st century drought, J. Geophys. Res.-Biogeo., 120, 2612–2624, 2015. 

Segerstrom, K.: Geology of the Bernal-Jalpan Area, Estado de Queretaro, Mexico, Geol. Surv. Bull., 1104, 19–82, 1961. 

Soper, F. M., McCalley, C. K., Sparks, K., and Sparks, J. P.: Soil carbon dioxide emissions from the Mojave desert: Isotopic evidence for a carbonate source, Geophys. Res. Lett., 44, 245–251,, 2017. 

Stevens, N. T., Steiner, S. H., and MacKay, R. J.: Assessing agreement between two measurement systems: An alternative to the limits of agreement approach, Stat. Methods Med. Res., 26, 2487–2504,, 2015. 

Tagesson, T., Ardö, J., Cappelaere, B., Kergoat, L., Abdi, A., Horion, S., and Fensholt, R.: Modelling spatial and temporal dynamics of gross primary production in the Sahel from earth-observation-based photosynthetic capacity and quantum efficiency, Biogeosciences, 14, 1333–1348,, 2017. 

Thornton, P., Thornton, M., Mayer, B., Wei, Y., Devarakonda, R., Vose, R., and Cook, R.: Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 3, Oak Ridge National Lab. (ORNL), Oak Ridge, TN, US, 2017. 

Tramontana, G., Jung, M., Schwalm, C. R., Ichii, K., Camps-Valls, G., Ráduly, B., Reichstein, M., Arain, M. A., Cescatti, A., Kiely, G., Merbold, L., Serrano-Ortiz, P., Sickert, S., Wolf, S., and Papale, D.: Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms, Biogeosciences, 13, 4291–4313,, 2016.” 

Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, C. B., and Farivar, R.: Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools, in: Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, US, 1471–1479, 2019. 

Verbist, K., Santibañez, F., Gabriels, D., and Soto, G.: Atlas de zonas áridas de América Latina y El Caribe, Donoso, M.C. (Coord.), UNESCO-PHI and CAZALAC, Montevideo, Uruguay, 2010. 

Wang, H., Li, X., Ma, M., and Geng, L.: Improving estimation of gross primary production in dryland ecosystems by a model-data fusion approach, Remote Sens., 11, 225,, 2019. 

Wang, X., Ma, M., Li, X., Song, Y., Tan, J., Huang, G., Zhang, Z., Zhao, T., Feng, J., and Ma, Z.: Validation of MODIS-GPP product at 10 flux sites in northern China, Int. J. Remote Sens., 34, 587–599,, 2013. 

Wang, X., Wang, H., Li, X., and Ran, Y.: Photosynthesis (NPP, NEP, Respiration), Obs. Meas. Ecohydrol. Process., 1–30,, 2018. 

Wheeler, C. W., Archer, S. R., Asner, G. P., and McMurtry, C. R.: Climatic/edaphic controls on soil carbon/nitrogen response to shrub encroachment in desert grassland, Ecol. Appl., 17, 1911–1928,, 2007. 

Wilcox, B. P., Birt, A., Fuhlendorf, S. D., and Archer, S. R.: Emerging frameworks for understanding and mitigating woody plant encroachment in grassy biomes, Curr. Opin. Env. Sust., 32, 46–52,, 2018. 

Wilson, J. B.: Cover plus: ways of measuring plant canopies and the terms used for them, J. Veg. Sci., 22, 197–206,, 2011. 

Wofsy, S., Goulden, M., Munger, J., Fan, S.-M., Bakwin, P., Daube, B., Bassow, S., and Bazzaz, F.: Net exchange of CO2 in a mid-latitude forest, Science, 260, 1314–1317,, 1993. 

Wu, C., Niu, Z., and Gao, S.: Gross primary production estimation from MODIS data with vegetation index and photosynthetically active radiation in maize, J. Geophys. Res.-Atmos., 115, D12127,, 2010. 

Wutzler, T., Lucas-Moffat, A., Migliavacca, M., Knauer, J., Sickel, K., Šigut, L., Menzer, O., and Reichstein, M.: Basic and extensible post-processing of eddy covariance flux data with REddyProc, Biogeosciences, 15, 5015–5030,, 2018. 

Xiao, H., McDonald-Madden, E., Sabbadin, R., Peyrard, N., Dee, L. E., and Chadès, I.: The value of understanding feedbacks from ecosystem functions to species for managing ecosystems, Nat. Commun., 10, 1–10,, 2019. 

Xu, B., Yang, Y., Li, P., Shen, H., and Fang, J.: Global patterns of ecosystem carbon flux in forests: A biometric data-based synthesis, Global Biogeochem. Cy., 28, 962–973,, 2014. 

Yao, J., Liu, H., Huang, J., Gao, Z., Wang, G., Li, D., Yu, H., and Chen, X.: Accelerated dryland expansion regulates future variability in dryland gross primary production, Nat. Commun., 11, 1–10,, 2020. 

Yepez, E. A., Williams, D. G., Scott, R. L., and Lin, G.: Partitioning overstory and understory evapotranspiration in a semiarid savanna woodland from the isotopic composition of water vapor, Agr. Forest Meteorol., 119, 53–68,, 2003. 

Yona, L., Cashore, B., Jackson, R. B., Ometto, J., and Bradford, M. A.: Refining national greenhouse gas inventories, Ambio, 1–6,, 2020. 

Zeng, X., Zeng, X., and Barlage, M.: Growing temperate shrubs over arid and semiarid regions in the Community Land Model–Dynamic Global Vegetation Model, Global Biogeochem. Cy., 22, GB3003,, 2008. 

Zhang, A., Jia, G., Epstein, H. E., and Xia, J.: ENSO elicits opposing responses of semi-arid vegetation between Hemispheres, Sci. Rep., 7, 42281,, 2017. 

Zhang, L., Xiao, J., Zheng, Y., Li, S., and Zhou, Y.: Increased carbon uptake and water use efficiency in global semi-arid ecosystems, Environ. Res. Lett., 15, 034022,, 2020. 

Zhu, X.-G., Long, S. P., and Ort, D. R.: What is the maximum efficiency with which photosynthesis can convert solar energy into biomass?, Curr. Opin. Biotech., 19, 153–159,, 2008. 

Short summary
All vegetation types can sequester carbon dioxide. We compared ground measurements (eddy covariance) with remotely sensed data (Moderate Resolution Imaging Spectroradiometer, MODIS) and machine learning ensembles of primary production in a semiarid scrub in Mexico. The annual carbon sink for this vegetation type was −283.5 g C m−2 y−1; MODIS underestimated the primary productivity, and the machine learning modeling was better locally.
Final-revised paper