Eddy covariance carbon flux in a scrub in the Mexican highland

Vegetation fixes C in its biomass through photosynthesis or might release it into the atmosphere through respiration. 10 Measurements of these fluxes would help us understand ecosystem functioning. The eddy covariance technique (EC) is widely used to measure the net ecosystem exchange of C (NEE) which is the balance between gross primary production (GPP) and ecosystem respiration (Reco). Orbital satellites such as MODIS can also provide estimates of GPP. In this study, we measured NEE with the EC in a scrub at Bernal in Mexico, and then partitioned into gross primary production (GPP-EC) and Reco using the recent R package Reddyproc. Measurements of GPP-EC were related to the estimates from the MODIS satellite provided 15 in product MOD17A2H, which contains data of the gross primary productivity (GPP-MODIS). The Bernal site was a carbon sink despite it was an overgrazed site, the average NEE during fifteen months of 2017 and 2018 was -0.78 g C m d and the flux was negative in all measured months. The GPP-MODIS underestimated the ground data when representing the relation with a Theil-Sen regression: GPP-EC = 1.866 + 1.861 GPP-MODIS; an ordinary less squares regression had similar coefficients and the R was 0.6. Although cacti (CAM), legume shrubs (C3) and herbs (C3) had a similar vegetation index, the 20 nighttime flux was characterized by positive NEE suggesting that the photosynthetic dark-cycle flux of cacti was lower than Reco. The discrepancy among the GPP flux estimates stresses the need to understand the limitations of EC and remote sensors, while incorporating complementary monitoring and modelling schemes of nighttime Reco, particularly in the presence of species with different photosynthetic cycles.

Abstract. Arid and semiarid ecosystems contain relatively high species diversity and are subject to intense use, in particular extensive cattle grazing, which has favored the expansion and encroachment of perennial thorny shrubs into the grasslands, thus decreasing the value of the rangeland. However, these environments have been shown to positively impact global carbon dynamics. Machine learning and remote sensing have enhanced our knowledge about carbon dynamics, but they need to be further developed and adapted to particular analysis. We measured the net ecosystem exchange (NEE) of C with the eddy covariance (EC) method and estimated gross primary production (GPP) in a thorny scrub at Bernal in Mexico. We tested the agreement between EC estimates and remotely sensed GPP estimates from the Moderate Resolution Imaging Spectroradiometer (MODIS), and also with two alternative modeling methods: ordinary-leastsquares (OLS) regression and ensembles of machine learning algorithms (EMLs). The variables used as predictors were MODIS spectral bands, vegetation indices and products, and gridded environmental variables. The Bernal site was a carbon sink even though it was overgrazed, the average NEE during 15 months of 2017 and 2018 was −0.78 g C m −2 d −1 , and the flux was negative or neutral during the measured months. The probability of agreement (θ s) represented the agreement between observed and estimated values of GPP across the range of measurement. According to the mean value of θ s, agreement was higher for the EML (0.6) followed by OLS (0.5) and then MODIS (0.24). This graphic metric was more informative than r 2 (0.98, 0.67, 0.58, re-spectively) to evaluate the model performance. This was particularly true for MODIS because the maximum θs of 4.3 was for measurements of 0.8 g C m −2 d −1 and then decreased steadily below 1 θ s for measurements above 6.5 g C m −2 d −1 for this scrub vegetation. In the case of EML and OLS, the θ s was stable across the range of measurement. We used an EML for the Ameriflux site US-SRM, which is similar in vegetation and climate, to predict GPP at Bernal, but θ s was low (0.16), indicating the local specificity of this model. Although cacti were an important component of the vegetation, the nighttime flux was characterized by positive NEE, suggesting that the photosynthetic dark-cycle flux of cacti was lower than ecosystem respiration. The discrepancy between MODIS and EC GPP estimates stresses the need to understand the limitations of both methods. woody areas increased, rural-urban migration being an important driver of that transition (Bonilla-Moheno and Aide, 2020). The transition from grasslands to shrublands or scrub is linked to the extremely heavy grazing by domestic livestock (Wilcox et al., 2018).
Vegetation in the arid and semiarid ecosystems are mostly classified as rangelands. These are one of the most widely distributed landscapes on earth, incorporating a wide range of communities including grasslands, shrublands and savannah. Scrub is a xeric category of shrublands characterized by plants with small leaves; it is very thorny, and its biomass is distributed mainly to roots and leaves rather than the stems (Rzedowski, 1978;Wheeler et al., 2007;Zhang et al., 2017). Studies need to consider multiple sources of evidence and the driving processes of land use change in Mexico to aid in policy formulation and to identify regions that may provide important ecosystem services (Murray-Tortarolo et al., 2016).
On the other hand, photosynthesis contributes to carbon sequestration by moving carbon stock from the atmosphere to other pools or sinks, such as aboveground biomass, roots and soil organic matter (Booker et al., 2013). The role of vegetation in carbon sequestration on arid and semiarid ecosystems is less evident because the growth rate is low, and biomass partition above and below ground is different from that of temperate and tropical forests. Competitive interactions of arid plants at the community level are strongly influenced by rooting architecture and phenological growth (Zeng et al., 2008). Many plants in semiarid systems support a deep and wide root system as a drought adaptation but also for nutrient uptake (McCulley et al., 2004).
Recent time trends indicate that semiarid ecosystems regulate the terrestrial carbon sink and dominate its interannual variability (Piao et al., 2019;Scott et al., 2015;Zhang et al., 2020). This variability mainly results from the imbalance between two larger biogenic fluxes that constitute the net ecosystem exchange (NEE): the photosynthetic uptake of CO 2 (gross primary production, GPP) and the respiratory release of CO 2 (total ecosystem respiration, R eco ). Radiation and water availability are important environmental drivers of NEE and thus GPP and R eco (Marcolla et al., 2017). However, other carbon fluxes contribute to the imbalance, such as fire and anthropogenic CO 2 emissions (Järvi et al., 2019;Piao et al., 2019). Another atmospheric CO 2 flux is that from soil inorganic carbon in arid and semiarid ecosystems (Soper et al., 2017). Calcium carbonates form in the soil at a relatively low rate of 5 to 150 kg C ha −1 yr −1 ; this carbon can return to the atmosphere, but it forms a carbon sink when carbonates are leached into the groundwater (Lal et al., 2004).
The methods used to explore the ecosystems and the understanding of their functioning are changing rapidly, particularly for arid and semiarid ecosystems (Goldstein et al., 2020;Ma et al., 2020;Xiao et al., 2019;Yao et al., 2020). There are many instruments and techniques for estimating carbon and water fluxes, but two stand out in the literature: eddy covariance (EC) and remote-sensing techniques. EC is a micrometeorological method that measures the ecosystem community NEE at short time intervals, representing a land surface smaller than 1 km 2 . The orbital remote sensors measure radiation emitted or reflected by the earth's surface using different algorithms, representing different traits of vegetation activity from large-scale areas. Both techniques are complementary, but an agreement between their estimates is important for regional, countrywide and global spatiotemporal monitoring of greenhouse gas inventories (Yona et al., 2020); ecological modeling; quantifying the interaction among the vegetation component and the hydrological component; energy and nutrient cycles; and others applications (Pasetto et al., 2018). Particularly, products from the Moderate Resolution Imaging Spectroradiometer (MODIS) have ample availability and have been extensively used to study land surface since 2000.
Gross primary production can be represented by a wide range of models, ranging in complexity from simple regression based on climatic forcing variables to complex models that simulate biophysical and ecophysiological processes (Anav et al., 2015). The MODIS MOD17 product uses a photosynthetic radiation conversion efficiency model (Running and Zhao, 2015), but a better relationship is reported with EC-derived GPP when the model uses vegetation indices calculated from the same MODIS platform (Ma et al., 2014;Wu et al., 2010). Although they are black box models in principle, recent modeling efforts report good agreement of GPP estimates obtained from machine learning (ML) algorithms or ensembles of models (Eshel et al., 2019;Joiner and Yoshida, 2020;Jung et al., 2020). Different machine learning algorithms are powerful because they can identify trends and patterns in big datasets and solve regression or classification problems.
To generate models of GPP, we measured EC fluxes during 2017-2018 in a thorny scrub with semiarid climate in the highlands of Mexico (Bernal site). Competing models were data-driven machine learning regression ensembles (EMLs) and ordinary-least-squares (OLS) regression, both using Daymet (Thornton et al., 2017) and MODIS datasets as explanatory variables. The MODIS GPP product was used as a baseline comparison. The second step was to use an EML model based on local data (Daymet and MODIS) from a site with EC instrumentation and similar vegetation to that of the Bernal's site and then use that model to predict GPP at the Bernal site. The site we used was Santa Rita from the Ameriflux network. While Santa Rita is in the Sonoran Desert, and Bernal is on the southern border of the Chihuahuan Desert, both have a similar climate and vegetation (Fig. 1). A good agreement between Bernal EC data and the predictions from the Santa Rita model would support the use of machine learning algorithms as a scale-up mechanism. This would be useful to the understanding of semiarid ecosystems and also improve current earth system models (Piao et al., 2019).
We measured the carbon flux for the vegetation in a semiarid site located at Bernal, Mexico. The phenology patterns in the region suggested that this site could be a carbon sink during the wet season or a carbon source during the dry season since some predominant species reproduce during winter and spring, particularly cacti, Acacia and Prosopis (mesquite). Furthermore, the Bernal site had a history of disturbance by overgrazing; this could decrease the GPP and even result in a positive carbon balance, thus being a carbon source. If the shrub vegetation in this site predominantly absorbed carbon during the measuring period, then this evidence would contribute to reinforce the reported importance of semiarid environments in the global carbon balance (Zhang et al., 2020). However, land ownership patterns and the balance between agricultural investments and land conservation will determine the absolute amount of carbon sequestered. Hopefully, the results of the present investigation would promote the idea that carbon sequestration is possible in scrubland and be incorporated in informed decisions and new policy.

Site description
The study site (Bernal) is located at 20.717 • N, 99.941 • W and 2050 m a.s.l. in the municipality of Ezequiel Montes in Querétaro, where real-estate development, feedlot beef production, cheese and wine production associated with tourism, and automotive-industry development are very attractive options for landowners in the region. Bernal is located in a shallow valley oriented from north to south, approximately 15 to 20 km wide and opening to the south to the Río Lerma basin and then draining into the Pacific Ocean. The northern limit of the valley is surrounded by hill country and its characteristic dacitic dome 433 m in height (Aguirre-Díaz et al., 2013). Moisture-laden winds blow westward from the Gulf of Mexico, but the Sierra Gorda, located 60 km east of Bernal, casts a rain shadow over the area (Segerstrom, 1961).
The Bernal site is private property, with grazing dairy cattle receiving additional concentrated feedstuffs under stallfeeding. Grazing was continuous, and water for livestock was only available in the feeding and milking area; there were no pasture divisions, and the perimeter fence was made of stone. These characteristics of the animal production model and the state of vegetation are representative of land management practices and the scrub vegetation of the region. However, the Bernal site suffered important changes in land use during 2019, and the scrub was suddenly cleared and converted into rainfed cropping.
The climate is arid with summer rains (BSk), with a mean annual rainfall of 476 mm and a mean annual temperature of 17.1 • C (CICESE, 2015). Prevailing wind is from the east and northeast. The terrain is mostly flat; most grades are below 2 %. The soil has a clay loam texture, the class is a Vertisol with abundant subrounded basaltic stones without rocky outcrops, and the depth is greater than 0.6 m. Vegetation was less than 3 m in height with an overgrazed herbaceous stratum. Vegetation corresponds to secondary scrub with the dominant genera Acacia, Prosopis and different Cacti (Fig. 3). This site was classified as grassland by the MODIS land cover product.
For the scrub and tree species, the importance vegetation index (IVI) was determined following Curtis and McIntosh (1950) to assess the vegetation homogeneity. The IVI is the sum of relative dominance, relative density and relative frequency of the species present. Vegetation sample points were chosen according to the flux footprint of the eddy covariance tower (Fig. 2). For each plant in the vegetation sample points, two stem diameters, the number of individuals (abundance) and identity of each species were measured as well as the coverage, which is the horizontal projection of the aerial parts of the individuals on the ground, expressed as a percentage of the total area (Wilson, 2011).

Eddy covariance measurements
The micrometeorological EC technique measures at the plant community level NEE in a nondestructive way and continuously over time (Baldocchi, 2014). The negative CO 2 fluxes corresponded to NEE, which is equivalent to NEP (net ecosystem production) but with opposite sign. The EC has advantages compared to other techniques that need to scale up measurements from the leaf, plant or soil levels up to ecosystems, especially when the vegetation is heterogeneous (Yepez et al., 2003). However, EC is an expensive technique, and data analysis and processing are complicated; also specific assumptions must be met regarding the terrain, vegetation and micrometeorological conditions, among other aspects (Richardson et al., 2019).
The fluxes were measured with the EC technique at a height of 6 m with the following instruments: a Biomet system (LI-COR Biosciences, USA) to measure H 2 O and CO 2 fluxes using an IRGASON-EC-150 open-circuit analyzer, a CSAT3 sonic anemometer and a KH20 krypton hygrometer; these were connected to a CR3000 data logger (Campbell Scientific Inc., Logan, UT, USA). The relative humidity and air temperature were measured with an HMP155A probe (Vaisala Corporation, Helsinki, Finland), net radiation was measured with an NR-Lite2 radiometer (Kipp and Zonen BV Delft, the Netherlands), and the photosynthetic active radiation (PAR) was measured with a quantum sensor (SKP215; Skye Instruments, Llandrindod Wells, UK). Measurements of the soil heat flux was implemented with four self-calibrating HFP01SC plates at 80 mm depth and in four representative positions of the landscape (Hukseflux Thermal Sensors BV, Delft, The Netherlands). Three time domain reflectometry (TDR) probes (CS616) measured volumetric water content in the ground installed vertically, and two sets of TCAV (averaging soil thermocouple) probes measured the temperature at 60 and 40 mm depths and above the HFP01SC plates (Campbell Scientific Inc., Logan, UT, USA). The TE525 (Texas Electronics, Dallas, TX, USA) tipping-bucket rain gauge was installed at 1.2 m height and 3 m away from the tower. All these meteorological variables were measured every 5 s, and average values were stored every 30 min; rainfall was accumulated for the same time interval. Sensible (H ) and latent (λE) heat fluxes were calculated by the EddyPro package (LI-COR Biosciences, USA).

Flux data processing
All EC data were collected at 10 MHz in the data logger and reported as micromoles of CO 2 per square meter per second and processed with the EddyPro package to convert values into average fluxes of 30 min intervals. Only quality-flagged records were used to account for the CO 2 flux (qc_co2_flux = 0) according to the Mauder and Foken (2011) policy in the EddyPro program (LI-COR, 2019).
However, this quality-checking is not sufficient, especially in the case of CO 2 ; therefore, data were postprocessed using the REddyProc package of R (R Development Core Team, 2009) to estimate the friction speed thresholds (u * ), gap-fill data, and partition the NEE flux into its GPP and R eco components (Wutzler et al., 2018). The filled-in estimates of NEE (NEE_uStar_f), GPP (GPP_uStar_f) and R eco (Reco_uStar) were used when the u * was lower than a u * annual threshold, above which nighttime fluxes are considered valid. The annual u * threshold was 0.193 and 0.194 for 2017 and 2018. The difference between these thresholds and the 95 % u * threshold was small (0.033 m s −1 ). Appendix A presents the threshold means and confidence intervals calculated in the REddyProc package. Only 1050 half-hour records (9.3 %)  had a u * below the annual mean u * threshold. The data with a flag equal to 0 were used for the variable NEE_uStar_fqc, as defined by REddyProc. Carbon dioxide flux data were timeintegrated and converted to grams of C per square meter per day using the molar ratio of C. We only reported the continuous measurements of the exchange of CO 2 for the period of April 2017 (DOY 89) to August 2018 (DOY 234) using the EC technique. Due to equipment malfunction and incomplete datasets, some periods of time were not considered. The measurement campaign presented here was not biased by wet winters since both years were characterized by a less-thanweak Niño-Niña.

Remotely sensed data
Data were requested to the National Aeronautics and Space Administration (NASA) via the Land Processes Distributed Active Archive Center (DAAC) using the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) to obtain spatial and temporal AppEEARS also unpacks and interprets the quality layers. Appendix B presents the details of each spectral band of MODIS. Data with less-than-good quality flags were deleted. Missing data were filled with splines, and a database with a 1 d time step was generated. This would smooth linear temporal-phenological evolution between any two successive remotely sensed data points (Eshel et al., 2019). Daily accumulated rainfall was requested using the Giovanni platform from the Goddard Space Flight Center (GSFC) DAAC; two products were used: the 3IMERGDF.006 from the Global Precipitation Measurement mission (GPM) and the 3B42.007 from the Tropical Rainfall Measuring mission (TRMM). Gridded weather parameters from the ORNL DAAC Daymet dataset were precipitation, shortwave radiation, maximum and minimum air temperature, and water vapor pressure. Daymet is a data product derived from a collection of algorithms interpolating and extrapolating daily meteorological observations (Thornton et al., 2017). Following Henrich et al. (2012) and Hill et al. (2006), daily reflectance bands of MODIS were used to compute several vegetation indices: red-green ratio index (RGRI), simple ratio (SimpleR), moisture stress (MoistS), disease stress index (DSI), green atmospherically resistant vegetation index (GARI), normalized difference vegetation index (NDVI), normalized difference water index (NDVI_w) and enhanced vegetation index (EVI); the corresponding equations are presented in Appendix B.

MODIS algorithm for GPP
Estimates of GPP are derived from data recorded by the MODIS sensor aboard the Terra and Aqua satellites. The efficiency (ε; g C MJ −1 ) with which vegetation produces dry matter is defined as the amount of solar energy stored by photosynthesis in a given period divided by the solar constant integrated over the same period (Monteith, 1972). Not all incident solar radiation is available for biomass conversion; only about 48 % is photosynthetically active (PAR; MJ m −2 ), and not all PAR is absorbed (Zhu et al., 2008). Thus, carbon exchange is mainly controlled by the amount of PAR absorbed by green vegetation (APAR) and modified by ε (Gitelson et al., 2015). The fraction of absorbed PAR (FPAR) is equal to APAR/PAR but can be represented by the NDVI spectral vegetation index produced by MODIS (Running 2004). The efficiency term ε is described as the product of different factors as a whole or as a part of the system (Monteith, 1972) but mostly those related to the efficiencies with which the vegetation intercepts the radiation and the efficiency of converting the intercepted radiation into biomass (Long et al., 2015). The MODIS algorithm that estimates GPP in the MOD17 product is (Running et al., 2004) The ε term in the MODIS algorithm is represented by a maximum radiation conversion efficiency (ε max ; kg C MJ −1 ) that is attenuated by suboptimal climatic conditions, mainly minimum air temperature (T min ) and vapor pressure deficit (VPD). Two parameters for each, T min and VPD, are used to define attenuation scalars for general biome types. These parameters form linear functions between the scalars (Running and Zhao, 2015; Wang et al., 2013): daily minimum temperature at which ε = ε max and at which ε = 0 and the daylight average VPD at which ε = ε max and at which ε = 0. GPP is truncated on days when air temperature is below 0 • C, or VPD is higher than 2000 Pa (Running and Zhao, 2015). Stress and nutrient constraints on vegetation growth are quantified by the limiting relation of leaf area in NDVI × PAR rather than constrained through ε (Running et al., 2004). However, the MODIS algorithm does not consider stomatal sensitivity to leaf-to-air vapor pressure across and within species and particularly between isohydric and anisohydric plant species (Grossiord et al., 2020).
The MOD17 user's guide presents a biome-property lookup table (BPLUT) with the parameters for each biome type and assumes that they do not vary with space or time (Running and Zhao, 2015). This aspect is important because ε max has the strongest impact on the predicted GPP of the MOD17 algorithm (Wang et al., 2013). The assumption is also important because the overstory and understory could be decoupled from each other and would intercept different amounts of light and have different water sources during the growing season . Light quantity and quality as diffuse light or sunflecks determine differences among understory species in their temporal response to gaps, involving acclimation and avoidance of photoinhibition (Pearcy, 2007). Another shortcoming is that few land cover classifications are incorporated into the MOD17 algorithm.

Santa Rita site dataset
The Santa Rita Experimental Range (SRER) is located in the western range of the Santa Rita Mountains in Arizona, USA (lat 31.8214, long −110.8661; 1120 m a.s.l.). Climate is BSk, with mean annual precipitation of 380 mm, temperature of 17.9 • C and Ustic Torrifluvent soils. Established in 1903, SRER has a long history of experimental manipulations to enhance grazing potential for cattle (Glenn et al., 2015). Two Ameriflux sites are located in the SRER: Santa Rita Grassland (US-SRG) and Santa Rita Mezquite (US-SRM). We used EC data for the years 2013-2019 from US-SRM, which is a mesquite grass savanna (35 % mesquite canopy cover and mean canopy height above 2 m, 22 % grasses, and 43 % bare soil), although MODIS describes this site as open shrublands (Glenn et al., 2015;Scott et al., 2004). The US-SRM site is dominated by velvet mesquite (Prosopis velutina) and has a diversity of shrubs, cacti, succulents and bunch grasses (McClaran, 2003). This site was chosen because the vegetation and climate are similar to the Bernal site, and it was the closest EC instrumentation with data availability (Fig. 4).

Modeling
Gross primary production estimated by EC at the Bernal site was modeled using OLS and EML. The explanatory variables were the remotely sensed data, the weather parameters and the vegetation indices (Appendix B). The OLS is a particular case of the generalized linear model, where the variation of a single response variable is explained by several independent variables. The OLS was fitted with the stepwise procedure; the final model included variables with a variance inflation factor (VIF) lower than 10 and a significance level of 0.05. Predictions of the EC GPP were obtained with the final model. Analysis and diagnostics were made with Minitab v 17 (Minitab LLC). Analysis of the OLS model can be used to determine agreement between methods of measurement, but it is sensitive to the range of values in the dataset, and its metrics -r, r 2 and root mean square -do not provide information on the type of association (Bland and Altman, 2010). In this paper, we used another metric, the probability of agreement, to determine bias and agreement between model estimates and observed data (see below).
While OLS is a well-known algorithm, machine learning algorithms are emerging techniques that focus on the data structure and match that data onto models. The EML approach considers the different realizations of machine learning models and constructs an ensemble of models coming with the advantage of being more accurate than the predictions from the individual ensemble members. However, EML is computationally intensive, requiring nodes with purpose-built hardware such as multiple processors or reduced-precision accelerators. The nodes could be aggregated in computing clusters, which require storage, power and cooling redundancy.
A stack of EMLs was obtained with the H 2 O package of R (Hall et al., 2019). This package provides several algorithms that can contribute to a stack of ensembles using the AutoML (automatic machine learning) function: feedforward artificial neural network (DL), general linear models (GLMs), gradient-boosting machine (GBM), extreme gradient boosting (XGBoost), default distributed random forest (DRF) and extremely randomized trees (XRT). AutoML trains two stacked-ensemble models: one ensemble contains all the models, and the second ensemble contains just the best-performing model from each algorithm class or family; both ensembles should produce better models than any individual model from the AutoML run. The term AutoML implies data preprocessing, normalization, feature engineering, model selection, hyperparameter optimization and prediction analysis, including procedures to identify and deal with nonindependent and identically distributed observations and overfitting (Michailidis, 2018;Truong et al., 2019).
Machine learning has two elements for supervised learning: training loss and regularization. The task of training tries to find the best parameters for the model while minimizing the training loss function, the mean squared error for example. The regularization term controls the complexity of the model, helping to reduce overfitting. Overfitting becomes apparent when the model performs accurately during the training, but the accuracy is low during the testing. A good model needs extensive parameter tuning by running the algorithm many times to explore the effect on regularization and crossvalidation accuracy (Mitchell and Frank, 2017). In this investigation, the function of training loss was the deviance, which is a generalization of the residual sum of squares driven by the likelihood. Deviance is a measure of model fit, and lower or negative values indicate better model performance (McElreath, 2020).
A stack of EML solutions was based on a random sample of the dataset for training the model. For the Bernal site, 85 % was used for training; for US-SRM 80 % was used. The AutoML function was run 20 times; each run added approximately 48 models to the leader board and ranked the best-performing models by their deviance. Each run splits the training data 10 times for k-fold cross-validation. The seed for an EML that is dependent on randomization was changed in every run. The stopping rule for each run was set at 100 s, and the maximum memory allocation pool for H 2 O was 100 GB in a single workstation with dual Xeon 2680 v4 processors and 128 GB of RAM. The H 2 O package was installed in a rocker/geospatial docker container (image available at https://hub.docker.com/r/rocker/geospatial, last access: 9 January 2021), which is a portable, scalable and reproducible environment (Boettiger and Eddelbuettel, 2017).
Two sets of predictions for the GPP at the Bernal site were obtained from the stacked ensemble. The first set of predictions was based on the 15 % of the Bernal site data reserved for testing. The second set of predictions was obtained by refeeding the US-SRM site model with the Bernal site explanatory variables. The first set of predictions would show the importance of local data to predict EC-based GPP. The second set of predictions would represent the suitability of off-site data to predict EC-based GPP. If the second scenario has good agreement, then an EML model could be used to represent wider areas of the ecosystem.

Variable importance
The variable importance within individual models was used to answer the question of which environmental variables were important for GPP prediction. For the "all-models ensemble" and "best-of-family ensemble" generated by Au-toML, it is not possible to examine the variable importance or the contribution of the individual models to the stack (H 2 O.ai 2017). Therefore, a weight (w i ) was calculated using Eq. (4), which is adequate for other information criteria besides the Akaike weights; this weight is an estimate of the conditional probability that the model will make the best predictions on new data, considering the set of models (McElreath, 2020). Then, the importance of each variable (%) was multiplied by the model's weight (w i ) and then added by variable to build the variable importance index. This index would measure how often a given variable was used on the leader board.

Model agreement
Calibration and agreement between methods of measurement are different procedures. Calibration compares known quantities of the true value or measurements made by a highly accurate method (a gold standard) against the measurements of a new or a contending method. When two methods of measurement are compared, neither provides an unequivocally correct measurement because both have a measurement error, and the true value remains unknown (Bland and Altman, 2010). Stevens et al. (2015) proposes the probability of agreement (θs) as a plot metric to represent the agreement between two measurement systems across a range of plausible values. The θs method addresses some of the challenges of the accepted "limits agreement method" presented by Bland and Altman (2010). Besides the agreement plot, agreement is based on maximum likelihood bias parameters: α and β, quantifying the fixed bias and the proportional bias. If α = 0, β = 1 and σ 1 = σ 2, then the two measurements are identical; σj is the measurement variation. The probability-of-agreement analysis was performed using the ProbAgreeAnalysis (https://uwaterloo.ca/ business-and-industrial-statistics-research-group/software, last access: 28 November 2020) in MATLAB 9.4 (Math-Works, Inc.). An arbitrary 1 g C m −2 d −1 was considered to be a tolerable magnitude to conclude that agreement is sufficient enough to use either estimated GPP interchangeably. The reference measurement was the GPP obtained from EC data at the Bernal site and tested against the MODIS MOD17 model, the OLS predictions or each of the two sets of EML predictions. If the probability-of-agreement plot suggested disagreement between two measurement systems, then the predictions can be adjusted using Adjusted predictor = (predictor − α) /β 3 Results

Eddy covariance fluxes at Bernal
Dominant flow at the Bernal site was from the northeast (Fig. 2). Energy balance closure for this site had a slope of 0.72 and r 2 = 0.92 (Fig. 5). Homogeneous sites of the Fluxnet network obtain closure percentages higher than 72 %, and for the Bernal site, the vegetation heterogeneity was important (see below). Average H was always negative during nighttime, but during some months of the dry and rainy season, λE was positive, particularly after dawn (Fig. 6), so as to allow nighttime evaporation from the soil or vegetation. However, during some months of the dry season, rainfall was low (Fig. 7); then the positive λE suggested that cacti could have an active gas exchange at that time.
Carbon dioxide absorptions had a diurnal behavior beginning at dawn and ending before sunset (Fig. 8). Nighttime flux was positive, indicating respiration, notwithstanding the presence of cacti. Although summer rains are characteristic of the climate at the Bernal site (Fig. 7), a negative NEE flux occurred in all measured months. The lowest CO 2 flux was recorded in January and February 2017 and in May 2018 (Table 1). This behavior resulted from the phenology of the vegetation since most species lost their leaves in the dry season and was also due to the effect of low temperature. Within the rainy season, the flux of CO 2 increased compared to the months of January to June. The correlation between NEE and precipitation was −0.45. When the sum of the precipitation of the current month and that of the previous month was considered, the correlation with NEE was −0.7, suggesting that continuous availability of soil moisture is important for the absorption of CO 2 in this environment.
The scrub at Bernal was heterogeneous in botanical composition. A total of 24 species of cacti and shrub were identified; on average, each sampling plot had 10.3 species. The IVI was similar between all cacti (0.36 ± 0.04), shrub legumes (0.38±0.04) and other shrubs (0.23±0.06) sampled. Most sampling plots were in areas of high flux frequency (Fig. 2). Cylindropuntia imbricata had the largest IVI, followed by Acacia farnesiana, Acacia schaffneri and Prosopis laevigata. The IVI of the herbaceous stratum represented by grasses was not characterized due to the state of overgrazing and the absence of reproductive structures in plants, which made measurement of their abundances, frequencies and dominance difficult. The grass genera present were Melinis, Chloris, Cynodon and Cenchrus, all corresponding to in- vasive C 4 tropical grasses. Scrub species of higher IVI had a similar LAI (1.2), although the magnitude of the LAI of P. laevigata stood out (Table 2).

Machine learning ensembles as predictors of eddy covariance GPP
In this section we describe the modeling with EML using local remotely sensed data from the Bernal site to predict GPP at the same site and then the agreement between EML GPP predictions and EC-derived GPP. The AutoML function generated 1031 models with an average deviance of 1.35, while the deviance of the leader model was 0.63 in the training dataset (Table 3). A total of 11 models of type GBM and five XGBoost models were in the top 30 models, along with the nine best-of-family ensembles and five all-models ensembles. The weighted variable importance on the leader board was higher for the LAI from the MOD15 and MCD15 products (17 % and 14 %). The PsnNet, EVI (MOD17), FPAR (MCD15), green atmospherically resistant vegetation index (GARI) and MODIS reflectance band 13 had an importance higher than 3 % (5.9 %, 5.4 %, 4.2 %, 3.6 % and 3.0 %). LAI (MCD13 and MOD13), PsnNet and the FPAR (MOD15) were the more important variables (20 %, 17 %, 13 % and 10 %) in the top nonstacked model, a GBM model that was ranked in fourth place.   Predictions of GPP in the testing dataset showed dispersion in the lower range of the scale of measurement, and the correlation was 0.94 (Fig. 9a). The final prediction of GPP for the whole dataset had a probability of agreement (θs) of 0.58 ± 0.01 (parameter estimate and standard error), α = −0.0616 ± 0.11 and β = 1.0133 ± 0.02, suggesting a good fit with low fixed and proportional bias (Fig. 9b). The probability of agreement decreased slightly at the lower and upper range of the scale of measurement (Fig. 9c), indicating that the EML model would predict GPP without increasing the bias, particularly in the range from 0 to 4 g C m −2 d −1 . However, the value of θ s should be higher than 0.95 so as to consider EC measurements and EML to be interchangeable. The correlation of 0.99 (r 2 = 0.98) for the data in Fig. 9b could be misleading as it would suggest a very good fit.
Using only the five of the more important variables named above to generate an EML resulted in an XBoost leader model with 2.73 deviance and a total of 1094 models. The top 30 models were 16 XBoost and GBM models, and the best-of-family ensembles started to show up in 12th place. Although the number of runs was the same (20), the AutoML function increased the number of produced models, but the smaller set of explanatory variables constrained the ability to identify features contributing to better models. Using another set of five randomly selected explanatory variables (one vegetation index and four MODIS bands) resulted in a leader model with 2.52 deviance out of 1024 models, but this time the leader was the best-of-the-family ensemble. Using only a few variables was considered to increase the deviance compared to the average deviance of 1.35 obtained during the training phase and using all available variables.
An important question for modeling upscaling is the capacity to extrapolate results temporally and spatially; here we explored the latter, posing the following question: would predictions of GPP from EML for an EC site agree with EC observations from another site with "similar" environmental conditions? First, an EML solution was found, training 80 % of the Santa Rita dataset and obtaining a bestof-family ensemble with 0.23 deviance out of 634 trained models (hereafter this model is referred to as the Santa Rita model). Then, the environmental and remotely sensed data from the Bernal site were fed into the Santa Rita model; this would be an external validation dataset. However, agreement was not good; the mean value of θ s was 0.15 ± 0.01, with α = −1.0822 ± 0.09 and β = 0.58127 ± 0.02. The value of θ s was not constant across the range of measurement and decreased rapidly after 2 g C m −2 d −1 (Fig. 10b). Because the bias was important, predictions were adjusted using Eq. (6), showing some improvement with r = 0.78 (Fig. 10a). Comparing Figs. 5b and 6a, it is evident that an EML model extrapolation to other conditions is noisier, i.e., Santa Rita model trying to represent the ecosystem function at Bernal. Notwithstanding, some of the most important variables were shared by both EML ensembles: Bernal and Santa Rita; in the case of Santa Rita, LAI from MOD15, MYD15 and MCD15 had 35.0 %, 4.8 % and 3.1 % of variable importance, and the FPAR from MOD15 was 12 %.

MODIS as predictor of eddy covariance GPP
MODIS is important because it overpasses every point of the earth every 1 or 2 d, and it implements a GPP product (MOD17) that has helped track the response of the biosphere to the environment since 2000. The product MOD17 has been validated against many EC sites, but few validation sites correspond to deserts and semideserts (Running et al., 2004). The GPP MOD17 underestimated the GPP derived from EC data at Bernal (Fig. 11a). In a similar bell-shaped distribution of θ s, as in the case of the extrapolation of the Santa Rita site (Fig. 10b), here the θ s was not constant across the range of measurement; mean θ s was 0.24 ± 0.13, with α = 0.00047±0.087 and β = 0.48749±0.02 (Fig. 11c). Adjusting MOD17 estimates with Eq. (6) improved the relation-  ship, but note that the value of r (0.76) was the same for the original MOD17 and the adjusted MOD17 (Fig. 11b).

Prediction eddy covariance GPP with ordinary-least-squares multiple regression
OLS is a common estimation method for linear models, and here this model appeared as adequate, judging by the general distribution of predictions (Fig. 12a) and the probabilityof-agreement plot (Fig. 12b). A total of 14 variables were included in the model, all of them with VIF values lower than 7.0 (Appendix C); the VIF statistic quantifies the severity of multicollinearity, and an acceptable threshold is 10. The most significant variables were the EVI from MYD13 and Daymet variables precipitation, shortwave radiation, and minimum and maximum temperatures (see Appendix B for variable details). Variables with high coefficient values were MODIS reflectance band 14 (9.17), the EVI from MYD13 (8.53) and the NDVI (3.9), while Daymet temperatures had small coefficients: −0.23 for maximum temperature and 0.17 for minimum temperature. The θ s decreased slightly at the end of the measurement range; mean θ s was 0.5±0.014, with α = 0.18845±0.137 and β = 0.94966±0.031. No correction for this model was calculated since α was close to 0, β was close to 1, and the EML model had a higher θ s.

Agreement
The probability of agreement (θ s) was the statistic to determine if the mean responses were in agreement. Comparing  Respective function of probability of agreement using 1 g C m −2 d −1 as a tolerable agreement between methods of estimation of GPP: EC data from the Bernal site and EML model for the Santa Rita site. The horizontal axis (s) represents the magnitude of the measurement, the vertical axis is the probability of agreement for the measurement, and the red line is the confidence interval p < 0.05. the confidence intervals (CIs) for θ s, the best modeling approach was the EML because its CI (0.56-0.59) was different from that of the OLS model (0.47-0.52). More importantly, for the EML and OLS models, the values of θ s had little variation across the range of GPP estimates. However, the CIs of the EML and OLS models were similar in their bias estimates α and β, and both models had no bias (p < 0.05). Altogether, the best result was the EML ensemble using environmental and remotely sensed data corresponding to the same site, i.e., Bernal (Fig. 10c). This kind of EML would be useful for gap filling or the evaluation of GPP time series of the site that generated the model. Machine learning algorithms can fill gaps longer than 30 d (Kang et al., 2019).
The second option to estimate GPP, using the same datasets, was the multiple-regression OLS model (Fig. 12a). The multiple regression is straightforward, and here multicollinearity was not a problem. The EML ensemble and the OLS regression have the highest values of θs (0.58 and 0.5, respectively; Fig. 12b). Higher values for θs are desirable (> 0.95), and this could be achieved by increasing the sample size, relaxing the tolerable magnitude for agreement (here it was set at 1 g C m −2 d −1 ) or perhaps using different forcing variables.
The MODIS estimates were a third-best alternative since the mean θ s was 0.24. In the present study, we used a spline to fill the data to a daily time step since the MCD17 is an 8 d composite product, but a similar result was obtained if  the EC GPP was rescaled and compared to the original 8 d MCD17 data (Fig. 11a). The GPP from MODIS was an underestimate of EC GPP, and when the estimates were adjusted (Eq. 6, Fig. 11b) the performance was not better than the OLS (model not shown). The MODIS land cover classification represented this site as grassland (MCD12), and this could be another reason for the poor agreement besides the assumptions made in the MODIS algorithm regarding the ε and ε max parameters and the response of vegetation to VPD. Agreement of MODIS GPP is crucial because MODIS products are frequently used in country-wide assessments of the carbon cycle and can influence public policies.
The model with least agreement resulted when the EML ensemble generated from the Santa Rita site was used to predict GPP at Bernal. Machine learning models can make predictions, but their usefulness decreases when they are used outside the context of where they were built, while process-based mechanistic models have this ability. Although the Au-toML function in H 2 O is designed to protect against overfitting using cross-validation runs (Michailidis, 2018), in our study, the Santa Rita model could not be generalized to represent the Bernal site GPP process, probably because the variables and features selected for Bernal or Santa Rita were different during the AutoML workflow. The Santa Rita model was good at predicting GPP at that site, with a deviance of the leader model of 0.23, while at Bernal the deviance of the leader model was 0.63 (Table 3), indicating that the Santa Rita EML ensemble was at least as good a model as the EML at Bernal (not shown). The GPP time series for Santa Rita was about 4 times the size of the Bernal dataset, and therefore the deviance was lower. However, when the Santa Rita model was used with Bernal data, the mean θ s was 0.16, indicating that the agreement was insufficient. Eyeballing the predictions in Figs. 9a, 10b and 11a and their corresponding correlation values (0.78, 0.76 and 0.82 for adjusted EML Santa Rita, adjusted MODIS and multiple-regression OLS), it could be argued that these models were comparable. However, their θ s plots present a different perspective.

Model agreement
The Santa Rita model had higher θ s when it extrapolated for low GPP values for the Bernal site, suggesting that the Santa Rita model had a better skill when predicting GPP close to 0 and even negative GPP values (Fig. 10b). Although the Bernal and Santa Rita sites had similar vegetation and climate classifications, they are more than 1600 km apart, and rainfall monthly distribution is different. The GPP seasonal cycle at Bernal started in February and steadily increased to a maximum during July and August (Table 1). At Santa Rita, the GPP was low from January until mid-July (< 0.5 g CO 2 m −2 d −1 ) and then increased sharply to a maximum level in mid-August (Joiner and Yoshida, 2020). What matters eventually for machine learning methods is how well the predictor space rather than geographic space is sampled (Jung et al., 2020). Incorporating data from more humid (semiarid) sites could improve the GPP predictions by a machine learning method. Not only more EC sites, but also sites representing a water availability gradient would be important for semiarid ecosystems, representing long spells and the influence of oceanic oscillations and monsoon rains.
All the models presented here used transient data to represent GPP, specifically at the 1 d time step. Besides the radiation-related variables, two sources of rainfall were used as forcing variables, but evapotranspiration (ET) was not used. The MOD16A2 version 6 is available as an 8 d gapfilled product and could be included in EML or OLS models. However, a more general representation of the carbon cycling could be achieved when including variables that represent annual or seasonal timescales of soil water, evaporation or precipitation (Scott and Biederman, 2019). Scott et al. (2015) suggest that real lags between precipitation and productivity that may impart legacy effects may also be partially masked by using ET as ET more carefully tracks productivity when soil moisture storage is accessed. The occurrence of off-season rainfall, dry spells and carryover effects could be parameterized as windowed events of a given duration. A window would be a period with distinct time boundaries; the window allows the grouping of records with similar features. The effect of the window at a given time could be represented as moving weights as the point in time in question is closer or farther from the window.
In our study, the metric to assess model agreement was the probability of agreement (θ s) and their bias parameters. Many other metrics can be used to evaluate model per-formance such as the root mean square error, r, r 2 or the model efficiency factor (MEF) presented by Nash and Sutcliffe (1970). In particular, the MEF is a step in the Fluxnet data processing pipeline (Pastorello et al., 2020). A MEF value close to 1 represents a high correlation and lower biases (Joiner and Yoshida, 2020). Therefore, the calculated MEF for the Bernal site EML (0.98) would suggest very good predictive performance, while the θ s of 0.6 for this model indicates a more modest performance. The θ s for a particular value of the measurand is the probability that the difference between two measurements made by different systems falls within an interval that is deemed to be acceptable (Stevens et al., 2015). In the present study we used 1 g C m −2 d −1 as a critical value defining an acceptable difference. If this value is smaller, then the probability of agreement would decrease for this same model. In such a case, it would be less likely that the predictors agree considering that the θs takes into account both the difference in the function means at a given value of the measurand and the uncertainty in its estimation.

MODIS discrepancies
Different authors have reported discrepancies between MCD17 and EC estimates of GPP in semiarid regions. Examining MODIS discrepancies in these ecosystems is important because the errors induced by cloud cover are expected to be minimal, and other effects can be identified (Gebremichael and Barros, 2006). The GPP of MOD17 did not relate well (EC = 0.11 + 0.17MODIS, r 2 = 0.67) with estimates of EC GPP in semidesert vegetation of the Sahel (Tagesson et al., 2017). With data from different types of vegetation in the Heihe basin in China, MODIS17 overestimated the GPP from EC (EC = 1.15 + 0.24MODIS, r 2 = 0.68; Cui et al., 2016). For scrub sites in Mexico, the relation between GPP calculated from EC and MOD17 was not good (MODIS = 383.82 + 0.467EC, r 2 = 0.6; Delgado-Balbuena et al., 2018). In arid and semiarid ecosystems in China, optimizing parameters of the MODIS GPP model with sitespecific data improved the estimate to explain 91 % of the variation in the GPP of the data observed by EC (Wang et al., 2019). These same authors propose improving the land use classification used by the MOD17 algorithm and recalibrating light use efficiency parameters to solve the GPP estimation problem. Gebremichael and Barros (2006) examined an open shrubland site in a semiarid region of Sonora, Mexico, and their analysis of the temporal evolution of the discrepancies with MODIS GPP suggested revisiting the light use efficiency parameterization, especially the functional dependence on VPD and PAR, and water stress or soil moisture availability.
The relationship between the GPP MODIS and the GPP EC presented in Sect. 3.3 is an approximation because the uncertainty in the respiration component must be considered. The empirical relationship between nocturnal NEE and soil temperature has been used to represent ecosystem res-382 A. Guevara-Escobar et al.: Machine learning estimates of eddy covariance carbon flux piration (R eco ) in order to separate the processes that contribute to daytime NEE (Richardson and Hollinger, 2005;Wofsy et al., 1993). Nighttime NEE should be equal to the rates of autotrophic and heterotrophic respiration, while during daytime, NEE should be equal to the combined rates of carboxylation and oxidation of Rubisco, autotrophic respiration and heterotrophic respiration. Then, the GPP can be calculated as the difference between daytime NEE and R eco , estimated through its relationship with temperature (Goulden et al., 1996). In the present study, R eco was calculated based on soil and air temperature following the procedure of Reichstein et al. (2005) implemented in REddyProc (Wutzler et al., 2018). Although it is possible to measure or model the partition of respiration (Running et al., 2004;Wang et al., 2018), the presence of cacti complicates the calculation, assuming that all nighttime flux represents ecosystem respiration (Owen et al., 2016;Richardson and Hollinger, 2005). While soil respiration tends to be temperature-limited when soil moisture is nonlimiting in temperate ecosystems, in rangeland ecosystems the controls of soil CO 2 efflux were photosynthesis, soil temperature and moisture (Roby et al., 2019). In our study, the instrumentation did not include measurements of plant or soil respiration partition to validate the R eco estimates.
A problem regarding data comparison from remote orbital sensors and terrestrial observations is that different quantities are fundamentally measured. MODIS measures the radiation reflected by the earth's surface in two spectral bands at 250 m spatial resolution per pixel, five bands at 500 m and 29 bands at 1000 m. The EC technique has a footprint to measure CO 2 that varies dynamically in shape and size but is generally considered to be 1 km 2 ; in this study the footprint radius was 600 m. To solve the scaling, MODIS products related to the carbon cycle have been validated with the EC technique and biometric measurements on several spatial scales using process-based ecosystem models and characterizing areas up to 47 km 2 around the EC measuring tower (Cohen et al., 2003).
At Bernal, the vegetation was heterogeneous. This situation was represented by the heterogeneity in vegetation activity in the 4 pixels used, spanning about 0.2 units of NDVI at the peak of the season activity (Fig. 1). The standard error of the IVI differed by 1 order of magnitude among species. Although more important species had a lower standard error, their means were similar, indicating that they were equally abundant at all sampling plots. The higher importance of some species (Table 2) was explained by selective grazingbrowsing behavior and the dispersion caused by cattle by either ingesting or transporting seeds or plant parts (Belayneh and Tessema, 2017). Regarding landscape heterogeneity, the tower fetch was predominantly from the northeast, capturing the less heterogeneous area of the site but also the more active, according to the NDVI (Fig. 1).
The thorny scrub examined had two vegetation layers: the overstory layer mainly consisting of mesquite, acacia and cacti and the understory layer that included grasses and herbs. Cattle preferentially graze the understory, and because they eat using their tongue, they will avoid browsing thorny species, unlike goats or deer, which use their lips. Without grazing management, overtime, the competition balance will favor bush species, resulting in encroaching, and the understory will be stressed; only unpalatable species or those with their growing meristems very close to the ground would survive. Representing the structure and functioning of these two layers using MODIS is possible (Liu et al., 2017). Recently, Hill and Guerschman (2020) presented a MODIS product derived from MCD43A4 to estimate the fractions of photosynthetic and nonphotosynthetic vegetation and the remaining fraction of bare soil. These developments could improve the MOD17 GPP estimates since its model represents a homogeneous single vegetation layer. All these considerations help to understand the low θ s between MOD17 estimates of GPP and EC-derived GPP.

Forcing variables and machine learning
Forcing variables in this study were gridded meteorological data and MODIS spectral bands and products, but the variables mostly included in the ML algorithms were LAI and FPAR, accounting for 36.4 % and 54.9 % of the variable importance index at Bernal and Santa Rita, respectively, in their leader board. This result supported the view that CO 2 fluxes can be represented by ML algorithms exclusively using remotely sensed data (Tramontana et al., 2016;Joiner and Yoshida, 2020). Using neural networks, Joiner and Yoshida (2020) showed that with only satellite reflectance from MODIS and top-of-atmosphere PAR, it is possible to capture a large fraction of GPP variability. They argue that vegetation indices may reduce the information content of the underlying reflectance when compressing the information from two or more bands into a single index. In our study, reflectance bands were not often included in the ML models. This was most likely because MOD09GA and MODOCGA provide estimates of surface reflectance uncorrected for the illumination and viewing geometry, while the MCD43 used in Joiner and Yoshida (2020) uses a surface bidirectional reflectance distribution function (BRDF) model that improves the quality of surface albedo retrievals. By propagating the BRDF correction in the MODIS processing pipeline, the vegetation indices would more likely relate better to the GPP. Future modeling efforts should use benchmark scenarios according to sets of forcing variables already identified as useful.

Carbon flux
Although the carbon balance in ecosystems is influenced by different factors such as soil type and amount of nutrients, the relationship with soil temperature and humidity is particularly strong (Anderson-Teixeira et al., 2011;Hastings et al., 2005). How much of the rainwater the system can retain or lose has been described as the leakiness of the system (Guerschman et al., 2009). More than immediate incident rainfall, the available soil moisture and its redistribution are important in semiarid ecosystems, including steam flow, preferential flow paths, hydraulic lift and others (Barron-Gafford et al., 2017). At Bernal, when the sum of the precipitation of the current month and that of the previous month was considered, the correlation with NEE was −0.7, suggesting that continuous availability of soil moisture is important for the absorption of CO 2 in this environment. This result is consistent with other studies in which the relationship between the net productivity of the ecosystem (NEP) and precipitation is initially positive but is leveled from 1000 to 1500 mm annually (Xu et al., 2014). The hydraulic redistribution of water from moist (deeper) to drier soils through plant roots tended to increase modeled annual ecosystem uptake of CO 2; this process was identified at US-SMR (Fu et al., 2018;Scott et al., 2003).
The leakiness is highly dependent upon vegetation fractional cover, the proportion of the surface occupied by bare soil and vegetation: photosynthetically active vegetation and nonphotosynthetically active vegetation such as litter, wood and dead biomass (Guerschman et al., 2009). All these vegetation fractions have a water storage capacity and can reduce the amount of effective rainfall available to plant roots. It is possible that the canopy of the bushes completely intercepted the rainfall in some months because the scrub can intercept up to 20 % of the precipitation, and its canopy storage capacity is 0.97 mm (Mastachi-Loza et al., 2010). Considering only the daily rain events greater than 5 mm, the correlation between precipitation and NEE rose to −0.72. In the present study, the interception of rain by vegetation surfaces was not calculated, but the results suggest that it would be important to explore the relationship between net precipitation and NEE.
The average NEE at a global level is −156 ± 284 g C m −2 yr −1 (Baldocchi, 2014). The highest frequency among sites that measured NEE with EC occurs from −200 to −300 g C m −2 yr −1 , but in sites with biometric measurements, the peak occurs at −100 g C m −2 yr −1 (Xu et al., 2014). Using the daily averages of Table 2, the average NEE during the measurement period was −0.78 g C m −2 d −1 and annually would be −283.5 g C m −2 yr −1 . This result was higher than the annual values of the induced grassland and scrubland vegetation characterizing the Sonoran Desert plains (138 and 130 g C m −2 yr −1 ; Hinojo-Hinojo et al., 2019). In New Mexico, NEE values measured with EC are between 35-50 g C m −2 yr −1 in desert grassland and 344-355 g C m −2 yr −1 in mixed coniferous forest (Anderson-Teixeira et al., 2011). In a dryer region, the sarcocaulescent scrubland of Baja California in Mexico, the NEE was −39 and −52 g C m −2 yr −1 in 2002, respectively (Hastings et al., 2005. The NEE measured here was within the range of NEE, 0.3 ± 0.2 kg C m −2 yr −1 , for grasslands and shrublands in Mexico (Murray-Tortarolo et al., 2016). Although the measurements of the present study had gaps and were compared with annual studies, we considered that the reported value of C was representative of the main season of growth of this type of scrub.

Ecosystem management
Overgrazing is an appreciation relative to the grazing productive system where the forage resource is overused; in a mixed shrub-grass ecosystem, such as Bernal, it usually refers to the understory. Overgrazing means that the plant regrowth is readily grazed, tillers and root reserves are lost, and eventually the plant may die. Although the Bernal site was overgrazed, the carbon fluxes indicated that the plant community was photosynthetically active in both dry and rainy seasons. It is fair to assume that water in the soil was not limiting for the deep-rooted bush species, and that was the reason why it was possible to maintain the photosynthetic function during rainless months. However, this primary production would not have tangible benefits for ranchers' production system since no edible biomass would be produced for the cattle. From the point of view of carbon capture, the system accumulated nonlabile biomass that would remain in the system for a longer time compared to a grassland ecosystem (although it would be necessary to determine the partition of said shrub biomass). However, the overgrazing condition affects the biomass of the understory roots and consequently the carbon pool in the soil.
In the short term, it can be thought that the estimated negative carbon flows had a favorable effect on the environmental agenda. As time passes, it is possible that the gaps between the individual shrubs of the overstory expand, and this would have had an effect on soil erosion. It is also possible that the water stored in the soil profile used by the bushes gradually decreases to the point of causing drought, changes in phenology and advancements of the desertification process. There are many opportunities for ecology conservation and livestock-oriented management; these may include controlled grazing or propagating native thornless shrub species. If ranchers do not identify a benefit in the vegetation, then they will be tempted to remove it as it occurred at the study site. Because of its wide coverage and availability, the MODIS GPP product accuracy is important in representing the carbon cycle, raising awareness and monitoring advancement of environmental decisions.
Although we found that EML was a good option for modeling the GPP of a site, what is really needed to evaluate the performance of semiarid ecosystems is a spatial representation of the carbon flux. This is a problem for an underrepresented area regarding instrumented EC towers. However, the EML could be designed to take into account the explanatory variables in a spatiotemporal continuity. The best modeling approach was the ensemble of machine learning. The second option to estimate GPP was the multiple-regression OLS, and the third alternative were the MODIS estimates. Machine learning was a good option to predict GPP in the local context where it was generated; otherwise its performance was not good. This was demonstrated when using the EML from Santa Rita and trying to predict fluxes at Bernal. Nevertheless, a machine learning model would be useful for gap filling or the evaluation of GPP at the same site. The GPP estimates of a given model can be adjusted using the bias parameters of the probability of agreement to improve the relationship. Further research exploring the usefulness of EML to predict GPP in other contexts could use EML to calibrate mechanistic model parameters, hybrid approaches integrating EML and biologically meaningful mechanistic models, or EML representing the spatial variability of the landscape.  Table A1. Threshold of velocity friction (u * ) above which nighttime fluxes were considered valid. Estimates were obtained using the R package REddyProc. We used only the flux records with u * equal to or higher than the corresponding mean u * threshold for each year. Bound values are a 95 % confidence interval. NA: not applicable.   (Hill et al., 2006) Moisture stress (  Spectroradiometer). f Vegetation indices: RGRI is red-green ratio index, and GARI is green atmospherically resistant vegetation index; details of formula are described in Appendix B. g Layers from MODIS products. EVIMYD is enhanced vegetation index from MYD13; PsnNet is photosynthesis product form MOD17; LstNgtMYD is nighttime land surface temperature emissivity from MYD11. h Variables obtained from Daymet daily dataset: DayTmin is minimum temperature; DayTmax is maximum temperature; Daysrad is shortwave radiation; Dayprc is precipitation. i Daily rainfall rate from 3B42 TRMM (Tropical Rainfall Measuring Mission).