A new global estimate of surface turbulent fluxes,
latent heat flux (LE) and sensible heat flux (
Turbulent fluxes from the land surface to the atmosphere, particularly
sensible heat flux (
Alternatively, one can use a machine learning approach, such as an artificial neural network (ANN), trained on globally representative but imperfect estimates of the fluxes (such as those from models) to parameterize the nonlinear statistical relationships between remote sensing observations and surface fluxes. This approach has been successfully used for global soil moisture retrieval (Aires et al., 2012; Kolassa et al., 2013, 2016; Rodriìguez-Fernández et al., 2015) and surface heat flux retrieval (Jiménez et al., 2009). Such ANNs require a target dataset for training. Climate model simulations of the relevant geophysical variable are usually used as the training dataset to facilitate subsequent data assimilation efforts (Aires et al., 2012; Kolassa et al., 2013, 2016). However, the downside of this approach is that the resulting fluxes estimated by the ANN often exhibit some of the same biases as the simulations used to train the network (Rodriìguez-Fernández et al., 2015), even if improvements can be achieved such as a more realistic seasonal cycle as it is informed by the seasonal cycle of the remote sensing data (Jiménez et al., 2009).
Previous studies show a strong relationship between the rate of photosynthesis and solar-induced fluorescence (SIF) observations and indicate that the plant fluorescence measurements can be a useful proxy for photosynthesis estimation (Flexas et al., 2002; Govindjee et al., 1981; Havaux and Lannoye, 1983; van Kooten and Snel, 1990; Krause and Weis, 1991; McFarlane et al., 1980; Toivonen and Vidaver, 1988; van der Tol et al., 2009). Recently, satellite observations of SIF have become available, opening new possibilities for the global monitoring of photosynthesis (Frankenberg et al., 2011, 2012, 2014; Guanter et al., 2012; Joiner et al., 2013; Schimel et al., 2015; Xu et al., 2015).
SIF observations from the Global Ozone Monitoring Experiment–2 (GOME-2) instrument are shown to better track the seasonal cycle of GPP compared to typical high-resolution optically based vegetation index estimates (Guanter et al., 2012, 2014; Joiner et al., 2014; Walther et al., 2016). SIF has also been shown to be a pertinent indicator of vegetation water stress (Lee et al., 2013). Moreover, a near-linear relationship between monthly SIF retrievals and GPP is found for different vegetation types, which suggests that SIF estimates can strongly constrain GPP retrievals (Frankenberg et al., 2011).
Recently, a new SIF product was developed from observations of the GOME-2
satellite using a new retrieval algorithm that disentangles three components
from multispectral observations (Joiner et al., 2013). SIF retrievals are
shown not to be strongly affected by cloud contamination and seasonal
variabilities in aerosol optical depth (Frankenberg et al., 2014). More
recently, remotely sensed SIF retrievals have been used to successfully
provide estimates of GPP in cropland and grassland ecosystems (Guanter et
al., 2014; Zhang et al., 2016a). SIF retrievals are also integrated with
photosynthesis estimates from the National Center for Atmospheric Research
Community Land Model version 4 (NCAR CLM4), which result in significant
improvement of the photosynthesis simulation (Lee et al., 2015). As GPP
relates to plant transpiration through stomata regulation (Damour et al.,
2010; DeLucia and Heckathorn, 1989; Dewar, 2002), and transpiration water
fluxes dominate continental ET (Jasechko et al., 2013), the use of remotely
sensed SIF has the potential to also better constrain estimates of the
continental water (LE) and energy (
In this study, we develop an ANN approach to retrieve monthly estimates of
LE,
Moreover, to reduce any errors, we introduce a Bayesian perspective to
generate the target dataset for the ANN. Multiple estimates of LE,
This new global product is named WECANN (Water, Energy, and Carbon Cycle with
Artificial Neural Networks). WECANN monthly estimates for the period
2007–2015 are provided on a 1
Characteristics of products used for training of ANN.
Characteristics of observations used as input in the WECANN product.
The inputs of WECANN include six remotely sensed variables introduced in
Sect. 2.2 and Table 2: SIF, net radiation, air temperature, soil moisture,
precipitation, and snow water equivalent (SWE). These are used to retrieve LE,
Four products are introduced in this section, and a triplet of them is used
for training of each of the LE,
The Global Land Evaporation Amsterdam Model (GLEAM) is a set of algorithms to
estimate terrestrial evapotranspiration using satellite observations (Martens
et al., 2017; Miralles et al., 2011a). GLEAM is a physically based model
composed of (1) a rainfall interception scheme, driven by rainfall and
vegetation cover observations; (2) a potential evaporation scheme, calculated
from the Priestley and Taylor (1972) equation and driven by satellite
observations; and (3) a stress factor attenuating potential evaporation,
based on a semiempirical relationship between microwave vegetation optical depth (VOD) observations and
root zone soil moisture estimates (based on a running water balance for
rainfall and assimilating satellite soil moisture). The data are provided at a
0.25
The FLUXNET-MTE provides global surface fluxes at
0.5
The ECMWF Reanalysis
(ERA) is a global 3-D variational data assimilation (3D-Var) product that uses HTESSEL in
the forecast system. HTESSEL has a surface runoff component and accounts for
a global nonuniform soil texture unlike the old TESSEL model (Balsamo et
al., 2009). This is an offline model simulation, and HTESSEL is driven by
meteorological forcing output from the forecast runs. Photosynthesis in the
model is computed independently (i.e., with its own canopy conductance) from
LE, so that the carbon cycle does not interact with the water cycle at the
stomata level, adding errors. We use LE,
The MODIS sensor is onboard
the sun-synchronous NASA satellites Terra (10:30 LT overpasses) and Aqua
(13:30 LT overpasses). It provides 44 global data products (Justice et
al., 2002) from 36 spectral bands, including visible, infrared, and thermal
infrared spectrums to monitor and understand Earth's surface: atmosphere,
land,
and ocean processes. The MODIS GPP/NPP project (MOD17) provides gross and net
primary production estimates covering the whole land surface and is useful
for analyzing the global carbon cycle and monitoring environmental change.
The MOD17 algorithm is based on a light-use efficiency approach proposed by
Monteith and Moss (1977), which states that GPP is proportional to the
product of incoming photosynthetically active radiation (PAR), fraction of
absorbed PAR (
Six sets of observations are used as input to the WECANN retrieval algorithm. These are selected in a way that provides necessary physical constraints on the estimates from the ANN. Table 2 lists the characteristics of each of the datasets, and they are briefly introduced in the following.
The GOME-2 instrument is an optical spectrometer onboard the Meteorological
Operational Satellite Program (MetOp-A and MetOp-B) satellites, which were
launched by the ESA. GOME-2 was designed to monitor
atmospheric ozone profiles as well as other trace gases and water vapor
content. It senses Earth backscatter radiance and solar irradiance at a
40
Net radiation is the main control of the rates of sensible and latent heat in
wet environments and is closely related to PAR. The Clouds and Earth's
Radiant Energy System (CERES) is a suite of instruments that measure
radiometric properties of solar-reflected and Earth-emitted radiation from
the top of the atmosphere to Earth's surface, from three broadband channels
at 0.3–100
The Atmospheric Infrared Sounder (AIRS) is a high-spectral-resolution
spectrometer onboard the NASA Aqua satellite launched in 2002. It provides
hyperspectral (visible and thermal infrared) observations for monitoring
process changes in the Earth's atmosphere and land surface, as well as for
improving weather prediction. The AIRS instrument was designed to obtain
atmospheric temperature and humidity profiles of every 1 km layer of the
atmosphere. The accuracy of AIRS temperature observations is typically better
than 1
The ESA Climate Change Initiative (CCI) program soil
moisture (ESA CCI SM) is a multi-decadal (1980–2015) global
satellite-observed surface soil moisture product. It merges observations from
passive sensors (e.g., Scanning Multichannel Microwave Radiometer (SMMR),
Special Sensor Microwave/Imager (SSM/I), AMSR-E) and active ones (e.g., the
European Remote Sensing, ERS; Advanced Scatterometer, ASCAT), based on a
triple collocation (TC) error characterization (Dorigo, et al., 2017; Liu et al.,
2011, 2012; Wagner et al., 2012). Here, we use daily data from the latest
version, v2.3. ESA CCI SM is provided on a
0.25
The Global Precipitation Climatology Project (GPCP) provides global daily
precipitation estimates at 1
The GlobSnow project is developed by ESA and provides long-term snow-related
variables: snow water equivalent (SWE) and areal snow extent (SE). It
combines microwave-based retrievals of snow information (including Nimbus-7
SMMR, DMSP F8/F11/F13/F17 SSM/I(S) observations) and ground-based station
data through a data assimilation process and provides the SWE and SE products
at different temporal resolutions: daily, weekly, and monthly (Pulliainen,
2006). Here, we use v2 of the daily L3A SWE product, which is posted on a
25 km
FLUXNET is a network of regional tower sites that measure turbulent flux exchanges (water vapor, energy fluxes, and carbon dioxide) between ecosystems and the atmosphere (Baldocchi et al., 2001). FLUXNET comprises over 750 sites covering five continents. Measurements from the FLUXNET towers provide valuable information for evaluating satellite-based retrievals of surface fluxes. In this study, FLUXNET measurements from the FLUXNET 2015, the La Thuile Synthesis dataset, and the Large-scale Biosphere–Atmosphere (LBA) experiment in Brazil are used for evaluation (details are provided in Sect. 4.2).
FLUXNET 2015 tier 1 and tier 2 data were retrieved from
From the LBA experiment in Brazil, we use data from sites in Rondônia at the edge of a deforested region (BR-Ji1 and BR-Ji2) and near São Paulo (BR-Sp1). As the data did not span recent years, we instead use a climatology of the fluxes for comparison from 1999 to 2003. We note that, of course, the interannual variability in the region (such as El Niño and La Niña) could alter the seasonality and magnitude of the fluxes in the region.
We also use data from the La Thuile Synthesis Dataset
(
A total of 85 sites from the three datasets are selected for evaluation of WECANN retrievals spanning a large climatic and biome gradient (Fig. S2 and Table S1). For AmeriFlux towers, if measurements from both the FLUXNET 2015 dataset and the La Thuile dataset were available, we used the FLUXNET 2015 data. We have only selected sites that had at least 24 months of continuous measurements during 2007–2015. Any site that would have fallen outside of the WECANN land mask (Fig. S1) is excluded (several sites in coastal regions).
We use estimates of an independent water budget closure model across five
major basins to evaluate WECANN retrievals on regional scales (Aires, 2014;
Munier et al., 2014) . ET estimates from the budget closure approach satisfy
a water budget closure with no residual; therefore, they can be used as a
reference to evaluate WECANN ET estimates on a basin scale. These basins
include the Amazon (4 680 000 km
Architecture of the ANN layers. Input layer provides the matrix
We developed an ANN retrieval algorithm to estimate the surface fluxes (LE
and
The training step of the ANN aims at estimating the weights for each of the
neuron connections, such that the mismatch between the ANN outputs and target
estimates is minimized. For this, we used the mean squared error (MSE) as the
cost function and a backpropagation algorithm to adjust the ANN weights.
During the training, the network implicitly learns the coupling of the LE,
For the purpose of training, the target data are divided into three subsets: training, validation, and testing constituting 60, 20, and 20 % of the target data, respectively. In each iteration, the training subset is used to estimates the weights in the network, and the convergence of the training towards the target data is checked using the validation subset. When overfitting of the network weights to the training data occurs, the validation estimates start diverging from the target data and the training is stopped (early stopping). The weights from the last iteration before the occurrence of the divergence represent the final solution. The testing subset is used to assess the ANN performance after the training phase.
As an additional measure to avoid overfitting, we repeated the training for
several ANNs with an increasing number of neurons in the hidden layer (1 to
15). For one to five neurons, the
To train the ANN, we used LE,
One of the key issues in the design of an ANN to retrieve any geophysical variable is defining a good target dataset. One practice has been to use outputs from a land surface model as the target (Aires et al., 2005; Jiménez et al., 2013; Kolassa et al., 2013; Rodriìguez-Fernández et al., 2015). However, all observations and models contain random errors and biases. Therefore, the retrieval based on the ANN exhibits some of the biases of the original target dataset even if the ANN is able to make corrections to its original target data (e.g., correction of an imperfect seasonal cycle, as demonstrated by Jiménez et al., 2009). To address this issue, we use three datasets, which are sufficiently independent so that the training can learn from each dataset and benefit from all of them, synergistically. We implement a pseudo-Bayesian training by probabilistically weighting the occurrence of each training dataset by its likelihood and define a target dataset. The three datasets are listed in Table 1 for each variable.
To define this prior distribution, we use the TC
technique. TC is a method to estimate the RMSE
(and, if desired, correlation coefficients) of three spatially and temporally
collocated measurements by assuming a linear error model between the
measurements (McColl et al., 2014; Stoffelen, 1998). This methodology has
been widely used in error estimation of land and ocean parameters, such as
wind speed, sea surface temperature, soil moisture, evaporation,
precipitation,
Left column: annual average retrievals in 2011 for
Global patterns of seasonal average LE from WECANN in 2011:
Similar to Fig. 3 but for
Similar to Fig. 3 but for GPP instead of LE.
The TC-estimated errors for the surface fluxes and GPP are shown in
Figs. S4–S6. The white regions represent missing retrievals or discarded
negative estimates due to an insufficient data record. For LE, high TC errors
are found in the Amazon rainforest and tropical Africa for GLEAM, in the Amazon
rainforest and the Sahel for ECMWF, on the Indian peninsula for FLUXNET-MTE, and
in US Great Plains for ECMWF and FLUXNET-MTE. For
There are several likely causes for these errors. For the FLUXNET-MTE data, the regions that are not covered by (many) FLUXNET eddy covariance stations may result in larger uncertainties, and those regions for which interception is a large component of the LE flux as well (Michel et al., 2016). For the GLEAM and ECMWF data, thick vegetation generally induces biases compared to the satellite observations, especially in tropical regions (Anber et al., 2015).
Finally, we use the TC-based RMSE estimates at each pixel to compute the a
priori probability (
In this section, we present and compare the retrievals of LE,
Figure 2 illustrates the annual global average and scatter plots of retrievals
vs. target estimates. The spatial patterns of the WECANN retrievals are
similar to expectations. The average global values in 2011 are
38.33 W m
The seasonal variability in and spatial pattern of the retrievals from 2011 are
shown in Figs. 3–5. LE does not exhibit any variability over deserts such
as the Sahara and Arabian Peninsula, as expected (Fig. 3). Wet tropical
forests exhibit subtle seasonal variability in LE. These spatial
variabilities in the seasonal cycle reflect changes in the radiation,
temperature, water availability during the dry season, soil nutrients, soil
type conditions, and leaf flushing (Anber et al., 2015; Morton et al.,
2014, 2016; Restrepo-Coupe et al., 2013; da Rocha et al., 2009; Saleska et
al., 2016). In contrast, seasonal variability dominated by radiation
availability is noticeable in wet midlatitude regions for both the Northern Hemisphere (NH) and
Southern Hemisphere (SH), i.e., East Asia, the eastern US, and the north and
east Australian coasts with over 60 W m
Seasonal variabilities in
The seasonal variability in GPP (Fig. 5) in northern latitudes follows the availability of radiation in wet regions, with a peak in summer and another in spring for dry regions, corresponding to both soil water availability and high incoming radiation. A clear east–west transition conditioned by water availability is observed in the continental US. In the tropics and subtropics, the response is diverse. The Amazon rainforest exhibits high GPP throughout the year with a peak between September and February in the wetter part of the basin, following the dry season, consistent with the observations at eddy covariance towers near Manaus and Santarém (Restrepo-Coupe et al., 2013; da Rocha et al., 2009). Compared to LE, substantial geographical variability is observed in the Amazon because of the strong variabilities in soil type, green-up, biodiversity, and rooting depth. In the drier part of the basin, water availability controls the seasonal cycle of photosynthesis, and the peak in GPP is observed in the wet season (DJFMA). In the Congo rainforest, GPP exhibits four seasons, with two wet and two dry ones, with a substantial decrease in GPP during those dry spells. In Indonesia, GPP is steadier throughout the year, exhibiting high values year round. Monsoonal climates over India, Southeast Asia, northern Australia, and Central–North America are well captured with rapid rise in GPP following water availability. The highest GPP values are observed in rainforests and the agricultural US Great Plains, in JJA for the latter. Northern latitude regions mainly exhibit substantial GPP in the summer and late spring and small values throughout the rest of the year.
Correlation coefficient (
Comparison of the retrievals with eddy covariance observations of
LE,
Direct validation of the WECANN retrievals is challenged by the fact that no
global, error-free estimates of LE,
A summary of statistics across 85 FLUXNET sites is provided in Tables S2–S4.
Overall, WECANN performs better than other alternative global products. In
particular, WECANN has the highest correlation for 76 % of sites for LE,
58 % of sites for
Figure 6 shows a summary of the correlation coefficients presented in Tables S2–S4 for each group of plant functional types (PFTs). Each class has between 6 and 22 sites. WECANN has the best mean within each PFT class, and the smallest variability in most of the classes for all three variables.
Figure 7 shows the comparison of monthly WECANN retrievals and three other
global products' estimates with the tower estimates across five select sites
that span a range of climatic and vegetation coverage conditions. At the
Oklahoma agricultural site (US-ARM),
At the Brasschaat, Belgium, site (BE-Bra) (Fig. 7b), LE is very well captured
by WECANN, which captures the seasonal cycle well, yet misses some of the
interannual variability. WECANN outperforms the other retrievals of LE and
GPP and captures the GPP seasonal cycle very well compared to other products,
which display a too-early GPP rise and overestimate summer GPP. Again, the SIF
data provide independent useful data compared to other environmental
information (radiation, temperature, vegetation indices) used by the other
retrieval schemes. All retrievals strongly underestimate the reported
eddy covariance
At the cold Finland site (FI-Hyy), WECANN captures the seasonal
cycle of GPP and LE very well, as well as
At the monsoonal grassland site of Santa Rita, AZ, WECANN correctly captures
the complex dynamics of
Finally, at the South African Mediterranean site, ZA-Kru, WECANN reproduces
some of the dynamics of the observed
Overall, across the different sites, the WECANN retrieval performs better
than other products, especially in terms of the seasonality of LE,
Difference between annual mean LE retrieved by WECANN and the three
target datasets
In this section, we compare the WECANN-based estimates to other datasets used
in the training to better understand how WECANN differs from those training
data. Figure 8 shows the comparisons for LE and indicates that our product
has a relatively similar
Similar to Fig. 8 but for
The differences in
Similar to Fig. 8 but for GPP instead of LE.
The comparison between the GPP estimates shows significant differences
(Fig. 10). WECANN compares the best against FLUXNET-MTE (
Finally, we compare annual anomalies of WECANN retrievals and the three
training datasets globally and in different climatic zones. The zones are
defined as polar (90–60
In order to further assess WECANN on regional scales, we analyze its capacity
to capture extreme events. We thus selected three major heat wave and
drought events that occurred during the temporal coverage of the WECANN product.
These events are the 2010 Russia heat wave, 2011 Texas drought, and 2012 US Corn
Belt drought. Figure 11 shows the percentage of average monthly
anomalies with respect to mean values, for LE,
Finally, an intense drought in the central US, particularly in the Corn Belt,
occurred in 2012 and reduced maize yields by about 25 % and increased
prices by 17–24 % (Boyer et al., 2013; USDA, 2013). By mid-September
2012 almost two-thirds of the continental US was covered by drought, and
different parts of the US Corn Belt were categorized as either D3 (extreme
drought) or D4 (exceptional drought) conditions. Figure 11 shows similar
patterns in LE,
We also assess the accuracy of WECANN ET retrievals using ET estimates from the independent water budget closure model introduced in Sect. 2.3.2. The analysis is carried out for the years 2007 to 2010, which overlap between WECANN retrievals and the water budget closure study. Figure 12 shows the relative absolute difference in ET estimates from WECANN compared to the ET estimates from the water budget study for each of the five basins (mean value and 1 standard deviation across 48 months). Mean absolute differences vary between a low of 5 % in the Amazon and a larger 24 % value in Colorado, while the other three basins have mean differences of 9, 17, and 20 %. While the differences vary between a low and moderate range, it should be noted that the coarse spatial resolution of the WECANN product causes a difference in the spatial averaging to get the basin-level estimates of ET. Moreover, in the budget closure estimates only a single runoff (at the outlet of the considered basin) is used over the entire basin; therefore, large heterogeneous basins such as the Colorado and Mississippi have large uncertainties associated with them, as runoff does not correctly constrain the flux distribution over the entire basin. It is over those basins that the WECANN retrieval compares less favorably with these large-scale estimates. Downscaled version of those estimates would further help in the evaluation of ET products.
One of the advantages of a statistical retrieval algorithm, in particular of
ANNs, is that the run time is extremely fast after the training step. This
enables us to characterize the uncertainty of the retrievals by propagating
the uncertainties in the input variables through the network. For this
purpose, we set up a 10 000-bootstrap experiment and run the WECANN
retrieval by adding error to input variables. The errors are normally
distributed with a mean of zero and a standard deviation that depends on the input
variable. For SIF, air temperature, and soil moisture, we use the error
estimates or standard deviations reported in their associated products. These
errors vary spatially and temporally and we used the associated value
for each time and space data point. For net radiation, we use a constant
standard deviation of 34.58 W m
Mean monthly anomalies (in percentage with respect to mean value) for three extreme heat wave events.
Relative absolute difference between ET estimates of WECANN compared to modeled ET from basin-scale water budget closure. Markers show the mean, and whiskers show 1-standard-deviation intervals.
Annual mean estimates and uncertainty bounds of LE (top row),
Figure 13 shows the results of the bootstrap for each of the LE,
On a global scale the GPP ranges between a minimum of
117.15
The interannual variations in surface fluxes and GPP show distinct patterns.
For example, in the year 2015, which was an El Niño year, LE and GPP decreased notably, and
Satellite SIF observations are relatively new and have not been previously used to
estimate LE and
To do so, we trained two different ANNs with NDVI and EVI instead of SIF data
on each of the three variables (LE,
This study introduces a new statistical approach to retrieving global surface latent and sensible heat fluxes as well as gross primary productivity using remotely sensed observations on a monthly timescale. The methodology is developed based on an artificial neural network that uses six input datasets including solar-induced fluorescence, precipitation, net radiation, soil moisture, snow water equivalent, and air temperature. Moreover, a Bayesian approach is implemented to optimally integrate information from three target datasets for training the ANN, using triple collocation to calculate a priori probabilities for each of the three target datasets based on their uncertainty estimates.
The new global product, referred to as WECANN, is evaluated using target
datasets as well as FLUXNET tower observations. The evaluation results
compared with training datasets show that our retrieval has similar
correlation with the three products, while it has the smallest RMSD with
FLUXNET-MTE for LE (RMSD
The retrieval maps indicate that LE,
We also assessed the impact of SIF on retrieval quality. In comparison to optical-based vegetation indices, SIF has better performance in regions where phenology and incident radiation are not the main contributors to flux variability, while it has similar performance in other regions.
From the evaluation results compared with FLUXNET tower observations, it is
noted that WECANN has better performance compared to other global products.
LE and
We also assessed the performance of WECANN in capturing extreme heat wave and
drought events and showed that in the case of the 2010 Russia heat wave, 2011 Texas
drought, and 2012 US Corn Belt drought, WECANN properly captures the
extent of the anomalies in LE,
The WECANN product is publicly available for download at the Aura
Validation Data Center (AVDC) at Goddard Space Flight Center via
The authors declare that they have no conflict of interest.
The funding for this study is provided by the NASA grant no. NNX15AB30G. Pierre Gentine
acknowledges funding from NSF CAREER award no. EAR – 1552304, and NASA
grant no. 14-AIST14-0096. Diego Miralles and Pierre Gentine acknowledge funding from the Belgian
Science Policy Office (BELSPO) in the frame of the STEREO III program
project STR3S (SR/02/329). The WECANN product is hosted on the AVDC server, and we
would like to thank Michael M. Yan and Ghassan Taha for their help in this
regard. The authors would like to thank all the producers and distributors
of the data used in this study. We would like to thank the ECMWF team (Gianpaolo Balsamo and
Souhail Bousetta, in particular) for providing the ECMWF data. We also
thank NASA and Steven W. Running for providing the MODIS GPP estimates and
Johanna Joiner for the GOME-2 data. The GPCP 1DD data were provided by the
NASA/Goddard Space Flight Center's Mesoscale Atmospheric Processes
Laboratory, which develops and computes the 1DD as a contribution to the
GEWEX Global Precipitation Climatology Project. The MCD12C1 data product was
retrieved from the online data pool, courtesy of the NASA Land Processes
Distributed Active Archive Center (LP DAAC), USGS/Earth Resources
Observation and Science (EROS) Center, Sioux Falls, South Dakota,