Simultaneous assimilation of satellite and eddy covariance data for improving terrestrial water and carbon simulations at a semi-arid woodland site in Botswana

Introduction Conclusions References


Introduction
Terrestrial ecosystems are strongly interconnected with the climate system through the hydrological cycle by various processes, such as infiltration, runoff, evaporation and transpiration.In particular, latent heat flux (LHF), resulting from the sum of evaporation and transpiration, is an essential component of the surface energy balance and needed for understanding the global and local water balance.It is also a key quantity for understanding the physiological response of ecosystems to changes in climate, as LHF is related to the terrestrial carbon cycle through stomatal function and leaf size T. Kato et al.: Simultaneous assimilation of satellite and eddy covariance data (Campbell and Norman, 1998;Beerling and Berner, 2005).Information on the latent heat fluxes of terrestrial ecosystems can improve our understanding of ecosystem functioning and its potential response to changes in the Earth's climate, such as an increased frequency of droughts (IPCC, 2007).
To fill the gap between measurements of terrestrial ecosystem fluxes and eco-physiological theory as embodied in terrestrial ecosystem models, data assimilation techniques are becoming more widely used in biogeochemistry.The main application of such data assimilation systems is focused on the optimization of model process parameters, primarily against observations of the carbon cycle, e.g.atmospheric CO 2 concentration, carbon fluxes and pools (e.g.Rayner et al., 2005;Braswell et al., 2005;Williams et al., 2005;Knorr and Kattge, 2005).Recently, Barbu et al. (2011) have applied the simplified extended Kalman filter to integrate both the in situ soil wetness index (SWI) and satellite-derived leaf area index (LAI) into the ISBA-A-gs land surface model for French grasslands for continuous update of the modeled state variables.The authors achieved a significant improvement in the root-zone soil water content of around 13 % as compared to results from the prior model.When applying the ISBA-A-gs model to grasslands and croplands in France, Calvet et al. (2012) also found that the maximum available soil water capacity has a large impact on how well modeled results correlate with available agricultural statistics.Such studies provide, besides parameters optimized to fit model output to observations, a better understanding of the key processes controlling the ecosystem behaviour with regard to eco-physiological functioning and closely related ecosystem carbon cycling.
In this study, we use the fully variational Carbon Cycle Data Assimilation System (CCDAS), incorporating the terrestrial ecosystem and land surface model BETHY (Biosphere Energy-Transfer Hydrology; Knorr, 2000).CCDAS has been designed to estimate process parameters through assimilation of various data streams, mainly atmospheric CO 2 concentration from ground-based measurement stations and remotely sensed fraction of absorbed photosynthetically active radiation (FAPAR) on a global scale (Rayner et al., 2005;Scholze et al., 2007;Kaminski et al., 2012).CCDAS is based on a variational approach and makes use of the availability of the adjoint (1st derivative) model to optimize model process parameters.Furthermore, CCDAS is able to calculate posterior parameter uncertainties through use of the Hessian matrix (2nd derivative of the misfit function between model and data) and propagating these uncertainties through the model to several diagnostic quantities of interest, e.g. the net carbon flux.CCDAS has so far been applied for assimilation of atmospheric CO 2 concentration and FAPAR observations (Kaminski et al., 2012).Here, CCDAS is extended to assimilate LHF and to estimate further parameters related to the hydrological part of the model.LHF is calculated in conjunction with terrestrial carbon fluxes by BETHY, and the difference of the simulated values to LHF measured with eddy covariance (EC) systems is minimized as part of the assimilation scheme.The aim of this work is, therefore, to demonstrate the possibility of applying CCDAS to multiple data streams simultaneously, and to extend the application of CCDAS to the study of eco-hydrological processes.
Savannas are climatically characterized by a distinct seasonality of rainfall, i.e. a combination of a severe dry season and a moderate wet season.Savanna vegetation is adapted to dry conditions and usually composed of sparse trees and grasses, whose canopy does not close.These regions are potentially at risk from large changes in the seasonality of water availability as well as the total amount of available water caused by climate change.For example, Wang (2005) showed that the climate models consistently predicted less rainfall and consequently drier soils at the end of the 21st century over much of subtropical and temperate regions including savannas.
Numerous model studies have analysed the importance of various processes for the hydrological conditions in savanna ecosystems.For example, Kleidon and Heimann (1998) and later Ichii et al. (2009) highlighted the importance of rooting depth within land surface models, which deal with both LHF and carbon fluxes, assuming that ecosystems are maximizing their productivity under water-limited conditions.
To investigate eco-hydrological dynamics of ecosystems as a whole, the eddy covariance (EC) technology has been applied in various terrestrial ecosystems (Aubinet et al., 2000;Baldocchi et al., 2001) as a reliable way of measuring energy, water and carbon fluxes.However, compared to closed-forest ecosystems in the Northern Hemisphere, little attention has been given to savanna ecosystems, even though they cover approximately 20 % of terrestrial surface (Veenendaal et al., 2004).Recent efforts to conduct eddy covariance observations in savanna regions (Veenendaal et al., 2004) have enabled us to greatly improve our modeling capabilities, and to better understand eco-hydrological functioning in savannas and open canopy woodlands.
We apply CCDAS to simultaneously assimilate eddy covariance LHF and remotely sensed FAPAR observations at a single point for a semi-arid savanna site at Maun, Botswana, and investigate the effect of assimilating multiple data streams on the accuracy in both the simulated variables.In addition, we analyse the effect of the assimilation of the two data sets on simulated gross primary production (GPP), which is not assimilated.To our knowledge, this is so far the first study to assimilate eddy covariance data simultaneously with other data streams into a terrestrial ecosystem model using the adjoint-based gradient approach.

Site description and measurement data
We have selected a mopane tree woodland area at Maun, Botswana ( 23• 33 E, 19 • 54 S; 950 m a.s.l.; Veenendaal et al., 2004).With a canopy cover of 30-40 %, the plant community at the flux measurement site is dominated by the mopane tree (Colophospermum mopane), and the marginal understory consists of grasses with a canopy cover of at most 15 %, dominated by Panicum maximum, Schmidtia pappophoroides and Urochloa trichopus.The mean maximum temperature of the warmest month is 33.6 • C, the mean minimum temperature of the coldest month 7.1 • C, and mean annual precipitation 464 mm.There is a distinct dry season during the winter months from May to September.Substantial amounts of rainfall are normally limited to between December and March.
LHF and CO 2 flux measurements are conducted by the EC method using a 12.6 m high tower in the middle of a homogeneous tall mopane tree stand with a maximum canopy height of about 8 m (Veenendaal et al., 2004).Three-dimensional wind speed, humidity, and CO 2 concentration were logged with a frequency of 20 Hz, and fluxes are integrated into half hour means with the EdiSol software (Moncrieff et al., 1997).Eddy flux measurements are influenced by contributions from a basal source area, whose size and position is varying depending on the aerodynamic conditions: wind direction, friction velocity, atmospheric stability, etc.At this site, Veenendaal et al. (2004) estimated the mean 90 % fetch distance of the installed eddy correlation measurement system to be 520 m for daytime data in March 2000 by a footprint analysis according to Schuepp et al. (1990) and Kolle and Rebmann (2002).Assuming that a fetch radius of 520 m can be enlarged occasionally depending on aerodynamic conditions and that different directions are averaged over time, the time-integrated fetch area approaches that of the footprint of the FAPAR measurements (a rectangle with length and width of 4500 m centred on the tower site, see details on FAPAR data below).At this spatial scale, the terrain is still generally homogeneous, but exhibits patches of the size of a few hundreds of metres (see Google Earth images, November 2012).
Air temperature, shortwave radiation, and precipitation are also measured at the same tower, and are used to calibrate the climate input data, which is extracted from a global data set as described in the following paragraph.Data from 2000 and 2001 are used for assimilation.Due to missing halfhourly data, only 223 points of daily averaged LHF data out of 731 days for two years would have been be available for assimilation if we had restricted ourselves to complete diurnal measurement cycles.To get both a sufficient number of data points and avoid biases in daily averaged LHF values, we include days with up to four of the 48 half-hourly values missing, where the gaps were filled using an appropriate gap-filling scheme (see Appendix A in the Supplement).This procedure yields a total of 464 daily data points for the two selected years 2000 and 2001.
Input data of daily precipitation, daily minimum and maximum temperatures, and incoming solar radiation at the site are derived from a global gridded climate data set, generated through a combination of available monthly gridded and daily station data (R.Schnur, personal communication, 2010) by a method by Nijssen et al. (2001), using gridded data from the Summary of the Day Observations (Global CEAS), National Climatic Data Center and the latest updates of gridded data by Jones et al. (2001) and Chen et al. (2002).These data are then corrected using the local climatology measured at the eddy flux tower (Lloyd et al., 2004).This is done by deriving linear regression equations between daily minimum and maximum temperatures and incoming solar radiation from the global data set and the local measurements.Daily precipitation from the global data set is adjusted by multiplying the global data with a constant factor such that the total rainfall matches that of the local rainfall data.
The assimilated FAPAR observations are derived from the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) of the National Aeronautics and Space Administration (NASA) at a spatial resolution of 1.5 km (Gobron et al., 2006).The FA-PAR data are provided every 10 days as representative values over the period giving a total of 70 data points over the twoyear study period.3 by 3 pixel scenes centred around the position of the Maun flux site are used here.
We have chosen the Maun site for several reasons.First, two years of flux data, both LHF and carbon fluxes, measured by the EC technique during the SAFARI2000 campaign are available (Lloyd et al., 2004).Second, a flat topography and homogeneous land cover increase the representativeness of the EC measurements for larger areas such as the FAPAR footprint.There are also significantly fewer cases of cloudy conditions at a savanna site as compared to, for example, tropical forest sites.Third, the dominant land cover types with mopane trees, understory grasses, and patchy bare ground, which change their relative coverage seasonally, are potentially responsible for large amplitudes and distinct seasonality in LHF and other related quantities.This environment thus provides a welcome opportunity for testing and enhancing the CCDAS in an area with water-limited conditions and low productivity.

Carbon Cycle Data Assimilation System
CCDAS, in its standard global setup, combines the land biosphere model BETHY (Knorr, 2000) with the atmospheric tracer transport model TM2 (Heimann, 1995) and some background fluxes not computed by BETHY (fossil fuel and land use change emissions and ocean-atmosphere exchange fluxes) to simulate the terrestrial carbon cycle globally along with atmospheric CO 2 concentrations.It uses first and second derivatives to optimize internal model process parameters and subsequently derive posterior uncertainties on these parameters.In this study, we modify the version of CCDAS as described in Knorr et al. (2010) to assimilate daily LHF and FAPAR measurements at the Maun savanna site.BETHY calculates the energy balance (including LHF), photosynthesis (including FAPAR) and autotrophic respiration on an hourly time step, and phenology, hydrology and heterotrophic respiration on a daily time step using the beforementioned climate input data (Fig. 1).Two plant functional types (PFTs), namely tropical broadleaf deciduous tree with a warm-deciduous phenology and C4 grass (recognized as PFT 2 and 10 in the original BETHY model), are simulated for the Maun site with a fractional coverage of 0.7 and 0.3 for PFT 2 and 10, respectively.Detailed information about BETHY is given in the Appendix B in the Supplement.
LHF is calculated as the sum of three terms: transpiration through stomata, soil evaporation and direct evaporation of intercepted rain water on the leaves (Knorr, 1997).Soil evaporation is calculated by the Ritchie model (Ritchie, 1972), separating the evaporation scheme into two phases depending on soil wetness level.In phase 1 (very wet soil), the soil evaporation rate depends on equilibrium evaporation, while in phase 2 (drying soil), the cumulative evaporation is proportional to the square root of time with a proportionality factor called "desorptivity".
FAPAR is calculated as the vertical integral of absorption of photosynthetically active radiation by healthy green leaves divided by the difference between the incoming and outgoing radiation flux at the top and bottom of the canopy (Knorr, 1997).This integration is carried out by a two-flux scheme (Sellers, 1985), which takes into account soil reflectance, solar angle and amount of diffuse radiation.Equating satellite and model FAPAR means that given the same illumination conditions, the same number of photons enter the photosynthetic mechanism of the vegetation, even if some of the assumptions differ between BETHY and the model used to derive FAPAR (Gobron et al., 2000).
Differences between simulated LHF and FAPAR values and the observed data are minimized by optimizing process parameters of BETHY.Here, we only briefly summarise the main methodological aspects.There are two modes in our data assimilation process: a calibration mode and a diagnostic mode.In calibration mode, the optimal parameter set is derived from FAPAR and LHF observations by propagating the observational information in an inverse sense through a chain of models.The mismatch of modeled values to observations is defined as a cost function as explained in the following Sect.2.3, and model parameters are then calibrated through iterative parameter adjustment (using the gradient information provided by the adjoint model) until the cost function reaches a minimum.In diagnostic mode, the quantities of interest (i.e.LHF and FAPAR) and their uncertainties can be calculated from the optimized parameter vector and its uncertainty as derived in the calibration mode.When the model is run in diagnostic mode, the model is run forward with the optimized parameters and the various diagnostics of interest.For detailed information on the CCDAS methodology, we refer to Kaminski et al. (2003), Scholze (2003), Rayner et al. (2005), and Scholze et al. (2007).

Cost function and observational uncertainties
The cost function J (p) (p denotes the parameter vector) expresses the differences between simulated and observed quantities normalised by the uncertainty of each of the contributing observations, LHF and FAPAR, under the assumption of a Gaussian probability density distribution.It is formulated in a Bayesian form: where p is the parameter vector, p 0 is the prior parameter vector (subscript 0 denotes the prior value), and C p 0 is the uncertainty for the prior parameter vector p 0 in the form of a covariance matrix.e(p) and a(p) are modeled LHF and FAPAR values as a function of the parameter set p; e 0 and a 0 are the observations of LHF and FAPAR; and C e 0 and C a 0 express the uncertainties of the observations e 0 and a 0 .( ) T and ( ) −1 denote the transpose and inverse of matrices.J is minimized iteratively using derivative information calculated by the adjoint model.The Hessian matrix, the second derivative of J with respect to the parameters, is used to estimate posterior parameter uncertainties, using the mathematical property that the inverse of the Hessian matrix at the cost function minimum approximates the posterior parameter error covariance matrix.All derivative code is derived efficiently from the model's source code (see Kaminiski et al., 2003) by applying the automatic differentiation tool TAF (Transformation of Algorithms in Fortran; Giering and Kaminski, 1998).
The uncertainty of observational LHF is taken as the maximum of 10.0 W m −2 and 23 % of measured LHF, which is in analogy to the approach taken by Knorr and Kattge (2005).In this study, the 23 % threshold was derived from the energy imbalance at this site, which was calculated as the underestimation of the sum of daily averaged sensible and latent heat flux (SHF + LHF) compared to the sum of net radiation R n and soil heat flux G in the regression line: SHF + LHF = 0.77 (R n + G) − 12.2 Wm −2 , r 2 = 0.79.Although the energy imbalance should rather be regarded as a systematic error, we consider it here as a reasonable ad hoc value to define the observational uncertainty for the optimization, which corresponds to a random error.As in Knorr et al. (2010), the uncertainty of observational FAPAR is set to a constant value of 0.1 for all observations.

Eco-hydrological parameters
We select 24 parameters to be optimized against observed LHF and FAPAR data (Table 1).8 parameters refer to both PFT 2 and 10, the remaining only to one of the two.The first 14 parameters are related to physiology, and the next four to leaf phenology.The prior mean and uncertainty values for these 18 parameters are the same as those used in previous applications of CCDAS (Scholze et al., 2007;Knorr et al., 2010).The six new parameters (19-24) control stomatal aperture, energy balance, and water balance processes in BETHY.They are f Ci C3 and f Ci C4 , the ratio of CO 2 concentration inside and outside of leaf tissue for C3 and C4, respectively (Eq.A21 in the Supplement); C W , the ratio of maximum water supply rate from the roots relative to plantavailable soil moisture (Eq.A24); h 0 , a scaling factor of the relative dryness of air (Eq.A35); ĥ, a scaling factor of the relative humidity of air (Eq.A34); and W max , the maximum plant-available soil moisture.The latter is defined as , where d is rooting depth (in m), and W fc andW wilt are volumetric water content at field capacity and wilting point, respectively (in m 3 water per m 3 wet soil).Parameters 19-23 were introduced by Knorr and Heimann (2001) and are parameterised as in that study, with a relative uncertainty of 10 %.
For W max , we assume a prior value of 104 mm derived from Patterson (1990).A decreasing value of W max causes a decline of relative soil wetness, leading to decreased transpiration in BETHY.The values of W max strictly determine the area average across PFTs, while the PFT specific values are determined such that the average W max of all grass PFTs is 30 % of the average W max of all tree (or shrub) PFTs.In this study, PFT2 (tree) and PFT10 (C4 grass) cover a fraction of f 2 = 0.7 and f 10 = 0.3 of ground surface, respectively.From this condition, we obtain W max (tree) = W max /(f 2 + 0.3f 10 ) = 1.27W max , and W max (grass) = W max /(f 10 + f 2 /0.3) = 0.38 W max .Further information on parameters is given in the Supplement.

Experimental set-up
To investigate the impact of multiple data streams on the assimilation results, we perform three assimilation experiments: (1) assimilating only LHF, (2) assimilating only FA-PAR data, and (3) assimilating LHF and FAPAR data simultaneously (denoted combined assimilation).In addition, we run a prior simulation of the model with prior parameter values and no data assimilation.The assimilation experiment with only LHF data considers the first and second terms in Eq. ( 1), the one with only FAPAR considers the first and third terms, and combined assimilation all three terms.

Optimization and parameter uncertainty
The optimizations took 38, 43, and 41 iterations to converge to a minimum for LHF, FAPAR and combined assimilation, respectively.The total value of the cost function was reduced by a factor of between 1.1 and 5.7, while the norm of the gradient of the cost function was reduced by many orders of magnitude to a final value close to zero for all experiments (see Table 2).
The main metric to measure the impact of the observational constraint provided by the assimilated data stream is the relative uncertainty reduction.It is defined as 1−σ posterior /σ prior , where σ is one standard deviation of the respective parameter uncertainty before or after assimilation.It is shown in Table 1 and Fig. 2 along with prior parameter values and uncertainties.
Relative parameter uncertainties are reduced by more than 20 % for three, four, and six of the 24 parameters for LHF, FAPAR and combined assimilation, respectively.In general, www.biogeosciences.net/10/789/2013/Biogeosciences, 10, 789-802, 2013 parameters showing considerable uncertainty reductions are the maximum catalytic capacity of rubisco (V max 25 ; parameters 1 and 2), the expected length of drought periods tolerated before leaf shedding (τ W ; parameters 17 and 18), the standard ratio of CO 2 concentration inside and outside the leaf tissues for C3 plants (f Ci C3 ; parameter 19), and the maximum plant-available soil moisture (W max : parameter 24).These six parameters also change considerably during assimilation (Table 1).
Which parameters show high uncertainty reduction differs only slightly among the experiments (Table 1 and Fig. 2).However, the largest uncertainty reductions always occur for the combined assimilation where we simultaneously assimilate both data streams.While the water balance-related parameters W max and τ W are strongly constrained even by one data stream alone, V max 25 for PFT2 and ξ require the combination of both to be constrained relative to their prior.For V max 25 (PFT10) and f Ci C3 , we find that LHF delivers some constraint, but the addition of FAPAR increases the constraint considerably, even though FAPAR assimilation alone delivers almost no uncertainty reduction.
Notably, parameter 24 (W max ) changes substantially from a prior value of 104 mm upward to 323 mm for LHF assimilation, and downward to 58 mm for FAPAR assimilation, while for combined assimilation the posterior value of 129 mm is close to the prior.Some parameters, for which the uncertainty reduction is less than 20 %, also show relatively large deviations from their prior parameter value (Table 1).These are parameters related to the activation energies and Michaelis-Menten constants of the temperature dependency of enzyme kinetics, E V max and K C 25 , as well as the efficiency of electron transport, α q , for C3 vegetation, and the linear growth constant in LAI, ξ .For the other 14 parameters, both the posterior value and the uncertainty hardly change compared to the respective prior values.
The three posterior uncertainty covariance matrices contain additional valuable information about which parameters are constrained simultaneously by which data stream.If we express the matrix in terms of correlations, i.e. normalise by standard deviations, the parameter which shows the highest absolute correlations with other parameters is W max .These are shown in Table 3 as R p i, j .A positive uncertainty correlation of two parameters means that if we underestimate one parameter, we are likely to underestimate the other parameter, too, and vice versa.This means we have a strong constraint on the difference of the two parameters but not on their sum.Negative uncertainty correlation is associated with a strong constraint on their sum but a weak constraint on their difference.
A positive uncertainty correlation (> 0.25) is found for W max against V max 25 PFT10 for LHF, and against V max 25 PFT2 for FAPAR assimilation, but a negative one (< 0.25) against V max 25 PFT10 for the combined assimilation.In the combined case, W max is also strongly uncertainty correlated with τ W PFT10, while for the assimilation of single data stream, it is more strongly correlated to τ W PFT2 than to τ W PFT10. W max and f Ci C3 are also co-constrained with a positive correlation for the single data stream cases, but not for the combined case.The only additional cases with absolute values above 0.25 are R p (τ W PFT2, f Ci C3 ) = −0.52,R p (τ W PFT2, V max 25 PFT2) = − 0.27 and R p (τ W PFT10, V max 25 PFT10) = −0.30for LHF assimilation, and R p (f Ci C3 , V max 25 PFT2) = − 0.95, R p (τ W PFT10, V max 25 PFT10) = 0.31 and R p (τ W PFT10, V max 25 PFT2) = − 0.25 for the combined assimilation.

LHF and FAPAR
As a comparison to measurements reveals (Fig. 3), the prior simulation considerably underestimates LHF during the two wet seasons (except for some scattered points between November and April), but has slightly higher values than the observations during the dry period.LHF and combined assimilation cases show a reasonable seasonality of LHF with high values in the wet season, gradually declining during the dry season (April to October), with slightly lower values than the observations.However, FAPAR assimilation yields lower LHF values than the observations over almost the entire simulation period, although with some scattered high values in the wet season.This results in the highest root mean square error (RMSE) for FAPAR assimilation (27.3 Wm −2 ) compared to 14.3 and 23.5 Wm −2 for LHF and combined assimilation, respectively.
Prior FAPAR values show a high value ranging up to 0.69 during spring and summer (Fig. 4); however, the observations show much lower values with a distinct seasonality ranging between 0.11 and 0.39.After assimilation of LHF, the modeled FAPAR values are too high, ranging between 0.65 to 0.95 (RMSE 0.67, see Fig. 4) when compared to the observations.Modeled FAPAR values after FAPAR assimilation show excellent agreement with the observations (RMSE 0.06), with the simulated FAPAR values falling within the uncertainty range of the observed values over almost the entire period.For the combined assimilation, the modeled FAPAR values have a distinct seasonality with values larger than the observations during the wet period, but showing a good agreement with the observations at the end of the dry period in October (RMSE 0.20).

Simulation of carbon fluxes
A comparison with data of gross primary production (GPP) derived from the eddy covariance data is shown in Fig. 5.The LHF assimilation case gives the highest GPP among the three experiments, which is mainly an effect of the simulated large LAI values.FAPAR assimilation results in much lower simulated GPP than the observations, while matching satellitederived FAPAR rather well (Fig. 4).
Apart from a good fit of LHF for the combined case, the simulated GPP also shows a moderately good fit in seasonality, which is reflected in a lower RMSE of 22 Wm −2 compared to the FAPAR assimilation case (27 Wm −2 ).All in all, the LHF assimilation case gives the best agreement with eddy covariance-derived GPP.We must note, however, that LHF

Sensitivity analysis of the impact of prior W max values on the optimization
Because W max shows the highest relative uncertainty reduction of up to 95 % (Table 1, and Fig. 2), we investigate how much the choice of the prior mean and uncertainty of this parameter impacts the assimilation results.Little is known about W max for the Maun site, which is why we have assigned it a large prior uncertainty (100 % of prior mean value).In a series of sensitivity experiments, we assimilated both LHF and FAPAR observations with four different combinations of prior mean and uncertainty for W max .The prior mean values are 1, 1, 3 and 10 times, and uncertainties are 0.5, 1, 2 and 10 times the standard case (see Table 4).All four data assimilation experiments yield very similar posterior values of W max , despite large variations in priors.Also, relative uncertainty reduction of W max is large for all cases (Table 5), but least when the prior error is smallest (Experiment D), as expected.The pattern of uncertainty reduction across all parameters is also very similar between the four experiments (Table 5).The biggest difference here is for τ W PFT10, for which Experiment A shows a markedly higher uncertainty reduction.The posterior values of the other parameters also vary by less than 2 % between the sensitivity runs, except τ W PFT2 and 10 for Experiment D (102 and 12, respectively, against 94 and 14 for the other cases).Posterior maximum LAI and the components of the cost function (total, parameter, LHF and FAPAR part) also all lie within a narrow range among the experiments, as does the RMSE for LHF, FAPAR and GPP (Table 4).This suggests that the optimum solution we found in the default experiment seems to be robust against the choice of prior W max values.The most notable finding is that the agreement with measured GPP is best for Experiment D, which has a reduced prior uncertainty.

Performance of model and assimilation scheme
Compared to the observed annual maximum in LAI between 0.9 and 1.3 (Mantlana, 2002;Veenendaal et al., 2008), FA-PAR and combined assimilation lead to reasonable simulated annual maximum LAI values of 1.0 and 2.4, respectively.However, LHF assimilation gives a value of 5.2, which is at the environmental limit given by the parameter value for max (5.2, see Table 1) and is clearly too high.The optimal value of W max is found between 58 to 323 mm, depending on the data stream assimilated (Table 1).W max can be converted to rooting depth by dividing by the difference between volumetric soil moisture content at field capacity and wilting point, i.e. by the volumetric plantavailable soil moisture.Here, we assume a plant-available soil moisture of 0.12 m 3 m −3 , based on the World Inventory of Soil Emission Potentials 2.1 data base (Batjes, 1995), for the Maun site and obtain values for rooting depth of 2.7, 0.5 and 1.1 m for LHF, FAPAR and combined assimilation, respectively.The value for the combined case (1.1 m) is close to the reported rooting depth in such dry conditions.Schenk  and Jackson (2002) suggested that dry tropical savannas have on average a rooting depth of 1.4 m containing 95 % of the total ecosystem roots.Veenendaal et al. (2008) showed that the tall and short mopane trees at the Maun site rooted at least 1.0 m deep.However, they also indicated that the total root density of both mopane types as well as the fine root density of short mopane were concentrated in the upper soil fraction up to 0.2 m depth.However, we must note that the inferred rooting depth is the maximum depth from which trees obtain water throughout both wet and dry season.Therefore, a rooting depth of 2.7 m as inferred by LHF assimilation seems to be at least possible, even though higher than commonly assumed.The estimates of the other two assimilation cases are found to be consistent with observations.Even though we do not give an explicit error margin for observed GPP, we would expect at least the same relative uncertainty (around 25 %) as for LHF, and a minimum uncertainty of 1-2 g C m −2 day −1 given the negative outliers shown in Fig. 5.With this assumption, we find that all simulations give reasonable GPP values, but that the LHF assimilation clearly leads to the best agreement.
We generally find that the combined assimilation lies in between the two separate assimilation cases: While LHF assimilation increases FAPAR and LHF, FAPAR assimilation decreases both quantities, and for the combined case simulated FAPAR, LHF and also GPP are adjusted to a an intermediate state, which happens to be rather close to the prior.
Considering the observational uncertainties, we find that both the LHF-assimilated and the combined-assimilated models generally agree with LHF observations, but that LHF is underestimated by the FAPAR-assimilated model.The opposite applies to the simulation of FAPAR: The LHFassimilated model clearly overestimates FAPAR compared to the observations throughout the entire assimilation period, effectively simulating a semi-evergreen ecosystem (Fig. 4).At the same time, the model is capable of matching observed FAPAR after assimilating this quantity, but at the cost of reducing LHF and GPP somewhat below observations (Figs. 3  and 4).The reduction in these quantities is expected given the reduced LAI required to match a lower FAPAR compared to the LHF assimilation or the prior.We also find that the FAPAR-assimilated model shows a slight delay in the maximum FAPAR against the satellite data and that the combined assimilation still overestimates FAPAR during the wet season.
Since the intermediate state found by the combined assimilation maximizes overall agreement with observations, the assimilation can be considered successful.We have thus demonstrated the capability of CCDAS to assimilate multiple data streams in a technical sense.However, the more interesting question is what can be learned from the way the model is adjusted and from the match or mismatch between model and observations.To match the observed LHF, the model needs to increase its simulated LAI beyond the observational constraint provided by FAPAR.Because CCDAS in calibration mode explores the complete parameter space of BETHY, we know that no combination of parameters is able to fully match both data streams, and that matching the observed LHF is only possible by simultaneously ignoring constraints by observed FAPAR.
From a theoretical standpoint, we expect the time evolution of LHF to closely follow FAPAR.This is because dry-season LHF is dominated by transpiration, transpiration mostly driven by net radiation absorbed by the canopy (McNaughton and Jarvis, 1991), and leaves mostly absorb in the photosynthetically active range.Indeed, observed LHF is reduced approximately proportionally to FAPAR when going from wet to dry season.During the dry season (May-October), observed LHF is still about one quarter of typical values during the wet season, and observed FAPAR somewhat more than a quarter of wet-season values.Examining Fig. 3 reveals that the under-matching LHF for the FAPAR-assimilated case is mainly due to low values during the dry season, and that the model shows a stronger contrast in LHF between the two seasons than the observations.Further model evaluation through assimilation of EC data, for other sites or different years, will be necessary to identify the reason for this difference.
When evaluating the model against observations as described above, we need to keep in mind conceptual issues when producing EC-derived GPP, which respiration accounted for by an empirical model, and possible biases of the EC system indicated by the lack of full energy closure (see Sect. 2.3).Trudinger et al. (2007) indicated that results were biased significantly by temporally correlated and non-Gaussian noise in their intercomparisons of many assimilation methods for a simplified process-based terrestrial model, including the adjoint method used here .Although most assimilation schemes assume, as does CCDAS, uncorrelated and Gaussian distributed errors, eddy covariance data have been shown to be impacted by several types of systematic errors: energy imbalance as mentioned above, underestimation of ecosystem flux under stable atmospheric conditions on calm nights, missing data, and so on.For example, negative observation-based GPP from eddy covariance data in November 2000 (Fig. 5) indicates inadequate data processing in terms of the estimated daytime ecosystem respiration for calculating GPP by subtracting NEE from it or the gap-filling procedure for missing data.Currently, we can not avoid such systematic errors easily.We have, however, considered the size of the energy disclosure as a prior uncertainty for LHF observations as a pragmatic solution.
A further problem may be associated with insufficient cloud screening for the FAPAR product, which would affect observations mainly in the wet season.This needs to be taken into account when interpreting the slight delay between modeled and satellite FAPAR for the FAPAR-assimilation case.A possible wet-season bias, if corrected, would bring the FAPAR-assimilated model closer to the prior or combined case, decreasing the mismatch between observed and model LHF.We further believe that the relatively coarse spatial resolution of the FAPAR product may not always capture the local phenomena inside the footprint of the flux tower.
It is important to note that most of the discrepancies just discussed would not have been detected if either BETHY had been run only with fixed parameters, without the CCDAS calibration step, or only one data stream had been assimilated.Therefore, an important lesson is that the assimilation of multiple data streams in CCDAS, by virtue of exploring the full range of model parameterisations, is a robust method for identifying data biases as well as conceptual limitations of models, and as such clearly superior to the standard approach of model parameterisation and validation.In an earlier study (Knorr et al., 2010), CCDAS was used to explore a range of possible formulations for leaf phenology, with assimilation of FAPAR providing a rigorous test for the combined surface, eco-physiology and phenology model in CCDAS.

Constraint of parameters by eddy covariance LHF and satellite FAPAR data
The most important outcome of this data assimilation exercise is that the biggest constraint delivered by the data streams in question is on the parameter W max .This is not only because this parameter is constrained to a large degree by all assimilation experiments, but also because the maximum water holding capacity of the maximum soil volume that can be reached by the vegetation's root system is a fundamental quantity in these water-limited ecosystems.Since it is not possible to measure this quantity at anything but the plot scale, assimilation of LHF and indirect inference of W max via data assimilation has the potential to deliver estimates at the scale of hundreds of metres.In addition, assimilation of FAPAR vastly expands the potential range to cover whole ecoregions and the globe.
We also find that the inferred value of W max is generally robust against the choice of prior values of the same parameter, but depends on the data stream assimilated.LHF assimilation required high values of LAI that needed to be supported by ample soil moisture, and consequently inferred the highest value of W max .Assimilation of FAPAR, by contrast, yields values more consistent with observed rooting depth.Before the technique could be applied at larger scales, some issues regarding data bias and the simulated latent heat flux under water stress (i.e. during the dry season) need to be clarified.
However, even before these issues are finally resolved, the structure of the posterior error covariance matrix can yield valuable information for the design of such observing systems.First of all, the parameters constrained by either assimilation experiment tend to be the same.Further, W max and τ W have generally large positive uncertainty correlations.This can be explained in the following way: Higher W max will lead to more soil moisture in the dry season, but has the effect of closing stomata further compared to a situation with identical absolute soil moisture but lower W max (Eq.A24).To compensate this effect, which decreases both LHF and FA-PAR, the optimization reduces τ W -recall that τ W represents the expected length of drought periods tolerated before leaf shedding (Knorr et al., 2010).
We also note that LHF assimilation yields information simultaneously on the stomatal control and drought-driven phenology of the savanna trees, but not on both separately.LHF assimilation also can only constrain W max and photosynthetic capacity of the grass canopy together, or photosynthetic capacity and parameters of the drought-limited phenology for both PFTs.FAPAR assimilation further yields a positive uncertainty correlation between photosynthetic capacity and W max , but for the trees not the grasses.It appears that LHF contains more information on grass, and FAPAR on tree functions.The sign of the uncertainty correlation for W max against photosynthetic capacity also varies between FAPAR and LHF on the one hand, and the combined case www.biogeosciences.net/10/789/2013/Biogeosciences, 10, 789-802, 2013 on the other, where the correlation is negative.This indicates that the model operates in a different regime after the combined assimilation compared to the other two cases, where increases in W max and photosynthetic capacity have either opposing or concurrent effects on the agreement with the data streams assimilated.
For the FAPAR assimilation, we find that posterior uncertainties of W max were correlated more strongly with other parameters than for LHF assimilation, with large correlations in particular for photosynthetic capacity of the seasonally green trees, their drought-phenology response and their stomata control.This means that FAPAR resolves fewer processes than LHF.However, the combined case in general has fewer cases of simultaneous constraint of parameters than either of the cases with a single data stream, except for the correlation between W max and τ W for grasses (see above).It seems that the seasonal response of grasses is the dominant factor determining agreement with observations in this case.
Further insights can be gained from considering changes in parameter values from prior to posterior.For the LHF assimilation case, where simulated FAPAR greatly overestimates satellite observations, it is the small τ W value for PFT2 of 10 days which is mainly responsible for the large LAI values.τ W has much higher posterior values of 170 days and 94 days for the other two cases, leading to a more pronounced LAI seasonality and also a lower overall LAI.When setting τ W to such a small value we assume that the plants' water reserves are almost always sufficient for continued plant growth.The high reduction in the relative parameter uncertainty of τ W PFT2 and 10 by more than 29 %, which is also apparent in a previous study with the same phenology scheme assimilating only FAPAR at seven eddy flux sites (Knorr et al., 2010), suggests a strong constraint by the FAPAR observations on the phenology component of BETHY.
The relative uncertainty reduction for parameter V max 25 for the trees (41 %) as well as for parameter ξ (14 %) are substantially larger for the combined experiment compared to the single-data stream cases (see Table 1 and Fig. 2).This suggests that each data stream carries complementary information on photosynthesis and phenology such that the combined assimilation has the apparent strong constraint on specific parameters of plant productivity and leaf phenology.

Conclusions and implications
We present the first study that simultaneously assimilates latent heat fluxes as measured by the eddy covariance technique and satellite-derived FAPAR using variational data assimilation, here for a savanna site at Maun, Botswana.Simulated LHF and FAPAR show a reasonable seasonality for the case of assimilating the two data streams together.The optimization against both data streams leads to an average relative reduction in parameter uncertainty of more than 15 % for the 24 eco-hydrological parameters in CCDAS, compared to between 9 and 6 % for LHF and FAPAR assimilation alone.The important finding that FAPAR is able to constrain hydrological parameters is confirmed by a recent application of CCDAS at the global scale (Kaminski et al., 2012).The authors showed that by assimilating FAPAR, CCDAS was able to constrain not only hydrological parameters, but also estimates of the diagnostic quantities of soil moisture and evapotranspiration.Similar to the present study, the constraint was improved when atmospheric CO 2 concentrations were assimilated simultaneously (in variance to but comparable to the EC fluxes used here).
We thus further demonstrate the potential of multiple-data stream assimilation by a more detailed local case study.From our experience reported in this study, we conclude that the simultaneous assimilation of locally and globally available data streams is an ideal tool for the identification of biases between models and data that can help with the development of suitable bias models for global assimilation exercises (Kaminski et al., 2012).Locally available data can thus improve global-scale assimilation of continuous data streams of FAPAR from satellites, bridging local and global scales and thus furthering the goal of large-scale monitoring of such essential climate variables as root-zone soil moisture.
The approach of simultaneous assimilation of multi-data streams as presented here can be extended to include additional remote sensing products, for example using the surface soil moisture product from the Soil Moisture and Ocean Salinity (SMOS) mission (Kerr et al., 2001).Despite of SMOS's lower spatial resolution (35-50 km), this would allow a rigorous assessment of the consistency of multiple data streams, as done here for FAPAR and LHF, but for data streams available with global coverage from remote sensing.More importantly, the combined assimilation of FAPAR data with surface soil moisture from SMOS in CCDAS would lead to a more complete description of the hydrological properties, due to the sensitivity of BETHY-simulated FAPAR to soil moisture in the entire root zone, not only in the nearsurface layer.

Fig. 1 .
Fig. 1.Schematic diagram of the CCDAS structure.Ovals represent input and output data, and boxes represent calculation steps.Diagnostics are quantities of interest such as carbon fluxes computed by CCDAS."unc."stands for uncertainty and "param."for parameters.

Fig. 4 .
Fig. 4. Observed and simulated fraction of absorbed photosynthetically active radiation (FAPAR) for the years 2000-2001, at 10-day intervals.Observed FAPAR is derived from the SeaWiFS instrument of the National Aeronautics and Space Administration (NASA).The error bar of observed FAPAR (Obs) represent the data uncertainty used in CCDAS."Prior" is the unconstrained simulation; "LHF", "FAPAR" and "Combined" denote which data streams were used when assimilating.Root mean square errors (RMSEs) of simulated FAPAR against observations are 0.195, 0.675, 0.056 and 0.202 for Prior, LHF, FAPAR and Combined, respectively.

Fig. 5 .
Fig. 5. Daily observed and simulated gross primary production (GPP) for the years 2000-2001.Observed GPP (Obs), based on eddy covariance data, is estimated by subtracting net ecosystem exchange (NEE) from ecosystem respiration during daytime.Daytime ecosystem respiration is estimated from the relationship between night-time ecosystem respiration and soil temperature.Root mean square errors (RMSEs) of simulated GPP against observation-based values are 1.72, 1.24, 1.87 and 1.45 g C m −2 day −1 for prior, LHF, FAPAR and combined, respectively.

Table 1 .
List of parameters in prior run and posterior runs assimilating LHF, FAPAR, and the combination of LHF and FAPAR.Uncertainty reduction (Unc.red.) is calculated as posterior minus prior uncertainty divided by prior uncertainty.Top rows: physiology; middle: phenology; bottom: energy and water budgets.Units of parameters are V max in µmol (CO 2 ) m −2 s −1 ; k in µmol (air) m −2 s −1 ; α , T in µmol (CO 2 ) mol (air) −1o C −1 ; K C in µmol (CO 2 ) mol (air) −1 ; K O in mol (O 2 ) mol (air) −1 ; activation energies E in J mol −1 , τ W in days; C W 0 in mm h −1 , W max in mm; and others are unitless.Prior uncertainty represents one standard deviation, except for the log-normally distributed parameters denoted by ( * ), for which the analogous difference between mean and upper 67 percentile is given.

Table 2 .
Cost function contributions from parameters (Param.),latent heat flux (LHF), and fraction of absorbed photosynthetically active radiation (FAPAR), as well as the total cost function value and the norm of the gradient.
* Not counted for total cost function.

Table 3 .
Error correlation coefficient of W max to each model parameter under optimization process.

Table 4 .
Sensitivity analysis on the effect by a variety of prior means and uncertainties in W max to simulation results.Runs are conducted with assimilating both LHF and FAPAR simultaneously.Max LAI shows maximum leaf area index in monthly averaged value for two simulation years.C denotes the standard case.

Table 5 .
Uncertainty reduction for model parameters for the sensitivity experiments on the effect by prior W max values.