Interactive comment on “ Alternative methods to predict actual evapotranspiration illustrate the importance of accounting for phenology – Part 2 : The event driven phenology model ” by V . Kovalskyy

P5338 l. 29. Phenology was modelled interactively even in global models much before Pitman (2003), e.g. Lüdeke et al. (1994) and Kaduk and Heiman (1996a), and effects were explored (e.g. Kaduk and Heiman, (1996b) and Kindemann et al., (1996)). I suggest that if the authors wish to explore the historical development of phonological modelling, then they need to also consider earlier work much more comprehensively. Otherwise, I would recommend to rewrite this part such that it explains and contrasts the different approaches not attempting a historical perspective.


Introduction
Water flux from the land surface to the atmosphere from evaporation and transpiration is a key variable that describes the surface climate and links it to the functioning of ecosystems.ET is characterized by the volume of liquid water transformed into water vapor and by the energy (latent heat, LE) spent to effect this phase transition.On a global level, ET accounts for approximately 62 × 10 12 m 3 of water per year (Peixoto and Oort, 1992), but this volume is distributed unevenly in space and time.Over vegetated surfaces the number of factors like the precipitation regime, fraction vegetation cover, and changing canopy structure interact to greatly complicate the ET estimation.The UN Food and Agriculture Organization (FAO) recommended using the Penman-Monteith model (Monteith, 1965) to estimate "reference" evapotranspiration (ET 0 ) on croplands (Allen et al., 1998).The concept of ET 0 has been used extensively in agriculture since it can mimic the ET dynamics over cereal crops with fully developed canopies.Actual evapotranspiration (ET a ), however, poses a great challenge for monitoring and even a greater one for prediction due to the high variability in environmental conditions observed across the land surface (Kalma et al., 2008).
Researchers have developed numerous approaches to retrieve ET a .For flux tower data, the eddy covariance method relates rapid fluctuations in water vapor density to ET a (Suyker and Verma, 2009).Yet, point-based estimates do not capture the spatiotemporal variability of evapotranspiration, even in feasibly dense networks (Kalma et al., 2008).Remote sensing provides means to achieve a better Published by Copernicus Publications on behalf of the European Geosciences Union.
V. Kovalskyy and G. M. Henebry: The event driven phenology model estimation of actual ET in the spatially explicit manner.Monitoring of ET a is based on retrievals of land surface temperature, which closely follows the sensible heat flux at the land surface.The ET a is then derived from the energy balance equations (Kustas and Anderson, 2009).Also, using surface energy balance (Bastiaanssen et al., 1998;Allen et al., 2005;Su et al., 2005;Senay et al., 2007;Mu et al., 2007;Allen et al., 2007) or water balance (Verdin et al., 2002;Senay and Verdin, 2003), ET a models have had varying degrees of success in addressing spatiotemporal variation.Several attempts were made to use empirical machine learning techniques for ET modeling (Yang et al., 2006;Kaheil et al., 2008;Jung et al., 2010).However, these approaches were designed for monitoring purposes, only to be used in retrospective data analysis and hence offer little for prediction.
The principal challenge in remote estimation of ET a is to capture temporal canopy dynamics (Cleugh et al., 2007;Mu et al., 2007;Godfrey et al., 2007;Senay, 2008, Weiß andMenzel, 2008).Temporal changes in canopy are determined by the phenological development in specific vegetation types (Suyker and Verma, 2009) and therefore cannot be well represented by a model constant.Varying in space and in time, dynamics of canopy properties often correlates with weather variables, posing a challenge for modelers to separate their influences on canopy resistance to transpiration.The various trajectories of canopy dynamics composed possibly of multiple species within a limited area constitute a complex object of land surface phenology (LSP).Land surface phenology studies the spatiotemporal development of the vegetated land surface using remote sensing (de Beurs and Henebry, 2004), and sometimes called "remote sensing phenology" (Morisette et al., 2009).Several pioneering studies in land surface phenology (de Beurs and Henebry, 2004;Reed, 2006;Zhang et al., 2007;Stöckli et al., 2008;Xiao et al., 2009) point to the need to move beyond the conventional representation of LSPs as static trajectories of vegetation cover properties with negligible response to changing weather conditions.
Traditionally, hydrological models have used just one coefficient to represent canopy factor where the value of the coefficient stayed the same for the whole season (e.g., Manabe 1969;Weiß and Menzel, 2008).More recent land surface models (LSM) with hydrology modules typically use static climatologies (seasonal trajectories averaged over multiple years) of canopy parameters (Mitchell et al., 2004;Montaldo et al., 2005;Lawrence and Chase, 2007;Senay, 2008) for the estimation of ET a and other land surface fluxes.This approach is employed in a number of models, including MO-SAIC (Koster and Suarez, 1996), SAC (Koren et al., 2004), Noah LSM (Ek et al., 2003), MIROC (Hasumi and Emori, 2004), and many other LSMs.In smaller scale studies the progression sometimes simply runs as a curve fitted into prior observations (Montaldo et al., 2005) to represent phenology as a function of time.Despite their robustness, the static climatologies and time functions also introduce errors by ignor-ing interannual phenological variability and transients due to abrupt weather events (Milly et al., 2008;Wegehenkel, 2009).
An interactive approach to phenology modeling was first introduced in applied plant growth models (Pitman, 2003).Lüdeke et al. (1994), Kaduk andHeiman (1996a), andKindemann et al. (1996) developed basic interactive phenology modules and applied them in global terrestrial carbon cycle modeling (Kaduk and Heiman 1996b).Relying on proxy variables such as thermal time, duration of daylight, or accumulated precipitation, interactive phenology modules determine the start, end, and duration of the growing season using empirical thresholds (Dickinson et al., 1998;Foley et al., 2000;Reed et al., 2003).In some vegetation models, developers went beyond just dates and linked seasonal dynamics of leaf area index (LAI) to thermal time based on plant thermal response functions (Neitsch et al., 2002;Bondeau et al., 2007;Rötzer et al., 2010).In other works researchers started using multiple factors simultaneously to derive phenological trajectories (Jolly et al., 2005;Setiyono et al., 2007;Stöckli et al., 2008).Finally, the interactive approach has been extended to include the concept of event drivers with the first successful trials reported in the companion paper (Kovalskyy and Henebry, 2011).This concept stands apart from all traditional models that have either air temperature, insolation, precipitation, or other weather variables acting as a single continues factor determining only the phenological timing or only the shape of seasonal canopy trajectory.The event driven concept uses continuous weather factors to estimate phenological timing while further transforming them into discrete events -triggers of change in daily canopy dynamics.Hence, daily insolation, daily thermal time, precipitation, freezing temperatures, and heat stress can simultaneously contribute to timing and shape of phenological trajectories.The EDPM can simulate daily canopy dynamics from the actual weather data and, thus, it has the potential to replace climatologies in LSMs that still rely on a static approach to phenology.
In this paper we compare and contrast both static and interactive approaches to the modeling of land surface phenologies.The LSPs representing both approaches are evaluated via parameterizing simplified model of actual evapotranspiration -VegET -that is currently used operationally in the Famine Early Warning System (FEWS NET).We use the original implementation of the VegET model parameterized with static NDVI climatologies as a starting point.We then replace the static parameterization with (1) contemporaneous NDVI time series to server as a reference and, alternatively, (2) vegetation index (VI) trajectories produced by the interactive EDPM.Within this experiment, VegET with alternative phenological parameterizations produced daily ET a values during the growing season for maize, soybean, and grassland canopies.We compare each modeled ET a outcome with the best available references -ET measured at flux towers -to characterize performance of the interactive EDPM in the coupled scheme relative to typical solutions coming from climatologies.Specifically, the study aims to answer the following questions: (1) How does the interactive phenology differ from the static phenology?(2) If there are differences, then when and where are results from the interactive phenology significantly different from the static phenology?Analytical procedures used to answer these questions are described in detail in Sect.2.6 that provides the roadmap for the analyses we used.

Evapotranspiration model
VegET is a recent development in ET modeling; it uses water balance principles and remote sensing data to drive the evapotranspiration process (Senay, 2008).The model is simple and flexible enough to provide framework for our analysis.VegET uses the standard Penman-Monteith equation to address the influence of climatic factors within a single time step in one location for one vegetation type.Transition to a different set of vegetation and soil conditions is effected through two coefficients capturing canopy dynamics and ground water regime: where K s is a soil moisture stress coefficient computed from daily water balance (2) and K cp is a plant coefficient driven by phenology and distinct from K c , the traditional stage standardized crop coefficient recommended by the FAO (Allen et al., 1998). if where SW i is soil water content at the current step, MAD is the Maximum Allowable Depletion level.The rationale for using K cp instead of K c comes from multiple observations of linear relationships between K c and VIs (e.g., Hunsaker et al., 2003;Tasumi et al., 2005).In evaluating VegET performance, Senay (2008) used very simple transformations from NDVI to K cp based on the thresholds and observed variability of the NDVI derived from AVHRR data.Despite the coarse resolution of the sensor (1 km pixels), results using K cp showed improvement in sensitivity to canopy dynamics compared to K c and remarkable performance in capturing the actual ET (Senay, 2008): Pearson correlation coefficients of 0.87 and 0.88 for flux towers in South Dakota and Arizona, respectively.

Representation of phenologies
In his original paper, Senay (2008) derived K cp from long term averages of NDVI from AVHRR.Smoothed with the temporal three-point moving average filter, the resulting curve is an NDVI climatology that is presumed to produce minimal errors over the long term.A set of empirically derived thresholds was used to mark the beginning and end of the growing seasons.We saw several limitations to this approach.First, the NDVI is not consistent across AVHRR, MODIS, and other synoptic sensors due to differences in sensor spectral bandwidth and band placement (van Leeuwen et al., 2006;Kovalskyy et al., 2011b).These sensor differences may cause discrepancies in derived K cp , but the significance of these has yet to be determined.Therefore, it was important for this study to assess the sensitivity of the model to the NDVI derived from AVHRR versus MODIS sensors.Second, the VegET relies on expert knowledge about the seasonal NDVI dynamics and on published maximum and minimum values of K cp for a given vegetation type.This approach worked empirically, but for a potential improvement it is possible to invert (1) and estimate K cp from flux towers.Therefore, we examined closely the relationship between the vegetation index and the phenologically forced coefficient.
To deliver a better solution for K cp derivation, we use our newly-developed Event Driven Phenology Model (EDPM; Kovalskyy and Henebry, 2011) as an interactive alternative to the long-term averages used previously with VegET (Senay, 2008;Senay et al., 2009).The EDPM is data driven, but instead of a historical record of satellite observations, it incorporates flux tower observations with sequence modeling to simulate daily dynamics of a modeled variable (e.g., a vegetation index), depending on the phase of canopy development.The EDPM treats daily forcings as transient "events" that can potentially modify trajectories of canopy development.From collections of such events the model builds the phenological trajectories at daily steps.The EDPM has been successfully tested on flux-tower derived normalized difference vegetation index (TNDVI; Wittich and Kraft, 2008).To generate an LSP, the model represented the TNDVI value in the next step as conditioned on the current value, with ongoing events potentially modifying the current TNDVI value with change slope E as follows: where TNDVI is the vegetation index value, E is the stepchange coefficient (or slope produced by events), and t is the time step index.Detailed description of how the EDPM works is given in the companion paper (Kovalskyy and Henebry, 2011).
In this experiment, the EDPM used six kinds of forcings that can manifest as events during the growing season: (1) rain, (2) heat stress, (3) frost, (4) insufficient insolation, (5) adequate insolation, and (6) growing degree days.The impacts of these events depend on the vegetation type and ongoing phenophase.The internal phenological phase control module is responsible for autonomous estimation of key growing season dates.

Study sites
The study sites included rain-fed maize and soybean fields and grasslands located within the "corn belt" and "soy belt" of the central United States.Climatic particularities and the geographic settings of the belts produce strong ET gradients.
The northern tier has only 600 mm ET annually; whereas, at the southern end, the annual ET can reach 1000 mm.Maize and soybean are the two most prevalent crops across the region.For that reason, we chose two sites from the AmeriFlux network to represent croplands at the extremes of the region: Bondville, Illinois, to the east and Mead, Nebraska, to the west.A similar strategy was used for grasslands: the Fermi site in Illinois represented humid grassland and the Brookings site in South Dakota represented subhumid grassland.(We did not include in this study a site representing the arid end of the grassland spectrum.)We presumed that the responses of the grasses at each location were sufficiently similar -all "spring-green" -so as not to require different types of phenological patterns.

Data sources
The experiment devised for this study required microclimatological records from flux tower sites as well as satellite observed canopy states for the locations.Level 2 flux tower data were downloaded from http://ameriflux.ornl.gov/;specifically, the energy fluxes, microclimate records, and soil moisture for rain-fed agricultural sites and grassland sites.
After checking the data quality (by examining the consistency of records), we selected twelve growing seasons for the experiment (Table 1).

Data preparation
Estimation of the daily actual evapotranspiration with VegET model required us to calculate the reference evapotranspiration and the soil water stress.However, before the calculations were made, we had to reprocess the hourly records of each variable into daily time series of 2 m air temperature (daily sum).The reference evapotranspiration was calculated using the Penman-Monteith equation (Monteith, 1965).We used the AmeriFlux site descriptions to obtain values for soil permanent wilting point and water holding capacity at each site.Finally, we used descriptions of crops (Nielsen, 2002;Setiyono et al., 2007) and grasses (Henebry, 2003;Henebry, 2010) to obtain rooting depth profiles and critical soil water depletion levels.Based on these data we calculated the SW within the root layer.Daily dynamics of soil water stress coefficient K s were derived from SW records via (2) and stored along with daily ET 0 as common inputs for calculations of multiple estimates of actual evapotranspiration.
Next step was the calculation of K cp trajectories from satellite data.Climatologies from MODIS NDVI and AVHRR NDVI time series were transformed into the K cp coefficient for VegET model as specified in Senay (2008) to represent static LSP modeling approach.The same transformation method was used on the contemporaneous MODIS NDVI time series representing observations of the canopy condition during the modeled growing seasons.The 8 day composite values of contemporaneous NDVI were linearly connected to make up daily time series.These contemporaneous time series served as a benchmark for the VegET model performance since with K cp derived from contemporaneous observations the ET a estimation becomes retrospective.Such parameter coefficients should, in theory, produce better model outcomes despite gaps due to cloud cover.Comparing other predicted ET a against retrospective estimates should give an idea of how closely the two approaches to phenological predictions match with the best performance of the VegET model.
Representing the interactive LSP modeling approach, the EDPM produced the phenological forcings by simulating seasonal trajectories of TNDVI (see Kovalskyy and Henebry, 2011).Transformation to K cp was affected through the linear relationships between the observed TNDVI and K cp obtained from inverting ET a and soil moisture data from flux towers (Fig. 1a, b), yielding slopes of 1.22 for maize-soy cropland and 1.38 for mesic grassland.Both relationships retained substantial noise (RMSE of 0.23 and 0.34, respectively).However, the residuals at grassland sites were found to correlate well with vapor pressure deficit.Therefore, we used the polynomial fit (Fig. 1c) to model the residuals.When included into the TNDVI -K cp transformation process, the modeled residuals helped to reduce dramatically the spread of errors around the linear fit (RMSE = 0.26; Fig. 1d).Therefore, modeled residuals were used to transform the EDPM derived TNDVI into the K cp at the Fermi and Brookings sites resulting in specific pattern of K cp parameters for grassland in Fig. 2.
Figure 2 shows all combinations of VegET parameterization by vegetation factor K cp .
The parameter sets shown in Fig. 2 were organized as input feeds to the VegET to produce four alternative sets of evapotranspiration estimates: (1) ET-ED where the ET was obtained with the VegET parameterized by K cp from the EDPM2) ET-CA where the ET was derived with the VegET driven by K cp based on AVHRR climatologies; (3) ET-CM where the ET was derived with the VegET driven by K cp based on MODIS climatologies; and (4) ET-OB where the ET was derived with the VegET driven by K cp produced from retrospective MODIS NDVI time series contemporary with the modeled seasons.In addition, our analysis retained the most commonly used alternative: the reference evapotranspiration ET 0 (ET-PM).We report these results in Appendix A. In total we examined four ET a estimates that accounted for phenology and ET 0 that simply used the Penman-Monteith equation (Monteith, 1965) without accounting for seasonal changes in canopy conditions.
Five distinct sets of ET estimates were compared (1) against flux tower observations, and (2) among one another to see the difference across considered LSP modeling approaches.When making the comparisons, we were aware of variable footprint dynamics in eddy covariance records, footprint size differences between spaceborne sensors and flux tower instrumentation, geo-location issues related to remotely sensed products, spectral differences of the AVHRR and MODIS instruments, and many other sources of noise and uncertainty.All these issues can create deviations in the flux tower records as well as in the remotely sensed data; consequently, these deviations can also appear in model output because they were embedded into the data used for model inputs.Yet here, we follow in the steps of many others (Nagler et al., 2005;Cleugh et al., 2007;Mu et al., 2007;Senay 2007Senay , 2008;;Zhang et al., 2009)    to calibrate, refine, and validate their models.Thereby, this experiment gives a picture of the relative differences between six realizations of phenological forcings on VegET predictions.

Roadmap for analysis
We selected four procedures to evaluate the predictive performance of the alternative parameterizations of VegET: ( To analyze the differences in performance between the four alternative LSP parameterizations of the VegET as well as ET 0 , we first focus on analyzing the distributions of residuals (DOR).Here, the shape and location of the distribution relative to zero deviation describes the precision and accuracy of model output.Out of the six sets of results, the parameterization that produces an average difference from observations closest to zero is preferred.Also, analysis of residuals allows using the mean root square error (RMSE) as a metric of error spread.Lower RMSE means that the parameters from a particular source produce outcomes with higher accuracy.A better performing phenology representation yields a narrow symmetrical residual distribution centered on zero.We used Student's t-test to evaluate whether each residual distribution was significantly different from zero.However, since the t-test here could not take into account differences in variance, we needed to complement this analysis with the Kolmogorov-Smirnov two-sample test to refine distinctions amongst phenological parameterizations in the predictions made by the VegET.Given the number of the alternative ET a estimates, we had ten pairs for comparison and so, to reduce the risk of a Type I error, we used the Dunn-Sídak procedure to adjust the critical p-value in multiple comparisons (de Beurs and Henebry, 2005).
To assess differential performance across three broad phenophases (green-up, reproduction, and senescence), we used a simple nonparametric score F to show the chances of one model to produce a better estimate than another.The timing of growing season and three phenological phases was extracted from observed TNDVI dynamics for each crop/vegetation type following the approach of Viña et al. (2004).The scoring procedure assigned scores to ET a estimates based on residuals: given a pair of ET estimates and a specific observation, the estimate with smaller absolute residual would earn the score of 1 and the one with larger difference with observation would receive the score of 0. The total score, whether for a specific phenophase or the whole season, was calculated as follows: where n is the number of considered pairs of absolute deviations and f is the score (1 or 0 depending on whether the deviation is less than the reference deviation).A similar scoring approach is used in internal workings of K-S test (Press et al., 1986).This technique also aimed to highlight the temporal consistency in accuracy of different ET a estimates.Finally, we needed a measure of VegET performance that could summarize pros and cons of the different parameterizations.We were specifically interested in the ability of the alternative parameterizations to estimate (1) the total seasonal evapotranspiration and (2) the duration of season in days.The use of K cp time series derived from either MODIS or AVHRR climatologies implies fixed growing season trajectories and dates for all sites/locations, regardless of vegetation type.Unlike climatologies, the EDPM can simulate seasonal trajectories of K cp for individual crops and grassland also producing phenological transition dates such as start and end of season.We analyzed the differences between estimated and observed lengths of seasons as well as total seasonal ET to identify the better performer.However, the number of seasons considered for each crop (3 crops) or location (4 locations) was too few to draw statistically reliable inferences separately for each group out of the total 12 seasons.Therefore, we have included figures for each crop type and each parameterization source to illustrate how both the modeled total seasonal ET and the modeled growing season length differed from observations.
All four aspects of model performance were independently analyzed in the context of vegetation type and locations.Geographic differences in evapotranspiration regimes and the magnitude of daily ET a values between locations gave us another reason not to compare the distribution of estimates, but to examine the residuals instead.Vegetation type is another crucial factor potentially affecting parameterizations of VegET since crops and grassland differ dramatically in terms of phenology (Henebry, 2010).Therefore, we present differences in model performance stratified by location and by vegetation type.

Detecting bias in the outcomes
We calculated the differences from the observations mated ET a -observed ET a ) for estimated ET a derived the four main LSP parameterizations.Figure 3 displays the histograms of the DORs for each version of estimated ET a by vegetation type (A) by location (B).Central tendencies of obtained DORs are shown as the dotted lines in Fig. 3.The top row shows from EDPM forcing, which exhibits low bias in each realization having dotted lines close to 0. In contrast, the DORs for crops produced by static forcings (rows 2 and 3) clearly show overestimation bias.Slightly better alignments can be observed in row 4, show DORs produced by retrospective MODIS NDVI parameterization of VegET.The grassland exhibit bias for both climatologies, but also low accuracy resulting from high variability.
In support of this visual assessment of the DORs, the statistical analyses confirm significant in the estimates of ET produced by the four of VegET (Tables 2a  and b).However, both tables show that the t-scores were consistently lower residuals coming from the EDPM (Table 2a) which is evident of less significant bias coming from this interactive model.In Brookings, South Dakota, DOR from EDPM was not significantly different from reference, most probably due to wide spread of residuals and because of good accuracy the results.In Table 2b the situation for climatologies, with only the South Dakota grassland site showing non-significant bias in satellite-derived parameterizations.Yet, the biases from the EDPM forcings seen in this experiment were smaller than those produced by climatologies of canopy parameters (AVHRR and MODIS).Also in most cases, the EDPM outcomes had smaller bias than the VegET parameterized with contemporaneous observations of MODIS NDVI expected to be a reference of a better VegET performance.

Contrasting the distributions of residuals from the four sets of ET a estimates
The distinctions between biases were captured by the Kolmogorov-Smirnov tests that looked at the divergences between the entire DORs, not just the means.With Dunn-Sidak procedure, we adjusted the critical level of p-value to 0.0016, thereby keeping the overall probability of Type I error below 0.01.Detected differences were organized into diagrams (Fig. 4) showing exhaustive pairwise comparisons of residuals structured by vegetation type and locations.Across the vegetation types (top row of Fig. 4) the EDPM parameterizations stood apart from every other phenological parameterization.Similar situation appeared where residuals were stratified by location, except for the Brookings site.At that location the use of EDPM yielded a DOR that could not be distinguished from DORs coming from other parameterizations.The situation at Brookings was unique since the VegET produced no substantial bias regardless of phenological parameterization (Table 2b).Only the Penman-Monteith model substantially overestimated ET a at Brookings tower (Table A2, Appendix A).
The performance of VegET parameterized with long time averaged MODIS and AVHRR NDVI transformed intoK cp turned out to be indistinguishable between the instruments in all seven comparisons.Only in maize 4) were the DORs climatologies different from the DOR of retrospective MODIS derived K cp parameters.At the same time, the Kolmogorov-Smirnov test failed to distinguish between the DORs from reference ET and the DORs from ET estimates produced with AVHRR and MODIS climatologies in the soybean crop and at Mead, Nebraska.In total the VegET with different parameterizations went through 28 comparisons with Penman-Monteith model and produced different DORs in 23 cases (Table A2, Appendix A).These results together with the smaller biases make the VegET stand far apart from ET 0 but closer to the observed ET a values.Consequently, this distinction serves as an evidence of the crucial role of canopy conditions and phenology in seasonal variation of actual evapotranspiration.

Comparing overall accuracy of different realizations of VegET
In this section the Fig. 5 shows the root mean squared errors as measures of model accuracy structured by vegetation type and location.RMSEs of the EDPM parameterized VegET were smaller than the RMSEs of ET a estimates coming from climatologies (2 middle bars).The difference between these RMSEs reached the maximum of 1 mm per day (at Fermi, IL, Fig. 5), but in other cases it dropped as low as 0.1 mm per day (in soybean, Fig. 5a) and even no difference (in Bondville, Fig. 5b).Sometimes the RMSEs from MODIS climatologies were comparable to those from climatologies (in soybean and at Mead, NE, Fig. 5), but contemporaneous MODIS observations of NDVI in the VegET model produced ET a with smaller RMSEs than climatologies.Despite the uncertainty in estimation of phenological timing, the EDPM managed to predict canopy conditions for VegET almost as well as contemporaneous MODIS NDVI observations.The RMSE differences between the EDPM and retrospective MODIS NDVI forcings were negligible within the two crops and grew only up to 0.4 mm per day for the grassland in Fermi, Illinois.
Another important issue depicted in Fig. 5 is the variability of RMSE within one source of canopy parameterization.The forcing from retrospective MODIS NDVI manages to hold the RMSE within 1.2-1.6 level in all cases except for Fermi.ET estimates from climatologies had their RMSEs varying parallel to each other and inflating greatly in the grassland (Fermi, IL).EDPM forcings produced ET a with the most stable RMSE varying from 1.2 to 1.6 mm per day.Comparable or greater error levels were reached by Nagler et al. (2005), Cleugh et al. (2007), Mu et al. (2007), Kang et al. (2009), Zhang et al. (2009).This stable agreement with observations in the EDPM is achieved with through interactive capturing of phenological developments.Stability of RSME in VegET outcomes from the EDPM forcings (as seen in Fig. 5) is superior to forcings from climatologies and matches the one from K cp trajectories derived from contemporaneous NDVI.

Temporal aspects of VegET performance
The temporal aspect of the VegET performance with different parameterization remained hidden until this point.To disclose this detail we calculated F scores (4) showing the probability that one set of modeled ET a was closer to observation than another.Here the main intent was to identify segments of the season when one of the VegET arrangements performed better (or worse).The comparison of the ET 0 against all VegET arrangement was placed in the Fig. 1 of the Appendix A, showing that the climatologies were better only during the green-up and brown-down phases, while EDPM managed to give better results even during the entire season.
Figure 6 (Sections A and B) shows the chances of EDPM forcings to be more effective than the K cp parameters from other sources: AVHRR climatology, MODIS climatology and MODIS contemporaneous NDVI.The graphs reveal that EDPM produced parameters yielded higher F -scores than climatologies coming from AVHRR for maize during greenup and the reproductive phases.The chances that the EDPM was performing better were also high during the reproductive phase in soybeans.For the senescence, both sets of K cp parameters coming from MODIS produced ET a with somewhat better F -scores than the EDPM forcings.For crops (maize and soybeans) and for agricultural sites (Mead and Bondville), the F -scores of the EDPM coupled to VegET followed very close patterns during growing seasons: high scores when tested against AVHRR climatologies and somewhat lower scores against phenological parameters from MODIS.For grassland, however, the EDPM forcings produced a very stable (> 0.5) level of F -score when tested against all other sources of VegET parameterization.The only noticeable difference within grassland sites arose for the scores of the EDPM over contemporaneous MODIS NDVI derived forcings.

Assessment of impact from errors in daily estimates on total seasonal ET a
Figures 7 and 8 give a bigger picture, showing the consequences of biases in the VegET forcings as well as choices made for determining phenological parameters of growing seasons.When considering only the observed timing of a growing season, it became apparent that the overestimation of daily ET a by VegET climatologies results in additional 100 mm of ET per season on average for crops and somewhat less for grassland (Fig. 7).With observed ranges of seasonal ET between 400 and 700 an error of this magnitude can be considered quite substantial.The EDPM provided parameters with considerably smaller biases resulting in only an additional 50 mm of ET per season for crops (equivalent to ∼10 %), but it came short by 50 mm (or ∼10 %)in grassland.
Figure 8 shows that in addition to the overestimation of daily ET a , climatologies added extra days to the duration of a season.The extra time was more apparent in crops adding more than 80 days to the growing period of the year.The automatic phenological control module of the EDPM overestimated season durations in crops by less than 20 days on average.For grassland the EDPM underestimated the length of growing cycle by around 30 days.This last issue, however, was better handled by climatologies where they overestimate the length of growing period for grassland by 20 days.Finally, Figure 8 shows that the differences in observed and estimated lengths of growing season in retrospective MODIS NDVI were not as big as in climatologies but not as small as those of the EDPM.www.biogeosciences.net/9/161/2012/Biogeosciences, 9, 161-177, 2012

Phenology factor in evapotranspiration process
This modeling experiment highlighted not only the role of phenology in the evapotranspiration, but also showed the particular significance of phenological factor in time, space, and vegetation type.Clearly, the overall impact from phenology in ET over vegetation will always be relative to the dynamic range of changes caused by other factors.The best instance is presented by grassland sites where the dynamic range of physiological changes in the canopy is often overshadowed by the response of grasses to water stress.Consistent F -scores in Fig. A2 (Appendix A) for grassland during all three phenophases tell that VegET gives advantage over the P-M model mostly through its ability to incorporate the water stress.For crops, however, the phenological factor becomes the dominant source of advantage pushing the F -scores up during green-up and senescence.Therefore, in the systems where other factors have minor influences, phenology becomes the key driving force for evapotranspiration, second only to the weather.The particularities of phenological development and the interaction of phenology with the climate are also important as plant communities shape their growing cycles dynamically in response to current weather conditions.Capturing those particularities by the EDPM provided an advantage and a better idea about evapotranspiration not only on a daily basis but also when giving a seasonal summary.

Performance of VegET in point based estimation of actual evapotranspiration
The representation of phenology factor turned out to be the key issue of the VegET performance in capturing the temporal dynamics of ET a .In fact, it is fair to say here that the VegET output was at least as good as its phenological parameters.As a reference point, contemporaneous 8-day observations of NDVI from MODIS helped the model estimate ET a with accuracy that surpassed Penman-Monteith equation by at least 0.5 mm per day.This translates into five tons of water per hectare per day, which can be crucial for farmers trying to estimate plant water demand for irrigation.The data product we used as a reference is based on multiple satellite observations and is released only long after the observations occur.Accordingly, for forecasting purposes it is necessary to rely on long term averages of the phenological variable or some prognostic phenology model.The results of this experiment suggested that, in the case of climatologies, there was a loss of accuracy.However, the use of the event driven phenology model as a source of K cp parameter helped the VegET to give prognoses of ET a values that were at least as accurate as those produced using 8-day MODIS NDVI observations.The EDPM achieved this level of performance by capturing fine temporal details of canopy component K cp .
Most of these details were averaged and smoothed out in climatologies.The retrospective time series of MODIS NDVI appear to do a better job than climatologies, but the temporal details were lost because of missing observations due to clouds (Roy et al., 2006), 8-day release period and 16 day rolling compositing algorithm (Schaaf et al., 2002) that may have smoothed out larger temporal fluctuations in NDVI.
Looking at all the aspects of VegET performance in this experiment, we ranked the parameterization sources in a quantitative manner giving 1, 2, or 3 points to the source that performed best, second best, and third for each of the several evaluation criteria.Lower scores indicated better performance.
(a) For the smallest average bias, the EDPM ranked first, with retrospective MODIS NDVI second, and climatologies third.Although the Student t-test identified all biases as significant, the Kolmogorov-Smirnov tests suggested that the smaller residuals from the EDPM were significantly different from the residuals produced with other phenological parameterizations.Also in the analysis of DORs, the contemporaneous MODIS NDVI parameters stood in between the EDPM forcings (lowest biases) and the climatologies (highest biases).
(b) Accuracy assessment with RMSE and F -scores resulted in the contemporaneous MODIS NDVI taking the first position and the EDPM and climatologies placing second and third, respectively.Even though the retrospective MODIS NDVI forcings were better in F -scores, the EDPM was not too far behind and it had slightly smaller and stable RMSE.Climatologies, however, were considerably behind.
(c) Producing estimates of total seasonal ET and growing season duration, the EDPM outperformed contemporaneous MODIS NDVI and it was a clear winner for crops.However, the advantage (smaller differences in total seasonal ET a ) of the interactive model was not as obvious for grassland.The K cp parameters from climatologies placed third with similar differences with observed seasonal ET and durations of growing period.
The analysis conducted for this study would benefit from a year by year comparison of performance between the four arrangements of VegET during different phenophases.It could help reveal reasons for poor performance by climatologies during anomalous years with shifts in the timing of spring or late season droughts.Unfortunately a lack of complete temporal overlap and large distances between flux tower site locations prevented us from including such an analysis in this study.We also have to point out the smoothing applied to the climatologies, as prescribed by Senay (2008), may have disadvantaged produced phenologies relative to the locally trained EDPM driven by contemporary weather.At the same time, this study was meant to show that interactive capturing of fine temporal details in canopy development can bring the expected advantage to the VegET.Overall, this investigation demonstrated that, when parameterized by climatologies, the VegET lost sensitivity to ongoing shifts in phenological timing and to finer temporal fluctuations of canopy characteristics, especially in crops.Using empirical thresholds for determining the start and end of the growing season binds the original methodology to the settings of the original experiment.Transfer of the same threshold to different locations was often problematic for spectral indices such as the NDVI (Verstraete et al., 1996).Therefore, perhaps, even the retrospective MODIS NDVI could not capture the growing season duration using the original constraints proposed for the VegET (Senay, 2008).Relying on a different mechanism and incorporating multiple factors for capturing phenological parameters, the EDPM gave a more realistic response to the changing weather conditions and thereby yielded substantially smaller errors for crops as well as for grassland.Therefore, the overall ranking makes the EDPM-producedK cp the best choice for VegET parameterization out of the four evaluated in this investigation.

Addressing issues in the EDPM functioning encountered during the experiments
Several caveats should be disclosed here for the future use of the EDPM in the described coupling scheme.First of all, the correction for drift in inverted K cp on grassland sites that correlated with VPD appeared to bring no advantage to the EDPM and should not be used in future research or application.Further, EDPM predictions, compare to climatologies, require additional input data and computational effort.The forecast made by the EDPM will depend on the reliability of weather scenarios supplied to the model, but so will the reference evapotranspiration required by VegET.For other coupling schemes that do not use weather data already, deployment of the EDPM may be redundant unless the higher level of accuracy is an absolute requirement.Long term averages may deliver sufficient results for places with stable species composition and little to no interannual variation in the course of the growing season.Meanwhile, the EDPM can provide a better phenological parameterization to models of land surface processes, but one must consider that not every factor influencing phenology has yet been brought into the modeling framework.Furthermore, the EDPM was trained to simulate phenologies for only three vegetation types.The automatic estimation of growing season parameters (dates) still constitutes a considerable source of error.The novelty of the event driven approach to phenology may well present an obstacle for wider applications of the model.The EDPM has made its first steps in simulation experiments, revealing some problems related to the unsettled methodological issues discussed in the companion paper (Kovalskyy and Henebry, 2011) and to limitation in data resources for training and testing.These problems can be and will be resolved as more flux tower data flow to the archives of AmeriFlux and other microclimatological data networks.However, we do not propose here that all of the problems in ET forecasting can be solved with good phenological forcing, since it has only a relative impact on ET.Training the model on new data and refining the patterns of vegetation responses to different event types has the potential to improve accuracy of outcomes produced by the EDPM.Though, the consistency and quality of microclimate records -training materials -can pose an obstacle for addressing model performance issues.New types of events should be included in the framework to drive the curves of canopy dynamics of current and new vegetation types.The work should continue on enhancing the precision of automatic estimation of the phenological transition dates.

Assessing the application potential for the event driven phenology model
Despite known issues, the EDPM and VegET coupling scheme showed potential to be used in modeling of evapotranspiration over vegetated areas.The biases and error measures of the produced estimates were comparable to those encountered in other investigations (Nagler et al., 2005;Kang et al., 2009;Zhang et al., 2009).Stability of error levels across vegetation types and locations seen in this experiment makes this scheme attractive for spatially explicit estimation of actual ET.Narrow focus of the EDPM on vegetation types allows using maps of vegetation species and mix LSPs within areal units (here pixels, but potentially as polygons).The interactive approach of the EDPM is anticipated to produce more precise trajectories of canopy characteristics, capturing more finely resolved ET changes on daily and growing season bases.The inherent limitation for the VegET and EDPM scheme in capturing spatial details would be the relatively coarse spatial resolution of input weather data.However, using the built-in data assimilation scheme (see Kovalskyy and Henebry, 2011), the EDPM is expected to bring in the moderate spatial resolution MODIS NDVI observations to enhance the resolution of VegET model outcomes.Although the current small number of supported vegetation types limits the domain of application to croplands and grasslands of the central part of the United States, extension to other vegetation types should be possible, given the availability of appropriate quality flux tower data.The results of spatially explicit trials of the EDPM plus VegET scheme are to be reported in the forthcoming paper (Kovalskyy et al., 2011a).
This investigation has shown how multiple aspects of phenology affect evapotranspiration during the growing season.It provided statistical and graphical evidence that accounting for phenology improves the accuracy of ET a estimation by the VegET (Figs.A1 and A2 in Appendix A).The level of improvement, however, varies across sources of phenological parameters.We also found that when using climatologies the VegET overestimated total seasonal ET in two aspects.First, climatologies forced the model into overestimation of daily ET a during the actual growing season and therefore increase total seasonal ET.Second, climatologies overestimated durations of seasons, adding to the gap between estimates and observations of total ET flux during that period.With the standard deviation of more than 5 weeks within crops, it resulted in an additional 100 to 200 mm of ET per season, which can account for about 25 % of seasonal ET in drier western sites.Therefore, we conclude that when used with climatologies, the VegET showed only a modest sensitivity to variation in growing season weather, yet it can offer a benefit if no better alternative is available.Parameterization of the VegET with the EDPM-simulated K cp proved to be more advantageous in capturing the impact of phenology on ET than the one provided by the climatologies.The EDPM produced daily ET a with smaller and more consistent RMSE.It is possible, though, that some of the differences between the climatologies and the event driven model for grassland were due to the way of derivation of the canopy coefficient.Yet, the residuals produced by the EDPM were closer to zero for agricultural sites and differed substantially with distribution of residuals coming from VegET with long term averaged AVHRR or MODIS forcings.The EDPM reached the accuracy of VegET results comparable to the level achieved by parameters from contemporaneous MODIS NDVI.The overestimation of total seasonal evapotranspiration did not go beyond 15 %, even for the maize, while the automatic PTP estimation system still has potential for improvement.Hence, we conclude that the EDPM is a better option for phenological parameterization of land surface models than long-term averages of canopy properties.
Finally, this study has opened the door and established a precedent of the EDPM deployment in a coupling scheme to estimate a land surface flux that depends on vegetation dynamics.Even just for ET estimation/monitoring over vegetated surfaces, there is an array of models listed by Allen et al. (2007), Kalma et al. (2008), Kustas and Anderson (2009) that might be able to adopt the EDPM for parameterization of their regional applications.At this point, the encouraging results of the EDPM indicate a promising new approach to overcoming the challenge of addressing phenological factors in models of land surface processes.

Fig. 1 .
Fig. 1.Derivation of the phenological factor in evapotranspiration (K cp ) from TNDVI.(A) K cp -TNDVI relationship in grassland.(B) K cp -TNDVI relationship in cropland.(C) Modeling K cp residuals in grassland.(D)Relationship between observed and TNDVI in grassland with modeled residuals added.

Fig. 3 .
Fig. 3. Histograms of differences between modeled daily ET a estimates and flux tower observations structured by vegetation/crop type (A) and by location (B).Dashed line marks 0 difference point where histograms are expected to be symmetrically centered.

Fig. 4 .
Fig. 4. Diagram of differences between DORs revealed by Kolmogorov-Smirnov tests.In the top row DORs grouped by vegetation type; bottom row -by location.Dark grey color indicates significant difference with p value <0.01 between compared distributions; white is no significant difference between DORs; light grey is no comparison made.ET-ED is the ET obtained through VegET parameterized by K cp from EDPM; ET-CA is the ET derived via VegET driven by K cp from AVHRR based climatologies; ET-CM is the ET derived via VegET driven by K cp from MODIS based long term averages; ET-OB is the ET derived via VegET driven by K cp transformed from retrospective MODIS time series.

Fig. 5 .
Fig. 5. Root Mean Squared Errors produced by different evapotranspiration estimates: arranged (A) by vegetation/crop type and (B) by location.

Fig. 6 .Fig. 7 .
Fig. 6.Temporal details of VegET performance with different phenological forcings revealed by F -scores.Results arranged (A) by vegetation/crop type and (B) by location.

Fig. 8 .
Fig. 8. Implications from choices of methods of determining growing season parameters: (1) Differences between observed and estimated season duration from EDPM; (2) Differences between observed and estimated season duration from retrospective MODIS time series; (3) Differences between observed and estimated season duration from AVHRR based climatologies; and (4) Differences between observed and estimated season duration from MODIS based long term averages.Error bars show standard errors.

Fig. A1 .
Fig. A1.Root Mean Squared Errors produced by Penman-Monteith model: arranged (A) by vegetation/crop type and (B) by location.

Fig. A2 .
Fig. A2.Temporal details of Penman-Monteith model performance relative to the VegET with different phenological forcings revealed by F -scores.Results arranged (A) by vegetation/crop type and (B) by location.

Table 1 .
Arrangement of vegetation types, locations, and years.

Table 2a .
Presence of bias in VegET outcomes from different phenological parameterization sources: distributions are presented by vegetation types.
< 0.01 < 0.01 < 0.01 < 0.01 ET-ED is the ET obtained through VegET parameterized by K cp from EDPM; ET-CA is the ET derived via VegET driven by K cp from AVHRR based climatologies; ET-CM is the ET derived via VegET driven by K cp from MODIS based long term averages; ET-OB is the ET derived via VegET driven by K cp transformed from retrospective MODIS time series.ing two sources: (1) MODIS NBAR 0.5 km resolution product (2000-2009) at ftp://e4ftl01u.ecs.nasa.gov/MOTA/MCD43A4.005/; and (2) AVHRR 1.1 km resolution NDVI composites by USGS

Table 2b .
Presence of bias in VegET outcomes from different phenological parameterization sources: distributions are structured by locations.
ET-ED is the ET obtained through VegET parameterized by K cp from EDPM; ET-CA is the ET derived via VegET driven by K cp from AVHRR based climatologies; ET-CM is the ET derived via VegET driven byK cp from MODIS based long term averages; ET-OB is the ET derived via VegET driven by K cp transformed from retrospective MODIS time series.

Table A1 .
Presence of bias in reference evapotranspiration (Penman-Monteith) across crops and locations.

Table A2 .
Difference between DORs of reference evapotranspiration and various arrangements of VegET across crops and locations.< 0.001 < 0.001 < 0.001 < 0.001 ET-ED is the ET obtained through VegET parameterized by K cp from EDPM; ET-CA is the ET derived via VegET driven by K cp from AVHRR based climatologies; ET-CM is the ET derived via VegET driven by K cp from MODIS based long term averages; ET-OB is the ET derived via VegET driven by K cp transformed from retrospective MODIS time series.