Evaluating remote sensing of deciduous forest phenology at multiple spatial scales using PhenoCam imagery

. Plant phenology regulates ecosystem services at local and global scales and is a sensitive indicator of global change. Estimates of phenophase transition dates, such as the start of spring or end of fall, can be derived from sensor-based time series, but must be interpreted in terms of bio-logically relevant events. We use the PhenoCam archive of digital repeat photography to implement a consistent protocol for visual assessment of canopy phenology at 13 temperate deciduous forest sites throughout eastern North America, and to perform digital image analysis for time-series-based estimation of phenophase transition dates. We then compare these results to remote sensing metrics of phenophase transition dates derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Very High Resolution Radiometer (AVHRR) sensors. We present a new type of curve ﬁt that uses a generalized sigmoid function to estimate phenology dates, and we quantify the statistical uncertainty of phenophase transition dates estimated using this method. Results show that the generalized sigmoid provides estimates of dates with less statistical uncertainty than other curve-ﬁtting methods. Additionally, we ﬁnd that dates derived from analysis of high-frequency PhenoCam imagery have smaller uncertainties than satellite remote sensing metrics of phenology, and that dates derived from the remotely sensed enhanced vegetation index (EVI) have smaller uncertainty than those derived from the normalized difference vegetation index (NDVI). Near-surface time-series estimates for the start of spring are found to closely match estimates derived from visual assessment of leaf-out, as well as satellite remote-sensing-derived estimates of the start of spring. However late spring and fall phenology metrics exhibit larger differences between near-surface and remote scales. Differences in late spring phenology between near-surface and remote scales are found to correlate with a landscape metric of deciduous forest cover. These results quantify the effect of landscape heterogeneity when aggregating to the coarser spatial scales of remote sensing, and demonstrate the importance of accurate curve ﬁtting and vegetation index selection when analyzing and interpreting phenology time series.


4306
S. T. Klosterman et al.: Evaluating remote sensing of deciduous forest phenology specific phenophases such as budburst and leaf-out, sensors such as radiometers and digital cameras are now being used to create automated, high-frequency, phenological time series (Richardson et al., 2013b). Sensor-based data provide measurements that range from the local scale of site based observations to the global extent of satellite missions (Garrity et al., 2011;Huemmrich et al., 1999;Jenkins et al., 2007;Soudani et al., 2012). A key challenge in the interpretation of phenology derived from sensor time series is determining how they relate to plant biological events that an observer would recognize. Digital repeat photography of terrestrial ecosystems serves two purposes in this regard, supplying both a visually interpretable record, and, through image processing techniques, time-series data similar to those available from radiometers Sonnentag et al., 2012). Digital repeat photography can therefore serve as a bridge between the traditional practice of direct visual observation of organisms, and sensor-based estimates of phenology from near-surface and remote sensing data.
Digital repeat photography also makes consistent visual assessment of phenology possible over broad geographic ranges, as a single set of observers can view many sites with relative ease via digital image archives. In previous comparisons of local-to landscape-scale phenology, investigators were limited by the ground area a group of observers could feasibly cover on foot (Liang et al., 2011). At the continental scale, comparison of ground-based phenology indicators to remote sensing is limited by the geographic extent of any given mode of ground observation (White et al., 2009). Consequently, there is a knowledge gap in how time series of sensor data relate to the biological events of canopy phenology over a wide geographic range of sites.
This study applied quantitative analysis and visual assessment to a collection of digital repeat photography from a range of deciduous forests across eastern North America. The study sites exhibit diverse landscape characteristics, from a nearly pure deciduous broadleaf forest in Arkansas, to an urban stand of trees in Washington, DC. We compared an ensemble of previously presented and new methods for extracting dates from phenological time series, and quantified the statistical uncertainty of estimated dates. Building on an earlier comparison study by Hufkens et al. (2012), we also analyzed time-series data from the Moderate Resolution Imaging Spectroradiometer (MODIS), as well as the MODIS and Making Earth System Data Records for Use in Research Environments (MEASURES) phenology products, for comparison to near-surface estimates. This study aims to evaluate how visually assessed biological events correspond to sensor-based estimates of phenological dates. A complementary goal is to explore how near-surface metrics of deciduous canopy phenology in the spring and fall are related to landscape-scale metrics of remote sensing across diverse forest ecosystems.

Study sites
To characterize leaf phenology of temperate deciduous forests over a broad geographic area, we chose 13 sites in the eastern US and Canada, based on availability of near-surface camera observations (Fig. 1). A total of 81 site years of nearsurface and satellite remote sensing imagery were analyzed across all sites in spring, and 83 site years were analyzed in fall.
To characterize land cover at the study sites, 30 m resolution National Land Cover Database (NLCD) data were used for sites in the US, and Earth Observation for Sustainable Development of Forests (EOSD) data were used for sites in Canada (Table 1) (Vogelmann et al., 2001;Wulder et al., 2008). The MODIS Collection 5 Land Cover Type product classification at 500 m resolution was also used to characterize land cover .

Near-surface imagery: visual assessment of phenological transitions
The PhenoCam network is a continental-scale phenological observatory, spanning a wide range of biogeoclimatic zones and vegetation types, primarily in the United States. In addition to retrieving imagery from publicly available webcams with either hourly or half-hourly temporal resolution, the network consists of 85 cameras deployed following a standardized protocol, which upload half-hourly imagery to the PhenoCam server from 4 a.m. to 10 p.m. each day. Imagery and data products are available at the PhenoCam web page: http://phenocam.sr.unh.edu/webcam/. PhenoCam imagery was used to visually identify deciduous canopy transition dates for this study. Six observers looked through daily images and used a common protocol to identify the following dates for each site year of data: 1. when the majority of trees started leafing out 2. when the canopy reached full maturity 3. when the canopy first started to change color in the fall 4. when the canopy exhibited the brightest fall colors 5. when the majority of trees had lost all leaves.
To reduce inter-observer variability in visually assessed dates, the earliest and latest estimates of each date were discarded, and the remaining dates were averaged to provide a single date for each event. Using the median observation (not reported here) gave similar results to the mean.

Near-surface imagery: time-series estimates of phenological transitions
To automatically extract phenology transition dates from near-surface images, we defined regions of interest (ROIs) representing the deciduous canopy in the foreground at each site (shown in Fig. 2 We recorded images at some sites from before sunrise until after sunset. As shown by Sonnentag et al. (2012), images recorded at dawn and dusk, under very low levels of diffuse illumination and with no direct solar beam, tended to have much lower GCC values than those recorded at midday. We therefore excluded extremely dark images (low average values of R, G, or B DNs in the ROI) from further consideration. Because camera characteristics varied from site to site (e.g., color balance, maximum exposure time, automatic gain control, and internal image processing), it was necessary to manually determine the DN threshold used to discard imagery across sites. Final processing consisted of selecting the 90th percentile value from a 3-day moving window . To quantify dynamics in canopy redness in fall, the red chromatic coordinate (RCC) was calculated in the same way: (2)

Remote sensing data
We downloaded MODIS data for the 13 study sites through the MODIS web service (http://daac.ornl.gov/MODIS/ MODIS-menu/modis_webservice.html) for comparison to near-surface observations. Nadir bidirectional reflectance distribution function (BRDF) adjusted surface reflectances (NBAR) from the MCD43A4 product in the red, nearinfrared (NIR), and blue bands were used to characterize vegetation dynamics at 500 m spatial resolution (Schaaf et al., 2002(Schaaf et al., , 2011. NBAR measurements are based on surface reflectances taken from 16-day moving windows of MODIS data, and produced every 8 days. NBAR measurements were associated with the middle day of the 16-day compositing period from which the measurements were drawn (Zhuosen Wang and Crystal Schaaf, personal communication). The NBAR data were filtered to remove observations over urban areas, ice, or water using the MODIS MCD12Q1 Land Cover Type product, and remaining data were filtered to remove interference from snow using the MODIS BRDF albedo quality product (MCD43A2). Filtered NBAR reflectances were used to compute the enhanced vegetation index (NBAR-EVI), and the normalized difference vegetation index (NBAR-NDVI), which provide metrics of canopy greenness (Huete et al., 2002;Rouse et al., 1973): To account for inherent noise in MODIS data due to cloud cover, atmospheric interference, and uncertainty in the ground area measured by the MODIS sensor (Xin et al., 2013), we used median NBAR values taken over 3 × 3 windows of 500 m pixels centered on PhenoCam locations. The resulting time series were then smoothed using the median of a three-point moving window to remove spikes due to snowfall and other sources of noise that were not captured using the MCD43A2 product. We also analyzed GCC time series from MODIS NBAR, calculated according to Eq. (1). MODIS GCC time series suffered from lower quality than EVI and NDVI, with more noise and outliers, even after applying the quality control procedures used on those time series. Unfortunately, the lower quality of MODIS GCC time series caused relatively high statistical uncertainty in estimated phenophase transition dates. For example, the average statistical uncertainty (95 % confidence interval) for phenophase transition dates identified from MODIS GCC time series using the generalized sigmoid method (described below) was 17 days, twice as large as that for EVI time series. Because of this uncertainty, we do not report MODIS GCC results here. However, we note that Hufkens et al. (2012) used remote sensing data to calculate phenology dates with the excess greenness index, a spectral index similar to GCC, and obtained similar results to NDVI, an index used in this study.
In addition to the MODIS time-series data, we examined two operational phenology products derived from satellite remote sensing. The MODIS Land Cover Dynamics Product (MCD12Q2) provides annual phenophase transition dates and related growing season metrics at 500 m spatial resolution. To do this, the MCD12Q2 algorithm fits logistic functions (Eq. 5, below) to smooth and gap-fill time series of NBAR-EVI data, and reports the timing of local maxima and minima in the rate of change of curvature as phenophase transition dates (Ganguly et al., 2010;Zhang et al., 2003). In addition, we also used data from a 30-year archive of multi-sensor harmonized vegetation indices created as part of the National Aeronautics and Space Administration (NASA) MEASURES project (http://vip.arizona. edu). The MEASURES phenology product reports similar metrics to the MCD12Q2 algorithm, but has the advantage of nearly 20 additional years of historical data, using measurements from the Advanced Very High Resolution Radiometer (AVHRR). The MEASURES phenology data are produced at a spatial resolution of 0.05 degrees, or approximately 5 km for the region studied here.

Estimating dates from time-series data
We used three sigmoid-based methods and a data smoothing and interpolation method to explore different approaches for extracting dates from phenological time-series data. The simplest sigmoid-based method, hereafter called the simple sigmoid, has been widely used in the remote sensing community  Liang et al., 2011;Zhang et al., 2003): In Eq. (5), f (t) represents the modeled value of a vegetation index such as GCC, at time t. d defines the dormant season baseline value of greenness, c is the amplitude of increase or decrease in greenness, a controls the timing of increase or decrease, and b controls the rate of increase or decrease. Equation (5) was separately fit to spring and fall data for each site year to account for independent green-up and green-down dynamics, using the Matlab function lsqnonlin.
To account for the decreasing summer time greenness that is widely observed in vegetation indices prior to leaf senescence, Elmore et al. (2012) presented a modified double sigmoid model that adds a new parameter (m 7 ), thereby providing more accurate model representation of seasonal vegetation time-series data in many forest canopies: where the double sigmoid model in Eq. (6) is fit to entire years of vegetation index time series. In addition we also tested a more flexible approach, using a generalized sigmoid formula which introduced two additional parameters (q i and v i ), to allow different rates of increase near the lower and upper asymptotes of the sigmoid (Richards, 1959). Our implementation of this generalized sigmoid also accounts for nonlinear decrease in summertime greenness, as observed in many site years of data (parameters a 2 and b 2 ), as well as a changes in the dormant season value via parameter a 1 : Equation (6), hereafter referred to as the generalized sigmoid, was also fit to entire years of data. For each of the sigmoid models, phenological transition dates were estimated using local extrema in the rate of change of curvature k (Kline, 1998): Points where the curvature changes most rapidly occur at the beginning, middle, and end of seasonal transitions. In the simple and green-down sigmoids (Eqs. 5, 6), extrema in the curvature change rate were used to identify the start, middle, and end of spring (SOS, MOS, and EOS), following the  Figure 3. Example comparison of the simple sigmoid, green-down sigmoid, generalized sigmoid, and smoothing and interpolation approaches for 1 year of GCC and NBAR-EVI data. Phenology date estimates represent start of spring, middle of spring, and end of spring (SOS, MOS, and EOS, as shown in panel b). The simple sigmoid, green-down sigmoid, and generalized sigmoid models also have start of fall, middle of fall, and end of fall (SOF, MOF, and EOF, as shown in panel b). A single fall phenology date is identified from RCC using the smoothing and interpolation model (RCC max, as shown in panel g).
method proposed by Zhang et al. (2003). These points approximately correspond to 10, 50, and 90 % of amplitude in springtime greenness. A similar technique was used for the start, middle, and end of fall (SOF, MOF, and EOF). For the generalized sigmoid (Eq. 7), the third extreme in the curvature change rate was used to identify the end of spring. However, because this model allows asymmetric growth of the sigmoid function, the first two extrema were frequently found to occur significantly later than 10 and 50 % of amplitude in springtime greenness. Consequently, the start and middle of spring were identified as the times corresponding to 10 and 50 % amplitude between the dormant season and the end of spring values of greenness for the generalized sigmoid, with a similar approach used in fall.

S. T. Klosterman et al.: Evaluating remote sensing of deciduous forest phenology
To quantify uncertainty in estimates of transition dates from the sigmoid-based methods, we used the Jacobian matrix of parameter sensitivities (provided by the Matlab routine lsqnonlin) to calculate the parameter covariance matrix. The covariance matrix was then used in a Monte Carlo procedure to generate 100 samples of the parameter space, each of which was used to produce a new set of phenology dates. Monte Carlo ensembles were used to construct confidence intervals using the inner 95 % range for each phenology date, which we use here to quantify the statistical uncertainty of estimated dates.
Finally, in the smoothing and interpolation approach, timeseries data were first smoothed using the loess algorithm in Matlab, which reduces noise by estimating a local regression to a second-order polynomial at each point in the time series. The fraction of annual data used for the local regression was set to 0.1 for near-surface data and 0.2 for remote sensing data, to account for the different temporal resolutions of 3 and 8 days, respectively. After smoothing, cubic spline interpolation was applied to obtain a sub-daily resolution time series for estimating phenological transitions. Spring transition dates were identified as the times when greenness crossed 10, 50, and 90 % thresholds of the springtime amplitude in greenness. The smoothing and interpolation method was also applied to RCC time series in fall, where a single phenology date was identified as the fall maximum of the processed time series. To illustrate each of the date estimation methods, an example year of data from Arbutus Lake in 2009 is shown for both near-surface and satellite remote sensing data (Fig. 3), with model fits and date estimates for each approach.
To compare phenology transition dates derived from visual assessment, near-surface, and satellite remote sensing time series, the root mean square deviation (RMSD), bias, and r 2 statistic were computed for each phenological transition across all site years of data. These statistics quantify the magnitude of differences between corresponding dates from different methods, the average signed difference, and the degree of correlation, respectively.

Geographical and environmental patterns in phenology
To characterize geographical patterns and environmental drivers of phenology, we estimated linear regressions of phenology dates using two predictors: a location predictor consisting of site latitude and elevation, and a climate predictor consisting of average daily temperature and cumulative precipitation during the periods April-May for spring transitions, and September-November for fall transitions, using the DAYMET data set (http://daymet.ornl.gov).

Statistical uncertainty in date estimates
We used inter-observer variability from visual assessments and parameter uncertainty from curve fitting methods to calculate measures of the statistical uncertainty in phenology date estimates derived from near-surface digital photography. The average range of dates estimated from visual assessment was larger than the average 95 % confidence interval from curve fitting of GCC data for both spring phenology dates, particularly at the end of spring (Table 2). However in fall, inter-observer variability was smaller than the statistical uncertainty of curve fits for the middle and end of fall. These results suggest that dates derived from curve fitting analysis of greenness time series generally have less statistical uncertainty than visual assessment in the spring, although this is less pronounced in fall. Near-surface GCC data from PhenoCam provided estimates of phenology dates with less uncertainty (average 6day confidence interval, across methods and dates reported in Table 2) than NBAR-EVI and NBAR-NDVI data from satellite remote sensing (average 12-and 19-day confidence intervals, respectively). This may be due to the higher temporal resolution of near-surface data, which more effectively constrains parameter estimates. Since NBAR-EVI data were found to result in less uncertainty for remote sensing estimates of phenology than NBAR-NDVI, we focus on the use of NBAR-EVI in the following analysis.
Of the three sigmoid methods that we tested, the generalized sigmoid curve fit the time-series data with the lowest RMSD and produced the least uncertain date estimates in most cases, particularly using the near-surface data ( Table 2). Using NBAR-EVI data, the simple sigmoid function identified the middle of spring transition with the lowest uncertainty. However the green-down sigmoid and the generalized sigmoid curves resulted in more certain estimates for dates corresponding to the beginning and end of spring, respectively. The generalized sigmoid appears to provide the best overall functional representation of vegetation dynamics for NBAR-EVI in terms of certainty from the beginning to end of spring, likely because of its flexibility. Results presented hereafter therefore consider all of the time-series approaches described above, but emphasize the generalized sigmoid.

Comparison of visual assessment to estimates from near-surface time-series data
Phenological dates derived from visual assessment exhibited varying degrees of correspondence to dates identified using time-series data, depending on the date estimation method and seasonal transition ( Table 3). The start of spring was most closely associated with visual assessments for the date when the majority of trees started to leaf out (Fig. 4a).  Figure 4. Scatterplots of the comparison between visually assessed dates (y axis) and dates identified from near-surface GCC (x axis) using the generalized sigmoid method. Table 2. Statistical uncertainty in estimated phenology dates. Statistical uncertainty in sigmoid-based methods is calculated as the average width of inner 95 % confidence intervals for each phenology date. SOS, MOS, and EOS are start, middle, and end of spring. SOF, MOF, and EOF are start, middle and end of fall. Statistical uncertainty in visual assessment is calculated as the average length of time between earliest and latest assessments, after removing the minimum and maximum estimates from the raw data. All units are in days. with an RMSD of less than 10 days, with the generalized sigmoid yielding the lowest bias of 0 days. The visually assessed date of canopy maturity was less consistent with time-series estimates than the date of leaf-out. While correlations were generally good, with r 2 ranging from 0.45 to 0.73 across methods, all time-series estimates for this date were biased by about 10 days early with respect to visual assessment. For the generalized sigmoid method, the end of spring was less biased with respect to visual assessment for spring transitions that ended later in the year (Fig. 4b).
Greenness-derived estimates for the beginning and end of fall generally showed less agreement with dates derived from visual assessment than for spring phenology; fall estimates derived from greenness time series had larger average RMSD (23 and 16 days, respectively; Table 3) across methods than either of the spring dates (8 days for the start of spring and 13 days for the end of spring). Fall dates derived using the Table 3. Statistics comparing visual assessment to phenology derived from near-surface image processing. Visually assessed dates were compared to time-series methods as follows: the date when the majority of trees started leafing out was compared to SOS, the date of canopy maturity to EOS, the date of first color change to SOF, the date of brightest fall colors to MOF, and the date of leaf loss to EOF. Near-surface imagery dates were estimated from greenness, except for SOF, MOF, and EOF dates in the smoothing and interpolation approach, which were estimated from redness. Statistics include RMSD and bias in units of days, and r 2 for the comparison of corresponding dates and methods across all site years. Bias is calculated relative to time-series estimates, so a negative bias indicates that the corresponding visual assessment is later. 73 site years of data were used in this analysis. generalized sigmoid had comparable or lower RMSD than visually assessed dates from other curve fitting approaches, and indicated that estimates for end of fall were generally less biased with respect to visual assessment for timing of abscission when this occurred later in the calendar year (Fig. 4e). While the visually assessed start of color change in fall and the end of abscission were closer to greenness-derived metrics, timing of the brightest fall colors had similar RMSD with respect to date estimates from time series of both redness and greenness (Table 3).

Climate and geographical analysis of phenophase transitions
Phenology dates were moderately correlated with latitude ( Fig. 4), although for deciduous forests in eastern North America this relationship is confounded by site elevation according to Hopkins' law (Hopkins, 1919), as well as local weather (Richardson et al., 2006), both of which affect leaf phenology. We compared the effects of site location and local weather on phenology by calculating linear regression models of phenophase transition dates on site latitude and elevation, along with a separate regression model on average temperature and cumulative precipitation during the periods April-May for spring transitions, and September-November for fall transitions.
In spring, we found that SOS was delayed 1.9 ± 0.3 (regression slope ± standard error) days per degree latitude and 1.6 ± 0.5 days per 100 m in elevation, each about half of what Hopkins' law predicts, and similar to the delay of leaf unfolding of 1.1 to 3.4 days per 100 m elevation that was previously observed in a study of beech and oak trees in southern France (Vitasse et al., 2009), as well as 2.7 days per 100 m elevation in a study of a hardwood forest in New Hampshire (Richardson et al., 2006). In fall, the effect of latitude was more pronounced as EOF advanced 2.9 ± 0.3 days for each degree of latitude, but elevation was not significantly different from zero with an advance of 0.3 ± 0.5 days for each 100 m increase in elevation.
From the weather analysis, we found that SOS for deciduous trees advanced 3.5 ± 0.3 days for each 1.0 • C change in mean April-May temperature, which is within the range observed in experimental warming studies of deciduous tree leaf phenology of 1-7 days per 1 • C (Morin et al., 2010;Norby et al., 2003). SOS was relatively insensitive to precipitation, with a delay of 0.02 ± 0.01 days for each 1 mm change in cumulative precipitation. In fall, MOF was delayed 3.6 ± 0.4 days for each 1 • C change in average September-November temperature, but precipitation effects were again not significantly different from zero. These findings are consistent with studies indicating that eastern deciduous forest phenology is generally insensitive to observed variation in precipitation (Dragoni and Rahman, 2012); however, the effects of precipitation may influence fall phenology through soil water balance (Archetti et al., 2013). From the climate analysis we conclude that temperature has significant effects on deciduous forest phenology in the spring and fall, while precipitation does not. From geographical analysis we find that the timing of both spring and fall phenology correlates with latitude, but that only spring phenology correlates with elevation.

Comparison of near-surface and remote sensing phenology
The generalized sigmoid model, the time-series method with the least uncertainty, and the smoothing and interpolation approach, with the most flexibility, each produced an average RMSD of about 9 days between remote sensing and nearsurface imagery across the beginning, middle, and end of spring dates (Table 4, Figs. 5a-c). The magnitude (absolute value) of bias was low across all methods for the beginning and middle of spring, less than 1 week in nearly all cases ( Table 4). As spring progressed however, the signed bias between satellite remote sensing and near-surface phenology became more negative, indicating satellite remote sensing was later in comparison to near-surface phenology. The trend of a more negative bias for later spring phenology dates was not isolated to one particular method; across all methods and indices, dates derived from satellite remote sensing were 2 and 8 days later, on average, than near-surface metrics for the middle and end of spring, respectively (Table 4).
To examine whether landscape characteristics at individual sites played a role in the late spring bias, we calculated the fractional coverage of deciduous forest and mixed forest land cover types from NLCD data ( Table 1). The bias between results from near-surface GCC and satellite remote sensing NBAR-EVI with the generalized sigmoid method (Fig. 6) showed a significant pattern (r 2 = 0.73, p < 0.001) of less bias for sites that had greater fractions of deciduous or mixed forest coverage.
Time-series estimates of fall phenology from near-surface and satellite remote sensing generally differed more than spring dates; the average RMSD for fall dates was higher than spring in all methods and indices used for date estimation (Table 4). This is likely due to larger statistical uncertainty in estimated fall dates; GCC-, NBAR-EVI-and NBAR-NDVI-derived dates were roughly twice as uncertain as those in fall (Table 2). GCC-derived near-surface fall dates from the generalized sigmoid method were biased roughly a week earlier than dates from NBAR indices, which was characteristic of a negative bias for fall dates observed across most greenness time-series results, particularly for the middle and end of fall. In contrast, near-surface dates derived from redness, which best corresponded to the middle of fall date extracted from satellite remote sensing (Table 5), were consistently biased positively. Both greenness-and rednessderived near-surface dates had the lowest magnitude of bias at the MOF date, with several methods producing bias of less than a week.

Comparison of near-surface phenology to MCD12Q2 and MEASURES phenology products
The MCD12Q2 and MEASURES phenology products provide remotely sensed phenology estimates at different spatial scales than the analysis of NBAR data presented above. The NBAR analysis conducted for this study used MODIS data at an effective resolution of 1.5 km due to spatial windowing. In contrast, the MCD12Q2 phenology product is produced at 500 m resolution, and the MEASURES is produced at approximately 5 km resolution. In comparison to the MODIS NBAR data analyzed here, both of these phenology products exhibited similar signs in bias, but different magnitudes, relative to date estimates from near-surface time series. The coarse-resolution MEASURES spring dates exhibited a low average bias of less than 2 days at the beginning of spring, while the middle and end of spring dates were progressively biased later by an average of −9 and −17 days, respectively (Table 6), similar to the late spring bias presented above, but larger in magnitude. RMSDs between MEASURES and near-surface dates were also larger (by over a week for most transition dates) relative to dates from NBAR data. The MCD12Q2 product, encompassing the smallest land area of the three remote sensing analyses used here, showed qualitatively similar characteristics to the Table 4. Statistics comparing remote-sensing-derived to greenness-derived near-surface phenology. Statistics are computed as in Table 3. The time-series method indicated in the table was used for both near-surface and remote sensing date estimates. Bias is calculated relative to near-surface estimates, so a negative bias indicates that the corresponding remote sensing estimate is later. 81 site years of data were used in this analysis. coarse-scale MEASURES results, but with larger biases (Table 7). In consideration of the analysis presented above, results from MEASURES and MCD12Q2 indicate that satellite remote sensing results based on data with spatial resolutions that are intermediate between these two products, processed with the methods presented here, may result in better agreement with near-surface data.

Discussion
Phenological data available from near-surface and satellite remote sensing measurements present a large and growing resource for monitoring the interaction between global change and the biosphere, but involve significant challenges for analysis. For example, lack of standard protocols complicates determination of which biological events correspond to data-driven estimates of phenophase transitions in diverse and geographically dispersed ecosystems (White et al., 2009). Furthermore, while several methods exist for estimat-ing transition dates from time-series data, few studies provide concrete guidance regarding how to distinguish between these approaches (but see Cong et al., 2012), or quantify the uncertainty associated with various methods. This study compared an ensemble of date estimation methods to assess how near-surface metrics of deciduous forest phenology, here derived from high-frequency digital camera imagery, relate to both visual assessment of canopy status, and to landscape-scale estimates from satellite remote sensing platforms, across a range of temperate deciduous forests. Our results show that the choice of analysis method affects the certainty with which dates can be estimated at both nearsurface and remote scales. The choice of analysis method can also affect the RMSD, magnitude of bias, and in some cases the direction of bias when comparing near-surface phenology metrics to metrics derived from visual assessment and satellite remote sensing.

Comparison of PhenoCam curve fitting to visual assessment and remote sensing in spring
Time-series estimates of the start of spring at the near-surface scale are generally well correlated with visual assessments for the first appearance of leaves (Fig. 4a), the stage of leaf phenology which immediately follows budburst. This indicates that the SOS metric represents the release of ecodormancy in buds on deciduous trees, the stage of bud development at which limitations of environmental factors are removed (Basler and Korner, 2014). A significant outlier occurred, however, in the spring of 2007 at the Upper Buffalo Wilderness. Observers consistently identified the start of leaf-out as DOY 90, earlier than other years for this site. However after this early leaf-out, a spring frost delayed further leaf development (Gu et al., 2008), likely resulting in the later start of spring (DOY 120) identified by curve fitting analysis. Consistent with previous studies, estimates for the start of spring were also highly correlated for metrics derived from satellite remote sensing and near-surface remote sensing. Liang et al. (2011) noted that start of spring estimated from MODIS EVI time series matched the date of budburst directly observed on trees to within 2 days in a mixed deciduous-coniferous forest in Wisconsin, and Soudani et al. (2008) found a similar result for deciduous forests located Fractional forest cover is defined as the fraction of 30 meter resolution pixels in the deciduous 5 forest land cover class plus half of the fraction of pixels in the mixed forest class at each study 6 site. 7 Figure 6. Scatterplot of bias in the end of spring (EOS) between near-surface GCC date estimates and remote sensing NBAR-EVI estimates using the generalized sigmoid method. Fractional forest cover is defined as the fraction of 30 m resolution pixels in the deciduous forest land cover class plus half of the fraction of pixels in the mixed forest class at each study site.
throughout France. A recent study by Hmimina et al. (2013) also found good agreement between near-surface and satellite remote-sensing-derived estimates for the beginning of spring. However we found that data-driven estimates of later spring phenology from near-surface imagery, intended to represent the final stages of springtime leaf development, exhibited less correspondence to estimates derived from both visual assessments and satellite remote sensing.
The visually assessed date of leaf maturity was later than the end of spring date derived from near-surface GCC. At visually assessed maturity, leaves were dark green, whereas, at the GCC-derived end of spring date, leaves were bright yellow-green. The shift from brighter to darker green was associated with an increase in the relative brightness of the blue channel. Recent studies explored possible reasons for this, finding that GCC from tower-mounted cameras reached its springtime maximum 2 to 3 weeks before a suite of leaf and canopy physiological traits, including chlorophyll fluorescence, total chlorophyll concentration, leaf area and mass, nitrogen, carbon, and water content, and leaf area index (LAI; Keenan et al., 2014;Yang et al., 2014). Keenan et al. (2014) concluded that GCC reaches its peak as the effective LAI viewed from tower-mounted cameras saturates, and GCC becomes insensitive to further increases in LAI, but begins to decrease due to changes in leaf color.
Comparison of near-surface GCC and MODIS NBAR indices for late spring phenology shows that, across all date estimation approaches, metrics derived from MODIS were biased later by an average of 8 days (Table 4). Different spectral indices exhibit different temporal trajectories , and have been reported to correlate with different  Hufkens et al. (2012) noted that both the excess green index (a color index similar to the GCC used in this study) and NDVI from MODIS tended to saturate before EVI, and were insensitive to later changes in LAI. However, recent results have shown that EVI from satellite remote sensing has a 2-to 3-week temporal bias, similar to GCC from tower-mounted cameras, with respect to the suite of leaf physiology measurements mentioned above (Keenan et al., 2014). Further, recent work indicates that bias between end of spring phenology at the near-surface and landscape scales may not be caused by differences in vegetation index; Hmimina et al. (2013) found a similar late spring bias using NDVI from remote sensing and near-surface NDVI sensors. Camera fields of view are smaller than ground areas associated with satellite pixels. Consequently, GCC from cameras can only be expected to agree with satellite vegetation indices to the extent that the camera field of view represents the vegetation and land cover in the satellite pixel. To explore this, we conducted a land cover analysis, focusing on the source of bias found in this study (Fig. 6). Results from this analysis show that landscape composition affects the magnitude of the bias, where sites with a smaller proportion of deciduous and mixed forests tended to have estimates of end of spring phenology from satellite remote sensing that were systematically later than near-surface estimates. For the other phenological transitions (SOS, MOS, SOF, MOF, and EOF), the statistical relationship between this bias and fractional forest cover was not significant (p > 0.05).
Other researchers have explored the effect of vegetation heterogeneity on measurements of albedo across multiple sites (Cescatti et al., 2012), finding that more homogeneous sites produced better agreement between scales. However this study appears to be the first to document a linear correlation between forest coverage and temporal bias in canopy phenology between the organism and pixel scales, indicating a way that landscape characteristics may determine the fidelity of satellite remote sensing measurements.

Comparison of PhenoCam curve fitting to visual assessment and remote sensing in fall
In fall, variability between observers was smaller for the dates of brightest fall colors and leaf abscission than for the first signs of senescence (Table 2), indicating that fall colors associated with the middle of senescence, and the eventual loss of leaves, give the clearest visual indicators of fall phenology. Estimates using peak RCC from near-surface images matched visual assessments of the timing of leaf coloration with similar RMSD to GCC-based estimates of the middle of fall (Table 3), with peak RCC biased 3 days later and GCC biased 2-6 days earlier.
We found that the statistical uncertainty in curve fit estimates of fall dates was larger than that of spring dates (Table 2). This may be caused by within-canopy heterogeneity, with some trees senescing before others. This is exemplified in Fig. 2 where some trees are in advanced stages of senescence while others still have many green leaves. Integrating all of these trees into a single region of interest tends to cause a longer, more drawn out transition in fall than in spring (Fig. 3). This more gradual change leads to less well-defined extrema in the curvature change rate in Eq. (7) of GCC time series and greater statistical uncertainty in estimated fall dates than spring dates. Based on this alone, we would expect larger RMSD between camera-and satellite-derived dates in fall than spring. On a larger scale, variation in species composition and land cover type below the spatial resolution of MODIS also complicates the interpretation of NBAR-EVI and NBAR-NDVI measurements in fall (Cescatti et al., 2012;Dragoni and Rahman, 2012), similar to effects on nearsurface GCC and RCC .
To more accurately study spatial variation in fall phenology, and to further study the late spring bias in heterogeneous forested landscapes reported above, digital photography from cameras with larger fields of view, with ROIs that include more plants and plant functional types, should be obtained . Similarly, the use of multiple cameras at a single site or multiple regions of interest in individual images ) could be used in combination with mixture models that combine phenological information from diverse plant functional types. In parallel, direct visual assessments of organisms are needed to complement these measurements, thereby supporting biological interpretation of metrics derived from digital cameras and other sources of time-series data.

Remote sensing phenology products
While the simple sigmoid approach used here with NBAR data is identical to that used for the MCD12Q2 and MEA-SURES products, each of these products is based on data with different spatial resolution, leading to divergent results. The MCD12Q2 algorithm does not use the spatial averaging approach employed here, and therefore represents remote sensing measurements associated with individual 500 m pixels. Consequently, the MCD12Q2 data are more susceptible to gridding artifacts of remote sensing measurements and other sources of noise (see Fig. 1 in Xin et al., 2013). Spatial averaging, which accounts for the values in neighboring pixels, appears to improve the remote sensing representation of deciduous canopy phenology in comparison to nearsurface measurements: the simple sigmoid method applied to MODIS NBAR data here yielded results with generally lower RMSD and bias with respect to ground measurements, relative to the MCD12Q2 product (Tables 4 and 7). The larger land area of measurements used to derive the MEASURES phenology product also resulted in smaller biases with respect to near-surface phenology dates than MCD12Q2, although RMSDs were similar for spring and late fall phenology (Table 6).

Conclusions
This study used near-surface digital repeat photography to derive both visual assessment-and time-series-based estimates of leaf phenology, over a broad geographic range of temperate deciduous forests. To evaluate landscape-scale phenology metrics from both satellite remote sensing and near-surface metrics, a common framework of curve fitting methods was applied to estimate phenophase transition dates from both data sources. Results indicate that visual assessment of the start of leaf-out in spring was very similar to estimates of the start of spring from curve fitting, and across the jump in scale from near-surface to satellite remote sensing. However in later spring, study sites with more heterogeneous land cover exhibited greater differences between estimates of phenology from near-surface and satellite remote sensing. In particular, estimates of late spring phenology from satellite remote sensing were biased later relative to near-surface estimates, with progressively larger bias for ecosystems with lower fractional forest cover.
These results have broad implications for methods and models that simulate or estimate ecosystem services that depend on accurate monitoring of phenological events. For example, remote sensing data are used to infer the phenology of deciduous trees in ecosystem and earth system models (Lawrence et al., 2011;Medvigy et al., 2009). If an artificially late end of spring is detected in regions with smaller fractions of forest cover, this may lead to later attainment of full photosynthetic capacity in the modeled canopy, resulting in lower estimates of annual sums of net productivity in forest ecosystems (Goulden et al., 1996;Richardson et al., 2012). Near-surface imagery could be used in such ecosystems to separate phenological signals of diverse land cover types, for more accurate quantification of ecosystem services.
In addition to site heterogeneity, this study found that both the analysis methods and data sources for phenological time series affect the uncertainty associated with derived phenology dates. Dates derived from NBAR-EVI had less statistical uncertainty than dates calculated using NBAR-NDVI. Analysis methods with more flexibility for describing seasonal variation in vegetation greenness, particularly a generalized sigmoid method, resulted in lower uncertainty in estimated dates and better agreement with visual assessment of canopy phenology, demonstrating the importance of accurate functional representation of phenological time series for identification of phenophase transition dates.