Assessing the dynamics of vegetation productivity in circumpolar regions with different satellite indicators of greenness and photosynthesis

High latitude treeless ecosystems represent spatially highly heterogeneous landscapes with small net carbon fluxes and a short growing season. Reliable observations and process understanding are critical for projections of the carbon balance of climate sensitive tundra. Spaceborne remote sensing is the only tool to obtain spatially continuous and temporally resolved information on vegetation greenness and activity in remote circumpolar areas. However, confounding effects from persistent clouds, low sun elevation angles, numerous lakes, widespread surface inundation, and the sparseness of the vegetation render 5 it highly challenging. Productivity during the peak of the growing season importantly affects the total annual carbon uptake. Here, we conduct an extensive analysis of the timing of peak vegetation productivity as shown by satellite observations of complementary indicators of plant greenness and photosynthesis. The suite of indicators are: (1) MODIS-based vegetation indices (VIs) as proxies of the fraction of absorbed photosynthetically radiation; (2) VIs combined with estimates of absorbed photosynthetically active radiation (APAR); (3) sun-induced chlorophyll fluorescence (SIF) serving as a proxy for photosynthesis; 10 (4) vegetation optical depth (VOD), indicative of total water content; and (5) empirically upscaled modelled gross primary productivity (GPP). Averaged over the pan-Arctic we find a clear order of the annual peak as APAR < GPP < SIF < VIs / VOD. SIF as an indicator of photosynthesis is maximized around the time of highest annual temperatures. Model GPP peaks at a similar time like APAR. The time lag of the annual peak between APAR and instantaneous SIF fluxes indicates that the SIF data do contain information on light-use efficiency of tundra vegetation, but further detailed studies are necessary to verify 15 this. Delayed peak greenness compared to peak photosynthesis is consistently found across years and land cover classes. A particularly late peak of NDVI in regions with very small seasonality in greenness and a high amount of lakes probably originates from artefacts. Given the very short growing season in circumpolar areas, the average time difference in maximum annual photosynthetic activity and greenness/growth of 3 to 25 days (depending on the data sets chosen) is important and needs to be considered when using satellite observations as drivers in vegetation models. 20 1 Biogeosciences Discuss., https://doi.org/10.5194/bg-2018-196 Manuscript under review for journal Biogeosciences Discussion started: 23 April 2018 c © Author(s) 2018. CC BY 4.0 License.


Introduction
Landscapes in circumpolar regions are characterised by sparse vegetation, bare soil, rocks, large surface areas inundated by open water and a long snow-covered period.Despite large carbon amounts being stored in the often permanently frozen grounds, net fluxes of carbon between the land sur-face and the atmosphere are small and their CO 2 balance is close to neutrality (McGuire et al., 2012).Because of their strong sensitivity to environmental conditions, carbon exchange processes are highly variable in space and time (Olivas et al., 2011;Pirk et al., 2017;Lafleur and Humphreys, 2008;Welker et al., 2004) and an ecosystem might switch between being a carbon sink or source from year to year depending on the weather conditions (Huemmrich et al., 2010b).
Warming happens at accelerated rates compared to middle and lower latitudes (AMAP, 2012).The carbon budgets of both the tundra ecosystem and the Arctic boreal zone as a whole are undergoing major changes -with possibly strong feedbacks to climate (Pearson et al., 2013).The future evolution of the net ecosystem exchange (NEE), its component fluxes gross primary productivity (GPP) and respiration in Arctic landscapes is highly uncertain.Higher temperatures, the accompanying mineralization, and higher atmospheric CO 2 concentrations fertilise vegetation (Yi et al., 2014;Zhu et al., 2016;Welker et al., 2004).Accordingly, changes in species composition (Chapin et al., 1995) are observed and satellite records indicate a greening in large regions in the Arctic (Jia et al., 2003;Verbyla, 2008).This is interpreted as increased growth (Stow et al., 2004;Elmendorf et al., 2012;Huemmrich et al., 2010a;Chapin et al., 1995) or even as a woody encroachment into tundra (Racine et al. in Stow et al., 2004;Dass et al., 2016;Sturm et al., 2001).Yet, higher leaf mass and growth do not necessarily linearly translate into enhanced GPP in every case, as increased growth might also cause enhanced self-shading and lower nitrogen amounts per unit leaf area (Street et al., 2007;McFadden et al., 2003).A warmer climate might extend the snow-free period (Myneni et al., 1997), but there are contradicting indications of whether (Ueyama et al., 2013b;Lund et al., 2010;Kross et al., 2014) or not (Gamon et al., 2013;Oberbauer et al., 1998;López-Blanco et al., 2017;Lafleur and Humphreys, 2008) a longer growing season enhances seasonal carbon uptake and growth.Photosynthetic activity and plant growth further depend on soil moisture conditions (Gamon et al., 2013;Opała-Owczarek et al., 2018;Lafleur and Humphreys, 2008;Welker et al., 2004), therefore warming and shorter and shallower snow packs do not necessarily increase productivity (Zhang et al., 2008;Gamon et al., 2013;Yi et al., 2014;Huemmrich et al., 2010a, b;Parida and Buermann, 2014).Soil warming promotes thaw and stronger drainage.Heterogeneous respiration and carbon emissions into the atmosphere are stimulated in warmer soils at a lowered water table depth (Billings et al., 1982;Yi et al., 2014;Oechel et al., 1993;Huemmrich et al., 2010b;Commane et al., 2017).The balance between the photosynthetic carbon uptake and respirational losses is further modulated by permafrost disturbances (Cassidy et al., 2016).Polar treeless regions are spatially highly heterogeneous ecosystems (Welker et al., 2004) but have widespread full vegetation cover.NEE, GPP and respiration are governed by variable conditions re-garding the wetness and temperature, microtopography, geomorphology and type, and acidity of the soils (Kwon et al., 2006;Walker et al., 1998;Olivas et al., 2011;Emmerton et al., 2016;Pirk et al., 2017).It is not clear whether, where and when the land surface in Arctic tundra actually acts as a sink or source of CO 2 (Cahoon et al., 2012;McGuire et al., 2012) and what the direction and magnitude of changes in altered climatic conditions will be (Oechel et al., 1993;Billings, 1987;Sitch et al., 2007).This has given rise to extensive and long-term project studies of the Arctic like the Arctic-Boreal Vulnerability Experiment (ABoVE, https: //above.nasa.gov/about.html,last access: 22 August 2018) or the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE, https://carve.jpl.nasa.gov/Missionoverview/,last access: 22 August 2018), both of which are not limited to CO 2 .
Observing carbon fluxes in these inaccessible and remote areas is difficult.Several long-term monitoring sites exist where phenological observations, spectral reflectance as well as gas flux measurements are conducted in situ, both under natural conditions and in manipulative experiments.Many studies can be found in the literature that evaluate eddy covariance (EC) or the chamber gas flux measurements with respect to the spatial patterns of NEE at a fixed point in time or in situ NEE integrated over the growing season and its variations between years (López-Blanco et al., 2017;Lund et al., 2010;Ueyama et al., 2013a;McFadden et al., 2003;Williams and Rastetter, 1999;Marushchak et al., 2013;Kross et al., 2014).However, only a few sites exist compared to temperate regions, and observations are usually not done in a continuous manner over the complete year but during individual measurement campaigns or dedicated periods during the growing season.Even if automated instrumentation can provide more continuous measurements throughout the year, it is still hampered by the difficulty of access in the case of equipment failure.Compared to more temperate sites, tundra poses additional challenges on the calculation of NEE and its component flux GPP and respiration (Pirk et al., 2017).Due to the small magnitudes of the net fluxes, different flux calculation methods might even differ in whether they indicate a source or a sink at a given time (Pirk et al., 2017).Snow and soil freezing can act as a barrier for gas exchange with the atmosphere and cause a temporal decoupling between the registration at the sensors and when the gas concentrations have actually been changed by the heterotrophic respiration in the soil (Arneth et al., 2006) or by the photosynthesis of evergreens under the snow (Starr and Oberbauer, 2003).Continuous illumination during the polar day challenges temperature-respiration relationships obtained from night-time data (Runkle et al., 2013).Further, the heterogeneity of the landscape poses limits to the spatial representativeness of the relationships between the carbon fluxes and meteorological and soil conditions that have been identified in situ (Pirk et al., 2017;Tuovinen et al., 2018).Therefore, in spatial up-scaling exercises (Ueyama et al., 2013a;Marushchak et al., 2013;Huemmrich et al., 2013;Tramontana et al., 2016), strong extrapolations are necessary.Yet, the modelling of the future evolution of the vegetation and carbon fluxes (including their timing and magnitude) in circumpolar areas requires an understanding of GPP and respiration as well as accurate spatially and temporally explicit observations of their drivers.
Satellite remote sensing can help in constraining the component flux of GPP and additionally in extending point observations to larger areas with repetitive coverage in time.Depending on the monitoring approach, different assets and limitations need to be considered to infer GPP.Optical reflectance measurements can give an indication of the abundance of green plant material and hence photosynthetic potential.From spectral observations of greenness, information can be inferred on the fraction of incident photosynthetically active radiation (PAR) that is absorbed (fAPAR) and can potentially be used for carbon fixation.Following the concept of the light-use efficiency of plant productivity by Monteith (1972), the amount of absorbed radiation (APAR, the product of fAPAR and the incident PAR) is an important determinant of spatial and seasonal variations in GPP along with the efficiency with which the absorbed energy is used in carbon fixation.Site-level studies have confirmed a highly linear relationship between APAR and GPP (Huemmrich et al., 2010a, b).Indeed, in the last decades, spatial extrapolations of in situ observations of carbon fluxes in tundra and peatland showed the skill of indicators of greenness (leaf area index -LAIand reflectance-based indices like the NDVI or the green ratio) as predictors for GPP and NEE (Ueyama et al., 2013a, b;Chadburn et al., 2017;McFadden et al., 2003;Williams and Rastetter, 1999;Street et al., 2007;Marushchak et al., 2013).At many sites, moss makes up 2 or 3 times the amount of biomass of vascular plants.However, its photosynthetic capacity is much lower (Yuan et al., 2014;Williams and Rastetter, 1999;Huemmrich et al., 2013;Zona et al., 2011), and its seasonality is often dissimilar (Gamon et al., 2013) as a consequence of their different sensitivities to environmental conditions (Zona et al., 2011).Microtopography affects moisture conditions, even within small elevation changes of about 1 m (Olivas et al., 2011;Gamon et al., 2013;Pirk et al., 2017).As a consequence, distinct spatial distributions of the plant functional types and highly variable patterns of photosynthetic light-use efficiency are observed.Vascular plants prefer lower, wetter places, and their growth increases biomass, productivity, NDVI and LAI.However, when the ground becomes drier, NDVI will increase, but actual productivity will decline (Olivas et al., 2011;Gamon et al., 2013;Buchhorn et al., 2013).Consequently, the wetness of the surface confounds the interpretation of spectral reflectance with respect to productivity, which is problematic, as soils are often watersaturated (Stow et al., 2004).Next to the confounding effect of moisture on spectral reflectance, changes in GPP have been observed as not necessarily translating into changes in the NDVI (Olivas et al., 2011).The dynamics of field mea-surements of the NDVI are not necessarily related to total photosynthetic phytomass but are rather driven by only parts of the live foliar component (Riedel et al., 2005).In addition to these challenges, spectral reflectance observations are affected by large signals from the background and shadows cast by the microtopography and vegetation itself.Snow and open water from the many rivers, small ponds and thaw lakes (globally, the highest abundance and areal coverage of lakes is between 55 and 75 • N, Verpoorter et al., 2014), and litter and dry plant materials influence the spectra to seasonally changing extents.Further, persistent cloud cover, low illumination and viewing geometry (Stow et al., 2004;Laidler and Treitz, 2003), and the relatively large pixel size compared to the high heterogeneity of the landscape render reflectancebased observations of circumpolar productivity difficult.
Recently, independent and complementary approaches to spectral reflectance have become available to remotely study vegetation dynamics.First, sun-induced chlorophyll fluorescence (SIF) is an electromagnetic signal emitted by chlorophyll as a "by-product" of photosynthesis.Because it is directly related to photosynthetic activity (e.g.Porcar-Castell et al., 2014) it is expected to give a more direct and accurate picture of actual photosynthesis (as compared to greenness or growth) and is much less affected than vegetation indices by open water, snow or background effects, the heterogeneity of the land surface, and plant functional types.However, the footprints of the sensors from which SIF measurements are available, which cover the course of several years, are very large and integrate over many different growing conditions.Further, the SIF signal is generally weak in the tundra regions due to the low vegetation abundance and photosynthetic rates, and in combination with low illumination angles, it is subject to high noise levels.
A second type of complementary satellite information lies in passive microwave remote sensing.Specifically, vegetation optical depth (VOD) is a radiometric variable describing the attenuation of microwave radiation emitted from the soil and the vegetation itself due to the water contained in the canopy.It can therefore be directly related to vegetation water content and biomass.VOD increases with vegetation density, but is strongly controlled by vegetation emissions in very dense vegetation (Liu et al., 2011).Depending on the wavelength of observation, the signal is sensitive to different depths in the canopy and objects of variable sizes (e.g.small objects like leaves versus large trunks or branches).Following Teubner et al. (2018), in moderately and sparsely vegetated areas, there is a chain of proportionalities from VOD to GPP.VOD indicates the total water content, which is related to leaf area and is, in turn, an important determinant of GPP.In their comprehensive study, Teubner et al. (2018) evaluated the temporal behaviour between different VOD data sets, the modelled GPP and SIF and found widespread high positive correlations both between the raw time series as well as patterns of anomalies globally.Although dynamics in tundra vegetation have not been explored explicitly, correla-tions between the VOD and GPP were consistently high in landscapes characterised by shrubs, grasses or sparse vegetation.Similarly, the highest correlations between phenological dates derived from the VOD and vegetation indices were obtained in low biomass regions (Jones et al., 2011).VOD observations are insensitive to cloud cover and to variations in day light, a strong advantage in the high latitudes of interest in our study.However, as for SIF, currently available satellite observations have a coarse spatial resolution compared to optical measurements.Further, the careful corrections of the effects of soil moisture, open water and frozen grounds, and snow and ice, among others are necessary in the retrieval, and it is therefore not clear whether VOD can be a useful parameter in evaluating the vegetation dynamics in the specific context of tundra.
Neither greenness, SIF nor VOD can be directly translated into the amount of carbon taken up through photosynthesis.Nevertheless, they all represent important observation-based driving variables for the modelling of the tundra carbon exchanges at the landscape scale and over the course of multiple years (e.g.Luus et al., 2017).Therefore, their ability to accurately represent the timing and relative changes of photosynthetic activity and growth is of key importance for realistic model estimates of carbon fluxes.In this study, we compare the timing of the peak growing season as indicated by several satellite vegetation indices, the VOD and SIF, in circumpolar treeless regions.We aim at analysing their complementary information content with respect to maximum greenness and photosynthetic activity -despite all aforementioned challenges -and relate them to environmental conditions.In addition, GPP empirically up-scaled from eddycovariance observations using satellite measurements of different variables describing the land surface and meteorological reanalysis data (Tramontana et al., 2016) is included in the study.In doing so, a comprehensive evaluation of several state-of-the-art satellite-based products is achieved in this study with a special focus on the timing of the peak growing season, as this represents the most important period with respect to total annual carbon uptake.In addition to the use of the broad array of complementary space-borne data sets, we perform this analysis for the total circumpolar pan-Arctic treeless regions, and it therefore represents an extension with respect to the majority of published tundra ecosystem studies that are mostly confined to specific regions like Alaska.
2 Methods and materials

Methods
The different vegetation proxies are evaluated at a 0.5 • spatial resolution and with daily sampling.A temporal running mean in a window of 16 days is applied to all data sets.The resulting data still contain values for every day in a year, but the effective temporal resolution corresponds to 16 days.
Gaps due to missing data are not aligned between data sets.The timing of the annual maximum is defined as the average day of year (DOY) of all days at which the values exceed the 95th quantile of all valid values of the time series in a year in a given pixel.Because of frequently low data quality and long and intermittent data gaps in those high latitude regions of interest, we mostly base our analysis on multi-year averages of the DOYs of annual maximum (henceforth referred to as the avg.peak).We test the uncertainties by bootstrapping each annual time series 50 times per year and pixel for selected vegetation proxies (DOYs sampled consistently between vegetation proxies, with a replacement, and restricted to DOYs 100-300).
We use tower eddy-covariance measurements as a test of consistency between satellite observations sampled at the towers and site-level conditions.

Environmental variables
Air temperature at a height of 2 m (t2m) every 6 h between 2007-2015 is obtained from ERAInterim reanalysis data (Dee et al., 2011) and aggregated to 16-day temporal resolution with daily sampling.
Daily global radiation (Rg) for the years 2007-2015 is obtained from measurements of the Clouds and the Earth's Radiant Energy System (CERES Ed4A, Wielicki et al., 1996;Doelling et al., 2013) onboard the Aqua and Terra satellites.From the 1 • spatial resolution product (the "SYN1deg-Day product" with all-sky surface shortwave downward fluxes and initial fluxes), we disaggregate to 0.5 • spatial resolution by bilinear interpolation.Subsequently, daily data are averaged in a daily moving window of 16 days.
We further include surface-soil moisture (SM, 0-10 cm depth in m 3 m −3 ) model results from the Global Land Evaporation Amsterdam Model (GLEAM) project (v3.1a, Miralles et al., 2011;Martens et al., 2017).GLEAM is provided at a daily temporal and 0.25 • spatial resolution for the years 2007-2015.For the analysis we aggregate them to a 0.5 • and 16-day resolution with daily sampling.In the case of moisture-related variables, we will also explore the timing of the annual minimum in order to get an indication of the potential moisture stress or confounding effects on reflectance measurements.Accordingly, the timing of the minima are defined as the average of all DOYs at which the values are below the 5th quantile of all valid values in a year in a given pixel.

Reflectance-based indices
We use MODIS reflectance measurements to obtain the enhanced vegetation index or EVI (Huete et al., 2002), the normalised difference vegetation index or NDVI (Tucker, 1979), and the near infrared reflectance of vegetation or NIRv (Bad-gley et al., 2017) as proxies of greenness for the years 2007-2015.These indices have been calculated from the nadir bidirectional reflectance distribution function Adjusted Reflectance (NBAR) from the MODIS MCD43C4v006 product (MCD43C4, 2017) at 0.05 • .This means that the reflectance values are modelled to a value as if they were viewed from directly above.After a quality check (only pixels with bidirectional reflectance distribution function -BRDF; Quality flags 0, 1 retained) and snow filter (all pixel values containing any snow removed) the data are aggregated to the 0.5 • spatial resolution and left at their native temporal resolution of 16 days with daily sampling.We will refer to these throughout the paper as EVI, NDVI and NIRv.
As the amount of incoming photosynthetically active radiation is proportional to the total downwelling shortwave radiation, we calculate an estimate of APAR as the product of global radiation and EVI (denoted EVI.Rg) or NDVI (denoted NDVI.Rg), both of which are assumed to be a valid approximation of fAPAR here.In the following we will refer to both of them together as APAR and will otherwise refer separately to EVI.Rg and NDVI.Rg.
We additionally include the MODIS vegetation index products of the NDVI and EVI from Aqua MYD13C1v006 (MYD13C1, 2017) and Terra MOD13C1v006 (MOD13C1, 2017) in the analysis.In contrast to EVI, NDVI and NIRv from the MCD43C4 data, those are obtained from reflectances with different viewing angles that do not necessarily correspond to nadir.Including them in the comparison can therefore help to get an idea of the influence of directional effects on the seasonality and of the consistency of the results.From the 0.05 • products generated with an 8-day frequency using a period encompassing the previous 8 and the following 8 days of acquisitions, we removed data that were not of good quality using the vegetation index (VI) quality indicator.We further remove pixel values that are flagged as cloudy, contain snow or ice or those that were not processed as indicated by the quality reliability flag.The remaining pixel values are aggregated to 0.5 • grid cells.Throughout the paper we will name these EVI.VIproduct and NDVI.VIproduct or refer to them together as the MODIS VIproduct.The data from the MODIS VIproduct are different from all other data sets in that they are sampled every 8 days, not daily.

Vegetation optical depth and land surface parameters
A data set of various land parameters simultaneously derived from passive microwave measurements of the AMSR-E onboard the Aqua and AMSR2 onboard the GCOM-W1 is used for the years 2007-2015 (v2, here called AMSR-E/2; Du et al., 2017a, b).The data records are combined but not continuous.In 2011 data are available until DOY277 and restart after that in 2012 only at DOY206.Because the peak growing season is covered in 2011, we use data from 2011 but not from 2012.Of the observations made in de-scending orbits with an equatorial crossing at 01:30 LT, we use the VOD and volumetric surface-soil moisture (0-1 cm depth in cm 3 cm −3 ) derived from X-band (10.7 GHz) as well as estimates of the fraction of open water.We use the descending orbit as retrievals are generally more accurate when vertical temperature gradients are low (Liu et al., 2011).The retrieval specifically accounts for the effects of open water on the VOD and surface-soil moisture (Du et al., 2017b).The accompanying quality flags are used to remove all pixel values observed under non-favourable conditions with respect to frozen soils, snow, ice or large areas of open water on the surface, very dense vegetation, precipitation, radio frequency interference, or microwave signal saturation.The daily files with the native 25 km resolution data in an Equal-Area Scalable Earth Grid (EASE-grid) projection are first filtered for quality, reprojected to 0.25 • longitude and latitude relative to WGS84, and are subsequently aggregated to 0.5 • .For temporal consistency we aggregate to 16 days with daily sampling, as in all other data sets.

Sun-induced chlorophyll fluorescence
Sun-induced chlorophyll fluorescence (SIF) as a proxy of photosynthetic activity is retrieved from GOME-2 measurements onboard Metop-A at 740 nm (Köhler et al., 2015, it will henceforth be called SIF GFZ) for January 2007 until the end of 2015.We remove the individual measurements that have unfavourable observational conditions, namely those that have an effective cloud fraction of more than 50 % (based on the FRESCO cloud mask that is provided with GOME2 L1b data, which are based on reflectances in the O2-A band; Wang et al., 2008), those that are measured before 08:00 or after 14:00 local solar time (which is important as in high latitudes during the solar day additional measurements in the evening are possible but subject to high noise) or under sun-zenith angles of more than 70 • , and those whose retrieval resulted in a residual sum of squares larger than 2 W m −2 sr −1 µm −1 .We also remove those SIF values for which no cloud fraction information is available.The individual remaining measurements are aggregated to the 0.5 • resolution based on the centre coordinates of a given footprint over the 16-day intervals, similar to the MODIS data, for each individual year to obtain a time series.We added to the comparison SIF data retrieved from GOME-2 with a slightly different method (Joiner et al., 2013, 2016, V26, henceforth SIF NASA).The individual measurements are filtered in the same way as for the SIF GFZ data set, except for the fact that the data are delivered filtered for an effective cloud fraction of smaller than 0.3 (we treat negative cloud fractions as zero).The SIF NASA uses a similar, but not identical method to obtain effective cloud fractions by using cloud radiance fractions at 865 nm (Joiner et al., 2012).We then average in the same way spatially and temporally as previously done for the years 2007-2015.
As SIF represents an instantaneous observation at a given time of the day and a comparison to the GPP seasonality would be hampered by the fact that GPP represents an average daily value (Zhang et al., 2018), additional comparisons are carried out for the SIF observations scaled to daily values (henceforth SIF.daily.intGFZ).Through a geometrical approximation of incoming PAR by the cosine of the sun-zenith angle, the correction to daily values is achieved by the multiplication of the instantaneous SIF by the ratio of the daily integrated (in 10 min steps) cosine of the sun-zenith angle and the cosine of the sun angle at the time of measurement.This correction is expected to account for the effects of seasonally and daily changing illumination.Caution is warranted for this correction, as it assumes that the same environmental conditions prevail over the entire day and may further amplify noise.
SIF can be approximated in a similar way as the GPP following Monteith (1972), as the product of fAPAR, PAR, which is approximated as cos(SZA), and the efficiency with which the energy is used in fluorescence emission.Hence, the comparison of SIF to vegetation indices is also more appropriate if one accounts for the illumination effects on SIF that are not included in the greenness indices.In that case, the SIF values are normalised by the cosine of the sun-zenith angle at the time of measurement and used in the analysis (henceforth SIF.cosSZA GFZ and SIF.cosSZA NASA).According to the SIF Monteith model, SIF.cos(SZA) therefore represents a convolution of canopy fAPAR and the efficiency of fluorescence emissions.
As a cross check of the plausibility of the GOME-2 SIF additional comparisons to SIF at 757 nm retrieved from OCO-2 are done using the OCO-2 SIF lite files (B8100r) from September 2014 to mid October 2017 (OCO-2 Science Team/Michael Gunson, 2017; Frankenberg et al., 2014;Sun et al., 2018).We filter all measurements taken with a sun-zenith angle of less than 70 • in nadir mode over regions whose IGBP land cover does not consist of water, forests or crops or is not urban or mosaic.Samplings of the OCO-2 SIF (henceforth OCO2) and the OCO-2 SIF corrected for illumination conditions through the division by the cos(SZA) in the same way as the GOME-2 measurements (henceforth OCO2.cosSZA;OCO2 and GOME-2 have different overpass times and therefore different instantaneous illuminations at the time of measurement) are averaged to a climatology based on 16-day averages sampled daily as a spatial average over different smaller regions of interest.The regional averaging is necessary, as OCO-2 is not continuously sampled like GOME-2.

Modelled GPP (FLUXCOM)
Another indicator of photosynthetic activity is provided by the modelled GPP simulations from the FLUXCOM initiative (http://www.fluxcom.org/products.html, last access: 20 August 2018 Tramontana et al., 2016).The relation-ships between land surface and environmental variables and land-atmosphere energy and carbon fluxes learned at FLUXNET eddy-covariance sites in the La-Thuile data set (http://fluxnet.fluxdata.org/data/la-thuile-dataset/,last access: 10 August 2018) are spatially up-scaled to the globe using a set of machine-learning techniques.FLUXCOM GPP is generated in two set-ups, the "remote sensing set-up (RS-GPP)" and the "meteorology + remote sensing (METGPP)" set-up.The former uses satellite-observed land surface conditions to estimate the GPP at the 8-daily temporal resolution and 1/12 • , and we use the years 2007-2015.The METGPP represents an ensemble of GPP, where the mean annual cycle of land surface conditions and additional information on actual meteorological conditions from reanalysis is used in the prediction at the 0.5 • and daily resolution.We restrict the METGPP data to the years 2007-2010, as simulation results from all ensemble members are available for those years.We aggregate to the 16-day averages sampled every day, consistent with the MODIS sampling in the MCD43C4v006 data.The RSGPP is linearly interpolated to daily values at 1/12 • .The subsequent aggregation to 0.5 • and running means over 16 days match the spatiotemporal resolution to all other data sets.Together RSGPP and METGPP are referred to as the modelled GPP.

Land cover
We use the ESA CCI land-cover classification and aggregate it to the broader classes of mossy, bare/sparse, grassy/herbaceous, woody, water covered and others.The ESA CCI provides a tool to convert discrete land-cover classes to continuous vegetation fractions (http://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf, last access: 23 February 2018, and Poulter et al., 2015), and we use it to obtain land-cover fractions from the native 300 m pixels in 0.5 • pixels for the period 2008-2012 (Fig. A1 in Appendix A).The distribution of moss in these products is expected to be problematic, because it is a complicated class to characterise in terms of global land-cover products.Being partly based on regional maps with varying thematic detail in their legends, it is possible that this moss class in the ESA CCI is not always accurately identified over certain regions, explaining why moss cover is barely indicated in Siberia.

Eddy-covariance-derived GPP
We use eddy-covariance measurements from a subset of the FLUXNET2015 data (http://fluxnet.fluxdata.org/,last access: 29 July 2018; please note that FLUXCOM has been trained on the earlier La Thuile dataset).We chose sites based on data availability after 2007, situated north of 55 • and that were not characterised as forests (details on sites and siteyears used are provided in Table A1 in Appendix A).We use the daily GPP derived using night-time (NT) and daytime (DT) partitioning and aggregate it to 16 day-averages, masking out those days where less than 80 % of reliable measured and gap-filled NEE values were available (with variable Ustar threshold and reference selected on the basis of the model efficiency).Site-level air temperature (consolidated by ERA-Interim and the MDS algorithm) was filtered for good quality and gap-filling in the same way.
We also include the daily GPP derived from NEE by night-time partitioning (Reichstein et al., 2005) from measurements in Cherskiy, Russia (control site RU-Ch2; data courtesy Mathias Göckede, Max-Planck-Institute for Biogeochemistry, Jena, Germany), where only days are used in the average with short data gaps filled (less than 1 day).

The study area
Operational satellite-derived land-cover data sets exhibit substantial differences in the classes they assign to circumpolar regions.We compared the ESA CCI land cover, Glo-beLand30 by Chen et al. (2015) and the IGBP classification from MODIS MCD12C1 and found that the classification of the same area can range from barren to grasslands to open shrub lands depending on the chosen data set (not shown).A clear and generally accepted delineation of a class "tundra" is not given.We therefore define our study area based on tree cover as "polar treeless regions".
Global data on annual forest cover gain and loss have been provided by Hansen et al. (2013) based on Landsat images.We take 2009 as representative of the period of investigation.Based on information on the global tree cover in 2000, the yearly losses until 2009 and the gains until 2009 (assuming that the growth between 2000 and 2012 that is given in the data set is linear), global tree cover in 2009 is estimated.We aggregated from the original 30 arcsec resolution to 0.5 • .Regions with less than 5 % tree cover north of 55 • N are fixed as our study area (cf.Fig. 1).In the data from Hansen et al. (2013), a tree is defined to have a minimum height of 5 m, which is tall for circumpolar areas.The studied area will therefore include parts of the taiga-tundra transition zone, tundra and polar deserts.
The landscape in this study area exhibits a complex microtopography caused by polygons and is characterised by abundant (thaw) lakes.Though vegetation often fully covers the ground (Stow et al., 2004), vascular plant cover is sparse and the growing season and carbon uptake period, consisting of 1 to 3 months, are short in high-latitude tundra.The RGB images from Sentinel-2 for selected spots in Fig. 1 give examples of what tundra landscapes and the tundra-taiga transition can look like.Further, for small areas on the Alaska North Slope and at the bottom of the Taymyr Peninsula (the corresponding places are indicated in Fig. 1), Figs. 2 and 3

Timing of the annual peak in vegetation activity and greenness
The distribution of the timing of the annual maximum in polar treeless regions (Fig. 4, regionally and over years) shows a distinct order of the satellite vegetation proxies.All proxies indicate the highest plant activity and biomass after the summer solstice.While APAR is highest around DOY191 (10 July), the METGPP indicates the maximum photosynthetic activity at a similar time and the RSGPP 4 days later.It is the time when surface-soil moisture is almost at minimum, according to GLEAM (Fig. A3).The SIF GFZ peaks only about 1 week later (4 days later in the case of SIF.daily.int

Spatial patterns of the annual maxima of the satellite vegetation proxies and of the lags between them
There is considerable spatial variability in the timing of the maximum for each satellite indicator (Fig. 5).In areas close to the Date Line (i.e.easternmost Siberia and Alaska), most proxies peak slightly earlier than in northern Canada (mainland and islands) or the coasts of western and central Siberia (i.e.Taymyr Peninsula and regions of the Lena delta and the Laptev Strait).In general, the spatial pattern of the timing of the annual maximum of the satellite indicators closely corresponds qualitatively to the dynamics seen in air temperature and partly in the surface-soil moisture (GLEAM).Incoming light shows partly reversed patterns with an earlier maximum irradiance in northern Canada and western-central Siberia.
www.biogeosciences.net/15/6221/2018/Biogeosciences, 15, 6221-6256, 2018 We test whether the annual maximum is shifted systematically or whether there are spatial gradients in the peak lag between proxies and plot maps of the average lag between the peaks of selected proxies and NDVI.We take the NDVI because it is the most widely used vegetation index for productivity studies, both from the satellite as well as in ground-based observations, particularly in the polar tundra.Figure 6 confirms the general pattern of a shifted annual peak of NDVI as compared to NDVI.Rg, RSGPP, SIF GFZ, VOD and the similar timing to EVI over the entire study area.The NDVI.Rg, RSGPP, SIF GFZ, VOD and air temperature reach their annual maxima at significantly different times than the NDVI (Fig. A4). Figure 6 further shows that the lag is not homogeneous in space, but that it is largest in vast areas of polar desert in northern Canada, on Iceland and in the northern part of the Siberian Taymyr Peninsula.Interestingly, the tundra regions that exhibit the largest time difference of the annual maximum between the individual vegetation proxies and NDVI tend to correspond to those where the annual maximum is reached comparatively late (Fig. 5).In contrast to the denser vegetation cover on the Siberian coast, very sparse vegetation (i.e.devoid of shrubs or woody vegetation) characterises the northern Canadian regions, both the islands and the mainland as well as the northern part of the Taymyr Peninsula (cf.Fig. A1; Walker et al., 2005, their Fig. 1).In addition, there is a comparatively high amount of lakes and a high fraction of barren regions in the areas of large NDVI lags in the central Siberian coastal areas (close to the Laptev Strait), coastal Alaska North Slope and in mainland Canada northwest of Hudson Bay.Similar results are obtained for EVI.Rg, METGPP, SIF.daily.intGFZ (Fig. A5).Next to these general observations, the VOD indicates a much later peak in smaller but contiguous areas in northwestern Canada as well as in the lake-rich regions on the Siberian coast (again the coast close to the Laptev Strait).The NDVI lags to METGPP are generally slightly larger than those to the RSGPP.No outstanding region emerges for the lags when compared to SIF NASA.The illumination correction of SIF (SIF.cosSZA)reduces the time difference to the NDVI when compared to the instantaneous SIF, as expected.

Spatial patterns of peak timing and of peak lags to the NDVI in relation to environmental variables
Putting the annual maximum of the NDVI into relation with that of the environmental variables, partially similar spatial patterns to the vegetation proxies emerge (Fig. 6).More precisely, air temperature and soil moisture (GLEAM minimum) peak earlier everywhere and have the largest time difference to the NDVI in the sparsely vegetated areas on Iceland, north- It is further interesting to test whether a certain temporal relationship between the maximum in photosynthesis or greenness and the dynamics of the environmental conditions holds across years.In Fig. 7, the environmental variable with the highest absolute value of the rank correlation, with the timing of the maximum of the given satellite proxy across years in a spatial moving window, is displayed.The important role of energy-related variables for vegetation activity and growth, mostly temperature, is highlighted by the highest widespread correlations with temperature and radiation for RSGPP, SIF GFZ, EVI and NDVI, while moisture-related variables are most important only in about one third of the pixels.The VOD and NDVI.Rg do show a strong relationship, with the annual temperature maximum only in 25 % to 30 % of the pixels and about half of the pixels show a higher importance of soil moisture or open water on the surface.It is worth noting that the contiguous regions with higher relationships with moisture-related variables for RSGPP, EVI and NDVI are situated in northwestern Canada and parts of the Taymyr (Fig. A6).
Interestingly, apart from the strong correlative relationship of photosynthesis with temperature, the lags in the annual peak timing between the temperature and SIF GFZ (Fig. A7) are barely statistically significant.Peak lags between the RS-GPP and temperature are only significant in northern Canada and easternmost Siberia.Conversely, EVI, NDVI, VOD and in the largest parts of the study area also NDVI.Rg, peak at significantly different times in the year than air temperature.

Consistency of the annual peak lags between different land covers and across years
The fact that there is spatial variability in the shift between the annual peaks of the satellite proxies relative to the NDVI suggests that the proxies differ in how strongly they indicate the spatial gradients in the peak DOY.We test to what extent the shift of the annual maximum holds across different years and whether there is a dependency on the land cover.This behaviour between years is qualitatively similar to the RSGPP-NDVI and SIF GFZ-NDVI but is rather caused by the variability in the peak timing of temperature than of the NDVI (not shown).

Discussion
Despite the considerable challenges for remote sensing applications in high latitudes and inherent comparatively large variability in the timing of the annual maximum, the differences in the peak timing of families or groups of key satellite indicators of plant productivity appear to be ordered in the polar tundra.Absorbed energy (APAR, both EVI.Rg and NDVI.Rg) is maximised roughly 1 month after peak irradiance in early July.Regarding the modelled GPP and SIF as indicators of photosynthetic activity, there is a time lag between them of 4 days to 2.5 weeks, depending on the combination of data sets.The modelled GPP peaks at a similar time to APAR.SIF GFZ reaches its maximum 1 to 1.5 weeks after (21 July) at a very similar time to air temperature, but SIF NASA only reaches its maximum at the end of July (DOY 209, 28 July).Greenness (EVI and NIRv) culminates 3 weeks after APAR and 1.5 weeks after SIF GFZ.NDVI maximum is delayed an average of 3 more days.Peak vegetation water content is indicated by the VOD at a similar time as EVI and NIRv.The spread in space and across years is, however, large.Vegetation activity is highly (though not exclusively) temperature-driven in tundra (Stow et al., 2004;May et al., 2017;Chapin, 1987).In the beginning of the growing season, light is abundant and plants rely on rhizome nutrient and car- bohydrate reserves to rapidly increase photosynthetic activity (Arneth et al., 2006) and growth (Chapin, 1987) through exposed moss, lichens and evergreens after snow melt and the rapid leaf-out process of deciduous plants.The fact that the photosynthesis seasonal maximum is reached in close and significant (SIF GFZ and RSGPP) temporal agreement with air temperature adds plausibility to the observed patterns in the modelled GPP and SIF.As the time of favourable environmental conditions for growth is short, several plant types strongly build up their photosynthetic capacity until late in the growing season to make use of the available light and temperature (Rogers et al., 2017).At the time when greenness is at a maximum, photosynthetic rates are decreasing, as PAR is already strongly reduced and the temperature peak has also passed.The peak timing of the SIF before greenness might hence indicate that although photosynthetic potential (fAPAR) is not yet fully developed, plants profit from the higher amounts of light and maximal temperature still present in the year to reach peak photosynthetic rates in the second half of July.The prolonged build-up of photosynthates in plant tissue results in a delayed maximum of green biomass.The widespread agreement between greenness proxies and photosynthesis proxies in high corre-lation with the temperature maximum across years and space (Fig. 7) are also indicative of the coordinated dynamics of annual maximum photosynthetic activity and the resulting peak photosynthetic potential (fAPAR) with temperature.
The results of Chadburn et al. (2017) support this interpretation of the patterns in the satellite observations.In their site-level evaluation of carbon fluxes in the high latitudes in Earth system models, GPP always depends on temperature but is limited rather by the LAI in the first part of the growing season until the end of July.After that, GPP is more driven by light, which qualitatively agrees with the earlier photosynthesis peak seen in our results.Besides, at the site-level, gas flux measurements show a similar timing of the maximum GPP in the first half of July or mid-July (Emmerton et al., 2016) and at the time of the annual temperature peak (Kross et al., 2014;Welker et al., 2004).This is similar to Lafleur and Humphreys (2008), who report on the largest annual site-level NEE after the summer solstice near the annual temperature maximum between DOYs190-210 and the dominant role of gross ecosystem productivity (GEP) in driving these dynamics.
An interesting aspect of the general time lags between proxies is the 10-to 12-day time difference between peak www.biogeosciences.net/15/6221/2018/Biogeosciences, 15, 6221-6256, 2018 APAR and peak SIF GFZ.According to the Monteith model for SIF, the observation of a time difference of peak APAR and the peak SIF suggests that SIF might contain information on the temporal dynamics of the actual photosynthetic light-use-efficiency of tundra vegetation.Circum-arctic vegetation is adapted to low light intensities to also allow photosynthesis at low irradiance (Chapin, 1987;Rogers et al., 2017, and references therein).Consequently, photosynthesis will rapidly become light-saturated, a situation that calls for high levels of non-photochemical quenching in order to avoid photodamage and inhibition by excess energy.Under these conditions, the efficiencies of carbon fixation and fluorescence emission are positively correlated (Porcar-Castell et al., 2014).Consequently, our results indicate a potential benefit of also using SIF in modelling the photosynthetic carbon uptake in the circumpolar tundra for its apparent sensitivity to both APAR and photosynthetic light-use efficiency.
Although they did not report on results on GPP, Luus et al. (2017) found higher agreement between the modelled NEE and eddy-covariance-derived NEE when phenology is prescribed by SIF instead of EVI in the tundra in Alaska.
Although the modelled GPP, SIF GFZ and SIF NASA are indicators of photosynthetic activity and they peak closer to the annual temperature maximum than both the APAR and greenness indices, they do differ in peak timing as well.In the following, we test a suite of possible explanations from a physiological decoupling of SIF and GPP to artefacts originating from data processing and data characteristics.
1.The time difference of the yearly maximum of the GPP model and SIF GFZ cannot be fully explained.Since the SIF maximum is reached in close and statistically significant temporal agreement with air temperature, it might indicate that SIF shows a higher sensitivity of photosynthetic rates to temperature.The earlier peak of the modelled GPP might be explained by a possible higher sensitivity to radiation as it is challenging to model the effects of the water table depth and temperature acclimation.This is especially true for the MET-GPP that culminates slightly earlier than the RSGPP, which is driven by a mean seasonal cycle and greenness that is not temporally resolved.Generally, the model performance of the modelled GPP is reduced in extreme climates (Tramontana et al., 2016).Furthermore, the FLUXCOM GPP might not accurately represent the GPP in tundra due to the small size of training data.The FLUXCOM GPP is trained at the FLUXNET sites and according to Tramontana et al. (2016), there are 11 sites north of 55 • that are not classified as forest or temperate and serve the modelling of the GPP in our study area.
Five of them are located north of 65 • and the three training sites classified as arctic are all located in Alaska.
We attempted to get an indication of whether -compared to an in situ eddy-covariance-derived GPP and site-level temperature -there is a systematic shift of either the modelled GPP (for the 0.5 • grid box whose centre coordinates are closest to the flux tower) or the SIF GFZ (16-day averages of all GOME-2 footprints with centre coordinates within a radius of 30 km from the flux tower).Such comparisons of satellite observations with site-level measurements can be very helpful for the interpretation of the patterns seen from space, but here they remain inconclusive.There is agreement between the site-level and regional analyses with respect to the order of the modelled GPP before temperature, but the order regarding the time of the peak SIF GFZ and temperature is reversed (SIF GFZ earlier at the site level, but after temperature at the regional scale, Figs.A9, A10).The comparison is hampered by a number of issues, including the (i) mismatch of scale and spatial representativeness, (ii) limited temporal overlap of site data with satellite records, (iii) data quality of the NEE and temperature at the site level (e.g.Figs.A11-A14), and (iv) quality of partitioning at the site level (partly strongly different behaviour between partitioning methods; see Figs.A12, A14, though they mostly converge versus the peak growing season; see also Runkle et al., 2013;Parazoo et al., 2018), (v) temporal scale of daily integrated EC measurements and instantaneous SIF observations in the morning (Parazoo et al., 2018, who find similar effects in studying the start of the growing season).These issues clearly suggest that EC cannot always be used as the "truth" in evaluating satellite observations.At the same time these results strongly underline the observational problems in the tundra explained in the introduction and call for more in situ measurements that are well characterised and understood in order to interpret the signals seen from the satellite.
As we do not find a physiological explanation, artefacts in SIF might originate from seasonal cloud cover as well as from satellite over-pass time and explain the time difference in the annual peak between the modelled GPP and SIF GFZ.Clouds affect the SIF values, both physiologically at the leaf level and on its way from the canopy to the satellite.A possible bias of rather clear-sky instantaneous observations of SIF in the morning hours compared to the modelled GPP, which is an all-day measure, might occur (Parazoo et al., 2018).Although there have been no tests in tundra, in both model simulations (Frankenberg et al., 2012) and empirical analyses (Köhler et al., 2015) choosing different thresholds of cloud cover does not strongly affect temporal patterns of SIF, and a large fraction can still be detected in moderately thick clouds.In our tests (Fig. A15), stricter cloud filters tend to slightly enlarge the lag between the modelled GPP and SIF GFZ, while SIF NASA shows no change.Conversely, SIF GFZ peak time shows no sensitivity to over-pass time, while in SIF NASA, noon over passes indicate a slightly earlier peak than using measurements between 08:00 and 14:00 LST Although data availability becomes problematic in the case of OCO-2, resulting in discontinuous climatologies, there is mostly agreement with OCO-2 SIF when averaged from larger regions (Fig. A17).This suggests that the peak timing obtained from GOME-2 SIF GFZ observations is reliable.
2. There is also a relatively large inconsistency between SIF GFZ and SIF NASA.We conducted tests in which we strictly filtered for the same cloud fractions (0.3) and over-pass times (08:00-14:00 LST) in both data sets.We did not attempt a one-to-one allocation of individual soundings between data sets, and it must be noted that the cloud fraction is defined in a slightly different way between SIF GFZ and SIF NASA (see methods).Indeed, a stricter cloud filter of 0.3 instead of 0.5 delays the SIF GFZ peak slightly, bringing it into closer to but not reaching a full temporal agreement with the SIF NASA peak (Fig. A15).Strict filtering for midday over pass times advances (as expected; Parazoo et al., 2018) the annual peak, but for other combinations of over-pass filters no systematics appear.
We argue that the NASA data set is more prone to noise (for example, for retrievals over bright surfaces when there is partial snow cover) due to the generally lower absolute values that result from a narrower retrieval window and that this severely affects the identification of the annual peak.This is indicated by the large spread in Fig. 4, by the less pronounced spatial patterns in Fig. 6 and by the time series examples in Fig. A16.Further, the illumination correction amplifies noise in the time series.This is thus an example for the degradation of the signal by the division by cos(SZA) www.biogeosciences.net/15/6221/2018/Biogeosciences, 15, 6221-6256, 2018 and calls for caution in applying it.We conclude that the different cloud fractions can explain a part of the difference between the SIF GFZ and SIF NASA and argue that the remaining difference might be ascribed to the different noise levels in the two data sets that make a reliable peak identification in SIF NASA more difficult than in SIF GFZ and also lead to a larger spread.
In our results, the NDVI and the MODIS VIproducts are the latest greenness proxies and peak around DOY216 (4 August).The NDVI is a widely used indicator of productivity, and a comparison to both ground-based measurements and satellite observations with the AVHRR instrument shows mostly support for the NDVI peak in very late July or the beginning of August.The ground NDVI along a transect in Alaska by Stow et al. (2004) and Stow et al. (2004) agree with the MODIS NDVI in that the seasonal maximum is observed at DOY218 in the beginning of August.In a second year there is even a second peak at DOY230 in the groundbased NDVI.Huemmrich et al. (2010a) show a time series of ground-based NDVI in Alaska that reaches the peak about 2 weeks earlier (at DOY203) than the average MODIS NDVI in our results but remains high until the end of August.However, May et al. (2017) report on peak dates of in situ measured NDVI in Alaska roughly 1 week to 2 weeks earlier  and talk about the beginning of senescence after the first sunset in late July to the beginning of August.Finally, the satellite-based biweekly NDVI from the AVHRR instrument is shown to peak between 22 July and 4 August (Stow et al., 2004;Zhou et al., 2001).Still, the onset and the peak timing of MODIS-based NDVI have also been found to be inconsistent with ground-based observations of NDVI (Gamon et al., 2013), which might suggest partly questionable reliability of satellite NDVI.
While the reported ground observations were all conducted in Alaska, in our circumpolar results the NDVI largely agrees with the other vegetation indices of the EVI and NIRv, only peaking later in the north-easternmost parts of Canada (Figs. 6 and A5).In addition to the Taymyr and coastal North Slope Alaska, these are the same regions where the NDVI lags to all other proxies are also largest.According to the ESA CCI land cover, those regions are characterised by moss (Fig. 1).Moss often has no clear seasonal cycle of greenness, making a peak identification difficult.Moreover, vegetation is particularly sparse in the form of prostrate dwarf shrubs, and there are extensive barren areas with rich lake cover in those northern Canadian areas (Fig. A1; Walker et al., 2005, their Figs. 1, 2e and 3).This renders the reflectance-based observation particularly sensitive to background conditions, especially without a clear seasonality in greenness (Walker et al., 2005, their Fig. 2f).Confirmation for this hypothesis of the strong contamination of the NDVI signal is given by the sharp transition from the very large lags in northeastern mainland Canada (eastern Barren Grounds) to lower albeit still negative lags to the northwestern part of mainland Canada (Fig. 6 and A5, corresponding to the land-cover transition between bare and sparse in the western parts of the Barren Grounds to moss in the more easterly regions of the Barren Grounds in Fig. 1).Similar to the northern Canadian islands and northeastern mainland, northwestern Canada is characterised by many lakes (Walker et al., 2005), and the ESA CCI land-cover reports sparse vegetation with much moss and open water as well (Fig. A1).However, in these more western areas, vegetation changes to erect dwarf shrubs and graminoids instead (Walker et al., 2005, their Fig.3), which exhibit a clearer seasonality (Fig. A18, panel of northwestern Canada) than the very sparse vegetation in the eastern parts with prostrate shrubs (Fig. A18, northern mainland and islands of the Canadian Arctic Archipelago, northeastern Canada and Iceland).These are also less affected by increasing values at the beginning and at the end of the growing season that are even partially higher than the summer maximum, severely influencing the identification of the annual peak.We speculate that possible explanations for this might be an increasing effect of low SZA late in the growing season (Kobayashi et al., 2016), affecting low NDVI in particularly sparse vegetation heavily.The NDVI might also be strongly decreased by standing surface water (Gamon et al., 2013) from snow melt or intermittent precipitation that has not yet drained or evaporated until later in the growing season.Only upon drying will the NDVI increase due to the missing water absorption in the near infrared, and this might most strongly affect the trajectory of the NDVI in the sparsely vegetated regions with the largest peak lags.
The highest vegetation water content is indicated by the VOD at a very similar time to the peak values of the vegetation indices, which might corroborate the usefulness of VOD to indicate vegetation biomass in tundra ecosystems, too.Both vegetation indices and the VOD are sensitive to vegetation structure and density and VOD in addition to water content (Liu et al., 2011).Especially in low biomass regions -as applied for tundra -a linear relationship between the VOD and vegetation water content has been found (Teubner et al., 2018).The similarity of the VOD as an indicator of total aboveground biomass to the vegetation indices might also support our interpretation of delayed peak greenness compared to photosynthesis, which is due to the longer-lasting allocation of plant material and pigments and indicate that the proposed relationship by Teubner et al. (2018) between the VOD, vegetation water content, LAI and GPP does not fully hold in the tundra.Conversely, when looking at examples of the mean seasonality of the VOD in comparison to the other vegetation proxies for selected regions (Fig. A19), the VOD annual cycle appears broader and the peak less well defined than the one of the vegetation indices.This could indicate that the vegetation water content changes only slightly during the growing season, while the chlorophyll concentrations possibly independently exhibit more pronounced dynamics and affect the vegetation indices to a greater degree.Another possible factor contributing to the broader annual cycle is the VOD's relation to the water content of the total aboveground biomass (as opposed to green or photosynthetic biomass), including moss, woody components and litter.If persistently wet, moss might drive the VOD signal and the greenness signal, though to a lesser degree.There is a high sensitivity of moss to air humidity as a consequence of the absence of roots.Despite a wet peat layer, there might be several centimetres of dry moss material.In grassy land, consistently high emissions and a consequently lower VOD seasonality were ascribed to the contributions of litter and wet vegetation components (Grant et al., 2016).Similar mechanisms might also hold for the VOD observations in tundra, especially considering that carbon turnover is slow in the high latitudes.It is also not clear what the VOD signal on water saturated soil might be.Moreover, the retrieval of the VOD is strongly dependent on the representation of open water and soil moisture.Considering that the retrieval algorithms have not been calibrated for tundra-like conditions and that with the high heterogeneity regarding plant types and landscape components, it might be difficult to accurately separate the contributions of vegetation from soil and water, it is not clear what the VOD in tundra means.In future studies it would be useful to analyse a suite of different VOD products from different sensors and wavelengths along with in situ observations in order to understand whether greenness and vegetation water content are strongly coupled in tundra or not and to what extent different retrievals affect the results.
Overall it must be stated that gaps in the data and the short growing season with often small seasonality and high noise levels challenge the reliable identification of phenological dates in all data sets.

Conclusions
We analysed and compared satellite-based indicators of plant productivity with respect to the timing of their maximum in arctic treeless regions.Over the whole study area, peak productivity is generally reached in July with an order of APAR, culminating in the first half of July along with the modelled GPP and followed by SIF GFZ one week later in synchrony with the highest annual temperatures.SIF NASA is delayed by 1 week.The EVI and NIRv indicate maximum greenness in the end of July, along with VOD as a proxy for vegetation water content.NDVI and MODIS VIproducts peak only in the first week of August.We interpret this sequence as the growth and synthesis of leaf tissue and pigments even after optimal conditions for assimilation regarding light and temperature have passed.Peak photosynthesis occurs earlier, at a time when full photosynthetic potential has not yet developed but when light is still abundant and the temperature favourable.The largest lags between the NDVI and photosynthesis indicators are found in regions with particularly sparse vegetation and without a clear seasonality in spectral reflectance that can heavily be confounded by low sun angles and the high abundance of lakes.
To our knowledge, only few studies of tundra vegetation have been based on other variables observed than spectral reflectance (Luus et al., 2017;Parazoo et al., 2018).It was questioned a priori whether current satellite-based SIF data sets are useful for tundra vegetation considering the very large footprints, high susceptibility to noise and very small signals from the sparse vegetation.However, the spatial patterns of peak productivity of SIF are qualitatively similar to the ones seen in the modelled GPP and reflectance-based observations.Furthermore, the fact that the SIF maximum is reached in close temporal agreement with air temperature might indicate a benefit for photosynthesis from the highest temperatures.The general time difference between the proxies of APAR and SIF suggest that there is information on light-use-efficiency contained in the SIF observations.Still, further studies are needed to verify this.The results of our study confirm the important separation between indicators of greenness and photosynthesis and also highlight nonnegligible differences between data sets of the same indicators.Regarding data availability in the future, similar cross comparisons to the chlorophyll-carotenoid index (Gamon et al., 2016) and the photochemical reflectance index (Gamon et al., 1992) in tundra might still add additional complementary information on circumpolar vegetation dynamics.

S. Walther et al.: Assessing the dynamics of vegetation productivity in circumpolar regions
Appendix A Averaged over all sites and years (Fig. A9), site-GPP peaks about 10 days earlier than site-level air temperature, with both partitioning and the quality filter shifting the peak time up to 5 days.Time lags of very similar magnitudes are shown by the grid-cell modelled GPP.In this respect the patterns at the site level agree with the slightly earlier GPP-model peak, compared to the temperature peak seen in the regional and interannual analyses (Fig. 4).Conversely, while in the largescale pattern, SIF GFZ and SIF NASA peak close to but slightly later than the air temperature; at the site level, SIF GFZ indicates a maximum at a similar time to the modelled GPP and the EC-derived GPP (excluding spurious results from sites NO-Adv, DK-ZaF and DK-ZaH, where strong differences between the partitionings are apparent and/or an annual cycle in the SIF data are absent; Fig. A12) and peaks before air temperature.and 19), "grass/herbaceous" (classes 26, fractions of 13,16,19,[21][22][23][24][25]33), "woody" (shrubs and trees, classes 10-12, 14, 15, 17, 18, 20, fractions of 13, 16, 19, 21-25, 31-33), "water" (class 38, fractions of 31-33) and "other" (remaining classes and fractions).
For individual sites the patterns are less clear.Despite a frequent tendency for an earlier peak of the EC-GPP, the signs of the time lags between the peak EC-GPP and temperature are not consistent.For NO-Adv, RU-Cok, RU-Sam and SE-St1, partially strong dependencies of the peak DOY on the partitioning method and the quality filter are seen.The modelled GPP is maximised somewhat earlier than the sitelevel temperature, and at some sites the lags become very small.For the sites where the SIF time series show a growing season and not pure noise, which excludes DK-ZaF, DK-ZaH (Fig. A12) and NO-Adv, SIF GFZ reaches its maximum earlier on average than the site-level temperature, although there is a large spread in the magnitude of the lag of 3 to 26 days between sites.

Figure 1 .
Figure1.ESA CCI land cover in regions with less than 5 % tree cover according toHansen et al. (2013).Atmospherically corrected true colour images are from Sentinel-2, taken at different dates in 2017.For the region shown in each image the majority land cover is given and the tree cover percentage according toHansen et al. (2013).

Figure 2 .
Figure 2. Climatologies of different atmospheric and land surface variables for a small area on the Taymyr Peninsula in Russia, indicated in Fig. 1.Bold lines indicate the time period when air temperatures are above the freezing point (no axes displayed for temperature and incoming radiation).Vertical dashed lines indicate the time of the year when the Sentinel-2 images shown in the second panel were taken.Sentinel-2 images are atmospherically corrected and taken in 2017.

Figure 3 .
Figure 3. Climatologies of different atmospheric and land surface variables for a small area in the North Slope in Alaska, indicated in Fig. 1.Bold lines indicate the time period when air temperatures are above the freezing point (no axes displayed for temperature and incoming radiation).Vertical dashed lines indicate the time of the year when the Sentinel-2 images shown in the second panel were taken.Sentinel-2 images are atmospherically corrected and taken in 2017.

Figure 4 .
Figure 4. Distribution of the DOY of the peak of the different vegetation proxies over the study region and between years (spatial sampling matched between data sets for each year).Bars in the boxes indicate the median, stars indicate the mean and the numbers below the bars denote the spatiotemporal mean (standard deviation).Colours of the bars denote grouping of the different variables according to the families of fAPAR, APAR, the modelled GPP, SIF and VOD.

Figure 5 .
Figure 5. DOY of the annual maximum averaged over all years, as indicated by selected vegetation proxies.
Figure 8 (and Fig. A8, but here based on the actual time series and not the bootstrapped values) shows the peak lags as a function of land cover based on the ESA CCI for all years in the study www.biogeosciences.net/15/6221/2018/Biogeosciences, 15, 6221-6256, 2018

Figure 6 .
Figure 6.Average time difference of the maximum across years of selected vegetation proxies and the NDVI.
period separately.Peak lags to the NDVI per proxy are generally similar between land covers, although there is a tendency in several proxies (excluding the VIproducts of MxD13C1, METGPP, SIF NASA) towards larger lags in regions classified as moss.According to Fig.A1, this largely corresponds to the sparsely vegetated areas in northern Canada, also with high cover fractions of water and barren.The smallest lags of RSGPP and SIF GFZ are shown for the mixed shrub and tree land covers.There is also some variability between years, which is largest for the timing of the moisture-related variables of the maximum extent of open water and the GLEAM soil moisture minimum.Conversely, the variability in the peak lag per land cover is smaller for the vegetation proxies between years.The VOD is maximised 5 to 7 days earlier than the NDVI in tundra-like vegetation types, but it has no consistent relationship with the NDVI in forests and water.Interesting patterns are the large NDVI-t2m lags in 2015 in grass and herb areas and 2012 in mossy areas as well as the particularly small lag in 2013 in mixed shrub and tree areas.

Figure 7 .
Figure7.Frequency distribution of the pixel-wise Spearman's rank correlation between the peak DOY of the vegetation proxies across years and the peak DOY of environmental variables in spatial moving windows of 1.5 • (9 spatial pixels times 10 years at most, correlated only if more than 20 data points available).Plotted here is the variable with the highest absolute correlation without the filter for statistical significance.Full correlations have been calculated (no partial correlations).

Figure 8 .
Figure 8.Average of the time difference between the peaks of one selected variable per family (APAR, fAPAR, SIF, GPP, VOD) and the NDVI as a reference, weighted with fractional cover per vegetation type (based on ESA CCI) and per year.Each bar represents the average of 50 bootstrap samples of the time series per year, pixel and vegetation proxy; error bars indicate a multiple of 1.96 of the standard deviation across the 50 samples.

Figure A3 .
Figure A3.Distribution of the DOY of the peak of the different environmental variables over the study region and between years (spatial sampling matched between data sets for each year).Bars in the boxes indicate the median, stars indicate the mean and the numbers below the bars denote the spatial mean (standard deviation).

Figure A4 .
Figure A4.Average time difference of the yearly maximum across years between selected vegetation proxies and the NDVI based on 50 bootstrap samples of each year's time series.Shown are only statistically significant differences at a level of 5 %.

Figure A5 .
Figure A5.Average time difference of the maximum across years of selected vegetation proxies and the NDVI.

Figure A6 .
Figure A6.Spearman's rank correlation between the peak DOY of the vegetation proxies across years and the peak DOY of environmental variables in spatial moving windows of 1.5 • (9 spatial pixels times 10 years at most, correlated only if more than 20 data points available).Plotted here is the variable with the highest absolute correlation without filter for statistical significance.Full correlations have been calculated (no partial correlations).

Figure A8 .
Figure A8.Average of the time difference between the peaks of various variables and the NDVI as a reference per vegetation type (based on ESA CCI) and per year.The "avg.peak" represents the average of the lags in the individual years.

Figure A9 .
Figure A9.Difference in peak timing in the different indicators to site-level temperature over years at selected FLUXNET sites.Diamonds represent the average peak lag across all years and sites, while squares indicate the average peak time across years and sites, excluding NO-Adv, DK-ZaH and DK-ZaF (where either no seasonality in SIF or strong issues between different partitioning methods are apparent).

Figure A10 .Figure A16 .Figure A17 .Figure A18 .
Figure A10.Difference in peak timing of the different indicators to site-level temperature over years at selected FLUXNET sites.Diamonds represent the average peak lag across all years per site and variable.

Figure A19 .
Figure A19.Mean seasonal cycles of selected proxies averaged over smaller regions, as indicated in Fig. 1.
S.Walther et al.:Assessing the dynamics of vegetation productivity in circumpolar regions