Basic and extensible post-processing of eddy covariance flux data with REddyProc

With the eddy-covariance (EC) technique, net fluxes of carbon dioxide (CO2) and other greenhouse gases as well as water and energy fluxes can be measured at the ecosystem level. These flux measurements are a main source for understanding biosphere-atmosphere interactions and feedbacks by cross-site analysis, model-data integration, and up-scaling. The raw fluxes measured with the EC technique require an extensive and laborious data processing. While there are standard tools available 5 in open source environment for processing high-frequency (10 or 20 Hz) data into half-hourly quality checked fluxes, there is a need for more usable and extensible tools for the subsequent post-processing steps. We tackled this need by developing the REddyProc package in the cross-platform language R that provides standard CO2-focused post-processing routines for reading (half-)hourly data from different formats, estimating the uStar threshold, gap-filling, flux-partitioning, and visualizing the results. In addition to basic processing, the functions are extensible and allow easier integration in extended analysis than current 10 tools. New features include cross year processing and a better treatment of uncertainties. A comparison of REddyProc routines with other state-of the art tools resulted in no significant differences in monthly and annual fluxes across sites. Lower uncertainty estimates of both uStar and resulting gap-filled fluxes with the presented tool was achieved by an improved treatment of seasons during the bootstrap analysis. Higher estimates of uncertainty in day-time partitioning resulted from a better accounting of the uncertainty in estimates of temperature sensitivity of respiration. The provided routines can be easily installed, configured, 15 used, and integrated with further analysis. Hence the eddy covariance community will benefit from using the provided package, allowing easier integration of standard post-processing with extended analysis. 1 Biogeosciences Discuss., https://doi.org/10.5194/bg-2018-56 Manuscript under review for journal Biogeosciences Discussion started: 12 February 2018 c © Author(s) 2018. CC BY 4.0 License.


Introduction
The availability of ecosystem-level observations of net ecosystem exchange (NEE) of carbon dioxide (CO 2 ) and other gases and latent heat (LE) and sensible heat (H ) fluxes measured by the eddy covariance (EC) method (Aubinet et al., 2000) greatly advanced ecosystem understanding at site to global scales (Baldocchi et al., 2017).The EC method provides half-hourly or hourly records of turbulent fluxes between an entire ecosystem and the atmosphere.These data are derived from high-frequency measurements (10 or 20 Hz) of wind speed and direction together with measurements of air scalar characteristics such as CO 2 and water vapor concentration, and temperature.Methods to compute fluxes from high-frequency measurements, methods for the quality checks and quality assessment (QA/QC), and methods for the storage corrections have Published by Copernicus Publications on behalf of the European Geosciences Union.
T. Wutzler et al.: REddyProc been consolidated in recent decades (Rebmann et al., 2012;Foken et al., 2012;Aubinet et al., 2012) and are available as open-source software2 .Although measured continuously, the (half-)hourly EC data contain gaps due to instrument malfunctioning or records which are not representative of the ecosystem because of micrometeorological conditions under which the assumptions of the EC technique are not met (details in e.g., Foken and Wichura, 1996;Foken et al., 2004Foken et al., , 2012;;Göckede et al., 2004).Hence, (half-)hourly records are marked with different quality flags and need further extensive post-processing as described by Papale et al. (2006).
NEE records from periods with low friction velocity (u * ) (Aubinet et al., 2012) need to be detected and filtered out to avoid systematic biases in nighttime NEE (Papale et al., 2006).The screened flux time series with gaps need to be filled (Reichstein et al., 2005a) using the available flux data and meteorological measurements.Additional information can be obtained from NEE thanks to flux-partitioning methods that provide model estimates of gross primary production (GPP) and ecosystem respiration (R eco ) (Reichstein et al., 2005a).These gross fluxes are important to understand landatmosphere interactions.
All these post-processing steps need to be performed routinely for EC data.Hence, it is desirable to have automated and reproducible post-processing tools available that can be easily used, extended, and integrated into researchers' own workflow.For this purpose we have compiled all routines for the important CO 2 -focused post-processing steps in the REddyProc package in the free and cross-platform R language.The REddyProc package loads time series of quality-checked and storage-corrected fluxes and the basic set of meteorological variables and provides a software environment to perform u * threshold detection and filtering, gap filling, and partitioning.Furthermore, a series of other functionalities like data import routines and data visualization are provided.
The objectives of the paper are to 1. provide a reference that describes the methodology of the processing used in the REddyProc package, and 2. show that the obtained results do not differ systematically from results obtained with standard postprocessing implemented in the FLUXNET community (based on Papale et al., 2006;Reichstein et al., 2005a;Lasslop et al., 2010;Pastorello et al., 2017).
Appendix C explains abbreviations used.The first part of the paper (Sect.2) describes the post-processing methods.The second part (Sect.3) presents the benchmarks of the REddyProc implementation with standard postprocessing tools.It details differences in the implementa-tions and possible consequences in obtained results and aggregated fluxes.Appendices A-B provide an overview of the package with the general design, an example of the postprocessing, and links to resources that get readers started with post-processing their own data.

Methods of post-processing
The post-processing relies on half-hourly or hourly measurements of NEE and ancillary meteorological data of u * , global radiation (Rg), air or soil temperature (Tair, Tsoil), and vapor pressure deficit (VPD).The fluxes should be quality-checked and, if applicable, storage-corrected before their use in the package.
The post-processing follows a specific workflow: 1. determination and filtering of periods with low turbulent mixing (u * filtering), 2. replacement of missing data in the half-hourly/hourly records (gap filling), and 3. partitioning of NEE into the gross fluxes GPP and R eco (flux partitioning).
Usage of the REddyProc package follows this data postprocessing workflow (Fig. 1).The following sections explain the steps in more detail.

u * filtering
Determining periods with low turbulent mixing is a critical step in the EC data post-processing.Standard steady-state and integral turbulence characteristics tests in the initial processing exclude the most problematic records of H , LE, and CO 2 fluxes (Foken and Wichura, 1996).However, it is well known (summarized in Aubinet et al., 2012, chap. 5), that such a quality-checking strategy is not sufficient, especially in the case of CO 2 .Stable stratification that is present often during the nighttime dampens turbulence and leads to an underestimation of the nighttime NEE, i.e., the ecosystem respiration (van Gorsel et al., 2007).Massman and Lee (2002) proposed that unfavorable conditions could be detected by inspecting the relationship of nighttime NEE versus u * .Within a similar time period and similar environmental conditions respiration should not be dependent on the u * .At low u * values, a negatively biased respiration is measured.A heuristic class of methods, which is widely accepted, assumes that a threshold of u * can be established above which nighttime fluxes are considered valid.Hence, the u * threshold is the minimum u * above which respiration reaches a plateau (Fig. 2).This threshold is specific for each season of a site year.Uncertainties in the u * threshold estimate represent one of the largest uncertainty components in the postprocessing of NEE.There are at least two methods of estimating the u * threshold: the moving point method (Reichstein et al., 2005c;Papale et al., 2006), which is currently more routinely used, and the breakpoint detection method (Barr et al., 2013).

Moving point method for u *
The method of Papale et al. (2006) detects a plateau in the relationship of nighttime NEE versus u * among all records within a temperature subset by a moving point test of records binned into different u * bins.
The nighttime data (default: Rg < 10 W m −2 ) are split into different times of year, here called seasons, to account for differing surface roughness.Then the data of each season are split into default six temperature subsets of equal size (according to quantiles).Within each temperature subset, data are split into 20 about equally sized u * bins.The default moving point method, called Forward2, determines the threshold based on these u * bins.It checks for each bin if the mean NEE is higher than 0.95 times the mean of the following 10 bins.If this also holds true for the next bin, the mean u * of the bin is reported as the threshold.There are often subsets of data in which no clear threshold can be detected.Hence, there are quality criteria for whether the estimate of a given subset is used in subsequent aggregation.One quality criterion specifies that temperature and u * should not be correlated within the temperature subset; another requires a minimum number of valid records within a subset.Next, the u * estimates for different temperature classes and seasons (details in Sect.3.2.1)are aggregated to derive a robust u * estimate.Within one season, the median is taken across the estimates of different temperature subsets.Within 1 year, the maximum is taken across the associated seasons.
Records during the nighttime with u * smaller than the estimated threshold are flagged as invalid and are replaced in the subsequent gap-filling processing step.In addition, each half hour after records with u * smaller than the threshold is flagged to be invalid.

Breakpoint detection method for u *
Alternatively, breakpoint detection can be applied to the unbinned data, which avoids the sensitivity of the moving point method to the specifics of the binning schemes (Barr et al., 2013).REddyProc provides this method by estimating the breakpoint based on unbinned records within the seasons/temperature subsets using the segmented R-package (Muggeo, 2003(Muggeo, , 2008)).However, REddyProc differs from Barr et al. (2013) by keeping the same aggregating scheme of seasonal/temperature estimates to annual thresholds as with the moving point method.

Bootstrapping uncertainty of the u * threshold
Estimates of the u * threshold are often sensitive to the specifics of the combination of methods and the data, e.g., the binning, minimum number of records within a season or temperature subset, and criteria in aggregation.Therefore, a bootstrap (resampling with replacement) is applied to generate 200 artificial replicates of the dataset, and for each replicate the threshold is estimated (Efron and Tibshirani, 1986;Davison and Hinkley, 1997).The 5th, 50th, and 95th percentile of the estimates are reported as a range of threshold estimates.The subsequent post-processing steps of gap filling and partitioning are then repeated using those different thresholds to propagate the uncertainty of u * threshold estimation to derived quantities such as annual NEE, GPP, and R eco .

Gap-filling methods
After quality checks and u * filtering, the dataset of halfhourly NEE fluxes may contain up to 50 % gaps (sometimes this fraction is even higher, depending on the site conditions).For the benchmark datasets used in this paper, the percentage of gaps before u * filtering was on average 32 % and after estimates, respectively.Filling of gaps in half-hourly NEE data is necessary to obtain complete time series for the calculation of daily averages or balances such as monthly or seasonal sums.The following three gap-filling methods are implemented in REddyProc.

Look-up tables
In the look-up table (LUT) approach, the fluxes are binned by the meteorological conditions within a certain time window.Within the chosen time window and respective bin, each meteorological variable deviates less than a fixed margin to ensure similar meteorological conditions.The missing value of the flux is then calculated as the average value of the binned records and its uncertainty estimated from their standard deviation.
The original LUT of Falge et al. (2001) consisted of fixed periods over a year, while in REddyProc the meteorological conditions are sampled with a moving window around the gap to be filled.

Mean diurnal course
NEE fluxes have a mean diurnal course (MDC) that follows the course of the sun with only respiration during nighttime and a combination of respiration and photosynthesis during daytime.This autocorrelation of the fluxes is exploited by taking the average value at the same time of day within a moving time window of adjacent days (Falge et al., 2001).In REddyProc the same time of day also includes the fluxes of the adjacent hour (±1 h).
Though the MDC method only showed a medium performance in the gap-filling comparison for net carbon fluxes by Moffat et al. (2007), it has the advantage that this approach can be used even if no meteorological information is available.

Marginal distribution sampling
The so-called marginal distribution sampling (MDS) by Reichstein et al. (2005b) exploits the covariation of the fluxes with the meteorological variables and their temporal autocorrelation based on the two methods LUT and MDC described above.
The filling of each half-hourly NEE with the MDS algorithm depends on the availability of the meteorological data of Rg, Tair, and VPD.(1) If all three meteorological variables are available, LUT will be used with default margins of 50 W m −2 , 2.5 • C, and 5.0 hPa, respectively.(2) If Tair or VPD are missing, only the variable Rg will be used.(3) If no meteorology is available, the gaps are filled with MDC.Following a specific sampling procedure, the MDS algorithm increases the number of days in the vicinity of the gap until there are enough data points (at least two) for gap filling.A more detailed description with a flow diagram is provided in the Supplement.
The MDS algorithm is optimized for carbon dioxide and water fluxes and can also be used to estimate the uncertainty of the half-hourly fluxes.In the comparison of gapfilling methods by Moffat et al. (2007), the MDS algorithm performed well for different artificial gap scenarios ranging from single half-hours to several days.Due to its flexibility in dealing with missing meteorological input data and its fast and highly automated routines available as an online tool (BGC16, Sect.3), the MDS gap-filling method has been widely used.

Flux-partitioning methods
The gross fluxes of GPP into the land system and R eco out of the land system are the two opposing parts of NEE: NEE = R eco − GPP.Availability of GPP and R eco is pivotal as they are the two biggest terms of the carbon cycle (e.g., Chapin et al., 2006;Jung et al., 2011).Moreover, understanding their sensitivity to environmental drivers (e.g., ra-diation, temperature, and soil moisture) is important to interpret land-atmosphere interactions and to improve earth system models (Reichstein et al., 2012).Therefore several methods were developed to partition NEE into these two components (Reichstein et al., 2005c;Lasslop et al., 2010;Moffat, 2012;Wehr and Saleska, 2015;Desai et al., 2008;Stoy et al., 2006).
The two most widely used methods are the so-called nighttime partitioning and daytime partitioning (Reichstein et al., 2012).The nighttime partitioning (Reichstein et al., 2005c) relies on the temperature response function of nighttime NEE fluxes that are representative of R eco .It assumes that this relationship is also applicable to daytime data.The relationship is then used to predict R eco from measured temperature and GPP is computed as a difference between R eco and NEE.This method is currently the most widely used approach.Alternatively, the daytime partitioning (Lasslop et al., 2010) fits a model to observations of daytime NEE and global radiation, accounting for the effects of radiation and VPD on GPP as well as the effects of temperature on R eco .

Nighttime flux partitioning
The method of Reichstein et al. (2005c) estimates a temporally varying respiration-temperature relationship from nighttime data.First nighttime data are selected by a threshold of Rg < 10 W m −2 , which is congruent with the BGC online tool (BGC16, Sect.3), but differs from the 20 W m −2 reported in Reichstein et al. (2005c).Additionally, nighttime data are constrained between computed sunset and sunrise.
Next, temperature sensitivity, E 0 , of the Lloyd and Taylor (1994) relationship (Eq. 1) is estimated by fitting the model to successive 15-day periods of nighttime data, and the resulting E 0 series is aggregated to an annual estimate.
where T 0 is kept constant at −46.02 • C (Lloyd and Taylor, 1994) and where the reference temperature T Ref is 15 • C, which is congruent with the BGC online tool (BGC16, Sect.3), but differs from the 10 • C reported in Reichstein et al. (2005c).For robustness each fit is repeated on a trimmed dataset excluding records with residuals outside the 5 %-95 % residual distribution.The annual aggregate is the mean across the three valid estimates with the lowest uncertainty in the fit.Single estimates of E 0 are considered valid if there were a minimum of six records, temperature ranged across at least 5 • C, and estimates were inside the range of 30 to 450 K. Subsequently, the respiration at reference temperature, R Ref , is re-estimated from nighttime data using the annual E 0 temperature sensitivity estimate for 7-day windows shifted consecutively for 4 days.The estimated value is then assigned to the central time point of the 4-day period and linearly interpolated between periods.Hence, the obtained respiration-temperature relationship varies across time.
Finally, R eco is estimated for both day-and nighttime from the temporarily varying R eco -temperature relationship, and daytime GPP is computed as R eco -NEE.

Daytime flux partitioning
The method of Lasslop et al. (2010) models NEE using the common rectangular hyperbolic light-response curve (LRC) (Falge et al., 2001): where α (µmol CO 2 J −1 ) is the canopy light utilization efficiency and represents the initial slope of the light-response curve, β (µmol CO 2 m −2 s −1 ) is the maximum CO 2 uptake rate of the canopy at infinite Rg, and γ (µmol CO 2 m −2 s −1 ) is a term accounting for ecosystem respiration.The hyperbolic light-response curve is modified to account for the temperature dependency of respiration after Gilmanov et al. (2003) by setting respiration γ to the Lloyd and Taylor respiration model (Lloyd and Taylor, 1994) (Eq.1).Further, the constant parameter β in Eq. ( 2) is replaced by an exponential decreasing function (Körner, 1995) at higher VPD values (Eq.3).
where the VPD 0 threshold is 10 hPa in accordance with earlier findings at the leaf level (Körner, 1995), ignoring potential vegetation specific differences.Parameter T 0 in Eq. ( 1) was fixed as in the nighttime partitioning (Sect.2.3.1).Parameter T Ref was fixed in each window to the median temperature within the window.The other parameters (E 0 , R Ref , α, β 0 , k) of the model are estimated by the following steps.(1) A time-varying temperature sensitivity E 0 is estimated from nighttime data for a window shifted by 2 days.( 2) The E 0 estimates are smoothed across successive windows by fitting a Gaussian process (Rasmussen and Williams, 2006;Menzer et al., 2013)  Note, that contrary to the nighttime-based flux partitioning, both GPP and R eco are model predictions and do not add up exactly to observed NEE.

Benchmarking REddyProc post-processing steps
The post-processing steps' implementations of REddyProc were benchmarked with the post-processing tools widely used in the FLUXNET processing.Specifically, REddyProc (version 0.8.1) u * -filtering results were compared with results by a C implementation from Dario Papale (Papale et al., 2006), referred to here as DP06.Results of REddyProc (version 1.1.3)gap filling and flux partitioning were compared with results obtained by the 2016 version web-based tool provided by the Max Planck Institute for Biogeochemistry, Jena, best described in Reichstein et al. (2005a).The tool was accessed in 2016 (29 July 2016) and is hereafter referred to as BGC16.Here, annually and monthly aggregated values refer to the mean across all valid values in a month or a year, which can differ from real annual or monthly budgets in the presence of large gaps.The first section describes the dataset used for benchmarking for each processing step implemented in the package.Within each of the following sections for the processing steps, subsections describe differences in the code, report the results of benchmarking, and discuss them.The Supplement, additionally, provides more detailed results and statistics.

Dataset used for benchmarking
Data of 25 sites of the LaThuile FLUXNET dataset 3 , which have an open data policy, were used for benchmarking.The sites are located in different climate zones and belong to a variety of plant functional types (Table 1) to guarantee testing of different conditions (i.e., presence of snow, management such as cuts and crop rotation, sites disturbed) and ecosystem types (e.g., deciduous versus evergreen forests, grasslands and croplands).For each site the following variables were used: NEE already filtered for quality flags (Foken and Wichura, 1996), despiked and u * -filtered (Papale et al., 2006), random error of NEE computed as described by Reichstein et al. (2005a), Tair, Tsoil, Rg, and VPD.Moreover, NEE time series before the u * filtering and the u * data were downloaded from AMERIFLUX and the European Flux Database to test the u * threshold estimation.Finally, time series of gap-filled NEE (NEE f ) and GPP partitioned with the nighttime-based method (GPP NT ) (Reichstein et al., 2005a) were downloaded from the LaThuile dataset, while GPP partitioned with the daytime method (GPP DT ) was computed with BGC16.3.2 u * filtering: benchmarking with DP06 Estimation of the u * threshold by REddyProc using the default moving point method (Sect.2.1.1)was benchmarked to estimation based on Papale's DP06 C implementation (Papale et al., 2006).The benchmark applied a bootstrap sample of size 60 and recorded lower, median, and upper quantiles of 10 %, 50 %, and 90 % instead of the default 5 % and 95 % based on a larger sample size to save computing time.
The different estimates of the u * threshold have potential consequences for the inferred fluxes.To explore these consequences, we used the different resulting thresholds to mark gaps, gap-fill the data, and compute the annual NEE based on the gap-filled time series.NEE uncertainty was estimated by the difference between NEE based on the lower quantile u * and NEE based on the upper quantile u * estimate.

Differences in code
The biggest difference of REddyProc compared to DP06 is that REddyProc by default employs seasons that can span across years.With the within a year classification op-tion, which is also employed by DP06, records of December are associated with the same season as January and February of the same year.With the default continuous classification, seasons start the same as in DP06 by default in March, June, September, and December.However, December is treated in the same season as January and February of the next year to avoid discontinuities at year boundaries.The annual u * threshold is then applied according to those continuous seasons spanning year boundaries.For example, the processing of 2014 data would by default use data from winter 2014 (starting in December 2013) to autumn 2014 (ending in November 2014).REddyProc also allows more flexibility with the user-specified classification into seasons as explained below.
There are further slight differences between REddyProc and DP06.Both methods bin in a way such that the number of records in each bin is similar.If there are numerically equal u * values, they are sorted into the same bin, resulting in bins with unequal record numbers.In DP06 sometimes no records are sorted into the subsequent bins, hampering the moving point detection.Conversely, the binning with REddyProc ensures that there are a minimum number of records in all bins.This often results in fewer bins.Moreover, differing from DP06, REddyProc employs several more quality criteria.First, when comparing the threshold bin to NEE in the following bins, it makes sure that there are least three bins to infer a plateau in NEE.Next, when aggregating the thresholds of different temperature classes to season, it ensures that a threshold was found in at least 20 % of the temperature classes.For those seasons during which no threshold could be determined, the annual estimate is used.When there are too few records within a year, a single season comprising all records is used for threshold estimation.
Differently to DP06, REddyProc only resamples data within seasons instead of across the entire year during the bootstrap, in order to protect periods of a similar u * -NEE relationship and to avoid seasonal biases in resampling.

Benchmark results
The general relationship in the estimation of the u * threshold was retained between the two methods (Fig. 3), although individual threshold estimates differed.The exceptionally high threshold value of > 0.6 m s −1 for site FR-Pue was very probably an overestimate by DP06.However, one has to remember that each estimate has a high uncertainty, and the differences between the two methods were in the range of this uncertainty (Supplement).The estimate of the uncertainty of the u * thresholds with REddyProc was, however, only half of the uncertainty range estimated by DP06 (Supplement).This increased precision was mainly due to the modified bootstrapping scheme, which respects the u * seasons.When propagating the differences in u * to differences in annual NEE, there was no bias and decreased scatter across sites between all the methods (Fig. 4), despite the differences in u * threshold.The absolute differences in annual NEE between the methods were small (mostly < 20 gC m −2 yr −1 ), and mostly lower than half of the uncertainty range estimated from the bootstrap (Supplement).REddyProc estimates u * thresholds with roughly double the precision compared to DP06, due to its protecting of seasons during bootstrap (Supplement).

Discussion of u * threshold estimation
The agreement between NEE based on u * estimates of REddyProc moving point implementation and current FLUXNET standard post-processing (DP06) (Fig. 4) indicates that the sensitivity of NEE to the u * threshold estimate in the inferred ranges is low, which also explains the large uncertainty of the u * threshold estimate.One reason for the missing effect could be site selection of this study without many sites affected by advection, which show limited saturation of the NEE-u * dependence.Since in such cases the filtering does not work properly anyway, it should not change the conclusions for NEE.Hence, we infer that u * estimates www.biogeosciences.net/15/5015/2018/Biogeosciences, 15, 5015-5030, 2018 of both DP06 and REddyProc are appropriate due to the negligible effect on NEE sums.The agreement implies that both methods can be interchanged in studies that are based on aggregated values, such as annual carbon budgets or for upscaling, without the need to reprocess data.However, the increase in estimated precision, i.e., lower standard deviation, of the u * threshold estimate also yields an increase in estimated precision of the annual NEE by 50 % (Supplement).This will lead to improved accuracy and usability of EC measurements and any downstream, postprocessed data products in model-data integration studies.
While the default seasons and their aggregation are in line with previous approaches, REddyProc allows site-specific knowledge to be used to derive better threshold estimates.For example, if there is a disturbance such as harvest, the u * threshold is expected to change and a different threshold should be applied for filtering before and after the disturbance.In this case the user can define a season change at the harvest date and use season-specific threshold estimates instead of the annually aggregated estimate (Sect.B7 in Appendix B).

Gap filling: benchmark with BGC16
The gap-filling implementation of REddyProc was benchmarked with the BGC online tool (BGC16, Sect.3), which used pvWave code developed by Reichstein et al. (2005c).

Differences in code
Compared to the BGC16, the new implementation of the MDS algorithm in REddyProc was not limited to single years, but it filled the gaps with a window moving continuously over all years in the input data.This had the advantage of smoother gap filling over the end of the year, and this will especially be of interest for sites in which vegetation is not dormant during this time.This new feature led to different, probably more realistic gap-filled NEE values at the beginning and end of the year.
There were also slight differences in the sequence of window sizes between REddyProc and BGC16.For MDC, the window size with BGC16 had a few more intermediate day steps than REddyProc, which affected longer gaps with missing meteorology.The default meteorological variables and margins for LUT (see Sect. 4.2.2 above) were the same in both implementations.
While REddyProc restricts gap filling to the interpolation of gaps, BGC16 also restricted missing records in periods without measurements.

Benchmark results and discussion for gap filling
In the benchmark, REddyProc gap filling was run using the same measured NEE as input that passed the QA/QC routines and u * filtering.The annually aggregated values comprised both filled gaps and originally valid records.REddyProc gap-filling results agreed with the results of BGC16.A few discrepancies at a half-hourly timescale were found mostly during longer gaps due to the usage of fewer window sizes, as shown for the DE-Tha case (Fig. 5a).At an annually aggregated timescale, the agreement between methods was strong (R 2 = 0.99) (Fig. 5b).The outlier of site RU-Cok is due to the availability of only a few months of data for the whole year.While REddyProc filled gaps in the time period with available data, BGC16 extrapolated into the time before and after this period.The seasonal cycle was well reproduced at each site (Supplement).
The good agreement between NEE based on gap filling by REddyProc and gap filling by BGC16 (Fig. 5) implies that both gap-filling tools can be used interchangeably without the need to reprocess data.

Nighttime flux partitioning: benchmark BGC16
The nighttime-based flux partitioning was benchmarked to BGC16, which used pvWave code developed by Reichstein et al. (2005c).

Differences in code
The main features of the REddyProc implementation of the nighttime-based partitioning algorithm were very similar to BGC16, using a reference temperature of 15 • C and trimming the estimates of temperature sensitivity E 0 before aggregating them (Sect.2.3.1).REddyProc differed from BGC16 in computing the potential radiation that is used in subsetting the nighttime data to derive E 0 and R Ref (Reichstein et al., 2005c).While REddyProc used the exact solar time for the calculation of the potential radiation, where the sun culminates exactly at noon, BGC16 used the local winter time, which differs from the solar time depending on the location within the time zone.

Benchmark results and discussion for nighttime flux partitioning
Annual aggregated values of R eco predicted by REddyProc were in very good agreement (R 2 = 0.99; slope ≈ 1) with BGC16 as shown in Fig. 6 and in the Supplement.
In order to evaluate the effects of the differences introduced in the code described above, we also computed R eco by prescribing either E 0 , or a selection of nighttime data, or both from BGC16 output in REddyProc.Results are reported in the Supplement and showed that the most important factor affecting the R eco computed with REddyProc was the different selection of nighttime data, though the differences were almost negligible at an annual timescale.
The two implementations agreed very well for most sites at an annual timescale.Because of no systematic deviations across sites, the spatial upscaling of fluxes should not be affected by REddyProc implementation.However, for some sites, such as IT-Amp, the relative errors that are quite large indicate problems related to the selection of nighttime data and problems due to large gaps in the dataset.

Daytime flux partitioning: benchmark with BGC16
The daytime flux partitioning was benchmarked with results of the BGC online tool (BGC16, Sect.3), which is based on pvWave code developed by Lasslop et al. (2010) and used in the processing of the 2015 FLUXNET release (Pastorello et al., 2017).

Differences in code
BGC16 differed from REddyProc (Sect.2.3.2),mainly in aspects of separation of nighttime data, estimation of temperature sensitivity from nighttime data, uncertainty estimation, treatment of missing values, and optimization library code.
While for separating nighttime data REddyProc used the exact solar time, where the sun culminates exactly at noon, BGC16 used the local winter time.
For the estimation of temperature sensitivity E 0 from nighttime data, BGC16 used a reference temperature of 15 • C, instead of the median temperature inside the window.Hence, it estimated stronger correlations between parameters for windows with a different temperature range.Moreover, it omitted smoothing of the estimated E 0 across time, often leading to large fluctuations of the E 0 estimates across a few days (Supplement), larger estimates of its uncertainty, and differences in subsequent estimation of LRC parameters.
For uncertainty estimation, BGC16 relied on the curvature of the LRC fit's optimum instead of a bootstrap procedure.Hence, it could not take into account the uncertainty of E 0 estimated from nighttime data before the daytime LRC fit.Moreover, during interpolation of fluxes based on previous and subsequent valid estimates, the distance weights differed.While REddyProc assigned the estimates to the time of the mean of valid record in a window, BGC16 assigned it to the start of the third day, also if there were only valid data for the first day in the window.
For weighting the records in the LRC fit, BGC16 used the raw estimated NEE uncertainty of each record.It did not check for high leverage of spurious low NEE uncertainty estimates.Its estimates, therefore, were in some windows very strongly influenced by a few records, and failed if a NEE uncertainty estimate of zero was provided.Moreover, when there were missing values or values below zero in a given NEE uncertainty, it set all uncertainty to 1, while REddyProc filled the gaps by setting the missing uncertainty to the maximum of 20 % of respective NEE but at least 0.7 µmol CO 2 m −2 s −1 .
Treatment of missing values was not considered by BGC16 and assumed to be handled prior to the processing.Hence, it did not handle missing VPD values and did not retry the LRC fit without the VPD effect in order to also use records with missing VPD.Moreover, as described above, when there were missing values of NEE uncertainty, weighting records in the LRC fit were omitted.For compatibility with BGC16, the above code differences can be disabled in REddyProc.But differences in optimization library code and specifically the conditions of nonconvergence on scattered data could not be eliminated, which led to differences in results as shown in the following section.

Benchmark results for daytime partitioning
Annually GPP predictions of both implementations showed no significant bias across the test sites (Fig. 7), although there was some scatter among individual predictions.Similar scatter was observed when comparing the predictions of the default REddyProc options to the predictions with compatibility options.Most of the differences were caused by decreasing the unreasonable high influence of NEE records with small NEE uncertainty (Supplement).
The largest differences in aggregated fluxes between implementations were due to the extrapolation of fitted parameters to periods where no parameter fits were obtained.In many of these cases, there were fits at the boundaries of these periods, whose validity was questionable.Whether these fits passed the quality check or not had a large influence on the extrapolation and hence on the aggregated values.For example, at RU-Cok parameter estimates for valid periods agreed between implementations.However, no valid parameters could be obtained for winter months.While REddyProc reported missing values, BGC16 also reported GPP values based on summer parameterizations for periods further away from summer, which in turn led to higher annual GPP estimates.
Uncertainty estimates of gross fluxes approximately doubled with REddyProc due to the accounting for uncertainty in temperature sensitivity estimates from nighttime data (Supplement).

Discussion of daytime flux partitioning
Agreement between aggregated fluxes predicted by the daytime method and absence of bias for the test sites (Fig. 7) suggest that the methods can be used interchangeably for upscaling, although differences in results of influential sites can potentially propagate to differences in upscaled estimates.REddyProc provides a quality flag for the results of the daytime partitioning, which allows less reliable data to be excluded in upscaling studies.For the results associated with good quality flags, we have greater confidence in the REddyProc-based estimates.
The daytime flux partitioning is quite sensitive to the details of the LRC fit.Small changes in treatment of extreme or missing NEE uncertainty estimates or changes in preprocessing and treatment of missing values cause different estimates of LRC parameters and propagate to predicted fluxes of GPP and R eco .Although we put much effort in trying to reproduce the results of BGC16, we were not able to eliminate all differences, especially in the subtle details in the parameter optimization library codes.The differences in predicted half-hourly fluxes, however, average out across sites and across time (Supplement), making this issue less severe at larger scales.
The estimated uncertainties are even more sensitive.Both implementations occasionally produce unreasonably high outliers that affect the aggregated values.REddyProc, in general, estimates higher uncertainties of predicted fluxes because it accounts for uncertainty in temperature sensitivity.Note that the uncertainty introduced to annually aggregated fluxes due to flux partitioning is smaller than uncertainty due to an uncertain u * threshold estimate.Hence, differences or difficulties in uncertainty estimation caused by flux partitioning do affect conclusions of the overall uncertainty estimates to a lesser extent.

Conclusions
The REddyProc software provides a set of tools for the CO 2 -focussed post-processing of eddy covariance flux data including u * filtering, gap filling, and flux partitioning, and propagation of the uncertainty from the u * filtering to the gap-filled NEE and partitioned GPP and R eco .
The freely available R-package enables researchers to integrate the flux data processing into their own offline environment or work stream without the need of uploading data.This seamless integration allows overall workflow to be improved, processing routines to be sped up, and ultimately cleaner, reproducible scientific results to be generated.
The compatibility of the implemented methods with the available standard tools provides continuity of the data analysis when adopting REddyProc for processing EC data.REddyProc can closely reproduce results of the widely used BGC online tool (BGC16, Sect.3).
A number of enhancements provide more flexibility to the user in the processing of their data.For instance, the new processing allows multi-year data to be treated without breaks at annual boundaries that can significantly affect sites in the Southern Hemisphere or sites characterized by vegetation activity in winter.Another new feature of REddyProc is the flexibility to define different seasons for the application of the u * -filtering and gap-filling routines, which is important for sites with discontinuous surface cover associated with snowmelt, dry seasons, or harvest.
Sensitivity of the results to subtle details of the implementation, however, calls for caution when interpreting results.This is especially true for u * threshold estimation and the daytime flux partitioning, and especially for data with long gaps.
Continued integration of new methodological developments into the package will support research using EC data.We strive to provide new developments in a basic and extensible manner, while paying attention to compatibility with results of reference implementations.
In summary, research using (half-)hourly eddy covariance data can benefit from building blocks for standardized and extensible post-processing provided by REddyProc.

B2 Estimating the u * threshold distribution
The second step is the estimation of the distribution of u * thresholds to identify periods of low friction velocity (u * ), where NEE is biased low.Discarding periods with low u * is one of the largest sources of uncertainty in aggregated fluxes.Hence, several quantiles of the distribution of the uncertain u * threshold are estimated by a bootstrap.
The friction velocity, u * , needs to be in a column of the input dataset named "Ustar".
The subsequent post-processing steps will be repeated using the three quantiles of the u * distribution.They require a u * threshold to be specified for each season as well as a suffix to distinguish the outputs related to different thresholds.
For this example of an evergreen forest site, the same annually aggregated u * threshold estimate will be chosen for each of the seasons within a year.In order to distinguish the automatically generated columns, the column names of the estimation results are written for the variable uStarSuffixes.

B3 Gap filling
The second post-processing step is filling the gaps using information from the valid data.In this case, the same annual u * threshold estimate is used for each season, as described above, and the uncertainty will also be computed for valid records (FillAll).The screen output (not shown here) already shows that the u * filtering and gap filling was repeated for each given estimate of the u * threshold, i.e., column in uStarThAnnual, with marking 22 % to 38 % of the data as a gap.
For each of the different u * threshold estimates, a separate set of output columns of filled NEE and its uncertainty is generated, distinguished by the suffixes given with uStarSuffixes.Suffix "_f" denotes the filled value and "_fsd" the estimated standard deviation of its uncertainty.

B4 Partitioning net flux into GPP and R eco
The third post-processing step is partitioning the net flux (NEE) into its gross components GPP and R eco .The partitioning algorithm needs a precise criterion between nighttime and daytime.Therefore, geographical coordinates and the time zone need to be provided to allow the exact solar time of sunrise and sunset to be computed.Further, missing values in the meteorological data used need to be filled.Now we are ready to invoke the partitioning, here by the nighttime approach, for each of the several filled NEE columns.

B5 Estimating the uncertainty of aggregated results
First, the mean of the GPP across all the years is computed for each u * -scenario and converted from µmol CO 2 m −2 s −1 to gC m −2 yr −1 .The difference between these aggregated values is a first estimate of the uncertainty range in GPP due to uncertainty of the u * threshold.
For a better but more time-consuming uncertainty estimate, specify a larger sample of u * threshold values, peat the post-processing for each, and compute statistics from the larger sample of resulting GPP columns.This can be achieved by specifying a larger sequence of quantiles when calling sEstUstarThresholdDistribution in Sect.B2.The results still reside inside the sEddyProc class.To export them to an R Data.frame, the newly generated columns need to be appended to the columns with the original input data.Then this data.frame is written to a text file in a temporary directory.B7 Specifying seasons where the u * threshold differs With changing surface roughness, e.g., during harvest or leaf fall, the u * -NEE relationship can also change.Therefore the u * threshold needs to be re-estimated at different times of the year, called seasons.The default uses continuous seasons; for details see Sect.3.2.1.In order to yield results corresponding to DP06, the user can specify seasonFactor.v= usCreateSeasonFactorMonthWithinYear( EddyData.C$sDATA$sDateTime, startMonth= c(3,6,9,12)) as an argument to the routine sEstUstarThreshold.By default the annual aggregate of the season thresholds, i.e., maximum across seasons, is used to identify unfavorable conditions, but the seasonal estimates can also be used instead.Moreover, the users can also specify other user-defined seasons, e.g., when harvest dates are known (see package vignette DEGebExample).They can create a grouping by specifying exact starting days of the periods by the function usCreateSeasonFactorYdayYear, or they can provide a column with the data that indicate, e.g., the same group for two wet seasons.Each season is associated with the year corresponding to the center day between the first and last day of the season.
With all methods, there is a required minimum number of 160 records within a season.If there are too few records, the data of the seasons within a year are combined and the u * threshold for these seasons is set to the estimate obtained for the data of the entire year.

Figure 1 .
Figure 1.The workflow starts with importing the half-hourly (or hourly) data, in this example the year 1998 of the DE-Tha site.Next, a probability distribution of u * threshold is estimated for each season.Gap filling and flux partitioning are performed for several quantiles of this distribution for an estimate of uncertainty.Finally the results are exported.
using the mlegp R-package that also estimates uncertainty of the smoothed E 0 .Next, a prior respiration, R Ref , for reference temperature T Ref = 15 • C is re-estimated from nighttime data for each window with smoothed E 0 .(3) Parameters of the rectangular hyperbolic light-response curve (R Ref , α, β 0 , k) are fitted using only daytime data and the previously determined temperature sensitivity (E 0 ) for each window.(4) Finally, for each NEE record, GPP and R eco are estimated with the parameter set of the previous valid window and the parameters of the next valid window, and the two results are interpolated linearly by the time difference to the window centers.The T. Wutzler et al.: REddyProc Supplement reports necessary technical details about these steps.

)Figure 3 )Figure 4 .
Figure 3. u * thresholds derived using different methods deviate for single sites.The relationship across site years is retained as indicated by a regression (solid line with shaded uncertainty bound) close to the 1 : 1 line (dashed).

Figure 5 .
Figure 5. Predictions of NEE by REddyProc after gap filling agree with BGC16 both at half-hourly values (a), shown for the DE-Tha 1998 case, and annual means across sites (b).Larger quality flags are associated with larger window sizes.

FR)Figure 6 .
Figure 6.Predictions of annually aggregated ecosystem respiration, R eco , from REddyProc nighttime partitioning agree with the predictions by BGC16.

)Figure 7 .
Figure 7. Prediction of annually aggregated GPP from REddyProc daytime partitioning agree with BGC16 across sites.

FilledEddyData
u * filtering 60 % and 48 % for upper and lower u * threshold Concept of the u * filter: nighttime NEE at low u * is biased towards lower NEE values compared to cases with higher u * .Unbiased NEE should scatter around the same plateau because environmental conditions are similar.The u * threshold (dashed line), i.e., the value below which this bias is considered significant, is estimated by a moving point method on u * bins (crosses) across half-hourly records (circles).The example uses a subset of data from DE-Tha.

Table 1 .
Description of sites and times used for benchmarking REddyProc.
Appendix C: Abbreviations used repeatedly in the paper threshold estimation by Dario Papale (Sect.3) BGC16 2016 version of the online tool provided by the MPI-BGC in Jena (Sect.3) *