Multiple quality tests for analysing CO 2 fluxes in a beech temperate forest

Abstract. Eddy covariance (EC) measurements are widely used to estimate the amount of carbon sequestrated by terrestrial biomes. The decision to exclude an EC flux from a database (bad quality records, turbulence regime not adequate, footprint problem,...) becomes an important step in the CO2 flux determination procedure. In this paper an innovative combination of existing assessment tests is used to give a relatively complete evaluation of the net ecosystem exchange measurements. For the 2005 full-leaf season at the Hesse site, the percentage of rejected half-hours is relatively high (59.7%) especially during night-time (68.9%). This result strengthens the importance of the data gap filling method. The data rejection does not lead to a real improvement of the accuracy of the relationship between the CO2 fluxes and the climatic factors especially during the nights. The spatial heterogeneity of the soil respiration (on a site with relatively homogenous vegetation pattern) seems large enough to mask an increase of the goodness of the fit of the ecosystem respiration measurements with a dependence on soil temperature and water content when the tests are used to reject EC data. However, the data rejected present some common characteristics. Their removal lead to an increase in the total amount of CO2 respired (24%) and photosynthesised (16%) during the 2005 full-leaf season. Consequently the application of our combination of multiple quality tests is able improve the inter-annual analysis. The systematic application on the large database like the CarboEurope and FLUXNET appears to be necessary.


Introduction
Carbon dioxide exchanges between the terrestrial ecosystems and the atmosphere are of major importance for cli-Correspondence to: B. Longdoz (longdoz@nancy.inra.fr) mate change and therefore for the future of the vegetation (Houghton et al., 1998). The quantification of CO 2 fluxes at the ecosystem-atmosphere interface is one of the primordial steps to improve our knowledge about the ecosystem carbon budget. The eddy covariance (EC) technique  provides the opportunity to have a direct measure of these fluxes. Sites equipped with EC systems spread around the world (Baldocchi et al., 2001) with, at the present time, more than 400 stations (http://www-eosdis. ornl.gov/FLUXNET), some of them have been running continuously for more than 10 years. The EC technique is based on high frequency (10-20 Hz) records of wind speed components, sonic temperature, CO 2 and H 2 O concentrations and includes a post processing procedure with several methodological choices (Finnigan et al., 2003). The method requires periods with developed atmospheric turbulent regime (Feigenwinter et al., 2004;Rebmann et al., 2005). For example, during quite nights, the CO 2 produced by the respiration of the ecosystem components can be stored by the canopy air or blown horizontally by advection (Paw et al., 2000) and is not registered by the EC system. For our temperate beech forest site (Hesse, France), some corrections with canopy air storage measurement and selection of the data without advection  are performed but they don't completely erased all the problems as short-term net CO 2 flux fluctuations during night-time (Longdoz et al., 2004) without any biophysical explanation are still observed. The question of the presence of instrumental anomalies, non stationary conditions, footprint outside of our beech forest have then arise to explain these observations. These problems lead to errors and propagation of uncertainties that are able to mask some properties of biophysical processes. The amount of data produced is so large that the visual detection and removing of non adequate data is impossible. The improvement of the EC dataset quality by automatic procedure has become a real challenge for the EC scientific community (Richardson et al., 2006a;Papale et al., 2006). Different authors have presented several tests for the selection of the EC data (Vickers and Mahrt, 1997;Wichura, 1996, Göckede et al., 2004) and several methods to fill the gaps existing in the dataset (Falge et al., 2001;Hui et al., 2004;Ruppert et al., 2006, Moffat et al., 2007. In this study, we combine most of the tests proposed for the CO 2 flux. This innovative grouping is applied to the records from the Hesse site for the full-leaf 2005 season. This period presents some reasonably standard climatic conditions (no extreme events). The duration of the period is short enough to assume relatively stable ecosystem response to environmental factors and long enough to provide a sufficient quantity of data (even after quality tests selection) to analyse these responses. The impact of our relatively complete combination of tests is evaluated by comparison with the datasets including or not including the records incriminated by the tests. The analysis is performed on the relationships between CO 2 fluxes and climatic factors and on the total fluxes accumulated during the full-leaf 2005 season.

Site
All the data used in the present analyses come from an experimental plot located in the state forest of Hesse (48 • 40 ′ N, 7 • 04 ′ E, North-east of France). This site belongs to the Car-boEurope network. The climate is temperate with 860 mm and 9.3 • C for mean annual rainfall and air temperature (mean on 30 years 1974-2003). The stand is composed mainly (90%) of Beech (Fagus sylvatica). For the period considered in this paper (full-leaf season from 15 May to 14 October 2005) the trees were 39 years old and 17 m high (mean value), the LAI (5.1 m 2 m −2 ) and tree density (2916 stem/ha) were relatively low compared to the previous years (mean LAI 7.3 m 2 m −2 ) because of the thinning performed during the winter 2004-2005. The gentle slope (approximately 3% going down in the Northeast direction) is sufficient to induce advection during the stable nights . The distance between the EC tower and the forest edge varies with wind direction from 390 m to 1610 m. The full-leaf season selected (2005) can be qualified as relatively normal when compared to the mean climate of the 30 previous years. The mean air temperature is slightly lower (16.0-16.4 • C) and even if the total amount of precipitation is higher (381-241 mm) the cumulative global radiation is also more important (2957-1846 MJ m −2 ) for 2005. A more detailed description of the site can be found in Granier et al. (2000a, b).

Flux measurements
The net CO 2 fluxes between the ecosystem and the atmosphere (Fc) were measured with an eddy covariance system composed by a sonic anemometer Solent R3 (Gill Instruments Ltd, Lymington, UK) and an infrared gas analyser Li-Cor 6262 (Li-Cor Inc., Lincoln, NE, USA). The anemometer measuring the three components of the wind velocity and sonic temperature (u, v, w, T ) at 20 Hz is located on a tower at 23.5 m above ground. The IRGA measuring CO 2 and H 2 O concentrations at 10Hz is located at the ground level, analysing the air sucked from a sampling point close to the anemometer with an air flow rate of 6 l min −1 . A mass flow controller (Model 5850, Brooks, Veenendaal, Netherlands) controls the airflow. The computer acquires the data with the software Eddymeas (Kolle and Rebmann, 2007). To improve u, v and w data, correction for sonic anemometer angle of attack errors is performed (Nakaï et al., 2006). This error becomes significant when the wind vector angle to the horizontal plane is superior to 20 • (threshold value depending on the sonic anemometer type). It is provoked by transducers self-sheltering or flow distortion induced by the anemometer frame. For each half-hour, the Fc fluxes are calculated from high frequency w and CO 2 concentration measurements using block averaging operator (Finnigan et al., 2003) and planar fit as coordinates rotation method (Wilczak et al., 2001). Finally, frequency correction applied to Fc follows the procedure proposed by Aubinet et al. (2000).
The net ecosystem exchange (NEE) is obtained by the summation of Fc and change in CO 2 storage in the canopy air (Sc). Sc corresponds to the difference between the total amount of CO 2 below the eddy covariance measurement height, at the beginning and the end of the half-hour. This amount of CO 2 is determined from a profile of concentration estimated from measurements at 6 different heights (22 m, 10.4 m, 5.2 m, 2 m, 0.7 m, 0.2 m). These measurements are performed with an infrared gas analyser Li-Cor 6262 (Li-Cor Inc., Lincoln, NE). For each level, the concentration used for the Sc computation is the average of the values recorded during 10 s after purge. More information about tubing, pumps and filters used are given in Granier et al. (2000b).

Data check procedure
The eddy covariance data treatment used in this study includes several tests ( Fig. 1) to detect flux sampling problems, periods with advection, low climatic stationarity or when flux is coming out of the target forest plot. After removal of periods corresponding to break-down and maintenance of the EC and profile systems, the records are first flagged following Vickers and Mahrt (1997). The objective is to identify abnormalities that may result from instrumental or data recording problems coming from the anemometer or the IRGA (corresponding to hard flag in Vickers and Mahrt). For each halfhours, the tests on high frequency measurements of vertical wind velocity and CO 2 concentration identify first the presence of spikes and flag the half-hour or remove the spikes according to the percentage of presence. After that, flag is activated if unrealistic values or discontinuities of mean or variance are detected. Finally higher-moment statistics (skewness and kurtosis) and standard deviation are computed and the half-hour is flagged if one of these parameters is outside of the tolerable range. The criteria to define a spike, an unrealistic value, a discontinuity and a tolerable range for spike percentage, higher-moment statistics and standard deviation are based on threshold values. As proposed by Vickers and Mahrt (1997), the thresholds are empirically adjusted for the Hesse site by inspection of frequency distributions of the parameters tested. The objective is to flag any records with obvious instrument problems. This threshold determination procedure realised for the Hesse 2005 full-leaf season can be illustrated by the choice of upper limit of the tolerable range for the kurtosis of CO 2 concentration (C) data, which is the more selective test (see results section). The upper limit of the kurtosis is set to 7.9 to be sure to flag the halfhour record like the one presented in the Fig. 2a. Indeed, the kurtosis of this half-hour is 8.1 because of few irregularities (too large to be considered as spikes) that happen at regular interval indicating their instrumental origins. This half hour record has to be flagged. Moreover, we have not found any half-hour with a lower kurtosis and presenting apparent instrumental problem. For example, turbulence with varying intensity (Fig. 2b) can explain a kurtosis slightly lower than the threshold (7.7) and should not be flagged. After the thresholds set up, the flag procedure is automatic and contrary to Vickers and Mahrt (1997) all the half-hours flagged have not been individually analysed to verify the origin of the flux-sampling problem. This procedure is not materially feasible when it is applied to large datasets. In consequence it is possible that a few correct fluxes are flagged but this conservative procedure, excluding a maximum of technical anomalies, is preferable. The last test to detect instrumental abnormalities is the verification of the airflow rate in the tubing transporting the air from the sampling point (at the top of the tower) to the EC IRGA. This rate is controlled and measured by a mass flow controller (Tylan 261, Tylan Corporation, Torrance, CA, USA). It is set to 6 l min −1 leading to a constant time lag of 4.8 s between w and C measurements. The flag is activated when the airflow rate is 10% below or above the desired value. This range is chosen because the post-processing programme is able to correct any deviation of the time lag in this range.
Additionnal tests do not refer to instrumental abnormalities. The first one check the Fc stationarity following the procedure presented by Foken and Wichura (1996). The Fc value determined for the half-hour period is compared to the mean out of six 5-min Fc from the same period. The flag is activated when the difference between both values (due for example to changing weather conditions) is above 30% so when the Fc data could not be used for fundamental research . The second test verifies if the footprint area is located in the targeted ecosystem (young beech plots of the Hesse forest). The Schuepp model (Schuepp et al., 1990) modified by Soegaard et al. (2003) determines the Fc footprint area. The record is flagged when more than 10% of Fc is coming from patches located out of Hesse beech forest. The spatial resolution used corresponds to patches delimited by circles centred on the tower with diameters multiples of 50 m and lines passing by the tower and separated from each others by an 5 • angle The forest edge is determined with the Hesse land use map already utilized in Rebmann et al. (2005) and Gockede et al. (2007). The last test confirm if the turbulent regime is developed enough to applied the EC method (u * test). This u * has been elaborated because different authors (Staebler et al., 2004;Aubinet et al., 2005) have shown that NEE estimated by summation of Fc and Sc can underestimate CO 2 exchanges during periods with low turbulence (low friction velocity u * ). Night-time Reco measurements clearly highlight this underestimation. At constant temperature, Reco drops down when u * decreases. This observation has no apparent biophysical explanation. At the Hesse site, additional measurements have proven that some CO 2 emitted during night by the ecosystem components goes out of the forest by horizontal advection when air mixing is limited . This CO 2 is not detected by the eddy covariance or profile concentration measurement systems explaining flux underestimation. We have established a u * threshold below which Reco is not correctly measured by a two step procedure. In the first step, the temporal variability of Reco, mainly due to temperature fluctuations, is determined. Using data during periods without water stress and no flagged by the previous tests, the dependence of Reco on temperature is fitted (regression algorithm presented in the following section) by a Q 10 relationship (Black et al., 1996): where Q 10 is the parameter reflecting the temperature sensibility and Reco T ref is the Reco value for a reference tem- to obtain a value Reco s that should be relatively constant during non water stressed periods. The second step consists in distributing Reco s in u * classes. For each class, we compute (Reco s ) cl corresponding to the Reco s class average, and (Reco s ) hi corresponding to the Reco s averaged on all the data with u * above the upper limit of the class. The ob-jective is to detect the classes with a (Reco s ) cl significantly (p<0.05) lower than (Reco s ) hi (comparison procedure described in Sect. 2.5). Among the latter classes, the one with the higher u * is selected. The u * threshold corresponds then to the upper limit of this class.

Datasets
Two datasets are established both compiling the main micrometeorological variables (global radiation, air and soil temperature, photosynthetic photon flux density, soil water content, air humidity, friction velocity. . . ) and the NEE values for the 7344 half-hours between the 15 May and 14 October 2005. In the first dataset, the gaps in the NEE correspond only to the breakdown and maintenance periods. This dataset is called DSIFR (DataSet Including Flagged Records) in the following (Fig. 1). In the other dataset, in addition, the records flagged by the tests are removed and replaced by gaps (DataSet Excluding Flagged Records, DSEFR, Fig. 1). Two different dataset partitioning are applied during the analysis. The partitioning between night and day is based on global radiation (Rg) with night records corresponding to Rg below 3 Wm −2 . During night-time NEE corresponds to ecosystem respiration (Reco) and during daytime to the sum of gross primary productivity (GPP) plus Reco. The second division aims at isolating the periods without soil water stress for Reco. As soil respiration represents usually the major part of Reco (Law et al., 1999;Longdoz et al., 2000) and because Ngao (2005) has demonstrated that soil respiration in the Hesse forest is limited by water depletion when soil water content on the first 10 cm (SWC) is below 0.2 m 3 m −3 ; we have adopted this threshold for the dataset partitioning.

Statistical analysis and regression
All the statistical analysis and non-linear regressions are performed by Statgraphics Plus software (Statistical Graphics Corp., Herndon, VA, USA). A t-test is used to compare two data samples and determine whether they are significantly different. During the t-test the threshold to consider significant difference is set to p<0.05 and the equality of the two samples variance is assumed. ANOVA (F-test) is used for the multi-groups comparison with the same p threshold value. The non-linear regressions are used to determine the relationship between meteorological variables and fluxes. The algorithm is an iterative procedure, determining the parameters that minimize the residual sum of squares (Marquardt method). For some of the regressions with a single independent variable, it is suitable to reduce the possible impacts of other variables and spatial heterogeneity. Then the regressions are performed with bin-average fluxes where the averages are performed for independent variable classes. The averages are weighted with weights proportional to the reciprocals of the squared standard errors (Murtaugh, 2007). 3 Results-discussion

Tests control results
Among the 7344 half-hours treated, only 2.1% correspond to total breakdown (mainly electricity failure) or maintenance of the EC system and 2.6% to specific maintenance or bad functioning of the profile sampling system. On the remaining Fc data (7001 values), flags are activated on 55.0% of the data (4038 half-hours). Consequently, following our procedure 59.7% of the total period is not acceptable in the DSEFR. This is a relatively high percentage but not very surprising in regard to the proportion already presented in the literature. Rebmann et al. (2005) flagged 28% of the Hesse Fc for the summer 2000 only with the stationarity test. Even if the flagging procedure is not completely identical, Vickers and Mahrt (1997) flagged during their measurement campaigns one-third of the records with the variance discontinuity test and one-half because of too large kurtosis. In our procedure, the causes of the Fc flags can be divided in six categories: anemometer, CO 2 IRGA and mass flow controller malfunctioning, lack of stationarity, too large footprint and too low u * . The percentages of Fc flags due to each category are presented in the Table 1. The values given for the three last tests are computed from a dataset where the half-hours flagged by the three first tests on anomalies have been excluded. Three tests appear to be highly restrictive: CO 2 IRGA anomalies, u * and stationarity. The percentage of stationarity flags is lower to that estimated by Rebmann et al. (2005) for the Hesse summer 2000 (28%) but the data were not selected before through the CO 2 anomalies test as here. The sum of all the half-hours flagged by the different tests (4296) exceeds the total value given in the Table 1 (4038) because of multi-flagged data To go one step further in the analysis and according to the Sect. 2.3, we subdivided the CO 2 IRGA flag into seven tests: spikes, mean discontinuity, variance discontinuity, absolute limits, skewness, kurtosis and standard deviation. The higher percentage of bad data is obtained by the kurtosis test (Table 2) with flags on 29.9% of the total period. This predominance of the flag activation by the kurtosis test has also been observed by Vickers and Mahrt (1997) on their own measurements and comes perhaps from the fact that high moments values are probably affected by intermittent turbulent regime in addition to instrumental anomalies. As for Table 1, the sum of all the half-hours flagged by the different CO 2 IRGA tests (3262) exceeds the total value (2297) because of multiflagged data. Only two half-hours are flagged by the spike test. Indeed, the sudden variations in the records are often larger than the maximum width of what can be considered as spike (4 points equivalent to 0.4 s). These variations are taken into account in the higher-moment statistics explaining their relatively high percentages of flags.

Friction velocity thresholds
Before to applied the u * test, the threshold have to be determined (see Sect. 2.3.). The u * test will be used to exclude some records only from in DSEFR ( Fig. 1) but we have determined the u * threshold for the two datasets to estimate the impact of all other tests presented here (applied before the u * one) on this value (for few EC sites the u * test is used without applying first other tests). Because the Q 10 relationship fit depends on the available dataset, the u * threshold determination procedure is applied separately on DSIFR and DSEFR. The Q 10 relationship fit on the Reco data (night NEE) are performed excluding soil water stress periods (Reco s representing 17% of the total period). The bin-average is used to erase the impact of the Reco s values affected by horizontal advection. We tested soil temperatures at 10 and 5 cm depth as  Table 3. The 10 cm depth soil temperature (Ts 10 ) is chosen because of its better general aptitude to explain the Reco variations. This is probably due its superior ability to take into account the spatial heterogeneity, because the 10 cm depth soil temperature value is an average of six measurements while the 5cm temperature is measured with only one sensor.  . This result suggests that the u * threshold is relatively constant with time. The u * filter is apply for both night and daytime and reject respectively 31.3% and 9.0% of the halfhours for a total of 12.3% but a part of these half-hours were already flagged by an other test. Then, when u * filter is the last test employed, it is responsible of an increase of the gap fraction from 50.3% to 59.7% in DSEFR. This increase is especially large for night-time (from 50.9% to 68.9%) illustrating the presence of advection events. Even if our conservative way to set the thresholds for the Vickers and Mahrt (1997) instrumental malfunctioning tests can be responsible of a slightly overestimation of the percentage of data removed in DSEFR, it's high value, especially during night-time (68.9%), stresses the importance of the data gap filling method (Moffat et al., 2007).

Temporal variability of ecosystem respiration
The temporal variability of Reco is usually attributed to temperature and soil water content variation (Carlyle and Ba Than, 1988;Richardson et al., 2006b). To separate the influence of these two environmental factors, the periods without any soil water stress (Sect. 2.4) are selected to analyse the temperature effect with a Q 10 function. The regression is performed with Ts 10 as explained above. After computing the residues of this regression on all the Reco values, they are used to simulate SWC impact with a Gompertz function  : where a and b are two parameters. The Reco complete function (multiplication of the Q 10 and Gompertz functions) is compared to the data to indicate the degree of temporal variability explained by Ts 10 and SWC. When this procedure is applied on the DSIFR and DSEFR half-hour data, the determination coefficients (r 2 ) are very low (respectively <0.01 and 0.02). This can reflect the fact that Q 10 and Gompertz functions are not the perfect parameterisation to simulate ecosystem respiration but this explanation is not sufficient to justify so low r 2 . The other possibility is the existence of factors different from Ts 10 and SWC as major causes of the temporal variability of Reco. The time evolution of the microbial population, available soil carbon content or root and aerial biomass can be evoked to explain the temporal variability of Reco. However they have too low variations compared to the short-term variations of Reco measurement (Fig. 3). The change in footprint seems to be a more likely candidate as it could happen between two consecutives half-hours. However, this hypothesis implies Biogeosciences, 5, 719-729, 2008 www.biogeosciences.net/5/719/2008/  a noteworthy ecosystem spatial heterogeneity and it is not apparently the case for the Hesse site that has a reasonably homogenous vegetation type and age. To give some indications about the possible impact of footprint changes on Reco temporal variability, we select measurements for a narrow range of Ts 10 (from 11.5 • C to 12.5 • C) and excluding soil water stress period. These measurements are compared according to their provenance from a geographical sector. This comparison cannot completely replace a full analysis combining footprint model and detailed map of soil respiration but this map is not yet available. Moreover, the subdivision in patches with homogeneous soil respiration that would result from this procedure will probably lead, in the DSEFR case, to a too low number of data per patch to investigate the temperature and soil water impact for many patches. Our Reco comparison between the geographical sectors shows differences. The more evident one appears when the East-Southeast sector (wind direction between 75 • and 155 • ) is compared to the sector including the other wind directions. The ecosystem in the East-Southeast sector (mean Reco=6.01 µmol m −2 s −1 ) produces significantly more CO 2 (p=0.034) than the ecosystem out of this zone (mean Reco=3.99 µmol m −2 s −1 ). When sectors with 45 • width are determined, the ANOVA performed on 5 groups (not enough data between 225 • −360 • ) gives a statis- tically significant difference between the 5 means (p=0.029). The large Reco disparity found with this ANOVA (up to almost 4 µmol m −2 s −1 ) has a sufficient order of magnitude to potentially explain many of the short-term Reco variations. One of the possible causes of this heterogeneity could be the soil respiration dependence on carbon to nitrogen content ratio (C/N). Ngao (2005) has demonstrated this dependence but it could be completely incriminated if the soil C/N map (work in progress) shows a spatial heterogeneity in agreement with the results of the geographical sectors analysis.
To overcome the spatial heterogeneity problem in the study of the Ts 10 and SWC influences on Reco, we use the bin-average technique. The bin-averaged Q 10 (with T ref =10 • C) and Gompertz regressions are presented in Figs. 4 and 5. Bin-average improves clearly the goodness of fit (Table 4) with Ts 10 being the main explaining factor for Reco temporal varibility, as usually found (Richardson et al., 2006b). Reco 10 (Table 4) are higher than the value for the equivalent parameter found for soil respiration, as expected (Rs 10 from 1.5 to 2.4 µmol m −2 s −1 , Ngao, 2005) because of the leaves and aerial wood CO 2 production. However, Rs 10 represents 65% to 36% of the CO 2 sources according to the soil plot investigated. The percentage for the less productive soil plots are low compared to the mean European forests www.biogeosciences.net/5/719/2008/ Biogeosciences, 5, 719-729, 2008  Janssens et al., 2001) but there are perhaps not representative of fluxes measured by the EC system. Contrary to Reco 10 , the Q 10 value is lower in our study (Table 4) compared to the Q 10 estimated for soil (2.55, Ngao, 2005). This is coherent with the contribution of aerial biomass to Reco that includes sources less sensible to Ts 10 than the soil.
The tests filtering does not lead to an increase of the coefficient of determination (it's even a decrease) of the regressions (for the bin-average and simple cases). The moderate validity of the relationships to describe the ecosystem respiration and fill the night gaps in the database implies that any overestimation of data rejection rate during night could lead to an increase of the uncertainty of the Reco (and then NEE) seasonal or annual estimation. In this context, the use of the u * filter to detect advection events could perhaps be improved Ruppert et al., 2006). Nevertheless, the fact that the regression curves for DSIFR and DSEFR give differences in Reco that range from 19.8% to 34.4% in the 5 • C-20 • C Ts 10 interval, proves that the data eliminated by this filtering are not evenly distributed.

Temporal variability of gross primary productivity and net ecosystem exchange
In the two datasets, the GPP is calculated for daytime halfhours showing valid NEE measurements (91% for DSIFR and 46.7% for DSEFR). For each time step, the Reco simulated with the parameterisation presented in the previous section is subtracted from the NEE measurement to achieve GPP. The existence of two Reco parameter sets (DSIFR and DSEFR) leads to two GPP time series. The main factor influencing GPP is the photosynthetic photon flux density (PPFD, µmol m −2 s −1 ). This influence is parameterised with the Michaelis-Menten relationship adapted by Falge et al., (2001): where α is the ecosystem quantum yield (µmol m −2 s −1 ) and GPP 2000 (µmol m −2 s −1 ) is the GPP for PPFD equals to 2000 µmol m −2 s −1 . The r 2 of the regression of this relationship on half-hours data are 0.57 (DSIFR) and 0.52 (DSEFR), thus much higher than the Reco-Ts 10 ones in the same conditions (<0.01 and 0.02). The lower dispersion of the experimental points around the parameterisation curve is probably due to the lower spatial variability for GPP comparing to soil respiration. This comes from the lower spatial variability in vegetation characteristics comparing to the soil ones. To improve the GPP-PPFD relationship, the bin-average method is also implemented. This allows hiding spatial variability and possible control of other environmental factors like air temperature (Ta) or vapour pressure deficit (VPD). The influence of GPP-PPFD appears then extremely clearly (Fig. 6, Table 5). The residues of this parameterisation don't show any dependence on Ta or VPD. This is not surprising in view of the full-leaf 2005 season climate (relatively humid and temperate). Like for Reco, the data selection with the assessment tests do not improve the quality of the regression. This is demonstrated by the a DSEFR r 2 lower than the DSIFR one (Table 5). However, the difference between the parameters obtained with these regressions gives significant variation in GPP between DSIFR and DSEFR cases whatever PPFD. The difference ranges from 7.0 to 38.3% when PPFD vary from 2050 to 0 µmol m −2 s −1 . This result suggests that the data eliminated by the filtering possess a common feature. The GPP generated by DSIFR and DSEFR are respectively -1246.7 and -1440.6 gC m −2 , therefore they let to an assimilation increase of 193.9 gC m −2 (15.6%). However, the impact of the environmental factors is larger (27.3%, 52.9 gC m −2 ). For NEE, the tests induce a sequestration rise of 35.8 gC m −2 (6.1%). Comparing to the results for GPP and Reco, the effect of the environmental factors is higher (98.2%,) because this effect is cumulative when GPP is added to Reco to obtain NEE in opposition with the part induced by the choice of the parameter sets (DSIFR or DSEFR). It is important to note that the impact of the quality tests on the different CO 2 fluxes have the same order of magnitude than the expected year to year NEE and especially Reco, and GPP variations with regards to the value already published for forests under similar climate (Aubinet et al., 2002;Carrara et al., 2003). Therefore, application of the quality tests is able to strongly influence the inter-annual analysis. This conclusion is still valid even if we do not take into account the data gap filling method, considering that a NEE variation of 38.7 gC m −2 are only a result from the gap characteristics.

Conclusions
The different tests presented in this paper are rarely applied, together and systematically, on large datasets. The results for the 2005 Hesse full-leaf season give an overview of their possible contribution. The high percentage of flagged data detected strengthens the importance to continue the work on data selection and data gap filling methods, especially during night. The strict data selection does not modify the u * threshold.
One of the expected contributions of the quality tests was the reduction of the unexplained short-term Reco fluctuations. This was not achieve because of the large spatial heterogeneity of Reco. Even for a site with homogeneous vegetation like Hesse, the Reco temporal variation analysis should probably be studied from the respiration spatial heterogeneity point of view, before focusing on the data quality. In this context, the way to proceed seems to first apply a footprint model combined with a soil respiration map before to select the data and study the inter-annual variability.
Apparently, the tests have an impact on the dataset properties. On the one hand, the data elimination changes the relationship between the CO 2 fluxes and the environmental factors. On the other hand, the general features of the gaps differ when the quality tests are applied. Consequently, the gap filling by the parameterisations, using the environmental factors as independent variables, produce different Reco and GPP values for DSIFR and DSEFR. Even if this conclusion should be confirm with other parameterizations or model, the total Reco, GPP and NEE for the 2005 full-leaf season vary by respectively 24, 16 and 6%, with more important GPP, more CO 2 produced by respiration processes and higher net sequestration when tests are applied. The combination of these tests have the potential ability to influence the inter-annual analysis for the CO 2 fluxes. Their systematic application on large databases like from the CarboEurope and FLUXNET experiments seems necessary.