Biogeosciences Quality control of CarboEurope flux data – Part 1 : Coupling footprint analyses with flux data quality assessment to evaluate sites in forest ecosystems

We applied a site evaluation approach combining Lagrangian Stochastic footprint modeling with a quality assessment approach for eddy-covariance data to 25 forested sites of the CarboEurope-IP network. The analysis addresses the spatial representativeness of the flux measurements, instrumental effects on data quality, spatial patterns in the data quality, and the performance of the coordinate rotation method. Our findings demonstrate that application of a footprint filter could strengthen the CarboEurope-IP flux database, since only one third of the sites is situated in truly homogeneous terrain. Almost half of the sites experience a significant reduction in eddy-covariance data quality under certain conditions, though these effects are mostly constricted to a small portion of the dataset. Reductions in data quality of the sensible heat flux are mostly induced by characteristics of the surrounding terrain, while the latent heat flux is subject to instrumentation-related problems. The Planar-Fit coordinate rotation proved to be a reliable tool for the majority of the sites using only a single set of rotation angles. Overall, we found a high average data quality for the CarboEurope-IP network, with good representativeness of the measurement data for the specified target land cover types.


Introduction
Continuous monitoring of fluxes between biosphere and atmosphere using the eddy-covariance technique (e.g. Aubinet et al., 2000;Baldocchi et al., 2000) has become an important tool to improve understanding of the role of different types of ecosystems as sources or sinks for greenhouse gases, with a particular focus on CO 2 . Datasets gathered by extensive networks such as FLUXNET (Baldocchi et al., 2001), CarboEurope  and Ameriflux  allow studying detailed ecosystem functions across biomes (e.g. Law et al., 2002;Hibbard et al., 2005;Reichstein et al., 2007b;Yuan et al., 2007) and ecosystem responses to climate anomalies and interannual variability (e.g. Wilson et al., 2002;Ciais et al., 2005;Reichstein et al., 2007a), or conducting inverse modeling studies to constrain carbon fluxes (e.g. Gerbig et al., 2003;Wang et al., 2007). Theoretical assumptions restrict the application of the eddy-covariance technique to conditions that often cannot be strictly observed (e.g. Kaimal and Finnigan, 1994;Lee et al., 2004;, especially at sites placed in topographically challenging terrain or surrounded by a heterogeneous land cover structure. Therefore, to ensure provision of high quality data for the growing community of flux data users, sophisticated protocols for the processing of eddy-covariance measurements (e.g. Mauder et al., 2006;Papale et al., 2006; Correspondence to: M. Göckede (mathias.goeckede@oregonstate.edu) Mauder et al., 2007) as well as for quality assessment and quality control (QA/QC, e.g.  have been established. In most of the studies using eddy-covariance databases on regional to continental scales, each site of the network is used to represent a certain biome type. Since for most sites the scale of horizontal variation is 1 km or less (e.g. Schmid, 2002), a distance that will often be exceeded by the fetch of the flux measurements (e.g. Jegede and Foken, 1999), footprint analyses (e.g. Schuepp et al., 1990;Horst and Weil, 1994;Schmid, 1994;Leclerc et al., 1997;Rannik et al., 2000;Kljun et al., 2002) have to be performed to test under which conditions the assumption of spatial representativeness is valid. It has been demonstrated that landscape heterogeneity has a significant impact on measurement and interpretation of atmospheric data at a single point in space. Panin et al. (1998) and Panin and Tetzlaff (1999) related energy balance closure problems due to underestimation of turbulent fluxes to length scales of landscape inhomogeneity. Schmid and Lloyd (1999) showed potential errors in flux measurements over inhomogeneous terrain due to non-representative footprints, a sensor location bias that had to be taken into account to derive ecosystem scale fluxes from local measurements. In a modeling study,  demonstrated that subgrid scale heterogeneity can alter landscape scale fluxes by up to 300%. Kim et al. (2006) demonstrated the value of flux footprints to scale up flux data from local to landscape scales in order to link terrestrial observations to remote sensing data. Göckede et al. (2004) combined a flux data quality assessment approach  with analytic footprint modeling (Schmid, 1994(Schmid, , 1997 to identify correlations between flux data quality and characteristics of the terrain surrounding the flux site. Their approach was successfully applied by Rebmann et al. (2005) on 18 forest sites of the CARBOEUROFLUX network.
This study presents an update of the study by Rebmann et al. (2005), extending the work to the larger number of 25 CarboEurope-IP sites, which cover a wide range of forest ecosystems, climate zones, and management regimes. The original site evaluation approach  has been replaced here by an improved version  that builds on a more sophisticated Lagrangian Stochastic footprint algorithm  and a more realistic approach to obtain areally-averaged surface roughness lengths (Hasager and Jensen, 1999) as input for the model. The extended site evaluation scheme includes (i) a quality assessment of the eddy-covariance fluxes and spatial structures in data quality, (ii) analysis of the spatial representativeness in terms of land cover structure and (iii) assessment of the performance of the applied Planar-Fit coordinate rotation procedure (Wilczak et al., 2001). Combination of these measures identifies potential problems connected to terrain characteristics and instrumental setup that reduce data quality. Including these findings as quality flags into eddy-covariance databases strengthens data reliability, Biogeosciences, 5, 433-450, 2008 www.biogeosciences.net/5/433/2008/ allowing the user community to filter out measurements that do not meet the standards required for their specific studies.

Data
A total of 25 forested flux measurement sites of the CarboEurope-IP network were analyzed in the context of this study. Site names and main characteristics are listed in Table 1. Flux data processing for all participating sites followed the concept proposed by Aubinet et al. (2000), which has been further refined by Mauder et al. (2006Mauder et al. ( , 2007. Original flow fields were rotated according to the planar fit method (Wilczak et al., 2001), and subsequently the Moore- (Moore, 1986), Schotanus- (Liu et al., 2001), and WPL-corrections (Webb et al., 1980) were performed. The flux data quality assessment is described in detail in Sect. 3.3. No friction velocity threshold (u * criterion) was applied to filter out measurements with low turbulence intensities. For the 25 sites that participated in this study, on average three months of flux data (minimum two months, maximum five months, Table 2) were provided (76 months of total raw data). The majority of the data (93%, or 71 months) was sent as raw data files by the cooperating research teams and uniformly processed with the flux processing software at the University of Bayreuth (TK2, Mauder et al., 2008). At the two sites that provided processed fluxes instead of eddycovariance raw data for this study, data processing protocols exactly matched that of the TK2, ensuring a uniform data processing for the entire study. The average data loss due to gaps in the raw data was 9.2% (minimum 0.3%, maximum 37.6%). For the majority of the sites (14), the data loss due to gaps was below 5%. 24 out of the 25 participating sites provided the land cover maps required to analyze the spatial representativeness of the eddy-covariance fluxes (for CH-Lae, the land cover map was not compatible with the format needed for this analysis). These maps had a mean grid resolution of 40 m (minimum 10 m, maximum 150 m) and an averaged number of land cover classes of 10 (minimum 2, maximum 34). See Ta  3 Site characterization methodology

Source area analysis
The footprint algorithm applied for this study builds on the Thomson (1987) LS trajectory model of Langevin type (e.g. Wilson et al., 1983;Wilson and Sawford, 1996), which is operated forward in time. The exact formulation of the footprint algorithms, the definition of the flow statistics and the effect of stability on the profiles is described in . For each model run, 5×10 4 particles were released from a height equal to 0.01 times the canopy height. A flux footprint can be derived by integrating their trajectories up to the upwind distance accounting for approximately 90% of the total flux. To enable footprint analyses based on 30-min averaged fluxes over periods of months for a large number of sites, source weight functions were pre-calculated and stored for more than 10 000 different combinations of measurement height, roughness length and atmospheric stability. Each 30-min source weight function is projected onto a discrete grid representing the area surrounding the flux tower, with the assigned weighting factors representing the influence of each grid cell to the actual measurement. For each of these grid cells the land cover type has been defined, so that the land cover composition for the specific measurement can be obtained by summing up the weighting factors sorted by land cover. In a similar manner, quality flags for the eddycovariance fluxes (see Sect. 3.3) are projected into the area and stored in a database. An overall quality flag per cell can be determined statistically after the complete observation period has been processed (see Göckede et al., 2004Göckede et al., , 2006, and based on these statistics maps of flux data quality can be produced. The quality measures analyzed in this study are the QA/QC flags for momentum flux, sensible heat flux, latent heat flux, CO 2 flux, and the mean values of the vertical wind component before and after performing the Planar-Fit rotation (Wilczak et al., 2001). Summing up all 30-min source weight functions yields the relative influence of each part of the surrounding terrain to the fluxes measured, the so-called footprint climatology (Amiro, 1998). These concepts of linking footprint results with flux data attributes are described in detail in Göckede et al. (2004Göckede et al. ( , 2006 The evaluation of the land cover composition within the footprint of the measurements is a centerpiece of this study. Each of the sites in networks such as CarboEurope-IP has been set up to monitor a specific type of land cover, or a combination of two or more land cover types, respectively. Flux contributions from different land cover types than the specified target introduce a systematic bias to the measurements, altering the "true" signal from the target land cover type and thus increasing the scatter in biome intercomparison studies. Therefore, the source area of the fluxes measured has to be known to evaluate how well the measurements represent the specified target land cover. The approach presented herein determines the overall representativeness of a site with respect to the specified target land cover type, and identifies possible problematic wind sectors or meteorological situations that fail to produce representative data. For each of the 30-min averaged fluxes of a processed dataset, our approach determines the flux contribution of the specified target land cover type, and the results are transferred to a database. Based on this information, individual thresholds for a minimum required flux contribution can be defined for different studies to filter out measurements failing to meet that requirement. After intensive pre-analyses, we found the following classification into four groups the most practicable to characterize individual 30-min measurements and facilitate the overall site evaluation: -Homogeneous measurements: 95% or more of the flux are emitted by the specified target land cover type. Systematic bias by flux contributions from other land cover types is negligible. Sites with a high percentage of homogeneous measurements reliably represent their specified target land cover type, and are thus ideal for acrossbiome studies.
-Representative measurements: 80 to 95% of the flux are emitted by the target land cover type. We chose the 80%-threshold to limit possible disturbing influences of non-target land cover types on the measured fluxes to a low level. At the same time, the threshold is relaxed enough to acknowledge the fact that the vast majority of FLUXNET sites has to deal with heterogeneous land cover structures to a certain degree, and ideal homogeneity in the fetch is very rare.
-Acceptable measurements: 50 to 80% flux contribution are emitted by the target land cover type. Though still dominant, the fluxes from the target areas are diluted significantly. Use for across-biome studies is not recommended, but may be valid depending on study objectives.
-Disturbed measurements. Less than 50% flux contribution from the target land cover type. These mea-surements are dominated by flux contributions which, strictly speaking, should be regarded as disturbances. These data are therefore not valid for across-biome studies.
Using the above categories, the sites within a flux network can be grouped according to the usefulness of a footprint filter to assure the representativeness of the measurement data for the specified target land cover type (see Sect. 4.1).

Flux data quality assessment
All fluxes were checked for their quality according to a scheme proposed by Foken and Wichura (1996) in the revised version as presented by . These tests are based on the analysis of high-frequency raw data of vertical and longitudinal wind components, air temperature, and water-and CO 2 -concentrations. Quality ratings assigned for stationarity of the flow and the development of the turbulent flow field (using the so-called integral turbulence characteristics) are combined to yield the overall quality flag of the data, with individual results for the fluxes of momentum, sensible and latent heat, and CO 2 . The deviation of the mean vertical wind component from zero forms a separate quality measure. The stationarity of the flow is tested by comparing 30-min covariances with the linear average of six 5-min covariances obtained for the same period (Foken and Wichura, 1996). Good agreement between both values indicates stationary conditions, e.g. deviations lower than 15 percent are rated with flag 1 .
Integral turbulence characteristics are based on the fluxvariance similarity (e.g. Obukhov, 1960;Wyngaard et al., 1971), and can be expressed as functions of stability of stratification (Panofsky et al., 1977;Foken et al., 1991) in case of undisturbed turbulence (Kaimal and Finnigan, 1994;Arya, 2001). To use them as a measure of flux data quality, values parameterized with standard functions (Thomas and Foken, 2002) are compared to the measurement results. Close agreement between both values indicates a well developed turbulent flow field, resulting in a good quality rating (e.g. differences are lower than 15% for class 1). Deviations, which may be caused by disturbances in the turbulent flow field like obstacles in the fetch or flow distortion caused by the instrument setup, are penalized by lower quality flags (differences for class 9 are >1000%). Specifics on the classification are given in . In the study presented, integral turbulence characteristics are only considered for the horizontal and vertical wind components, since for the temperature scalar, the parameterization  is only valid for unstable stratification, and no valid parameterizations exist for the water vapor and CO 2 scalars (see also Rebmann et al., 2005;Göckede et al., 2006).
Details on the combination of the ratings for stationarity and integral turbulence characteristics to form the overall flux quality rating is given in . Quality flags range on a scale from one (best) to nine (worst) to each flux measurement. As a short guideline, fundamental research should be restricted to measurements with the highest data quality (classes one to three), while classes four to six indicate good quality, and data can be used for calculating solid averages or long-term budgets of the exchange fluxes. Classes seven and eight may be included for averaging purposes at the user's discretion, but should be checked with care for plausibility because of significant deviations from the basic theoretical assumptions for the eddy-covariance technique. Measurements rated with quality class nine should be discarded in any case. This classification does not include a test for the presence of advective fluxes, which can act as a large selective systematic error in the measurements and introduce large uncertainties into long-term budgets in addition to small errors in half-hourly measurements.

Spatial structures in the data quality flags
Based on the footprint results, this approach links the quality flags for momentum flux, sensible and latent heat fluxes, and CO 2 flux with characteristics of the surrounding terrain. Comparison of spatial structures in the data quality between individual sites is not straightforward, since local terrain characteristics and climate conditions may vary significantly. To facilitate an overall evaluation and allow site intercomparisons, we differentiate between three kinds of spatial effects in the data quality: isolated effects, multidirectional effects and instrumental effects. The abundance of each of those effects can be used for site classification (see Sect. 4.2).
Isolated effects on the data quality consist of a narrow wind sector with reduced data quality in an otherwise highquality region of the map. Usually, due to the restriction to a specific wind direction and stability regime, the total number of lower-quality measurements is rather low, so that they could easily be overlooked if not coupled to footprint results as done in this approach. A perfect example for an isolated effect in the data quality was found for the Wetzstein site (DE-Wet, Fig. 1). This site received an outstanding overall quality rating for all fluxes, except for a narrow wind sector around 110 • where the quality of all four fluxes analyzed was significantly reduced during stable to neutral stratification. In this case, the effect can most probably be attributed to flow distortion induced by the instrumentation setup, as it occurs independent of the type of flux. However, since wind direction from this sector is very rare at the Wetzstein site (1.8% for 2002-2006 for 105-115 • over all stability regimes), this effect does not have a significant impact on the overall site performance.
For multidirectional effects, the median data quality is significantly lower than average for a specific stability of stratification or flux. In contrast to the isolated effect, a reduced data quality is found for several wind sectors, or is observed independent of the wind direction. Multidirectional effects are in most cases caused by instrumentation effects or regional flow patterns, rather than by the local scale characteristics of the surrounding terrain. A very good example for the identification of an multidirectional effect on the data quality was found for the Danish Sorø site (DK-Sor). At Sorø, all fluxes analyzed were of very good overall quality independent of the stability of stratification, except of the latent heat flux during stable stratification (Fig. 2). Since also the CO 2 flux was not affected by this reduction in data quality, both terrain influences and general instrumental problems of infra-red gas analyzer (IRGA) or sonic anemometer can be ruled out as a possible cause of the problem. One plausible explanation is that the tubing of the closed-path IRGA system affects water vapor transport to the sensor during nighttime measurements (see also Ibrom et al., 2007), altering the quality of the latent heat flux, but not of the CO 2 flux. Other multidirectional effects identified in the context of this study were caused by extreme low turbulence intensities during stable stratification which did not pass the test for a well-developed turbulent flow field (FI-Hyy), or by a regional wind climatology with channeled flow that covered certain wind sectors only during transitional periods with non stationary flow conditions (CH-Lae, ES-LMa). linked to the IRGAs. As an example, Fig. 3 shows the data quality map for the latent heat flux during stable stratification at Sodankylä (FI-Sod), where a METEK USA1 sonic anemometer is installed. For this site, during stable stratification quality maps for all fluxes were structured into 3 sectors of 120 degrees each, with good data quality in the north, east-southeast and southwest, and a reduced quality in between. These results indicate that there is a flow distortion induced by geometry of the METEK USA1 sensor head, slightly compromising the quality of the measurements.

Evaluation of the Planar-Fit coordinate rotation method
For this study, we applied the Planar-Fit coordinate rotation method (Wilczak et al., 2001) for all sites to adapt the orientation of the measured wind regime to the requirements of eddy-covariance flux data processing (e.g. a mean vertical wind velocity of zero). Only one set of coordinate rotation angles was applied at each site to avoid systematic errors in long-term mass balances introduced by transient horizontal flux divergences below the sensor height for sensors mounted above tall canopies (Finnigan et al., 2003). Consequently, in many cases the rotated mean vertical wind velocity still deviates from the ideal value of zero in some of the wind sectors, because the vertical wind field is often curved instead of being an ideal plane. For a detailed analysis of individual sites, maps of the mean vertical wind velocity before and after rotation can be compared to evaluate the performance of the Planar-Fit method. In the example of the Hesse site (FR-Hes, Fig. 4), the tilted unrotated wind field (maximum mean vertical velocity 0.15 m s −1 ) was rotated into a balanced plane with absolute residues below 0.03 m s −1 . For site intercomparison, the maximum absolute value of the vertical wind velocity after rotation can be used to evaluate the performance of the Planar-Fit coordinate rotation method.

Representativeness for the specified target land cover
We analyzed the representativeness for the specified land cover type for 24 forested CarboEurope sites. Please note that while the classification of individual 30-min averages follows the scheme proposed in Sect. 3.2, for the overall site evaluation it has to be determined what percentage of the total dataset exceeds one of those thresholds, which can be chosen by the user. Three of those thresholds were tested herein.
The vast majority (for most sites all) of the 30-min measurements analyzed in this study were dominated by flux emitted from the specified target land cover type (flux contribution of a 30-min averaged flux >50%). The maximum percentage of measurements with target area flux contributions lower than 50% in the dataset for an individual site was 3.3%. However, application of this threshold is not suitable for an overall site evaluation, because with a possible 49% of flux emitted by other sources there is still a significant source of uncertainty that might compromise the representativeness. Checking for the most rigid quality measure, the test for homogeneous source areas (95% or more of the flux is emitted by the target land cover type), we found that 10 of the 24 sites had 50% or more of their 30-min measurements falling into this category. These sites, which perfectly represent their specified target land cover type, are FR-Pue ( Table 3).
The most suitable threshold defined in Sect. 3.2 to perform an overall site evaluation is the percentage of 30-min measurements exceeding the threshold of 80% flux contribution from the target land cover type (representative measurements). For a better overview of the results, we grouped the 24 sites into four different categories according to the percentage of representative measurements in the total dataset. See also  contribution from the target land cover type will significantly strengthen the data for further use.
-50% to 60% of data exceed the 80% threshold: 3 sites (FI-Hyy, IT-PT1, IT-SRo). Most measurements are still representative for the target land cover type, but small scale heterogeneities close to the tower that slightly influence the measurements during all stratification cases reduce the flux percentage of the target land cover type. During stable stratification, short fetches in one or several wind sectors lead to a large influence of non-target land cover types. These datasets need a detailed footprint analysis before uploading into the database.
-Less than 50% of data exceed the 80% threshold: 2 sites (BE-Bra, DK-Sor). The representativeness of these sites for the specified target land cover types is significantly reduced, due to various reasons. Details on these sites are listed further below.
For Brasschaat (BE-Bra, Fig. 5), the percentage of measurements exceeding the 80% threshold of flux contribution from the target land cover (forest) is only 42.5%. However, the general situation is similar to the three sites listed in the third category above (50-60% of representative data), with a small scale heterogeneity close to the tower, but significant reductions of the representativeness only to be found during stable stratification. However, the fetch is extremely short, especially in the main wind direction around the west. Measurements during stable stratification should generally be discarded except for easterly winds. Application of a footprint filter based on the results by Rebmann et al. (2005) at this site increased the contribution of the target vegetation to >90% (Nagy et al., 2006). Because stable stratification is predominantly a nighttime phenomenon at this site, the footprint filter strongly affected the nighttime fluxes (on average more than 10% reduction), while it had virtually no effect on the daytime fluxes.
Two major factors compromise the representativeness of the Sorø (DK-Sor, Fig. 6) data for the target land cover type deciduous forest: First, the deciduous forest is interspersed by a large number of small heterogeneities such as other forest types or grassland patches, and second the fetch is rather short especially in the main wind directions to the east and to the west. Consequently, only 9.5% of all measurements exceed the threshold of 80% flux contribution from the target land use type (however, still 99.3% exceed the threshold of 50%). These results have to be interpreted with care, since the wind climatology within the 2 1 / 2 month period in summer 2002 used for this study has more pronounced maxima to the east and west than usually found at this site. In addition, most of the large grassland patches, or clearings, within the forest, have characteristics very close to those of the target land cover type deciduous forest. A more representative wind climatology and inclusion of the clearings into the target area would lead to a significant increase of measurements truly representative for the specified target land cover type. Since the quite detailed land cover map provided for this site may have biased the results, a different treatment of the land cover data may change the representativeness of the flux data significantly. The footprint issue for this site has previously been dealt with by Jensen (2000, 2005). Sites that have "footprint problems", i.e. a high percentage of flux contributions emitted from outside their defined target land cover type, can however be used without restrictions in applications where the exact source of the fluxes is not important. Studies like that could include for example the analysis of interannual variability, monitoring of disturbances effects, and parameterization and validation of models with a low horizontal resolution compared to the footprint of the measurements.

Evaluation of spatial structures in flux data quality
For a general overview of average data qualities assigned by the  approach, Fig. 7 presents the fre-quency distributions of the median data quality found for the 25 sites involved in this study. Please note that due to software related problems, the CO 2 flux for the site ES-LMa could not be evaluated in the context of this study, so only 24 sites are considered for the distributions of median data quality for CO 2 in Fig. 7.
For most of the spatial maps of the momentum flux data quality, the quality ratings were in the excellent range between classes 1 and 3, indicating no significant disturbance effects on the measurements. For all sites, no significant reduction (mean QC flag greater than 6) of the data quality was observed during unstable and neutral stratification. reduction of overall data quality during stable stratification for three more sites (ES-LMa, FI-Hyy, FR-Pue).
The average data quality for the sensible heat flux was found to be lower than for the momentum flux, mainly due to a failed stationarity test for the temperature measurements. For 13 sites (BE-Bra, DE-Hai, DE-Wet, ES-LMa, FR-Hes, FR-LBr, FR-Pue, IT-Ren, IT-Ro1, IT-SRo, NL-Loo, PT-Mi1, UK-Gri), isolated effects were observed that occurred during one or two stability regimes. Since all stability regimes are affected, these effects are likely to be caused by heterogeneities in the source strength of the sensible heat flux in the surrounding area, instead of systematic errors in e.g. instrument setup, flux processing or quality flag assignment. Three more sites experienced a multidirectional reduction of the sensible heat flux data quality, two of those during stable stratification (FI-Hyy, IT-Col), and one (CH-Lae) during neutral stratification.
For the latent heat flux, overall data quality was frequently found to be only moderate (classes 3 to 6). For four sites, isolated disturbances were observed (DE-Wet, ES-LMa, FI-Sod, IT-SRo), most of those (except IT-SRo) restricted to a single stability class. Multidirectional reduction in data quality occurred mostly during stable stratification at 11 sites (CH-Lae, CZ-BK1, DK-Sor, FI-Hyy, FR-LBr, FR-Pue, IT-Col, IT-Ren, IT-Ro1, PT-Mi1, UK-Gri). For two of these sites, the data quality was significantly reduced for all stratification regimes: CH-Lae, which is situated on a mountain slope, and UK-Gri for which wintertime data were analyzed. The observed patterns suggest that reduced data quality of the latent heat flux can be attributed to a large extent to measurement problems such as water in the tubing systems of closed-path IRGAs, or open-path analyzers with liquid precipitating on the windows of the measurement path during rain or fog events, while terrain effects only play a minor role. A reduction of tubing length and regular cleaning of the tubing can help to improve the quality for closed-path systems.  The overall data quality of the CO 2 flux was mostly found to be good to very good (classes 1 to 3). For five sites (DE-Wet, FR-Hes, FR-LBr, IT-SRo, UK-Gri), isolated effects were detected that reduced the data quality in one or two stratification regimes. Multidirectional problems were found at three sites (CH-Lae, FI-Hyy, PT-Mi1). Possible explanations for the reduced data quality in these cases include steep topography and channeled flow (CH-Lae) or high frequency of occurrence of low turbulence intensities (FI-Hyy). As for the sensible heat flux, data quality of the CO 2 flux seems to be influenced mostly by smaller scale variations of source strength in the surrounding terrain, while sensor problems play a minor role.
Instrumental effects on the spatial structures of data quality were only observed for the two sites that used METEK USA1 sonic anemometers, FI-Sod and IT-Roc (all other sites installed Gill instruments). In the case of FI-Sod, all quality maps were clearly structured into 3 sectors with 120 degrees each during stable stratification (see Fig. 1 for details), with the strongest distinction between the sectors found for latent heat flux and CO 2 flux. At the IT-Roc site, the influence of the sonic structures could clearly be observed in the spatial structures of the averaged vertical wind component after applying the Planar-Fit coordinate rotation method. Both results indicate that the geometry of the METEK USA1 sensor head influences the wind field, which may have an effect on the data quality under certain conditions. The problem may Biogeosciences, 5, 433-450, 2008 www.biogeosciences.net/5/433/2008/ be caused by an ineffective head correction .
To facilitate a site intercomparison and overall network evaluation, the results for the four different fluxes and the instrumental effects listed above can be integrated into four major categories (Table 4).

Results of the Planar-Fit coordinate rotation method
The frequency distribution of the maximum value of the remaining residues after performing the rotation is displayed in Fig. 8. The results demonstrate that the application of the Planar-Fit coordinate rotation method was successful in the majority of the cases. For 20 of the 25 sites, the maximum of the residue of the mean vertical wind velocity did not exceed the threshold of 0.06 m/s. As for most of those cases the area with the maximum residue was restricted to a very small part of the terrain covered by the accumulated source weight function, the total effect of these deviations from the ideal value of zero should be insignificant for the flux processing.
For three of the sites analyzed, Planar-Fit could not provide a rotated flow field with residues low enough to be considered insignificant (vertical velocity residue >=0.10 m s −1 , CH-Lae, FR-LBr, IT-Ren), and two more sites (DE-Tha, PT-Mi1) were close to that threshold. In all cases, the distortion of the streamlines of the averaged flow cannot be removed with a single set of coordinate angles. For two of those sites (CH-Lae, IT-Ren), this is a consequence of complex mountainous terrain, and also for DE-Tha the curved streamlines in a narrow wind sector to the east of the tower can be attributed to the hilly topography. For FR-LBr, the distorted streamlines are caused by steep gradients in surface roughness due to a significant step change in vegetation height such as a forest edge close to the tower position. In the case of the PT-Mi1 site which is situated in rather open savanna forest, possible explanations include a distortion of the wind field by individual trees close to the tower, or a flow distortion by the inlet tube of the IRGA system which partly obstructs the sonic anemometer in the disturbed wind sectors.

Discussion
Interpretation of the results has to consider that  developed their site evaluation approach as a practical and easy-to-use tool that allows to conduct extensive network studies as presented in this study. With an average of close to 4000 Lagrangian Stochastic footprints to be calculated for each of the 25 participating sites, generalizations like the use of pre-calculated source weight functions had to be adopted to reduce processing time. Consequently, adapting the flow statistics for the footprint computation to specific characteristics of the individual sites was not possible in the context of this study. Since any footprint model can only be as good as the description of the underlying tur-  Table 3.
bulent flow conditions, this simplification introduces additional scatter into the results (e.g. Göckede et al., 2007). The performance of footprint models is further compromised by complex topographical conditions and step changes in surface properties (e.g. Schmid and Oke, 1988;Klaassen et al., 2002;Leclerc et al., 2003;Foken and Leclerc, 2004), which alter the atmospheric flow conditions. Consequently, the applied method may introduce additional uncertainty especially at sites with fine scale heterogeneities in land cover structure, and frequent transitions from e.g. arable land to tall forest. Overall, the uncertainty introduced by the footprint modeling basically affects quantitative results like the percentage flux contributions from different types of land cover. Effects of footprint uncertainty on the predicted flux contributions depend on the relative position of heterogeneities to the tower location and on the local wind climatology, so that it is not possible to provide a general error estimate. Releasing particles only close to the ground in the setup of the Lagrangian stochastic footprint model introduces an additional bias into the footprint computations, since the vertical structure of sources and sinks may be different for each of the fluxes (e.g. CO 2 flux vs. latent heat flux), and may also change over time (e.g. daily course of CO 2 assimilation in the upper canopy). Sensitivity analyses for the particle release height carried out for the same LS footprint model as applied herein Rannik et al., 2003) demonstrate that elevated sources result in a shorter fetch and more pronounced peak of the source weight function, as compared to sources on the ground. Therefore, the footprint calculations for this study can be regarded as conservative estimates, i.e. they tend to overestimate the fetch rather than underestimate. Since for most sites the fraction of the target land cover decreases with increasing fetch distance, this implies that also the results for target area representativeness tend to be conservative. Effects on qualitative findings are expected to be insignificant, since the spatial maps of data quality are mainly determined by the  quality flags for fluxes and the meteorological boundary conditions. Changing the setup of the footprint model would alter the scales of those maps, but not the spatial patterns of areas with high or low overall data quality. See Göckede et al. (2006) for a more detailed discussion.
Even though this study invested a lot of effort to assure a uniform processing of the eddy-covariance raw data of each participating site, still a site intercomparison is compromised by differences in the provided data material. We decided to rely on the experience of the cooperating research groups to pick a data set that best represents the local conditions and avoid large data gaps and/or periods with sensor malfunctions. Consequently, the provided datasets vary in length and season covered. All data were considered for this study to optimize the results for each individual site, but for site intercomparison the increased representativeness of results based on a larger dataset or the higher abundance of stable stratification cases with lower data quality during wintertime have to be considered. Also, interpretation of data quality maps such as e.g. shown in Fig. 2 must take into account that weighting according to the wind climatology is considered only for the accumulated source weight function (white lines), but not for the colored background indicating the median data quality. This implies that a sector with reduced data quality may vary in significance for the overall site evaluation, depending on whether it falls within the main wind direction or within a less frequented wind sector.
This study did not apply any kind of data filtering like use of a friction velocity threshold (u * filter, e.g. Massman and Lee, 2002;Gu et al., 2005) to exclude situations with low turbulence intensities at nighttime, as is the normal procedure for the selection of data uploaded to the CarboEurope-IP database. Therefore, results presented herein may vary for the individual sites if the analysis was only based on submitted data, depending on the amount of data excluded by u* filtering. Since low data qualities are usually correlated with low turbulence intensities, application of a u * -filter when uploading the data into the database has the potential to solve many of the data quality problems during stable stratification that are identified by the approach presented herein. However, for 17 out of the 25 sites analyzed here, the fraction of data flagged "bad" (flags in the range 7 to 9) was reduced by only 30 percent or less after application of a u * -filter with a threshold as recommended by the CarboEurope database. For these sites, application of a u * -filter would have no or very limited impact on the findings presented. Significantly better results in the maps showing spatial effects on average data quality would probably be obtained for 2 sites within this list, FI-Hyy and FR-Pue, where the fraction of bad data was reduced by more than 70 percent after application of the u * -filter.
The different characteristics of the map material compromise a comparison between the sites, as the quality of the land cover information also influences the accuracy of the footprint analysis, and the various levels of detail may lead to biased conclusions about representativeness. For example, a forest with a large number of small scale heterogeneities such as roads and small clearings would be completely differently described by a map read out from topographical maps in a resolution of 100 m, or produced based on remote sensing data with a resolution of 15 m. In the former case, the implicit majority filter would make most the heterogeneities disappear, resulting in a description of a homogeneous forest, while the remote sensing map is capable to resolve the heterogeneities. On the other hand, a coarse resolution may also enhance the influence of heterogeneities which are just large enough to form the majority of a pixel, and therefore decrease the level of representativeness in the footprint analysis. Consequently, the choice of the map resolution has a significant impact on the footprint analysis, with the direction of the bias dependent on the specific land cover structure. The number of land cover types distinguished might also have a large impact on the footprint analysis, since an area with patches of different forest types might appear as homogeneous forest in a simpler map, and thus vary the target area significantly.
For the PT-Mi1 site, we ran two site analyses to highlight this aspect: The original model run was based on the land cover map with a horizontal resolution of 100 m, which had been read out manually from topographical maps. The updated site analysis (Siebicke, 2007) used a map with 10 m resolution, which had been produced based on an intensive on-site vegetation survey and aerial photographs. Comparison of the maps demonstrated that, in a landscape characterized by small scale heterogeneities, a coarse resolution may significantly distort the coverage relationships between land cover types by the implicit majority filter, e.g. turning a small creek into a 100 m wide stream. In addition, the more realistic representation of the landscape in the high resolution map may also influence the choice of the target land cover type. In the case of PT-Mi1, the 10 m resolution map showed a large number of individual trees and tree groups interspersed by grass-covered areas (open savanna, or Montado), while the coarser map suggested patches of closed forest and large clearings. Regarding the Montado as the "true" target area, and use of the updated land cover map, raised the average flux contribution from the target area from 71% to 99%, and consequently altered the overall site evaluation from "serious fetch problems" to "perfect fetch conditions". This example emphasizes that choice of the target area and quality of the underlying vegetation map have significant influence on the results presented by the footprint-based quality evaluation procedure presented herein, which has to be considered especially for site intercomparisons. Moreover, the approach presented only distinguishes between target area and other land cover types when assessing the representativeness of the measurements, without taking into account ecophysiological differences between the different land cover types. In Biogeosciences, 5, 433-450, 2008 www.biogeosciences.net/5/433/2008/ practice, however, it will play a role if e.g. measurements intended to monitor mature forest are influenced to a certain degree by a neighboring lake or by an adjacent patch of younger forest, but the interpretation of the impact of such differences on fluxes at the individual sites is beyond the scope of this study. Due to the sources for scatter or systematic bias listed above, the results presented herein should not be used to rank the sites of the network, or label them as suitable or unsuitable for providing valuable data for CarboEurope-IP. This study focuses on quality aspects of micrometeorological measurements, while ecological importance and available infrastructure also play a major role for site selection in a network. At sites receiving a low quality rating in one or more of the categories tested, the specific situations when data quality is significantly reduced should be closely examined, as e.g. performed by Nagy et al. (2006) at the BE-Bra site. Our findings highlight effects reducing data quality at individual sites as well as network-wide patterns, distinguishing between sensor related quality reductions that can be improved for and effects induced by characteristics of the surrounding terrain that have to be flagged and filtered out in the database.
Results computed with the footprint based quality assessment approach presented herein could be used to establish a network wide quality flagging procedure for eddycovariance measurements that is consistent for all sites. Uniform QA/QC flags based on objective and transparent criteria would significantly improve database reliability for a growing user community, as they allow filtering the data by criteria like momentum flux data quality or the percentage flux contribution of the specified target land cover type. Since establishment of a uniform flagging procedure is complicated due to the use of a variety of flux processing software tools in the CarboEurope-IP network, this approach could provide an alternative in the form of lookup tables which would be part of the post-processing once data got uploaded into the database. To obtain lookup tables as such, the results obtained with the approach presented would have to be binned into classes of e.g. wind direction and stability of stratification, with the number and definition of the binning parameters as well as their class definitions open to the database manager. For each combination of classes, an averaged flux contribution of the target area and a list of parameters describing expected flux data quality (e.g. median flag, or percentage of low quality measurements) can be computed. Linking those tables to the database would allow assigning additional information on data quality to each individual 30min averaged flux, allowing for users to assure the data meet the requirements for their specific studies.

Summary
This study applied the site evaluation approach by Göckede et al. (2006), which combines Lagrangian Stochastic footprint modeling with a quality assessment approach for eddycovariance data  to 25 forested sites of the CarboEurope-IP network. The analysis focused on the representativeness of the datasets for the specified target land cover type, spatial patterns in the data quality of momentum flux, sensible and latent heat flux and CO 2 flux, the performance of the Planar-Fit coordinate rotation method (Wilczak et al., 2001), and instrumentation effects on data quality.
We did not find systematic differences in overall data quality depending on the three different types of infrared gas analyzers and four different types of sonic anemometer installed at the analyzed sites, but METEK USA1 anemometers seem to introduce spatial patterns in the data quality due to the geometry of the sensor, which may be attributed to an ineffective head correction. Most of the sites represented their specified target land cover type very well, but only ten of the 25 analyzed sites (40%) are situated in almost homogeneous terrain, while for two sites measurements are significantly compromised by a heterogeneous land cover structure. A footprint filter is recommended as additional information in the CarboEurope-IP database to indicate level of representativeness of each stored flux measurement. 12 of the 25 sites (48%, Table 2) experienced a significant reduction in eddycovariance data quality under certain conditions, mostly constricted to a small wind sector and a specific flux and/or atmospheric stability. Data quality was highest for momentum flux and CO 2 flux. Our results indicate that the quality of the sensible heat flux was reduced mostly by terrain effects, such as small-scale heat sources that deviate significantly from their environment in terms of source strength, while instrumentation effects did not seem to be important for this parameter. For the latent heat flux, on the other hand, data quality reductions occurred due to general instrumentationrelated problem instead of effects induced by the characteristics of the surrounding terrain. The Planar-Fit coordinate rotation could be applied successfully at 20 sites, while for the remaining five sites complex mountainous terrain or significant changes in vegetation height induced a distortion of the flow field that could not be corrected for. For those sites, the use of a coordinate rotation for each averaging interval may be a more appropriate method to adapt the wind field for eddy-covariance data processing.
Overall, our quality evaluation results for the CarboEurope-IP network demonstrated a high average data quality, and good representativeness of the measurement data for the specified target land cover types. The study demonstrates that the site evaluation approach by , or application of footprint approaches in general, provides a valuable tool to identify measurement problems connected to terrain characteristics and instrumental setup that reduce data quality. The reliability of eddy-covariance databases for the user community can be significantly strengthened by using these results as part of the database, allowing to filter out and to flag data that do not meet the required standards.