the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical Note: Flagging inconsistencies in flux tower data
Martin Jung
Jacob Nelson
Mirco Migliavacca
Tarek El-Madany
Dario Papale
Markus Reichstein
Sophia Walther
Thomas Wutzler
Abstract. Global collections of synthesized flux tower data such as FLUXNET have accelerated scientific progress beyond the eddy covariance community. However, remaining data issues in FLUXNET data pose challenges for users, particularly for multi-site synthesis and modeling activities.
Here we present complementary consistency flags (C2F) for flux tower data, which rely on multiple indications of inconsistency among variables, along with a methodology to detect discontinuities in time series. The C2F relates to carbon and energy fluxes as well as to core meteorological variables, and consists of: (1) flags for daily data values, (2) flags for entire site variables, (3) flags at time stamps that mark large discontinuities in the time series. The flagging is primarily based on combining outlier scores from a set of predefined relationships among variables. The methodology to detect break points in the time series is based on a non-parametric test for the difference of distributions of model residuals.
Applying C2F to the FLUXNET 2015 dataset reveals that: (1) Among the considered variables, gross primary productivity and ecosystem respiration data were flagged most frequently, in particular during rain pulses under dry and hot conditions. This information is useful for modelling and analysing ecohydrological responses. (2) there are elevated flagging frequencies for radiation variables (shortwave, photosynthetically active, and net). This information can improve the interpretation and modelling of ecosystem fluxes with respect to issues in the driver. (3) The majority of long-term sites show temporal discontinuities in the time series of latent energy, net ecosystem exchange, and radiation variables. This should be useful for carefully assessing the results on interannual variations and trends of ecosystem fluxes.
The C2F methodology is flexible for customizing, and allows for varying the desired strictness of consistency. We discuss the limitations of the approach that can present starting points for future improvements.
- Preprint
(4059 KB) - Metadata XML
- BibTeX
- EndNote
Martin Jung et al.
Status: open (until 04 Oct 2023)
-
RC1: 'Comment on bg-2023-110', Dennis Baldocchi, 10 Sep 2023
reply
Technical Note: Flagging inconsistencies in flux tower data
This is definitely a technical note, and as long as the editors of this journal are willing to consider it, I will give it due diligence as a referee. Otherwise, this document could be used as a white paper or grey literature to supplement processing on the ICOS or Fluxnet web sites.
I appreciate the utility of having a set of well defined and accepted flags for data by the community, as data has certain quality due to time and place. Sadly we are aware that many data users who are not involved in the details and rigors of the measurements, processing and interpretation often ignore these flags. But no harm in producing and providing them.
What is unique and distinct here is the production of a complementary set of consistency flags (C2F) for flux tower data, which rely on multiple indications of inconsistency among variables, along with a methodology to detect discontinuities in time series. I am a fan of multiple constraints, so I am willing to read the case before me.
As I read this work I have some disagreements with conditions which they may flag.
For example it is stated that most frequent flags were associated with photosynthesis and respiration during rain pulses under dry and hot conditions. I have spent a good part of my career studying such pulses. They are real and sustained following rain events. To remove them is faulty and will cause biases in sums. Yes, I concur during the rain event itself data such be flagged when sensors are wet. But following the rain, huge amounts of respiration can and will occur.
2 Materials and methods
As I read this paper back and forth, I wonder about the wisdom of its organization. In the Methods section there are 8 figures or so. This seems more like a Results and Discussion. I also suspect much of the excess material could be in an Appendix or Supplemental material. For a Technical Note, this paper is really long and excessive.
Section 2.2.1. It is important to benchmark and monitor the relation between PAR and Rg. It is our experience that quantum sensors tend to drift over time, if not frequently calibrated. PAR is used to upscale fluxes with remote sensing and if those relationships are built on faulty values of PAR, the derived products in time, space and trends will be in error. Hence, looking at these flags can have important implications. Too often this issue has been overlooked, so it would be important to know the consequences of improving these data.
The relation between Rn and Rg is important to examine, but realize it will change with season as albedo and surface temperature changes. So be careful and do not make your flags by using one annual dataset for the site.
Comparing GPP and Reco with Day vs Night partitioning methods may be interesting, but not sure which is right. We know there is down regulation in dark respiration during the day, it is hard to measure reliable CO2 fluxes at night under stable conditions and with tall vegetation and appreciable storage and or sloping terrain. These can be points of reference and maybe the daily sum is better than hour by hour measurements, as some errors cancel.
I have learned from Dario the value of plotting CO2 flux vs u* and developed a matlab subroutine to do so. The threshold can be uncertain as u* has some autocorrelation with the flux. Of course we don’t want to set high thresholds as they are based on a diminishing number of data points as high u* values are rare compared to low ones.
I see one of the constraints is LE +H vs Rn..What about G or storage in water column? I think these tests are only instructional. We know that there are many differences in sampling areas and representative of radiation and fluxes. It is dangerous to indict one or the other. And with wetlands it is really hard to measure water storage. We have a data set with nearly closed energy balance and then flooded the system and it all went to hell. Same sensors, same processing, ideal fetch and site. Just water moves heat in and out and it is hard as hell to sample well and well enough.
I must admit I am having a problem coming up with a salient point of this paper and how it will help me do better. I am at the point where an outlier score is proposed. It seems ok, but it is a lot like the college ratings, that depend upon an arbitrary set of metrics and scores.
I often advise use the set of sites that help you ask and answer the questions you are asking, relating to climate, function and structure. Just because these sites and data are in the fluxnet database does not mean we have to use them all. Maybe this should be the point of this paper.
Figure 1. It is a comparison between machine learning and flux data. Not sure what I am to learn and extract here. Which is right or wrong? Machine learning ultimately is a fancy least squared fit to a bunch of transfer and nodes.
Here is a set of data comparing annual carbon fluxes with machine learning methods from my sites. They are almost indistinguishable from the direct flux measurements they are derived from. In this case we know our site and develop the machine learning model with the most appropriate and representative biophysical forcings. In the figure given in this paper, I have no idea how appropriate the machine learning model may be for this situation, as the answer is based on independent variables they chose to use or omit.
Regarding the comparison of radiometers I know during some seasons our guy wires may shade the quantum sensor for certain angles of the sun. surely those data are not fit and I hope such a method may help detect these biases and errors.
Figure 2 seems to be a nice case study to show the attributes of your ideas. Maybe start with that one first. It is clear and more understandable, as we know PAR and Rg are closely related. So when there are differences it can help us think about why and which is more plausible and better.
Fig 3. Maybe I am just tired, or thick, but I don’t follow the logic and rationale of the flag for light used for GPP. It would only give me pause on the accuracy of the machine learning calculations, but not the eddy fluxes.
Fig 5. I am trying to get my head around the issue of the comparison of the daytime vs nighttime methods. Again, I would argue one is better than the other. Personally, I like the idea of multiple constraints and see if the two methods are converging for confidence, more than anything. Not sure what you all are doing, but in early days working with Eva Falge, we estimated respiration during the day by the extrapolation of the CO2 flux vs light response curve. Now one of the limits is basing a regression and extrapolation on only a few points when the response function is linear, and the fact that during the sunrise sunset period steady state conditions don’t hold. It is these reasons why I argue against one being better or worse, but if they both converge at least we may assume the fluxes may be good enough.
The reality is that pulses due to rain or insects passing through the path of the IRGA or sonic are problematic. Or those from electrical noise (a rarity today). We also see problems with CO2 fluxes over open water as there is a covariance with w and RSSI of the sensor that yields fluxes in the wrong direction and that are not physical. Those should be filtered. But I don’t hear about that here.
Fig 6 seems to align with my suggestions that some sites may not be the best for some analyses and just toss them. Nothing lost as we oversample in many situations.
Fig 7. Curious as to why there is a systematic jump in LE. Eddy covariance should be immune from just a jump as we are doing mean removal. So even if sensors change and they are properly calibrated we should not expect such a marked difference. This is not like comparing two separate sensors, that can have offsets.
Fig 8. Illustration of the outlier score. This is needed to support the method described here. Has taken a long time to get to this point. Line 350!
Results
Fig9 demonstrates the point of this method. As expected met variable values tend to have few outliers.
Fig 10 provides a needed diagnostic as to when data may be rejected
Fig 11. Would think this would be a function of open vs closed path sensors
Fig 13. The jumps in NEE seem to be with site management. So Know Thy Site. Just don’t blindly process long term data. This is why we have phenocams at our tower, to look at the vegetation when things are ‘weird’.
Jumps in sensors can, will and do happen. This is why we make big efforts to write notes and log our sensor systems. Users have to remember Cavet Emptor and use the data wisely and when there are jumps look to reasons, and not mis interpret the data. Us data providers cant hand hold all users. They must do due diligence when using data too. Getting back to my point one should not use all the data. Use what is best and most fitting.
Fig14. Interesting
Discussion
Factors for potential false positive and false negative flagging
Glad to see something on this. But it leaves begging the point I make that respiration pulses are real.
Detection and interpretation of discontinuities in the time series
As I have mentioned, these are expected with long term sites as management can make changes..The site history needs to be considered too.
4.3.1 Flagged data points
I have already made my point about the danger of flagging rain pulses that are real. We have studied this with eddy fluxes, chambers, soil probes and they are consistent.
4.3.2 Flagged discontinuities in time series
It is reasonable to flag discontinuities, but aren’t they flagged already?
Concluding points
I find this paper on the opaque side. It is a slog to read through, very engineering in spirit, style and narrative.
I must confess given the energy and time to write any paper, this is one I would not have spent writing.
I am missing the ‘so what’ message and being convinced I need to apply another set of flags to what I am already doing or what is being done in fluxnet, especially something that is automated and may not be applicable for the sites I may need in my synthesis.
The scoring method seems on the arbitrary side and reminds me of the scoring system for the ‘best’ world universities. Each scoring system yields a different ranking and group. I suspect this would apply to the application of this method, too.
I want to know how often this automated method suffers from type 2 errors, calling an error when there really isn’t one.
I want to know how often this automated method suffers from type 2 errors, calling an error when there really isn’t one. This concern also revolves around my complaints about flagging real respiration rain pulses. These pulses are real and sustained and should not be flagged (except for the period when the sensors are wet).
At this point I really feel it is up to the editor whether or not they are interested in publishing such a paper. My suspicion is that it may not be cited much, but again I may be wrong. As I look at the data from a different perspective being a data generator and knowing what to belief and accept as reasonable.
-
AC1: 'Reply on RC1', Martin Jung, 22 Sep 2023
reply
We would like to thank Dennis Baldocchi for his critical assessment on our manuscript which helps to improve clarity in the revised version. There are three main issues that have emerged from his review that we would like clarify:
- Relevance, purpose, and clarity
Thankfully, FLUXNET has become a data stream to validate and calibrate process-based land surface models, machine learning models, and retrievals from remote sensing. These synergistic global data streams feed global assessments of ecosystems and climate, and associated scientific progress. Scientists leveraging flux tower data from the full network are often non-eddy covariance experts, and take the data for granted, partly because available data quality flags are limited to a few variables and are limited in their scope.
The purpose of our work is to provide a methodological framework and tool to assess inconsistencies in flux tower data according to a set of well defined criteria. The resulting complementary consistency flags can be used additionally in network wide synthesis studies by non EC-experts, e.g. for analyzing the robustness of results to the inclusion of questionable data points. The tool can assist in and accelerate data quality checking done individually by PIs or by centralized processing – it cannot replace current practices but it can catch issues that occasionally slip through, and eventually will help to save time. We have demonstrated in the manuscript that the obtained flagging patterns point to issues in the data that are not really surprising to PIs but which can have very large relevance for important scientific questions if those data issues were not flagged or were ignored. Since no such flags have been available yet, we think that our framework and tool to assess inconsistencies in flux tower data is a useful contribution to the community.
We apologize for the apparent lack of clarity in framing the paper and we aim at improving this aspect in the revision. The presented material is very technical by nature. We tried to make it accessible by including educational examples and many figures in the methods section, while bundling details in the last subsection to keep the clarity of the overall framework. It is clearly a delicate balance that we are willing to improve according to reviewer suggestions in the revised version. We thank the reviewer for his comments which help us to improve the framing and clarity of the revised paper.
- Respiration rain pulse response
We apologize for an apparent misunderstanding that we clarify in the revised version of the manuscript. Clearly, rain induced respiration pulses are real, interesting, and relevant phenomena evident in the measured NEE. However, flux partitioning methods struggle to work accurately under these conditions of abrupt change in ecosystem sensitivity to moisture, temperature, and light such that derived GPP and Reco estimates can be inaccurate. We hope that flagging GPP and Reco, when they appear to be inconsistent, helps for a better interpretation and usage of GPP and Reco data, motivates and helps in the development of improved flux partitioning methods, and helps in better understanding respiration rain pulses.
- Methodological design
We apologize for the apparent lack of clarity of the methodological rationale and advantages that we aim to improve in the revised version of the manuscript. The method follows a clearly defined logic for identifying inconsistencies in flux tower data based on a set of criteria (constraints) that are conceptually and empirically justified (see Table 2 in the manuscript) along with a careful conceptual distinction between soft and hard constraints. The strictness parameter relates to the boxplot rule and thus has an interpretable basis that is familiar to everyone with a basic understanding of statistics. The methodology is fully automatized and there is no data screening based on subjective visual inspection. The methodology is setup to be flexible in terms of modifying the criteria (constraints) and strictness – a key advantage to tailor the methodology for specific questions and applications since “there is no free lunch”.
Below we reply point by point to the comments
This is definitely a technical note, and as long as the editors of this journal are willing to consider it, I will give it due diligence as a referee. Otherwise, this document could be used as a white paper or grey literature to supplement processing on the ICOS or Fluxnet web sites.
See major point 1).
I appreciate the utility of having a set of well defined and accepted flags for data by the community, as data has certain quality due to time and place. Sadly we are aware that many data users who are not involved in the details and rigors of the measurements, processing and interpretation often ignore these flags. But no harm in producing and providing them.
We appreciate the positive comment on the general usefulness of flags for eddy covariance data users. We hope that a clear and peer-reviewed documentation raises the awareness of the users and can facilitate a more appropriate usage of the data.
What is unique and distinct here is the production of a complementary set of consistency flags (C2F) for flux tower data, which rely on multiple indications of inconsistency among variables, along with a methodology to detect discontinuities in time series. I am a fan of multiple constraints, so I am willing to read the case before me.
As I read this work I have some disagreements with conditions which they may flag.
See below for responses on specific conditions.
For example it is stated that most frequent flags were associated with photosynthesis and respiration during rain pulses under dry and hot conditions. I have spent a good part of my career studying such pulses. They are real and sustained following rain events. To remove them is faulty and will cause biases in sums. Yes, I concur during the rain event itself data such be flagged when sensors are wet. But following the rain, huge amounts of respiration can and will occur.
See major point 2).
2 Materials and methods
As I read this paper back and forth, I wonder about the wisdom of its organization. In the Methods section there are 8 figures or so. This seems more like a Results and Discussion. I also suspect much of the excess material could be in an Appendix or Supplemental material. For a Technical Note, this paper is really long and excessive.
See major point 3). We can agree to move e.g. methodological details to supplementary material based on reviewers’ and editor’s advice. Our initial decision to keep the details in the manuscript was because we considered this appropriate for a ‘Technical Note’.
Section 2.2.1. It is important to benchmark and monitor the relation between PAR and Rg. It is our experience that quantum sensors tend to drift over time, if not frequently calibrated. PAR is used to upscale fluxes with remote sensing and if those relationships are built on faulty values of PAR, the derived products in time, space and trends will be in error. Hence, looking at these flags can have important implications. Too often this issue has been overlooked, so it would be important to know the consequences of improving these data.
We agree with the reviewer that issues in PAR data could easily propagate or deteriorate downstream analysis, including the flux partitioning. We hope that our C2F flags will be contributing to a better usage of PAR data from FLUXNET that could be also implemented in the first steps of the data processing.
The relation between Rn and Rg is important to examine, but realize it will change with season as albedo and surface temperature changes. So be careful and do not make your flags by using one annual dataset for the site.
We agree with the reviewer on these conceptual issues of the Rn-Rg relationship which we had mentioned explicitly in table 2. For these reasons, the relationship was classified as soft constraint meaning that additional constraints need to indicate outliers for the same data points to cause flagging. The median correlation between daily Rn and Rg is about 0.95 which provides an empirical justification for considering this constraint for deriving inconsistency flags for radiation variables.
Comparing GPP and Reco with Day vs Night partitioning methods may be interesting, but not sure which is right. We know there is down regulation in dark respiration during the day, it is hard to measure reliable CO2 fluxes at night under stable conditions and with tall vegetation and appreciable storage and or sloping terrain. These can be points of reference and maybe the daily sum is better than hour by hour measurements, as some errors cancel.
We agree with the reviewer on these points and would mention these factors as potential sources of inconsistencies between flux partitioning methods in the revised version of the manuscript. As explained above, the aim of these flags is not to identify which one is correct but to identify data where there are multiple indications of inconsistency that suggest issues such that is able to make a choice to include these data or not e.g. for modeling activities. Inconsistencies between results from partitioning methods is one of these cases.
I have learned from Dario the value of plotting CO2 flux vs u* and developed a matlab subroutine to do so. The threshold can be uncertain as u* has some autocorrelation with the flux. Of course we don’t want to set high thresholds as they are based on a diminishing number of data points as high u* values are rare compared to low ones.
Due to the reason’s outlined by the reviewer we used the u* uncertainty quantified according to Pastorello et al. 2020 (that is an extension of Papale et al. 2006 and more robust to noise) as a soft constraint for assessing NEE.
I see one of the constraints is LE +H vs Rn..What about G or storage in water column? I think these tests are only instructional. We know that there are many differences in sampling areas and representative of radiation and fluxes. It is dangerous to indict one or the other. And with wetlands it is really hard to measure water storage. We have a data set with nearly closed energy balance and then flooded the system and it all went to hell. Same sensors, same processing, ideal fetch and site. Just water moves heat in and out and it is hard as hell to sample well and well enough.
We agree with the reviewer that accounting for storage is important for assessing the energy balance constraint for diurnal data and when flooding and associated lateral transport of energy would violate the assumption behind the LE+H vs Rn constraint.
Our approach is based on daily integrals, where for most sites and conditions, the ground heat storage change G, is quantitatively very small compared to the magnitude of daily Rn or LE+H and their uncertainties. Furthermore daily mean G tends to scale with daily Rn which further reduces the effect of neglecting G for the performance of the linear fit. This is illustrated by the median correlation for this constraint being 0.95 which provides an empirical justification. Furthermore, in sites where non-trivial variations of G would contribute to a weaker correlation between LE+H and Rn (e.g. for wetland sites) then the outlier score should not be impacted by that because outliers are identified relative to the variance of residuals (which would be bigger). For better comparability among sites we had decided to not account for G because it is comparatively frequently missing. While we hope we could make a convincing case for keeping this energy balance constraints we agree with the conceptual concerns raised by Dennis and would consider it as soft constraint rather than as hard constraint in the revised version. Thank you again for the constructive comment that helps improving the methodology.
I must admit I am having a problem coming up with a salient point of this paper and how it will help me do better. I am at the point where an outlier score is proposed. It seems ok, but it is a lot like the college ratings, that depend upon an arbitrary set of metrics and scores.
See major point 1) and 3). Essentially the objectives of the paper are to present the methodology/tool and to show the usefulness of the flags by pointing to data issues in FLUXNET2015 that are relevant to consider for important scientific questions. We will make this more clear in the revised version of the manuscript.
I often advise use the set of sites that help you ask and answer the questions you are asking, relating to climate, function and structure. Just because these sites and data are in the fluxnet database does not mean we have to use them all. Maybe this should be the point of this paper.
Indeed, we hope we can contribute to a better selection of data for specific FLUXNET based analysis for example involving modeling and empirical upscaling. We will try to make that more clear in the revised version of the manuscript. See also major point 1).
Figure 1. It is a comparison between machine learning and flux data. Not sure what I am to learn and extract here. Which is right or wrong? Machine learning ultimately is a fancy least squared fit to a bunch of transfer and nodes.
With Figure 1 we aimed at illustrating in an educational way how the outlier score works and behaves, in particular in the context of heteroscedasticity. We will improve the clarity of this part in the revised manuscript. See also major point 3).
Here is a set of data comparing annual carbon fluxes with machine learning methods from my sites. They are almost indistinguishable from the direct flux measurements they are derived from. In this case we know our site and develop the machine learning model with the most appropriate and representative biophysical forcings. In the figure given in this paper, I have no idea how appropriate the machine learning model may be for this situation, as the answer is based on independent variables they chose to use or omit.
We thank the reviewer for this important point and for sharing his results with us. The median correlation between machine learning based predictions (cross-validated within site) varies between 0.93 and 0.99 depending on the target variable (Table 2), which suggests that the chosen predictor variables listed in section 2.4.2 are appropriate. We had invested energy into designing and calculating a set of water availability metrics for improved modelling of water stress effects that are typically more difficult to get (section xx). The cross-validated model predictions as well as performance metrics are by default stored and accessible by the interested user in the tool and data. We would like to emphasize again that the outlier score is always calculated with reference to the spread of residuals which means that a bad model and thus large absolute residuals would not cause elevated outlier scores (discussed in section 2.4.2). Furthermore, the machine learning constraints are considered as soft constraints, i.e. multiple indications of inconsistencies are needed to cause flagging. We will improve the clarity on these aspects in the revised version of the manuscript.
Regarding the comparison of radiometers I know during some seasons our guy wires may shade the quantum sensor for certain angles of the sun. surely those data are not fit and I hope such a method may help detect these biases and errors.
Good point. We will add this to the discussion.
Figure 2 seems to be a nice case study to show the attributes of your ideas. Maybe start with that one first. It is clear and more understandable, as we know PAR and Rg are closely related. So when there are differences it can help us think about why and which is more plausible and better.
Thank you for your suggestion! We will consider this carefully for the revised version of the manuscript.
Fig 3. Maybe I am just tired, or thick, but I don’t follow the logic and rationale of the flag for light used for GPP. It would only give me pause on the accuracy of the machine learning calculations, but not the eddy fluxes.
SW_IN is an input of the daytime based flux partitioning method to fit a light-response curve. Therefore, GPP_DT is affected by errors in SW_IN. For this reason GPP_DT get flagged when SW_IN was flagged (see also Fig. 4). We apologize if that was not clear enough and will improve this aspect in the revisions.
Fig 5. I am trying to get my head around the issue of the comparison of the daytime vs nighttime methods. Again, I would argue one is better than the other. Personally, I like the idea of multiple constraints and see if the two methods are converging for confidence, more than anything. Not sure what you all are doing, but in early days working with Eva Falge, we estimated respiration during the day by the extrapolation of the CO2 flux vs light response curve. Now one of the limits is basing a regression and extrapolation on only a few points when the response function is linear, and the fact that during the sunrise sunset period steady state conditions don’t hold. It is these reasons why I argue against one being better or worse, but if they both converge at least we may assume the fluxes may be good enough.
The main intention of Figure 5 was to illustrate how the flagging works and that we can diagnose which constraints have contributed to or caused flagging. In addition, we wanted to illustrate how respiration pulses evident in NEE (and not flagged) are associated with inconsistent GPP and Reco partitioning methods such as systematic negative GPP_NT. If that figure adds more confusion than clarity we would consider to move it to supplementary material. See also major point 3).
The reality is that pulses due to rain or insects passing through the path of the IRGA or sonic are problematic. Or those from electrical noise (a rarity today). We also see problems with CO2 fluxes over open water as there is a covariance with w and RSSI of the sensor that yields fluxes in the wrong direction and that are not physical. Those should be filtered. But I don’t hear about that here.
Thank you for these points. We will fold those into the discussion.
Fig 6 seems to align with my suggestions that some sites may not be the best for some analyses and just toss them. Nothing lost as we oversample in many situations.
Agree.
Fig 7. Curious as to why there is a systematic jump in LE. Eddy covariance should be immune from just a jump as we are doing mean removal. So even if sensors change and they are properly calibrated we should not expect such a marked difference. This is not like comparing two separate sensors, that can have offsets.
According to the BADM the break coincides with a major change in instrumentation including a change in the gas analyzer, the sonic anemometer, and the measurement height (see also Fig.13). It is an example where these tests could also help the PI to identify at least a missing communication in terms of metadata about the setup.
Fig 8. Illustration of the outlier score. This is needed to support the method described here. Has taken a long time to get to this point. Line 350!
We will consider moving it up in the revised manuscript. The trade-off between methodological details and overall clarity seems very delicate. See also major point 3).
Results
Fig9 demonstrates the point of this method. As expected met variable values tend to have few outliers.
This is mostly correct while radiation variables show comparatively frequent flagging too.
Fig 10 provides a needed diagnostic as to when data may be rejected
It was intended to illustrate issues of currently existing flux-partitioning methods in particular for dry and rain pulse conditions.
Fig 11. Would think this would be a function of open vs closed path sensors
Interesting hypothesis! We’ll look into this for the revised version.
Fig 13. The jumps in NEE seem to be with site management. So Know Thy Site. Just don’t blindly process long term data. This is why we have phenocams at our tower, to look at the vegetation when things are ‘weird’.
Agree. Management is clearly an important factor that can cause flags for temporal discontinuities as shown in figure 13 and discussed in section 4.1.2. Changes in instrumentation or processing by the PI seem to be another important, and apparently the more important cause of temporal discontinuities (see Figure 13 and its caption and section 4.1.2). As discussed in section 4.2.1 the purpose of these flags is to make users aware of temporal discontinuities and then the decision of how to interpret or deal with them is left to the user according to the requirements.
Jumps in sensors can, will and do happen. This is why we make big efforts to write notes and log our sensor systems. Users have to remember Cavet Emptor and use the data wisely and when there are jumps look to reasons, and not mis interpret the data. Us data providers cant hand hold all users. They must do due diligence when using data too. Getting back to my point one should not use all the data. Use what is best and most fitting.
While we agree in principle we think that we should try our best to facilitate an appropriate usage of FLUXNET data for non-EC experts (in particular when large number of sites are used in an automatic ingestion system) to ultimately facilitate a wide and solid use of the data. Since we are addressing this problem explicitly we hope our flags will be useful for the community of data users. A conceptual advantage of our system compared to visual expert judgement is that the approach is standardized and automated, avoids subjectivity and avoids potential confirmation bias in selecting or filtering out data.
Fig14. Interesting
Thank you.
Discussion
Factors for potential false positive and false negative flagging
Glad to see something on this. But it leaves begging the point I make that respiration pulses are real.
See major point 2).
Detection and interpretation of discontinuities in the time series
As I have mentioned, these are expected with long term sites as management can make changes..The site history needs to be considered too.
Agreed. The flags can help pointing to management effects on fluxes as discussed in section 4.1.2 and 4.2.1, and the user can consult the BADMs and the PI to find out what happened specifically. Likewise, changes in instrumentation seem to play another major role unfortunately. This demonstrates once more the importance of the management and instrumentation change data that are often not shared. If systematically available an attribution of the temporal discontinuities could be facilitated in a future version of C2F. We will clarify this in the new version.
4.3.1 Flagged data points
I have already made my point about the danger of flagging rain pulses that are real. We have studied this with eddy fluxes, chambers, soil probes and they are consistent.
See major point 2).
4.3.2 Flagged discontinuities in time series
It is reasonable to flag discontinuities, but aren’t they flagged already?
No, they are not. Therefore we think our methodology and tool will be useful.
Concluding points
I find this paper on the opaque side. It is a slog to read through, very engineering in spirit, style and narrative.
See major point 1) and 3). We’ll try to improve.
I must confess given the energy and time to write any paper, this is one I would not have spent writing.
See major point 1).
I am missing the ‘so what’ message and being convinced I need to apply another set of flags to what I am already doing or what is being done in fluxnet, especially something that is automated and may not be applicable for the sites I may need in my synthesis.
See major point 1).
The scoring method seems on the arbitrary side and reminds me of the scoring system for the ‘best’ world universities. Each scoring system yields a different ranking and group. I suspect this would apply to the application of this method, too.
Since we aim at flagging inconsistencies in flux tower data based on a clear rationale and a set of conceptually and empirically justified criteria we think the analogy to college ratings does not apply here. See also major point 3).
I want to know how often this automated method suffers from type 2 errors, calling an error when there really isn’t one.
We fully agree with the reviewer that this would be relevant to know. To facilitate such analysis we either need labels for the real data or synthetic data. We would love to have them but don’t. It is clearly a function of the chosen niqr threshold (see discussion in section 4.1.1) – when increasing niqr (allowing for more loose consistency) type 2 errors will decrease. We encouraged that future efforts should try to establish such a benchmarking data set such that we can objectively evaluate type 1 and type 2 errors and ultimately improve the method (conclusion section, line 772-774).
This concern also revolves around my complaints about flagging real respiration rain pulses. These pulses are real and sustained and should not be flagged (except for the period when the sensors are wet).
See major point 2).
At this point I really feel it is up to the editor whether or not they are interested in publishing such a paper. My suspicion is that it may not be cited much, but again I may be wrong. As I look at the data from a different perspective being a data generator and knowing what to belief and accept as reasonable.
See major point 1). We hope we could convince the reviewer, and the editor to see the value of our contribution for the scientific community. Classic papers on eddy covariance data processing like Papale et al. 2006, Reichstein et al. 2005, Desai et al. 2008, Lasslop et al. 2010, Wutzler et al. 2018, Pastorello et al. 2020 count hundreds to thousands of citations each so far implying that advances in this domain are likely to have impact.
Citation: https://doi.org/10.5194/bg-2023-110-AC1
-
AC1: 'Reply on RC1', Martin Jung, 22 Sep 2023
reply
Martin Jung et al.
Martin Jung et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
204 | 80 | 11 | 295 | 9 | 6 |
- HTML: 204
- PDF: 80
- XML: 11
- Total: 295
- BibTeX: 9
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1