the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Enhancement of float-pH data quality control methods: A study case in the Subpolar Northwestern Atlantic region
Cathy Wimart-Rousseau
Tobias Steinhoff
Birgit Klein
Henry Bittig
Arne Körtzinger
Abstract. Since a pH sensor has become available that is suitable for this demanding autonomous measurement platform, the marine CO2 system can be observed independently and continuously by BGC-Argo floats. This opens the possibility to detect variability and long-term changes in interior ocean inorganic carbon storage and quantify the ocean sink for atmospheric CO2. In combination with a second parameter of the marine CO2 system, pH can be a useful tool to derive the surface ocean CO2 partial pressure (pCO2).
The large spatiotemporal variability of the marine CO2 system requires sustained observations to decipher trends and punctual events (e.g., river discharge, phytoplankton bloom) but also puts a high emphasis on the quality control of float-based pH measurements. In consequence, as the interpretation of changes depends on accurate data, and because sensor offsets or drifts might appear, a consistent and rigorous correction procedure to process and quality-control the data has been established. By applying standardized routines of the Ago data management to pH measurements from a pH/O2 float pilot array in the subpolar North Atlantic Ocean, we investigate the uncertainties and lack of objective criteria associated with the standardized routines, notably the choice of the reference method for the pH correction (CANYON-B or LIRPH) as well the reference depth for this correction. For the studied float array, significant differences of ca. 0.02 pH units are observed between the two reference methods which can be used to correct float-pH data from water samples. Through comparison against discrete pH data from water samples, an assessment of the adjusted float-pH data quality is presented. The results point out noticeable discrepancies near the surface of > 0.01 pH units. In the context of converting surface ocean pH measurements into pCO2 data for the purpose to derive air-sea CO2 fluxes, we conclude that the minimum accuracy requirement of 0.01 pH units (equivalent to the minimum pCO2 accuracy of 10 µatm for potential future inclusion into the SOCAT database) is not systematically achieved in the upper ocean.
While the limited dataset and regional focus of our study provides only one showcase, it still calls for an additional independent pH reference in the surface ocean. We therefore propose a way forward to enhance the float-pH quality control procedure. In our analysis, the current philosophy of pH data correction against climatological reference data at one single depth in the deep ocean appears insufficient to assure adequate data quality in the surface ocean. Ideally, an additional reference point should be taken at or near the surface where the resulting pCO2 data are of the highest importance to monitor the air-sea exchange of CO2 and would have the potential to very significantly augment the impact of the current observation network.
- Preprint
(4935 KB) - Metadata XML
- BibTeX
- EndNote
Cathy Wimart-Rousseau et al.
Status: final response (author comments only)
-
RC1: 'Comment on bg-2023-76', Brendan Carter, 28 Jun 2023
Wimart-Rousseau et al. have put together an interesting new data set and float-cruise comparison. They also review float adjustment algorithms, adjustment reference depths, and choices regarding how and when adjustments are updated. Their findings show significant offsets between their floats and discrete samples collected from a nearby cruise. They arrive at a plausible set of conclusions and suggest several useful measures. The resulting paper could be a useful contribution to the literature, but I believe it should nevertheless be returned to the authors for revision for the following reasons:
- This is too long to be a technical note, which, to my read on the journal policies, is limited to “a few pages.” This is 21 pages (before references) in the review format. Also, too much of the discussion is qualitative rather than quantitative and did not "feel" technical. This paper needs to be re-worked as a full length paper or shortened and focused.
- The discussion about adjustment update methodologies was ultimately unconvincing. I feel it could be reduced to a quick comment that it would be useful to have a community consensus regarding how this is done, which I don't feel needs much justification. Alternatively, the authors could rework and/or expand upon their rationale for their preferred method in a full paper.
- The quantitative aspects seemed, in places, potentially incorrect. Other float-to-pCO2 comparisons have not shown as large of offsets at the surface as are found in this study (see discussion in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022AV000722, also confirmed by some unpublished community studies, and compare to the pCO2 offset implied by the pH offset observered herein of ~50 uatm). Worryingly, the correlation shown herein of the upper-ocean discrete-to-float pH offset to the temperature is comparable in magnitude to the sensitivity of pH itself to temperature, so I would urge a double check on the pH temperature adjustments made in this comparison. If they were performed correctly, then the calibration of the pH probe sensitivity to temperature seems to be nearly 100% in error (which might be the author’s point, though it is odd then that this is not seen elsewhere). The author's point stands that the North Atlantic is a worst case scenario in many respects is accurate, but that it is not an especially problematic location from the perspective of vertical temperature gradients. The authors’ comment regarding the differences between the algorithm estimates is supported by a listed standard deviation of the algorithm differences of 0.051… for reasons given in line-by-line comments I believe this represents an order of magnitude error in the listed value or an indication that the standard deviation is not the appropriate statistic in this case. If my suspicions here are correct, and the true standard deviation is <0.01, as it appears from the histogram, then this is roughly consistent with the published algorithm uncertainties at depth when comparing two (only arguably independent) algorithm estimates. This then would give us no reason to doubt the earlier Williams et al. uncertainty propagation.
- A somewhat recent global pH algorithm update (ESPER-LIR and ESPER-NN) were omitted from the analysis. Presumably this is because they are not yet included in SAGE but including them should nevertheless be helpful for this discussion because the updated algorithms take a different approach to several of the algorithm limitations discussed in this paper and are slated to become incorporated into SAGE (to my understanding). For the algorithms that were used, important information was not provided (that I saw) regarding the use of an optional adjustment for ocean acidification. Also, it is arguably appropriate to keep the "spectrophotometric" pH adjustment (as the authors chose to do), but it should be pointed out that this limits the comparability of the two algorithms that were considered (unless such an adjustement was applied to CANYON-B estimates?). As a note here, we (Ocean Carbonate System Intercomparison Forum) are drafting a paper that is taking a step back from the recommendation that this adjustment should always be applied given the finding that the apparent slope is not present in a subset of cruise measurements made with purified dye. The recommendation is to instead treat uncertainty in the comparability of pH and DIC + TA as a component of uncertainty in whatever calculation is being attempted... this recommendation would support the overall thesis of this paper that we need to be cautious about the uncertainty that we claim we can achieve from float-based pCO2. A related comment that this paper uses the term "correction" when in many cases I believe "adjustment" should be used. The distinction I draw is when we believe we understand the reason for the apparent offset it is appropriate to call it a correction, and I would use adjustment in all other cases.
- Some of the figures are missing information, and the writing is difficult to understand in a few places (but excellent in many other parts). Some of the notation is inconsistent (see line by line comments). There is a comparison to “weather” and “climate” quality data thresholds, which I argue below is inappropriate. The discrete pH measurement uncertainty is unrealistically low.
Ultimately I was left uncertain of what to make of the results. This is perhaps a useful outcome for a paper that is arguing that we need to be less optimistic about the quality of the data generated by certain approaches, but I feel like the authors should take advantage of a revision to address the issues noted above and below, should double check for potential errors in the presentation and the analysis (particularly the pH- temperature and pressure conversions), should more rigorously compare how the algorithm estimate variability compares to the expected variability in a statistical sense, and should shorten the paper and distill the main ideas (particularly if it is to be kept as a technical note). That said, I tend to agree with all of the conclusions made by the authors, and their presentation did seem to support the conclusions. The subject matter is important and the statements in the conclusions are worth making to the community, so I hope the authors resubmit this paper.
Line-by-line comments
1: “this” refers to a subject that has not yet been defined
6: “decipher punctual events” is awkward phrasing. Suggest “Measure the impacts of short-term events”
8: This is a matter of personal preference so please feel free to ignore this comment, but I feel this sentence is written backwards. It is shorter and easier to read when written as, e.g., “Quality control is needed to correct sensor offsets or drifts.” A recommendation a past advisor gave me is to establish the subject and verb of a sentence early in the sentence so the readers know what the sentence is about. The subject and verb are among the last words in this sentence, and this is a common element of many of the most difficult sentences in this paper.
9: Again, this feels backwards
12: LIRPH should be LIPHR or LIR-pH
32: This sentence is correct, but it is probably better to say “ocean acidity will increase” because “alkalinity” will not decrease by this mechanism.
49: The majority of what we know about surface ocean pCO2 arguably comes from ships of opportunity, which deserve greater mention in this discussion.
58: Maurer et al. show large pH sensor adjustments, suggesting the pre-deployment calibration is not a major factor since it is seldom used… except perhaps for characterizing the dependence on, e.g., pressure.
71: Missing ESPER-NN and ESPER-LIR (which are updates to LIR).
98: the “claimed” pH accuracy…
104: worth pointing out that this is a single point calibration that doesn’t reflect the conditions often found at the reference adjustment depth for pH.
120: the GLODAP data are not directly used in the float pH correction procedure in most instances.
135: this is not a CRM for pH, and there is no certified value
137: on what basis is it assigned this very low pH uncertainty given that the best methods are typically assigned an uncertainty of 0.01-0.007? The uncertainties in the calculations from DIC and TA are also quite large, and the uncertainties from the conversion to in situ conditions are thought to be large and poorly known. This claim might be justifiable if it is expressed in terms of reproducibility, but uncertainty has a more expansive definition than reproducibility.
152: This caveat should go earlier.
200: note
230: and in
236: measure pH
250: Figure 4 does not obviously support this statement without further explanation
256: what does the "respectively" refer to? Are these different depths? Estimate methods?
Figure 4: You should indicate whether LIR-PH estimates are using the OA adjustment option. I believe SAGE usually omits this adjustment, which might explain why LIR-PH is high relative to CANYON-B in the North Atlantic and low in the North Pacific. That said, as you correctly point out, LIR-pH uses an globally uniform OA adjustment that varies only by density, whereas CANYON-B uses an empirical local fit (that, I argue elsewhere, may erroneously project interannual variability forward and backward in time). Carter et al. 2021 attempt to resolve these issues, and this technical note would be more useful if it also included a comparison with the ESPER routine estimates. These routines are slated to become incorporated into SAGE. (Note, you’ll probably still find large differences between ESPER pH estimates and LIR-pH estimates, particularly in the North Atlantic, which appear to be attributable to the omission of depth as a predictor variable from ESPER-pH… i.e. the values from ESPER-LIR are much more similar to LIR-PH (and ESPER-NN) when depth is included as a predictor... future updates to ESPER-LIR will likely have depth as an optional predictor to minimize the discontinuity that we'll see if and when the transition from LIPHR to ESPER-LIR is implemented).
Figure 4c: The y axis is not labeled. Also, there may be a missing 0 in the std value?: It is only possible for the STD to exceed the 5th and 95th percentiles if there are a small number of extreme outliers (that should have likely been omitted). This is among the most important numbers in this manuscript, to my thinking. If it is much smaller, then I'd ask whether the std is indeed much larger than we’d expect from a comparison of two algorithm reference adjusmtents?
261: from the way the
264: is the noise uniform with depth? If not, then it should be handled by averaging the adjustment across a depth range. If it is, then should it be removed by adjusting every profile independently?
266: “corrected for” should be “removed”
Figure 5: The use of 2 y axes is very confusing here and defeats the purpose of being able to compare the adjustments to one another, though the goal of increasing visibility makes sense. It would probably be better to use a single y axis, which would allow the plot to focus in on the 0.04 to 0.06 range. The greater vertical resolution should also improve visibility. If the different methods are difficult to distinguish on this unified scale, then that suggests that the methods aren’t different enough to worry about on this scale. What are the numbers in the upper right? (it’s not hard to guess, but it is better if it is spelled out)
278: profile-by-profile or cycle-by-cycle? Either is fine but be consistent. Also, doesn’t it remove too much sensor noise, or am I misunderstanding?
282: “has to” is an overstatement as this is not currently done by many data centers (this is another case where the verb is among the last words in the sentence). Also, you have not completely made the case against the cycle-by-cycle approach or the SAGE approach. The basis for the SAGE approach, I believe, was based upon the first principles assumption that the reference potential jumps with discrete events and that the wiggles around the jumps reflect variability in the algorithm estimates or short term sensor noise. This hypothesis would need to be discounted to really make the case effectively.
293: discontinuities are not a problem for QC. The statistics still work fine. They are perhaps a problem for studies examining biogeochemical variability over time for a specific profiling float, but these studies would also be challenged by excessively smoothed transitions if a reference potential jump occurred. I believe the authors have a strong case to make here, but this presentation leaves me more confused than convinced. I’d urge them to instead focus on the consequences for a common biogeochemical analysis that would be affected by discontinuities (leading to, e.g., discontinuities in DIC vs. time… though even then a clearly visible discontinuity in DIC might be preferable to a smooth-seeming but equally spurious excursion in DIC as the smoothed adjustment factor catches up to the true adjustment factor). These arguments lead me to the belief that the best approach would be a 1000-1500 m average adjustment applied cycle-by-cycle.
289: Figure A1 is critical for your argument. It needs to be brought into the main text if this section is retained in the final manuscript. It needs to be well-explained and your rationale for preferring the smoothed adjustments over alternatives needs to be more strongly and thoroughly defended. The rationale should go quantitatively beyond “In our view, the sensor rather shows undulations in response with smooth and less smooth phases.”
293: others would argue that these are correcting biases of that magnitude
Figure A1: would be more useful if pHTotalInsitu were plotted as the difference from the algorithm estimate. Also, it is unclear if the parking depth pH value has any adjustments applied and, if so, on what basis. Finally, the indication of whether pH is “total scale pH” is inconsistent in these figures. Based on other conversations with people who have strong opinions about these things, my recommendation is to universally use pHT to indicate total scale pH.
302: Yikes! Variants on this experiment have been performed by several members of the community, and none, to my recollection, saw consistent differences this large.
302: There is a growing sense in the community that bottle pH samples are not well preserved even when following SOPs for DIC and AT storage. Is this possibly a discrete sample issue? Do you have some at-sea measurements to compare with?
302: the two numbers in this figure only make sense after looking at figure 6, and it should be clear from the text alone
6D: the y axis is unlabeled and missing many negative signs (looks like the figure was just cut off on the left). Far more worrying, the delta pH is about the same magnitude as the change in pH expected from changes in temperature. It is unlikely that there is a 100% uncertainty in the calculated pH change with temperature, which leads a possible simpler explanation that the pH vs. temperature correction was performed incorrectly in this comparison. Were the discrete pH values recalculated at the in situ temperature and pressure? When you say “discrete water temperature” is this the temperature at the time of bottle triggering or the laboratory temperature at the moment of analysis?
324: SOOP-pH is not a mature effort to my understanding. Some comments on the SOOP-pH methods and QC practices are warranted. SOOP-pCO2 comparisons, to my understanding, are showing much more modest implied float offsets.
328: needs
Table 2: A standard deviation of 0.5 salinity units is quite large, is it not? Usually this represents an offset of hundreds of km or more in the surface of the North Atlantic except in areas of very strong salinity gradients near coasts. A mean of 0.4 is just as worrisome. Am I misreading this?
364: Climate and weather goals are specifically formulated based on needs for characterizing (paraphrased) “changes in carbonate ion concentrations.” This means they are related to precision and not accuracy. This analysis is concerned with accuracy to a much greater extent than precision, so this is not an apples to apples comparison. A better comparison would be how the offset varies over time for a given float or varies between two or more floats in the same location.
379: The TA uncertainty is also far more important if we become interested in DIC.
385: The situation becomes worse still when including uncertainties in the carbonate system constants (Orr et al., 2018)… however, again, these additional uncertainties should be relatively consistent over time, and “weather” and “climate” relate to precision. Many of the concerns raised herein (and the concern over carbonate constants) fall away when you begin to examine variability in the float observations over time.
Citation: https://doi.org/10.5194/bg-2023-76-RC1 - AC1: 'Reply on RC1', Cathy Wimart-Rousseau, 29 Aug 2023
-
RC2: 'Comment on bg-2023-76', Anonymous Referee #2, 28 Jul 2023
This paper explores the impact of different calibration choices on the resulting data accuracy for pH data collected by autonomous sensors on Argo floats in the NW Atlantic including comparisons with independent data in the region. The presentation is detailed and will be of high interest to groups seeking to quality control and use data from pH sensors on floats and other autonomous vehicles. The recommendations for choices to make and issues that remain to be solved are helpful. I have a few suggestions that I think will improve the paper and enhance how useful it is to the broader community.
Moderate comments
Breakpoint nodes: The existence and use of breakpoint nodes in pH sensor calibration may be well known in the pH community already but not to researchers focusing on other sensors. Some discussion of this phenomenon and its likely causes would help this paper be useful to a broader community who may be coming to pH sensor calibration from other sensors. What is the likely cause of these sudden jumps? Discussion of their likely cause would help support the authors’ recommendation that the jumps should be gradual rather than sudden. In particular, I think it would be useful to demonstrate that the sudden jumps are directly related to changing sensor response rather than to apparent discontinuities in the reference value derived from CANYON-B or LIR. The authors states that these swings in the record are mostly interpreted as changes occurring in the sensor, but this could be supported with the available data. Figure A1 may be partly showing this, but I was unsure if the parking depth pH data had already been corrected or was raw. A figure (perhaps supplemental) showing the raw pH values and the CANYON-B and LIR values (on different scales) at the relevant pressures would make which signal contained the apparent discontinuities clear.
Section 3.2.1: The authors make the point that the crossover between the float and discrete data is extremely close in time and space. I suggest that they compare temperature/salinity between the float and discrete profiles. This would add evidence that the same water masses were sampled by both.
First paragraph of section 3.1.1: Very interesting discussion, but I think you could be clearer about whether the main reason for the big offsets between using 1500 and 1960 debar as the reference depth in this region is due to the occasional large convection depths or just because there are offsets in the reference pH values between these two depths, i.e., depth based accuracy differences in CANYON-B or LIR.
Line 125: That deployment cast pH measurements are an ineffective means of calibrating float pH sensors seems very important to me. Suggest that you support this statement more strongly. The two papers cited are on oxygen optodes rather than pH sensors. How large a drift rate is typically observed at the start of deployment? When does this change? Could this be overcome with a conditioning period before deployment?
Minor comments
Internal waves: The authors take care to remove the effect of internal waves on the discrete – float data comparison. I suspect that this effect is implicitly removed in the calculation of reference pH at a specific depth by CANYON-B and LIR, but it would be useful to state that explicitly.
Line 60: suggest changing “qualification” to “quality control”.
Line 62: suggest adding citations for the suggested procedures.
Line 99: suggest rephrasing. Your paper is evaluating the accuracy range after data adjustment.
Line 106: suggest adding a few more details. Were the adjustments based on air-calibration or surface climatologies or …?
Line 200: “not” should be “note”
Lines 246-247: suggest clarifying what is meant by “keep the adjustment”. I wasn’t sure if you meant that you did apply the optional CANYON-B pH data adjustment to align with spectrophotometric data or you did not.
Line 332: I think “as” should be “than”
Line 394: remove comma after both
Figure 6d: Add y-axis labels and negative signs to all y-axis labels.
Citation: https://doi.org/10.5194/bg-2023-76-RC2 - AC2: 'Reply on RC2', Cathy Wimart-Rousseau, 29 Aug 2023
Cathy Wimart-Rousseau et al.
Cathy Wimart-Rousseau et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
398 | 125 | 13 | 536 | 8 | 9 |
- HTML: 398
- PDF: 125
- XML: 13
- Total: 536
- BibTeX: 8
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1