the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessing improvements in global ocean pCO2 machine learning reconstructions with Southern Ocean autonomous sampling
Thea Hatlen Heimdal
Galen A. McKinley
Adrienne J. Sutton
Amanda R. Fay
Lucas Gloege
Abstract. The Southern Ocean plays an important role in the exchange of carbon between the atmosphere and oceans, and is a critical region for the ocean uptake of anthropogenic CO2. However, estimates of the Southern Ocean air-sea CO2 flux are highly uncertain due to limited data coverage. Increased sampling in winter and across meridional gradients in the Southern Ocean may improve machine learning (ML) reconstructions of global surface ocean pCO2. Here, we use a Large Ensemble Testbed (LET) of Earth System Models and the pCO2-Residual reconstruction method to assess improvements in pCO2 reconstruction fidelity that could be achieved with additional autonomous sampling in the Southern Ocean added to existing Surface Ocean CO2 Atlas (SOCAT) observations. The LET allows us to robustly evaluate the skill of pCO2 reconstructions in space and time through comparison to ‘model truth’. With only SOCAT sampling, Southern Ocean and global pCO2 are overestimated, and thus the ocean carbon sink is underestimated. Incorporating Uncrewed Surface Vehicle (USV) sampling increases the spatial and seasonal coverage of observations within the Southern Ocean, leading to a decrease in the overestimation of pCO2. A modest number of additional observations in southern hemisphere winter and across meridional gradients in the Southern Ocean leads to improvement in reconstruction bias and root-mean squared error (RMSE) can be improved by as much as 65 % and 19 %, respectively, as compared to using SOCAT sampling alone. Lastly, the large decadal variability of air-sea CO2 fluxes shown by SOCAT-only sampling, may be partially attributable to undersampling of the Southern Ocean.
- Preprint
(9421 KB) - Metadata XML
-
Supplement
(14679 KB) - BibTeX
- EndNote
Thea Hatlen Heimdal et al.
Status: final response (author comments only)
-
RC1: 'Comment on bg-2023-160', Anonymous Referee #1, 09 Nov 2023
General comments:
Heildal et al., assess how much our estimate of the annual ocean CO2 sink would improve if we had regular autonomous sampling of the partial pressure of CO2 in surface seawater (pCO2) from Saildrones in the Southern Ocean. To do this, observing system simulation experiments were carried out with a large ensemble testbed (LET) of Earth System Models and the authors’ observation-based method for estimating the ocean CO2 sink. In other words, they are evaluating the influence of the availability of observations on the performance of their method.
Such analysis is particularly relevant because we are increasingly relying on observation-based methodologies to estimate the annual ocean CO2 sink, and these methods therefore need to be evaluated more systematically. There is also an alarming decline in the ocean CO2 observing capacity that autonomous platforms (such as Saildrone) should help to counter.
However, (i) I have some reservations about the way the authors have used their method, (ii) I would expect a more quantitative estimate of the magnitude of the differences potentially detected between the different experiments performed, and (iii) the results and discussion of the decadal variability of the oceanic CO2 sink need to be improved (see specific comments). As a result, the paper is likely to be a significant scientific contribution with some revisions.
Specific comments:
1) To calculate the residual pCO2, the authors used the equation in line 160. In this equation, the term pCO2mean came from “surface ocean pCO2 […] using all 1°x1° grid cells from the testbed” (line 162). But in the original publication of the methodology (equation 2 in Bennington et al., 2022a), the term pCO2mean comes from an initial reconstruction of pCO2. Therefore, in this submitted manuscript, by using model outputs instead of an initial reconstruction of pCO2 fields, the authors assumed that their method would be able to perfectly reconstruct the long-term average pCO2 at each grid cell. It seems to me that to obtain a more accurate evaluation of their method (i.e., more accurate observing system simulation experiments) the authors should follow the steps as they were originally published. If this is not possible, could the authors explain why and prove that this assumption does not influence their results.
2) To calculate the net sea–air CO2 flux, authors used (line 272): “EN4.2.2 salinity (Good et al., 2013), SST and ice fraction from NOAA Optimum Interpolation Sea Surface Temperature V2 (OISSTv2) (Reynolds et al., 2002), and surface winds and associated wind scaling factor from the European Centre for Medium-Range Weather Forecasts (ECMWF ERA5 sea level pressure (Hersbach et al., 2020)”. But as mentioned in line 95: “The goal here is to assess the accuracy with which an ML algorithm can reconstruct the ‘model truth’”. Therefore, I would expect the model outputs (some of which have already been used for pCO2 reconstruction) to be used to calculate the CO2 flux, rather than observational data which may have different variabilities and/or trends to those simulated. The authors could then compare this calculated CO2 flux to the simulated CO2 flux (and not to a "model truth" CO2 flux from simulated pCO2 fields mixed with observational data). This is particularly important when the authors are discussing the ability of their method to reproduce CO2 flux variability (see my next comment).
3) Authors wrote line 531: “The SOCAT baseline demonstrates a weakening of the global and Southern Ocean carbon sink in the 2000s (Figs. 10, S12), which is in agreement with various data products using real-world SOCAT data”. The weakening of the Southern Ocean carbon sink occurred in the 1990s (Le Quéré et al., 2007), while a reinvigoration of the sink was observed during the 2000s (Landschützer et al., 2015). The authors therefore need to revise their text. More importantly, this study focuses on the ability of the authors' method to reproduce "model-truth" variability and not the "real-world" variability. Consequently, I would suggest calculating certain metrics of variability (for example, the size of decadal variability or trends) from simulated CO2 fluxes (and not from recalculated "model-truth" CO2 fluxes, see my previous comment) and comparing the values of these metrics with the values that would be obtained when reconstructed CO2 fluxes are used. Because, otherwise, it assumes that all models perfectly reproduce the variability of the 'real world', which might not be the case.
4) The LET has 75 members (i.e., simulations). For each experiment, the values given in the manuscript and in the figures are for the most part averages calculated over the 75 members of the ensemble. But no information is given on the dispersion (or confidence interval) around these averages. It is therefore not possible to assess whether the differences mentioned between the experiments are significant or not.
For example:
- The interpretation of Figure 5 (line 335): “The ‘one-latitude’ ‘high-sampling’ run ‘x13_10Y_J-A’ (44,250 observations) show similar bias or is outperformed by all ‘zigzag’ runs as well as the ‘one-latitude’-runs that restrict sampling to southern hemisphere winter months (i.e., ‘x5_5Y_W’ and ‘x13_10Y_W’).” How similar or superior is the performance? Is it true for all members?
- Line 346: “Run ‘Z_x10_5Y_W’, which has the lowest bias out of the ‘zigzag’ runs (Fig. 5), shows improvement even further back in time, until the beginning of the testbed period (Fig. S6).” Is it really significant?
I would therefore suggest not only reporting the averages over the 75 members, but also taking advantage of the study of the spread around these averages.
Technical corrections:
5) Line 37, Please explain the acronym "fCO2". Note that the term "pCO2" is also used in the manuscript. Although understandable to researchers working on this topic, it is less clear to a wider audience. Therefore, authors should be more careful about the terms they use, especially in the Abstract and Introduction sections.
6) Line 178, Why aren't the number of decision trees and depth levels different for each reconstruction?
7) Line 188, After reading this sentence, I wasn't sure whether the authors were always going to use the "unseen" values. Could the authors be clearer?
8) Line 203: “2) potential future meridional USV observations (‘zigzag’ track)”. Are they realistic? I found some elements of response later in the text, but it would be good to know here whether all the experiments are realistic or not.
9) Table 1: I suggest replacing table 1 with table S1. This is because the information in table 1 is repeated in table S1, and table S1 contains important values that the reader should be able to access easily.
10) Line 268, Why not use the same method across all models to calculate pCO2atm? Do all the values obtained take into account the contribution of water vapor pressure?
11) Line 293: “where algorithm generally overestimates pCO2”. This is not the case for the Atlantic sector of the Southern Ocean.
12) Figure 3, colour scale: The colour scales need to be harmonised. In panel a, a white colour means a good value, whereas in panel b, it means a bad value.
13) Figure 3, line 301 to 307: All this information is already present in the text. Please write shorter figure captions. This is a general comment, not just on figure 3.
14) Line 318 and wherever necessary in the text: “…where the baseline reconstruction…” Please, use the expression "SOCAT baseline" that was introduced in the method section.
15) Line 384, Please delete the reference to "bias". This was introduced in the previous section.
16) Line 493, why not excluding the hypothetical data points that would be covered by sea ice?
17) Figure 10: The figure starts in 1985 and not 1982, why?
18) Figure S3: Because you focused on the open-ocean (line 123), non-open-ocean data should be removed as they were not use for the training, is it right? Does this drastically modified the data availability and explain why better results are obtained from 1990?
Citation: https://doi.org/10.5194/bg-2023-160-RC1 - RC2: 'Comment on bg-2023-160', Anonymous Referee #2, 15 Nov 2023
Thea Hatlen Heimdal et al.
Thea Hatlen Heimdal et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
185 | 64 | 7 | 256 | 17 | 8 | 8 |
- HTML: 185
- PDF: 64
- XML: 7
- Total: 256
- Supplement: 17
- BibTeX: 8
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1