Reconstructing ocean carbon storage with CMIP6 Earth system models and synthetic Argo observations

Turner, Katherine E.; Smith, Doug M.; Katavouta, Anna; Williams, Richard G.

doi:https://doi.org/10.5194/bg-20-1671-2023

Articles | Volume 20, issue 8

https://doi.org/10.5194/bg-20-1671-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/bg-20-1671-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 20, issue 8

Research article

| Highlight paper

|

28 Apr 2023

Research article | Highlight paper |

| 28 Apr 2023

Reconstructing ocean carbon storage with CMIP6 Earth system models and synthetic Argo observations

Katherine E. Turner, Doug M. Smith, Anna Katavouta, and Richard G. Williams

Download

Final revised paper (published on 28 Apr 2023)
Supplement to the final revised paper
Preprint (discussion started on 18 Aug 2022)

Interactive discussion

Status: closed

RC1:
'Comment on bg-2022-166', Anonymous Referee #1, 08 Sep 2022
The manuscript “Reconstructing ocean carbon storage with CMIP6 models and synthetic Argo observations” by Turner et al. is well written and easy to follow. The figures are well chosen, good to read, and facilitate the understanding of the study. The methods are sound and well presented, although some precisions at a few points would be needed, I think.

However, I am somewhat concerned about the novelty and significance of this work for this journal, although this might well be due to a misunderstanding of the methods. To be sure that no misunderstandings have occurred, I am first summarizing the paper very(!) briefly in my own words. If the authors detect significant flaws, please consider them when reading my following comments.

As far as I understand, the here presented method builds on the assumption that the DIC (averaged over the first 100 m in the ocean) at a point x,y can be expressed as the average DIC from 1955 to 2014 at this point x,y plus a delta DIC. This delta DIC can, for any moment in time, be approximated by a linear combination of delta pCO2, delta T, and delta S at locations more or less close to point delta x,y.

My comments are:

Long-term trend

To me, it seems somewhat trivial that the long-term trajectory of DIC in the first 100 m, a depth that is more or less representative of the average mixed layer depth, is mainly determined by the atmospheric pCO2. The provided BATS example is an excellent example (blue dashed line in Figure 7b). By knowing also temperature, which affects solubility, and salinity, which is closely related to alkalinity, at only one point close to the point of interest, slight deviations of this long-term trend can be represented, as well as inter-annual and decadal variability that is not due to changes in primary production or remineralization can be represented. This is nicely represented by the HOT example (blue dashed line in Figure 7c). As the authors describe in their paper, the underlying mechanisms are well-understood: air-sea CO2 flux due to increasing atmospheric pCO2, decreased solubility with increasing temperatures, and more DIC with increasing alkalinity. Therefore, I think that the predictability of DIC in the mixed layer from pCO2, T, and S as well as a background DIC, is not surprising.

Variability (detrended signal)

When the detrended response is looked at, pCO2, which is responsible for the largest part of the trend, should not play any role anymore as annual globally averaged pCO2 has little variability. At this point, the RMSE improvement seems to drop (Figures 5d-f and 6d-f). It remains mainly high in the relatively slow and calm subtropical gyres and becomes smaller in the more dynamical regions like the tropical Pacific, eastern upwelling regions, or the Southern Ocean. In these regions, variability in close-to-surface DIC is likely influenced by a variability in the circulation and hence the upwelling of nutrients and carbon. The negative correlation between DIC and T in the tropical eastern Pacific (2d) due to El Nino is a good example. It would be nice to discuss and analyze a little bit more, why these regions are less predictable. As these regions are usually also the regions with the highest variability in surface ocean DIC, I was wondering how the statement that the RMSE of the detrended signal was reduced by 60% was calculated. Did you calculate the area-averaged eta from Figures 5e and 6e? Or did you compare the detrended true signal of globally averaged DIC to the predicted signal of globally averaged DIC? I am not sure which way is better as both have potential pitfalls. Taking the area-weighted average of eta gives potentially to strong emphasize to regions with little variability, as the subtropical gyres, that contribute only little to the global ocean DIC signal. However, by taking the area-averaged DIC first, errors in different directions may globally compensate. Maybe it would be best to use the local eta at each point x,y and weight them by the true DIC variability at that point x,y?

Significance of the first 100 m for the long-term ocean carbon sink

Another point that I am not sure about is the importance of this analysis for the global ocean carbon sink. How much of the additional DIC is in the first 100 m? The Global Carbon Budget 2021 (https://essd.copernicus.org/articles/14/1917/2022/essd-14-1917-2022.html) estimates that from 1960 to 2020, 115 Pg C has entered the global ocean. Recent estimates from observationally constrained ESM output suggests that it might be a bit more. Thus from 1960 to 2014, the ocean has taken up very approximately ~100 Pg C. The upper ocean carbon inventory shoes an uptake of ~14 Pg C (tried to read that number from Figure 7a). Thus, the upper ocean is ‘only’ responsible for 14% of the global ocean carbon sink. Wouldn’t it be more important to quantify how much is transferred from the surface ocean to the interior ocean, i.e., subducted below the mixed layer (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/gbc.20092, https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015GL065073). Recent studies indicate that salinity and temperature might indeed be able to reproduce the interior ocean Cant (https://www.science.org/doi/10.1126/sciadv.abd5964, https://www.nature.com/articles/s41586-020-2360-3, https://bg.copernicus.org/articles/18/2221/2021/bg-18-2221-2021.html#&gid=1&pid=1, and https://bg.copernicus.org/preprints/bg-2022-134/). However, the interior ocean transport might make this much harder. The authors mention that this will happen in the next step, but I think it could/should also be incorporated in here if(!) the long-term trend remains a focus of this study.

Significance of the first 100 m for the variability of the ocean carbon sink

Figure 7a suggests, however, that the fist 100 m might be very interesting for the inter-annual or decadal variability. In Figure 7a, you show the variability of the upper ocean 100 m DIC. How much of this is caused by the air-sea CO2 flux and how much by transport to the interior ocean. Could the variability in the air-sea CO2 flux maybe largely be derived from variability in the ocean T and S? Or in the variability of subduction on these time-scales? Can your method provide an estimate of the subduction of DIC on an annual basis using the surface ocean air-sea CO2 flux estimates in combination with your estimates? The example of HOT in NorESM is striking as it suggests that the upper-ocean has taken up almost no carbon between 1990 and 2014 despite an increase in atmospheric pCO2.

Proof-of-concept without application to real data

After having read through the manuscript, I was really disappointed that this new method is not yet applied to observations right now. There is much potential, and I understand the idea of making two publications, one for ‘proof-of-concept’ and one for the application, but now it seems as if something is missing in this paper. If no institutional, or PhD-related restrictions exists, I would recommend to also present the application here.

My overall recommendation would therefore be to de-emphasize the long-term trend, focus on the variability, apply the method to observations, compare it to estimates of the air-sea CO2 flux, and try to address the subject of the decadal and inter-annual variability of the ocean carbon sink.

As you already have the models, you might even be able to draw conclusions why and where models do not capture the variability. Are the regions with the largest variability in the air-sea CO2 flux also the regions with the largest variability in the upper ocean DIC?

If the focus is more on the detrended signal, the subsurface ocean would not fit anymore and could be left for another study.

While this sounds somewhat negative, I am in strong favor of publishing this manuscript in Biogeoscience but with an application to observations. If not applied, maybe GMD is more suited for 'proof-of-concept' studies? But that is an editorial decision.

Minor comments:

Line 22, page 1: While the Global Carbon Budget summarizes the numbers, I prefer to give credit to the original studies (for example: https://bg.copernicus.org/articles/10/2169/2013/, https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2013GB004739, and recently https://bg.copernicus.org/preprints/bg-2022-134/)

Line 25, page 2: Gruber et al. is about compound extreme events and rather summarizes literature. I would recommend to replace it by this earlier study on global ocean acidity extremes (https://bg.copernicus.org/articles/17/4633/2020/) and this regional study (https://www.nature.com/articles/s43247-021-00254-z) that provides one example in detail.

Line 57: I think this is the first time that you use the term ‘upper ocean’. Maybe define it here.

Lines 60-66 and lines 99-100: Would suggest to add the Southern Ocean studies (https://www.science.org/doi/10.1126/sciadv.abd5964, https://www.nature.com/articles/s41467-022-27979-5), and one study from the Arctic Ocean (https://www.nature.com/articles/s41586-020-2360-3) that show how surface observations (and variations) of S and T can alter the CO2 uptake. Or maybe keep that for later in the Discussion when you describe the potential to go below the surface.

Line 112: Does the method account for inter-dependencies between the predictors? Sorry about the question, but I am not very familiar with this kind of method.

Line 132-133: Just wanted to say that I really like that you provided the regridding method here in such detail. Often it is not possible to reproduce data because the regridding software is not named.

Line 141: Something is wrong in the sentence, I think.

Lines 147-154: Just to be sure that I understand it right. When you use the ARGO 2002 coverage, you assume that you have data at each of these points for all years from 1955-2014? Is that right? If yes, how do changes in the temporal coverage effect time series as shown in Figure 7? If you have little data for the first 30 years and then more, you should get a better representation of the variability over time, right? Figure 7c seems to suggest that there is only 1 observation for the blue line, but that means one observation per year in the vicinity of the HOT station, right?

Section 3: I found this section a little bit difficult to follow. What could help (maybe it doesn’t), is to look only at the long-term signal and only at the detrended for the correlations. Than the long-term should give you a correlation between pCO2, T, and DIC. T and DIC because the ocean is warming and taking up carbon (but one does not cause the other) and the other between pCO2 in the atmosphere that causes DIC in the ocean to increase. For the detrended signal, I would expect no correlation between pCO2 and DIC as the pCO2 trend is gone. T and DIC should be negatively correlated (like shown in Figure 3b). And salinity should be positively correlated. A little bit along the lines of Figure 2 in https://www.nature.com/articles/s41467-022-32120-7

General: Did you think about including the area of sea ice in each cell as well? As this blocks the air-sea gas exchange to some extent, it might be a powerful predictor in the high-latitude oceans. Just a guess.

Section 6: While I completely understand the reason to only choose Nor-ESM to make these tests, I would be curious how other models perform.
Citation: https://doi.org/10.5194/bg-2022-166-RC1
- AC1: 'Reply on RC1', Katherine Turner, 28 Oct 2022
  
  We would like to thank the reviewer for their thoughtful and detailed comments. We have attached our full response to both reviewers as a supplementary PDF.
  
  Citation: https://doi.org/10.5194/bg-2022-166-AC1
RC2:
'Comment on bg-2022-166', Anonymous Referee #2, 24 Sep 2022

In this study Turner et al. reconstruct the ocean carbon storage from first-order relationships between temperature, salinity and atmospheric CO2 and Dissolved Inorganic Carbon (DIC). The authors use a set of CMIP6 models to assess these relationships and estimate the covariance fields. The inferred statistical relations are then used to reconstruct the carbon storage from hydrographic pseudo-observations. In order to test the capabilities of this approach, two sampling methods are proposed: 1) a complete coverage using CMIP6 co-located observations and 2) Irregular sampling consistent with Argo profiles. While co-located observation (1) is taken as a sensitivity test, the irregular Argo-style observations (2) show the potential of this method to use real Argo measurements to reconstruct the carbon storage. Both of the sampling methods offer a significant improvement compared to the reconstruction based solely on the climatological mean. The study is well-written, presented in a structured logical manner and the results and their implications are easy to follow.

This study represents a significant advancement within the field, offering a powerful method to understand both the spatial and temporal variability of DIC. This method could not only be used to reconstruct the carbon storage from real hydrographic measurements but also, the resulting DIC fields could be used to identify differences between linear and non-linear mapping methods as well as explore the differences in the processes that affect the DIC between different models. Based on the aforementioned, I recommend the publication of this study. Here are just some suggestions I believe would add value to the publication:

General Comments:

- There is an emphasis throughout the manuscript on the application of this method to reconstruct ocean carbon storage from real-world Argo observations. I was expecting such reconstruction at the end of the paper and a comparison to existing reconstructions (such as GLODAP). I understand that this paper is intended as a "proof of concept" study, followed by another one regarding its "real-world applications". If this is decided as the final form of the study, without including the reconstruction using real Argo measurements, I think the potential application to Argo measurements should be de-emphasized throughout the text.

- The results in this study are focused on the top 100m, however, I couldn't find in the text the motivation to choose such a horizon. Is it because the mixed layer drives the variability in carbon sink? Or, do the statistical relationships between temperature and salinity and DIC break down in the interior ocean?

- Since it is possible to separate the detrended covariance fields into pCO2 and non-pCO2 terms, if I understood it correctly, it would also be possible to reconstruct the preindustrial DIC and thus calculate the Anthropogenic Carbon fields as the difference between Total and the Preindustrial DIC. If this is possible, it should be mentioned in the paper as possible applications of the method, as there is a significant interest in the community regarding the drivers of the variability of anthropogenic carbon sink (e.g. Gruber et al., 2018: 10.1146/annurev-marine-121916-063407; Gruber et al. 2019: 10.1126/science.aau5153; DeVries 2017: 10.1038/nature21068).

- When reconstructing the carbon time-series from Argo-style observations in Section 6, two observation locations are used, one at year 2002 (with very sparse observations) and another at year 2015 (with many more observations). As far as I understand, the entire time-serie (1955-2014) is reconstructed based on the spatial distribution of Argo observation locations in 2002 and 2015, but the number of locations is constant in time. E.g. carbon storage in 1960s is reconstructed from observations that did not exist in the real world. Why not account for both the temporal and spatial distribution of the observation locations? This would give a true insight into the potential of the real-world Argo measurements to reconstruct the time series of carbon storage.

Comments by Line:

L150: What does it mean "6 months of profiles"? One profile at least in each of the 6 months? 6 profiles in total?

L198: I think there should be a full stop after pCO2 instead of a coma to make the sentence less confusing.

L199: Should DIC' and T' be in italics?

Figure 4: Would it be possible to use the same colorscale? This would make the comparison of the magnitude of the coefficients more intuitive.

L378: Do you mean Fig 6a and d, instead of a and c?

Citation: https://doi.org/10.5194/bg-2022-166-RC2
- AC2: 'Reply on RC2', Katherine Turner, 28 Oct 2022
  
  We would like to thank the reviewer for their thoughtful and detailed comments. We have attached our full response to both reviewers as a supplementary PDF.
  
  Citation: https://doi.org/10.5194/bg-2022-166-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (04 Nov 2022) by Peter Landschützer

AR by Katherine Turner on behalf of the Authors (15 Jan 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (24 Jan 2023) by Peter Landschützer

RR by Anonymous Referee #1 (09 Feb 2023)

Suggestions for revision or reasons for rejection

I want to thank the authors for their extensive review, their detailed answers to all major and minor comments, and their patience with me when I did not understand everything. The extension of the analysis to the upper 2000 m are a great plus and after the response, I entirely agree that it is best to apply the method to the ARGO data in a separate study.

However, I have two last outstanding major question with respect to the Methods that need to be clarified before publication and that could substantially improve the results. Although I have clicked 'major revisions', I do not want you to see this as a 'traditional' major revisions. The manuscript is in great shape but I think the two major points would make it much better. The first point would have to be adressed in text if the model drift was accounted for and needs additional analysis if it was not. The second point could be adressed by changing the text, but I believe an additional analysis might substantially strengthen the study.

1) I could not find information if the model output was detrended using the pre-industrial control output. Trends are usually small to negligible close to the ocean surface due to exchanges with the atmosphere but can be substantial in the subsurface ocean. If such drifts exist and vary across the models in the ensemble, it might be almost impossible to obtain the optimal weights from these models. Such a drift may well cause the bipolar optimal coefficient pattern for T and S that is shown in Figure 7. If the drift is not accounted for, I think it has to be accounted for. The best way is probably to fit a spline to the pre-industrial control run and to remove it from the historical simulation. In this way, the inter-annual or decadal variability, which are likely similar in the piControl and historical run, is not removed with the drift.

2) It is somehow concerning that problems arise for MPI-ESM1.2-LR because of the higher spatial resolution. I believe that a mistake was made. Following Séférian et al. (2020) (https://link.springer.com/article/10.1007/s40641-020-00160-0), MPI-ESM1.2-LR has one of the coarsest resolutions. If the MPI-ESM1.2-LR resolution is indeed coarser, the argument would have to be altered. At the moment, the weak performance of the DIC reconstruction under MPI-ESM1.2-LR is discussed essential weakness of this method when being applied to observations and regions with lots of mesoscale processes. However, this might be wrong if MPI-ESM1.2-LR was not highly resolved. From my experience, MPI-ESM1.2-LR is more likely an outlier and may not affect the performance of the methods when applied to observations. I believe an adjustment of the Discussion is absolutely necessary. However, I believe the authors could significantly increase trust in the reconstruction method, and I believe the method merits it, by including GFDL-ESM4 (1/2° resolution) and GFDL-CM4 (1/4° resolution). Both model versions are highly resolved compared to other models and GFDL-ESM4 is the best performing model with respect to historical carbon uptake (Annex in Terhaar et al., 2022). It would take a large amount of work and I hence cannot ask for it but I think it would be a worth it.

In addition, I have one comment that is neither major nor minor:

1) Section 3.1 seems to be complicating the message. The co-evolving trends of atmospheric CO2 and warming lead to correlations that are not related to the effect of warming on the ocean system, i.e., changes in circulation and solubility. In fact, the ensemble coefficients in Figure 4 show this as the positive correlation from temperature is entirely accounted for by the increase in pCO2 and the pCO2 coefficient, whereas the T coefficient accounts for the solubility. I believe that Figure 4 really shows the strength of the approach and merits to be a centerpiece of the results.
I am not an author of this study but would seriously consider removing section 3.1, as it confused me more than it helped. However, I am only one person and others may disagree. And as nothing is wrong with the section, I only wanted to voice my opinion and do not want to ask for changes.

Please find below some minor comments:

1) Figure 8: It seems that the reconstruction performs best, where most carbon is stored (North Atlantic and Southern Ocean) and performs poorly where little to no carbon is stored, for example in the deep North Pacific. Thus, the maps in Figure 8 might suggest that the reconstruction performs poorer than it does. Maybe it would be better to show differences between the reconstructed and the true DIC, like Figure 9.
2) Line 3 of the abstract: I would maybe highlight here that the temperature and salinity coverage is much less sparse than the carbon observation coverage. The contrast between both coverages is what this is all about.
3) Line 20: Should probably be Terhaar et al. (2022) and not 2020.
4) Line 28: would replace ‘sufficient’ by ‘enough’. The coverage of, 1.5% is likely not sufficient (L. Gloege, G. A. McKinley, P. Landschützer, A. R. Fay, T. L. Frölicher, J. C. Fyfe, T. Ilyina, S. Jones, N. S. Lovenduski, K. B. Rodgers, S. Schlunegger, Y. Takano, Global Biogeochem. Cycles, in press, doi:https://doi.org/10.1029/2020GB006788.)
5) Lines 38 to 47: This paragraph should probably also include reference to the application of the eMLR* method by Gruber et al. (2019) (https://www.science.org/doi/10.1126/science.aau5153)
6) Lines 56-57: What exactly are these well-understood relationships? Maybe worth to elaborate a bit more.
7) Line 59: Maybe worth to reference Weiss et al. (1974). (https://www.sciencedirect.com/science/article/pii/0304420374900152)
8) Line 234-235: I believe that this sentence is not correct. Fig. 2 is rather a combination of the parallel increase in atm CO2 and temperature and the solubily effect of T on pCO2.
9) Line 336: How many regions are ‘most regions’.

Hide

ED: Publish subject to minor revisions (review by editor) (14 Feb 2023) by Peter Landschützer

AR by Katherine Turner on behalf of the Authors (09 Mar 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (17 Mar 2023) by Peter Landschützer

AR by Katherine Turner on behalf of the Authors (06 Apr 2023) Manuscript

Download

Article (3516 KB)
Full-text XML

Co-editor-in-chief

The paper by Turner and co-authors tackles the very timely question on how well we can reconstruct carbon inventories given the sparse observations. Using an Ensemble Optimal Interpolation approach and synthetic observations, the authors show that a large fraction of ocean carbon and its variability can be reconstructed using temperature and salinity measurements in the top 100 meter, however, reconstruction skill decreases when the top 2000 meters are considered. The authors propose a new way to use sparse observations to better understand historical carbon cycle changes, i.e., an important quantity in light of future changes driven by man-made emissions.