|2nd review of Gregor et al: Interannual drivers of the seasonal cycle of CO2 in the Southern Ocean|
Response to previous concerns:
Previously, I have raised 3 main concerns. The first related to the very confusing presentation of variability and trends, the second related to the missing evaluation of uncertainties (of various sources) and the third related to the choice of time periods as they are compared in the text.
In their revised manuscript, the authors have indeed taken into consideration the concerns raised by this reviewer, however, as I will outline below, they still miss to clearly communicate the uncertainty of their analysis and thereby still present results overconfidently.
Firstly though, the authors have done a good job in their manuscript to clarify when they consider trends and variability and what is the timeperiod considered. This makes the manuscript much easier to read. Exceptions however still exist, e.g. the abstract line 21 (here authors talk about “interannual trends”) and on page 21 line 70, where the authors state “… summer drivers may explain the inter-annual variability in the decadal trends”. These statements need clarification – see my previous review.
Coming back to the uncertainty analysis: The authors now provide an error assessment which is a great leap forward, however, when discussing the results the new uncertainty estimates are often mentioned but likewise not properly taken into consideration. To give you a concrete example: Uncertainties are still only hand-wavy included in figure 4 which makes me doubt that these observed short temporal variabilities are real or just statistical noise given that these fluctuations are often in the order of 2-3 µatm (visually assessment based on figure 4). Figure 2 likewise suggests errors beyond the displayed differences between methods. Another example is figures 5 and 6. Here the authors do add the uncertainty, but fail to properly discuss the limitation that basically the majority of the SO variability is insignificant, besides a few regions. Instead the authors assume the significant regional drivers are representative for the entire region. The authors further only mention uncertainty in a first sentence of the sections but then it is not clear if this is properly taken into consideration when discussing the drivers (see specific comments below).
Regarding my third point, the authors still don’t make a strong enough case for their periods they consider. They refer in the text to figure 4 but visually it is not obvious why the periods have been chosen. I am aware that this is a new argument (as I have previously only criticized the length of the periods). This however can be easily solved by adding a sentence or 2 explaining why these periods were chosen (may this be due to some metric or subjective choice)
In summary, I think the manuscript has improved, but overall, issues remain regarding the uncertainty of the analysis. This really pains me because I do think that the paper is important and I do very much like the approach based on looking at anomalous periods (rather than linear trends). Also, an assessment based on 2 novel methods is a welcome addition to assessment papers such as the recently published Ritter et al. (GRL) Southern Ocean SOCOM trend comparison. I don’t think that (many of) the conclusions drawn would fall based on the error assessment, but in a data sparse region like the Southern Ocean, where all methods rely on heavy data extrapolation the uncertainty must be on the forefront of any variability, trend or process study.
Based on the revision, I cannot recommend the manuscript for publication. Instead I would like to see a revised manuscript, where the authors really discuss their results in in a fair way in light of the uncertainty they are facing. Plus I suggest the authors check remaining editorial issues. I am convinced that after this step (plus some minor comments below) the paper can become acceptable for publication in BG.
In general: Many editorial issues need to be fixed. E.g. in many instances commas are missing, the authors switch between present tense and past tense (e.g. in the abstract) and figure 7 is labelled figure 9. I will not list them all here, but rather suggest professional text editing.
Abstract line 9-10: (a) very minor but SOCAT includes fCO2 not pCO2 and (b) “… ship measurements of pCO2 (SOCAT) …” really is a clumsy way to introduce SOCAT. Firstly, the abbreviation SOCAT needs to be defined (Surface Ocean CO2 Atlas) and secondly, what about LDEO? This database equally includes pCO2 ship measurements.
Abstract line 13: “… nine regions defined by basin …” – at this time you have not mentioned the Southern Ocean so the reader gets the impression you talk about the actual basins.
Abstract line 15: delta pCO2 is not defined (i.e that you mean the difference between ocean and atmosphere – it may as well be the seasonal difference).
Abstract line 21: “Interannual trends” - see my previous assessment
Abstract line 22: “… chlorophyll-a variability where the latter had high mean seasonal
concentrations.” It is not clear what the authors try to say here
Introduction line 32: “accurately measure” – I suggest “accurately quantify”. Measurements of any quantity have reached high accuracy. The interpretation through interpolation methods (such as this study – hence the necessity of an uncertainty estimate) suffer from lower accuracy.
Introduction lines 34 and 35: “Empirical models provide an interim solution to this challenge until prognostic ocean biogeochemical models are able to represent the Southern Ocean CO2 seasonal 35 cycle accurately“ – it is not clear from the context of the text why the seasonality is suddenly important here
Introduction line 37: “source in the 1990’s” – I am not aware of any study that suggests the Southern Ocean was a source in the 1990’s. Studies of Le Quéré and Lovenduski only suggest a saturation of the sink. Do you refer to a specific region or a specific season or both?
Introduction line 44: Not all proxies in the literature are satellite proxies.
Introduction line 47: The Landschutzer et al 2015 paper focuses on 2002-2011 and not 2000-2010
Introduction line 56-57: Additionally, the Xue et al 2015 paper suggests the same trends based on observations south of Tasmania and should be cited.
Page 3 line 92: It is “Self-Organizing Map” – i.e. singular not plural.
Page 4 line 2: “v2.2” This is not the SOM-FFN version the authors refer to, but the version of the database where the data are stored.
Results: See also main comments above.
Page 4 line 18: “comparing the different products is beyond the scope of this study” is a clumsy formulation. The authors do compare products here, but pCO2. The phrase should rather read that comparing proxies is beyond the scope of this study
Page 6 line 73: The first sentence is not necessary – of course you discuss the results in the results section
Figures 2 and 4: Add uncertainty alongside the lines. Not as numbers. It is difficult to compare lines with numbers. At the moment, it looks like the authors try and highlight a difference between methods in Figure 2 that is not statistically significant (given the Ew and Eb numbers) as well as amplitude anomalies (green and blue) in Figure 4 that are as well not significant based on the Eb. This is very confusing. So, my question to the authors is: Can you actually say – with absolute certainty – that (a) any of the 3 methods is at any given point in time statistically significantly different from any of the other methods? (b) That anomalies are – with absolute certainty - the result of environmental conditions and not simply the result of internal variability? Based on the evidence presented, I doubt you can.
Page 8 line 10: The authors missed my point in the first review round. I have noted that I have not seen any evidence that the CLUSTERING step is causing the difference. I am well aware that there is a difference and I do trust the authors with their assessment that the difference comes from the SOM method, but in neither of the papers I have seen any evidence that it is in fact the CLUSTERING step responsible for the mismatch. Many people are using the Landschutzer product, hence such an assessment of the cluster-based mismatch would be very valuable to the community. So, in summary: it is not enough to point at a difference plot and jumping to mechanistic conclusions. The authors should rather add a more in-depth analysis – also comparing the products to actual observations - if they want to add such a conclusive statement.
Page 9 line 21: Is the MIZ now in- or excluded? Later on, it is mentioned again. And if it is excluded, then why mention it at all?
Page 9 line 34: Figure 3 a-d
Page 10 line 54: Now the MIZ is discussed again – very confusing
Page 10 line 65: Only 6 are shown? Why? Is the MIZ in or out? It seems that it is ignored in the figures but added to the text. This is misleading the reader.
Page 11 line 88: “however, our confidence in the changing trend is low due to lack of coherence between methods (Figure 2a,b) and only three years of data, with little data in 2014.” – This statement is a bit of a surprise. Here the authors highlight that their trends are uncertain, but in the following they discuss the short term IAV as if it only little uncertainty is present. In contrast, Rodenbeck et al 2015, Landschutzer et al 2015 and Ritter et al 2017 show that trends are more robust among methods than IAV. Please explain or expand.
Page 13 line 35: “The 335 mean of the method anomalies for each transition is then taken. These anomalies are considered significant if the absolute estimate of the anomaly is larger than the standard deviation between the methods for each period” – all fine, but I am puzzled why one uncertainty estimate is in the methods section and the other is in the appendix?
Page 13 line 47: The authors here mention that the uncertainty estimate masks out large regions. They equally and rightfully point out that there are other regions that are not masked and that those are considered. I do agree with the author’s driver assessment in the following but now my question: Based on the assessment of the fewer, significant regions, how much can one assume that the driver assessment is also driving the variability of the larger – insignificant SO. I don’t think one can with absolute certainty.
Page 15 line 99: “However, seasonal – regional analysis shows that the observed relationship between pCO2 and SST is counterintuitive (Figure 5a-c,g-i). On this basis we propose that SST is not a driver of pCO2 in winter.” – Hold on here: Firstly, this is not a new proposal but has been e.g. shown by Takahashi et al 2002. Secondly, despite temperature not being the driver, the solubility relation still exists, it is simply not dominating the variability (see e.g. Figure 3 of Landschutzer 2015). Thirdly, 2-3 lines earlier the authors mention that they propose changes in wind stress as an alternative hypothesis to Landschutzer 2015, but this is exactly the point of the Landschutzer paper, that changes in the wind pattern and thereby changes in wind stress and upwelling caused the reinvigoration of the SO sink (see again e.g. Figure 3 in Landschutzer et al). The main difference is that these authors have not done the analysis for winter separately.
Page 16 lines 28 onward: The authors talk about correlations, but based on the visual comparison it is not easy to verify this assessment. It would help to add an actual correlation plot, or adjust the colour scheme.
Page 16 line 35: “Looking more specifically at the significant variability…” – Now I am completely lost. Do the authors now, as they state in the beginning, only consider significant regions or not? This statement suggests that they did not but start doing so now.