|The manuscript “Using BGC-Argo floats for the assessment of marine biogeochemical models: a case study with CMEMS global forecast system” by Mignot et al. proposes 22 metrics for the assessment of biogeochemical models and applies them to a single model. As such, the analysis is a very welcomely comprehensive application of ocean BGC Argo observations, but is done in a vacuum without reference to previous or alternative modeling efforts. While this approach is fine from a technical report documentation perspective, it does not fit the standard of a scientific research paper. As such, it would seem more appropriate for “Geoscientific Model Development” than “Biogeosciences” in its present form. The null hypothesis for establishing that the model is “good” should be defined. Also, there are some really interesting of the value and needs for BGC Argo observations in the conclusions that are completely unsupported by the body of the manuscript… if the authors want to bring some of this Appendix material into the manuscript body so as to support these conclusions, (and leave the focus on just the current model) that would also be an appropriate means of turning the paper from a technical report on diagnostics into a scientific research paper. Finally, the paper includes a multitude of language mistakes which I have tried to rectify in my technical comments.|
1-16 – “a major tool” should be “major tools”
2-1 – “or” should be “and”. Also, is there a difference between “These metrics” and “The metrics in the sentence before? If not, “. These metrics” should be ”and”
2-3 – “suggest” seems an odd word here given that nearly all scientific papers display plots. Perhaps instead of “suggest” should be “recommend as a community standard”
2-7 – “had” should be “has”
2-14 – No, numerical simulations are not necessary “to monitor these ongoing changes”. Instead, the authors could say, “to contextualize monitoring of ongoing changes”
2-23 – remove “being”. Also, the attribution here with “mostly” is overconfidently placed on lack of BGC understanding. In many instances it is lack of understanding of the physics and lack of characterization of the forcing that are the bigger issues than the BGC parameterization.
2-25 – add comma before “and” .
2-30 – add “a” before “few”
3-3 – The list should reflect back to the same part of the sentence, not three different parts. If reflecting back to “to test their”, then it should be “to test their predictive skills, ability to reproduce BGC processes, and confidence intervals on model predictions” or if these are separate statements reflecting back to “their” and “to”, then “to test their predictive skills and ability to reproduce BGC processes and estimate confidence intervals on model predictions”
3-11 – “All these datasets neither have a” Should be “These datasets have neither”
3-12 – remove “can”
3-23 – “so far essentially sampled” should be “well sampled only”
3-24 – Add “the” before “regional”
3-24 – remove comma before “large”
3-25 – remove comma before “like”, and replace “or” with ‘and”
4-4 – “represent” should be “represents”. Also, while this statement may be true in terms of quantity of data for a few parameters, it is not true in terms of either accuracy or comprehensiveness. Just because you can derive an estimate of SiO4 from an O2 sensor does not mean the dataset is better than actually measuring O2 from a Winkler titration, much less using that value to extrapolate SiO4.
4-6 – I do not know what “interoperability” means in this context. Is it something about the inherent environmental variability, or the measurement uncertainty?
4-8 – I don’t know what “separately” is being used for here. Are the authors saying that they are initializing the model with “WOA/WOD” and then evaluating performance separately with BGC-Argo? Or that the initialization of the model is done independently from WOA/WOD and then both WOA/WOD and BGC-Argo are used for independent evaluation?
4-18 – The sentence “We expect that the methodology employed here (from the data handling to the use of assessment metrics) would be useful and informative for other research teams interested in model evaluation with BGC-Argo floats.” Belongs in the discussion/conclusions, not in the introduction.
4-27 – “them” should be “these metrics”
4-29 – “. These metrics” should be “and”
4-31 to 5-5 – Again, the sentences beginning “Further, our validation framework could...” to “… is not addressed in this study” Belongs in the discussion/conclusions, not in the introduction.
4-33 – This sentence needs a lot of work. The authors could try adding “have” before “demonstrated”, “flux” before “calculation” and “the” before “basin”, remove the commas and remove “of mass fluxes and process rates” and see if it makes sense.
5-3 “use of the word “arduous” seems odd here. Whether something is hard to do is not necessarily relevant. More relevant is whether the effort is warranted… would it be too uncertain so as not to be robust?
5-7 – “follow: s” Should be “follows. S”
5-26 – “variable” should be “variables,”
5-32 – It would be helpful to site the WCRP standard here for essential climate variables, e.g Bojinski et al, 2014, “The concept of essential climate variables in support of climate research, applications, and policy”, BAMS
6-1 – Unclear what is intended for “highest” here. Is it “highest quality” or “highest density” or something else?
6-11,12 – “points” should be “point”
6-18 – So this means that the low values are biased high as the chance of a low positive value includes the possibility of the value being zero. How big is this problem? What fraction of the data had to be adjusted to zero?
6-23 – “floats” should be “float”
6-24 – add comma after “salinity”
6-25 – there should be a statement here on the carbon system data source that is used for the training of the algorithm… eventually the skill has to be traced back to the GLODAP or other data source.
6-34:7-2 – The authors should note that whether or not it is “reasonable” to draw these conclusions is also entirely reliant on both the BGC Argo data and the model capturing the underlying environmental variability.
7-8 remove comma
7-10 – remove “it” after “and”
7-19 – what is the advantage, if there is one, of saving only weekly and then recreating the daily values with interpolation? Is this to speed the model or otherwise reduce data size?
7-26 – remove “values”. Again, is there an advantage of calculating output offline? Are CO2 fluxes calculated online and saved out? Perhaps it would be better to move this to the next section where the CO2 flux calculation is discussed.
7-32 – and “space to” between “and” and “the”
8-3 – The bias in MLD is provided, but what is the average MLD that would allow me to know the % bias?
8-28 – This is a strange phrasing. It sounds from this that acidification does not impact the subsurface down to 200 m, on the “surface” and the 200-400 m range… Why not just say that acidification is expected to have its largest impact in the upper 400 m and then separately that the present analysis chooses the 200-400 m range of Kwiatkowski? Presumably the surface and 200-400 m ranges are shown to highlight different signals rather than to suggest the area in between is unimportant. This should b clarified.
9-12 – I would replace “first level” with “most simple but indirect level”, 9-13 – replace “of” with “associated with” and 9-17 – replace “second level” with “more process level” since the “second level” isn’t being pursued.
9-22:9-26 – A brief statement and reference on the motivation for providing these mesopelagic estimates is warranted. Also, is there a reference or other rationale for this choice of varying depth range? This MLD-1000 m variable depth definition would seem to include the part of the euphotic zone below the mixed layer as “mesopelagic”, at least during the growing season. I would have thought the area below the mixed layer within the euphotic zone to look more like the surface than the mesopelagic, or “twilight zone”, a constant 200-1000 m range would have been easier to interpret, particularly against the 200-400 definition for pH.
9-33 – remove second “processes” and end sentence after “production”
10-10 – This sentence is very misleading. The vertical supply of NO3 to the surface is accompanied with remineralized DIC which is the reverse of the biological carbon pump. This sentence should be reworded.
20-32 - Why define a biased average for O2 300? Shouldn’t the average oxygen between 250-300 be referred to O2 275? Why not use the same 200-400 definition as pH? Or 250-350?
11-2 – Similarly, why define O2 1000 as O2 950-1000? Should this be o2 975, or alternatively, defined as 950-1050… do the floats only go down to 1000m? This would seem a reasonable justification if it were the case since gradients at this depth tend to be weak, but still wouldn’t explain the odd 250-300 definition.
12-12 – “on a climatological level” should be “as a climatology”
12-13 – what is the purpose of “etc..”? “imposes” should be “requires”
12-14 – “in a climatological way” should be “as a climatology”
12-19 – why is “Biogeochemical-Argo Planning Group, 2016” in parenthesis here. Was this means of gridding a recommendation from this group? If so, please be explicit.
12-21 – “clarity” should be “clarity in visualization” or “simplicity in visualization”
12-26 – Add “While” before Taylor” and replace “but” with a comma in the next sentence. That would make It more clear that you are introducing a new topic rather than simply revisiting how great are the first three presentation methods.
12-33 – “for” should be “in”
13-7 – The sentence “Examples of the diagnostic plots described in section 4 in combination with the metrics defined in Section 3 are shown.” Seems redundant with the orientation statement in the introduction section and should be removed.
13-16:13-25 – The null hypothesis that the reader should use to define “well represented” are not clear. Isn’t much or all of this fidelity due to the initial condition derived through the assimilation? I am not sure what to take from this. Is there an “unassimilated” version of the model with which the assimilation should be compared? Or a previous generation model? Or other unassimilative models such as CMIP6? Or is the objective just to show the broad contrast in pattern agreement between model and observations across variables? Why is pH so poorly predicted?
14-1:14-4 – This discussion of the value of Taylor diagrams is very superficial and somewhat misleading. The presentation here certainly shows what patterns and variability in different variables are relatively well reproduced, but whether this should inform future model development priorities entirely depends on the intended use of the model and associated requirements. Further, the most common scientific use of Taylor diagrams is the comparison of the same metric across models so that one can quantify the improvements.
14-25 – Without a frame of reference, it is not at all clear whether the model is good or bad. Like in the case of the Taylor diagrams, it seems like the analysis is being done in a vacuum without any awareness of other modeling efforts. There is also the lack of appreciation of the satellite derived estimate for this metric.
18-18 – The conclusions “Here, we showed that the spatial maps of model-observations comparison are also informative a posteriori, with respect to the network design, as they highlight sensitive areas where BGC-Argo observations are critical and where sustained BGC-Argo observations are required to better constrain the model. These maps correspond to the regions where the model uncertainty (see RMSD spatial maps in Figs. A22-A44) is the highest, i.e., the Equatorial belt with respect to the carbonate system variables, the Southern Ocean with respect to the nutrients and the DCM variables, and the western boundary currents and OMZs with respect to oxygen.” Are very interesting scientific research conclusions but are not at all discussed in the body of the manuscript. This is totally unacceptable. The paper cannot bring in unsupported information at the conclusion stage referencing Appendix material. The authors need to show this or restate these conclusions as hypotheses for future work.