the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Ocean models as shallow sea oxygen deficiency assessment tools: from research to practical application
Abstract. Oxygen is a key indicator of ecosystem health and part of environmental assessments used as a tool to achieve a healthy ocean. Oxygen assessments are mostly based on monitoring data that are spatially and temporally limited, although monitoring efforts have increased. This leads to an incomplete understanding of the current state and ongoing trends of the oxygen situation in the oceans. Ocean models can be used to overcome spatial and temporal limitations and provide high-resolution 3D oxygen data but are rarely used for policy-relevant assessments. In the Baltic Sea where environmental assessments have a long history and which is known for the world’s largest permanent hypoxic areas, ocean models are not routinely used for oxygen assessments. Especially for the increasingly observed seasonal oxygen deficiency in its shallower parts, current approaches cannot adequately reflect the high spatio-temporal dynamics. To develop a suitable shallow water oxygen deficiency assessment method for the western Baltic Sea, we evaluated first the benefits of a refined model resolution. Secondly, we integrated model results and observations by a retrospective fitting of the model data to the measured data using several correction functions as well as a correction factor. Despite its capability to reduce the model error, none of the retrospective correction functions applied led to consistent improvements. One reason is probably the heterogeneity of the used measurement data, which are not consistent in their temporal and vertical resolution. Using the Arkona Basin as an example, we show a potential future approach where only high temporal and/or vertical resolution station data is integrated with model data to provide a reliable and ecologically relevant assessment of oxygen depletion with a high degree of confidence and transparency. By doing so we further aim to demonstrate strengths and limitations of ocean models and to assess their applicability for policy-relevant environmental assessments.
- Preprint
(5435 KB) - Metadata XML
-
Supplement
(2308 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on bg-2023-152', Anonymous Referee #1, 27 Nov 2023
Overall Statements:
The manuscript “Ocean models as shallow sea oxygen deficiency assessment tools: from research to practical application“ by Sarah Piehl, Rene Friedland, Thomas Neumann, and Gerald Schernewski aims to show how the results of a coupled 3D physical-biogeochemical model can be improved so that they provide reliable information to support environmental measures. The authors use very heterogeneous data in the southwestern Baltic Sea to show how this attempt fails. However, a much higher resolution data set in one of the Baltic Sea basins can actually improve the model results there. I have the impression that the first experiment was designed in such a way that no good results were possible. It is described that sometimes contradictory data was recorded at neighbouring locations. It is clear that the model results cannot be improved with such data. In-depth data quality control and a plausibility check in advance would have been appropriate here.
The present manuscript also contains technical errors: data from at least one station is used for both optimisation and validation. Furthermore, one definition of the error metric is incorrect. The methods used appear to have been chosen arbitrarily without providing a justification.
The figures are often of poor quality and are inadequately described in the captions. Some sentences in the text are incorrect or abbreviated. References are missing.
Overall, a retrospective correction of the model results is presented, which is already questionable in its approach. The reader would expect an attempt to be made to improve parts of the model in such a way that quality-tested measured oxygen values are matched. Data assimilation would also be more in line with the current state of modelling.
The take-home message of the manuscript is that it is necessary to use a more dense data network to obtain good modelling results. This is not a new scientific insight.
I see a way out of this manuscript to deliver a good scientific contribution by using the data of the Arkona Basin to recognise and improve deficits of the model. This requires an in-depth error analysis that not only sheds light on statistical relations, but also examines individual time periods and developments in the model results and the high-resolution data.
Detailed remarks:
L32: Rosenberg: reference missing.
L105: Please insert a chapter on data sets used.
L106: as oxygen dynamics are the focus, the model description should contain the description of the benthic and sediment model system.
L110: “well”.
L118: cite correctly.
L138: Fig. 1 poor quality. Station numbers are barely recognisable.
L158: no profiles in Fig. 1.
L159: Fredland: missing reference
L177: wrong definition, if P(bar) and O(bar) represent the means.
L197ff: please write in more detail. I do not know if a distance between observation and corrected simulation was taken for a simulation cell/time nearest to the observation.
L213: show the position of the MARNET station used.
L220: please explain why you switch to a multiplicative correction.
L224: please say at this point of the manuscript, that the following sections use uncorrected results.
L234: please use unified nomenclature: south-western BS (Like in Tab 1).
L235: AE>0 does not fit to the corresponding profiles.
L247: are the r values for the Kiel Bay correct?
L259: Fig. 3: poor quality. Too high information density.
L260ff: introduce Fig. 3. Which model data is used? Explain the boxes and the lines (percentiles ..)
L267: much stronger model scatter for TF0012.
L269: TF0010 is here used for validation. Fig 1 shows this station with a white dot (for model correction).
L269: Tab 2: please explain which data is compared here. Monthly means? Same location/time?
L273: Fig 4 caption: where are the observations (black)?
L280: also negative values appear.
L304: Fig 5: identify the different basins.
L315: please say model performance is poor.
L316: this is a very interesting section. Please introduce these experiments in more detail: How are the different components (distance to shore, ..) calculated? Did you set in (5) ai,bi,ci = 0 for calculating the distance to shore component only?
L320: you use different correlation functions. Here it is R2 . In Tab 2 you use r (Pearson’s correlation coefficient), and in Tab 3 the Spearman Rank correlation. Please explain your choice.
L320: state that y-axes have different numbers.
L348: I cannot see an improvement of 0.28.
L354: Fig 7: please insert basin names.
L357: I do not understand the categories: spatial near-bottom oxygen, near bottom oxygen and oxygen profiles. Maybe it’s only to adapt the nomenclature to Tab 3.
L375: please also discuss “all basins” in Fig 3.
L379: standard error decreases with increasing number of observations. This by definition.
L384: please explain the boxes, the lines and the dots in Fig 8. Why do we have standard errors for each month? Why not one standard error for each month?
L396: “oxygen concentration” in the Arkona Bay?
L396: please give a motivation to use this simple multiplicative correction function here.
L408: please explain MARNET Function and MARNET Factor.
L433: which baseline?
L437: please discuss vertical resolution too.
L480: show the position of this single station.
L501: “other studies” give examples.
L514: “three different thresholds were used and averaged” – this appears indeed senseless.
L517: I would expect an overestimation of oxygen concentrations if the thickness of the model’s deepest layer is larger than 4 m.
L518: “daily resolution”: is this the time step of the model or the resolution of the results which are analysed?
L522: Dias et al: reference missing.
L532: also discuss the possibility to use a retrospective adaption for projections.
L539: the transparency is discussed above in the opposite direction.
L541: in the frame of OSPAR work such exercises were done too.
L569: “improved the representation of the oxygen dynamics”.
L576: discuss also the possibility of data assimilation.
L598ff: please help the reader to recognise each reference (line distance or indent).
Citation: https://doi.org/10.5194/bg-2023-152-RC1 -
AC3: 'Reply on RC1', Sarah Piehl, 11 Jan 2024
We would like to thank the reviewer for the careful review of our manuscript. We thereby realized that the manuscript in its current form lacks a clear presentation of the background, rationale and objectives, as all three reviewers wondered why we did not consider to improve the model itself. A critical evaluation and presentation of the suitability of the model was not the aim of this study. We use the model in our study to provide a solution to a practical problem. For the Baltic Sea, the MOM-ERGOM modelling system has been calibrated for decades on mass balances and got validated for key eutrophication parameters (e.g., Eilola et al. 2011, Meier et al. 2018) so that it runs stable in the entire Baltic Sea in the long term with a focus on the western Baltic Sea (Friedland et al. 2012, Schernewski et al. 2015). Reliable long-term methods are also required for oxygen assessments within the framework of HELCOM.
The background of our study is that current oxygen assessment in the western Baltic Sea are based on sparse heterogeneous observational data. This data is further spatially and temporally averaged, which leads to highly unreliable assessments of the oxygen situation (e.g., BLANO 2018). Models have already been used to address several water quality management issues in the Baltic Sea (e.g., HELCOM 2013, Schernewski et al. 2015, Meier et al. 2018, Friedland et al. 2021) but are not yet an integral part of assessments, although they are already suitable for a combined approach which brings together observational and modelling data. The Baltic Sea is ideal for testing such an approach because, as explained above, the framework conditions of the model system are in place.
Based on the reviewer's summary, the aim of our study does not seem clear, so we will adapt the manuscript to clarify that we are not aiming to support environmental measures, but to supplement environmental monitoring data with the help of model data in order to improve environmental assessments. Here we tested two approaches to combine both data types (using the correction function or the correction factor). We see that for the background of the study, the comparison of the two different spatial model resolutions is actually a side aspect and misleading. We therefore suggest to shorten it and to move parts to the supplementary information.
We appreciate the comments on the technical issues and will address them accordingly. The reviewer's impression that contradictory measurements were used is due to improper wording in the manuscript. The number of measurements at a station plays a major role in the spatial comparison (where an average over a longer period of time is evaluated), so that high or low values play a major role at stations with a low number of data, which tend to average out at stations with a high number of data. However, this has no influence on the use of the data for the correction function (no averaging of oxygen values) as we have performed data quality and plausibility check, which we will emphasize more clearly in the manuscript.
We will also revise the introduction and discussion section to better clarify our main messages, namely that a more reliable oxygen assessment is possible if observational data is combined with modelling data providing high resolution information. That practical applications rely on observational data, so modelling results need to be adjusted accordingly. And, in order to apply an integrated approach of observational and modelling data in the future, the monitoring network needs to be reconsidered to meet the requirements for such an approach.
References:
BLANO 2018: https://mitglieder.meeresschutz.info/files/meeresschutz/berichte/art8910/zyklus18/Nationale_Indikatoren_Ostsee_Anlage_1.pdf
Eilola et al. 2011: https://doi.org/10.1016/j.jmarsys.2011.05.004
Friedland et al. 2021: https://doi.org/10.3389/fmars.2021.596126
Friedland et al. 2012: https://doi.org/10.1016/j.jmarsys.2012.08.002
HELCOM 2013: https://helcom.fi/wp-content/uploads/2019/08/Eutorophication-targets_BSEP133.pdf
Meier et al. 2018: https://doi.org/10.3389/fmars.2018.00440
Schernewski et al. 2015: https://doi.org/10.1016/j.marpol.2014.09.002Answers to detailed remarks:
We thank the reviewer for the detailed remarks which will improve the manuscript and we will accept them accordingly.
Below are further more detailed answers to specific questions/remarks:L32: Rosenberg: reference missing.
We guess that R1 is referring to “Diaz and Rosenberg, 2008” which is included in the reference list.L105: Please insert a chapter on data sets used.
As suggested we will add a chapter on the data sets used.L106: as oxygen dynamics are the focus, the model description should contain the description of the benthic and sediment model system.
The recent version of the biogeochemical model ERGOM is described at Neumann et al. (2022, https://gmd.copernicus.org/articles/15/8473/2022/). The oxygen dynamics were analyzed in depth by Börgel et al. (2023, https://www.frontiersin.org/articles/10.3389/fmars.2023.1174039/full). Nevertheless, we agree with the reviewer and a more detailed description of the oxygen-relevant processes will be added.L158: no profiles in Fig. 1.
We agree with the reviewer and we will change Fig. 1 accordingly. The profiles are shown as white dots in the figure. To be more clear a proper description will be added.L177: wrong definition, if P(bar) and O(bar) represent the means.
We thank the reviewer for discovering this error and will address it accordingly.L197ff: please write in more detail. I do not know if a distance between observation and corrected simulation was taken for a simulation cell/time nearest to the observation.
We agree with the reviewer and suggest to include an improved description of the correction function in the revised manuscript.L220: please explain why you switch to a multiplicative correction.
See answer on comment to L396 below.L235: AE>0 does not fit to the corresponding profiles.
The statement refers to the near-bottom oxygen concentrations which will be made clear in the revised version.L247: are the r values for the Kiel Bay correct?
The values have been checked and are correct. One has to consider that here again only the near-bottom oxygen concentrations are analyzed (Figure 2).L267: much stronger model scatter for TF0012.
We agree with the reviewer and will delete the sentence.L269: TF0010 is here used for validation. Fig 1 shows this station with a white dot (for model correction).
This is a misunderstanding. The description should refer to the next closest red dot. The figure will be improved according to the reviewer comments.L269: Tab 2: please explain which data is compared here. Monthly means? Same location/time?
The multi-annual mean near-bottom oxygen concentrations (2010-2019) at same location and time. Information will be added.L273: Fig 4 caption: where are the observations (black)?
Representation of observations will be adjusted in the figure.L280: also negative values appear.
There are neither negative values in Table 1 nor in Table 2 considering the RMSE. Is something else meant?L316: this is a very interesting section. Please introduce these experiments in more detail: How are the different components (distance to shore, ..) calculated? Did you set in (5) ai,bi,ci = 0 for calculating the distance to shore component only?
We agree with reviewer and suggest to enhance the description of the results in the revised manuscript. The reviewer is right, here we computed the change of the model skills by using only one (instead of all) predictor variables from eq. (5), what means that the factors ai,bi,ci,di were set to 0, which were corresponding to the not-used predictor variables.L320: you use different correlation functions. Here it is R2 . In Tab 2 you use r (Pearson’s correlation coefficient), and in Tab 3 the Spearman Rank correlation. Please explain your choice.
We thank the reviewer for pointing out the differences. Indeed Tab 3 should show the Pearson’s correlation coefficient which will be corrected. We will replace the coefficient of determination with Pearsons’s R accordingly.L348: I cannot see an improvement of 0.28.
We thank the reviewer for pointing this out. In fact, Table 3 has not been updated in the latest version, and the difference in the Pomeranian Bay of 0.28 can be seen when the Pearson's corr. coefficient is used.L357: I do not understand the categories: spatial near-bottom oxygen, near bottom oxygen and oxygen profiles. Maybe it’s only to adapt the nomenclature to Tab 3.
Table 3 refers to the data used for correction, here the 3 different analyzed aspects of model performance are meant. We will improve nomenclature accordingly.L384: please explain the boxes, the lines and the dots in Fig 8. Why do we have standard errors for each month? Why not one standard error for each month?
The reviewer is right. It should be "Average monthly standard error for the years 2010-2019" in the figure legend. Figure 8 will be revised accordingly.L396: “oxygen concentration” in the Arkona Bay?
For the MARNET station. The information will be added accordingly.L396: please give a motivation to use this simple multiplicative correction function here.
The reviewer is right. A simple correction factor is easier to understand and more transparent for users. This is why we included this second approach. We will add it accordingly.L433: which baseline?
Territorial sea baseline. Will be changed accordingly.L514: “three different thresholds were used and averaged” – this appears indeed senseless.
We agree with the reviewer but this is the current approach by HELCOM. The reason for the HELCOM approach is the aggregation to EQR values at the end. As some years and areas have no oxygen deficiency situation with conditions <2 mg/l, it was decided to use and aggregate the different threshold values discussed so far (2, 4 and 6 mg/l), as zero values are difficult to calculate EQRs.L517: I would expect an overestimation of oxygen concentrations if the thickness of the model’s deepest layer is larger than 4 m.
Our model system has vertical layer thicknesses of not more than 2 meter in the western Baltic Sea. But to a large extent, the bottom-most cells have even less thicknesses as we work with only partly filled bottom cells to allow better inflow dynamics.L518: “daily resolution”: is this the time step of the model or the resolution of the results which are analysed?
The analysis time step. The time step of the model is 10 minutes. Sentence will be changed to be more clear.L522: Dias et al: reference missing.
Indeed there is a typ-o, Diaz et al. 2011, where the reference is also provided.L539: the transparency is discussed above in the opposite direction.
We thank the reviewer for pointing this out and will clarify that we mean the approach utilizing the simple correction factor.L541: in the frame of OSPAR work such exercises were done too.
We thank the reviewer for pointing out the OSPAR work and will add the reference.L576: discuss also the possibility of data assimilation.
We see that data assimilation for forecasting products is another interesting use case. But for the conclusion we think this would open up a new topic which is beyond the focus in the manuscript and therefore do not want to include it.Citation: https://doi.org/10.5194/bg-2023-152-AC3
-
AC3: 'Reply on RC1', Sarah Piehl, 11 Jan 2024
-
RC2: 'Comment on bg-2023-152', Anonymous Referee #2, 06 Dec 2023
In their manuscript, Piehl et al compare the results of a coupled circulation-biogeochemical model of the Baltic Sea with observations to assess the ability of the model to represent oxygen variability, in particular hypoxic/low oxygen conditions, in the southwestern Baltic Sea. The goal is to use the model in environmental assessments of the Baltic Sea. An observation-based model correction is tested. The authors find that both increased horizontal resolution and applied correction can improve the model performance. They conclude that despite some mismatch with observations the model is appropriate for use in oxygen deficiency assessments.
The title of the manuscript is a bit misleading. The study is limited to model/observations comparison and lacks scientific insights that could be valuable to develop the oxygen deficiency assessment method of the western Baltic Sea. The correction method is an interesting exercise to see where the model fails but should be used to correct the model rather than its results. If the model overestimates oxygen depletion in some regions then oxygen consumption may be overestimated and should be corrected. With the assessment method in mind, the authors should propose metrics (with examples), based on model simulations, that would be useful additions to the current observations used in the environmental assessments of the Baltic Sea. In its current form, the manuscript is more like a technical report and therefore of somewhat limited interest to the broader Biogeosciences community.
Specific comments:L98-104: There seem to be a disconnect between the proposed work (assess/improve model agreement with observations) and the objective of the study (provide a high resolution monitoring system)
Figure 2: Is this a multi annual summer average?
L270-271: This is a bit optimistic, Fig 4 shows a large discrepancy in simulated subsurface oxygen
Section 3.2: Here you correct your model with a set of data and then assess the correct model with the same set of data. A more robust assessment of the model skills would be to correct the model with half of the dataset and then compare the corrected model with the second part of the observations.
L365-368: Figure 3b indicate an underestimation of near bottom oxygen at the Bay of Mecklenburg, which seems to contradict this statement
L412-414: The case study was mainly an assessment of the model skills and not the development of a method for a high resolution monitoring and assessment system. To reach this goal you first need to assess the model and then use its results to develop metrics that are robust and useful addition to the assessment
L421-422: Of course, this is controlled by air-sea gas exchange
L498-499: This is somewhat contradictory to what was said before, the model has some limitations
L532-535: The model assessment could be used to understand the source of the discrepancies and ultimately to improve the model
L552: Computational power remains an issue, especially for highly resolved models
Minor comments/typos:L46: Here and elsewhere: "monitoring data" is not the proper terminology, please rephrase
L47: remove monitoring
L49: the quality and thus the reliability...
L51: Nevertheless, data acquisition is limited...
L88: models provide an approximation of in-situ conditions and will never perfectly align with observations.
This sentence is redundant with the previous one thoughL96: However, due to the coarse horizontal resolution of the model (~5km), nearshore areas...
L98: resolution (~2km) to investigate...
L126: km more appropriate than nautical miles?
vertically resolved z-layers ... from 0.5 at the surface to 2m at the bottom in the deepest areas.L128: Figure 1 does not mention the model, is the red square on the left panel the limits of the model grid and does the left panel show the model bathymetry? or does the model grid cover the entire Baltic Sea?
L135: what is the time unit?
Equation 3: The formula is the mean square error, please correct. Do you mean the Mean Absolute Error(MAE)? Why not using the bias?
Figure 3: - Can you mention in the caption to refer to Fig 1 for the station locations
- Can you move the xtick labels leftward to align with the vertical line?Figure 7: Although the color scale is the same, the color of the observations points vary from Figure 2, can you explain? The captions are the same.
L558: Can you add this reference in the References list?
Citation: https://doi.org/10.5194/bg-2023-152-RC2 -
AC2: 'Reply on RC2', Sarah Piehl, 11 Jan 2024
We would like to thank the reviewer for the careful review of our manuscript. We thereby realized that the manuscript in its current form lacks a clear presentation of the background, rationale and objectives, as all three reviewers wondered why we did not consider to improve the model itself. A critical evaluation and presentation of the suitability of the model was not the aim of this study. We use the model in our study to provide a solution to a practical problem. For the Baltic Sea, the MOM-ERGOM modelling system has been calibrated for decades on mass balances and got validated for key eutrophication parameters (e.g., Eilola et al. 2011, Meier et al. 2018) so that it runs stable in the entire Baltic Sea in the long term with a focus on the western Baltic Sea (Friedland et al. 2012, Schernewski et al. 2015). Reliable long-term methods are also required for oxygen assessments within the framework of HELCOM.
The background of our study is, that current oxygen assessment in the western Baltic Sea are based on sparse heterogeneous observational data. This data is further spatially and temporally averaged, which leads to highly unreliable assessments of the oxygen situation (e.g., BLANO 2018). Models have already been used to address several water quality management issues in the Baltic Sea (e.g., HELCOM 2013, Schernewski et al. 2015, Meier et al. 2018, Friedland et al. 2021), but are not yet an integral part of assessments, although they are already suitable for a combined approach which brings together observational and modelling data. The Baltic Sea is ideal for testing such an approach because, as explained above, the framework conditions of the model system are in place.
We agree with the reviewer that the title of the manuscript and the use of the terms assessment and monitoring is sometimes misleading and we will address it accordingly. Further, the aim of our study does not seem clear, so we will adapt the manuscript to clarify that we are aiming to supplement environmental monitoring data with the help of model data. A high-resolution monitoring product is needed as basis for more reliable and meaningful oxygen assessments. Scientific insights in how an oxygen deficiency metrics could look like was already done in a previous study (Piehl et al. 2023) and is thus not the focus here.
We see that for the background of the study, the comparison of the two different spatial model resolutions is actually a side aspect that leads to the article being perceived as a technical report. We therefore suggest to shorten this aspect and focus more on the two approaches to combine observations with model data. We see the added value of our manuscript for the BGS community in showing a way to generate a high-resolution, reliable data basis for policy-relevant oxygen assessments by combining model results and observational data.
We will also revise the introduction and discussion section to better clarify our main messages, namely that a more reliable oxygen assessment is possible if observational data is combined with modelling data to provide high resolution information. That practical applications rely on observational data, so modelling results need to be adjusted accordingly. And, in order to apply an integrated approach of observational and modelling data in the future, the monitoring network needs to be reconsidered to meet the requirements for such an approach.
References:
BLANO 2018: https://mitglieder.meeresschutz.info/files/meeresschutz/berichte/art8910/zyklus18/Nationale_Indikatoren_Ostsee_Anlage_1.pdf
Eilola et al. 2011: https://doi.org/10.1016/j.jmarsys.2011.05.004
Friedland et al. 2021: https://doi.org/10.3389/fmars.2021.596126
Friedland et al. 2012: https://doi.org/10.1016/j.jmarsys.2012.08.002
HELCOM 2013: https://helcom.fi/wp-content/uploads/2019/08/Eutorophication-targets_BSEP133.pdf
Meier et al. 2018: https://doi.org/10.3389/fmars.2018.00440
Piehl et al. 2023: https://doi.org/10.1007/s10666-022-09866-x
Schernewski et al. 2015: https://doi.org/10.1016/j.marpol.2014.09.002Answers to specific comments:
We thank the reviewer for the specific comments/suggestions which will improve the manuscript.L98-104: There seem to be a disconnect between the proposed work (assess/improve model agreement with observations) and the objective of the study (provide a high resolution monitoring system)
The reviewer is right that we cannot name it monitoring system and will change it accordingly.Figure 2: Is this a multi annual summer average?
No, this is the multi-annual average from 2010-2019 including data from all months. For this spatial comparison it was necessary to include a long time-period and all months in order to reach this density of observational data points.L270-271: This is a bit optimistic, Fig 4 shows a large discrepancy in simulated subsurface oxygen
The statement is based not only on Fig 4 but also on all other profiles in Fig S2 and the Pearson’s correlation coefficient. But the reviewer is right that especially in the lower, near-bottom water layers the discrepancy is large for some stations in some months. Will be addressed accordingly.Section 3.2: Here you correct your model with a set of data and then assess the correct model with the same set of data. A more robust assessment of the model skills would be to correct the model with half of the dataset and then compare the corrected model with the second part of the observations.
Partly this method was already done, e.g. by computing the correction function based on the near-bottom observations and then checking the model skill for the whole column. But we can adjust the posterior correction of the modeled oxygen concentrations by separating the dataset with observations into 2 parts, from which one will be applied to compute the correction function and the other one to compute the skill of the corrected oxygen concentrations.L365-368: Figure 3b indicate an underestimation of near bottom oxygen at the Bay of Mecklenburg, which seems to contradict this statement
The reviewer is right but this discrepancy is due to the fact that for the spatial comparison which was done in Table 3 a higher number of observational data (dots in Fig2/7) was used to calculate the error metrics. In Table 3 only three stations for each basin are shown. We hope to clarify those misunderstandings by adding a short paragraph on “data sets used” (as suggested by reviewer #1) in the Materials and Methods section.L412-414: The case study was mainly an assessment of the model skills and not the development of a method for a high resolution monitoring and assessment system. To reach this goal you first need to assess the model and then use its results to develop metrics that are robust and useful addition to the assessment
The reviewer is correct and as mentioned in the general response above, we will address the issue in the manuscript and ensure that terms/sentences will be changed or deleted accordingly.L421-422: Of course, this is controlled by air-sea gas exchange
Will be re-phrased accordingly.L498-499: This is somewhat contradictory to what was said before, the model has some limitations
Will be re-phrased accordingly.L532-535: The model assessment could be used to understand the source of the discrepancies and ultimately to improve the model
We hope that our general answer above has clarified our aims and why we have not pursued this idea further.L552: Computational power remains an issue, especially for highly resolved models
We agree that our statement needs to be more nuanced and will change it accordingly.Answers to minor comments:
We thank the reviewer for the minor comments/suggestions which will improve the manuscript and we will accept them accordingly.
Below are further answers to specific questions:L126: km more appropriate than nautical miles?
The corresponding kilometers will be added.vertically resolved z-layers ... from 0.5 at the surface to 2m at the bottom in the deepest areas.
Will be changed accordingly.L128: Figure 1 does not mention the model, is the red square on the left panel the limits of the model grid and does the left panel show the model bathymetry? or does the model grid cover the entire Baltic Sea?
The reviewer is right and we will change the sentence and Figure 1 accordingly. This is the official bathymetry of the Baltic Sea as provided by the IOW. Apart from minor adjustments, we also use this for the models. The model grid covers the entire Baltic Sea.L135: what is the time unit?
Will be changed accordingly to “kg P/km2/a”Equation 3: The formula is the mean square error, please correct. Do you mean the Mean Absolute Error(MAE)? Why not using the bias?
We thank the reviewer for pointing towards this mistake. Indeed it shall be the bias and will be corrected accordingly.Figure 7: Although the color scale is the same, the color of the observations points vary from Figure 2, can you explain? The captions are the same.
We thank the reviewer for pointing towards this mistake. Will be addressed accordingly.Citation: https://doi.org/10.5194/bg-2023-152-AC2
-
AC2: 'Reply on RC2', Sarah Piehl, 11 Jan 2024
-
RC3: 'Comment on bg-2023-152', Anonymous Referee #3, 07 Dec 2023
I think this analysis was a rigorous test of model performance in a region of the Baltic Sea. The motivation for the effort was also worthy in that we do need to improve model simulations in the shallower, perhaps more dynamic regions of the coastal zone. Where I think the analysis fell short was in the way the model was evaluated. There are two challenges I have for the authors:
(1) Why were the data aggregated into seasonal means for a long time period and compared to similarly aggregated model means? I can understand that this is one reasonable test of model behavior (getting the mean seasonal cycle correct), but for these models to be useful in scenarios of climate change and nutrient abatement, they need to do better than a seasonal climatology. It is also possible that aggregating data into a climatology might obscure how well the model might be doing to capturing particular events (e.g., wet years, heat waves). I am not sure if I wish for the authors return to the output and data and do a more time-specific comparison, but it would certainly provide another useful test of the model.
(2) The approach for model correction described in this paper is not incorrect, but it is insufficient. To try and post-correct the model with fixed parameters (e.g., distance from shore) seems counterintuitive; surely the more likely reasons for model-data mismatch are an inadequacy in the process rates, which are time and space varying. I think the paper would have been much more effective if the authors did sensitivity tests on some key rate processes (primary production, background diffusivity) to see if that helped improve model performance. This would be an avenue by which the authors might see more meaningful improvements in RMSE and other error metrics.
Below are more specific comments:
(1) Line 99: Is it better to call this "disagreement"? Error implies one is wrong, and the observations, as you correctly point out, have limitations.
(2) Line 110: "wel' should be "well"
(3) Line 118: Just write "as described by Kuznetsov and Neumann (2013)" so you don't need to write the names twice.
(4) Line 121: I am not sure what this means. Do you just mean that the main currency of metabolism is oxygen or carbon, and it is translated to N and P pools through stoichiometry?
(5) Line 126: make "miles" singular
(6) Line 148: should be "emphasizes"
(7) Line 148: Delete the comma after "Both"
(8) Line 158: Do you mean station with vertical oxygen profiles? There is no data in Fig 1
(9) Line 159: Can you offer a little more explanation here? Do I understand that near-bottom values were estimated from near-surface? Presumably, there is uncertainty with this that should be reported here.
(10) line 166: My guess is that you don't really care that the models reproduce a monthly oxygen climatology, but rather you want the models to capture other forms of variability. So why the aggregation here? This is related to one of my main general comments.
(11) Line 198: polynomials?
(12) Line 212: Can you specify what the temporal resolution is here?
(13) Lines 234-243: This writing seems to overlook the fact that the inner part of the Bay of Mecklenberg is an area where the model substantially over estimates DO. this should be addessed.
(14) Figure 3: There does seem to be a case where the models keep oxygen low well into November (Panels a, b, c, d) , but the observations don't show this, why might that be? This is a fairly season-specific problem that could be discussed in detail from a mechanistic standpoint.
(15) Line 310: I think you could more clearly point out that RMSE is more often > 2 mg/L for these comparisons. This is rather high, and worth highlighting.
(16) Figure 6 and related text: Iwould argue that the relative differences in RMSE between variables (purple dots) are pretty small (0.02 to 0.2) relative to the overall RMSE (~2). This would tell me that there are other factors that are more important in driving the discrepancies, likely biogeochemical or hydrodynamic processes in the model. This is not really discussed at all in the paper and is worth more discussion.
(17) Figure 7: It would probably be more effective to show these as "differences from corrected" and forget the observations. This would make it clearer to see where the improvements occurred. It seems that only the deep Arkona basin was really improved by the model.
(18) Line 386: should be "analysis"
(19) Line 339: This sentence is worded really oddly. How about "Since model skill was not equally good...."
Citation: https://doi.org/10.5194/bg-2023-152-RC3 -
AC1: 'Reply on RC3', Sarah Piehl, 11 Jan 2024
We would like to thank the reviewer for the careful review of our manuscript. We thereby realized that the manuscript in its current form lacks a clear presentation of the background, rationale and objectives, as all three reviewers wondered why we did not consider to improve the model itself. A critical evaluation and presentation of the suitability of the model was not the aim of this study. We use the model in our study to provide a solution to a practical problem. For the Baltic Sea, the MOM-ERGOM modelling system has been calibrated for decades on mass balances and got validated for key eutrophication parameters (e.g., Eilola et al. 2011, Meier et al. 2018) so that it runs stable in the entire Baltic Sea in the long term, with a focus on the western Baltic Sea (Friedland et al. 2012, Schernewski et al. 2015). Reliable long-term methods are also required for oxygen assessments within the framework of HELCOM.
The background of our study is, that current oxygen assessment in the western Baltic Sea are based on sparse heterogeneous observational data. This data is further spatially and temporally averaged, which leads to highly unreliable assessments of the oxygen situation (e.g., BLANO 2018). Models have already been used to address several water quality management issues in the Baltic Sea (e.g., HELCOM 2013, Schernewski et al. 2015, Meier et al. 2018, Friedland et al. 2021), but are not yet an integral part of assessments, although they are already suitable for a combined approach which brings together observational and modelling data. The Baltic Sea is ideal for testing such an approach because, as explained above, the framework conditions of the model system are in place.
Further, the aim of our study does not seem clear, so we will adapt the manuscript to clarify that we are aiming to supplement environmental monitoring data with the help of model data. A high-resolution monitoring product is needed as basis for more reliable and meaningful oxygen assessments.
We chose 10 years for our study in order to have an appropriate sample size for each month regarding the observations which generally have a low data coverage. The data aggregation was carried out in such a way that it corresponds to the current assessment approaches for oxygen deficiency. Scenario simulations as well as capturing particular events was not the focus of this study. Especially the later can become a challenge for oxygen assessments which needs to be addressed in future studies.
We will also revise the introduction and discussion section to better clarify our main messages, namely that a more reliable oxygen assessment is possible if observational data is combined with modelling data to provide high resolution information. That practical applications rely on observational data, so modelling results need to be adjusted accordingly. And, in order to apply an integrated approach of observational and modelling data in the future, the monitoring network needs to be reconsidered to meet the requirements for such an approach.
References:
BLANO 2018: https://mitglieder.meeresschutz.info/files/meeresschutz/berichte/art8910/zyklus18/Nationale_Indikatoren_Ostsee_Anlage_1.pdf
Eilola et al. 2011: https://doi.org/10.1016/j.jmarsys.2011.05.004
Friedland et al. 2021: https://doi.org/10.3389/fmars.2021.596126
Friedland et al. 2012: https://doi.org/10.1016/j.jmarsys.2012.08.002
HELCOM 2013: https://helcom.fi/wp-content/uploads/2019/08/Eutorophication-targets_BSEP133.pdf
Meier et al. 2018: https://doi.org/10.3389/fmars.2018.00440
Piehl et al. 2023: https://doi.org/10.1007/s10666-022-09866-x
Schernewski et al. 2015: https://doi.org/10.1016/j.marpol.2014.09.002Answers to more specific comments:
We thank the reviewer for the specific comments/suggestions which will improve the manuscript and we will accept them accordingly.
Below are further answers to specific questions:(4) Line 121: I am not sure what this means. Do you just mean that the main currency of metabolism is oxygen or carbon, and it is translated to N and P pools through stoichiometry?
The main biogeochemical fluxes are related to C, N and P. All the processes changing them result in a production or consumption of oxygen, which is calculated based out of the physical and chemical equations and fluxes. A more detailed description will be added also to the other reviewer comments.(8) Line 158: Do you mean station with vertical oxygen profiles? There is no data in Fig 1
Yes, will be addressed accordingly.(9) Line 159: Can you offer a little more explanation here? Do I understand that near-bottom values were estimated from near-surface? Presumably, there is uncertainty with this that should be reported here.
We agree that the description is confusing and we will address it accordingly. As reviewer #1 recommended we will insert a chapter on data sets used.(10) line 166: My guess is that you don't really care that the models reproduce a monthly oxygen climatology, but rather you want the models to capture other forms of variability. So why the aggregation here? This is related to one of my main general comments.
We chose 10 years for our study in order to have an appropriate sample size for each month regarding the observations which generally have a low data coverage. The data aggregation was carried out in such a way that it corresponds to the current assessment approaches for oxygen deficiency (see also our answer to the general comment above).(11) Line 198: polynomials?
Yes, will be changed accordingly.(12) Line 212: Can you specify what the temporal resolution is here?
We will add that the MARNET buoy data is available in hourly resolution.(13) Lines 234-243: This writing seems to overlook the fact that the inner part of the Bay of Mecklenberg is an area where the model substantially over estimates DO. this should be addessed.
We guess you mean underestimation of DO? Will be addressed accordingly.(14) Figure 3: There does seem to be a case where the models keep oxygen low well into November (Panels a, b, c, d) , but the observations don't show this, why might that be? This is a fairly season-specific problem that could be discussed in detail from a mechanistic standpoint.
The comparison of the model results with salinity measurements indicates an overestimation of the stratification in the model, which could lead to too weak vertical oxygen transport in the late autumn and winter months. Will be discussed accordingly.Citation: https://doi.org/10.5194/bg-2023-152-AC1
-
AC1: 'Reply on RC3', Sarah Piehl, 11 Jan 2024
Status: closed
-
RC1: 'Comment on bg-2023-152', Anonymous Referee #1, 27 Nov 2023
Overall Statements:
The manuscript “Ocean models as shallow sea oxygen deficiency assessment tools: from research to practical application“ by Sarah Piehl, Rene Friedland, Thomas Neumann, and Gerald Schernewski aims to show how the results of a coupled 3D physical-biogeochemical model can be improved so that they provide reliable information to support environmental measures. The authors use very heterogeneous data in the southwestern Baltic Sea to show how this attempt fails. However, a much higher resolution data set in one of the Baltic Sea basins can actually improve the model results there. I have the impression that the first experiment was designed in such a way that no good results were possible. It is described that sometimes contradictory data was recorded at neighbouring locations. It is clear that the model results cannot be improved with such data. In-depth data quality control and a plausibility check in advance would have been appropriate here.
The present manuscript also contains technical errors: data from at least one station is used for both optimisation and validation. Furthermore, one definition of the error metric is incorrect. The methods used appear to have been chosen arbitrarily without providing a justification.
The figures are often of poor quality and are inadequately described in the captions. Some sentences in the text are incorrect or abbreviated. References are missing.
Overall, a retrospective correction of the model results is presented, which is already questionable in its approach. The reader would expect an attempt to be made to improve parts of the model in such a way that quality-tested measured oxygen values are matched. Data assimilation would also be more in line with the current state of modelling.
The take-home message of the manuscript is that it is necessary to use a more dense data network to obtain good modelling results. This is not a new scientific insight.
I see a way out of this manuscript to deliver a good scientific contribution by using the data of the Arkona Basin to recognise and improve deficits of the model. This requires an in-depth error analysis that not only sheds light on statistical relations, but also examines individual time periods and developments in the model results and the high-resolution data.
Detailed remarks:
L32: Rosenberg: reference missing.
L105: Please insert a chapter on data sets used.
L106: as oxygen dynamics are the focus, the model description should contain the description of the benthic and sediment model system.
L110: “well”.
L118: cite correctly.
L138: Fig. 1 poor quality. Station numbers are barely recognisable.
L158: no profiles in Fig. 1.
L159: Fredland: missing reference
L177: wrong definition, if P(bar) and O(bar) represent the means.
L197ff: please write in more detail. I do not know if a distance between observation and corrected simulation was taken for a simulation cell/time nearest to the observation.
L213: show the position of the MARNET station used.
L220: please explain why you switch to a multiplicative correction.
L224: please say at this point of the manuscript, that the following sections use uncorrected results.
L234: please use unified nomenclature: south-western BS (Like in Tab 1).
L235: AE>0 does not fit to the corresponding profiles.
L247: are the r values for the Kiel Bay correct?
L259: Fig. 3: poor quality. Too high information density.
L260ff: introduce Fig. 3. Which model data is used? Explain the boxes and the lines (percentiles ..)
L267: much stronger model scatter for TF0012.
L269: TF0010 is here used for validation. Fig 1 shows this station with a white dot (for model correction).
L269: Tab 2: please explain which data is compared here. Monthly means? Same location/time?
L273: Fig 4 caption: where are the observations (black)?
L280: also negative values appear.
L304: Fig 5: identify the different basins.
L315: please say model performance is poor.
L316: this is a very interesting section. Please introduce these experiments in more detail: How are the different components (distance to shore, ..) calculated? Did you set in (5) ai,bi,ci = 0 for calculating the distance to shore component only?
L320: you use different correlation functions. Here it is R2 . In Tab 2 you use r (Pearson’s correlation coefficient), and in Tab 3 the Spearman Rank correlation. Please explain your choice.
L320: state that y-axes have different numbers.
L348: I cannot see an improvement of 0.28.
L354: Fig 7: please insert basin names.
L357: I do not understand the categories: spatial near-bottom oxygen, near bottom oxygen and oxygen profiles. Maybe it’s only to adapt the nomenclature to Tab 3.
L375: please also discuss “all basins” in Fig 3.
L379: standard error decreases with increasing number of observations. This by definition.
L384: please explain the boxes, the lines and the dots in Fig 8. Why do we have standard errors for each month? Why not one standard error for each month?
L396: “oxygen concentration” in the Arkona Bay?
L396: please give a motivation to use this simple multiplicative correction function here.
L408: please explain MARNET Function and MARNET Factor.
L433: which baseline?
L437: please discuss vertical resolution too.
L480: show the position of this single station.
L501: “other studies” give examples.
L514: “three different thresholds were used and averaged” – this appears indeed senseless.
L517: I would expect an overestimation of oxygen concentrations if the thickness of the model’s deepest layer is larger than 4 m.
L518: “daily resolution”: is this the time step of the model or the resolution of the results which are analysed?
L522: Dias et al: reference missing.
L532: also discuss the possibility to use a retrospective adaption for projections.
L539: the transparency is discussed above in the opposite direction.
L541: in the frame of OSPAR work such exercises were done too.
L569: “improved the representation of the oxygen dynamics”.
L576: discuss also the possibility of data assimilation.
L598ff: please help the reader to recognise each reference (line distance or indent).
Citation: https://doi.org/10.5194/bg-2023-152-RC1 -
AC3: 'Reply on RC1', Sarah Piehl, 11 Jan 2024
We would like to thank the reviewer for the careful review of our manuscript. We thereby realized that the manuscript in its current form lacks a clear presentation of the background, rationale and objectives, as all three reviewers wondered why we did not consider to improve the model itself. A critical evaluation and presentation of the suitability of the model was not the aim of this study. We use the model in our study to provide a solution to a practical problem. For the Baltic Sea, the MOM-ERGOM modelling system has been calibrated for decades on mass balances and got validated for key eutrophication parameters (e.g., Eilola et al. 2011, Meier et al. 2018) so that it runs stable in the entire Baltic Sea in the long term with a focus on the western Baltic Sea (Friedland et al. 2012, Schernewski et al. 2015). Reliable long-term methods are also required for oxygen assessments within the framework of HELCOM.
The background of our study is that current oxygen assessment in the western Baltic Sea are based on sparse heterogeneous observational data. This data is further spatially and temporally averaged, which leads to highly unreliable assessments of the oxygen situation (e.g., BLANO 2018). Models have already been used to address several water quality management issues in the Baltic Sea (e.g., HELCOM 2013, Schernewski et al. 2015, Meier et al. 2018, Friedland et al. 2021) but are not yet an integral part of assessments, although they are already suitable for a combined approach which brings together observational and modelling data. The Baltic Sea is ideal for testing such an approach because, as explained above, the framework conditions of the model system are in place.
Based on the reviewer's summary, the aim of our study does not seem clear, so we will adapt the manuscript to clarify that we are not aiming to support environmental measures, but to supplement environmental monitoring data with the help of model data in order to improve environmental assessments. Here we tested two approaches to combine both data types (using the correction function or the correction factor). We see that for the background of the study, the comparison of the two different spatial model resolutions is actually a side aspect and misleading. We therefore suggest to shorten it and to move parts to the supplementary information.
We appreciate the comments on the technical issues and will address them accordingly. The reviewer's impression that contradictory measurements were used is due to improper wording in the manuscript. The number of measurements at a station plays a major role in the spatial comparison (where an average over a longer period of time is evaluated), so that high or low values play a major role at stations with a low number of data, which tend to average out at stations with a high number of data. However, this has no influence on the use of the data for the correction function (no averaging of oxygen values) as we have performed data quality and plausibility check, which we will emphasize more clearly in the manuscript.
We will also revise the introduction and discussion section to better clarify our main messages, namely that a more reliable oxygen assessment is possible if observational data is combined with modelling data providing high resolution information. That practical applications rely on observational data, so modelling results need to be adjusted accordingly. And, in order to apply an integrated approach of observational and modelling data in the future, the monitoring network needs to be reconsidered to meet the requirements for such an approach.
References:
BLANO 2018: https://mitglieder.meeresschutz.info/files/meeresschutz/berichte/art8910/zyklus18/Nationale_Indikatoren_Ostsee_Anlage_1.pdf
Eilola et al. 2011: https://doi.org/10.1016/j.jmarsys.2011.05.004
Friedland et al. 2021: https://doi.org/10.3389/fmars.2021.596126
Friedland et al. 2012: https://doi.org/10.1016/j.jmarsys.2012.08.002
HELCOM 2013: https://helcom.fi/wp-content/uploads/2019/08/Eutorophication-targets_BSEP133.pdf
Meier et al. 2018: https://doi.org/10.3389/fmars.2018.00440
Schernewski et al. 2015: https://doi.org/10.1016/j.marpol.2014.09.002Answers to detailed remarks:
We thank the reviewer for the detailed remarks which will improve the manuscript and we will accept them accordingly.
Below are further more detailed answers to specific questions/remarks:L32: Rosenberg: reference missing.
We guess that R1 is referring to “Diaz and Rosenberg, 2008” which is included in the reference list.L105: Please insert a chapter on data sets used.
As suggested we will add a chapter on the data sets used.L106: as oxygen dynamics are the focus, the model description should contain the description of the benthic and sediment model system.
The recent version of the biogeochemical model ERGOM is described at Neumann et al. (2022, https://gmd.copernicus.org/articles/15/8473/2022/). The oxygen dynamics were analyzed in depth by Börgel et al. (2023, https://www.frontiersin.org/articles/10.3389/fmars.2023.1174039/full). Nevertheless, we agree with the reviewer and a more detailed description of the oxygen-relevant processes will be added.L158: no profiles in Fig. 1.
We agree with the reviewer and we will change Fig. 1 accordingly. The profiles are shown as white dots in the figure. To be more clear a proper description will be added.L177: wrong definition, if P(bar) and O(bar) represent the means.
We thank the reviewer for discovering this error and will address it accordingly.L197ff: please write in more detail. I do not know if a distance between observation and corrected simulation was taken for a simulation cell/time nearest to the observation.
We agree with the reviewer and suggest to include an improved description of the correction function in the revised manuscript.L220: please explain why you switch to a multiplicative correction.
See answer on comment to L396 below.L235: AE>0 does not fit to the corresponding profiles.
The statement refers to the near-bottom oxygen concentrations which will be made clear in the revised version.L247: are the r values for the Kiel Bay correct?
The values have been checked and are correct. One has to consider that here again only the near-bottom oxygen concentrations are analyzed (Figure 2).L267: much stronger model scatter for TF0012.
We agree with the reviewer and will delete the sentence.L269: TF0010 is here used for validation. Fig 1 shows this station with a white dot (for model correction).
This is a misunderstanding. The description should refer to the next closest red dot. The figure will be improved according to the reviewer comments.L269: Tab 2: please explain which data is compared here. Monthly means? Same location/time?
The multi-annual mean near-bottom oxygen concentrations (2010-2019) at same location and time. Information will be added.L273: Fig 4 caption: where are the observations (black)?
Representation of observations will be adjusted in the figure.L280: also negative values appear.
There are neither negative values in Table 1 nor in Table 2 considering the RMSE. Is something else meant?L316: this is a very interesting section. Please introduce these experiments in more detail: How are the different components (distance to shore, ..) calculated? Did you set in (5) ai,bi,ci = 0 for calculating the distance to shore component only?
We agree with reviewer and suggest to enhance the description of the results in the revised manuscript. The reviewer is right, here we computed the change of the model skills by using only one (instead of all) predictor variables from eq. (5), what means that the factors ai,bi,ci,di were set to 0, which were corresponding to the not-used predictor variables.L320: you use different correlation functions. Here it is R2 . In Tab 2 you use r (Pearson’s correlation coefficient), and in Tab 3 the Spearman Rank correlation. Please explain your choice.
We thank the reviewer for pointing out the differences. Indeed Tab 3 should show the Pearson’s correlation coefficient which will be corrected. We will replace the coefficient of determination with Pearsons’s R accordingly.L348: I cannot see an improvement of 0.28.
We thank the reviewer for pointing this out. In fact, Table 3 has not been updated in the latest version, and the difference in the Pomeranian Bay of 0.28 can be seen when the Pearson's corr. coefficient is used.L357: I do not understand the categories: spatial near-bottom oxygen, near bottom oxygen and oxygen profiles. Maybe it’s only to adapt the nomenclature to Tab 3.
Table 3 refers to the data used for correction, here the 3 different analyzed aspects of model performance are meant. We will improve nomenclature accordingly.L384: please explain the boxes, the lines and the dots in Fig 8. Why do we have standard errors for each month? Why not one standard error for each month?
The reviewer is right. It should be "Average monthly standard error for the years 2010-2019" in the figure legend. Figure 8 will be revised accordingly.L396: “oxygen concentration” in the Arkona Bay?
For the MARNET station. The information will be added accordingly.L396: please give a motivation to use this simple multiplicative correction function here.
The reviewer is right. A simple correction factor is easier to understand and more transparent for users. This is why we included this second approach. We will add it accordingly.L433: which baseline?
Territorial sea baseline. Will be changed accordingly.L514: “three different thresholds were used and averaged” – this appears indeed senseless.
We agree with the reviewer but this is the current approach by HELCOM. The reason for the HELCOM approach is the aggregation to EQR values at the end. As some years and areas have no oxygen deficiency situation with conditions <2 mg/l, it was decided to use and aggregate the different threshold values discussed so far (2, 4 and 6 mg/l), as zero values are difficult to calculate EQRs.L517: I would expect an overestimation of oxygen concentrations if the thickness of the model’s deepest layer is larger than 4 m.
Our model system has vertical layer thicknesses of not more than 2 meter in the western Baltic Sea. But to a large extent, the bottom-most cells have even less thicknesses as we work with only partly filled bottom cells to allow better inflow dynamics.L518: “daily resolution”: is this the time step of the model or the resolution of the results which are analysed?
The analysis time step. The time step of the model is 10 minutes. Sentence will be changed to be more clear.L522: Dias et al: reference missing.
Indeed there is a typ-o, Diaz et al. 2011, where the reference is also provided.L539: the transparency is discussed above in the opposite direction.
We thank the reviewer for pointing this out and will clarify that we mean the approach utilizing the simple correction factor.L541: in the frame of OSPAR work such exercises were done too.
We thank the reviewer for pointing out the OSPAR work and will add the reference.L576: discuss also the possibility of data assimilation.
We see that data assimilation for forecasting products is another interesting use case. But for the conclusion we think this would open up a new topic which is beyond the focus in the manuscript and therefore do not want to include it.Citation: https://doi.org/10.5194/bg-2023-152-AC3
-
AC3: 'Reply on RC1', Sarah Piehl, 11 Jan 2024
-
RC2: 'Comment on bg-2023-152', Anonymous Referee #2, 06 Dec 2023
In their manuscript, Piehl et al compare the results of a coupled circulation-biogeochemical model of the Baltic Sea with observations to assess the ability of the model to represent oxygen variability, in particular hypoxic/low oxygen conditions, in the southwestern Baltic Sea. The goal is to use the model in environmental assessments of the Baltic Sea. An observation-based model correction is tested. The authors find that both increased horizontal resolution and applied correction can improve the model performance. They conclude that despite some mismatch with observations the model is appropriate for use in oxygen deficiency assessments.
The title of the manuscript is a bit misleading. The study is limited to model/observations comparison and lacks scientific insights that could be valuable to develop the oxygen deficiency assessment method of the western Baltic Sea. The correction method is an interesting exercise to see where the model fails but should be used to correct the model rather than its results. If the model overestimates oxygen depletion in some regions then oxygen consumption may be overestimated and should be corrected. With the assessment method in mind, the authors should propose metrics (with examples), based on model simulations, that would be useful additions to the current observations used in the environmental assessments of the Baltic Sea. In its current form, the manuscript is more like a technical report and therefore of somewhat limited interest to the broader Biogeosciences community.
Specific comments:L98-104: There seem to be a disconnect between the proposed work (assess/improve model agreement with observations) and the objective of the study (provide a high resolution monitoring system)
Figure 2: Is this a multi annual summer average?
L270-271: This is a bit optimistic, Fig 4 shows a large discrepancy in simulated subsurface oxygen
Section 3.2: Here you correct your model with a set of data and then assess the correct model with the same set of data. A more robust assessment of the model skills would be to correct the model with half of the dataset and then compare the corrected model with the second part of the observations.
L365-368: Figure 3b indicate an underestimation of near bottom oxygen at the Bay of Mecklenburg, which seems to contradict this statement
L412-414: The case study was mainly an assessment of the model skills and not the development of a method for a high resolution monitoring and assessment system. To reach this goal you first need to assess the model and then use its results to develop metrics that are robust and useful addition to the assessment
L421-422: Of course, this is controlled by air-sea gas exchange
L498-499: This is somewhat contradictory to what was said before, the model has some limitations
L532-535: The model assessment could be used to understand the source of the discrepancies and ultimately to improve the model
L552: Computational power remains an issue, especially for highly resolved models
Minor comments/typos:L46: Here and elsewhere: "monitoring data" is not the proper terminology, please rephrase
L47: remove monitoring
L49: the quality and thus the reliability...
L51: Nevertheless, data acquisition is limited...
L88: models provide an approximation of in-situ conditions and will never perfectly align with observations.
This sentence is redundant with the previous one thoughL96: However, due to the coarse horizontal resolution of the model (~5km), nearshore areas...
L98: resolution (~2km) to investigate...
L126: km more appropriate than nautical miles?
vertically resolved z-layers ... from 0.5 at the surface to 2m at the bottom in the deepest areas.L128: Figure 1 does not mention the model, is the red square on the left panel the limits of the model grid and does the left panel show the model bathymetry? or does the model grid cover the entire Baltic Sea?
L135: what is the time unit?
Equation 3: The formula is the mean square error, please correct. Do you mean the Mean Absolute Error(MAE)? Why not using the bias?
Figure 3: - Can you mention in the caption to refer to Fig 1 for the station locations
- Can you move the xtick labels leftward to align with the vertical line?Figure 7: Although the color scale is the same, the color of the observations points vary from Figure 2, can you explain? The captions are the same.
L558: Can you add this reference in the References list?
Citation: https://doi.org/10.5194/bg-2023-152-RC2 -
AC2: 'Reply on RC2', Sarah Piehl, 11 Jan 2024
We would like to thank the reviewer for the careful review of our manuscript. We thereby realized that the manuscript in its current form lacks a clear presentation of the background, rationale and objectives, as all three reviewers wondered why we did not consider to improve the model itself. A critical evaluation and presentation of the suitability of the model was not the aim of this study. We use the model in our study to provide a solution to a practical problem. For the Baltic Sea, the MOM-ERGOM modelling system has been calibrated for decades on mass balances and got validated for key eutrophication parameters (e.g., Eilola et al. 2011, Meier et al. 2018) so that it runs stable in the entire Baltic Sea in the long term with a focus on the western Baltic Sea (Friedland et al. 2012, Schernewski et al. 2015). Reliable long-term methods are also required for oxygen assessments within the framework of HELCOM.
The background of our study is, that current oxygen assessment in the western Baltic Sea are based on sparse heterogeneous observational data. This data is further spatially and temporally averaged, which leads to highly unreliable assessments of the oxygen situation (e.g., BLANO 2018). Models have already been used to address several water quality management issues in the Baltic Sea (e.g., HELCOM 2013, Schernewski et al. 2015, Meier et al. 2018, Friedland et al. 2021), but are not yet an integral part of assessments, although they are already suitable for a combined approach which brings together observational and modelling data. The Baltic Sea is ideal for testing such an approach because, as explained above, the framework conditions of the model system are in place.
We agree with the reviewer that the title of the manuscript and the use of the terms assessment and monitoring is sometimes misleading and we will address it accordingly. Further, the aim of our study does not seem clear, so we will adapt the manuscript to clarify that we are aiming to supplement environmental monitoring data with the help of model data. A high-resolution monitoring product is needed as basis for more reliable and meaningful oxygen assessments. Scientific insights in how an oxygen deficiency metrics could look like was already done in a previous study (Piehl et al. 2023) and is thus not the focus here.
We see that for the background of the study, the comparison of the two different spatial model resolutions is actually a side aspect that leads to the article being perceived as a technical report. We therefore suggest to shorten this aspect and focus more on the two approaches to combine observations with model data. We see the added value of our manuscript for the BGS community in showing a way to generate a high-resolution, reliable data basis for policy-relevant oxygen assessments by combining model results and observational data.
We will also revise the introduction and discussion section to better clarify our main messages, namely that a more reliable oxygen assessment is possible if observational data is combined with modelling data to provide high resolution information. That practical applications rely on observational data, so modelling results need to be adjusted accordingly. And, in order to apply an integrated approach of observational and modelling data in the future, the monitoring network needs to be reconsidered to meet the requirements for such an approach.
References:
BLANO 2018: https://mitglieder.meeresschutz.info/files/meeresschutz/berichte/art8910/zyklus18/Nationale_Indikatoren_Ostsee_Anlage_1.pdf
Eilola et al. 2011: https://doi.org/10.1016/j.jmarsys.2011.05.004
Friedland et al. 2021: https://doi.org/10.3389/fmars.2021.596126
Friedland et al. 2012: https://doi.org/10.1016/j.jmarsys.2012.08.002
HELCOM 2013: https://helcom.fi/wp-content/uploads/2019/08/Eutorophication-targets_BSEP133.pdf
Meier et al. 2018: https://doi.org/10.3389/fmars.2018.00440
Piehl et al. 2023: https://doi.org/10.1007/s10666-022-09866-x
Schernewski et al. 2015: https://doi.org/10.1016/j.marpol.2014.09.002Answers to specific comments:
We thank the reviewer for the specific comments/suggestions which will improve the manuscript.L98-104: There seem to be a disconnect between the proposed work (assess/improve model agreement with observations) and the objective of the study (provide a high resolution monitoring system)
The reviewer is right that we cannot name it monitoring system and will change it accordingly.Figure 2: Is this a multi annual summer average?
No, this is the multi-annual average from 2010-2019 including data from all months. For this spatial comparison it was necessary to include a long time-period and all months in order to reach this density of observational data points.L270-271: This is a bit optimistic, Fig 4 shows a large discrepancy in simulated subsurface oxygen
The statement is based not only on Fig 4 but also on all other profiles in Fig S2 and the Pearson’s correlation coefficient. But the reviewer is right that especially in the lower, near-bottom water layers the discrepancy is large for some stations in some months. Will be addressed accordingly.Section 3.2: Here you correct your model with a set of data and then assess the correct model with the same set of data. A more robust assessment of the model skills would be to correct the model with half of the dataset and then compare the corrected model with the second part of the observations.
Partly this method was already done, e.g. by computing the correction function based on the near-bottom observations and then checking the model skill for the whole column. But we can adjust the posterior correction of the modeled oxygen concentrations by separating the dataset with observations into 2 parts, from which one will be applied to compute the correction function and the other one to compute the skill of the corrected oxygen concentrations.L365-368: Figure 3b indicate an underestimation of near bottom oxygen at the Bay of Mecklenburg, which seems to contradict this statement
The reviewer is right but this discrepancy is due to the fact that for the spatial comparison which was done in Table 3 a higher number of observational data (dots in Fig2/7) was used to calculate the error metrics. In Table 3 only three stations for each basin are shown. We hope to clarify those misunderstandings by adding a short paragraph on “data sets used” (as suggested by reviewer #1) in the Materials and Methods section.L412-414: The case study was mainly an assessment of the model skills and not the development of a method for a high resolution monitoring and assessment system. To reach this goal you first need to assess the model and then use its results to develop metrics that are robust and useful addition to the assessment
The reviewer is correct and as mentioned in the general response above, we will address the issue in the manuscript and ensure that terms/sentences will be changed or deleted accordingly.L421-422: Of course, this is controlled by air-sea gas exchange
Will be re-phrased accordingly.L498-499: This is somewhat contradictory to what was said before, the model has some limitations
Will be re-phrased accordingly.L532-535: The model assessment could be used to understand the source of the discrepancies and ultimately to improve the model
We hope that our general answer above has clarified our aims and why we have not pursued this idea further.L552: Computational power remains an issue, especially for highly resolved models
We agree that our statement needs to be more nuanced and will change it accordingly.Answers to minor comments:
We thank the reviewer for the minor comments/suggestions which will improve the manuscript and we will accept them accordingly.
Below are further answers to specific questions:L126: km more appropriate than nautical miles?
The corresponding kilometers will be added.vertically resolved z-layers ... from 0.5 at the surface to 2m at the bottom in the deepest areas.
Will be changed accordingly.L128: Figure 1 does not mention the model, is the red square on the left panel the limits of the model grid and does the left panel show the model bathymetry? or does the model grid cover the entire Baltic Sea?
The reviewer is right and we will change the sentence and Figure 1 accordingly. This is the official bathymetry of the Baltic Sea as provided by the IOW. Apart from minor adjustments, we also use this for the models. The model grid covers the entire Baltic Sea.L135: what is the time unit?
Will be changed accordingly to “kg P/km2/a”Equation 3: The formula is the mean square error, please correct. Do you mean the Mean Absolute Error(MAE)? Why not using the bias?
We thank the reviewer for pointing towards this mistake. Indeed it shall be the bias and will be corrected accordingly.Figure 7: Although the color scale is the same, the color of the observations points vary from Figure 2, can you explain? The captions are the same.
We thank the reviewer for pointing towards this mistake. Will be addressed accordingly.Citation: https://doi.org/10.5194/bg-2023-152-AC2
-
AC2: 'Reply on RC2', Sarah Piehl, 11 Jan 2024
-
RC3: 'Comment on bg-2023-152', Anonymous Referee #3, 07 Dec 2023
I think this analysis was a rigorous test of model performance in a region of the Baltic Sea. The motivation for the effort was also worthy in that we do need to improve model simulations in the shallower, perhaps more dynamic regions of the coastal zone. Where I think the analysis fell short was in the way the model was evaluated. There are two challenges I have for the authors:
(1) Why were the data aggregated into seasonal means for a long time period and compared to similarly aggregated model means? I can understand that this is one reasonable test of model behavior (getting the mean seasonal cycle correct), but for these models to be useful in scenarios of climate change and nutrient abatement, they need to do better than a seasonal climatology. It is also possible that aggregating data into a climatology might obscure how well the model might be doing to capturing particular events (e.g., wet years, heat waves). I am not sure if I wish for the authors return to the output and data and do a more time-specific comparison, but it would certainly provide another useful test of the model.
(2) The approach for model correction described in this paper is not incorrect, but it is insufficient. To try and post-correct the model with fixed parameters (e.g., distance from shore) seems counterintuitive; surely the more likely reasons for model-data mismatch are an inadequacy in the process rates, which are time and space varying. I think the paper would have been much more effective if the authors did sensitivity tests on some key rate processes (primary production, background diffusivity) to see if that helped improve model performance. This would be an avenue by which the authors might see more meaningful improvements in RMSE and other error metrics.
Below are more specific comments:
(1) Line 99: Is it better to call this "disagreement"? Error implies one is wrong, and the observations, as you correctly point out, have limitations.
(2) Line 110: "wel' should be "well"
(3) Line 118: Just write "as described by Kuznetsov and Neumann (2013)" so you don't need to write the names twice.
(4) Line 121: I am not sure what this means. Do you just mean that the main currency of metabolism is oxygen or carbon, and it is translated to N and P pools through stoichiometry?
(5) Line 126: make "miles" singular
(6) Line 148: should be "emphasizes"
(7) Line 148: Delete the comma after "Both"
(8) Line 158: Do you mean station with vertical oxygen profiles? There is no data in Fig 1
(9) Line 159: Can you offer a little more explanation here? Do I understand that near-bottom values were estimated from near-surface? Presumably, there is uncertainty with this that should be reported here.
(10) line 166: My guess is that you don't really care that the models reproduce a monthly oxygen climatology, but rather you want the models to capture other forms of variability. So why the aggregation here? This is related to one of my main general comments.
(11) Line 198: polynomials?
(12) Line 212: Can you specify what the temporal resolution is here?
(13) Lines 234-243: This writing seems to overlook the fact that the inner part of the Bay of Mecklenberg is an area where the model substantially over estimates DO. this should be addessed.
(14) Figure 3: There does seem to be a case where the models keep oxygen low well into November (Panels a, b, c, d) , but the observations don't show this, why might that be? This is a fairly season-specific problem that could be discussed in detail from a mechanistic standpoint.
(15) Line 310: I think you could more clearly point out that RMSE is more often > 2 mg/L for these comparisons. This is rather high, and worth highlighting.
(16) Figure 6 and related text: Iwould argue that the relative differences in RMSE between variables (purple dots) are pretty small (0.02 to 0.2) relative to the overall RMSE (~2). This would tell me that there are other factors that are more important in driving the discrepancies, likely biogeochemical or hydrodynamic processes in the model. This is not really discussed at all in the paper and is worth more discussion.
(17) Figure 7: It would probably be more effective to show these as "differences from corrected" and forget the observations. This would make it clearer to see where the improvements occurred. It seems that only the deep Arkona basin was really improved by the model.
(18) Line 386: should be "analysis"
(19) Line 339: This sentence is worded really oddly. How about "Since model skill was not equally good...."
Citation: https://doi.org/10.5194/bg-2023-152-RC3 -
AC1: 'Reply on RC3', Sarah Piehl, 11 Jan 2024
We would like to thank the reviewer for the careful review of our manuscript. We thereby realized that the manuscript in its current form lacks a clear presentation of the background, rationale and objectives, as all three reviewers wondered why we did not consider to improve the model itself. A critical evaluation and presentation of the suitability of the model was not the aim of this study. We use the model in our study to provide a solution to a practical problem. For the Baltic Sea, the MOM-ERGOM modelling system has been calibrated for decades on mass balances and got validated for key eutrophication parameters (e.g., Eilola et al. 2011, Meier et al. 2018) so that it runs stable in the entire Baltic Sea in the long term, with a focus on the western Baltic Sea (Friedland et al. 2012, Schernewski et al. 2015). Reliable long-term methods are also required for oxygen assessments within the framework of HELCOM.
The background of our study is, that current oxygen assessment in the western Baltic Sea are based on sparse heterogeneous observational data. This data is further spatially and temporally averaged, which leads to highly unreliable assessments of the oxygen situation (e.g., BLANO 2018). Models have already been used to address several water quality management issues in the Baltic Sea (e.g., HELCOM 2013, Schernewski et al. 2015, Meier et al. 2018, Friedland et al. 2021), but are not yet an integral part of assessments, although they are already suitable for a combined approach which brings together observational and modelling data. The Baltic Sea is ideal for testing such an approach because, as explained above, the framework conditions of the model system are in place.
Further, the aim of our study does not seem clear, so we will adapt the manuscript to clarify that we are aiming to supplement environmental monitoring data with the help of model data. A high-resolution monitoring product is needed as basis for more reliable and meaningful oxygen assessments.
We chose 10 years for our study in order to have an appropriate sample size for each month regarding the observations which generally have a low data coverage. The data aggregation was carried out in such a way that it corresponds to the current assessment approaches for oxygen deficiency. Scenario simulations as well as capturing particular events was not the focus of this study. Especially the later can become a challenge for oxygen assessments which needs to be addressed in future studies.
We will also revise the introduction and discussion section to better clarify our main messages, namely that a more reliable oxygen assessment is possible if observational data is combined with modelling data to provide high resolution information. That practical applications rely on observational data, so modelling results need to be adjusted accordingly. And, in order to apply an integrated approach of observational and modelling data in the future, the monitoring network needs to be reconsidered to meet the requirements for such an approach.
References:
BLANO 2018: https://mitglieder.meeresschutz.info/files/meeresschutz/berichte/art8910/zyklus18/Nationale_Indikatoren_Ostsee_Anlage_1.pdf
Eilola et al. 2011: https://doi.org/10.1016/j.jmarsys.2011.05.004
Friedland et al. 2021: https://doi.org/10.3389/fmars.2021.596126
Friedland et al. 2012: https://doi.org/10.1016/j.jmarsys.2012.08.002
HELCOM 2013: https://helcom.fi/wp-content/uploads/2019/08/Eutorophication-targets_BSEP133.pdf
Meier et al. 2018: https://doi.org/10.3389/fmars.2018.00440
Piehl et al. 2023: https://doi.org/10.1007/s10666-022-09866-x
Schernewski et al. 2015: https://doi.org/10.1016/j.marpol.2014.09.002Answers to more specific comments:
We thank the reviewer for the specific comments/suggestions which will improve the manuscript and we will accept them accordingly.
Below are further answers to specific questions:(4) Line 121: I am not sure what this means. Do you just mean that the main currency of metabolism is oxygen or carbon, and it is translated to N and P pools through stoichiometry?
The main biogeochemical fluxes are related to C, N and P. All the processes changing them result in a production or consumption of oxygen, which is calculated based out of the physical and chemical equations and fluxes. A more detailed description will be added also to the other reviewer comments.(8) Line 158: Do you mean station with vertical oxygen profiles? There is no data in Fig 1
Yes, will be addressed accordingly.(9) Line 159: Can you offer a little more explanation here? Do I understand that near-bottom values were estimated from near-surface? Presumably, there is uncertainty with this that should be reported here.
We agree that the description is confusing and we will address it accordingly. As reviewer #1 recommended we will insert a chapter on data sets used.(10) line 166: My guess is that you don't really care that the models reproduce a monthly oxygen climatology, but rather you want the models to capture other forms of variability. So why the aggregation here? This is related to one of my main general comments.
We chose 10 years for our study in order to have an appropriate sample size for each month regarding the observations which generally have a low data coverage. The data aggregation was carried out in such a way that it corresponds to the current assessment approaches for oxygen deficiency (see also our answer to the general comment above).(11) Line 198: polynomials?
Yes, will be changed accordingly.(12) Line 212: Can you specify what the temporal resolution is here?
We will add that the MARNET buoy data is available in hourly resolution.(13) Lines 234-243: This writing seems to overlook the fact that the inner part of the Bay of Mecklenberg is an area where the model substantially over estimates DO. this should be addessed.
We guess you mean underestimation of DO? Will be addressed accordingly.(14) Figure 3: There does seem to be a case where the models keep oxygen low well into November (Panels a, b, c, d) , but the observations don't show this, why might that be? This is a fairly season-specific problem that could be discussed in detail from a mechanistic standpoint.
The comparison of the model results with salinity measurements indicates an overestimation of the stratification in the model, which could lead to too weak vertical oxygen transport in the late autumn and winter months. Will be discussed accordingly.Citation: https://doi.org/10.5194/bg-2023-152-AC1
-
AC1: 'Reply on RC3', Sarah Piehl, 11 Jan 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
431 | 133 | 47 | 611 | 56 | 34 | 38 |
- HTML: 431
- PDF: 133
- XML: 47
- Total: 611
- Supplement: 56
- BibTeX: 34
- EndNote: 38
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1