As already noted in my first review, I really appreciate the very thorough and exhaustive attempt to investigate the sensitivity of a biogeochemical model that will be, at a later stage, used for projections of climate change and other applied tasks. This paper could provide important information on global biogeochemical model sensitivity with respect to structural uncertainty, even for users not applying this particular model. However, so far, and for this purpose, I find the discussion of results somehow incomplete. I think discussing the outcome of these experiments before the background of earlier studies that also deal with global model uncertainty and performance would enhance the paper's impact, and put this work into a broader context.
For example, I am not convinced that the results indicate a larger importance of structural sensitivity compared to uncertainties related to physical or parametric uncertainty. To my impression, so far there is little (with respect to the metrics applied) evidence that mean or median perform much better than the default model (see below, point B). I think this should be discussed more, e.g. before the background of the study by Kwiatkowski et al. (2014), who showed that "No model is shown to consistently outperform all other models across all metrics." when comparing six models (among them MEDUSA) against surface tracers such as DIN and Chl. Likewise, Galbraith et al. (2016) or Kriest (2017) found that models of varying complexity performed quite similar with respect to metrics such as RMSE etc. The importance of parametric uncertainty was addressed in several studies, even at a global scale. There are also many studies that deal with the uncertainty due to physics (starting with the "classic" study by Najjar et al., 2007), and show a large impact of circulation or phenomena at smaller (meso) scales.
Secondly, different equations, as applied in this study, necessarily result in different nutrient or food affinities of plankton, as briefly discussed by the authors; therefore, to some extent the structural sensitivity also includes some parametric sensitivity. Again, this might also be worthwhile discussing (see below, point A, and specific comments for p15).
I would thus recommend an overhaul of the discussion section, as already suggested by Reviewer #1, to give this paper a wider impact and "to address the big picture".
(A) Authors response to my comment 2 on fitting procedure of different structural forms
"However, we are trying to capture the whole range of nutrient and phytoplankton at all the different region, [...]"
As shown in Figures 3, 9 and 14, the full range of observed and simulated nutrient concentration is between ~0-10 mmol/m3, and the range of Chl ~0-1 (PAP, Cariaco), 0-0.3 (ALOHA and BATS) and ~0-6 (L4), i.e. more than an order of magnitude lower than used for curve fitting. I would suggest to give the range over which the functions were fitted in the paper (method section).
Further, in their response the authors state that
"Suppose we are optimising the nutrient uptake on the similar range of station BATS and ALOHA (with maximum nitrogen and phytoplankton concentration of 5 mmol N m- 3, shown on Figure 3, although at stations like Cariaco, PAP, and L4, we may see nitrogen larger than 5 mmol N m-3), the functions still deviate at low nitrogen and phytoplankton concentration."
This is very important information, and I would suggest to mention this in the paper, together with a full description on how the fitting was carried out. (Was the range between 0-100 discretized, and if so, into how many bins? Were the discrete values equally distributed over the range?)
I still think this issue (fitting different functional forms, the side effects on affinities and rates in different oceanic regimes) might merit more in depth discussion and detailed presentation.
B) I find the analysis of model performance for the different structural types (ensemble mean or median vs default) somewhat biased and incomplete: For example, Table 3 shows, that out of the 51 criteria listed in that table for r, RMSE and Bias for DIN and Chl (profile, surface, integrated), the default model is the best model w.r.t. to the bias and also r, compared to ensemble mean or median. It also outperforms mean and median at BATS, and, all three criteria combined, is as good as the ensemble mean at ALOHA. Also, for RMSE some of the values do not seem to provide a clear evidence that the ensemble mean or median is better: for example, at ALOHA the RMSE for DIN and Chl (profile or surface) is very similar among the different models.
Specific comments:
p1, line 6-7: "that describe the key biogeochemical processes" - I would suggest to rephrase this to "that describe some key biogeochemical processes", as, at a larger scale, phytoplankton light affinity, recycling and sinking organic matter may also be regarded as key processes.
p1, line 13: "Changing mortality" - which mortality: both zooplankton and phytoplankton?
p1, line 14-15: "The RMSEs between in situ observations and the ensemble mean and median are mostly reduced compared to the default model output." - See above, point B. Why not address the bias or correlation coefficient here, and in the discussion?
p2, Introduction: This is probably just a minor point, but usually global models are NPZD models, with detritus (and/or DOP) being an important component in the recycling and vertical redistristribution of nutrients.
p2, line 17ff: "Moreover, in order to investigate the effect of global climate change and anthropogenic activities in the ocean, marine biogeochemical models are now being embedded into earth system models." - Simple marine biogeochemical models were first embedded into GCMs in the 90ies, and into ESMs possibly around 2005 (e.g., UVic, Schmittner et al., 2005). Perhaps it would be better to rephrase this as "[...] activities in the ocean, MORE COMPLEX marine biogeochemical models are now being embedded into earth system models."
p3, line 14: "A few studies have investigated the effects of biogeochemical process formulations, e.g. Yool et al. (2011) has demonstrated [...]" - Perhaps better "A few studies have investigated the effects of biogeochemical process formulations. For example, Yool et al. (2011) have demonstrated [...]" ?
p4, section 2.1: I suggest to give range of nutrient concentrations over which the functions were fitted, and some details of the fitting procedure here (see A).
p5, line 5 "which are high quality food sources" - In the model? In reality? Is this of importance here, and if so, how is it embedded in the model?
p5, line 12: "These functions both become constant at a maximum grazing rate." - Perhaps better: "These functions approach a maximum grazing rate at high concentrations of prey."
p5, line 25: "as shown on Fig. 1(b)." - Fig 1(c)?
p5, line 27ff "integrated over the range of prey density" - Give range here (see p4, section 2.1)
p6, line 5: "From a previous 3-D MEDUSA run, in the oligotrophic regions show" - skip "in".
p6, line 9: "MEDUSA also contains both slow and fast detritus sinking factors." - What does this mean? Sinking speeds? Remineralisation length scales? Perhaps better "MEDUSA also parameterises slow and fast sinking detritus" ?
p6, line 14f: " We chose a lower sinking rate of 0.1 m day to prevent depletion of state variables particularly at the shallower stations." - Does this mean sinking organic matter is buried at the sea floor? If so, I would mention it here.
p7, line 18f: "Our 1D model uses these same 63 depth levels vertical resolution." - The same as the global (MEDUSA) model? Then, on p8, line 9: "The model is simulated at 37 depth levels, [...]" - I am a bit confused - How many levels did the 1D setup have - 63 or 37?
p7, line 18f: "we use the integrated nitrogen over 200 m (integrated nitrogen / depth)" - is this meant to be "dissolved inorganic nitrogen averaged over the upper 200 m"? I would suggest to refer to "averaged" when appropriate (also in some of the figure captions); Additionally, if nutrients are mean it should be "nitrate" or DIN or "dissolved inorganic nitrogen", because "nitrogen" alone can also include the organic forms.
p 9, line 25: "Chlorophyll and nitrogen profile 10 year means"? What does this mean? "Observed mean profiles of chlorophyll and DIN"?
p15, lines 15ff: " In order to maintain phenomenological similarity, these functions are calibrated using non-linear least squares, while keeping the maximum process rates fixed." - To me it seems as if the phenomenological similarity is maintained with respect to the rates integrated over the nutrient and Chlorophyll range considered; i.e. between 0-100 mmol/m3 or mg/m3. This could downweigh the affinity at very low nutrient or chlorophyll concentrations. (See above, A.) Thus, phenomenological similarity is maintained with regard to a certain criterion. As stated by the authors in their reply, fitting the curves over a narrower range results [0-5] in almost the same curves, which would strengthen the above statement of phenomenological similarity. I would suggest to add a few sentences about this here.
p15, lines 19ff: "Applying structural sensitivity in the 1-D framework has also allowed a large parameter space of concurrent variations to be explored for several different oceanographic regions, and with minimal computational cost." - How is the parameter space explored, particularly when considering the above statement about phenomenological similarity?
p15, line 24: "have" -> has
p15, line 25: "This is expected as at low concentrations, using the G2 function would graze more phytoplankton, as shown on Fig. 1(b)." - Again, here the effect of the fitting procedure for different functions shows up: perhaps adjusting the k's of the different function such that they become similar at very low concentrations might have resulted in similar results at the oligotrophic stations BATS and ALOHA? Thus, the effect of different functional forms seems to be mingled with the effects of parametric uncertainty. Of course, this cannot be avoided, but I think I think it could be discussed more.
p16, lines 19ff: "At most of the stations, the ensemble mean produced lower RMSE compared to the default run, suggesting that the structural ensemble with a wide range of predictions covering the in situ observations, is likely to produce a mean field closer to the observation, than a single-structure model." - I am still not fully convinced about this - see my above comment B.
p18, line 8: "widely used in the community" - In which community?
References:
Galbraith et al. (2016), Complex functionality with minimal computation: Promise and pitfalls of reduced-tracer ocean biogeochemistry models. Journal of Advances in Modeling Earth Systems, doi:10.1002/2015MS000463
Kriest (2017), Calibration of a simple and a complex model of global marine biogeochemistry, Biogeosciences, doi:10.5194/bg-14-4965-2017
Najjar et al. (2007), Impact of circulation on export production, dissolved organic matter, and dissolved oxygen in the ocean: Results from Phase II of the Ocean Carbon-cycle Model Intercomparison Project (OCMIP-2), Global Biogeochemical Cycles, doi:10.1029/2006GB002857, 2007
Schmittner et al. (2005), A global model of the marine ecosystem for long-term simulations: Sensitivity to ocean mixing, buoyancy forcing, particle sinking, and dissolved organic matter cycling, Global Biogeochemical Cycles, doi:10.1029/2004GB002283 |