|Review of "Modelling polar marine ecosystem functions guided by bacterial physiological and taxonomic traits" by H. H. Kim et al.|
For the revised ms the authors expanded their model from 0D to 1D (3 layers). The ms has greatly improved, mostly by providing the model equations (although apparently incomplete, see below), whose omission had made it impossible for me to understand the model structure in the previous round. The parameter estimation has much improved and its description has become OK now. But even after the long time it has taken the authors to prepare the revised ms, it still leaves a strong impression of sloppiness. Several sentences are simply incomprehensible and little attention seems to have been paid to the readability, correctness, and design of some of the figures. Only some of the changes to the previous ms are highlighted. The model description in the main text is still very much unclear and this applies also to the mode concept. Nevertheless, having seen the equations, the study seems to be much better than I had feared based on the original ms. After another major revision or two, I now think it could become a useful contribution.
One of the remaining problems is the confusion of assumptions and results. The authors mention in the response letter (introductory para, point 4) that the "larger cell-specific BP and SDOC uptake rates of HNA cells than those of LNA cells" indicate the robustness of their analysis. This finding is also referred to in the results and discussion sections (lines 276, 384–385). But this is a model assumption, not a result: "maximum bacterial growth rate of the HNA group (μHNA, d-1) was ensured to be optimized to be higher than that of the LNA group (μLNA, d-1)" (lines 168–169).
The authors have now clarified that their model considers flexible (Chl:)C:N:P stoichiometry, but this is mentioned only in the equations and the supplement. This information must be provided in the main text, e.g., under Sections 2.1 or 2.2 or a new 2.x section, as this information is quite crucial for understanding the model design. The statement that the model has 12 state variables (line 81) is simply wrong (I counted 32). This misinformation had led me to conclude that the model was based on a fixed-stoichiometry approach in my previous review. Fig. 1 has been amended regarding the flows of inorganic nutrients to phytoplankton. But it still remains a source of confusion. Fig. 1 shows two compartments, "Higher level" and "RDOM", which do not have corresponding differential equations, so the authors should either add the missing equations or modify Fig. 1 to clarify what these are (this applies also to Fig. 5).
The sentence "Total (bulk) bacterial production (BP; BP = BPHNA + BPLNA) was constrained by observations, and therefore, the group-specific production (BPHNA and BPLNA, mmol C m-3 d-1) was determined during optimization:" (lines 99–100) is unclear. Does this mean that Eqs. (3) and (4) apply only during the optimization? How do you calculate BP_HNA and BP_LNA when not optimizing?
On line 104, you state that "The modelling framework consisted of a dynamic (mechanistic) part and a data-driven part (Figure 2)" but Fig. 2 is about the data assimilation scheme and does not show or mention dynamic and data-driven parts.
On lines 109–110, you introduce fmodes as functional modes, but even after reading the whole ms several times, it remains unclear what these are, e.g., which functions the fmodes describe. Since the fmodes are used later on in the statistical analysis, they should be explained clearly.
On lines 310–311, "These results suggest a clear link between the modelled ecosystem functions and observed bacterial taxonomic (modes) and physiological (fHNA) traits observations." This, together with the absence of any significant relations for fmodes, seems to indicate that the functions (not decribed in the ms) of the functional modes were chosen inappropriately.
The authors moved from a 0D to a 1D setup for the model but do not provide any information about the 1D setup except the depth levels. No indication about vertical mixing is given in the equations either, so they must be considered incomplete. Also, I am not convinced that, given the shallow model domain (20 m), a 1D design provides a significant advantage over 0D. But again, essential information is missing to allow a firm judgement, e.g., the depth of the mixed layer and its seasonal variations. If the mixed layer is usually deeper than 20 m at the modelled site, then a 1D model offers no advantage over a 0D model. Also, no information is provided regarding the vertical geometry (are the three layers of the same height?) or the mixing scheme (implicit, explicit, positive definite, etc.). The authors should also indicate how the mixing coefficients were obtained or calculated. The reference to Kim et al. (2021) is insufficient, as this has not been published. The authors should just add a short section describing the vertical configuration and modify the equations accordingly.
The concept of the bacterial modes remains rather confusing. Since this is one of the main foundations of the present study, this must be clarified. My main problem with Fig. 6 is still that the explanation of the modes (also in the authors' response letter) does not seem to match what is shown in the panels. For example, Candidatus Pelagibacter ubique is supposed to dominate mode 6 (Fig. 6c) and C. Thioglobus singularis should dominate mode 1 (Fig. 6e). However, the relative abundance of C. T. singularis in mode 1 never exceeds 0.25, whereas C. P. ubique has relative abundances between 0.25 and 0.35 in mode 1, according to panel c, so it appears that both modes 1 and 6 are dominated by C. P. ubique. I did go through Bowman et al. (2017) but could not find an explanation there either.
The description of the data assimilation and parameter optimisation has become much more accessible by the added explanations. Still, several points remain unclear.
On lines 166–167, you write "... group-specific bacterial model parameters were optimized in the direction to properly represent the dynamics associated with each group ..." I do not understand what this means, even with the explanation in the next sentence, which describes a constraint imposed in the maximum bacterial growth rates.
On lines 187–188, "When converting Chl to phytoplankton C (N) biomass, the maximum Chl to N ratio was used along with other reference ratios ..." it remains unclear why you use the maximum (rather than, e.g., an average) Chl:N ratio and what the other reference ratios are. This must be clarified. Also, throughout the ms, you mostly refer to C biomass, so it is unclear where, when, or why you convert to N biomass here.
On lines 198–199, you refer to "... normalized costs of individual data types (J’m) ..." The J'm seem to be indicated in Table 2 but they are never defined.
On lines 205–213, you refer to the depth of the mixed layer as affecting the calculation of the target error. Besides the lack of information on the mixed-layer depth, it is also unclear whether the mixed layer was always deeper than 20 m, or whether you always applied the same CV throughout the whole model domain. Please explain clearly.
On lines 240–241, "... 5-7 constrained parameters and 3-6 optimized parameters ..." Please clearly explain how you define and determine constrained and optimized parameters and how they differ. Also, explain CS and OP in Tables S2–S6.
On lines 254–155 "However, the model skill for HNA biomass slightly degraded in the climatological model (Figure 3b), with lower correlations and normalized standard deviation and higher RMSD than the four years together (Figure 3a)." Since the individual years have been optimized individually, this result was to be expected. What could be more informative is a comparison with simulations for the individual years but with the same parameter set, e.g., the most portable one. This could provide insight into the influence of parameter differences compared to that of different boundary conditions and forcings between the different years.
These sentences are incomprehensible and must be corrected. I could not figure out what you wanted to say here:
Lines 273–275: "C stocks and flows averaged over the growth (Figure 5) and normalized by NPP (normalized by NPP in 1-day for C stocks; Figure S9) season for each year summarized an annual snapshot of the group-specific bacterial dynamics."
Lines 283–284: "NO3, POC, and SDOC in unassimilated years were modelled to values comparable to those in other assimilated years (Figure 5)."
Line 399: "Modelled nutrient stocks were above detect limits and indicated the lack of macronutrient limitations."
Fig. 3: The caption for panel a (2010 - 2013) is confusing. The simulations also cover 2014 and this caption gives the impression that the simulations went through 2010–2013 continuously, which is not what you did. You should come up with a better caption. Also, as mentioned above, a third panel showing results for different years with a single parameter set could be useful. The last sentence of the caption seems to make no sense, since the x-axes of both panels are the same.
Fig. 4: The numbers and letters are very hard to read and often overlap. Maybe rearrange in 4 columns (2 for the means and 2 for the CVs)? The units in (b) are wrong, the CV is dimensionless.
Fig. 5: The caption says that the panels show C stocks and flows in units of mmol C m-2 and mmol C m-2 d-1 but the panels also show NH4, NO3, and PO4, so the associated numbers must have different units. The caption explains the numbers in the first rows and the numbers in parentheses but not the numbers in the second rows next to the arrows. Then it says that "N and P flows, as well as the flows smaller than 0.01 mmol C m-3 d-1, are omitted." but the panels show arrows from and to NH4 and to the inorganic nutrient compartments.
Fig. 8 suffers from the same problems as Fig. 4 (% numbers are also dimensionless). In addition, the first rows in (b) should be left out as they are always 0 by definition.