|This is the second round of review for Koelher et al. I thank the authors for addressing and answering in detail the reviewers’ comments. I think that the authors have done good work reviewing the manuscript and that this has helped make the manuscript more robust and streamlined. I still have a few minor comments for the authors to consider. Please note that I refer to the Lines from the manuscript with track changes, sorry for the confusion.|
- Figure 2- and table 3: why the expectation of linear regression? how are the residuals looking? (any patterns?). In many cases it looks like a non-linear relationship would be more appropriate, maybe some GAMs with restricted k smoothing factors. For example, Fig 4c. SR obs does not seem to follow a linear relationship (residuals not homogeneously distributed, all observed SR at low salinity are under the regression line)
- Incidence (e.g. L83 and throughout). The term “incidence” is not defined. Is an incidence an observation / occurrence? If so, then why put in opposition with “presence” data (e.g. L460). When reading I thought that the term was sometimes confusing, notably when calculating % incidence of undetected species. You could define what is “incidence information” the first time the word is used.
- L81: definition of coastal areas? distance from the shore? For IBTS data, which criteria is used to be defined as coastal sampling?
- L32-33: the link between the first and second part of the sentence does not follow a logical flow, consider separating and rephrasing.
- L130 “Geographical delineation”. I might have missed it, but which geographical delineation are you referring to?
- L165: This is done per sub-basin?
- Table 2 “species incidence” and footnote a. Is this the sum of all samples’ unique species richness? (Referring back to comment earlier on incidence). Footnote c also needs clarification.
- Table 3: There are so many relationships tested that the p-value has little meaning, you could consider adjusting p-values for multiple comparisons, or delete the p-value, the R2 on its own can give support on whether there is a clear pattern or not. I am not a big fan of p-value, but would of course understand if the authors want to keep it.
- L275: instead of “exist” maybe “present in the sub-basins”
- Fig 1. could you increase the height of the figure? small height gives a wrong impression that all curves have reached the asymptote.
- L438: So more concretely, the IC suggests that how many species are probably occurring but not observed, 1-2 species still not identified? More?
- L445: this sentence could be improved: “which are only more rarely present”?
- L451: Based on your results I would say that SR was also well described for rare species? And that few, extremely rare species might still not be listed
- L462: “50-90%” where does this range comes from? Maybe should be written in the results section. "are found" (= are occurring in the area) or 50-90% were "observed in the samples"? If the last one, then shouldn’t it be “at least 50-90% …"?
- L516: "expected" instead of indicated? you didn't test
- SC: SC, throughout the text for simplicity maybe only write "sample coverage" without the abbreviation as it is not consistently used
- L545: do you rather mean “functionally-driven” differences?