|The authors have revised their manuscript in accordance to the referees comments, but have not adequately responded to all identified problems. The long list of comments below indicates that there are still serious issues. I have especially problems with some of the methods and analyses which are below the standard of such a high ranking journal. In addition, much of the interpretation and conclusions is based on the inclusion of the results from the first days the MCs have been conducted. These are largely influenced by the sudden decrease of pH which is not naturally occurring The authors have not responded to this issue. In general, analyses should be restricted towards the end of the mesocosms period when animals are acclimated. In my opinion, some of the conclusions are, therefore, faulty. Carry-over effects (size) are also not discussed. In summary, I still cannot recommend publication of the manuscript in Biogeosciences.|
The authors have misunderstood the original comment by one referee. Here, it is not very important to know that there is a seasonal variation in pH. Here the referee questions whether it makes sense to conduct the experiments at a time point at which the pH is minimal. Thus, a community acclimated to a high pH (8.15) is subjected to a sudden decrease of more than 0.6 units. This might cause unwanted treatment effects such as a comparison of an not acclimated population at reduced pH vs acclimated population at ambient conditions. As one of the refrees has pointed out, the development of female size in the mesocosms point to such acclimation effects by showing a delayed size increase in the low pH mesocosm compared to ambient mesocosms. This brings cohort development out of tune during the first days and, thus, the developing populations in each of the mesocosms experience different combinations of pH, T and food. Thus, the rates measured cannot be interpreted on the background of pH alone. Under such conditions, maternal effects induced by pH can hardly be analysed, but require that a new cohort/generation has developed. That different pH can be experienced by vertical migration does not change much because intermittent exposure might contrast constant exposure and gives in addition animals a choice. (calculate how long this would take…..The authors have not responded to this important issue raised in the original review.
Estimates of egg production is apparently based on the incubation of bulk 17 females only. This is quite below the recommended number (Laabir et al. 1995, Runge & Roff 2000) for this type of experiments. The volume-copepod ratio is also poor (200 mL are recommended). In general egg production can be variable and in the present work no variability estimates are done (recommended are 5-6 replicates (Runge & Roff 2000) (Page 5, line 26). The authors also argue that they have replication in time. This is not correct because size changes indicate the development of cohorts in the respective mesocosms, so principally animals with different history are compared. This is critical because cohorts developed differently in the various mesocosms (e.g. delayed cohort development in the low pH mesocosms) Thus, the cohorts have also a different history regarding other factors than pH. Therefore, replication in time is doubtful.
During egg production, the pH increased in several bottles. This is unexpected considering the dim-light conditions and 17 animals added. How is this explained? In egg hatching experiments there was no outgassing, but partly large changes in the pH in both directions. Any explanations? (Page 5, line 30).
Formalin shrinkage, for instance, depends also on temperature at sampling (Kapiris et al. 1997). Thus, one should use length measurements of fixed animals in comparisons only if these kind of effects are known. I assume little is known about similar effects of RNAlater. (page 6, line 7).
The authors argue in their response that ‘contamination’ of eggs in the incubation water was low because females are deep and egg sink out. However, nauplii normally have very shallow distributions and might have biased the counts and according to the description in such cases counts were corrected (Page 6 line 26-32). The sentence in line 26 does not make sense to me; ‘calculated … only’ implies introduced females and eggs were included. So why only?
To exclude the nauplii > stage 4 is very subjective, especially since delayed hatching/development of nauplii occurs and thus such procedure affects treatments differently. Thus, there are many assumptions concerning numbers of eggs and nauplii inherent in the methods. The authors have not responded to this issue as well that the egg transplant study transfers eggs from the low pH into high pH for some low pH mesocosms but into basically similar pH in the high pH mesocosms. Similar to the initial setup, individuals (this time embryos) from the low pH treatment are subjected to large changes, but not the individuals from the high pH. Thus, maternal effects are tested in the low pH treatments only. (Page 6 line 32).
Food characterized by TPC < 55 µm, C:N: It is correct that this size fraction includes food particles > 10 µm. However, dynamics of both size fraction do not correlate; e.g., small zooplankton <10 µm can cause a high TPC (as shown in Paul et al. 2015 for all mesocosms; roughly only 25% of TPC is > 10 µm), but do not contribute much to nutrition of zooplankton. The authors are referred to the papers of Dam et al. for the predictive power of such fractionation. Moreover, a filtration pressure of 200 mbar as used in the present study will likely destroy important food organisms in post bloom conditions such as ciliates (Taylor & Lean 1981, Broglio et al. 2003) and underestimate food concentrations. Thus, TPC is a poor predictor in the present case; in addition, size-selective feeding which is common under the described environmental conditions further complicates the straight forward interpretation conducted by the authors. When TPC < 10 µm is available, the authors could have used TPC < 55 µm minus TPC < 10 µm. This all applies to C:N as well (Page 7 line 23, and response to p. 188548, line 8).
Why 20 eggs? It is recommended to use at least 30 eggs (Laabir et al. 1996, Runge & Roff 2000). Thus, more than one egg sample does not meet the standards and day 24 needs to be excluded from the statistical analyses (Response to authors reply p 188546 line 17 in the original manuscript).
To exclude only day 3 from the analyses of length versus environmental condition is not sufficient. The first 10 days of the experiment T was < 11 degrees. Development time increases exponentially with decreasing temperature (e.g., the work on generation time by McLaren et al.) roughly doubling the G-time given by the authors in their response. These estimates are based on food replete conditions, thus a further increase in G-time is expected for the mesocosms. Thus, it is expected that copepodites alone take already more than 17 days to mature during the initial half of the experiment (Page 8 line 15). Thus, length measurements of females beyond day 17 largely represent a mix of females with different history. Moreover predators are excluded and the life expectancy of ‘old’ females is likely large (they can be present for weeks), increasing their inclusion in size estimates far beyond the first 3 days. This cannot be ignored, especially since the females were small from the beginning in the low pH mesocosms.
Reproduction: While it is acceptable that the authors do not present the weight specific egg production, the effect of size on body weight needs to be discussed because this can have an effect on EPR. The error bars of length measurements are missing (n=17) (Page 9, line 13; response to 188550, line 6).
Length analysis: When I understand correctly, these analyses (Table 2) are based on the whole experimental period including day 10 to 45. As outlined above, the females from day 10 to 20 have a history largely outside the mesocosms and should be excluded. Table 1 and 2 are confusing, as Table 1 says that only TPC was used in LMM/GLMM for length (Page 9, line 18; response to 188550, line 6).
Maternal effects: Apparently, all measurements of egg hatching success are included in the analysis (this needs to be described in the MatMeth). The original results (hatching in Baltic Water) are not described. As it looks like much of the relationship to pCO2 is driven by first day results in which stressed animals (MC 3 and 8 just adjusted to low pH) are compared to acclimated animals basically kept in the similar environment of a pH around 8. The conclusion of a threshold is erroneously based on these days. Diatoms (Skeletonema) was present during the initial phase of mesocosms and could contribute to low hatching in combination with pH as well. Maternal effects (animals with at least some history in the respective treatments) could be tested from day 16 onwards, but this would also require estimates of within treatment variability for each measurement (nevertheless, no effect was found for these days!). The assumption that all animals share the same history in GLMM is basically wrong. The females tested are a mix between old and newly maturing, and the population is constantly changing, but potentially not at the same pace (e.g., potential effects of delayed development) (Page 10, line 12).
This conclusion is not logic: A negative effect of pCO2 on the ration mesocosms/Baltic would imply that hatching in maternal environment is smaller than in the Baltic with increasing CO2, which contrasts with the statement given here (Page 10, line 18).
The authors have not responded to the referee’s suggestion to adequateky include a proper analysis of history. In my opinion the threshold conclusion on egg hatching is based on the first days results, which basically are inappropriate to analyse ‘adaptive maternal’ effects as the history of females is largely outside the mesocosms. The vanishing of the effects with time argues against this conclusion (page 11, line 2)
When food is limiting, interactive effects of food and temperature exists for length and egg production. Whether this is interesting for the authors or not, it should be included in analyses as T was not constant (page 11, line 2).
Where does the negative correlation of TPC originates from? The authors could include some ideas on the effects of changing phytoplankton communities.
The discussion of transgenerational effects is redundant because the history of the females in the mesocosms is for large periods unknown. Only in the second half of the experiment there is a change that a new generation contributes to the population (this coincides with the disappearance of negative pCO2 effects). However, the population is likely a mix of several generations. (Page 12, line 21)
The authors have not responded to the referees objections that the smaller size in the lowerest pH MC was carried over from the field, and that any delayed development due to the initial large decrease in pH (which is not experienced as such by populations!) changed the conditions under which new females developed to more unfavourable conditions (low food, higher T; again interactive effects) which have a strong influence on size. Under such conditions mesocosms cannot be compared for pH effects alone Page 14, line 1).
The estimates of G-time are erroneous. Dzierzbicka et al (2009) used an average of very different species (A. clausi, A longiremis) for their estimates of Acartia spp. in a modelling study (not bifilosa), of which one is a polar copepod (A. longiremis). Underlying these estimates are food replete conditions which do not apply to the mesocosms. The initial T conditions were lower than 17 degrees. The estimates of G-time are thus largely overestimates. Error bars (n=17) should be presented and females are largely a mix of generations rather than separate generations.