Reply on RC2

Raberg et al. investigate the relative abundance of branched GDGTs in a number of highlatitude lakes, and compare those distributions to previously published datasets from the tropics and mid-latitude sites. Through this work, they empirically derive global calibrations of the brGDGT distributions to temperature, salinity, and pH for use in paleoclimate reconstruction. The authors provide a comprehensive analysis of the existing and new data and develop a number of new indices to quantify the distributions of brGDGT abundances, some of which improve our understanding of how the lipid structures vary in response to environmental conditions. Overall, the authors have done a very good job and the new methods will be of wide interest to organic geochemists and the paleoclimate community. I recommend publication with minor but important revisions that I hope will improve the manuscript.

sets that ultimately proved useful.

We thank the reviewer for this comment. The structural sets were formed solely to isolate each structural variable in turn. The approach was motivated by observations showing the relationship of methylation number with temperature
and cyclization number with pH, but we decided to build a framework to systematically examine all structural variations in turn. We will clarify this in the text.

We understand that due to this systematic approach, we generate a large number of structural sets and associated nomenclature. However, a major goal of this publication is to outline the new structural set framework in full. While the Meth and Cyc-Isom sets appear to be the most important for this dataset, for example, subsequent research may find other sets to be more relevant. At this early stage of method development, we therefore opted to show and discuss the complete structural set framework.
2) The authors suggest that the 'best' temperature calibration is based on a set of methylated brGDGTs (the "meth" set, maybe not the ideal name) to the average temperature of the months above freezing. It would be worth some more thinking and text about what "best" means. Ultimately, these calibrations will require extensive 'field testing' in different lacustrine environments to determine what works and what doesn't (just because a calibration 'works' with surface sediment does not mean it will produce meaningful downcore reconstructions). And what makes one calibration 'best? Although the calculations of fractional abundances within brGDGT "sets" are a new approach and are discussed at length in the paper, ultimately the best temperature regression (as measured by RMSE) uses the traditional method with all 15 brGDGTs to calculate the abundances. Lastly, although the 'months above freezing' is meaningful in regions with strong temperature seasonality it is not a particularly meaningful concept in low-latitude areas where there is no monthly variation; thus, the calibration target might not be 'best' in these regions. It would be worth it to consider these issues more in the conclusions.

For the first portion of this comment, we refer the editor to Reviewer 1's similar comment and our reply, which is reproduced below:
"We thank the reviewer for this concern. We agree that "better" and "best" are ambiguous terms for evaluating our results that should be avoided. It is more accurate to say that our calibrations have some advantages in comparison to previous ones. In terms of R 2 , the structural set calibrations perform comparably well to those derived using the standard approach (slightly worse for temperature and pH, slightly better for conductivity). The main advantage of using the structural set calibrations is that they can perform this well using a smaller number of compounds chosen to isolate variations in a single structural variable. In theory, this selective approach could provide some protection against the unwanted influences of other environmental parameters on different compounds (e.g. L618-621). Furthermore, subset-specific calibrations using only the most abundant compounds make it easier to apply brGDGT proxies to organiclean samples, where only some (major) compounds are typically present above detection limit. However, we fully recognize the need to test these calibrations in paleoclimate archives, ideally with independent proxies for comparison. Such tests will be the subject of future work from our group, and we hope from the broader paleoclimate community as well. We have included the Full set temperature calibration (Eqn. 11) for this reason, but will emphasize this point in the text." We agree that Months Above Freezing is only meaningful for cold sites, especially those with strong seasonality. At lower latitudes, where MAF is identical to MAT, the distinction is meaningless. In such cases, regional calibrations using lakes that do not go below freezing may indeed be more desirable. However, we note that our high-latitude sites from the Canadian Arctic overlap with high-altitude sites from East Africa in almost every MAF plot (e.g. Fig 7).

This result suggests that using MAF instead of MAT allows data from highlatitude regions to be leveraged for calibrations that are relevant in the tropics. As noted in the text, temperatures reconstructed using the MAF calibrations might be interpretable as MAT in warm or low latitude regions. We therefore suggest using the MAF calibrations in paleoclimate applications and equating the resulting MAF temperatures with MATs where appropriate.
3) The Fit to MAF is interesting in that most sediment trap studies suggest brGDGT fluxes peak during events. Admittedly, "most" means only 2-3 studies, and it is not clear to me that those results can be reconciled with good fit to MAF. Does the improved fit to MAF imply that seasonal production and fluxes of brGDGTs are indeed biased toward summer, and if so, what are we missing from sediment trap studies? If calibrations are done to 'shoulder season' temperatures, are the calibrations worse? The authors briefly discuss these issues, but it would be interesting to dig a bit deeper.

We thank the reviewer for this comment and agree that much remains to be reconciled between sediment trap studies and calibration efforts. With sediment trap (and sub-annually resolved water filtrate) studies showing brGDGT production to vary temporally and to be tied to numerous potential influence, it is somewhat remarkable that the pool of brGDGTs in a lake surface sediment relates to an average air temperature at all. We, too, noted that MAF is probably roughly equal to shoulder-season temperature in most cases. Unfortunately, it proved difficult to standardize a "shoulder-season" across this varied dataset.
We tried calibrating against the first and last positive-degree months, or the first and last two, but our monthly temperature resolution proved to be too coarse. Additionally, if shoulder-season temperatures are relevant, it is likely due to overturning events and/or seasonal nutrient fluxes occurring at these times, which will vary widely between lakes. We thus abandoned these efforts. However, we will expand our discussion to include this subject in the text. 4) I have some concerns about the section starting on line 580 that develops calibrations for dissolved oxygen. Although this is inconclusive, I question whether we the environmental data is good enough to meaningfully address this. Is DOmean the average of the entire water column? And over what seasons? Calculating this variable is a fraught exercise without access to each lakes' actual DO profiles, many of which do not appear to exist.

We agree with the reviewer's concerns, and echo them in the discussion at the end of the section. Given the increasing evidence of a dissolved oxygen
influence, we believe the section on DO will be of interest to readers and will highlight the need for better DO data. However, given that the results are inconclusive, we will discuss the possibility of moving the technical aspects of this discussion to the Supplement with the editor. 5) Several of the indices and calibrations utilize fractional abundances of the bicyclized penta-and/or hexamethylated compounds that are often not abundant in sufficient quantities to be accurately measured in many of the published datasets. It is not clear how (a) the authors treated those sites in their calibrations, (b) what their limits of detection are for the different compounds in their new sites and how those were determined, and (c) what they suggest to do at sites where these compounds (chiefly IIc, IIc', IIIb, IIIb', IIIc, IIIc') are below detection levels. This could create problems for some of the sets that isolate these structural groups.

We treated compounds below the detection limit as having an absolute abundance of zero. This assumption led some sites to have compounds with FA = 0 and/or FA = 1. All of these FAs are plotted as such (e.g. Fig. 7c) and tend to be associated with noisier trends (presumably due in part to the lower abundances). Removing these sites from the dataset would have removed valuable points from the stronger trends (e.g. Fig. 7a), upon which our calibrations primarily rely. For subset-specific calibrations (e.g. Eqns. S1-S9), we did not face this issue and removed samples with any FA = 1.
We will add the above discussion, along with a description of our own detection limit, to the Methods section. We will also stress the potential applications for low-abundance samples in the Discussion.
6) Line 432. The text below highlights a challenge in working with these data -the environmental variables themselves are strongly correlated. This makes it difficult to conclusively support some of the statements in this paragraph -"temperature may therefore play a secondary role"... for instance. This assumes that temperature actually does play a role, rather than that temperature is correlated to pH and conductivity, and therefore is (spuriously) correlated to the brGDGTs. It might be worth considering application of multivariate statistical methods that take into account collinearity among the predictors, such as redundancy analysis, to address this issue.

We agree with the reviewer -correlations between environmental variables pose an inherent challenge to this type of work. We will add a correlation table for the environmental variables in the Supplement and expand our discussion in this section and elsewhere in the text. We further agree that redundancy analysis and other multivariate statistical methods would be a valuable extension of this work and will be the subject of a future publication.
Minor things: Line 59. Why "however"? The shifts in community composition can track shifts in environmental conditions.

We use "however" to indicate that changes in community composition could affect brGDGT distributions independently of a physiological response.
Line 74. As stated above, note that, at many sites, the highly cyclized brGDGTs are often below detection levels, such that equation 2 is reduced to the fractional abundances of only the most abundant compounds.

We will include a discussion of this point either here or later in the text.
Line 138-139. How many sites had water chemistry data?

We will include this information.
Line 220. Did the logarithmic scaling result in a normal distribution of these environmental variables?

Logarithmic scaling greatly improved the normality of these variables, resulting in a much more even spread of the data, but they were not normally distributed in the end (did not pass a Shapiro-Wilks normality test). We will include this information.
Line 301. Here and elsewhere throughout the text, it would be best to provide p values for the correlations you report.

Due to the large number of R 2 values discussed in the text, we found that including accompanying p-values made the text overwhelming and more difficult to follow. As the vast majority of p-values were < 0.01, we opted to only report pvalues ≥ 0.01. This explanation is included in the Methods.
Line 398. There is some thinking that a substitution closer to the head group would be more influential on membrane fluidity (structural disruption closer to the surface of the compound). Your results suggest this is a minor effect.

We thank the reviewer for this interesting comment. We would appreciate if the reviewer could provide some references to us so we could explore this topic in our dataset.
Line 427. This does not suggest they have similar influences on brGDGTs. Rather, it suggests that conductivity and temperature covary in the environment.
We thank the reviewer for pointing this out. We agree and will change this text.
Line 433. Could you cite a table or similar for this statement?

We will add a reference to the supplemental figures containing the relevant statistical values.
Line 471. Similarly, how did you assess that ring number is "primarily" correlated with pH rather than conductivity? Based on an r2 difference of 0.04?
We agree that an R 2 difference of 0.04 is small. However, this (or a comparable) difference was apparent across almost all 15 compounds (Fig. 9 vs. Fig. S20).
Line 499. Regressions of the temperature variables onto what? Which brGDGT formulations?

We will change the sentence beginning on L499 to read: "For each structural set and subset defined in Section 3, we performed regressions of brGDGT FAs against five temperature variables to generate multiple global-scale calibrations."
Line 505. A challenge here is that MAT = MAF in the low latitudes. The main reason for the MAF formulation is to account for the additional complexity introduced by a global calibration that includes high latitude systems. It would be worthwhile to reemphasize this here.

We will reemphasize this nuance in the text.
Line 540. This would make some sense, since MAT = MAF in the sites included in the Russell et al. (2018) calibrations.

Agreed.
Line 541. "On a regional scale"... It would be worthwhile to mention the relatively unique (highly alkaline) nature of these lakes.
We thank the reviewer for noting this and will include it in the text. Related to this, it appears from the text that a very simple formulation, MBT'5Me, performs nearly as well as more mathematically complicated (e.g. equation 10) versions. Is this correct? Lastly, although a high r2 is great, the RMSE is in some ways more important to climate reconstruction. It might be good to include a plot showing the RMSE, similar to figure 10A.
We thank the reviewer for these suggestions. We will explore the option of introducing the equations alongside the figures.

It is true that MBT'
5Me performs nearly as well as Eq. 10. This is perhaps not surprising, as the Meth FAs in Eq. 10 were inspired by the same underlying principle as the MBT' 5Me index -that methylation number varies in response to

temperature. We do not suggest abandoning the MBT' 5Me index, but we present the Meth structural set as a more targeted approach at isolating the underlying connection between methylation number and temperature.
We thank the reviewer for pointing out that a plot similar to Fig. 10A comparing the fits using RMSE rather than R 2 will be of interest to the paleoclimate community. We will explore options for including such a plot, either as part of the main text figures (Fig. 10 and 11) or in the Supplement.
Line 674. This is correct in a way, but the use of structural sets does not ultimately improve calibration performance. Correct?

We will clarify that the structural sets "improve compound-specific correlations with environmental parameters..." in line 674. It is true that they perform comparably well to traditional methods in terms of R 2 . However, we believe the fact that the structural sets can perform this well while isolating only the relevant structural changes is an "improvement". This point is currently ambiguous in the text, however. We thank the reviewer for pointing this out and will revise accordingly.
Line 692, could add a clause of "particularly in high-latitude environments".

We will incorporate this helpful suggestion.
The paper refers to papers by Sosa et al. (2020Sosa et al. ( , 2020a that are not in the references. Please provide the correct citations.