Reviews and syntheses: Opportunities for robust use of peak intensities from high-resolution mass spectrometry in organic matter studies

Kew, William; Myers-Pigg, Allison; Chang, Christine H.; Colby, Sean M.; Eder, Josie; Tfaily, Malak M.; Hawkes, Jeffrey; Chu, Rosalie K.; Stegen, James C.

doi:https://doi.org/10.5194/bg-21-4665-2024

Articles | Volume 21, issue 20

https://doi.org/10.5194/bg-21-4665-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/bg-21-4665-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 21, issue 20

Reviews and syntheses

|

29 Oct 2024

Reviews and syntheses |

| 29 Oct 2024

Reviews and syntheses: Opportunities for robust use of peak intensities from high-resolution mass spectrometry in organic matter studies

William Kew, Allison Myers-Pigg, Christine H. Chang, Sean M. Colby, Josie Eder, Malak M. Tfaily, Jeffrey Hawkes, Rosalie K. Chu, and James C. Stegen

Download

Final revised paper (published on 29 Oct 2024)
Supplement to the final revised paper
Preprint (discussion started on 17 Oct 2022)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on kew et al _ egusphere', Anonymous Referee #1, 15 Nov 2022

Comments on: Use and misuse of peak intensities for HRMS on OM studies: opportunities for robust usage. Kew et al. https://doi.org/10.5194/egusphere-2022-1105

General comments

The authors present a theoretical review of factors affecting the relationship between absolute quantity of a compound vs. analytical response as measured by high resolution mass spectrometry, as well as an experiment showing the differences in response factor for standard compounds in (negative ion) ESI-MS when measured in different matrices. They then discuss the ramifications of the fact that equal quantities of different compounds in a sample can produce different mass peak intensities or that the same quantity of compound may have a variable response in different samples depending on matrix, for the treatment of HRMS data from DOM analysis with statistical approaches designed for population ecology. As my expertise is mostly in the area of MS I will focus most of my comments on those sections.

The theoretical factors governing the response and identification of compounds in MS analysis, that the authors describe, are well known. Within the LC/MS community it is well known that one should not compare apples with pears, and that even comparing apples may be tricky as quantitation is difficult and often semi-quantitative. Not understanding the confounding factors can lead to over-interpretation of lc/ms data. Whether this message is something that needs to (still) be learned within the DOM community is unclear to me. I hope a reviewer from that community has more to say about that. The experiments with a set of standard compounds measured in different matrices is a nice illustration of the effect of ion suppression by matrix but is probably not necessary for the message of the manuscript as it is mostly a text book experiment with accordingly predictable outcome. The more novel part of this manuscript lies in the discussion on how these quantitative errors and uncertainties might influence the data treatment and outcomes. However, as I indicate in the comments below, I don’t think that the in silico data set was generated with sound choices. Below I list several specific comments and questions to be addressed. In general my recommendation is to shorten and simplify the descriptions of the factors governing quantitative response in (LC)MS, and focus the manuscript on the consequences for data treatment. A welcome additions would be a discussion on how to remedy the problems. As this probably entails more than major revisions, I recommend rejection at this time.

Specific comments

The second paragraph (lines 49 to 55) can be shortened and incorporated in the third paragraph, specifically in between line 67 and 68.

Line 75 to 76: peak intensity actually is proportional to differences in concentration for a certain compound, if the conditions are kept the same, otherwise no-one would be able to produce a standard curve. However the response factor (response per amount) may differ from compound to compound and for a compound depending on matrix and other factors. This should be rephrased.

Line 96 to 99: move this section to the end of this section 2.3 as this is a concluding remark.

Line 101 to 112: Formation of (mixed) dimers and trimers should also be mentioned, as well as in-source fragmentation. There are several nice reviews on ion suppression , it would be good to reference a few here. The influence of the choice of ionization mode ( + vs - ionization; APCI vs ESI) should also be discussed. Also, a discussion on the use or (mis) use of internal standards would be a nice addition to this discussion

Line 114 to 122: the issue described here is in fact not an ionization bias, but the inability to separate isobaric species. It simply describes the fact that when analyzing a complex (DOM) sample, any peak in any MS1 spectrum can in fact consist of the signal of multiple compounds with identical m/z (within the mass accuracy specifications) and is therefore a cumulative response of those compounds. This would still be the case even if the ionization of each compound was perfect at 100% efficiency.

Section 2.3 The effect of dilution of any compound by abundant matrix should be discussed: in trapping instruments the fill rate of the trap (or ion target) is often a programmable parameter. If a compound, present

at a given quantity is the most abundant compound there, than the trap is mostly filled with that compound. If

the compound is present in the same quantity but with together with a lot of matrix ions, the matrix ions will

take up part or most of the available space in the trap, effectively ‘diluting” the compound and thereby to

underestimation of its actual quantity. Maybe a better title for this section would be “ion transmission and

collection”

Lines 172 to 251: Section 3, in my opinion, can be removed in its entirety. These are predictable text book

experiments and whatever extra information is discussed can be incorporated in section 2.

Line 331 to 354: generation of in silico data set. I find the use of errors well above 1 (like 1.5) debatable.

Although ion enhancement does happen it is very rare and seldom that pronounced. Even in fig 4, the increase

shown in panel C from 0 to 2 ppm matrix added, is attributed to the addition of endogenous compounds from

the matrix. So is this realistic? What would happen if you make the error non-gaussian?

Line 375: Again a comment about the choices underlying the in silico data set: Ionization efficiencies do not

randomly vary for a compound across samples. In general, more complex samples with more matrix will have

less ionization efficiency for all compounds in that sample. The deviations are not truly random.

Lines 385-408: as wordy as some of the other section are, as quickly the authors go through the consequences

for the application of the statistical models. Some parts were truly unreadable to me as a person not involved

in statistics and models. The concept of mean trait values and figure 11 are hardly introduced and poorly

explained.

Line 483 – 489; After reading the entire manuscript the conclusion that the ecological metrics actually perform

quite well, came as a bit of surprise to me. The tone of the manuscript as a whole is quite negative towards

these concepts.

Fig 4: What does relative intensity mean here? Relative to what? As no bar is ever reaching 100%, is the

response of the compound with no matrix or SRFA added 100%? Please clarify. Which of the differences

between bars is statistically significant? However, I recommend to remove this figure along with section 3.

Fig. 8. Panel A, B and D are data clouds, which seems a logical outcome of the way errors are assigned in the in

silico data set. But why the convergence to 0 for the true to observed difference in panel C? I do not see the

strong resemblance between panel A and C described in line 383 of the text.

Citation: https://doi.org/10.5194/egusphere-2022-1105-RC1
- AC1: 'Reply on RC1', James Stegen, 06 Apr 2023
  
  Reviewer 1:
  Thank you for the helpful suggestions. Below we provide a summary of how we plan to address each comment in the revised manuscript. Our responses are in bold text. We look forward to your evaluation of these planned revisions.
  
  General comments
  
  The authors present a theoretical review of factors affecting the relationship between absolute quantity of a compound vs. analytical response as measured by high resolution mass spectrometry, as well as an experiment showing the differences in response factor for standard compounds in (negative ion) ESI-MS when measured in different matrices. They then discuss the ramifications of the fact that equal quantities of different compounds in a sample can produce different mass peak intensities or that the same quantity of compound may have a variable response in different samples depending on matrix, for the treatment of HRMS data from DOM analysis with statistical approaches designed for population ecology. As my expertise is mostly in the area of MS I will focus most of my comments on those sections.
  
  The theoretical factors governing the response and identification of compounds in MS analysis, that the authors describe, are well known. Within the LC/MS community it is well known that one should not compare apples with pears, and that even comparing apples may be tricky as quantitation is difficult and often semi-quantitative. Not understanding the confounding factors can lead to over-interpretation of lc/ms data. Whether this message is something that needs to (still) be learned within the DOM community is unclear to me. I hope a reviewer from that community has more to say about that.
  
  Thank you for these comments. While we agree that quantitation has been discussed within the context of LC-MS, our survey of direct infusion DOM literature indicates a need to renew this discussion with respect to the analysis and interpretation of direct infusion FTICR-MS data. We plan to maintain the vision of the manuscript, as our literature review indicates there is a need for increased awareness of challenges and pitfalls across the DOM community. We observed that ~10 years ago, DOM manuscripts often included discussion points related to the issues of using peak intensities. More recent DOM manuscripts rarely mention or evaluate these issues, even if they are comparing apples to pears. We believe the current state of the field, in which methods using peak intensities have been used in previous papers and authors are picking up those methods and applying them without necessarily understanding the pitfalls, necessitates the comprehensive treatment presented in our manuscript.
  
  The experiments with a set of standard compounds measured in different matrices is a nice illustration of the effect of ion suppression by matrix but is probably not necessary for the message of the manuscript as it is mostly a text book experiment with accordingly predictable outcome.
  
  We plan to keep these analyses because while they may be textbook for mass spectrometry experts, we believe that many folks using high resolution mass spectrometry (HRMS) data would still benefit from these examples, particularly users of HRMS data whose formal training is in other disciplines such as ecology or biogeochemistry.
  
  The more novel part of this manuscript lies in the discussion on how these quantitative errors and uncertainties might influence the data treatment and outcomes. However, as I indicate in the comments below, I don’t think that the in silico data set was generated with sound choices.
  
  Please see our plans below related to modifications to the in silico simulations.
  
  Below I list several specific comments and questions to be addressed. In general my recommendation is to shorten and simplify the descriptions of the factors governing quantitative response in (LC)MS, and focus the manuscript on the consequences for data treatment. A welcome additions would be a discussion on how to remedy the problems. As this probably entails more than major revisions, I recommend rejection at this time.
  
  While we agree with many of the suggestions from the reviewer, we believe that the recommended major structural revisions are related to the interpretation of LC-MS data rather than the focus of our manuscript, which is direct infusion (FTICR)-MS data. We believe that many of the sections regarding quantitative response which the reviewer recommends removing or shortening would directly benefit the FTICR-MS community. Thus, we prefer to maintain the overall manuscript length and structure. Many HRMS data users have limited formal training in mass spectrometry. In its current form, our manuscript provides an overview of the issues to be aware of as well as an immediate solution in the form of a simulation model. We also provide ideas for additional solutions (e.g., hierarchical modeling).
  
  We will also note that Reviewer 2 does not necessarily agree with all our interpretations of the experimental data. This demonstrates a clear need for further discussion to reconcile how data are interpreted. In other words, what may seem “textbook” to some is not so simple to others. The only way to advance the field is direct sharing, discussing, and improving research outcomes via primary literature.
  
  We also emphasize that the manuscript is not focused on LC-MS data. We are primarily focused on direct infusion (FTICR)-MS data. We plan to edit the manuscript to make this point clearer, especially in the early parts of the paper.
  
  Specific comments
  
  The second paragraph (lines 49 to 55) can be shortened and incorporated in the third paragraph, specifically in between line 67 and 68.
  
  We will make the suggested edits.
  
  Line 75 to 76: peak intensity actually is proportional to differences in concentration for a certain compound, if the conditions are kept the same, otherwise no-one would be able to produce a standard curve. However the response factor (response per amount) may differ from compound to compound and for a compound depending on matrix and other factors. This should be rephrased.
  
  We plan to rephrase as suggested and be more direct in terms of linking the statements to studies using direct infusion mass spectrometry.
  
  Line 96 to 99: move this section to the end of this section 2.3 as this is a concluding remark.
  
  We plan to make the suggested modifications.
  
  Line 101 to 112: Formation of (mixed) dimers and trimers should also be mentioned, as well as in-source fragmentation. There are several nice reviews on ion suppression , it would be good to reference a few here. The influence of the choice of ionization mode ( + vs - ionization; APCI vs ESI) should also be discussed. Also, a discussion on the use or (mis) use of internal standards would be a nice addition to this discussion
  
  These are certainly important considerations, though there are tradeoffs to account for in terms of adding more paragraphs of text on these details. Our plan is to find middle ground by referencing some review papers as suggested, and briefly noting that there are many technical decisions to make when performing direct infusion mass spectrometry (e.g., ionization mode and the use of internal standards). Our intent is to help the reader be aware of the myriad decisions one must make in terms of how to set up and run mass spectrometry analyses and how to prepare samples. Many of these decisions are tied to goals of a given study. Providing overall guidance on instrument setup and sample preparation is beyond the scope of the current manuscript. In turn, we feel that providing some brief pointers to key decisions and referencing review papers on ion suppression will efficiently convey the key information while staying within scope.
  
  Line 114 to 122: the issue described here is in fact not an ionization bias, but the inability to separate isobaric species. It simply describes the fact that when analyzing a complex (DOM) sample, any peak in any MS1 spectrum can in fact consist of the signal of multiple compounds with identical m/z (within the mass accuracy specifications) and is therefore a cumulative response of those compounds. This would still be the case even if the ionization of each compound was perfect at 100% efficiency.
  
  We plan to change the title of this subsection to ‘Ionization Effects and Isomeric Species.’ We will also include more direct/plain language, similar to how the reviewer summarized the key issue/points here, on both isomers and near-isobars.
  
  Section 2.3 The effect of dilution of any compound by abundant matrix should be discussed: in trapping instruments the fill rate of the trap (or ion target) is often a programmable parameter. If a compound, present at a given quantity is the most abundant compound there, than the trap is mostly filled with that compound. If the compound is present in the same quantity but with together with a lot of matrix ions, the matrix ions will take up part or most of the available space in the trap, effectively ‘diluting” the compound and thereby to underestimation of its actual quantity. Maybe a better title for this section would be “ion transmission and
  collection”
  
  We plan to change the subsection title to that suggested by the reviewer and edit the text to clarify the key points, similar to the reviewer’s summary here.
  
  Lines 172 to 251: Section 3, in my opinion, can be removed in its entirety. These are predictable text book experiments and whatever extra information is discussed can be incorporated in section 2.
  
  As discussed in response to earlier reviewer comments, we strongly prefer to retain this section. While mass spectrometrists may not be surprised by results shown in this section, the intended audience for this paper is much broader. We posit that the average ecology- or biogeochemistry-minded data user will not necessarily be aware of the issues or patterns addressed in this section. A key goal of our paper is to speak to researchers across the continuum, from ecologists to mass spectrometrists. At both ends of that continuum, researchers will find some of our examples very simple and ‘textbook’ in direct relation to their domain of training. However, few researchers are trained in all the concepts covered in our paper, and thus the paper provides common ground for anyone working with direct infusion mass spectrometry, regardless of formal training domains.
  
  Line 331 to 354: generation of in silico data set. I find the use of errors well above 1 (like 1.5) debatable. Although ion enhancement does happen it is very rare and seldom that pronounced. Even in fig 4, the increase shown in panel C from 0 to 2 ppm matrix added, is attributed to the addition of endogenous compounds from the matrix. So is this realistic? What would happen if you make the error non-gaussian?
  
  We understand the questions raised by the reviewer and plan to address the reviewer’s suggestion (and those of Reviewer 2) by re-running the entire suite of analyses incorporating the suggested modifications. We acknowledge that 1) there are multiple ways to set up the simulations, 2) we expect future developments in this vein, and 3) we are quite sure that different researchers will suggest and prefer different implementations of multiple aspects of the model. Part of the reason that many valid setups exist is that we are simulating phenomena that are not deeply characterized and may never be truly known. However, we would like to emphasize that we believe that our simulation model’s primary value lies not in the particular conditions simulated in the example presented in our paper, but rather in its ability to be adjusted to more closely match a dataset of interest, thus providing a flexible framework by which mass spectrometry users can evaluate their own data. To this end, we have provided the code used to produce the simulation model in a publicly available format for general use. While we plan to accommodate the reviewer’s suggestions here, we would request future editor support in avoiding situations involving repeated requests for different model setups which, while interesting, do not necessarily demonstrate additional value for our simulation model. In return, we will more clearly highlight the general utility of the model in the revised manuscript. A great follow-up manuscript would be a deep exploration of how the ecological metrics respond to different assumptions in the model, as well as extending to additional ecological metrics as pointed to in the manuscript.
  
  Line 375: Again a comment about the choices underlying the in silico data set: Ionization efficiencies do not randomly vary for a compound across samples. In general, more complex samples with more matrix will have less ionization efficiency for all compounds in that sample. The deviations are not truly random.
  
  In summary, we plan to include some new text in the manuscript outlining additional ideas for simulation model development, such as the idea suggested here. Below, we provide more rationale for this plan.
  
  As noted in the manuscript, the presented simulation model is intended as a first look at what biases and/or issues may arise when peak intensities are used to calculate common ecological metrics. We feel it is important to keep the scope of the modeling as constrained as possible while still providing informative outcomes. While the point made here by the reviewer is interesting, the practical question is whether we should expand the scope of the simulation model to account for it, or if we should provide this in the manuscript text as an example of future development needs.
  
  More specifically, we ran the model with either ~100 or ~1000 peaks within each sample. The more peaks in a sample is, we believe, what the reviewer refers to as ‘more matrix.’ In this case, we could make ionization efficiency of all peaks inversely proportional to the number of peaks in a sample. We believe that this change will have a limited impact on the ecological metrics because all samples in a given simulation run have about the same number of peaks. Where this change could have more influence is in comparisons across samples that vary widely in the number of peaks (e.g., comparing one sample with 100 peaks vs. one sample with 10000 peaks). In theory, we could pursue this direction and implement a function that decreases ionization efficiency for all peaks in a sample based on how many peaks are in that sample. However, the extent to which this function accurately models physical phenomena is unclear, and this change will also add a significant amount of scope to the simulation component of the manuscript. For instance, we would need to add another large set of figures across both of our current scenarios and include sensitivity analyses for how the effect of this mechanism changes depending on details of the function.
  
  We are reluctant to expand the simulation scope in this way, which will add numerous figures and substantial length to the (already very long) manuscript. In addition, we are not fully convinced that more peaks in a sample will, by itself, decrease ionization efficiency. Peak intensities may decrease, but that is because the concentration of each peak/molecule must be lower if there are more kinds of peaks/molecules and total dissolved carbon concentration is kept constant. Decreasing peak intensity due to lower concentration is not the same as decreasing ionization efficiency.
  
  Given the above considerations, we plan to include some new text in the manuscript laying out additional ideas for simulation model development, such as that suggested here.
  
  Lines 385-408: as wordy as some of the other section are, as quickly the authors go through the consequences for the application of the statistical models. Some parts were truly unreadable to me as a person not involved in statistics and models. The concept of mean trait values and figure 11 are hardly introduced and poorly explained.
  
  Our inference is that the primary challenge here is with the mean trait values. In turn, we will add further explanation in terms of how those values are calculated, along with some tangible examples of real-world metrics. We will also have someone from outside the author team that is not familiar with the ‘mean trait’ concept review our updated text.
  
  Line 483 – 489; After reading the entire manuscript the conclusion that the ecological metrics actually perform quite well, came as a bit of surprise to me. The tone of the manuscript as a whole is quite negative towards these concepts.
  
  Based on this suggestion, as well as similar comments from Reviewer 2, we plan to edit the text to have a more balanced or neutral tone leading up to the simulation-based evaluation.
  
  Fig 4: What does relative intensity mean here? Relative to what? As no bar is ever reaching 100%, is the response of the compound with no matrix or SRFA added 100%? Please clarify. Which of the differences between bars is statistically significant? However, I recommend to remove this figure along with section 3.
  
  As discussed above, we greatly prefer to retain this section and this figure. By ‘relative intensity,’ we mean that the data were scaled to the largest signal in any replicate from that series of spectra. We are combining replicates to show the mean values with a 95% confidence interval, which is why the bars do not always reach 100%. These details will be added to the manuscript for clarification.
  
  The primary value of the figure is to show how observed peak intensities deviate from expectations and assumptions made by ecological metrics commonly applied to direct infusion mass spectrometry data. As such, we prefer to not add details of statistical significance, beyond providing the 95% confidence intervals, as adding this information would introduce significant complexity to the existing figure, likely contributing to more confusion than clarity.
  
  Fig. 8. Panel A, B and D are data clouds, which seems a logical outcome of the way errors are assigned in the in silico data set. But why the convergence to 0 for the true to observed difference in panel C? I do not see the strong resemblance between panel A and C described in line 383 of the text.
  
  We will add more explanation for why the data converge to zero in the middle of the data cloud in panel C. The point about lack of resemblance is due to an error in which the figure labels were misplaced (i.e., panel B should be C and vice versa); we will correct this error and thank the reviewer for catching it.
  
  Citation: https://doi.org/10.5194/egusphere-2022-1105-AC1
RC2:
'Comment on egusphere-2022-1105', Anonymous Referee #2, 20 Mar 2023

Given the growing literature on the application of ecological analyses to high-resolution mass spectrometry, a careful assessment of the bias, flaws, and assumptions is timely and a welcome contribution to the expanding subdiscipline. In particular, the suggestion to expand modeling approaches, including machine learning or hierarchical modeling, to quantify the magnitude of errors is an important one for future research.
Meanwhile, the empirical results in this study show that many ecological metrics derived from the peak intensities provide valid patterns. This is an important result and could be highlighted alongside papers demonstrating the acquisition of quantitative data from HMRS (e.g., Kruve et al., 2020; Groff et al., 2022).
On line 483, it is stated that HRMS has many weaknesses, just like any analytical platform. Most practitioners of HRMS would agree that the biases presented in this paper are present (as reviewed in Urban et al., 2016; Kujawinski et al., 2010). Many of these biases (Viera-Silva et al., 2019) exist with other compositional data as well (unlike the statement on lines 73-77), such as microbiome data and have produced solutions such as those reviewed in Gloor et al 2017, such as the creation of internal standards (Hardwick et al., 2018).
Together, these two foci highlight the most problematic aspect of this paper: The tone is much too negative to accurately reflect the reality of the (very common) use of peak intensities. A more balanced and contrasted view should be taken so as not to alienate specialist readers or mislead those less well-versed. The tone leaves the feeling that the bulk of the literature in the past decade is highly flawed and not to be trusted. This is not impossible, but if this is what the authors are trying to convey, the analysis must be made much more robust. I suggest the authors change the tone to ensure that the key messages (the utility of ecological metrics and some of their drawbacks) are most effectively conveyed.
I am also concerned that some of the assumptions of the empirical work have significant technical flaws. This may be improved by providing greater transparency for the selection choices.
1) The errors of their simulation model. How can a random selection between 0 and 100 for the simulated errors be justified (lines 352 and 369)? Why is 0 included in the random selection? The decision for this range should be motivated by actual evidence, such as from the experiments measuring variation in peak intensities of analytes of known concentration. Examining Fig, 4b. it looks like a better error selection would be between 1-8. Without further justification, the results of the in silica simulation model appear quite arbitrary.
2) Peak intensities are normally distributed in HRMS data (e.g. He et al., 2020). The way the authors generate random intensities does not reproduce the normal distribution in peak intensities.
3) The number of peaks. The simulation models use either 100 or 1000 peaks. These are not environmentally relevant. Most environmental studies have several thousand peaks, where the authors nicely and unequivocally show in Fig. S2 that there is absolutely no bias in the calculation of ecological metrics, that is, the observed vs true R2 values approximate 1. For this reason, any simulations with small numbers of peaks are misleading and not relevant to most studies.
4) Why these specific standards? What are their features beyond just an absence in natural DOM. I think there needs to be a description of what makes the molecular structures, ionization properties, etc… of these analytes appropriate spike-ins?
5) No evidence is presented why these higher analyte concentrations (>200 ppb) or with relative intensities of individual peaks >1% will ever be realistic. I similarly don’t understand why the summed relative intensities exceed 100% in Fig 4a in the absence of SRFA.
There are also several strong statements that I do not believe are sufficiently supported by the scientific evidence presented in the paper. These are on line 245 “Strategies to use calibration curves will fail” and line 324 “The previous sections show that between-peak changes in peak intensity do not accurately reflect between-peak changes in abundance”.
Fig. 4 seems to show that the relative intensity scales with concentration. The authors can predict this by a nonlinear model or GAM for each analyte, or with a single model where the slope varies with the m/z (for example). Without having attempted such an analysis it is difficult to understand how this statement is supported. Figure 4 also nicely shows that you can reach quantitative assumptions between-peaks. As shown in Figure 4a, at low concentrations of the three different molecules (<100 ppb), the signal intensities seem statistically indistinguishable. A similar result is seen, especially in the MeOH Matrix, at higher concentrations of SRFA that effectively dilute the analytes to representative concentrations. These results suggest that the analytes are performing quantitatively.
Additional comments:
Line 60: The authors should cite the reference of the first use of 21T FT-ICR-MS (Smith et al., 2018).
L195-205 – This point is already made on L122
Figure 4- relative intensity is not explained. Relative to what?
Thank you for your thoughtful consideration of these points.
Refs:
Kruve, A. (2020). Strategies for drawing quantitative conclusions from nontargeted liquid chromatography–high-resolution mass spectrometry analysis.

Groff, L. C., Grossman, J. N., Kruve, A., Minucci, J. M., Lowe, C. N., McCord, J. P., ... & Sobus, J. R. (2022). Uncertainty estimation strategies for quantitative non-targeted analysis. Analytical and Bioanalytical Chemistry, 414(17), 4919-4933.
Urban, P. L. (2016). Quantitative mass spectrometry: an overview. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2079), 20150382.
Kujawinski, E. B. (2002). Electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI FT-ICR MS): characterization of complex environmental mixtures. Environmental Forensics, 3(3-4), 207-216
Vieira-Silva, S., Sabino, J., Valles-Colomer, M., Falony, G., Kathagen, G., Caenepeel, C., ... & Raes, J. (2019). Quantitative microbiome profiling disentangles inflammation-and bile duct obstruction-associated microbiota alterations across PSC/IBD diagnoses. Nature microbiology, 4(11), 1826-1831.
Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V., & Egozcue, J. J. (2017). Microbiome datasets are compositional: and this is not optional. Frontiers in microbiology, 8, 2224.
Hardwick, S. A., Chen, W. Y., Wong, T., Kanakamedala, B. S., Deveson, I. W., Ongley, S. E., ... & Mercer, T. R. (2018). Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis. Nature communications, 9(1), 3096.
He, Chen, et al. "In-house standard method for molecular characterization of dissolved organic matter by FT-ICR mass spectrometry." ACS omega 5.20 (2020): 11730-11736.
Smith, D. F., Podgorski, D. C., Rodgers, R. P., Blakney, G. T., & Hendrickson, C. L. (2018). 21 tesla FT-ICR mass spectrometer for ultrahigh-resolution analysis of complex organic mixtures. Analytical chemistry, 90(3), 2041-2047.

Citation: https://doi.org/10.5194/egusphere-2022-1105-RC2
- AC2: 'Reply on RC2', James Stegen, 06 Apr 2023
  
  Reviewer 2:
  Thank you for the helpful suggestions. Below we provide a summary of how we plan to address each comment in the revised manuscript. Our responses are in bold text. We look forward to your evaluation of these planned revisions.
  Given the growing literature on the application of ecological analyses to high-resolution mass spectrometry, a careful assessment of the bias, flaws, and assumptions is timely and a welcome contribution to the expanding subdiscipline. In particular, the suggestion to expand modeling approaches, including machine learning or hierarchical modeling, to quantify the magnitude of errors is an important one for future research.
  
  Thank you for the encouraging remarks.
  
  Meanwhile, the empirical results in this study show that many ecological metrics derived from the peak intensities provide valid patterns. This is an important result and could be highlighted alongside papers demonstrating the acquisition of quantitative data from HMRS (e.g., Kruve et al., 2020; Groff et al., 2022).
  
  Per this suggestion, we plan to include additional references and text that relate to our results. Because some of the listed references are related to LC-MS, we emphasize that our work is focused on direct infusion HRMS and not LC-MS,and will add text to clarify this point.
  
  On line 483, it is stated that HRMS has many weaknesses, just like any analytical platform. Most practitioners of HRMS would agree that the biases presented in this paper are present (as reviewed in Urban et al., 2016; Kujawinski et al., 2010). Many of these biases (Viera-Silva et al., 2019) exist with other compositional data as well (unlike the statement on lines 73-77), such as microbiome data and have produced solutions such as those reviewed in Gloor et al 2017, such as the creation of internal standards (Hardwick et al., 2018).
  
  We plan to include these references as well as relevant perspectives and solutions developed in other analytical domains focused on similar data structures. We will also include some text discussing important differences between HRMS and microbiome data that are relevant to solutions.
  
  Together, these two foci highlight the most problematic aspect of this paper: The tone is much too negative to accurately reflect the reality of the (very common) use of peak intensities. A more balanced and contrasted view should be taken so as not to alienate specialist readers or mislead those less well-versed. The tone leaves the feeling that the bulk of the literature in the past decade is highly flawed and not to be trusted. This is not impossible, but if this is what the authors are trying to convey, the analysis must be made much more robust. I suggest the authors change the tone to ensure that the key messages (the utility of ecological metrics and some of their drawbacks) are most effectively conveyed.
  
  The other reviewer had a similar comment. We plan to edit the text to achieve a more balanced, yet rigorous tone.
  
  I am also concerned that some of the assumptions of the empirical work have significant technical flaws. This may be improved by providing greater transparency for the selection choices.
  
  1) The errors of their simulation model. How can a random selection between 0 and 100 for the simulated errors be justified (lines 352 and 369)? Why is 0 included in the random selection? The decision for this range should be motivated by actual evidence, such as from the experiments measuring variation in peak intensities of analytes of known concentration. Examining Fig, 4b. it looks like a better error selection would be between 1-8. Without further justification, the results of the in silica simulation model appear quite arbitrary.
  
  The reason for including zero is to allow molecules to drop below the limit of detection of the instrument, such as in the cases of very poor ionization given either their inherent chemistry and/or interactions with other molecules in the sample/matrix. We plan to address the reviewer’s concern in two ways. First, per reviewer suggestions, we will use the empirical data to guide the range of simulated errors and contrast those results with those previously generated. Second, we will be more explicit about our rationale for including zero in the range of possible errors.
  
  2) Peak intensities are normally distributed in HRMS data (e.g. He et al., 2020). The way the authors generate random intensities does not reproduce the normal distribution in peak intensities.
  
  We plan to address this comment by providing evidence, including from our own empirical data, that peak intensity distributions are very heterogeneous. We note that the data shown in He et al 2020 is for a single standard (SRFA) which is representative of heavily processed DOM, and is not representative of all HRMS data for natural organic matter. Further, it isn’t that peak intensities are normally distributed in those spectra, but that the spectra appear to have a (skewed) normal distribution of peak intensity vs. peak mass, especially when looking only at the most abundant ion at each nominal mass. We will explicitly raise the need for follow-on simulation efforts to test the sensitivity of the results to peak intensity distributions.
  
  As stated previously, we believe that, rather than specific results from a single analysis with example parameters, the power of our simulation model lies in its ability to be adjusted to more closely match a dataset of interest, thus providing a more quantitative tool for the interpretation of mass spectrometry data. Thus, we plan to highlight that researchers can examine peak intensity distributions for a dataset of interest and modify the simulation model assumptions/setup to reflect those distributions.
  
  3) The number of peaks. The simulation models use either 100 or 1000 peaks. These are not environmentally relevant. Most environmental studies have several thousand peaks, where the authors nicely and unequivocally show in Fig. S2 that there is absolutely no bias in the calculation of ecological metrics, that is, the observed vs true R2 values approximate 1. For this reason, any simulations with small numbers of peaks are misleading and not relevant to most studies.
  
  We plan to edit the text to emphasize the points made here by the reviewer. However, with respect to the 100 and 1000 peak simulations, we would like to note that real datasets vary tremendously in the number of peaks being used for analyses. There are many reasons for large variation in the number of peaks actually used in the calculation of ecological metrics. For instance, some researchers may only examine peaks with assigned formulas and/or peaks observed consistently across technical replicates. It is important for researchers to be aware that biases and uncertainty will increase as the number of peaks used decreases. We will specifically emphasize two key points: (1) Sample-to-sample changes in the value of ecological metrics can be interpreted with increasing confidence as the number of peaks increases, as shown in Fig. S2. Qualitative gradients are, therefore, more robust with more peaks. (2) The absolute magnitude of some ecological metrics, however, are shifted away from their true magnitude even when there are large numbers of peaks. Quantitative comparisons from one dataset to another may not, therefore, be valid. Thus, based on both points, we feel that the continued inclusion of the analyses, figures, and discussion informed by simulations with 100 or 1000 peaks is warranted.
  
  4) Why these specific standards? What are their features beyond just an absence in natural DOM. I think there needs to be a description of what makes the molecular structures, ionization properties, etc… of these analytes appropriate spike-ins?
  
  We plan to include more details behind the rationale for our selections. In short, we selected commercially available natural products that could represent (a) types of organic matter present or possibly present in real samples, or (b) precursors to natural organic matter. In addition, the selected standards collectively span a diverse range of chemistries and are all amenable to the instrument type and setup (i.e., negative mode electrospray ionization FTICR-MS).
  
  5) No evidence is presented why these higher analyte concentrations (>200 ppb) or with relative intensities of individual peaks >1% will ever be realistic. I similarly don’t understand why the summed relative intensities exceed 100% in Fig 4a in the absence of SRFA.
  
  We plan to address this comment in two pieces.
  
  First, we will provide rationale for the concentrations used, state our assumptions, and clarify why some peaks exceed 1% relative intensity. In summary, we selected a range of concentrations that would be observable via our instrumentation. Pushing to lower concentrations would likely lead to molecules falling below our limit of detection. We assume the overall patterns and associated inferences are robust as the point was to vary concentrations and matrices. Lastly, relative intensity is scaled to the peak with the highest intensity. Some peaks must, therefore, have relative intensity > 1% (e.g., the peak with the highest intensity will have a relative intensity of 100%).
  
  Second, we will add text that clarifies why summed relative intensities exceed 100% in Fig. 4a. This is due to how the data were normalized and we believe that clarifying the approach will resolve any concerns.
  
  There are also several strong statements that I do not believe are sufficiently supported by the scientific evidence presented in the paper. These are on line 245 “Strategies to use calibration curves will fail” and line 324 “The previous sections show that between-peak changes in peak intensity do not accurately reflect between-peak changes in abundance”.
  
  We plan to add text that helps tie these statements more directly to the data shown. While we feel these statements are valid and important outcomes of our study, we acknowledge this comment along with previous reviewer comments and plan to soften the language accordingly.
  
  We also note that Reviewer 1 appears to agree with our interpretations and feels that the data shown are too basic to warrant publication. They suggest removing this entire section of the manuscript because it is ‘textbook.’ Reviewer 2 interprets the data differently, however, which indicates there is no coherence in how researchers view and use peak intensity data from direct infusion HRMS. This emphasizes the importance of retaining this section of the manuscript and more generally to continue publishing the type of empirical studies we provide to further the literature-based discussion.
  
  Fig. 4 seems to show that the relative intensity scales with concentration. The authors can predict this by a nonlinear model or GAM for each analyte, or with a single model where the slope varies with the m/z (for example). Without having attempted such an analysis it is difficult to understand how this statement is supported.
  
  We plan to include the idea of using nonlinear, GAM, and/or mixed effects modeling to account for peak-to-peak variation in the quantitative link between intensity and concentration. However, we do not plan to include these models as part of our revisions, as they are beyond the scope of our manuscript, which is to present an overview and general toolkits for method evaluation. We note that even if we were able to successfully fit such models to our empirical data, models would need to be developed for each of the 10s of thousands of peaks across all natural organic matter. These models would also need to account for shifts in the intensity-concentration relationships across matrices. Showing that we can fit reasonable models for a few standards does not imply that this would work for all organic molecules. We also note that the simulations indicate that, in many cases, we do not need to model or otherwise explicitly derive quantitative relationships between intensity and concentration because the ecological metrics perform well without that information.
  
  Figure 4 also nicely shows that you can reach quantitative assumptions between-peaks. As shown in Figure 4a, at low concentrations of the three different molecules (<100 ppb), the signal intensities seem statistically indistinguishable. A similar result is seen, especially in the MeOH Matrix, at higher concentrations of SRFA that effectively dilute the analytes to representative concentrations. These results suggest that the analytes are performing quantitatively.
  
  We plan to edit the text to highlight that there are certain situations, such as in the case of low concentration mentioned by the reviewer, in which between-peak comparisons of intensity could carry valid information. However, our data show that there is inconsistency in how well differences in peak intensity map to differences in concentration. It may be that there is a clearer link at low concentration, but in natural samples that vary widely in complexity, freshness, etc., the true concentrations of each peak are unknown. Put another way, one could not tell if a low-signal ion is due to a high-concentration analyte with poor ionization efficiency, or due to a low concentration analyte with high ionization efficiency, nor if a high-signal ion consisted of a single high concentration analyte or many low-concentration isomeric analytes. Thus, we have no way of knowing which peaks have intensities that carry valid information and those that do not. As such, we cannot generally recommend using between-peak differences in intensity to infer between-peak shifts in concentration. We agree this is an interesting direction for follow-on work and will add text pointing to the need for additional empirical evaluation to pinpoint situations in which between-peak comparisons of intensity carry valid quantitative information.
  
  More generally, we again highlight the contrast between the two reviewers on these topics. Reviewer 1 feels our empirical data show outcomes that are ‘textbook’ and thus are not needed because it is well-known that peak intensities cannot be used quantitatively. Reviewer 2 feels our data support the quantitative use of peak intensities. Clearly, there are diverging opinions across the research community, highlighting the importance of this section of the paper. We need to come to a resolution on these topics as a research community and that can only happen by presenting, discussing, and improving evidence.
  
  Additional comments:
  
  Line 60: The authors should cite the reference of the first use of 21T FT-ICR-MS (Smith et al., 2018).
  
  We will cite the pertinent 21T papers.
  
  L195-205 – This point is already made on L122
  
  We will minimize duplication.
  
  Figure 4- relative intensity is not explained. Relative to what?
  
  We will clarify this (and see comments above).
  
  Thank you for your thoughtful consideration of these points.
  
  Refs:
  
  Kruve, A. (2020). Strategies for drawing quantitative conclusions from nontargeted liquid chromatography–high-resolution mass spectrometry analysis.
  
  Groff, L. C., Grossman, J. N., Kruve, A., Minucci, J. M., Lowe, C. N., McCord, J. P., ... & Sobus, J. R. (2022). Uncertainty estimation strategies for quantitative non-targeted analysis. Analytical and Bioanalytical Chemistry, 414(17), 4919-4933.
  
  Urban, P. L. (2016). Quantitative mass spectrometry: an overview. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2079), 20150382.
  
  Kujawinski, E. B. (2002). Electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI FT-ICR MS): characterization of complex environmental mixtures. Environmental Forensics, 3(3-4), 207-216
  
  Vieira-Silva, S., Sabino, J., Valles-Colomer, M., Falony, G., Kathagen, G., Caenepeel, C., ... & Raes, J. (2019). Quantitative microbiome profiling disentangles inflammation-and bile duct obstruction-associated microbiota alterations across PSC/IBD diagnoses. Nature microbiology, 4(11), 1826-1831.
  
  Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V., & Egozcue, J. J. (2017). Microbiome datasets are compositional: and this is not optional. Frontiers in microbiology, 8, 2224.
  
  Hardwick, S. A., Chen, W. Y., Wong, T., Kanakamedala, B. S., Deveson, I. W., Ongley, S. E., ... & Mercer, T. R. (2018). Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis. Nature communications, 9(1), 3096.
  
  He, Chen, et al. "In-house standard method for molecular characterization of dissolved organic matter by FT-ICR mass spectrometry." ACS omega 5.20 (2020): 11730-11736.
  
  Smith, D. F., Podgorski, D. C., Rodgers, R. P., Blakney, G. T., & Hendrickson, C. L. (2018). 21 tesla FT-ICR mass spectrometer for ultrahigh-resolution analysis of complex organic mixtures. Analytical chemistry, 90(3), 2041-2047.
  Thank you for the helpful references.
  
  Citation: https://doi.org/10.5194/egusphere-2022-1105-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (25 Apr 2023) by Jack Middelburg

AR by James Stegen on behalf of the Authors (12 Aug 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Aug 2023) by Jack Middelburg

RR by Qing-Zeng Zhu (30 Aug 2023)

RR by Anonymous Referee #4 (05 Mar 2024)

Suggestions for revision or reasons for rejection

In this manuscript Kew et al. discuss the use of intensity values for natural organic matter ions measured by ultrahigh resolution mass spectrometry. The authors review basic concepts, provide a summary of limitations, pitfalls, and then describe how the use/misuse of intensities could impact the employment of FTMS to obtain ecological metrics. The paper is concluded with a wonderful summary and recommendations to the community. The paper is not of a traditional format and is a hybrid between a review article, a critique, has some experimental data, as well as some computational modeling. It also has a lot of important citations provided to us, which is also very valiable. As a heavy FTMS user I enjoyed reading through and will be certainly coming back to in when I need to be reminded of how intensities can change if I were to change X, Y, or Z in my experiment.

I really like figures 1,2, and 3. I agree with the authors, as they say in the responce to comments doc, that most environmental users of FTMS do not have much formal education in mass spectrometry and they end up learning MS and other analytical methods "on the go". So while some of this theoretical background and experiments may be reduntant with other papers/textbooks, often such resources are abundant with complex diagrams, lots of calculus, etc. - texts that are not super digestable by the broader biogeoscienst. This study is a perfect resource for our community and especially the uprising young scientists in non-chemistry departments (e.g., geology). The authors also nicely introduce the ecological metrics to non-ecologist readers (as a classically trained chemist I have no formal education in ecology, so it was nice to read a braoder introduction of alpha/beta diversity indices and their uses).

Given that this is a second round of review, this is already an excellent paper. I have one major comment and a few minor onces that can be easily remedied with a minor revision and overall aim to make the paper more receptable by the broader biogeoscientific community.

My biggest critique is that there is too much emphasis on the ecological indices. The PNNL group is the main group using these metrics with very few other groups using them once in a while - if we look at the global biogeoscieces community, most people do not use mass spec data to calculate these diversity indeces and do not use the data in an ecological framework/manner. People commonly plot van Krevelens, DBE vs C, calculate %CHO, %CHON, %CHOS, etc. types of formulas (%lignin, %lipid, etc.), other averages (average H/C, O/C, NOSC, AImod, etc.). Statistics like HCA, PCoA, PCA, NMDS are also very common, and so are now Spearman correlations with some other metrics (PARAFAC Fmax, CDOM metrics, etc.). I am not saying this is the right thing to do, I do think people should be doing more advanced things with MS data (ecological frameworking, neural networks, etc.), but this is where we are at and have to consider the state of the community right now. As this paper is targeting the broader community, it should contain as much relevant information as possible. I like the ecological framing, but I do not think it would be so useful to the broader biogeoscientific community, because most of us do not use this type of ecological framing for our expeirments at present and probably in the forseeable future. For this reason, I think sections 4 and 5 are a too long and should be combined into one section. The ecological section should come after a main section where the more common uses are discussed. All of the things I describe above involve the use of intensity values and the authors should discuss the relevant issues. Some examples I can easily come up with: Should we use intensity-weighed H/C, O/C, etc. metrics (sum of rel.intensity*metric), or just regularly calculated averages? Would it be better to report %CHO, CHON, etc. as % intensity (intensity of CHO formulas/all formulas) or % number of formulas (number of CHO formulas/number of total formulas)? When people look at a three-dimensional van Krevelen plot, and see that the most intense peaks are in the lipids region (for example), how confidently can we say "the sample is rich in lipids/most molecules are lipid-like". When people Spearman-correlate FTMS formula intensities with external metrics (e.g., proteinaceous component from PARAFAC), how much can we trust that the signifficantly correlating formulas correspond to proteins? Regarding statistics, should we combine FTMS results (%CHO, %CHON, etc., which are semi-quantitative) with truly quantitative data (DOC, SUVA, etc.) - everyone does it, even though all variables have to be of equal quantitativeness. Or should we not use FTMS intensities at all in such stats and just resort to presence-absense (see first paragraph of section 2.6.5 here: 10.1016/j.gca.2010.03.035). I can go on and on with coming up with similar questions, and I am sure that the authors can too - I don't necessaryly ask you to answer these exact questions in the revised version, but giving you ideas for talking points. I think this manuscript is the perfect place for addressing such questions and having this kind of discussion - this would be much more beneficial to the community. After such section of describing the use of intensities in "common FTMS data workup approaches" you can follow up with the "using/misuing FTMS intensities in an ecological framerworks" section. I will also say that sections 4 and 5 are collectively too long, not so straightforward to read by someone who does not use ecological indicies on a regular basis (in my opinion, most of FTMS users), and so I recomment shorteing, not going that much into the weeds, and slightly streamlining with the thought of making that more friendly to the broader public.

A minor comment was that I was confused by the "within-peak" and "between-peak" terminologies. Regarding within-peak, I first thought of doing comparisons of the isomers that are under one m/z value. The between-peak sounds ackward and I though we would be comparing the noise level between two peaks.... I recommend changing these terms to something clearer. Suggestions: "same-peak comparisons" for within-peak and "peak-to-peak comparisons"/"different-peaks comparisons" for between-peak.

This manuscript desperately needs to include some text (maybe one paragraph?) about normalization. Usually we take the raw intensity values of assigned peaks and divide the by the sum of all intensity values of assigned peaks, but is this the best way? Should we consider the sum of ALL spectral peaks (including isotopologues, blank peaks, etc.) or just normalize to the sum of intensities of the assigned formulas? Some ideas for talking points. I am familiar with this wonderful paper that provides some guidance: 10.1002/rcm.9068

Lastly, there are a few different terms that we come accross: intensity, magnitude, peak hight. They are in my mind the same, but are there any nomenclature issues that need to be discussed in this regard? Should we stick to one uniform term for more comparable literature or keep using all of them interchangeably? Are there different connotations/flavors to these terms? Some ideas for talking points if you choose to include a terminology discussion (I do recommend! This paper would be the perfect spot for it).

Hide

ED: Reconsider after major revisions (05 Mar 2024) by Jack Middelburg

AR by James Stegen on behalf of the Authors (07 Jun 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (19 Jun 2024) by Jack Middelburg

RR by Anonymous Referee #4 (19 Jun 2024)

ED: Publish subject to technical corrections (15 Aug 2024) by Jack Middelburg

AR by James Stegen on behalf of the Authors (26 Aug 2024) Author's response Manuscript

Short summary

Natural organic matter (NOM) is often studied via Fourier transform mass spectrometry (FTMS), which identifies organic molecules as mass spectra peaks. The intensity of peaks is data that is often discarded due to technical concerns. We review the theory behind these concerns and show they are supported empirically. However, simulations show that ecological analyses of NOM data that include FTMS peak intensities are often valid. This opens a path for robust use of FTMS peak intensities for NOM.